Showing 10 of total 10 results (show query)
quanteda
quanteda:Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Maintained by Kenneth Benoit. Last updated 3 months ago.
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
851 stars 16.65 score 5.4k scripts 52 dependentsdata-cleaning
validate:Data Validation Infrastructure
Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. Supports checks implied by an SDMX DSD file as well. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, Chapter 6 and the JSS paper (2021) <doi:10.18637/jss.v097.i10>.
Maintained by Mark van der Loo. Last updated 25 days ago.
419 stars 12.39 score 448 scripts 8 dependentsbioc
VariantAnnotation:Annotation of Genetic Variants
Annotate variants, compute amino acid coding changes, predict coding outcomes.
Maintained by Bioconductor Package Maintainer. Last updated 3 months ago.
dataimportsequencingsnpannotationgeneticsvariantannotationcurlbzip2xz-utilszlib
11.39 score 1.9k scripts 152 dependentskurthornik
NLP:Natural Language Processing Infrastructure
Basic classes and methods for Natural Language Processing.
Maintained by Kurt Hornik. Last updated 4 months ago.
6 stars 9.42 score 1.0k scripts 127 dependentsropensci
textreuse:Detect Text Reuse and Document Similarity
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Maintained by Yaoxiang Li. Last updated 1 months ago.
200 stars 9.28 score 226 scriptsbioc
cmapR:CMap Tools in R
The Connectivity Map (CMap) is a massive resource of perturbational gene expression profiles built by researchers at the Broad Institute and funded by the NIH Library of Integrated Network-Based Cellular Signatures (LINCS) program. Please visit https://clue.io for more information. The cmapR package implements methods to parse, manipulate, and write common CMap data objects, such as annotated matrices and collections of gene sets.
Maintained by Ted Natoli. Last updated 5 months ago.
dataimportdatarepresentationgeneexpressionbioconductorbioinformaticscmap
90 stars 8.86 score 298 scriptsmeireles
spectrolab:Class and Methods for Spectral Data
Input/Output, processing and visualization of spectra taken with different spectrometers, including SVC (Spectra Vista), ASD and PSR (Spectral Evolution). Implements an S3 class spectra that other packages can build on. Provides methods to access, plot, manipulate, splice sensor overlap, vector normalize and smooth spectra.
Maintained by Jose Eduardo Meireles. Last updated 3 months ago.
16 stars 7.39 score 256 scriptsbioc
SIAMCAT:Statistical Inference of Associations between Microbial Communities And host phenoTypes
Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).
Maintained by Jakob Wirbel. Last updated 5 months ago.
immunooncologymetagenomicsclassificationmicrobiomesequencingpreprocessingclusteringfeatureextractiongeneticvariabilitymultiplecomparisonregression
6.72 score 147 scriptsstatistikat
tatoo:Combine and Export Data Frames
Functions to combine data.frames in ways that require additional effort in base R, and to add metadata (id, title, ...) that can be used for printing and xlsx export. The 'Tatoo_report' class is provided as a convenient helper to write several such tables to a workbook, one table per worksheet. Tatoo is built on top of 'openxlsx', but intimate knowledge of that package is not required to use tatoo.
Maintained by Stefan Fleck. Last updated 2 years ago.
7 stars 5.53 score 24 scriptsbioc
Rcwl:An R interface to the Common Workflow Language
The Common Workflow Language (CWL) is an open standard for development of data analysis workflows that is portable and scalable across different tools and working environments. Rcwl provides a simple way to wrap command line tools and build CWL data analysis pipelines programmatically within R. It increases the ease of usage, development, and maintenance of CWL pipelines.
Maintained by Qiang Hu. Last updated 5 months ago.
softwareworkflowstepimmunooncology
5.52 score 37 scripts 2 dependents