Showing 200 of total 5586 results (show query)
tidymodels
broom:Convert Statistical Objects into Tidy Tibbles
Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.
Maintained by Simon Couch. Last updated 4 days ago.
1.5k stars 21.58 score 37k scripts 1.5k dependentstidyverse
tidyverse:Easily Install and Load the 'Tidyverse'
The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.
Maintained by Hadley Wickham. Last updated 5 months ago.
1.7k stars 20.23 score 664k scripts 125 dependentstidyverse
dbplyr:A 'dplyr' Back End for Databases
A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.
Maintained by Hadley Wickham. Last updated 4 months ago.
481 stars 19.72 score 5.2k scripts 736 dependentsplotly
plotly:Create Interactive Web Graphics via 'plotly.js'
Create interactive web graphics from 'ggplot2' graphs and/or a custom interface to the (MIT-licensed) JavaScript library 'plotly.js' inspired by the grammar of graphics.
Maintained by Carson Sievert. Last updated 4 months ago.
d3jsdata-visualizationggplot2javascriptplotlyshinywebgl
2.6k stars 19.43 score 93k scripts 797 dependentssfirke
janitor:Simple Tools for Examining and Cleaning Dirty Data
The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and explore duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness.
Maintained by Sam Firke. Last updated 3 months ago.
data-analysisdata-cleaningdata-sciencedirty-dataexcelpivot-tablesspsstabulationstidyverse
1.4k stars 19.40 score 35k scripts 231 dependentstopepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 4 months ago.
1.6k stars 19.24 score 61k scripts 303 dependentstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 3 days ago.
586 stars 18.80 score 7.2k scripts 383 dependentsbioc
clusterProfiler:A universal enrichment tool for interpreting omics data
This package supports functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation. It provides a univeral interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios. It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation. Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions.
Maintained by Guangchuang Yu. Last updated 4 months ago.
annotationclusteringgenesetenrichmentgokeggmultiplecomparisonpathwaysreactomevisualizationenrichment-analysisgsea
1.1k stars 17.03 score 11k scripts 48 dependentsddsjoberg
gtsummary:Presentation-Ready Data Summary and Analytic Result Tables
Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.
Maintained by Daniel D. Sjoberg. Last updated 7 days ago.
easy-to-usegthtml5regression-modelsreproducibilityreproducible-researchstatisticssummary-statisticssummary-tablestable1tableone
1.1k stars 17.02 score 8.2k scripts 15 dependentsthomasp85
ggraph:An Implementation of Grammar of Graphics for Graphs and Networks
The grammar of graphics as implemented in ggplot2 is a poor fit for graph and network visualizations due to its reliance on tabular data input. ggraph is an extension of the ggplot2 API tailored to graph visualizations and provides the same flexible approach to building up plots layer by layer.
Maintained by Thomas Lin Pedersen. Last updated 1 years ago.
ggplot-extensionggplot2graph-visualizationnetwork-visualizationvisualizationcpp
1.1k stars 16.96 score 9.2k scripts 111 dependentssatijalab
Seurat:Tools for Single Cell Genomics
A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
human-cell-atlassingle-cell-genomicssingle-cell-rna-seqcpp
2.4k stars 16.86 score 50k scripts 73 dependentsbioc
ggtree:an R package for visualization of tree and annotation data
'ggtree' extends the 'ggplot2' plotting system which implemented the grammar of graphics. 'ggtree' is designed for visualization and annotation of phylogenetic trees and other tree-like structures with their annotation data.
Maintained by Guangchuang Yu. Last updated 5 months ago.
alignmentannotationclusteringdataimportmultiplesequencealignmentphylogeneticsreproducibleresearchsoftwarevisualizationannotationsggplot2phylogenetic-trees
871 stars 16.83 score 5.1k scripts 109 dependentsropensci
skimr:Compact and Flexible Summaries of Data
A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README.
Maintained by Elin Waring. Last updated 2 months ago.
peer-reviewedropenscisummary-statisticsunconfunconf17
1.1k stars 16.80 score 18k scripts 14 dependentstidymodels
rsample:General Resampling Infrastructure
Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).
Maintained by Hannah Frick. Last updated 20 days ago.
341 stars 16.72 score 5.2k scripts 79 dependentskassambara
ggpubr:'ggplot2' Based Publication Ready Plots
The 'ggplot2' package is excellent and flexible for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a 'ggplot', the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. 'ggpubr' provides some easy-to-use functions for creating and customizing 'ggplot2'- based publication ready plots.
Maintained by Alboukadel Kassambara. Last updated 2 years ago.
1.2k stars 16.68 score 65k scripts 409 dependentsamices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 4 days ago.
chained-equationsfcsimputationmicemissing-datamissing-valuesmultiple-imputationmultivariate-datacpp
462 stars 16.64 score 10k scripts 154 dependentstidymodels
tidymodels:Easily Install and Load the 'Tidymodels' Packages
The tidy modeling "verse" is a collection of packages for modeling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse.
Maintained by Max Kuhn. Last updated 1 months ago.
783 stars 16.52 score 66k scripts 15 dependentstidyverse
modelr:Modelling Functions that Work with the Pipe
Functions for modelling that help you seamlessly integrate modelling into a pipeline of data manipulation and visualisation.
Maintained by Hadley Wickham. Last updated 1 years ago.
400 stars 16.46 score 6.9k scripts 1.1k dependentstidymodels
parsnip:A Common API to Modeling and Analysis Functions
A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).
Maintained by Max Kuhn. Last updated 19 days ago.
612 stars 16.37 score 3.4k scripts 69 dependentsggobi
GGally:Extension to 'ggplot2'
The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.
Maintained by Barret Schloerke. Last updated 11 months ago.
597 stars 16.15 score 17k scripts 154 dependentsbioc
biomaRt:Interface to BioMart databases (i.e. Ensembl)
In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.
Maintained by Mike Smith. Last updated 18 days ago.
annotationbioconductorbiomartensembl
38 stars 15.99 score 13k scripts 230 dependentskassambara
survminer:Drawing Survival Curves using 'ggplot2'
Contains the function 'ggsurvplot()' for drawing easily beautiful and 'ready-to-publish' survival curves with the 'number at risk' table and 'censoring count plot'. Other functions are also available to plot adjusted curves for `Cox` model and to visually examine 'Cox' model assumptions.
Maintained by Alboukadel Kassambara. Last updated 5 months ago.
524 stars 15.87 score 7.0k scripts 55 dependentstidymodels
infer:Tidy Statistical Inference
The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.
Maintained by Simon Couch. Last updated 6 months ago.
736 stars 15.75 score 3.5k scripts 18 dependentsbioc
enrichplot:Visualization of Functional Enrichment Result
The 'enrichplot' package implements several visualization methods for interpreting functional enrichment results obtained from ORA or GSEA analysis. It is mainly designed to work with the 'clusterProfiler' package suite. All the visualization methods are developed based on 'ggplot2' graphics.
Maintained by Guangchuang Yu. Last updated 3 months ago.
annotationgenesetenrichmentgokeggpathwayssoftwarevisualizationenrichment-analysispathway-analysis
239 stars 15.71 score 3.1k scripts 58 dependentsnjtierney
naniar:Data Structures, Summaries, and Visualisations for Missing Data
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.
Maintained by Nicholas Tierney. Last updated 19 days ago.
data-visualisationggplot2missing-datamissingnesstidy-data
657 stars 15.63 score 5.1k scripts 9 dependentsprophet:Automatic Forecasting Procedure
Implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
Maintained by Sean Taylor. Last updated 5 months ago.
19k stars 15.59 score 976 scripts 13 dependentsr-forge
car:Companion to Applied Regression
Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.
Maintained by John Fox. Last updated 5 months ago.
15.38 score 43k scripts 919 dependentsrich-iannone
DiagrammeR:Graph/Network Visualization
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
Maintained by Richard Iannone. Last updated 2 months ago.
graphgraph-functionsnetwork-graphproperty-graphvisualization
1.7k stars 15.29 score 3.8k scripts 86 dependentskassambara
rstatix:Pipe-Friendly Framework for Basic Statistical Tests
Provides a simple and intuitive pipe-friendly framework, coherent with the 'tidyverse' design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely 'within-Ss' designs (repeated measures), purely 'between-Ss' designs, and mixed 'within-and-between-Ss' designs. It's also possible to compute several effect size metrics, including "eta squared" for ANOVA, "Cohen's d" for t-test and 'Cramer V' for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.
Maintained by Alboukadel Kassambara. Last updated 2 years ago.
458 stars 15.27 score 11k scripts 432 dependentsbbolker
broom.mixed:Tidying Methods for Mixed Models
Convert fitted objects from various R mixed-model packages into tidy data frames along the lines of the 'broom' package. The package provides three S3 generics for each model: tidy(), which summarizes a model's statistical findings such as coefficients of a regression; augment(), which adds columns to the original data such as predictions, residuals and cluster assignments; and glance(), which provides a one-row summary of model-level statistics.
Maintained by Ben Bolker. Last updated 8 days ago.
230 stars 15.22 score 4.0k scripts 37 dependentssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 14 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
959 stars 15.20 score 4.0k scripts 21 dependentslarmarange
labelled:Manipulating Labelled Data
Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.
Maintained by Joseph Larmarange. Last updated 1 months ago.
havenlabelsmetadatasasspssstata
76 stars 15.04 score 2.4k scripts 98 dependentshojsgaard
doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities
Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.
Maintained by Søren Højsgaard. Last updated 2 days ago.
1 stars 14.99 score 3.2k scripts 948 dependentsbioc
MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor
Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.
Maintained by Marcel Ramos. Last updated 2 months ago.
infrastructuredatarepresentationbioconductorbioconductor-packagegenomicsnci-itcrtcgau24ca289073
71 stars 14.94 score 670 scripts 126 dependentscynkra
dm:Relational Data Models
Provides tools for working with multiple related tables, stored as data frames or in a relational database. Multiple tables (data and metadata) are stored in a compound object, which can then be manipulated with a pipe-friendly syntax.
Maintained by Kirill Müller. Last updated 3 months ago.
data-modeldata-warehousingdatawarehousingdbidbplyrrelational-databases
511 stars 14.81 score 410 scripts 8 dependentsbioc
GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data
Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.
Maintained by Robert Castelo. Last updated 11 days ago.
functionalgenomicsmicroarrayrnaseqpathwaysgenesetenrichmentgene-set-enrichmentgenomicspathway-enrichment-analysis
212 stars 14.74 score 1.6k scripts 19 dependentsflorianhartig
DHARMa:Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models
The 'DHARMa' package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Currently supported are linear and generalized linear (mixed) models from 'lme4' (classes 'lmerMod', 'glmerMod'), 'glmmTMB', 'GLMMadaptive', and 'spaMM'; phylogenetic linear models from 'phylolm' (classes 'phylolm' and 'phyloglm'); generalized additive models ('gam' from 'mgcv'); 'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model classes. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial, phylogenetic and temporal autocorrelation.
Maintained by Florian Hartig. Last updated 28 days ago.
glmmregressionregression-diagnosticsresidual
226 stars 14.74 score 2.8k scripts 10 dependentsthomasp85
tidygraph:A Tidy API for Graph Manipulation
A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.
Maintained by Thomas Lin Pedersen. Last updated 2 months ago.
graph-algorithmsgraph-manipulationigraphnetwork-analysistidyversecpp
553 stars 14.74 score 4.6k scripts 136 dependentsmjskay
tidybayes:Tidy Data and 'Geoms' for Bayesian Models
Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.
Maintained by Matthew Kay. Last updated 7 months ago.
bayesian-data-analysisbrmsggplot2jagsstantidy-datavisualization
733 stars 14.72 score 7.3k scripts 20 dependentshusson
FactoMineR:Multivariate Exploratory Data Analysis and Data Mining
Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017).
Maintained by Francois Husson. Last updated 4 months ago.
47 stars 14.71 score 5.6k scripts 112 dependentsdcomtois
summarytools:Tools to Quickly and Neatly Summarize Data
Data frame summaries, cross-tabulations, weight-enabled frequency tables and common descriptive (univariate) statistics in concise tables available in a variety of formats (plain ASCII, Markdown and HTML). A good point-of-entry for exploring data, both for experienced and new R users.
Maintained by Dominic Comtois. Last updated 6 days ago.
descriptive-statisticsfrequency-tablehtml-reportmarkdownpanderpandocpandoc-markdownrmarkdownrstudio
527 stars 14.62 score 2.9k scripts 6 dependentssinhrks
ggfortify:Data Visualization Tools for Statistical Analysis Results
Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using 'ggplot2'.
Maintained by Yuan Tang. Last updated 9 months ago.
528 stars 14.60 score 9.1k scripts 24 dependentshojsgaard
pbkrtest:Parametric Bootstrap, Kenward-Roger and Satterthwaite Based Methods for Test in Mixed Models
Computes p-values based on (a) Satterthwaite or Kenward-Rogers degree of freedom methods and (b) parametric bootstrap for mixed effects models as implemented in the 'lme4' package. Implements parametric bootstrap test for generalized linear mixed models as implemented in 'lme4' and generalized linear models. The package is documented in the paper by Halekoh and Højsgaard, (2012, <doi:10.18637/jss.v059.i09>). Please see 'citation("pbkrtest")' for citation details.
Maintained by Søren Højsgaard. Last updated 1 days ago.
6 stars 14.53 score 648 scripts 929 dependentsjacob-long
jtools:Analysis and Presentation of Social Scientific Data
This is a collection of tools for more efficiently understanding and sharing the results of (primarily) regression analyses. There are also a number of miscellaneous functions for statistical and programming purposes. Support for models produced by the survey and lme4 packages are points of emphasis.
Maintained by Jacob A. Long. Last updated 7 months ago.
167 stars 14.48 score 4.0k scripts 14 dependentsbioc
TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data
The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.
Maintained by Tiago Chedraoui Silva. Last updated 1 months ago.
dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksequencingsurvivalsoftwarebiocbioconductorgdcintegrative-analysistcgatcga-datatcgabiolinks
310 stars 14.47 score 1.6k scripts 6 dependentsindrajeetpatil
ggstatsplot:'ggplot2' Based Plots with Statistical Details
Extension of 'ggplot2', 'ggstatsplot' creates graphics with details from statistical tests included in the plots themselves. It provides an easier syntax to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms, dot plots, dot-and-whisker plots) or categorical (pie and bar charts) data. Currently, it supports the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian versions of t-test/ANOVA, correlation analyses, contingency table analysis, meta-analysis, and regression analyses. References: Patil (2021) <doi:10.21105/joss.03236>.
Maintained by Indrajeet Patil. Last updated 1 months ago.
bayes-factorsdatasciencedatavizeffect-sizeggplot-extensionhypothesis-testingnon-parametric-statisticsregression-modelsstatistical-analysis
2.1k stars 14.46 score 3.0k scripts 1 dependentsstatistikat
VIM:Visualization and Imputation of Missing Values
New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
Maintained by Matthias Templ. Last updated 8 months ago.
hotdeckimputation-methodsmodel-predictionsvisualizationcpp
85 stars 14.44 score 2.6k scripts 19 dependentssingmann
afex:Analysis of Factorial Experiments
Convenience functions for analyzing factorial experiments using ANOVA or mixed models. aov_ez(), aov_car(), and aov_4() allow specification of between, within (i.e., repeated-measures), or mixed (i.e., split-plot) ANOVAs for data in long format (i.e., one observation per row), automatically aggregating multiple observations per individual and cell of the design. mixed() fits mixed models using lme4::lmer() and computes p-values for all fixed effects using either Kenward-Roger or Satterthwaite approximation for degrees of freedom (LMM only), parametric bootstrap (LMMs and GLMMs), or likelihood ratio tests (LMMs and GLMMs). afex_plot() provides a high-level interface for interaction or one-way plots using ggplot2, combining raw data and model estimates. afex uses type 3 sums of squares as default (imitating commercial statistical software).
Maintained by Henrik Singmann. Last updated 7 months ago.
124 stars 14.43 score 1.4k scripts 15 dependentsbioc
xcms:LC-MS and GC-MS Data Analysis
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.
Maintained by Steffen Neumann. Last updated 18 days ago.
immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp
196 stars 14.31 score 984 scripts 11 dependentstidymodels
tune:Tidy Tuning Tools
The ability to tune models is important. 'tune' contains functions and classes to be used in conjunction with other 'tidymodels' packages for finding reasonable values of hyper-parameters in models, pre-processing methods, and post-processing steps.
Maintained by Max Kuhn. Last updated 27 days ago.
293 stars 14.27 score 756 scripts 39 dependentstalgalili
heatmaply:Interactive Cluster Heat Maps Using 'plotly' and 'ggplot2'
Create interactive cluster 'heatmaps' that can be saved as a stand- alone HTML file, embedded in 'R Markdown' documents or in a 'Shiny' app, and available in the 'RStudio' viewer pane. Hover the mouse pointer over a cell to show details or drag a rectangle to zoom. A 'heatmap' is a popular graphical method for visualizing high-dimensional data, in which a table of numbers are encoded as a grid of colored cells. The rows and columns of the matrix are ordered to highlight patterns and are often accompanied by 'dendrograms'. 'Heatmaps' are used in many fields for visualizing observations, correlations, missing values patterns, and more. Interactive 'heatmaps' allow the inspection of specific value by hovering the mouse over a cell, as well as zooming into a region of the 'heatmap' by dragging a rectangle around the relevant area. This work is based on the 'ggplot2' and 'plotly.js' engine. It produces similar 'heatmaps' to 'heatmap.2' with the advantage of speed ('plotly.js' is able to handle larger size matrix), the ability to zoom from the 'dendrogram' panes, and the placing of factor variables in the sides of the 'heatmap'.
Maintained by Tal Galili. Last updated 9 months ago.
d3-heatmapdendextenddendrogramggplot2heatmapplotly
386 stars 14.21 score 2.0k scripts 45 dependentsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
626 stars 14.20 score 4.0k scripts 16 dependentsdkahle
ggmap:Spatial Visualization with ggplot2
A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps and Stamen Maps). It includes tools common to those tasks, including functions for geolocation and routing.
Maintained by David Kahle. Last updated 1 years ago.
770 stars 14.17 score 12k scripts 31 dependentscorybrunson
ggalluvial:Alluvial Plots in 'ggplot2'
Alluvial plots use variable-width ribbons and stacked bar plots to represent multi-dimensional or repeated-measures data with categorical or ordinal variables; see Riehmann, Hanfler, and Froehlich (2005) <doi:10.1109/INFVIS.2005.1532152> and Rosvall and Bergstrom (2010) <doi:10.1371/journal.pone.0008694>. Alluvial plots are statistical graphics in the sense of Wilkinson (2006) <doi:10.1007/0-387-28695-0>; they share elements with Sankey diagrams and parallel sets plots but are uniquely determined from the data and a small set of parameters. This package extends Wickham's (2010) <doi:10.1198/jcgs.2009.07098> layered grammar of graphics to generate alluvial plots from tidy data.
Maintained by Jason Cory Brunson. Last updated 8 months ago.
alluvial-diagramsalluvial-plotscategorical-data-visualizationggplot2repeated-measures-data
507 stars 14.14 score 3.0k scripts 21 dependentskassambara
factoextra:Extract and Visualize the Results of Multivariate Data Analyses
Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including 'PCA' (Principal Component Analysis), 'CA' (Correspondence Analysis), 'MCA' (Multiple Correspondence Analysis), 'FAMD' (Factor Analysis of Mixed Data), 'MFA' (Multiple Factor Analysis) and 'HMFA' (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides 'ggplot2' - based elegant data visualization.
Maintained by Alboukadel Kassambara. Last updated 5 years ago.
363 stars 14.13 score 15k scripts 52 dependentswalkerke
tidycensus:Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames
An integrated R interface to several United States Census Bureau APIs (<https://www.census.gov/data/developers/data-sets.html>) and the US Census Bureau's geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for mapping and spatial analysis.
Maintained by Kyle Walker. Last updated 2 months ago.
648 stars 14.02 score 7.5k scripts 10 dependentspharmaverse
admiral:ADaM in R Asset Library
A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam>).
Maintained by Ben Straub. Last updated 7 days ago.
cdiscclinical-trialsopen-source
239 stars 13.97 score 486 scripts 4 dependentstidymodels
workflows:Modeling Workflows
Managing both a 'parsnip' model and a preprocessor, such as a model formula or recipe from 'recipes', can often be challenging. The goal of 'workflows' is to streamline this process by bundling the model alongside the preprocessor, all within the same object.
Maintained by Simon Couch. Last updated 1 months ago.
207 stars 13.97 score 876 scripts 43 dependentsjbkunst
highcharter:A Wrapper for the 'Highcharts' Library
A wrapper for the 'Highcharts' library including shortcut functions to plot R objects. 'Highcharts' <https://www.highcharts.com/> is a charting library offering numerous chart types with a simple configuration syntax.
Maintained by Joshua Kunst. Last updated 1 years ago.
highchartshtmlwidgetsshinyshiny-rvisualizationwrapper
725 stars 13.93 score 4.9k scripts 18 dependentsbioc
AnnotationHub:Client to access AnnotationHub resources
This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be discovered. The resource includes metadata about each resource, e.g., a textual description, tags, and date of modification. The client creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructuredataimportguithirdpartyclientcore-packageu24ca289073
17 stars 13.88 score 2.7k scripts 104 dependentsgergness
srvyr:'dplyr'-Like Syntax for Summary Statistics of Survey Data
Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.
Maintained by Greg Freedman Ellis. Last updated 2 months ago.
215 stars 13.88 score 1.8k scripts 15 dependentsbiomodhub
biomod2:Ensemble Platform for Species Distribution Modeling
Functions for species distribution modeling, calibration and evaluation, ensemble of models, ensemble forecasting and visualization. The package permits to run consistently up to 10 single models on a presence/absences (resp presences/pseudo-absences) dataset and to combine them in ensemble models and ensemble projections. Some bench of other evaluation and visualisation tools are also available within the package.
Maintained by Maya Guéguen. Last updated 13 hours ago.
95 stars 13.85 score 536 scripts 7 dependentsbioc
BiocFileCache:Manage Files Across Sessions
This package creates a persistent on-disk cache of files that the user can add, update, and retrieve. It is useful for managing resources (such as custom Txdb objects) that are costly or difficult to create, web resources, and data files used across sessions.
Maintained by Lori Shepherd. Last updated 2 months ago.
dataimportcore-packageu24ca289073
13 stars 13.76 score 486 scripts 436 dependentsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 4 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
185 stars 13.75 score 1.3k scripts 22 dependentsyulab-smu
scatterpie:Scatter Pie Plot
Creates scatterpie plots, especially useful for plotting pies on a map.
Maintained by Guangchuang Yu. Last updated 3 months ago.
62 stars 13.60 score 820 scripts 68 dependentsdieghernan
tidyterra:'tidyverse' Methods and 'ggplot2' Helpers for 'terra' Objects
Extension of the 'tidyverse' for 'SpatRaster' and 'SpatVector' objects of the 'terra' package. It includes also new 'geom_' functions that provide a convenient way of visualizing 'terra' objects with 'ggplot2'.
Maintained by Diego Hernangómez. Last updated 5 hours ago.
terraggplot-extensionr-spatialrspatial
190 stars 13.59 score 1.9k scripts 26 dependentskaz-yos
tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights
Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.
Maintained by Kazuki Yoshida. Last updated 3 years ago.
baseline-characteristicsdescriptive-statisticsstatistics
221 stars 13.55 score 2.3k scripts 12 dependentstidyverts
fable:Forecasting Models for Tidy Time Series
Provides a collection of commonly used univariate and multivariate time series forecasting models including automatically selected exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA) models. These models work within the 'fable' framework provided by the 'fabletools' package, which provides the tools to evaluate, visualise, and combine models in a workflow consistent with the tidyverse.
Maintained by Mitchell OHara-Wild. Last updated 4 months ago.
569 stars 13.54 score 2.1k scripts 6 dependentsbioc
GEOquery:Get data from NCBI Gene Expression Omnibus (GEO)
The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.
Maintained by Sean Davis. Last updated 5 months ago.
microarraydataimportonechanneltwochannelsagebioconductorbioinformaticsdata-sciencegenomicsncbi-geo
93 stars 13.48 score 4.1k scripts 45 dependentsyulab-smu
tidytree:A Tidy Tool for Phylogenetic Tree Data Manipulation
Phylogenetic tree generally contains multiple components including node, edge, branch and associated data. 'tidytree' provides an approach to convert tree object to tidy data frame as well as provides tidy interfaces to manipulate tree data.
Maintained by Guangchuang Yu. Last updated 8 months ago.
phylogenetic-treetidyversetree-data
56 stars 13.36 score 584 scripts 128 dependentsbusiness-science
tidyquant:Tidy Quantitative Financial Analysis
Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.
Maintained by Matt Dancho. Last updated 2 months ago.
dplyrfinancial-analysisfinancial-datafinancial-statementsmultiple-stocksperformance-analysisperformanceanalyticsquantmodstockstock-exchangesstock-indexesstock-listsstock-performancestock-pricesstock-symboltidyversetime-seriestimeseriesxts
872 stars 13.34 score 5.2k scriptsprojectmosaic
mosaic:Project MOSAIC Statistics and Mathematics Teaching Utilities
Data sets and utilities from Project MOSAIC (<http://www.mosaic-web.org>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.
Maintained by Randall Pruim. Last updated 1 years ago.
93 stars 13.32 score 7.2k scripts 7 dependentsropensci
visdat:Preliminary Visualisation of Data
Create preliminary exploratory data visualisations of an entire dataset to identify problems or unexpected features using 'ggplot2'.
Maintained by Nicholas Tierney. Last updated 9 months ago.
exploratory-data-analysismissingnesspeer-reviewedropenscivisualisation
452 stars 13.31 score 2.1k scripts 11 dependentschjackson
flexsurv:Flexible Parametric Survival and Multi-State Models
Flexible parametric models for time-to-event data, including the Royston-Parmar spline model, generalized gamma and generalized F distributions. Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multi-state models, based on either cause-specific hazards or mixture models.
Maintained by Christopher Jackson. Last updated 2 months ago.
57 stars 13.31 score 632 scripts 43 dependentsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 9 days ago.
deep-learningmachine-learningnlptransformersopenjdk
145 stars 13.21 score 436 scripts 1 dependentslarmarange
ggstats:Extension to 'ggplot2' for Plotting Stats
Provides new statistics, new geometries and new positions for 'ggplot2' and a suite of functions to facilitate the creation of statistical plots.
Maintained by Joseph Larmarange. Last updated 22 days ago.
37 stars 13.08 score 190 scripts 156 dependentskeaven
gsDesign:Group Sequential Design
Derives group sequential clinical trial designs and describes their properties. Particular focus on time-to-event, binary, and continuous outcomes. Largely based on methods described in Jennison, Christopher and Turnbull, Bruce W., 2000, "Group Sequential Methods with Applications to Clinical Trials" ISBN: 0-8493-0316-8.
Maintained by Keaven Anderson. Last updated 27 days ago.
biostatisticsboundariesclinical-trialsdesignspending-functions
51 stars 13.05 score 338 scripts 5 dependentsbioc
Gviz:Plotting data and annotation information along genomic coordinates
Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.
Maintained by Robert Ivanek. Last updated 5 months ago.
visualizationmicroarraysequencing
79 stars 13.05 score 1.4k scripts 46 dependentsbioc
ChIPseeker:ChIPseeker for ChIP peak Annotation, Comparison, and Visualization
This package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statstical methods for estimate the significance of overlap among ChIP peak data sets, and incorporate GEO database for user to compare the own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, and overlap of peaks or genes.
Maintained by Guangchuang Yu. Last updated 5 months ago.
annotationchipseqsoftwarevisualizationmultiplecomparisonatac-seqchip-seqcomparisonepigeneticsepigenomics
233 stars 13.05 score 1.6k scripts 5 dependentsgavinsimpson
gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'
Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.
Maintained by Gavin L. Simpson. Last updated 9 hours ago.
distributional-regressiongamgammgeneralized-additive-mixed-modelsgeneralized-additive-modelsggplot2glmlmmgcvpenalized-splinerandom-effectssmoothingsplines
216 stars 12.96 score 1.6k scripts 2 dependentsdgrtwo
fuzzyjoin:Join Tables Together on Inexact Matching
Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance and regular expression matching.
Maintained by David Robinson. Last updated 5 years ago.
679 stars 12.96 score 1.5k scripts 21 dependentsopenair-project
openair:Tools for the Analysis of Air Pollution Data
Tools to analyse, interpret and understand air pollution data. Data are typically regular time series and air quality measurement, meteorological data and dispersion model output can be analysed. The package is described in Carslaw and Ropkins (2012, <doi:10.1016/j.envsoft.2011.09.008>) and subsequent papers.
Maintained by David Carslaw. Last updated 4 days ago.
air-qualityair-quality-datameteorologyopenaircpp
316 stars 12.94 score 1.2k scripts 12 dependentsjuba
questionr:Functions to Make Surveys Processing Easier
Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.
Maintained by Julien Barnier. Last updated 11 days ago.
83 stars 12.93 score 1.1k scripts 19 dependentsmjockers
syuzhet:Extracts Sentiment and Sentiment-Derived Plot Arcs from Text
Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include "syuzhet" (default) developed in the Nebraska Literary Lab "afinn" developed by Finn Årup Nielsen, "bing" developed by Minqing Hu and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D. Applicable references are available in README.md and in the documentation for the "get_sentiment" function. The package also provides a hack for implementing Stanford's coreNLP sentiment parser. The package provides several methods for plot arc normalization.
Maintained by Matthew Jockers. Last updated 2 years ago.
336 stars 12.92 score 1.4k scripts 31 dependentsfriendly
matlib:Matrix Functions for Teaching and Learning Linear Algebra and Multivariate Statistics
A collection of matrix functions for teaching and learning matrix linear algebra as used in multivariate statistical methods. Many of these functions are designed for tutorial purposes in learning matrix algebra ideas using R. In some cases, functions are provided for concepts available elsewhere in R, but where the function call or name is not obvious. In other cases, functions are provided to show or demonstrate an algorithm. In addition, a collection of functions are provided for drawing vector diagrams in 2D and 3D and for rendering matrix expressions and equations in LaTeX.
Maintained by Michael Friendly. Last updated 17 days ago.
diagramslinear-equationsmatrixmatrix-functionsmatrix-visualizervectorvignette
65 stars 12.89 score 900 scripts 11 dependentspaleolimbot
ggspatial:Spatial Data Framework for ggplot2
Spatial data plus the power of the ggplot2 framework means easier mapping when input data are already in the form of spatial objects.
Maintained by Dewey Dunnington. Last updated 2 years ago.
379 stars 12.85 score 4.1k scripts 24 dependentsbioc
minfi:Analyze Illumina Infinium DNA methylation arrays
Tools to analyze & visualize Illumina Infinium methylation arrays.
Maintained by Kasper Daniel Hansen. Last updated 4 months ago.
immunooncologydnamethylationdifferentialmethylationepigeneticsmicroarraymethylationarraymultichanneltwochanneldataimportnormalizationpreprocessingqualitycontrol
60 stars 12.82 score 996 scripts 27 dependentsbioc
MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics
MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.
Maintained by Laurent Gatto. Last updated 18 days ago.
immunooncologyinfrastructureproteomicsmassspectrometryqualitycontroldataimportbioconductorbioinformaticsmass-spectrometryproteomics-datavisualisationcpp
131 stars 12.76 score 772 scripts 36 dependentsinsightsengineering
teal:Exploratory Web Apps for Analyzing Clinical Trials Data
A 'shiny' based interactive exploration framework for analyzing clinical trials data. 'teal' currently provides a dynamic filtering facility and different data viewers. 'teal' 'shiny' applications are built using standard 'shiny' modules.
Maintained by Dawid Kaledkowski. Last updated 1 months ago.
clinical-trialsnestshinywebapp
206 stars 12.65 score 176 scripts 5 dependentsbioc
SpatialExperiment:S4 Class for Spatially Resolved -omics Data
Defines an S4 class for storing data from spatial -omics experiments. The class extends SingleCellExperiment to support storage and retrieval of additional information from spot-based and molecule-based platforms, including spatial coordinates, images, and image metadata. A specialized constructor function is included for data from the 10x Genomics Visium platform.
Maintained by Dario Righelli. Last updated 5 months ago.
datarepresentationdataimportinfrastructureimmunooncologygeneexpressiontranscriptomicssinglecellspatial
59 stars 12.63 score 1.8k scripts 71 dependentsohdsi
DatabaseConnector:Connecting to Various Database Platforms
An R 'DataBase Interface' ('DBI') compatible interface to various database platforms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Impala', 'Google BigQuery', 'Snowflake', 'Spark', 'SQLite', and 'InterSystems IRIS'). Also includes support for fetching data as 'Andromeda' objects. Uses either 'Java Database Connectivity' ('JDBC') or other 'DBI' drivers to connect to databases.
Maintained by Martijn Schuemie. Last updated 2 months ago.
56 stars 12.63 score 772 scripts 11 dependentshrbrmstr
ggalt:Extra Coordinate Systems, 'Geoms', Statistical Transformations, Scales and Fonts for 'ggplot2'
A compendium of new geometries, coordinate systems, statistical transformations, scales and fonts for 'ggplot2', including splines, 1d and 2d densities, univariate average shifted histograms, a new map coordinate system based on the 'PROJ.4'-library along with geom_cartogram() that mimics the original functionality of geom_map(), formatters for "bytes", a stat_stepribbon() function, increased 'plotly' compatibility and the 'StateFace' open source font 'ProPublica'. Further new functionality includes lollipop charts, dumbbell charts, the ability to encircle points and coordinate-system-based text annotations.
Maintained by Bob Rudis. Last updated 2 years ago.
geomggplot-extensionggplot2ggplot2-geomggplot2-scales
676 stars 12.60 score 2.3k scripts 7 dependentsmassimoaria
bibliometrix:Comprehensive Science Mapping Analysis
Tool for quantitative research in scientometrics and bibliometrics. It implements the comprehensive workflow for science mapping analysis proposed in Aria M. and Cuccurullo C. (2017) <doi:10.1016/j.joi.2017.08.007>. 'bibliometrix' provides various routines for importing bibliographic data from 'SCOPUS', 'Clarivate Analytics Web of Science' (<https://www.webofknowledge.com/>), 'Digital Science Dimensions' (<https://www.dimensions.ai/>), 'OpenAlex' (<https://openalex.org/>), 'Cochrane Library' (<https://www.cochranelibrary.com/>), 'Lens' (<https://lens.org>), and 'PubMed' (<https://pubmed.ncbi.nlm.nih.gov/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.
Maintained by Massimo Aria. Last updated 13 days ago.
bibliometric-analysisbibliometricscitationcitation-networkcitationsco-authorsco-occurenceco-word-analysiscorrespondence-analysiscouplingisi-webjournalmanuscriptquantitative-analysisscholarssciencescience-mappingscientificscientometricsscopus
545 stars 12.54 score 518 scripts 2 dependentsbilldenney
PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis
Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.
Maintained by Bill Denney. Last updated 1 months ago.
ncanoncompartmental-analysispharmacokinetics
73 stars 12.53 score 214 scripts 4 dependentsbioc
microbiome:Microbiome Analytics
Utilities for microbiome analysis.
Maintained by Leo Lahti. Last updated 5 months ago.
metagenomicsmicrobiomesequencingsystemsbiologyhitchiphitchip-atlashuman-microbiomemicrobiologymicrobiome-analysisphyloseqpopulation-study
293 stars 12.51 score 2.0k scripts 5 dependentsinsightsengineering
tern:Create Common TLGs Used in Clinical Trials
Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.
Maintained by Joe Zhu. Last updated 2 months ago.
clinical-trialsgraphslistingsnestoutputstables
83 stars 12.50 score 186 scripts 9 dependentsropensci
treeio:Base Classes and Functions for Phylogenetic Tree Input and Output
'treeio' is an R package to make it easier to import and store phylogenetic tree with associated data; and to link external data from different sources to phylogeny. It also supports exporting phylogenetic tree with heterogeneous associated data to a single tree file and can be served as a platform for merging tree with associated data and converting file formats.
Maintained by Guangchuang Yu. Last updated 5 months ago.
softwareannotationclusteringdataimportdatarepresentationalignmentmultiplesequencealignmentphylogeneticsexporterparserphylogenetic-trees
102 stars 12.46 score 1.3k scripts 122 dependentstidyverts
feasts:Feature Extraction and Statistics for Time Series
Provides a collection of features, decomposition methods, statistical summaries and graphics functions for the analysing tidy time series data. The package name 'feasts' is an acronym comprising of its key features: Feature Extraction And Statistics for Time Series.
Maintained by Mitchell OHara-Wild. Last updated 5 months ago.
300 stars 12.38 score 1.4k scripts 7 dependentsouhscbbmc
REDCapR:Interaction Between R and REDCap
Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.
Maintained by Will Beasley. Last updated 3 months ago.
118 stars 12.36 score 438 scripts 6 dependentsbioc
ReactomePA:Reactome Pathway Analysis
This package provides functions for pathway analysis based on REACTOME pathway database. It implements enrichment analysis, gene set enrichment analysis and several functions for visualization. This package is not affiliated with the Reactome team.
Maintained by Guangchuang Yu. Last updated 5 months ago.
pathwaysvisualizationannotationmultiplecomparisongenesetenrichmentreactomeenrichment-analysisreactome-pathway-analysisreactomepa
40 stars 12.25 score 1.5k scripts 7 dependentsbioc
ggbio:Visualization tools for genomic data
The ggbio package extends and specializes the grammar of graphics for biological data. The graphics are designed to answer common scientific questions, in particular those often asked of high throughput genomics data. All core Bioconductor data structures are supported, where appropriate. The package supports detailed views of particular genomic regions, as well as genome-wide overviews. Supported overviews include ideograms and grand linear views. High-level plots include sequence fragment length, edge-linked interval to data view, mismatch pileup, and several splicing summaries.
Maintained by Michael Lawrence. Last updated 5 months ago.
111 stars 12.23 score 734 scripts 16 dependentsrsquaredacademy
olsrr:Tools for Building OLS Regression Models
Tools designed to make it easier for users, particularly beginner/intermediate R users to build ordinary least squares regression models. Includes comprehensive regression output, heteroskedasticity tests, collinearity diagnostics, residual diagnostics, measures of influence, model fit assessment and variable selection procedures.
Maintained by Aravind Hebbali. Last updated 5 months ago.
collinearity-diagnosticslinear-modelsregressionstepwise-regression
103 stars 12.19 score 1.4k scripts 4 dependentsstuart-lab
Signac:Analysis of Single-Cell Chromatin Data
A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.
Maintained by Tim Stuart. Last updated 7 months ago.
atacbioinformaticssingle-cellzlibcpp
355 stars 12.18 score 3.7k scripts 1 dependentstidyverts
fabletools:Core Tools for Packages in the 'fable' Framework
Provides tools, helpers and data structures for developing models and time series functions for 'fable' and extension packages. These tools support a consistent and tidy interface for time series modelling and analysis.
Maintained by Mitchell OHara-Wild. Last updated 2 months ago.
91 stars 12.18 score 396 scripts 18 dependentstidymodels
probably:Tools for Post-Processing Predicted Values
Models can be improved by post-processing class probabilities, by: recalibration, conversion to hard probabilities, assessment of equivocal zones, and other activities. 'probably' contains tools for conducting these operations as well as calibration tools and conformal inference techniques for regression models.
Maintained by Max Kuhn. Last updated 6 months ago.
115 stars 12.09 score 21k scripts 1 dependentsmrc-ide
EpiEstim:Estimate Time Varying Reproduction Numbers from Epidemic Curves
Tools to quantify transmissibility throughout an epidemic from the analysis of time series of incidence as described in Cori et al. (2013) <doi:10.1093/aje/kwt133> and Wallinga and Teunis (2004) <doi:10.1093/aje/kwh255>.
Maintained by Anne Cori. Last updated 7 months ago.
95 stars 12.06 score 1.0k scripts 7 dependentstidymodels
workflowsets:Create a Collection of 'tidymodels' Workflows
A workflow is a combination of a model and preprocessors (e.g, a formula, recipe, etc.) (Kuhn and Silge (2021) <https://www.tmwr.org/>). In order to try different combinations of these, an object can be created that contains many workflows. There are functions to create workflows en masse as well as training them and visualizing the results.
Maintained by Simon Couch. Last updated 5 months ago.
94 stars 12.04 score 294 scripts 19 dependentscrsh
papaja:Prepare American Psychological Association Journal Articles with R Markdown
Tools to create dynamic, submission-ready manuscripts, which conform to American Psychological Association manuscript guidelines. We provide R Markdown document formats for manuscripts (PDF and Word) and revision letters (PDF). Helper functions facilitate reporting statistical analyses or create publication-ready tables and plots.
Maintained by Frederik Aust. Last updated 1 months ago.
apaapa-guidelinesjournalmanuscriptpsychologyreproducible-paperreproducible-researchrmarkdown
663 stars 12.00 score 1.7k scripts 2 dependentszachmayer
caretEnsemble:Ensembles of Caret Models
Functions for creating ensembles of caret models: caretList() and caretStack(). caretList() is a convenience function for fitting multiple caret::train() models to the same dataset. caretStack() will make linear or non-linear combinations of these models, using a caret::train() model as a meta-model.
Maintained by Zachary A. Deane-Mayer. Last updated 3 months ago.
226 stars 11.98 score 780 scripts 1 dependentsbioc
ExperimentHub:Client to access ExperimentHub resources
This package provides a client for the Bioconductor ExperimentHub web resource. ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed. Each resource has associated metadata, tags and date of modification. The client creates and manages a local cache of files retrieved enabling quick and reproducible access.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructuredataimportguithirdpartyclientcore-packageu24ca289073
10 stars 11.94 score 764 scripts 57 dependentsbioc
GenomicDataCommons:NIH / NCI Genomic Data Commons Access
Programmatically access the NIH / NCI Genomic Data Commons RESTful service.
Maintained by Sean Davis. Last updated 2 months ago.
dataimportsequencingapi-clientbioconductorbioinformaticscancercore-servicesdata-sciencegenomicsncitcgavignette
87 stars 11.94 score 238 scripts 12 dependentsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 7 days ago.
12 stars 11.94 score 448 scripts 16 dependentsxfim
ggmcmc:Tools for Analyzing MCMC Simulations from Bayesian Inference
Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables, and functions to work with hierarchical/multilevel batches of parameters (Fernández-i-Marín, 2016 <doi:10.18637/jss.v070.i09>).
Maintained by Xavier Fernández i Marín. Last updated 2 years ago.
bayesian-data-analysisggplot2graphicaljagsmcmcstan
111 stars 11.94 score 1.6k scripts 8 dependentspecanproject
PEcAn.DB:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by David LeBauer. Last updated 3 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
216 stars 11.90 score 127 scripts 27 dependentsbioc
QFeatures:Quantitative features for mass spectrometry data
The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.
Maintained by Laurent Gatto. Last updated 28 days ago.
infrastructuremassspectrometryproteomicsmetabolomicsbioconductormass-spectrometry
27 stars 11.87 score 278 scripts 49 dependentshannameyer
CAST:'caret' Applications for Spatial-Temporal Models
Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.
Maintained by Hanna Meyer. Last updated 2 months ago.
autocorrelationcaretfeature-selectionmachine-learningoverfittingpredictive-modelingspatialspatio-temporalvariable-selection
114 stars 11.85 score 298 scripts 1 dependentsr-causal
ggdag:Analyze and Create Elegant Directed Acyclic Graphs
Tidy, analyze, and plot directed acyclic graphs (DAGs). 'ggdag' is built on top of 'dagitty', an R package that uses the 'DAGitty' web tool (<https://dagitty.net/>) for creating and analyzing DAGs. 'ggdag' makes it easy to tidy and plot 'dagitty' objects using 'ggplot2' and 'ggraph', as well as common analytic and graphical functions, such as determining adjustment sets and node relationships.
Maintained by Malcolm Barrett. Last updated 8 months ago.
causal-inferencedagggplot-extension
443 stars 11.78 score 1.8k scripts 5 dependentsfriendly
heplots:Visualizing Hypothesis Tests in Multivariate Linear Models
Provides HE plot and other functions for visualizing hypothesis tests in multivariate linear models. HE plots represent sums-of-squares-and-products matrices for linear hypotheses and for error using ellipses (in two dimensions) and ellipsoids (in three dimensions). The related 'candisc' package provides visualizations in a reduced-rank canonical discriminant space when there are more than a few response variables.
Maintained by Michael Friendly. Last updated 8 days ago.
linear-hypothesesmatricesmultivariate-linear-modelsplotrepeated-measure-designsvisualizing-hypothesis-tests
9 stars 11.78 score 1.1k scripts 7 dependentsdatalorax
equatiomatic:Transform Models into 'LaTeX' Equations
The goal of 'equatiomatic' is to reduce the pain associated with writing 'LaTeX' formulas from fitted models. The primary function of the package, extract_eq(), takes a fitted model object as its input and returns the corresponding 'LaTeX' code for the model.
Maintained by Philippe Grosjean. Last updated 22 days ago.
619 stars 11.75 score 424 scripts 5 dependentsbioc
variancePartition:Quantify and interpret drivers of variation in multilevel gene expression experiments
Quantify and interpret multiple sources of biological and technical variation in gene expression experiments. Uses a linear mixed model to quantify variation in gene expression attributable to individual, tissue, time point, or technical variables. Includes dream differential expression analysis for repeated measures.
Maintained by Gabriel E. Hoffman. Last updated 3 months ago.
rnaseqgeneexpressiongenesetenrichmentdifferentialexpressionbatcheffectqualitycontrolregressionepigeneticsfunctionalgenomicstranscriptomicsnormalizationpreprocessingmicroarrayimmunooncologysoftware
7 stars 11.69 score 1.1k scripts 3 dependentshaleyjeppson
ggmosaic:Mosaic Plots in the 'ggplot2' Framework
Mosaic plots in the 'ggplot2' framework. Mosaic plot functionality is provided in a single 'ggplot2' layer by calling the geom 'mosaic'.
Maintained by Haley Jeppson. Last updated 6 months ago.
167 stars 11.63 score 1.8k scripts 4 dependentspecanproject
PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.
Maintained by David LeBauer. Last updated 3 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
216 stars 11.62 score 64 scripts 14 dependentsggseg
ggseg:Plotting Tool for Brain Atlases
Contains 'ggplot2' geom for plotting brain atlases using simple features. The largest component of the package is the data for the two built-in atlases. Mowinckel & Vidal-Piñeiro (2020) <doi:10.1177/2515245920928009>.
Maintained by Athanasia Mo Mowinckel. Last updated 2 years ago.
221 stars 11.57 score 590 scripts 14 dependentsprojectmosaic
ggformula:Formula Interface to the Grammar of Graphics
Provides a formula interface to 'ggplot2' graphics.
Maintained by Randall Pruim. Last updated 1 years ago.
38 stars 11.55 score 1.7k scripts 25 dependentsbioc
mia:Microbiome analysis
mia implements tools for microbiome analysis based on the SummarizedExperiment, SingleCellExperiment and TreeSummarizedExperiment infrastructure. Data wrangling and analysis in the context of taxonomic data is the main scope. Additional functions for common task are implemented such as community indices calculation and summarization.
Maintained by Tuomas Borman. Last updated 5 days ago.
microbiomesoftwaredataimportanalysisbioconductorcpp
51 stars 11.51 score 316 scripts 5 dependentstidymodels
stacks:Tidy Model Stacking
Model stacking is an ensemble technique that involves training a model to combine the outputs of many diverse statistical models, and has been shown to improve predictive performance in a variety of settings. 'stacks' implements a grammar for 'tidymodels'-aligned model stacking.
Maintained by Simon Couch. Last updated 5 months ago.
298 stars 11.46 score 840 scriptsjohncoene
echarts4r:Create Interactive Graphs with 'Echarts JavaScript' Version 5
Easily create interactive charts by leveraging the 'Echarts Javascript' library which includes 36 chart types, themes, 'Shiny' proxies and animations.
Maintained by David Munoz Tord. Last updated 18 days ago.
echartshacktoberfesthtmlwidgethtmlwidgetsvisualization
603 stars 11.45 score 1.3k scripts 11 dependentslarmarange
broom.helpers:Helpers for Model Coefficients Tibbles
Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.
Maintained by Joseph Larmarange. Last updated 25 days ago.
22 stars 11.45 score 165 scripts 2 dependentsbioc
destiny:Creates diffusion maps
Create and plot diffusion maps.
Maintained by Philipp Angerer. Last updated 4 months ago.
cellbiologycellbasedassaysclusteringsoftwarevisualizationdiffusion-mapsdimensionality-reductioncpp
82 stars 11.44 score 792 scripts 1 dependentsewenharrison
finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling
Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.
Maintained by Ewen Harrison. Last updated 10 days ago.
270 stars 11.43 score 1.0k scriptsdarwin-eu
CDMConnector:Connect to an OMOP Common Data Model
Provides tools for working with observational health data in the Observational Medical Outcomes Partnership (OMOP) Common Data Model format with a pipe friendly syntax. Common data model database table references are stored in a single compound object along with metadata.
Maintained by Adam Black. Last updated 1 months ago.
12 stars 11.43 score 502 scripts 12 dependentsluukvdmeer
sfnetworks:Tidy Geospatial Networks
Provides a tidy approach to spatial network analysis, in the form of classes and functions that enable a seamless interaction between the network analysis package 'tidygraph' and the spatial analysis package 'sf'.
Maintained by Lucas van der Meer. Last updated 3 months ago.
geospatial-networksnetwork-analysisrspatialsimple-featuresspatial-analysisspatial-data-sciencespatial-networkstidygraphtidyverse
373 stars 11.43 score 332 scripts 7 dependentsjacob-long
interactions:Comprehensive, User-Friendly Toolkit for Probing Interactions
A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the 'jtools' package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of "simple slopes" and Johnson-Neyman intervals (see e.g., Bauer & Curran, 2005 <doi:10.1207/s15327906mbr4003_5>). These capabilities are implemented for generalized linear models in addition to the standard linear regression context.
Maintained by Jacob A. Long. Last updated 8 months ago.
interactionsmoderationsocial-sciencesstatistics
131 stars 11.40 score 1.2k scripts 5 dependentslazappi
clustree:Visualise Clusterings at Different Resolutions
Deciding what resolution to use can be a difficult question when approaching a clustering analysis. One way to approach this problem is to look at how samples move as the number of clusters increases. This package allows you to produce clustering trees, a visualisation for interrogating clusterings as resolution increases.
Maintained by Luke Zappia. Last updated 1 years ago.
clusteringclustering-treesvisualisationvisualization
219 stars 11.40 score 1.9k scripts 5 dependentsinsightsengineering
cards:Analysis Results Data
Construct CDISC (Clinical Data Interchange Standards Consortium) compliant Analysis Results Data objects. These objects are used and re-used to construct summary tables, visualizations, and written reports. The package also exports utilities for working with these objects and creating new Analysis Results Data objects.
Maintained by Daniel D. Sjoberg. Last updated 1 months ago.
40 stars 11.40 score 100 scripts 19 dependentsbioc
PharmacoGx:Analysis of Large-Scale Pharmacogenomic Data
Contains a set of functions to perform large-scale analysis of pharmaco-genomic data. These include the PharmacoSet object for storing the results of pharmacogenomic experiments, as well as a number of functions for computing common summaries of drug-dose response and correlating them with the molecular features in a cancer cell-line.
Maintained by Benjamin Haibe-Kains. Last updated 3 months ago.
geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationdatasetspharmacogenomicpharmacogxcpp
68 stars 11.39 score 442 scripts 3 dependentsr4ss
r4ss:R Code for Stock Synthesis
A collection of R functions for use with Stock Synthesis, a fisheries stock assessment modeling platform written in ADMB by Dr. Richard D. Methot at the NOAA Northwest Fisheries Science Center. The functions include tools for summarizing and plotting results, manipulating files, visualizing model parameterizations, and various other common stock assessment tasks. This version of '{r4ss}' is compatible with Stock Synthesis versions 3.24 through 3.30 (specifically version 3.30.23.1, from December 2024). Support for 3.24 models is only through the core functions for reading output and plotting.
Maintained by Ian G. Taylor. Last updated 20 days ago.
fisheriesfisheries-stock-assessmentstock-synthesis
43 stars 11.38 score 1.0k scripts 2 dependentsdoi-usgs
nhdplusTools:NHDPlus Tools
Tools for traversing and working with National Hydrography Dataset Plus (NHDPlus) data. All methods implemented in 'nhdplusTools' are available in the NHDPlus documentation available from the US Environmental Protection Agency <https://www.epa.gov/waterdata/basic-information>.
Maintained by David Blodgett. Last updated 1 months ago.
87 stars 11.38 score 348 scripts 5 dependentsrolkra
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 1 days ago.
data-explorationdata-visualisationdecision-treesedarmarkdownshinytidy
230 stars 11.36 score 221 scripts 1 dependentsropensci
biomartr:Genomic Data Retrieval
Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.
Maintained by Hajk-Georg Drost. Last updated 2 months ago.
biomartgenomic-data-retrievalannotation-retrievaldatabase-retrievalncbiensemblbiological-data-retrievalensembl-serversgenomegenome-annotationgenome-retrievalgenomicsmeta-analysismetagenomicsncbi-genbankpeer-reviewedproteomesequenced-genomes
218 stars 11.35 score 129 scripts 3 dependentsdsy109
mixtools:Tools for Analyzing Finite Mixture Models
Analyzes finite mixture models for various parametric and semiparametric settings. This includes mixtures of parametric distributions (normal, multivariate normal, multinomial, gamma), various Reliability Mixture Models (RMMs), mixtures-of-regressions settings (linear regression, logistic regression, Poisson regression, linear regression with changepoints, predictor-dependent mixing proportions, random effects regressions, hierarchical mixtures-of-experts), and tools for selecting the number of components (bootstrapping the likelihood ratio test statistic, mixturegrams, and model selection criteria). Bayesian estimation of mixtures-of-linear-regressions models is available as well as a novel data depth method for obtaining credible bands. This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772 and the Chan Zuckerberg Initiative: Essential Open Source Software for Science (Grant No. 2020-255193).
Maintained by Derek Young. Last updated 10 months ago.
mixture-modelsmixture-of-expertssemiparametric-regression
20 stars 11.34 score 1.4k scripts 56 dependentsmoderndive
moderndive:Tidyverse-Friendly Introductory Linear Regression
Datasets and wrapper functions for tidyverse-friendly introductory linear regression, used in "Statistical Inference via Data Science: A ModernDive into R and the Tidyverse" available at <https://moderndive.com/>.
Maintained by Albert Y. Kim. Last updated 3 months ago.
88 stars 11.32 score 1.8k scriptsmrcieu
TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database
A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.
Maintained by Gibran Hemani. Last updated 4 days ago.
476 stars 11.27 score 1.7k scripts 1 dependentsbioc
decoupleR:decoupleR: Ensemble of computational methods to infer biological activities from omics data
Many methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability. Here, we present decoupleR, a Bioconductor package containing different statistical methods to extract these signatures within a unified framework. decoupleR allows the user to flexibly test any method with any resource. It incorporates methods that take into account the sign and weight of network interactions. decoupleR can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.
Maintained by Pau Badia-i-Mompel. Last updated 5 months ago.
differentialexpressionfunctionalgenomicsgeneexpressiongeneregulationnetworksoftwarestatisticalmethodtranscription
230 stars 11.27 score 316 scripts 3 dependentsadeverse
adespatial:Multivariate Multiscale Spatial Analysis
Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran's Eigenvectors Maps, MEM). Several approaches are described in the review Dray et al (2012) <doi:10.1890/11-1183.1>.
Maintained by Aurélie Siberchicot. Last updated 12 hours ago.
36 stars 11.25 score 398 scripts 2 dependentsfishr-core-team
FSA:Simple Fisheries Stock Assessment Methods
A variety of simple fish stock assessment methods.
Maintained by Derek H. Ogle. Last updated 2 months ago.
fishfisheriesfisheries-managementfisheries-stock-assessmentpopulation-dynamicsstock-assessment
69 stars 11.16 score 1.7k scripts 6 dependentsjuliasilge
widyr:Widen, Process, then Re-Tidy Data
Encapsulates the pattern of untidying data into a wide matrix, performing some processing, then turning it back into a tidy form. This is useful for several operations such as co-occurrence counts, correlations, or clustering that are mathematically convenient on wide matrices.
Maintained by Julia Silge. Last updated 2 years ago.
329 stars 11.12 score 1.7k scripts 2 dependentsfmichonneau
phylobase:Base Package for Phylogenetic Structures and Comparative Data
Provides a base S4 class for comparative methods, incorporating one or more trees and trait data.
Maintained by Francois Michonneau. Last updated 1 years ago.
18 stars 11.10 score 394 scripts 18 dependentscovid19datahub
COVID19:COVID-19 Data Hub
Unified datasets for a better understanding of COVID-19.
Maintained by Emanuele Guidotti. Last updated 1 months ago.
2019-ncovcoronaviruscovid-19covid-datacovid19-data
252 stars 11.08 score 265 scriptsropengov
eurostat:Tools for Eurostat Open Data
Tools to download data from the Eurostat database <https://ec.europa.eu/eurostat> together with search and manipulation utilities.
Maintained by Leo Lahti. Last updated 1 months ago.
242 stars 11.07 score 892 scripts 4 dependentstidymodels
tidypredict:Run Predictions Inside the Database
It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.
Maintained by Emil Hvitfeldt. Last updated 3 months ago.
262 stars 11.05 score 241 scripts 2 dependentschoonghyunryu
dlookr:Tools for Data Diagnosis, Exploration, Transformation
A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Maintained by Choonghyun Ryu. Last updated 10 months ago.
212 stars 11.05 score 748 scripts 2 dependentsmhahsler
arulesViz:Visualizing Association Rules and Frequent Itemsets
Extends package 'arules' with various visualization techniques for association rules and itemsets. The package also includes several interactive visualizations for rule exploration. Michael Hahsler (2017) <doi:10.32614/RJ-2017-047>.
Maintained by Michael Hahsler. Last updated 7 months ago.
arulesassociation-rulesfrequent-itemsetsinteractive-visualizationsvisualization
54 stars 11.03 score 1.7k scripts 2 dependentsbioc
Maaslin2:"Multivariable Association Discovery in Population-scale Meta-omics Studies"
MaAsLin2 is comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta'omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods. MaAsLin2 is the next generation of MaAsLin.
Maintained by Lauren McIver. Last updated 5 months ago.
metagenomicssoftwaremicrobiomenormalizationbiobakerybioconductordifferential-abundance-analysisfalse-discovery-ratemultiple-covariatespublicrepeated-measurestools
133 stars 11.03 score 532 scripts 3 dependentsuupharmacometrics
xpose:Diagnostics for Pharmacometric Models
Diagnostics for non-linear mixed-effects (population) models from 'NONMEM' <https://www.iconplc.com/solutions/technologies/nonmem/>. 'xpose' facilitates data import, creation of numerical run summary and provide 'ggplot2'-based graphics for data exploration and model diagnostics.
Maintained by Benjamin Guiastrennec. Last updated 3 months ago.
diagnosticsggplot2nonmempharmacometricsxpose
62 stars 11.02 score 183 scripts 6 dependentsbioc
CATALYST:Cytometry dATa anALYSis Tools
CATALYST provides tools for preprocessing of and differential discovery in cytometry data such as FACS, CyTOF, and IMC. Preprocessing includes i) normalization using bead standards, ii) single-cell deconvolution, and iii) bead-based compensation. For differential discovery, the package provides a number of convenient functions for data processing (e.g., clustering, dimension reduction), as well as a suite of visualizations for exploratory data analysis and exploration of results from differential abundance (DA) and state (DS) analysis in order to identify differences in composition and expression profiles at the subpopulation-level, respectively.
Maintained by Helena L. Crowell. Last updated 4 months ago.
clusteringdataimportdifferentialexpressionexperimentaldesignflowcytometryimmunooncologymassspectrometrynormalizationpreprocessingsinglecellsoftwarestatisticalmethodvisualization
67 stars 10.99 score 362 scripts 2 dependentsbioc
infercnv:Infer Copy Number Variation from Single-Cell RNA-Seq Data
Using single-cell RNA-Seq expression to visualize CNV in cells.
Maintained by Christophe Georgescu. Last updated 5 months ago.
softwarecopynumbervariationvariantdetectionstructuralvariationgenomicvariationgeneticstranscriptomicsstatisticalmethodbayesianhiddenmarkovmodelsinglecelljagscpp
601 stars 10.92 score 674 scriptsindrajeetpatil
statsExpressions:Tidy Dataframes and Expressions with Statistical Details
Utilities for producing dataframes with rich details for the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian t-test, one-way ANOVA, correlation analyses, contingency table analyses, and meta-analyses. The functions are pipe-friendly and provide a consistent syntax to work with tidy data. These dataframes additionally contain expressions with statistical details, and can be used in graphing packages. This package also forms the statistical processing backend for 'ggstatsplot'. References: Patil (2021) <doi:10.21105/joss.03236>.
Maintained by Indrajeet Patil. Last updated 1 months ago.
bayesian-inferencebayesian-statisticscontingency-tablecorrelationeffectsizemeta-analysisparametricrobustrobust-statisticsstatistical-detailsstatistical-teststidy
312 stars 10.92 score 146 scripts 2 dependentstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 12 days ago.
160 stars 10.86 score 964 scripts 1 dependentsohdsi
PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model
A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.
Maintained by Egill Fridgeirsson. Last updated 24 days ago.
190 stars 10.85 score 297 scriptsfriendly
vcdExtra:'vcd' Extensions and Additions
Provides additional data sets, methods and documentation to complement the 'vcd' package for Visualizing Categorical Data and the 'gnm' package for Generalized Nonlinear Models. In particular, 'vcdExtra' extends mosaic, assoc and sieve plots from 'vcd' to handle 'glm()' and 'gnm()' models and adds a 3D version in 'mosaic3d'. Additionally, methods are provided for comparing and visualizing lists of 'glm' and 'loglm' objects. This package is now a support package for the book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer.
Maintained by Michael Friendly. Last updated 7 days ago.
categorical-data-visualizationgeneralized-linear-modelsmosaic-plots
24 stars 10.85 score 472 scripts 3 dependentsbioc
muscat:Multi-sample multi-group scRNA-seq data analysis tools
`muscat` provides various methods and visualization tools for DS analysis in multi-sample, multi-group, multi-(cell-)subpopulation scRNA-seq data, including cell-level mixed models and methods based on aggregated “pseudobulk” data, as well as a flexible simulation platform that mimics both single and multi-sample scRNA-seq data.
Maintained by Helena L. Crowell. Last updated 5 months ago.
immunooncologydifferentialexpressionsequencingsinglecellsoftwarestatisticalmethodvisualization
184 stars 10.74 score 686 scripts 1 dependentsrstudio
pointblank:Data Validation and Organization of Metadata for Local and Remote Tables
Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.
Maintained by Richard Iannone. Last updated 5 days ago.
data-assertionsdata-checkerdata-dictionariesdata-framesdata-inferencedata-managementdata-profilerdata-qualitydata-validationdata-verificationdatabase-tableseasy-to-understandreporting-toolschema-validationtesting-toolsyaml-configuration
942 stars 10.73 score 284 scriptsapache
apache.sedona:R Interface for Apache Sedona
R interface for 'Apache Sedona' based on 'sparklyr' (<https://sedona.apache.org>).
Maintained by Apache Sedona. Last updated 4 hours ago.
cluster-computinggeospatialjavapythonscalaspatial-analysisspatial-queryspatial-sql
2.0k stars 10.72 score 105 scriptspecanproject
PEcAn.benchmark:PEcAn Functions Used for Benchmarking
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. The PEcAn.benchmark package provides utilities for comparing models and data, including a suite of statistical metrics and plots.
Maintained by Mike Dietze. Last updated 3 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
216 stars 10.72 score 416 scripts 11 dependentsjimmyday12
fitzRoy:Easily Scrape and Process AFL Data
An easy package for scraping and processing Australia Rules Football (AFL) data. 'fitzRoy' provides a range of functions for accessing publicly available data from 'AFL Tables' <https://afltables.com/afl/afl_index.html>, 'Footy Wire' <https://www.footywire.com> and 'The Squiggle' <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.
Maintained by James Day. Last updated 11 days ago.
136 stars 10.72 score 324 scriptscjvanlissa
tidySEM:Tidy Structural Equation Modeling
A tidy workflow for generating, estimating, reporting, and plotting structural equation models using 'lavaan', 'OpenMx', or 'Mplus'. Throughout this workflow, elements of syntax, results, and graphs are represented as 'tidy' data, making them easy to customize. Includes functionality to estimate latent class analyses, and to plot 'dagitty' and 'igraph' objects.
Maintained by Caspar J. van Lissa. Last updated 23 days ago.
58 stars 10.69 score 330 scripts 1 dependentsbioc
GWASTools:Tools for Genome Wide Association Studies
Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.
Maintained by Stephanie M. Gogarten. Last updated 13 days ago.
snpgeneticvariabilityqualitycontrolmicroarray
17 stars 10.67 score 396 scripts 5 dependentsneonscience
neonUtilities:Utilities for Working with NEON Data
NEON data packages can be accessed through the NEON Data Portal <https://www.neonscience.org> or through the NEON Data API (see <https://data.neonscience.org/data-api> for documentation). Data delivered from the Data Portal are provided as monthly zip files packaged within a parent zip file, while individual files can be accessed from the API. This package provides tools that aid in discovering, downloading, and reformatting data prior to use in analyses. This includes downloading data via the API, merging data tables by type, and converting formats. For more information, see the readme file at <https://github.com/NEONScience/NEON-utilities>.
Maintained by Claire Lunch. Last updated 2 months ago.
57 stars 10.66 score 944 scripts 15 dependentsohdsi
FeatureExtraction:Generating Features for a Cohort
An R interface for generating features for a cohort using data in the Common Data Model. Features can be constructed using default or custom made feature definitions. Furthermore it's possible to aggregate features and get the summary statistics.
Maintained by Ger Inberg. Last updated 11 days ago.
62 stars 10.64 score 209 scripts 2 dependentscolearendt
tidyjson:Tidy Complex 'JSON'
Turn complex 'JSON' data into tidy data frames.
Maintained by Cole Arendt. Last updated 2 years ago.
192 stars 10.64 score 522 scripts 7 dependentsbusiness-science
modeltime:The Tidymodels Extension for Time Series Modeling
The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).
Maintained by Matt Dancho. Last updated 5 months ago.
arimadata-sciencedeep-learningetsforecastingmachine-learningmachine-learning-algorithmsmodeltimeprophettbatstidymodelingtidymodelstimetime-seriestime-series-analysistimeseriestimeseries-forecasting
551 stars 10.61 score 1.1k scripts 7 dependentsbioc
tximeta:Transcript Quantification Import with Automatic Metadata
Transcript quantification import from Salmon and other quantifiers with automatic attachment of transcript ranges and release information, and other associated metadata. De novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for computational reproducibility.
Maintained by Michael Love. Last updated 2 months ago.
annotationgenomeannotationdataimportpreprocessingrnaseqsinglecelltranscriptomicstranscriptiongeneexpressionfunctionalgenomicsreproducibleresearchreportwritingimmunooncology
67 stars 10.58 score 466 scripts 1 dependentshojsgaard
geepack:Generalized Estimating Equation Package
Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses. See e.g. Halekoh and Højsgaard, (2005, <doi:10.18637/jss.v015.i02>), for details.
Maintained by Søren Højsgaard. Last updated 8 months ago.
1 stars 10.57 score 1.7k scripts 43 dependentspharmaverse
ggsurvfit:Flexible Time-to-Event Figures
Ease the creation of time-to-event (i.e. survival) endpoint figures. The modular functions create figures ready for publication. Each of the functions that add to or modify the figure are written as proper 'ggplot2' geoms or stat methods, allowing the functions from this package to be combined with any function or customization from 'ggplot2' and other 'ggplot2' extension packages.
Maintained by Daniel D. Sjoberg. Last updated 2 months ago.
78 stars 10.57 score 640 scripts 2 dependentsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 1 months ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
33 stars 10.56 score 115 scripts 2 dependentsjmsigner
amt:Animal Movement Tools
Manage and analyze animal movement data. The functionality of 'amt' includes methods to calculate home ranges, track statistics (e.g. step lengths, speed, or turning angles), prepare data for fitting habitat selection analyses, and simulation of space-use from fitted step-selection functions.
Maintained by Johannes Signer. Last updated 5 months ago.
41 stars 10.54 score 418 scriptsbioc
miloR:Differential neighbourhood abundance testing on a graph
Milo performs single-cell differential abundance testing. Cell states are modelled as representative neighbourhoods on a nearest neighbour graph. Hypothesis testing is performed using either a negative bionomial generalized linear model or negative binomial generalized linear mixed model.
Maintained by Mike Morgan. Last updated 5 months ago.
singlecellmultiplecomparisonfunctionalgenomicssoftwareopenblascppopenmp
362 stars 10.49 score 340 scripts 1 dependentsjknowles
merTools:Tools for Analyzing Mixed Effect Regression Models
Provides methods for extracting results from mixed-effect model objects fit with the 'lme4' package. Allows construction of prediction intervals efficiently from large scale linear and generalized linear mixed-effects models. This method draws from the simulation framework used in the Gelman and Hill (2007) textbook: Data Analysis Using Regression and Multilevel/Hierarchical Models.
Maintained by Jared E. Knowles. Last updated 1 years ago.
105 stars 10.49 score 768 scriptsthie1e
cutpointr:Determine and Evaluate Optimal Cutpoints in Binary Classification Tasks
Estimate cutpoints that optimize a specified metric in binary classification tasks and validate performance using bootstrapping. Some methods for more robust cutpoint estimation are supported, e.g. a parametric method assuming normal distributions, bootstrapped cutpoints, and smoothing of the metric values per cutpoint using Generalized Additive Models. Various plotting functions are included. For an overview of the package see Thiele and Hirschfeld (2021) <doi:10.18637/jss.v098.i11>.
Maintained by Christian Thiele. Last updated 4 months ago.
bootstrappingcutpoint-optimizationroc-curvecpp
88 stars 10.44 score 322 scripts 1 dependentsbioc
GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness
The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.
Maintained by Stephanie M. Gogarten. Last updated 2 months ago.
snpgeneticvariabilitygeneticsstatisticalmethoddimensionreductionprincipalcomponentgenomewideassociationqualitycontrolbiocviews
36 stars 10.44 score 342 scripts 1 dependentsmbojan
alluvial:Alluvial Diagrams
Creating alluvial diagrams (also known as parallel sets plots) for multivariate and time series-like data.
Maintained by Michał Bojanowski. Last updated 3 years ago.
141 stars 10.43 score 287 scripts 9 dependentsbioc
scRepertoire:A toolkit for single-cell immune receptor profiling
scRepertoire is a toolkit for processing and analyzing single-cell T-cell receptor (TCR) and immunoglobulin (Ig). The scRepertoire framework supports use of 10x, AIRR, BD, MiXCR, Omniscope, TRUST4, and WAT3R single-cell formats. The functionality includes basic clonal analyses, repertoire summaries, distance-based clustering and interaction with the popular Seurat and SingleCellExperiment/Bioconductor R workflows.
Maintained by Nick Borcherding. Last updated 12 days ago.
softwareimmunooncologysinglecellclassificationannotationsequencingcpp
327 stars 10.42 score 240 scriptsdanchaltiel
crosstable:Crosstables for Descriptive Analyses
Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using 'rmarkdown' or 'officer'. You can also compute effect sizes and statistical tests if needed.
Maintained by Dan Chaltiel. Last updated 2 months ago.
descriptive-statisticsflextablefrequency-tablehtml-reportmswordofficer
116 stars 10.40 score 340 scriptsmike-lawrence
ez:Easy Analysis and Visualization of Factorial Experiments
Facilitates easy analysis of factorial experiments, including purely within-Ss designs (a.k.a. "repeated measures"), purely between-Ss designs, and mixed within-and-between-Ss designs. The functions in this package aim to provide simple, intuitive and consistent specification of data analysis and visualization. Visualization functions also include design visualization for pre-analysis data auditing, and correlation matrix visualization. Finally, this package includes functions for non-parametric analysis, including permutation tests and bootstrap resampling. The bootstrap function obtains predictions either by cell means or by more advanced/powerful mixed effects models, yielding predictions and confidence intervals that may be easily visualized at any level of the experiment's design.
Maintained by Michael A. Lawrence. Last updated 8 years ago.
53 stars 10.39 score 2.7k scripts 12 dependentsdicook
nullabor:Tools for Graphical Inference
Tools for visual inference. Generate null data sets and null plots using permutation and simulation. Calculate distance metrics for a lineup, and examine the distributions of metrics.
Maintained by Di Cook. Last updated 2 months ago.
57 stars 10.38 score 370 scripts 2 dependentsegeulgen
pathfindR:Enrichment Analysis Utilizing Active Subnetworks
Enrichment analysis enables researchers to uncover mechanisms underlying a phenotype. However, conventional methods for enrichment analysis do not take into account protein-protein interaction information, resulting in incomplete conclusions. 'pathfindR' is a tool for enrichment analysis utilizing active subnetworks. The main function identifies active subnetworks in a protein-protein interaction network using a user-provided list of genes and associated p values. It then performs enrichment analyses on the identified subnetworks, identifying enriched terms (i.e. pathways or, more broadly, gene sets) that possibly underlie the phenotype of interest. 'pathfindR' also offers functionalities to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results. The enrichment, clustering and other methods implemented in 'pathfindR' are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2019. 'pathfindR': An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. <doi:10.3389/fgene.2019.00858>.
Maintained by Ege Ulgen. Last updated 1 months ago.
active-subnetworksenrichmentpathwaypathway-enrichment-analysissubnetwork
187 stars 10.38 score 138 scriptstidymodels
themis:Extra Recipes Steps for Dealing with Unbalanced Data
A dataset with an uneven number of cases in each class is said to be unbalanced. Many models produce a subpar performance on unbalanced datasets. A dataset can be balanced by increasing the number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>, BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008 <https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the number of majority cases using NearMiss 2003 <https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf> or Tomek link removal 1976 <https://ieeexplore.ieee.org/document/4309452>.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
143 stars 10.37 score 1.3k scripts 2 dependentsbcgov
bcdata:Search and Retrieve Data from the BC Data Catalogue
Search, query, and download tabular and 'geospatial' data from the British Columbia Data Catalogue (<https://catalogue.data.gov.bc.ca/>). Search catalogue data records based on keywords, data licence, sector, data format, and B.C. government organization. View metadata directly in R, download many data formats, and query 'geospatial' data available via the B.C. government Web Feature Service ('WFS') using 'dplyr' syntax.
Maintained by Andy Teucher. Last updated 5 days ago.
83 stars 10.36 score 186 scripts 4 dependentsnacnudus
unpivotr:Unpivot Complex and Irregular Data Layouts
Tools for converting data from complex or irregular layouts to a columnar structure. For example, tables with multilevel column or row headers, or spreadsheets. Header and data cells are selected by their contents and position, as well as formatting and comments where available, and are associated with one other by their proximity in given directions. Functions for data frames and HTML tables are provided.
Maintained by Duncan Garmonsway. Last updated 2 months ago.
186 stars 10.35 score 368 scripts 3 dependentsssnn-airr
alakazam:Immunoglobulin Clonal Lineage and Diversity Analysis
Provides methods for high-throughput adaptive immune receptor repertoire sequencing (AIRR-Seq; Rep-Seq) analysis. In particular, immunoglobulin (Ig) sequence lineage reconstruction, lineage topology analysis, diversity profiling, amino acid property analysis and gene usage. Citations: Gupta and Vander Heiden, et al (2017) <doi:10.1093/bioinformatics/btv359>, Stern, Yaari and Vander Heiden, et al (2014) <doi:10.1126/scitranslmed.3008879>.
Maintained by Susanna Marquez. Last updated 3 months ago.
10.33 score 424 scripts 7 dependentsludvigolsen
cvms:Cross-Validation for Model Selection
Cross-validate one or multiple regression and classification models and get relevant evaluation metrics in a tidy format. Validate the best model on a test set and compare it to a baseline evaluation. Alternatively, evaluate predictions from an external model. Currently supports regression and classification (binary and multiclass). Described in chp. 5 of Jeyaraman, B. P., Olsen, L. R., & Wambugu M. (2019, ISBN: 9781838550134).
Maintained by Ludvig Renbo Olsen. Last updated 25 days ago.
39 stars 10.31 score 492 scripts 5 dependentsbioc
pRoloc:A unifying bioinformatics framework for spatial proteomics
The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation.
Maintained by Lisa Breckels. Last updated 5 days ago.
immunooncologyproteomicsmassspectrometryclassificationclusteringqualitycontrolbioconductorproteomics-dataspatial-proteomicsvisualisationopenblascpp
15 stars 10.31 score 101 scripts 2 dependentsbioc
CAMERA:Collection of annotation related methods for mass spectrometry data
Annotation of peaklists generated by xcms, rule based annotation of isotopes and adducts, isotope validation, EIC correlation based tagging of unknown adducts and fragments
Maintained by Steffen Neumann. Last updated 5 months ago.
immunooncologymassspectrometrymetabolomics
11 stars 10.27 score 175 scripts 6 dependentsfacebookexperimental
Robyn:Semi-Automated Marketing Mix Modeling (MMM) from Meta Marketing Science
Semi-Automated Marketing Mix Modeling (MMM) aiming to reduce human bias by means of ridge regression and evolutionary algorithms, enables actionable decision making providing a budget allocation and diminishing returns curves and allows ground-truth calibration to account for causation.
Maintained by Gufeng Zhou. Last updated 13 days ago.
adstockingbudget-allocationcost-response-curveeconometricsevolutionary-algorithmgradient-based-optimisationhyperparameter-optimizationmarketing-mix-modelingmarketing-mix-modellingmarketing-sciencemmmridge-regression
1.3k stars 10.27 score 95 scriptsbioc
EDASeq:Exploratory Data Analysis and Normalization for RNA-Seq
Numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).
Maintained by Davide Risso. Last updated 5 months ago.
immunooncologysequencingrnaseqpreprocessingqualitycontroldifferentialexpression
5 stars 10.24 score 594 scripts 9 dependentsropensci
qualtRics:Download 'Qualtrics' Survey Data
Provides functions to access survey results directly into R using the 'Qualtrics' API. 'Qualtrics' <https://www.qualtrics.com/about/> is an online survey and data collection software platform. See <https://api.qualtrics.com/> for more information about the 'Qualtrics' API. This package is community-maintained and is not officially supported by 'Qualtrics'.
Maintained by Julia Silge. Last updated 7 months ago.
apiqualtricsqualtrics-apisurveysurvey-data
221 stars 10.23 score 272 scripts