R-universe search: compare

ngmarchant

comparator:Comparison Functions for Clustering and Record Linkage

Implements functions for comparing strings, sequences and numeric vectors for clustering and record linkage applications. Supported comparison functions include: generalized edit distances for comparing sequences/strings, Monge-Elkan similarity for fuzzy comparison of token sets, and L-p distances for comparing numeric vectors. Where possible, comparison functions are implemented in C/C++ to ensure good performance.

Maintained by Neil Marchant. Last updated 3 years ago.

clustering distance-measures distance-metrics entity-resolution record-linkage similarity-measures string-similarity cpp

94.7 match 18 stars 4.63 score 47 scripts

cran

compare:Comparing Objects for Differences

Functions to compare a model object to a comparison object. If the objects are not identical, the functions can be instructed to explore various modifications of the objects (e.g., sorting rows, dropping names) to see if the modified versions are identical.

Maintained by Paul Murrell. Last updated 10 years ago.

80.3 match 4.68 score 5 dependents

emmanuelparadis

ape:Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

Maintained by Emmanuel Paradis. Last updated 1 months ago.

openblas cpp

19.4 match 64 stars 17.18 score 13k scripts 601 dependents

collinerickson

comparer:Compare Output and Run Time

Quickly run experiments to compare the run time and output of code blocks. The function mbc() can make fast comparisons of code, and will calculate statistics comparing the resulting outputs. It can be used to compare model fits to the same data or see which function runs faster. The R6 class ffexp$new() runs a function using all possible combinations of selected inputs. This is useful for comparing the effect of different parameter values. It can also run in parallel and automatically save intermediate results, which is very useful for long computations.

Maintained by Collin Erickson. Last updated 5 months ago.

60.6 match 4 stars 5.38 score 20 scripts

rspatial

terra:Spatial Data Analysis

Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).

Maintained by Robert J. Hijmans. Last updated 4 hours ago.

geospatial raster spatial vector onetbb proj gdal geos cpp

15.9 match 559 stars 17.64 score 17k scripts 851 dependents

davidorme

caper:Comparative Analyses of Phylogenetics and Evolution in R

Functions for performing phylogenetic comparative analyses.

Maintained by David Orme. Last updated 1 years ago.

32.3 match 1 stars 7.41 score 928 scripts 5 dependents

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

15.1 match 459 stars 14.63 score 948 scripts 18 dependents

geomorphr

geomorph:Geometric Morphometric Analyses of 2D and 3D Landmark Data

Read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation.

Maintained by Dean Adams. Last updated 1 months ago.

16.1 match 76 stars 12.05 score 700 scripts 6 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 6 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

11.0 match 462 stars 16.50 score 10k scripts 154 dependents

capitalone

dataCompareR:Compare Two Data Frames and Summarise the Difference

Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn't intended to replace all.equal() as a way to test for equality.

Maintained by Sarah Johnston. Last updated 2 years ago.

compare-data data data-analysis data-science

22.5 match 76 stars 7.24 score 76 scripts

hfgolino

EGAnet:Exploratory Graph Analysis – a Framework for Estimating the Number of Dimensions in Multivariate Data using Network Psychometrics

Implements the Exploratory Graph Analysis (EGA) framework for dimensionality and psychometric assessment. EGA estimates the number of dimensions in psychological data using network estimation methods and community detection algorithms. A bootstrap method is provided to assess the stability of dimensions and items. Fit is evaluated using the Entropy Fit family of indices. Unique Variable Analysis evaluates the extent to which items are locally dependent (or redundant). Network loadings provide similar information to factor loadings and can be used to compute network scores. A bootstrap and permutation approach are available to assess configural and metric invariance. Hierarchical structures can be detected using Hierarchical EGA. Time series and intensive longitudinal data can be analyzed using Dynamic EGA, supporting individual, group, and population level assessments.

Maintained by Hudson Golino. Last updated 9 days ago.

20.0 match 47 stars 7.80 score 61 scripts 1 dependents

rspatial

raster:Geographic Data Analysis and Modeling

Reading, writing, manipulating, analyzing and modeling of spatial data. This package has been superseded by the "terra" package <https://CRAN.R-project.org/package=terra>.

Maintained by Robert J. Hijmans. Last updated 2 months ago.

cpp

8.8 match 164 stars 17.05 score 58k scripts 555 dependents

bioc

survcomp:Performance Assessment and Comparison for Survival Analysis

Assessment and Comparison for Performance of Risk Prediction (Survival) Models.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression differentialexpression visualization cpp

17.7 match 8.46 score 448 scripts 12 dependents

bnowok

synthpop:Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. For a description of the implemented method see Nowok, Raab and Dibben (2016) <doi:10.18637/jss.v074.i11>.

Maintained by Beata Nowok. Last updated 3 years ago.

19.1 match 44 stars 7.85 score 536 scripts

talgalili

dendextend:Extending 'dendrogram' Functionality in R

Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.

Maintained by Tal Galili. Last updated 2 months ago.

8.5 match 154 stars 17.02 score 6.0k scripts 164 dependents

thackl

gggenomes:A Grammar of Graphics for Comparative Genomics

An extension of 'ggplot2' for creating complex genomic maps. It builds on the power of 'ggplot2' and 'tidyverse' adding new 'ggplot2'-style geoms & positions and 'dplyr'-style verbs to manipulate the underlying data. It implements a layout concept inspired by 'ggraph' and introduces tracks to bring tidiness to the mess that is genomics data.

Maintained by Thomas Hackl. Last updated 1 months ago.

biological-data comparative-genomics genomics-visualization ggplot-extension ggplot2

14.8 match 650 stars 9.56 score 123 scripts

tidyverse

tibble:Simple Data Frames

Provides a 'tbl_df' class (the 'tibble') with stricter checking and better formatting than the traditional data frame.

Maintained by Kirill Müller. Last updated 3 months ago.

tidy-data

6.1 match 692 stars 22.78 score 47k scripts 11k dependents

braverock

PerformanceAnalytics:Econometric Tools for Performance and Risk Analysis

Collection of econometric functions for performance and risk analysis. In addition to standard risk and performance metrics, this package aims to aid practitioners and researchers in utilizing the latest research in analysis of non-normal return streams. In general, it is most tested on return (rather than price) data on a regular scale, but most functions will work with irregular return data as well, and increasing numbers of functions will work with P&L or price data where possible.

Maintained by Brian G. Peterson. Last updated 3 months ago.

8.4 match 222 stars 15.93 score 4.8k scripts 20 dependents

liamrevell

phytools:Phylogenetic Tools for Comparative Biology (and Other Things)

A wide range of methods for phylogenetic analysis - concentrated in phylogenetic comparative biology, but also including numerous techniques for visualizing, analyzing, manipulating, reading or writing, and even inferring phylogenetic trees. Included among the functions in phylogenetic comparative biology are various for ancestral state reconstruction, model-fitting, and simulation of phylogenies and trait data. A broad range of plotting methods for phylogenies and comparative data include (but are not restricted to) methods for mapping trait evolution on trees, for projecting trees into phenotype space or a onto a geographic map, and for visualizing correlated speciation between trees. Lastly, numerous functions are designed for reading, writing, analyzing, inferring, simulating, and manipulating phylogenetic trees and comparative data. For instance, there are functions for computing consensus phylogenies from a set, for simulating phylogenetic trees and data under a range of models, for randomly or non-randomly attaching species or clades to a tree, as well as for a wide range of other manipulations and analyses that phylogenetic biologists might find useful in their research.

Maintained by Liam J. Revell. Last updated 27 days ago.

9.7 match 218 stars 13.85 score 4.8k scripts 76 dependents

kingaa

ouch:Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses

Fit and compare Ornstein-Uhlenbeck models for evolution along a phylogenetic tree.

Maintained by Aaron A. King. Last updated 4 months ago.

adaptive-regime brownian-motion ornstein-uhlenbeck ornstein-uhlenbeck-models ouch phylogenetic-comparative-hypotheses phylogenetic-comparative-methods phylogenetic-data react

18.6 match 15 stars 6.87 score 68 scripts 4 dependents

stan-dev

loo:Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models

Efficient approximate leave-one-out cross-validation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions.

Maintained by Jonah Gabry. Last updated 2 days ago.

bayes bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics cross-validation information-criterion model-comparison stan

7.2 match 152 stars 17.30 score 2.6k scripts 297 dependents

walkerke

mapgl:Interactive Maps with 'Mapbox GL JS' and 'MapLibre GL JS'

Provides an interface to the 'Mapbox GL JS' (<https://docs.mapbox.com/mapbox-gl-js/guides>) and the 'MapLibre GL JS' (<https://maplibre.org/maplibre-gl-js/docs/>) interactive mapping libraries to help users create custom interactive maps in R. Users can create interactive globe visualizations; layer 'sf' objects to create filled maps, circle maps, 'heatmaps', and three-dimensional graphics; and customize map styles and views. The package also includes utilities to use 'Mapbox' and 'MapLibre' maps in 'Shiny' web applications.

Maintained by Kyle Walker. Last updated 10 hours ago.

15.2 match 114 stars 8.06 score 138 scripts

skranz

RTutor:Interactive R problem sets with automatic testing of solutions and automatic hints

Interactive R problem sets with automatic testing of solutions and automatic hints

Maintained by Sebastian Kranz. Last updated 1 years ago.

economics learn-to-code problem-set rstudio rtutor shiny teaching

19.6 match 205 stars 5.83 score 111 scripts 1 dependents

paternogbc

sensiPhy:Sensitivity Analysis for Comparative Methods

An implementation of sensitivity analysis for phylogenetic comparative methods. The package is an umbrella of statistical and graphical methods that estimate and report different types of uncertainty in PCM: (i) Species Sampling uncertainty (sample size; influential species and clades). (ii) Phylogenetic uncertainty (different topologies and/or branch lengths). (iii) Data uncertainty (intraspecific variation and measurement error).

Maintained by Gustavo Paterno. Last updated 5 years ago.

comparative-methods ecology evolution phylogenetics sensitivity-analysis

17.6 match 13 stars 6.38 score 61 scripts

cran

nlme:Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixed-effects models.

Maintained by R Core Team. Last updated 2 months ago.

fortran

8.6 match 6 stars 13.00 score 13k scripts 8.7k dependents

ax3man

phylopath:Perform Phylogenetic Path Analysis

A comprehensive and easy to use R implementation of confirmatory phylogenetic path analysis as described by Von Hardenberg and Gonzalez-Voyer (2012) <doi:10.1111/j.1558-5646.2012.01790.x>.

Maintained by Wouter van der Bijl. Last updated 6 months ago.

analysis comparative-methods path phylogenetics

13.7 match 13 stars 8.10 score 81 scripts 1 dependents

ms609

TreeDist:Calculate and Map Distances Between Phylogenetic Trees

Implements measures of tree similarity, including information-based generalized Robinson-Foulds distances (Phylogenetic Information Distance, Clustering Information Distance, Matching Split Information Distance; Smith 2020) <doi:10.1093/bioinformatics/btaa614>; Jaccard-Robinson-Foulds distances (Bocker et al. 2013) <doi:10.1007/978-3-642-40453-5_13>, including the Nye et al. (2006) metric <doi:10.1093/bioinformatics/bti720>; the Matching Split Distance (Bogdanowicz & Giaro 2012) <doi:10.1109/TCBB.2011.48>; Maximum Agreement Subtree distances; the Kendall-Colijn (2016) distance <doi:10.1093/molbev/msw124>, and the Nearest Neighbour Interchange (NNI) distance, approximated per Li et al. (1996) <doi:10.1007/3-540-61332-3_168>. Includes tools for visualizing mappings of tree space (Smith 2022) <doi:10.1093/sysbio/syab100>, for identifying islands of trees (Silva and Wilkinson 2021) <doi:10.1093/sysbio/syab015>, for calculating the median of sets of trees, and for computing the information content of trees and splits.

Maintained by Martin R. Smith. Last updated 1 months ago.

phylogenetics tree-distance phylogenetic-trees tree-distances trees cpp

10.3 match 32 stars 10.32 score 97 scripts 5 dependents

lem-usp

evolqg:Evolutionary Quantitative Genetics

Provides functions for covariance matrix comparisons, estimation of repeatabilities in measurements and matrices, and general evolutionary quantitative genetics tools. Melo D, Garcia G, Hubbe A, Assis A P, Marroig G. (2016) <doi:10.12688/f1000research.7082.3>.

Maintained by Diogo Melo. Last updated 11 months ago.

openblas cpp

16.7 match 10 stars 6.26 score 114 scripts

mjskay

tidybayes:Tidy Data and 'Geoms' for Bayesian Models

Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.

Maintained by Matthew Kay. Last updated 6 months ago.

bayesian-data-analysis brms ggplot2 jags stan tidy-data visualization

7.0 match 732 stars 14.88 score 7.3k scripts 19 dependents

igraph

igraph:Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

Maintained by Kirill Müller. Last updated 2 days ago.

complex-networks graph-algorithms graph-theory mathematics network-analysis network-graph fortran libxml2 glpk openblas cpp

4.8 match 581 stars 21.10 score 31k scripts 1.9k dependents

bioc

SparseArray:High-performance sparse data representation and manipulation in R

The SparseArray package provides array-like containers for efficient in-memory representation of multidimensional sparse data in R (arrays and matrices). The package defines the SparseArray virtual class and two concrete subclasses: COO_SparseArray and SVT_SparseArray. Each subclass uses its own internal representation of the nonzero multidimensional data: the "COO layout" and the "SVT layout", respectively. SVT_SparseArray objects mimic as much as possible the behavior of ordinary matrix and array objects in base R. In particular, they suppport most of the "standard matrix and array API" defined in base R and in the matrixStats package from CRAN.

Maintained by Hervé Pagès. Last updated 23 days ago.

infrastructure datarepresentation bioconductor-package core-package openmp

7.9 match 8 stars 12.68 score 79 scripts 1.2k dependents

kbroman

qtl:Tools for Analyzing QTL Experiments

Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits. Broman et al. (2003) <doi:10.1093/bioinformatics/btg112>.

Maintained by Karl W Broman. Last updated 7 months ago.

openblas

7.5 match 80 stars 12.79 score 2.4k scripts 29 dependents

andrewljackson

SIBER:Stable Isotope Bayesian Ellipses in R

Fits bi-variate ellipses to stable isotope data using Bayesian inference with the aim being to describe and compare their isotopic niche.

Maintained by Andrew Jackson. Last updated 10 months ago.

community-ecology ecology niche-modelling stable-isotopes jags cpp

10.5 match 36 stars 9.13 score 187 scripts 1 dependents

bioc

YAPSA:Yet Another Package for Signature Analysis

This package provides functions and routines for supervised analyses of mutational signatures (i.e., the signatures have to be known, cf. L. Alexandrov et al., Nature 2013 and L. Alexandrov et al., Bioaxiv 2018). In particular, the family of functions LCD (LCD = linear combination decomposition) can use optimal signature-specific cutoffs which takes care of different detectability of the different signatures. Moreover, the package provides different sets of mutational signatures, including the COSMIC and PCAWG SNV signatures and the PCAWG Indel signatures; the latter infering that with YAPSA, the concept of supervised analysis of mutational signatures is extended to Indel signatures. YAPSA also provides confidence intervals as computed by profile likelihoods and can perform signature analysis on a stratified mutational catalogue (SMC = stratify mutational catalogue) in order to analyze enrichment and depletion patterns for the signatures in different strata.

Maintained by Zuguang Gu. Last updated 5 months ago.

sequencing dnaseq somaticmutation visualization clustering genomicvariation statisticalmethod biologicalquestion

14.8 match 6.41 score 57 scripts

bioc

cogeqc:Systematic quality checks on comparative genomics analyses

cogeqc aims to facilitate systematic quality checks on standard comparative genomics analyses to help researchers detect issues and select the most suitable parameters for each data set. cogeqc can be used to asses: i. genome assembly and annotation quality with BUSCOs and comparisons of statistics with publicly available genomes on the NCBI; ii. orthogroup inference using a protein domain-based approach and; iii. synteny detection using synteny network properties. There are also data visualization functions to explore QC summary statistics.

Maintained by Fabrício Almeida-Silva. Last updated 5 months ago.

software genomeassembly comparativegenomics functionalgenomics phylogenetics qualitycontrol network comparative-genomics evolutionary-genomics

15.6 match 10 stars 6.08 score 20 scripts

stmcg

metamedian:Meta-Analysis of Medians

Implements several methods to meta-analyze studies that report the sample median of the outcome. The methods described by McGrath et al. (2019) <doi:10.1002/sim.8013>, Ozturk and Balakrishnan (2020) <doi:10.1002/sim.8738>, and McGrath et al. (2020a) <doi:10.1002/bimj.201900036> can be applied to directly meta-analyze the median or difference of medians between groups. Additionally, a number of methods (e.g., McGrath et al. (2020b) <doi:10.1177/0962280219889080>, Cai et al. (2021) <doi:10.1177/09622802211047348>, and McGrath et al. (2023) <doi:10.1177/09622802221139233>) are implemented to estimate study-specific (difference of) means and their standard errors in order to estimate the pooled (difference of) means. Methods for meta-analyzing median survival times (McGrath et al. (2025) <doi:10.48550/arXiv.2503.03065>) are also implemented. See McGrath et al. (2024) <doi:10.1002/jrsm.1686> for a detailed guide on using the package.

Maintained by Sean McGrath. Last updated 8 days ago.

19.5 match 9 stars 4.86 score 16 scripts

gefeizhang

statVisual:Statistical Visualization Tools

Visualization functions in the applications of translational medicine (TM) and biomarker (BM) development to compare groups by statistically visualizing data and/or results of analyses, such as visualizing data by displaying in one figure different groups' histograms, boxplots, densities, scatter plots, error-bar plots, or trajectory plots, by displaying scatter plots of top principal components or dendrograms with data points colored based on group information, or visualizing volcano plots to check the results of whole genome analyses for gene differential expression.

Maintained by Wenfei Zhang. Last updated 5 years ago.

31.5 match 3.00 score 3 scripts

insightsengineering

rtables:Reporting Tables

Reporting tables often have structure that goes beyond simple rectangular data. The 'rtables' package provides a framework for declaring complex multi-level tabulations and then applying them to data. This framework models both tabulation and the resulting tables as hierarchical, tree-like objects which support sibling sub-tables, arbitrary splitting or grouping of data in row and column dimensions, cells containing multiple values, and the concept of contextual summary computations. A convenient pipe-able interface is provided for declaring table layouts and the corresponding computations, and then applying them to data.

Maintained by Joe Zhu. Last updated 2 months ago.

pharmaceuticals tables

6.7 match 232 stars 13.65 score 238 scripts 17 dependents

donaldrwilliams

BGGM:Bayesian Gaussian Graphical Models

Fit Bayesian Gaussian graphical models. The methods are separated into two Bayesian approaches for inference: hypothesis testing and estimation. There are extensions for confirmatory hypothesis testing, comparing Gaussian graphical models, and node wise predictability. These methods were recently introduced in the Gaussian graphical model literature, including Williams (2019) <doi:10.31234/osf.io/x8dpr>, Williams and Mulder (2019) <doi:10.31234/osf.io/ypxd8>, Williams, Rast, Pericchi, and Mulder (2019) <doi:10.31234/osf.io/yt386>.

Maintained by Philippe Rast. Last updated 3 months ago.

bayes-factors bayesian-hypothesis-testing gaussian-graphical-models openblas cpp openmp

9.4 match 55 stars 9.64 score 102 scripts 1 dependents

loukiaspin

rnmamod:Bayesian Network Meta-Analysis with Missing Participants

A comprehensive suite of functions to perform and visualise pairwise and network meta-analysis with aggregate binary or continuous missing participant outcome data. The package covers core Bayesian one-stage models implemented in a systematic review with multiple interventions, including fixed-effect and random-effects network meta-analysis, meta-regression, evaluation of the consistency assumption via the node-splitting approach and the unrelated mean effects model (original and revised model proposed by Spineli, (2022) <doi:10.1177/0272989X211068005>), and sensitivity analysis (see Spineli et al., (2021) <doi:10.1186/s12916-021-02195-y>). Missing participant outcome data are addressed in all models of the package (see Spineli, (2019) <doi:10.1186/s12874-019-0731-y>, Spineli et al., (2019) <doi:10.1002/sim.8207>, Spineli, (2019) <doi:10.1016/j.jclinepi.2018.09.002>, and Spineli et al., (2021) <doi:10.1002/jrsm.1478>). The robustness to primary analysis results can also be investigated using a novel intuitive index (see Spineli et al., (2021) <doi:10.1177/0962280220983544>). Methods to evaluate the transitivity assumption quantitatively are provided (see Spineli, (2024) <doi:10.1186/s12874-024-02436-7>). A novel index to facilitate interpretation of local inconsistency is also available (see Spineli, (2024) <doi:0.1186/s13643-024-02680-4>) The package also offers a rich, user-friendly visualisation toolkit that aids in appraising and interpreting the results thoroughly and preparing the manuscript for journal submission. The visualisation tools comprise the network plot, forest plots, panel of diagnostic plots, heatmaps on the extent of missing participant outcome data in the network, league heatmaps on estimation and prediction, rankograms, Bland-Altman plot, leverage plot, deviance scatterplot, heatmap of robustness, barplot of Kullback-Leibler divergence, heatmap of comparison dissimilarities and dendrogram of comparison clustering. The package also allows the user to export the results to an Excel file at the working directory.

Maintained by Loukia Spineli. Last updated 9 days ago.

jags cpp

13.5 match 5 stars 6.64 score 12 scripts

biodiverse

ubms:Bayesian Models for Data from Unmarked Animals using 'Stan'

Fit Bayesian hierarchical models of animal abundance and occurrence via the 'rstan' package, the R interface to the 'Stan' C++ library. Supported models include single-season occupancy, dynamic occupancy, and N-mixture abundance models. Covariates on model parameters are specified using a formula-based interface similar to package 'unmarked', while also allowing for estimation of random slope and intercept terms. References: Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.

Maintained by Ken Kellner. Last updated 17 days ago.

distance-sampling hierarchical-models n-mixture-model occupancy stan openblas cpp

11.3 match 35 stars 7.88 score 73 scripts

richarddmorey

BayesFactor:Computation of Bayes Factors for Common Designs

A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.

Maintained by Richard D. Morey. Last updated 1 years ago.

cpp

6.5 match 133 stars 13.70 score 1.7k scripts 21 dependents

bioc

ComplexHeatmap:Make Complex Heatmaps

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization sequencing clustering complex-heatmaps heatmap

5.2 match 1.3k stars 16.93 score 16k scripts 151 dependents

bioc

musicatk:Mutational Signature Comprehensive Analysis Toolkit

Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.

Maintained by Joshua D. Campbell. Last updated 5 months ago.

software biologicalquestion somaticmutation variantannotation

12.4 match 13 stars 7.02 score 20 scripts

kajlinko

testCompareR:Comparing Two Diagnostic Tests with Dichotomous Results using Paired Data

Provides a method for comparing the results of two binary diagnostic tests using paired data. Users can rapidly perform descriptive and inferential statistics in a single function call. Options permit users to select which parameters they are interested in comparing and methods for correction for multiple comparisons. Confidence intervals are calculated using the methods with the best coverage. Hypothesis tests use the methods with the best asymptotic performance. A summary of the methods is available in Roldán-Nofuentes (2020) <doi:10.1186/s12874-020-00988-y>. This package is targeted at clinical researchers who want to rapidly and effectively compare results from binary diagnostic tests.

Maintained by Kyle J. Wilson. Last updated 4 months ago.

19.8 match 4.30 score 4 scripts

bioc

variancePartition:Quantify and interpret drivers of variation in multilevel gene expression experiments

Quantify and interpret multiple sources of biological and technical variation in gene expression experiments. Uses a linear mixed model to quantify variation in gene expression attributable to individual, tissue, time point, or technical variables. Includes dream differential expression analysis for repeated measures.

Maintained by Gabriel E. Hoffman. Last updated 2 months ago.

rnaseq geneexpression genesetenrichment differentialexpression batcheffect qualitycontrol regression epigenetics functionalgenomics transcriptomics normalization preprocessing microarray immunooncology software

7.2 match 7 stars 11.69 score 1.1k scripts 3 dependents

joe-chelladurai

uxr:User Experience Research

Provides convenience functions for user experience research with an emphasis on quantitative user experience testing and reporting. The functions are designed to translate statistical approaches to applied user experience research.

Maintained by Joe Chelladurai. Last updated 2 years ago.

quantitative statistics ux-research

22.7 match 1 stars 3.70 score 10 scripts

bioc

clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters

Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.

Maintained by Rui Fu. Last updated 5 months ago.

singlecell annotation sequencing microarray geneexpression assign-identities clusters marker-genes rna-seq single-cell-rna-seq

8.5 match 119 stars 9.63 score 296 scripts

dgbonett

vcmeta:Varying Coefficient Meta-Analysis

Implements functions for varying coefficient meta-analysis methods. These methods do not assume effect size homogeneity. Subgroup effect size comparisons, general linear effect size contrasts, and linear models of effect sizes based on varying coefficient methods can be used to describe effect size heterogeneity. Varying coefficient meta-analysis methods do not require the unrealistic assumptions of the traditional fixed-effect and random-effects meta-analysis methods. For details see: Statistical Methods for Psychologists, Volume 5, <https://dgbonett.sites.ucsc.edu/>.

Maintained by Douglas G. Bonett. Last updated 8 months ago.

27.1 match 1 stars 3.00 score 8 scripts

pecanproject

PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.

Maintained by David LeBauer. Last updated 1 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

6.7 match 216 stars 11.59 score 64 scripts 14 dependents

jiscah

sequoia:Pedigree Inference from SNPs

Multi-generational pedigree inference from incomplete data on hundreds of SNPs, including parentage assignment and sibship clustering. See Huisman (2017) (<DOI:10.1111/1755-0998.12665>) for more information.

Maintained by Jisca Huisman. Last updated 9 months ago.

pedigree pedigree-reconstruction pedigrees sequoia snp snp-data fortran

10.5 match 26 stars 7.40 score 79 scripts

jclavel

mvMORPH:Multivariate Comparative Tools for Fitting Evolutionary Models to Morphometric Data

Fits multivariate (Brownian Motion, Early Burst, ACDC, Ornstein-Uhlenbeck and Shifts) models of continuous traits evolution on trees and time series. 'mvMORPH' also proposes high-dimensional multivariate comparative tools (linear models using Generalized Least Squares and multivariate tests) based on penalized likelihood. See Clavel et al. (2015) <DOI:10.1111/2041-210X.12420>, Clavel et al. (2019) <DOI:10.1093/sysbio/syy045>, and Clavel & Morlon (2020) <DOI:10.1093/sysbio/syaa010>.

Maintained by Julien Clavel. Last updated 1 months ago.

openblas

8.2 match 17 stars 9.46 score 189 scripts 3 dependents

bioc

SPIA:Signaling Pathway Impact Analysis (SPIA) using combined evidence of pathway over-representation and unusual signaling perturbations

This package implements the Signaling Pathway Impact Analysis (SPIA) which uses the information form a list of differentially expressed genes and their log fold changes together with signaling pathways topology, in order to identify the pathways most relevant to the condition under the study.

Maintained by Adi Laurentiu Tarca. Last updated 2 months ago.

microarray graphandnetwork

11.7 match 6.62 score 113 scripts 4 dependents

r-lib

waldo:Find Differences Between R Objects

Compare complex R objects and reveal the key differences. Designed particularly for use in testing packages where being able to quickly isolate key differences makes understanding test failures much easier.

Maintained by Hadley Wickham. Last updated 4 months ago.

diff testing

5.5 match 291 stars 13.95 score 143 scripts 480 dependents

cristianetaniguti

onemap:Construction of Genetic Maps in Experimental Crosses

Analysis of molecular marker data from model (backcrosses, F2 and recombinant inbred lines) and non-model systems (i. e. outcrossing species). For the later, it allows statistical analysis by simultaneously estimating linkage and linkage phases (genetic map construction) according to Wu et al. (2002) <doi:10.1006/tpbi.2002.1577>. All analysis are based on multipoint approaches using hidden Markov models.

Maintained by Cristiane Taniguti. Last updated 2 months ago.

cpp

11.5 match 3 stars 6.58 score 183 scripts

vandomed

tab:Create Summary Tables for Statistical Reports

Contains functions for creating various types of summary tables, e.g. comparing characteristics across levels of a categorical variable and summarizing fitted generalized linear models, generalized estimating equations, and Cox proportional hazards models. Functions are available to handle data from simple random samples as well as complex surveys.

Maintained by Dane R. Van Domelen. Last updated 4 years ago.

manuscripts reports reproducible-research statistics tables

10.8 match 2 stars 6.97 score 86 scripts 9 dependents

bioc

cola:A Framework for Consensus Partitioning

Subgroup classification is a basic task in genomic data analysis, especially for gene expression and DNA methylation data analysis. It can also be used to test the agreement to known clinical annotations, or to test whether there exist significant batch effects. The cola package provides a general framework for subgroup classification by consensus partitioning. It has the following features: 1. It modularizes the consensus partitioning processes that various methods can be easily integrated. 2. It provides rich visualizations for interpreting the results. 3. It allows running multiple methods at the same time and provides functionalities to straightforward compare results. 4. It provides a new method to extract features which are more efficient to separate subgroups. 5. It automatically generates detailed reports for the complete analysis. 6. It allows applying consensus partitioning in a hierarchical manner.

Maintained by Zuguang Gu. Last updated 1 months ago.

clustering geneexpression classification software consensus-clustering cpp

9.9 match 61 stars 7.49 score 112 scripts

alexsanjoseph

compareDF:Do a Git Style Diff of the Rows Between Two Dataframes with Similar Structure

Compares two dataframes which have the same column structure to show the rows that have changed. Also gives a git style diff format to quickly see what has changed in addition to summary statistics.

Maintained by Alex Joseph. Last updated 1 years ago.

compare-data

10.0 match 93 stars 7.30 score 119 scripts 2 dependents

winvector

WVPlots:Common Plots for Analysis

Select data analysis plots, under a standardized calling interface implemented on top of 'ggplot2' and 'plotly'. Plots of interest include: 'ROC', gain curve, scatter plot with marginal distributions, conditioned scatter plot with marginal densities, box and stem with matching theoretical distribution, and density with matching theoretical distribution.

Maintained by John Mount. Last updated 11 months ago.

9.0 match 85 stars 8.00 score 280 scripts

bioc

sesame:SEnsible Step-wise Analysis of DNA MEthylation BeadChips

Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines.

Maintained by Wanding Zhou. Last updated 2 months ago.

dnamethylation methylationarray preprocessing qualitycontrol bioinformatics dna-methylation microarray

7.9 match 69 stars 9.08 score 258 scripts 1 dependents

bioc

Biobase:Biobase: Base functions for Bioconductor

Functions that are needed by many other packages or which replace R functions.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure bioconductor-package core-package

4.3 match 9 stars 16.45 score 6.6k scripts 1.8k dependents

marce10

warbleR:Streamline Bioacoustic Analysis

Functions aiming to facilitate the analysis of the structure of animal acoustic signals in 'R'. 'warbleR' makes use of the basic sound analysis tools from the packages 'tuneR' and 'seewave', and offers new tools for explore and quantify acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals.

Maintained by Marcelo Araya-Salas. Last updated 2 months ago.

animal-acoustic-signals audio-processing bioacoustics spectrogram streamline-analysis cpp

6.5 match 54 stars 11.01 score 270 scripts 4 dependents

framverse

framrsquared:FRAM Database Interface

A convenient tool for interfacing with FRAM access databases in R environments.

Maintained by Ty Garber. Last updated 2 months ago.

14.0 match 6 stars 5.06 score 9 scripts

data-cleaning

validate:Data Validation Infrastructure

Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. Supports checks implied by an SDMX DSD file as well. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, Chapter 6 and the JSS paper (2021) <doi:10.18637/jss.v097.i10>.

Maintained by Mark van der Loo. Last updated 11 days ago.

data-cleaning validation

5.7 match 418 stars 12.50 score 448 scripts 9 dependents

lukejharmon

geiger:Analysis of Evolutionary Diversification

Methods for fitting macroevolutionary models to phylogenetic trees Pennell (2014) <doi:10.1093/bioinformatics/btu181>.

Maintained by Luke Harmon. Last updated 2 years ago.

openblas cpp

9.0 match 1 stars 7.84 score 2.3k scripts 28 dependents

jhelvy

logitr:Logit Models w/Preference & WTP Space Utility Parameterizations

Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R. Models can be estimated using "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. Weighted models can also be estimated. An option is available to run a parallelized multistart optimization loop with random starting points in each iteration, which is useful for non-convex problems like MXL models or models with WTP space utility parameterizations. The main optimization loop uses the 'nloptr' package to minimize the negative log-likelihood function. Additional functions are available for computing and comparing WTP from both preference space and WTP space models and for predicting expected choices and choice probabilities for sets of alternatives based on an estimated model. Mixed logit models can include uncorrelated or correlated heterogeneity covariances and are estimated using maximum simulated likelihood based on the algorithms in Train (2009) <doi:10.1017/CBO9780511805271>. More details can be found in Helveston (2023) <doi:10.18637/jss.v105.i10>.

Maintained by John Helveston. Last updated 4 months ago.

log-likelihood logit logit-model mixed-logit mlogit multinomial-regression mxl mxl-models preference-space preferences willingness-to-pay wtp

7.8 match 54 stars 9.10 score 119 scripts 1 dependents

rqtl

qtl2:Quantitative Trait Locus Mapping in Experimental Crosses

Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.

Maintained by Karl W Broman. Last updated 8 days ago.

cpp

7.4 match 34 stars 9.48 score 1.1k scripts 5 dependents

bluefoxr

COINr:Composite Indicator Construction and Analysis

A comprehensive high-level package, for composite indicator construction and analysis. It is a "development environment" for composite indicators and scoreboards, which includes utilities for construction (indicator selection, denomination, imputation, data treatment, normalisation, weighting and aggregation) and analysis (multivariate analysis, correlation plotting, short cuts for principal component analysis, global sensitivity analysis, and more). A composite indicator is completely encapsulated inside a single hierarchical list called a "coin". This allows a fast and efficient work flow, as well as making quick copies, testing methodological variations and making comparisons. It also includes many plotting options, both statistical (scatter plots, distribution plots) as well as for presenting results.

Maintained by William Becker. Last updated 2 months ago.

7.8 match 26 stars 9.07 score 73 scripts 1 dependents

tarnduong

ks:Kernel Smoothing

Kernel smoothers for univariate and multivariate data, with comprehensive visualisation and bandwidth selection capabilities, including for densities, density derivatives, cumulative distributions, clustering, classification, density ridges, significant modal regions, and two-sample hypothesis tests. Chacon & Duong (2018) <doi:10.1201/9780429485572>.

Maintained by Tarn Duong. Last updated 6 months ago.

6.9 match 6 stars 10.14 score 920 scripts 262 dependents

arcaldwell49

TOSTER:Two One-Sided Tests (TOST) Equivalence Testing

Two one-sided tests (TOST) procedure to test equivalence for t-tests, correlations, differences between proportions, and meta-analyses, including power analysis for t-tests and correlations. Allows you to specify equivalence bounds in raw scale units or in terms of effect sizes. See: Lakens (2017) <doi:10.1177/1948550617697177>.

Maintained by Aaron Caldwell. Last updated 1 months ago.

10.3 match 6.77 score 266 scripts

gbradburd

conStruct:Models Spatially Continuous and Discrete Population Genetic Structure

A method for modeling genetic data as a combination of discrete layers, within each of which relatedness may decay continuously with geographic distance. This package contains code for running analyses (which are implemented in the modeling language 'rstan') and visualizing and interpreting output. See the paper for more details on the model and its utility.

Maintained by Gideon Bradburd. Last updated 1 years ago.

cpp

8.3 match 35 stars 8.39 score 70 scripts

gagolews

stringi:Fast and Portable Character String Processing Facilities

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).

Maintained by Marek Gagolewski. Last updated 1 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode cpp

3.8 match 309 stars 18.31 score 10k scripts 8.6k dependents

r-lib

cli:Helpers for Developing Command Line Interfaces

A suite of tools to build attractive command line interfaces ('CLIs'), from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom themes via a 'CSS'-like language. It also contains a number of lower level 'CLI' elements: rules, boxes, trees, and 'Unicode' symbols with 'ASCII' alternatives. It support ANSI colors and text styles as well.

Maintained by Gábor Csárdi. Last updated 17 hours ago.

cli

3.5 match 664 stars 19.33 score 1.4k scripts 14k dependents

dwarton

ecostats:Code and Data Accompanying the Eco-Stats Text (Warton 2022)

Functions and data supporting the Eco-Stats text (Warton, 2022, Springer), and solutions to exercises. Functions include tools for using simulation envelopes in diagnostic plots, and a function for diagnostic plots of multivariate linear models. Datasets mentioned in the package are included here (where not available elsewhere) and there is a vignette for each chapter of the text with solutions to exercises.

Maintained by David Warton. Last updated 1 years ago.

10.4 match 8 stars 6.58 score 53 scripts

luisdva

unheadr:Handle Data with Messy Header Rows and Broken Values

Verb-like functions to work with messy data, often derived from spreadsheets or parsed PDF tables. Includes functions for unwrapping values broken up across rows, relocating embedded grouping values, and to annotate meaningful formatting in spreadsheet files.

Maintained by Luis D. Verde Arregoitia. Last updated 10 months ago.

10.6 match 61 stars 6.44 score 45 scripts

adeverse

adephylo:Exploratory Analyses for the Phylogenetic Comparative Method

Multivariate tools to analyze comparative data, i.e. a phylogeny and some traits measured for each taxa. The package contains functions to represent comparative data, compute phylogenetic proximities, perform multivariate analysis with phylogenetic constraints and test for the presence of phylogenetic autocorrelation. The package is described in Jombart et al (2010) <doi:10.1093/bioinformatics/btq292>.

Maintained by Aurélie Siberchicot. Last updated 2 days ago.

6.7 match 9 stars 10.05 score 312 scripts 4 dependents

yulab-smu

scholar:Analyse Citation Data from Google Scholar

Provides functions to extract citation data from Google Scholar. Convenience functions are also provided for comparing multiple scholars and predicting future h-index values.

Maintained by Guangchuang Yu. Last updated 1 years ago.

7.0 match 43 stars 9.63 score 468 scripts 3 dependents

tushiqi

MAnorm2:Tools for Normalizing and Comparing ChIP-seq Samples

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the premier technology for profiling genome-wide localization of chromatin-binding proteins, including transcription factors and histones with various modifications. This package provides a robust method for normalizing ChIP-seq signals across individual samples or groups of samples. It also designs a self-contained system of statistical models for calling differential ChIP-seq signals between two or more biological conditions as well as for calling hypervariable ChIP-seq signals across samples. Refer to Tu et al. (2021) <doi:10.1101/gr.262675.120> and Chen et al. (2022) <doi:10.1186/s13059-022-02627-9> for associated statistical details.

Maintained by Shiqi Tu. Last updated 2 years ago.

chip-seq differential-analysis empirical-bayes winsorize-values

12.1 match 32 stars 5.48 score 19 scripts

deboerk

cocor:Comparing Correlations

Statistical tests for the comparison between two correlations based on either independent or dependent groups. Dependent correlations can either be overlapping or nonoverlapping. A web interface is available on the website <http://comparingcorrelations.org>. A plugin for the R GUI and IDE RKWard is included. Please install RKWard from <https://rkward.kde.org> to use this feature. The respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard.

Maintained by Birk Diedenhofen. Last updated 3 years ago.

12.9 match 1 stars 5.15 score 151 scripts 9 dependents

bioc

PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit

Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression pharmacogenetics pharmacogenomics software classification survival clustering geneprediction

15.3 match 1 stars 4.31 score 17 scripts

kornl

mutoss:Unified Multiple Testing Procedures

Designed to ease the application and comparison of multiple hypothesis testing procedures for FWER, gFWER, FDR and FDX. Methods are standardized and usable by the accompanying 'mutossGUI'.

Maintained by Kornelius Rohmeyer. Last updated 12 months ago.

7.8 match 4 stars 8.44 score 24 scripts 16 dependents

venelin

PCMBase:Simulation and Likelihood Calculation of Phylogenetic Comparative Models

Phylogenetic comparative methods represent models of continuous trait data associated with the tips of a phylogenetic tree. Examples of such models are Gaussian continuous time branching stochastic processes such as Brownian motion (BM) and Ornstein-Uhlenbeck (OU) processes, which regard the data at the tips of the tree as an observed (final) state of a Markov process starting from an initial state at the root and evolving along the branches of the tree. The PCMBase R package provides a general framework for manipulating such models. This framework consists of an application programming interface for specifying data and model parameters, and efficient algorithms for simulating trait evolution under a model and calculating the likelihood of model parameters for an assumed model and trait data. The package implements a growing collection of models, which currently includes BM, OU, BM/OU with jumps, two-speed OU as well as mixed Gaussian models, in which different types of the above models can be associated with different branches of the tree. The PCMBase package is limited to trait-simulation and likelihood calculation of (mixed) Gaussian phylogenetic models. The PCMFit package provides functionality for inference of these models to tree and trait data. The package web-site <https://venelin.github.io/PCMBase/> provides access to the documentation and other resources.

Maintained by Venelin Mitov. Last updated 10 months ago.

8.7 match 6 stars 7.56 score 85 scripts 3 dependents

ropensci

rotl:Interface to the 'Open Tree of Life' API

An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.

Maintained by Francois Michonneau. Last updated 2 years ago.

metadata ropensci phylogenetics independant-contrasts biodiversity peer-reviewed phylogeny taxonomy

5.5 match 40 stars 12.05 score 356 scripts 29 dependents

jm-umn

distfreereg:Distribution-Free Goodness-of-Fit Testing for Regression

Implements distribution-free goodness-of-fit regression testing for the mean structure of parametric models introduced in Khmaladze (2021) <doi:10.1007/s10463-021-00786-3>.

Maintained by Jesse Miller. Last updated 4 months ago.

15.3 match 4.25 score 178 scripts

rcalinjageman

esci:Estimation Statistics with Confidence Intervals

A collection of functions and 'jamovi' module for the estimation approach to inferential statistics, the approach which emphasizes effect sizes, interval estimates, and meta-analysis. Nearly all functions are based on 'statpsych' and 'metafor'. This package is still under active development, and breaking changes are likely, especially with the plot and hypothesis test functions. Data sets are included for all examples from Cumming & Calin-Jageman (2024) <ISBN:9780367531508>.

Maintained by Robert Calin-Jageman. Last updated 21 days ago.

jamovi jasp science statistics visualization

11.7 match 22 stars 5.42 score 12 scripts

ovvo-financial

NNS:Nonlinear Nonparametric Statistics

Nonlinear nonparametric statistics using partial moments. Partial moments are the elements of variance and asymptotically approximate the area of f(x). These robust statistics provide the basis for nonlinear analysis while retaining linear equivalences. NNS offers: Numerical integration, Numerical differentiation, Clustering, Correlation, Dependence, Causal analysis, ANOVA, Regression, Classification, Seasonality, Autoregressive modeling, Normalization, Stochastic dominance and Advanced Monte Carlo sampling. All routines based on: Viole, F. and Nawrocki, D. (2013), Nonlinear Nonparametric Statistics: Using Partial Moments (ISBN: 1490523995).

Maintained by Fred Viole. Last updated 4 days ago.

clustering econometrics machine-learning nonlinear nonparametric partial-moments statistics time-series cpp

5.8 match 71 stars 10.96 score 66 scripts 3 dependents

r-lib

testthat:Unit Testing for R

Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy to learn and use, and integrates with your existing 'workflow'.

Maintained by Hadley Wickham. Last updated 15 days ago.

unit-testing cpp

3.0 match 900 stars 20.97 score 74k scripts 465 dependents

tidyverse

lubridate:Make Dealing with Dates a Little Easier

Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects. The 'lubridate' package has a consistent and memorable syntax that makes working with dates easy and fun.

Maintained by Vitalie Spinu. Last updated 3 months ago.

date date-time

3.0 match 757 stars 20.95 score 135k scripts 1.9k dependents

sonsoleslp

tna:Transition Network Analysis (TNA)

Provides tools for performing Transition Network Analysis (TNA) to study relational dynamics, including functions for building and plotting TNA models, calculating centrality measures, and identifying dominant events and patterns. TNA statistical techniques (e.g., bootstrapping and permutation tests) ensure the reliability of observed insights and confirm that identified dynamics are meaningful. See (Saqr et al., 2025) <doi:10.1145/3706468.3706513> for more details on TNA.

Maintained by Sonsoles López-Pernas. Last updated 2 days ago.

educational-data-mining learning-analytics markov-model temporal-analysis

9.7 match 4 stars 6.48 score 5 scripts

hugaped

MBNMAtime:Run Time-Course Model-Based Network Meta-Analysis (MBNMA) Models

Fits Bayesian time-course models for model-based network meta-analysis (MBNMA) that allows inclusion of multiple time-points from studies. Repeated measures over time are accounted for within studies by applying different time-course functions, following the method of Pedder et al. (2019) <doi:10.1002/jrsm.1351>. The method allows synthesis of studies with multiple follow-up measurements that can account for time-course for a single or multiple treatment comparisons. Several general time-course functions are provided; others may be added by the user. Various characteristics can be flexibly added to the models, such as correlation between time points and shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting.

Maintained by Hugo Pedder. Last updated 1 months ago.

jags cpp

10.2 match 7 stars 6.10 score

thibautjombart

treespace:Statistical Exploration of Landscapes of Phylogenetic Trees

Tools for the exploration of distributions of phylogenetic trees. This package includes a 'shiny' interface which can be started from R using treespaceServer(). For further details see Jombart et al. (2017) <DOI:10.1111/1755-0998.12676>.

Maintained by Michelle Kendall. Last updated 2 years ago.

cpp

8.4 match 28 stars 7.39 score 63 scripts

sym33

RecordLinkage:Record Linkage Functions for Linking and Deduplicating Data Sets

Provides functions for linking and deduplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain. For details, see our paper "The RecordLinkage Package: Detecting Errors in Data" Sariyar M / Borg A (2010) <doi:10.32614/RJ-2010-017>.

Maintained by Murat Sariyar. Last updated 2 years ago.

6.8 match 6 stars 9.00 score 454 scripts 8 dependents

marc-girondot

HelpersMG:Tools for Environmental Analyses, Ecotoxicology and Various R Functions

Contains miscellaneous functions useful for managing 'NetCDF' files (see <https://en.wikipedia.org/wiki/NetCDF>), get moon phase and time for sun rise and fall, tide level, analyse and reconstruct periodic time series of temperature with irregular sinusoidal pattern, show scales and wind rose in plot with change of color of text, Metropolis-Hastings algorithm for Bayesian MCMC analysis, plot graphs or boxplot with error bars, search files in disk by there names or their content, read the contents of all files from a folder at one time.

Maintained by Marc Girondot. Last updated 2 months ago.

13.1 match 4 stars 4.59 score 160 scripts 4 dependents

bioc

S4Vectors:Foundation of vector-like and list-like containers in Bioconductor

The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure datarepresentation bioconductor-package core-package

3.8 match 18 stars 16.05 score 1.0k scripts 1.9k dependents

melff

memisc:Management of Survey Data and Presentation of Analysis Results

An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.

Maintained by Martin Elff. Last updated 11 days ago.

survey-data

4.9 match 46 stars 12.34 score 1.2k scripts 13 dependents

nhanhocu

metamicrobiomeR:an R package for analysis of microbiome relative abundance data using zero inflated beta GAMLSS and meta-analysis across studies using random effect model

The metamicrobiomeR package implements Generalized Additive Model for Location, Scale and Shape (GAMLSS) with zero inflated beta (BEZI) family for analysis of microbiome relative abundance data (with various options for data transformation/normalization to address compositional effects) and random effect meta-analysis models for meta-analysis pooling estimates across microbiome studies. Random Forest model to predict microbiome age based on relative abundances of shared bacterial genera with the Bangladesh data (Subramanian et al 2014), comparison of multiple diversity indexes using linear/linear mixed effect models and some data display/visualization are also implemented.

Maintained by Nhan Ho. Last updated 4 years ago.

12.3 match 33 stars 4.90 score 12 scripts

cschwarz-stat-sfu-ca

SPAS:Stratified-Petersen Analysis System

The Stratified-Petersen Analysis System (SPAS) is designed to estimate abundance in two-sample capture-recapture experiments where the capture and recaptures are stratified. This is a generalization of the simple Lincoln-Petersen estimator. Strata may be defined in time or in space or both, and the s strata in which marking takes place may differ from the t strata in which recoveries take place. When s=t, SPAS reduces to the method described by Darroch (1961) <doi:10.2307/2332748>. When s<t, SPAS implements the methods described in Plante, Rivest, and Tremblay (1988) <doi:10.2307/2533994>. Schwarz and Taylor (1998) <doi:10.1139/f97-238> describe the use of SPAS in estimating return of salmon stratified by time and geography. A related package, BTSPAS, deals with temporal stratification where a spline is used to model the distribution of the population over time as it passes the second capture location. This is the R-version of the (now obsolete) standalone Windows program of the same name.

Maintained by Carl James Schwarz. Last updated 1 months ago.

cpp

9.1 match 2 stars 6.55 score 28 scripts 1 dependents

matteo21q

dani:Design and Analysis of Non-Inferiority Trials

Provides tools to help with the design and analysis of non-inferiority trials. These include functions for doing sample size calculations and for analysing non-inferiority trials, using a variety of outcome types and population-level sumamry measures. It also features functions to make trials more resilient by using the concept of non-inferiority frontiers, as described in Quartagno et al. (2019) <arXiv:1905.00241>. Finally it includes function to design and analyse MAMS-ROCI (aka DURATIONS) trials.

Maintained by Matteo Quartagno. Last updated 7 months ago.

11.1 match 2 stars 5.33 score 27 scripts

ncss-tech

aqp:Algorithms for Quantitative Pedology

The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.

Maintained by Dylan Beaudette. Last updated 28 days ago.

digital-soil-mapping ncss-tech nrcs pedology pedometrics soil soil-survey usda

5.0 match 55 stars 11.77 score 1.2k scripts 2 dependents

xfim

ggmcmc:Tools for Analyzing MCMC Simulations from Bayesian Inference

Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables, and functions to work with hierarchical/multilevel batches of parameters (Fernández-i-Marín, 2016 <doi:10.18637/jss.v070.i09>).

Maintained by Xavier Fernández i Marín. Last updated 2 years ago.

bayesian-data-analysis ggplot2 graphical jags mcmc stan

4.9 match 112 stars 12.02 score 1.6k scripts 8 dependents

brry

berryFunctions:Function Collection Related to Plotting and Hydrology

Draw horizontal histograms, color scattered points by 3rd dimension, enhance date- and log-axis plots, zoom in X11 graphics, trace errors and warnings, use the unit hydrograph in a linear storage cascade, convert lists to data.frames and arrays, fit multiple functions.

Maintained by Berry Boessenkool. Last updated 1 months ago.

6.3 match 13 stars 9.43 score 350 scripts 16 dependents

bioc

orthogene:Interspecies gene mapping

`orthogene` is an R package for easy mapping of orthologous genes across hundreds of species. It pulls up-to-date gene ortholog mappings across **700+ organisms**. It also provides various utility functions to aggregate/expand common objects (e.g. data.frames, gene expression matrices, lists) using **1:1**, **many:1**, **1:many** or **many:many** gene mappings, both within- and between-species.

Maintained by Brian Schilder. Last updated 5 months ago.

genetics comparativegenomics preprocessing phylogenetics transcriptomics geneexpression animal-models bioconductor bioconductor-package bioinformatics biomedicine comparative-genomics evolutionary-biology genes genomics ontologies translational-research

7.5 match 42 stars 7.85 score 31 scripts 2 dependents

asardaes

dtwclust:Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

Time series clustering along with optimized techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Implementations of partitional, hierarchical, fuzzy, k-Shape and TADPole clustering are available. Functionality can be easily extended with custom distance measures and centroid definitions. Implementations of DTW barycenter averaging, a distance based on global alignment kernels, and the soft-DTW distance and centroid routines are also provided. All included distance functions have custom loops optimized for the calculation of cross-distance matrices, including parallelization support. Several cluster validity indices are included.

Maintained by Alexis Sarda. Last updated 8 months ago.

clustering dtw time-series openblas cpp

4.7 match 261 stars 12.39 score 406 scripts 14 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 9 days ago.

fortran cpp

3.4 match 87 stars 16.68 score 7.7k scripts 99 dependents

salvatoremangiafico

rcompanion:Functions to Support Extension Education Program Evaluation

Functions and datasets to support Summary and Analysis of Extension Program Evaluation in R, and An R Companion for the Handbook of Biological Statistics. Vignettes are available at <https://rcompanion.org>.

Maintained by Salvatore Mangiafico. Last updated 30 days ago.

7.0 match 4 stars 8.01 score 2.4k scripts 5 dependents

modeloriented

DALEXtra:Extension for 'DALEX' Package

Provides wrapper of various machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the interpretable machine learning, there are more and more new ideas for explaining black-box models, that are implemented in 'R'. 'DALEXtra' creates 'DALEX' Biecek (2018) <arXiv:1806.08915> explainer for many type of models including those created using 'python' 'scikit-learn' and 'keras' libraries, and 'java' 'h2o' library. Important part of the package is Champion-Challenger analysis and innovative approach to model performance across subsets of test data presented in Funnel Plot.

Maintained by Szymon Maksymiuk. Last updated 2 years ago.

extension-for-dalex-package

7.2 match 67 stars 7.71 score 400 scripts 1 dependents

annennenne

causalDisco:Tools for Causal Discovery on Observational Data

Various tools for inferring causal models from observational data. The package includes an implementation of the temporal Peter-Clark (TPC) algorithm. Petersen, Osler and Ekstrøm (2021) <doi:10.1093/aje/kwab087>. It also includes general tools for evaluating differences in adjacency matrices, which can be used for evaluating performance of causal discovery procedures.

Maintained by Anne Helby Petersen. Last updated 13 days ago.

11.7 match 19 stars 4.76 score 10 scripts

veseshan

clinfun:Clinical Trial Design and Data Analysis Functions

Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2-stage and group sequential designs and for data analysis such as Jonckheere-Terpstra test and estimating survival quantiles.

Maintained by Venkatraman E. Seshan. Last updated 1 years ago.

fortran

7.0 match 5 stars 7.86 score 124 scripts 8 dependents

femiguez

apsimx:Inspect, Read, Edit and Run 'APSIM' "Next Generation" and 'APSIM' Classic

The functions in this package inspect, read, edit and run files for 'APSIM' "Next Generation" ('JSON') and 'APSIM' "Classic" ('XML'). The files with an 'apsim' extension correspond to 'APSIM' Classic (7.x) - Windows only - and the ones with an 'apsimx' extension correspond to 'APSIM' "Next Generation". For more information about 'APSIM' see (<https://www.apsim.info/>) and for 'APSIM' next generation (<https://apsimnextgeneration.netlify.app/>).

Maintained by Fernando Miguez. Last updated 2 days ago.

apsim apsimx

5.7 match 59 stars 9.71 score 68 scripts 2 dependents

steve-the-bayesian

Boom:Bayesian Object Oriented Modeling

A C++ library for Bayesian modeling, with an emphasis on Markov chain Monte Carlo. Although boom contains a few R utilities (mainly plotting functions), its primary purpose is to install the BOOM C++ library on your system so that other packages can link against it.

Maintained by Steven L. Scott. Last updated 1 years ago.

11.4 match 9 stars 4.82 score 57 scripts 6 dependents

dkaschek

dMod:Dynamic Modeling and Parameter Estimation in ODE Models

The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives.

Maintained by Daniel Kaschek. Last updated 9 days ago.

6.5 match 20 stars 8.35 score 251 scripts

mlcollyer

RRPP:Linear Model Evaluation with Randomized Residuals in a Permutation Procedure

Linear model calculations are made for many random versions of data. Using residual randomization in a permutation procedure, sums of squares are calculated over many permutations to generate empirical probability distributions for evaluating model effects. Additionally, coefficients, statistics, fitted values, and residuals generated over many permutations can be used for various procedures including pairwise tests, prediction, classification, and model comparison. This package should provide most tools one could need for the analysis of high-dimensional data, especially in ecology and evolutionary biology, but certainly other fields, as well.

Maintained by Michael Collyer. Last updated 25 days ago.

5.5 match 4 stars 9.84 score 173 scripts 7 dependents

cran

bnlearn:Bayesian Network Structure Learning, Parameter Learning and Inference

Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC, HPC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, cross-validation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <https://www.bnlearn.com/>.

Maintained by Marco Scutari. Last updated 2 months ago.

openblas

7.0 match 57 stars 7.72 score 32 dependents

singmann

afex:Analysis of Factorial Experiments

Convenience functions for analyzing factorial experiments using ANOVA or mixed models. aov_ez(), aov_car(), and aov_4() allow specification of between, within (i.e., repeated-measures), or mixed (i.e., split-plot) ANOVAs for data in long format (i.e., one observation per row), automatically aggregating multiple observations per individual and cell of the design. mixed() fits mixed models using lme4::lmer() and computes p-values for all fixed effects using either Kenward-Roger or Satterthwaite approximation for degrees of freedom (LMM only), parametric bootstrap (LMMs and GLMMs), or likelihood ratio tests (LMMs and GLMMs). afex_plot() provides a high-level interface for interaction or one-way plots using ggplot2, combining raw data and model estimates. afex uses type 3 sums of squares as default (imitating commercial statistical software).

Maintained by Henrik Singmann. Last updated 7 months ago.

3.8 match 123 stars 14.50 score 1.4k scripts 15 dependents

welch-lab

rliger:Linked Inference of Genomic Experimental Relationships

Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.

Maintained by Yichen Wang. Last updated 2 months ago.

nonnegative-matrix-factorization single-cell openblas cpp

5.0 match 402 stars 10.80 score 334 scripts 1 dependents

wraff

wrMisc:Analyze Experimental High-Throughput (Omics) Data

The efficient treatment and convenient analysis of experimental high-throughput (omics) data gets facilitated through this collection of diverse functions. Several functions address advanced object-conversions, like manipulating lists of lists or lists of arrays, reorganizing lists to arrays or into separate vectors, merging of multiple entries, etc. Another set of functions provides speed-optimized calculation of standard deviation (sd), coefficient of variance (CV) or standard error of the mean (SEM) for data in matrixes or means per line with respect to additional grouping (eg n groups of replicates). A group of functions facilitate dealing with non-redundant information, by indexing unique, adding counters to redundant or eliminating lines with respect redundancy in a given reference-column, etc. Help is provided to identify very closely matching numeric values to generate (partial) distance matrixes for very big data in a memory efficient manner or to reduce the complexity of large data-sets by combining very close values. Other functions help aligning a matrix or data.frame to a reference using partial matching or to mine an experimental setup to extract patterns of replicate samples. Many times large experimental datasets need some additional filtering, adequate functions are provided. Convenient data normalization is supported in various different modes, parameter estimation via permutations or boot-strap as well as flexible testing of multiple pair-wise combinations using the framework of 'limma' is provided, too. Batch reading (or writing) of sets of files and combining data to arrays is supported, too.

Maintained by Wolfgang Raffelsberger. Last updated 7 months ago.

12.1 match 4.44 score 33 scripts 4 dependents

neptune-ai

neptune:MLOps Metadata Store - Experiment Tracking and Model Registry for Production Teams

An interface to Neptune. A metadata store for MLOps, built for teams that run a lot of experiments. It gives you a single place to log, store, display, organize, compare, and query all your model-building metadata. Neptune is used for: • Experiment tracking: Log, display, organize, and compare ML experiments in a single place. • Model registry: Version, store, manage, and query trained models, and model building metadata. • Monitoring ML runs live: Record and monitor model training, evaluation, or production runs live For more information see <https://neptune.ai/>.

Maintained by Rafal Jankowski. Last updated 2 years ago.

compare language log management metadata metrics mlops models monitoring organize parameters store tracker visualization

10.8 match 14 stars 4.89 score 16 scripts

polkas

pacs:Supplementary Tools for R Packages Developers

Supplementary utils for CRAN maintainers and R packages developers. Validating the library, packages and lock files. Exploring a complexity of a specific package like evaluating its size in bytes with all dependencies. The shiny app complexity could be explored too. Assessing the life duration of a specific package version. Checking a CRAN package check page status for any errors and warnings. Retrieving a DESCRIPTION or NAMESPACE file for any package version. Comparing DESCRIPTION or NAMESPACE files between different package versions. Getting a list of all releases for a specific package. The Bioconductor is partly supported.

Maintained by Maciej Nasinski. Last updated 6 months ago.

bioconductor dependencies library lifeduration renv shiny tools utils

9.2 match 25 stars 5.70 score 8 scripts

bioc

MutationalPatterns:Comprehensive genome-wide analysis of mutational processes

Mutational processes leave characteristic footprints in genomic DNA. This package provides a comprehensive set of flexible functions that allows researchers to easily evaluate and visualize a multitude of mutational patterns in base substitution catalogues of e.g. healthy samples, tumour samples, or DNA-repair deficient cells. The package covers a wide range of patterns including: mutational signatures, transcriptional and replicative strand bias, lesion segregation, genomic distribution and association with genomic features, which are collectively meaningful for studying the activity of mutational processes. The package works with single nucleotide variants (SNVs), insertions and deletions (Indels), double base substitutions (DBSs) and larger multi base substitutions (MBSs). The package provides functionalities for both extracting mutational signatures de novo and determining the contribution of previously identified mutational signatures on a single sample level. MutationalPatterns integrates with common R genomic analysis workflows and allows easy association with (publicly available) annotation data.

Maintained by Mark van Roosmalen. Last updated 5 months ago.

genetics somaticmutation

7.2 match 7.27 score 251 scripts 1 dependents

soerenpannier

emdi:Estimating and Mapping Disaggregated Indicators

Functions that support estimating, assessing and mapping regional disaggregated indicators. So far, estimation methods comprise direct estimation, the model-based unit-level approach Empirical Best Prediction (see "Small area estimation of poverty indicators" by Molina and Rao (2010) <doi:10.1002/cjs.10051>), the area-level model (see "Estimates of income for small places: An application of James-Stein procedures to Census Data" by Fay and Herriot (1979) <doi:10.1080/01621459.1979.10482505>) and various extensions of it (adjusted variance estimation methods, log and arcsin transformation, spatial, robust and measurement error models), as well as their precision estimates. The assessment of the used model is supported by a summary and diagnostic plots. For a suitable presentation of estimates, map plots can be easily created. Furthermore, results can easily be exported to excel. For a detailed description of the package and the methods used see "The R Package emdi for Estimating and Mapping Regionally Disaggregated Indicators" by Kreutzmann et al. (2019) <doi:10.18637/jss.v091.i07> and the second package vignette "A Framework for Producing Small Area Estimates Based on Area-Level Models in R".

Maintained by Soeren Pannier. Last updated 1 years ago.

7.1 match 15 stars 7.26 score 45 scripts 1 dependents

acorg

Racmacs:Antigenic Cartography Macros

A toolkit for making antigenic maps from immunological assay data, in order to quantify and visualize antigenic differences between different pathogen strains as described in Smith et al. (2004) <doi:10.1126/science.1097211> and used in the World Health Organization influenza vaccine strain selection process. Additional functions allow for the diagnostic evaluation of antigenic maps and an interactive viewer is provided to explore antigenic relationships amongst several strains and incorporate the visualization of associated genetic information.

Maintained by Sam Wilks. Last updated 9 months ago.

openblas cpp openmp

6.4 match 21 stars 8.06 score 362 scripts

r-forge

Matrix:Sparse and Dense Matrix Classes and Methods

A rich hierarchy of sparse and dense matrix classes, including general, symmetric, triangular, and diagonal matrices with numeric, logical, or pattern entries. Efficient methods for operating on such matrices, often wrapping the 'BLAS', 'LAPACK', and 'SuiteSparse' libraries.

Maintained by Martin Maechler. Last updated 6 days ago.

openblas

3.0 match 1 stars 17.23 score 33k scripts 12k dependents

kwb-r

kwb.utils:General Utility Functions Developed at KWB

This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).

Maintained by Hauke Sonnenberg. Last updated 12 months ago.

7.0 match 8 stars 7.33 score 12 scripts 78 dependents

ravingmantis

unittest:TAP-Compliant Unit Testing

Concise TAP <http://testanything.org/> compliant unit testing package. Authored tests can be run using CMD check with minimal implementation overhead.

Maintained by Jamie Lentin. Last updated 7 months ago.

6.9 match 4 stars 7.43 score 224 scripts

gksmyth

statmod:Statistical Modeling

A collection of algorithms and functions to aid statistical modeling. Includes limiting dilution analysis (aka ELDA), growth curve comparisons, mixed linear models, heteroscedastic regression, inverse-Gaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. Also includes advanced generalized linear model functions including Tweedie and Digamma distributional families, secure convergence and exact distributional calculations for unit deviances.

Maintained by Gordon Smyth. Last updated 2 years ago.

fortran

5.3 match 1 stars 9.62 score 2.2k scripts 849 dependents

dariah-fi-survey-concept-network

finnsurveytext:Analyse Open-Ended Survey Responses in Finnish

Annotates Finnish textual survey responses into CoNLL-U format using Finnish treebanks from <https://universaldependencies.org/format.html> using UDPipe as described in Straka and Straková (2017) <doi:10.18653/v1/K17-3009>. Formatted data is then analysed using single or comparison n-gram plots, wordclouds, summary tables and Concept Network plots. The Concept Network plots use the TextRank algorithm as outlined in Mihalcea, Rada & Tarau, Paul (2004) <https://aclanthology.org/W04-3252/>.

Maintained by Adeline Clarke. Last updated 9 days ago.

dariah-fi

9.4 match 5.39 score 27 scripts

adamlilith

fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'

Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.

Maintained by Adam B. Smith. Last updated 18 days ago.

aspect distance fragmentation fragmentation-indices gis grass grass-gis raster raster-projection rasterize slope topography vectorization

6.6 match 58 stars 7.69 score 8 scripts

bioc

SummarizedExperiment:A container (S4 class) for matrix-like assays

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Maintained by Hervé Pagès. Last updated 5 months ago.

genetics infrastructure sequencing annotation coverage genomeannotation bioconductor-package core-package

3.0 match 34 stars 16.85 score 8.6k scripts 1.2k dependents

bioc

countsimQC:Compare Characteristic Features of Count Data Sets

countsimQC provides functionality to create a comprehensive report comparing a broad range of characteristics across a collection of count matrices. One important use case is the comparison of one or more synthetic count matrices to a real count matrix, possibly the one underlying the simulations. However, any collection of count matrices can be compared.

Maintained by Charlotte Soneson. Last updated 3 months ago.

microbiome rnaseq singlecell experimentaldesign qualitycontrol reportwriting visualization immunooncology

6.5 match 27 stars 7.69 score 24 scripts

jinkim3

kim:A Toolkit for Behavioral Scientists

A collection of functions for analyzing data typically collected or used by behavioral scientists. Examples of the functions include a function that compares groups in a factorial experimental design, a function that conducts two-way analysis of variance (ANOVA), and a function that cleans a data set generated by Qualtrics surveys. Some of the functions will require installing additional package(s). Such packages and other references are cited within the section describing the relevant functions. Many functions in this package rely heavily on these two popular R packages: Dowle et al. (2021) <https://CRAN.R-project.org/package=data.table>. Wickham et al. (2021) <https://CRAN.R-project.org/package=ggplot2>.

Maintained by Jin Kim. Last updated 18 days ago.

10.8 match 7 stars 4.66 score 3 scripts

klausvigo

phangorn:Phylogenetic Reconstruction and Analysis

Allows for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation (Schliep 2011). Offers methods for tree comparison, model selection and visualization of phylogenetic networks as described in Schliep et al. (2017).

Maintained by Klaus Schliep. Last updated 1 months ago.

software technology qualitycontrol phylogenetic-analysis phylogenetics openblas cpp

3.0 match 206 stars 16.69 score 2.5k scripts 135 dependents

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 2 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

3.0 match 851 stars 16.68 score 5.4k scripts 51 dependents

bioc

syntenet:Inference And Analysis Of Synteny Networks

syntenet can be used to infer synteny networks from whole-genome protein sequences and analyze them. Anchor pairs are detected with the MCScanX algorithm, which was ported to this package with the Rcpp framework for R and C++ integration. Anchor pairs from synteny analyses are treated as an undirected unweighted graph (i.e., a synteny network), and users can perform: i. network clustering; ii. phylogenomic profiling (by identifying which species contain which clusters) and; iii. microsynteny-based phylogeny reconstruction with maximum likelihood.

Maintained by Fabrício Almeida-Silva. Last updated 3 months ago.

software networkinference functionalgenomics comparativegenomics phylogenetics systemsbiology graphandnetwork wholegenome network comparative-genomics evolutionary-genomics network-science phylogenomics synteny synteny-network cpp

7.5 match 26 stars 6.67 score 12 scripts 1 dependents

hojsgaard

geepack:Generalized Estimating Equation Package

Generalized estimating equations solver for parameters in mean, scale, and correlation structures, through mean link, scale link, and correlation link. Can also handle clustered categorical responses. See e.g. Halekoh and Højsgaard, (2005, <doi:10.18637/jss.v015.i02>), for details.

Maintained by Søren Højsgaard. Last updated 7 months ago.

cpp

5.2 match 1 stars 9.59 score 1.7k scripts 43 dependents

mdsteiner

EFAtools:Fast and Flexible Implementations of Exploratory Factor Analysis Tools

Provides functions to perform exploratory factor analysis (EFA) procedures and compare their solutions. The goal is to provide state-of-the-art factor retention methods and a high degree of flexibility in the EFA procedures. This way, for example, implementations from R 'psych' and 'SPSS' can be compared. Moreover, functions for Schmid-Leiman transformation and the computation of omegas are provided. To speed up the analyses, some of the iterative procedures, like principal axis factoring (PAF), are implemented in C++.

Maintained by Markus Steiner. Last updated 3 months ago.

openblas cpp openmp

7.5 match 10 stars 6.57 score 83 scripts 1 dependents

r-lib

rcmdcheck:Run 'R CMD check' from 'R' and Capture Results

Run 'R CMD check' from 'R' and capture the results of the individual checks. Supports running checks in the background, timeouts, pretty printing and comparing check results.

Maintained by Gábor Csárdi. Last updated 5 months ago.

4.0 match 116 stars 12.34 score 102 scripts 158 dependents

tdhock

mlr3resampling:Resampling Algorithms for 'mlr3' Framework

A supervised learning algorithm inputs a train set, and outputs a prediction function, which can be used on a test set. If each data point belongs to a subset (such as geographic region, year, etc), then how do we know if subsets are similar enough so that we can get accurate predictions on one subset, after training on Other subsets? And how do we know if training on All subsets would improve prediction accuracy, relative to training on the Same subset? SOAK, Same/Other/All K-fold cross-validation, <doi:10.48550/arXiv.2410.08643> can be used to answer these question, by fixing a test subset, training models on Same/Other/All subsets, and then comparing test error rates (Same versus Other and Same versus All). Also provides code for estimating how many train samples are required to get accurate predictions on a test set.

Maintained by Toby Hocking. Last updated 1 months ago.

10.5 match 3 stars 4.68 score

djvanderlaan

reclin2:Record Linkage Toolkit

Functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities (I. Fellegi & A. Sunter (1969) <doi:10.1080/01621459.1969.10501049>, T.N. Herzog, F.J. Scheuren, & W.E. Winkler (2007), "Data Quality and Record Linkage Techniques", ISBN:978-0-387-69502-0), forcing one-to-one matching. Can also be used for pre- and post-processing for machine learning methods for record linkage. Focus is on memory, CPU performance and flexibility.

Maintained by Jan van der Laan. Last updated 1 years ago.

cpp

6.7 match 43 stars 7.36 score 89 scripts 1 dependents

cran

survRM2:Comparing Restricted Mean Survival Time

Performs two-sample comparisons using the restricted mean survival time (RMST) as a summary measure of the survival time distribution. Three kinds of between-group contrast metrics (i.e., the difference in RMST, the ratio of RMST and the ratio of the restricted mean time lost (RMTL)) are computed. It performs an ANCOVA-type covariate adjustment as well as unadjusted analyses for those measures.

Maintained by Hajime Uno. Last updated 3 years ago.

9.3 match 2 stars 5.26 score 5 dependents

bioc

bluster:Clustering Algorithms for Bioconductor

Wraps common clustering algorithms in an easily extended S4 framework. Backends are implemented for hierarchical, k-means and graph-based clustering. Several utilities are also provided to compare and evaluate clustering results.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology software geneexpression transcriptomics singlecell clustering cpp

5.2 match 9.43 score 636 scripts 51 dependents

bioc

MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics

MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.

Maintained by Laurent Gatto. Last updated 1 days ago.

immunooncology infrastructure proteomics massspectrometry qualitycontrol dataimport bioconductor bioinformatics mass-spectrometry proteomics-data visualisation cpp

3.8 match 130 stars 12.81 score 772 scripts 36 dependents

bioc

MPRAnalyze:Statistical Analysis of MPRA data

MPRAnalyze provides statistical framework for the analysis of data generated by Massively Parallel Reporter Assays (MPRAs), used to directly measure enhancer activity. MPRAnalyze can be used for quantification of enhancer activity, classification of active enhancers and comparative analyses of enhancer activity between conditions. MPRAnalyze construct a nested pair of generalized linear models (GLMs) to relate the DNA and RNA observations, easily adjustable to various experimental designs and conditions, and provides a set of rigorous statistical testig schemes.

Maintained by Tal Ashuach. Last updated 5 months ago.

immunooncology software statisticalmethod sequencing geneexpression cellbiology cellbasedassays differentialexpression experimentaldesign classification

7.1 match 12 stars 6.86 score 30 scripts

revelle

psych:Procedures for Psychological, Psychometric, and Personality Research

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 3 months ago.

3.5 match 52 stars 13.94 score 29k scripts 317 dependents

bioc

doubletrouble:Identification and classification of duplicated genes

doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.

Maintained by Fabrício Almeida-Silva. Last updated 3 days ago.

software wholegenome comparativegenomics functionalgenomics phylogenetics network classification bioinformatics comparative-genomics gene-duplication molecular-evolution whole-genome-duplication

7.5 match 23 stars 6.44 score 17 scripts

lmarusich

rmcorr:Repeated Measures Correlation

Compute the repeated measures correlation, a statistical technique for determining the overall within-individual relationship among paired measures assessed on two or more occasions, first introduced by Bland and Altman (1995). Includes functions for diagnostics, p-value, effect size with confidence interval including optional bootstrapping, as well as graphing. Also includes several example datasets. For more details, see the web documentation <https://lmarusich.github.io/rmcorr/index.html> and the original paper: Bakdash and Marusich (2017) <doi:10.3389/fpsyg.2017.00456>.

Maintained by Laura R. Marusich. Last updated 7 months ago.

5.3 match 7 stars 9.18 score 304 scripts

yonicd

ggedit:Interactive 'ggplot2' Layer and Theme Aesthetic Editor

Interactively edit 'ggplot2' layer and theme aesthetics definitions.

Maintained by Jonathan Sidi. Last updated 10 months ago.

ggplot2 shiny

6.0 match 250 stars 7.95 score 116 scripts 3 dependents

merck

psm3mkv:Evaluate Partitioned Survival and State Transition Models

Fits and evaluates three-state partitioned survival analyses (PartSAs) and Markov models (clock forward or clock reset) to progression and overall survival data typically collected in oncology clinical trials. These model structures are typically considered in cost-effectiveness modeling in advanced/metastatic cancer indications. Muston (2024). "Informing structural assumptions for three state oncology cost-effectiveness models through model efficiency and fit". Applied Health Economics and Health Policy.

Maintained by Dominic Muston. Last updated 9 months ago.

7.4 match 10 stars 6.43 score 1 scripts

jeffreyevans

yaImpute:Nearest Neighbor Observation Imputation and Evaluation Tools

Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation: analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.

Maintained by Jeffrey S. Evans. Last updated 6 months ago.

imputation cpp

6.4 match 3 stars 7.40 score 94 scripts 12 dependents

hdvinod

generalCorr:Generalized Correlations, Causal Paths and Portfolio Selection

Function gmcmtx0() computes a more reliable (general) correlation matrix. Since causal paths from data are important for all sciences, the package provides many sophisticated functions. causeSummBlk() and causeSum2Blk() give easy-to-interpret causal paths. Let Z denote control variables and compare two flipped kernel regressions: X=f(Y, Z)+e1 and Y=g(X, Z)+e2. Our criterion Cr1 says that if |e1*Y|>|e2*X| then variation in X is more "exogenous or independent" than in Y, and the causal path is X to Y. Criterion Cr2 requires |e2|<|e1|. These inequalities between many absolute values are quantified by four orders of stochastic dominance. Our third criterion Cr3, for the causal path X to Y, requires new generalized partial correlations to satisfy |r*(x|y,z)|< |r*(y|x,z)|. The function parcorVec() reports generalized partials between the first variable and all others. The package provides several R functions including get0outliers() for outlier detection, bigfp() for numerical integration by the trapezoidal rule, stochdom2() for stochastic dominance, pillar3D() for 3D charts, canonRho() for generalized canonical correlations, depMeas() measures nonlinear dependence, and causeSummary(mtx) reports summary of causal paths among matrix columns. Portfolio selection: decileVote(), momentVote(), dif4mtx(), exactSdMtx() can rank several stocks. Functions whose names begin with 'boot' provide bootstrap statistical inference, including a new bootGcRsq() test for "Granger-causality" allowing nonlinear relations. A new tool for evaluation of out-of-sample portfolio performance is outOFsamp(). Panel data implementation is now included. See eight vignettes of the package for theory, examples, and usage tips. See Vinod (2019) \doi{10.1080/03610918.2015.1122048}.

Maintained by H. D. Vinod. Last updated 1 years ago.

10.6 match 2 stars 4.48 score 63 scripts 1 dependents

open-aims

bayesnec:A Bayesian No-Effect- Concentration (NEC) Algorithm

Implementation of No-Effect-Concentration estimation that uses 'brms' (see Burkner (2017)<doi:10.18637/jss.v080.i01>; Burkner (2018)<doi:10.32614/RJ-2018-017>; Carpenter 'et al.' (2017)<doi:10.18637/jss.v076.i01> to fit concentration(dose)-response data using Bayesian methods for the purpose of estimating 'ECx' values, but more particularly 'NEC' (see Fox (2010)<doi:10.1016/j.ecoenv.2009.09.012>), 'NSEC' (see Fisher and Fox (2023)<doi:10.1002/etc.5610>), and 'N(S)EC (see Fisher et al. 2023<doi:10.1002/ieam.4809>). A full description of this package can be found in Fisher 'et al.' (2024)<doi:10.18637/jss.v110.i05>. This package expands and supersedes an original version implemented in 'R2jags' (see Su and Yajima (2020)<https://CRAN.R-project.org/package=R2jags>; Fisher et al. (2020)<doi:10.5281/ZENODO.3966864>).

Maintained by Rebecca Fisher. Last updated 7 months ago.

bayesian-inference concentration-response ecotoxicology no-effect-concentration non-linear-decay threshold-derivation toxicology

5.8 match 12 stars 8.11 score 360 scripts

jepusto

scdhlm:Estimating Hierarchical Linear Models for Single-Case Designs

Provides a set of tools for estimating hierarchical linear models and effect sizes based on data from single-case designs. Functions are provided for calculating standardized mean difference effect sizes that are directly comparable to standardized mean differences estimated from between-subjects randomized experiments, as described in Hedges, Pustejovsky, and Shadish (2012) <DOI:10.1002/jrsm.1052>; Hedges, Pustejovsky, and Shadish (2013) <DOI:10.1002/jrsm.1086>; Pustejovsky, Hedges, and Shadish (2014) <DOI:10.3102/1076998614547577>; and Chen, Pustejovsky, Klingbeil, and Van Norman (2023) <DOI:10.1016/j.jsp.2023.02.002>. Includes an interactive web interface.

Maintained by James Pustejovsky. Last updated 1 years ago.

8.4 match 4 stars 5.62 score 52 scripts

bioc

HiCcompare:HiCcompare: Joint normalization and comparative analysis of multiple Hi-C datasets

HiCcompare provides functions for joint normalization and difference detection in multiple Hi-C datasets. HiCcompare operates on processed Hi-C data in the form of chromosome-specific chromatin interaction matrices. It accepts three-column tab-separated text files storing chromatin interaction matrices in a sparse matrix format which are available from several sources. HiCcompare is designed to give the user the ability to perform a comparative analysis on the 3-Dimensional structure of the genomes of cells in different biological states.`HiCcompare` differs from other packages that attempt to compare Hi-C data in that it works on processed data in chromatin interaction matrix format instead of pre-processed sequencing data. In addition, `HiCcompare` provides a non-parametric method for the joint normalization and removal of biases between two Hi-C datasets for the purpose of comparative analysis. `HiCcompare` also provides a simple yet robust method for detecting differences between Hi-C datasets.

Maintained by Mikhail Dozmorov. Last updated 5 months ago.

software hic sequencing normalization difference-detection hi-c visualization

5.5 match 19 stars 8.61 score 51 scripts 5 dependents

predictiveecology

Require:Installing and Loading R Packages for Reproducible Workflows

A single key function, 'Require' that makes rerun-tolerant versions of 'install.packages' and `require` for CRAN packages, packages no longer on CRAN (i.e., archived), specific versions of packages, and GitHub packages. This approach is developed to create reproducible workflows that are flexible and fast enough to use while in development stages, while able to build snapshots once a stable package collection is found. As with other functions in a reproducible workflow, this package emphasizes functions that return the same result whether it is the first or subsequent times running the function, with subsequent times being sufficiently fast that they can be run every time without undue waiting burden on the user or developer.

Maintained by Eliot J B McIntire. Last updated 14 days ago.

5.0 match 22 stars 9.42 score 144 scripts 13 dependents

cjvanlissa

tidySEM:Tidy Structural Equation Modeling

A tidy workflow for generating, estimating, reporting, and plotting structural equation models using 'lavaan', 'OpenMx', or 'Mplus'. Throughout this workflow, elements of syntax, results, and graphs are represented as 'tidy' data, making them easy to customize. Includes functionality to estimate latent class analyses, and to plot 'dagitty' and 'igraph' objects.

Maintained by Caspar J. van Lissa. Last updated 7 days ago.

4.4 match 58 stars 10.69 score 330 scripts 1 dependents

drostlab

philentropy:Similarity and Distance Quantification Between Probability Functions

Computes 46 optimized distance and similarity measures for comparing probability functions (Drost (2018) <doi:10.21105/joss.00765>). These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a core framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions.

Maintained by Hajk-Georg Drost. Last updated 3 months ago.

distance-measures distance-quantification information-theory jensen-shannon-divergence parametric-distributions similarity-measures statistics cpp

3.8 match 137 stars 12.44 score 484 scripts 24 dependents

richfitz

diversitree:Comparative 'Phylogenetic' Analyses of Diversification

Contains a number of comparative 'phylogenetic' methods, mostly focusing on analysing diversification and character evolution. Contains implementations of 'BiSSE' (Binary State 'Speciation' and Extinction) and its unresolved tree extensions, 'MuSSE' (Multiple State 'Speciation' and Extinction), 'QuaSSE', 'GeoSSE', and 'BiSSE-ness' Other included methods include Markov models of discrete and continuous trait evolution and constant rate 'speciation' and extinction.

Maintained by Richard G. FitzJohn. Last updated 6 months ago.

fftw3 gsl openblas cpp

5.5 match 33 stars 8.51 score 524 scripts 4 dependents

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 12 days ago.

data-manipulation grammar cpp

1.9 match 4.8k stars 24.68 score 659k scripts 7.8k dependents

chrisaberson

pwr2ppl:Power Analyses for Common Designs (Power to the People)

Statistical power analysis for designs including t-tests, correlations, multiple regression, ANOVA, mediation, and logistic regression. Functions accompany Aberson (2019) <doi:10.4324/9781315171500>.

Maintained by Chris Aberson. Last updated 3 years ago.

11.1 match 17 stars 4.16 score 17 scripts

bioc

ChIPpeakAnno:Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments, or any experiments that result in large number of genomic interval data

The package encompasses a range of functions for identifying the closest gene, exon, miRNA, or custom features—such as highly conserved elements and user-supplied transcription factor binding sites. Additionally, users can retrieve sequences around the peaks and obtain enriched Gene Ontology (GO) or Pathway terms. In version 2.0.5 and beyond, new functionalities have been introduced. These include features for identifying peaks associated with bi-directional promoters along with summary statistics (peaksNearBDP), summarizing motif occurrences in peaks (summarizePatternInPeaks), and associating additional identifiers with annotated peaks or enrichedGO (addGeneIDs). The package integrates with various other packages such as biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest, and stat to enhance its analytical capabilities.

Maintained by Jianhong Ou. Last updated 2 months ago.

annotation chipseq chipchip

5.2 match 8.75 score 584 scripts 6 dependents

ewenharrison

finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling

Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.

Maintained by Ewen Harrison. Last updated 7 months ago.

4.0 match 270 stars 11.43 score 1.0k scripts

bioc

XVector:Foundation of external vector representation and manipulation in Bioconductor

Provides memory efficient S4 classes for storing sequences "externally" (e.g. behind an R external pointer, or on disk).

Maintained by Hervé Pagès. Last updated 2 months ago.

infrastructure datarepresentation bioconductor-package core-package zlib

4.0 match 2 stars 11.36 score 67 scripts 1.7k dependents

bioc

compcodeR:RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods

This package provides extensive functionality for comparing results obtained by different methods for differential expression analysis of RNAseq data. It also contains functions for simulating count data. Finally, it provides convenient interfaces to several packages for performing the differential expression analysis. These can also be used as templates for setting up and running a user-defined differential analysis workflow within the framework of the package.

Maintained by Charlotte Soneson. Last updated 3 months ago.

immunooncology rnaseq differentialexpression

5.6 match 11 stars 8.06 score 26 scripts

ms609

Quartet:Comparison of Phylogenetic Trees Using Quartet and Split Measures

Calculates the number of four-taxon subtrees consistent with a pair of cladograms, calculating the symmetric quartet distance of Bandelt & Dress (1986), Reconstructing the shape of a tree from observed dissimilarity data, Advances in Applied Mathematics, 7, 309-343 <doi:10.1016/0196-8858(86)90038-2>, and using the tqDist algorithm of Sand et al. (2014), tqDist: a library for computing the quartet and triplet distances between binary or general trees, Bioinformatics, 30, 2079–2080 <doi:10.1093/bioinformatics/btu157> for pairs of binary trees.

Maintained by Martin R. Smith. Last updated 2 months ago.

bioinformatics comparison phylogenetic-trees phylogenetics quartet quartet-distance research-tool tree cpp

5.6 match 14 stars 8.00 score 40 scripts

simongrund1

mitml:Tools for Multiple Imputation in Multilevel Modeling

Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets.

Maintained by Simon Grund. Last updated 1 years ago.

imputation missing-data mixed-effects multilevel-data multilevel-models

3.6 match 29 stars 12.36 score 246 scripts 153 dependents

nicebread

RSA:Response Surface Analysis

Advanced response surface analysis. The main function RSA computes and compares several nested polynomial regression models (full second- or third-order polynomial, shifted and rotated squared difference model, rising ridge surfaces, basic squared difference model, asymmetric or level-dependent congruence effect models). The package provides plotting functions for 3d wireframe surfaces, interactive 3d plots, and contour plots. Calculates many surface parameters (a1 to a5, principal axes, stationary point, eigenvalues) and provides standard, robust, or bootstrapped standard errors and confidence intervals for them.

Maintained by Felix Schönbrodt. Last updated 11 months ago.

7.1 match 17 stars 6.30 score 26 scripts 1 dependents

desctable

desctable:Produce Descriptive and Comparative Tables Easily

Easily create descriptive and comparative tables. It makes use and integrates directly with the tidyverse family of packages, and pipes. Tables are produced as (nested) dataframes for easy manipulation.

Maintained by Maxime Wack. Last updated 3 years ago.

markdown statistics tidyverse

6.5 match 52 stars 6.85 score 45 scripts

psavary3

graph4lg:Build Graphs for Landscape Genetics Analysis

Build graphs for landscape genetics analysis. This set of functions can be used to import and convert spatial and genetic data initially in different formats, import landscape graphs created with 'GRAPHAB' software (Foltete et al., 2012) <doi:10.1016/j.envsoft.2012.07.002>, make diagnosis plots of isolation by distance relationships in order to choose how to build genetic graphs, create graphs with a large range of pruning methods, weight their links with several genetic distances, plot and analyse graphs, compare them with other graphs. It uses functions from other packages such as 'adegenet' (Jombart, 2008) <doi:10.1093/bioinformatics/btn129> and 'igraph' (Csardi et Nepusz, 2006) <https://igraph.org/>. It also implements methods commonly used in landscape genetics to create graphs, described by Dyer et Nason (2004) <doi:10.1111/j.1365-294X.2004.02177.x> and Greenbaum et Fefferman (2017) <doi:10.1111/mec.14059>, and to analyse distance data (van Strien et al., 2015) <doi:10.1038/hdy.2014.62>.

Maintained by Paul Savary. Last updated 2 years ago.

9.9 match 3 stars 4.51 score 54 scripts

computationalstylistics

stylo:Stylometric Multivariate Analyses

Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.

Maintained by Maciej Eder. Last updated 2 months ago.

5.2 match 186 stars 8.59 score 462 scripts

yihui

knitr:A General-Purpose Package for Dynamic Report Generation in R

Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.

Maintained by Yihui Xie. Last updated 1 days ago.

dynamic-documents knitr literate-programming rmarkdown sweave

1.9 match 2.4k stars 23.62 score 116k scripts 4.2k dependents

choonghyunryu

dlookr:Tools for Data Diagnosis, Exploration, Transformation

A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.

Maintained by Choonghyun Ryu. Last updated 9 months ago.

4.0 match 212 stars 11.05 score 748 scripts 2 dependents

hneth

unikn:Graphical Elements of the University of Konstanz's Corporate Design

Define and use graphical elements of corporate design manuals in R. The 'unikn' package provides color functions (by defining dedicated colors and color palettes, and commands for finding, changing, viewing, and using them) and styled text elements (e.g., for marking, underlining, or plotting colored titles). The pre-defined range of colors and text decoration functions is based on the corporate design of the University of Konstanz <https://www.uni-konstanz.de/>, but can be adapted and extended for other purposes or institutions.

Maintained by Hansjoerg Neth. Last updated 3 months ago.

branding color color-palette colorscheme corporate-design palette text-decoration university-colors visual-identity

5.0 match 39 stars 8.82 score 156 scripts 2 dependents

radiant-rstats

radiant.basics:Basics Menu for Radiant: Business Analytics using R and Shiny

The Radiant Basics menu includes interfaces for probability calculation, central limit theorem simulation, comparing means and proportions, goodness-of-fit testing, cross-tabs, and correlation. The application extends the functionality in 'radiant.data'.

Maintained by Vincent Nijs. Last updated 10 months ago.

7.9 match 8 stars 5.56 score 79 scripts 3 dependents

gergness

srvyr:'dplyr'-Like Syntax for Summary Statistics of Survey Data

Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.

Maintained by Greg Freedman Ellis. Last updated 1 months ago.

survey

3.1 match 215 stars 13.88 score 1.8k scripts 15 dependents

nakarinp

longreadvqs:Viral Quasispecies Comparison from Long-Read Sequencing Data

Performs variety of viral quasispecies diversity analyses [see Pamornchainavakul et al. (2024) <doi:10.21203/rs.3.rs-4637890/v1>] based on long-read sequence alignment. Main functions include 1) sequencing error and other noise minimization and read sampling, 2) Single nucleotide variant (SNV) profiles comparison, and 3) viral quasispecies profiles comparison and visualization.

Maintained by Nakarin Pamornchainavakul. Last updated 7 months ago.

9.3 match 4.65 score 4 scripts

cran

epiR:Tools for the Analysis of Epidemiological Data

Tools for the analysis of epidemiological and surveillance data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, computation of confidence intervals around incidence risk and incidence rate estimates and sample size calculations for cross-sectional, case-control and cohort studies. Surveillance tools include functions to calculate an appropriate sample size for 1- and 2-stage representative freedom surveys, functions to estimate surveillance system sensitivity and functions to support scenario tree modelling analyses.

Maintained by Mark Stevenson. Last updated 2 months ago.

5.3 match 10 stars 8.18 score 10 dependents

alexkowa

EnvStats:Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).

Maintained by Alexander Kowarik. Last updated 16 days ago.

3.4 match 26 stars 12.80 score 2.4k scripts 46 dependents

briencj

growthPheno:Functional Analysis of Phenotypic Growth Data to Smooth and Extract Traits

Assists in the plotting and functional smoothing of traits measured over time and the extraction of features from these traits, implementing the SET (Smoothing and Extraction of Traits) method described in Brien et al. (2020) Plant Methods, 16. Smoothing of growth trends for individual plants using natural cubic smoothing splines or P-splines is available for removing transient effects and segmented smoothing is available to deal with discontinuities in growth trends. There are graphical tools for assessing the adequacy of trait smoothing, both when using this and other packages, such as those that fit nonlinear growth models. A range of per-unit (plant, pot, plot) growth traits or features can be extracted from the data, including single time points, interval growth rates and other growth statistics, such as maximum growth or days to maximum growth. The package also has tools adapted to inputting data from high-throughput phenotyping facilities, such from a Lemna-Tec Scananalyzer 3D (see <https://www.youtube.com/watch?v=MRAF_mAEa7E/> for more information). The package 'growthPheno' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 2 months ago.

6.6 match 6 stars 6.53 score 42 scripts

flr

FLCore:Core Package of FLR, Fisheries Modelling in R

Core classes and methods for FLR, a framework for fisheries modelling and management strategy simulation in R. Developed by a team of fisheries scientists in various countries. More information can be found at <http://flr-project.org/>.

Maintained by Iago Mosqueira. Last updated 9 days ago.

fisheries flr fisheries-modelling

4.9 match 16 stars 8.78 score 956 scripts 23 dependents

bioc

TEKRABber:An R package estimates the correlations of orthologs and transposable elements between two species

TEKRABber is made to provide a user-friendly pipeline for comparing orthologs and transposable elements (TEs) between two species. It considers the orthology confidence between two species from BioMart to normalize expression counts and detect differentially expressed orthologs/TEs. Then it provides one to one correlation analysis for desired orthologs and TEs. There is also an app function to have a first insight on the result. Users can prepare orthologs/TEs RNA-seq expression data by their own preference to run TEKRABber following the data structure mentioned in the vignettes.

Maintained by Yao-Chung Chen. Last updated 20 days ago.

differentialexpression normalization transcription geneexpression bioconductor cpp

8.0 match 3 stars 5.33 score 18 scripts

gavinsimpson

analogue:Analogue and Weighted Averaging Methods for Palaeoecology

Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.

Maintained by Gavin L. Simpson. Last updated 6 months ago.

4.8 match 14 stars 8.96 score 185 scripts 4 dependents

bryanhanson

LearnPCA:Functions, Data Sets and Vignettes to Aid in Learning Principal Components Analysis (PCA)

Principal component analysis (PCA) is one of the most widely used data analysis techniques. This package provides a series of vignettes explaining PCA starting from basic concepts. The primary purpose is to serve as a self-study resource for anyone wishing to understand PCA better. A few convenience functions are provided as well.

Maintained by Bryan A. Hanson. Last updated 10 months ago.

6.8 match 10 stars 6.20 score 1 scripts

civilstat

RankingProject:The Ranking Project: Visualizations for Comparing Populations

Functions to generate plots and tables for comparing independently-sampled populations. Companion package to "A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals" by Wright, Klein, and Wieczorek (2019) <DOI:10.1080/00031305.2017.1392359> and "A Joint Confidence Region for an Overall Ranking of Populations" by Klein, Wright, and Wieczorek (2020) <DOI:10.1111/rssc.12402>.

Maintained by Jerzy Wieczorek. Last updated 3 years ago.

8.4 match 7 stars 5.02 score 10 scripts

ropensci

redland:RDF Library Bindings in R

Provides methods to parse, query and serialize information stored in the Resource Description Framework (RDF). RDF is described at <https://www.w3.org/TR/rdf-primer/>. This package supports RDF by implementing an R interface to the Redland RDF C library, described at <https://librdf.org/docs/api/index.html>. In brief, RDF provides a structured graph consisting of Statements composed of Subject, Predicate, and Object Nodes.

Maintained by Matthew B. Jones. Last updated 1 years ago.

redland

5.4 match 17 stars 7.85 score 98 scripts 13 dependents

emcramer

CHOIRBM:Plots the CHOIR Body Map

Collection of utility functions for visualizing body map data collected with the Collaborative Health Outcomes Information Registry.

Maintained by Eric Cramer. Last updated 1 years ago.

body-map cbm choir data-visualization visualization

7.6 match 5 stars 5.51 score 26 scripts

xiw021

ccmEstimator:Comparative Causal Mediation Estimation

Functions to perform comparative causal mediation analysis to compare the mediation effects of different treatments via a common mediator. Results contain the estimates and confidence intervals for the two comparative causal mediation analysis estimands, as well as the ATE and ACME for each treatment. Functions provided in the package will automatically assess the comparative causal mediation analysis scope conditions (i.e. for each comparative causal mediation estimand, a numerator and denominator that are both estimated with the desired statistical significance and of the same sign). Results will be returned for each comparative causal mediation estimand only if scope conditions are met for it. See details in Bansak(2020)<doi:10.1017/pan.2019.31>.

Maintained by Xiaohan Wu. Last updated 4 years ago.

11.3 match 3.70 score 3 scripts

jgoungounga

xhaz:Excess Hazard Modelling Considering Inappropriate Mortality Rates

Fits relative survival regression models with or without proportional excess hazards and with the additional possibility to correct for background mortality by one or more parameter(s). These models are relevant when the observed mortality in the studied group is not comparable to that of the general population or in population-based studies where the available life tables used for net survival estimation are insufficiently stratified. In the latter case, the proposed model by Touraine et al. (2020) <doi:10.1177/0962280218823234> can be used. The user can also fit a model that relaxes the proportional expected hazards assumption considered in the Touraine et al. excess hazard model. This extension was proposed by Mba et al. (2020) <doi:10.1186/s12874-020-01139-z> to allow non-proportional effects of the additional variable on the general population mortality. In non-population-based studies, researchers can identify non-comparability source of bias in terms of expected mortality of selected individuals. An excess hazard model correcting this selection bias is presented in Goungounga et al. (2019) <doi:10.1186/s12874-019-0747-3>. This class of model with a random effect at the cluster level on excess hazard is presented in Goungounga et al. (2023) <doi:10.1002/bimj.202100210>.

Maintained by Juste Goungounga. Last updated 9 months ago.

13.7 match 3.04 score 11 scripts

rstudio

tfruns:Training Run Tools for 'TensorFlow'

Create and manage unique directories for each 'TensorFlow' training run. Provides a unique, time stamped directory for each run along with functions to retrieve the directory of the latest run or latest several runs.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

3.5 match 34 stars 11.80 score 325 scripts 77 dependents

bjoernrohr

sampcompR:Comparing and Visualizing Differences Between Surveys

Easily analyze and visualize differences between samples (e.g., benchmark comparisons, nonresponse comparisons in surveys) on three levels. The comparisons can be univariate, bivariate or multivariate. On univariate level the variables of interest of a survey and a comparison survey (i.e. benchmark) are compared, by calculating one of several difference measures (e.g., relative difference in mean), and an average difference between the surveys. On bivariate level a function can calculate significant differences in correlations for the surveys. And on multivariate levels a function can calculate significant differences in model coefficients between the surveys of comparison. All of those differences can be easily plotted and outputted as a table. For more detailed information on the methods and example use see Rohr, B., Silber, H., & Felderer, B. (2024). Comparing the Accuracy of Univariate, Bivariate, and Multivariate Estimates across Probability and Nonprobability Surveys with Population Benchmarks. Sociological Methodology <doi:10.1177/00811750241280963>.

Maintained by Bjoern Rohr. Last updated 3 days ago.

9.2 match 4 stars 4.51 score 4 scripts

insightsengineering

tern:Create Common TLGs Used in Clinical Trials

Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.

Maintained by Joe Zhu. Last updated 2 months ago.

clinical-trials graphs listings nest outputs tables

3.3 match 79 stars 12.62 score 186 scripts 9 dependents

bioc

netZooR:Unified methods for the inference and analysis of gene regulatory networks

netZooR unifies the implementations of several Network Zoo methods (netzoo, netzoo.github.io) into a single package by creating interfaces between network inference and network analysis methods. Currently, the package has 3 methods for network inference including PANDA and its optimized implementation OTTER (network reconstruction using mutliple lines of biological evidence), LIONESS (single-sample network inference), and EGRET (genotype-specific networks). Network analysis methods include CONDOR (community detection), ALPACA (differential community detection), CRANE (significance estimation of differential modules), MONSTER (estimation of network transition states). In addition, YARN allows to process gene expresssion data for tissue-specific analyses and SAMBAR infers missing mutation data based on pathway information.

Maintained by Tara Eicher. Last updated 8 days ago.

networkinference network generegulation geneexpression transcription microarray graphandnetwork gene-regulatory-network transcription-factors

5.1 match 105 stars 7.98 score

bioc

PhyloProfile:PhyloProfile

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Maintained by Vinh Tran. Last updated 6 days ago.

software visualization datarepresentation multiplecomparison functionalprediction dimensionreduction bioinformatics heatmap interactive-visualizations orthologs phylogenetic-profile shiny

5.3 match 33 stars 7.77 score 10 scripts

bioc

genefu:Computation of Gene Expression-Based Signatures in Breast Cancer

This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.

Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.

differentialexpression geneexpression visualization clustering classification

5.5 match 7.42 score 193 scripts 3 dependents

csafe-isu

handwriterRF:Handwriting Analysis with Random Forests

Perform forensic handwriting analysis of two scanned handwritten documents. This package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>. Similarity measures and a random forest produce a score-based likelihood ratio that quantifies the strength of the evidence in favor of the documents being written by the same writer or different writers.

Maintained by Stephanie Reinders. Last updated 8 days ago.

jags cpp

6.6 match 2 stars 6.18 score 15 scripts 1 dependents

jglev

veccompare:Perform Set Operations on Vectors, Automatically Generating All n-Wise Comparisons, and Create Markdown Output

Automates set operations (i.e., comparisons of overlap) between multiple vectors. It also contains a function for automating reporting in 'RMarkdown', by generating markdown output for easy analysis, as well as an 'RMarkdown' template for use with 'RStudio'.

Maintained by Jacob Gerard Levernier. Last updated 8 years ago.

11.3 match 8 stars 3.60 score 10 scripts

xrobin

pROC:Display and Analyze ROC Curves

Tools for visualizing, smoothing and comparing receiver operating characteristic (ROC curves). (Partial) area under the curve (AUC) can be compared with statistical tests based on U-statistics or bootstrap. Confidence intervals can be computed for (p)AUC or ROC curves.

Maintained by Xavier Robin. Last updated 4 months ago.

bootstrapping covariance hypothesis-testing machine-learning plot plotting roc roc-curve variance cpp

2.7 match 125 stars 15.18 score 16k scripts 445 dependents

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

3.5 match 228 stars 11.43 score 221 scripts 1 dependents

stan-dev

rstanarm:Bayesian Applied Regression Modeling via Stan

Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.

Maintained by Ben Goodrich. Last updated 9 months ago.

bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics multilevel-models rstan rstanarm stan statistical-modeling cpp

2.6 match 393 stars 15.68 score 5.0k scripts 13 dependents

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 10 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

2.5 match 375 stars 16.11 score 17k scripts 115 dependents

biostatomics

Coxmos:Cox MultiBlock Survival

This software package provides Cox survival analysis for high-dimensional and multiblock datasets. It encompasses a suite of functions dedicated from the classical Cox regression to newest analysis, including Cox proportional hazards model, Stepwise Cox regression, and Elastic-Net Cox regression, Sparse Partial Least Squares Cox regression (sPLS-COX) incorporating three distinct strategies, and two Multiblock-PLS Cox regression (MB-sPLS-COX) methods. This tool is designed to adeptly handle high-dimensional data, and provides tools for cross-validation, plot generation, and additional resources for interpreting results. While references are available within the corresponding functions, key literature is mentioned below. Terry M Therneau (2024) <https://CRAN.R-project.org/package=survival>, Noah Simon et al. (2011) <doi:10.18637/jss.v039.i05>, Philippe Bastien et al. (2005) <doi:10.1016/j.csda.2004.02.005>, Philippe Bastien (2008) <doi:10.1016/j.chemolab.2007.09.009>, Philippe Bastien et al. (2014) <doi:10.1093/bioinformatics/btu660>, Kassu Mehari Beyene and Anouar El Ghouch (2020) <doi:10.1002/sim.8671>, Florian Rohart et al. (2017) <doi:10.1371/journal.pcbi.1005752>.

Maintained by Pedro Salguero García. Last updated 10 days ago.

7.6 match 1 stars 5.30 score 5 scripts

csgillespie

poweRlaw:Analysis of Heavy Tailed Distributions

An implementation of maximum likelihood estimators for a variety of heavy tailed distributions, including both the discrete and continuous power law distributions. Additionally, a goodness-of-fit based approach is used to estimate the lower cut-off for the scaling region.

Maintained by Colin Gillespie. Last updated 1 months ago.

clauset powerlaw

3.1 match 112 stars 12.79 score 332 scripts 32 dependents