R-universe search: variability

pharmaverse

admiral:ADaM in R Asset Library

A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam>).

Maintained by Ben Straub. Last updated 4 days ago.

cdisc clinical-trials open-source

100.5 match 236 stars 13.89 score 486 scripts 4 dependents

crunch-io

crunch:Crunch.io Data Tools

The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.

Maintained by Greg Freedman Ellis. Last updated 11 days ago.

118.8 match 9 stars 10.53 score 200 scripts 2 dependents

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

92.4 match 228 stars 11.43 score 221 scripts 1 dependents

stan-dev

posterior:Tools for Working with Posterior Distributions

Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.

Maintained by Paul-Christian Bürkner. Last updated 11 days ago.

bayes bayesian mcmc

63.4 match 168 stars 16.13 score 3.3k scripts 342 dependents

choonghyunryu

dlookr:Tools for Data Diagnosis, Exploration, Transformation

A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.

Maintained by Choonghyun Ryu. Last updated 9 months ago.

81.5 match 212 stars 11.05 score 748 scripts 2 dependents

adeverse

ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.

Maintained by Aurélie Siberchicot. Last updated 13 days ago.

openblas cpp

52.8 match 39 stars 14.96 score 2.2k scripts 256 dependents

revelle

psych:Procedures for Psychological, Psychometric, and Personality Research

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 3 months ago.

56.2 match 52 stars 13.94 score 29k scripts 317 dependents

tidymodels

recipes:Preprocessing and Feature Engineering Steps for Modeling

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Maintained by Max Kuhn. Last updated 6 days ago.

40.9 match 584 stars 18.71 score 7.2k scripts 380 dependents

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 13 days ago.

data-manipulation grammar cpp

27.4 match 4.8k stars 24.68 score 659k scripts 7.8k dependents

beckerbenj

eatGADS:Data Management of Large Hierarchical Data

Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.

Maintained by Benjamin Becker. Last updated 23 days ago.

84.5 match 1 stars 7.36 score 34 scripts 1 dependents

gdemin

expss:Tables, Labels and Some Useful Functions from Spreadsheets and 'SPSS' Statistics

Package computes and displays tables with support for 'SPSS'-style labels, multiple and nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', 'Shiny', '*.xlsx' files, R and 'Jupyter' notebooks. Methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package brings popular data transformation functions from 'SPSS' Statistics and 'Excel': 'RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP' and etc. These functions are very useful for data processing in marketing research surveys. Package intended to help people to move data processing from 'Excel' and 'SPSS' to R.

Maintained by Gregory Demin. Last updated 11 months ago.

excel labels labels-support msexcel pivot-tables recode spss spss-statistics tables variable-labels vlookup

54.9 match 84 stars 11.00 score 1.8k scripts 4 dependents

njtierney

naniar:Data Structures, Summaries, and Visualisations for Missing Data

Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.

Maintained by Nicholas Tierney. Last updated 4 days ago.

data-visualisation ggplot2 missing-data missingness tidy-data

37.4 match 657 stars 15.63 score 5.1k scripts 9 dependents

afialkowski

SimMultiCorrData:Simulation of Correlated Data with Multiple Variable Types

Generate continuous (normal or non-normal), binary, ordinal, and count (Poisson or Negative Binomial) variables with a specified correlation matrix. It can also produce a single continuous variable. This package can be used to simulate data sets that mimic real-world situations (i.e. clinical or genetic data sets, plasmodes). All variables are generated from standard normal variables with an imposed intermediate correlation matrix. Continuous variables are simulated by specifying mean, variance, skewness, standardized kurtosis, and fifth and sixth standardized cumulants using either Fleishman's third-order (<DOI:10.1007/BF02293811>) or Headrick's fifth-order (<DOI:10.1016/S0167-9473(02)00072-5>) polynomial transformation. Binary and ordinal variables are simulated using a modification of the ordsample() function from 'GenOrd'. Count variables are simulated using the inverse cdf method. There are two simulation pathways which differ primarily according to the calculation of the intermediate correlation matrix. In Correlation Method 1, the intercorrelations involving count variables are determined using a simulation based, logarithmic correlation correction (adapting Yahav and Shmueli's 2012 method, <DOI:10.1002/asmb.901>). In Correlation Method 2, the count variables are treated as ordinal (adapting Barbiero and Ferrari's 2015 modification of GenOrd, <DOI:10.1002/asmb.2072>). There is an optional error loop that corrects the final correlation matrix to be within a user-specified precision value of the target matrix. The package also includes functions to calculate standardized cumulants for theoretical distributions or from real data sets, check if a target correlation matrix is within the possible correlation bounds (given the distributions of the simulated variables), summarize results (numerically or graphically), to verify valid power method pdfs, and to calculate lower standardized kurtosis bounds.

Maintained by Allison Cynthia Fialkowski. Last updated 7 years ago.

76.1 match 12 stars 7.58 score 44 scripts 6 dependents

yrosseel

lavaan:Latent Variable Analysis

Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.

Maintained by Yves Rosseel. Last updated 6 hours ago.

factor-analysis growth-curve-models latent-variables missing-data multilevel-models multivariate-analysis path-analysis psychometrics statistical-modeling structural-equation-modeling

32.5 match 453 stars 16.83 score 8.4k scripts 216 dependents

easystats

datawizard:Easy Data Wrangling and Statistical Transformations

A lightweight package to assist in key steps involved in any data analysis workflow: (1) wrangling the raw data to get it in the needed form, (2) applying preprocessing steps and statistical transformations, and (3) compute statistical summaries of data properties and distributions. It is also the data wrangling backend for packages in 'easystats' ecosystem. References: Patil et al. (2022) <doi:10.21105/joss.04684>.

Maintained by Etienne Bacher. Last updated 10 days ago.

data dplyr hacktoberfest janitor manipulation reshape tidyr wrangling

36.3 match 222 stars 14.71 score 436 scripts 119 dependents

nlmixr2

rxode2:Facilities for Simulating from ODE-Based Models

Facilities for running simulations from ordinary differential equation ('ODE') models, such as pharmacometrics and other compartmental models. A compilation manager translates the ODE model into C, compiles it, and dynamically loads the object code into R for improved computational efficiency. An event table object facilitates the specification of complex dosing regimens (optional) and sampling schedules. NB: The use of this package requires both C and Fortran compilers, for details on their use with R please see Section 6.3, Appendix A, and Appendix D in the "R Administration and Installation" manual. Also the code is mostly released under GPL. The 'VODE' and 'LSODA' are in the public domain. The information is available in the inst/COPYRIGHTS.

Maintained by Matthew L. Fidler. Last updated 30 days ago.

fortran openblas cpp openmp

46.7 match 40 stars 11.24 score 220 scripts 13 dependents

ewenharrison

finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling

Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.

Maintained by Ewen Harrison. Last updated 7 months ago.

45.8 match 270 stars 11.43 score 1.0k scripts

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 4 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

37.5 match 182 stars 13.71 score 1.3k scripts 22 dependents

kkholst

lava:Latent Variable Models

A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.

Maintained by Klaus K. Holst. Last updated 2 months ago.

latent-variable-models simulation statistics structural-equation-models

39.1 match 33 stars 12.85 score 610 scripts 476 dependents

heli-xu

findSVI:Calculate Social Vulnerability Index for Communities

Developed by CDC/ATSDR (Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry), Social Vulnerability Index (SVI) serves as a tool to assess the resilience of communities by taking into account socioeconomic and demographic factors. Provided with year(s), region(s) and a geographic level of interest, 'findSVI' retrieves required variables from US census data and calculates SVI for communities in the specified area based on CDC/ATSDR SVI documentation. Reference for the calculation methods: Flanagan BE, Gregory EW, Hallisey EJ, Heitgerd JL, Lewis B (2011) <doi:10.2202/1547-7355.1792>.

Maintained by Heli Xu. Last updated 1 months ago.

86.5 match 12 stars 5.68 score 16 scripts

mjskay

tidybayes:Tidy Data and 'Geoms' for Bayesian Models

Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.

Maintained by Matthew Kay. Last updated 6 months ago.

bayesian-data-analysis brms ggplot2 jags stan tidy-data visualization

33.3 match 733 stars 14.72 score 7.3k scripts 20 dependents

winvector

vtreat:A Statistically Sound 'data.frame' Processor/Conditioner

A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.

Maintained by John Mount. Last updated 2 months ago.

categorical-variables machine-learning-algorithms nested-models prepare-data

43.1 match 285 stars 11.19 score 328 scripts 1 dependents

rhartmano

labelr:Label Data Frames, Variables, and Values

Create and use data frame labels for data frame objects (frame labels), their columns (name labels), and individual values of a column (value labels). Value labels include one-to-one and many-to-one labels for nominal and ordinal variables, as well as numerical range-based value labels for continuous variables. Convert value-labeled variables so each value is replaced by its corresponding value label. Add values-converted-to-labels columns to a value-labeled data frame while preserving parent columns. Filter and subset a value-labeled data frame using labels, while returning results in terms of values. Overlay labels in place of values in common R commands to increase interpretability. Generate tables of value frequencies, with categories expressed as raw values or as labels. Access data frames that show value-to-label mappings for easy reference.

Maintained by Robert Hartman. Last updated 7 months ago.

83.0 match 3 stars 5.65 score 10 scripts

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

40.9 match 10.82 score 10k scripts 54 dependents

dreamrs

fresh:Create Custom 'Bootstrap' Themes to Use in 'Shiny'

Customize 'Bootstrap' and 'Bootswatch' themes, like colors, fonts, grid layout, to use in 'Shiny' applications, 'rmarkdown' documents and 'flexdashboard'.

Maintained by Victor Perrier. Last updated 9 months ago.

bootstrap shiny shiny-applications shiny-themes

36.2 match 228 stars 11.99 score 546 scripts 47 dependents

big-life-lab

recodeflow:Contains functions to interface with variable details sheets, including recoding variables and converting them to PMML

Recode and harmonize data using variable and details sheets.

Maintained by Yulric Sequeria. Last updated 6 days ago.

63.1 match 6 stars 6.75 score 7 scripts

harrelfe

Hmisc:Harrell Miscellaneous

Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.

Maintained by Frank E Harrell Jr. Last updated 9 hours ago.

fortran

22.8 match 210 stars 17.61 score 17k scripts 750 dependents

mayoverse

arsenal:An Arsenal of 'R' Functions for Large-Scale Statistical Summaries

An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types 'by' the levels of one or more categorical variables; paired(), a Table-1-like summary of multiple variable types paired across two time points; modelsum(), which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates); freqlist(), a powerful frequency table across many categorical variables; comparedf(), a function for comparing data.frames; and write2(), a function to output tables to a document.

Maintained by Ethan Heinzen. Last updated 7 months ago.

baseline-characteristics descriptive-statistics modeling paired-comparisons reporting statistics tableone

29.7 match 225 stars 13.45 score 1.2k scripts 16 dependents

daya6489

SmartEDA:Summarize and Explore the Data

Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.

Maintained by Dayanand Ubrangala. Last updated 1 years ago.

analysis exploratory-data-analysis

54.4 match 42 stars 7.25 score 214 scripts

biomodhub

biomod2:Ensemble Platform for Species Distribution Modeling

Functions for species distribution modeling, calibration and evaluation, ensemble of models, ensemble forecasting and visualization. The package permits to run consistently up to 10 single models on a presence/absences (resp presences/pseudo-absences) dataset and to combine them in ensemble models and ensemble projections. Some bench of other evaluation and visualisation tools are also available within the package.

Maintained by Maya Gueguen. Last updated 5 days ago.

28.3 match 95 stars 13.88 score 536 scripts 7 dependents

koalaverse

vip:Variable Importance Plots

A general framework for constructing variable importance plots from various types of machine learning models in R. Aside from some standard model- specific variable importance measures, this package also provides model- agnostic approaches that can be applied to any supervised learning algorithm. These include 1) an efficient permutation-based variable importance measure, 2) variable importance based on Shapley values (Strumbelj and Kononenko, 2014) <doi:10.1007/s10115-013-0679-x>, and 3) the variance-based approach described in Greenwell et al. (2018) <arXiv:1805.04755>. A variance-based method for quantifying the relative strength of interaction effects is also included (see the previous reference for details).

Maintained by Brandon M. Greenwell. Last updated 2 years ago.

interaction-effect machine-learning partial-dependence-plot supervised-learning-algorithms variable-importance variable-importance-plots

33.2 match 187 stars 11.61 score 3.5k scripts 6 dependents

uupharmacometrics

xpose4:Diagnostics for Nonlinear Mixed-Effect Models

A model building aid for nonlinear mixed-effects (population) model analysis using NONMEM, facilitating data set checkout, exploration and visualization, model diagnostics, candidate covariate identification and model comparison. The methods are described in Keizer et al. (2013) <doi:10.1038/psp.2013.24>, and Jonsson et al. (1999) <doi:10.1016/s0169-2607(98)00067-4>.

Maintained by Andrew C. Hooker. Last updated 1 years ago.

diagnostics nonmem pharmacometrics population-model xpose

51.5 match 35 stars 7.30 score 315 scripts

r-forge

variables:Variable Descriptions

Abstract descriptions of (yet) unobserved variables.

Maintained by Torsten Hothorn. Last updated 4 days ago.

62.5 match 5.92 score 11 scripts 12 dependents

afialkowski

SimCorrMix:Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

Generate continuous (normal, non-normal, or mixture distributions), binary, ordinal, and count (regular or zero-inflated, Poisson or Negative Binomial) variables with a specified correlation matrix, or one continuous variable with a mixture distribution. This package can be used to simulate data sets that mimic real-world clinical or genetic data sets (i.e., plasmodes, as in Vaughan et al., 2009 <DOI:10.1016/j.csda.2008.02.032>). The methods extend those found in the 'SimMultiCorrData' R package. Standard normal variables with an imposed intermediate correlation matrix are transformed to generate the desired distributions. Continuous variables are simulated using either Fleishman (1978)'s third order <DOI:10.1007/BF02293811> or Headrick (2002)'s fifth order <DOI:10.1016/S0167-9473(02)00072-5> polynomial transformation method (the power method transformation, PMT). Non-mixture distributions require the user to specify mean, variance, skewness, standardized kurtosis, and standardized fifth and sixth cumulants. Mixture distributions require these inputs for the component distributions plus the mixing probabilities. Simulation occurs at the component level for continuous mixture distributions. The target correlation matrix is specified in terms of correlations with components of continuous mixture variables. These components are transformed into the desired mixture variables using random multinomial variables based on the mixing probabilities. However, the package provides functions to approximate expected correlations with continuous mixture variables given target correlations with the components. Binary and ordinal variables are simulated using a modification of ordsample() in package 'GenOrd'. Count variables are simulated using the inverse CDF method. There are two simulation pathways which calculate intermediate correlations involving count variables differently. Correlation Method 1 adapts Yahav and Shmueli's 2012 method <DOI:10.1002/asmb.901> and performs best with large count variable means and positive correlations or small means and negative correlations. Correlation Method 2 adapts Barbiero and Ferrari's 2015 modification of the 'GenOrd' package <DOI:10.1002/asmb.2072> and performs best under the opposite scenarios. The optional error loop may be used to improve the accuracy of the final correlation matrix. The package also contains functions to calculate the standardized cumulants of continuous mixture distributions, check parameter inputs, calculate feasible correlation boundaries, and summarize and plot simulated variables.

Maintained by Allison Cynthia Fialkowski. Last updated 7 years ago.

70.5 match 5 stars 5.24 score 14 scripts

r-lib

tidyselect:Select from a Set of Strings

A backend for the selecting functions of the 'tidyverse'. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other 'tidyverse' interfaces for selection.

Maintained by Lionel Henry. Last updated 4 months ago.

19.7 match 130 stars 18.31 score 1.9k scripts 8.2k dependents

ekstroem

dataMaid:A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process

Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.

Maintained by Claus Thorn Ekstrøm. Last updated 3 years ago.

data-cleaning data-screening reproducible-research

47.6 match 143 stars 7.53 score 236 scripts

vegandevs

vegan:Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Maintained by Jari Oksanen. Last updated 16 days ago.

ecological-modelling ecology ordination fortran openblas

18.1 match 472 stars 19.41 score 15k scripts 440 dependents

rdatatable

data.table:Extension of `data.frame`

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.

Maintained by Tyson Barrett. Last updated 4 hours ago.

14.5 match 3.7k stars 23.52 score 230k scripts 4.6k dependents

insightsengineering

rtables:Reporting Tables

Reporting tables often have structure that goes beyond simple rectangular data. The 'rtables' package provides a framework for declaring complex multi-level tabulations and then applying them to data. This framework models both tabulation and the resulting tables as hierarchical, tree-like objects which support sibling sub-tables, arbitrary splitting or grouping of data in row and column dimensions, cells containing multiple values, and the concept of contextual summary computations. A convenient pipe-able interface is provided for declaring table layouts and the corresponding computations, and then applying them to data.

Maintained by Joe Zhu. Last updated 2 months ago.

pharmaceuticals tables

24.2 match 232 stars 13.65 score 238 scripts 17 dependents

nicolas-robette

GDAtools:Geometric Data Analysis

Many tools for Geometric Data Analysis (Le Roux & Rouanet (2005) <doi:10.1007/1-4020-2236-0>), such as MCA variants (Specific Multiple Correspondence Analysis, Class Specific Analysis), many graphical and statistical aids to interpretation (structuring factors, concentration ellipses, inductive tests, bootstrap validation, etc.) and multiple-table analysis (Multiple Factor Analysis, between- and inter-class analysis, Principal Component Analysis and Correspondence Analysis with Instrumental Variables, etc.).

Maintained by Nicolas Robette. Last updated 10 months ago.

55.5 match 10 stars 5.93 score 94 scripts 2 dependents

rsquaredacademy

olsrr:Tools for Building OLS Regression Models

Tools designed to make it easier for users, particularly beginner/intermediate R users to build ordinary least squares regression models. Includes comprehensive regression output, heteroskedasticity tests, collinearity diagnostics, residual diagnostics, measures of influence, model fit assessment and variable selection procedures.

Maintained by Aravind Hebbali. Last updated 4 months ago.

collinearity-diagnostics linear-models regression stepwise-regression

26.9 match 103 stars 12.19 score 1.4k scripts 4 dependents

bdwilliamson

vimp:Perform Inference on Algorithm-Agnostic Variable Importance

Calculate point estimates of and valid confidence intervals for nonparametric, algorithm-agnostic variable importance measures in high and low dimensions, using flexible estimators of the underlying regression functions. For more information about the methods, please see Williamson et al. (Biometrics, 2020), Williamson et al. (JASA, 2021), and Williamson and Feng (ICML, 2020).

Maintained by Brian D. Williamson. Last updated 1 months ago.

machine-learning nonparametric-statistics statistical-inference variable-importance

47.5 match 23 stars 6.79 score 67 scripts

topepo

caret:Classification and Regression Training

Misc functions for training and plotting classification and regression models.

Maintained by Max Kuhn. Last updated 3 months ago.

16.6 match 1.6k stars 19.24 score 61k scripts 303 dependents

larmarange

broom.helpers:Helpers for Model Coefficients Tibbles

Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.

Maintained by Joseph Larmarange. Last updated 10 days ago.

27.7 match 22 stars 11.45 score 165 scripts 2 dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

38.0 match 3 stars 8.20 score 7.8k scripts 11 dependents

tbates

umx:Structural Equation Modeling and Twin Modeling in R

Quickly create, run, and report structural equation models, and twin models. See '?umx' for help, and umx_open_CRAN_page("umx") for NEWS. Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. <doi:10.1017/thg.2019.2>.

Maintained by Timothy C. Bates. Last updated 2 days ago.

behavior-genetics genetics openmx psychology sem statistics structural-equation-modeling tutorials twin-models umx

31.7 match 44 stars 9.45 score 472 scripts

pharmaverse

admiralvaccine:Vaccine Extension Package for ADaM in 'R' Asset Library

Programming vaccine specific Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in 'R'. Flat model is followed as per Center for Biologics Evaluation and Research (CBER) guidelines for creating vaccine specific domains. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team (2021), <https://www.cdisc.org/standards/foundational/adam/adamig-v1-3-release-package>). The package is an extension package of the 'admiral' package.

Maintained by Sukalpo Saha. Last updated 2 months ago.

cdisc

39.8 match 6 stars 7.44 score 23 scripts

pablo14

funModeling:Exploratory Data Analysis and Data Preparation Tool-Box

Around 10% of almost any predictive modeling project is spent in predictive modeling, 'funModeling' and the book Data Science Live Book (<https://livebook.datascienceheroes.com/>) are intended to cover remaining 90%: data preparation, profiling, selecting best variables 'dataViz', assessing model performance and other functions.

Maintained by Pablo Casas. Last updated 2 years ago.

34.3 match 100 stars 8.57 score 654 scripts

ekstroem

dataReporter:Reproducible Data Screening Checks and Report of Possible Errors

Data screening is an important first step of any statistical analysis. 'dataReporter' auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset. See Petersen AH, Ekstrøm CT (2019). "dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R." _Journal of Statistical Software_, *90*(6), 1-38 <doi:10.18637/jss.v090.i06> for more information.

Maintained by Claus Thorn Ekstrøm. Last updated 2 years ago.

47.6 match 86 stars 6.16 score 34 scripts

tidymodels

infer:Tidy Statistical Inference

The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.

Maintained by Simon Couch. Last updated 6 months ago.

18.7 match 734 stars 15.69 score 3.5k scripts 17 dependents

insightsengineering

tern:Create Common TLGs Used in Clinical Trials

Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.

Maintained by Joe Zhu. Last updated 2 months ago.

clinical-trials graphs listings nest outputs tables

23.2 match 79 stars 12.62 score 186 scripts 9 dependents

matteo21q

jomo:Multilevel Joint Modelling Multiple Imputation

Similarly to Schafer's package 'pan', 'jomo' is a package for multilevel joint modelling multiple imputation (Carpenter and Kenward, 2013) <doi:10.1002/9781119942283>. Novel aspects of 'jomo' are the possibility of handling binary and categorical data through latent normal variables, the option to use cluster-specific covariance matrices and to impute compatibly with the substantive model.

Maintained by Matteo Quartagno. Last updated 3 years ago.

30.5 match 3 stars 9.58 score 126 scripts 154 dependents

bioc

RImmPort:RImmPort: Enabling Ready-for-analysis Immunology Research Data

The RImmPort package simplifies access to ImmPort data for analysis in the R environment. It provides a standards-based interface to the ImmPort study data that is in a proprietary format.

Maintained by Zicheng Hu. Last updated 5 months ago.

biomedicalinformatics dataimport datarepresentation

67.0 match 4.33 score 27 scripts

tidymodels

textrecipes:Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Maintained by Emil Hvitfeldt. Last updated 9 days ago.

26.6 match 160 stars 10.87 score 964 scripts 1 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 9 hours ago.

fortran cpp

16.7 match 87 stars 16.70 score 7.7k scripts 99 dependents

dgerbing

lessR:Less Code, More Results

Each function replaces multiple standard R functions. For example, two function calls, Read() and CountAll(), generate summary statistics for all variables in the data frame, plus histograms and bar charts as appropriate. Other functions provide for summary statistics via pivot tables, a comprehensive regression analysis, ANOVA and t-test, visualizations including the Violin/Box/Scatter plot for a numerical variable, bar chart, histogram, box plot, density curves, calibrated power curve, reading multiple data formats with the same function call, variable labels, time series with aggregation and forecasting, color themes, and Trellis (facet) graphics. Also includes a confirmatory factor analysis of multiple indicator measurement models, pedagogical routines for data simulation such as for the Central Limit Theorem, generation and rendering of regression instructions for interpretative output, and interactive visualizations.

Maintained by David W. Gerbing. Last updated 1 months ago.

37.0 match 6 stars 7.47 score 394 scripts 3 dependents

lrberge

fixest:Fast Fixed-Effects Estimations

Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.

Maintained by Laurent Berge. Last updated 7 months ago.

cpp openmp

18.8 match 387 stars 14.69 score 3.8k scripts 25 dependents

jenniniku

gllvm:Generalized Linear Latent Variable Models

Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).

Maintained by Jenni Niku. Last updated 13 hours ago.

cpp openmp

25.9 match 52 stars 10.53 score 176 scripts 1 dependents

cvxgrp

CVXR:Disciplined Convex Optimization

An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.

Maintained by Anqi Fu. Last updated 4 months ago.

cpp

20.9 match 207 stars 12.89 score 768 scripts 51 dependents

larmarange

labelled:Manipulating Labelled Data

Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.

Maintained by Joseph Larmarange. Last updated 26 days ago.

haven labels metadata sas spss stata

17.6 match 76 stars 15.02 score 2.4k scripts 96 dependents

openanalytics

inTextSummaryTable:Creation of in-Text Summary Table

Creation of tables of summary statistics or counts for clinical data (for 'TLFs'). These tables can be exported as in-text table (with the 'flextable' package) for a Clinical Study Report (Word format) or a 'topline' presentation (PowerPoint format), or as interactive table (with the 'DT' package) to an html document for clinical data review.

Maintained by Laure Cougnaud. Last updated 9 months ago.

45.9 match 1 stars 5.52 score 47 scripts

inzightvit

iNZightTools:Tools for 'iNZight'

Provides a collection of wrapper functions for common variable and dataset manipulation workflows primarily used by 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions. Additionally, many of the functions return the 'tidyverse' code used to obtain the result in an effort to bridge the gap between GUI and coding.

Maintained by Tom Elliott. Last updated 3 months ago.

48.7 match 1 stars 5.16 score 18 scripts 2 dependents

joon-e

tidycomm:Data Modification and Analysis for Communication Research

Provides convenience functions for common data modification and analysis tasks in communication research. This includes functions for univariate and bivariate data analysis, index generation and reliability computation, and intercoder reliability tests. All functions follow the style and syntax of the tidyverse, and are construed to perform their computations on multiple variables at once. Functions for univariate and bivariate data analysis comprise summary statistics for continuous and categorical variables, as well as several tests of bivariate association including effect sizes. Functions for data modification comprise index generation and automated reliability analysis of index variables. Functions for intercoder reliability comprise tests of several intercoder reliability estimates, including simple and mean pairwise percent agreement, Krippendorff's Alpha (Krippendorff 2004, ISBN: 9780761915454), and various Kappa coefficients (Brennan & Prediger 1981 <doi: 10.1177/001316448104100307>; Cohen 1960 <doi: 10.1177/001316446002000104>; Fleiss 1971 <doi: 10.1037/h0031619>).

Maintained by Julian Unkel. Last updated 10 months ago.

37.8 match 15 stars 6.59 score 52 scripts

pecanproject

PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.

Maintained by David LeBauer. Last updated 2 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

21.5 match 216 stars 11.59 score 64 scripts 14 dependents

rfastofficial

Rfast:A Collection of Efficient and Extremely Fast R Functions

A collection of fast (utility) functions for data analysis. Column and row wise means, medians, variances, minimums, maximums, many t, F and G-square tests, many regressions (normal, logistic, Poisson), are some of the many fast functions. References: a) Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>. b) Tsagris M. and Papadakis M. (2018). Forward regression in R: from the extreme slow to the extreme fast. Journal of Data Science, 16(4): 771--780. <doi:10.6339/JDS.201810_16(4).00006>. c) Chatzipantsiou C., Dimitriadis M., Papadakis M. and Tsagris M. (2020). Extremely Efficient Permutation and Bootstrap Hypothesis Tests Using Hypothesis Tests Using R. Journal of Modern Applied Statistical Methods, 18(2), eP2898. <doi:10.48550/arXiv.1806.10947>. d) Tsagris M., Papadakis M., Alenazi A. and Alzeley O. (2024). Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm. Computation, 12(9): 185. <doi:10.3390/computation12090185>. e) Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. <doi:10.48550/arXiv.2501.02849>.

Maintained by Manos Papadakis. Last updated 17 days ago.

openblas cpp openmp

19.8 match 147 stars 12.54 score 1.2k scripts 166 dependents

radiant-rstats

radiant.data:Data Menu for Radiant: Business Analytics using R and Shiny

The Radiant Data menu includes interfaces for loading, saving, viewing, visualizing, summarizing, transforming, and combining data. It also contains functionality to generate reproducible reports of the analyses conducted in the application.

Maintained by Vincent Nijs. Last updated 5 months ago.

29.8 match 54 stars 8.30 score 146 scripts 6 dependents

juba

questionr:Functions to Make Surveys Processing Easier

Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.

Maintained by Julien Barnier. Last updated 1 days ago.

19.4 match 83 stars 12.62 score 1.1k scripts 19 dependents

sdctools

sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.

Maintained by Matthias Templ. Last updated 27 days ago.

cpp

24.1 match 83 stars 9.89 score 258 scripts

consbiol-unibern

SDMtune:Species Distribution Model Selection

User-friendly framework that enables the training and the evaluation of species distribution models (SDMs). The package implements functions for data driven variable selection and model tuning and includes numerous utilities to display the results. All the functions used to select variables or to tune model hyperparameters have an interactive real-time chart displayed in the 'RStudio' viewer pane during their execution.

Maintained by Sergio Vignali. Last updated 3 months ago.

hyperparameter-tuning species-distribution-modelling variable-selection cpp

32.2 match 25 stars 7.37 score 155 scripts

jacobkap

fastDummies:Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables

Creates dummy columns from columns that have categorical variables (character or factor types). You can also specify which columns to make dummies out of, or which columns to ignore. Also creates dummy rows from character, factor, and Date columns. This package provides a significant speed increase from creating dummy variables through model.matrix().

Maintained by Jacob Kaplan. Last updated 2 months ago.

binary-data dummy-columns dummy-data dummy-rows dummy-variable

18.0 match 36 stars 13.14 score 2.5k scripts 131 dependents

fabrice-rossi

mixvlmc:Variable Length Markov Chains with Covariates

Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates.

Maintained by Fabrice Rossi. Last updated 10 months ago.

machine-learning markov-chain markov-model statistics time-series cpp

37.5 match 2 stars 6.23 score 20 scripts

hdvinod

generalCorr:Generalized Correlations, Causal Paths and Portfolio Selection

Function gmcmtx0() computes a more reliable (general) correlation matrix. Since causal paths from data are important for all sciences, the package provides many sophisticated functions. causeSummBlk() and causeSum2Blk() give easy-to-interpret causal paths. Let Z denote control variables and compare two flipped kernel regressions: X=f(Y, Z)+e1 and Y=g(X, Z)+e2. Our criterion Cr1 says that if |e1*Y|>|e2*X| then variation in X is more "exogenous or independent" than in Y, and the causal path is X to Y. Criterion Cr2 requires |e2|<|e1|. These inequalities between many absolute values are quantified by four orders of stochastic dominance. Our third criterion Cr3, for the causal path X to Y, requires new generalized partial correlations to satisfy |r*(x|y,z)|< |r*(y|x,z)|. The function parcorVec() reports generalized partials between the first variable and all others. The package provides several R functions including get0outliers() for outlier detection, bigfp() for numerical integration by the trapezoidal rule, stochdom2() for stochastic dominance, pillar3D() for 3D charts, canonRho() for generalized canonical correlations, depMeas() measures nonlinear dependence, and causeSummary(mtx) reports summary of causal paths among matrix columns. Portfolio selection: decileVote(), momentVote(), dif4mtx(), exactSdMtx() can rank several stocks. Functions whose names begin with 'boot' provide bootstrap statistical inference, including a new bootGcRsq() test for "Granger-causality" allowing nonlinear relations. A new tool for evaluation of out-of-sample portfolio performance is outOFsamp(). Panel data implementation is now included. See eight vignettes of the package for theory, examples, and usage tips. See Vinod (2019) \doi{10.1080/03610918.2015.1122048}.

Maintained by H. D. Vinod. Last updated 1 years ago.

52.1 match 2 stars 4.48 score 63 scripts 1 dependents

lleisong

itsdm:Isolation Forest-Based Presence-Only Species Distribution Modeling

Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest) and its variations such as Extended isolation forest and SCiForest. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) <https://proceedings.mlr.press/v48/guha16.html>, Cortes, D. (2021) <arXiv:2110.13402>. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) <https://dl.acm.org/doi/abs/10.5555/3295222.3295230>, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Štrumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including 'WorldClim' version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and 'CMCC-BioClimInd' (see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>.

Maintained by Lei Song. Last updated 2 years ago.

isolation-forest outlier-detection presence-onlymodel shapley-value species-distribution-modelling

41.0 match 4 stars 5.59 score 65 scripts

ggobi

GGally:Extension to 'ggplot2'

The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.

Maintained by Barret Schloerke. Last updated 10 months ago.

14.1 match 597 stars 16.15 score 17k scripts 154 dependents

config-i1

greybox:Toolbox for Model Building and Forecasting

Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.

Maintained by Ivan Svetunkov. Last updated 2 days ago.

forecasting model-selection model-selection-and-evaluation regression regression-models statistics cpp

20.7 match 30 stars 11.03 score 97 scripts 34 dependents

cdcgov

surveytable:Formatted Survey Estimates

Short and understandable commands that generate tabulated, formatted, and rounded survey estimates. Mostly a wrapper for the 'survey' package (Lumley (2004) <doi:10.18637/jss.v009.i08> <https://CRAN.R-project.org/package=survey>) that identifies low-precision estimates using the National Center for Health Statistics (NCHS) presentation standards (Parker et al. (2017) <https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf>, Parker et al. (2023) <doi:10.15620/cdc:124368>).

Maintained by Alex Strashny. Last updated 4 days ago.

estimates formatted-output pretty-print survey tables

33.6 match 6 stars 6.71 score 19 scripts

rcalinjageman

esci:Estimation Statistics with Confidence Intervals

A collection of functions and 'jamovi' module for the estimation approach to inferential statistics, the approach which emphasizes effect sizes, interval estimates, and meta-analysis. Nearly all functions are based on 'statpsych' and 'metafor'. This package is still under active development, and breaking changes are likely, especially with the plot and hypothesis test functions. Data sets are included for all examples from Cumming & Calin-Jageman (2024) <ISBN:9780367531508>.

Maintained by Robert Calin-Jageman. Last updated 22 days ago.

jamovi jasp science statistics visualization

40.9 match 22 stars 5.42 score 12 scripts

merliseclyde

BAS:Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling

Package for Bayesian Variable Selection and Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner's g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the mixture of g-priors from Liang et al (2008) <DOI:10.1198/016214507000001337> for linear models or mixtures of g-priors from Li and Clyde (2019) <DOI:10.1080/01621459.2018.1469992> in generalized linear models. Other model selection criteria include AIC, BIC and Empirical Bayes estimates of g. Sampling probabilities may be updated based on the sampled models using sampling w/out replacement or an efficient MCMC algorithm which samples models using a tree structure of the model space as an efficient hash table. See Clyde, Ghosh and Littman (2010) <DOI:10.1198/jcgs.2010.09049> for details on the sampling algorithms. Uniform priors over all models or beta-binomial prior distributions on model size are allowed, and for large p truncated priors on the model space may be used to enforce sampling models that are full rank. The user may force variables to always be included in addition to imposing constraints that higher order interactions are included only if their parents are included in the model. This material is based upon work supported by the National Science Foundation under Division of Mathematical Sciences grant 1106891. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Maintained by Merlise Clyde. Last updated 4 months ago.

bayesian bayesian-inference generalized-linear-models linear-regression logistic-regression mcmc model-selection poisson-regression predictive-modeling regression variable-selection fortran openblas

20.5 match 44 stars 10.81 score 420 scripts 3 dependents

ehrlinger

ggRandomForests:Visually Exploring Random Forests

Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.

Maintained by John Ehrlinger. Last updated 5 days ago.

24.7 match 148 stars 8.94 score 197 scripts

kaz-yos

tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights

Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.

Maintained by Kazuki Yoshida. Last updated 3 years ago.

baseline-characteristics descriptive-statistics statistics

16.2 match 221 stars 13.55 score 2.3k scripts 12 dependents

oscarkjell

text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Maintained by Oscar Kjell. Last updated 4 days ago.

deep-learning machine-learning nlp transformers openjdk

16.1 match 146 stars 13.16 score 436 scripts 1 dependents

geomorphr

geomorph:Geometric Morphometric Analyses of 2D and 3D Landmark Data

Read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation.

Maintained by Dean Adams. Last updated 1 months ago.

17.6 match 76 stars 12.05 score 700 scripts 6 dependents

usepa

httk:High-Throughput Toxicokinetics

Pre-made models that can be rapidly tailored to various chemicals and species using chemical-specific in vitro data and physiological information. These tools allow incorporation of chemical toxicokinetics ("TK") and in vitro-in vivo extrapolation ("IVIVE") into bioinformatics, as described by Pearce et al. (2017) (<doi:10.18637/jss.v079.i04>). Chemical-specific in vitro data characterizing toxicokinetics have been obtained from relatively high-throughput experiments. The chemical-independent ("generic") physiologically-based ("PBTK") and empirical (for example, one compartment) "TK" models included here can be parameterized with in vitro data or in silico predictions which are provided for thousands of chemicals, multiple exposure routes, and various species. High throughput toxicokinetics ("HTTK") is the combination of in vitro data and generic models. We establish the expected accuracy of HTTK for chemicals without in vivo data through statistical evaluation of HTTK predictions for chemicals where in vivo data do exist. The models are systems of ordinary differential equations that are developed in MCSim and solved using compiled (C-based) code for speed. A Monte Carlo sampler is included for simulating human biological variability (Ring et al., 2017 <doi:10.1016/j.envint.2017.06.004>) and propagating parameter uncertainty (Wambaugh et al., 2019 <doi:10.1093/toxsci/kfz205>). Empirically calibrated methods are included for predicting tissue:plasma partition coefficients and volume of distribution (Pearce et al., 2017 <doi:10.1007/s10928-017-9548-7>). These functions and data provide a set of tools for using IVIVE to convert concentrations from high-throughput screening experiments (for example, Tox21, ToxCast) to real-world exposures via reverse dosimetry (also known as "RTK") (Wetmore et al., 2015 <doi:10.1093/toxsci/kfv171>).

Maintained by John Wambaugh. Last updated 1 months ago.

comptox ord

20.6 match 27 stars 10.22 score 307 scripts 1 dependents

trackerproject

trackeR:Infrastructure for Running, Cycling and Swimming Data from GPS-Enabled Tracking Devices

Provides infrastructure for handling running, cycling and swimming data from GPS-enabled tracking devices within R. The package provides methods to extract, clean and organise workout and competition data into session-based and unit-aware data objects of class 'trackeRdata' (S3 class). The information can then be visualised, summarised, and analysed through flexible and extensible methods. Frick and Kosmidis (2017) <doi: 10.18637/jss.v082.i07>, which is updated and maintained as one of the vignettes, provides detailed descriptions of the package and its methods, and real-data demonstrations of the package functionality.

Maintained by Ioannis Kosmidis. Last updated 1 years ago.

32.4 match 90 stars 6.37 score 58 scripts 1 dependents

easystats

insight:Easy Access to Model Information for Various Model Objects

A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.

Maintained by Daniel Lüdecke. Last updated 5 days ago.

easystats hacktoberfest insight models names predictors random

11.9 match 412 stars 17.24 score 568 scripts 210 dependents

sollano

forestmangr:Forest Mensuration and Management

Processing forest inventory data with methods such as simple random sampling, stratified random sampling and systematic sampling. There are also functions for yield and growth predictions and model fitting, linear and nonlinear grouped data fitting, and statistical tests. References: Kershaw Jr., Ducey, Beers and Husch (2016). <doi:10.1002/9781118902028>.

Maintained by Sollano Rabelo Braga. Last updated 4 months ago.

25.7 match 17 stars 7.97 score 378 scripts

r-a-dobson

dynamicSDM:Species Distribution and Abundance Modelling at High Spatio-Temporal Resolution

A collection of novel tools for generating species distribution and abundance models (SDM) that are dynamic through both space and time. These highly flexible functions incorporate spatial and temporal aspects across key SDM stages; including when cleaning and filtering species occurrence data, generating pseudo-absence records, assessing and correcting sampling biases and autocorrelation, extracting explanatory variables and projecting distribution patterns. Throughout, functions utilise Google Earth Engine and Google Drive to minimise the computing power and storage demands associated with species distribution modelling at high spatio-temporal resolution.

Maintained by Rachel Dobson. Last updated 27 days ago.

dynamicsdm google-earth-engine googledrive sdm spatiotemporal spatiotemporal-data-analysis spatiotemporal-forecasting species-distribution-modelling species-distributions

32.6 match 6 stars 6.16 score 20 scripts

mountainmath

cancensus:Access, Retrieve, and Work with Canadian Census Data and Geography

Integrated, convenient, and uniform access to Canadian Census data and geography retrieved using the 'CensusMapper' API. This package produces analysis-ready tidy data frames and spatial data in multiple formats, as well as convenience functions for working with Census variables, variable hierarchies, and region selection. API keys are freely available with free registration at <https://censusmapper.ca/api>. Census data and boundary geometries are reproduced and distributed on an "as is" basis with the permission of Statistics Canada (Statistics Canada 2001; 2006; 2011; 2016; 2021).

Maintained by Dmitry Shkolnik. Last updated 1 years ago.

22.8 match 82 stars 8.80 score 414 scripts

r-causal

ggdag:Analyze and Create Elegant Directed Acyclic Graphs

Tidy, analyze, and plot directed acyclic graphs (DAGs). 'ggdag' is built on top of 'dagitty', an R package that uses the 'DAGitty' web tool (<https://dagitty.net/>) for creating and analyzing DAGs. 'ggdag' makes it easy to tidy and plot 'dagitty' objects using 'ggplot2' and 'ggraph', as well as common analytic and graphical functions, such as determining adjustment sets and node relationships.

Maintained by Malcolm Barrett. Last updated 8 months ago.

causal-inference dag ggplot-extension

17.0 match 443 stars 11.78 score 1.8k scripts 5 dependents

chavent

ClustOfVar:Clustering of Variables

Cluster analysis of a set of variables. Variables can be quantitative, qualitative or a mixture of both.

Maintained by Marie Chavent. Last updated 5 years ago.

30.8 match 7 stars 6.47 score 142 scripts 2 dependents

modeloriented

vivo:Variable Importance via Oscillations

Provides an easy to calculate local variable importance measure based on Ceteris Paribus profile and global variable importance measure based on Partial Dependence Profiles.

Maintained by Anna Kozak. Last updated 4 years ago.

explainable-ai explainable-artificial-intelligence explainable-ml iml interpretable-machine-learning variable-importance xai

36.5 match 14 stars 5.45 score 7 scripts

danchaltiel

crosstable:Crosstables for Descriptive Analyses

Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using 'rmarkdown' or 'officer'. You can also compute effect sizes and statistical tests if needed.

Maintained by Dan Chaltiel. Last updated 2 months ago.

descriptive-statistics flextable frequency-table html-report msword officer

19.1 match 116 stars 10.37 score 340 scripts

ecmerkle

blavaan:Bayesian Latent Variable Analysis

Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models. References: Merkle & Rosseel (2018) <doi:10.18637/jss.v085.i04>; Merkle et al. (2021) <doi:10.18637/jss.v100.i06>.

Maintained by Edgar Merkle. Last updated 4 days ago.

bayesian-statistics factor-analysis growth-curve-models latent-variables missing-data multilevel-models multivariate-analysis path-analysis psychometrics statistical-modeling structural-equation-modeling cpp

18.3 match 92 stars 10.84 score 183 scripts 3 dependents

clewerenz

ilabelled:Simple Handling of Labelled Data

Simple handling of survey data. Smart handling of meta-information like e.g. variable-labels value-labels and scale-levels. Easy access and validation of meta-information. Useage of value labels and values respectively for subsetting and recoding data.

Maintained by Christof Lewerenz. Last updated 2 months ago.

32.6 match 2 stars 6.02 score 13 scripts

rpolars

polars:Lightning-Fast 'DataFrame' Library

Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.

Maintained by Soren Welling. Last updated 3 days ago.

arrow polars rust

16.3 match 499 stars 12.01 score 1.0k scripts 2 dependents

tyee001

VGAM:Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book "Vector Generalized Linear and Additive Models: With an Implementation in R" (Yee, 2015) <DOI:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (100+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, doubly constrained RR-VGLMs, quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)---these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Hauck-Donner effect detection is implemented. Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

Maintained by Thomas Yee. Last updated 1 months ago.

fortran

17.9 match 10 stars 10.67 score 3.6k scripts 169 dependents

rfastofficial

Rfast2:A Collection of Efficient and Extremely Fast R Functions II

A collection of fast statistical and utility functions for data analysis. Functions for regression, maximum likelihood, column-wise statistics and many more have been included. C++ has been utilized to speed up the functions. References: Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>.

Maintained by Manos Papadakis. Last updated 1 years ago.

openblas cpp openmp

23.6 match 38 stars 8.09 score 75 scripts 26 dependents

biodiverse

spOccupancy:Single-Species, Multi-Species, and Integrated Spatial Occupancy Models

Fits single-species, multi-species, and integrated non-spatial and spatial occupancy models using Markov Chain Monte Carlo (MCMC). Models are fit using Polya-Gamma data augmentation detailed in Polson, Scott, and Windle (2013) <doi:10.1080/01621459.2013.829001>. Spatial models are fit using either Gaussian processes or Nearest Neighbor Gaussian Processes (NNGP) for large spatial datasets. Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Provides functionality for data integration of multiple single-species occupancy data sets using a joint likelihood framework. Details on data integration are given in Miller, Pacifici, Sanderlin, and Reich (2019) <doi:10.1111/2041-210X.13110>. Details on single-species and multi-species models are found in MacKenzie, Nichols, Lachman, Droege, Royle, and Langtimm (2002) <doi:10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2> and Dorazio and Royle <doi:10.1198/016214505000000015>, respectively.

Maintained by Jeffrey Doser. Last updated 22 days ago.

openblas cpp openmp

25.8 match 59 stars 7.31 score 204 scripts

psychmeta

psychmeta:Psychometric Meta-Analysis Toolkit

Tools for computing bare-bones and psychometric meta-analyses and for generating psychometric data for use in meta-analysis simulations. Supports bare-bones, individual-correction, and artifact-distribution methods for meta-analyzing correlations and d values. Includes tools for converting effect sizes, computing sporadic artifact corrections, reshaping meta-analytic databases, computing multivariate corrections for range variation, and more. Bugs can be reported to <https://github.com/psychmeta/psychmeta/issues> or <issues@psychmeta.com>.

Maintained by Jeffrey A. Dahlke. Last updated 9 months ago.

hacktoberfest meta-analysis psychology psychometric psychometrics

22.5 match 57 stars 8.25 score 151 scripts

gavinsimpson

gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'

Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.

Maintained by Gavin L. Simpson. Last updated 12 hours ago.

distributional-regression gam gamm generalized-additive-mixed-models generalized-additive-models ggplot2 glm lm mgcv penalized-spline random-effects smoothing splines

14.3 match 217 stars 12.99 score 1.6k scripts 2 dependents

pauljohn32

rockchalk:Regression Estimation and Presentation

A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.

Maintained by Paul E. Johnson. Last updated 3 years ago.

26.0 match 7.13 score 584 scripts 18 dependents

pletschm

aldvmm:Adjusted Limited Dependent Variable Mixture Models

The goal of the package 'aldvmm' is to fit adjusted limited dependent variable mixture models of health state utilities. Adjusted limited dependent variable mixture models are finite mixtures of normal distributions with an accumulation of density mass at the limits, and a gap between 100% quality of life and the next smaller utility value. The package 'aldvmm' uses the likelihood and expected value functions proposed by Hernandez Alava and Wailoo (2015) <doi:10.1177/1536867X1501500307> using normal component distributions and a multinomial logit model of probabilities of component membership.

Maintained by Mark Pletscher. Last updated 1 years ago.

clinical-trials cost-effectiveness eq5d finite-mixture health-economics hta hui limited-dependent-variable mapping mixture-model patient-reported-outcomes quality-of-life utilities

41.6 match 5 stars 4.40 score 2 scripts

hofnerb

stabs:Stability Selection with Error Control

Resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, <doi:10.1111/j.1467-9868.2010.00740.x>) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, <doi:10.1111/j.1467-9868.2011.01034.x>) are implemented. The package can be combined with arbitrary user specified variable selection approaches.

Maintained by Benjamin Hofner. Last updated 4 years ago.

machine-learning r-language resampling stability-selection variable-importance variable-selection

19.0 match 26 stars 9.59 score 53 scripts 31 dependents

bsvars

bsvars:Bayesian Estimation of Structural Vector Autoregressive Models

Provides fast and efficient procedures for Bayesian analysis of Structural Vector Autoregressions. This package estimates a wide range of models, including homo-, heteroskedastic, and non-normal specifications. Structural models can be identified by adjustable exclusion restrictions, time-varying volatility, or non-normality. They all include a flexible three-level equation-specific local-global hierarchical prior distribution for the estimated level of shrinkage for autoregressive and structural parameters. Additionally, the package facilitates predictive and structural analyses such as impulse responses, forecast error variance and historical decompositions, forecasting, verification of heteroskedasticity, non-normality, and hypotheses on autoregressive parameters, as well as analyses of structural shocks, volatilities, and fitted values. Beautiful plots, informative summary functions, and extensive documentation including the vignette by Woźniak (2024) <doi:10.48550/arXiv.2410.15090> complement all this. The implemented techniques align closely with those presented in Lütkepohl, Shang, Uzeda, & Woźniak (2024) <doi:10.48550/arXiv.2404.11057>, Lütkepohl & Woźniak (2020) <doi:10.1016/j.jedc.2020.103862>, and Song & Woźniak (2021) <doi:10.1093/acrefore/9780190625979.013.174>. The 'bsvars' package is aligned regarding objects, workflows, and code structure with the R package 'bsvarSIGNs' by Wang & Woźniak (2024) <doi:10.32614/CRAN.package.bsvarSIGNs>, and they constitute an integrated toolset.

Maintained by Tomasz Woźniak. Last updated 1 months ago.

bayesian-inference econometrics vector-autoregression openblas cpp openmp

23.5 match 46 stars 7.67 score 32 scripts 1 dependents

tidyverse

forcats:Tools for Working with Categorical Variables (Factors)

Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').

Maintained by Hadley Wickham. Last updated 1 years ago.

factor tidyverse

9.5 match 555 stars 18.77 score 21k scripts 1.2k dependents

bioc

sva:Surrogate Variable Analysis

The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).

Maintained by Jeffrey T. Leek. Last updated 5 months ago.

immunooncology microarray statisticalmethod preprocessing multiplecomparison sequencing rnaseq batcheffect normalization

17.8 match 10.05 score 3.2k scripts 50 dependents

overton-group

eHDPrep:Quality Control and Semantic Enrichment of Datasets

A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) <doi:10.1093/gigascience/giad030>). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.

Maintained by Ian Overton. Last updated 2 years ago.

data-quality health-informatics semantic-enrichment

36.0 match 8 stars 4.90 score 10 scripts

r-forge

car:Companion to Applied Regression

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.

Maintained by John Fox. Last updated 5 months ago.

11.5 match 15.29 score 43k scripts 901 dependents

indrajeetpatil

ggstatsplot:'ggplot2' Based Plots with Statistical Details

Extension of 'ggplot2', 'ggstatsplot' creates graphics with details from statistical tests included in the plots themselves. It provides an easier syntax to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms, dot plots, dot-and-whisker plots) or categorical (pie and bar charts) data. Currently, it supports the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian versions of t-test/ANOVA, correlation analyses, contingency table analysis, meta-analysis, and regression analyses. References: Patil (2021) <doi:10.21105/joss.03236>.

Maintained by Indrajeet Patil. Last updated 20 days ago.

bayes-factors datascience dataviz effect-size ggplot-extension hypothesis-testing non-parametric-statistics regression-models statistical-analysis

12.1 match 2.1k stars 14.49 score 3.0k scripts 1 dependents

statisticsnorway

SSBtools:Algorithms and Tools for Tabular Statistics and Hierarchical Computations

Includes general data manipulation functions, algorithms for statistical disclosure control (Langsrud, 2024) <doi:10.1007/978-3-031-69651-0_6> and functions for hierarchical computations by sparse model matrices (Langsrud, 2023) <doi:10.32614/RJ-2023-088>.

Maintained by Øyvind Langsrud. Last updated 3 days ago.

statistics

23.0 match 7 stars 7.62 score 68 scripts 7 dependents

alexanderrobitzsch

miceadds:Some Additional Multiple Imputation Functions, Especially for 'mice'

Contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).

Maintained by Alexander Robitzsch. Last updated 16 days ago.

missing-data multiple-imputation openblas cpp

19.1 match 16 stars 9.16 score 542 scripts 9 dependents

cardiomoon

autoReg:Automatic Linear and Logistic Regression and Survival Analysis

Make summary tables for descriptive statistics and select explanatory variables automatically in various regression models. Support linear models, generalized linear models and cox-proportional hazard models. Generate publication-ready tables summarizing result of regression analysis and plots. The tables and plots can be exported in "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents.

Maintained by Keon-Woong Moon. Last updated 1 years ago.

25.0 match 47 stars 7.00 score 69 scripts

edwinkipruto

mfp2:Multivariable Fractional Polynomial Models with Extensions

Multivariable fractional polynomial algorithm simultaneously selects variables and functional forms in both generalized linear models and Cox proportional hazard models. Key references are Royston and Altman (1994) <doi:10.2307/2986270> and Royston and Sauerbrei (2008, ISBN:978-0-470-02842-1). In addition, it can model a sigmoid relationship between variable x and an outcome variable y using the approximate cumulative distribution transformation proposed by Royston (2014) <doi:10.1177/1536867X1401400206>. This feature distinguishes it from a standard fractional polynomial function, which lacks the ability to achieve such modeling.

Maintained by Edwin Kipruto. Last updated 10 months ago.

33.2 match 3 stars 5.26 score 4 scripts 2 dependents

myles-lewis

nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'

Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.

Maintained by Myles Lewis. Last updated 6 days ago.

21.9 match 12 stars 7.92 score 46 scripts

tiledb-inc

tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays

The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.

Maintained by Isaiah Norton. Last updated 4 days ago.

array hdfs s3 storage-manager tiledb cpp

14.4 match 107 stars 11.96 score 306 scripts 4 dependents

uupharmacometrics

xpose:Diagnostics for Pharmacometric Models

Diagnostics for non-linear mixed-effects (population) models from 'NONMEM' <https://www.iconplc.com/solutions/technologies/nonmem/>. 'xpose' facilitates data import, creation of numerical run summary and provide 'ggplot2'-based graphics for data exploration and model diagnostics.

Maintained by Benjamin Guiastrennec. Last updated 2 months ago.

diagnostics ggplot2 nonmem pharmacometrics xpose

15.7 match 62 stars 11.02 score 183 scripts 6 dependents

zeileis

ivreg:Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics

Instrumental variable estimation for linear models by two-stage least-squares (2SLS) regression or by robust-regression via M-estimation (2SM) or MM-estimation (2SMM). The main ivreg() model-fitting function is designed to provide a workflow as similar as possible to standard lm() regression. A wide range of methods is provided for fitted ivreg model objects, including extensive functionality for computing and graphing regression diagnostics in addition to other standard model tools.

Maintained by Achim Zeileis. Last updated 2 months ago.

instrumental-variables regression-diagnostics two-stage-least-squares-regression

16.8 match 20 stars 10.24 score 360 scripts 4 dependents

luckinet

tabshiftr:Reshape Disorganised Messy Data

Helps the user to build and register schema descriptions of disorganised (messy) tables. Disorganised tables are tables that are not in a topologically coherent form, where packages such as 'tidyr' could be used for reshaping. The schema description documents the arrangement of input tables and is used to reshape them into a standardised (tidy) output format.

Maintained by Steffen Ehrmann. Last updated 30 days ago.

data-management data-reshaping schemas

23.7 match 6 stars 7.18 score 62 scripts 1 dependents

pik-piam

remind2:The REMIND R package (2nd generation)

Contains the REMIND-specific routines for data and model output manipulation.

Maintained by Renato Rodrigues. Last updated 6 days ago.

19.1 match 8.88 score 161 scripts 5 dependents

nixtla

nixtlar:A Software Development Kit for 'Nixtla''s 'TimeGPT'

A Software Development Kit for working with 'Nixtla''s 'TimeGPT', a foundation model for time series forecasting. 'API' is an acronym for 'application programming interface'; this package allows users to interact with 'TimeGPT' via the 'API'. You can set and validate 'API' keys and generate forecasts via 'API' calls. It is compatible with 'tsibble' and base R. For more details visit <https://docs.nixtla.io/>.

Maintained by Mariana Menchero. Last updated 28 days ago.

20.7 match 30 stars 8.16 score 38 scripts

nbarrowman

vtree:Display Information About Nested Subsets of a Data Frame

A tool for calculating and drawing "variable trees". Variable trees display information about nested subsets of a data frame.

Maintained by Nick Barrowman. Last updated 1 days ago.

data-science data-visualization exploratory-data-analysis statistics

23.7 match 76 stars 7.09 score 65 scripts

hrbrmstr

hrbrthemes:Additional Themes, Theme Components and Utilities for 'ggplot2'

A compilation of extra 'ggplot2' themes, scales and utilities, including a spell check function for plot label fields and an overall emphasis on typography. A copy of the 'Google' font 'Roboto Condensed' is also included.

Maintained by Bob Rudis. Last updated 2 days ago.

data-visualization datavisualization ggplot-extension ggplot2 ggplot2-scales ggplot2-themes visualization

12.0 match 1.3k stars 13.92 score 13k scripts 15 dependents

stan-dev

rstanarm:Bayesian Applied Regression Modeling via Stan

Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.

Maintained by Ben Goodrich. Last updated 9 months ago.

bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics multilevel-models rstan rstanarm stan statistical-modeling cpp

10.7 match 393 stars 15.68 score 5.0k scripts 13 dependents

brian-j-smith

MachineShop:Machine Learning Models and Tools

Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.

Maintained by Brian J Smith. Last updated 7 months ago.

classification-models machine-learning predictive-modeling regression-models survival-models

20.8 match 61 stars 7.95 score 121 scripts

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 7 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

10.0 match 462 stars 16.50 score 10k scripts 154 dependents

statistikat

VIM:Visualization and Imputation of Missing Values

New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.

Maintained by Matthias Templ. Last updated 7 months ago.

hotdeck imputation-methods model-predictions visualization cpp

11.3 match 85 stars 14.44 score 2.6k scripts 19 dependents

modeloriented

DALEX:moDel Agnostic Language for Exploration and eXplanation

Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.

Maintained by Przemyslaw Biecek. Last updated 1 months ago.

black-box dalex data-science explainable-ai explainable-artificial-intelligence explainable-ml explanations explanatory-model-analysis fairness iml interpretability interpretable-machine-learning machine-learning model-visualization predictive-modeling responsible-ai responsible-ml xai

12.2 match 1.4k stars 13.40 score 876 scripts 21 dependents

ajsims1704

rdecision:Decision Analytic Modelling in Health Economics

Classes and functions for modelling health care interventions using decision trees and semi-Markov models. Mechanisms are provided for associating an uncertainty distribution with each source variable and for ensuring transparency of the mathematical relationships between variables. The package terminology follows Briggs "Decision Modelling for Health Economic Evaluation" (2006, ISBN:978-0-19-852662-9).

Maintained by Andrew Sims. Last updated 1 months ago.

25.2 match 3 stars 6.46 score 22 scripts

microsoft

wpa:Tools for Analysing and Visualising Viva Insights Data

Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.

Maintained by Martin Chan. Last updated 4 months ago.

workplace-analytics

24.2 match 30 stars 6.69 score 39 scripts 1 dependents

gergness

srvyr:'dplyr'-Like Syntax for Summary Statistics of Survey Data

Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.

Maintained by Greg Freedman Ellis. Last updated 1 months ago.

survey

11.6 match 215 stars 13.88 score 1.8k scripts 15 dependents

satijalab

Seurat:Tools for Single Cell Genomics

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.

Maintained by Paul Hoffman. Last updated 1 years ago.

human-cell-atlas single-cell-genomics single-cell-rna-seq cpp

9.4 match 2.4k stars 16.86 score 50k scripts 73 dependents

maelstrom-research

madshapR:Support Technical Processes Following 'Maelstrom Research' Standards

Functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).

Maintained by Guillaume Fabre. Last updated 11 months ago.

29.4 match 2 stars 5.40 score 28 scripts 3 dependents

pauljohn32

kutils:Project Management Tools

Tools for data importation, recoding, and inspection. There are functions to create new project folders, R code templates, create uniquely named output directories, and to quickly obtain a visual summary for each variable in a data frame. The main feature here is the systematic implementation of the "variable key" framework for data importation and recoding. We are eager to have community feedback about the variable key and the vignette about it. In version 1.7, the function 'semTable' is removed. It was deprecated since 1.67. That is provided in a separate package, 'semTable'.

Maintained by Paul Johnson. Last updated 1 years ago.

27.0 match 5.85 score 110 scripts 20 dependents

insightsengineering

chevron:Standard TLGs for Clinical Trials Reporting

Provide standard tables, listings, and graphs (TLGs) libraries used in clinical trials. This package implements a structure to reformat the data with 'dunlin', create reporting tables using 'rtables' and 'tern' with standardized input arguments to enable quick generation of standard outputs. In addition, it also provides comprehensive data checks and script generation functionality.

Maintained by Joe Zhu. Last updated 24 days ago.

clinical-trials graphs listings nest reporting tables

19.0 match 12 stars 8.24 score 12 scripts

nicolas-robette

descriptio:Descriptive Statistical Analysis

Description of statistical associations between variables : measures of local and global association between variables (phi, Cramér V, correlations, eta-squared, Goodman and Kruskal tau, permutation tests, etc.), multiple graphical representations of the associations between variables (using 'ggplot2') and weighted statistics.

Maintained by Nicolas Robette. Last updated 6 months ago.

31.0 match 4 stars 5.00 score 11 scripts 3 dependents

insightsengineering

teal.modules.general:General Modules for 'teal' Applications

Prebuilt 'shiny' modules containing tools for viewing data, visualizing data, understanding missing and outlier values within your data and performing simple data analysis. This extends 'teal' framework that supports reproducible research and analysis.

Maintained by Dawid Kaledkowski. Last updated 16 days ago.

general-purpose modules nest shiny

15.9 match 12 stars 9.76 score 71 scripts

pik-piam

piamInterfaces:Project specific interfaces to REMIND / MAgPIE

Project specific interfaces to REMIND / MAgPIE.

Maintained by Falk Benke. Last updated 2 days ago.

23.0 match 6.63 score 38 scripts 7 dependents

calcita

ech:Downloading and Processing Microdata from ECH-INE (Uruguay)

A consistent tool for downloading ECH data, processing them and generating new indicators: poverty, education, employment, etc. All data are downloaded from the official site of the National Institute of Statistics at <https://www.gub.uy/instituto-nacional-estadistica/datos-y-estadisticas/encuestas/encuesta-continua-hogares>.

Maintained by Gabriela Mathieu. Last updated 1 years ago.

24.7 match 16 stars 6.15 score 22 scripts

jtextor

dagitty:Graphical Analysis of Structural Causal Models

A port of the web-based software 'DAGitty', available at <https://dagitty.net>, for analyzing structural causal models (also known as directed acyclic graphs or DAGs). This package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation.

Maintained by Johannes Textor. Last updated 3 months ago.

11.8 match 302 stars 12.83 score 1.7k scripts 11 dependents

a-dudek-ue

clusterSim:Searching for Optimal Clustering Procedure for a Data Set

Distance measures (GDM1, GDM2, Sokal-Michener, Bray-Curtis, for symbolic interval-valued data), cluster quality indices (Calinski-Harabasz, Baker-Hubert, Hubert-Levine, Silhouette, Krzanowski-Lai, Hartigan, Gap, Davies-Bouldin), data normalization formulas (metric data, interval-valued symbolic data), data generation (typical and non-typical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic interval-valued data). (MILLIGAN, G.W., COOPER, M.C. (1985) <doi:10.1007/BF02294245>, HUBERT, L., ARABIE, P. (1985) <doi:10.1007%2FBF01908075>, RAND, W.M. (1971) <doi:10.1080/01621459.1971.10482356>, JAJUGA, K., WALESIAK, M. (2000) <doi:10.1007/978-3-642-57280-7_11>, MILLIGAN, G.W., COOPER, M.C. (1988) <doi:10.1007/BF01897163>, JAJUGA, K., WALESIAK, M., BAK, A. (2003) <doi:10.1007/978-3-642-55721-7_12>, DAVIES, D.L., BOULDIN, D.W. (1979) <doi:10.1109/TPAMI.1979.4766909>, CALINSKI, T., HARABASZ, J. (1974) <doi:10.1080/03610927408827101>, HUBERT, L. (1974) <doi:10.1080/01621459.1974.10480191>, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) <doi:10.1111/1467-9868.00293>, BRECKENRIDGE, J.N. (2000) <doi:10.1207/S15327906MBR3502_5>, WALESIAK, M., DUDEK, A. (2008) <doi:10.1007/978-3-540-78246-9_11>).

Maintained by Andrzej Dudek. Last updated 6 months ago.

cpp

23.8 match 2 stars 6.35 score 512 scripts 9 dependents

tidyverse

tidyr:Tidy Messy Data

Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).

Maintained by Hadley Wickham. Last updated 13 days ago.

tidy-data cpp

6.6 match 1.4k stars 22.88 score 168k scripts 5.5k dependents

pik-piam

mrremind:MadRat REMIND Input Data Package

The mrremind packages contains data preprocessing for the REMIND model.

Maintained by Lavinia Baumstark. Last updated 3 days ago.

24.1 match 4 stars 6.25 score 15 scripts 1 dependents

nathaneastwood

poorman:A Poor Man's Dependency Free Recreation of 'dplyr'

A replication of key functionality from 'dplyr' and the wider 'tidyverse' using only 'base'.

Maintained by Nathan Eastwood. Last updated 1 years ago.

base-r data-manipulation grammar

13.9 match 341 stars 10.79 score 156 scripts 27 dependents

mjlajeunesse

switchboard:An Agile Widget Engine for Real-Time, Dynamic Visualizations

An unsorted collection of visualization widgets rendered in 'Tcl/Tk'<https://www.tcl.tk/> to generate agile dashboards for your iterative simulations. Widgets include progress bars, counters, eavesdroppers, injectors, switches, and sliders for dynamic manipulation and visualization of simulation parameters.

Maintained by Marc J. Lajeunesse. Last updated 3 years ago.

30.1 match 18 stars 4.95 score 2 scripts

welch-lab

rliger:Linked Inference of Genomic Experimental Relationships

Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.

Maintained by Yichen Wang. Last updated 2 months ago.

nonnegative-matrix-factorization single-cell openblas cpp

13.8 match 408 stars 10.77 score 334 scripts 1 dependents

christophergandrud

DataCombine:Tools for Easily Combining and Cleaning Data Sets

Tools for combining and cleaning data sets, particularly with grouped and time series data. This includes functions for merging data while reporting duplicates, filling in columns with values of a column in another data frame, and creating continuous time data for interupted time series.

Maintained by Christopher Gandrud. Last updated 5 years ago.

17.4 match 55 stars 8.50 score 864 scripts 3 dependents

hannameyer

CAST:'caret' Applications for Spatial-Temporal Models

Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.

Maintained by Hanna Meyer. Last updated 2 months ago.

autocorrelation caret feature-selection machine-learning overfitting predictive-modeling spatial spatio-temporal variable-selection

12.3 match 114 stars 11.97 score 298 scripts 1 dependents

husson

FactoMineR:Multivariate Exploratory Data Analysis and Data Mining

Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017).

Maintained by Francois Husson. Last updated 3 months ago.

10.0 match 47 stars 14.71 score 5.6k scripts 112 dependents

lebebr01

simglm:Simulate Models Based on the Generalized Linear Model

Simulates regression models, including both simple regression and generalized linear mixed models with up to three level of nesting. Power simulations that are flexible allowing the specification of missing data, unbalanced designs, and different random error distributions are built into the package.

Maintained by Brandon LeBeau. Last updated 10 months ago.

power simulation

18.7 match 43 stars 7.87 score 87 scripts

jokergoo

GetoptLong:Parsing Command-Line Arguments and Simple Variable Interpolation

This is a command-line argument parser which wraps the powerful Perl module Getopt::Long and with some adaptations for easier use in R. It also provides a simple way for variable interpolation in R.

Maintained by Zuguang Gu. Last updated 2 years ago.

13.3 match 17 stars 10.98 score 478 scripts 155 dependents

kogalur

randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.

Maintained by Udaya B. Kogalur. Last updated 2 months ago.

openmp

18.3 match 10 stars 7.90 score 1.2k scripts 12 dependents

dirkschumacher

ompr:Model and Solve Mixed Integer Linear Programs

Model mixed integer linear programs in an algebraic way directly in R. The model is solver-independent and thus offers the possibility to solve a model with different solvers. It currently only supports linear constraints and objective functions. See the 'ompr' website <https://dirkschumacher.github.io/ompr/> for more information, documentation and examples.

Maintained by Dirk Schumacher. Last updated 2 years ago.

integer-programming linear-programming milp mip optimization

17.4 match 268 stars 8.33 score 321 scripts 6 dependents

dgrun

RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data

Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).

Maintained by Dominic Grün. Last updated 4 months ago.

cpp

30.3 match 4.74 score 110 scripts

ddsjoberg

gtsummary:Presentation-Ready Data Summary and Analytic Result Tables

Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.

Maintained by Daniel D. Sjoberg. Last updated 2 days ago.

easy-to-use gt html5 regression-models reproducibility reproducible-research statistics summary-statistics summary-tables table1 tableone

8.4 match 1.1k stars 17.00 score 8.2k scripts 15 dependents

pecanproject

PEcAn.utils:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Rob Kooper. Last updated 2 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

12.9 match 216 stars 10.92 score 218 scripts 35 dependents

jamiemkass

ENMeval:Automated Tuning and Evaluations of Ecological Niche Models

Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.

Maintained by Jamie M. Kass. Last updated 2 months ago.

12.6 match 49 stars 11.25 score 332 scripts 2 dependents

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 2 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

8.4 match 851 stars 16.68 score 5.4k scripts 51 dependents

tomohiro-ando

PDMIF:Fits Heterogeneous Panel Data Models

Fits heterogeneous panel data models with interactive effects for linear regression, logistic, count, probit, quantile, and clustering. Based on Ando, T. and Bai, J. (2015) "A simple new test for slope homogeneity in panel data models with interactive effects" <doi: 10.1016/j.econlet.2015.09.019>, Ando, T. and Bai, J. (2015) "Asset Pricing with a General Multifactor Structure" <doi: 10.1093/jjfinex/nbu026> , Ando, T. and Bai, J. (2016) "Panel data models with grouped factor structure under unknown group membership" <doi: 10.1002/jae.2467>, Ando, T. and Bai, J. (2017) "Clustering huge number of financial time series: A panel data approach with high-dimensional predictors and factor structures" <doi: 10.1080/01621459.2016.1195743>, Ando, T. and Bai, J. (2020) "Quantile co-movement in financial markets" <doi: 10.1080/01621459.2018.1543598>, Ando, T., Bai, J. and Li, K. (2021) "Bayesian and maximum likelihood analysis of large-scale panel choice models with unobserved heterogeneity" <doi: 10.1016/j.jeconom.2020.11.013.>.

Maintained by Tomohiro Ando. Last updated 2 years ago.

49.1 match 7 stars 2.85 score

laresbernardo

lares:Analytics & Machine Learning Sidekick

Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.

Maintained by Bernardo Lares. Last updated 24 days ago.

analytics api automation automl data-science descriptive-statistics h2o machine-learning marketing mmm predictive-modeling puzzle rlanguage robyn visualization

14.2 match 233 stars 9.84 score 185 scripts 1 dependents

tidyverts

tsibble:Tidy Temporal Data Frames and Tools

Provides a 'tbl_ts' class (the 'tsibble') for temporal data in an data- and model-oriented format. The 'tsibble' provides tools to easily manipulate and analyse temporal data, such as filling in time gaps and aggregating over calendar periods.

Maintained by Earo Wang. Last updated 2 months ago.

9.6 match 538 stars 14.47 score 4.4k scripts 42 dependents

evolecolgroup

pastclim:Manipulate Time Series of Climate Reconstructions

Methods to easily extract and manipulate climate reconstructions for ecological and anthropological analyses, as described in Leonardi et al. (2023) <doi:10.1111/ecog.06481>. The package includes datasets of palaeoclimate reconstructions, present observations, and future projections from multiple climate models.

Maintained by Andrea Manica. Last updated 3 days ago.

climate-data paleoclimate species-distribution-modelling

17.0 match 38 stars 8.12 score 49 scripts

paul-buerkner

brms:Bayesian Regression Models using 'Stan'

Fit Bayesian generalized (non-)linear multivariate multilevel models using 'Stan' for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include both theory-driven and data-driven non-linear terms, auto-correlation structures, censoring and truncation, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their prior knowledge. Models can easily be evaluated and compared using several methods assessing posterior or prior predictions. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>; Bürkner (2018) <doi:10.32614/RJ-2018-017>; Bürkner (2021) <doi:10.18637/jss.v100.i05>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.

Maintained by Paul-Christian Bürkner. Last updated 3 days ago.

bayesian-inference brms multilevel-models stan statistical-models

8.3 match 1.3k stars 16.61 score 13k scripts 34 dependents

cjvanlissa

tidySEM:Tidy Structural Equation Modeling

A tidy workflow for generating, estimating, reporting, and plotting structural equation models using 'lavaan', 'OpenMx', or 'Mplus'. Throughout this workflow, elements of syntax, results, and graphs are represented as 'tidy' data, making them easy to customize. Includes functionality to estimate latent class analyses, and to plot 'dagitty' and 'igraph' objects.

Maintained by Caspar J. van Lissa. Last updated 7 days ago.

12.9 match 58 stars 10.69 score 330 scripts 1 dependents

albertofranzin

bnstruct:Bayesian Network Structure Learning from Data with Missing Values

Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Parents-and-Children, the Hill-Climbing, the Max-Min Hill-climbing heuristic searches, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference.

Maintained by Alberto Franzin. Last updated 1 years ago.

25.5 match 1 stars 5.40 score 111 scripts 3 dependents

statistikat

simPop:Simulation of Complex Synthetic Data Information

Tools and methods to simulate populations for surveys based on auxiliary data. The tools include model-based methods, calibration and combinatorial optimization algorithms, see Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v079.i10>) and Templ (2017) <doi:10.1007/978-3-319-50272-4>. The package was developed with support of the International Household Survey Network, DFID Trust Fund TF011722 and funds from the World bank.

Maintained by Matthias Templ. Last updated 4 months ago.

cpp

21.1 match 31 stars 6.51 score 104 scripts

briencj

growthPheno:Functional Analysis of Phenotypic Growth Data to Smooth and Extract Traits

Assists in the plotting and functional smoothing of traits measured over time and the extraction of features from these traits, implementing the SET (Smoothing and Extraction of Traits) method described in Brien et al. (2020) Plant Methods, 16. Smoothing of growth trends for individual plants using natural cubic smoothing splines or P-splines is available for removing transient effects and segmented smoothing is available to deal with discontinuities in growth trends. There are graphical tools for assessing the adequacy of trait smoothing, both when using this and other packages, such as those that fit nonlinear growth models. A range of per-unit (plant, pot, plot) growth traits or features can be extracted from the data, including single time points, interval growth rates and other growth statistics, such as maximum growth or days to maximum growth. The package also has tools adapted to inputting data from high-throughput phenotyping facilities, such from a Lemna-Tec Scananalyzer 3D (see <https://www.youtube.com/watch?v=MRAF_mAEa7E/> for more information). The package 'growthPheno' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 12 hours ago.

20.6 match 6 stars 6.66 score 42 scripts

cardiomoon

ggiraphExtra:Make Interactive 'ggplot2'. Extension to 'ggplot2' and 'ggiraph'

Collection of functions to enhance 'ggplot2' and 'ggiraph'. Provides functions for exploratory plots. All plot can be a 'static' plot or an 'interactive' plot using 'ggiraph'.

Maintained by Keon-Woong Moon. Last updated 4 years ago.

15.4 match 48 stars 8.93 score 402 scripts 3 dependents

mlverse

torch:Tensors and Neural Networks with 'GPU' Acceleration

Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.

Maintained by Daniel Falbel. Last updated 6 days ago.

autograd deep-learning torch cpp

8.3 match 520 stars 16.52 score 1.4k scripts 38 dependents

dankelley

oce:Analysis of Oceanographic Data

Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.

Maintained by Dan Kelley. Last updated 1 days ago.

oceanography fortran cpp

8.8 match 146 stars 15.42 score 4.2k scripts 18 dependents

ropensci

rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data

Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.

Maintained by OJ Watson. Last updated 17 days ago.

dataset dhs dhs-api extract peer-reviewed survey-data

13.5 match 35 stars 10.07 score 286 scripts 3 dependents

jacobseedorff21

BranchGLM:Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.

Maintained by Jacob Seedorff. Last updated 6 months ago.

generalized-linear-models regression statistics subset-selection variable-selection openblas cpp openmp

21.9 match 7 stars 6.20 score 30 scripts

ycroissant

plm:Linear Models for Panel Data

A set of estimators for models and (robust) covariance matrices, and tests for panel data econometrics, including within/fixed effects, random effects, between, first-difference, nested random effects as well as instrumental-variable (IV) and Hausman-Taylor-style models, panel generalized method of moments (GMM) and general FGLS models, mean groups (MG), demeaned MG, and common correlated effects (CCEMG) and pooled (CCEP) estimators with common factors, variable coefficients and limited dependent variables models. Test functions include model specification, serial correlation, cross-sectional dependence, panel unit root and panel Granger (non-)causality. Typical references are general econometrics text books such as Baltagi (2021), Econometric Analysis of Panel Data (<doi:10.1007/978-3-030-53953-5>), Hsiao (2014), Analysis of Panel Data (<doi:10.1017/CBO9781139839327>), and Croissant and Millo (2018), Panel Data Econometrics with R (<doi:10.1002/9781119504641>).

Maintained by Kevin Tappe. Last updated 23 hours ago.

11.2 match 59 stars 12.06 score 39 dependents

modeloriented

iBreakDown:Model Agnostic Instance Level Variable Attributions

Model agnostic tool for decomposition of predictions from black boxes. Supports additive attributions and attributions with interactions. The Break Down Table shows contributions of every variable to a final prediction. The Break Down Plot presents variable contributions in a concise graphical way. This package works for classification and regression models. It is an extension of the 'breakDown' package (Staniak and Biecek 2018) <doi:10.32614/RJ-2018-072>, with new and faster strategies for orderings. It supports interactions in explanations and has interactive visuals (implemented with 'D3.js' library). The methodology behind is described in the 'iBreakDown' article (Gosiewska and Biecek 2019) <arXiv:1903.11420> This package is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.

Maintained by Przemyslaw Biecek. Last updated 1 years ago.

breakdown iml interpretability shapley xai

13.4 match 84 stars 10.07 score 56 scripts 22 dependents

mrcieu

OneSampleMR:One Sample Mendelian Randomization and Instrumental Variable Analyses

Useful functions for one-sample (individual level data) Mendelian randomization and instrumental variable analyses. The package includes implementations of; the Sanderson and Windmeijer (2016) <doi:10.1016/j.jeconom.2015.06.004> conditional F-statistic, the multiplicative structural mean model Hernán and Robins (2006) <doi:10.1097/01.ede.0000222409.00878.37>, and two-stage predictor substitution and two-stage residual inclusion estimators explained by Terza et al. (2008) <doi:10.1016/j.jhealeco.2007.09.009>.

Maintained by Tom Palmer. Last updated 21 days ago.

instrumental-variable instrumental-variables mendelian-randomisation mendelian-randomization mendelianrandomisation mendelianrandomization

20.1 match 19 stars 6.69 score 16 scripts

american-institutes-for-research

EdSurvey:Analysis of NCES Education Survey and Assessment Data

Read in and analyze functions for education survey and assessment data from the National Center for Education Statistics (NCES) <https://nces.ed.gov/>, including National Assessment of Educational Progress (NAEP) data <https://nces.ed.gov/nationsreportcard/> and data from the International Assessment Database: Organisation for Economic Co-operation and Development (OECD) <https://www.oecd.org/en/about/directorates/directorate-for-education-and-skills.html>, including Programme for International Student Assessment (PISA), Teaching and Learning International Survey (TALIS), Programme for the International Assessment of Adult Competencies (PIAAC), and International Association for the Evaluation of Educational Achievement (IEA) <https://www.iea.nl/>, including Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, Progress in International Reading Literacy Study (PIRLS), International Civic and Citizenship Study (ICCS), International Computer and Information Literacy Study (ICILS), and Civic Education Study (CivEd).

Maintained by Paul Bailey. Last updated 16 days ago.

17.0 match 10 stars 7.86 score 139 scripts 1 dependents

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

18.8 match 145 stars 7.09 score 50 scripts 2 dependents

bioc

singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.

Maintained by Joshua David Campbell. Last updated 24 days ago.

singlecell geneexpression differentialexpression alignment clustering immunooncology batcheffect normalization qualitycontrol dataimport gui

13.1 match 181 stars 10.16 score 252 scripts

usdaforestservice

FIESTAutils:Utility Functions for Forest Inventory Estimation and Analysis

A set of tools for data wrangling, spatial data analysis, statistical modeling (including direct, model-assisted, photo-based, and small area tools), and USDA Forest Service data base tools. These tools are aimed to help Foresters, Analysts, and Scientists extract and perform analyses on USDA Forest Service data.

Maintained by Grayson White. Last updated 2 days ago.

cpp

21.0 match 8 stars 6.33 score 1 dependents

bioc

clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters

Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.

Maintained by Rui Fu. Last updated 5 months ago.

singlecell annotation sequencing microarray geneexpression assign-identities clusters marker-genes rna-seq single-cell-rna-seq

13.8 match 119 stars 9.63 score 296 scripts

julienvollering

MIAmaxent:A Modular, Integrated Approach to Maximum Entropy Distribution Modeling

Tools for training, selecting, and evaluating maximum entropy (and standard logistic regression) distribution models. This package provides tools for user-controlled transformation of explanatory variables, selection of variables by nested model comparison, and flexible model evaluation and projection. It follows principles based on the maximum- likelihood interpretation of maximum entropy modeling, and uses infinitely- weighted logistic regression for model fitting. The package is described in Vollering et al. (2019; <doi:10.1002/ece3.5654>).

Maintained by Julien Vollering. Last updated 7 months ago.

20.3 match 14 stars 6.53 score 30 scripts

tlverse

tmle3:The Extensible TMLE Framework

A general framework supporting the implementation of targeted maximum likelihood estimators (TMLEs) of a diverse range of statistical target parameters through a unified interface. The goal is that the exposed framework be as general as the mathematical framework upon which it draws.

Maintained by Jeremy Coyle. Last updated 4 months ago.

causal-inference machine-learning targeted-learning variable-importance

16.7 match 38 stars 7.91 score 286 scripts 5 dependents

nliulab

AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator

A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.

Maintained by Feng Xie. Last updated 15 days ago.

17.1 match 32 stars 7.70 score 30 scripts

cytomining

cytominer:Methods for Image-Based Cell Profiling

`cytominer` is a suite of common functions used to process high-dimensional readouts from image-based cell profiling experiments.

Maintained by Shantanu Singh. Last updated 2 years ago.

microscopy profiling

19.2 match 50 stars 6.89 score 44 scripts

cdalzell

Lahman:Sean 'Lahman' Baseball Database

Provides the tables from the 'Sean Lahman Baseball Database' as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2023, as recorded in the 2024 version of the database. Documentation examples show how many baseball questions can be investigated.

Maintained by Chris Dalzell. Last updated 4 months ago.

11.0 match 79 stars 11.98 score 1.7k scripts 2 dependents

bioc

phyloseq:Handling and analysis of high-throughput microbiome census data

phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.

Maintained by Paul J. McMurdie. Last updated 5 months ago.

immunooncology sequencing microbiome metagenomics clustering classification multiplecomparison geneticvariability

9.5 match 597 stars 13.90 score 8.4k scripts 37 dependents

greta-dev

greta:Simple and Scalable Statistical Modelling in R

Write statistical models in R and fit them by MCMC and optimisation on CPUs and GPUs, using Google 'TensorFlow'. greta lets you write your own model like in BUGS, JAGS and Stan, except that you write models right in R, it scales well to massive datasets, and it’s easy to extend and build on. See the website for more information, including tutorials, examples, package documentation, and the greta forum.

Maintained by Nicholas Tierney. Last updated 6 days ago.

10.4 match 566 stars 12.53 score 396 scripts 6 dependents

atorus-research

xportr:Utilities to Output CDISC SDTM/ADaM XPT Files

Tools to build CDISC compliant data sets and check for CDISC compliance.

Maintained by Eli Miller. Last updated 3 months ago.

clinical-programmers xpt

14.4 match 43 stars 9.01 score 102 scripts

modeloriented

survex:Explainable Machine Learning in Survival Analysis

Survival analysis models are commonly used in medicine and other areas. Many of them are too complex to be interpreted by human. Exploration and explanation is needed, but standard methods do not give a broad enough picture. 'survex' provides easy-to-apply methods for explaining survival models, both complex black-boxes and simpler statistical models. They include methods specific to survival analysis such as SurvSHAP(t) introduced in Krzyzinski et al., (2023) <doi:10.1016/j.knosys.2022.110234>, SurvLIME described in Kovalev et al., (2020) <doi:10.1016/j.knosys.2020.106164> as well as extensions of existing ones described in Biecek et al., (2021) <doi:10.1201/9780429027192>.

Maintained by Mikołaj Spytek. Last updated 9 months ago.

biostatistics brier-scores censored-data cox-model cox-regression explainable-ai explainable-machine-learning explainable-ml explanatory-model-analysis interpretable-machine-learning interpretable-ml machine-learning probabilistic-machine-learning shap survival-analysis time-to-event variable-importance xai

15.4 match 110 stars 8.40 score 114 scripts

microsoft

vivainsights:Analyze and Visualize Data from 'Microsoft Viva Insights'

Provides a versatile range of functions, including exploratory data analysis, time-series analysis, organizational network analysis, and data validation, whilst at the same time implements a set of best practices in analyzing and visualizing data specific to 'Microsoft Viva Insights'.

Maintained by Martin Chan. Last updated 24 days ago.

21.1 match 11 stars 6.12 score 68 scripts

pik-piam

quitte:Bits and pieces of code to use with quitte-style data frames

A collection of functions for easily dealing with quitte-style data frames, doing multi-model comparisons and plots.

Maintained by Michaja Pehl. Last updated 3 days ago.

15.7 match 8.22 score 184 scripts 35 dependents

ebmtnprof

rties:Modeling Interpersonal Dynamics

The name of this package grew out of our research on temporal interpersonal emotion systems (TIES), hence 'rties'. It provides tools for using a set of models to investigate temporal processes in bivariate (e.g., dyadic) systems. The general approach is to model, one dyad at a time, the dynamics of a variable that is assessed repeatedly from both partners, extract the parameter estimates for each dyad, and then use those parameter estimates as input to a latent profile analysis to extract groups of dyads with qualitatively distinct dynamics. Finally, the profile memberships can be used to either predict, or be predicted by, another variable of interest. Currently, 2 models are supported: 1) inertia-coordination, and 2) a coupled-oscillator. Extended documentation is provided in vignettes. Theoretical background can be found in Butler (2011) <doi:10.1177/1088868311411164> and Butler & Barnard (2019) <doi:10.1097/PSY.0000000000000703>.

Maintained by Emily Butler. Last updated 3 years ago.

32.4 match 11 stars 3.94 score 16 scripts

bdwilliamson

flevr:Flexible, Ensemble-Based Variable Selection with Potentially Missing Data

Perform variable selection in settings with possibly missing data based on extrinsic (algorithm-specific) and intrinsic (population-level) variable importance. Uses a Super Learner ensemble to estimate the underlying prediction functions that give rise to estimates of variable importance. For more information about the methods, please see Williamson and Huang (2023+) <arXiv:2202.12989>.

Maintained by Brian D. Williamson. Last updated 1 years ago.

26.2 match 5 stars 4.88 score 2 scripts

openanalytics

clinDataReview:Clinical Data Review Tool

Creation of interactive tables, listings and figures ('TLFs') and associated report for exploratory analysis of data in a clinical trial, e.g. for clinical oversight activities. Interactive figures include sunburst, treemap, scatterplot, line plot and barplot of counts data. Interactive tables include table of summary statistics (as counts of adverse events, enrollment table) and listings. Possibility to compare data (summary table or listing) across two data batches/sets. A clinical data review report is created via study-specific configuration files and template 'R Markdown' reports contained in the package.

Maintained by Laure Cougnaud. Last updated 9 months ago.

17.9 match 11 stars 7.10 score 36 scripts

usdaforestservice

FIESTA:Forest Inventory Estimation and Analysis

A research estimation tool for analysts that work with sample-based inventory data from the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program.

Maintained by Grayson White. Last updated 3 days ago.

17.6 match 30 stars 7.24 score 62 scripts

pvanlaake

ncdfCF:Easy Access to NetCDF Files with CF Metadata Conventions

Network Common Data Form ('netCDF') files are widely used for scientific data. Library-level access in R is provided through packages 'RNetCDF' and 'ncdf4'. Package 'ncdfCF' is built on top of 'RNetCDF' and makes the data and its attributes available as a set of R6 classes that are informed by the Climate and Forecasting Metadata Conventions. Access to the data uses standard R subsetting operators and common function forms.

Maintained by Patrick Van Laake. Last updated 2 days ago.

23.5 match 5.41 score 4 scripts

rapler

dst:Using the Theory of Belief Functions

Using the Theory of Belief Functions for evidence calculus. Basic probability assignments, or mass functions, can be defined on the subsets of a set of possible values and combined. A mass function can be extended to a larger frame. Marginalization, i.e. reduction to a smaller frame can also be done. These features can be combined to analyze small belief networks and take into account situations where information cannot be satisfactorily described by probability distributions.

Maintained by Peiyuan Zhu. Last updated 3 months ago.

21.3 match 6 stars 5.96 score 126 scripts

imbi-heidelberg

DescrTab2:Publication Quality Descriptive Statistics Tables

Provides functions to create descriptive statistics tables for continuous and categorical variables. By default, summary statistics such as mean, standard deviation, quantiles, minimum and maximum for continuous variables and relative and absolute frequencies for categorical variables are calculated. 'DescrTab2' features a sophisticated algorithm to choose appropriate test statistics for your data and provides p-values. On top of this, confidence intervals for group differences of appropriated summary measures are automatically produces for two-group comparison. Tables generated by 'DescrTab2' can be integrated in a variety of document formats, including .html, .tex and .docx documents. 'DescrTab2' also allows printing tables to console and saving table objects for later use.

Maintained by Jan Meis. Last updated 1 years ago.

categorical-variables continuous-variable descriptive-statistics p-values statistical-tests statistics

18.8 match 9 stars 6.71 score 19 scripts 1 dependents

kurthornik

tseries:Time Series Analysis and Computational Finance

Time series analysis and computational finance.

Maintained by Kurt Hornik. Last updated 6 months ago.

fortran openblas

11.3 match 4 stars 11.22 score 10k scripts 289 dependents

bioc

gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files

Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

Maintained by Xiuwen Zheng. Last updated 2 days ago.

infrastructure dataimport bioinformatics gds-format genomics cpp

11.0 match 18 stars 11.34 score 920 scripts 29 dependents

ropensci

aorsf:Accelerated Oblique Random Forests

Fit, interpret, and compute predictions with oblique random forests. Includes support for partial dependence, variable importance, passing customized functions for variable importance and identification of linear combinations of features. Methods for the oblique random survival forest are described in Jaeger et al., (2023) <DOI:10.1080/10618600.2023.2231048>.

Maintained by Byron Jaeger. Last updated 4 days ago.

data-science oblique random-forest survival openblas cpp openmp

13.5 match 58 stars 9.21 score 60 scripts 1 dependents