Showing 200 of total 4923 results (show query)
pharmaverse
admiral:ADaM in R Asset Library
A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam>).
Maintained by Ben Straub. Last updated 4 days ago.
cdiscclinical-trialsopen-source
100.5 match 236 stars 13.89 score 486 scripts 4 dependentscrunch-io
crunch:Crunch.io Data Tools
The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.
Maintained by Greg Freedman Ellis. Last updated 11 days ago.
118.8 match 9 stars 10.53 score 200 scripts 2 dependentsrolkra
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 3 months ago.
data-explorationdata-visualisationdecision-treesedarmarkdownshinytidy
92.4 match 228 stars 11.43 score 221 scripts 1 dependentsstan-dev
posterior:Tools for Working with Posterior Distributions
Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.
Maintained by Paul-Christian Bürkner. Last updated 10 days ago.
63.4 match 168 stars 16.13 score 3.3k scripts 342 dependentschoonghyunryu
dlookr:Tools for Data Diagnosis, Exploration, Transformation
A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Maintained by Choonghyun Ryu. Last updated 9 months ago.
81.5 match 212 stars 11.05 score 748 scripts 2 dependentsadeverse
ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences
Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
Maintained by Aurélie Siberchicot. Last updated 12 days ago.
52.8 match 39 stars 14.96 score 2.2k scripts 256 dependentstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 6 days ago.
40.9 match 584 stars 18.71 score 7.2k scripts 380 dependentstidyverse
dplyr:A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
Maintained by Hadley Wickham. Last updated 13 days ago.
27.4 match 4.8k stars 24.68 score 659k scripts 7.8k dependentsbeckerbenj
eatGADS:Data Management of Large Hierarchical Data
Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.
Maintained by Benjamin Becker. Last updated 23 days ago.
84.5 match 1 stars 7.36 score 34 scripts 1 dependentsgdemin
expss:Tables, Labels and Some Useful Functions from Spreadsheets and 'SPSS' Statistics
Package computes and displays tables with support for 'SPSS'-style labels, multiple and nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', 'Shiny', '*.xlsx' files, R and 'Jupyter' notebooks. Methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package brings popular data transformation functions from 'SPSS' Statistics and 'Excel': 'RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP' and etc. These functions are very useful for data processing in marketing research surveys. Package intended to help people to move data processing from 'Excel' and 'SPSS' to R.
Maintained by Gregory Demin. Last updated 11 months ago.
excellabelslabels-supportmsexcelpivot-tablesrecodespssspss-statisticstablesvariable-labelsvlookup
54.9 match 84 stars 11.00 score 1.8k scripts 4 dependentsnjtierney
naniar:Data Structures, Summaries, and Visualisations for Missing Data
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.
Maintained by Nicholas Tierney. Last updated 4 days ago.
data-visualisationggplot2missing-datamissingnesstidy-data
37.4 match 657 stars 15.63 score 5.1k scripts 9 dependentsyrosseel
lavaan:Latent Variable Analysis
Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.
Maintained by Yves Rosseel. Last updated 4 hours ago.
factor-analysisgrowth-curve-modelslatent-variablesmissing-datamultilevel-modelsmultivariate-analysispath-analysispsychometricsstatistical-modelingstructural-equation-modeling
32.5 match 453 stars 16.83 score 8.4k scripts 216 dependentseasystats
datawizard:Easy Data Wrangling and Statistical Transformations
A lightweight package to assist in key steps involved in any data analysis workflow: (1) wrangling the raw data to get it in the needed form, (2) applying preprocessing steps and statistical transformations, and (3) compute statistical summaries of data properties and distributions. It is also the data wrangling backend for packages in 'easystats' ecosystem. References: Patil et al. (2022) <doi:10.21105/joss.04684>.
Maintained by Etienne Bacher. Last updated 9 days ago.
datadplyrhacktoberfestjanitormanipulationreshapetidyrwrangling
36.3 match 222 stars 14.71 score 436 scripts 119 dependentsnlmixr2
rxode2:Facilities for Simulating from ODE-Based Models
Facilities for running simulations from ordinary differential equation ('ODE') models, such as pharmacometrics and other compartmental models. A compilation manager translates the ODE model into C, compiles it, and dynamically loads the object code into R for improved computational efficiency. An event table object facilitates the specification of complex dosing regimens (optional) and sampling schedules. NB: The use of this package requires both C and Fortran compilers, for details on their use with R please see Section 6.3, Appendix A, and Appendix D in the "R Administration and Installation" manual. Also the code is mostly released under GPL. The 'VODE' and 'LSODA' are in the public domain. The information is available in the inst/COPYRIGHTS.
Maintained by Matthew L. Fidler. Last updated 30 days ago.
46.7 match 40 stars 11.24 score 220 scripts 13 dependentsewenharrison
finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling
Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.
Maintained by Ewen Harrison. Last updated 7 months ago.
45.8 match 270 stars 11.43 score 1.0k scriptsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 4 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
37.5 match 182 stars 13.71 score 1.3k scripts 22 dependentskkholst
lava:Latent Variable Models
A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.
Maintained by Klaus K. Holst. Last updated 2 months ago.
latent-variable-modelssimulationstatisticsstructural-equation-models
39.1 match 33 stars 12.85 score 610 scripts 476 dependentsheli-xu
findSVI:Calculate Social Vulnerability Index for Communities
Developed by CDC/ATSDR (Centers for Disease Control and Prevention/ Agency for Toxic Substances and Disease Registry), Social Vulnerability Index (SVI) serves as a tool to assess the resilience of communities by taking into account socioeconomic and demographic factors. Provided with year(s), region(s) and a geographic level of interest, 'findSVI' retrieves required variables from US census data and calculates SVI for communities in the specified area based on CDC/ATSDR SVI documentation. Reference for the calculation methods: Flanagan BE, Gregory EW, Hallisey EJ, Heitgerd JL, Lewis B (2011) <doi:10.2202/1547-7355.1792>.
Maintained by Heli Xu. Last updated 1 months ago.
86.5 match 12 stars 5.68 score 16 scriptsmjskay
tidybayes:Tidy Data and 'Geoms' for Bayesian Models
Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.
Maintained by Matthew Kay. Last updated 6 months ago.
bayesian-data-analysisbrmsggplot2jagsstantidy-datavisualization
33.3 match 733 stars 14.72 score 7.3k scripts 20 dependentswinvector
vtreat:A Statistically Sound 'data.frame' Processor/Conditioner
A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.
Maintained by John Mount. Last updated 2 months ago.
categorical-variablesmachine-learning-algorithmsnested-modelsprepare-data
43.1 match 285 stars 11.19 score 328 scripts 1 dependentsrhartmano
labelr:Label Data Frames, Variables, and Values
Create and use data frame labels for data frame objects (frame labels), their columns (name labels), and individual values of a column (value labels). Value labels include one-to-one and many-to-one labels for nominal and ordinal variables, as well as numerical range-based value labels for continuous variables. Convert value-labeled variables so each value is replaced by its corresponding value label. Add values-converted-to-labels columns to a value-labeled data frame while preserving parent columns. Filter and subset a value-labeled data frame using labels, while returning results in terms of values. Overlay labels in place of values in common R commands to increase interpretability. Generate tables of value frequencies, with categories expressed as raw values or as labels. Access data frames that show value-to-label mappings for easy reference.
Maintained by Robert Hartman. Last updated 7 months ago.
83.0 match 3 stars 5.65 score 10 scriptst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
40.9 match 10.82 score 10k scripts 54 dependentsdreamrs
fresh:Create Custom 'Bootstrap' Themes to Use in 'Shiny'
Customize 'Bootstrap' and 'Bootswatch' themes, like colors, fonts, grid layout, to use in 'Shiny' applications, 'rmarkdown' documents and 'flexdashboard'.
Maintained by Victor Perrier. Last updated 9 months ago.
bootstrapshinyshiny-applicationsshiny-themes
36.2 match 228 stars 11.99 score 546 scripts 47 dependentsbig-life-lab
recodeflow:Contains functions to interface with variable details sheets, including recoding variables and converting them to PMML
Recode and harmonize data using variable and details sheets.
Maintained by Yulric Sequeria. Last updated 6 days ago.
63.1 match 6 stars 6.75 score 7 scriptsharrelfe
Hmisc:Harrell Miscellaneous
Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.
Maintained by Frank E Harrell Jr. Last updated 7 hours ago.
22.8 match 210 stars 17.61 score 17k scripts 750 dependentsmayoverse
arsenal:An Arsenal of 'R' Functions for Large-Scale Statistical Summaries
An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types 'by' the levels of one or more categorical variables; paired(), a Table-1-like summary of multiple variable types paired across two time points; modelsum(), which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates); freqlist(), a powerful frequency table across many categorical variables; comparedf(), a function for comparing data.frames; and write2(), a function to output tables to a document.
Maintained by Ethan Heinzen. Last updated 7 months ago.
baseline-characteristicsdescriptive-statisticsmodelingpaired-comparisonsreportingstatisticstableone
29.7 match 225 stars 13.45 score 1.2k scripts 16 dependentsdaya6489
SmartEDA:Summarize and Explore the Data
Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.
Maintained by Dayanand Ubrangala. Last updated 1 years ago.
analysisexploratory-data-analysis
54.4 match 42 stars 7.25 score 214 scriptsbiomodhub
biomod2:Ensemble Platform for Species Distribution Modeling
Functions for species distribution modeling, calibration and evaluation, ensemble of models, ensemble forecasting and visualization. The package permits to run consistently up to 10 single models on a presence/absences (resp presences/pseudo-absences) dataset and to combine them in ensemble models and ensemble projections. Some bench of other evaluation and visualisation tools are also available within the package.
Maintained by Maya Gueguen. Last updated 5 days ago.
28.3 match 95 stars 13.88 score 536 scripts 7 dependentskoalaverse
vip:Variable Importance Plots
A general framework for constructing variable importance plots from various types of machine learning models in R. Aside from some standard model- specific variable importance measures, this package also provides model- agnostic approaches that can be applied to any supervised learning algorithm. These include 1) an efficient permutation-based variable importance measure, 2) variable importance based on Shapley values (Strumbelj and Kononenko, 2014) <doi:10.1007/s10115-013-0679-x>, and 3) the variance-based approach described in Greenwell et al. (2018) <arXiv:1805.04755>. A variance-based method for quantifying the relative strength of interaction effects is also included (see the previous reference for details).
Maintained by Brandon M. Greenwell. Last updated 2 years ago.
interaction-effectmachine-learningpartial-dependence-plotsupervised-learning-algorithmsvariable-importancevariable-importance-plots
33.2 match 187 stars 11.61 score 3.5k scripts 6 dependentsuupharmacometrics
xpose4:Diagnostics for Nonlinear Mixed-Effect Models
A model building aid for nonlinear mixed-effects (population) model analysis using NONMEM, facilitating data set checkout, exploration and visualization, model diagnostics, candidate covariate identification and model comparison. The methods are described in Keizer et al. (2013) <doi:10.1038/psp.2013.24>, and Jonsson et al. (1999) <doi:10.1016/s0169-2607(98)00067-4>.
Maintained by Andrew C. Hooker. Last updated 1 years ago.
diagnosticsnonmempharmacometricspopulation-modelxpose
51.5 match 35 stars 7.30 score 315 scriptsr-forge
variables:Variable Descriptions
Abstract descriptions of (yet) unobserved variables.
Maintained by Torsten Hothorn. Last updated 4 days ago.
62.5 match 5.92 score 11 scripts 12 dependentsr-lib
tidyselect:Select from a Set of Strings
A backend for the selecting functions of the 'tidyverse'. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other 'tidyverse' interfaces for selection.
Maintained by Lionel Henry. Last updated 4 months ago.
19.7 match 130 stars 18.31 score 1.9k scripts 8.2k dependentsekstroem
dataMaid:A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process
Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.
Maintained by Claus Thorn Ekstrøm. Last updated 3 years ago.
data-cleaningdata-screeningreproducible-research
47.6 match 143 stars 7.53 score 236 scriptsvegandevs
vegan:Community Ecology Package
Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
Maintained by Jari Oksanen. Last updated 16 days ago.
ecological-modellingecologyordinationfortranopenblas
18.1 match 472 stars 19.41 score 15k scripts 440 dependentsrdatatable
data.table:Extension of `data.frame`
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
Maintained by Tyson Barrett. Last updated 2 days ago.
14.5 match 3.7k stars 23.53 score 230k scripts 4.6k dependentsinsightsengineering
rtables:Reporting Tables
Reporting tables often have structure that goes beyond simple rectangular data. The 'rtables' package provides a framework for declaring complex multi-level tabulations and then applying them to data. This framework models both tabulation and the resulting tables as hierarchical, tree-like objects which support sibling sub-tables, arbitrary splitting or grouping of data in row and column dimensions, cells containing multiple values, and the concept of contextual summary computations. A convenient pipe-able interface is provided for declaring table layouts and the corresponding computations, and then applying them to data.
Maintained by Joe Zhu. Last updated 2 months ago.
24.2 match 232 stars 13.65 score 238 scripts 17 dependentsnicolas-robette
GDAtools:Geometric Data Analysis
Many tools for Geometric Data Analysis (Le Roux & Rouanet (2005) <doi:10.1007/1-4020-2236-0>), such as MCA variants (Specific Multiple Correspondence Analysis, Class Specific Analysis), many graphical and statistical aids to interpretation (structuring factors, concentration ellipses, inductive tests, bootstrap validation, etc.) and multiple-table analysis (Multiple Factor Analysis, between- and inter-class analysis, Principal Component Analysis and Correspondence Analysis with Instrumental Variables, etc.).
Maintained by Nicolas Robette. Last updated 10 months ago.
55.5 match 10 stars 5.93 score 94 scripts 2 dependentsrsquaredacademy
olsrr:Tools for Building OLS Regression Models
Tools designed to make it easier for users, particularly beginner/intermediate R users to build ordinary least squares regression models. Includes comprehensive regression output, heteroskedasticity tests, collinearity diagnostics, residual diagnostics, measures of influence, model fit assessment and variable selection procedures.
Maintained by Aravind Hebbali. Last updated 4 months ago.
collinearity-diagnosticslinear-modelsregressionstepwise-regression
26.9 match 103 stars 12.19 score 1.4k scripts 4 dependentsbdwilliamson
vimp:Perform Inference on Algorithm-Agnostic Variable Importance
Calculate point estimates of and valid confidence intervals for nonparametric, algorithm-agnostic variable importance measures in high and low dimensions, using flexible estimators of the underlying regression functions. For more information about the methods, please see Williamson et al. (Biometrics, 2020), Williamson et al. (JASA, 2021), and Williamson and Feng (ICML, 2020).
Maintained by Brian D. Williamson. Last updated 1 months ago.
machine-learningnonparametric-statisticsstatistical-inferencevariable-importance
47.5 match 23 stars 6.79 score 67 scriptstopepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 3 months ago.
16.6 match 1.6k stars 19.24 score 61k scripts 303 dependentslarmarange
broom.helpers:Helpers for Model Coefficients Tibbles
Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.
Maintained by Joseph Larmarange. Last updated 10 days ago.
27.7 match 22 stars 11.45 score 165 scripts 2 dependentstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
38.0 match 3 stars 8.20 score 7.8k scripts 11 dependentstbates
umx:Structural Equation Modeling and Twin Modeling in R
Quickly create, run, and report structural equation models, and twin models. See '?umx' for help, and umx_open_CRAN_page("umx") for NEWS. Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. <doi:10.1017/thg.2019.2>.
Maintained by Timothy C. Bates. Last updated 2 days ago.
behavior-geneticsgeneticsopenmxpsychologysemstatisticsstructural-equation-modelingtutorialstwin-modelsumx
31.7 match 44 stars 9.45 score 472 scriptspharmaverse
admiralvaccine:Vaccine Extension Package for ADaM in 'R' Asset Library
Programming vaccine specific Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in 'R'. Flat model is followed as per Center for Biologics Evaluation and Research (CBER) guidelines for creating vaccine specific domains. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team (2021), <https://www.cdisc.org/standards/foundational/adam/adamig-v1-3-release-package>). The package is an extension package of the 'admiral' package.
Maintained by Sukalpo Saha. Last updated 2 months ago.
39.8 match 6 stars 7.44 score 23 scriptspablo14
funModeling:Exploratory Data Analysis and Data Preparation Tool-Box
Around 10% of almost any predictive modeling project is spent in predictive modeling, 'funModeling' and the book Data Science Live Book (<https://livebook.datascienceheroes.com/>) are intended to cover remaining 90%: data preparation, profiling, selecting best variables 'dataViz', assessing model performance and other functions.
Maintained by Pablo Casas. Last updated 2 years ago.
34.3 match 100 stars 8.57 score 654 scriptsekstroem
dataReporter:Reproducible Data Screening Checks and Report of Possible Errors
Data screening is an important first step of any statistical analysis. 'dataReporter' auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset. See Petersen AH, Ekstrøm CT (2019). "dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R." _Journal of Statistical Software_, *90*(6), 1-38 <doi:10.18637/jss.v090.i06> for more information.
Maintained by Claus Thorn Ekstrøm. Last updated 2 years ago.
47.6 match 86 stars 6.16 score 34 scriptstidymodels
infer:Tidy Statistical Inference
The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.
Maintained by Simon Couch. Last updated 6 months ago.
18.7 match 734 stars 15.69 score 3.5k scripts 17 dependentsinsightsengineering
tern:Create Common TLGs Used in Clinical Trials
Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.
Maintained by Joe Zhu. Last updated 2 months ago.
clinical-trialsgraphslistingsnestoutputstables
23.2 match 79 stars 12.62 score 186 scripts 9 dependentsmatteo21q
jomo:Multilevel Joint Modelling Multiple Imputation
Similarly to Schafer's package 'pan', 'jomo' is a package for multilevel joint modelling multiple imputation (Carpenter and Kenward, 2013) <doi:10.1002/9781119942283>. Novel aspects of 'jomo' are the possibility of handling binary and categorical data through latent normal variables, the option to use cluster-specific covariance matrices and to impute compatibly with the substantive model.
Maintained by Matteo Quartagno. Last updated 3 years ago.
30.5 match 3 stars 9.58 score 126 scripts 154 dependentsbioc
RImmPort:RImmPort: Enabling Ready-for-analysis Immunology Research Data
The RImmPort package simplifies access to ImmPort data for analysis in the R environment. It provides a standards-based interface to the ImmPort study data that is in a proprietary format.
Maintained by Zicheng Hu. Last updated 5 months ago.
biomedicalinformaticsdataimportdatarepresentation
67.0 match 4.33 score 27 scriptstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 9 days ago.
26.6 match 160 stars 10.87 score 964 scripts 1 dependentsdgerbing
lessR:Less Code, More Results
Each function replaces multiple standard R functions. For example, two function calls, Read() and CountAll(), generate summary statistics for all variables in the data frame, plus histograms and bar charts as appropriate. Other functions provide for summary statistics via pivot tables, a comprehensive regression analysis, ANOVA and t-test, visualizations including the Violin/Box/Scatter plot for a numerical variable, bar chart, histogram, box plot, density curves, calibrated power curve, reading multiple data formats with the same function call, variable labels, time series with aggregation and forecasting, color themes, and Trellis (facet) graphics. Also includes a confirmatory factor analysis of multiple indicator measurement models, pedagogical routines for data simulation such as for the Central Limit Theorem, generation and rendering of regression instructions for interpretative output, and interactive visualizations.
Maintained by David W. Gerbing. Last updated 1 months ago.
37.0 match 6 stars 7.47 score 394 scripts 3 dependentslrberge
fixest:Fast Fixed-Effects Estimations
Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.
Maintained by Laurent Berge. Last updated 7 months ago.
18.8 match 387 stars 14.69 score 3.8k scripts 25 dependentsjenniniku
gllvm:Generalized Linear Latent Variable Models
Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).
Maintained by Jenni Niku. Last updated 11 hours ago.
25.9 match 52 stars 10.53 score 176 scripts 1 dependentscvxgrp
CVXR:Disciplined Convex Optimization
An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.
Maintained by Anqi Fu. Last updated 4 months ago.
20.9 match 207 stars 12.89 score 768 scripts 51 dependentslarmarange
labelled:Manipulating Labelled Data
Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.
Maintained by Joseph Larmarange. Last updated 26 days ago.
havenlabelsmetadatasasspssstata
17.6 match 76 stars 15.02 score 2.4k scripts 96 dependentsopenanalytics
inTextSummaryTable:Creation of in-Text Summary Table
Creation of tables of summary statistics or counts for clinical data (for 'TLFs'). These tables can be exported as in-text table (with the 'flextable' package) for a Clinical Study Report (Word format) or a 'topline' presentation (PowerPoint format), or as interactive table (with the 'DT' package) to an html document for clinical data review.
Maintained by Laure Cougnaud. Last updated 9 months ago.
45.9 match 1 stars 5.52 score 47 scriptsinzightvit
iNZightTools:Tools for 'iNZight'
Provides a collection of wrapper functions for common variable and dataset manipulation workflows primarily used by 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions. Additionally, many of the functions return the 'tidyverse' code used to obtain the result in an effort to bridge the gap between GUI and coding.
Maintained by Tom Elliott. Last updated 3 months ago.
48.7 match 1 stars 5.16 score 18 scripts 2 dependentspecanproject
PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.
Maintained by David LeBauer. Last updated 2 days ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
21.5 match 216 stars 11.59 score 64 scripts 14 dependentsradiant-rstats
radiant.data:Data Menu for Radiant: Business Analytics using R and Shiny
The Radiant Data menu includes interfaces for loading, saving, viewing, visualizing, summarizing, transforming, and combining data. It also contains functionality to generate reproducible reports of the analyses conducted in the application.
Maintained by Vincent Nijs. Last updated 5 months ago.
29.8 match 54 stars 8.30 score 146 scripts 6 dependentsjuba
questionr:Functions to Make Surveys Processing Easier
Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.
Maintained by Julien Barnier. Last updated 1 days ago.
19.4 match 83 stars 12.62 score 1.1k scripts 19 dependentssdctools
sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Maintained by Matthias Templ. Last updated 26 days ago.
24.1 match 83 stars 9.89 score 258 scriptsconsbiol-unibern
SDMtune:Species Distribution Model Selection
User-friendly framework that enables the training and the evaluation of species distribution models (SDMs). The package implements functions for data driven variable selection and model tuning and includes numerous utilities to display the results. All the functions used to select variables or to tune model hyperparameters have an interactive real-time chart displayed in the 'RStudio' viewer pane during their execution.
Maintained by Sergio Vignali. Last updated 3 months ago.
hyperparameter-tuningspecies-distribution-modellingvariable-selectioncpp
32.2 match 25 stars 7.37 score 155 scriptsjacobkap
fastDummies:Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables
Creates dummy columns from columns that have categorical variables (character or factor types). You can also specify which columns to make dummies out of, or which columns to ignore. Also creates dummy rows from character, factor, and Date columns. This package provides a significant speed increase from creating dummy variables through model.matrix().
Maintained by Jacob Kaplan. Last updated 2 months ago.
binary-datadummy-columnsdummy-datadummy-rowsdummy-variable
18.0 match 36 stars 13.14 score 2.5k scripts 131 dependentsfabrice-rossi
mixvlmc:Variable Length Markov Chains with Covariates
Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates.
Maintained by Fabrice Rossi. Last updated 10 months ago.
machine-learningmarkov-chainmarkov-modelstatisticstime-seriescpp
37.5 match 2 stars 6.23 score 20 scriptslleisong
itsdm:Isolation Forest-Based Presence-Only Species Distribution Modeling
Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest) and its variations such as Extended isolation forest and SCiForest. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) <https://proceedings.mlr.press/v48/guha16.html>, Cortes, D. (2021) <arXiv:2110.13402>. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) <https://dl.acm.org/doi/abs/10.5555/3295222.3295230>, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Štrumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including 'WorldClim' version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and 'CMCC-BioClimInd' (see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>.
Maintained by Lei Song. Last updated 2 years ago.
isolation-forestoutlier-detectionpresence-onlymodelshapley-valuespecies-distribution-modelling
41.0 match 4 stars 5.59 score 65 scriptsggobi
GGally:Extension to 'ggplot2'
The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.
Maintained by Barret Schloerke. Last updated 10 months ago.
14.1 match 597 stars 16.15 score 17k scripts 154 dependentsconfig-i1
greybox:Toolbox for Model Building and Forecasting
Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
Maintained by Ivan Svetunkov. Last updated 2 days ago.
forecastingmodel-selectionmodel-selection-and-evaluationregressionregression-modelsstatisticscpp
20.7 match 30 stars 11.03 score 97 scripts 34 dependentscdcgov
surveytable:Formatted Survey Estimates
Short and understandable commands that generate tabulated, formatted, and rounded survey estimates. Mostly a wrapper for the 'survey' package (Lumley (2004) <doi:10.18637/jss.v009.i08> <https://CRAN.R-project.org/package=survey>) that identifies low-precision estimates using the National Center for Health Statistics (NCHS) presentation standards (Parker et al. (2017) <https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf>, Parker et al. (2023) <doi:10.15620/cdc:124368>).
Maintained by Alex Strashny. Last updated 4 days ago.
estimatesformatted-outputpretty-printsurveytables
33.6 match 6 stars 6.71 score 19 scriptsrcalinjageman
esci:Estimation Statistics with Confidence Intervals
A collection of functions and 'jamovi' module for the estimation approach to inferential statistics, the approach which emphasizes effect sizes, interval estimates, and meta-analysis. Nearly all functions are based on 'statpsych' and 'metafor'. This package is still under active development, and breaking changes are likely, especially with the plot and hypothesis test functions. Data sets are included for all examples from Cumming & Calin-Jageman (2024) <ISBN:9780367531508>.
Maintained by Robert Calin-Jageman. Last updated 22 days ago.
jamovijaspsciencestatisticsvisualization
40.9 match 22 stars 5.42 score 12 scriptsmerliseclyde
BAS:Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling
Package for Bayesian Variable Selection and Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner's g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the mixture of g-priors from Liang et al (2008) <DOI:10.1198/016214507000001337> for linear models or mixtures of g-priors from Li and Clyde (2019) <DOI:10.1080/01621459.2018.1469992> in generalized linear models. Other model selection criteria include AIC, BIC and Empirical Bayes estimates of g. Sampling probabilities may be updated based on the sampled models using sampling w/out replacement or an efficient MCMC algorithm which samples models using a tree structure of the model space as an efficient hash table. See Clyde, Ghosh and Littman (2010) <DOI:10.1198/jcgs.2010.09049> for details on the sampling algorithms. Uniform priors over all models or beta-binomial prior distributions on model size are allowed, and for large p truncated priors on the model space may be used to enforce sampling models that are full rank. The user may force variables to always be included in addition to imposing constraints that higher order interactions are included only if their parents are included in the model. This material is based upon work supported by the National Science Foundation under Division of Mathematical Sciences grant 1106891. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Maintained by Merlise Clyde. Last updated 4 months ago.
bayesianbayesian-inferencegeneralized-linear-modelslinear-regressionlogistic-regressionmcmcmodel-selectionpoisson-regressionpredictive-modelingregressionvariable-selectionfortranopenblas
20.5 match 44 stars 10.81 score 420 scripts 3 dependentsehrlinger
ggRandomForests:Visually Exploring Random Forests
Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.
Maintained by John Ehrlinger. Last updated 5 days ago.
24.7 match 148 stars 8.94 score 197 scriptskaz-yos
tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights
Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.
Maintained by Kazuki Yoshida. Last updated 3 years ago.
baseline-characteristicsdescriptive-statisticsstatistics
16.2 match 221 stars 13.55 score 2.3k scripts 12 dependentsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 4 days ago.
deep-learningmachine-learningnlptransformersopenjdk
16.1 match 146 stars 13.16 score 436 scripts 1 dependentsgeomorphr
geomorph:Geometric Morphometric Analyses of 2D and 3D Landmark Data
Read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation.
Maintained by Dean Adams. Last updated 1 months ago.
17.6 match 76 stars 12.05 score 700 scripts 6 dependentstrackerproject
trackeR:Infrastructure for Running, Cycling and Swimming Data from GPS-Enabled Tracking Devices
Provides infrastructure for handling running, cycling and swimming data from GPS-enabled tracking devices within R. The package provides methods to extract, clean and organise workout and competition data into session-based and unit-aware data objects of class 'trackeRdata' (S3 class). The information can then be visualised, summarised, and analysed through flexible and extensible methods. Frick and Kosmidis (2017) <doi: 10.18637/jss.v082.i07>, which is updated and maintained as one of the vignettes, provides detailed descriptions of the package and its methods, and real-data demonstrations of the package functionality.
Maintained by Ioannis Kosmidis. Last updated 1 years ago.
32.4 match 90 stars 6.37 score 58 scripts 1 dependentseasystats
insight:Easy Access to Model Information for Various Model Objects
A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.
Maintained by Daniel Lüdecke. Last updated 5 days ago.
easystatshacktoberfestinsightmodelsnamespredictorsrandom
11.9 match 412 stars 17.24 score 568 scripts 210 dependentssollano
forestmangr:Forest Mensuration and Management
Processing forest inventory data with methods such as simple random sampling, stratified random sampling and systematic sampling. There are also functions for yield and growth predictions and model fitting, linear and nonlinear grouped data fitting, and statistical tests. References: Kershaw Jr., Ducey, Beers and Husch (2016). <doi:10.1002/9781118902028>.
Maintained by Sollano Rabelo Braga. Last updated 4 months ago.
25.7 match 17 stars 7.97 score 378 scriptsr-a-dobson
dynamicSDM:Species Distribution and Abundance Modelling at High Spatio-Temporal Resolution
A collection of novel tools for generating species distribution and abundance models (SDM) that are dynamic through both space and time. These highly flexible functions incorporate spatial and temporal aspects across key SDM stages; including when cleaning and filtering species occurrence data, generating pseudo-absence records, assessing and correcting sampling biases and autocorrelation, extracting explanatory variables and projecting distribution patterns. Throughout, functions utilise Google Earth Engine and Google Drive to minimise the computing power and storage demands associated with species distribution modelling at high spatio-temporal resolution.
Maintained by Rachel Dobson. Last updated 26 days ago.
dynamicsdmgoogle-earth-enginegoogledrivesdmspatiotemporalspatiotemporal-data-analysisspatiotemporal-forecastingspecies-distribution-modellingspecies-distributions
32.6 match 6 stars 6.16 score 20 scriptsmountainmath
cancensus:Access, Retrieve, and Work with Canadian Census Data and Geography
Integrated, convenient, and uniform access to Canadian Census data and geography retrieved using the 'CensusMapper' API. This package produces analysis-ready tidy data frames and spatial data in multiple formats, as well as convenience functions for working with Census variables, variable hierarchies, and region selection. API keys are freely available with free registration at <https://censusmapper.ca/api>. Census data and boundary geometries are reproduced and distributed on an "as is" basis with the permission of Statistics Canada (Statistics Canada 2001; 2006; 2011; 2016; 2021).
Maintained by Dmitry Shkolnik. Last updated 1 years ago.
22.8 match 82 stars 8.80 score 414 scriptsr-causal
ggdag:Analyze and Create Elegant Directed Acyclic Graphs
Tidy, analyze, and plot directed acyclic graphs (DAGs). 'ggdag' is built on top of 'dagitty', an R package that uses the 'DAGitty' web tool (<https://dagitty.net/>) for creating and analyzing DAGs. 'ggdag' makes it easy to tidy and plot 'dagitty' objects using 'ggplot2' and 'ggraph', as well as common analytic and graphical functions, such as determining adjustment sets and node relationships.
Maintained by Malcolm Barrett. Last updated 8 months ago.
causal-inferencedagggplot-extension
17.0 match 443 stars 11.78 score 1.8k scripts 5 dependentschavent
ClustOfVar:Clustering of Variables
Cluster analysis of a set of variables. Variables can be quantitative, qualitative or a mixture of both.
Maintained by Marie Chavent. Last updated 5 years ago.
30.8 match 7 stars 6.47 score 142 scripts 2 dependentsmodeloriented
vivo:Variable Importance via Oscillations
Provides an easy to calculate local variable importance measure based on Ceteris Paribus profile and global variable importance measure based on Partial Dependence Profiles.
Maintained by Anna Kozak. Last updated 4 years ago.
explainable-aiexplainable-artificial-intelligenceexplainable-mlimlinterpretable-machine-learningvariable-importancexai
36.5 match 14 stars 5.45 score 7 scriptsdanchaltiel
crosstable:Crosstables for Descriptive Analyses
Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using 'rmarkdown' or 'officer'. You can also compute effect sizes and statistical tests if needed.
Maintained by Dan Chaltiel. Last updated 2 months ago.
descriptive-statisticsflextablefrequency-tablehtml-reportmswordofficer
19.1 match 116 stars 10.37 score 340 scriptsecmerkle
blavaan:Bayesian Latent Variable Analysis
Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models. References: Merkle & Rosseel (2018) <doi:10.18637/jss.v085.i04>; Merkle et al. (2021) <doi:10.18637/jss.v100.i06>.
Maintained by Edgar Merkle. Last updated 4 days ago.
bayesian-statisticsfactor-analysisgrowth-curve-modelslatent-variablesmissing-datamultilevel-modelsmultivariate-analysispath-analysispsychometricsstatistical-modelingstructural-equation-modelingcpp
18.3 match 92 stars 10.84 score 183 scripts 3 dependentsclewerenz
ilabelled:Simple Handling of Labelled Data
Simple handling of survey data. Smart handling of meta-information like e.g. variable-labels value-labels and scale-levels. Easy access and validation of meta-information. Useage of value labels and values respectively for subsetting and recoding data.
Maintained by Christof Lewerenz. Last updated 2 months ago.
32.6 match 2 stars 6.02 score 13 scriptsrpolars
polars:Lightning-Fast 'DataFrame' Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Soren Welling. Last updated 3 days ago.
16.3 match 499 stars 12.01 score 1.0k scripts 2 dependentsrfastofficial
Rfast2:A Collection of Efficient and Extremely Fast R Functions II
A collection of fast statistical and utility functions for data analysis. Functions for regression, maximum likelihood, column-wise statistics and many more have been included. C++ has been utilized to speed up the functions. References: Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>.
Maintained by Manos Papadakis. Last updated 1 years ago.
23.6 match 38 stars 8.09 score 75 scripts 26 dependentspsychmeta
psychmeta:Psychometric Meta-Analysis Toolkit
Tools for computing bare-bones and psychometric meta-analyses and for generating psychometric data for use in meta-analysis simulations. Supports bare-bones, individual-correction, and artifact-distribution methods for meta-analyzing correlations and d values. Includes tools for converting effect sizes, computing sporadic artifact corrections, reshaping meta-analytic databases, computing multivariate corrections for range variation, and more. Bugs can be reported to <https://github.com/psychmeta/psychmeta/issues> or <issues@psychmeta.com>.
Maintained by Jeffrey A. Dahlke. Last updated 9 months ago.
hacktoberfestmeta-analysispsychologypsychometricpsychometrics
22.5 match 57 stars 8.25 score 151 scriptsgavinsimpson
gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'
Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.
Maintained by Gavin L. Simpson. Last updated 10 hours ago.
distributional-regressiongamgammgeneralized-additive-mixed-modelsgeneralized-additive-modelsggplot2glmlmmgcvpenalized-splinerandom-effectssmoothingsplines
14.3 match 217 stars 12.99 score 1.6k scripts 2 dependentspauljohn32
rockchalk:Regression Estimation and Presentation
A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.
Maintained by Paul E. Johnson. Last updated 3 years ago.
26.0 match 7.13 score 584 scripts 18 dependentspletschm
aldvmm:Adjusted Limited Dependent Variable Mixture Models
The goal of the package 'aldvmm' is to fit adjusted limited dependent variable mixture models of health state utilities. Adjusted limited dependent variable mixture models are finite mixtures of normal distributions with an accumulation of density mass at the limits, and a gap between 100% quality of life and the next smaller utility value. The package 'aldvmm' uses the likelihood and expected value functions proposed by Hernandez Alava and Wailoo (2015) <doi:10.1177/1536867X1501500307> using normal component distributions and a multinomial logit model of probabilities of component membership.
Maintained by Mark Pletscher. Last updated 1 years ago.
clinical-trialscost-effectivenesseq5dfinite-mixturehealth-economicshtahuilimited-dependent-variablemappingmixture-modelpatient-reported-outcomesquality-of-lifeutilities
41.6 match 5 stars 4.40 score 2 scriptshofnerb
stabs:Stability Selection with Error Control
Resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Buhlmann, 2010, <doi:10.1111/j.1467-9868.2010.00740.x>) and complementary pairs stability selection with improved error bounds (Shah & Samworth, 2013, <doi:10.1111/j.1467-9868.2011.01034.x>) are implemented. The package can be combined with arbitrary user specified variable selection approaches.
Maintained by Benjamin Hofner. Last updated 4 years ago.
machine-learningr-languageresamplingstability-selectionvariable-importancevariable-selection
19.0 match 26 stars 9.59 score 53 scripts 31 dependentsbsvars
bsvars:Bayesian Estimation of Structural Vector Autoregressive Models
Provides fast and efficient procedures for Bayesian analysis of Structural Vector Autoregressions. This package estimates a wide range of models, including homo-, heteroskedastic, and non-normal specifications. Structural models can be identified by adjustable exclusion restrictions, time-varying volatility, or non-normality. They all include a flexible three-level equation-specific local-global hierarchical prior distribution for the estimated level of shrinkage for autoregressive and structural parameters. Additionally, the package facilitates predictive and structural analyses such as impulse responses, forecast error variance and historical decompositions, forecasting, verification of heteroskedasticity, non-normality, and hypotheses on autoregressive parameters, as well as analyses of structural shocks, volatilities, and fitted values. Beautiful plots, informative summary functions, and extensive documentation including the vignette by Woźniak (2024) <doi:10.48550/arXiv.2410.15090> complement all this. The implemented techniques align closely with those presented in Lütkepohl, Shang, Uzeda, & Woźniak (2024) <doi:10.48550/arXiv.2404.11057>, Lütkepohl & Woźniak (2020) <doi:10.1016/j.jedc.2020.103862>, and Song & Woźniak (2021) <doi:10.1093/acrefore/9780190625979.013.174>. The 'bsvars' package is aligned regarding objects, workflows, and code structure with the R package 'bsvarSIGNs' by Wang & Woźniak (2024) <doi:10.32614/CRAN.package.bsvarSIGNs>, and they constitute an integrated toolset.
Maintained by Tomasz Woźniak. Last updated 1 months ago.
bayesian-inferenceeconometricsvector-autoregressionopenblascppopenmp
23.5 match 46 stars 7.67 score 32 scripts 1 dependentstidyverse
forcats:Tools for Working with Categorical Variables (Factors)
Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').
Maintained by Hadley Wickham. Last updated 1 years ago.
9.5 match 555 stars 18.77 score 21k scripts 1.2k dependentsbioc
sva:Surrogate Variable Analysis
The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).
Maintained by Jeffrey T. Leek. Last updated 5 months ago.
immunooncologymicroarraystatisticalmethodpreprocessingmultiplecomparisonsequencingrnaseqbatcheffectnormalization
17.8 match 10.05 score 3.2k scripts 50 dependentsoverton-group
eHDPrep:Quality Control and Semantic Enrichment of Datasets
A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) <doi:10.1093/gigascience/giad030>). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.
Maintained by Ian Overton. Last updated 2 years ago.
data-qualityhealth-informaticssemantic-enrichment
36.0 match 8 stars 4.90 score 10 scriptsr-forge
car:Companion to Applied Regression
Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.
Maintained by John Fox. Last updated 5 months ago.
11.5 match 15.29 score 43k scripts 901 dependentsindrajeetpatil
ggstatsplot:'ggplot2' Based Plots with Statistical Details
Extension of 'ggplot2', 'ggstatsplot' creates graphics with details from statistical tests included in the plots themselves. It provides an easier syntax to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms, dot plots, dot-and-whisker plots) or categorical (pie and bar charts) data. Currently, it supports the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian versions of t-test/ANOVA, correlation analyses, contingency table analysis, meta-analysis, and regression analyses. References: Patil (2021) <doi:10.21105/joss.03236>.
Maintained by Indrajeet Patil. Last updated 20 days ago.
bayes-factorsdatasciencedatavizeffect-sizeggplot-extensionhypothesis-testingnon-parametric-statisticsregression-modelsstatistical-analysis
12.1 match 2.1k stars 14.49 score 3.0k scripts 1 dependentsstatisticsnorway
SSBtools:Algorithms and Tools for Tabular Statistics and Hierarchical Computations
Includes general data manipulation functions, algorithms for statistical disclosure control (Langsrud, 2024) <doi:10.1007/978-3-031-69651-0_6> and functions for hierarchical computations by sparse model matrices (Langsrud, 2023) <doi:10.32614/RJ-2023-088>.
Maintained by Øyvind Langsrud. Last updated 2 days ago.
23.0 match 7 stars 7.62 score 68 scripts 7 dependentsalexanderrobitzsch
miceadds:Some Additional Multiple Imputation Functions, Especially for 'mice'
Contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).
Maintained by Alexander Robitzsch. Last updated 16 days ago.
missing-datamultiple-imputationopenblascpp
19.1 match 16 stars 9.16 score 542 scripts 9 dependentscardiomoon
autoReg:Automatic Linear and Logistic Regression and Survival Analysis
Make summary tables for descriptive statistics and select explanatory variables automatically in various regression models. Support linear models, generalized linear models and cox-proportional hazard models. Generate publication-ready tables summarizing result of regression analysis and plots. The tables and plots can be exported in "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents.
Maintained by Keon-Woong Moon. Last updated 1 years ago.
25.0 match 47 stars 7.00 score 69 scriptsedwinkipruto
mfp2:Multivariable Fractional Polynomial Models with Extensions
Multivariable fractional polynomial algorithm simultaneously selects variables and functional forms in both generalized linear models and Cox proportional hazard models. Key references are Royston and Altman (1994) <doi:10.2307/2986270> and Royston and Sauerbrei (2008, ISBN:978-0-470-02842-1). In addition, it can model a sigmoid relationship between variable x and an outcome variable y using the approximate cumulative distribution transformation proposed by Royston (2014) <doi:10.1177/1536867X1401400206>. This feature distinguishes it from a standard fractional polynomial function, which lacks the ability to achieve such modeling.
Maintained by Edwin Kipruto. Last updated 10 months ago.
33.2 match 3 stars 5.26 score 4 scripts 2 dependentsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 6 days ago.
21.9 match 12 stars 7.92 score 46 scriptstiledb-inc
tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays
The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.
Maintained by Isaiah Norton. Last updated 4 days ago.
arrayhdfss3storage-managertiledbcpp
14.4 match 107 stars 11.96 score 306 scripts 4 dependentsuupharmacometrics
xpose:Diagnostics for Pharmacometric Models
Diagnostics for non-linear mixed-effects (population) models from 'NONMEM' <https://www.iconplc.com/solutions/technologies/nonmem/>. 'xpose' facilitates data import, creation of numerical run summary and provide 'ggplot2'-based graphics for data exploration and model diagnostics.
Maintained by Benjamin Guiastrennec. Last updated 2 months ago.
diagnosticsggplot2nonmempharmacometricsxpose
15.7 match 62 stars 11.02 score 183 scripts 6 dependentszeileis
ivreg:Instrumental-Variables Regression by '2SLS', '2SM', or '2SMM', with Diagnostics
Instrumental variable estimation for linear models by two-stage least-squares (2SLS) regression or by robust-regression via M-estimation (2SM) or MM-estimation (2SMM). The main ivreg() model-fitting function is designed to provide a workflow as similar as possible to standard lm() regression. A wide range of methods is provided for fitted ivreg model objects, including extensive functionality for computing and graphing regression diagnostics in addition to other standard model tools.
Maintained by Achim Zeileis. Last updated 2 months ago.
instrumental-variablesregression-diagnosticstwo-stage-least-squares-regression
16.8 match 20 stars 10.24 score 360 scripts 4 dependentsluckinet
tabshiftr:Reshape Disorganised Messy Data
Helps the user to build and register schema descriptions of disorganised (messy) tables. Disorganised tables are tables that are not in a topologically coherent form, where packages such as 'tidyr' could be used for reshaping. The schema description documents the arrangement of input tables and is used to reshape them into a standardised (tidy) output format.
Maintained by Steffen Ehrmann. Last updated 30 days ago.
data-managementdata-reshapingschemas
23.7 match 6 stars 7.18 score 62 scripts 1 dependentspik-piam
remind2:The REMIND R package (2nd generation)
Contains the REMIND-specific routines for data and model output manipulation.
Maintained by Renato Rodrigues. Last updated 6 days ago.
19.1 match 8.88 score 161 scripts 5 dependentsnixtla
nixtlar:A Software Development Kit for 'Nixtla''s 'TimeGPT'
A Software Development Kit for working with 'Nixtla''s 'TimeGPT', a foundation model for time series forecasting. 'API' is an acronym for 'application programming interface'; this package allows users to interact with 'TimeGPT' via the 'API'. You can set and validate 'API' keys and generate forecasts via 'API' calls. It is compatible with 'tsibble' and base R. For more details visit <https://docs.nixtla.io/>.
Maintained by Mariana Menchero. Last updated 28 days ago.
20.7 match 30 stars 8.16 score 38 scriptsnbarrowman
vtree:Display Information About Nested Subsets of a Data Frame
A tool for calculating and drawing "variable trees". Variable trees display information about nested subsets of a data frame.
Maintained by Nick Barrowman. Last updated 1 days ago.
data-sciencedata-visualizationexploratory-data-analysisstatistics
23.7 match 76 stars 7.09 score 65 scriptshrbrmstr
hrbrthemes:Additional Themes, Theme Components and Utilities for 'ggplot2'
A compilation of extra 'ggplot2' themes, scales and utilities, including a spell check function for plot label fields and an overall emphasis on typography. A copy of the 'Google' font 'Roboto Condensed' is also included.
Maintained by Bob Rudis. Last updated 2 days ago.
data-visualizationdatavisualizationggplot-extensionggplot2ggplot2-scalesggplot2-themesvisualization
12.0 match 1.3k stars 13.92 score 13k scripts 15 dependentsstan-dev
rstanarm:Bayesian Applied Regression Modeling via Stan
Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.
Maintained by Ben Goodrich. Last updated 9 months ago.
bayesianbayesian-data-analysisbayesian-inferencebayesian-methodsbayesian-statisticsmultilevel-modelsrstanrstanarmstanstatistical-modelingcpp
10.7 match 393 stars 15.68 score 5.0k scripts 13 dependentsbrian-j-smith
MachineShop:Machine Learning Models and Tools
Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.
Maintained by Brian J Smith. Last updated 7 months ago.
classification-modelsmachine-learningpredictive-modelingregression-modelssurvival-models
20.8 match 61 stars 7.95 score 121 scriptsamices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 6 days ago.
chained-equationsfcsimputationmicemissing-datamissing-valuesmultiple-imputationmultivariate-datacpp
10.0 match 462 stars 16.50 score 10k scripts 154 dependentsstatistikat
VIM:Visualization and Imputation of Missing Values
New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
Maintained by Matthias Templ. Last updated 7 months ago.
hotdeckimputation-methodsmodel-predictionsvisualizationcpp
11.3 match 85 stars 14.44 score 2.6k scripts 19 dependentsmodeloriented
DALEX:moDel Agnostic Language for Exploration and eXplanation
Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.
Maintained by Przemyslaw Biecek. Last updated 1 months ago.
black-boxdalexdata-scienceexplainable-aiexplainable-artificial-intelligenceexplainable-mlexplanationsexplanatory-model-analysisfairnessimlinterpretabilityinterpretable-machine-learningmachine-learningmodel-visualizationpredictive-modelingresponsible-airesponsible-mlxai
12.2 match 1.4k stars 13.40 score 876 scripts 21 dependentsajsims1704
rdecision:Decision Analytic Modelling in Health Economics
Classes and functions for modelling health care interventions using decision trees and semi-Markov models. Mechanisms are provided for associating an uncertainty distribution with each source variable and for ensuring transparency of the mathematical relationships between variables. The package terminology follows Briggs "Decision Modelling for Health Economic Evaluation" (2006, ISBN:978-0-19-852662-9).
Maintained by Andrew Sims. Last updated 1 months ago.
25.2 match 3 stars 6.46 score 22 scriptsmicrosoft
wpa:Tools for Analysing and Visualising Viva Insights Data
Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.
Maintained by Martin Chan. Last updated 4 months ago.
24.2 match 30 stars 6.69 score 39 scripts 1 dependentsgergness
srvyr:'dplyr'-Like Syntax for Summary Statistics of Survey Data
Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.
Maintained by Greg Freedman Ellis. Last updated 1 months ago.
11.6 match 215 stars 13.88 score 1.8k scripts 15 dependentssatijalab
Seurat:Tools for Single Cell Genomics
A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
human-cell-atlassingle-cell-genomicssingle-cell-rna-seqcpp
9.4 match 2.4k stars 16.86 score 50k scripts 73 dependentsmaelstrom-research
madshapR:Support Technical Processes Following 'Maelstrom Research' Standards
Functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).
Maintained by Guillaume Fabre. Last updated 11 months ago.
29.4 match 2 stars 5.40 score 28 scripts 3 dependentspauljohn32
kutils:Project Management Tools
Tools for data importation, recoding, and inspection. There are functions to create new project folders, R code templates, create uniquely named output directories, and to quickly obtain a visual summary for each variable in a data frame. The main feature here is the systematic implementation of the "variable key" framework for data importation and recoding. We are eager to have community feedback about the variable key and the vignette about it. In version 1.7, the function 'semTable' is removed. It was deprecated since 1.67. That is provided in a separate package, 'semTable'.
Maintained by Paul Johnson. Last updated 1 years ago.
27.0 match 5.85 score 110 scripts 20 dependentsinsightsengineering
chevron:Standard TLGs for Clinical Trials Reporting
Provide standard tables, listings, and graphs (TLGs) libraries used in clinical trials. This package implements a structure to reformat the data with 'dunlin', create reporting tables using 'rtables' and 'tern' with standardized input arguments to enable quick generation of standard outputs. In addition, it also provides comprehensive data checks and script generation functionality.
Maintained by Joe Zhu. Last updated 24 days ago.
clinical-trialsgraphslistingsnestreportingtables
19.0 match 12 stars 8.24 score 12 scriptsnicolas-robette
descriptio:Descriptive Statistical Analysis
Description of statistical associations between variables : measures of local and global association between variables (phi, Cramér V, correlations, eta-squared, Goodman and Kruskal tau, permutation tests, etc.), multiple graphical representations of the associations between variables (using 'ggplot2') and weighted statistics.
Maintained by Nicolas Robette. Last updated 6 months ago.
31.0 match 4 stars 5.00 score 11 scripts 3 dependentsinsightsengineering
teal.modules.general:General Modules for 'teal' Applications
Prebuilt 'shiny' modules containing tools for viewing data, visualizing data, understanding missing and outlier values within your data and performing simple data analysis. This extends 'teal' framework that supports reproducible research and analysis.
Maintained by Dawid Kaledkowski. Last updated 16 days ago.
general-purposemodulesnestshiny
15.9 match 12 stars 9.76 score 71 scriptspik-piam
piamInterfaces:Project specific interfaces to REMIND / MAgPIE
Project specific interfaces to REMIND / MAgPIE.
Maintained by Falk Benke. Last updated 2 days ago.
23.0 match 6.63 score 38 scripts 7 dependentscalcita
ech:Downloading and Processing Microdata from ECH-INE (Uruguay)
A consistent tool for downloading ECH data, processing them and generating new indicators: poverty, education, employment, etc. All data are downloaded from the official site of the National Institute of Statistics at <https://www.gub.uy/instituto-nacional-estadistica/datos-y-estadisticas/encuestas/encuesta-continua-hogares>.
Maintained by Gabriela Mathieu. Last updated 1 years ago.
24.7 match 16 stars 6.15 score 22 scriptsjtextor
dagitty:Graphical Analysis of Structural Causal Models
A port of the web-based software 'DAGitty', available at <https://dagitty.net>, for analyzing structural causal models (also known as directed acyclic graphs or DAGs). This package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation.
Maintained by Johannes Textor. Last updated 3 months ago.
11.8 match 302 stars 12.83 score 1.7k scripts 11 dependentstidyverse
tidyr:Tidy Messy Data
Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).
Maintained by Hadley Wickham. Last updated 13 days ago.
6.6 match 1.4k stars 22.88 score 168k scripts 5.5k dependentspik-piam
mrremind:MadRat REMIND Input Data Package
The mrremind packages contains data preprocessing for the REMIND model.
Maintained by Lavinia Baumstark. Last updated 3 days ago.
24.1 match 4 stars 6.25 score 15 scripts 1 dependentsnathaneastwood
poorman:A Poor Man's Dependency Free Recreation of 'dplyr'
A replication of key functionality from 'dplyr' and the wider 'tidyverse' using only 'base'.
Maintained by Nathan Eastwood. Last updated 1 years ago.
base-rdata-manipulationgrammar
13.9 match 341 stars 10.79 score 156 scripts 27 dependentsmjlajeunesse
switchboard:An Agile Widget Engine for Real-Time, Dynamic Visualizations
An unsorted collection of visualization widgets rendered in 'Tcl/Tk'<https://www.tcl.tk/> to generate agile dashboards for your iterative simulations. Widgets include progress bars, counters, eavesdroppers, injectors, switches, and sliders for dynamic manipulation and visualization of simulation parameters.
Maintained by Marc J. Lajeunesse. Last updated 3 years ago.
30.1 match 18 stars 4.95 score 2 scriptswelch-lab
rliger:Linked Inference of Genomic Experimental Relationships
Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.
Maintained by Yichen Wang. Last updated 2 months ago.
nonnegative-matrix-factorizationsingle-cellopenblascpp
13.8 match 408 stars 10.77 score 334 scripts 1 dependentschristophergandrud
DataCombine:Tools for Easily Combining and Cleaning Data Sets
Tools for combining and cleaning data sets, particularly with grouped and time series data. This includes functions for merging data while reporting duplicates, filling in columns with values of a column in another data frame, and creating continuous time data for interupted time series.
Maintained by Christopher Gandrud. Last updated 5 years ago.
17.4 match 55 stars 8.50 score 864 scripts 3 dependentshannameyer
CAST:'caret' Applications for Spatial-Temporal Models
Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.
Maintained by Hanna Meyer. Last updated 2 months ago.
autocorrelationcaretfeature-selectionmachine-learningoverfittingpredictive-modelingspatialspatio-temporalvariable-selection
12.3 match 114 stars 11.97 score 298 scripts 1 dependentshusson
FactoMineR:Multivariate Exploratory Data Analysis and Data Mining
Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017).
Maintained by Francois Husson. Last updated 3 months ago.
10.0 match 47 stars 14.71 score 5.6k scripts 112 dependentslebebr01
simglm:Simulate Models Based on the Generalized Linear Model
Simulates regression models, including both simple regression and generalized linear mixed models with up to three level of nesting. Power simulations that are flexible allowing the specification of missing data, unbalanced designs, and different random error distributions are built into the package.
Maintained by Brandon LeBeau. Last updated 10 months ago.
18.7 match 43 stars 7.87 score 87 scriptsjokergoo
GetoptLong:Parsing Command-Line Arguments and Simple Variable Interpolation
This is a command-line argument parser which wraps the powerful Perl module Getopt::Long and with some adaptations for easier use in R. It also provides a simple way for variable interpolation in R.
Maintained by Zuguang Gu. Last updated 2 years ago.
13.3 match 17 stars 10.98 score 478 scripts 155 dependentskogalur
randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.
Maintained by Udaya B. Kogalur. Last updated 2 months ago.
18.3 match 10 stars 7.90 score 1.2k scripts 12 dependentsdirkschumacher
ompr:Model and Solve Mixed Integer Linear Programs
Model mixed integer linear programs in an algebraic way directly in R. The model is solver-independent and thus offers the possibility to solve a model with different solvers. It currently only supports linear constraints and objective functions. See the 'ompr' website <https://dirkschumacher.github.io/ompr/> for more information, documentation and examples.
Maintained by Dirk Schumacher. Last updated 2 years ago.
integer-programminglinear-programmingmilpmipoptimization
17.4 match 268 stars 8.33 score 321 scripts 6 dependentsdgrun
RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data
Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).
Maintained by Dominic Grün. Last updated 4 months ago.
30.3 match 4.74 score 110 scriptsddsjoberg
gtsummary:Presentation-Ready Data Summary and Analytic Result Tables
Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.
Maintained by Daniel D. Sjoberg. Last updated 2 days ago.
easy-to-usegthtml5regression-modelsreproducibilityreproducible-researchstatisticssummary-statisticssummary-tablestable1tableone
8.4 match 1.1k stars 17.00 score 8.2k scripts 15 dependentspecanproject
PEcAn.utils:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Rob Kooper. Last updated 2 days ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
12.9 match 216 stars 10.92 score 218 scripts 35 dependentsjamiemkass
ENMeval:Automated Tuning and Evaluations of Ecological Niche Models
Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.
Maintained by Jamie M. Kass. Last updated 2 months ago.
12.6 match 49 stars 11.25 score 332 scripts 2 dependentsquanteda
quanteda:Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Maintained by Kenneth Benoit. Last updated 2 months ago.
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
8.4 match 851 stars 16.68 score 5.4k scripts 51 dependentslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 24 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
14.2 match 233 stars 9.84 score 185 scripts 1 dependentstidyverts
tsibble:Tidy Temporal Data Frames and Tools
Provides a 'tbl_ts' class (the 'tsibble') for temporal data in an data- and model-oriented format. The 'tsibble' provides tools to easily manipulate and analyse temporal data, such as filling in time gaps and aggregating over calendar periods.
Maintained by Earo Wang. Last updated 2 months ago.
9.6 match 538 stars 14.47 score 4.4k scripts 42 dependentsevolecolgroup
pastclim:Manipulate Time Series of Climate Reconstructions
Methods to easily extract and manipulate climate reconstructions for ecological and anthropological analyses, as described in Leonardi et al. (2023) <doi:10.1111/ecog.06481>. The package includes datasets of palaeoclimate reconstructions, present observations, and future projections from multiple climate models.
Maintained by Andrea Manica. Last updated 3 days ago.
climate-datapaleoclimatespecies-distribution-modelling
17.0 match 38 stars 8.12 score 49 scriptspaul-buerkner
brms:Bayesian Regression Models using 'Stan'
Fit Bayesian generalized (non-)linear multivariate multilevel models using 'Stan' for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include both theory-driven and data-driven non-linear terms, auto-correlation structures, censoring and truncation, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their prior knowledge. Models can easily be evaluated and compared using several methods assessing posterior or prior predictions. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>; Bürkner (2018) <doi:10.32614/RJ-2018-017>; Bürkner (2021) <doi:10.18637/jss.v100.i05>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.
Maintained by Paul-Christian Bürkner. Last updated 3 days ago.
bayesian-inferencebrmsmultilevel-modelsstanstatistical-models
8.3 match 1.3k stars 16.61 score 13k scripts 34 dependentscjvanlissa
tidySEM:Tidy Structural Equation Modeling
A tidy workflow for generating, estimating, reporting, and plotting structural equation models using 'lavaan', 'OpenMx', or 'Mplus'. Throughout this workflow, elements of syntax, results, and graphs are represented as 'tidy' data, making them easy to customize. Includes functionality to estimate latent class analyses, and to plot 'dagitty' and 'igraph' objects.
Maintained by Caspar J. van Lissa. Last updated 7 days ago.
12.9 match 58 stars 10.69 score 330 scripts 1 dependentsalbertofranzin
bnstruct:Bayesian Network Structure Learning from Data with Missing Values
Bayesian Network Structure Learning from Data with Missing Values. The package implements the Silander-Myllymaki complete search, the Max-Min Parents-and-Children, the Hill-Climbing, the Max-Min Hill-climbing heuristic searches, and the Structural Expectation-Maximization algorithm. Available scoring functions are BDeu, AIC, BIC. The package also implements methods for generating and using bootstrap samples, imputed data, inference.
Maintained by Alberto Franzin. Last updated 1 years ago.
25.5 match 1 stars 5.40 score 111 scripts 3 dependentsstatistikat
simPop:Simulation of Complex Synthetic Data Information
Tools and methods to simulate populations for surveys based on auxiliary data. The tools include model-based methods, calibration and combinatorial optimization algorithms, see Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v079.i10>) and Templ (2017) <doi:10.1007/978-3-319-50272-4>. The package was developed with support of the International Household Survey Network, DFID Trust Fund TF011722 and funds from the World bank.
Maintained by Matthias Templ. Last updated 4 months ago.
21.1 match 31 stars 6.51 score 104 scriptscardiomoon
ggiraphExtra:Make Interactive 'ggplot2'. Extension to 'ggplot2' and 'ggiraph'
Collection of functions to enhance 'ggplot2' and 'ggiraph'. Provides functions for exploratory plots. All plot can be a 'static' plot or an 'interactive' plot using 'ggiraph'.
Maintained by Keon-Woong Moon. Last updated 4 years ago.
15.4 match 48 stars 8.93 score 402 scripts 3 dependentsmlverse
torch:Tensors and Neural Networks with 'GPU' Acceleration
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
Maintained by Daniel Falbel. Last updated 6 days ago.
8.3 match 520 stars 16.52 score 1.4k scripts 38 dependentsdankelley
oce:Analysis of Oceanographic Data
Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.
Maintained by Dan Kelley. Last updated 1 days ago.
8.8 match 146 stars 15.42 score 4.2k scripts 18 dependentsropensci
rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data
Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.
Maintained by OJ Watson. Last updated 17 days ago.
datasetdhsdhs-apiextractpeer-reviewedsurvey-data
13.5 match 35 stars 10.07 score 286 scripts 3 dependentsjacobseedorff21
BranchGLM:Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms
Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.
Maintained by Jacob Seedorff. Last updated 6 months ago.
generalized-linear-modelsregressionstatisticssubset-selectionvariable-selectionopenblascppopenmp
21.9 match 7 stars 6.20 score 30 scriptsmodeloriented
iBreakDown:Model Agnostic Instance Level Variable Attributions
Model agnostic tool for decomposition of predictions from black boxes. Supports additive attributions and attributions with interactions. The Break Down Table shows contributions of every variable to a final prediction. The Break Down Plot presents variable contributions in a concise graphical way. This package works for classification and regression models. It is an extension of the 'breakDown' package (Staniak and Biecek 2018) <doi:10.32614/RJ-2018-072>, with new and faster strategies for orderings. It supports interactions in explanations and has interactive visuals (implemented with 'D3.js' library). The methodology behind is described in the 'iBreakDown' article (Gosiewska and Biecek 2019) <arXiv:1903.11420> This package is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.
Maintained by Przemyslaw Biecek. Last updated 1 years ago.
breakdownimlinterpretabilityshapleyxai
13.4 match 84 stars 10.07 score 56 scripts 22 dependentsmrcieu
OneSampleMR:One Sample Mendelian Randomization and Instrumental Variable Analyses
Useful functions for one-sample (individual level data) Mendelian randomization and instrumental variable analyses. The package includes implementations of; the Sanderson and Windmeijer (2016) <doi:10.1016/j.jeconom.2015.06.004> conditional F-statistic, the multiplicative structural mean model Hernán and Robins (2006) <doi:10.1097/01.ede.0000222409.00878.37>, and two-stage predictor substitution and two-stage residual inclusion estimators explained by Terza et al. (2008) <doi:10.1016/j.jhealeco.2007.09.009>.
Maintained by Tom Palmer. Last updated 21 days ago.
instrumental-variableinstrumental-variablesmendelian-randomisationmendelian-randomizationmendelianrandomisationmendelianrandomization
20.1 match 19 stars 6.69 score 16 scriptsamerican-institutes-for-research
EdSurvey:Analysis of NCES Education Survey and Assessment Data
Read in and analyze functions for education survey and assessment data from the National Center for Education Statistics (NCES) <https://nces.ed.gov/>, including National Assessment of Educational Progress (NAEP) data <https://nces.ed.gov/nationsreportcard/> and data from the International Assessment Database: Organisation for Economic Co-operation and Development (OECD) <https://www.oecd.org/en/about/directorates/directorate-for-education-and-skills.html>, including Programme for International Student Assessment (PISA), Teaching and Learning International Survey (TALIS), Programme for the International Assessment of Adult Competencies (PIAAC), and International Association for the Evaluation of Educational Achievement (IEA) <https://www.iea.nl/>, including Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, Progress in International Reading Literacy Study (PIRLS), International Civic and Citizenship Study (ICCS), International Computer and Information Literacy Study (ICILS), and Civic Education Study (CivEd).
Maintained by Paul Bailey. Last updated 16 days ago.
17.0 match 10 stars 7.86 score 139 scripts 1 dependentsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
18.8 match 145 stars 7.09 score 50 scripts 2 dependentsbioc
singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data
The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.
Maintained by Joshua David Campbell. Last updated 24 days ago.
singlecellgeneexpressiondifferentialexpressionalignmentclusteringimmunooncologybatcheffectnormalizationqualitycontroldataimportgui
13.1 match 181 stars 10.16 score 252 scriptsusdaforestservice
FIESTAutils:Utility Functions for Forest Inventory Estimation and Analysis
A set of tools for data wrangling, spatial data analysis, statistical modeling (including direct, model-assisted, photo-based, and small area tools), and USDA Forest Service data base tools. These tools are aimed to help Foresters, Analysts, and Scientists extract and perform analyses on USDA Forest Service data.
Maintained by Grayson White. Last updated 2 days ago.
21.0 match 8 stars 6.33 score 1 dependentsbioc
clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters
Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.
Maintained by Rui Fu. Last updated 5 months ago.
singlecellannotationsequencingmicroarraygeneexpressionassign-identitiesclustersmarker-genesrna-seqsingle-cell-rna-seq
13.8 match 119 stars 9.63 score 296 scriptsjulienvollering
MIAmaxent:A Modular, Integrated Approach to Maximum Entropy Distribution Modeling
Tools for training, selecting, and evaluating maximum entropy (and standard logistic regression) distribution models. This package provides tools for user-controlled transformation of explanatory variables, selection of variables by nested model comparison, and flexible model evaluation and projection. It follows principles based on the maximum- likelihood interpretation of maximum entropy modeling, and uses infinitely- weighted logistic regression for model fitting. The package is described in Vollering et al. (2019; <doi:10.1002/ece3.5654>).
Maintained by Julien Vollering. Last updated 7 months ago.
20.3 match 14 stars 6.53 score 30 scriptstlverse
tmle3:The Extensible TMLE Framework
A general framework supporting the implementation of targeted maximum likelihood estimators (TMLEs) of a diverse range of statistical target parameters through a unified interface. The goal is that the exposed framework be as general as the mathematical framework upon which it draws.
Maintained by Jeremy Coyle. Last updated 4 months ago.
causal-inferencemachine-learningtargeted-learningvariable-importance
16.7 match 38 stars 7.91 score 286 scripts 5 dependentsnliulab
AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator
A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.
Maintained by Feng Xie. Last updated 15 days ago.
17.1 match 32 stars 7.70 score 30 scriptscytomining
cytominer:Methods for Image-Based Cell Profiling
`cytominer` is a suite of common functions used to process high-dimensional readouts from image-based cell profiling experiments.
Maintained by Shantanu Singh. Last updated 2 years ago.
19.2 match 50 stars 6.89 score 44 scriptscdalzell
Lahman:Sean 'Lahman' Baseball Database
Provides the tables from the 'Sean Lahman Baseball Database' as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2023, as recorded in the 2024 version of the database. Documentation examples show how many baseball questions can be investigated.
Maintained by Chris Dalzell. Last updated 4 months ago.
11.0 match 79 stars 11.98 score 1.7k scripts 2 dependentsbioc
phyloseq:Handling and analysis of high-throughput microbiome census data
phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.
Maintained by Paul J. McMurdie. Last updated 5 months ago.
immunooncologysequencingmicrobiomemetagenomicsclusteringclassificationmultiplecomparisongeneticvariability
9.5 match 597 stars 13.90 score 8.4k scripts 37 dependentsgreta-dev
greta:Simple and Scalable Statistical Modelling in R
Write statistical models in R and fit them by MCMC and optimisation on CPUs and GPUs, using Google 'TensorFlow'. greta lets you write your own model like in BUGS, JAGS and Stan, except that you write models right in R, it scales well to massive datasets, and it’s easy to extend and build on. See the website for more information, including tutorials, examples, package documentation, and the greta forum.
Maintained by Nicholas Tierney. Last updated 6 days ago.
10.4 match 566 stars 12.53 score 396 scripts 6 dependentsatorus-research
xportr:Utilities to Output CDISC SDTM/ADaM XPT Files
Tools to build CDISC compliant data sets and check for CDISC compliance.
Maintained by Eli Miller. Last updated 3 months ago.
14.4 match 43 stars 9.01 score 102 scriptsmodeloriented
survex:Explainable Machine Learning in Survival Analysis
Survival analysis models are commonly used in medicine and other areas. Many of them are too complex to be interpreted by human. Exploration and explanation is needed, but standard methods do not give a broad enough picture. 'survex' provides easy-to-apply methods for explaining survival models, both complex black-boxes and simpler statistical models. They include methods specific to survival analysis such as SurvSHAP(t) introduced in Krzyzinski et al., (2023) <doi:10.1016/j.knosys.2022.110234>, SurvLIME described in Kovalev et al., (2020) <doi:10.1016/j.knosys.2020.106164> as well as extensions of existing ones described in Biecek et al., (2021) <doi:10.1201/9780429027192>.
Maintained by Mikołaj Spytek. Last updated 9 months ago.
biostatisticsbrier-scorescensored-datacox-modelcox-regressionexplainable-aiexplainable-machine-learningexplainable-mlexplanatory-model-analysisinterpretable-machine-learninginterpretable-mlmachine-learningprobabilistic-machine-learningshapsurvival-analysistime-to-eventvariable-importancexai
15.4 match 110 stars 8.40 score 114 scriptsmicrosoft
vivainsights:Analyze and Visualize Data from 'Microsoft Viva Insights'
Provides a versatile range of functions, including exploratory data analysis, time-series analysis, organizational network analysis, and data validation, whilst at the same time implements a set of best practices in analyzing and visualizing data specific to 'Microsoft Viva Insights'.
Maintained by Martin Chan. Last updated 23 days ago.
21.1 match 11 stars 6.12 score 68 scriptspik-piam
quitte:Bits and pieces of code to use with quitte-style data frames
A collection of functions for easily dealing with quitte-style data frames, doing multi-model comparisons and plots.
Maintained by Michaja Pehl. Last updated 2 days ago.
15.7 match 8.22 score 184 scripts 35 dependentsbdwilliamson
flevr:Flexible, Ensemble-Based Variable Selection with Potentially Missing Data
Perform variable selection in settings with possibly missing data based on extrinsic (algorithm-specific) and intrinsic (population-level) variable importance. Uses a Super Learner ensemble to estimate the underlying prediction functions that give rise to estimates of variable importance. For more information about the methods, please see Williamson and Huang (2023+) <arXiv:2202.12989>.
Maintained by Brian D. Williamson. Last updated 1 years ago.
26.2 match 5 stars 4.88 score 2 scriptsopenanalytics
clinDataReview:Clinical Data Review Tool
Creation of interactive tables, listings and figures ('TLFs') and associated report for exploratory analysis of data in a clinical trial, e.g. for clinical oversight activities. Interactive figures include sunburst, treemap, scatterplot, line plot and barplot of counts data. Interactive tables include table of summary statistics (as counts of adverse events, enrollment table) and listings. Possibility to compare data (summary table or listing) across two data batches/sets. A clinical data review report is created via study-specific configuration files and template 'R Markdown' reports contained in the package.
Maintained by Laure Cougnaud. Last updated 9 months ago.
17.9 match 11 stars 7.10 score 36 scriptsusdaforestservice
FIESTA:Forest Inventory Estimation and Analysis
A research estimation tool for analysts that work with sample-based inventory data from the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program.
Maintained by Grayson White. Last updated 3 days ago.
17.6 match 30 stars 7.24 score 62 scriptspvanlaake
ncdfCF:Easy Access to NetCDF Files with CF Metadata Conventions
Network Common Data Form ('netCDF') files are widely used for scientific data. Library-level access in R is provided through packages 'RNetCDF' and 'ncdf4'. Package 'ncdfCF' is built on top of 'RNetCDF' and makes the data and its attributes available as a set of R6 classes that are informed by the Climate and Forecasting Metadata Conventions. Access to the data uses standard R subsetting operators and common function forms.
Maintained by Patrick Van Laake. Last updated 2 days ago.
23.5 match 5.41 score 4 scriptsrapler
dst:Using the Theory of Belief Functions
Using the Theory of Belief Functions for evidence calculus. Basic probability assignments, or mass functions, can be defined on the subsets of a set of possible values and combined. A mass function can be extended to a larger frame. Marginalization, i.e. reduction to a smaller frame can also be done. These features can be combined to analyze small belief networks and take into account situations where information cannot be satisfactorily described by probability distributions.
Maintained by Peiyuan Zhu. Last updated 3 months ago.
21.3 match 6 stars 5.96 score 126 scriptsimbi-heidelberg
DescrTab2:Publication Quality Descriptive Statistics Tables
Provides functions to create descriptive statistics tables for continuous and categorical variables. By default, summary statistics such as mean, standard deviation, quantiles, minimum and maximum for continuous variables and relative and absolute frequencies for categorical variables are calculated. 'DescrTab2' features a sophisticated algorithm to choose appropriate test statistics for your data and provides p-values. On top of this, confidence intervals for group differences of appropriated summary measures are automatically produces for two-group comparison. Tables generated by 'DescrTab2' can be integrated in a variety of document formats, including .html, .tex and .docx documents. 'DescrTab2' also allows printing tables to console and saving table objects for later use.
Maintained by Jan Meis. Last updated 1 years ago.
categorical-variablescontinuous-variabledescriptive-statisticsp-valuesstatistical-testsstatistics
18.8 match 9 stars 6.71 score 19 scripts 1 dependentskurthornik
tseries:Time Series Analysis and Computational Finance
Time series analysis and computational finance.
Maintained by Kurt Hornik. Last updated 6 months ago.
11.3 match 4 stars 11.22 score 10k scripts 289 dependentsbioc
gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files
Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.
Maintained by Xiuwen Zheng. Last updated 2 days ago.
infrastructuredataimportbioinformaticsgds-formatgenomicscpp
11.0 match 18 stars 11.34 score 920 scripts 29 dependentsropensci
aorsf:Accelerated Oblique Random Forests
Fit, interpret, and compute predictions with oblique random forests. Includes support for partial dependence, variable importance, passing customized functions for variable importance and identification of linear combinations of features. Methods for the oblique random survival forest are described in Jaeger et al., (2023) <DOI:10.1080/10618600.2023.2231048>.
Maintained by Byron Jaeger. Last updated 4 days ago.
data-scienceobliquerandom-forestsurvivalopenblascppopenmp
13.5 match 58 stars 9.21 score 60 scripts 1 dependents