Showing 200 of total 871 results (show query)
winvector
vtreat:A Statistically Sound 'data.frame' Processor/Conditioner
A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.
Maintained by John Mount. Last updated 2 months ago.
categorical-variablesmachine-learning-algorithmsnested-modelsprepare-data
23.4 match 285 stars 11.19 score 328 scripts 1 dependentstidymodels
infer:Tidy Statistical Inference
The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.
Maintained by Simon Couch. Last updated 6 months ago.
14.0 match 734 stars 15.69 score 3.5k scripts 17 dependentscrunch-io
crunch:Crunch.io Data Tools
The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.
Maintained by Greg Freedman Ellis. Last updated 11 days ago.
19.9 match 9 stars 10.53 score 200 scripts 2 dependentschoonghyunryu
dlookr:Tools for Data Diagnosis, Exploration, Transformation
A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Maintained by Choonghyun Ryu. Last updated 9 months ago.
16.4 match 212 stars 11.05 score 748 scripts 2 dependentsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 4 days ago.
12.4 match 845 stars 13.57 score 264 scripts 2 dependentskaz-yos
tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights
Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.
Maintained by Kazuki Yoshida. Last updated 3 years ago.
baseline-characteristicsdescriptive-statisticsstatistics
11.4 match 221 stars 13.55 score 2.3k scripts 12 dependentsdaya6489
SmartEDA:Summarize and Explore the Data
Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.
Maintained by Dayanand Ubrangala. Last updated 1 years ago.
analysisexploratory-data-analysis
21.2 match 42 stars 7.25 score 214 scriptslarmarange
broom.helpers:Helpers for Model Coefficients Tibbles
Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.
Maintained by Joseph Larmarange. Last updated 10 days ago.
12.5 match 22 stars 11.45 score 165 scripts 2 dependentsjeffreyracine
crs:Categorical Regression Splines
Regression splines that handle a mix of continuous and categorical (discrete) data often encountered in applied settings. I would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC, <https://www.nserc-crsng.gc.ca>), the Social Sciences and Humanities Research Council of Canada (SSHRC, <https://www.sshrc-crsh.gc.ca>), and the Shared Hierarchical Academic Research Computing Network (SHARCNET, <https://www.sharcnet.ca>). We would also like to acknowledge the contributions of the GNU GSL authors. In particular, we adapt the GNU GSL B-spline routine gsl_bspline.c adding automated support for quantile knots (in addition to uniform knots), providing missing functionality for derivatives, and for extending the splines beyond their endpoints.
Maintained by Jeffrey S. Racine. Last updated 3 months ago.
16.6 match 17 stars 8.30 score 90 scripts 1 dependentsalexpghayes
distributions3:Probability Distributions as S3 Objects
Tools to create and manipulate probability distributions using S3. Generics pdf(), cdf(), quantile(), and random() provide replacements for base R's d/p/q/r style functions. Functions and arguments have been named carefully to minimize confusion for students in intro stats courses. The documentation for each distribution contains detailed mathematical notes.
Maintained by Alex Hayes. Last updated 6 months ago.
12.1 match 102 stars 11.35 score 118 scripts 7 dependentsadamlilith
fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'
Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.
Maintained by Adam B. Smith. Last updated 19 days ago.
aspectdistancefragmentationfragmentation-indicesgisgrassgrass-gisrasterraster-projectionrasterizeslopetopographyvectorization
17.2 match 58 stars 7.69 score 8 scriptssfcheung
manymome:Mediation, Moderation and Moderated-Mediation After Model Fitting
Computes indirect effects, conditional effects, and conditional indirect effects in a structural equation model or path model after model fitting, with no need to define any user parameters or label any paths in the model syntax, using the approach presented in Cheung and Cheung (2024) <doi:10.3758/s13428-023-02224-z>. Can also form bootstrap confidence intervals by doing bootstrapping only once and reusing the bootstrap estimates in all subsequent computations. Supports bootstrap confidence intervals for standardized (partially or completely) indirect effects, conditional effects, and conditional indirect effects as described in Cheung (2009) <doi:10.3758/BRM.41.2.425> and Cheung, Cheung, Lau, Hui, and Vong (2022) <doi:10.1037/hea0001188>. Model fitting can be done by structural equation modeling using lavaan() or regression using lm().
Maintained by Shu Fai Cheung. Last updated 23 days ago.
bootstrappingconfidence-intervallavaanmanymomemediationmoderated-mediationmoderationregressionsemstandardized-effect-sizestructural-equation-modeling
16.3 match 1 stars 8.06 score 172 scripts 4 dependentsfriendly
vcdExtra:'vcd' Extensions and Additions
Provides additional data sets, methods and documentation to complement the 'vcd' package for Visualizing Categorical Data and the 'gnm' package for Generalized Nonlinear Models. In particular, 'vcdExtra' extends mosaic, assoc and sieve plots from 'vcd' to handle 'glm()' and 'gnm()' models and adds a 3D version in 'mosaic3d'. Additionally, methods are provided for comparing and visualizing lists of 'glm' and 'loglm' objects. This package is now a support package for the book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer.
Maintained by Michael Friendly. Last updated 5 months ago.
categorical-data-visualizationgeneralized-linear-modelsmosaic-plots
12.1 match 24 stars 10.34 score 472 scripts 3 dependentstidyverse
forcats:Tools for Working with Categorical Variables (Factors)
Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').
Maintained by Hadley Wickham. Last updated 1 years ago.
6.5 match 555 stars 18.77 score 21k scripts 1.2k dependentstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 6 days ago.
6.5 match 584 stars 18.71 score 7.2k scripts 380 dependentsgzt
catsim:Binary and Categorical Image Similarity Index
Computes a structural similarity metric (after the style of MS-SSIM for images) for binary and categorical 2D and 3D images. Can be based on accuracy (simple matching), Cohen's kappa, Rand index, adjusted Rand index, Jaccard index, Dice index, normalized mutual information, or adjusted mutual information. In addition, has fast computation of Cohen's kappa, the Rand indices, and the two mutual informations. Implements the methods of Thompson and Maitra (2020) <doi:10.48550/arXiv.2004.09073>.
Maintained by Geoffrey Thompson. Last updated 6 months ago.
binary-databinary-image-classificationbinary-image-processingcategorical-datacategorical-imagesclassificationimage-processingcpp
27.1 match 5 stars 4.40 score 5 scriptswinvector
sigr:Succinct and Correct Statistical Summaries for Reports
Succinctly and correctly format statistical summaries of various models and tests (F-test, Chi-Sq-test, Fisher-test, T-test, and rank-significance). This package also includes empirical tests, such as Monte Carlo and bootstrap distribution estimates.
Maintained by John Mount. Last updated 2 years ago.
15.6 match 28 stars 7.18 score 97 scripts 1 dependentspablo14
funModeling:Exploratory Data Analysis and Data Preparation Tool-Box
Around 10% of almost any predictive modeling project is spent in predictive modeling, 'funModeling' and the book Data Science Live Book (<https://livebook.datascienceheroes.com/>) are intended to cover remaining 90%: data preparation, profiling, selecting best variables 'dataViz', assessing model performance and other functions.
Maintained by Pablo Casas. Last updated 2 years ago.
12.8 match 100 stars 8.57 score 654 scriptsnicolas-robette
GDAtools:Geometric Data Analysis
Many tools for Geometric Data Analysis (Le Roux & Rouanet (2005) <doi:10.1007/1-4020-2236-0>), such as MCA variants (Specific Multiple Correspondence Analysis, Class Specific Analysis), many graphical and statistical aids to interpretation (structuring factors, concentration ellipses, inductive tests, bootstrap validation, etc.) and multiple-table analysis (Multiple Factor Analysis, between- and inter-class analysis, Principal Component Analysis and Correspondence Analysis with Instrumental Variables, etc.).
Maintained by Nicolas Robette. Last updated 10 months ago.
18.3 match 10 stars 5.93 score 94 scripts 2 dependentslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 24 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
11.0 match 233 stars 9.84 score 185 scripts 1 dependentsrolkra
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 3 months ago.
data-explorationdata-visualisationdecision-treesedarmarkdownshinytidy
9.3 match 228 stars 11.43 score 221 scripts 1 dependentsrstudio
tfdatasets:Interface to 'TensorFlow' Datasets
Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://www.tensorflow.org/guide> for additional details.
Maintained by Tomasz Kalinowski. Last updated 4 days ago.
11.0 match 34 stars 9.32 score 656 scripts 3 dependentscorybrunson
ggalluvial:Alluvial Plots in 'ggplot2'
Alluvial plots use variable-width ribbons and stacked bar plots to represent multi-dimensional or repeated-measures data with categorical or ordinal variables; see Riehmann, Hanfler, and Froehlich (2005) <doi:10.1109/INFVIS.2005.1532152> and Rosvall and Bergstrom (2010) <doi:10.1371/journal.pone.0008694>. Alluvial plots are statistical graphics in the sense of Wilkinson (2006) <doi:10.1007/0-387-28695-0>; they share elements with Sankey diagrams and parallel sets plots but are uniquely determined from the data and a small set of parameters. This package extends Wickham's (2010) <doi:10.1198/jcgs.2009.07098> layered grammar of graphics to generate alluvial plots from tidy data.
Maintained by Jason Cory Brunson. Last updated 7 months ago.
alluvial-diagramsalluvial-plotscategorical-data-visualizationggplot2repeated-measures-data
7.2 match 507 stars 14.14 score 3.0k scripts 21 dependentskgoldfeld
simstudy:Simulation of Study Data
Simulates data sets in order to explore modeling techniques or better understand data generating processes. The user specifies a set of relationships between covariates, and generates data based on these specifications. The final data sets can represent data from randomized control trials, repeated measure (longitudinal) designs, and cluster randomized trials. Missingness can be generated using various mechanisms (MCAR, MAR, NMAR).
Maintained by Keith Goldfeld. Last updated 8 months ago.
data-generationdata-simulationsimulationstatistical-modelscpp
9.2 match 82 stars 11.00 score 972 scripts 1 dependentsmatteo21q
jomo:Multilevel Joint Modelling Multiple Imputation
Similarly to Schafer's package 'pan', 'jomo' is a package for multilevel joint modelling multiple imputation (Carpenter and Kenward, 2013) <doi:10.1002/9781119942283>. Novel aspects of 'jomo' are the possibility of handling binary and categorical data through latent normal variables, the option to use cluster-specific covariance matrices and to impute compatibly with the substantive model.
Maintained by Matteo Quartagno. Last updated 3 years ago.
10.5 match 3 stars 9.58 score 126 scripts 154 dependentskkholst
lava:Latent Variable Models
A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.
Maintained by Klaus K. Holst. Last updated 2 months ago.
latent-variable-modelssimulationstatisticsstructural-equation-models
7.7 match 33 stars 12.85 score 610 scripts 476 dependentsblasbenito
collinear:Automated Multicollinearity Management
Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.
Maintained by Blas M. Benito. Last updated 2 months ago.
machine-learningmulticollinearitystatistics
17.8 match 11 stars 5.51 score 15 scripts 1 dependentshelske
seqHMM:Mixture Hidden Markov Models for Social Sequence Data and Other Multivariate, Multichannel Categorical Time Series
Designed for fitting hidden (latent) Markov models and mixture hidden Markov models for social sequence data and other categorical time series. Also some more restricted versions of these type of models are available: Markov models, mixture Markov models, and latent class models. The package supports models for one or multiple subjects with one or multiple parallel sequences (channels). External covariates can be added to explain cluster membership in mixture models. The package provides functions for evaluating and comparing models, as well as functions for visualizing of multichannel sequence data and hidden Markov models. Models are estimated using maximum likelihood via the EM algorithm and/or direct numerical maximization with analytical gradients. All main algorithms are written in C++ with support for parallel computation. Documentation is available via several vignettes in this page, and the paper by Helske and Helske (2019, <doi:10.18637/jss.v088.i03>).
Maintained by Jouni Helske. Last updated 2 years ago.
categorical-dataem-algorithmhidden-markov-modelshmmmixture-markov-modelstime-seriesopenblascppopenmp
10.7 match 97 stars 8.51 score 92 scripts 1 dependentstlverse
tmle3mopttx:Targeted Maximum Likelihood Estimation of the Mean under Optimal Individualized Treatment
This package estimates the optimal individualized treatment rule for the categorical treatment using Super Learner (sl3). In order to avoid nested cross-validation, it uses split-specific estimates of Q and g to estimate the rule as described by Coyle et al. In addition, it provides the Targeted Maximum Likelihood estimates of the mean performance using CV-TMLE under such estimated rules. This is an adapter package for use with the tmle3 framework and the tlverse software ecosystem for Targeted Learning.
Maintained by Ivana Malenica. Last updated 3 years ago.
categorical-treatmentcausal-inferenceheterogeneous-effectsmachine-learningoptimal-individualized-treatmenttargeted-learningvariable-importance
20.0 match 12 stars 4.25 score 49 scripts 1 dependentsmodal-inria
cfda:Categorical Functional Data Analysis
Package for the analysis of categorical functional data. The main purpose is to compute an encoding (real functional variable) for each state <doi:10.3390/math9233074>. It also provides functions to perform basic statistical analysis on categorical functional data.
Maintained by Quentin Grimonprez. Last updated 2 months ago.
categorical-datafunctional-data-analysishacktoberfest
18.2 match 4 stars 4.60 score 3 scriptscertara
tidyvpc:VPC Percentiles and Prediction Intervals
Perform a Visual Predictive Check (VPC), while accounting for stratification, censoring, and prediction correction. Using piping from 'magrittr', the intuitive syntax gives users a flexible and powerful method to generate VPCs using both traditional binning and a new binless approach Jamsen et al. (2018) <doi:10.1002/psp4.12319> with Additive Quantile Regression (AQR) and Locally Estimated Scatterplot Smoothing (LOESS) prediction correction.
Maintained by James Craig. Last updated 4 months ago.
10.5 match 11 stars 7.81 score 47 scripts 1 dependentsconfig-i1
greybox:Toolbox for Model Building and Forecasting
Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
Maintained by Ivan Svetunkov. Last updated 2 days ago.
forecastingmodel-selectionmodel-selection-and-evaluationregressionregression-modelsstatisticscpp
7.4 match 30 stars 11.03 score 97 scripts 34 dependentsjprybylski
xpose.xtras:Extra Functionality for the 'xpose' Package
Adding some at-present missing functionality, or functions unlikely to be added to the base 'xpose' package. This includes some diagnostic plots that have been missing in translation from 'xpose4', but also some useful features that truly extend the capabilities of what can be done with 'xpose'. These extensions include the concept of a set of 'xpose' objects, and diagnostics for likelihood-based models.
Maintained by John Prybylski. Last updated 4 months ago.
13.4 match 6.01 score 5 scriptst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
7.1 match 10.82 score 10k scripts 54 dependentsinzightvit
iNZightTools:Tools for 'iNZight'
Provides a collection of wrapper functions for common variable and dataset manipulation workflows primarily used by 'iNZight', a graphical user interface providing easy exploration and visualisation of data for students of statistics, available in both desktop and online versions. Additionally, many of the functions return the 'tidyverse' code used to obtain the result in an effort to bridge the gap between GUI and coding.
Maintained by Tom Elliott. Last updated 3 months ago.
14.7 match 1 stars 5.16 score 18 scripts 2 dependentsnicolas-robette
descriptio:Descriptive Statistical Analysis
Description of statistical associations between variables : measures of local and global association between variables (phi, Cramér V, correlations, eta-squared, Goodman and Kruskal tau, permutation tests, etc.), multiple graphical representations of the associations between variables (using 'ggplot2') and weighted statistics.
Maintained by Nicolas Robette. Last updated 6 months ago.
15.1 match 4 stars 5.00 score 11 scripts 3 dependentsrstudio
tfestimators:Interface to 'TensorFlow' Estimators
Interface to 'TensorFlow' Estimators <https://www.tensorflow.org/guide/estimator>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
8.9 match 57 stars 8.42 score 170 scriptsrspatial
terra:Spatial Data Analysis
Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).
Maintained by Robert J. Hijmans. Last updated 23 hours ago.
geospatialrasterspatialvectoronetbbprojgdalgeoscpp
4.3 match 559 stars 17.64 score 17k scripts 851 dependentsdanchaltiel
crosstable:Crosstables for Descriptive Analyses
Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using 'rmarkdown' or 'officer'. You can also compute effect sizes and statistical tests if needed.
Maintained by Dan Chaltiel. Last updated 2 months ago.
descriptive-statisticsflextablefrequency-tablehtml-reportmswordofficer
7.1 match 116 stars 10.37 score 340 scriptsrcalinjageman
esci:Estimation Statistics with Confidence Intervals
A collection of functions and 'jamovi' module for the estimation approach to inferential statistics, the approach which emphasizes effect sizes, interval estimates, and meta-analysis. Nearly all functions are based on 'statpsych' and 'metafor'. This package is still under active development, and breaking changes are likely, especially with the plot and hypothesis test functions. Data sets are included for all examples from Cumming & Calin-Jageman (2024) <ISBN:9780367531508>.
Maintained by Robert Calin-Jageman. Last updated 22 days ago.
jamovijaspsciencestatisticsvisualization
13.5 match 22 stars 5.42 score 12 scriptsajwills72
catlearn:Formal Psychological Models of Categorization and Learning
Formal psychological models of categorization and learning, independently-replicated data sets against which to test them, and simulation archives.
Maintained by Andy Wills. Last updated 3 months ago.
categorizationcognitive-scienceformal-modelslearninglearning-theoryopen-modelsopen-sciencepsychologycpp
13.5 match 26 stars 5.25 score 46 scriptsr-spatial
spdep:Spatial Dependence: Weighting Schemes, Statistics
A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.
Maintained by Roger Bivand. Last updated 18 days ago.
spatial-autocorrelationspatial-dependencespatial-weights
4.0 match 131 stars 16.62 score 6.0k scripts 107 dependentsrpolars
polars:Lightning-Fast 'DataFrame' Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Soren Welling. Last updated 3 days ago.
5.5 match 499 stars 12.01 score 1.0k scripts 2 dependentsbioc
animalcules:Interactive microbiome analysis toolkit
animalcules is an R package for utilizing up-to-date data analytics, visualization methods, and machine learning models to provide users an easy-to-use interactive microbiome analysis framework. It can be used as a standalone software package or users can explore their data with the accompanying interactive R Shiny application. Traditional microbiome analysis such as alpha/beta diversity and differential abundance analysis are enhanced, while new methods like biomarker identification are introduced by animalcules. Powerful interactive and dynamic figures generated by animalcules enable users to understand their data better and discover new insights.
Maintained by Jessica McClintock. Last updated 5 months ago.
microbiomemetagenomicscoveragevisualization
9.5 match 55 stars 6.95 score 23 scriptsbrad-cannell
freqtables:Make Quick Descriptive Tables for Categorical Variables
Quickly make tables of descriptive statistics (i.e., counts, percentages, confidence intervals) for categorical variables. This package is designed to work in a Tidyverse pipeline, and consideration has been given to get results from R to Microsoft Word ® with minimal pain.
Maintained by Brad Cannell. Last updated 1 years ago.
categorical-datadata-analysisdescriptive-statisticsepidemiology
10.9 match 12 stars 6.00 score 84 scriptsmayoverse
arsenal:An Arsenal of 'R' Functions for Large-Scale Statistical Summaries
An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types 'by' the levels of one or more categorical variables; paired(), a Table-1-like summary of multiple variable types paired across two time points; modelsum(), which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates); freqlist(), a powerful frequency table across many categorical variables; comparedf(), a function for comparing data.frames; and write2(), a function to output tables to a document.
Maintained by Ethan Heinzen. Last updated 7 months ago.
baseline-characteristicsdescriptive-statisticsmodelingpaired-comparisonsreportingstatisticstableone
4.7 match 225 stars 13.45 score 1.2k scripts 16 dependentscardiomoon
autoReg:Automatic Linear and Logistic Regression and Survival Analysis
Make summary tables for descriptive statistics and select explanatory variables automatically in various regression models. Support linear models, generalized linear models and cox-proportional hazard models. Generate publication-ready tables summarizing result of regression analysis and plots. The tables and plots can be exported in "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents.
Maintained by Keon-Woong Moon. Last updated 1 years ago.
8.9 match 47 stars 7.00 score 69 scriptssvkucheryavski
mdatools:Multivariate Data Analysis for Chemometrics
Projection based methods for preprocessing, exploring and analysis of multivariate data used in chemometrics. S. Kucheryavskiy (2020) <doi:10.1016/j.chemolab.2020.103937>.
Maintained by Sergey Kucheryavskiy. Last updated 8 months ago.
8.4 match 35 stars 7.37 score 220 scripts 1 dependentsnowosad
motif:Local Pattern Analysis
Describes spatial patterns of categorical raster data for any defined regular and irregular areas. Patterns are described quantitatively using built-in signatures based on co-occurrence matrices but also allows for any user-defined functions. It enables spatial analysis such as search, change detection, and clustering to be performed on spatial patterns (Nowosad (2021) <doi:10.1007/s10980-020-01135-0>).
Maintained by Jakub Nowosad. Last updated 7 months ago.
categorical-rasterglobal-ecologylandscape-ecologyspatialcpp
8.0 match 63 stars 7.48 score 48 scriptspharmaverse
admiral:ADaM in R Asset Library
A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam>).
Maintained by Ben Straub. Last updated 4 days ago.
cdiscclinical-trialsopen-source
4.3 match 236 stars 13.89 score 486 scripts 4 dependentsjacob-long
interactions:Comprehensive, User-Friendly Toolkit for Probing Interactions
A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the 'jtools' package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of "simple slopes" and Johnson-Neyman intervals (see e.g., Bauer & Curran, 2005 <doi:10.1207/s15327906mbr4003_5>). These capabilities are implemented for generalized linear models in addition to the standard linear regression context.
Maintained by Jacob A. Long. Last updated 8 months ago.
interactionsmoderationsocial-sciencesstatistics
5.2 match 131 stars 11.39 score 1.2k scripts 5 dependentsablack3
icdpicr:'ICD' Programs for Injury Categorization in R
Categorization and scoring of injury severity typically involves trained personnel with access to injured persons or their medical records. 'icdpicr' contains a function that provides automated calculation of Abbreviated Injury Scale ('AIS') and Injury Severity Score ('ISS') from International Classification of Diseases ('ICD') codes and may be a useful substitute to manual injury severity scoring. 'ICDPIC' was originally developed in 'Stata', and 'icdpicr' is an open-access update that accepts both 'ICD-9' and 'ICD-10' codes.
Maintained by Adam Black. Last updated 3 years ago.
13.0 match 6 stars 4.48 score 10 scriptsxiaoruizhu
SurrogateRsq:Goodness-of-Fit Analysis for Categorical Data using the Surrogate R-Squared
To assess and compare the models' goodness of fit, R-squared is one of the most popular measures. For categorical data analysis, however, no universally adopted R-squared measure can resemble the ordinary least square (OLS) R-squared for linear models with continuous data. This package implement the surrogate R-squared measure for categorical data analysis, which is proposed in the study of Dungang Liu, Xiaorui Zhu, Brandon Greenwell, and Zewei Lin (2022) <doi:10.1111/bmsp.12289>. It can generate a point or interval measure of the surrogate R-squared. It can also provide a ranking measure of the percentage contribution of each variable to the overall surrogate R-squared. This ranking assessment allows one to check the importance of each variable in terms of their explained variance. This package can be jointly used with other existing R packages for variable selection and model diagnostics in the model-building process.
Maintained by Xiaorui (Jeremy) Zhu. Last updated 12 months ago.
categorical-data-analysisgoodness-of-fitr-squared-statisticstatistics
13.0 match 5 stars 4.48 score 12 scriptsbig-life-lab
cchsflow:Transforming and Harmonizing CCHS Variables
Supporting the use of the Canadian Community Health Survey (CCHS) by transforming variables from each cycle into harmonized, consistent versions that span survey cycles (currently, 2001 to 2018). CCHS data used in this library is accessed and adapted in accordance to the Statistics Canada Open Licence Agreement. This package uses rec_with_table(), which was developed from 'sjmisc' rec(). Lüdecke D (2018). "sjmisc: Data and Variable Transformation Functions". Journal of Open Source Software, 3(26), 754. <doi:10.21105/joss.00754>.
Maintained by Kitty Chen. Last updated 1 years ago.
9.6 match 12 stars 6.02 score 192 scriptsfernandotusell
cat:Analysis and Imputation of Categorical-Variable Datasets with Missing Values
Performs analysis of categorical-variable with missing values. Implements methods from Schafer, JL, Analysis of Incomplete Multivariate Data, Chapman and Hall.
Maintained by Fernando Tusell. Last updated 2 years ago.
17.6 match 3.27 score 52 scripts 2 dependentssfcheung
betaselectr:Betas-Select in Structural Equation Models and Linear Models
It computes betas-select, coefficients after standardization in structural equation models and regression models, standardizing only selected variables. Supports models with moderation, with product terms formed after standardization. It also offers confidence intervals that account for standardization, including bootstrap confidence intervals as proposed by Cheung et al. (2022) <doi:10.1037/hea0001188>.
Maintained by Shu Fai Cheung. Last updated 4 months ago.
bootstrappingconfidence-intervalsgeneralized-linear-modelslavaanlogistic-regressionregressionsemstandardizationstructural-equation-modeling
11.5 match 1 stars 4.95 score 8 scriptslanl
ezECM:Event Categorization Matrix Classification for Nuclear Detonations
Implementation of an Event Categorization Matrix (ECM) detonation detection model and a Bayesian variant. Functions are provided for importing and exporting data, fitting models, and applying decision criteria for categorizing new events. This package implements methods described in the paper "Bayesian Event Categorization Matrix Approach for Nuclear Detonations" Koermer, Carmichael, and Williams (2024) available on arXiv at <doi:10.48550/arXiv.2409.18227>.
Maintained by Scott Koermer. Last updated 5 months ago.
11.1 match 5.08 score 4 scriptsropensci
dynamite:Bayesian Modeling and Causal Inference for Multivariate Longitudinal Data
Easy-to-use and efficient interface for Bayesian inference of complex panel (time series) data using dynamic multivariate panel models by Helske and Tikka (2024) <doi:10.1016/j.alcr.2024.100617>. The package supports joint modeling of multiple measurements per individual, time-varying and time-invariant effects, and a wide range of discrete and continuous distributions. Estimation of these dynamic multivariate panel models is carried out via 'Stan'. For an in-depth tutorial of the package, see (Tikka and Helske, 2024) <doi:10.48550/arXiv.2302.01607>.
Maintained by Santtu Tikka. Last updated 19 days ago.
bayesian-inferencepanel-datastanstatistical-models
7.0 match 29 stars 7.92 score 20 scriptsimbi-heidelberg
DescrTab2:Publication Quality Descriptive Statistics Tables
Provides functions to create descriptive statistics tables for continuous and categorical variables. By default, summary statistics such as mean, standard deviation, quantiles, minimum and maximum for continuous variables and relative and absolute frequencies for categorical variables are calculated. 'DescrTab2' features a sophisticated algorithm to choose appropriate test statistics for your data and provides p-values. On top of this, confidence intervals for group differences of appropriated summary measures are automatically produces for two-group comparison. Tables generated by 'DescrTab2' can be integrated in a variety of document formats, including .html, .tex and .docx documents. 'DescrTab2' also allows printing tables to console and saving table objects for later use.
Maintained by Jan Meis. Last updated 1 years ago.
categorical-variablescontinuous-variabledescriptive-statisticsp-valuesstatistical-testsstatistics
8.3 match 9 stars 6.71 score 19 scripts 1 dependentseclarke
ggbeeswarm:Categorical Scatter (Violin Point) Plots
Provides two methods of plotting categorical scatter plots such that the arrangement of points within a category reflects the density of data at that region, and avoids over-plotting.
Maintained by Erik Clarke. Last updated 4 months ago.
3.5 match 550 stars 15.45 score 7.6k scripts 84 dependentseliasmacielr
msu:Multivariate Symmetric Uncertainty and Other Measurements
Estimators for multivariate symmetrical uncertainty based on the work of Gustavo Sosa et al. (2016) <arXiv:1709.08730>, total correlation, information gain and symmetrical uncertainty of categorical variables.
Maintained by Elias Maciel. Last updated 7 years ago.
19.8 match 1 stars 2.74 score 11 scriptsjackdunnnz
iai:Interface to 'Interpretable AI' Modules
An interface to the algorithms of 'Interpretable AI' <https://www.interpretable.ai> from the R programming language. 'Interpretable AI' provides various modules, including 'Optimal Trees' for classification, regression, prescription and survival analysis, 'Optimal Imputation' for missing data imputation and outlier detection, and 'Optimal Feature Selection' for exact sparse regression. The 'iai' package is an open-source project. The 'Interpretable AI' software modules are proprietary products, but free academic and evaluation licenses are available.
Maintained by Jack Dunn. Last updated 5 months ago.
26.9 match 1 stars 2.00 score 7 scriptsconst-ae
mixdir:Cluster High Dimensional Categorical Datasets
Scalable Bayesian clustering of categorical datasets. The package implements a hierarchical Dirichlet (Process) mixture of multinomial distributions. It is thus a probabilistic latent class model (LCM) and can be used to reduce the dimensionality of hierarchical data and cluster individuals into latent classes. It can automatically infer an appropriate number of latent classes or find k classes, as defined by the user. The model is based on a paper by Dunson and Xing (2009) <doi:10.1198/jasa.2009.tm08439>, but implements a scalable variational inference algorithm so that it is applicable to large datasets. It is described and tested in the accompanying paper by Ahlmann-Eltze and Yau (2018) <doi:10.1109/DSAA.2018.00068>.
Maintained by Constantin Ahlmann-Eltze. Last updated 2 years ago.
categorical-dataclusteringquestionnairesvariational-inferencecpp
12.8 match 14 stars 4.19 score 22 scriptsuupharmacometrics
xpose4:Diagnostics for Nonlinear Mixed-Effect Models
A model building aid for nonlinear mixed-effects (population) model analysis using NONMEM, facilitating data set checkout, exploration and visualization, model diagnostics, candidate covariate identification and model comparison. The methods are described in Keizer et al. (2013) <doi:10.1038/psp.2013.24>, and Jonsson et al. (1999) <doi:10.1016/s0169-2607(98)00067-4>.
Maintained by Andrew C. Hooker. Last updated 1 years ago.
diagnosticsnonmempharmacometricspopulation-modelxpose
7.3 match 35 stars 7.30 score 315 scriptsbabaknaimi
elsa:Entropy-Based Local Indicator of Spatial Association
A framework that provides the methods for quantifying entropy-based local indicator of spatial association (ELSA) that can be used for both continuous and categorical data. In addition, this package offers other methods to measure local indicators of spatial associations (LISA). Furthermore, global spatial structure can be measured using a variogram-like diagram, called entrogram. For more information, please check that paper: Naimi, B., Hamm, N. A., Groen, T. A., Skidmore, A. K., Toxopeus, A. G., & Alibakhshi, S. (2019) <doi:10.1016/j.spasta.2018.10.001>.
Maintained by Babak Naimi. Last updated 1 years ago.
10.2 match 14 stars 5.23 score 24 scriptsolihawkins
tabbycat:Tabulate and Summarise Categorical Data
Functions for tabulating and summarising categorical variables. Most functions are designed to work with dataframes, and use the 'tidyverse' idiom of taking the dataframe as the first argument so they work within pipelines. Equivalent functions that operate directly on vectors are also provided where it makes sense. This package aims to make exploratory data analysis involving categorical variables quicker, simpler and more robust.
Maintained by Oliver Hawkins. Last updated 2 years ago.
12.5 match 36 stars 4.26 score 2 scriptsfurrer-lab
abn:Modelling Multivariate Data with Additive Bayesian Networks
The 'abn' R package facilitates Bayesian network analysis, a probabilistic graphical model that derives from empirical data a directed acyclic graph (DAG). This DAG describes the dependency structure between random variables. The R package 'abn' provides routines to help determine optimal Bayesian network models for a given data set. These models are used to identify statistical dependencies in messy, complex data. Their additive formulation is equivalent to multivariate generalised linear modelling, including mixed models with independent and identically distributed (iid) random effects. The core functionality of the 'abn' package revolves around model selection, also known as structure discovery. It supports both exact and heuristic structure learning algorithms and does not restrict the data distribution of parent-child combinations, providing flexibility in model creation and analysis. The 'abn' package uses Laplace approximations for metric estimation and includes wrappers to the 'INLA' package. It also employs 'JAGS' for data simulation purposes. For more resources and information, visit the 'abn' website.
Maintained by Matteo Delucchi. Last updated 5 days ago.
bayesian-networkbinomialcategorical-datagaussiangrouped-datasetsmixed-effectsmultinomialmultivariatepoissonstructure-learninggslopenblascppopenmpjags
7.5 match 6 stars 6.94 score 90 scriptsironholds
batman:Convert categorical representations of logicals to actual logicals
Survey systems and other third-party data sources commonly use non- standard representations of logical values when it comes to qualitative data - "Yes", "No" and "N/A", say. batman is a package designed to seamlessly convert these into logicals. It is highly localised, and contains equivalents to boolean values in languages including German, French, Spanish, Italian, Turkish, Chinese and Polish.
Maintained by Oliver Keyes. Last updated 9 years ago.
9.8 match 11 stars 5.28 score 70 scriptsstefan-stein
igate:Guided Analytics for Testing Manufacturing Parameters
An implementation of the initial guided analytics for parameter testing and controlband extraction framework. Functions are available for continuous and categorical target variables as well as for generating standardized reports of the conducted analysis. See <https://doi.org/10.1016/j.commatsci.2020.110053> for the paper.
Maintained by Stefan Stein. Last updated 4 years ago.
14.0 match 1 stars 3.70 score 3 scriptsthomasgstewart
tangram.pipe:Row-by-Row Table Building
Builds tables with customizable rows. Users can specify the type of data to use for each row, as well as how to handle missing data and the types of comparison tests to run on the table columns.
Maintained by Andrew Guide. Last updated 3 years ago.
14.2 match 1 stars 3.60 score 1 scriptsguyabel
tidycat:Expand Tidy Output for Categorical Parameter Estimates
Create additional rows and columns on broom::tidy() output to allow for easier control on categorical parameter estimates.
Maintained by Guy J. Abel. Last updated 1 years ago.
data-visualizationdata-vizglmmodel-comparisonregression-analysisregression-modelsstatistical-analysisstatistical-modeling
9.1 match 4 stars 5.53 score 56 scripts 1 dependentspaul-buerkner
brms:Bayesian Regression Models using 'Stan'
Fit Bayesian generalized (non-)linear multivariate multilevel models using 'Stan' for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include both theory-driven and data-driven non-linear terms, auto-correlation structures, censoring and truncation, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their prior knowledge. Models can easily be evaluated and compared using several methods assessing posterior or prior predictions. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>; Bürkner (2018) <doi:10.32614/RJ-2018-017>; Bürkner (2021) <doi:10.18637/jss.v100.i05>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.
Maintained by Paul-Christian Bürkner. Last updated 3 days ago.
bayesian-inferencebrmsmultilevel-modelsstanstatistical-models
3.0 match 1.3k stars 16.61 score 13k scripts 34 dependentsvincentarelbundock
modelsummary:Summary Tables and Plots for Statistical Models and Data: Beautiful, Customizable, and Publication-Ready
Create beautiful and customizable tables to summarize several statistical models side-by-side. Draw coefficient plots, multi-level cross-tabs, dataset summaries, balance tables (a.k.a. "Table 1s"), and correlation matrices. This package supports dozens of statistical models, and it can produce tables in HTML, LaTeX, Word, Markdown, PDF, PowerPoint, Excel, RTF, JPG, or PNG. Tables can easily be embedded in 'Rmarkdown' or 'knitr' dynamic documents. Details can be found in Arel-Bundock (2022) <doi:10.18637/jss.v103.i01>.
Maintained by Vincent Arel-Bundock. Last updated 15 days ago.
3.7 match 926 stars 13.41 score 6.2k scripts 2 dependentssdctools
sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Maintained by Matthias Templ. Last updated 26 days ago.
4.9 match 83 stars 9.89 score 258 scriptsalexanderrobitzsch
sirt:Supplementary Item Response Theory Models
Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).
Maintained by Alexander Robitzsch. Last updated 3 months ago.
item-response-theoryopenblascpp
4.8 match 23 stars 10.01 score 280 scripts 22 dependentsbioc
annotatr:Annotation of Genomic Regions to Genomic Annotations
Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.
Maintained by Raymond G. Cavalcante. Last updated 5 months ago.
softwareannotationgenomeannotationfunctionalgenomicsvisualizationgenome-annotation
4.9 match 26 stars 9.76 score 246 scripts 5 dependentssimsem
semTools:Useful Tools for Structural Equation Modeling
Provides miscellaneous tools for structural equation modeling, many of which extend the 'lavaan' package. For example, latent interactions can be estimated using product indicators (Lin et al., 2010, <doi:10.1080/10705511.2010.488999>) and simple effects probed; analytical power analyses can be conducted (Jak et al., 2021, <doi:10.3758/s13428-020-01479-0>); and scale reliability can be estimated based on estimated factor-model parameters.
Maintained by Terrence D. Jorgensen. Last updated 3 days ago.
3.4 match 79 stars 13.74 score 1.1k scripts 31 dependentstidyverse
ggplot2:Create Elegant Data Visualisations Using the Grammar of Graphics
A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
Maintained by Thomas Lin Pedersen. Last updated 9 days ago.
data-visualisationvisualisation
1.9 match 6.6k stars 25.10 score 645k scripts 7.5k dependentsstatistikat
simPop:Simulation of Complex Synthetic Data Information
Tools and methods to simulate populations for surveys based on auxiliary data. The tools include model-based methods, calibration and combinatorial optimization algorithms, see Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v079.i10>) and Templ (2017) <doi:10.1007/978-3-319-50272-4>. The package was developed with support of the International Household Survey Network, DFID Trust Fund TF011722 and funds from the World bank.
Maintained by Matthias Templ. Last updated 4 months ago.
7.2 match 31 stars 6.51 score 104 scriptsbioc
biocViews:Categorized views of R package repositories
Infrastructure to support 'views' used to classify Bioconductor packages. 'biocViews' are directed acyclic graphs of terms from a controlled vocabulary. There are three major classifications, corresponding to 'software', 'annotation', and 'experiment data' packages.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructurebioconductor-packagecore-package
4.8 match 4 stars 9.71 score 30 scripts 14 dependentsinsightsengineering
cardx:Extra Analysis Results Data Utilities
Create extra Analysis Results Data (ARD) summary objects. The package supplements the simple ARD functions from the 'cards' package, exporting functions to put statistical results in the ARD format. These objects are used and re-used to construct summary tables, visualizations, and written reports.
Maintained by Daniel D. Sjoberg. Last updated 20 days ago.
5.4 match 19 stars 8.46 score 50 scriptspedrosfig
BayesSampling:Bayes Linear Estimators for Finite Population
Allows the user to apply the Bayes Linear approach to finite population with the Simple Random Sampling - BLE_SRS() - and the Stratified Simple Random Sampling design - BLE_SSRS() - (both without replacement), to the Ratio estimator (using auxiliary information) - BLE_Ratio() - and to categorical data - BLE_Categorical(). The Bayes linear estimation approach is applied to a general linear regression model for finite population prediction in BLE_Reg() and it is also possible to achieve the design based estimators using vague prior distributions. Based on Gonçalves, K.C.M, Moura, F.A.S and Migon, H.S.(2014) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X201400111886>.
Maintained by Pedro Soares Figueiredo. Last updated 4 years ago.
10.0 match 1 stars 4.56 score 12 scriptswilkelab
sicegar:Analysis of Single-Cell Viral Growth Curves
Aims to quantify time intensity data by using sigmoidal and double sigmoidal curves. It fits straight lines, sigmoidal, and double sigmoidal curves on to time vs intensity data. Then all the fits are used to make decision on which model best describes the data. This method was first developed in the context of single-cell viral growth analysis (for details, see Caglar et al. (2018) <doi:10.7717/peerj.4251>), and the package name stands for "SIngle CEll Growth Analysis in R".
Maintained by Claus O. Wilke. Last updated 4 years ago.
6.9 match 9 stars 6.57 score 41 scriptspegeler
samplesizeCMH:Power and Sample Size Calculation for the Cochran-Mantel-Haenszel Test
Calculates the power and sample size for Cochran-Mantel-Haenszel tests. There are also several helper functions for working with probability, odds, relative risk, and odds ratio values.
Maintained by Paul Egeler. Last updated 1 months ago.
categorical-datacmh-testsample-sizestatistical-powerstatistics
7.5 match 4 stars 5.94 score 36 scriptstrivialfis
xgboost:Extreme Gradient Boosting
Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.
Maintained by Jiaming Yuan. Last updated 8 months ago.
3.8 match 6 stars 11.70 score 13k scripts 112 dependentseasystats
datawizard:Easy Data Wrangling and Statistical Transformations
A lightweight package to assist in key steps involved in any data analysis workflow: (1) wrangling the raw data to get it in the needed form, (2) applying preprocessing steps and statistical transformations, and (3) compute statistical summaries of data properties and distributions. It is also the data wrangling backend for packages in 'easystats' ecosystem. References: Patil et al. (2022) <doi:10.21105/joss.04684>.
Maintained by Etienne Bacher. Last updated 9 days ago.
datadplyrhacktoberfestjanitormanipulationreshapetidyrwrangling
3.0 match 222 stars 14.71 score 436 scripts 119 dependentsr-spatialecology
landscapemetrics:Landscape Metrics for Categorical Map Patterns
Calculates landscape metrics for categorical landscape patterns in a tidy workflow. 'landscapemetrics' reimplements the most common metrics from 'FRAGSTATS' (<https://www.fragstats.org/>) and new ones from the current literature on landscape metrics. This package supports 'terra' SpatRaster objects as input arguments. It further provides utility functions to visualize patches, select metrics and building blocks to develop new metrics.
Maintained by Maximilian H.K. Hesselbarth. Last updated 1 months ago.
landscape-ecologylandscape-metricsrasterspatialcpp
3.5 match 240 stars 12.47 score 584 scripts 4 dependentsjacobkap
fastDummies:Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables
Creates dummy columns from columns that have categorical variables (character or factor types). You can also specify which columns to make dummies out of, or which columns to ignore. Also creates dummy rows from character, factor, and Date columns. This package provides a significant speed increase from creating dummy variables through model.matrix().
Maintained by Jacob Kaplan. Last updated 2 months ago.
binary-datadummy-columnsdummy-datadummy-rowsdummy-variable
3.3 match 36 stars 13.14 score 2.5k scripts 131 dependentsschaubert
catdata:Categorical Data
This R-package contains examples from the book "Regression for Categorical Data", Tutz 2012, Cambridge University Press. The names of the examples refer to the chapter and the data set that is used.
Maintained by Gunther Schauberger. Last updated 1 years ago.
6.5 match 6.61 score 158 scripts 2 dependentsbenjaminrich
table1:Tables of Descriptive Statistics in HTML
Create HTML tables of descriptive statistics, as one would expect to see as the first table (i.e. "Table 1") in a medical/epidemiological journal article.
Maintained by Benjamin Rich. Last updated 2 years ago.
3.8 match 81 stars 11.17 score 1.5k scripts 6 dependentskoalaverse
sure:Surrogate Residuals for Ordinal and General Regression Models
An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017, <doi:https://doi.org/10.1080/01621459.2017.1292915>) and Greenwell et al. (2017, <https://journal.r-project.org/archive/2018/RJ-2018-004/index.html>). These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an 'autoplot' function for producing standard diagnostic plots using 'ggplot2' graphics. The package currently supports cumulative link models from packages 'MASS', 'ordinal', 'rms', and 'VGAM'. Support for binary regression models using the standard 'glm' function is also available.
Maintained by Brandon Greenwell. Last updated 13 days ago.
categorical-datadiagnosticsordinal-regressionresiduals
7.5 match 9 stars 5.58 score 47 scripts 1 dependentsepiforecasts
scoringutils:Utilities for Scoring and Assessing Predictions
Facilitate the evaluation of forecasts in a convenient framework based on data.table. It allows user to to check their forecasts and diagnose issues, to visualise forecasts and missing data, to transform data before scoring, to handle missing forecasts, to aggregate scores, and to visualise the results of the evaluation. The package mostly focuses on the evaluation of probabilistic forecasts and allows evaluating several different forecast types and input formats. Find more information about the package in the Vignettes as well as in the accompanying paper, <doi:10.48550/arXiv.2205.07090>.
Maintained by Nikos Bosse. Last updated 13 days ago.
forecast-evaluationforecasting
3.7 match 52 stars 11.37 score 326 scripts 7 dependentsbioc
iSEE:Interactive SummarizedExperiment Explorer
Create an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata. The interface supports transmission of selections between plots and tables, code tracking, interactive tours, interactive or programmatic initialization, preservation of app state, and extensibility to new panel types via S4 classes. Special attention is given to single-cell data in a SingleCellExperiment object with visualization of dimensionality reduction results.
Maintained by Kevin Rue-Albrecht. Last updated 10 days ago.
cellbasedassaysclusteringdimensionreductionfeatureextractiongeneexpressionguiimmunooncologyshinyappssinglecelltranscriptiontranscriptomicsvisualizationdimension-reductionfeature-extractiongene-expressionhacktoberfesthuman-cell-atlasshinysingle-cell
3.2 match 225 stars 12.86 score 380 scripts 9 dependentsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 3 months ago.
3.6 match 240 stars 11.39 score 6.0k scriptsjeffreypullin
rater:Statistical Models of Repeated Categorical Rating Data
Fit statistical models based on the Dawid-Skene model - Dawid and Skene (1979) <doi:10.2307/2346806> - to repeated categorical rating data. Full Bayesian inference for these models is supported through the Stan modelling language. 'rater' also allows the user to extract and plot key parameters of these models.
Maintained by Jeffrey Pullin. Last updated 2 years ago.
annotationsbayesianbayesian-statisticsstancpp
6.9 match 18 stars 5.86 score 20 scriptskenaho1
asbio:A Collection of Statistical Tools for Biologists
Contains functions from: Aho, K. (2014) Foundational and Applied Statistics for Biologists using R. CRC/Taylor and Francis, Boca Raton, FL, ISBN: 978-1-4398-7338-0.
Maintained by Ken Aho. Last updated 2 months ago.
5.5 match 5 stars 7.32 score 310 scripts 3 dependentsrobindenz1
adjustedCurves:Confounder-Adjusted Survival Curves and Cumulative Incidence Functions
Estimate and plot confounder-adjusted survival curves using either 'Direct Adjustment', 'Direct Adjustment with Pseudo-Values', various forms of 'Inverse Probability of Treatment Weighting', two forms of 'Augmented Inverse Probability of Treatment Weighting', 'Empirical Likelihood Estimation' or 'Targeted Maximum Likelihood Estimation'. Also includes a significance test for the difference between two adjusted survival curves and the calculation of adjusted restricted mean survival times. Additionally enables the user to estimate and plot cause-specific confounder-adjusted cumulative incidence functions in the competing risks setting using the same methods (with some exceptions). For details, see Denz et. al (2023) <doi:10.1002/sim.9681>.
Maintained by Robin Denz. Last updated 28 days ago.
adjustedconfidence-intervalscumulative-incidencesurvival-curves
4.9 match 38 stars 8.12 score 93 scriptsejikeugba
gofcat:Goodness-of-Fit Measures for Categorical Response Models
A post-estimation method for categorical response models (CRM). Inputs from objects of class serp(), clm(), polr(), multinom(), mlogit(), vglm() and glm() are currently supported. Available tests include the Hosmer-Lemeshow tests for the binary, multinomial and ordinal logistic regression; the Lipsitz and the Pulkstenis-Robinson tests for the ordinal models. The proportional odds, adjacent-category, and constrained continuation-ratio models are particularly supported at ordinal level. Tests for the proportional odds assumptions in ordinal models are also possible with the Brant and the Likelihood-Ratio tests. Moreover, several summary measures of predictive strength (Pseudo R-squared), and some useful error metrics, including, the brier score, misclassification rate and logloss are also available for the binary, multinomial and ordinal models. Ugba, E. R. and Gertheiss, J. (2018) <http://www.statmod.org/workshops_archive_proceedings_2018.html>.
Maintained by Ejike R. Ugba. Last updated 2 years ago.
brant-testbrier-scoreshosmer-lemeshow-testlikelihood-ratio-testlipsitz-testlog-loss-score-metriclogistic-regressionmisclassificationordinal-regressionproportional-odds-testpseudo-r2pulkstenis-robinson-test
12.4 match 2 stars 3.18 score 15 scriptsgdurif
plsgenomics:PLS Analyses for Genomics
Routines for PLS-based genomic analyses, implementing PLS methods for classification with microarray data and prediction of transcription factor activities from combined ChIP-chip analysis. The >=1.2-1 versions include two new classification methods for microarray data: GSIM and Ridge PLS. The >=1.3 versions includes a new classification method combining variable selection and compression in logistic regression context: logit-SPLS; and an adaptive version of the sparse PLS.
Maintained by Ghislain Durif. Last updated 12 months ago.
7.0 match 5.55 score 140 scripts 2 dependentsrsquaredacademy
descriptr:Generate Descriptive Statistics
Generate descriptive statistics such as measures of location, dispersion, frequency tables, cross tables, group summaries and multiple one/two way tables.
Maintained by Aravind Hebbali. Last updated 4 months ago.
descriptive-statisticsedasummary-statistics
5.3 match 34 stars 7.37 score 221 scriptscimentadaj
perccalc:Estimate Percentiles from an Ordered Categorical Variable
An implementation of two functions that estimate values for percentiles from an ordered categorical variable as described by Reardon (2011, isbn:978-0-87154-372-1). One function estimates percentile differences from two percentiles while the other returns the values for every percentile from 1 to 100.
Maintained by Jorge Cimentada. Last updated 5 years ago.
6.9 match 4 stars 5.53 score 14 scriptsbetsybersson
fabPrediction:Compute FAB (Frequentist and Bayes) Conformal Prediction Intervals
Computes and plots prediction intervals for numerical data or prediction sets for categorical data using prior information. Empirical Bayes procedures to estimate the prior information from multi-group data are included. See, e.g.,Bersson and Hoff (2022) <arXiv:2204.08122> "Optimal Conformal Prediction for Small Areas".
Maintained by Elizabeth Bersson. Last updated 12 months ago.
8.8 match 4.30 score 2 scriptsmaelstrom-research
madshapR:Support Technical Processes Following 'Maelstrom Research' Standards
Functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).
Maintained by Guillaume Fabre. Last updated 11 months ago.
7.0 match 2 stars 5.40 score 28 scripts 3 dependentsgreta-dev
greta:Simple and Scalable Statistical Modelling in R
Write statistical models in R and fit them by MCMC and optimisation on CPUs and GPUs, using Google 'TensorFlow'. greta lets you write your own model like in BUGS, JAGS and Stan, except that you write models right in R, it scales well to massive datasets, and it’s easy to extend and build on. See the website for more information, including tutorials, examples, package documentation, and the greta forum.
Maintained by Nicholas Tierney. Last updated 6 days ago.
3.0 match 566 stars 12.53 score 396 scripts 6 dependentsmoderndive
moderndive:Tidyverse-Friendly Introductory Linear Regression
Datasets and wrapper functions for tidyverse-friendly introductory linear regression, used in "Statistical Inference via Data Science: A ModernDive into R and the Tidyverse" available at <https://moderndive.com/>.
Maintained by Albert Y. Kim. Last updated 3 months ago.
3.2 match 88 stars 11.35 score 1.8k scriptsmjskay
tidybayes:Tidy Data and 'Geoms' for Bayesian Models
Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.
Maintained by Matthew Kay. Last updated 6 months ago.
bayesian-data-analysisbrmsggplot2jagsstantidy-datavisualization
2.5 match 733 stars 14.72 score 7.3k scripts 20 dependentsdarwin-eu
PatientProfiles:Identify Characteristics of Patients in the OMOP Common Data Model
Identify the characteristics of patients in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model.
Maintained by Marti Catala. Last updated 9 days ago.
3.7 match 1 stars 9.97 score 225 scripts 9 dependentsbioc
scBubbletree:Quantitative visual exploration of scRNA-seq data
scBubbletree is a quantitative method for the visual exploration of scRNA-seq data, preserving key biological properties such as local and global cell distances and cell density distributions across samples. It effectively resolves overplotting and enables the visualization of diverse cell attributes from multiomic single-cell experiments. Additionally, scBubbletree is user-friendly and integrates seamlessly with popular scRNA-seq analysis tools, facilitating comprehensive and intuitive data interpretation.
Maintained by Simo Kitanovski. Last updated 5 months ago.
visualizationclusteringsinglecelltranscriptomicsrnaseqbig-databigdatascrna-seqscrna-seq-analysisvisualvisual-exploration
6.3 match 6 stars 5.82 score 8 scriptsmatloff
dsld:Data Science Looks at Discrimination
Statistical and graphical tools for detecting and measuring discrimination and bias, be it racial, gender, age or other. Detection and remediation of bias in machine learning algorithms. 'Python' interfaces available.
Maintained by Norm Matloff. Last updated 1 months ago.
4.7 match 12 stars 7.81 score 35 scriptsbioc
EnrichedHeatmap:Making Enriched Heatmaps
Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.
Maintained by Zuguang Gu. Last updated 5 months ago.
softwarevisualizationsequencinggenomeannotationcoveragecpp
3.3 match 190 stars 10.87 score 330 scripts 1 dependentsylleonv
GLMcat:Generalized Linear Models for Categorical Responses
In statistical modeling, there is a wide variety of regression models for categorical dependent variables (nominal or ordinal data); yet, there is no software embracing all these models together in a uniform and generalized format. Following the methodology proposed by Peyhardi, Trottier, and Guédon (2015) <doi:10.1093/biomet/asv042>, we introduce 'GLMcat', an R package to estimate generalized linear models implemented under the unified specification (r, F, Z). Where r represents the ratio of probabilities (reference, cumulative, adjacent, or sequential), F the cumulative cdf function for the linkage, and Z, the design matrix.
Maintained by Lorena León. Last updated 6 months ago.
8.1 match 1 stars 4.41 score 17 scriptscran
StatMatch:Statistical Matching or Data Fusion
Integration of two data sources referred to the same target population which share a number of variables. Some functions can also be used to impute missing values in data sets through hot deck imputation methods. Methods to perform statistical matching when dealing with data from complex sample surveys are available too.
Maintained by Marcello DOrazio. Last updated 2 months ago.
8.8 match 4.03 score 10 dependentszjg540066169
AuxSurvey:Survey Analysis with Auxiliary Discretized Variables
Probability surveys often use auxiliary continuous data from administrative records, but the utility of this data is diminished when it is discretized for confidentiality. We provide a set of survey estimators to make full use of information from the discretized variables. See Williams, S.Z., Zou, J., Liu, Y., Si, Y., Galea, S. and Chen, Q. (2024), Improving Survey Inference Using Administrative Records Without Releasing Individual-Level Continuous Data. Statistics in Medicine, 43: 5803-5813. <doi:10.1002/sim.10270> for details.
Maintained by Jungang Zou. Last updated 3 months ago.
auxilary-variablescategorical-variablessurvey-analysis
7.5 match 1 stars 4.70 score 5 scriptsholgerschw
scrime:Analysis of High-Dimensional Categorical Data Such as SNP Data
Tools for the analysis of high-dimensional data developed/implemented at the group "Statistical Complexity Reduction In Molecular Epidemiology" (SCRIME). Main focus is on SNP data. But most of the functions can also be applied to other types of categorical data.
Maintained by Holger Schwender. Last updated 6 years ago.
7.1 match 4.98 score 53 scripts 34 dependentsvandomed
tab:Create Summary Tables for Statistical Reports
Contains functions for creating various types of summary tables, e.g. comparing characteristics across levels of a categorical variable and summarizing fitted generalized linear models, generalized estimating equations, and Cox proportional hazards models. Functions are available to handle data from simple random samples as well as complex surveys.
Maintained by Dane R. Van Domelen. Last updated 4 years ago.
manuscriptsreportsreproducible-researchstatisticstables
5.0 match 2 stars 6.97 score 86 scripts 9 dependentscran
ModelMap:Modeling and Map Production using Random Forest and Related Stochastic Models
Creates sophisticated models of training data and validates the models with an independent test set, cross validation, or Out Of Bag (OOB) predictions on the training data. Create graphs and tables of the model validation results. Applies these models to GIS .img files of predictors to create detailed prediction surfaces. Handles large predictor files for map making, by reading in the .img files in chunks, and output to the .txt file the prediction for each data chunk, before reading the next chunk of data.
Maintained by Elizabeth Freeman. Last updated 2 years ago.
10.8 match 1 stars 3.26 scorerempsyc
rempsyc:Convenience Functions for Psychology
Make your workflow faster and easier. Easily customizable plots (via 'ggplot2'), nice APA tables (following the style of the *American Psychological Association*) exportable to Word (via 'flextable'), easily run statistical tests or check assumptions, and automatize various other tasks.
Maintained by Rémi Thériault. Last updated 1 months ago.
convenience-functionsggplot2psychologystatisticsvisualization
3.3 match 43 stars 10.68 score 214 scripts 2 dependentscran
NPBayesImputeCat:Non-Parametric Bayesian Multiple Imputation for Categorical Data
These routines create multiple imputations of missing at random categorical data, and create multiply imputed synthesis of categorical data, with or without structural zeros. Imputations and syntheses are based on Dirichlet process mixtures of multinomial distributions, which is a non-parametric Bayesian modeling approach that allows for flexible joint modeling, described in Manrique-Vallier and Reiter (2014) <doi:10.1080/10618600.2013.844700>.
Maintained by Jingchen Hu. Last updated 2 years ago.
23.5 match 1.48 score 1 dependentsplangfelder
WGCNA:Weighted Correlation Network Analysis
Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.
Maintained by Peter Langfelder. Last updated 6 months ago.
3.5 match 54 stars 9.65 score 5.3k scripts 32 dependentscran
vcd:Visualizing Categorical Data
Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book "Visualizing Categorical Data" by Michael Friendly and is now the main support package for a new book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer (2015).
Maintained by David Meyer. Last updated 6 months ago.
4.1 match 5 stars 8.19 score 87 dependentsjcfaria
fdth:Frequency Distribution Tables, Histograms and Polygons
Perform frequency distribution tables, associated histograms and polygons from vector, data.frame and matrix objects for numerical and categorical variables.
Maintained by José C. Faria. Last updated 1 years ago.
5.7 match 2 stars 5.87 score 107 scriptsepicentre-msf
dbc:Dictionary-Based Cleaning
Tools for dictionary-based data cleaning.
Maintained by Patrick Barks. Last updated 1 years ago.
13.5 match 2 stars 2.48 score 4 scripts 1 dependentsajwills72
grt:General Recognition Theory
Functions to generate and analyze data for psychology experiments based on the General Recognition Theory.
Maintained by Andy Wills. Last updated 8 years ago.
13.9 match 2.34 score 44 scriptsoverton-group
eHDPrep:Quality Control and Semantic Enrichment of Datasets
A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) <doi:10.1093/gigascience/giad030>). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.
Maintained by Ian Overton. Last updated 2 years ago.
data-qualityhealth-informaticssemantic-enrichment
6.6 match 8 stars 4.90 score 10 scriptsrsparapa
BART:Bayesian Additive Regression Trees
Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and time-to-event outcomes. For more information see Sparapani, Spanbauer and McCulloch <doi:10.18637/jss.v097.i01>.
Maintained by Rodney Sparapani. Last updated 9 months ago.
4.1 match 14 stars 7.96 score 474 scripts 10 dependentsmaxwestphal
cases:Stratified Evaluation of Subgroup Classification Accuracy
Enables simultaneous statistical inference for the accuracy of multiple classifiers in multiple subgroups (strata). For instance, allows to perform multiple comparisons in diagnostic accuracy studies with co-primary endpoints sensitivity and specificity (Westphal M, Zapf A. Statistical inference for diagnostic test accuracy studies with multiple comparisons. Statistical Methods in Medical Research. 2024;0(0). <doi:10.1177/09622802241236933>).
Maintained by Max Westphal. Last updated 2 months ago.
7.0 match 1 stars 4.59 score 13 scriptstwolodzko
extraDistr:Additional Univariate and Multivariate Distributions
Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, beta-binomial, beta-negative binomial, beta prime, Bhattacharjee, Birnbaum-Saunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichlet-multinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gamma-Poisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, half-Cauchy, half-normal, half-t, Huber density, inverse chi-squared, inverse-gamma, Kumaraswamy, Laplace, location-scale t, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, non-standard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zero-inflated binomial, zero-inflated negative binomial, zero-inflated Poisson.
Maintained by Tymoteusz Wolodzko. Last updated 11 days ago.
c-plus-plusc-plus-plus-11distributionmultivariate-distributionsprobabilityrandom-generationrcppstatisticscpp
2.8 match 53 stars 11.60 score 1.5k scripts 107 dependentsalastairrushworth
inspectdf:Inspection, Comparison and Visualisation of Data Frames
A collection of utilities for columnwise summary, comparison and visualisation of data frames. Functions report missingness, categorical levels, numeric distribution, correlation, column types and memory usage.
Maintained by Alastair Rushworth. Last updated 3 years ago.
comparisondataframeedaexploratory-data-analysisvisualizationcpp
4.1 match 251 stars 7.74 score 444 scripts 1 dependentsjonnob
rsetse:Strain Elevation Tension Spring Embedding
An R implementation for the Strain Elevation and Tension embedding algorithm from Bourne (2020) <doi:10.1007/s41109-020-00329-4>. The package embeds graphs and networks using the Strain Elevation and Tension embedding (SETSe) algorithm. SETSe represents the network as a physical system, where edges are elastic, and nodes exert a force either up or down based on node features. SETSe positions the nodes vertically such that the tension in the edges of a node is equal and opposite to the force it exerts for all nodes in the network. The resultant structure can then be analysed by looking at the node elevation and the edge strain and tension. This algorithm works on weighted and unweighted networks as well as networks with or without explicit node features. Edge elasticity can be created from existing edge weights or kept as a constant.
Maintained by Jonathan Bourne. Last updated 3 years ago.
embeddingembedding-graphsgraph-embeddingigraphnetworksnetworkscienceunsupervised-learningcppopenmp
6.5 match 7 stars 4.85 score 8 scriptstkcaccia
KODAMA:Knowledge Discovery by Accuracy Maximization
An unsupervised and semi-supervised learning algorithm that performs feature extraction from noisy and high-dimensional data. It facilitates identification of patterns representing underlying groups on all samples in a data set. Based on Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA. (2017) Bioinformatics <doi:10.1093/bioinformatics/btw705> and Cacciatore S, Luchinat C, Tenori L. (2014) Proc Natl Acad Sci USA <doi:10.1073/pnas.1220873111>.
Maintained by Stefano Cacciatore. Last updated 13 hours ago.
4.5 match 1 stars 7.00 score 63 scripts 1 dependentshusson
FactoMineR:Multivariate Exploratory Data Analysis and Data Mining
Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017).
Maintained by Francois Husson. Last updated 3 months ago.
2.1 match 47 stars 14.71 score 5.6k scripts 112 dependentsr-tmap
tmap:Thematic Maps
Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.
Maintained by Martijn Tennekes. Last updated 5 days ago.
choropleth-mapsmapsspatialthematic-mapsvisualisation
1.9 match 880 stars 16.73 score 13k scripts 24 dependentsjameslamb
lightgbm:Light Gradient Boosting Machine
Tree based algorithms can be improved by introducing boosting frameworks. 'LightGBM' is one such framework, based on Ke, Guolin et al. (2017) <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision>. This package offers an R interface to work with it. It is designed to be distributed and efficient with the following advantages: 1. Faster training speed and higher efficiency. 2. Lower memory usage. 3. Better accuracy. 4. Parallel learning supported. 5. Capable of handling large-scale data. In recognition of these advantages, 'LightGBM' has been widely-used in many winning solutions of machine learning competitions. Comparison experiments on public datasets suggest that 'LightGBM' can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. In addition, parallel experiments suggest that in certain circumstances, 'LightGBM' can achieve a linear speed-up in training time by using multiple machines.
Maintained by James Lamb. Last updated 1 months ago.
3.7 match 1 stars 8.47 score 1.6k scripts 6 dependentscmmr
rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data
A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.
Maintained by Daniel P. Smith. Last updated 6 days ago.
3.4 match 15 stars 9.02 score 117 scripts 6 dependentsthomaswiemann
civ:Categorical Instrumental Variables
Implementation of the categorical instrumental variable (CIV) estimator proposed by Wiemann (2023) <arXiv:2311.17021>. CIV allows for optimal instrumental variable estimation in settings with relatively few observations per category. To obtain valid inference in these challenging settings, CIV leverages a regularization assumption that implies existence of a latent categorical variable with fixed finite support achieving the same first stage fit as the observed instrument.
Maintained by Thomas Wiemann. Last updated 1 years ago.
7.7 match 2 stars 4.00 score 5 scriptscaubm
ExactMed:Exact Mediation Analysis for Binary Outcomes
A tool for conducting exact parametric regression-based causal mediation analysis of binary outcomes as described in Samoilenko, Blais and Lefebvre (2018) <doi:10.1353/obs.2018.0013>; Samoilenko, Lefebvre (2021) <doi:10.1093/aje/kwab055>; and Samoilenko, Lefebvre (2023) <doi:10.1002/sim.9621>.
Maintained by Miguel Caubet. Last updated 1 years ago.
8.3 match 3.70 score 5 scriptsewenharrison
finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling
Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.
Maintained by Ewen Harrison. Last updated 7 months ago.
2.7 match 270 stars 11.43 score 1.0k scriptsmitchelloharawild
distributional:Vectorised Probability Distributions
Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions.
Maintained by Mitchell OHara-Wild. Last updated 2 months ago.
probability-distributionstatisticsvctrs
2.3 match 101 stars 13.50 score 744 scripts 384 dependentsggobi
GGally:Extension to 'ggplot2'
The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.
Maintained by Barret Schloerke. Last updated 10 months ago.
1.9 match 597 stars 16.15 score 17k scripts 154 dependentslaplacesdemonr
LaplacesDemon:Complete Environment for Bayesian Inference
Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview).
Maintained by Henrik Singmann. Last updated 12 months ago.
2.3 match 93 stars 13.45 score 1.8k scripts 60 dependentsdtkaplan
LSTbook:Data and Software for "Lessons in Statistical Thinking"
"Lessons in Statistical Thinking" D.T. Kaplan (2014) <https://dtkaplan.github.io/Lessons-in-statistical-thinking/> is a textbook for a first or second course in statistics that embraces data wrangling, causal reasoning, modeling, statistical adjustment, and simulation. 'LSTbook' supports the student-centered, tidy, pipeline-oriented computing style featured in the book.
Maintained by Daniel Kaplan. Last updated 1 days ago.
4.8 match 4 stars 6.29 score 27 scriptsvegawidget
vegawidget:'Htmlwidget' for 'Vega' and 'Vega-Lite'
'Vega' and 'Vega-Lite' parse text in 'JSON' notation to render chart-specifications into 'HTML'. This package is used to facilitate the rendering. It also provides a means to interact with signals, events, and datasets in a 'Vega' chart using 'JavaScript' or 'Shiny'.
Maintained by Ian Lyttle. Last updated 1 years ago.
3.8 match 68 stars 8.04 score 49 scripts 4 dependentsbioc
metabomxtr:A package to run mixture models for truncated metabolomics data with normal or lognormal distributions
The functions in this package return optimized parameter estimates and log likelihoods for mixture models of truncated data with normal or lognormal distributions.
Maintained by Michael Nodzenski. Last updated 5 months ago.
immunooncologymetabolomicsmassspectrometry
8.2 match 3.60 score 5 scriptsalexpkeil1
qgcompint:Quantile G-Computation Extensions for Effect Measure Modification
G-computation for a set of time-fixed exposures with quantile-based basis functions, possibly under linearity and homogeneity assumptions. Effect measure modification in this method is a way to assess how the effect of the mixture varies by a binary, categorical or continuous variable. Reference: Alexander P. Keil, Jessie P. Buckley, Katie M. OBrien, Kelly K. Ferguson, Shanshan Zhao, and Alexandra J. White (2019) A quantile-based g-computation approach to addressing the effects of exposure mixtures; <doi:10.1289/EHP5838>.
Maintained by Alexander Keil. Last updated 4 days ago.
6.1 match 4 stars 4.89 score 13 scriptsflorianhartig
DHARMa:Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models
The 'DHARMa' package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Currently supported are linear and generalized linear (mixed) models from 'lme4' (classes 'lmerMod', 'glmerMod'), 'glmmTMB', 'GLMMadaptive', and 'spaMM'; phylogenetic linear models from 'phylolm' (classes 'phylolm' and 'phyloglm'); generalized additive models ('gam' from 'mgcv'); 'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model classes. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial, phylogenetic and temporal autocorrelation.
Maintained by Florian Hartig. Last updated 12 days ago.
glmmregressionregression-diagnosticsresidual
2.0 match 226 stars 14.74 score 2.8k scripts 10 dependentsbioc
siggenes:Multiple Testing using SAM and Efron's Empirical Bayes Approaches
Identification of differentially expressed genes and estimation of the False Discovery Rate (FDR) using both the Significance Analysis of Microarrays (SAM) and the Empirical Bayes Analyses of Microarrays (EBAM).
Maintained by Holger Schwender. Last updated 5 months ago.
multiplecomparisonmicroarraygeneexpressionsnpexonarraydifferentialexpression
3.8 match 7.86 score 74 scripts 33 dependentsrstudio
crosstalk:Inter-Widget Interactivity for HTML Widgets
Provides building blocks for allowing HTML widgets to communicate with each other, with Shiny or without (i.e. static .html files). Currently supports linked brushing and filtering.
Maintained by Carson Sievert. Last updated 2 months ago.
2.0 match 292 stars 14.69 score 1.6k scripts 1.5k dependentsdeepayan
lattice:Trellis Graphics for R
A powerful and elegant high-level data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.
Maintained by Deepayan Sarkar. Last updated 11 months ago.
1.7 match 68 stars 17.33 score 27k scripts 13k dependentsnimble-dev
nimble:MCMC, Particle Filtering, and Programmable Hierarchical Modeling
A system for writing hierarchical statistical models largely compatible with 'BUGS' and 'JAGS', writing nimbleFunctions to operate models and do basic R-style math, and compiling both models and nimbleFunctions via custom-generated C++. 'NIMBLE' includes default methods for MCMC, Laplace Approximation, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers 'NIMBLE' provides. 'NIMBLE' extends the 'BUGS'/'JAGS' language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the 'BUGS'/'JAGS' language for writing models, one can use 'NIMBLE' for writing arbitrary other kinds of model-generic algorithms as well. A full User Manual is available at <https://r-nimble.org>.
Maintained by Christopher Paciorek. Last updated 4 days ago.
bayesian-inferencebayesian-methodshierarchical-modelsmcmcprobabilistic-programmingopenblascpp
2.3 match 169 stars 12.97 score 2.6k scripts 19 dependentsrmaia
pavo:Perceptual Analysis, Visualization and Organization of Spectral Colour Data
A cohesive framework for the spectral and spatial analysis of colour described in Maia, Eliason, Bitton, Doucet & Shawkey (2013) <doi:10.1111/2041-210X.12069> and Maia, Gruson, Endler & White (2019) <doi:10.1111/2041-210X.13174>.
Maintained by Thomas White. Last updated 1 months ago.
3.0 match 72 stars 9.72 score 151 scripts 1 dependentsbusiness-science
correlationfunnel:Speed Up Exploratory Data Analysis (EDA) with the Correlation Funnel
Speeds up exploratory data analysis (EDA) by providing a succinct workflow and interactive visualization tools for understanding which features have relationships to target (response). Uses binary correlation analysis to determine relationship. Default correlation method is the Pearson method. Lian Duan, W Nick Street, Yanchi Liu, Songhua Xu, and Brook Wu (2014) <doi:10.1145/2637484>.
Maintained by Matt Dancho. Last updated 1 years ago.
correlationexploratory-analysisexploratory-data-analysisexploratory-data-visualizationstidyverse
4.0 match 137 stars 7.20 score 115 scriptstidymodels
yardstick:Tidy Characterizations of Model Performance
Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).
Maintained by Emil Hvitfeldt. Last updated 4 days ago.
1.9 match 387 stars 15.47 score 2.2k scripts 60 dependentsmlr-org
mlr3torch:Deep Learning with 'mlr3'
Deep Learning library that extends the mlr3 framework by building upon the 'torch' package. It allows to conveniently build, train, and evaluate deep learning models without having to worry about low level details. Custom architectures can be created using the graph language defined in 'mlr3pipelines'.
Maintained by Sebastian Fischer. Last updated 1 months ago.
data-sciencedeep-learningmachine-learningmlr3torch
3.8 match 42 stars 7.63 score 78 scriptsejikeugba
serp:Smooth Effects on Response Penalty for CLM
Implements a regularization method for cumulative link models using the Smooth-Effect-on-Response Penalty (SERP). This method allows flexible modeling of ordinal data by enabling a smooth transition from a general cumulative link model to a simplified version of the same model. As the tuning parameter increases from zero to infinity, the subject-specific effects for each variable converge to a single global effect. The approach addresses common issues in cumulative link models, such as parameter unidentifiability and numerical instability, by maximizing a penalized log-likelihood instead of the standard non-penalized version. Fitting is performed using a modified Newton's method. Additionally, the package includes various model performance metrics and descriptive tools. For details on the implemented penalty method, see Ugba (2021) <doi:10.21105/joss.03705> and Ugba et al. (2021) <doi:10.3390/stats4030037>.
Maintained by Ejike R. Ugba. Last updated 4 months ago.
categorical-dataordinal-regressionpenalized-regressionproportional-odds-regressionregularization-techniques
7.5 match 1 stars 3.86 score 44 scriptstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
3.5 match 3 stars 8.20 score 7.8k scripts 11 dependentsverkehrsbetriebezuerich
catmaply:Heatmap for Categorical Data using 'plotly'
Methods and plotting functions for displaying categorical data on an interactive heatmap using 'plotly'. Provides functionality for strictly categorical heatmaps, heatmaps illustrating categorized continuous data and annotated heatmaps. Also, there are various options to interact with the x-axis to prevent overlapping axis labels, e.g. via simple sliders or range sliders. Besides the viewer pane, resulting plots can be saved as a standalone HTML file, embedded in 'R Markdown' documents or in a 'Shiny' app.
Maintained by Yves Mauron. Last updated 2 months ago.
5.7 match 16 stars 4.98 score 12 scriptsmlverse
torch:Tensors and Neural Networks with 'GPU' Acceleration
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
Maintained by Daniel Falbel. Last updated 6 days ago.
1.7 match 520 stars 16.52 score 1.4k scripts 38 dependentscefet-rj-dal
daltoolbox:Leveraging Experiment Lines to Data Analytics
The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Maintained by Eduardo Ogasawara. Last updated 1 months ago.
4.3 match 1 stars 6.65 score 536 scripts 4 dependentsjobnmadu
Dyn4cast:Dynamic Modeling and Machine Learning Environment
Estimates, predict and forecast dynamic models as well as Machine Learning metrics which assists in model selection for further analysis. The package also have capabilities to provide tools and metrics that are useful in machine learning and modeling. For example, there is quick summary, percent sign, Mallow's Cp tools and others. The ecosystem of this package is analysis of economic data for national development. The package is so far stable and has high reliability and efficiency as well as time-saving.
Maintained by Job Nmadu. Last updated 9 hours ago.
data-scienceequal-lenght-forecastforecastingknotsmachine-learningnigeriapredictionregression-modelsspline-modelsstatisticstime-series
5.6 match 4 stars 5.03 score 38 scriptsanloor7
ctsfeatures:Analyzing Categorical Time Series
An implementation of several functions for feature extraction in categorical time series datasets. Specifically, some features related to marginal distributions and serial dependence patterns can be computed. These features can be used to feed clustering and classification algorithms for categorical time series, among others. The package also includes some interesting datasets containing biological sequences. Practitioners from a broad variety of fields could benefit from the general framework provided by 'ctsfeatures'.
Maintained by Angel Lopez-Oriona. Last updated 1 years ago.
28.0 match 1 stars 1.00 score 1 scriptsdominiquemaucieri
quadcleanR:Cleanup and Visualization of Quadrat Data
A tool that can be customized to aid in the clean up of ecological data collected using quadrats and can crop quadrats to ensure comparability between quadrats collected under different methodologies.
Maintained by Dominique Maucieri. Last updated 2 years ago.
6.3 match 4.45 score 14 scriptsecortesgomez
DiscreteGapStatistic:An Extension of the Gap Statistic for Ordinal/Categorical Data
The gap statistic approach is extended to estimate the number of clusters for categorical response format data. This approach and accompanying software is designed to be used with the output of any clustering algorithm and with distances specifically designed for categorical (i.e. multiple choice) or ordinal survey response data.
Maintained by Eduardo Cortes. Last updated 11 days ago.
7.3 match 3.81 score 4 scriptsstatnet
ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks
An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.
Maintained by Pavel N. Krivitsky. Last updated 7 days ago.
1.8 match 100 stars 15.36 score 1.4k scripts 36 dependentsmjskay
ggdist:Visualizations of Distributions and Uncertainty
Provides primitives for visualizing distributions using 'ggplot2' that are particularly tuned for visualizing uncertainty in either a frequentist or Bayesian mode. Both analytical distributions (such as frequentist confidence distributions or Bayesian priors) and distributions represented as samples (such as bootstrap distributions or Bayesian posterior samples) are easily visualized. Visualization primitives include but are not limited to: points with multiple uncertainty intervals, eye plots (Spiegelhalter D., 1999) <https://ideas.repec.org/a/bla/jorssa/v162y1999i1p45-58.html>, density plots, gradient plots, dot plots (Wilkinson L., 1999) <doi:10.1080/00031305.1999.10474474>, quantile dot plots (Kay M., Kola T., Hullman J., Munson S., 2016) <doi:10.1145/2858036.2858558>, complementary cumulative distribution function barplots (Fernandes M., Walls L., Munson S., Hullman J., Kay M., 2018) <doi:10.1145/3173574.3173718>, and fit curves with multiple uncertainty ribbons.
Maintained by Matthew Kay. Last updated 4 months ago.
ggplot2uncertaintyuncertainty-visualizationvisualizationcpp
1.8 match 856 stars 15.24 score 3.1k scripts 61 dependentselies-ramon
kerntools:Kernel Functions and Tools for Machine Learning Applications
Kernel functions for diverse types of data (including, but not restricted to: nonnegative and real vectors, real matrices, categorical and ordinal variables, sets, strings), plus other utilities like kernel similarity, kernel Principal Components Analysis (PCA) and features' importance for Support Vector Machines (SVMs), which expand other 'R' packages like 'kernlab'.
Maintained by Elies Ramon. Last updated 25 days ago.
5.7 match 1 stars 4.73 score 12 scriptsschlosslab
mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines
An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Maintained by Kelly Sovacool. Last updated 2 years ago.
3.4 match 56 stars 7.83 score 86 scriptscdcgov
surveytable:Formatted Survey Estimates
Short and understandable commands that generate tabulated, formatted, and rounded survey estimates. Mostly a wrapper for the 'survey' package (Lumley (2004) <doi:10.18637/jss.v009.i08> <https://CRAN.R-project.org/package=survey>) that identifies low-precision estimates using the National Center for Health Statistics (NCHS) presentation standards (Parker et al. (2017) <https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf>, Parker et al. (2023) <doi:10.15620/cdc:124368>).
Maintained by Alex Strashny. Last updated 4 days ago.
estimatesformatted-outputpretty-printsurveytables
4.0 match 6 stars 6.71 score 19 scriptseitsupi
neopolars:R Bindings for the 'polars' Rust Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Tatsuya Shima. Last updated 1 days ago.
5.5 match 40 stars 4.86 score 1 scriptspmamouris
ImputeLongiCovs:Longitudinal Imputation of Categorical Variables via a Joint Transition Model
Imputation of longitudinal categorical covariates. We use a methodological framework which ensures that the plausibility of transitions is preserved, overfitting and colinearity issues are resolved, and confounders can be utilized. See Mamouris (2023) <doi:10.1002/sim.9919> for an overview.
Maintained by Pavlos Mamouris. Last updated 1 years ago.
13.3 match 2.00 score 2 scriptsbalexanderstats
ggsurvey:Simplifying `ggplot2` for Survey Data
Functions for survey data including svydesign objects from the 'survey' package that call 'ggplot2' to make bar charts, histograms, boxplots, and hexplots of survey data.
Maintained by Brittany Alexander. Last updated 3 years ago.
7.0 match 11 stars 3.78 score 11 scriptssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 10 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
1.8 match 959 stars 15.16 score 4.0k scripts 21 dependentsmrmaxent
maxnet:Fitting 'Maxent' Species Distribution Models with 'glmnet'
Procedures to fit species distributions models from occurrence records and environmental variables, using 'glmnet' for model fitting. Model structure is the same as for the 'Maxent' Java package, version 3.4.0, with the same feature types and regularization options. See the 'Maxent' website <http://biodiversityinformatics.amnh.org/open_source/maxent> for more details.
Maintained by Steven Phillips. Last updated 2 years ago.
3.0 match 75 stars 8.68 score 169 scripts 7 dependentsmichaelhallquist
MplusAutomation:An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus
Leverages the R language to automate latent variable model estimation and interpretation using 'Mplus', a powerful latent variable modeling program developed by Muthen and Muthen (<https://www.statmodel.com>). Specifically, this package provides routines for creating related groups of models, running batches of models, and extracting and tabulating model parameters and fit statistics.
Maintained by Michael Hallquist. Last updated 2 months ago.
2.0 match 86 stars 12.96 score 664 scripts 13 dependentsstatistikat
VIM:Visualization and Imputation of Missing Values
New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
Maintained by Matthias Templ. Last updated 7 months ago.
hotdeckimputation-methodsmodel-predictionsvisualizationcpp
1.8 match 85 stars 14.44 score 2.6k scripts 19 dependentsacdelre
MAd:Meta-Analysis with Mean Differences
A collection of functions for conducting a meta-analysis with mean differences data. It uses recommended procedures as described in The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).
Maintained by AC Del Re. Last updated 3 years ago.
5.9 match 4.29 score 82 scripts 2 dependentsmhahsler
arules:Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.
Maintained by Michael Hahsler. Last updated 1 months ago.
arulesassociation-rulesfrequent-itemsets
1.8 match 194 stars 13.99 score 3.3k scripts 28 dependentsazure
AzureVision:Interface to Azure Computer Vision Services
An interface to 'Azure Computer Vision' <https://docs.microsoft.com/azure/cognitive-services/Computer-vision/Home> and 'Azure Custom Vision' <https://docs.microsoft.com/azure/cognitive-services/custom-vision-service/home>, building on the low-level functionality provided by the 'AzureCognitive' package. These services allow users to leverage the cloud to carry out visual recognition tasks using advanced image processing models, without needing powerful hardware of their own. Part of the 'AzureR' family of packages.
Maintained by Hong Ooi. Last updated 4 years ago.
azure-cognitive-servicesazure-sdk-rcomputer-visioncustom-vision
5.0 match 5 stars 5.00 score 8 scriptsr-forge
robustbase:Basic Robust Statistics
"Essential" Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book "Robust Statistics, Theory and Methods" by 'Maronna, Martin and Yohai'; Wiley 2006.
Maintained by Martin Maechler. Last updated 4 months ago.
1.9 match 13.33 score 1.7k scripts 480 dependentsroelandkindt
BiodiversityR:Package for Community Ecology and Suitability Analysis
Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.
Maintained by Roeland Kindt. Last updated 2 months ago.
3.4 match 16 stars 7.42 score 390 scripts 2 dependentsheike
ggparallel:Variations of Parallel Coordinate Plots for Categorical Data
Create hammock plots, parallel sets, and common angle plots with 'ggplot2'.
Maintained by Heike Hofmann. Last updated 1 years ago.
4.7 match 41 stars 5.32 score 51 scriptsgavinsimpson
gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'
Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.
Maintained by Gavin L. Simpson. Last updated 10 hours ago.
distributional-regressiongamgammgeneralized-additive-mixed-modelsgeneralized-additive-modelsggplot2glmlmmgcvpenalized-splinerandom-effectssmoothingsplines
1.9 match 217 stars 12.99 score 1.6k scripts 2 dependentsrkabacoff
qacBase:Functions to Facilitate Exploratory Data Analysis
Functions for descriptive statistics, data management, and data visualization.
Maintained by Kabacoff Robert. Last updated 3 years ago.
4.7 match 1 stars 5.13 score 45 scriptscardiomoon
moonBook:Functions and Datasets for the Book by Keon-Woong Moon
Several analysis-related functions for the book entitled "R statistics and graph for medical articles" (written in Korean), version 1, by Keon-Woong Moon with Korean demographic data with several plot functions.
Maintained by Keon-Woong Moon. Last updated 1 years ago.
2.5 match 37 stars 9.66 score 278 scripts 5 dependentsgiuseppec
iml:Interpretable Machine Learning
Interpretability methods to analyze the behavior and predictions of any machine learning model. Implemented methods are: Feature importance described by Fisher et al. (2018) <doi:10.48550/arxiv.1801.01489>, accumulated local effects plots described by Apley (2018) <doi:10.48550/arxiv.1612.08468>, partial dependence plots described by Friedman (2001) <www.jstor.org/stable/2699986>, individual conditional expectation ('ice') plots described by Goldstein et al. (2013) <doi:10.1080/10618600.2014.907095>, local models (variant of 'lime') described by Ribeiro et. al (2016) <doi:10.48550/arXiv.1602.04938>, the Shapley Value described by Strumbelj et. al (2014) <doi:10.1007/s10115-013-0679-x>, feature interactions described by Friedman et. al <doi:10.1214/07-AOAS148> and tree surrogate models.
Maintained by Giuseppe Casalicchio. Last updated 20 days ago.
1.9 match 494 stars 12.86 score 642 scripts 4 dependents