eatGADS:Data Management of Large Hierarchical Data
Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.
Maintained by Benjamin Becker. Last updated 25 days ago.
expss:Tables, Labels and Some Useful Functions from Spreadsheets and 'SPSS' Statistics
Package computes and displays tables with support for 'SPSS'-style labels, multiple and nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', 'Shiny', '*.xlsx' files, R and 'Jupyter' notebooks. Methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package brings popular data transformation functions from 'SPSS' Statistics and 'Excel': 'RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP' and etc. These functions are very useful for data processing in marketing research surveys. Package intended to help people to move data processing from 'Excel' and 'SPSS' to R.
Maintained by Gregory Demin. Last updated 11 months ago.
mde:Missing Data Explorer
Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, 'mde' provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.
Maintained by Nelson Gonzabato. Last updated 3 years ago.
questionr:Functions to Make Surveys Processing Easier
Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.
Maintained by Julien Barnier. Last updated 12 hours ago.
recodeflow:Contains functions to interface with variable details sheets, including recoding variables and converting them to PMML
Recode and harmonize data using variable and details sheets.
Maintained by Yulric Sequeria. Last updated 7 days ago.
censable:Making Census Data More Usable
Creates a common framework for organizing, naming, and gathering population, age, race, and ethnicity data from the Census Bureau. Accesses the API <>. Provides tools for adding information to existing data to line up with Census data.
Maintained by Christopher T. Kenny. Last updated 10 months ago.
regions:Processing Regional Statistics
Validating sub-national statistical typologies, re-coding across standard typologies of sub-national statistics, and making valid aggregate level imputation, re-aggregation, re-weighting and projection down to lower hierarchical levels to create meaningful data panels and time series.
Maintained by Daniel Antal. Last updated 2 years ago.
dplyr:A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
Maintained by Hadley Wickham. Last updated 14 days ago.
car:Companion to Applied Regression
Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.
Maintained by John Fox. Last updated 5 months ago.
labelled:Manipulating Labelled Data
Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.
Maintained by Joseph Larmarange. Last updated 27 days ago.
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 17 days ago.
qlcData:Processing Data for Quantitative Language Comparison
Functionality to read, recode, and transcode data as used in quantitative language comparison, specifically to deal with multilingual orthographic variation (Moran & Cysouw (2018) <doi:10.5281/zenodo.1296780>) and with the recoding of nominal data.
Maintained by Michael Cysouw. Last updated 9 months ago.
memisc:Management of Survey Data and Presentation of Analysis Results
An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.
Maintained by Martin Elff. Last updated 13 days ago.
datawizard:Easy Data Wrangling and Statistical Transformations
A lightweight package to assist in key steps involved in any data analysis workflow: (1) wrangling the raw data to get it in the needed form, (2) applying preprocessing steps and statistical transformations, and (3) compute statistical summaries of data properties and distributions. It is also the data wrangling backend for packages in 'easystats' ecosystem. References: Patil et al. (2022) <doi:10.21105/joss.04684>.
Maintained by Etienne Bacher. Last updated 11 hours ago.
PCAmixdata:Multivariate Analysis of Mixed Data
Implements principal component analysis, orthogonal rotation and multiple factor analysis for a mixture of quantitative and qualitative variables.
Maintained by Marie Chavent. Last updated 2 years ago.
sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Maintained by Matthias Templ. Last updated 28 days ago.
poorman:A Poor Man's Dependency Free Recreation of 'dplyr'
A replication of key functionality from 'dplyr' and the wider 'tidyverse' using only 'base'.
Maintained by Nathan Eastwood. Last updated 1 years ago.
fpc:Flexible Procedures for Clustering
Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.
Maintained by Christian Hennig. Last updated 6 months ago.
DiagrammeR:Graph/Network Visualization
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
Maintained by Richard Iannone. Last updated 2 months ago.
IPEDSuploadables:Transforms Institutional Data into Text Files for IPEDS Automated Import/Upload
Starting from user-supplied institutional data, these scripts transform, aggregate, and reshape the information to produce key-value pair data files that are able to be uploaded to IPEDS (Integrated Postsecondary Education Data System) through their submission portal <>. Starting data specifications can be found in the vignettes. Final files are saved locally to a location of the user's choice. User-friendly readable files can also be produced for purposes of data review and validation.
Maintained by Alison Lanski. Last updated 3 months ago.
likert:Analysis and Visualization Likert Items
An approach to analyzing Likert response items, with an emphasis on visualizations. The stacked bar plot is the preferred method for presenting Likert results. Tabular results are also implemented along with density plots to assist researchers in determining whether Likert responses can be used quantitatively instead of qualitatively. See the likert(), summary.likert(), and plot.likert() functions to get started.
Maintained by Jason Bryer. Last updated 3 years ago.
tidycensus:Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames
An integrated R interface to several United States Census Bureau APIs (<>) and the US Census Bureau's geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for mapping and spatial analysis.
Maintained by Kyle Walker. Last updated 2 months ago.
collapse:Advanced and Fast Data Transformation
A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.
Maintained by Sebastian Krantz. Last updated 7 days ago.
doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities
Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.
Maintained by Sรธren Hรธjsgaard. Last updated 6 days ago.
EdSurvey:Analysis of NCES Education Survey and Assessment Data
Read in and analyze functions for education survey and assessment data from the National Center for Education Statistics (NCES) <>, including National Assessment of Educational Progress (NAEP) data <> and data from the International Assessment Database: Organisation for Economic Co-operation and Development (OECD) <>, including Programme for International Student Assessment (PISA), Teaching and Learning International Survey (TALIS), Programme for the International Assessment of Adult Competencies (PIAAC), and International Association for the Evaluation of Educational Achievement (IEA) <>, including Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, Progress in International Reading Literacy Study (PIRLS), International Civic and Citizenship Study (ICCS), International Computer and Information Literacy Study (ICILS), and Civic Education Study (CivEd).
Maintained by Paul Bailey. Last updated 17 days ago.
qacBase:Functions to Facilitate Exploratory Data Analysis
Functions for descriptive statistics, data management, and data visualization.
Maintained by Kabacoff Robert. Last updated 3 years ago.
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling
Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.
Maintained by Ewen Harrison. Last updated 7 months ago.
admisc:Adrian Dusa's Miscellaneous
Contains functions used across packages 'DDIwR', 'QCA' and 'venn'. Interprets and translates, factorizes and negates SOP - Sum of Products expressions, for both binary and multi-value crisp sets, and extracts information (set names, set values) from those expressions. Other functions perform various other checks if possibly numeric (even if all numbers reside in a character vector) and coerce to numeric, or check if the numbers are whole. It also offers, among many others, a highly versatile recoding routine and some more flexible alternatives to the base functions 'with()' and 'within()'. SOP simplification functions in this package use related minimization from package 'QCA', which is recommended to be installed despite not being listed in the Imports field, due to circular dependency issues.
Maintained by Adrian Dusa. Last updated 5 days ago.
lest:Vectorised Nested if-else Statements Similar to CASE WHEN in 'SQL'
Functions for vectorised conditional recoding of variables. case_when() enables you to vectorise multiple if and else statements (like 'CASE WHEN' in 'SQL'). if_else() is a stricter and more predictable version of ifelse() in 'base' that preserves attributes. These functions are forked from 'dplyr' with all package dependencies removed and behave identically to the originals.
Maintained by Stefan Fleck. Last updated 1 years ago.
incase:Pipe-Friendly Vector Replacement with Case Statements
Offers a pipe-friendly alternative to the 'dplyr' functions case_when() and if_else(), as well as a number of user-friendly simplifications for common use cases. These functions accept a vector as an optional first argument, allowing conditional statements to be built using the 'magrittr' dot operator. The functions also coerce all outputs to the same type, meaning you no longer have to worry about using specific typed variants of NA or explicitly declaring integer outputs, and evaluate outputs somewhat lazily, so you don't waste time on long operations that won't be used.
Maintained by Alexander Rossell Hayes. Last updated 8 months ago.
netdiffuseR:Analysis of Diffusion and Contagion Processes on Networks
Empirical statistical analysis, visualization and simulation of diffusion and contagion processes on networks. The package implements algorithms for calculating network diffusion statistics such as transmission rate, hazard rates, exposure models, network threshold levels, infectiousness (contagion), and susceptibility. The package is inspired by work published in Valente, et al., (2015) <DOI:10.1016/j.socscimed.2015.10.001>; Valente (1995) <ISBN: 9781881303213>, Myers (2000) <DOI:10.1086/303110>, Iyengar and others (2011) <DOI:10.1287/mksc.1100.0566>, Burt (1987) <DOI:10.1086/228667>; among others.
Maintained by George Vega Yon. Last updated 3 months ago.
arules:Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.
Maintained by Michael Hahsler. Last updated 1 months ago.
bruceR:Broadly Useful Convenient and Efficient R Functions
Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.
Maintained by Han-Wu-Shuang Bao. Last updated 9 months ago.
celda:CEllular Latent Dirichlet Allocation
Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.
Maintained by Joshua Campbell. Last updated 29 days ago.
cchsflow:Transforming and Harmonizing CCHS Variables
Supporting the use of the Canadian Community Health Survey (CCHS) by transforming variables from each cycle into harmonized, consistent versions that span survey cycles (currently, 2001 to 2018). CCHS data used in this library is accessed and adapted in accordance to the Statistics Canada Open Licence Agreement. This package uses rec_with_table(), which was developed from 'sjmisc' rec(). Lรผdecke D (2018). "sjmisc: Data and Variable Transformation Functions". Journal of Open Source Software, 3(26), 754. <doi:10.21105/joss.00754>.
Maintained by Kitty Chen. Last updated 1 years ago.
lessR:Less Code, More Results
Each function replaces multiple standard R functions. For example, two function calls, Read() and CountAll(), generate summary statistics for all variables in the data frame, plus histograms and bar charts as appropriate. Other functions provide for summary statistics via pivot tables, a comprehensive regression analysis, ANOVA and t-test, visualizations including the Violin/Box/Scatter plot for a numerical variable, bar chart, histogram, box plot, density curves, calibrated power curve, reading multiple data formats with the same function call, variable labels, time series with aggregation and forecasting, color themes, and Trellis (facet) graphics. Also includes a confirmatory factor analysis of multiple indicator measurement models, pedagogical routines for data simulation such as for the Central Limit Theorem, generation and rendering of regression instructions for interpretative output, and interactive visualizations.
Maintained by David W. Gerbing. Last updated 1 days ago.
ape:Analyses of Phylogenetics and Evolution
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
Maintained by Emmanuel Paradis. Last updated 2 days ago.
hydrostats:Hydrologic Indices for Daily Time Series Data
Calculates a suite of hydrologic indices for daily time series data that are widely used in hydrology and stream ecology.
Maintained by Nick Bond. Last updated 3 years ago.
tidytable:Tidy Interface to 'data.table'
A tidy interface to 'data.table', giving users the speed of 'data.table' while using tidyverse-like syntax.
Maintained by Mark Fairbanks. Last updated 2 months ago.
dae:Functions Useful in the Design and ANOVA of Experiments
The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the design functions for randomizing and assessing designs available as a vignette called 'DesignNotes'. The ANOVA functions facilitate the extraction of information when the 'Error' function has been used in the call to 'aov'. The package 'dae' can also be installed from <>.
Maintained by Chris Brien. Last updated 4 months ago.
bbw:Blocked Weighted Bootstrap
The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys) is implemented. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys.
Maintained by Ernest Guevarra. Last updated 2 months ago.
gendercoder:Recodes Sex/Gender Descriptions into a Standard Set
Provides functions and dictionaries for recoding of freetext gender responses into more consistent categories.
Maintained by Yaoxiang Li. Last updated 1 months ago.
rcoder:Lightweight Data Structure for Recoding Categorical Data without Factors
A data structure and toolkit for documenting and recoding categorical data that can be shared in other statistical software.
Maintained by Patrick Anker. Last updated 1 years ago.
whomds:Calculate Results from WHO Model Disability Survey Data
The Model Disability Survey (MDS) <> is a World Health Organization (WHO) general population survey instrument to assess the distribution of disability within a country or region, grounded in the International Classification of Functioning, Disability and Health <>. This package provides fit-for-purpose functions for calculating and presenting the results from this survey, as used by the WHO. The package primarily provides functions for implementing Rasch Analysis (see Andrich (2011) <doi:10.1586/erp.11.59>) to calculate a metric scale for disability.
Maintained by Lindsay Lee. Last updated 2 years ago.
epiDisplay:Epidemiological Data Display Package
Package for data exploration and result presentation. Full 'epicalc' package with data management functions is available at '<>'.
Maintained by Virasakdi Chongsuvivatwong. Last updated 3 years ago.
quest:Prepare Questionnaire Data for Analysis
Offers a suite of functions to prepare questionnaire data for analysis (perhaps other types of data as well). By data preparation, I mean data analytic tasks to get your raw data ready for statistical modeling (e.g., regression). There are functions to investigate missing data, reshape data, validate responses, recode variables, score questionnaires, center variables, aggregate by groups, shift scores (i.e., leads or lags), etc. It provides functions for both single level and multilevel (i.e., grouped) data. With a few exceptions (e.g., ncases()), functions without an "s" at the end of their primary word (e.g., center_by()) act on atomic vectors, while functions with an "s" at the end of their primary word (e.g., centers_by()) act on multiple columns of a data.frame.
Maintained by David Disabato. Last updated 1 years ago.
psychTools:Tools to Accompany the 'psych' Package for Psychological Research
Support functions, data sets, and vignettes for the 'psych' package. Contains several of the biggest data sets for the 'psych' package as well as four vignettes. A few helper functions for file manipulation are included as well. For more information, see the <> web page.
Maintained by William Revelle. Last updated 12 months ago.
stats19:Work with Open Road Traffic Casualty Data from Great Britain
Tools to help download, process and analyse the UK road collision data collected using the 'STATS19' form. The datasets are provided as 'CSV' files with detailed road safety information about the circumstances of car crashes and other incidents on the roads resulting in casualties in Great Britain from 1979 to present. Tables are available on 'colissions' with the circumstances (e.g. speed limit of road), information about 'vehicles' involved (e.g. type of vehicle), and 'casualties' (e.g. age). The statistics relate only to events on public roads that were reported to the police, and subsequently recorded, using the 'STATS19' collision reporting form. See the Department for Transport website <> for more information on these datasets. The package is described in a paper in the Journal of Open Source Software (Lovelace et al. 2019) <doi:10.21105/joss.01181>. See Gilardi et al. (2022) <doi:10.1111/rssa.12823>, Vidal-Tortosa et al. (2021) <doi:10.1016/j.jth.2021.101291>, and Tait et al. (2023) <doi:10.1016/j.aap.2022.106895> for examples of how the data can be used for methodological and empirical road safety research.
Maintained by Robin Lovelace. Last updated 2 months ago.
lordif:Logistic Ordinal Regression Differential Item Functioning using IRT
Performs analysis of Differential Item Functioning (DIF) for dichotomous and polytomous items using an iterative hybrid of ordinal logistic regression and item response theory (IRT) according to Choi, Gibbons, and Crane (2011) <doi:10.18637/jss.v039.i08>.
Maintained by Seung W. Choi. Last updated 2 months ago.
rres:Realized Relatedness Estimation and Simulation
Functions for studying realized genetic relatedness between people. Users will be able to simulate inheritance patterns given pedigree structures, generate SNP marker data given inheritance patterns, and estimate realized relatedness between pairs of individuals using SNP marker data. See Wang (2017) <doi:10.1534/genetics.116.197004>. This work was supported by National Institutes of Health grants R37 GM-046255.
Maintained by Bowen Wang. Last updated 7 years ago.
childfree:Access and Harmonize Childfree Demographic Data
Reads demographic data from a variety of public data sources, extracting and harmonizing variables useful for the study of childfree individuals. The identification of childfree individuals and those with other family statuses uses Neal & Neal's (2024) "A Framework for Studying Adults who Neither have Nor Want Children" <doi:10.1177/10664807231198869>; A pre-print is available at <doi:10.31234/>.
Maintained by Zachary Neal. Last updated 6 months ago.
dataPreparation:Automated Data Preparation
Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of 'data.table' efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.
Maintained by Emmanuel-Lin Toulemonde. Last updated 2 years ago.
questionnaires:Package with functions to calculate components and sums for LCBC questionnaires
Creates summaries and factorials of answers to questionnaires.
Maintained by Athanasia Mo Mowinckel. Last updated 2 years ago.
ETLUtils:Utility Functions to Execute Standard Extract/Transform/Load Operations (using Package 'ff') on Large Data
Provides functions to facilitate the use of the 'ff' package in interaction with big data in 'SQL' databases (e.g. in 'Oracle', 'MySQL', 'PostgreSQL', 'Hive') by allowing easy importing directly into 'ffdf' objects using 'DBI', 'RODBC' and 'RJDBC'. Also contains some basic utility functions to do fast left outer join merging based on 'match', factorisation of data and a basic function for re-coding vectors.
Maintained by Jan Wijffels. Last updated 5 years ago.
rsm:Response-Surface Analysis
Provides functions to generate response-surface designs, fit first- and second-order response-surface models, make surface plots, obtain the path of steepest ascent, and do canonical analysis. A good reference on these methods is Chapter 10 of Wu, C-F J and Hamada, M (2009) "Experiments: Planning, Analysis, and Parameter Design Optimization" ISBN 978-0-471-69946-0. An early version of the package is documented in Journal of Statistical Software <doi:10.18637/jss.v032.i07>.
Maintained by Russell Lenth. Last updated 9 months ago.
EMCluster:EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution
EM algorithms and several efficient initialization methods for model-based clustering of finite mixture Gaussian distribution with unstructured dispersion in both of unsupervised and semi-supervised learning.
Maintained by Wei-Chen Chen. Last updated 6 months ago.
sirt:Supplementary Item Response Theory Models
Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).
Maintained by Alexander Robitzsch. Last updated 3 months ago.
rempsyc:Convenience Functions for Psychology
Make your workflow faster and easier. Easily customizable plots (via 'ggplot2'), nice APA tables (following the style of the *American Psychological Association*) exportable to Word (via 'flextable'), easily run statistical tests or check assumptions, and automatize various other tasks.
Maintained by Rรฉmi Thรฉriault. Last updated 1 months ago.
demogsurv:Demographic analysis of DHS and other household surveys
This package includes tools for calculating demographic indicators from household survey data. Initially developed for for processing and analysis from Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS). The package provides tools to calculate standard child mortality, adult mortality, and fertility indicators stratified arbitrarily by age group, calendar period, pre-survey time periods, birth cohorts and other survey variables (e.g. residence, region, wealth status, education, etc.). Design-based standard errors and sample correlations are available for all indicators via Taylor linearisation or jackknife.
Maintained by Jeff Eaton. Last updated 3 years ago.
naniar:Data Structures, Summaries, and Visualisations for Missing Data
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.
Maintained by Nicholas Tierney. Last updated 5 days ago.
glottospace:Language Mapping and Geospatial Analysis of Linguistic and Cultural Data
Streamlined workflows for geolinguistic analysis, including: accessing global linguistic and cultural databases, data import, data entry, data cleaning, data exploration, mapping, visualization and export.
Maintained by Rui Dong. Last updated 3 months ago.
tidyfst:Tidy Verbs for Fast Data Manipulation
A toolkit of tidy data manipulation verbs with 'data.table' as the backend. Combining the merits of syntax elegance from 'dplyr' and computing performance from 'data.table', 'tidyfst' intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of 'data.table'. While enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations.
Maintained by Tian-Yuan Huang. Last updated 6 months ago.
scrime:Analysis of High-Dimensional Categorical Data Such as SNP Data
Tools for the analysis of high-dimensional data developed/implemented at the group "Statistical Complexity Reduction In Molecular Epidemiology" (SCRIME). Main focus is on SNP data. But most of the functions can also be applied to other types of categorical data.
Maintained by Holger Schwender. Last updated 6 years ago.
datapackage:Creating and Reading Data Packages
Open, read data from and modify Data Packages. Data Packages are an open standard for bundling and describing data sets (<>). When data is read from a Data Package care is taken to convert the data as much a possible to R appropriate data types. The package can be extended with plugins for additional data types.
Maintained by Jan van der Laan. Last updated 7 days ago.
Rcmdr:R Commander
A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
Maintained by John Fox. Last updated 5 months ago.
stevemisc:Steve's Miscellaneous Functions
These are miscellaneous functions that I find useful for my research and teaching. The contents include themes for plots, functions for simulating quantities of interest from regression models, functions for simulating various forms of fake data for instructional/research purposes, and many more. All told, the functions provided here are broadly useful for data organization, data presentation, data recoding, and data simulation.
Maintained by Steve Miller. Last updated 7 days ago.
faux:Simulation for Factorial Designs
Create datasets with factorial structure through simulation by specifying variable parameters. Extended documentation at <>. Described in DeBruine (2020) <doi:10.5281/zenodo.2669586>.
Maintained by Lisa DeBruine. Last updated 2 months ago.
poppr:Genetic Analysis of Populations with Mixed Reproduction
Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grรผnwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grรผnwald (2015) <doi:10.3389/fgene.2015.00208>.
Maintained by Zhian N. Kamvar. Last updated 10 months ago.
bupaR:Business Process Analysis in R
Comprehensive Business Process Analysis toolkit. Creates S3-class for event log objects, and related handler functions. Imports related packages for filtering event data, computation of descriptive statistics, handling of 'Petri Net' objects and visualization of process maps. See also packages 'edeaR','processmapR', 'eventdataR' and 'processmonitR'.
Maintained by Gert Janssenswillen. Last updated 2 years ago.
DDIwR:DDI with R
Useful functions for various DDI (Data Documentation Initiative) related inputs and outputs. Converts data files to and from DDI, SPSS, Stata, SAS, R and Excel, including user declared missing values.
Maintained by Adrian Dusa. Last updated 3 months ago.
Deducer:A Data Analysis GUI for R
An intuitive, cross-platform graphical data analysis system. It uses menus and dialogs to guide the user efficiently through the data manipulation and analysis process, and has an excel like spreadsheet for easy data frame visualization and editing. Deducer works best when used with the Java based R GUI JGR, but the dialogs can be called from the command line. Dialogs have also been integrated into the Windows Rgui.
Maintained by Ian Fellows. Last updated 9 years ago.
Ecfun:Functions for 'Ecdat'
Functions and vignettes to update data sets in 'Ecdat' and to create, manipulate, plot, and analyze those and similar data sets.
Maintained by Spencer Graves. Last updated 4 months ago.
qtl2:Quantitative Trait Locus Mapping in Experimental Crosses
Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.
Maintained by Karl W Broman. Last updated 9 days ago.
dplyover:Create columns by applying functions to vectors and/or columns in 'dplyr'
Extension of 'dplyrโs functionality that builds a family of functions around dplyr::across().
Maintained by Tim Tiefenbach. Last updated 3 years ago.
annotatr:Annotation of Genomic Regions to Genomic Annotations
Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.
Maintained by Raymond G. Cavalcante. Last updated 5 months ago.
mokken:Conducts Mokken Scale Analysis
Contains functions for performing Mokken scale analysis on test and questionnaire data. It includes an automated item selection algorithm, and various checks of model assumptions.
Maintained by L. Andries van der Ark. Last updated 9 months ago.
blockmodeling:Generalized and Classical Blockmodeling of Valued Networks
This is primarily meant as an implementation of generalized blockmodeling for valued networks. In addition, measures of similarity or dissimilarity based on structural equivalence and regular equivalence (REGE algorithms) can be computed and partitioned matrices can be plotted: ลฝiberna (2007)<doi:10.1016/j.socnet.2006.04.002>, ลฝiberna (2008)<doi:10.1080/00222500701790207>, ลฝiberna (2014)<doi:10.1016/j.socnet.2014.04.002>.
Maintained by Aleลก ลฝiberna. Last updated 2 years ago.
infoelectoral:Download Spanish Election Results
Download official election results for Spain at polling station, municipality and province level from the Ministry of Interior (<>), format them and import them to the R environment.
Maintained by Hรฉctor Meleiro. Last updated 7 months ago.
noga:noga: Recode according to the General Classifcation of Economic Activities 2008
This package recodes numeric NOGA values to its value labels and vice versa The package allows to recode values from all five NOGA levels (Section, Division, Group, Class and Type). The package can recode values to value labels in four languages (English, German, French and Italian).
Maintained by Johannes Besch. Last updated 8 months ago.
ThurMod:Thurstonian CFA and Thurstonian IRT Modeling
Fit Thurstonian forced-choice models (CFA (simple and factor) and IRT) in R. This package allows for the analysis of item response modeling (IRT) as well as confirmatory factor analysis (CFA) in the Thurstonian framework. Currently, estimation can be performed by 'Mplus' and 'lavaan'. References: Brown & Maydeu-Olivares (2011) <doi:10.1177/0013164410375112>; Jansen, M. T., & Schulze, R. (in review). The Thurstonian linked block design: Improving Thurstonian modeling for paired comparison and ranking data.; Maydeu-Olivares & Bรถckenholt (2005) <doi:10.1037/1082-989X.10.3.285>.
Maintained by Markus Thomas Jansen. Last updated 1 years ago.
vegdata:Access Vegetation Databases and Treat Taxonomy
Handling of vegetation data from different sources ( Turboveg 2.0 <>; the German national repository <> and others. Taxonomic harmonization (given appropriate taxonomic lists, e.g. the German taxonomic standard list "GermanSL", <>).
Maintained by Florian Jansen. Last updated 1 years ago.
codeditr:Implementing Cause-of-Death Data Checks Based on the WHO CoDEdit Tool
The World Health Organization's CoDEdit electronic tool is intended to help producers of cause-of-death statistics in strengthening their capacity to perform routine checks on their data. This package ports the original tool built using Microsoft Access into R so as to leverage the utility and function of the original tool into a usable application program interface that can be used for building more universal tools or for creating programmatic scientific workflows aimed at routine, automated, and large-scale monitoring of cause-of-death data.
Maintained by Ernest Guevarra. Last updated 4 months ago.
REDCapR:Interaction Between R and REDCap
Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.
Maintained by Will Beasley. Last updated 2 months ago.
mark:Miscellaneous, Analytic R Kernels
Miscellaneous functions and wrappers for development in other packages created, maintained by Jordan Mark Barbone.
Maintained by Jordan Mark Barbone. Last updated 1 months ago.
SNPassoc:SNPs-Based Whole Genome Association Studies
Functions to perform most of the common analysis in genome association studies are implemented. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Max-statistic and genetic risk-allele score exact distributions are also possible to be estimated. The methods are described in Gonzalez JR et al., 2007 <doi: 10.1093/bioinformatics/btm025>.
Maintained by Dolors Pelegri. Last updated 5 months ago.
TraMineR:Trajectory Miner: a Sequence Analysis Toolkit
Set of sequence analysis tools for manipulating, describing and rendering categorical sequences, and more generally mining sequence data in the field of social sciences. Although this sequence analysis package is primarily intended for state or event sequences that describe time use or life courses such as family formation histories or professional careers, its features also apply to many other kinds of categorical sequence data. It accepts many different sequence representations as input and provides tools for converting sequences from one format to another. It offers several functions for describing and rendering sequences, for computing distances between sequences with different metrics (among which optimal matching), original dissimilarity-based analysis tools, and functions for extracting the most frequent event subsequences and identifying the most discriminating ones among them. A user's guide can be found on the TraMineR web page.
Maintained by Gilbert Ritschard. Last updated 3 months ago.
ShinyItemAnalysis:Test and Item Analysis via Shiny
Package including functions and interactive shiny application for the psychometric analysis of educational tests, psychological assessments, health-related and other types of multi-item measurements, or ratings from multiple raters.
Maintained by Patricia Martinkova. Last updated 1 months ago.
fabricatr:Imagine Your Data Before You Collect It
Helps you imagine your data before you collect it. Hierarchical data structures and correlated data can be easily simulated, either from random number generators or by resampling from existing data sources. This package is faster with 'data.table' and 'mvnfast' installed.
Maintained by Graeme Blair. Last updated 1 months ago.
weights:Weighting and Weighted Statistics
Provides a variety of functions for producing simple weighted statistics, such as weighted Pearson's correlations, partial correlations, Chi-Squared statistics, histograms, and t-tests. Also now includes some software for quickly recoding survey data and plotting estimates from interaction terms in regressions (and multiply imputed regressions) both with and without weights. NOTE: Weighted partial correlation calculations pulled to address a bug.
Maintained by Josh Pasek. Last updated 4 years ago.
kutils:Project Management Tools
Tools for data importation, recoding, and inspection. There are functions to create new project folders, R code templates, create uniquely named output directories, and to quickly obtain a visual summary for each variable in a data frame. The main feature here is the systematic implementation of the "variable key" framework for data importation and recoding. We are eager to have community feedback about the variable key and the vignette about it. In version 1.7, the function 'semTable' is removed. It was deprecated since 1.67. That is provided in a separate package, 'semTable'.
Maintained by Paul Johnson. Last updated 1 years ago.
lfa:Logistic Factor Analysis for Categorical Data
Logistic Factor Analysis is a method for a PCA analogue on Binomial data via estimation of latent structure in the natural parameter. The main method estimates genetic population structure from genotype data. There are also methods for estimating individual-specific allele frequencies using the population structure. Lastly, a structured Hardy-Weinberg equilibrium (HWE) test is developed, which quantifies the goodness of fit of the genotype data to the estimated population structure, via the estimated individual-specific allele frequencies (all of which generalizes traditional HWE tests).
Maintained by Alejandro Ochoa. Last updated 5 months ago.
beezdemand:Behavioral Economic Easy Demand
Facilitates many of the analyses performed in studies of behavioral economic demand. The package supports commonly-used options for modeling operant demand including (1) data screening proposed by Stein, Koffarnus, Snider, Quisenberry, & Bickel (2015; <doi:10.1037/pha0000020>), (2) fitting models of demand such as linear (Hursh, Raslear, Bauman, & Black, 1989, <doi:10.1007/978-94-009-2470-3_22>), exponential (Hursh & Silberberg, 2008, <doi:10.1037/0033-295X.115.1.186>) and modified exponential (Koffarnus, Franck, Stein, & Bickel, 2015, <doi:10.1037/pha0000045>), and (3) calculating numerous measures relevant to applied behavioral economists (Intensity, Pmax, Omax). Also supports plotting and comparing data.
Maintained by Brent Kaplan. Last updated 7 months ago.
rockchalk:Regression Estimation and Presentation
A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.
Maintained by Paul E. Johnson. Last updated 3 years ago.
ilabelled:Simple Handling of Labelled Data
Simple handling of survey data. Smart handling of meta-information like e.g. variable-labels value-labels and scale-levels. Easy access and validation of meta-information. Useage of value labels and values respectively for subsetting and recoding data.
Maintained by Christof Lewerenz. Last updated 2 months ago.
rosetta:Parallel Use of Statistical Packages in Teaching
When teaching statistics, it can often be desirable to uncouple the content from specific software packages. To ease such efforts, the Rosetta Stats website (<>) allows comparing analyses in different packages. This package is the companion to the Rosetta Stats website, aiming to provide functions that produce output that is similar to output from other statistical packages, thereby facilitating 'software-agnostic' teaching of statistics.
Maintained by Gjalt-Jorn Peters. Last updated 2 years ago.
corpustools:Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
Maintained by Kasper Welbers. Last updated 6 months ago.
cpsvote:A Toolbox for Using the CPSโs Voting and Registration Supplement
Provides automated methods for downloading, recoding, and merging selected years of the Current Population Survey's Voting and Registration Supplement, a large N national survey about registration, voting, and non-voting in United States federal elections. Provides documentation for appropriate use of sample weights to generate statistical estimates, drawing from Hur & Achen (2013) <doi:10.1093/poq/nft042> and McDonald (2018) <>.
Maintained by Jay Lee. Last updated 2 years ago.
midfieldr:Tools and Methods for Working with MIDFIELD Data in 'R'
Provides tools and demonstrates methods for working with individual undergraduate student-level records (registrar's data) in 'R'. Tools include filters for program codes, data sufficiency, and timely completion. Methods include gathering blocs of records, computing quantitative metrics such as graduation rate, and creating charts to visualize comparisons. 'midfieldr' interacts with practice data provided in 'midfielddata', an R data package available at <>. 'midfieldr' also interacts with the full MIDFIELD database for users who have access. This work is supported by the US National Science Foundation through grant numbers 1545667 and 2142087.
Maintained by Richard Layton. Last updated 2 months ago.
BGmisc:An R Package for Extended Behavior Genetics Analysis
Provides functions for behavior genetics analysis, including variance component model identification [Hunter et al. (2021) <doi:10.1007/s10519-021-10055-x>], calculation of relatedness coefficients using path-tracing methods [Wright (1922) <doi:10.1086/279872>; McArdle & McDonald (1984) <doi:10.1111/j.2044-8317.1984.tb00802.x>], inference of relatedness, pedigree conversion, and simulation of multi-generational family data [Lyu et al. (2024) <doi:10.1101/2024.12.19.629449>]. For a full overview, see Garrison et al. (2024) <doi:10.21105/joss.06203>.
Maintained by S. Mason Garrison. Last updated 25 days ago.
essurvey:Download Data from the European Social Survey on the Fly
Download data from the European Social Survey directly from their website <>. There are two families of functions that allow you to download and interactively check all countries and rounds available.
Maintained by Jorge Cimentada. Last updated 3 years ago.
growthcleanr:Data Cleaner for Anthropometric Measurements
Identifies implausible anthropometric (e.g., height, weight) measurements in irregularly spaced longitudinal datasets, such as those from electronic health records.
Maintained by Carrie Daymont. Last updated 17 days ago.
igoR:Intergovernmental Organizations Database
Tools to extract information from the Intergovernmental Organizations ('IGO') Database , version 3, provided by the Correlates of War Project <>. See also Pevehouse, J. C. et al. (2020). Version 3 includes information from 1815 to 2014.
Maintained by Diego Hernangรณmez. Last updated 6 days ago.
bioCancer:Interactive Multi-Omics Cancers Data Visualization and Analysis
This package is a Shiny App to visualize and analyse interactively Multi-Assays of Cancer Genomic Data.
Maintained by Karim Mezhoud. Last updated 5 months ago.
oneclust:Maximum Homogeneity Clustering for Univariate Data
Maximum homogeneity clustering algorithm for one-dimensional data described in W. D. Fisher (1958) <doi:10.1080/01621459.1958.10501479> via dynamic programming.
Maintained by Nan Xiao. Last updated 1 years ago.
peditools:Pediatric Clinical Data Science Tools
A collection of tools for newborn and pediatric anthropometric calculations and data abstraction from Vermont Oxford Network registry exports. Includes charts based on Lambda, Mu, Sigma (LMS) parameters, including: Fenton 2003, Olsen 2010, Olsen BMI, CDC infant, CDC pediatric, CDC BMI, CDC (Addo) skin, WHO infant, WHO skin, Abdel-Rahman 2017, Mramba 2017, Zemel Down Syndrome, Brooks cerebral palsy, WHO expanded, Cappa 2024 (except BMI). Includes functions to take a Vermont Oxford Network XML or CSV data file export read into a data frame, converting the coded variables into human readable factors.
Maintained by Joseph Chou. Last updated 2 months ago.
psyntur:Helper Tools for Teaching Statistical Data Analysis
Provides functions and data-sets that are helpful for teaching statistics and data analysis. It was originally designed for use when teaching students in the Psychology Department at Nottingham Trent University.
Maintained by Mark Andrews. Last updated 4 months ago.
labelr:Label Data Frames, Variables, and Values
Create and use data frame labels for data frame objects (frame labels), their columns (name labels), and individual values of a column (value labels). Value labels include one-to-one and many-to-one labels for nominal and ordinal variables, as well as numerical range-based value labels for continuous variables. Convert value-labeled variables so each value is replaced by its corresponding value label. Add values-converted-to-labels columns to a value-labeled data frame while preserving parent columns. Filter and subset a value-labeled data frame using labels, while returning results in terms of values. Overlay labels in place of values in common R commands to increase interpretability. Generate tables of value frequencies, with categories expressed as raw values or as labels. Access data frames that show value-to-label mappings for easy reference.
Maintained by Robert Hartman. Last updated 7 months ago.
retroharmonize:Ex Post Survey Data Harmonization
Assist in reproducible retrospective (ex-post) harmonization of data, particularly individual level survey data, by providing tools for organizing metadata, standardizing the coding of variables, and variable names and value labels, including missing values, and documenting the data transformations, with the help of comprehensive s3 classes.
Maintained by Daniel Antal. Last updated 2 months ago.
redcaptools:Tools for exporting and working with REDCap data
Tools for exporting and working with REDCap data (e.g. adding labels, formatting dates).
Maintained by Alan G Haynes. Last updated 4 months ago.
LightLogR:Process Data from Wearable Light Loggers and Optical Radiation Dosimeters
Import, processing, validation, and visualization of personal light exposure measurement data from wearable devices. The package implements features such as the import of data and metadata files, conversion of common file formats, validation of light logging data, verification of crucial metadata, calculation of common parameters, and semi-automated analysis and visualization.
Maintained by Johannes Zauner. Last updated 26 days ago.
DoE.base:Full Factorials, Orthogonal Arrays and Base Utilities for DoE Packages
Creates full factorial experimental designs and designs based on orthogonal arrays for (industrial) experiments. Provides diverse quality criteria. Provides utility functions for the class design, which is also used by other packages for designed experiments.
Maintained by Ulrike Groemping. Last updated 1 years ago.
STATcubeR:R Interface for the 'STATcube' REST API and Open Government Data
Import data from the 'STATcube' REST API or from the open data portal of Statistics Austria. This package includes a client for API requests as well as parsing utilities for data which originates from 'STATcube'. Documentation about 'STATcubeR' is provided by several vignettes included in the package as well as on the public 'pkgdown' page at <>.
Maintained by Bernhard Meindl. Last updated 4 months ago.
Jmisc:Julian Miscellaneous Function
Some handy function in R.
Maintained by TszKin Julian Chan. Last updated 3 years ago.
Data package for 'dartR'. Provides data sets to run examples in 'dartR'. This was necessary due to the size limit imposed by 'CRAN'. The data in '' is needed to run the examples provided in the 'dartR' functions. All available data sets are either based on actual data (but reduced in size) and/or simulated data sets to allow the fast execution of examples and demonstration of the functions.
Maintained by Bernd Gruber. Last updated 10 months ago.
forcats:Tools for Working with Categorical Variables (Factors)
Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').
Maintained by Hadley Wickham. Last updated 1 years ago.
eatTools:Miscellaneous Functions for the Analysis of Educational Assessments
Miscellaneous functions for data cleaning and data analysis of educational assessments. Includes functions for descriptive analyses, character vector manipulations and weighted statistics. Mainly a lightweight dependency for the packages 'eatRep', 'eatGADS', 'eatPrep' and 'eatModel' (which will be subsequently submitted to 'CRAN'). The function for defining (weighted) contrasts in weighted effect coding refers to te Grotenhuis et al. (2017) <doi:10.1007/s00038-016-0901-1>. Functions for weighted statistics refer to Wolter (2007) <doi:10.1007/978-0-387-35099-8>.
Maintained by Sebastian Weirich. Last updated 3 months ago.
tntpr:Data Analysis Tools Customized for TNTP
An assortment of functions and templates customized to meet the needs of data analysts at the non-profit organization TNTP. Includes functions for branded colors and plots, credentials management, repository set-up, and other common analytic tasks.
Maintained by Dustin Pashouwer. Last updated 4 months ago.
petersenlab:A Collection of R Functions by the Petersen Lab
A collection of R functions that are widely used by the Petersen Lab. Included are functions for various purposes, including evaluating the accuracy of judgments and predictions, performing scoring of assessments, generating correlation matrices, conversion of data between various types, data management, psychometric evaluation, extensions related to latent variable modeling, various plotting capabilities, and other miscellaneous useful functions. By making the package available, we hope to make our methods reproducible and replicable by others and to help others perform their data processing and analysis methods more easily and efficiently. The codebase is provided in Petersen (2025) <doi:10.5281/zenodo.7602890> and on 'CRAN': <doi: 10.32614/CRAN.package.petersenlab>. The package is described in "Principles of Psychological Assessment: With Applied Examples in R" (Petersen, 2024, 2025) <doi:10.1201/9781003357421>, <doi:10.25820/work.007199>, <doi:10.5281/zenodo.6466589>.
Maintained by Isaac T. Petersen. Last updated 27 days ago.
ggquickeda:Quickly Explore Your Data Using 'ggplot2' and 'table1' Summary Tables
Quickly and easily perform exploratory data analysis by uploading your data as a 'csv' file. Start generating insights using 'ggplot2' plots and 'table1' tables with descriptive stats, all using an easy-to-use point and click 'Shiny' interface.
Maintained by Samer Mouksassi. Last updated 1 days ago.
admiralophtha:ADaM in R Asset Library - Ophthalmology
Aids the programming of Clinical Data Standards Interchange Consortium (CDISC) compliant Ophthalmology Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <>).
Maintained by Edoardo Mancini. Last updated 2 months ago.
ASRgenomics:Complementary Genomic Functions
Presents a series of molecular and genetic routines in the R environment with the aim of assisting in analytical pipelines before and after the use of 'asreml' or another library to perform analyses such as Genomic Selection or Genome-Wide Association Analyses. Methods and examples are described in Gezan, Oliveira, Galli, and Murray (2022) <>.
Maintained by Salvador Gezan. Last updated 1 years ago.
Hmisc:Harrell Miscellaneous
Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.
Maintained by Frank E Harrell Jr. Last updated 1 days ago.
shipunov:Miscellaneous Functions from Alexey Shipunov
A collection of functions for data manipulation, plotting and statistical computing, to use separately or with the book "Visual Statistics. Use R!": Shipunov (2020) <>. Dr Alexey Shipunov died in December 2022. Most useful functions: Bclust(), Jclust() and BootA() which bootstrap hierarchical clustering; Recode() which does multiple recoding in a fast, simple and flexible way; Misclass() which outputs confusion matrix even if classes are not concerted; Overlap() which measures group separation on any projection; Biarrows() which converts any scatterplot into biplot; and Pleiad() which is fast and flexible correlogram.
Maintained by ORPHANED. Last updated 2 years ago.
gssr:US General Social Survey (GSS) Data for R
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the GSS Cumulative Data and GSS Panel Data files packaged for R. Its companion package, gssrdoc, provides the codebook integrated into R's help system For more information on the GSS see \url{}.
Maintained by Kieran Healy. Last updated 4 months ago.
seqhandbook:Miscellaneous Tools for Sequence Analysis
It provides miscellaneous sequence analysis functions for describing episodes in individual sequences, measuring association between domains in multidimensional sequence analysis (see Piccarreta (2017) <doi:10.1177/0049124115591013>), heat maps of sequence data, Globally Interdependent Multidimensional Sequence Analysis (see Robette et al (2015) <doi:10.1177/0081175015570976>), smoothing sequences for index plots (see Piccarreta (2012) <doi:10.1177/0049124112452394>), coding sequences for Qualitative Harmonic Analysis (see Deville (1982)), measuring stress from multidimensional scaling factors (see Piccarreta and Lior (2010) <doi:10.1111/j.1467-985X.2009.00606.x>), symmetrical (or canonical) Partial Least Squares (see Bry (1996)).
Maintained by Nicolas Robette. Last updated 2 years ago.
tabxplor:User-Friendly Tables with Color Helpers for Data Exploration
Make it easy to deal with multiple cross-tables in data exploration, by creating them, manipulating them, and adding color helpers to highlight important informations (differences from totals, comparisons between lines or columns, contributions to variance, confidence intervals, odds ratios, etc.). All functions are pipe-friendly and render data frames which can be easily manipulated. In the same time, time-taking operations are done with 'data.table' to go faster with big dataframes. Tables can be exported with formats and colors to 'Excel', plot and html.
Maintained by Brice Nocenti. Last updated 8 days ago.
randomUniformForest:Random Uniform Forests for Classification, Regression and Unsupervised Learning
Ensemble model, for classification, regression and unsupervised learning, based on a forest of unpruned and randomized binary decision trees. Each tree is grown by sampling, with replacement, a set of variables at each node. Each cut-point is generated randomly, according to the continuous Uniform distribution. For each tree, data are either bootstrapped or subsampled. The unsupervised mode introduces clustering, dimension reduction and variable importance, using a three-layer engine. Random Uniform Forests are mainly aimed to lower correlation between trees (or trees residuals), to provide a deep analysis of variable importance and to allow native distributed and incremental learning.
Maintained by Saip Ciss. Last updated 3 years ago.
chipenrich:Gene Set Enrichment For ChIP-seq Peak Data
ChIP-Enrich and Poly-Enrich perform gene set enrichment testing using peaks called from a ChIP-seq experiment. The method empirically corrects for confounding factors such as the length of genes, and the mappability of the sequence surrounding genes.
Maintained by Kai Wang. Last updated 6 days ago.
CTNote:CTN Outcomes, Treatments, and Endpoints
The Clinical Trials Network (CTN) of the U.S. National Institute of Drug Abuse sponsored the CTN-0094 research team to harmonize data sets from three nationally-representative clinical trials for opioid use disorder (OUD). The CTN-0094 team herein provides a coded collection of trial outcomes and endpoints used in various OUD clinical trials over the past 50 years. These coded outcome functions are used to contrast and cluster different clinical outcome functions based on daily or weekly patient urine screenings. Note that we abbreviate urine drug screen as "UDS" and urine opioid screen as "UOS". For the example data sets (based on clinical trials data harmonized by the CTN-0094 research team), UDS and UOS are largely interchangeable.
Maintained by Gabriel Odom. Last updated 1 years ago.
statgenGWAS:Genome Wide Association Studies
Fast single trait Genome Wide Association Studies (GWAS) following the method described in Kang et al. (2010), <doi:10.1038/ng.548>. One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris.
Maintained by Bart-Jan van Rossum. Last updated 4 months ago.
opendataformat:Reading and Writing Open Data Format Files
The Open Data Format (ODF) is a new, non-proprietary, multilingual, metadata enriched, and zip-compressed data format with metadata structured in the Data Documentation Initiative (DDI) Codebook standard. This package allows reading and writing of data files in the Open Data Format (ODF) in R, and displaying metadata in different languages. For further information on the Open Data Format, see <>.
Maintained by Tom Hartl. Last updated 8 days ago.
dietry:Utilities for Calculating Dietary Intake Indicators for Food Security Assessments
Food security assessments utilise several dietary intake indicators as proxy measures for diet quality, diet sufficiency, and food availability either at individual or household level. Utilities for recoding and calculating these indicators support in establishing consistent and reliable results.
Maintained by Ernest Guevarra. Last updated 3 months ago.
cepumd:Calculate Consumer Expenditure Survey (CE) Annual Estimates
Provides functions and data files to help CE Public-Use Microdata (PUMD) users calculate annual estimated expenditure means, standard errors, and quantiles according to the methods used by the CE with PUMD. For more information on the CE please visit <>. For further reading on CE estimate calculations please see the CE Calculation section of the U.S. Bureau of Labor Statistics (BLS) Handbook of Methods at <>. For further information about CE PUMD please visit <>.
Maintained by Arcenis Rojas. Last updated 11 months ago.
VWPre:Tools for Preprocessing Visual World Data
Gaze data from the Visual World Paradigm requires significant preprocessing prior to plotting and analyzing the data. This package provides functions for preparing visual world eye-tracking data for statistical analysis and plotting. It can prepare data for linear analyses (e.g., ANOVA, Gaussian-family LMER, Gaussian-family GAMM) as well as logistic analyses (e.g., binomial-family LMER and binomial-family GAMM). Additionally, it contains various plotting functions for creating grand average and conditional average plots. See the vignette for samples of the functionality. Currently, the functions in this package are designed for handling data collected with SR Research Eyelink eye trackers using Sample Reports created in SR Research Data Viewer. While we would like to add functionality for data collected with other systems in the future, the current package is considered to be feature-complete; further updates will mainly entail maintenance and the addition of minor functionality.
Maintained by Vincent Porretta. Last updated 4 years ago.
semhelpinghands:Helper Functions for Structural Equation Modeling
An assortment of helper functions for doing structural equation modeling, mainly by 'lavaan' for now. Most of them are time-saving functions for common tasks in doing structural equation modeling and reading the output. This package is not for functions that implement advanced statistical procedures. It is a light-weight package for simple functions that do simple tasks conveniently, with as few dependencies as possible.
Maintained by Shu Fai Cheung. Last updated 5 months ago.
oldr:An Implementation of Rapid Assessment Method for Older People
An implementation of the Rapid Assessment Method for Older People or RAM-OP <>. It provides various functions that allow the user to design and plan the assessment and analyse the collected data. RAM-OP provides accurate and reliable estimates of the needs of older people.
Maintained by Ernest Guevarra. Last updated 1 months ago.
peranavolley:Perana Sports Volleyball Files
Basic functions for reading and working with Perana Sports volleyball scouting files.
Maintained by Ben Raymond. Last updated 10 months ago.
phdcocktail:Enhance the Ease of R Experience as an Emerging Researcher
A toolkit of functions to help: i) effortlessly transform collected data into a publication ready format, ii) generate insightful visualizations from clinical data, iii) report summary statistics in a publication-ready format, iv) efficiently export, save and reload R objects within the framework of R projects.
Maintained by Dahham Alsoud. Last updated 1 years ago.
permutes:Permutation Tests for Time Series Data
Helps you determine the analysis window to use when analyzing densely-sampled time-series data, such as EEG data, using permutation testing (Maris & Oostenveld, 2007) <doi:10.1016/j.jneumeth.2007.03.024>. These permutation tests can help identify the timepoints where significance of an effect begins and ends, and the results can be plotted in various types of heatmap for reporting. Mixed-effects models are supported using an implementation of the approach by Lee & Braun (2012) <doi:10.1111/j.1541-0420.2011.01675.x>.
Maintained by Cesko C. Voeten. Last updated 2 years ago.
rock:Reproducible Open Coding Kit
The Reproducible Open Coding Kit ('ROCK', and this package, 'rock') was developed to facilitate reproducible and open coding, specifically geared towards qualitative research methods. Although it is a general-purpose toolkit, three specific applications have been implemented, specifically an interface to the 'rENA' package that implements Epistemic Network Analysis ('ENA'), means to process notes from Cognitive Interviews ('CIs'), and means to work with decentralized construct taxonomies ('DCTs'). The 'ROCK' and this 'rock' package are described in the ROCK book <> and more information, such as tutorials, is available at <>.
Maintained by Gjalt-Jorn Peters. Last updated 9 days ago.
hmatch:Tools for Cleaning and Matching Hierarchically-Structured Data
Tools for matching raw, potentially messy hierarchical data (e.g. province, county, township) against a reference dataset.
Maintained by Patrick Barks. Last updated 1 years ago.
bsnsing:Bsnsing: A Decision Tree Induction Method Based on Recursive Optimal Boolean Rule Composition
The bsnsing package provides functions for training a decision tree classifier, making predictions and generating latex code for plotting. It solves the two-class and multi-class classification problems under the supervised learning paradigm. While building a decision tree, bsnsing uses a Boolean rule involving multiple variables to split a node. Each split rule is identified by solving an optimization problem. Use the bsnsing function to build a tree, the predict function to make predictions and the show function to plot the tree. The paper is at <arXiv:2205.15263>. Source code and more data sets are at <>.
Maintained by Yanchao Liu. Last updated 3 years ago.
codelist:Working with Code Lists
Functions for working with code lists and vectors with codes. These are an alternative for factor that keep track of both the codes and labels. Methods allow for transforming between codes and labels. Also supports hierarchical code lists.
Maintained by Jan van der Laan. Last updated 25 days ago.
speedycode:Automate Code for Adding Labels, Recoding and Renaming Variables, and Converting ASCII Files
Label, recode, rename, and convert datasets and ASCII files more efficiently. 'speedycode' automates the code necessary for labeling variables with the 'labelled' package, recoding and renaming variables with 'dplyr' syntax, and converting ASCII files with the 'readroper' package. Most functions require only the name of the dataset and the code will be automatically written. Some convenience functions useful for converting ASCII files are also included.
Maintained by Jacob Harris. Last updated 3 years ago.
BCRA:Breast Cancer Risk Assessment
Functions provide risk projections of invasive breast cancer based on Gail model according to National Cancer Institute's Breast Cancer Risk Assessment Tool algorithm for specified race/ethnic groups and age intervals. Gail MH, Brinton LA, et al (1989) <doi:10.1093/jnci/81.24.1879>. Marthew PB, Gail MH, et al (2016) <doi:10.1093/jnci/djw215>.
Maintained by Fanni Zhang. Last updated 5 years ago.
odkr:'Open Data Kit' ('ODK') R API
Utility functions for working with datasets gathered using 'Open Data Kit' ('ODK') <>. These include an API to interface with 'ODK Briefcase', a 'Java' application for fetching and pushing 'ODK' forms and their contents, that allows pulling of data from either a remote 'ODK Aggregate Server' or a local 'ODK' folder, a rename function to give more human readable variable names for 'ODK' datasets, a merge function to create a single dataframe from a nested 'ODK' dataset and an expand function to disaggregate multiple choice answers that have been collapsed into single code by 'ODK'.
Maintained by Ernest Guevarra. Last updated 5 months ago.
limonaid:Working with 'LimeSurvey' Surveys and Responses
'LimeSurvey' is Free/Libre Open Source Software for the development and administrations of online studies, using sophisticated tailoring capabilities to support multiple study designs (see <>). This package supports programmatic creation of surveys that can then be imported into 'LimeSurvey', as well as user friendly import of responses from 'LimeSurvey' studies.
Maintained by Gjalt-Jorn Peters. Last updated 2 months ago.
psycCleaning:Data Cleaning for Psychological Analyses
Useful for preparing and cleaning data. It includes functions to center data, reverse coding, dummy code and effect code data, and more.
Maintained by Jason Moy. Last updated 11 months ago.
exceldata:Streamline Data Import, Cleaning and Recoding from 'Excel'
A small group of functions to read in a data dictionary and the corresponding data table from 'Excel' and to automate the cleaning, re-coding and creation of simple calculated variables. This package was designed to be a companion to the macro-enabled 'Excel' template available on the GitHub site, but works with any similarly-formatted 'Excel' data.
Maintained by Lisa Avery. Last updated 1 years ago.
vimpclust:Variable Importance in Clustering
An implementation of methods related to sparse clustering and variable importance in clustering. The package currently allows to perform sparse k-means clustering with a group penalty, so that it automatically selects groups of numerical features. It also allows to perform sparse clustering and variable selection on mixed data (categorical and numerical features), by preprocessing each categorical feature as a group of numerical features. Several methods for visualizing and exploring the results are also provided. M. Chavent, J. Lacaille, A. Mourer and M. Olteanu (2020)<>.
Maintained by Madalina Olteanu. Last updated 4 years ago.
wtest:The W-Test for Genetic Interactions Testing
Perform the calculation of W-test, diagnostic checking, calculate minor allele frequency (MAF) and odds ratio.
Maintained by Rui Sun. Last updated 6 years ago.
dinamic:A Method to Analyze Recurrent DNA Copy Number Aberrations in Tumors
In tumor tissue, underlying genomic instability can lead to DNA copy number alterations, e.g., copy number gains or losses. Sporadic copy number alterations occur randomly throughout the genome, whereas recurrent alterations are observed in the same genomic region across multiple independent samples, perhaps because they provide a selective growth advantage. This package implements the DiNAMIC procedure for assessing the statistical significance of recurrent DNA copy number aberrations (Bioinformatics (2011) 27(5) 678 - 685).
Maintained by Vonn Walter. Last updated 1 years ago.
zooimage:Analysis of Numerical Plankton Images
A free (open source) solution for analyzing digital images of plankton. In combination with ImageJ, a free image analysis system, it processes digital images, measures individuals, trains for automatic classification of taxa, and finally, measures plankton samples (abundances, total and partial size spectra or biomasses, etc.).
Maintained by Philippe Grosjean. Last updated 7 years ago.
statgenQTLxT:Multi-Trait and Multi-Trial Genome Wide Association Studies
Fast multi-trait and multi-trail Genome Wide Association Studies (GWAS) following the method described in Zhou and Stephens. (2014), <doi:10.1038/nmeth.2848>. One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris.
Maintained by Bart-Jan van Rossum. Last updated 1 years ago.
ecopower:Power Estimates and Equivalence Testing for Multivariate Data
Estimates power by simulation for multivariate abundance data to be used for sample size estimates. Multivariate equivalence testing by simulation from a Gaussian copula model. The package also provides functions for parameterising multivariate effect sizes and simulating multivariate abundance data jointly. The discrete Gaussian copula approach is described in Popovic et al. (2018) <doi:10.1016/j.jmva.2017.12.002>.
Maintained by Michelle Lim. Last updated 2 years ago.
naomi.utils:Utility Functions For Naomi Datasets
This package contains utility functions for creating and manipulating datasets for the Naomi model and related projects.
Maintained by Jeffrey Eaton. Last updated 12 months ago.
PROreg:Patient Reported Outcomes Regression Analysis
It offers a wide variety of techniques, such as graphics, recoding, or regression models, for a comprehensive analysis of patient-reported outcomes (PRO). Especially novel is the broad range of regression models based on the beta-binomial distribution useful for analyzing binomial data with over-dispersion in cross-sectional, longitudinal, or multidimensional response studies (see Najera-Zuloaga J., Lee D.-J. and Arostegui I. (2019) <doi:10.1002/bimj.201700251>).
Maintained by Josu Najera-Zuloaga. Last updated 1 years ago.
GNOSIS:Genomics explorer using statistical and survival analysis in R
GNOSIS incorporates a range of R packages enabling users to efficiently explore and visualise clinical and genomic data obtained from cBioPortal. GNOSIS uses an intuitive GUI and multiple tab panels supporting a range of functionalities. These include data upload and initial exploration, data recoding and subsetting, multiple visualisations, survival analysis, statistical analysis and mutation analysis, in addition to facilitating reproducible research.
Maintained by Lydia King. Last updated 5 months ago.
multiUS:Functions for the Courses Multivariate Analysis and Computer Intensive Methods
Provides utility functions for multivariate analysis (factor analysis, discriminant analysis, and others). The package is primary written for the course Multivariate analysis and for the course Computer intensive methods at the masters program of Applied Statistics at University of Ljubljana.
Maintained by Cugmas Marjan. Last updated 2 years ago.
dfertility:District level estimation of age-specific fertility
This package estimates district-level estimates of age-specific fertility from nationally representative household survey data.
Maintained by Oli Stevens. Last updated 1 years ago.
RcmdrPlugin.temis:Graphical Integrated Text Mining Solution
An 'R Commander' plug-in providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, 'Twitter' queries, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.
Maintained by Milan Bouchet-Valat. Last updated 7 years ago.
OTrecod:Data Fusion using Optimal Transportation Theory
In the context of data fusion, the package provides a set of functions dedicated to the solving of 'recoding problems' using optimal transportation theory (Gares, Guernec, Savy (2019) <doi:10.1515/ijb-2018-0106> and Gares, Omer (2020) <doi:10.1080/01621459.2020.1775615>). From two databases with no overlapping part except a subset of shared variables, the functions of the package assist users until obtaining a unique synthetic database, where the missing information is fully completed.
Maintained by Gregory Guernec. Last updated 2 years ago.
