R-universe search: recoding

beckerbenj

eatGADS:Data Management of Large Hierarchical Data

Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.

Maintained by Benjamin Becker. Last updated 25 days ago.

34.7 match 1 stars 7.36 score 34 scripts 1 dependents

gdemin

expss:Tables, Labels and Some Useful Functions from Spreadsheets and 'SPSS' Statistics

Package computes and displays tables with support for 'SPSS'-style labels, multiple and nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', 'Shiny', '*.xlsx' files, R and 'Jupyter' notebooks. Methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package brings popular data transformation functions from 'SPSS' Statistics and 'Excel': 'RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP' and etc. These functions are very useful for data processing in marketing research surveys. Package intended to help people to move data processing from 'Excel' and 'SPSS' to R.

Maintained by Gregory Demin. Last updated 11 months ago.

excel labels labels-support msexcel pivot-tables recode spss spss-statistics tables variable-labels vlookup

19.9 match 84 stars 11.00 score 1.8k scripts 4 dependents

nelson-gon

mde:Missing Data Explorer

Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, 'mde' provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.

Maintained by Nelson Gonzabato. Last updated 3 years ago.

data-analysis data-cleaning data-exploration data-science datacleaner datacleaning exploratory-data-analysis missing missing-data missing-value-treatment missing-values missingness omit recode replace statistics

35.3 match 4 stars 5.61 score 34 scripts

juba

questionr:Functions to Make Surveys Processing Easier

Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.

Maintained by Julien Barnier. Last updated 21 hours ago.

13.8 match 83 stars 12.91 score 1.1k scripts 19 dependents

big-life-lab

recodeflow:Contains functions to interface with variable details sheets, including recoding variables and converting them to PMML

Recode and harmonize data using variable and details sheets.

Maintained by Yulric Sequeria. Last updated 8 days ago.

20.8 match 6 stars 6.75 score 7 scripts

christopherkenny

censable:Making Census Data More Usable

Creates a common framework for organizing, naming, and gathering population, age, race, and ethnicity data from the Census Bureau. Accesses the API <https://www.census.gov/data/developers/data-sets.html>. Provides tools for adding information to existing data to line up with Census data.

Maintained by Christopher T. Kenny. Last updated 10 months ago.

24.0 match 8 stars 5.78 score 42 scripts 4 dependents

ropengov

regions:Processing Regional Statistics

Validating sub-national statistical typologies, re-coding across standard typologies of sub-national statistics, and making valid aggregate level imputation, re-aggregation, re-weighting and projection down to lower hierarchical levels to create meaningful data panels and time series.

Maintained by Daniel Antal. Last updated 2 years ago.

observatory regions ropengov statistics

15.2 match 12 stars 8.81 score 67 scripts 5 dependents

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 15 days ago.

data-manipulation grammar cpp

5.3 match 4.8k stars 24.68 score 659k scripts 7.8k dependents

r-forge

car:Companion to Applied Regression

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.

Maintained by John Fox. Last updated 5 months ago.

8.3 match 15.29 score 43k scripts 901 dependents

larmarange

labelled:Manipulating Labelled Data

Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.

Maintained by Joseph Larmarange. Last updated 28 days ago.

haven labels metadata sas spss stata

7.9 match 76 stars 15.02 score 2.4k scripts 96 dependents

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 18 days ago.

genetics imputation lmm fortran

9.0 match 12 stars 11.88 score 448 scripts 16 dependents

cysouw

qlcData:Processing Data for Quantitative Language Comparison

Functionality to read, recode, and transcode data as used in quantitative language comparison, specifically to deal with multilingual orthographic variation (Moran & Cysouw (2018) <doi:10.5281/zenodo.1296780>) and with the recoding of nominal data.

Maintained by Michael Cysouw. Last updated 9 months ago.

18.2 match 3 stars 5.38 score 40 scripts

melff

memisc:Management of Survey Data and Presentation of Analysis Results

An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.

Maintained by Martin Elff. Last updated 13 days ago.

survey-data

7.3 match 46 stars 12.34 score 1.2k scripts 13 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 2 days ago.

fortran cpp

5.3 match 87 stars 16.70 score 7.7k scripts 99 dependents

simonlabcode

bakR:Analyze and Compare Nucleotide Recoding RNA Sequencing Datasets

Several implementations of a novel Bayesian hierarchical statistical model of nucleotide recoding RNA-seq experiments (NR-seq; TimeLapse-seq, SLAM-seq, TUC-seq, etc.) for analyzing and comparing NR-seq datasets (see 'Vock and Simon' (2023) <doi:10.1261/rna.079451.122>). NR-seq is a powerful extension of RNA-seq that provides information about the kinetics of RNA metabolism (e.g., RNA degradation rate constants), which is notably lacking in standard RNA-seq data. The statistical model makes maximal use of these high-throughput datasets by sharing information across transcripts to significantly improve uncertainty quantification and increase statistical power. 'bakR' includes a maximally efficient implementation of this model for conservative initial investigations of datasets. 'bakR' also provides more highly powered implementations using the probabilistic programming language 'Stan' to sample from the full posterior distribution. 'bakR' performs multiple-test adjusted statistical inference with the output of these model implementations to help biologists separate signal from background. Methods to automatically visualize key results and detect batch effects are also provided.

Maintained by Isaac Vock. Last updated 4 months ago.

cpp

13.8 match 6 stars 6.12 score 21 scripts

easystats

datawizard:Easy Data Wrangling and Statistical Transformations

A lightweight package to assist in key steps involved in any data analysis workflow: (1) wrangling the raw data to get it in the needed form, (2) applying preprocessing steps and statistical transformations, and (3) compute statistical summaries of data properties and distributions. It is also the data wrangling backend for packages in 'easystats' ecosystem. References: Patil et al. (2022) <doi:10.21105/joss.04684>.

Maintained by Etienne Bacher. Last updated 20 hours ago.

data dplyr hacktoberfest janitor manipulation reshape tidyr wrangling

5.3 match 222 stars 14.74 score 436 scripts 119 dependents

chavent

PCAmixdata:Multivariate Analysis of Mixed Data

Implements principal component analysis, orthogonal rotation and multiple factor analysis for a mixture of quantitative and qualitative variables.

Maintained by Marie Chavent. Last updated 2 years ago.

9.1 match 8 stars 8.36 score 91 scripts 6 dependents

sdctools

sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.

Maintained by Matthias Templ. Last updated 28 days ago.

cpp

5.9 match 83 stars 9.89 score 258 scripts

green-striped-gecko

dartR.base:Analysing 'SNP' and 'Silicodart' Data - Basic Functions

Facilitates the import and analysis of 'SNP' (single nucleotide 'polymorphism') and 'silicodart' (presence/absence) data. The main focus is on data generated by 'DarT' (Diversity Arrays Technology), however, data from other sequencing platforms can be used once 'SNP' or related fragment presence/absence data from any source is imported. Genetic datasets are stored in a derived 'genlight' format (package 'adegenet'), that allows for a very compact storage of data and metadata. Functions are available for importing and exporting of 'SNP' and 'silicodart' data, for reporting on and filtering on various criteria (e.g. 'callrate', 'heterozygosity', 'reproducibility', maximum allele frequency). Additional functions are available for visualization (e.g. Principle Coordinate Analysis) and creating a spatial representation using maps. 'dartR.base' is the 'base' package of the 'dartRverse' suits of packages. To install the other packages, we recommend to install the 'dartRverse' package, that supports the installation of all packages in the 'dartRverse'. If you want to cite 'dartR', you find the information by typing citation('dartR.base') in the console.

Maintained by Bernd Gruber. Last updated 14 days ago.

15.0 match 3.84 score 17 scripts 5 dependents

nathaneastwood

poorman:A Poor Man's Dependency Free Recreation of 'dplyr'

A replication of key functionality from 'dplyr' and the wider 'tidyverse' using only 'base'.

Maintained by Nathan Eastwood. Last updated 1 years ago.

base-r data-manipulation grammar

5.3 match 341 stars 10.79 score 156 scripts 27 dependents

chrhennig

fpc:Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

Maintained by Christian Hennig. Last updated 6 months ago.

5.9 match 11 stars 9.25 score 2.6k scripts 70 dependents

rich-iannone

DiagrammeR:Graph/Network Visualization

Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.

Maintained by Richard Iannone. Last updated 2 months ago.

graph graph-functions network-graph property-graph visualization

3.6 match 1.7k stars 15.18 score 3.8k scripts 87 dependents

alisonlanski

IPEDSuploadables:Transforms Institutional Data into Text Files for IPEDS Automated Import/Upload

Starting from user-supplied institutional data, these scripts transform, aggregate, and reshape the information to produce key-value pair data files that are able to be uploaded to IPEDS (Integrated Postsecondary Education Data System) through their submission portal <https://surveys.nces.ed.gov/ipeds/>. Starting data specifications can be found in the vignettes. Final files are saved locally to a location of the user's choice. User-friendly readable files can also be produced for purposes of data review and validation.

Maintained by Alison Lanski. Last updated 3 months ago.

7.8 match 8 stars 7.05 score 39 scripts

jbryer

likert:Analysis and Visualization Likert Items

An approach to analyzing Likert response items, with an emphasis on visualizations. The stacked bar plot is the preferred method for presenting Likert results. Tabular results are also implemented along with density plots to assist researchers in determining whether Likert responses can be used quantitatively instead of qualitatively. See the likert(), summary.likert(), and plot.likert() functions to get started.

Maintained by Jason Bryer. Last updated 3 years ago.

5.3 match 310 stars 10.22 score 480 scripts 2 dependents

walkerke

tidycensus:Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames

An integrated R interface to several United States Census Bureau APIs (<https://www.census.gov/data/developers/data-sets.html>) and the US Census Bureau's geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for mapping and spatial analysis.

Maintained by Kyle Walker. Last updated 2 months ago.

3.6 match 647 stars 14.27 score 7.5k scripts 10 dependents

sebkrantz

collapse:Advanced and Fast Data Transformation

A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.

Maintained by Sebastian Krantz. Last updated 8 days ago.

data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data scientific-computing statistics time-series weighted weights cpp openmp

3.0 match 672 stars 16.63 score 708 scripts 97 dependents

hojsgaard

doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities

Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.

Maintained by Søren Højsgaard. Last updated 6 days ago.

3.3 match 1 stars 14.94 score 3.2k scripts 939 dependents

american-institutes-for-research

EdSurvey:Analysis of NCES Education Survey and Assessment Data

Read in and analyze functions for education survey and assessment data from the National Center for Education Statistics (NCES) <https://nces.ed.gov/>, including National Assessment of Educational Progress (NAEP) data <https://nces.ed.gov/nationsreportcard/> and data from the International Assessment Database: Organisation for Economic Co-operation and Development (OECD) <https://www.oecd.org/en/about/directorates/directorate-for-education-and-skills.html>, including Programme for International Student Assessment (PISA), Teaching and Learning International Survey (TALIS), Programme for the International Assessment of Adult Competencies (PIAAC), and International Association for the Evaluation of Educational Achievement (IEA) <https://www.iea.nl/>, including Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, Progress in International Reading Literacy Study (PIRLS), International Civic and Citizenship Study (ICCS), International Computer and Information Literacy Study (ICILS), and Civic Education Study (CivEd).

Maintained by Paul Bailey. Last updated 17 days ago.

6.1 match 10 stars 7.86 score 139 scripts 1 dependents

rkabacoff

qacBase:Functions to Facilitate Exploratory Data Analysis

Functions for descriptive statistics, data management, and data visualization.

Maintained by Kabacoff Robert. Last updated 3 years ago.

eda statistics

8.8 match 1 stars 5.13 score 45 scripts

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

3.8 match 215 stars 11.83 score 1.2k scripts 9 dependents

ewenharrison

finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling

Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.

Maintained by Ewen Harrison. Last updated 7 months ago.

3.9 match 270 stars 11.43 score 1.0k scripts

dusadrian

admisc:Adrian Dusa's Miscellaneous

Contains functions used across packages 'DDIwR', 'QCA' and 'venn'. Interprets and translates, factorizes and negates SOP - Sum of Products expressions, for both binary and multi-value crisp sets, and extracts information (set names, set values) from those expressions. Other functions perform various other checks if possibly numeric (even if all numbers reside in a character vector) and coerce to numeric, or check if the numbers are whole. It also offers, among many others, a highly versatile recoding routine and some more flexible alternatives to the base functions 'with()' and 'within()'. SOP simplification functions in this package use related minimization from package 'QCA', which is recommended to be installed despite not being listed in the Imports field, due to circular dependency issues.

Maintained by Adrian Dusa. Last updated 5 days ago.

5.8 match 2 stars 7.61 score 20 scripts 92 dependents

s-fleck

lest:Vectorised Nested if-else Statements Similar to CASE WHEN in 'SQL'

Functions for vectorised conditional recoding of variables. case_when() enables you to vectorise multiple if and else statements (like 'CASE WHEN' in 'SQL'). if_else() is a stricter and more predictable version of ifelse() in 'base' that preserves attributes. These functions are forked from 'dplyr' with all package dependencies removed and behave identically to the originals.

Maintained by Stefan Fleck. Last updated 1 years ago.

dplyr ifelse recoding

10.5 match 24 stars 4.08 score 4 scripts

rossellhayes

incase:Pipe-Friendly Vector Replacement with Case Statements

Offers a pipe-friendly alternative to the 'dplyr' functions case_when() and if_else(), as well as a number of user-friendly simplifications for common use cases. These functions accept a vector as an optional first argument, allowing conditional statements to be built using the 'magrittr' dot operator. The functions also coerce all outputs to the same type, meaning you no longer have to worry about using specific typed variants of NA or explicitly declaring integer outputs, and evaluate outputs somewhat lazily, so you don't waste time on long operations that won't be used.

Maintained by Alexander Rossell Hayes. Last updated 8 months ago.

magrittr magrittr-pipes recode-values tidyverse

11.1 match 7 stars 3.85 score 9 scripts

usccana

netdiffuseR:Analysis of Diffusion and Contagion Processes on Networks

Empirical statistical analysis, visualization and simulation of diffusion and contagion processes on networks. The package implements algorithms for calculating network diffusion statistics such as transmission rate, hazard rates, exposure models, network threshold levels, infectiousness (contagion), and susceptibility. The package is inspired by work published in Valente, et al., (2015) <DOI:10.1016/j.socscimed.2015.10.001>; Valente (1995) <ISBN: 9781881303213>, Myers (2000) <DOI:10.1086/303110>, Iyengar and others (2011) <DOI:10.1287/mksc.1100.0566>, Burt (1987) <DOI:10.1086/228667>; among others.

Maintained by George Vega Yon. Last updated 3 months ago.

contagion diffusion-network network-analysis network-visualization openblas cpp openmp

4.8 match 88 stars 8.88 score 217 scripts

mhahsler

arules:Mining Association Rules and Frequent Itemsets

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.

Maintained by Michael Hahsler. Last updated 1 months ago.

arules association-rules frequent-itemsets

3.0 match 194 stars 13.99 score 3.3k scripts 28 dependents

psychbruce

bruceR:Broadly Useful Convenient and Efficient R Functions

Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.

Maintained by Han-Wu-Shuang Bao. Last updated 9 months ago.

anova data-analysis data-science linear-models linear-regression multilevel-models statistics toolbox

5.3 match 176 stars 7.87 score 316 scripts 3 dependents

bioc

celda:CEllular Latent Dirichlet Allocation

Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.

Maintained by Joshua Campbell. Last updated 30 days ago.

singlecell geneexpression clustering sequencing bayesian immunooncology dataimport cpp openmp

3.8 match 147 stars 10.47 score 256 scripts 2 dependents

big-life-lab

cchsflow:Transforming and Harmonizing CCHS Variables

Supporting the use of the Canadian Community Health Survey (CCHS) by transforming variables from each cycle into harmonized, consistent versions that span survey cycles (currently, 2001 to 2018). CCHS data used in this library is accessed and adapted in accordance to the Statistics Canada Open Licence Agreement. This package uses rec_with_table(), which was developed from 'sjmisc' rec(). Lüdecke D (2018). "sjmisc: Data and Variable Transformation Functions". Journal of Open Source Software, 3(26), 754. <doi:10.21105/joss.00754>.

Maintained by Kitty Chen. Last updated 1 years ago.

cchs opensci openscience

6.3 match 12 stars 6.02 score 192 scripts

briencj

asremlPlus:Augments 'ASReml-R' in Fitting Mixed Models and Packages Generally in Exploring Prediction Differences

Assists in automating the selection of terms to include in mixed models when 'asreml' is used to fit the models. Procedures are available for choosing models that conform to the hierarchy or marginality principle, for fitting and choosing between two-dimensional spatial models using correlation, natural cubic smoothing spline and P-spline models. A history of the fitting of a sequence of models is kept in a data frame. Also used to compute functions and contrasts of, to investigate differences between and to plot predictions obtained using any model fitting function. The content falls into the following natural groupings: (i) Data, (ii) Model modification functions, (iii) Model selection and description functions, (iv) Model diagnostics and simulation functions, (v) Prediction production and presentation functions, (vi) Response transformation functions, (vii) Object manipulation functions, and (viii) Miscellaneous functions (for further details see 'asremlPlus-package' in help). The 'asreml' package provides a computationally efficient algorithm for fitting a wide range of linear mixed models using Residual Maximum Likelihood. It is a commercial package and a license for it can be purchased from 'VSNi' <https://vsni.co.uk/> as 'asreml-R', who will supply a zip file for local installation/updating (see <https://asreml.kb.vsni.co.uk/>). It is not needed for functions that are methods for 'alldiffs' and 'data.frame' objects. The package 'asremPlus' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 29 days ago.

asreml mixed-models

3.9 match 19 stars 9.34 score 200 scripts

dgerbing

lessR:Less Code, More Results

Each function replaces multiple standard R functions. For example, two function calls, Read() and CountAll(), generate summary statistics for all variables in the data frame, plus histograms and bar charts as appropriate. Other functions provide for summary statistics via pivot tables, a comprehensive regression analysis, ANOVA and t-test, visualizations including the Violin/Box/Scatter plot for a numerical variable, bar chart, histogram, box plot, density curves, calibrated power curve, reading multiple data formats with the same function call, variable labels, time series with aggregation and forecasting, color themes, and Trellis (facet) graphics. Also includes a confirmatory factor analysis of multiple indicator measurement models, pedagogical routines for data simulation such as for the Central Limit Theorem, generation and rendering of regression instructions for interpretative output, and interactive visualizations.

Maintained by David W. Gerbing. Last updated 2 days ago.

4.8 match 6 stars 7.42 score 394 scripts 3 dependents

emmanuelparadis

ape:Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

Maintained by Emmanuel Paradis. Last updated 2 days ago.

openblas cpp

2.0 match 64 stars 17.22 score 13k scripts 599 dependents

nickbond

hydrostats:Hydrologic Indices for Daily Time Series Data

Calculates a suite of hydrologic indices for daily time series data that are widely used in hydrology and stream ecology.

Maintained by Nick Bond. Last updated 3 years ago.

6.0 match 26 stars 5.71 score 65 scripts 1 dependents

markfairbanks

tidytable:Tidy Interface to 'data.table'

A tidy interface to 'data.table', giving users the speed of 'data.table' while using tidyverse-like syntax.

Maintained by Mark Fairbanks. Last updated 2 months ago.

3.0 match 458 stars 11.41 score 732 scripts 10 dependents

briencj

dae:Functions Useful in the Design and ANOVA of Experiments

The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the design functions for randomizing and assessing designs available as a vignette called 'DesignNotes'. The ANOVA functions facilitate the extraction of information when the 'Error' function has been used in the call to 'aov'. The package 'dae' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 4 months ago.

3.9 match 1 stars 8.62 score 356 scripts 7 dependents

rapidsurveys

bbw:Blocked Weighted Bootstrap

The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys) is implemented. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys.

Maintained by Ernest Guevarra. Last updated 2 months ago.

bootstrapping-statistics ram surveys

6.0 match 3 stars 5.61 score 9 scripts 1 dependents

ropensci

gendercoder:Recodes Sex/Gender Descriptions into a Standard Set

Provides functions and dictionaries for recoding of freetext gender responses into more consistent categories.

Maintained by Yaoxiang Li. Last updated 1 months ago.

gender-diversity ozunconf18 unconf

5.2 match 46 stars 6.36 score 45 scripts

nyuglobalties

rcoder:Lightweight Data Structure for Recoding Categorical Data without Factors

A data structure and toolkit for documenting and recoding categorical data that can be shared in other statistical software.

Maintained by Patrick Anker. Last updated 1 years ago.

9.3 match 2 stars 3.48 score 7 scripts 1 dependents

lindsayevanslee

whomds:Calculate Results from WHO Model Disability Survey Data

The Model Disability Survey (MDS) <https://www.who.int/activities/collection-of-data-on-disability> is a World Health Organization (WHO) general population survey instrument to assess the distribution of disability within a country or region, grounded in the International Classification of Functioning, Disability and Health <https://www.who.int/standards/classifications/international-classification-of-functioning-disability-and-health>. This package provides fit-for-purpose functions for calculating and presenting the results from this survey, as used by the WHO. The package primarily provides functions for implementing Rasch Analysis (see Andrich (2011) <doi:10.1586/erp.11.59>) to calculate a metric scale for disability.

Maintained by Lindsay Lee. Last updated 2 years ago.

5.8 match 4 stars 5.56 score 15 scripts

jangraffelman

HardyWeinberg:Statistical Tests and Graphics for Hardy-Weinberg Equilibrium

Contains tools for exploring Hardy-Weinberg equilibrium (Hardy, 1908; Weinberg, 1908) for bi and multi-allelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) with bi-allelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included (Graffelman & Weir, 2016) <doi:10.1038/hdy.2016.20>, including Bayesian procedures. Some exact and permutation procedures also work with multi-allelic variants. Special test procedures that jointly address Hardy-Weinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multi-allelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of bi-allelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots. The functionality of the package is explained in detail in a related JSS paper <doi:10.18637/jss.v064.i03>.

Maintained by Jan Graffelman. Last updated 12 months ago.

cpp

5.0 match 6.30 score 167 scripts 4 dependents

cran

epiDisplay:Epidemiological Data Display Package

Package for data exploration and result presentation. Full 'epicalc' package with data management functions is available at '<https://medipe.psu.ac.th/epicalc/>'.

Maintained by Virasakdi Chongsuvivatwong. Last updated 3 years ago.

5.5 match 1 stars 5.44 score 758 scripts 2 dependents

ddisab01

quest:Prepare Questionnaire Data for Analysis

Offers a suite of functions to prepare questionnaire data for analysis (perhaps other types of data as well). By data preparation, I mean data analytic tasks to get your raw data ready for statistical modeling (e.g., regression). There are functions to investigate missing data, reshape data, validate responses, recode variables, score questionnaires, center variables, aggregate by groups, shift scores (i.e., leads or lags), etc. It provides functions for both single level and multilevel (i.e., grouped) data. With a few exceptions (e.g., ncases()), functions without an "s" at the end of their primary word (e.g., center_by()) act on atomic vectors, while functions with an "s" at the end of their primary word (e.g., centers_by()) act on multiple columns of a data.frame.

Maintained by David Disabato. Last updated 1 years ago.

14.6 match 1.98 score 12 scripts

revelle

psychTools:Tools to Accompany the 'psych' Package for Psychological Research

Support functions, data sets, and vignettes for the 'psych' package. Contains several of the biggest data sets for the 'psych' package as well as four vignettes. A few helper functions for file manipulation are included as well. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 12 months ago.

4.7 match 5.89 score 178 scripts 5 dependents

ropensci

stats19:Work with Open Road Traffic Casualty Data from Great Britain

Tools to help download, process and analyse the UK road collision data collected using the 'STATS19' form. The datasets are provided as 'CSV' files with detailed road safety information about the circumstances of car crashes and other incidents on the roads resulting in casualties in Great Britain from 1979 to present. Tables are available on 'colissions' with the circumstances (e.g. speed limit of road), information about 'vehicles' involved (e.g. type of vehicle), and 'casualties' (e.g. age). The statistics relate only to events on public roads that were reported to the police, and subsequently recorded, using the 'STATS19' collision reporting form. See the Department for Transport website <https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-accidents-safety-data> for more information on these datasets. The package is described in a paper in the Journal of Open Source Software (Lovelace et al. 2019) <doi:10.21105/joss.01181>. See Gilardi et al. (2022) <doi:10.1111/rssa.12823>, Vidal-Tortosa et al. (2021) <doi:10.1016/j.jth.2021.101291>, and Tait et al. (2023) <doi:10.1016/j.aap.2022.106895> for examples of how the data can be used for methodological and empirical road safety research.

Maintained by Robin Lovelace. Last updated 2 months ago.

stats19 road-safety transport car-crashes ropensci data

2.8 match 64 stars 9.20 score 193 scripts

choi-phd

lordif:Logistic Ordinal Regression Differential Item Functioning using IRT

Performs analysis of Differential Item Functioning (DIF) for dichotomous and polytomous items using an iterative hybrid of ordinal logistic regression and item response theory (IRT) according to Choi, Gibbons, and Crane (2011) <doi:10.18637/jss.v039.i08>.

Maintained by Seung W. Choi. Last updated 2 months ago.

5.0 match 1 stars 5.12 score 35 scripts 1 dependents

bowenwang7

rres:Realized Relatedness Estimation and Simulation

Functions for studying realized genetic relatedness between people. Users will be able to simulate inheritance patterns given pedigree structures, generate SNP marker data given inheritance patterns, and estimate realized relatedness between pairs of individuals using SNP marker data. See Wang (2017) <doi:10.1534/genetics.116.197004>. This work was supported by National Institutes of Health grants R37 GM-046255.

Maintained by Bowen Wang. Last updated 7 years ago.

cpp

8.4 match 2.95 score 18 scripts

zpneal

childfree:Access and Harmonize Childfree Demographic Data

Reads demographic data from a variety of public data sources, extracting and harmonizing variables useful for the study of childfree individuals. The identification of childfree individuals and those with other family statuses uses Neal & Neal's (2024) "A Framework for Studying Adults who Neither have Nor Want Children" <doi:10.1177/10664807231198869>; A pre-print is available at <doi:10.31234/osf.io/fa89m>.

Maintained by Zachary Neal. Last updated 6 months ago.

5.1 match 4.85 score 1 scripts

eltoulemonde

dataPreparation:Automated Data Preparation

Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of 'data.table' efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.

Maintained by Emmanuel-Lin Toulemonde. Last updated 2 years ago.

data-preparation data-preprocessing data-science date-conversion speed variable-elimination variable-selection

4.5 match 31 stars 5.46 score 86 scripts

lcbc-uio

questionnaires:Package with functions to calculate components and sums for LCBC questionnaires

Creates summaries and factorials of answers to questionnaires.

Maintained by Athanasia Mo Mowinckel. Last updated 2 years ago.

5.1 match 3 stars 4.63 score 13 scripts

jwijffels

ETLUtils:Utility Functions to Execute Standard Extract/Transform/Load Operations (using Package 'ff') on Large Data

Provides functions to facilitate the use of the 'ff' package in interaction with big data in 'SQL' databases (e.g. in 'Oracle', 'MySQL', 'PostgreSQL', 'Hive') by allowing easy importing directly into 'ffdf' objects using 'DBI', 'RODBC' and 'RJDBC'. Also contains some basic utility functions to do fast left outer join merging based on 'match', factorisation of data and a basic function for re-coding vectors.

Maintained by Jan Wijffels. Last updated 5 years ago.

4.9 match 20 stars 4.75 score 28 scripts

rvlenth

rsm:Response-Surface Analysis

Provides functions to generate response-surface designs, fit first- and second-order response-surface models, make surface plots, obtain the path of steepest ascent, and do canonical analysis. A good reference on these methods is Chapter 10 of Wu, C-F J and Hamada, M (2009) "Experiments: Planning, Analysis, and Parameter Design Optimization" ISBN 978-0-471-69946-0. An early version of the package is documented in Journal of Statistical Software <doi:10.18637/jss.v032.i07>.

Maintained by Russell Lenth. Last updated 9 months ago.

2.3 match 18 stars 10.16 score 192 scripts 8 dependents

revelle

psych:Procedures for Psychological, Psychometric, and Personality Research

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 3 months ago.

1.6 match 52 stars 13.94 score 29k scripts 317 dependents

snoweye

EMCluster:EM Algorithm for Model-Based Clustering of Finite Mixture Gaussian Distribution

EM algorithms and several efficient initialization methods for model-based clustering of finite mixture Gaussian distribution with unstructured dispersion in both of unsupervised and semi-supervised learning.

Maintained by Wei-Chen Chen. Last updated 6 months ago.

openblas

3.0 match 18 stars 7.53 score 123 scripts 2 dependents

alexanderrobitzsch

sirt:Supplementary Item Response Theory Models

Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).

Maintained by Alexander Robitzsch. Last updated 3 months ago.

item-response-theory openblas cpp

2.3 match 23 stars 10.01 score 280 scripts 22 dependents

rempsyc

rempsyc:Convenience Functions for Psychology

Make your workflow faster and easier. Easily customizable plots (via 'ggplot2'), nice APA tables (following the style of the *American Psychological Association*) exportable to Word (via 'flextable'), easily run statistical tests or check assumptions, and automatize various other tasks.

Maintained by Rémi Thériault. Last updated 1 months ago.

convenience-functions ggplot2 psychology statistics visualization

2.0 match 43 stars 10.68 score 214 scripts 2 dependents

mrc-ide

demogsurv:Demographic analysis of DHS and other household surveys

This package includes tools for calculating demographic indicators from household survey data. Initially developed for for processing and analysis from Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS). The package provides tools to calculate standard child mortality, adult mortality, and fertility indicators stratified arbitrarily by age group, calendar period, pre-survey time periods, birth cohorts and other survey variables (e.g. residence, region, wealth status, education, etc.). Design-based standard errors and sample correlations are available for all indicators via Taylor linearisation or jackknife.

Maintained by Jeff Eaton. Last updated 3 years ago.

7.2 match 6 stars 2.92 score 28 scripts

njtierney

naniar:Data Structures, Summaries, and Visualisations for Missing Data

Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.

Maintained by Nicholas Tierney. Last updated 5 days ago.

data-visualisation ggplot2 missing-data missingness tidy-data

1.3 match 657 stars 15.63 score 5.1k scripts 9 dependents

glottospace

glottospace:Language Mapping and Geospatial Analysis of Linguistic and Cultural Data

Streamlined workflows for geolinguistic analysis, including: accessing global linguistic and cultural databases, data import, data entry, data cleaning, data exploration, mapping, visualization and export.

Maintained by Rui Dong. Last updated 3 months ago.

3.7 match 23 stars 5.54 score 6 scripts

hope-data-science

tidyfst:Tidy Verbs for Fast Data Manipulation

A toolkit of tidy data manipulation verbs with 'data.table' as the backend. Combining the merits of syntax elegance from 'dplyr' and computing performance from 'data.table', 'tidyfst' intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of 'data.table'. While enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations.

Maintained by Tian-Yuan Huang. Last updated 6 months ago.

2.0 match 100 stars 10.06 score 118 scripts 4 dependents

holgerschw

scrime:Analysis of High-Dimensional Categorical Data Such as SNP Data

Tools for the analysis of high-dimensional data developed/implemented at the group "Statistical Complexity Reduction In Molecular Epidemiology" (SCRIME). Main focus is on SNP data. But most of the functions can also be applied to other types of categorical data.

Maintained by Holger Schwender. Last updated 6 years ago.

3.9 match 5.10 score 53 scripts 35 dependents

djvanderlaan

datapackage:Creating and Reading Data Packages

Open, read data from and modify Data Packages. Data Packages are an open standard for bundling and describing data sets (<https://datapackage.org>). When data is read from a Data Package care is taken to convert the data as much a possible to R appropriate data types. The package can be extended with plugins for additional data types.

Maintained by Jan van der Laan. Last updated 8 days ago.

datapackage frictionless

3.5 match 2 stars 5.62 score

john-d-fox

Rcmdr:R Commander

A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.

Maintained by John Fox. Last updated 5 months ago.

2.0 match 4 stars 9.49 score 636 scripts 38 dependents

svmiller

stevemisc:Steve's Miscellaneous Functions

These are miscellaneous functions that I find useful for my research and teaching. The contents include themes for plots, functions for simulating quantities of interest from regression models, functions for simulating various forms of fake data for instructional/research purposes, and many more. All told, the functions provided here are broadly useful for data organization, data presentation, data recoding, and data simulation.

Maintained by Steve Miller. Last updated 7 days ago.

dplyr mixed-effects-models multivariate-normal-distribution tidyverse

2.8 match 10 stars 6.85 score 392 scripts 2 dependents

grunwaldlab

poppr:Genetic Analysis of Populations with Mixed Reproduction

Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grünwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grünwald (2015) <doi:10.3389/fgene.2015.00208>.

Maintained by Zhian N. Kamvar. Last updated 10 months ago.

clonality genetic-analysis genetic-distances minimum-spanning-networks multilocus-genotypes multilocus-lineages population-genetics populations openmp

1.7 match 69 stars 10.84 score 672 scripts

debruine

faux:Simulation for Factorial Designs

Create datasets with factorial structure through simulation by specifying variable parameters. Extended documentation at <https://debruine.github.io/faux/>. Described in DeBruine (2020) <doi:10.5281/zenodo.2669586>.

Maintained by Lisa DeBruine. Last updated 2 months ago.

data simulation

2.0 match 98 stars 9.14 score 716 scripts 1 dependents

bupaverse

bupaR:Business Process Analysis in R

Comprehensive Business Process Analysis toolkit. Creates S3-class for event log objects, and related handler functions. Imports related packages for filtering event data, computation of descriptive statistics, handling of 'Petri Net' objects and visualization of process maps. See also packages 'edeaR','processmapR', 'eventdataR' and 'processmonitR'.

Maintained by Gert Janssenswillen. Last updated 2 years ago.

2.0 match 55 stars 9.07 score 389 scripts 11 dependents

dusadrian

DDIwR:DDI with R

Useful functions for various DDI (Data Documentation Initiative) related inputs and outputs. Converts data files to and from DDI, SPSS, Stata, SAS, R and Excel, including user declared missing values.

Maintained by Adrian Dusa. Last updated 3 months ago.

3.7 match 15 stars 4.92 score 10 scripts

ifellows

Deducer:A Data Analysis GUI for R

An intuitive, cross-platform graphical data analysis system. It uses menus and dialogs to guide the user efficiently through the data manipulation and analysis process, and has an excel like spreadsheet for easy data frame visualization and editing. Deducer works best when used with the Java based R GUI JGR, but the dialogs can be called from the command line. Dialogs have also been integrated into the Windows Rgui.

Maintained by Ian Fellows. Last updated 9 years ago.

openjdk

5.3 match 3.44 score 91 scripts 1 dependents

sbgraves237

Ecfun:Functions for 'Ecdat'

Functions and vignettes to update data sets in 'Ecdat' and to create, manipulate, plot, and analyze those and similar data sets.

Maintained by Spencer Graves. Last updated 4 months ago.

2.3 match 7.94 score 85 scripts 4 dependents

rqtl

qtl2:Quantitative Trait Locus Mapping in Experimental Crosses

Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.

Maintained by Karl W Broman. Last updated 10 days ago.

cpp

1.9 match 34 stars 9.48 score 1.1k scripts 5 dependents

timteafan

dplyover:Create columns by applying functions to vectors and/or columns in 'dplyr'

Extension of 'dplyr’s functionality that builds a family of functions around dplyr::across().

Maintained by Tim Tiefenbach. Last updated 3 years ago.

dplyr

3.6 match 61 stars 4.79 score 9 scripts

bioc

annotatr:Annotation of Genomic Regions to Genomic Annotations

Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.

Maintained by Raymond G. Cavalcante. Last updated 5 months ago.

software annotation genomeannotation functionalgenomics visualization genome-annotation

1.8 match 26 stars 9.76 score 246 scripts 5 dependents

avdrark

mokken:Conducts Mokken Scale Analysis

Contains functions for performing Mokken scale analysis on test and questionnaire data. It includes an automated item selection algorithm, and various checks of model assumptions.

Maintained by L. Andries van der Ark. Last updated 9 months ago.

cpp

4.9 match 2 stars 3.45 score 68 scripts

nudacc

psHarmonize:Creates a Harmonized Dataset Based on a Set of Instructions

Functions which facilitate harmonization of data from multiple different datasets. Data harmonization involves taking data sources with differing values, creating coding instructions to create a harmonized set of values, then making those data modifications. 'psHarmonize' will assist with data modification once the harmonization instructions are written. Coding instructions are written by the user to create a "harmonization sheet". This sheet catalogs variable names, domains (e.g. clinical, behavioral, outcomes), provides R code instructions for mapping or conversion of data, specifies the variable name in the harmonized data set, and tracks notes. The package will then harmonize the source datasets according to the harmonization sheet to create a harmonized dataset. Once harmonization is finished, the package also has functions that will create descriptive statistics using 'RMarkdown'. Data Harmonization guidelines have been described by Fortier I, Raina P, Van den Heuvel ER, et al. (2017) <doi:10.1093/ije/dyw075>. Additional details of our R package have been described by Stephen JJ, Carolan P, Krefman AE, et al. (2024) <doi:10.1016/j.patter.2024.101003>.

Maintained by John Stephen. Last updated 2 months ago.

3.3 match 2 stars 5.15 score 10 scripts

cran

blockmodeling:Generalized and Classical Blockmodeling of Valued Networks

This is primarily meant as an implementation of generalized blockmodeling for valued networks. In addition, measures of similarity or dissimilarity based on structural equivalence and regular equivalence (REGE algorithms) can be computed and partitioned matrices can be plotted: Žiberna (2007)<doi:10.1016/j.socnet.2006.04.002>, Žiberna (2008)<doi:10.1080/00222500701790207>, Žiberna (2014)<doi:10.1016/j.socnet.2014.04.002>.

Maintained by Aleš Žiberna. Last updated 2 years ago.

fortran

6.0 match 2.78 score 12 dependents

ropenspain

infoelectoral:Download Spanish Election Results

Download official election results for Spain at polling station, municipality and province level from the Ministry of Interior (<https://infoelectoral.interior.gob.es/es/elecciones-celebradas/area-de-descargas/>), format them and import them to the R environment.

Maintained by Héctor Meleiro. Last updated 7 months ago.

data elecciones elections electoral infoelectoral spain

4.0 match 31 stars 4.14 score 9 scripts

swissstatsr

noga:noga: Recode according to the General Classifcation of Economic Activities 2008

This package recodes numeric NOGA values to its value labels and vice versa The package allows to recode values from all five NOGA levels (Section, Division, Group, Class and Type). The package can recode values to value labels in four languages (English, German, French and Italian).

Maintained by Johannes Besch. Last updated 8 months ago.

5.4 match 3.02 score 4 scripts

markustjansen

ThurMod:Thurstonian CFA and Thurstonian IRT Modeling

Fit Thurstonian forced-choice models (CFA (simple and factor) and IRT) in R. This package allows for the analysis of item response modeling (IRT) as well as confirmatory factor analysis (CFA) in the Thurstonian framework. Currently, estimation can be performed by 'Mplus' and 'lavaan'. References: Brown & Maydeu-Olivares (2011) <doi:10.1177/0013164410375112>; Jansen, M. T., & Schulze, R. (in review). The Thurstonian linked block design: Improving Thurstonian modeling for paired comparison and ranking data.; Maydeu-Olivares & Böckenholt (2005) <doi:10.1037/1082-989X.10.3.285>.

Maintained by Markus Thomas Jansen. Last updated 1 years ago.

5.3 match 3.00 score 2 scripts

florianjansen

vegdata:Access Vegetation Databases and Treat Taxonomy

Handling of vegetation data from different sources ( Turboveg 2.0 <https://www.synbiosys.alterra.nl/turboveg/>; the German national repository <https://www.vegetweb.de> and others. Taxonomic harmonization (given appropriate taxonomic lists, e.g. the German taxonomic standard list "GermanSL", <https://germansl.infinitenature.org>).

Maintained by Florian Jansen. Last updated 1 years ago.

4.0 match 2 stars 3.84 score 38 scripts 3 dependents

oxfordihtm

codeditr:Implementing Cause-of-Death Data Checks Based on the WHO CoDEdit Tool

The World Health Organization's CoDEdit electronic tool is intended to help producers of cause-of-death statistics in strengthening their capacity to perform routine checks on their data. This package ports the original tool built using Microsoft Access into R so as to leverage the utility and function of the original tool into a usable application program interface that can be used for building more universal tools or for creating programmatic scientific workflows aimed at routine, automated, and large-scale monitoring of cause-of-death data.

Maintained by Ernest Guevarra. Last updated 4 months ago.

cod icd

3.3 match 3 stars 4.65 score 6 scripts

ouhscbbmc

REDCapR:Interaction Between R and REDCap

Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.

Maintained by Will Beasley. Last updated 2 months ago.

redcap redcap-api

1.3 match 118 stars 12.36 score 438 scripts 6 dependents

jmbarbone

mark:Miscellaneous, Analytic R Kernels

Miscellaneous functions and wrappers for development in other packages created, maintained by Jordan Mark Barbone.

Maintained by Jordan Mark Barbone. Last updated 1 months ago.

3.0 match 6 stars 4.95 score 9 scripts

isglobal-brge

SNPassoc:SNPs-Based Whole Genome Association Studies

Functions to perform most of the common analysis in genome association studies are implemented. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Max-statistic and genetic risk-allele score exact distributions are also possible to be estimated. The methods are described in Gonzalez JR et al., 2007 <doi: 10.1093/bioinformatics/btm025>.

Maintained by Dolors Pelegri. Last updated 5 months ago.

1.6 match 16 stars 9.14 score 89 scripts 6 dependents

traminer

TraMineR:Trajectory Miner: a Sequence Analysis Toolkit

Set of sequence analysis tools for manipulating, describing and rendering categorical sequences, and more generally mining sequence data in the field of social sciences. Although this sequence analysis package is primarily intended for state or event sequences that describe time use or life courses such as family formation histories or professional careers, its features also apply to many other kinds of categorical sequence data. It accepts many different sequence representations as input and provides tools for converting sequences from one format to another. It offers several functions for describing and rendering sequences, for computing distances between sequences with different metrics (among which optimal matching), original dissimilarity-based analysis tools, and functions for extracting the most frequent event subsequences and identifying the most discriminating ones among them. A user's guide can be found on the TraMineR web page.

Maintained by Gilbert Ritschard. Last updated 3 months ago.

cpp

1.8 match 11 stars 8.24 score 534 scripts 13 dependents

patriciamar

ShinyItemAnalysis:Test and Item Analysis via Shiny

Package including functions and interactive shiny application for the psychometric analysis of educational tests, psychological assessments, health-related and other types of multi-item measurements, or ratings from multiple raters.

Maintained by Patricia Martinkova. Last updated 1 months ago.

assessment differential-item-functioning item-analysis item-response-theory psychometrics shiny

1.9 match 44 stars 7.88 score 105 scripts 3 dependents

declaredesign

fabricatr:Imagine Your Data Before You Collect It

Helps you imagine your data before you collect it. Hierarchical data structures and correlated data can be easily simulated, either from random number generators or by resampling from existing data sources. This package is faster with 'data.table' and 'mvnfast' installed.

Maintained by Graeme Blair. Last updated 1 months ago.

1.8 match 93 stars 8.29 score 234 scripts 5 dependents

jmping

weights:Weighting and Weighted Statistics

Provides a variety of functions for producing simple weighted statistics, such as weighted Pearson's correlations, partial correlations, Chi-Squared statistics, histograms, and t-tests. Also now includes some software for quickly recoding survey data and plotting estimates from interaction terms in regressions (and multiply imputed regressions) both with and without weights. NOTE: Weighted partial correlation calculations pulled to address a bug.

Maintained by Josh Pasek. Last updated 4 years ago.

2.3 match 6.20 score 590 scripts 41 dependents

pauljohn32

kutils:Project Management Tools

Tools for data importation, recoding, and inspection. There are functions to create new project folders, R code templates, create uniquely named output directories, and to quickly obtain a visual summary for each variable in a data frame. The main feature here is the systematic implementation of the "variable key" framework for data importation and recoding. We are eager to have community feedback about the variable key and the vignette about it. In version 1.7, the function 'semTable' is removed. It was deprecated since 1.67. That is provided in a separate package, 'semTable'.

Maintained by Paul Johnson. Last updated 2 years ago.

2.4 match 5.85 score 110 scripts 20 dependents

bioc

lfa:Logistic Factor Analysis for Categorical Data

Logistic Factor Analysis is a method for a PCA analogue on Binomial data via estimation of latent structure in the natural parameter. The main method estimates genetic population structure from genotype data. There are also methods for estimating individual-specific allele frequencies using the population structure. Lastly, a structured Hardy-Weinberg equilibrium (HWE) test is developed, which quantifies the goodness of fit of the genotype data to the estimated population structure, via the estimated individual-specific allele frequencies (all of which generalizes traditional HWE tests).

Maintained by Alejandro Ochoa. Last updated 5 months ago.

snp dimensionreduction principalcomponent regression openblas

2.0 match 16 stars 7.04 score 57 scripts 1 dependents

brentkaplan

beezdemand:Behavioral Economic Easy Demand

Facilitates many of the analyses performed in studies of behavioral economic demand. The package supports commonly-used options for modeling operant demand including (1) data screening proposed by Stein, Koffarnus, Snider, Quisenberry, & Bickel (2015; <doi:10.1037/pha0000020>), (2) fitting models of demand such as linear (Hursh, Raslear, Bauman, & Black, 1989, <doi:10.1007/978-94-009-2470-3_22>), exponential (Hursh & Silberberg, 2008, <doi:10.1037/0033-295X.115.1.186>) and modified exponential (Koffarnus, Franck, Stein, & Bickel, 2015, <doi:10.1037/pha0000045>), and (3) calculating numerous measures relevant to applied behavioral economists (Intensity, Pmax, Omax). Also supports plotting and comparing data.

Maintained by Brent Kaplan. Last updated 7 months ago.

2.3 match 15 stars 6.12 score 29 scripts 1 dependents

pauljohn32

rockchalk:Regression Estimation and Presentation

A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.

Maintained by Paul E. Johnson. Last updated 3 years ago.

1.9 match 7.13 score 584 scripts 18 dependents

clewerenz

ilabelled:Simple Handling of Labelled Data

Simple handling of survey data. Smart handling of meta-information like e.g. variable-labels value-labels and scale-levels. Easy access and validation of meta-information. Useage of value labels and values respectively for subsetting and recoding data.

Maintained by Christof Lewerenz. Last updated 2 months ago.

2.2 match 2 stars 6.02 score 13 scripts

cran

rosetta:Parallel Use of Statistical Packages in Teaching

When teaching statistics, it can often be desirable to uncouple the content from specific software packages. To ease such efforts, the Rosetta Stats website (<https://rosettastats.com>) allows comparing analyses in different packages. This package is the companion to the Rosetta Stats website, aiming to provide functions that produce output that is similar to output from other statistical packages, thereby facilitating 'software-agnostic' teaching of statistics.

Maintained by Gjalt-Jorn Peters. Last updated 2 years ago.

4.9 match 2.70 score

kasperwelbers

corpustools:Managing, Querying and Analyzing Tokenized Text

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.

Maintained by Kasper Welbers. Last updated 6 months ago.

cpp

1.8 match 31 stars 7.50 score 174 scripts 1 dependents

reed-evic

cpsvote:A Toolbox for Using the CPS’s Voting and Registration Supplement

Provides automated methods for downloading, recoding, and merging selected years of the Current Population Survey's Voting and Registration Supplement, a large N national survey about registration, voting, and non-voting in United States federal elections. Provides documentation for appropriate use of sample weights to generate statistical estimates, drawing from Hur & Achen (2013) <doi:10.1093/poq/nft042> and McDonald (2018) <http://www.electproject.org/home/voter-turnout/voter-turnout-data>.

Maintained by Jay Lee. Last updated 2 years ago.

2.3 match 3 stars 5.58 score 21 scripts

midfieldr

midfieldr:Tools and Methods for Working with MIDFIELD Data in 'R'

Provides tools and demonstrates methods for working with individual undergraduate student-level records (registrar's data) in 'R'. Tools include filters for program codes, data sufficiency, and timely completion. Methods include gathering blocs of records, computing quantitative metrics such as graduation rate, and creating charts to visualize comparisons. 'midfieldr' interacts with practice data provided in 'midfielddata', an R data package available at <https://midfieldr.github.io/midfielddata/>. 'midfieldr' also interacts with the full MIDFIELD database for users who have access. This work is supported by the US National Science Foundation through grant numbers 1545667 and 2142087.

Maintained by Richard Layton. Last updated 2 months ago.

2.3 match 2 stars 5.56 score 26 scripts

r-computing-lab

BGmisc:An R Package for Extended Behavior Genetics Analysis

Provides functions for behavior genetics analysis, including variance component model identification [Hunter et al. (2021) <doi:10.1007/s10519-021-10055-x>], calculation of relatedness coefficients using path-tracing methods [Wright (1922) <doi:10.1086/279872>; McArdle & McDonald (1984) <doi:10.1111/j.2044-8317.1984.tb00802.x>], inference of relatedness, pedigree conversion, and simulation of multi-generational family data [Lyu et al. (2024) <doi:10.1101/2024.12.19.629449>]. For a full overview, see Garrison et al. (2024) <doi:10.21105/joss.06203>.

Maintained by S. Mason Garrison. Last updated 26 days ago.

behavior-genetics

1.8 match 1 stars 6.83 score 35 scripts

ropensci

essurvey:Download Data from the European Social Survey on the Fly

Download data from the European Social Survey directly from their website <http://www.europeansocialsurvey.org/>. There are two families of functions that allow you to download and interactively check all countries and rounds available.

Maintained by Jorge Cimentada. Last updated 3 years ago.

ess

1.8 match 48 stars 6.88 score 79 scripts

carriedaymont

growthcleanr:Data Cleaner for Anthropometric Measurements

Identifies implausible anthropometric (e.g., height, weight) measurements in irregularly spaced longitudinal datasets, such as those from electronic health records.

Maintained by Carrie Daymont. Last updated 18 days ago.

ehr ehr-data

1.8 match 14 stars 6.68 score 41 scripts 1 dependents

dieghernan

igoR:Intergovernmental Organizations Database

Tools to extract information from the Intergovernmental Organizations ('IGO') Database , version 3, provided by the Correlates of War Project <https://correlatesofwar.org/>. See also Pevehouse, J. C. et al. (2020). Version 3 includes information from 1815 to 2014.

Maintained by Diego Hernangómez. Last updated 7 days ago.

igo correlates-of-war intergovernmental-organisations intergovernmental-organizations

1.9 match 6 stars 6.36 score 12 scripts

bioc

bioCancer:Interactive Multi-Omics Cancers Data Visualization and Analysis

This package is a Shiny App to visualize and analyse interactively Multi-Assays of Cancer Genomic Data.

Maintained by Karim Mezhoud. Last updated 5 months ago.

gui datarepresentation network multiplecomparison pathways reactome visualization geneexpression genetarget analysis biocancer-interface cancer cancer-studies rmarkdown

2.0 match 20 stars 5.95 score 7 scripts

nanxstats

oneclust:Maximum Homogeneity Clustering for Univariate Data

Maximum homogeneity clustering algorithm for one-dimensional data described in W. D. Fisher (1958) <doi:10.1080/01621459.1958.10501479> via dynamic programming.

Maintained by Nan Xiao. Last updated 1 years ago.

clustering-algorithm feature-engineering homogeneity peak-calling univariate-data cpp

2.7 match 5 stars 4.40 score

jhchou

peditools:Pediatric Clinical Data Science Tools

A collection of tools for newborn and pediatric anthropometric calculations and data abstraction from Vermont Oxford Network registry exports. Includes charts based on Lambda, Mu, Sigma (LMS) parameters, including: Fenton 2003, Olsen 2010, Olsen BMI, CDC infant, CDC pediatric, CDC BMI, CDC (Addo) skin, WHO infant, WHO skin, Abdel-Rahman 2017, Mramba 2017, Zemel Down Syndrome, Brooks cerebral palsy, WHO expanded, Cappa 2024 (except BMI). Includes functions to take a Vermont Oxford Network XML or CSV data file export read into a data frame, converting the coded variables into human readable factors.

Maintained by Joseph Chou. Last updated 2 months ago.

peditools

3.9 match 5 stars 3.00 score 2 scripts

mark-andrews

psyntur:Helper Tools for Teaching Statistical Data Analysis

Provides functions and data-sets that are helpful for teaching statistics and data analysis. It was originally designed for use when teaching students in the Psychology Department at Nottingham Trent University.

Maintained by Mark Andrews. Last updated 4 months ago.

1.8 match 5 stars 6.41 score 50 scripts

joon-e

tidycomm:Data Modification and Analysis for Communication Research

Provides convenience functions for common data modification and analysis tasks in communication research. This includes functions for univariate and bivariate data analysis, index generation and reliability computation, and intercoder reliability tests. All functions follow the style and syntax of the tidyverse, and are construed to perform their computations on multiple variables at once. Functions for univariate and bivariate data analysis comprise summary statistics for continuous and categorical variables, as well as several tests of bivariate association including effect sizes. Functions for data modification comprise index generation and automated reliability analysis of index variables. Functions for intercoder reliability comprise tests of several intercoder reliability estimates, including simple and mean pairwise percent agreement, Krippendorff's Alpha (Krippendorff 2004, ISBN: 9780761915454), and various Kappa coefficients (Brennan & Prediger 1981 <doi: 10.1177/001316448104100307>; Cohen 1960 <doi: 10.1177/001316446002000104>; Fleiss 1971 <doi: 10.1037/h0031619>).

Maintained by Julian Unkel. Last updated 11 months ago.

1.8 match 15 stars 6.59 score 52 scripts

tjebo

eye:Analysis of Eye Data

There is no ophthalmic researcher who has not had headaches from the handling of visual acuity entries. Different notations, untidy entries. This shall now be a matter of the past. Eye makes it as easy as pie to work with VA data - easy cleaning, easy conversion between Snellen, logMAR, ETDRS letters, and qualitative visual acuity shall never pester you again. The eye package automates the pesky task to count number of patients and eyes, and can help to clean data with easy re-coding for right and left eyes. It also contains functions to help reshaping eye side specific variables between wide and long format. Visual acuity conversion is based on Schulze-Bonsel et al. (2006) <doi:10.1167/iovs.05-0981>, Gregori et al. (2010) <doi:10.1097/iae.0b013e3181d87e04>, Beck et al. (2003) <doi:10.1016/s0002-9394(02)01825-1> and Bach (2007) <http:michaelbach.de/sci/acuity.html>.

Maintained by Tjebo Heeren. Last updated 3 years ago.

2.3 match 6 stars 4.92 score 14 scripts

ropengov

retroharmonize:Ex Post Survey Data Harmonization

Assist in reproducible retrospective (ex-post) harmonization of data, particularly individual level survey data, by providing tools for organizing metadata, standardizing the coding of variables, and variable names and value labels, including missing values, and documenting the data transformations, with the help of comprehensive s3 classes.

Maintained by Daniel Antal. Last updated 2 months ago.

ropengov

1.3 match 10 stars 7.62 score 59 scripts

tscnlab

LightLogR:Process Data from Wearable Light Loggers and Optical Radiation Dosimeters

Import, processing, validation, and visualization of personal light exposure measurement data from wearable devices. The package implements features such as the import of data and metadata files, conversion of common file formats, validation of light logging data, verification of crucial metadata, calculation of common parameters, and semi-automated analysis and visualization.

Maintained by Johannes Zauner. Last updated 26 days ago.

dosimetry light time-series-analysis wearable-devices wearable-sensors

1.7 match 12 stars 5.91 score 28 scripts

ugroempi

DoE.base:Full Factorials, Orthogonal Arrays and Base Utilities for DoE Packages

Creates full factorial experimental designs and designs based on orthogonal arrays for (industrial) experiments. Provides diverse quality criteria. Provides utility functions for the class design, which is also used by other packages for designed experiments.

Maintained by Ulrike Groemping. Last updated 1 years ago.

2.3 match 1 stars 4.50 score 104 scripts 19 dependents

statistikat

STATcubeR:R Interface for the 'STATcube' REST API and Open Government Data

Import data from the 'STATcube' REST API or from the open data portal of Statistics Austria. This package includes a client for API requests as well as parsing utilities for data which originates from 'STATcube'. Documentation about 'STATcubeR' is provided by several vignettes included in the package as well as on the public 'pkgdown' page at <https://statistikat.github.io/STATcubeR/>.

Maintained by Bernhard Meindl. Last updated 4 months ago.

api database ogd open-data sdmx

2.0 match 18 stars 5.03 score 9 scripts

rhartmano

labelr:Label Data Frames, Variables, and Values

Create and use data frame labels for data frame objects (frame labels), their columns (name labels), and individual values of a column (value labels). Value labels include one-to-one and many-to-one labels for nominal and ordinal variables, as well as numerical range-based value labels for continuous variables. Convert value-labeled variables so each value is replaced by its corresponding value label. Add values-converted-to-labels columns to a value-labeled data frame while preserving parent columns. Filter and subset a value-labeled data frame using labels, while returning results in terms of values. Overlay labels in place of values in common R commands to increase interpretability. Generate tables of value frequencies, with categories expressed as raw values or as labels. Access data frames that show value-to-label mappings for easy reference.

Maintained by Robert Hartman. Last updated 7 months ago.

1.8 match 3 stars 5.56 score 10 scripts

ctszkin

Jmisc:Julian Miscellaneous Function

Some handy function in R.

Maintained by TszKin Julian Chan. Last updated 3 years ago.

5.0 match 1.98 score 95 scripts

ctu-bern

redcaptools:Tools for exporting and working with REDCap data

Tools for exporting and working with REDCap data (e.g. adding labels, formatting dates).

Maintained by Alan G Haynes. Last updated 4 months ago.

api-export database

2.3 match 4 stars 4.38 score 9 scripts

green-striped-gecko

dartR.data:Auxiliary Data Package for Our Main Package 'dartR'

Data package for 'dartR'. Provides data sets to run examples in 'dartR'. This was necessary due to the size limit imposed by 'CRAN'. The data in 'dartR.data' is needed to run the examples provided in the 'dartR' functions. All available data sets are either based on actual data (but reduced in size) and/or simulated data sets to allow the fast execution of examples and demonstration of the functions.

Maintained by Bernd Gruber. Last updated 10 months ago.

1.9 match 5.20 score 4 scripts 7 dependents

tidyverse

forcats:Tools for Working with Categorical Variables (Factors)

Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').

Maintained by Hadley Wickham. Last updated 1 years ago.

factor tidyverse

0.5 match 555 stars 18.77 score 21k scripts 1.2k dependents

weirichs

eatTools:Miscellaneous Functions for the Analysis of Educational Assessments

Miscellaneous functions for data cleaning and data analysis of educational assessments. Includes functions for descriptive analyses, character vector manipulations and weighted statistics. Mainly a lightweight dependency for the packages 'eatRep', 'eatGADS', 'eatPrep' and 'eatModel' (which will be subsequently submitted to 'CRAN'). The function for defining (weighted) contrasts in weighted effect coding refers to te Grotenhuis et al. (2017) <doi:10.1007/s00038-016-0901-1>. Functions for weighted statistics refer to Wolter (2007) <doi:10.1007/978-0-387-35099-8>.

Maintained by Sebastian Weirich. Last updated 3 months ago.

1.8 match 2 stars 5.38 score 11 scripts 2 dependents

tntp

tntpr:Data Analysis Tools Customized for TNTP

An assortment of functions and templates customized to meet the needs of data analysts at the non-profit organization TNTP. Includes functions for branded colors and plots, credentials management, repository set-up, and other common analytic tasks.

Maintained by Dustin Pashouwer. Last updated 4 months ago.

1.7 match 7 stars 5.83 score 13 scripts

devpsylab

petersenlab:A Collection of R Functions by the Petersen Lab

A collection of R functions that are widely used by the Petersen Lab. Included are functions for various purposes, including evaluating the accuracy of judgments and predictions, performing scoring of assessments, generating correlation matrices, conversion of data between various types, data management, psychometric evaluation, extensions related to latent variable modeling, various plotting capabilities, and other miscellaneous useful functions. By making the package available, we hope to make our methods reproducible and replicable by others and to help others perform their data processing and analysis methods more easily and efficiently. The codebase is provided in Petersen (2025) <doi:10.5281/zenodo.7602890> and on 'CRAN': <doi: 10.32614/CRAN.package.petersenlab>. The package is described in "Principles of Psychological Assessment: With Applied Examples in R" (Petersen, 2024, 2025) <doi:10.1201/9781003357421>, <doi:10.25820/work.007199>, <doi:10.5281/zenodo.6466589>.

Maintained by Isaac T. Petersen. Last updated 27 days ago.

data-analysis data-analysis-in-r data-management psychometrics

2.3 match 1 stars 4.15 score 1 scripts

smouksassi

ggquickeda:Quickly Explore Your Data Using 'ggplot2' and 'table1' Summary Tables

Quickly and easily perform exploratory data analysis by uploading your data as a 'csv' file. Start generating insights using 'ggplot2' plots and 'table1' tables with descriptive stats, all using an easy-to-use point and click 'Shiny' interface.

Maintained by Samer Mouksassi. Last updated 2 days ago.

1.1 match 73 stars 8.34 score 27 scripts

pharmaverse

admiralophtha:ADaM in R Asset Library - Ophthalmology

Aids the programming of Clinical Data Standards Interchange Consortium (CDISC) compliant Ophthalmology Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam/adamig-v1-3-release-package>).

Maintained by Edoardo Mancini. Last updated 2 months ago.

1.2 match 15 stars 7.94 score 10 scripts

sgezan

ASRgenomics:Complementary Genomic Functions

Presents a series of molecular and genetic routines in the R environment with the aim of assisting in analytical pipelines before and after the use of 'asreml' or another library to perform analyses such as Genomic Selection or Genome-Wide Association Analyses. Methods and examples are described in Gezan, Oliveira, Galli, and Murray (2022) <https://asreml.kb.vsni.co.uk/wp-content/uploads/sites/3/ASRgenomics_Manual.pdf>.

Maintained by Salvador Gezan. Last updated 1 years ago.

4.0 match 1 stars 2.28 score 38 scripts

harrelfe

Hmisc:Harrell Miscellaneous

Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.

Maintained by Frank E Harrell Jr. Last updated 2 days ago.

fortran

0.5 match 210 stars 17.61 score 17k scripts 750 dependents

ashipunov

shipunov:Miscellaneous Functions from Alexey Shipunov

A collection of functions for data manipulation, plotting and statistical computing, to use separately or with the book "Visual Statistics. Use R!": Shipunov (2020) <http://ashipunov.info/shipunov/software/r/r-en.htm>. Dr Alexey Shipunov died in December 2022. Most useful functions: Bclust(), Jclust() and BootA() which bootstrap hierarchical clustering; Recode() which does multiple recoding in a fast, simple and flexible way; Misclass() which outputs confusion matrix even if classes are not concerted; Overlap() which measures group separation on any projection; Biarrows() which converts any scatterplot into biplot; and Pleiad() which is fast and flexible correlogram.

Maintained by ORPHANED. Last updated 2 years ago.

8.8 match 1.00 score 9 scripts

kjhealy

gssr:US General Social Survey (GSS) Data for R

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the GSS Cumulative Data and GSS Panel Data files packaged for R. Its companion package, gssrdoc, provides the codebook integrated into R's help system For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 4 months ago.

1.3 match 45 stars 6.42 score 147 scripts

nicolas-robette

seqhandbook:Miscellaneous Tools for Sequence Analysis

It provides miscellaneous sequence analysis functions for describing episodes in individual sequences, measuring association between domains in multidimensional sequence analysis (see Piccarreta (2017) <doi:10.1177/0049124115591013>), heat maps of sequence data, Globally Interdependent Multidimensional Sequence Analysis (see Robette et al (2015) <doi:10.1177/0081175015570976>), smoothing sequences for index plots (see Piccarreta (2012) <doi:10.1177/0049124112452394>), coding sequences for Qualitative Harmonic Analysis (see Deville (1982)), measuring stress from multidimensional scaling factors (see Piccarreta and Lior (2010) <doi:10.1111/j.1467-985X.2009.00606.x>), symmetrical (or canonical) Partial Least Squares (see Bry (1996)).

Maintained by Nicolas Robette. Last updated 2 years ago.

1.8 match 6 stars 4.76 score 19 scripts

cran

RALSA:R Analyzer for Large-Scale Assessments

Download, prepare and analyze data from large-scale assessments and surveys with complex sampling and assessment design (see 'Rutkowski', 2010 <doi:10.3102/0013189X10363170>). Such studies are, for example, international assessments like 'TIMSS', 'PIRLS' and 'PISA'. A graphical interface is available for the non-technical user.The package includes functions to covert the original data from 'SPSS' into 'R' data sets keeping the user-defined missing values, merge data from different respondents and/or countries, generate variable dictionaries, modify data, produce descriptive statistics (percentages, means, percentiles, benchmarks) and multivariate statistics (correlations, linear regression, binary logistic regression). The number of supported studies and analysis types will increase in future. For a general presentation of the package, see 'Mirazchiyski', 2021a (<doi:10.1186/s40536-021-00114-4>). For detailed technical aspects of the package, see 'Mirazchiyski', 2021b (<doi:10.3390/psych3020018>).

Maintained by Plamen V. Mirazchiyski. Last updated 22 days ago.

3.7 match 2.30 score

bricenocenti

tabxplor:User-Friendly Tables with Color Helpers for Data Exploration

Make it easy to deal with multiple cross-tables in data exploration, by creating them, manipulating them, and adding color helpers to highlight important informations (differences from totals, comparisons between lines or columns, contributions to variance, confidence intervals, odds ratios, etc.). All functions are pipe-friendly and render data frames which can be easily manipulated. In the same time, time-taking operations are done with 'data.table' to go faster with big dataframes. Tables can be exported with formats and colors to 'Excel', plot and html.

Maintained by Brice Nocenti. Last updated 9 days ago.

1.8 match 1 stars 4.73 score 12 scripts 1 dependents

cran

randomUniformForest:Random Uniform Forests for Classification, Regression and Unsupervised Learning

Ensemble model, for classification, regression and unsupervised learning, based on a forest of unpruned and randomized binary decision trees. Each tree is grown by sampling, with replacement, a set of variables at each node. Each cut-point is generated randomly, according to the continuous Uniform distribution. For each tree, data are either bootstrapped or subsampled. The unsupervised mode introduces clustering, dimension reduction and variable importance, using a three-layer engine. Random Uniform Forests are mainly aimed to lower correlation between trees (or trees residuals), to provide a deep analysis of variable importance and to allow native distributed and incremental learning.

Maintained by Saip Ciss. Last updated 3 years ago.

cpp

2.3 match 3 stars 3.77 score 99 scripts

bioc

chipenrich:Gene Set Enrichment For ChIP-seq Peak Data

ChIP-Enrich and Poly-Enrich perform gene set enrichment testing using peaks called from a ChIP-seq experiment. The method empirically corrects for confounding factors such as the length of genes, and the mappability of the sequence surrounding genes.

Maintained by Kai Wang. Last updated 7 days ago.

immunooncology chipseq epigenetics functionalgenomics genesetenrichment histonemodification regression

1.7 match 4.94 score 29 scripts

ctn-0094

CTNote:CTN Outcomes, Treatments, and Endpoints

The Clinical Trials Network (CTN) of the U.S. National Institute of Drug Abuse sponsored the CTN-0094 research team to harmonize data sets from three nationally-representative clinical trials for opioid use disorder (OUD). The CTN-0094 team herein provides a coded collection of trial outcomes and endpoints used in various OUD clinical trials over the past 50 years. These coded outcome functions are used to contrast and cluster different clinical outcome functions based on daily or weekly patient urine screenings. Note that we abbreviate urine drug screen as "UDS" and urine opioid screen as "UOS". For the example data sets (based on clinical trials data harmonized by the CTN-0094 research team), UDS and UOS are largely interchangeable.

Maintained by Gabriel Odom. Last updated 1 years ago.

1.7 match 1 stars 4.78 score 20 scripts

biometris

statgenGWAS:Genome Wide Association Studies

Fast single trait Genome Wide Association Studies (GWAS) following the method described in Kang et al. (2010), <doi:10.1038/ng.548>. One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris.

Maintained by Bart-Jan van Rossum. Last updated 4 months ago.

genetics gwas openblas cpp openmp

1.3 match 14 stars 6.14 score 15 scripts 3 dependents

opendataformat

opendataformat:Reading and Writing Open Data Format Files

The Open Data Format (ODF) is a new, non-proprietary, multilingual, metadata enriched, and zip-compressed data format with metadata structured in the Data Documentation Initiative (DDI) Codebook standard. This package allows reading and writing of data files in the Open Data Format (ODF) in R, and displaying metadata in different languages. For further information on the Open Data Format, see <https://opendataformat.github.io/>.

Maintained by Tom Hartl. Last updated 8 days ago.

1.5 match 5.41 score 7 scripts

nutriverse

dietry:Utilities for Calculating Dietary Intake Indicators for Food Security Assessments

Food security assessments utilise several dietary intake indicators as proxy measures for diet quality, diet sufficiency, and food availability either at individual or household level. Utilities for recoding and calculating these indicators support in establishing consistent and reliable results.

Maintained by Ernest Guevarra. Last updated 3 months ago.

2.3 match 2 stars 3.48 score 3 scripts

bpoconnor

EFA.dimensions:Exploratory Factor Analysis Functions for Assessing Dimensionality

Functions for eleven procedures for determining the number of factors, including functions for parallel analysis and the minimum average partial test. There are also functions for conducting principal components analysis, principal axis factor analysis, maximum likelihood factor analysis, image factor analysis, and extension factor analysis, all of which can take raw data or correlation matrices as input and with options for conducting the analyses using Pearson correlations, Kendall correlations, Spearman correlations, gamma correlations, or polychoric correlations. Varimax rotation, promax rotation, and Procrustes rotations can be performed. Additional functions focus on the factorability of a correlation matrix, the congruences between factors from different datasets, the assessment of local independence, the assessment of factor solution complexity, and internal consistency. Auerswald & Moshagen (2019, ISSN:1939-1463); Field, Miles, & Field (2012, ISBN:978-1-4462-0045-2); Mulaik (2010, ISBN:978-1-4200-9981-2); O'Connor (2000, <doi:10.3758/bf03200807>); O'Connor (2001, ISSN:0146-6216).

Maintained by Brian P. OConnor. Last updated 9 months ago.

5.0 match 1 stars 1.57 score 33 scripts

arcenis-r

cepumd:Calculate Consumer Expenditure Survey (CE) Annual Estimates

Provides functions and data files to help CE Public-Use Microdata (PUMD) users calculate annual estimated expenditure means, standard errors, and quantiles according to the methods used by the CE with PUMD. For more information on the CE please visit <https://www.bls.gov/cex>. For further reading on CE estimate calculations please see the CE Calculation section of the U.S. Bureau of Labor Statistics (BLS) Handbook of Methods at <https://www.bls.gov/opub/hom/cex/calculation.htm>. For further information about CE PUMD please visit <https://www.bls.gov/cex/pumd.htm>.

Maintained by Arcenis Rojas. Last updated 11 months ago.

1.8 match 7 stars 4.24 score 6 scripts

vincentporretta

VWPre:Tools for Preprocessing Visual World Data

Gaze data from the Visual World Paradigm requires significant preprocessing prior to plotting and analyzing the data. This package provides functions for preparing visual world eye-tracking data for statistical analysis and plotting. It can prepare data for linear analyses (e.g., ANOVA, Gaussian-family LMER, Gaussian-family GAMM) as well as logistic analyses (e.g., binomial-family LMER and binomial-family GAMM). Additionally, it contains various plotting functions for creating grand average and conditional average plots. See the vignette for samples of the functionality. Currently, the functions in this package are designed for handling data collected with SR Research Eyelink eye trackers using Sample Reports created in SR Research Data Viewer. While we would like to add functionality for data collected with other systems in the future, the current package is considered to be feature-complete; further updates will mainly entail maintenance and the addition of minor functionality.

Maintained by Vincent Porretta. Last updated 4 years ago.

1.7 match 4.28 score 80 scripts 1 dependents

sfcheung

semhelpinghands:Helper Functions for Structural Equation Modeling

An assortment of helper functions for doing structural equation modeling, mainly by 'lavaan' for now. Most of them are time-saving functions for common tasks in doing structural equation modeling and reading the output. This package is not for functions that implement advanced statistical procedures. It is a light-weight package for simple functions that do simple tasks conveniently, with as few dependencies as possible.

Maintained by Shu Fai Cheung. Last updated 5 months ago.

bootstrapping lavaan structural-equation-modeling

1.3 match 5.13 score 27 scripts

rapidsurveys

oldr:An Implementation of Rapid Assessment Method for Older People

An implementation of the Rapid Assessment Method for Older People or RAM-OP <https://www.helpage.org/resource/rapid-assessment-method-for-older-people-ramop-manual/>. It provides various functions that allow the user to design and plan the assessment and analyse the collected data. RAM-OP provides accurate and reliable estimates of the needs of older people.

Maintained by Ernest Guevarra. Last updated 1 months ago.

assessment data-analysis odk ram-op rapid-assessment

1.3 match 2 stars 5.00 score 4 scripts

openvolley

peranavolley:Perana Sports Volleyball Files

Basic functions for reading and working with Perana Sports volleyball scouting files.

Maintained by Ben Raymond. Last updated 10 months ago.

2.3 match 2.95 score 1 scripts 6 dependents

dahhamalsoud

phdcocktail:Enhance the Ease of R Experience as an Emerging Researcher

A toolkit of functions to help: i) effortlessly transform collected data into a publication ready format, ii) generate insightful visualizations from clinical data, iii) report summary statistics in a publication-ready format, iv) efficiently export, save and reload R objects within the framework of R projects.

Maintained by Dahham Alsoud. Last updated 1 years ago.

1.8 match 3.70 score 1 scripts

cvoeten

permutes:Permutation Tests for Time Series Data

Helps you determine the analysis window to use when analyzing densely-sampled time-series data, such as EEG data, using permutation testing (Maris & Oostenveld, 2007) <doi:10.1016/j.jneumeth.2007.03.024>. These permutation tests can help identify the timepoints where significance of an effect begins and ends, and the results can be plotted in various types of heatmap for reporting. Mixed-effects models are supported using an implementation of the approach by Lee & Braun (2012) <doi:10.1111/j.1541-0420.2011.01675.x>.

Maintained by Cesko C. Voeten. Last updated 2 years ago.

1.5 match 4.23 score 16 scripts

cranhaven

rock:Reproducible Open Coding Kit

The Reproducible Open Coding Kit ('ROCK', and this package, 'rock') was developed to facilitate reproducible and open coding, specifically geared towards qualitative research methods. Although it is a general-purpose toolkit, three specific applications have been implemented, specifically an interface to the 'rENA' package that implements Epistemic Network Analysis ('ENA'), means to process notes from Cognitive Interviews ('CIs'), and means to work with decentralized construct taxonomies ('DCTs'). The 'ROCK' and this 'rock' package are described in the ROCK book <https://rockbook.org> and more information, such as tutorials, is available at <https://rock.science>.

Maintained by Gjalt-Jorn Peters. Last updated 9 days ago.

archived packages r-universe

1.9 match 5 stars 3.40 score

cran

misty:Miscellaneous Functions 'T. Yanagida'

Miscellaneous functions for (1) data management (e.g., grand-mean and group-mean centering, coding variables and reverse coding items, scale and cluster scores, reading and writing Excel and SPSS files), (2) descriptive statistics (e.g., frequency table, cross tabulation, effect size measures), (3) missing data (e.g., descriptive statistics for missing data, missing data pattern, Little's test of Missing Completely at Random, and auxiliary variable analysis), (4) multilevel data (e.g., multilevel descriptive statistics, within-group and between-group correlation matrix, multilevel confirmatory factor analysis, level-specific fit indices, cross-level measurement equivalence evaluation, multilevel composite reliability, and multilevel R-squared measures), (5) item analysis (e.g., confirmatory factor analysis, coefficient alpha and omega, between-group and longitudinal measurement equivalence evaluation), (6) statistical analysis (e.g., bootstrap confidence intervals, collinearity and residual diagnostics, dominance analysis, between- and within-subject analysis of variance, latent class analysis, t-test, z-test, sample size determination), and (7) functions to interact with 'Blimp' and 'Mplus'.

Maintained by Takuya Yanagida. Last updated 8 days ago.

2.3 match 1 stars 2.82 score 1 dependents

epicentre-msf

hmatch:Tools for Cleaning and Matching Hierarchically-Structured Data

Tools for matching raw, potentially messy hierarchical data (e.g. province, county, township) against a reference dataset.

Maintained by Patrick Barks. Last updated 1 years ago.

1.8 match 10 stars 3.43 score 27 scripts

profyliu

bsnsing:Bsnsing: A Decision Tree Induction Method Based on Recursive Optimal Boolean Rule Composition

The bsnsing package provides functions for training a decision tree classifier, making predictions and generating latex code for plotting. It solves the two-class and multi-class classification problems under the supervised learning paradigm. While building a decision tree, bsnsing uses a Boolean rule involving multiple variables to split a node. Each split rule is identified by solving an optimization problem. Use the bsnsing function to build a tree, the predict function to make predictions and the show function to plot the tree. The paper is at <arXiv:2205.15263>. Source code and more data sets are at <https://github.com/profyliu/bsnsing>.

Maintained by Yanchao Liu. Last updated 3 years ago.

1.7 match 7 stars 3.54 score 1 scripts

cran

codelist:Working with Code Lists

Functions for working with code lists and vectors with codes. These are an alternative for factor that keep track of both the codes and labels. Methods allow for transforming between codes and labels. Also supports hierarchical code lists.

Maintained by Jan van der Laan. Last updated 26 days ago.

1.8 match 3.02 score 21 scripts

jacobwharris

speedycode:Automate Code for Adding Labels, Recoding and Renaming Variables, and Converting ASCII Files

Label, recode, rename, and convert datasets and ASCII files more efficiently. 'speedycode' automates the code necessary for labeling variables with the 'labelled' package, recoding and renaming variables with 'dplyr' syntax, and converting ASCII files with the 'readroper' package. Most functions require only the name of the dataset and the code will be automatically written. Some convenience functions useful for converting ASCII files are also included.

Maintained by Jacob Harris. Last updated 3 years ago.

5.3 match 1.00 score

cran

BCRA:Breast Cancer Risk Assessment

Functions provide risk projections of invasive breast cancer based on Gail model according to National Cancer Institute's Breast Cancer Risk Assessment Tool algorithm for specified race/ethnic groups and age intervals. Gail MH, Brinton LA, et al (1989) <doi:10.1093/jnci/81.24.1879>. Marthew PB, Gail MH, et al (2016) <doi:10.1093/jnci/djw215>.

Maintained by Fanni Zhang. Last updated 5 years ago.

4.0 match 1 stars 1.30 score

rapidsurveys

odkr:'Open Data Kit' ('ODK') R API

Utility functions for working with datasets gathered using 'Open Data Kit' ('ODK') <https://opendatakit.org/>. These include an API to interface with 'ODK Briefcase', a 'Java' application for fetching and pushing 'ODK' forms and their contents, that allows pulling of data from either a remote 'ODK Aggregate Server' or a local 'ODK' folder, a rename function to give more human readable variable names for 'ODK' datasets, a merge function to create a single dataframe from a nested 'ODK' dataset and an expand function to disaggregate multiple choice answers that have been collapsed into single code by 'ODK'.

Maintained by Ernest Guevarra. Last updated 5 months ago.

odk odk-briefcase open-data-kit openjdk

1.6 match 11 stars 3.32 score 19 scripts

matherion

limonaid:Working with 'LimeSurvey' Surveys and Responses

'LimeSurvey' is Free/Libre Open Source Software for the development and administrations of online studies, using sophisticated tailoring capabilities to support multiple study designs (see <https://www.limesurvey.org>). This package supports programmatic creation of surveys that can then be imported into 'LimeSurvey', as well as user friendly import of responses from 'LimeSurvey' studies.

Maintained by Gjalt-Jorn Peters. Last updated 2 months ago.

1.7 match 3.00 score 6 scripts

jasonmoy28

psycCleaning:Data Cleaning for Psychological Analyses

Useful for preparing and cleaning data. It includes functions to center data, reverse coding, dummy code and effect code data, and more.

Maintained by Jason Moy. Last updated 11 months ago.

1.9 match 1 stars 2.70 score 1 scripts

cran

exceldata:Streamline Data Import, Cleaning and Recoding from 'Excel'

A small group of functions to read in a data dictionary and the corresponding data table from 'Excel' and to automate the cleaning, re-coding and creation of simple calculated variables. This package was designed to be a companion to the macro-enabled 'Excel' template available on the GitHub site, but works with any similarly-formatted 'Excel' data.

Maintained by Lisa Avery. Last updated 1 years ago.

2.9 match 1.70 score

cran

vimpclust:Variable Importance in Clustering

An implementation of methods related to sparse clustering and variable importance in clustering. The package currently allows to perform sparse k-means clustering with a group penalty, so that it automatically selects groups of numerical features. It also allows to perform sparse clustering and variable selection on mixed data (categorical and numerical features), by preprocessing each categorical feature as a group of numerical features. Several methods for visualizing and exploring the results are also provided. M. Chavent, J. Lacaille, A. Mourer and M. Olteanu (2020)<https://www.esann.org/sites/default/files/proceedings/2020/ES2020-103.pdf>.

Maintained by Madalina Olteanu. Last updated 4 years ago.

2.0 match 2.30 score 4 scripts

cran

QTLRel:Tools for Mapping of Quantitative Traits of Genetically Related Individuals and Calculating Identity Coefficients from Pedigrees

This software provides tools for quantitative trait mapping in populations such as advanced intercross lines where relatedness among individuals should not be ignored. It can estimate background genetic variance components, impute missing genotypes, simulate genotypes, perform a genome scan for putative quantitative trait loci (QTL), and plot mapping results. It also has functions to calculate identity coefficients from pedigrees, especially suitable for pedigrees that consist of a large number of generations, or estimate identity coefficients from genotypic data in certain circumstances.

Maintained by Riyan Cheng. Last updated 2 years ago.

fortran openblas

2.3 match 2.00 score

anotherruisun

wtest:The W-Test for Genetic Interactions Testing

Perform the calculation of W-test, diagnostic checking, calculate minor allele frequency (MAF) and odds ratio.

Maintained by Rui Sun. Last updated 6 years ago.

4.3 match 1.04 score 11 scripts

vonnwalter23

dinamic:A Method to Analyze Recurrent DNA Copy Number Aberrations in Tumors

In tumor tissue, underlying genomic instability can lead to DNA copy number alterations, e.g., copy number gains or losses. Sporadic copy number alterations occur randomly throughout the genome, whereas recurrent alterations are observed in the same genomic region across multiple independent samples, perhaps because they provide a selective growth advantage. This package implements the DiNAMIC procedure for assessing the statistical significance of recurrent DNA copy number aberrations (Bioinformatics (2011) 27(5) 678 - 685).

Maintained by Vonn Walter. Last updated 1 years ago.

2.0 match 2.18 score 8 scripts 1 dependents

phgrosjean

zooimage:Analysis of Numerical Plankton Images

A free (open source) solution for analyzing digital images of plankton. In combination with ImageJ, a free image analysis system, it processes digital images, measures individuals, trains for automatic classification of taxa, and finally, measures plankton samples (abundances, total and partial size spectra or biomasses, etc.).

Maintained by Philippe Grosjean. Last updated 7 years ago.

3.0 match 1.32 score 21 scripts

loremerdrignac

EpiStats:Tools for Epidemiologists

Provides set of functions aimed at epidemiologists. The package includes commands for measures of association and impact for case control studies and cohort studies. It may be particularly useful for outbreak investigations including univariable analysis and stratified analysis. The functions for cohort studies include the CS(), CSTable() and CSInter() commands. The functions for case control studies include the CC(), CCTable() and CCInter() commands. References - Cornfield, J. 1956. A statistical problem arising from retrospective studies. In Vol. 4 of Proceedings of the Third Berkeley Symposium, ed. J. Neyman, 135-148. Berkeley, CA - University of California Press. Woolf, B. 1955. On estimating the relation between blood group disease. Annals of Human Genetics 19 251-253. Reprinted in Evolution of Epidemiologic Ideas Annotated Readings on Concepts and Methods, ed. S. Greenland, pp. 108-110. Newton Lower Falls, MA Epidemiology Resources. Gilles Desve & Peter Makary, 2007. 'CSTABLE Stata module to calculate summary table for cohort study' Statistical Software Components S456879, Boston College Department of Economics. Gilles Desve & Peter Makary, 2007. 'CCTABLE Stata module to calculate summary table for case-control study' Statistical Software Components S456878, Boston College Department of Economics.

Maintained by Lore Merdrignac. Last updated 1 years ago.

1.3 match 2.91 score 82 scripts

bartjanvanrossum

statgenQTLxT:Multi-Trait and Multi-Trial Genome Wide Association Studies

Fast multi-trait and multi-trail Genome Wide Association Studies (GWAS) following the method described in Zhou and Stephens. (2014), <doi:10.1038/nmeth.2848>. One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris.

Maintained by Bart-Jan van Rossum. Last updated 1 years ago.

openblas cpp openmp

1.3 match 2.70 score 2 scripts

cran

ecopower:Power Estimates and Equivalence Testing for Multivariate Data

Estimates power by simulation for multivariate abundance data to be used for sample size estimates. Multivariate equivalence testing by simulation from a Gaussian copula model. The package also provides functions for parameterising multivariate effect sizes and simulating multivariate abundance data jointly. The discrete Gaussian copula approach is described in Popovic et al. (2018) <doi:10.1016/j.jmva.2017.12.002>.

Maintained by Michelle Lim. Last updated 2 years ago.

1.3 match 2.70 score

mrc-ide

naomi.utils:Utility Functions For Naomi Datasets

This package contains utility functions for creating and manipulating datasets for the Naomi model and related projects.

Maintained by Jeffrey Eaton. Last updated 12 months ago.

1.7 match 1 stars 1.64 score 11 scripts

najerazuloaga

PROreg:Patient Reported Outcomes Regression Analysis

It offers a wide variety of techniques, such as graphics, recoding, or regression models, for a comprehensive analysis of patient-reported outcomes (PRO). Especially novel is the broad range of regression models based on the beta-binomial distribution useful for analyzing binomial data with over-dispersion in cross-sectional, longitudinal, or multidimensional response studies (see Najera-Zuloaga J., Lee D.-J. and Arostegui I. (2019) <doi:10.1002/bimj.201700251>).

Maintained by Josu Najera-Zuloaga. Last updated 1 years ago.

2.3 match 1.23 score

bioc

GNOSIS:Genomics explorer using statistical and survival analysis in R

GNOSIS incorporates a range of R packages enabling users to efficiently explore and visualise clinical and genomic data obtained from cBioPortal. GNOSIS uses an intuitive GUI and multiple tab panels supporting a range of functionalities. These include data upload and initial exploration, data recoding and subsetting, multiple visualisations, survival analysis, statistical analysis and mutation analysis, in addition to facilitating reproducible research.

Maintained by Lydia King. Last updated 5 months ago.

software shinyapps survival gui

0.5 match 5 stars 4.70 score 2 scripts

mrmarjan

multiUS:Functions for the Courses Multivariate Analysis and Computer Intensive Methods

Provides utility functions for multivariate analysis (factor analysis, discriminant analysis, and others). The package is primary written for the course Multivariate analysis and for the course Computer intensive methods at the masters program of Applied Statistics at University of Ljubljana.

Maintained by Cugmas Marjan. Last updated 2 years ago.

1.7 match 1.23 score 17 scripts

mrc-ide

dfertility:District level estimation of age-specific fertility

This package estimates district-level estimates of age-specific fertility from nationally representative household survey data.

Maintained by Oli Stevens. Last updated 1 years ago.

1.8 match 1 stars 1.18 score 15 scripts

nalimilan

RcmdrPlugin.temis:Graphical Integrated Text Mining Solution

An 'R Commander' plug-in providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, 'Twitter' queries, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.

Maintained by Milan Bouchet-Valat. Last updated 7 years ago.

1.9 match 1.00 score 7 scripts

cran

OTrecod:Data Fusion using Optimal Transportation Theory

In the context of data fusion, the package provides a set of functions dedicated to the solving of 'recoding problems' using optimal transportation theory (Gares, Guernec, Savy (2019) <doi:10.1515/ijb-2018-0106> and Gares, Omer (2020) <doi:10.1080/01621459.2020.1775615>). From two databases with no overlapping part except a subset of shared variables, the functions of the package assist users until obtaining a unique synthetic database, where the missing information is fully completed.

Maintained by Gregory Guernec. Last updated 2 years ago.

0.5 match 3.00 score

safai

varbin:Optimal Binning of Continuous and Categorical Variables

Tool for easy and efficient discretization of continuous and categorical data. The package calculates the most optimal binning of a given explanatory variable with respect to a user-specified target variable. The purpose is to assign a unique Weight-of-Evidence value to each of the calculated binpoints in order to recode the original variable. The package allows users to impose certain restrictions on the functional form on the resulting binning while maximizing the overall information value in the original data. The package is well suited for logistic scoring models where input variables may be subject to restrictions such as linearity by e.g. regulatory authorities. An excellent source describing in detail the development of scorecards, and the role of Weight-of-Evidence coding in credit scoring is (Siddiqi 2006, ISBN: 978–0-471–75451–0). The package utilizes the discrete nature of decision trees and Isotonic Regression to accommodate the trade-off between flexible functional forms and maximum information value.

Maintained by Daniel Safai. Last updated 6 years ago.

0.5 match 1.00 score 6 scripts