AGD:Analysis of Growth Data
Tools for the analysis of growth data: to extract an LMS table from a gamlss object, to calculate the standard deviation scores and its inverse, and to superpose two wormplots from different models. The package contains a some varieties of reference tables, especially for The Netherlands.
Maintained by Stef van Buuren. Last updated 11 months ago.
RSiena:Siena - Simulation Investigation for Empirical Network Analysis
The main purpose of this package is to perform simulation-based estimation of stochastic actor-oriented models for longitudinal network data collected as panel data. Dependent variables can be single or multivariate networks, which can be directed, non-directed, or two-mode; and associated actor variables. There are also functions for testing parameters and checking goodness of fit. An overview of these models is given in Snijders (2017), <doi:10.1146/annurev-statistics-060116-054035>.
Maintained by Tom A.B. Snijders. Last updated 1 months ago.
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 6 days ago.
vegan:Community Ecology Package
Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
Maintained by Jari Oksanen. Last updated 16 days ago.
gstat:Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation
Variogram modelling; simple, ordinary and universal point or block (co)kriging; spatio-temporal kriging; sequential Gaussian or indicator (co)simulation; variogram and variogram map plotting utility functions; supports sf and stars.
Maintained by Edzer Pebesma. Last updated 10 days ago.
languageR:Analyzing Linguistic Data: A Practical Introduction to Statistics
Data sets exemplifying statistical methods, and some facilitatory utility functions used in ``Analyzing Linguistic Data: A practical introduction to statistics using R'', Cambridge University Press, 2008.
Maintained by R. H. Baayen. Last updated 6 years ago.
Data used as examples in the current two books on Generalised Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>.
Maintained by Mikis Stasinopoulos. Last updated 1 years ago.
lsa:Latent Semantic Analysis
The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.
Maintained by Fridolin Wild. Last updated 3 years ago.
crfsuite:Conditional Random Fields for Labelling Sequential Data in Natural Language Processing
Wraps the 'CRFsuite' library <> allowing users to fit a Conditional Random Field model and to apply it on existing data. The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for named entity recognition, text chunking, part of speech tagging, intent recognition or classification of any category you have in mind. Next to training, a small web application is included in the package to allow you to easily construct training data.
Maintained by Jan Wijffels. Last updated 2 years ago.
certegis:A Certe R Package for Geographic Information Science
A Certe R package for geographic information science (GIS), using the 'sf' package and Dutch reference data. This package is part of the 'certedata' universe.
Maintained by Matthijs S. Berends. Last updated 3 months ago.
charlatan:Make Fake Data
Make fake data that looks realistic, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers ('DOIs'), jobs, phone numbers, 'DNA' sequences, doubles and integers from distributions and within a range.
Maintained by Roel M. Hogervorst. Last updated 1 months ago.
rsyntax:Extract Semantic Relations from Text by Querying and Reshaping Syntax
Various functions for querying and reshaping dependency trees, as for instance created with the 'spacyr' or 'udpipe' packages. This enables the automatic extraction of useful semantic relations from texts, such as quotes (who said what) and clauses (who did what). Method proposed in Van Atteveldt et al. (2017) <doi:10.1017/pan.2016.12>.
Maintained by Kasper Welbers. Last updated 3 years ago.
cocorresp:Co-Correspondence Analysis Methods
Fits predictive and symmetric co-correspondence analysis (CoCA) models to relate one data matrix to another data matrix. More specifically, CoCA maximises the weighted covariance between the weighted averaged species scores of one community and the weighted averaged species scores of another community. CoCA attempts to find patterns that are common to both communities.
Maintained by Gavin L. Simpson. Last updated 5 months ago.
MCMCpack:Markov Chain Monte Carlo (MCMC) Package
Contains functions to perform Bayesian inference using posterior simulation for a number of statistical models. Most simulation is done in compiled C++ written in the Scythe Statistical Library Version 1.0.3. All models return 'coda' mcmc objects that can then be summarized using the 'coda' package. Some useful utility functions such as density functions, pseudo-random number generators for statistical distributions, a general purpose Metropolis sampling algorithm, and tools for visualization are provided.
Maintained by Jong Hee Park. Last updated 7 months ago.
calibrate:Calibration of Scatterplot and Biplot Axes
Package for drawing calibrated scales with tick marks on (non-orthogonal) variable vectors in scatterplots and biplots. Also provides some functions for biplot creation and for multivariate analysis such as principal coordinate analysis.
Maintained by Jan Graffelman. Last updated 5 years ago.
grafzahl:Supervised Machine Learning for Textual Data Using Transformers and 'Quanteda'
Duct tape the 'quanteda' ecosystem (Benoit et al., 2018) <doi:10.21105/joss.00774> to modern Transformer-based text classification models (Wolf et al., 2020) <doi:10.18653/v1/2020.emnlp-demos.6>, in order to facilitate supervised machine learning for textual data. This package mimics the behaviors of 'quanteda.textmodels' and provides a function to setup the 'Python' environment to use the pretrained models from 'Hugging Face' <>. More information: <doi:10.5117/CCR2023.1.003.CHAN>.
Maintained by Chung-hong Chan. Last updated 25 days ago.
phonTools:Tools for Phonetic and Acoustic Analyses
Contains tools for the organization, display, and analysis of the sorts of data frequently encountered in phonetics research and experimentation, including the easy creation of IPA vowel plots, and the creation and manipulation of WAVE audio files.
Maintained by Santiago Barreda. Last updated 1 years ago.
childsds:Data and Methods Around Reference Values in Pediatrics
Calculation of standard deviation scores and percentiles adduced from different standards (WHO, UK, Germany, Italy, China, etc). Also, references for laboratory values in children and adults are available, e.g., serum lipids, iron-related blood parameters, IGF, liver enzymes. See package documentation for full list.
Maintained by Mandy Vogel. Last updated 2 months ago.
svs:Tools for Semantic Vector Spaces
Various tools for semantic vector spaces, such as correspondence analysis (simple, multiple and discriminant), latent semantic analysis, probabilistic latent semantic analysis, non-negative matrix factorization, latent class analysis, EM clustering, logratio analysis and log-multiplicative (association) analysis. Furthermore, there are specialized distance measures, plotting functions and some helper functions.
Maintained by Koen Plevoets. Last updated 9 months ago.
vegtable:Handling Vegetation Data Sets
Import and handling data from vegetation-plot databases, especially data stored in 'Turboveg 2' (<>). Also import/export routines for exchange of data with 'Juice' (<>) are implemented.
Maintained by Miguel Alvarez. Last updated 8 months ago.
lsbclust:Least-Squares Bilinear Clustering for Three-Way Data
Functions for performing least-squares bilinear clustering of three-way data. The method uses the bilinear decomposition (or bi-additive model) to model two-way matrix slices while clustering over the third way. Up to four different types of clusters are included, one for each term of the bilinear decomposition. In this way, matrices are clustered simultaneously on (a subset of) their overall means, row margins, column margins and row-column interactions. The orthogonality of the bilinear model results in separability of the joint clustering problem into four separate ones. Three of these sub-problems are specific k-means problems, while a special algorithm is implemented for the interactions. Plotting methods are provided, including biplots for the low-rank approximations of the interactions.
Maintained by Pieter Schoonees. Last updated 6 years ago.
r2spss:Format R Output to Look Like SPSS
Create plots and LaTeX tables that look like SPSS output for use in teaching materials. Rather than copying-and-pasting SPSS output into documents, R code that mocks up SPSS output can be integrated directly into dynamic LaTeX documents with tools such as knitr. Functionality includes statistical techniques that are typically covered in introductory statistics classes: descriptive statistics, common hypothesis tests, ANOVA, and linear regression, as well as box plots, histograms, scatter plots, and line plots (including profile plots).
Maintained by Andreas Alfons. Last updated 3 years ago.
longevity:Statistical Methods for the Analysis of Excess Lifetimes
A collection of parametric and nonparametric methods for the analysis of survival data. Parametric families implemented include Gompertz-Makeham, exponential and generalized Pareto models and extended models. The package includes an implementation of the nonparametric maximum likelihood estimator for arbitrary truncation and censoring pattern based on Turnbull (1976) <doi:10.1111/j.2517-6161.1976.tb01597.x>, along with graphical goodness-of-fit diagnostics. Parametric models for positive random variables and peaks over threshold models based on extreme value theory are described in Rootzén and Zholud (2017) <doi:10.1007/s10687-017-0305-5>; Belzile et al. (2021) <doi:10.1098/rsos.202097> and Belzile et al. (2022) <doi:10.1146/annurev-statistics-040120-025426>.
Maintained by Leo Belzile. Last updated 4 months ago.
dscore:D-Score for Child Development
The D-score summarizes the child's performance on a set of milestones into a single number. The package implements four Rasch model keys to convert milestone scores into a D-score. It provides tools to calculate the D-score and its precision from the child's milestone scores, to convert the D-score into the Development-for-Age Z-score (DAZ) using age-conditional references, and to map milestone names into a generic 9-position item naming convention.
Maintained by Stef van Buuren. Last updated 7 months ago.
lgrdata:Example Datasets for a Learning Guide to R
A largish collection of example datasets, including several classics. Many of these datasets are well suited for regression, classification, and visualization.
Maintained by Remko Duursma. Last updated 6 years ago.
tatooheene:Technology Appraisal Toolbox for Health Economic Evaluations in the Netherlands
Functions to support economic modelling in R based on the methods of the Dutch guideline for economic evaluations in healthcare <>, CBS data <>, and OECD data <>.
Maintained by Stijn Peeters. Last updated 3 months ago.
pct:Propensity to Cycle Tool
Functions and example data to teach and increase the reproducibility of the methods and code underlying the Propensity to Cycle Tool (PCT), a research project and web application hosted at <>. For an academic paper on the methods, see Lovelace et al (2017) <doi:10.5198/jtlu.2016.862>.
Maintained by Robin Lovelace. Last updated 12 days ago.
expectreg:Expectile and Quantile Regression
Expectile and quantile regression of models with nonlinear effects e.g. spatial, random, ridge using least asymmetric weighed squares / absolutes as well as boosting; also supplies expectiles for common distributions.
Maintained by Fabian Otto-Sobotka. Last updated 1 years ago.
nsapi:Connect to the NS (Dutch Railways) API
Access the NS api and download current departure times, disruptions and engineering work, the station list, and travel recommendations from station to station. All results will be returned as a 'data.frame'. NS (Nederlandse Spoorwegen; Dutch Railways) is the largest train travel provider in the Netherlands. for more information about the API itself see <>. To use the API, and this package, you will need to obtain a username and password. More information about authentication and the use of the functions are described in the vignette.
Maintained by Roel M. Hogervorst. Last updated 2 years ago.
dynpred:Companion Package to "Dynamic Prediction in Clinical Survival Analysis"
The dynpred package contains functions for dynamic prediction in survival analysis.
Maintained by Hein Putter. Last updated 10 years ago.
aquodom:Access to Aquo domaintables from R (Dutch)
The Aquo Standard is the Dutch Standard for the exchange of data in water management. With *aquodom* (short for aquo domaintables) it is easy to exploit the API (<>) to download domaintables of the Aquo Standard and use them in R.
Maintained by Johan van Tent. Last updated 3 years ago.
litRiddle:Dataset and Tools to Research the Riddle of Literary Quality
Dataset and functions to explore quality of literary novels. The package is a part of the Riddle of Literary Quality project, and it contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data. For more details, see: Eder M, van Zundert J, Lensink S, van Dalen-Oskam K (2022). Replicating The Riddle of Literary Quality: The litRiddle package for R. In _Digital Humanities 2022: Conference Abstracts_, 636-637.
Maintained by Maciej Eder. Last updated 2 years ago.
certestyle:A Certe R Package for Applying Certe Organisational Style
A Certe R Package for applying the organisational colours and style of Certe, plus some additional formatting functions. This package is part of the 'certedata' universe.
Maintained by Matthijs S. Berends. Last updated 8 months ago.
SnowballC:Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library
An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.
Maintained by Milan Bouchet-Valat. Last updated 17 days ago.
con2lki:Calculate the Dutch Air Quality Index (LKI)
Calculates the dutch air quality index (LKI). This index was created on the basis of scientific studies of the health effects of air pollution. From these studies it can be deduced at what concentrations a certain percentage of the population can be affected. For more information see: <>.
Maintained by Mark Baas. Last updated 4 years ago.
wec:Weighted Effect Coding
Provides functions to create factor variables with contrasts based on weighted effect coding, and their interactions. In weighted effect coding the estimates from a first order regression model show the deviations per group from the sample mean. This is especially useful when a researcher has no directional hypotheses and uses a sample from a population in which the number of observation per group is different.
Maintained by Rense Nieuwenhuis. Last updated 7 years ago.
lmap:Logistic Mapping
Set of tools for mapping of categorical response variables based on principal component analysis (pca) and multidimensional unfolding (mdu).
Maintained by Mark de Rooij. Last updated 2 months ago.
fSRM:Social Relations Analyses with Roles ("Family SRM")
Social Relations Analysis with roles ("Family SRM") are computed, using a structural equation modeling approach. Groups ranging from three members up to an unlimited number of members are supported and the mean structure can be computed. Means and variances can be compared between different groups of families and between roles.
Maintained by Felix Schönbrodt. Last updated 4 years ago.
hqmisc:Miscellaneous Convenience Functions and Dataset
Miscellaneous convenience functions and wrapper functions to convert frequencies between Hz, semitones, mel and Bark, to create a matrix of dummy columns from a factor, to determine whether x lies in range [a,b], and to add a bracketed line to an existing plot. This package also contains an example data set of a stratified sample of 80 talkers of Dutch.
Maintained by Hugo Quene. Last updated 3 years ago.
SSrat:Two-Dimensional Sociometric Status Determination with Rating Scales
A set of functions for two-dimensional sociometric status determination with rating scales. For each person assessed, SSrat computes probability distributions of the total scores for `Sympathy' (S), `Antipathy' (A), social `Preference' (P) and social `Impact' (I), and applies a set of criteria for sociometric status categorization.
Maintained by Hans Landsheer. Last updated 7 years ago.
CAvariants:Correspondence Analysis Variants
Provides six variants of two-way correspondence analysis (ca): simple ca, singly ordered ca, doubly ordered ca, non symmetrical ca, singly ordered non symmetrical ca, and doubly ordered non symmetrical ca.
Maintained by Rosaria Lombardo. Last updated 1 years ago.
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{}.
Maintained by Kieran Healy. Last updated 11 months ago.
CA3variants:Three-Way Correspondence Analysis Variants
Provides four variants of three-way correspondence analysis (ca): three-way symmetrical ca, three-way non-symmetrical ca, three-way ordered symmetrical ca and three-way ordered non-symmetrical ca.
Maintained by Rosaria Lombardo. Last updated 2 years ago.
twn:Taxa Waterbeheer Nederland voor R
The TWN-list (Taxa Waterbeheer Nederland) is the Dutch standard for naming taxons in Dutch Watermanagement. This package makes it easier to use the TWN-list for ecological analyses. It consists of two parts. First it makes the TWN-list itself available in R. Second, it has a few functions that make it easy to perform some basic and often recurring tasks for checking and consulting taxonomic data from the TWN-list.
Maintained by Johan van Tent. Last updated 4 months ago.
