Showing 200 of total 682 results (show query)
tidyverse
tidyverse:Easily Install and Load the 'Tidyverse'
The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.
Maintained by Hadley Wickham. Last updated 5 months ago.
1.7k stars 20.23 score 664k scripts 125 dependentsgesistsa
rio:A Swiss-Army Knife for Data I/O
Streamlined data import and export by making assumptions that the user is probably willing to make: 'import()' and 'export()' determine the data format from the file extension, reasonable defaults are used for data import and export, web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly, and fast import packages are used where appropriate. An additional convenience function, 'convert()', provides a simple method for converting between file types.
Maintained by Chung-hong Chan. Last updated 3 months ago.
csvcsvydatadata-scienceexcelioriosasspssstata
610 stars 17.10 score 7.8k scripts 74 dependentsamices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 23 hours ago.
chained-equationsfcsimputationmicemissing-datamissing-valuesmultiple-imputationmultivariate-datacpp
462 stars 16.64 score 10k scripts 154 dependentslarmarange
labelled:Manipulating Labelled Data
Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.
Maintained by Joseph Larmarange. Last updated 1 months ago.
havenlabelsmetadatasasspssstata
76 stars 15.04 score 2.4k scripts 98 dependentskaz-yos
tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights
Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.
Maintained by Kazuki Yoshida. Last updated 3 years ago.
baseline-characteristicsdescriptive-statisticsstatistics
221 stars 13.55 score 2.3k scripts 12 dependentsprojectmosaic
mosaic:Project MOSAIC Statistics and Mathematics Teaching Utilities
Data sets and utilities from Project MOSAIC (<http://www.mosaic-web.org>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.
Maintained by Randall Pruim. Last updated 1 years ago.
93 stars 13.32 score 7.2k scripts 7 dependentsdreamrs
esquisse:Explore and Visualize Your Data Interactively
A 'shiny' gadget to create 'ggplot2' figures interactively with drag-and-drop to map your variables to different aesthetics. You can quickly visualize your data accordingly to their type, export in various formats, and retrieve the code to reproduce the plot.
Maintained by Victor Perrier. Last updated 1 months ago.
addindata-visualizationggplot2rstudio-addinvisualization
1.8k stars 13.31 score 1.1k scripts 1 dependentsjuba
questionr:Functions to Make Surveys Processing Easier
Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.
Maintained by Julien Barnier. Last updated 8 days ago.
83 stars 12.93 score 1.1k scripts 19 dependentssimongrund1
mitml:Tools for Multiple Imputation in Multilevel Modeling
Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets.
Maintained by Simon Grund. Last updated 1 years ago.
imputationmissing-datamixed-effectsmultilevel-datamultilevel-models
29 stars 12.36 score 246 scripts 153 dependentsdreamrs
datamods:Modules to Import and Manipulate Data in 'Shiny'
'Shiny' modules to import data into an application or 'addin' from various sources, and to manipulate them after that.
Maintained by Victor Perrier. Last updated 24 days ago.
144 stars 12.03 score 174 scripts 7 dependentsprojectmosaic
ggformula:Formula Interface to the Grammar of Graphics
Provides a formula interface to 'ggplot2' graphics.
Maintained by Randall Pruim. Last updated 1 years ago.
38 stars 11.55 score 1.7k scripts 25 dependentslarmarange
broom.helpers:Helpers for Model Coefficients Tibbles
Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.
Maintained by Joseph Larmarange. Last updated 23 days ago.
22 stars 11.45 score 165 scripts 2 dependentsewenharrison
finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling
Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.
Maintained by Ewen Harrison. Last updated 8 days ago.
270 stars 11.43 score 1.0k scriptschoonghyunryu
dlookr:Tools for Data Diagnosis, Exploration, Transformation
A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Maintained by Choonghyun Ryu. Last updated 10 months ago.
212 stars 11.05 score 748 scripts 2 dependentsipums
ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data
An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <https://www.ipums.org>.
Maintained by Derek Burk. Last updated 1 months ago.
30 stars 11.05 score 720 scripts 2 dependentsbioc
ANCOMBC:Microbiome differential abudance and correlation analyses with bias correction
ANCOMBC is a package containing differential abundance (DA) and correlation analyses for microbiome data. Specifically, the package includes Analysis of Compositions of Microbiomes with Bias Correction 2 (ANCOM-BC2), Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC), and Analysis of Composition of Microbiomes (ANCOM) for DA analysis, and Sparse Estimation of Correlations among Microbiomes (SECOM) for correlation analysis. Microbiome data are typically subject to two sources of biases: unequal sampling fractions (sample-specific biases) and differential sequencing efficiencies (taxon-specific biases). Methodologies included in the ANCOMBC package are designed to correct these biases and construct statistically consistent estimators.
Maintained by Huang Lin. Last updated 13 days ago.
differentialexpressionmicrobiomenormalizationsequencingsoftwareancomancombcancombc2correlationdifferential-abundance-analysissecom
120 stars 10.79 score 406 scripts 1 dependentsbioc
GWASTools:Tools for Genome Wide Association Studies
Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.
Maintained by Stephanie M. Gogarten. Last updated 11 days ago.
snpgeneticvariabilityqualitycontrolmicroarray
17 stars 10.67 score 396 scripts 5 dependentsbioc
GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness
The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.
Maintained by Stephanie M. Gogarten. Last updated 2 months ago.
snpgeneticvariabilitygeneticsstatisticalmethoddimensionreductionprincipalcomponentgenomewideassociationqualitycontrolbiocviews
36 stars 10.44 score 342 scripts 1 dependentsrichardli
SUMMER:Small-Area-Estimation Unit/Area Models and Methods for Estimation in R
Provides methods for spatial and spatio-temporal smoothing of demographic and health indicators using survey data, with particular focus on estimating and projecting under-five mortality rates, described in Mercer et al. (2015) <doi:10.1214/15-AOAS872>, Li et al. (2019) <doi:10.1371/journal.pone.0210645>, Wu et al. (DHS Spatial Analysis Reports No. 21, 2021), and Li et al. (2023) <doi:10.48550/arXiv.2007.05117>.
Maintained by Zehang R Li. Last updated 3 months ago.
bayesian-inferencesmall-area-estimationspace-time
23 stars 10.28 score 134 scripts 2 dependentsinsightsengineering
teal.modules.clinical:'teal' Modules for Standard Clinical Outputs
Provides user-friendly tools for creating and customizing clinical trial reports. By leveraging the 'teal' framework, this package provides 'teal' modules to easily create an interactive panel that allows for seamless adjustments to data presentation, thereby streamlining the creation of detailed and accurate reports.
Maintained by Dawid Kaledkowski. Last updated 29 days ago.
clinical-trialsmodulesnestoutputsshiny
34 stars 10.25 score 149 scriptsidigbio
ridigbio:Interface to the iDigBio Data API
An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio.
Maintained by Jesse Bennett. Last updated 18 days ago.
16 stars 10.23 score 63 scripts 7 dependentsropensci
spocc:Interface to Species Occurrence Data Sources
A programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility ('GBIF'), 'iNaturalist', 'eBird', Integrated Digitized 'Biocollections' ('iDigBio'), 'VertNet', Ocean 'Biogeographic' Information System ('OBIS'), and Atlas of Living Australia ('ALA'). Includes functionality for retrieving species occurrence data, and combining those data.
Maintained by Hannah Owens. Last updated 2 months ago.
specimensapiweb-servicesoccurrencesspeciestaxonomygbifinatvertnetebirdidigbioobisalaantwebbisondataecoengineinaturalistoccurrencespecies-occurrencespocc
118 stars 10.09 score 552 scripts 5 dependentsjinseob2kim
jstable:Create Tables from Different Types of Regression
Create regression tables from generalized linear model(GLM), generalized estimating equation(GEE), generalized linear mixed-effects model(GLMM), Cox proportional hazards model, survey-weighted generalized linear model(svyglm) and survey-weighted Cox model results for publication.
Maintained by Jinseob Kim. Last updated 22 hours ago.
28 stars 10.08 score 199 scripts 1 dependentsropensci
rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data
Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.
Maintained by OJ Watson. Last updated 30 days ago.
datasetdhsdhs-apiextractpeer-reviewedsurvey-data
35 stars 10.07 score 286 scripts 3 dependentssdctools
sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Maintained by Matthias Templ. Last updated 1 months ago.
84 stars 9.63 score 258 scriptsjohn-d-fox
Rcmdr:R Commander
A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
Maintained by John Fox. Last updated 5 months ago.
4 stars 9.48 score 636 scripts 38 dependentsgeorgheinze
logistf:Firth's Bias-Reduced Logistic Regression
Fit a logistic regression model using Firth's bias reduction method, equivalent to penalization of the log-likelihood by the Jeffreys prior. Confidence intervals for regression coefficients can be computed by penalized profile likelihood. Firth's method was proposed as ideal solution to the problem of separation in logistic regression, see Heinze and Schemper (2002) <doi:10.1002/sim.1047>. If needed, the bias reduction can be turned off such that ordinary maximum likelihood logistic regression is obtained. Two new modifications of Firth's method, FLIC and FLAC, lead to unbiased predictions and are now available in the package as well, see Puhr et al (2017) <doi:10.1002/sim.7273>.
Maintained by Georg Heinze. Last updated 2 years ago.
12 stars 9.23 score 346 scripts 16 dependentsalexanderrobitzsch
miceadds:Some Additional Multiple Imputation Functions, Especially for 'mice'
Contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).
Maintained by Alexander Robitzsch. Last updated 28 days ago.
missing-datamultiple-imputationopenblascpp
16 stars 9.16 score 542 scripts 9 dependentsnickch-k
vtable:Variable Table for Variable Documentation
Automatically generates HTML variable documentation including variable names, labels, classes, value labels (if applicable), value ranges, and summary statistics. See the vignette "vtable" for a package overview.
Maintained by Nick Huntington-Klein. Last updated 3 months ago.
40 stars 9.10 score 1.2k scriptsbioc
BatchQC:Batch Effects Quality Control Software
Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.
Maintained by Jessica Anderson. Last updated 11 days ago.
batcheffectgraphandnetworkmicroarraynormalizationprincipalcomponentsequencingsoftwarevisualizationqualitycontrolrnaseqpreprocessingdifferentialexpressionimmunooncology
7 stars 9.06 score 54 scriptssachaepskamp
bootnet:Bootstrap Methods for Various Network Estimation Routines
Bootstrap methods to assess accuracy and stability of estimated network structures and centrality indices <doi:10.3758/s13428-017-0862-1>. Allows for flexible specification of any undirected network estimation procedure in R, and offers default sets for various estimation routines.
Maintained by Sacha Epskamp. Last updated 5 months ago.
32 stars 8.94 score 155 scripts 3 dependentsmattcowgill
readabs:Download and Tidy Time Series Data from the Australian Bureau of Statistics
Downloads, imports, and tidies time series data from the Australian Bureau of Statistics <https://www.abs.gov.au/>.
Maintained by Matt Cowgill. Last updated 27 days ago.
absaustraliaaustralian-bureau-of-statisticsaustralian-datastatisticstidy-datatime-series
104 stars 8.85 score 180 scriptsatorus-research
xportr:Utilities to Output CDISC SDTM/ADaM XPT Files
Tools to build CDISC compliant data sets and check for CDISC compliance.
Maintained by Eli Miller. Last updated 3 months ago.
43 stars 8.84 score 102 scriptsbioc
SeqVarTools:Tools for variant data
An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.
Maintained by Stephanie M. Gogarten. Last updated 5 months ago.
snpgeneticvariabilitysequencinggenetics
3 stars 8.76 score 384 scripts 2 dependentsjinseob2kim
jsmodule:'RStudio' Addins and 'Shiny' Modules for Medical Research
'RStudio' addins and 'Shiny' modules for descriptive statistics, regression and survival analysis.
Maintained by Jinseob Kim. Last updated 11 days ago.
medicalrstudio-addinsshinyshiny-modulesstatistics
21 stars 8.69 score 61 scriptsr-box
boxr:Interface for the 'Box.com API'
An R interface for the remote file hosting service 'Box' (<https://www.box.com/>). In addition to uploading and downloading files, this package includes functions which mirror base R operations for local files, (e.g. box_load(), box_save(), box_read(), box_setwd(), etc.), as well as 'git' style functions for entire directories (e.g. box_fetch(), box_push()).
Maintained by Ian Lyttle. Last updated 12 months ago.
63 stars 8.65 score 238 scriptsprojectmosaic
mosaicCalc:R-Language Based Calculus Operations for Teaching
Software to support the introductory *MOSAIC Calculus* textbook <https://www.mosaic-web.org/MOSAIC-Calculus/>), one of many data- and modeling-oriented educational resources developed by Project MOSAIC (<https://www.mosaic-web.org/>). Provides symbolic and numerical differentiation and integration, as well as support for applied linear algebra (for data science), and differential equations/dynamics. Includes grammar-of-graphics-based functions for drawing vector fields, trajectories, etc. The software is suitable for general use, but intended mainly for teaching calculus.
Maintained by Daniel Kaplan. Last updated 1 months ago.
13 stars 8.63 score 546 scriptsisubirana
compareGroups:Descriptive Analysis by Groups
Create data summaries for quality control, extensive reports for exploring data, as well as publication-ready univariate or bivariate tables in several formats (plain text, HTML,LaTeX, PDF, Word or Excel. Create figures to quickly visualise the distribution of your data (boxplots, barplots, normality-plots, etc.). Display statistics (mean, median, frequencies, incidences, etc.). Perform the appropriate tests (t-test, Analysis of variance, Kruskal-Wallis, Fisher, log-rank, ...) depending on the nature of the described variable (normal, non-normal or qualitative). Summarize genetic data (Single Nucleotide Polymorphisms) data displaying Allele Frequencies and performing Hardy-Weinberg Equilibrium tests among other typical statistics and tests for these kind of data.
Maintained by Isaac Subirana. Last updated 1 months ago.
comparegroupsdescriptive-statisticsplotreporttable
36 stars 8.46 score 396 scripts 1 dependentsdavidhodge931
ggblanket:Simplify 'ggplot2' Visualisation
Simplify 'ggplot2' visualisation with 'ggblanket' wrapper functions.
Maintained by David Hodge. Last updated 11 days ago.
data-visualisationdata-visualizationggplotggplot-extensionggplot2ggplot2-enhancementsvisualisationvisualization
173 stars 8.42 score 45 scriptsnlmixr2
nlmixr2:Nonlinear Mixed Effects Models in Population PK/PD
Fit and compare nonlinear mixed-effects models in differential equations with flexible dosing information commonly seen in pharmacokinetics and pharmacodynamics (Almquist, Leander, and Jirstrand 2015 <doi:10.1007/s10928-015-9409-1>). Differential equation solving is by compiled C code provided in the 'rxode2' package (Wang, Hallow, and James 2015 <doi:10.1002/psp4.12052>).
Maintained by Matthew Fidler. Last updated 1 months ago.
52 stars 8.38 score 120 scripts 3 dependentswallaceecomod
wallace:A Modular Platform for Reproducible Modeling of Species Niches and Distributions
The 'shiny' application Wallace is a modular platform for reproducible modeling of species niches and distributions. Wallace guides users through a complete analysis, from the acquisition of species occurrence and environmental data to visualizing model predictions on an interactive map, thus bundling complex workflows into a single, streamlined interface. An extensive vignette, which guides users through most package functionality can be found on the package's GitHub Pages website: <https://wallaceecomod.github.io/wallace/articles/tutorial-v2.html>.
Maintained by Mary E. Blair. Last updated 22 days ago.
133 stars 8.36 score 96 scriptsrubenarslan
codebook:Automatic Codebooks from Metadata Encoded in Dataset Attributes
Easily automate the following tasks to describe data frames: Summarise the distributions, and labelled missings of variables graphically and using descriptive statistics. For surveys, compute and summarise reliabilities (internal consistencies, retest, multilevel) for psychological scales. Combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on 'rmarkdown' partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.) and in JSON-LD, so that search engines can find your data and index the metadata. The metadata are also available at your fingertips via RStudio Addins.
Maintained by Ruben Arslan. Last updated 3 months ago.
codebookdocumentationformrjson-ldmetadataspsswebapp
143 stars 8.29 score 229 scriptsdbosak01
libr:Libraries, Data Dictionaries, and a Data Step for R
Contains a set of functions to create data libraries, generate data dictionaries, and simulate a data step. The libname() function will load a directory of data into a library in one line of code. The dictionary() function will generate data dictionaries for individual data frames or an entire library. And the datestep() function will perform row-by-row data processing.
Maintained by David Bosak. Last updated 3 months ago.
27 stars 8.27 score 48 scripts 2 dependentssafetygraphics
safetyGraphics:Interactive Graphics for Monitoring Clinical Trial Safety
A framework for evaluation of clinical trial safety. Users can interactively explore their data using the included 'Shiny' application.
Maintained by Jeremy Wildfire. Last updated 2 years ago.
99 stars 8.19 score 111 scriptsalinetalhouk
diceR:Diverse Cluster Ensemble in R
Performs cluster analysis using an ensemble clustering framework, Chiu & Talhouk (2018) <doi:10.1186/s12859-017-1996-y>. Results from a diverse set of algorithms are pooled together using methods such as majority voting, K-Modes, LinkCluE, and CSPA. There are options to compare cluster assignments across algorithms using internal and external indices, visualizations such as heatmaps, and significance testing for the existence of clusters.
Maintained by Derek Chiu. Last updated 2 months ago.
37 stars 8.13 score 60 scripts 3 dependentsgfellerlab
SuperCell:Simplification of scRNA-seq data by merging together similar cells
Aggregates large single-cell data into metacell dataset by merging together gene expression of very similar cells.
Maintained by The package maintainer. Last updated 8 months ago.
softwarecoarse-grainingscrna-seq-analysisscrna-seq-data
72 stars 8.08 score 93 scriptssalvatoremangiafico
rcompanion:Functions to Support Extension Education Program Evaluation
Functions and datasets to support Summary and Analysis of Extension Program Evaluation in R, and An R Companion for the Handbook of Biological Statistics. Vignettes are available at <https://rcompanion.org>.
Maintained by Salvatore Mangiafico. Last updated 1 months ago.
4 stars 8.01 score 2.4k scripts 5 dependentsjohn-harrold
formods:'Shiny' Modules for General Tasks
'Shiny' apps can often make use of the same key elements, this package provides modules for common tasks (data upload, wrangling data, figure generation and saving the app state), and also a framework for developing. These modules can react and interact as well as generate code to create reproducible analyses.
Maintained by John Harrold. Last updated 19 days ago.
8 stars 7.94 score 100 scripts 1 dependentsdataobservatory-eu
dataset:Create Data Frames that are Easier to Exchange and Reuse
The aim of the 'dataset' package is to make tidy datasets easier to release, exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, well-described, interoperable datasets into release and reuse ready form.
Maintained by Daniel Antal. Last updated 4 days ago.
14 stars 7.89 score 76 scripts 1 dependentspsychbruce
bruceR:Broadly Useful Convenient and Efficient R Functions
Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.
Maintained by Han-Wu-Shuang Bao. Last updated 10 months ago.
anovadata-analysisdata-sciencelinear-modelslinear-regressionmultilevel-modelsstatisticstoolbox
176 stars 7.87 score 316 scripts 3 dependentsdbosak01
sassy:Makes 'R' Easier for Everyone
A meta-package that aims to make 'R' easier for everyone, especially programmers who have a background in 'SAS®' software. This set of packages brings many useful concepts to 'R', including data libraries, data dictionaries, formats and format catalogs, a data step, and a traceable log. The 'flagship' package is a reporting package that can output in text, rich text, 'PDF', 'HTML', and 'DOCX' file formats.
Maintained by David Bosak. Last updated 7 days ago.
21 stars 7.87 score 92 scriptsamerican-institutes-for-research
EdSurvey:Analysis of NCES Education Survey and Assessment Data
Read in and analyze functions for education survey and assessment data from the National Center for Education Statistics (NCES) <https://nces.ed.gov/>, including National Assessment of Educational Progress (NAEP) data <https://nces.ed.gov/nationsreportcard/> and data from the International Assessment Database: Organisation for Economic Co-operation and Development (OECD) <https://www.oecd.org/en/about/directorates/directorate-for-education-and-skills.html>, including Programme for International Student Assessment (PISA), Teaching and Learning International Survey (TALIS), Programme for the International Assessment of Adult Competencies (PIAAC), and International Association for the Evaluation of Educational Achievement (IEA) <https://www.iea.nl/>, including Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, Progress in International Reading Literacy Study (PIRLS), International Civic and Citizenship Study (ICCS), International Computer and Information Literacy Study (ICILS), and Civic Education Study (CivEd).
Maintained by Paul Bailey. Last updated 29 days ago.
10 stars 7.86 score 139 scripts 1 dependentspmair78
smacof:Multidimensional Scaling
Implements the following approaches for multidimensional scaling (MDS) based on stress minimization using majorization (smacof): ratio/interval/ordinal/spline MDS on symmetric dissimilarity matrices, MDS with external constraints on the configuration, individual differences scaling (idioscal, indscal), MDS with spherical restrictions, and ratio/interval/ordinal/spline unfolding (circular restrictions, row-conditional). Various tools and extensions like jackknife MDS, bootstrap MDS, permutation tests, MDS biplots, gravity models, unidimensional scaling, drift vectors (asymmetric MDS), classical scaling, and Procrustes are implemented as well.
Maintained by Patrick Mair. Last updated 6 months ago.
5 stars 7.86 score 152 scripts 24 dependentsjoachim-gassen
ExPanDaR:Explore Your Data Interactively
Provides a shiny-based front end (the 'ExPanD' app) and a set of functions for exploratory data analysis. Run as a web-based app, 'ExPanD' enables users to assess the robustness of empirical evidence without providing them access to the underlying data. You can export a notebook containing the analysis of 'ExPanD' and/or use the functions of the package to support your exploratory data analysis workflow. Refer to the vignettes of the package for more information on how to use 'ExPanD' and/or the functions of this package.
Maintained by Joachim Gassen. Last updated 4 years ago.
accountingedaexploratory-data-analysisfinanceopen-sciencereplicationshinyshiny-apps
156 stars 7.80 score 203 scriptsjwiley
JWileymisc:Miscellaneous Utilities and Functions
Miscellaneous tools and functions, including: generate descriptive statistics tables, format output, visualize relations among variables or check distributions, and generic functions for residual and model diagnostics.
Maintained by Joshua F. Wiley. Last updated 3 days ago.
6 stars 7.78 score 241 scripts 4 dependentsobiba
opalr:'Opal' Data Repository Client and 'DataSHIELD' Utils
Data integration Web application for biobanks by 'OBiBa'. 'Opal' is the core database application for biobanks. Participant data, once collected from any data source, must be integrated and stored in a central data repository under a uniform model. 'Opal' is such a central repository. It can import, process, validate, query, analyze, report, and export data. 'Opal' is typically used in a research center to analyze the data acquired at assessment centres. Its ultimate purpose is to achieve seamless data-sharing among biobanks. This 'Opal' client allows to interact with 'Opal' web services and to perform operations on the R server side. 'DataSHIELD' administration tools are also provided.
Maintained by Yannick Marcon. Last updated 3 months ago.
3 stars 7.76 score 179 scripts 2 dependentsnovartis
xgxr:Exploratory Graphics for Pharmacometrics
Supports a structured approach for exploring PKPD data <https://opensource.nibr.com/xgx/>. It also contains helper functions for enabling the modeler to follow best R practices (by appending the program name, figure name location, and draft status to each plot). In addition, it enables the modeler to follow best graphical practices (by providing a theme that reduces chart ink, and by providing time-scale, log-scale, and reverse-log-transform-scale functions for more readable axes). Finally, it provides some data checking and summarizing functions for rapidly exploring pharmacokinetics and pharmacodynamics (PKPD) datasets.
Maintained by Andrew Stein. Last updated 1 years ago.
13 stars 7.76 score 105 scripts 5 dependentsvaleriapolicastro
robin:ROBustness in Network
Assesses the robustness of the community structure of a network found by one or more community detection algorithm to give indications about their reliability. It detects if the community structure found by a set of algorithms is statistically significant and compares the different selected detection algorithms on the same network. robin helps to choose among different community detection algorithms the one that better fits the network of interest. Reference in Policastro V., Righelli D., Carissimo A., Cutillo L., De Feis I. (2021) <https://journal.r-project.org/archive/2021/RJ-2021-040/index.html>.
Maintained by Valeria Policastro. Last updated 8 days ago.
19 stars 7.72 score 8 scriptsnliulab
AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator
A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.
Maintained by Feng Xie. Last updated 27 days ago.
32 stars 7.70 score 30 scriptsekstroem
MESS:Miscellaneous Esoteric Statistical Scripts
A mixed collection of useful and semi-useful diverse statistical functions, some of which may even be referenced in The R Primer book. See Ekstrøm, C. T. (2016). The R Primer. 2nd edition. Chapman & Hall.
Maintained by Claus Thorn Ekstrøm. Last updated 1 months ago.
biostatisticspower-analysisstatistical-analysisstatistical-methodsstatistical-modelsopenblascpp
4 stars 7.69 score 328 scripts 13 dependentsproteomicslab57357
UniprotR:Retrieving Information of Proteins from Uniprot
Connect to Uniprot <https://www.uniprot.org/> to retrieve information about proteins using their accession number such information could be name or taxonomy information, For detailed information kindly read the publication <https://www.sciencedirect.com/science/article/pii/S1874391919303859>.
Maintained by Mohamed Soudy. Last updated 3 years ago.
61 stars 7.65 score 89 scripts 1 dependentsropengov
retroharmonize:Ex Post Survey Data Harmonization
Assist in reproducible retrospective (ex-post) harmonization of data, particularly individual level survey data, by providing tools for organizing metadata, standardizing the coding of variables, and variable names and value labels, including missing values, and documenting the data transformations, with the help of comprehensive s3 classes.
Maintained by Daniel Antal. Last updated 2 months ago.
10 stars 7.62 score 59 scriptsuligges
klaR:Classification and Visualization
Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kernel-density naive Bayes, an interface to 'svmlight' and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.
Maintained by Uwe Ligges. Last updated 1 years ago.
5 stars 7.61 score 1.4k scripts 13 dependentsmmollina
mappoly:Genetic Linkage Maps in Autopolyploids
Construction of genetic maps in autopolyploid full-sib populations. Uses pairwise recombination fraction estimation as the first source of information to sequentially position allelic variants in specific homologous chromosomes. For situations where pairwise analysis has limited power, the algorithm relies on the multilocus likelihood obtained through a hidden Markov model (HMM). For more detail, please see Mollinari and Garcia (2019) <doi:10.1534/g3.119.400378> and Mollinari et al. (2020) <doi:10.1534/g3.119.400620>.
Maintained by Marcelo Mollinari. Last updated 23 days ago.
polyploidpolyploid-genetic-mappingpolyploidycpp
27 stars 7.56 score 111 scripts 1 dependentssilvadenisson
electionsBR:R Functions to Download and Clean Brazilian Electoral Data
Offers a set of functions to easily download and clean Brazilian electoral data from the Superior Electoral Court and 'CepespData' websites. Among other features, the package retrieves data on local and federal elections for all positions (city councilor, mayor, state deputy, federal deputy, governor, and president) aggregated by state, city, and electoral zones.
Maintained by Denisson Silva. Last updated 4 months ago.
65 stars 7.54 score 66 scriptsekstroem
dataMaid:A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process
Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.
Maintained by Claus Thorn Ekstrøm. Last updated 3 years ago.
data-cleaningdata-screeningreproducible-research
143 stars 7.53 score 236 scriptsbeckerbenj
eatGADS:Data Management of Large Hierarchical Data
Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.
Maintained by Benjamin Becker. Last updated 2 days ago.
1 stars 7.48 score 34 scripts 1 dependentscardiomoon
editData:'RStudio' Addin for Editing a 'data.frame'
An 'RStudio' addin for editing a 'data.frame' or a 'tibble'. You can delete, add or update a 'data.frame' without coding. You can get resultant data as a 'data.frame'. In the package, modularized 'shiny' app codes are provided. These modules are intended for reuse across applications.
Maintained by Keon-Woong Moon. Last updated 4 years ago.
32 stars 7.45 score 63 scripts 5 dependentsamices
ggmice:Visualizations for 'mice' with 'ggplot2'
Enhance a 'mice' imputation workflow with visualizations for incomplete and/or imputed data. The plotting functions produce 'ggplot' objects which may be easily manipulated or extended. Use 'ggmice' to inspect missing data, develop imputation models, evaluate algorithmic convergence, or compare observed versus imputed data.
Maintained by Hanne Oberman. Last updated 8 months ago.
32 stars 7.42 score 165 scriptsfarhadpishgar
MatchThem:Matching and Weighting Multiply Imputed Datasets
Provides essential tools for the pre-processing techniques of matching and weighting multiply imputed datasets. The package includes functions for matching within and across multiply imputed datasets using various methods, estimating weights for units in the imputed datasets using multiple weighting methods, calculating causal effect estimates in each matched or weighted dataset using parametric or non-parametric statistical models, and pooling the resulting estimates according to Rubin's rules (please see <https://journal.r-project.org/archive/2021/RJ-2021-073/> for more details).
Maintained by Farhad Pishgar. Last updated 5 months ago.
18 stars 7.40 score 112 scriptsddotta
parquetize:Convert Files to Parquet Format
Collection of functions to get files in parquet format. Parquet is a columnar storage file format <https://parquet.apache.org/>. The files to convert can be of several formats ("csv", "RData", "rds", "RSQLite", "json", "ndjson", "SAS", "SPSS"...).
Maintained by Damien Dotta. Last updated 5 months ago.
conversionconvertconvertercsvparquetsasspsssqlitestata
71 stars 7.36 score 27 scripts 1 dependentseltebioinformatics
mulea:Enrichment Analysis Using Multiple Ontologies and False Discovery Rate
Background - Traditional gene set enrichment analyses are typically limited to a few ontologies and do not account for the interdependence of gene sets or terms, resulting in overcorrected p-values. To address these challenges, we introduce mulea, an R package offering comprehensive overrepresentation and functional enrichment analysis. Results - mulea employs a progressive empirical false discovery rate (eFDR) method, specifically designed for interconnected biological data, to accurately identify significant terms within diverse ontologies. mulea expands beyond traditional tools by incorporating a wide range of ontologies, encompassing Gene Ontology, pathways, regulatory elements, genomic locations, and protein domains. This flexibility enables researchers to tailor enrichment analysis to their specific questions, such as identifying enriched transcriptional regulators in gene expression data or overrepresented protein domains in protein sets. To facilitate seamless analysis, mulea provides gene sets (in standardised GMT format) for 27 model organisms, covering 22 ontology types from 16 databases and various identifiers resulting in almost 900 files. Additionally, the muleaData ExperimentData Bioconductor package simplifies access to these pre-defined ontologies. Finally, mulea's architecture allows for easy integration of user-defined ontologies, or GMT files from external sources (e.g., MSigDB or Enrichr), expanding its applicability across diverse research areas. Conclusions - mulea is distributed as a CRAN R package. It offers researchers a powerful and flexible toolkit for functional enrichment analysis, addressing limitations of traditional tools with its progressive eFDR and by supporting a variety of ontologies. Overall, mulea fosters the exploration of diverse biological questions across various model organisms.
Maintained by Tamas Stirling. Last updated 4 months ago.
annotationdifferentialexpressiongeneexpressiongenesetenrichmentgographandnetworkmultiplecomparisonpathwaysreactomesoftwaretranscriptionvisualizationenrichmentenrichment-analysisfunctional-enrichment-analysisgene-set-enrichmentontologiestranscriptomicscpp
28 stars 7.36 score 34 scriptsbioc
gDRimport:Package for handling the import of dose-response data
The package is a part of the gDR suite. It helps to prepare raw drug response data for downstream processing. It mainly contains helper functions for importing/loading/validating dose-response data provided in different file formats.
Maintained by Arkadiusz Gladki. Last updated 10 days ago.
softwareinfrastructuredataimport
3 stars 7.29 score 5 scripts 1 dependentsibecav
CGPfunctions:Powell Miscellaneous Functions for Teaching and Learning Statistics
Miscellaneous functions useful for teaching statistics as well as actually practicing the art. They typically are not new methods but rather wrappers around either base R or other packages.
Maintained by Chuck Powell. Last updated 4 years ago.
27 stars 7.28 score 122 scriptsmodesto-escobar
netCoin:Interactive Analytic Networks
Create interactive analytic networks. It joins the data analysis power of R to obtain coincidences, co-occurrences and correlations, and the visualization libraries of 'JavaScript' in one package.
Maintained by Modesto Escobar. Last updated 22 hours ago.
11 stars 7.22 score 47 scriptsbioc
CRISPRseek:Design of guide RNAs in CRISPR genome-editing systems
The package encompasses functions to find potential guide RNAs for the CRISPR-based genome-editing systems including the Base Editors and the Prime Editors when supplied with target sequences as input. Users have the flexibility to filter resulting guide RNAs based on parameters such as the absence of restriction enzyme cut sites or the lack of paired guide RNAs. The package also facilitates genome-wide exploration for off-targets, offering features to score and rank off-targets, retrieve flanking sequences, and indicate whether the hits are located within exon regions. All detected guide RNAs are annotated with the cumulative scores of the top5 and topN off-targets together with the detailed information such as mismatch sites and restrictuion enzyme cut sites. The package also outputs INDELs and their frequencies for Cas9 targeted sites.
Maintained by Lihua Julie Zhu. Last updated 19 days ago.
immunooncologygeneregulationsequencematchingcrispr
7.18 score 51 scripts 2 dependentsmwheymans
psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets
Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.
Maintained by Martijn Heymans. Last updated 2 years ago.
cox-regressionimputationimputed-datasetslogisticmultiple-imputationpoolpredictorregressionselectionsplinespline-predictors
10 stars 7.17 score 70 scriptscardiomoon
autoReg:Automatic Linear and Logistic Regression and Survival Analysis
Make summary tables for descriptive statistics and select explanatory variables automatically in various regression models. Support linear models, generalized linear models and cox-proportional hazard models. Generate publication-ready tables summarizing result of regression analysis and plots. The tables and plots can be exported in "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents.
Maintained by Keon-Woong Moon. Last updated 1 years ago.
49 stars 7.13 score 69 scriptsroelandkindt
BiodiversityR:Package for Community Ecology and Suitability Analysis
Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.
Maintained by Roeland Kindt. Last updated 2 months ago.
17 stars 7.13 score 390 scripts 2 dependentsopenanalytics
clinDataReview:Clinical Data Review Tool
Creation of interactive tables, listings and figures ('TLFs') and associated report for exploratory analysis of data in a clinical trial, e.g. for clinical oversight activities. Interactive figures include sunburst, treemap, scatterplot, line plot and barplot of counts data. Interactive tables include table of summary statistics (as counts of adverse events, enrollment table) and listings. Possibility to compare data (summary table or listing) across two data batches/sets. A clinical data review report is created via study-specific configuration files and template 'R Markdown' reports contained in the package.
Maintained by Laure Cougnaud. Last updated 10 months ago.
11 stars 7.10 score 36 scriptschaisemartinpackages
DIDmultiplegtDYN:Estimation in Difference-in-Difference Designs with Multiple Groups and Periods
Estimation of heterogeneity-robust difference-in-differences estimators, with a binary, discrete, or continuous treatment, in designs where past treatments may affect the current outcome.
Maintained by Diego Ciccia. Last updated 3 days ago.
42 stars 7.10 score 19 scripts 1 dependentsfarrellday
miceRanger:Multiple Imputation by Chained Equations with Random Forests
Multiple Imputation has been shown to be a flexible method to impute missing values by Van Buuren (2007) <doi:10.1177/0962280206074463>. Expanding on this, random forests have been shown to be an accurate model by Stekhoven and Buhlmann <arXiv:1105.0828> to impute missing values in datasets. They have the added benefits of returning out of bag error and variable importance estimates, as well as being simple to run in parallel.
Maintained by Sam Wilson. Last updated 3 years ago.
imputation-methodsmachine-learningmicemissing-datamissing-valuesrandom-forests
67 stars 7.09 score 41 scripts 1 dependentsjohn-harrold
ruminate:A Pharmacometrics Data Transformation and Analysis Tool
Exploration of pharmacometrics data involves both general tools (transformation and plotting) and specific techniques (non-compartmental analysis). This kind of exploration is generally accomplished by utilizing different packages. The purpose of 'ruminate' is to create a 'shiny' interface to make these tools more broadly available while creating reproducible results.
Maintained by John Harrold. Last updated 19 days ago.
2 stars 7.06 score 84 scriptsjohn-d-fox
RcmdrMisc:R Commander Miscellaneous Functions
Various statistical, graphics, and data-management functions used by the Rcmdr package in the R Commander GUI for R.
Maintained by John Fox. Last updated 2 years ago.
1 stars 7.02 score 432 scripts 42 dependentsbioc
musicatk:Mutational Signature Comprehensive Analysis Toolkit
Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.
Maintained by Joshua D. Campbell. Last updated 5 months ago.
softwarebiologicalquestionsomaticmutationvariantannotation
13 stars 6.97 score 20 scriptscmerow
rangeModelMetadata:Provides Templates for Metadata Files Associated with Species Range Models
Range Modeling Metadata Standards (RMMS) address three challenges: they (i) are designed for convenience to encourage use, (ii) accommodate a wide variety of applications, and (iii) are extensible to allow the community of range modelers to steer it as needed. RMMS are based on a data dictionary that specifies a hierarchical structure to catalog different aspects of the range modeling process. The dictionary balances a constrained, minimalist vocabulary to improve standardization with flexibility for users to provide their own values. Merow et al. (2019) <DOI:10.1111/geb.12993> describe the standards in more detail. Note that users who prefer to use the R package 'ecospat' can obtain it from <https://github.com/ecospat/ecospat>.
Maintained by Cory Merow. Last updated 8 months ago.
ecological-metadata-languageecological-modellingecological-modelsecologyspecies-distribution-modellingspecies-distributions
6 stars 6.96 score 16 scripts 3 dependentsinsightsengineering
tern.gee:Tables and Graphs for Generalized Estimating Equations (GEE) Model Fits
Generalized estimating equations (GEE) are a popular choice for analyzing longitudinal binary outcomes. This package provides an interface for fitting GEE, currently for logistic regression, within the 'tern' <https://cran.r-project.org/package=tern> framework (Zhu, Sabanés Bové et al., 2023) and tabulate results easily using 'rtables' <https://cran.r-project.org/package=rtables> (Becker, Waddell et al., 2023). It builds on 'geepack' <doi:10.18637/jss.v015.i02> (Højsgaard, Halekoh and Yan, 2006) for the actual GEE model fitting.
Maintained by Joe Zhu. Last updated 7 months ago.
8 stars 6.94 score 3 scripts 1 dependentsdanlwarren
ENMTools:Analysis of Niche Evolution using Niche and Distribution Models
Constructing niche models and analyzing patterns of niche evolution. Acts as an interface for many popular modeling algorithms, and allows users to conduct Monte Carlo tests to address basic questions in evolutionary ecology and biogeography. Warren, D.L., R.E. Glor, and M. Turelli (2008) <doi:10.1111/j.1558-5646.2008.00482.x> Glor, R.E., and D.L. Warren (2011) <doi:10.1111/j.1558-5646.2010.01177.x> Warren, D.L., R.E. Glor, and M. Turelli (2010) <doi:10.1111/j.1600-0587.2009.06142.x> Cardillo, M., and D.L. Warren (2016) <doi:10.1111/geb.12455> D.L. Warren, L.J. Beaumont, R. Dinnage, and J.B. Baumgartner (2019) <doi:10.1111/ecog.03900>.
Maintained by Dan Warren. Last updated 3 months ago.
105 stars 6.91 score 126 scriptsropensci
essurvey:Download Data from the European Social Survey on the Fly
Download data from the European Social Survey directly from their website <http://www.europeansocialsurvey.org/>. There are two families of functions that allow you to download and interactively check all countries and rounds available.
Maintained by Jorge Cimentada. Last updated 3 years ago.
48 stars 6.88 score 79 scriptssvmiller
stevemisc:Steve's Miscellaneous Functions
These are miscellaneous functions that I find useful for my research and teaching. The contents include themes for plots, functions for simulating quantities of interest from regression models, functions for simulating various forms of fake data for instructional/research purposes, and many more. All told, the functions provided here are broadly useful for data organization, data presentation, data recoding, and data simulation.
Maintained by Steve Miller. Last updated 18 days ago.
dplyrmixed-effects-modelsmultivariate-normal-distributiontidyverse
10 stars 6.85 score 392 scripts 2 dependentsraymondbalise
rUM:R Templates from the University of Miami
This holds some r markdown and quarto templates and a template to create a research project in "R Studio".
Maintained by Raymond Balise. Last updated 8 days ago.
9 stars 6.84 score 16 scriptsstla
qspray:Multivariate Polynomials with Rational Coefficients
Symbolic calculation and evaluation of multivariate polynomials with rational coefficients. This package is strongly inspired by the 'spray' package. It provides a function to compute Gröbner bases (reference <doi:10.1007/978-3-319-16721-3>). It also includes some features for symmetric polynomials, such as the Hall inner product. The header file of the C++ code can be used by other packages. It provides the templated class 'Qspray' that can be used to represent and to deal with multivariate polynomials with another type of coefficients.
Maintained by Stéphane Laurent. Last updated 7 months ago.
4 stars 6.81 score 152 scripts 5 dependentscardiomoon
webr:Data and Functions for Web-Based Analysis
Several analysis-related functions for the book entitled "Web-based Analysis without R in Your Computer"(written in Korean, ISBN 978-89-5566-185-9) by Keon-Woong Moon. The main function plot.htest() shows the distribution of statistic for the object of class 'htest'.
Maintained by Keon-Woong Moon. Last updated 5 years ago.
33 stars 6.80 score 181 scriptsopenanalytics
clinUtils:General Utility Functions for Analysis of Clinical Data
Utility functions to facilitate the import, the reporting and analysis of clinical data. Example datasets in 'SDTM' and 'ADaM' format, containing a subset of patients/domains from the 'CDISC Pilot 01 study' are also available as R datasets to demonstrate the package functionalities.
Maintained by Laure Cougnaud. Last updated 11 months ago.
3 stars 6.78 score 105 scripts 3 dependentsmichaellli
evalITR:Evaluating Individualized Treatment Rules
Provides various statistical methods for evaluating Individualized Treatment Rules under randomized data. The provided metrics include Population Average Value (PAV), Population Average Prescription Effect (PAPE), Area Under Prescription Effect Curve (AUPEC). It also provides the tools to analyze Individualized Treatment Rules under budget constraints. Detailed reference in Imai and Li (2019) <arXiv:1905.05389>.
Maintained by Michael Lingzhi Li. Last updated 2 years ago.
14 stars 6.78 score 36 scriptsharrison4192
autostats:Auto Stats
Automatically do statistical exploration. Create formulas using 'tidyselect' syntax, and then determine cross-validated model accuracy and variable contributions using 'glm' and 'xgboost'. Contains additional helper functions to create and modify formulas. Has a flagship function to quickly determine relationships between categorical and continuous variables in the data set.
Maintained by Harrison Tietze. Last updated 24 days ago.
6 stars 6.76 score 5 scripts 2 dependentsbig-life-lab
recodeflow:Contains functions to interface with variable details sheets, including recoding variables and converting them to PMML
Recode and harmonize data using variable and details sheets.
Maintained by Yulric Sequeria. Last updated 19 days ago.
6 stars 6.75 score 7 scriptspaytonjjones
networktools:Tools for Identifying Important Nodes in Networks
Includes assorted tools for network analysis. Bridge centrality; goldbricker; MDS, PCA, & eigenmodel network plotting.
Maintained by Payton Jones. Last updated 1 months ago.
10 stars 6.75 score 93 scripts 5 dependentsrichardhooijmaijers
shinyMixR:Interactive 'shiny' Dashboard for 'nlmixr2'
An R shiny user interface for the 'nlmixr2' (Fidler et al (2019) <doi:10.1002/psp4.12445>) package, designed to simplify the modeling process for users. Additionally, this package includes supplementary functions to further enhances the usage of 'nlmixr2'.
Maintained by Richard Hooijmaijers. Last updated 5 months ago.
11 stars 6.74 score 28 scriptsmarsicofl
mispitools:Missing Person Identification Tools
An open source software package written in R statistical language. It consists of a set of decision-making tools to conduct missing person searches. Particularly, it allows computing optimal LR threshold for declaring potential matches in DNA-based database search. More recently 'mispitools' incorporates preliminary investigation data based LRs. Statistical weight of different traces of evidence such as biological sex, age and hair color are presented. For citing mispitools please use the following references: Marsico and Caridi, 2023 <doi:10.1016/j.fsigen.2023.102891> and Marsico, Vigeland et al. 2021 <doi:10.1016/j.fsigen.2021.102519>.
Maintained by Franco Marsico. Last updated 3 months ago.
35 stars 6.74 score 19 scripts 1 dependentsimbi-heidelberg
DescrTab2:Publication Quality Descriptive Statistics Tables
Provides functions to create descriptive statistics tables for continuous and categorical variables. By default, summary statistics such as mean, standard deviation, quantiles, minimum and maximum for continuous variables and relative and absolute frequencies for categorical variables are calculated. 'DescrTab2' features a sophisticated algorithm to choose appropriate test statistics for your data and provides p-values. On top of this, confidence intervals for group differences of appropriated summary measures are automatically produces for two-group comparison. Tables generated by 'DescrTab2' can be integrated in a variety of document formats, including .html, .tex and .docx documents. 'DescrTab2' also allows printing tables to console and saving table objects for later use.
Maintained by Jan Meis. Last updated 1 years ago.
categorical-variablescontinuous-variabledescriptive-statisticsp-valuesstatistical-testsstatistics
9 stars 6.71 score 19 scripts 1 dependentsharrison4192
presenter:Present Data with Style
Consists of custom wrapper functions using packages 'openxlsx', 'flextable', and 'officer' to create highly formatted MS office friendly output of your data frames. These viewer friendly outputs are intended to match expectations of professional looking presentations in business and consulting scenarios. The functions are opinionated in the sense that they expect the input data frame to have certain properties in order to take advantage of the automated formatting.
Maintained by Harrison Tietze. Last updated 2 years ago.
11 stars 6.69 score 15 scripts 4 dependentscarriedaymont
growthcleanr:Data Cleaner for Anthropometric Measurements
Identifies implausible anthropometric (e.g., height, weight) measurements in irregularly spaced longitudinal datasets, such as those from electronic health records.
Maintained by Carrie Daymont. Last updated 29 days ago.
14 stars 6.68 score 41 scripts 1 dependentsagnesdeng
mixgb:Multiple Imputation Through 'XGBoost'
Multiple imputation using 'XGBoost', subsampling, and predictive mean matching as described in Deng and Lumley (2023) <doi:10.1080/10618600.2023.2252501>. The package supports various types of variables, offers flexible settings, and enables saving an imputation model to impute new data. Data processing and memory usage have been optimised to speed up the imputation process.
Maintained by Yongshi Deng. Last updated 2 months ago.
23 stars 6.58 score 82 scriptsstamats
MKinfer:Inferential Statistics
Computation of various confidence intervals (Altman et al. (2000), ISBN:978-0-727-91375-3; Hedderich and Sachs (2018), ISBN:978-3-662-56657-2) including bootstrapped versions (Davison and Hinkley (1997), ISBN:978-0-511-80284-3) as well as Hsu (Hedderich and Sachs (2018), ISBN:978-3-662-56657-2), permutation (Janssen (1997), <doi:10.1016/S0167-7152(97)00043-6>), bootstrap (Davison and Hinkley (1997), ISBN:978-0-511-80284-3), intersection-union (Sozu et al. (2015), ISBN:978-3-319-22005-5) and multiple imputation (Barnard and Rubin (1999), <doi:10.1093/biomet/86.4.948>) t-test; furthermore, computation of intersection-union z-test as well as multiple imputation Wilcoxon tests. Graphical visualization by volcano and Bland-Altman plots (Bland and Altman (1986), <doi:10.1016/S0140-6736(86)90837-8>; Shieh (2018), <doi:10.1186/s12874-018-0505-y>).
Maintained by Matthias Kohl. Last updated 12 months ago.
6 stars 6.56 score 71 scripts 4 dependentsbioc
methylclock:Methylclock - DNA methylation-based clocks
This package allows to estimate chronological and gestational DNA methylation (DNAm) age as well as biological age using different methylation clocks. Chronological DNAm age (in years) : Horvath's clock, Hannum's clock, BNN, Horvath's skin+blood clock, PedBE clock and Wu's clock. Gestational DNAm age : Knight's clock, Bohlin's clock, Mayne's clock and Lee's clocks. Biological DNAm clocks : Levine's clock and Telomere Length's clock.
Maintained by Dolors Pelegri-Siso. Last updated 5 months ago.
dnamethylationbiologicalquestionpreprocessingstatisticalmethodnormalizationcpp
39 stars 6.52 score 28 scriptshuanglabumn
oncoPredict:Drug Response Modeling and Biomarker Discovery
Allows for building drug response models using screening data between bulk RNA-Seq and a drug response metric and two additional tools for biomarker discovery that have been developed by the Huang Laboratory at University of Minnesota. There are 3 main functions within this package. (1) calcPhenotype is used to build drug response models on RNA-Seq data and impute them on any other RNA-Seq dataset given to the model. (2) GLDS is used to calculate the general level of drug sensitivity, which can improve biomarker discovery. (3) IDWAS can take the results from calcPhenotype and link the imputed response back to available genomic (mutation and CNV alterations) to identify biomarkers. Each of these functions comes from a paper from the Huang research laboratory. Below gives the relevant paper for each function. calcPhenotype - Geeleher et al, Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. GLDS - Geeleher et al, Cancer biomarker discovery is improved by accounting for variability in general levels of drug sensitivity in pre-clinical models. IDWAS - Geeleher et al, Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies.
Maintained by Robert Gruener. Last updated 12 months ago.
svapreprocesscorestringrbiomartgenefilterorg.hs.eg.dbgenomicfeaturestxdb.hsapiens.ucsc.hg19.knowngenetcgabiolinksbiocgenericsgenomicrangesirangess4vectors
18 stars 6.47 score 41 scriptshusson
Factoshiny:Perform Factorial Analysis from 'FactoMineR' with a Shiny Application
Perform factorial analysis with a menu and draw graphs interactively thanks to 'FactoMineR' and a Shiny application.
Maintained by Francois Husson. Last updated 2 months ago.
9 stars 6.46 score 152 scriptscardiomoon
rrtable:Reproducible Research with a Table of R Codes
Makes documents containing plots and tables from a table of R codes. Can make "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents with or without R code. In the package, modularized 'shiny' app codes are provided. These modules are intended for reuse across applications.
Maintained by Keon-Woong Moon. Last updated 2 years ago.
3 stars 6.45 score 76 scripts 2 dependentsagrdatasci
gosset:Tools for Data Analysis in Experimental Agriculture
Methods to analyse experimental agriculture data, from data synthesis to model selection and visualisation. The package is named after W.S. Gosset aka ‘Student’, a pioneer of modern statistics in small sample experimental design and analysis.
Maintained by Kauê de Sousa. Last updated 4 months ago.
experimental-designrankings-data
6 stars 6.44 score 23 scriptsbioc
gwasurvivr:gwasurvivr: an R package for genome wide survival analysis
gwasurvivr is a package to perform survival analysis using Cox proportional hazard models on imputed genetic data.
Maintained by Abbas Rizvi. Last updated 5 months ago.
genomewideassociationsurvivalregressiongeneticssnpgeneticvariabilitypharmacogenomicsbiomedicalinformatics
12 stars 6.43 score 75 scriptskjhealy
gssr:US General Social Survey (GSS) Data for R
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the GSS Cumulative Data and GSS Panel Data files packaged for R. Its companion package, gssrdoc, provides the codebook integrated into R's help system For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 5 months ago.
45 stars 6.42 score 147 scriptsanthonydevaux
DynForest:Random Forest with Multivariate Longitudinal Predictors
Based on random forest principle, 'DynForest' is able to include multiple longitudinal predictors to provide individual predictions. Longitudinal predictors are modeled through the random forest. The methodology is fully described for a survival outcome in: Devaux, Helmer, Genuer & Proust-Lima (2023) <doi: 10.1177/09622802231206477>.
Maintained by Anthony Devaux. Last updated 5 months ago.
16 stars 6.38 score 8 scriptsacaimo
Bergm:Bayesian Exponential Random Graph Models
Bayesian analysis for exponential random graph models using advanced computational algorithms. More information can be found at: <https://acaimo.github.io/Bergm/>.
Maintained by Alberto Caimo. Last updated 2 months ago.
16 stars 6.37 score 31 scripts 4 dependentsjacobkap
asciiSetupReader:Reads Fixed-Width ASCII Data Files (.txt or .dat) that Have Accompanying Setup Files (.sps or .sas)
Lets you open a fixed-width ASCII file (.txt or .dat) that has an accompanying setup file (.sps or .sas). These file combinations are sometimes referred to as .txt+.sps, .txt+.sas, .dat+.sps, or .dat+.sas. This will only run in a txt-sps or txt-sas pair in which the setup file contains instructions to open that text file. It will NOT open other text files, .sav, .sas, or .por data files. Fixed-width ASCII files with setup files are common in older (pre-2000) government data.
Maintained by Jacob Kaplan. Last updated 8 months ago.
asciidatdata-readerfixed-widthfixed-width-parserfixed-width-tablesfixed-width-textsasspss
11 stars 6.34 score 22 scripts 1 dependentssbg
sevenbridges2:The 'Seven Bridges Platform' API Client
R client and utilities for 'Seven Bridges Platform' API, from 'Cancer Genomics Cloud' to other 'Seven Bridges' supported platforms. API documentation is hosted publicly at <https://docs.sevenbridges.com/docs/the-api>.
Maintained by Marko Trifunovic. Last updated 4 days ago.
api-clientbioinformaticscloudsevenbridges
3 stars 6.32 score 4 scriptsphuse-org
sendigR:Enable Cross-Study Analysis of 'CDISC' 'SEND' Datasets
A system enables cross study Analysis by extracting and filtering study data for control animals from 'CDISC' 'SEND' Study Repository. These data types are supported: Body Weights, Laboratory test results and Microscopic findings. These database types are supported: 'SQLite' and 'Oracle'.
Maintained by Wenxian Wang. Last updated 23 days ago.
12 stars 6.28 score 6 scriptsbioc
RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences
This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49–58.
Maintained by Pascal Belleau. Last updated 5 months ago.
geneticssoftwaresequencingwholegenomeprincipalcomponentgeneticvariabilitydimensionreductionbiocviewsancestrycancer-genomicsexome-sequencinggenomicsinferencer-languagerna-seqrna-sequencingwhole-genome-sequencing
5 stars 6.23 score 19 scriptssjmack
HLAtools:Toolkit for HLA Immunogenomics
A toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics HLA 'GitHub' repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.
Maintained by Steven Mack. Last updated 26 days ago.
4 stars 6.21 score 7 scripts 1 dependentsjmping
weights:Weighting and Weighted Statistics
Provides a variety of functions for producing simple weighted statistics, such as weighted Pearson's correlations, partial correlations, Chi-Squared statistics, histograms, and t-tests. Also now includes some software for quickly recoding survey data and plotting estimates from interaction terms in regressions (and multiply imputed regressions) both with and without weights. NOTE: Weighted partial correlation calculations pulled to address a bug.
Maintained by Josh Pasek. Last updated 4 years ago.
6.20 score 590 scripts 40 dependentsekstroem
dataReporter:Reproducible Data Screening Checks and Report of Possible Errors
Data screening is an important first step of any statistical analysis. 'dataReporter' auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset. See Petersen AH, Ekstrøm CT (2019). "dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R." _Journal of Statistical Software_, *90*(6), 1-38 <doi:10.18637/jss.v090.i06> for more information.
Maintained by Claus Thorn Ekstrøm. Last updated 2 years ago.
86 stars 6.16 score 34 scriptsnataliepatten
gatoRs:Geographic and Taxonomic Occurrence R-Based Scrubbing
Streamlines downloading and cleaning biodiversity data from Integrated Digitized Biocollections (iDigBio) and the Global Biodiversity Information Facility (GBIF).
Maintained by Natalie N. Patten. Last updated 11 months ago.
11 stars 6.16 score 66 scriptsjoliencremers
bpnreg:Bayesian Projected Normal Regression Models for Circular Data
Fitting Bayesian multiple and mixed-effect regression models for circular data based on the projected normal distribution. Both continuous and categorical predictors can be included. Sampling from the posterior is performed via an MCMC algorithm. Posterior descriptives of all parameters, model fit statistics and Bayes factors for hypothesis tests for inequality constrained hypotheses are provided. See Cremers, Mulder & Klugkist (2018) <doi:10.1111/bmsp.12108> and Nuñez-Antonio & Guttiérez-Peña (2014) <doi:10.1016/j.csda.2012.07.025>.
Maintained by Jolien Cremers. Last updated 1 years ago.
14 stars 6.15 score 101 scriptscalcita
ech:Downloading and Processing Microdata from ECH-INE (Uruguay)
A consistent tool for downloading ECH data, processing them and generating new indicators: poverty, education, employment, etc. All data are downloaded from the official site of the National Institute of Statistics at <https://www.gub.uy/instituto-nacional-estadistica/datos-y-estadisticas/encuestas/encuesta-continua-hogares>.
Maintained by Gabriela Mathieu. Last updated 1 years ago.
16 stars 6.15 score 22 scriptsnlmixr2
babelmixr2:Use 'nlmixr2' to Interact with Open Source and Commercial Software
Run other estimation and simulation software via the 'nlmixr2' (Fidler et al (2019) <doi:10.1002/psp4.12445>) interface including 'PKNCA', 'NONMEM' and 'Monolix'. While not required, you can get/install the 'lixoftConnectors' package in the 'Monolix' installation, as described at the following url <https://monolixsuite.slp-software.com/r-functions/2024R1/installation-and-initialization>. When 'lixoftConnectors' is available, 'Monolix' can be run directly instead of setting up command line usage.
Maintained by Matthew Fidler. Last updated 18 days ago.
monolixnonmempharmacometricscpp
9 stars 6.11 score 53 scriptsbig-life-lab
cchsflow:Transforming and Harmonizing CCHS Variables
Supporting the use of the Canadian Community Health Survey (CCHS) by transforming variables from each cycle into harmonized, consistent versions that span survey cycles (currently, 2001 to 2018). CCHS data used in this library is accessed and adapted in accordance to the Statistics Canada Open Licence Agreement. This package uses rec_with_table(), which was developed from 'sjmisc' rec(). Lüdecke D (2018). "sjmisc: Data and Variable Transformation Functions". Journal of Open Source Software, 3(26), 754. <doi:10.21105/joss.00754>.
Maintained by Kitty Chen. Last updated 1 years ago.
12 stars 6.02 score 192 scriptsdanchaltiel
EDCimport:Import Data from EDC Software
A convenient toolbox to import data exported from Electronic Data Capture (EDC) software 'TrialMaster'.
Maintained by Dan Chaltiel. Last updated 19 days ago.
6.01 score 12 scriptseu-ecdc
epitweetr:Early Detection of Public Health Threats from 'Twitter' Data
It allows you to automatically monitor trends of tweets by time, place and topic aiming at detecting public health threats early through the detection of signals (e.g. an unusual increase in the number of tweets). It was designed to focus on infectious diseases, and it can be extended to all hazards or other fields of study by modifying the topics and keywords. More information is available in the 'epitweetr' peer-review publication (doi:10.2807/1560-7917.ES.2022.27.39.2200177).
Maintained by Laura Espinosa. Last updated 1 years ago.
early-warning-systemsepidemic-surveillancelucenemachine-learningsignal-detectionsparktwitter
56 stars 5.98 score 86 scriptsgbganalyst
bulkreadr:The Ultimate Tool for Reading Data in Bulk
Designed to simplify and streamline the process of reading and processing large volumes of data in R, this package offers a collection of functions tailored for bulk data operations. It enables users to efficiently read multiple sheets from Microsoft Excel and Google Sheets workbooks, as well as various CSV files from a directory. The data is returned as organized data frames, facilitating further analysis and manipulation. Ideal for handling extensive data sets or batch processing tasks, bulkreadr empowers users to manage data in bulk effortlessly, saving time and effort in data preparation workflows. Additionally, the package seamlessly works with labelled data from SPSS and Stata.
Maintained by Ezekiel Ogundepo. Last updated 7 months ago.
bulkreadercsv-readerdata-importgooglesheetsmissing-valuesxlsxreader
12 stars 5.94 score 12 scriptsbioc
SCOPE:A normalization and copy number estimation method for single-cell DNA sequencing
Whole genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy number profiles at the cellular level. This circumvents the averaging effects associated with bulk-tissue sequencing and has increased resolution yet decreased ambiguity in deconvolving cancer subclones and elucidating cancer evolutionary history. ScDNA-seq data is, however, sparse, noisy, and highly variable even within a homogeneous cell population, due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we propose SCOPE, a normalization and copy number estimation method for scDNA-seq data. The distinguishing features of SCOPE include: (i) utilization of cell-specific Gini coefficients for quality controls and for identification of normal/diploid cells, which are further used as negative control samples in a Poisson latent factor model for normalization; (ii) modeling of GC content bias using an expectation-maximization algorithm embedded in the Poisson generalized linear models, which accounts for the different copy number states along the genome; (iii) a cross-sample iterative segmentation procedure to identify breakpoints that are shared across cells from the same genetic background.
Maintained by Rujin Wang. Last updated 5 months ago.
singlecellnormalizationcopynumbervariationsequencingwholegenomecoveragealignmentqualitycontroldataimportdnaseq
5.92 score 84 scriptsswissclinicaltrialorganisation
secuTrialR:Handling of Data from the Clinical Data Management System 'secuTrial'
Seamless and standardized interaction with data exported from the clinical data management system (CDMS) 'secuTrial'<https://www.secutrial.com>. The primary data export the package works with is a standard non-rectangular export.
Maintained by Alan G. Haynes. Last updated 10 months ago.
9 stars 5.91 score 15 scriptsbruigtp
REDCapDM:'REDCap' Data Management
REDCap Data Management - REDCapDM is an R package that allows users to manage data exported directly from REDCap or using an API connection. This package includes several functions designed for pre-processing data, generating reports of queries such as outliers or missing values, and following up on the identified queries. 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>) is a web application developed at Vanderbilt University, designed for creating and managing online surveys and databases and the REDCap API is an interface that allows external applications to connect to REDCap remotely, and is used to programmatically retrieve or modify project data or settings within REDCap, such as importing or exporting data.
Maintained by João Carmezim. Last updated 15 days ago.
4 stars 5.89 score 9 scriptstirgit
missCompare:Intuitive Missing Data Imputation Framework
Offers a convenient pipeline to test and compare various missing data imputation algorithms on simulated and real data. These include simpler methods, such as mean and median imputation and random replacement, but also include more sophisticated algorithms already implemented in popular R packages, such as 'mi', described by Su et al. (2011) <doi:10.18637/jss.v045.i02>; 'mice', described by van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>; 'missForest', described by Stekhoven and Buhlmann (2012) <doi:10.1093/bioinformatics/btr597>; 'missMDA', described by Josse and Husson (2016) <doi:10.18637/jss.v070.i01>; and 'pcaMethods', described by Stacklies et al. (2007) <doi:10.1093/bioinformatics/btm069>. The central assumption behind 'missCompare' is that structurally different datasets (e.g. larger datasets with a large number of correlated variables vs. smaller datasets with non correlated variables) will benefit differently from different missing data imputation algorithms. 'missCompare' takes measurements of your dataset and sets up a sandbox to try a curated list of standard and sophisticated missing data imputation algorithms and compares them assuming custom missingness patterns. 'missCompare' will also impute your real-life dataset for you after the selection of the best performing algorithm in the simulations. The package also provides various post-imputation diagnostics and visualizations to help you assess imputation performance.
Maintained by Tibor V. Varga. Last updated 4 years ago.
comparisoncomparison-benchmarksimputationimputation-algorithmimputation-methodsimputationskolmogorov-smirnovmissingmissing-datamissing-data-imputationmissing-status-checkmissing-valuesmissingnesspost-imputation-diagnosticsrmse
39 stars 5.89 score 40 scriptsflr
FLBEIA:Bio-Economic Impact Assessment of Management Strategies using FLR
A simulation toolbox that describes a fishery system under a Management Strategy Estrategy approach. The objective of the model is to facilitate the Bio-Economic evaluation of Management strategies. It is multistock, multifleet and seasonal. The simulation is divided in 2 main blocks, the Operating Model (OM) and the Management Procedure (MP). In turn, each of these two blocks is divided in 3 components: the biological, the fleets and the covariables on the one hand, and the observation, the assessment and the advice on the other.
Maintained by FLBEIA Team. Last updated 17 days ago.
11 stars 5.89 score 156 scriptsbioc
miRspongeR:Identification and analysis of miRNA sponge regulation
This package provides several functions to explore miRNA sponge (also called ceRNA or miRNA decoy) regulation from putative miRNA-target interactions or/and transcriptomics data (including bulk, single-cell and spatial gene expression data). It provides eight popular methods for identifying miRNA sponge interactions, and an integrative method to integrate miRNA sponge interactions from different methods, as well as the functions to validate miRNA sponge interactions, and infer miRNA sponge modules, conduct enrichment analysis of miRNA sponge modules, and conduct survival analysis of miRNA sponge modules. By using a sample control variable strategy, it provides a function to infer sample-specific miRNA sponge interactions. In terms of sample-specific miRNA sponge interactions, it implements three similarity methods to construct sample-sample correlation network.
Maintained by Junpeng Zhang. Last updated 5 months ago.
geneexpressionbiomedicalinformaticsnetworkenrichmentsurvivalmicroarraysoftwaresinglecellspatialrnaseqcernamirnasponge
5 stars 5.88 score 8 scriptsedhofman
ReSurv:Machine Learning Models For Predicting Claim Counts
Prediction of claim counts using the feature based development factors introduced in the manuscript <doi:10.48550/arXiv.2312.14549>. Implementation of Neural Networks, Extreme Gradient Boosting, and Cox model with splines to optimise the partial log-likelihood of proportional hazard models.
Maintained by Emil Hofman. Last updated 5 months ago.
2 stars 5.87 score 21 scriptsdopatendo
ILSAmerge:Merge and Download International Large-Scale Assessments (ILSA) Data
Merges and downloads 'SPSS' data from different International Large-Scale Assessments (ILSA), including: Trends in International Mathematics and Science Study (TIMSS), Progress in International Reading Literacy Study (PIRLS), and others.
Maintained by Andrés Christiansen. Last updated 1 months ago.
2 stars 5.86 score 12 scriptsdickoa
robotoolbox:Client for the 'KoboToolbox' API
Suite of utilities for accessing and manipulating data from the 'KoboToolbox' API. 'KoboToolbox' is a robust platform designed for field data collection in various disciplines. This package aims to simplify the process of fetching and handling data from the API. Detailed documentation for the 'KoboToolbox' API can be found at <https://support.kobotoolbox.org/api.html>.
Maintained by Ahmadou Dicko. Last updated 3 months ago.
open-datakobotoolboxodkkpiapidatadataset
5.86 score 48 scriptscardiomoon
ggplotAssist:'RStudio' Addin for Teaching and Learning 'ggplot2'
An 'RStudio' addin for teaching and learning making plot using the 'ggplot2' package. You can learn each steps of making plot by clicking your mouse without coding. You can get resultant code for the plot.
Maintained by Keon-Woong Moon. Last updated 7 years ago.
79 stars 5.85 score 18 scriptshsvab
odbr:Download Data from Brazil's Origin Destination Surveys
Download data from Brazil's Origin Destination Surveys. The package covers both data from household travel surveys, dictionaries of variables, and the spatial geometries of surveys conducted in different years and across various urban areas in Brazil. For some cities, the package will include enhanced versions of the data sets with variables "harmonized" across different years.
Maintained by Haydee Svab. Last updated 1 months ago.
16 stars 5.85 score 11 scriptsrpruim
fastR2:Foundations and Applications of Statistics Using R (2nd Edition)
Data sets and utilities to accompany the second edition of "Foundations and Applications of Statistics: an Introduction using R" (R Pruim, published by AMS, 2017), a text covering topics from probability and mathematical statistics at an advanced undergraduate level. R is integrated throughout, and access to all the R code in the book is provided via the snippet() function.
Maintained by Randall Pruim. Last updated 1 years ago.
13 stars 5.85 score 108 scriptsbayer-group
adepro:A 'shiny' Application for the (Audio-)Visualization of Adverse Event Profiles
Contains a 'shiny' application called AdEPro (Animation of Adverse Event Profiles) which (audio-)visualizes adverse events occurring in clinical trials. As this data is usually considered sensitive, this tool is provided as a stand-alone application that can be launched from any local machine on which the data is stored.
Maintained by Nicole Rethemeier. Last updated 5 days ago.
adverse-eventsbayer-not-classifiedbayer-reg-nonebeat-not-applicableclinical-trialsdata-insightsshiny-appsvisualization
7 stars 5.84 score 11 scriptsbioc
ISAnalytics:Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies
In gene therapy, stem cells are modified using viral vectors to deliver the therapeutic transgene and replace functional properties since the genetic modification is stable and inherited in all cell progeny. The retrieval and mapping of the sequences flanking the virus-host DNA junctions allows the identification of insertion sites (IS), essential for monitoring the evolution of genetically modified cells in vivo. A comprehensive toolkit for the analysis of IS is required to foster clonal trackign studies and supporting the assessment of safety and long term efficacy in vivo. This package is aimed at (1) supporting automation of IS workflow, (2) performing base and advance analysis for IS tracking (clonal abundance, clonal expansions and statistics for insertional mutagenesis, etc.), (3) providing basic biology insights of transduced stem cells in vivo.
Maintained by Francesco Gazzo. Last updated 4 months ago.
biomedicalinformaticssequencingsinglecell
3 stars 5.83 score 15 scriptsjdjohn215
pollster:Calculate Crosstab and Topline Tables of Weighted Survey Data
Calculate common types of tables for weighted survey data. Options include topline and (2-way and 3-way) crosstab tables of categorical or ordinal data as well as summary tables of weighted numeric variables. Optionally, include the margin of error at selected confidence intervals including the design effect. The design effect is calculated as described by Kish (1965) <doi:10.1002/bimj.19680100122> beginning on page 257. Output takes the form of tibbles (simple data frames). This package conveniently handles labelled data, such as that commonly used by 'Stata' and 'SPSS.' Complex survey design is not supported at this time.
Maintained by John D. Johnson. Last updated 2 years ago.
9 stars 5.80 score 47 scriptsbioc
benchdamic:Benchmark of differential abundance methods on microbiome data
Starting from a microbiome dataset (16S or WMS with absolute count values) it is possible to perform several analysis to assess the performances of many differential abundance detection methods. A basic and standardized version of the main differential abundance analysis methods is supplied but the user can also add his method to the benchmark. The analyses focus on 4 main aspects: i) the goodness of fit of each method's distributional assumptions on the observed count data, ii) the ability to control the false discovery rate, iii) the within and between method concordances, iv) the truthfulness of the findings if any apriori knowledge is given. Several graphical functions are available for result visualization.
Maintained by Matteo Calgaro. Last updated 4 months ago.
metagenomicsmicrobiomedifferentialexpressionmultiplecomparisonnormalizationpreprocessingsoftwarebenchmarkdifferential-abundance-methods
8 stars 5.78 score 8 scriptsdrsimonspencer
AMISforInfectiousDiseases:Implement the AMIS Algorithm for Infectious Disease Models
Implements the Adaptive Multiple Importance Sampling (AMIS) algorithm, as described by Retkute et al. (2021, <doi:10.1214/21-AOAS1486>), to estimate key epidemiological parameters by combining outputs from a geostatistical model of infectious diseases (such as prevalence, incidence, or relative risk) with a disease transmission model. Utilising the resulting posterior distributions, the package enables forward projections at the local level.
Maintained by Simon Spencer. Last updated 2 months ago.
5.78 score 6 scriptsrichardli
surveyPrev:Mapping the Prevalence of Binary Indicators using Survey Data in Small Areas
Provides a pipeline to perform small area estimation and prevalence mapping of binary indicators using health and demographic survey data, described in Fuglstad et al. (2022) <doi:10.48550/arXiv.2110.09576> and Wakefield et al. (2020) <doi:10.1111/insr.12400>.
Maintained by Qianyu Dong. Last updated 20 hours ago.
1 stars 5.76 score 11 scriptshomerhanumat
tigerstats:R Functions for Elementary Statistics
A collection of data sets and functions that are useful in the teaching of statistics at an elementary level to students who may have little or no previous experience with the command line. The functions for elementary inferential procedures follow a uniform interface for user input. Some of the functions are instructional applets that can only be run on the R Studio integrated development environment with package 'manipulate' installed. Other instructional applets are Shiny apps that may be run locally. In teaching the package is used alongside of package 'mosaic', 'mosaicData' and 'abd', which are therefore listed as dependencies.
Maintained by Homer White. Last updated 5 years ago.
16 stars 5.74 score 327 scriptsjwiley
multilevelTools:Multilevel and Mixed Effects Model Diagnostics and Effect Sizes
Effect sizes, diagnostics and performance metrics for multilevel and mixed effects models. Includes marginal and conditional 'R2' estimates for linear mixed effects models based on Johnson (2014) <doi:10.1111/2041-210X.12225>.
Maintained by Joshua F. Wiley. Last updated 3 days ago.
4 stars 5.74 score 136 scriptsbioc
limpca:An R package for the linear modeling of high-dimensional designed data based on ASCA/APCA family of methods
This package has for objectives to provide a method to make Linear Models for high-dimensional designed data. limpca applies a GLM (General Linear Model) version of ASCA and APCA to analyse multivariate sample profiles generated by an experimental design. ASCA/APCA provide powerful visualization tools for multivariate structures in the space of each effect of the statistical model linked to the experimental design and contrarily to MANOVA, it can deal with mutlivariate datasets having more variables than observations. This method can handle unbalanced design.
Maintained by Manon Martin. Last updated 5 months ago.
statisticalmethodprincipalcomponentregressionvisualizationexperimentaldesignmultiplecomparisongeneexpressionmetabolomics
2 stars 5.73 score 2 scriptsjosie-athens
pubh:A Toolbox for Public Health and Epidemiology
A toolbox for making R functions and capabilities more accessible to students and professionals from Epidemiology and Public Health related disciplines. Includes a function to report coefficients and confidence intervals from models using robust standard errors (when available), functions that expand 'ggplot2' plots and functions relevant for introductory papers in Epidemiology or Public Health. Please note that use of the provided data sets is for educational purposes only.
Maintained by Josie Athens. Last updated 6 months ago.
5 stars 5.73 score 72 scriptsbioc
rexposome:Exposome exploration and outcome data analysis
Package that allows to explore the exposome and to perform association analyses between exposures and health outcomes.
Maintained by Xavier Escribà Montagut. Last updated 5 months ago.
softwarebiologicalquestioninfrastructuredataimportdatarepresentationbiomedicalinformaticsexperimentaldesignmultiplecomparisonclassificationclustering
5.70 score 28 scripts 1 dependentsmaraab23
ggseqplot:Render Sequence Plots using 'ggplot2'
A set of wrapper functions that mainly re-produces most of the sequence plots rendered with TraMineR::seqplot(). Whereas 'TraMineR' uses base R to produce the plots this library draws on 'ggplot2'. The plots are produced on the basis of a sequence object defined with TraMineR::seqdef(). The package automates the reshaping and plotting of sequence data. Resulting plots are of class 'ggplot', i.e. components can be added and tweaked using '+' and regular 'ggplot2' functions.
Maintained by Marcel Raab. Last updated 4 months ago.
ggplot2sequence-analysistraminervisualization
14 stars 5.70 score 18 scriptsplant-functional-trait-course
fluxible:Ecosystem Gas Fluxes Calculations for Closed Loop Chamber Setup
Processes the raw data from closed loop flux chamber (or tent) setups into ecosystem gas fluxes usable for analysis. It goes from a data frame of gas concentration over time (which can contain several measurements) and a meta data file indicating which measurement was done when, to a data frame of ecosystem gas fluxes including quality diagnostics. Functions provided include different models (exponential as described in Zhao et al (2018) <doi:10.1016/j.agrformet.2018.08.022>, quadratic and linear) to estimate the fluxes from the raw data, quality assessment, plotting for visual check and calculation of fluxes based on the setup specific parameters (chamber size, plot area, ...).
Maintained by Joseph Gaudard. Last updated 1 days ago.
5.69 score 12 scriptsthermostats
RVA:RNAseq Visualization Automation
Automate downstream visualization & pathway analysis in RNAseq analysis. 'RVA' is a collection of functions that efficiently visualize RNAseq differential expression analysis result from summary statistics tables. It also utilize the Fisher's exact test to evaluate gene set or pathway enrichment in a convenient and efficient manner.
Maintained by Xingpeng Li. Last updated 3 years ago.
9 stars 5.65 score 6 scriptsbioc
multicrispr:Multi-locus multi-purpose Crispr/Cas design
This package is for designing Crispr/Cas9 and Prime Editing experiments. It contains functions to (1) define and transform genomic targets, (2) find spacers (4) count offtarget (mis)matches, and (5) compute Doench2016/2014 targeting efficiency. Care has been taken for multicrispr to scale well towards large target sets, enabling the design of large Crispr/Cas9 libraries.
Maintained by Aditya Bhagwat. Last updated 4 months ago.
5.65 score 2 scriptsbioc
MSstatsLiP:LiP Significance Analysis in shotgun mass spectrometry-based proteomic experiments
Tools for LiP peptide and protein significance analysis. Provides functions for summarization, estimation of LiP peptide abundance, and detection of changes across conditions. Utilizes functionality across the MSstats family of packages.
Maintained by Devon Kohler. Last updated 5 months ago.
immunooncologymassspectrometryproteomicssoftwaredifferentialexpressiononechanneltwochannelnormalizationqualitycontrolcpp
7 stars 5.62 score 5 scriptstntp
tntpr:Data Analysis Tools Customized for TNTP
An assortment of functions and templates customized to meet the needs of data analysts at the non-profit organization TNTP. Includes functions for branded colors and plots, credentials management, repository set-up, and other common analytic tasks.
Maintained by Dustin Pashouwer. Last updated 4 months ago.
7 stars 5.61 score 13 scriptsucd-serg
serocalculator:Estimating Infection Rates from Serological Data
Translates antibody levels measured in cross-sectional population samples into estimates of the frequency with which seroconversions (infections) occur in the sampled populations. Replaces the previous `seroincidence` package.
Maintained by Kristina Lai. Last updated 5 days ago.
epidemiologyincidence-estimationseroepidemiology
6 stars 5.61 score 13 scriptsbioc
gpuMagic:An openCL compiler with the capacity to compile R functions and run the code on GPU
The package aims to help users write openCL code with little or no effort. It is able to compile an user-defined R function and run it on a device such as a CPU or a GPU. The user can also write and run their openCL code directly by calling .kernel function.
Maintained by Jiefei Wang. Last updated 5 months ago.
10 stars 5.60 score 1 scriptschaisemartinpackages
TwoWayFEWeights:Estimation of the Weights Attached to the Two-Way Fixed Effects Regressions
Estimates the weights and measure of robustness to treatment effect heterogeneity attached to two-way fixed effects regressions. Clément de Chaisemartin, Xavier D'Haultfœuille (2020) <DOI: 10.1257/aer.20181169>.
Maintained by Diego Ciccia. Last updated 8 months ago.
18 stars 5.58 score 20 scriptsmrcieu
mrbayes:Bayesian Summary Data Models for Mendelian Randomization Studies
Bayesian estimation of inverse variance weighted (IVW), Burgess et al. (2013) <doi:10.1002/gepi.21758>, and MR-Egger, Bowden et al. (2015) <doi:10.1093/ije/dyv080>, summary data models for Mendelian randomization analyses.
Maintained by Tom Palmer. Last updated 13 days ago.
4 stars 5.56 score 2 scriptsmiddleton-lab
abd:The Analysis of Biological Data
The abd package contains data sets and sample code for The Analysis of Biological Data by Michael Whitlock and Dolph Schluter (2009; Roberts & Company Publishers).
Maintained by Kevin M. Middleton. Last updated 11 months ago.
6 stars 5.53 score 182 scripts 1 dependentsopenanalytics
inTextSummaryTable:Creation of in-Text Summary Table
Creation of tables of summary statistics or counts for clinical data (for 'TLFs'). These tables can be exported as in-text table (with the 'flextable' package) for a Clinical Study Report (Word format) or a 'topline' presentation (PowerPoint format), or as interactive table (with the 'DT' package) to an html document for clinical data review.
Maintained by Laure Cougnaud. Last updated 10 months ago.
1 stars 5.52 score 47 scriptssvmiller
peacesciencer:Tools and Data for Quantitative Peace Science Research
These are useful tools and data sets for the study of quantitative peace science. The goal for this package is to include tools and data sets for doing original research that mimics well what a user would have to previously get from a software package that may not be well-sourced or well-supported. Those software bundles were useful the extent to which they encourage replications of long-standing analyses by starting the data-generating process from scratch. However, a lot of the functionality can be done relatively quickly and more transparently in the R programming language.
Maintained by Steve Miller. Last updated 16 days ago.
29 stars 5.49 score 211 scriptsropensci
EndoMineR:Functions to mine endoscopic and associated pathology datasets
This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.
Maintained by Sebastian Zeki. Last updated 7 months ago.
endoscopygastroenterologypeer-reviewedsemi-structured-datatext-mining
13 stars 5.47 score 30 scriptsilostat
Rilostat:ILO Open Data via Ilostat Bulk Download Facility
Tools to download data from the [ilostat](<https://ilostat.ilo.org>) database together with search and manipulation utilities.
Maintained by David Bescond. Last updated 19 days ago.
apidatasetopen-sourcepublic-api
34 stars 5.47 score 43 scriptsalvesks
epifitter:Analysis and Simulation of Plant Disease Progress Curves
Analysis and visualization of plant disease progress curve data. Functions for fitting two-parameter population dynamics models (exponential, monomolecular, logistic and Gompertz) to proportion data for single or multiple epidemics using either linear or no-linear regression. Statistical and visual outputs are provided to aid in model selection. Synthetic curves can be simulated for any of the models given the parameters. See Laurence V. Madden, Gareth Hughes, and Frank van den Bosch (2007) <doi:10.1094/9780890545058> for further information on the methods.
Maintained by Kaique dos S. Alves. Last updated 2 months ago.
5 stars 5.42 score 53 scriptsmaelstrom-research
madshapR:Support Technical Processes Following 'Maelstrom Research' Standards
Functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).
Maintained by Guillaume Fabre. Last updated 11 months ago.
2 stars 5.40 score 28 scripts 3 dependentsgertstulp
ggplotgui:Create Ggplots via a Graphical User Interface
Easily explore data by creating ggplots through a (shiny-)GUI. R-code to recreate graph provided.
Maintained by Gert Stulp. Last updated 5 years ago.
139 stars 5.40 score 18 scriptsjeksterslab
semmcci:Monte Carlo Confidence Intervals in Structural Equation Modeling
Monte Carlo confidence intervals for free and defined parameters in models fitted in the structural equation modeling package 'lavaan' can be generated using the 'semmcci' package. 'semmcci' has three main functions, namely, MC(), MCMI(), and MCStd(). The output of 'lavaan' is passed as the first argument to the MC() function or the MCMI() function to generate Monte Carlo confidence intervals. Monte Carlo confidence intervals for the standardized estimates can also be generated by passing the output of the MC() function or the MCMI() function to the MCStd() function. A description of the package and code examples are presented in Pesigan and Cheung (2023) <doi:10.3758/s13428-023-02114-4>.
Maintained by Ivan Jacob Agaloos Pesigan. Last updated 3 months ago.
confidence-intervalsmonte-carlostructural-equation-modeling
2 stars 5.39 score 76 scriptschoonghyunryu
alookr:Model Classifier for Binary Classification
A collection of tools that support data splitting, predictive modeling, and model evaluation. A typical function is to split a dataset into a training dataset and a test dataset. Then compare the data distribution of the two datasets. Another feature is to support the development of predictive models and to compare the performance of several predictive models, helping to select the best model.
Maintained by Choonghyun Ryu. Last updated 1 years ago.
12 stars 5.38 score 9 scriptsbtskinner
duawranglr:Securely Wrangle Dataset According to Data Usage Agreement
Create shareable data sets from raw data files that contain protected elements. Relying on master crosswalk files that list restricted variables, package functions warn users about possible violations of data usage agreement and prevent writing protected elements.
Maintained by Benjamin Skinner. Last updated 4 years ago.
data-securitydata-usage-agreementdata-wrangling
9 stars 5.37 score 13 scriptsbioc
SPONGE:Sparse Partial Correlations On Gene Expression
This package provides methods to efficiently detect competitive endogeneous RNA interactions between two genes. Such interactions are mediated by one or several miRNAs such that both gene and miRNA expression data for a larger number of samples is needed as input. The SPONGE package now also includes spongEffects: ceRNA modules offer patient-specific insights into the miRNA regulatory landscape.
Maintained by Markus List. Last updated 5 months ago.
geneexpressiontranscriptiongeneregulationnetworkinferencetranscriptomicssystemsbiologyregressionrandomforestmachinelearning
5.36 score 38 scripts 1 dependentss87jackson
rfars:Download and Analyze Crash Data
Download crash data from the National Highway Traffic Safety Administration and prepare it for research.
Maintained by Steve Jackson. Last updated 12 months ago.
crashfatalitiesofficial-statisticstransportation
10 stars 5.35 score 15 scriptsjunjunlab
transPlotR:Visualize Transcript Structures in Elegant Way
To visualize the gene structure with multiple isoforms better, I developed this package to draw different transcript structures easily.
Maintained by Jun Zhang. Last updated 2 years ago.
bedbigwiggenelinkvistranscriptvisualization
73 stars 5.34 score 60 scriptsmatteo21q
dani:Design and Analysis of Non-Inferiority Trials
Provides tools to help with the design and analysis of non-inferiority trials. These include functions for doing sample size calculations and for analysing non-inferiority trials, using a variety of outcome types and population-level sumamry measures. It also features functions to make trials more resilient by using the concept of non-inferiority frontiers, as described in Quartagno et al. (2019) <arXiv:1905.00241>. Finally it includes function to design and analyse MAMS-ROCI (aka DURATIONS) trials.
Maintained by Matteo Quartagno. Last updated 7 months ago.
2 stars 5.33 score 27 scriptssantagos
dad:Three-Way / Multigroup Data Analysis Through Densities
The data consist of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides tools to create or manage such data and functional methods (principal component analysis, multidimensional scaling, cluster analysis, discriminant analysis...) for such probability densities.
Maintained by Pierre Santagostini. Last updated 4 months ago.
5.32 score 92 scriptselliecurnow
midoc:A Decision-Making System for Multiple Imputation
A guidance system for analysis with missing data. It incorporates expert, up-to-date methodology to help researchers choose the most appropriate analysis approach when some data are missing. You provide the available data and the assumed causal structure, including the likely causes of missing data. 'midoc' will advise which analysis approaches can be used, and how best to perform them. 'midoc' follows the framework for the treatment and reporting of missing data in observational studies (TARMOS). Lee et al (2021). <doi:10.1016/j.jclinepi.2021.01.008>.
Maintained by Elinor Curnow. Last updated 6 months ago.
missing-datamultiple-imputation
6 stars 5.32 score 8 scriptsmspinillos
ecoregime:Analysis of Ecological Dynamic Regimes
A toolbox for implementing the Ecological Dynamic Regime framework (Sánchez-Pinillos et al., 2023 <doi:10.1002/ecm.1589>) to characterize and compare groups of ecological trajectories in multidimensional spaces defined by state variables. The package includes the RETRA-EDR algorithm to identify representative trajectories, functions to generate, summarize, and visualize representative trajectories, and several metrics to quantify the distribution and heterogeneity of trajectories in an ecological dynamic regime and quantify the dissimilarity between two or more ecological dynamic regimes. The package also includes a set of functions to assess ecological resilience based on ecological dynamic regimes (Sánchez-Pinillos et al., 2024 <doi:10.1016/j.biocon.2023.110409>).
Maintained by Martina Sánchez-Pinillos. Last updated 12 months ago.
7 stars 5.32 score 8 scriptsf-silva-archaeo
skyscapeR:Data Analysis and Visualization for Skyscape Archaeology
Data reduction, visualization and statistical analysis of measurements of orientation of archaeological structures, following Silva (2020) <doi:10.1016/j.jas.2020.105138>.
Maintained by Silva Fabio. Last updated 6 months ago.
5 stars 5.31 score 41 scriptseldafani
intsvy:International Assessment Data Manager
Provides tools for importing, merging, and analysing data from international assessment studies (TIMSS, PIRLS, PISA, ICILS, and PIAAC).
Maintained by Daniel Caro. Last updated 1 years ago.
22 stars 5.29 score 88 scriptsbtskinner
crosswalkr:Rename and Encode Data Frames Using External Crosswalk Files
A pair of functions for renaming and encoding data frames using external crosswalk files. It is especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on similar commands in 'Stata'.
Maintained by Benjamin Skinner. Last updated 1 years ago.
9 stars 5.26 score 20 scriptsnceas
scicomptools:Tools Developed by the NCEAS Scientific Computing Support Team
Set of tools to import, summarize, wrangle, and visualize data. These functions were originally written based on the needs of the various synthesis working groups that were supported by the National Center for Ecological Analysis and Synthesis (NCEAS). These tools are meant to be useful inside and outside of the context for which they were designed.
Maintained by Angel Chen. Last updated 5 months ago.
9 stars 5.26 score 6 scriptsjiang-junyao
CACIMAR:cross-species analysis of cell identities, markers and regulations
A toolkit to perform cross-species analysis based on scRNA-seq data. CACIMAR contains 5 main features. (1) identify Markers in each cluster. (2) Cell type annotaion (3) identify conserved markers. (4) identify conserved cell types. (5) identify conserved modules of regulatory networks.
Maintained by Junyao Jiang. Last updated 4 months ago.
cross-species-analysisscrna-seq
12 stars 5.26 score 6 scriptsharrison4192
validata:Validate Data Frames
Functions for validating the structure and properties of data frames. Answers essential questions about a data set after initial import or modification. What are the unique or missing values? What columns form a primary key? What are the properties of the numeric or categorical columns? What kind of overlap or mapping exists between 2 columns?
Maintained by Harrison Tietze. Last updated 24 days ago.
6 stars 5.26 score 4 scripts 1 dependentsanna-neufeld
splinetree:Longitudinal Regression Trees and Forests
Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.
Maintained by Anna Neufeld. Last updated 6 years ago.
4 stars 5.24 score 29 scripts