R-universe search: needs:haven

tidyverse

tidyverse:Easily Install and Load the 'Tidyverse'

The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.

Maintained by Hadley Wickham. Last updated 5 months ago.

data-science tidyverse

1.7k stars 20.23 score 664k scripts 125 dependents

gesistsa

rio:A Swiss-Army Knife for Data I/O

Streamlined data import and export by making assumptions that the user is probably willing to make: 'import()' and 'export()' determine the data format from the file extension, reasonable defaults are used for data import and export, web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly, and fast import packages are used where appropriate. An additional convenience function, 'convert()', provides a simple method for converting between file types.

Maintained by Chung-hong Chan. Last updated 3 months ago.

csv csvy data data-science excel io rio sas spss stata

610 stars 17.10 score 7.8k scripts 74 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 23 hours ago.

fortran cpp

86 stars 16.73 score 7.7k scripts 101 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 23 hours ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

462 stars 16.64 score 10k scripts 154 dependents

larmarange

labelled:Manipulating Labelled Data

Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.

Maintained by Joseph Larmarange. Last updated 1 months ago.

haven labels metadata sas spss stata

76 stars 15.04 score 2.4k scripts 98 dependents

kaz-yos

tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights

Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.

Maintained by Kazuki Yoshida. Last updated 3 years ago.

baseline-characteristics descriptive-statistics statistics

221 stars 13.55 score 2.3k scripts 12 dependents

projectmosaic

mosaic:Project MOSAIC Statistics and Mathematics Teaching Utilities

Data sets and utilities from Project MOSAIC (<http://www.mosaic-web.org>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.

Maintained by Randall Pruim. Last updated 1 years ago.

93 stars 13.32 score 7.2k scripts 7 dependents

dreamrs

esquisse:Explore and Visualize Your Data Interactively

A 'shiny' gadget to create 'ggplot2' figures interactively with drag-and-drop to map your variables to different aesthetics. You can quickly visualize your data accordingly to their type, export in various formats, and retrieve the code to reproduce the plot.

Maintained by Victor Perrier. Last updated 1 months ago.

addin data-visualization ggplot2 rstudio-addin visualization

1.8k stars 13.31 score 1.1k scripts 1 dependents

juba

questionr:Functions to Make Surveys Processing Easier

Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.

Maintained by Julien Barnier. Last updated 8 days ago.

83 stars 12.93 score 1.1k scripts 19 dependents

simongrund1

mitml:Tools for Multiple Imputation in Multilevel Modeling

Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets.

Maintained by Simon Grund. Last updated 1 years ago.

imputation missing-data mixed-effects multilevel-data multilevel-models

29 stars 12.36 score 246 scripts 153 dependents

dreamrs

datamods:Modules to Import and Manipulate Data in 'Shiny'

'Shiny' modules to import data into an application or 'addin' from various sources, and to manipulate them after that.

Maintained by Victor Perrier. Last updated 24 days ago.

shiny shiny-modules

144 stars 12.03 score 174 scripts 7 dependents

projectmosaic

ggformula:Formula Interface to the Grammar of Graphics

Provides a formula interface to 'ggplot2' graphics.

Maintained by Randall Pruim. Last updated 1 years ago.

38 stars 11.55 score 1.7k scripts 25 dependents

larmarange

broom.helpers:Helpers for Model Coefficients Tibbles

Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.

Maintained by Joseph Larmarange. Last updated 23 days ago.

22 stars 11.45 score 165 scripts 2 dependents

ewenharrison

finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling

Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.

Maintained by Ewen Harrison. Last updated 8 days ago.

270 stars 11.43 score 1.0k scripts

jamiemkass

ENMeval:Automated Tuning and Evaluations of Ecological Niche Models

Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.

Maintained by Jamie M. Kass. Last updated 2 months ago.

49 stars 11.16 score 332 scripts 2 dependents

choonghyunryu

dlookr:Tools for Data Diagnosis, Exploration, Transformation

A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.

Maintained by Choonghyun Ryu. Last updated 10 months ago.

212 stars 11.05 score 748 scripts 2 dependents

ipums

ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data

An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <https://www.ipums.org>.

Maintained by Derek Burk. Last updated 1 months ago.

30 stars 11.05 score 720 scripts 2 dependents

bioc

ANCOMBC:Microbiome differential abudance and correlation analyses with bias correction

ANCOMBC is a package containing differential abundance (DA) and correlation analyses for microbiome data. Specifically, the package includes Analysis of Compositions of Microbiomes with Bias Correction 2 (ANCOM-BC2), Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC), and Analysis of Composition of Microbiomes (ANCOM) for DA analysis, and Sparse Estimation of Correlations among Microbiomes (SECOM) for correlation analysis. Microbiome data are typically subject to two sources of biases: unequal sampling fractions (sample-specific biases) and differential sequencing efficiencies (taxon-specific biases). Methodologies included in the ANCOMBC package are designed to correct these biases and construct statistically consistent estimators.

Maintained by Huang Lin. Last updated 13 days ago.

differentialexpression microbiome normalization sequencing software ancom ancombc ancombc2 correlation differential-abundance-analysis secom

120 stars 10.79 score 406 scripts 1 dependents

bioc

GWASTools:Tools for Genome Wide Association Studies

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.

Maintained by Stephanie M. Gogarten. Last updated 11 days ago.

snp geneticvariability qualitycontrol microarray

17 stars 10.67 score 396 scripts 5 dependents

bioc

GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness

The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.

Maintained by Stephanie M. Gogarten. Last updated 2 months ago.

snp geneticvariability genetics statisticalmethod dimensionreduction principalcomponent genomewideassociation qualitycontrol biocviews

36 stars 10.44 score 342 scripts 1 dependents

richardli

SUMMER:Small-Area-Estimation Unit/Area Models and Methods for Estimation in R

Provides methods for spatial and spatio-temporal smoothing of demographic and health indicators using survey data, with particular focus on estimating and projecting under-five mortality rates, described in Mercer et al. (2015) <doi:10.1214/15-AOAS872>, Li et al. (2019) <doi:10.1371/journal.pone.0210645>, Wu et al. (DHS Spatial Analysis Reports No. 21, 2021), and Li et al. (2023) <doi:10.48550/arXiv.2007.05117>.

Maintained by Zehang R Li. Last updated 3 months ago.

bayesian-inference small-area-estimation space-time

23 stars 10.28 score 134 scripts 2 dependents

insightsengineering

teal.modules.clinical:'teal' Modules for Standard Clinical Outputs

Provides user-friendly tools for creating and customizing clinical trial reports. By leveraging the 'teal' framework, this package provides 'teal' modules to easily create an interactive panel that allows for seamless adjustments to data presentation, thereby streamlining the creation of detailed and accurate reports.

Maintained by Dawid Kaledkowski. Last updated 29 days ago.

clinical-trials modules nest outputs shiny

34 stars 10.25 score 149 scripts

idigbio

ridigbio:Interface to the iDigBio Data API

An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio.

Maintained by Jesse Bennett. Last updated 18 days ago.

16 stars 10.23 score 63 scripts 7 dependents

ropensci

spocc:Interface to Species Occurrence Data Sources

A programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility ('GBIF'), 'iNaturalist', 'eBird', Integrated Digitized 'Biocollections' ('iDigBio'), 'VertNet', Ocean 'Biogeographic' Information System ('OBIS'), and Atlas of Living Australia ('ALA'). Includes functionality for retrieving species occurrence data, and combining those data.

Maintained by Hannah Owens. Last updated 2 months ago.

specimens api web-services occurrences species taxonomy gbif inat vertnet ebird idigbio obis ala antweb bison data ecoengine inaturalist occurrence species-occurrence spocc

118 stars 10.09 score 552 scripts 5 dependents

jinseob2kim

jstable:Create Tables from Different Types of Regression

Create regression tables from generalized linear model(GLM), generalized estimating equation(GEE), generalized linear mixed-effects model(GLMM), Cox proportional hazards model, survey-weighted generalized linear model(svyglm) and survey-weighted Cox model results for publication.

Maintained by Jinseob Kim. Last updated 22 hours ago.

label regression table

28 stars 10.08 score 199 scripts 1 dependents

ropensci

rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data

Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.

Maintained by OJ Watson. Last updated 30 days ago.

dataset dhs dhs-api extract peer-reviewed survey-data

35 stars 10.07 score 286 scripts 3 dependents

sdctools

sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.

Maintained by Matthias Templ. Last updated 1 months ago.

cpp

84 stars 9.63 score 258 scripts

john-d-fox

Rcmdr:R Commander

A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.

Maintained by John Fox. Last updated 5 months ago.

4 stars 9.48 score 636 scripts 38 dependents

georgheinze

logistf:Firth's Bias-Reduced Logistic Regression

Fit a logistic regression model using Firth's bias reduction method, equivalent to penalization of the log-likelihood by the Jeffreys prior. Confidence intervals for regression coefficients can be computed by penalized profile likelihood. Firth's method was proposed as ideal solution to the problem of separation in logistic regression, see Heinze and Schemper (2002) <doi:10.1002/sim.1047>. If needed, the bias reduction can be turned off such that ordinary maximum likelihood logistic regression is obtained. Two new modifications of Firth's method, FLIC and FLAC, lead to unbiased predictions and are now available in the package as well, see Puhr et al (2017) <doi:10.1002/sim.7273>.

Maintained by Georg Heinze. Last updated 2 years ago.

12 stars 9.23 score 346 scripts 16 dependents

alexanderrobitzsch

miceadds:Some Additional Multiple Imputation Functions, Especially for 'mice'

Contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).

Maintained by Alexander Robitzsch. Last updated 28 days ago.

missing-data multiple-imputation openblas cpp

16 stars 9.16 score 542 scripts 9 dependents

nickch-k

vtable:Variable Table for Variable Documentation

Automatically generates HTML variable documentation including variable names, labels, classes, value labels (if applicable), value ranges, and summary statistics. See the vignette "vtable" for a package overview.

Maintained by Nick Huntington-Klein. Last updated 3 months ago.

40 stars 9.10 score 1.2k scripts

bioc

BatchQC:Batch Effects Quality Control Software

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.

Maintained by Jessica Anderson. Last updated 11 days ago.

batcheffect graphandnetwork microarray normalization principalcomponent sequencing software visualization qualitycontrol rnaseq preprocessing differentialexpression immunooncology

7 stars 9.06 score 54 scripts

sachaepskamp

bootnet:Bootstrap Methods for Various Network Estimation Routines

Bootstrap methods to assess accuracy and stability of estimated network structures and centrality indices <doi:10.3758/s13428-017-0862-1>. Allows for flexible specification of any undirected network estimation procedure in R, and offers default sets for various estimation routines.

Maintained by Sacha Epskamp. Last updated 5 months ago.

32 stars 8.94 score 155 scripts 3 dependents

mattcowgill

readabs:Download and Tidy Time Series Data from the Australian Bureau of Statistics

Downloads, imports, and tidies time series data from the Australian Bureau of Statistics <https://www.abs.gov.au/>.

Maintained by Matt Cowgill. Last updated 27 days ago.

abs australia australian-bureau-of-statistics australian-data statistics tidy-data time-series

104 stars 8.85 score 180 scripts

atorus-research

xportr:Utilities to Output CDISC SDTM/ADaM XPT Files

Tools to build CDISC compliant data sets and check for CDISC compliance.

Maintained by Eli Miller. Last updated 3 months ago.

clinical-programmers xpt

43 stars 8.84 score 102 scripts

bioc

SeqVarTools:Tools for variant data

An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.

Maintained by Stephanie M. Gogarten. Last updated 5 months ago.

snp geneticvariability sequencing genetics

3 stars 8.76 score 384 scripts 2 dependents

jinseob2kim

jsmodule:'RStudio' Addins and 'Shiny' Modules for Medical Research

'RStudio' addins and 'Shiny' modules for descriptive statistics, regression and survival analysis.

Maintained by Jinseob Kim. Last updated 11 days ago.

medical rstudio-addins shiny shiny-modules statistics

21 stars 8.69 score 61 scripts

r-box

boxr:Interface for the 'Box.com API'

An R interface for the remote file hosting service 'Box' (<https://www.box.com/>). In addition to uploading and downloading files, this package includes functions which mirror base R operations for local files, (e.g. box_load(), box_save(), box_read(), box_setwd(), etc.), as well as 'git' style functions for entire directories (e.g. box_fetch(), box_push()).

Maintained by Ian Lyttle. Last updated 12 months ago.

63 stars 8.65 score 238 scripts

projectmosaic

mosaicCalc:R-Language Based Calculus Operations for Teaching

Software to support the introductory *MOSAIC Calculus* textbook <https://www.mosaic-web.org/MOSAIC-Calculus/>), one of many data- and modeling-oriented educational resources developed by Project MOSAIC (<https://www.mosaic-web.org/>). Provides symbolic and numerical differentiation and integration, as well as support for applied linear algebra (for data science), and differential equations/dynamics. Includes grammar-of-graphics-based functions for drawing vector fields, trajectories, etc. The software is suitable for general use, but intended mainly for teaching calculus.

Maintained by Daniel Kaplan. Last updated 1 months ago.

13 stars 8.63 score 546 scripts

isubirana

compareGroups:Descriptive Analysis by Groups

Create data summaries for quality control, extensive reports for exploring data, as well as publication-ready univariate or bivariate tables in several formats (plain text, HTML,LaTeX, PDF, Word or Excel. Create figures to quickly visualise the distribution of your data (boxplots, barplots, normality-plots, etc.). Display statistics (mean, median, frequencies, incidences, etc.). Perform the appropriate tests (t-test, Analysis of variance, Kruskal-Wallis, Fisher, log-rank, ...) depending on the nature of the described variable (normal, non-normal or qualitative). Summarize genetic data (Single Nucleotide Polymorphisms) data displaying Allele Frequencies and performing Hardy-Weinberg Equilibrium tests among other typical statistics and tests for these kind of data.

Maintained by Isaac Subirana. Last updated 1 months ago.

comparegroups descriptive-statistics plot report table

36 stars 8.46 score 396 scripts 1 dependents

davidhodge931

ggblanket:Simplify 'ggplot2' Visualisation

Simplify 'ggplot2' visualisation with 'ggblanket' wrapper functions.

Maintained by David Hodge. Last updated 11 days ago.

data-visualisation data-visualization ggplot ggplot-extension ggplot2 ggplot2-enhancements visualisation visualization

173 stars 8.42 score 45 scripts

nlmixr2

nlmixr2:Nonlinear Mixed Effects Models in Population PK/PD

Fit and compare nonlinear mixed-effects models in differential equations with flexible dosing information commonly seen in pharmacokinetics and pharmacodynamics (Almquist, Leander, and Jirstrand 2015 <doi:10.1007/s10928-015-9409-1>). Differential equation solving is by compiled C code provided in the 'rxode2' package (Wang, Hallow, and James 2015 <doi:10.1002/psp4.12052>).

Maintained by Matthew Fidler. Last updated 1 months ago.

52 stars 8.38 score 120 scripts 3 dependents

wallaceecomod

wallace:A Modular Platform for Reproducible Modeling of Species Niches and Distributions

The 'shiny' application Wallace is a modular platform for reproducible modeling of species niches and distributions. Wallace guides users through a complete analysis, from the acquisition of species occurrence and environmental data to visualizing model predictions on an interactive map, thus bundling complex workflows into a single, streamlined interface. An extensive vignette, which guides users through most package functionality can be found on the package's GitHub Pages website: <https://wallaceecomod.github.io/wallace/articles/tutorial-v2.html>.

Maintained by Mary E. Blair. Last updated 22 days ago.

openjdk

133 stars 8.36 score 96 scripts

rubenarslan

codebook:Automatic Codebooks from Metadata Encoded in Dataset Attributes

Easily automate the following tasks to describe data frames: Summarise the distributions, and labelled missings of variables graphically and using descriptive statistics. For surveys, compute and summarise reliabilities (internal consistencies, retest, multilevel) for psychological scales. Combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on 'rmarkdown' partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.) and in JSON-LD, so that search engines can find your data and index the metadata. The metadata are also available at your fingertips via RStudio Addins.

Maintained by Ruben Arslan. Last updated 3 months ago.

codebook documentation formr json-ld metadata spss webapp

143 stars 8.29 score 229 scripts

dbosak01

libr:Libraries, Data Dictionaries, and a Data Step for R

Contains a set of functions to create data libraries, generate data dictionaries, and simulate a data step. The libname() function will load a directory of data into a library in one line of code. The dictionary() function will generate data dictionaries for individual data frames or an entire library. And the datestep() function will perform row-by-row data processing.

Maintained by David Bosak. Last updated 3 months ago.

cpp

27 stars 8.27 score 48 scripts 2 dependents

safetygraphics

safetyGraphics:Interactive Graphics for Monitoring Clinical Trial Safety

A framework for evaluation of clinical trial safety. Users can interactively explore their data using the included 'Shiny' application.

Maintained by Jeremy Wildfire. Last updated 2 years ago.

99 stars 8.19 score 111 scripts

alinetalhouk

diceR:Diverse Cluster Ensemble in R

Performs cluster analysis using an ensemble clustering framework, Chiu & Talhouk (2018) <doi:10.1186/s12859-017-1996-y>. Results from a diverse set of algorithms are pooled together using methods such as majority voting, K-Modes, LinkCluE, and CSPA. There are options to compare cluster assignments across algorithms using internal and external indices, visualizations such as heatmaps, and significance testing for the existence of clusters.

Maintained by Derek Chiu. Last updated 2 months ago.

cpp

37 stars 8.13 score 60 scripts 3 dependents

gfellerlab

SuperCell:Simplification of scRNA-seq data by merging together similar cells

Aggregates large single-cell data into metacell dataset by merging together gene expression of very similar cells.

Maintained by The package maintainer. Last updated 8 months ago.

software coarse-graining scrna-seq-analysis scrna-seq-data

72 stars 8.08 score 93 scripts

salvatoremangiafico

rcompanion:Functions to Support Extension Education Program Evaluation

Functions and datasets to support Summary and Analysis of Extension Program Evaluation in R, and An R Companion for the Handbook of Biological Statistics. Vignettes are available at <https://rcompanion.org>.

Maintained by Salvatore Mangiafico. Last updated 1 months ago.

4 stars 8.01 score 2.4k scripts 5 dependents

john-harrold

formods:'Shiny' Modules for General Tasks

'Shiny' apps can often make use of the same key elements, this package provides modules for common tasks (data upload, wrangling data, figure generation and saving the app state), and also a framework for developing. These modules can react and interact as well as generate code to create reproducible analyses.

Maintained by John Harrold. Last updated 19 days ago.

8 stars 7.94 score 100 scripts 1 dependents

dataobservatory-eu

dataset:Create Data Frames that are Easier to Exchange and Reuse

The aim of the 'dataset' package is to make tidy datasets easier to release, exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, well-described, interoperable datasets into release and reuse ready form.

Maintained by Daniel Antal. Last updated 4 days ago.

dataset metadata-management

14 stars 7.89 score 76 scripts 1 dependents

psychbruce

bruceR:Broadly Useful Convenient and Efficient R Functions

Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.

Maintained by Han-Wu-Shuang Bao. Last updated 10 months ago.

anova data-analysis data-science linear-models linear-regression multilevel-models statistics toolbox

176 stars 7.87 score 316 scripts 3 dependents

dbosak01

sassy:Makes 'R' Easier for Everyone

A meta-package that aims to make 'R' easier for everyone, especially programmers who have a background in 'SAS®' software. This set of packages brings many useful concepts to 'R', including data libraries, data dictionaries, formats and format catalogs, a data step, and a traceable log. The 'flagship' package is a reporting package that can output in text, rich text, 'PDF', 'HTML', and 'DOCX' file formats.

Maintained by David Bosak. Last updated 7 days ago.

21 stars 7.87 score 92 scripts

american-institutes-for-research

EdSurvey:Analysis of NCES Education Survey and Assessment Data

Read in and analyze functions for education survey and assessment data from the National Center for Education Statistics (NCES) <https://nces.ed.gov/>, including National Assessment of Educational Progress (NAEP) data <https://nces.ed.gov/nationsreportcard/> and data from the International Assessment Database: Organisation for Economic Co-operation and Development (OECD) <https://www.oecd.org/en/about/directorates/directorate-for-education-and-skills.html>, including Programme for International Student Assessment (PISA), Teaching and Learning International Survey (TALIS), Programme for the International Assessment of Adult Competencies (PIAAC), and International Association for the Evaluation of Educational Achievement (IEA) <https://www.iea.nl/>, including Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, Progress in International Reading Literacy Study (PIRLS), International Civic and Citizenship Study (ICCS), International Computer and Information Literacy Study (ICILS), and Civic Education Study (CivEd).

Maintained by Paul Bailey. Last updated 29 days ago.

10 stars 7.86 score 139 scripts 1 dependents

pmair78

smacof:Multidimensional Scaling

Implements the following approaches for multidimensional scaling (MDS) based on stress minimization using majorization (smacof): ratio/interval/ordinal/spline MDS on symmetric dissimilarity matrices, MDS with external constraints on the configuration, individual differences scaling (idioscal, indscal), MDS with spherical restrictions, and ratio/interval/ordinal/spline unfolding (circular restrictions, row-conditional). Various tools and extensions like jackknife MDS, bootstrap MDS, permutation tests, MDS biplots, gravity models, unidimensional scaling, drift vectors (asymmetric MDS), classical scaling, and Procrustes are implemented as well.

Maintained by Patrick Mair. Last updated 6 months ago.

5 stars 7.86 score 152 scripts 24 dependents

joachim-gassen

ExPanDaR:Explore Your Data Interactively

Provides a shiny-based front end (the 'ExPanD' app) and a set of functions for exploratory data analysis. Run as a web-based app, 'ExPanD' enables users to assess the robustness of empirical evidence without providing them access to the underlying data. You can export a notebook containing the analysis of 'ExPanD' and/or use the functions of the package to support your exploratory data analysis workflow. Refer to the vignettes of the package for more information on how to use 'ExPanD' and/or the functions of this package.

Maintained by Joachim Gassen. Last updated 4 years ago.

accounting eda exploratory-data-analysis finance open-science replication shiny shiny-apps

156 stars 7.80 score 203 scripts

jwiley

JWileymisc:Miscellaneous Utilities and Functions

Miscellaneous tools and functions, including: generate descriptive statistics tables, format output, visualize relations among variables or check distributions, and generic functions for residual and model diagnostics.

Maintained by Joshua F. Wiley. Last updated 3 days ago.

6 stars 7.78 score 241 scripts 4 dependents

obiba

opalr:'Opal' Data Repository Client and 'DataSHIELD' Utils

Data integration Web application for biobanks by 'OBiBa'. 'Opal' is the core database application for biobanks. Participant data, once collected from any data source, must be integrated and stored in a central data repository under a uniform model. 'Opal' is such a central repository. It can import, process, validate, query, analyze, report, and export data. 'Opal' is typically used in a research center to analyze the data acquired at assessment centres. Its ultimate purpose is to achieve seamless data-sharing among biobanks. This 'Opal' client allows to interact with 'Opal' web services and to perform operations on the R server side. 'DataSHIELD' administration tools are also provided.

Maintained by Yannick Marcon. Last updated 3 months ago.

3 stars 7.76 score 179 scripts 2 dependents

novartis

xgxr:Exploratory Graphics for Pharmacometrics

Supports a structured approach for exploring PKPD data <https://opensource.nibr.com/xgx/>. It also contains helper functions for enabling the modeler to follow best R practices (by appending the program name, figure name location, and draft status to each plot). In addition, it enables the modeler to follow best graphical practices (by providing a theme that reduces chart ink, and by providing time-scale, log-scale, and reverse-log-transform-scale functions for more readable axes). Finally, it provides some data checking and summarizing functions for rapidly exploring pharmacokinetics and pharmacodynamics (PKPD) datasets.

Maintained by Andrew Stein. Last updated 1 years ago.

13 stars 7.76 score 105 scripts 5 dependents

valeriapolicastro

robin:ROBustness in Network

Assesses the robustness of the community structure of a network found by one or more community detection algorithm to give indications about their reliability. It detects if the community structure found by a set of algorithms is statistically significant and compares the different selected detection algorithms on the same network. robin helps to choose among different community detection algorithms the one that better fits the network of interest. Reference in Policastro V., Righelli D., Carissimo A., Cutillo L., De Feis I. (2021) <https://journal.r-project.org/archive/2021/RJ-2021-040/index.html>.

Maintained by Valeria Policastro. Last updated 8 days ago.

19 stars 7.72 score 8 scripts

nliulab

AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator

A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.

Maintained by Feng Xie. Last updated 27 days ago.

32 stars 7.70 score 30 scripts

ekstroem

MESS:Miscellaneous Esoteric Statistical Scripts

A mixed collection of useful and semi-useful diverse statistical functions, some of which may even be referenced in The R Primer book. See Ekstrøm, C. T. (2016). The R Primer. 2nd edition. Chapman & Hall.

Maintained by Claus Thorn Ekstrøm. Last updated 1 months ago.

biostatistics power-analysis statistical-analysis statistical-methods statistical-models openblas cpp

4 stars 7.69 score 328 scripts 13 dependents

proteomicslab57357

UniprotR:Retrieving Information of Proteins from Uniprot

Connect to Uniprot <https://www.uniprot.org/> to retrieve information about proteins using their accession number such information could be name or taxonomy information, For detailed information kindly read the publication <https://www.sciencedirect.com/science/article/pii/S1874391919303859>.

Maintained by Mohamed Soudy. Last updated 3 years ago.

61 stars 7.65 score 89 scripts 1 dependents

ropengov

retroharmonize:Ex Post Survey Data Harmonization

Assist in reproducible retrospective (ex-post) harmonization of data, particularly individual level survey data, by providing tools for organizing metadata, standardizing the coding of variables, and variable names and value labels, including missing values, and documenting the data transformations, with the help of comprehensive s3 classes.

Maintained by Daniel Antal. Last updated 2 months ago.

ropengov

10 stars 7.62 score 59 scripts

uligges

klaR:Classification and Visualization

Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kernel-density naive Bayes, an interface to 'svmlight' and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.

Maintained by Uwe Ligges. Last updated 1 years ago.

5 stars 7.61 score 1.4k scripts 13 dependents

mmollina

mappoly:Genetic Linkage Maps in Autopolyploids

Construction of genetic maps in autopolyploid full-sib populations. Uses pairwise recombination fraction estimation as the first source of information to sequentially position allelic variants in specific homologous chromosomes. For situations where pairwise analysis has limited power, the algorithm relies on the multilocus likelihood obtained through a hidden Markov model (HMM). For more detail, please see Mollinari and Garcia (2019) <doi:10.1534/g3.119.400378> and Mollinari et al. (2020) <doi:10.1534/g3.119.400620>.

Maintained by Marcelo Mollinari. Last updated 23 days ago.

polyploid polyploid-genetic-mapping polyploidy cpp

27 stars 7.56 score 111 scripts 1 dependents

silvadenisson

electionsBR:R Functions to Download and Clean Brazilian Electoral Data

Offers a set of functions to easily download and clean Brazilian electoral data from the Superior Electoral Court and 'CepespData' websites. Among other features, the package retrieves data on local and federal elections for all positions (city councilor, mayor, state deputy, federal deputy, governor, and president) aggregated by state, city, and electoral zones.

Maintained by Denisson Silva. Last updated 4 months ago.

65 stars 7.54 score 66 scripts

ekstroem

dataMaid:A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process

Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.

Maintained by Claus Thorn Ekstrøm. Last updated 3 years ago.

data-cleaning data-screening reproducible-research

143 stars 7.53 score 236 scripts

beckerbenj

eatGADS:Data Management of Large Hierarchical Data

Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.

Maintained by Benjamin Becker. Last updated 2 days ago.

1 stars 7.48 score 34 scripts 1 dependents

cardiomoon

editData:'RStudio' Addin for Editing a 'data.frame'

An 'RStudio' addin for editing a 'data.frame' or a 'tibble'. You can delete, add or update a 'data.frame' without coding. You can get resultant data as a 'data.frame'. In the package, modularized 'shiny' app codes are provided. These modules are intended for reuse across applications.

Maintained by Keon-Woong Moon. Last updated 4 years ago.

32 stars 7.45 score 63 scripts 5 dependents

amices

ggmice:Visualizations for 'mice' with 'ggplot2'

Enhance a 'mice' imputation workflow with visualizations for incomplete and/or imputed data. The plotting functions produce 'ggplot' objects which may be easily manipulated or extended. Use 'ggmice' to inspect missing data, develop imputation models, evaluate algorithmic convergence, or compare observed versus imputed data.

Maintained by Hanne Oberman. Last updated 8 months ago.

ggplot2 mice visualization

32 stars 7.42 score 165 scripts

farhadpishgar

MatchThem:Matching and Weighting Multiply Imputed Datasets

Provides essential tools for the pre-processing techniques of matching and weighting multiply imputed datasets. The package includes functions for matching within and across multiply imputed datasets using various methods, estimating weights for units in the imputed datasets using multiple weighting methods, calculating causal effect estimates in each matched or weighted dataset using parametric or non-parametric statistical models, and pooling the resulting estimates according to Rubin's rules (please see <https://journal.r-project.org/archive/2021/RJ-2021-073/> for more details).

Maintained by Farhad Pishgar. Last updated 5 months ago.

18 stars 7.40 score 112 scripts

ddotta

parquetize:Convert Files to Parquet Format

Collection of functions to get files in parquet format. Parquet is a columnar storage file format <https://parquet.apache.org/>. The files to convert can be of several formats ("csv", "RData", "rds", "RSQLite", "json", "ndjson", "SAS", "SPSS"...).

Maintained by Damien Dotta. Last updated 5 months ago.

conversion convert converter csv parquet sas spss sqlite stata

71 stars 7.36 score 27 scripts 1 dependents

eltebioinformatics

mulea:Enrichment Analysis Using Multiple Ontologies and False Discovery Rate

Background - Traditional gene set enrichment analyses are typically limited to a few ontologies and do not account for the interdependence of gene sets or terms, resulting in overcorrected p-values. To address these challenges, we introduce mulea, an R package offering comprehensive overrepresentation and functional enrichment analysis. Results - mulea employs a progressive empirical false discovery rate (eFDR) method, specifically designed for interconnected biological data, to accurately identify significant terms within diverse ontologies. mulea expands beyond traditional tools by incorporating a wide range of ontologies, encompassing Gene Ontology, pathways, regulatory elements, genomic locations, and protein domains. This flexibility enables researchers to tailor enrichment analysis to their specific questions, such as identifying enriched transcriptional regulators in gene expression data or overrepresented protein domains in protein sets. To facilitate seamless analysis, mulea provides gene sets (in standardised GMT format) for 27 model organisms, covering 22 ontology types from 16 databases and various identifiers resulting in almost 900 files. Additionally, the muleaData ExperimentData Bioconductor package simplifies access to these pre-defined ontologies. Finally, mulea's architecture allows for easy integration of user-defined ontologies, or GMT files from external sources (e.g., MSigDB or Enrichr), expanding its applicability across diverse research areas. Conclusions - mulea is distributed as a CRAN R package. It offers researchers a powerful and flexible toolkit for functional enrichment analysis, addressing limitations of traditional tools with its progressive eFDR and by supporting a variety of ontologies. Overall, mulea fosters the exploration of diverse biological questions across various model organisms.

Maintained by Tamas Stirling. Last updated 4 months ago.

annotation differentialexpression geneexpression genesetenrichment go graphandnetwork multiplecomparison pathways reactome software transcription visualization enrichment enrichment-analysis functional-enrichment-analysis gene-set-enrichment ontologies transcriptomics cpp

28 stars 7.36 score 34 scripts

bioc

gDRimport:Package for handling the import of dose-response data

The package is a part of the gDR suite. It helps to prepare raw drug response data for downstream processing. It mainly contains helper functions for importing/loading/validating dose-response data provided in different file formats.

Maintained by Arkadiusz Gladki. Last updated 10 days ago.

software infrastructure dataimport

3 stars 7.29 score 5 scripts 1 dependents

ibecav

CGPfunctions:Powell Miscellaneous Functions for Teaching and Learning Statistics

Miscellaneous functions useful for teaching statistics as well as actually practicing the art. They typically are not new methods but rather wrappers around either base R or other packages.

Maintained by Chuck Powell. Last updated 4 years ago.

27 stars 7.28 score 122 scripts

modesto-escobar

netCoin:Interactive Analytic Networks

Create interactive analytic networks. It joins the data analysis power of R to obtain coincidences, co-occurrences and correlations, and the visualization libraries of 'JavaScript' in one package.

Maintained by Modesto Escobar. Last updated 22 hours ago.

11 stars 7.22 score 47 scripts

bioc

CRISPRseek:Design of guide RNAs in CRISPR genome-editing systems

The package encompasses functions to find potential guide RNAs for the CRISPR-based genome-editing systems including the Base Editors and the Prime Editors when supplied with target sequences as input. Users have the flexibility to filter resulting guide RNAs based on parameters such as the absence of restriction enzyme cut sites or the lack of paired guide RNAs. The package also facilitates genome-wide exploration for off-targets, offering features to score and rank off-targets, retrieve flanking sequences, and indicate whether the hits are located within exon regions. All detected guide RNAs are annotated with the cumulative scores of the top5 and topN off-targets together with the detailed information such as mismatch sites and restrictuion enzyme cut sites. The package also outputs INDELs and their frequencies for Cas9 targeted sites.

Maintained by Lihua Julie Zhu. Last updated 19 days ago.

immunooncology generegulation sequencematching crispr

7.18 score 51 scripts 2 dependents

mwheymans

psfmi:Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets

Pooling, backward and forward selection of linear, logistic and Cox regression models in multiply imputed datasets. Backward and forward selection can be done from the pooled model using Rubin's Rules (RR), the D1, D2, D3, D4 and the median p-values method. This is also possible for Mixed models. The models can contain continuous, dichotomous, categorical and restricted cubic spline predictors and interaction terms between all these type of predictors. The stability of the models can be evaluated using (cluster) bootstrapping. The package further contains functions to pool model performance measures as ROC/AUC, Reclassification, R-squared, scaled Brier score, H&L test and calibration plots for logistic regression models. Internal validation can be done across multiply imputed datasets with cross-validation or bootstrapping. The adjusted intercept after shrinkage of pooled regression coefficients can be obtained. Backward and forward selection as part of internal validation is possible. A function to externally validate logistic prediction models in multiple imputed datasets is available and a function to compare models. For Cox models a strata variable can be included. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>.

Maintained by Martijn Heymans. Last updated 2 years ago.

cox-regression imputation imputed-datasets logistic multiple-imputation pool predictor regression selection spline spline-predictors

10 stars 7.17 score 70 scripts

cardiomoon

autoReg:Automatic Linear and Logistic Regression and Survival Analysis

Make summary tables for descriptive statistics and select explanatory variables automatically in various regression models. Support linear models, generalized linear models and cox-proportional hazard models. Generate publication-ready tables summarizing result of regression analysis and plots. The tables and plots can be exported in "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents.

Maintained by Keon-Woong Moon. Last updated 1 years ago.

49 stars 7.13 score 69 scripts

roelandkindt

BiodiversityR:Package for Community Ecology and Suitability Analysis

Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.

Maintained by Roeland Kindt. Last updated 2 months ago.

17 stars 7.13 score 390 scripts 2 dependents

openanalytics

clinDataReview:Clinical Data Review Tool

Creation of interactive tables, listings and figures ('TLFs') and associated report for exploratory analysis of data in a clinical trial, e.g. for clinical oversight activities. Interactive figures include sunburst, treemap, scatterplot, line plot and barplot of counts data. Interactive tables include table of summary statistics (as counts of adverse events, enrollment table) and listings. Possibility to compare data (summary table or listing) across two data batches/sets. A clinical data review report is created via study-specific configuration files and template 'R Markdown' reports contained in the package.

Maintained by Laure Cougnaud. Last updated 10 months ago.

11 stars 7.10 score 36 scripts

chaisemartinpackages

DIDmultiplegtDYN:Estimation in Difference-in-Difference Designs with Multiple Groups and Periods

Estimation of heterogeneity-robust difference-in-differences estimators, with a binary, discrete, or continuous treatment, in designs where past treatments may affect the current outcome.

Maintained by Diego Ciccia. Last updated 3 days ago.

42 stars 7.10 score 19 scripts 1 dependents

farrellday

miceRanger:Multiple Imputation by Chained Equations with Random Forests

Multiple Imputation has been shown to be a flexible method to impute missing values by Van Buuren (2007) <doi:10.1177/0962280206074463>. Expanding on this, random forests have been shown to be an accurate model by Stekhoven and Buhlmann <arXiv:1105.0828> to impute missing values in datasets. They have the added benefits of returning out of bag error and variable importance estimates, as well as being simple to run in parallel.

Maintained by Sam Wilson. Last updated 3 years ago.

imputation-methods machine-learning mice missing-data missing-values random-forests

67 stars 7.09 score 41 scripts 1 dependents

john-harrold

ruminate:A Pharmacometrics Data Transformation and Analysis Tool

Exploration of pharmacometrics data involves both general tools (transformation and plotting) and specific techniques (non-compartmental analysis). This kind of exploration is generally accomplished by utilizing different packages. The purpose of 'ruminate' is to create a 'shiny' interface to make these tools more broadly available while creating reproducible results.

Maintained by John Harrold. Last updated 19 days ago.

2 stars 7.06 score 84 scripts

john-d-fox

RcmdrMisc:R Commander Miscellaneous Functions

Various statistical, graphics, and data-management functions used by the Rcmdr package in the R Commander GUI for R.

Maintained by John Fox. Last updated 2 years ago.

1 stars 7.02 score 432 scripts 42 dependents

bioc

musicatk:Mutational Signature Comprehensive Analysis Toolkit

Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.

Maintained by Joshua D. Campbell. Last updated 5 months ago.

software biologicalquestion somaticmutation variantannotation

13 stars 6.97 score 20 scripts

cmerow

rangeModelMetadata:Provides Templates for Metadata Files Associated with Species Range Models

Range Modeling Metadata Standards (RMMS) address three challenges: they (i) are designed for convenience to encourage use, (ii) accommodate a wide variety of applications, and (iii) are extensible to allow the community of range modelers to steer it as needed. RMMS are based on a data dictionary that specifies a hierarchical structure to catalog different aspects of the range modeling process. The dictionary balances a constrained, minimalist vocabulary to improve standardization with flexibility for users to provide their own values. Merow et al. (2019) <DOI:10.1111/geb.12993> describe the standards in more detail. Note that users who prefer to use the R package 'ecospat' can obtain it from <https://github.com/ecospat/ecospat>.

Maintained by Cory Merow. Last updated 8 months ago.

ecological-metadata-language ecological-modelling ecological-models ecology species-distribution-modelling species-distributions

6 stars 6.96 score 16 scripts 3 dependents

insightsengineering

tern.gee:Tables and Graphs for Generalized Estimating Equations (GEE) Model Fits

Generalized estimating equations (GEE) are a popular choice for analyzing longitudinal binary outcomes. This package provides an interface for fitting GEE, currently for logistic regression, within the 'tern' <https://cran.r-project.org/package=tern> framework (Zhu, Sabanés Bové et al., 2023) and tabulate results easily using 'rtables' <https://cran.r-project.org/package=rtables> (Becker, Waddell et al., 2023). It builds on 'geepack' <doi:10.18637/jss.v015.i02> (Højsgaard, Halekoh and Yan, 2006) for the actual GEE model fitting.

Maintained by Joe Zhu. Last updated 7 months ago.

8 stars 6.94 score 3 scripts 1 dependents

danlwarren

ENMTools:Analysis of Niche Evolution using Niche and Distribution Models

Constructing niche models and analyzing patterns of niche evolution. Acts as an interface for many popular modeling algorithms, and allows users to conduct Monte Carlo tests to address basic questions in evolutionary ecology and biogeography. Warren, D.L., R.E. Glor, and M. Turelli (2008) <doi:10.1111/j.1558-5646.2008.00482.x> Glor, R.E., and D.L. Warren (2011) <doi:10.1111/j.1558-5646.2010.01177.x> Warren, D.L., R.E. Glor, and M. Turelli (2010) <doi:10.1111/j.1600-0587.2009.06142.x> Cardillo, M., and D.L. Warren (2016) <doi:10.1111/geb.12455> D.L. Warren, L.J. Beaumont, R. Dinnage, and J.B. Baumgartner (2019) <doi:10.1111/ecog.03900>.

Maintained by Dan Warren. Last updated 3 months ago.

105 stars 6.91 score 126 scripts

ropensci

essurvey:Download Data from the European Social Survey on the Fly

Download data from the European Social Survey directly from their website <http://www.europeansocialsurvey.org/>. There are two families of functions that allow you to download and interactively check all countries and rounds available.

Maintained by Jorge Cimentada. Last updated 3 years ago.

ess

48 stars 6.88 score 79 scripts

biogen-inc

tidyCDISC:Quick Table Generation & Exploratory Analyses on ADaM-Ish Datasets

Provides users a quick exploratory dive into common visualizations without writing a single line of code given the users data follows the Analysis Data Model (ADaM) standards put forth by the Clinical Data Interchange Standards Consortium (CDISC) <https://www.cdisc.org>. Prominent modules/ features of the application are the Table Generator, Population Explorer, and the Individual Explorer. The Table Generator allows users to drag and drop variables and desired statistics (frequencies, means, ANOVA, t-test, and other summary statistics) into bins that automagically create stunning tables with validated information. The Population Explorer offers various plots to visualize general trends in the population from various vantage points. Plot modules currently include scatter plot, spaghetti plot, box plot, histogram, means plot, and bar plot. Each plot type allows the user to plot uploaded variables against one another, and dissect the population by filtering out certain subjects. Last, the Individual Explorer establishes a cohesive patient narrative, allowing the user to interact with patient metrics (params) by visit or plotting important patient events on a timeline. All modules allow for concise filtering & downloading bulk outputs into html or pdf formats to save for later.

Maintained by Aaron Clark. Last updated 2 years ago.

pharma rinpharma

108 stars 6.86 score 19 scripts

svmiller

stevemisc:Steve's Miscellaneous Functions

These are miscellaneous functions that I find useful for my research and teaching. The contents include themes for plots, functions for simulating quantities of interest from regression models, functions for simulating various forms of fake data for instructional/research purposes, and many more. All told, the functions provided here are broadly useful for data organization, data presentation, data recoding, and data simulation.

Maintained by Steve Miller. Last updated 18 days ago.

dplyr mixed-effects-models multivariate-normal-distribution tidyverse

10 stars 6.85 score 392 scripts 2 dependents

raymondbalise

rUM:R Templates from the University of Miami

This holds some r markdown and quarto templates and a template to create a research project in "R Studio".

Maintained by Raymond Balise. Last updated 8 days ago.

rmarkdown

9 stars 6.84 score 16 scripts

stla

qspray:Multivariate Polynomials with Rational Coefficients

Symbolic calculation and evaluation of multivariate polynomials with rational coefficients. This package is strongly inspired by the 'spray' package. It provides a function to compute Gröbner bases (reference <doi:10.1007/978-3-319-16721-3>). It also includes some features for symmetric polynomials, such as the Hall inner product. The header file of the C++ code can be used by other packages. It provides the templated class 'Qspray' that can be used to represent and to deal with multivariate polynomials with another type of coefficients.

Maintained by Stéphane Laurent. Last updated 7 months ago.

gmp polynomials cpp

4 stars 6.81 score 152 scripts 5 dependents

cardiomoon

webr:Data and Functions for Web-Based Analysis

Several analysis-related functions for the book entitled "Web-based Analysis without R in Your Computer"(written in Korean, ISBN 978-89-5566-185-9) by Keon-Woong Moon. The main function plot.htest() shows the distribution of statistic for the object of class 'htest'.

Maintained by Keon-Woong Moon. Last updated 5 years ago.

33 stars 6.80 score 181 scripts

openanalytics

clinUtils:General Utility Functions for Analysis of Clinical Data

Utility functions to facilitate the import, the reporting and analysis of clinical data. Example datasets in 'SDTM' and 'ADaM' format, containing a subset of patients/domains from the 'CDISC Pilot 01 study' are also available as R datasets to demonstrate the package functionalities.

Maintained by Laure Cougnaud. Last updated 11 months ago.

3 stars 6.78 score 105 scripts 3 dependents

michaellli

evalITR:Evaluating Individualized Treatment Rules

Provides various statistical methods for evaluating Individualized Treatment Rules under randomized data. The provided metrics include Population Average Value (PAV), Population Average Prescription Effect (PAPE), Area Under Prescription Effect Curve (AUPEC). It also provides the tools to analyze Individualized Treatment Rules under budget constraints. Detailed reference in Imai and Li (2019) <arXiv:1905.05389>.

Maintained by Michael Lingzhi Li. Last updated 2 years ago.

14 stars 6.78 score 36 scripts

harrison4192

autostats:Auto Stats

Automatically do statistical exploration. Create formulas using 'tidyselect' syntax, and then determine cross-validated model accuracy and variable contributions using 'glm' and 'xgboost'. Contains additional helper functions to create and modify formulas. Has a flagship function to quickly determine relationships between categorical and continuous variables in the data set.

Maintained by Harrison Tietze. Last updated 24 days ago.

6 stars 6.76 score 5 scripts 2 dependents

big-life-lab

recodeflow:Contains functions to interface with variable details sheets, including recoding variables and converting them to PMML

Recode and harmonize data using variable and details sheets.

Maintained by Yulric Sequeria. Last updated 19 days ago.

6 stars 6.75 score 7 scripts

paytonjjones

networktools:Tools for Identifying Important Nodes in Networks

Includes assorted tools for network analysis. Bridge centrality; goldbricker; MDS, PCA, & eigenmodel network plotting.

Maintained by Payton Jones. Last updated 1 months ago.

10 stars 6.75 score 93 scripts 5 dependents

richardhooijmaijers

shinyMixR:Interactive 'shiny' Dashboard for 'nlmixr2'

An R shiny user interface for the 'nlmixr2' (Fidler et al (2019) <doi:10.1002/psp4.12445>) package, designed to simplify the modeling process for users. Additionally, this package includes supplementary functions to further enhances the usage of 'nlmixr2'.

Maintained by Richard Hooijmaijers. Last updated 5 months ago.

11 stars 6.74 score 28 scripts

marsicofl

mispitools:Missing Person Identification Tools

An open source software package written in R statistical language. It consists of a set of decision-making tools to conduct missing person searches. Particularly, it allows computing optimal LR threshold for declaring potential matches in DNA-based database search. More recently 'mispitools' incorporates preliminary investigation data based LRs. Statistical weight of different traces of evidence such as biological sex, age and hair color are presented. For citing mispitools please use the following references: Marsico and Caridi, 2023 <doi:10.1016/j.fsigen.2023.102891> and Marsico, Vigeland et al. 2021 <doi:10.1016/j.fsigen.2021.102519>.

Maintained by Franco Marsico. Last updated 3 months ago.

35 stars 6.74 score 19 scripts 1 dependents

imbi-heidelberg

DescrTab2:Publication Quality Descriptive Statistics Tables

Provides functions to create descriptive statistics tables for continuous and categorical variables. By default, summary statistics such as mean, standard deviation, quantiles, minimum and maximum for continuous variables and relative and absolute frequencies for categorical variables are calculated. 'DescrTab2' features a sophisticated algorithm to choose appropriate test statistics for your data and provides p-values. On top of this, confidence intervals for group differences of appropriated summary measures are automatically produces for two-group comparison. Tables generated by 'DescrTab2' can be integrated in a variety of document formats, including .html, .tex and .docx documents. 'DescrTab2' also allows printing tables to console and saving table objects for later use.

Maintained by Jan Meis. Last updated 1 years ago.

categorical-variables continuous-variable descriptive-statistics p-values statistical-tests statistics

9 stars 6.71 score 19 scripts 1 dependents

harrison4192

presenter:Present Data with Style

Consists of custom wrapper functions using packages 'openxlsx', 'flextable', and 'officer' to create highly formatted MS office friendly output of your data frames. These viewer friendly outputs are intended to match expectations of professional looking presentations in business and consulting scenarios. The functions are opinionated in the sense that they expect the input data frame to have certain properties in order to take advantage of the automated formatting.

Maintained by Harrison Tietze. Last updated 2 years ago.

excel powerpoint

11 stars 6.69 score 15 scripts 4 dependents

carriedaymont

growthcleanr:Data Cleaner for Anthropometric Measurements

Identifies implausible anthropometric (e.g., height, weight) measurements in irregularly spaced longitudinal datasets, such as those from electronic health records.

Maintained by Carrie Daymont. Last updated 29 days ago.

ehr ehr-data

14 stars 6.68 score 41 scripts 1 dependents

theomargel

ProtE:Processing Proteomics Data, Statistical Analysis and Visualization

The 'Proteomics Eye' ('ProtE') offers a comprehensive and intuitive framework for the univariate analysis of label-free proteomics data. By integrating essential data wrangling and processing steps into a single function, 'ProtE' streamlines pairwise statistical comparisons for categorical variables. It provides quality checks and generates publication-ready visualizations, enabling efficient and robust data analysis. 'ProtE' is compatible with proteomics data outputs from 'MaxQuant' (Cox & Mann, (2008) <doi:10.1038/nbt.1511>), 'DIA-NN' (Demichev et al., (2020) <doi:10.1038/s41592-019-0638-x>), and 'Proteome Discoverer' (Thermo Fisher Scientific, version 2.5). The package leverages 'ggplot2' for visualization (Wickham, (2016) <doi:10.1007/978-3-319-24277-4>) and 'limma' for statistical analysis (Ritchie et al., (2015) <doi:10.1093/nar/gkv007>).

Maintained by Theodoros Margelos. Last updated 10 days ago.

6.61 score 2 scripts

joon-e

tidycomm:Data Modification and Analysis for Communication Research

Provides convenience functions for common data modification and analysis tasks in communication research. This includes functions for univariate and bivariate data analysis, index generation and reliability computation, and intercoder reliability tests. All functions follow the style and syntax of the tidyverse, and are construed to perform their computations on multiple variables at once. Functions for univariate and bivariate data analysis comprise summary statistics for continuous and categorical variables, as well as several tests of bivariate association including effect sizes. Functions for data modification comprise index generation and automated reliability analysis of index variables. Functions for intercoder reliability comprise tests of several intercoder reliability estimates, including simple and mean pairwise percent agreement, Krippendorff's Alpha (Krippendorff 2004, ISBN: 9780761915454), and various Kappa coefficients (Brennan & Prediger 1981 <doi: 10.1177/001316448104100307>; Cohen 1960 <doi: 10.1177/001316446002000104>; Fleiss 1971 <doi: 10.1037/h0031619>).

Maintained by Julian Unkel. Last updated 11 months ago.

15 stars 6.59 score 52 scripts

agnesdeng

mixgb:Multiple Imputation Through 'XGBoost'

Multiple imputation using 'XGBoost', subsampling, and predictive mean matching as described in Deng and Lumley (2023) <doi:10.1080/10618600.2023.2252501>. The package supports various types of variables, offers flexible settings, and enables saving an imputation model to impute new data. Data processing and memory usage have been optimised to speed up the imputation process.

Maintained by Yongshi Deng. Last updated 2 months ago.

cpp openmp

23 stars 6.58 score 82 scripts

stamats

MKinfer:Inferential Statistics

Computation of various confidence intervals (Altman et al. (2000), ISBN:978-0-727-91375-3; Hedderich and Sachs (2018), ISBN:978-3-662-56657-2) including bootstrapped versions (Davison and Hinkley (1997), ISBN:978-0-511-80284-3) as well as Hsu (Hedderich and Sachs (2018), ISBN:978-3-662-56657-2), permutation (Janssen (1997), <doi:10.1016/S0167-7152(97)00043-6>), bootstrap (Davison and Hinkley (1997), ISBN:978-0-511-80284-3), intersection-union (Sozu et al. (2015), ISBN:978-3-319-22005-5) and multiple imputation (Barnard and Rubin (1999), <doi:10.1093/biomet/86.4.948>) t-test; furthermore, computation of intersection-union z-test as well as multiple imputation Wilcoxon tests. Graphical visualization by volcano and Bland-Altman plots (Bland and Altman (1986), <doi:10.1016/S0140-6736(86)90837-8>; Shieh (2018), <doi:10.1186/s12874-018-0505-y>).

Maintained by Matthias Kohl. Last updated 12 months ago.

6 stars 6.56 score 71 scripts 4 dependents

bioc

methylclock:Methylclock - DNA methylation-based clocks

This package allows to estimate chronological and gestational DNA methylation (DNAm) age as well as biological age using different methylation clocks. Chronological DNAm age (in years) : Horvath's clock, Hannum's clock, BNN, Horvath's skin+blood clock, PedBE clock and Wu's clock. Gestational DNAm age : Knight's clock, Bohlin's clock, Mayne's clock and Lee's clocks. Biological DNAm clocks : Levine's clock and Telomere Length's clock.

Maintained by Dolors Pelegri-Siso. Last updated 5 months ago.

dnamethylation biologicalquestion preprocessing statisticalmethod normalization cpp

39 stars 6.52 score 28 scripts

huanglabumn

oncoPredict:Drug Response Modeling and Biomarker Discovery

Allows for building drug response models using screening data between bulk RNA-Seq and a drug response metric and two additional tools for biomarker discovery that have been developed by the Huang Laboratory at University of Minnesota. There are 3 main functions within this package. (1) calcPhenotype is used to build drug response models on RNA-Seq data and impute them on any other RNA-Seq dataset given to the model. (2) GLDS is used to calculate the general level of drug sensitivity, which can improve biomarker discovery. (3) IDWAS can take the results from calcPhenotype and link the imputed response back to available genomic (mutation and CNV alterations) to identify biomarkers. Each of these functions comes from a paper from the Huang research laboratory. Below gives the relevant paper for each function. calcPhenotype - Geeleher et al, Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. GLDS - Geeleher et al, Cancer biomarker discovery is improved by accounting for variability in general levels of drug sensitivity in pre-clinical models. IDWAS - Geeleher et al, Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies.

Maintained by Robert Gruener. Last updated 12 months ago.

sva preprocesscore stringr biomart genefilter org.hs.eg.db genomicfeatures txdb.hsapiens.ucsc.hg19.knowngene tcgabiolinks biocgenerics genomicranges iranges s4vectors

18 stars 6.47 score 41 scripts

husson

Factoshiny:Perform Factorial Analysis from 'FactoMineR' with a Shiny Application

Perform factorial analysis with a menu and draw graphs interactively thanks to 'FactoMineR' and a Shiny application.

Maintained by Francois Husson. Last updated 2 months ago.

9 stars 6.46 score 152 scripts

cardiomoon

rrtable:Reproducible Research with a Table of R Codes

Makes documents containing plots and tables from a table of R codes. Can make "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents with or without R code. In the package, modularized 'shiny' app codes are provided. These modules are intended for reuse across applications.

Maintained by Keon-Woong Moon. Last updated 2 years ago.

3 stars 6.45 score 76 scripts 2 dependents

agrdatasci

gosset:Tools for Data Analysis in Experimental Agriculture

Methods to analyse experimental agriculture data, from data synthesis to model selection and visualisation. The package is named after W.S. Gosset aka ‘Student’, a pioneer of modern statistics in small sample experimental design and analysis.

Maintained by Kauê de Sousa. Last updated 4 months ago.

experimental-design rankings-data

6 stars 6.44 score 23 scripts

bioc

gwasurvivr:gwasurvivr: an R package for genome wide survival analysis

gwasurvivr is a package to perform survival analysis using Cox proportional hazard models on imputed genetic data.

Maintained by Abbas Rizvi. Last updated 5 months ago.

genomewideassociation survival regression genetics snp geneticvariability pharmacogenomics biomedicalinformatics

12 stars 6.43 score 75 scripts

florianstijven

Surrogate:Evaluation of Surrogate Endpoints in Clinical Trials

In a clinical trial, it frequently occurs that the most credible outcome to evaluate the effectiveness of a new therapy (the true endpoint) is difficult to measure. In such a situation, it can be an effective strategy to replace the true endpoint by a (bio)marker that is easier to measure and that allows for a prediction of the treatment effect on the true endpoint (a surrogate endpoint). The package 'Surrogate' allows for an evaluation of the appropriateness of a candidate surrogate endpoint based on the meta-analytic, information-theoretic, and causal-inference frameworks. Part of this software has been developed using funding provided from the European Union's Seventh Framework Programme for research, technological development and demonstration (Grant Agreement no 602552), the Special Research Fund (BOF) of Hasselt University (BOF-number: BOF2OCPO3), GlaxoSmithKline Biologicals, Baekeland Mandaat (HBC.2022.0145), and Johnson & Johnson Innovative Medicine.

Maintained by Wim Van Der Elst. Last updated 1 months ago.

1 stars 6.42 score 133 scripts

kjhealy

gssr:US General Social Survey (GSS) Data for R

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the GSS Cumulative Data and GSS Panel Data files packaged for R. Its companion package, gssrdoc, provides the codebook integrated into R's help system For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 5 months ago.

45 stars 6.42 score 147 scripts

anthonydevaux

DynForest:Random Forest with Multivariate Longitudinal Predictors

Based on random forest principle, 'DynForest' is able to include multiple longitudinal predictors to provide individual predictions. Longitudinal predictors are modeled through the random forest. The methodology is fully described for a survival outcome in: Devaux, Helmer, Genuer & Proust-Lima (2023) <doi: 10.1177/09622802231206477>.

Maintained by Anthony Devaux. Last updated 5 months ago.

16 stars 6.38 score 8 scripts

acaimo

Bergm:Bayesian Exponential Random Graph Models

Bayesian analysis for exponential random graph models using advanced computational algorithms. More information can be found at: <https://acaimo.github.io/Bergm/>.

Maintained by Alberto Caimo. Last updated 2 months ago.

16 stars 6.37 score 31 scripts 4 dependents

jacobkap

asciiSetupReader:Reads Fixed-Width ASCII Data Files (.txt or .dat) that Have Accompanying Setup Files (.sps or .sas)

Lets you open a fixed-width ASCII file (.txt or .dat) that has an accompanying setup file (.sps or .sas). These file combinations are sometimes referred to as .txt+.sps, .txt+.sas, .dat+.sps, or .dat+.sas. This will only run in a txt-sps or txt-sas pair in which the setup file contains instructions to open that text file. It will NOT open other text files, .sav, .sas, or .por data files. Fixed-width ASCII files with setup files are common in older (pre-2000) government data.

Maintained by Jacob Kaplan. Last updated 8 months ago.

ascii dat data-reader fixed-width fixed-width-parser fixed-width-tables fixed-width-text sas spss

11 stars 6.34 score 22 scripts 1 dependents

sbg

sevenbridges2:The 'Seven Bridges Platform' API Client

R client and utilities for 'Seven Bridges Platform' API, from 'Cancer Genomics Cloud' to other 'Seven Bridges' supported platforms. API documentation is hosted publicly at <https://docs.sevenbridges.com/docs/the-api>.

Maintained by Marko Trifunovic. Last updated 4 days ago.

api-client bioinformatics cloud sevenbridges

3 stars 6.32 score 4 scripts

jangraffelman

HardyWeinberg:Statistical Tests and Graphics for Hardy-Weinberg Equilibrium

Contains tools for exploring Hardy-Weinberg equilibrium (Hardy, 1908; Weinberg, 1908) for bi and multi-allelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) with bi-allelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included (Graffelman & Weir, 2016) <doi:10.1038/hdy.2016.20>, including Bayesian procedures. Some exact and permutation procedures also work with multi-allelic variants. Special test procedures that jointly address Hardy-Weinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multi-allelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of bi-allelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots. The functionality of the package is explained in detail in a related JSS paper <doi:10.18637/jss.v064.i03>.

Maintained by Jan Graffelman. Last updated 12 months ago.

cpp

6.30 score 167 scripts 4 dependents

phuse-org

sendigR:Enable Cross-Study Analysis of 'CDISC' 'SEND' Datasets

A system enables cross study Analysis by extracting and filtering study data for control animals from 'CDISC' 'SEND' Study Repository. These data types are supported: Body Weights, Laboratory test results and Microscopic findings. These database types are supported: 'SQLite' and 'Oracle'.

Maintained by Wenxian Wang. Last updated 23 days ago.

12 stars 6.28 score 6 scripts

bioc

RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences

This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49–58.

Maintained by Pascal Belleau. Last updated 5 months ago.

genetics software sequencing wholegenome principalcomponent geneticvariability dimensionreduction biocviews ancestry cancer-genomics exome-sequencing genomics inference r-language rna-seq rna-sequencing whole-genome-sequencing

5 stars 6.23 score 19 scripts

sjmack

HLAtools:Toolkit for HLA Immunogenomics

A toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics HLA 'GitHub' repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.

Maintained by Steven Mack. Last updated 26 days ago.

4 stars 6.21 score 7 scripts 1 dependents

jmping

weights:Weighting and Weighted Statistics

Provides a variety of functions for producing simple weighted statistics, such as weighted Pearson's correlations, partial correlations, Chi-Squared statistics, histograms, and t-tests. Also now includes some software for quickly recoding survey data and plotting estimates from interaction terms in regressions (and multiply imputed regressions) both with and without weights. NOTE: Weighted partial correlation calculations pulled to address a bug.

Maintained by Josh Pasek. Last updated 4 years ago.

6.20 score 590 scripts 40 dependents

ekstroem

dataReporter:Reproducible Data Screening Checks and Report of Possible Errors

Data screening is an important first step of any statistical analysis. 'dataReporter' auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset. See Petersen AH, Ekstrøm CT (2019). "dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R." _Journal of Statistical Software_, *90*(6), 1-38 <doi:10.18637/jss.v090.i06> for more information.

Maintained by Claus Thorn Ekstrøm. Last updated 2 years ago.

86 stars 6.16 score 34 scripts

nataliepatten

gatoRs:Geographic and Taxonomic Occurrence R-Based Scrubbing

Streamlines downloading and cleaning biodiversity data from Integrated Digitized Biocollections (iDigBio) and the Global Biodiversity Information Facility (GBIF).

Maintained by Natalie N. Patten. Last updated 11 months ago.

11 stars 6.16 score 66 scripts

joliencremers

bpnreg:Bayesian Projected Normal Regression Models for Circular Data

Fitting Bayesian multiple and mixed-effect regression models for circular data based on the projected normal distribution. Both continuous and categorical predictors can be included. Sampling from the posterior is performed via an MCMC algorithm. Posterior descriptives of all parameters, model fit statistics and Bayes factors for hypothesis tests for inequality constrained hypotheses are provided. See Cremers, Mulder & Klugkist (2018) <doi:10.1111/bmsp.12108> and Nuñez-Antonio & Guttiérez-Peña (2014) <doi:10.1016/j.csda.2012.07.025>.

Maintained by Jolien Cremers. Last updated 1 years ago.

openblas cpp openmp

14 stars 6.15 score 101 scripts

calcita

ech:Downloading and Processing Microdata from ECH-INE (Uruguay)

A consistent tool for downloading ECH data, processing them and generating new indicators: poverty, education, employment, etc. All data are downloaded from the official site of the National Institute of Statistics at <https://www.gub.uy/instituto-nacional-estadistica/datos-y-estadisticas/encuestas/encuesta-continua-hogares>.

Maintained by Gabriela Mathieu. Last updated 1 years ago.

16 stars 6.15 score 22 scripts

nlmixr2

babelmixr2:Use 'nlmixr2' to Interact with Open Source and Commercial Software

Run other estimation and simulation software via the 'nlmixr2' (Fidler et al (2019) <doi:10.1002/psp4.12445>) interface including 'PKNCA', 'NONMEM' and 'Monolix'. While not required, you can get/install the 'lixoftConnectors' package in the 'Monolix' installation, as described at the following url <https://monolixsuite.slp-software.com/r-functions/2024R1/installation-and-initialization>. When 'lixoftConnectors' is available, 'Monolix' can be run directly instead of setting up command line usage.

Maintained by Matthew Fidler. Last updated 18 days ago.

monolix nonmem pharmacometrics cpp

9 stars 6.11 score 53 scripts

big-life-lab

cchsflow:Transforming and Harmonizing CCHS Variables

Supporting the use of the Canadian Community Health Survey (CCHS) by transforming variables from each cycle into harmonized, consistent versions that span survey cycles (currently, 2001 to 2018). CCHS data used in this library is accessed and adapted in accordance to the Statistics Canada Open Licence Agreement. This package uses rec_with_table(), which was developed from 'sjmisc' rec(). Lüdecke D (2018). "sjmisc: Data and Variable Transformation Functions". Journal of Open Source Software, 3(26), 754. <doi:10.21105/joss.00754>.

Maintained by Kitty Chen. Last updated 1 years ago.

cchs opensci openscience

12 stars 6.02 score 192 scripts

danchaltiel

EDCimport:Import Data from EDC Software

A convenient toolbox to import data exported from Electronic Data Capture (EDC) software 'TrialMaster'.

Maintained by Dan Chaltiel. Last updated 19 days ago.

6.01 score 12 scripts

eu-ecdc

epitweetr:Early Detection of Public Health Threats from 'Twitter' Data

It allows you to automatically monitor trends of tweets by time, place and topic aiming at detecting public health threats early through the detection of signals (e.g. an unusual increase in the number of tweets). It was designed to focus on infectious diseases, and it can be extended to all hazards or other fields of study by modifying the topics and keywords. More information is available in the 'epitweetr' peer-review publication (doi:10.2807/1560-7917.ES.2022.27.39.2200177).

Maintained by Laura Espinosa. Last updated 1 years ago.

early-warning-systems epidemic-surveillance lucene machine-learning signal-detection spark twitter

56 stars 5.98 score 86 scripts

gbganalyst

bulkreadr:The Ultimate Tool for Reading Data in Bulk

Designed to simplify and streamline the process of reading and processing large volumes of data in R, this package offers a collection of functions tailored for bulk data operations. It enables users to efficiently read multiple sheets from Microsoft Excel and Google Sheets workbooks, as well as various CSV files from a directory. The data is returned as organized data frames, facilitating further analysis and manipulation. Ideal for handling extensive data sets or batch processing tasks, bulkreadr empowers users to manage data in bulk effortlessly, saving time and effort in data preparation workflows. Additionally, the package seamlessly works with labelled data from SPSS and Stata.

Maintained by Ezekiel Ogundepo. Last updated 7 months ago.

bulkreader csv-reader data-import googlesheets missing-values xlsxreader

12 stars 5.94 score 12 scripts

bioc

SCOPE:A normalization and copy number estimation method for single-cell DNA sequencing

Whole genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy number profiles at the cellular level. This circumvents the averaging effects associated with bulk-tissue sequencing and has increased resolution yet decreased ambiguity in deconvolving cancer subclones and elucidating cancer evolutionary history. ScDNA-seq data is, however, sparse, noisy, and highly variable even within a homogeneous cell population, due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we propose SCOPE, a normalization and copy number estimation method for scDNA-seq data. The distinguishing features of SCOPE include: (i) utilization of cell-specific Gini coefficients for quality controls and for identification of normal/diploid cells, which are further used as negative control samples in a Poisson latent factor model for normalization; (ii) modeling of GC content bias using an expectation-maximization algorithm embedded in the Poisson generalized linear models, which accounts for the different copy number states along the genome; (iii) a cross-sample iterative segmentation procedure to identify breakpoints that are shared across cells from the same genetic background.

Maintained by Rujin Wang. Last updated 5 months ago.

singlecell normalization copynumbervariation sequencing wholegenome coverage alignment qualitycontrol dataimport dnaseq

5.92 score 84 scripts

swissclinicaltrialorganisation

secuTrialR:Handling of Data from the Clinical Data Management System 'secuTrial'

Seamless and standardized interaction with data exported from the clinical data management system (CDMS) 'secuTrial'<https://www.secutrial.com>. The primary data export the package works with is a standard non-rectangular export.

Maintained by Alan G. Haynes. Last updated 10 months ago.

9 stars 5.91 score 15 scripts

bruigtp

REDCapDM:'REDCap' Data Management

REDCap Data Management - REDCapDM is an R package that allows users to manage data exported directly from REDCap or using an API connection. This package includes several functions designed for pre-processing data, generating reports of queries such as outliers or missing values, and following up on the identified queries. 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>) is a web application developed at Vanderbilt University, designed for creating and managing online surveys and databases and the REDCap API is an interface that allows external applications to connect to REDCap remotely, and is used to programmatically retrieve or modify project data or settings within REDCap, such as importing or exporting data.

Maintained by João Carmezim. Last updated 15 days ago.

4 stars 5.89 score 9 scripts

tirgit

missCompare:Intuitive Missing Data Imputation Framework

Offers a convenient pipeline to test and compare various missing data imputation algorithms on simulated and real data. These include simpler methods, such as mean and median imputation and random replacement, but also include more sophisticated algorithms already implemented in popular R packages, such as 'mi', described by Su et al. (2011) <doi:10.18637/jss.v045.i02>; 'mice', described by van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>; 'missForest', described by Stekhoven and Buhlmann (2012) <doi:10.1093/bioinformatics/btr597>; 'missMDA', described by Josse and Husson (2016) <doi:10.18637/jss.v070.i01>; and 'pcaMethods', described by Stacklies et al. (2007) <doi:10.1093/bioinformatics/btm069>. The central assumption behind 'missCompare' is that structurally different datasets (e.g. larger datasets with a large number of correlated variables vs. smaller datasets with non correlated variables) will benefit differently from different missing data imputation algorithms. 'missCompare' takes measurements of your dataset and sets up a sandbox to try a curated list of standard and sophisticated missing data imputation algorithms and compares them assuming custom missingness patterns. 'missCompare' will also impute your real-life dataset for you after the selection of the best performing algorithm in the simulations. The package also provides various post-imputation diagnostics and visualizations to help you assess imputation performance.

Maintained by Tibor V. Varga. Last updated 4 years ago.

comparison comparison-benchmarks imputation imputation-algorithm imputation-methods imputations kolmogorov-smirnov missing missing-data missing-data-imputation missing-status-check missing-values missingness post-imputation-diagnostics rmse

39 stars 5.89 score 40 scripts

flr

FLBEIA:Bio-Economic Impact Assessment of Management Strategies using FLR

A simulation toolbox that describes a fishery system under a Management Strategy Estrategy approach. The objective of the model is to facilitate the Bio-Economic evaluation of Management strategies. It is multistock, multifleet and seasonal. The simulation is divided in 2 main blocks, the Operating Model (OM) and the Management Procedure (MP). In turn, each of these two blocks is divided in 3 components: the biological, the fleets and the covariables on the one hand, and the observation, the assessment and the advice on the other.

Maintained by FLBEIA Team. Last updated 17 days ago.

cpp

11 stars 5.89 score 156 scripts

bioc

miRspongeR:Identification and analysis of miRNA sponge regulation

This package provides several functions to explore miRNA sponge (also called ceRNA or miRNA decoy) regulation from putative miRNA-target interactions or/and transcriptomics data (including bulk, single-cell and spatial gene expression data). It provides eight popular methods for identifying miRNA sponge interactions, and an integrative method to integrate miRNA sponge interactions from different methods, as well as the functions to validate miRNA sponge interactions, and infer miRNA sponge modules, conduct enrichment analysis of miRNA sponge modules, and conduct survival analysis of miRNA sponge modules. By using a sample control variable strategy, it provides a function to infer sample-specific miRNA sponge interactions. In terms of sample-specific miRNA sponge interactions, it implements three similarity methods to construct sample-sample correlation network.

Maintained by Junpeng Zhang. Last updated 5 months ago.

geneexpression biomedicalinformatics networkenrichment survival microarray software singlecell spatial rnaseq cerna mirna sponge

5 stars 5.88 score 8 scripts

edhofman

ReSurv:Machine Learning Models For Predicting Claim Counts

Prediction of claim counts using the feature based development factors introduced in the manuscript <doi:10.48550/arXiv.2312.14549>. Implementation of Neural Networks, Extreme Gradient Boosting, and Cox model with splines to optimise the partial log-likelihood of proportional hazard models.

Maintained by Emil Hofman. Last updated 5 months ago.

2 stars 5.87 score 21 scripts

dopatendo

ILSAmerge:Merge and Download International Large-Scale Assessments (ILSA) Data

Merges and downloads 'SPSS' data from different International Large-Scale Assessments (ILSA), including: Trends in International Mathematics and Science Study (TIMSS), Progress in International Reading Literacy Study (PIRLS), and others.

Maintained by Andrés Christiansen. Last updated 1 months ago.

2 stars 5.86 score 12 scripts

dickoa

robotoolbox:Client for the 'KoboToolbox' API

Suite of utilities for accessing and manipulating data from the 'KoboToolbox' API. 'KoboToolbox' is a robust platform designed for field data collection in various disciplines. This package aims to simplify the process of fetching and handling data from the API. Detailed documentation for the 'KoboToolbox' API can be found at <https://support.kobotoolbox.org/api.html>.

Maintained by Ahmadou Dicko. Last updated 3 months ago.

open-data kobotoolbox odk kpi api data dataset

5.86 score 48 scripts

cardiomoon

ggplotAssist:'RStudio' Addin for Teaching and Learning 'ggplot2'

An 'RStudio' addin for teaching and learning making plot using the 'ggplot2' package. You can learn each steps of making plot by clicking your mouse without coding. You can get resultant code for the plot.

Maintained by Keon-Woong Moon. Last updated 7 years ago.

79 stars 5.85 score 18 scripts

hsvab

odbr:Download Data from Brazil's Origin Destination Surveys

Download data from Brazil's Origin Destination Surveys. The package covers both data from household travel surveys, dictionaries of variables, and the spatial geometries of surveys conducted in different years and across various urban areas in Brazil. For some cities, the package will include enhanced versions of the data sets with variables "harmonized" across different years.

Maintained by Haydee Svab. Last updated 1 months ago.

16 stars 5.85 score 11 scripts

rpruim

fastR2:Foundations and Applications of Statistics Using R (2nd Edition)

Data sets and utilities to accompany the second edition of "Foundations and Applications of Statistics: an Introduction using R" (R Pruim, published by AMS, 2017), a text covering topics from probability and mathematical statistics at an advanced undergraduate level. R is integrated throughout, and access to all the R code in the book is provided via the snippet() function.

Maintained by Randall Pruim. Last updated 1 years ago.

13 stars 5.85 score 108 scripts

bayer-group

adepro:A 'shiny' Application for the (Audio-)Visualization of Adverse Event Profiles

Contains a 'shiny' application called AdEPro (Animation of Adverse Event Profiles) which (audio-)visualizes adverse events occurring in clinical trials. As this data is usually considered sensitive, this tool is provided as a stand-alone application that can be launched from any local machine on which the data is stored.

Maintained by Nicole Rethemeier. Last updated 5 days ago.

adverse-events bayer-not-classified bayer-reg-none beat-not-applicable clinical-trials data-insights shiny-apps visualization

7 stars 5.84 score 11 scripts

agdamsbo

REDCapCAST:REDCap Metadata Casting and Castellated Data Handling

Casting metadata for REDCap database creation and handling of castellated data using repeated instruments and longitudinal projects in 'REDCap'. Keeps a focused data export approach, by allowing to only export required data from the database. Also for casting new REDCap databases based on datasets from other sources. Originally forked from the R part of 'REDCapRITS' by Paul Egeler. See <https://github.com/pegeler/REDCapRITS>. 'REDCap' (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies, providing 1) an intuitive interface for validated data capture; 2) audit trails for tracking data manipulation and export procedures; 3) automated export procedures for seamless data downloads to common statistical packages; and 4) procedures for data integration and interoperability with external sources (Harris et al (2009) <doi:10.1016/j.jbi.2008.08.010>; Harris et al (2019) <doi:10.1016/j.jbi.2019.103208>).

Maintained by Andreas Gammelgaard Damsbo. Last updated 19 days ago.

1 stars 5.84 score 12 scripts

bioc

ISAnalytics:Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies

In gene therapy, stem cells are modified using viral vectors to deliver the therapeutic transgene and replace functional properties since the genetic modification is stable and inherited in all cell progeny. The retrieval and mapping of the sequences flanking the virus-host DNA junctions allows the identification of insertion sites (IS), essential for monitoring the evolution of genetically modified cells in vivo. A comprehensive toolkit for the analysis of IS is required to foster clonal trackign studies and supporting the assessment of safety and long term efficacy in vivo. This package is aimed at (1) supporting automation of IS workflow, (2) performing base and advance analysis for IS tracking (clonal abundance, clonal expansions and statistics for insertional mutagenesis, etc.), (3) providing basic biology insights of transduced stem cells in vivo.

Maintained by Francesco Gazzo. Last updated 4 months ago.

biomedicalinformatics sequencing singlecell

3 stars 5.83 score 15 scripts

jdjohn215

pollster:Calculate Crosstab and Topline Tables of Weighted Survey Data

Calculate common types of tables for weighted survey data. Options include topline and (2-way and 3-way) crosstab tables of categorical or ordinal data as well as summary tables of weighted numeric variables. Optionally, include the margin of error at selected confidence intervals including the design effect. The design effect is calculated as described by Kish (1965) <doi:10.1002/bimj.19680100122> beginning on page 257. Output takes the form of tibbles (simple data frames). This package conveniently handles labelled data, such as that commonly used by 'Stata' and 'SPSS.' Complex survey design is not supported at this time.

Maintained by John D. Johnson. Last updated 2 years ago.

9 stars 5.80 score 47 scripts

bioc

benchdamic:Benchmark of differential abundance methods on microbiome data

Starting from a microbiome dataset (16S or WMS with absolute count values) it is possible to perform several analysis to assess the performances of many differential abundance detection methods. A basic and standardized version of the main differential abundance analysis methods is supplied but the user can also add his method to the benchmark. The analyses focus on 4 main aspects: i) the goodness of fit of each method's distributional assumptions on the observed count data, ii) the ability to control the false discovery rate, iii) the within and between method concordances, iv) the truthfulness of the findings if any apriori knowledge is given. Several graphical functions are available for result visualization.

Maintained by Matteo Calgaro. Last updated 4 months ago.

metagenomics microbiome differentialexpression multiplecomparison normalization preprocessing software benchmark differential-abundance-methods

8 stars 5.78 score 8 scripts

drsimonspencer

AMISforInfectiousDiseases:Implement the AMIS Algorithm for Infectious Disease Models

Implements the Adaptive Multiple Importance Sampling (AMIS) algorithm, as described by Retkute et al. (2021, <doi:10.1214/21-AOAS1486>), to estimate key epidemiological parameters by combining outputs from a geostatistical model of infectious diseases (such as prevalence, incidence, or relative risk) with a disease transmission model. Utilising the resulting posterior distributions, the package enables forward projections at the local level.

Maintained by Simon Spencer. Last updated 2 months ago.

cpp openmp

5.78 score 6 scripts

richardli

surveyPrev:Mapping the Prevalence of Binary Indicators using Survey Data in Small Areas

Provides a pipeline to perform small area estimation and prevalence mapping of binary indicators using health and demographic survey data, described in Fuglstad et al. (2022) <doi:10.48550/arXiv.2110.09576> and Wakefield et al. (2020) <doi:10.1111/insr.12400>.

Maintained by Qianyu Dong. Last updated 20 hours ago.

1 stars 5.76 score 11 scripts

homerhanumat

tigerstats:R Functions for Elementary Statistics

A collection of data sets and functions that are useful in the teaching of statistics at an elementary level to students who may have little or no previous experience with the command line. The functions for elementary inferential procedures follow a uniform interface for user input. Some of the functions are instructional applets that can only be run on the R Studio integrated development environment with package 'manipulate' installed. Other instructional applets are Shiny apps that may be run locally. In teaching the package is used alongside of package 'mosaic', 'mosaicData' and 'abd', which are therefore listed as dependencies.

Maintained by Homer White. Last updated 5 years ago.

16 stars 5.74 score 327 scripts

jwiley

multilevelTools:Multilevel and Mixed Effects Model Diagnostics and Effect Sizes

Effect sizes, diagnostics and performance metrics for multilevel and mixed effects models. Includes marginal and conditional 'R2' estimates for linear mixed effects models based on Johnson (2014) <doi:10.1111/2041-210X.12225>.

Maintained by Joshua F. Wiley. Last updated 3 days ago.

4 stars 5.74 score 136 scripts

bioc

limpca:An R package for the linear modeling of high-dimensional designed data based on ASCA/APCA family of methods

This package has for objectives to provide a method to make Linear Models for high-dimensional designed data. limpca applies a GLM (General Linear Model) version of ASCA and APCA to analyse multivariate sample profiles generated by an experimental design. ASCA/APCA provide powerful visualization tools for multivariate structures in the space of each effect of the statistical model linked to the experimental design and contrarily to MANOVA, it can deal with mutlivariate datasets having more variables than observations. This method can handle unbalanced design.

Maintained by Manon Martin. Last updated 5 months ago.

statisticalmethod principalcomponent regression visualization experimentaldesign multiplecomparison geneexpression metabolomics

2 stars 5.73 score 2 scripts

josie-athens

pubh:A Toolbox for Public Health and Epidemiology

A toolbox for making R functions and capabilities more accessible to students and professionals from Epidemiology and Public Health related disciplines. Includes a function to report coefficients and confidence intervals from models using robust standard errors (when available), functions that expand 'ggplot2' plots and functions relevant for introductory papers in Epidemiology or Public Health. Please note that use of the provided data sets is for educational purposes only.

Maintained by Josie Athens. Last updated 6 months ago.

5 stars 5.73 score 72 scripts

bioc

rexposome:Exposome exploration and outcome data analysis

Package that allows to explore the exposome and to perform association analyses between exposures and health outcomes.

Maintained by Xavier Escribà Montagut. Last updated 5 months ago.

software biologicalquestion infrastructure dataimport datarepresentation biomedicalinformatics experimentaldesign multiplecomparison classification clustering

5.70 score 28 scripts 1 dependents

maraab23

ggseqplot:Render Sequence Plots using 'ggplot2'

A set of wrapper functions that mainly re-produces most of the sequence plots rendered with TraMineR::seqplot(). Whereas 'TraMineR' uses base R to produce the plots this library draws on 'ggplot2'. The plots are produced on the basis of a sequence object defined with TraMineR::seqdef(). The package automates the reshaping and plotting of sequence data. Resulting plots are of class 'ggplot', i.e. components can be added and tweaked using '+' and regular 'ggplot2' functions.

Maintained by Marcel Raab. Last updated 4 months ago.

ggplot2 sequence-analysis traminer visualization

14 stars 5.70 score 18 scripts

plant-functional-trait-course

fluxible:Ecosystem Gas Fluxes Calculations for Closed Loop Chamber Setup

Processes the raw data from closed loop flux chamber (or tent) setups into ecosystem gas fluxes usable for analysis. It goes from a data frame of gas concentration over time (which can contain several measurements) and a meta data file indicating which measurement was done when, to a data frame of ecosystem gas fluxes including quality diagnostics. Functions provided include different models (exponential as described in Zhao et al (2018) <doi:10.1016/j.agrformet.2018.08.022>, quadratic and linear) to estimate the fluxes from the raw data, quality assessment, plotting for visual check and calculation of fluxes based on the setup specific parameters (chamber size, plot area, ...).

Maintained by Joseph Gaudard. Last updated 1 days ago.

5.69 score 12 scripts

xiaozhangryy

CAESAR.Suite:CAESAR: a Cross-Technology and Cross-Resolution Framework for Spatial Omics Annotation

Biotechnology in spatial omics has advanced rapidly over the past few years, enhancing both throughput and resolution. However, existing annotation pipelines in spatial omics predominantly rely on clustering methods, lacking the flexibility to integrate extensive annotated information from single-cell RNA sequencing (scRNA-seq) due to discrepancies in spatial resolutions, species, or modalities. Here we introduce the CAESAR suite, an open-source software package that provides image-based spatial co-embedding of locations and genomic features. It uniquely transfers labels from scRNA-seq reference, enabling the annotation of spatial omics datasets across different technologies, resolutions, species, and modalities, based on the conserved relationship between signature genes and cells/locations at an appropriate level of granularity. Notably, CAESAR enriches location-level pathways, allowing for the detection of gradual biological pathway activation within spatially defined domain types. More details on the methods related to our paper currently under submission. A full reference to the paper will be provided in future versions once the paper is published.

Maintained by Xiao Zhang. Last updated 4 days ago.

openblas cpp

1 stars 5.67 score 2 scripts

thermostats

RVA:RNAseq Visualization Automation

Automate downstream visualization & pathway analysis in RNAseq analysis. 'RVA' is a collection of functions that efficiently visualize RNAseq differential expression analysis result from summary statistics tables. It also utilize the Fisher's exact test to evaluate gene set or pathway enrichment in a convenient and efficient manner.

Maintained by Xingpeng Li. Last updated 3 years ago.

9 stars 5.65 score 6 scripts

bioc

multicrispr:Multi-locus multi-purpose Crispr/Cas design

This package is for designing Crispr/Cas9 and Prime Editing experiments. It contains functions to (1) define and transform genomic targets, (2) find spacers (4) count offtarget (mis)matches, and (5) compute Doench2016/2014 targeting efficiency. Care has been taken for multicrispr to scale well towards large target sets, enabling the design of large Crispr/Cas9 libraries.

Maintained by Aditya Bhagwat. Last updated 4 months ago.

crispr software

5.65 score 2 scripts

stamats

MKpower:Power Analysis and Sample Size Calculation

Power analysis and sample size calculation for Welch and Hsu (Hedderich and Sachs (2018), ISBN:978-3-662-56657-2) t-tests including Monte-Carlo simulations of empirical power and type-I-error. Power and sample size calculation for Wilcoxon rank sum and signed rank tests via Monte-Carlo simulations. Power and sample size required for the evaluation of a diagnostic test(-system) (Flahault et al. (2005), <doi:10.1016/j.jclinepi.2004.12.009>; Dobbin and Simon (2007), <doi:10.1093/biostatistics/kxj036>) as well as for a single proportion (Fleiss et al. (2003), ISBN:978-0-471-52629-2; Piegorsch (2004), <doi:10.1016/j.csda.2003.10.002>; Thulin (2014), <doi:10.1214/14-ejs909>), comparing two negative binomial rates (Zhu and Lakkis (2014), <doi:10.1002/sim.5947>), ANCOVA (Shieh (2020), <doi:10.1007/s11336-019-09692-3>), reference ranges (Jennen-Steinmetz and Wellek (2005), <doi:10.1002/sim.2177>), multiple primary endpoints (Sozu et al. (2015), ISBN:978-3-319-22005-5), and AUC (Hanley and McNeil (1982), <doi:10.1148/radiology.143.1.7063747>).

Maintained by Matthias Kohl. Last updated 6 months ago.

7 stars 5.65 score 32 scripts

bioc

MSstatsLiP:LiP Significance Analysis in shotgun mass spectrometry-based proteomic experiments

Tools for LiP peptide and protein significance analysis. Provides functions for summarization, estimation of LiP peptide abundance, and detection of changes across conditions. Utilizes functionality across the MSstats family of packages.

Maintained by Devon Kohler. Last updated 5 months ago.

immunooncology massspectrometry proteomics software differentialexpression onechannel twochannel normalization qualitycontrol cpp

7 stars 5.62 score 5 scripts

tntp

tntpr:Data Analysis Tools Customized for TNTP

An assortment of functions and templates customized to meet the needs of data analysts at the non-profit organization TNTP. Includes functions for branded colors and plots, credentials management, repository set-up, and other common analytic tasks.

Maintained by Dustin Pashouwer. Last updated 4 months ago.

7 stars 5.61 score 13 scripts

ucd-serg

serocalculator:Estimating Infection Rates from Serological Data

Translates antibody levels measured in cross-sectional population samples into estimates of the frequency with which seroconversions (infections) occur in the sampled populations. Replaces the previous `seroincidence` package.

Maintained by Kristina Lai. Last updated 5 days ago.

epidemiology incidence-estimation seroepidemiology

6 stars 5.61 score 13 scripts

bioc

gpuMagic:An openCL compiler with the capacity to compile R functions and run the code on GPU

The package aims to help users write openCL code with little or no effort. It is able to compile an user-defined R function and run it on a device such as a CPU or a GPU. The user can also write and run their openCL code directly by calling .kernel function.

Maintained by Jiefei Wang. Last updated 5 months ago.

infrastructure ocl-icd cpp

10 stars 5.60 score 1 scripts

maelstrom-research

Rmonize:Support Retrospective Harmonization of Data

Functions to support rigorous retrospective data harmonization processing, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the harmonization process, apply specified processing rules to generate harmonized data, diagnose processing errors, and summarize and evaluate harmonized outputs. The main inputs that define the processing are a DataSchema (list and definitions of harmonized variables to be generated) and Data Processing Elements (processing rules to be applied to generate harmonized variables from study-specific variables). The main outputs of processing are harmonized datasets, associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).

Maintained by Guillaume Fabre. Last updated 1 years ago.

5 stars 5.58 score 51 scripts

chaisemartinpackages

TwoWayFEWeights:Estimation of the Weights Attached to the Two-Way Fixed Effects Regressions

Estimates the weights and measure of robustness to treatment effect heterogeneity attached to two-way fixed effects regressions. Clément de Chaisemartin, Xavier D'Haultfœuille (2020) <DOI: 10.1257/aer.20181169>.

Maintained by Diego Ciccia. Last updated 8 months ago.

18 stars 5.58 score 20 scripts

mrcieu

mrbayes:Bayesian Summary Data Models for Mendelian Randomization Studies

Bayesian estimation of inverse variance weighted (IVW), Burgess et al. (2013) <doi:10.1002/gepi.21758>, and MR-Egger, Bowden et al. (2015) <doi:10.1093/ije/dyv080>, summary data models for Mendelian randomization analyses.

Maintained by Tom Palmer. Last updated 13 days ago.

cpp

4 stars 5.56 score 2 scripts

middleton-lab

abd:The Analysis of Biological Data

The abd package contains data sets and sample code for The Analysis of Biological Data by Michael Whitlock and Dolph Schluter (2009; Roberts & Company Publishers).

Maintained by Kevin M. Middleton. Last updated 11 months ago.

6 stars 5.53 score 182 scripts 1 dependents

openanalytics

inTextSummaryTable:Creation of in-Text Summary Table

Creation of tables of summary statistics or counts for clinical data (for 'TLFs'). These tables can be exported as in-text table (with the 'flextable' package) for a Clinical Study Report (Word format) or a 'topline' presentation (PowerPoint format), or as interactive table (with the 'DT' package) to an html document for clinical data review.

Maintained by Laure Cougnaud. Last updated 10 months ago.

1 stars 5.52 score 47 scripts

svmiller

peacesciencer:Tools and Data for Quantitative Peace Science Research

These are useful tools and data sets for the study of quantitative peace science. The goal for this package is to include tools and data sets for doing original research that mimics well what a user would have to previously get from a software package that may not be well-sourced or well-supported. Those software bundles were useful the extent to which they encourage replications of long-standing analyses by starting the data-generating process from scratch. However, a lot of the functionality can be done relatively quickly and more transparently in the R programming language.

Maintained by Steve Miller. Last updated 16 days ago.

eugene peace-science

29 stars 5.49 score 211 scripts

ropensci

EndoMineR:Functions to mine endoscopic and associated pathology datasets

This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.

Maintained by Sebastian Zeki. Last updated 7 months ago.

endoscopy gastroenterology peer-reviewed semi-structured-data text-mining

13 stars 5.47 score 30 scripts

ilostat

Rilostat:ILO Open Data via Ilostat Bulk Download Facility

Tools to download data from the [ilostat](<https://ilostat.ilo.org>) database together with search and manipulation utilities.

Maintained by David Bescond. Last updated 19 days ago.

api dataset open-source public-api

34 stars 5.47 score 43 scripts

alvesks

epifitter:Analysis and Simulation of Plant Disease Progress Curves

Analysis and visualization of plant disease progress curve data. Functions for fitting two-parameter population dynamics models (exponential, monomolecular, logistic and Gompertz) to proportion data for single or multiple epidemics using either linear or no-linear regression. Statistical and visual outputs are provided to aid in model selection. Synthetic curves can be simulated for any of the models given the parameters. See Laurence V. Madden, Gareth Hughes, and Frank van den Bosch (2007) <doi:10.1094/9780890545058> for further information on the methods.

Maintained by Kaique dos S. Alves. Last updated 2 months ago.

5 stars 5.42 score 53 scripts

bioc

synergyfinder:Calculate and Visualize Synergy Scores for Drug Combinations

Efficient implementations for analyzing pre-clinical multiple drug combination datasets. It provides efficient implementations for 1.the popular synergy scoring models, including HSA, Loewe, Bliss, and ZIP to quantify the degree of drug combination synergy; 2. higher order drug combination data analysis and synergy landscape visualization for unlimited number of drugs in a combination; 3. statistical analysis of drug combination synergy and sensitivity with confidence intervals and p-values; 4. synergy barometer for harmonizing multiple synergy scoring methods to provide a consensus metric of synergy; 5. evaluation of synergy and sensitivity simultaneously to provide an unbiased interpretation of the clinical potential of the drug combinations. Based on this package, we also provide a web application (http://www.synergyfinder.org) for users who prefer graphical user interface.

Maintained by Shuyu Zheng. Last updated 5 months ago.

software statisticalmethod

5.42 score 44 scripts

maelstrom-research

madshapR:Support Technical Processes Following 'Maelstrom Research' Standards

Functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).

Maintained by Guillaume Fabre. Last updated 11 months ago.

2 stars 5.40 score 28 scripts 3 dependents

gertstulp

ggplotgui:Create Ggplots via a Graphical User Interface

Easily explore data by creating ggplots through a (shiny-)GUI. R-code to recreate graph provided.

Maintained by Gert Stulp. Last updated 5 years ago.

139 stars 5.40 score 18 scripts

jeksterslab

semmcci:Monte Carlo Confidence Intervals in Structural Equation Modeling

Monte Carlo confidence intervals for free and defined parameters in models fitted in the structural equation modeling package 'lavaan' can be generated using the 'semmcci' package. 'semmcci' has three main functions, namely, MC(), MCMI(), and MCStd(). The output of 'lavaan' is passed as the first argument to the MC() function or the MCMI() function to generate Monte Carlo confidence intervals. Monte Carlo confidence intervals for the standardized estimates can also be generated by passing the output of the MC() function or the MCMI() function to the MCStd() function. A description of the package and code examples are presented in Pesigan and Cheung (2023) <doi:10.3758/s13428-023-02114-4>.

Maintained by Ivan Jacob Agaloos Pesigan. Last updated 3 months ago.

confidence-intervals monte-carlo structural-equation-modeling

2 stars 5.39 score 76 scripts

choonghyunryu

alookr:Model Classifier for Binary Classification

A collection of tools that support data splitting, predictive modeling, and model evaluation. A typical function is to split a dataset into a training dataset and a test dataset. Then compare the data distribution of the two datasets. Another feature is to support the development of predictive models and to compare the performance of several predictive models, helping to select the best model.

Maintained by Choonghyun Ryu. Last updated 1 years ago.

12 stars 5.38 score 9 scripts

insightsengineering

osprey:R Package to Create TLGs

Community effort to collect TLG code and create a catalogue.

Maintained by Nina Qi. Last updated 1 months ago.

catalog graphs listings nest tables

4 stars 5.38 score 1 dependents

btskinner

duawranglr:Securely Wrangle Dataset According to Data Usage Agreement

Create shareable data sets from raw data files that contain protected elements. Relying on master crosswalk files that list restricted variables, package functions warn users about possible violations of data usage agreement and prevent writing protected elements.

Maintained by Benjamin Skinner. Last updated 4 years ago.

data-security data-usage-agreement data-wrangling

9 stars 5.37 score 13 scripts

bioc

SPONGE:Sparse Partial Correlations On Gene Expression

This package provides methods to efficiently detect competitive endogeneous RNA interactions between two genes. Such interactions are mediated by one or several miRNAs such that both gene and miRNA expression data for a larger number of samples is needed as input. The SPONGE package now also includes spongEffects: ceRNA modules offer patient-specific insights into the miRNA regulatory landscape.

Maintained by Markus List. Last updated 5 months ago.

geneexpression transcription generegulation networkinference transcriptomics systemsbiology regression randomforest machinelearning

5.36 score 38 scripts 1 dependents

s87jackson

rfars:Download and Analyze Crash Data

Download crash data from the National Highway Traffic Safety Administration and prepare it for research.

Maintained by Steve Jackson. Last updated 12 months ago.

crash fatalities official-statistics transportation

10 stars 5.35 score 15 scripts

junjunlab

transPlotR:Visualize Transcript Structures in Elegant Way

To visualize the gene structure with multiple isoforms better, I developed this package to draw different transcript structures easily.

Maintained by Jun Zhang. Last updated 2 years ago.

bed bigwig gene linkvis transcript visualization

73 stars 5.34 score 60 scripts

matteo21q

dani:Design and Analysis of Non-Inferiority Trials

Provides tools to help with the design and analysis of non-inferiority trials. These include functions for doing sample size calculations and for analysing non-inferiority trials, using a variety of outcome types and population-level sumamry measures. It also features functions to make trials more resilient by using the concept of non-inferiority frontiers, as described in Quartagno et al. (2019) <arXiv:1905.00241>. Finally it includes function to design and analyse MAMS-ROCI (aka DURATIONS) trials.

Maintained by Matteo Quartagno. Last updated 7 months ago.

2 stars 5.33 score 27 scripts

santagos

dad:Three-Way / Multigroup Data Analysis Through Densities

The data consist of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides tools to create or manage such data and functional methods (principal component analysis, multidimensional scaling, cluster analysis, discriminant analysis...) for such probability densities.

Maintained by Pierre Santagostini. Last updated 4 months ago.

5.32 score 92 scripts

elliecurnow

midoc:A Decision-Making System for Multiple Imputation

A guidance system for analysis with missing data. It incorporates expert, up-to-date methodology to help researchers choose the most appropriate analysis approach when some data are missing. You provide the available data and the assumed causal structure, including the likely causes of missing data. 'midoc' will advise which analysis approaches can be used, and how best to perform them. 'midoc' follows the framework for the treatment and reporting of missing data in observational studies (TARMOS). Lee et al (2021). <doi:10.1016/j.jclinepi.2021.01.008>.

Maintained by Elinor Curnow. Last updated 6 months ago.

missing-data multiple-imputation

6 stars 5.32 score 8 scripts

mspinillos

ecoregime:Analysis of Ecological Dynamic Regimes

A toolbox for implementing the Ecological Dynamic Regime framework (Sánchez-Pinillos et al., 2023 <doi:10.1002/ecm.1589>) to characterize and compare groups of ecological trajectories in multidimensional spaces defined by state variables. The package includes the RETRA-EDR algorithm to identify representative trajectories, functions to generate, summarize, and visualize representative trajectories, and several metrics to quantify the distribution and heterogeneity of trajectories in an ecological dynamic regime and quantify the dissimilarity between two or more ecological dynamic regimes. The package also includes a set of functions to assess ecological resilience based on ecological dynamic regimes (Sánchez-Pinillos et al., 2024 <doi:10.1016/j.biocon.2023.110409>).

Maintained by Martina Sánchez-Pinillos. Last updated 12 months ago.

7 stars 5.32 score 8 scripts

f-silva-archaeo

skyscapeR:Data Analysis and Visualization for Skyscape Archaeology

Data reduction, visualization and statistical analysis of measurements of orientation of archaeological structures, following Silva (2020) <doi:10.1016/j.jas.2020.105138>.

Maintained by Silva Fabio. Last updated 6 months ago.

5 stars 5.31 score 41 scripts

eldafani

intsvy:International Assessment Data Manager

Provides tools for importing, merging, and analysing data from international assessment studies (TIMSS, PIRLS, PISA, ICILS, and PIAAC).

Maintained by Daniel Caro. Last updated 1 years ago.

22 stars 5.29 score 88 scripts

btskinner

crosswalkr:Rename and Encode Data Frames Using External Crosswalk Files

A pair of functions for renaming and encoding data frames using external crosswalk files. It is especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on similar commands in 'Stata'.

Maintained by Benjamin Skinner. Last updated 1 years ago.

crosswalk encode labels rename

9 stars 5.26 score 20 scripts

nceas

scicomptools:Tools Developed by the NCEAS Scientific Computing Support Team

Set of tools to import, summarize, wrangle, and visualize data. These functions were originally written based on the needs of the various synthesis working groups that were supported by the National Center for Ecological Analysis and Synthesis (NCEAS). These tools are meant to be useful inside and outside of the context for which they were designed.

Maintained by Angel Chen. Last updated 5 months ago.

data-science

9 stars 5.26 score 6 scripts

jiang-junyao

CACIMAR:cross-species analysis of cell identities, markers and regulations

A toolkit to perform cross-species analysis based on scRNA-seq data. CACIMAR contains 5 main features. (1) identify Markers in each cluster. (2) Cell type annotaion (3) identify conserved markers. (4) identify conserved cell types. (5) identify conserved modules of regulatory networks.

Maintained by Junyao Jiang. Last updated 4 months ago.

cross-species-analysis scrna-seq

12 stars 5.26 score 6 scripts

harrison4192

validata:Validate Data Frames

Functions for validating the structure and properties of data frames. Answers essential questions about a data set after initial import or modification. What are the unique or missing values? What columns form a primary key? What are the properties of the numeric or categorical columns? What kind of overlap or mapping exists between 2 columns?

Maintained by Harrison Tietze. Last updated 24 days ago.

6 stars 5.26 score 4 scripts 1 dependents

anna-neufeld

splinetree:Longitudinal Regression Trees and Forests

Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.

Maintained by Anna Neufeld. Last updated 6 years ago.

4 stars 5.24 score 29 scripts