R-universe search: needs:testthat

r-lib

devtools:Tools to Make Developing R Packages Easier

Collection of package development tools.

Maintained by Jennifer Bryan. Last updated 6 months ago.

package-creation

2.4k stars 19.55 score 51k scripts 150 dependents

philchalmers

mirt:Multidimensional Item Response Theory

Analysis of discrete response data using unidimensional and multidimensional item analysis models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory item factor analysis models are estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier models are available for modeling item testlets using dimension reduction EM algorithms, while multiple group analyses and mixed effects designs are included for detecting differential item, bundle, and test functioning, and for modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, mixture IRT models, and zero-inflated response models are supported, as well as a wide family of probabilistic unfolding models.

Maintained by Phil Chalmers. Last updated 4 days ago.

irt mirt openblas cpp openmp

212 stars 14.93 score 2.5k scripts 40 dependents

philchalmers

SimDesign:Structure for Organizing Monte Carlo Simulation Designs

Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>.

Maintained by Phil Chalmers. Last updated 3 days ago.

monte-carlo-simulation simulation simulation-framework

62 stars 13.41 score 253 scripts 47 dependents

openpharma

mmrm:Mixed Models for Repeated Measures

Mixed models for repeated measures (MMRM) are a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials and beyond; see Cnaan, Laird and Slasor (1997) <doi:10.1002/(SICI)1097-0258(19971030)16:20%3C2349::AID-SIM667%3E3.0.CO;2-E> for a tutorial and Mallinckrodt, Lane, Schnell, Peng and Mancuso (2008) <doi:10.1177/009286150804200402> for a review. This package implements MMRM based on the marginal linear model without random effects using Template Model Builder ('TMB') which enables fast and robust model fitting. Users can specify a variety of covariance matrices, weight observations, fit models with restricted or standard maximum likelihood inference, perform hypothesis testing with Satterthwaite or Kenward-Roger adjustment, and extract least square means estimates by using 'emmeans'.

Maintained by Daniel Sabanes Bove. Last updated 24 days ago.

cpp

138 stars 12.15 score 113 scripts 4 dependents

rstudio

shinytest2:Testing for Shiny Applications

Automated unit testing of Shiny applications through a headless 'Chromium' browser.

Maintained by Barret Schloerke. Last updated 5 days ago.

cpp

108 stars 12.13 score 704 scripts 1 dependents

kevinushey

sourcetools:Tools for Reading, Tokenizing and Parsing R Code

Tools for Reading, Tokenizing and Parsing R Code.

Maintained by Kevin Ushey. Last updated 2 years ago.

cpp

78 stars 11.77 score 32 scripts 1.8k dependents

r-lib

mockery:Mocking Library for R

The two main functionalities of this package are creating mock objects (functions) and selectively intercepting calls to a given function that originate in some other function. It can be used with any testing framework available for R. Mock objects can be injected with either this package's own stub() function or a similar with_mock() facility present in the 'testthat' package.

Maintained by Hadley Wickham. Last updated 1 years ago.

100 stars 11.57 score 504 scripts 5 dependents

tylermorganwall

rayshader:Create Maps and Visualize Data in 2D and 3D

Uses a combination of raytracing and multiple hill shading methods to produce 2D and 3D data visualizations and maps. Includes water detection and layering functions, programmable color palette generation, several built-in textures for hill shading, 2D and 3D plotting options, a built-in path tracer, 'Wavefront' OBJ file export, and the ability to save 3D visualizations to a 3D printable format.

Maintained by Tyler Morgan-Wall. Last updated 2 months ago.

cpp

2.1k stars 11.55 score 1.5k scripts 5 dependents

tylermorganwall

rayrender:Build and Raytrace 3D Scenes

Render scenes using pathtracing. Build 3D scenes out of spheres, cubes, planes, disks, triangles, cones, curves, line segments, cylinders, ellipsoids, and 3D models in the 'Wavefront' OBJ file format or the PLY Polygon File Format. Supports several material types, textures, multicore rendering, and tone-mapping. Based on the "Ray Tracing in One Weekend" book series. Peter Shirley (2018) <https://raytracing.github.io>.

Maintained by Tyler Morgan-Wall. Last updated 4 days ago.

libx11 cpp

631 stars 10.87 score 188 scripts 8 dependents

marce10

warbleR:Streamline Bioacoustic Analysis

Functions aiming to facilitate the analysis of the structure of animal acoustic signals in 'R'. 'warbleR' makes use of the basic sound analysis tools from the packages 'tuneR' and 'seewave', and offers new tools for explore and quantify acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals.

Maintained by Marcelo Araya-Salas. Last updated 2 months ago.

animal-acoustic-signals audio-processing bioacoustics spectrogram streamline-analysis cpp

56 stars 10.86 score 270 scripts 4 dependents

r-lib

vdiffr:Visual Regression Testing and Graphical Diffing

An extension to the 'testthat' package that makes it easy to add graphical unit tests. It provides a Shiny application to manage the test cases.

Maintained by Lionel Henry. Last updated 5 months ago.

ggplot2 graphics testthat libpng cpp

191 stars 10.84 score 254 scripts 5 dependents

rstudio

pointblank:Data Validation and Organization of Metadata for Local and Remote Tables

Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.

Maintained by Richard Iannone. Last updated 5 days ago.

data-assertions data-checker data-dictionaries data-frames data-inference data-management data-profiler data-quality data-validation data-verification database-tables easy-to-understand reporting-tool schema-validation testing-tools yaml-configuration

942 stars 10.73 score 284 scripts

caseyyoungflesh

MCMCvis:Tools to Visualize, Manipulate, and Summarize MCMC Output

Performs key functions for MCMC analysis using minimal code - visualizes, manipulates, and summarizes MCMC output. Functions support simple and straightforward subsetting of model parameters within the calls, and produce presentable and 'publication-ready' output. MCMC output may be derived from Bayesian model output fit with Stan, NIMBLE, JAGS, and other software.

Maintained by Casey Youngflesh. Last updated 4 months ago.

38 stars 10.52 score 1.8k scripts 5 dependents

insightsengineering

teal.modules.clinical:'teal' Modules for Standard Clinical Outputs

Provides user-friendly tools for creating and customizing clinical trial reports. By leveraging the 'teal' framework, this package provides 'teal' modules to easily create an interactive panel that allows for seamless adjustments to data presentation, thereby streamlining the creation of detailed and accurate reports.

Maintained by Dawid Kaledkowski. Last updated 1 months ago.

clinical-trials modules nest outputs shiny

35 stars 10.21 score 149 scripts

n8thangreen

BCEA:Bayesian Cost Effectiveness Analysis

Produces an economic evaluation of a sample of suitable variables of cost and effectiveness / utility for two or more interventions, e.g. from a Bayesian model in the form of MCMC simulations. This package computes the most cost-effective alternative and produces graphical summaries and probabilistic sensitivity analysis, see Baio et al (2017) <doi:10.1007/978-3-319-55718-2>.

Maintained by Gianluca Baio. Last updated 2 months ago.

bayesian cost-effectiveness

3 stars 9.90 score 243 scripts 3 dependents

ndphillips

FFTrees:Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees

Create, visualize, and test fast-and-frugal decision trees (FFTs) using the algorithms and methods described by Phillips, Neth, Woike & Gaissmaier (2017), <doi:10.1017/S1930297500006239>. FFTs are simple and transparent decision trees for solving binary classification problems. FFTs can be preferable to more complex algorithms because they require very little information, are easy to understand and communicate, and are robust against overfitting.

Maintained by Hansjoerg Neth. Last updated 5 months ago.

136 stars 9.53 score 144 scripts

philchalmers

mirtCAT:Computerized Adaptive Testing with Multidimensional Item Response Theory

Provides tools to generate HTML interfaces for adaptive and non-adaptive tests using the shiny package (Chalmers (2016) <doi:10.18637/jss.v071.i05>). Suitable for applying unidimensional and multidimensional computerized adaptive tests (CAT) using item response theory methodology and for creating simple questionnaires forms to collect response data directly in R. Additionally, optimal test designs (e.g., "shadow testing") are supported for tests that contain a large number of item selection constraints. Finally, package contains tools useful for performing Monte Carlo simulations for studying test item banks.

Maintained by Phil Chalmers. Last updated 5 months ago.

cat irt openblas cpp

95 stars 9.47 score 62 scripts 3 dependents

nealrichardson

httptest:A Test Environment for HTTP Requests

Testing and documenting code that communicates with remote servers can be painful. Dealing with authentication, server state, and other complications can make testing seem too costly to bother with. But it doesn't need to be that hard. This package enables one to test all of the logic on the R sides of the API in your package without requiring access to the remote service. Importantly, it provides three contexts that mock the network connection in different ways, as well as testing functions to assert that HTTP requests were---or were not---made. It also allows one to safely record real API responses to use as test fixtures. The ability to save responses and load them offline also enables one to write vignettes and other dynamic documents that can be distributed without access to a live server.

Maintained by Neal Richardson. Last updated 1 years ago.

http mock test-framework

81 stars 9.46 score 276 scripts 1 dependents

thinkr-open

fusen:Build a Package from Rmarkdown Files

Use Rmarkdown First method to build your package. Start your package with documentation, functions, examples and tests in the same unique file. Everything can be set from the Rmarkdown template file provided in your project, then inflated as a package. Inflating the template copies the relevant chunks and sections in the appropriate files required for package development.

Maintained by Vincent Guyader. Last updated 2 months ago.

hacktoberfest rmd-first

163 stars 9.45 score 35 scripts

nealrichardson

httptest2:Test Helpers for 'httr2'

Testing and documenting code that communicates with remote servers can be painful. This package helps with writing tests for packages that use 'httr2'. It enables testing all of the logic on the R sides of the API without requiring access to the remote service, and it also allows recording real API responses to use as test fixtures. The ability to save responses and load them offline also enables writing vignettes and other dynamic documents that can be distributed without access to a live server.

Maintained by Neal Richardson. Last updated 9 months ago.

http mock testing

33 stars 9.37 score 95 scripts 1 dependents

briencj

asremlPlus:Augments 'ASReml-R' in Fitting Mixed Models and Packages Generally in Exploring Prediction Differences

Assists in automating the selection of terms to include in mixed models when 'asreml' is used to fit the models. Procedures are available for choosing models that conform to the hierarchy or marginality principle, for fitting and choosing between two-dimensional spatial models using correlation, natural cubic smoothing spline and P-spline models. A history of the fitting of a sequence of models is kept in a data frame. Also used to compute functions and contrasts of, to investigate differences between and to plot predictions obtained using any model fitting function. The content falls into the following natural groupings: (i) Data, (ii) Model modification functions, (iii) Model selection and description functions, (iv) Model diagnostics and simulation functions, (v) Prediction production and presentation functions, (vi) Response transformation functions, (vii) Object manipulation functions, and (viii) Miscellaneous functions (for further details see 'asremlPlus-package' in help). The 'asreml' package provides a computationally efficient algorithm for fitting a wide range of linear mixed models using Residual Maximum Likelihood. It is a commercial package and a license for it can be purchased from 'VSNi' <https://vsni.co.uk/> as 'asreml-R', who will supply a zip file for local installation/updating (see <https://asreml.kb.vsni.co.uk/>). It is not needed for functions that are methods for 'alldiffs' and 'data.frame' objects. The package 'asremPlus' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 1 months ago.

asreml mixed-models

19 stars 9.37 score 200 scripts

cbielow

PTXQC:Quality Report Generation for MaxQuant and mzTab Results

Generates Proteomics (PTX) quality control (QC) reports for shotgun LC-MS data analyzed with the MaxQuant software suite (from .txt files) or mzTab files (ideally from OpenMS 'QualityControl' tool). Reports are customizable (target thresholds, subsetting) and available in HTML or PDF format. Published in J. Proteome Res., Proteomics Quality Control: Quality Control Software for MaxQuant Results (2015) <doi:10.1021/acs.jproteome.5b00780>.

Maintained by Chris Bielow. Last updated 1 years ago.

drag-and-drop hacktoberfest heatmap match-between-runs maxquant metric mztab openms proteomics quality-control quality-metrics report

42 stars 9.35 score 105 scripts 1 dependents

bioc

BatchQC:Batch Effects Quality Control Software

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.

Maintained by Jessica Anderson. Last updated 13 days ago.

batcheffect graphandnetwork microarray normalization principalcomponent sequencing software visualization qualitycontrol rnaseq preprocessing differentialexpression immunooncology

7 stars 9.06 score 54 scripts

rstudio

shinytest:Test Shiny Apps

Please see the shinytest to shinytest2 migration guide at <https://rstudio.github.io/shinytest2/articles/z-migration.html>.

Maintained by Winston Chang. Last updated 10 months ago.

225 stars 9.02 score 352 scripts

bioc

scPipe:Pipeline for single cell multi-omic data pre-processing

A preprocessing pipeline for single cell RNA-seq/ATAC-seq data that starts from the fastq files and produces a feature count matrix with associated quality control information. It can process fastq data generated by CEL-seq, MARS-seq, Drop-seq, Chromium 10x and SMART-seq protocols.

Maintained by Shian Su. Last updated 3 months ago.

immunooncology software sequencing rnaseq geneexpression singlecell visualization sequencematching preprocessing qualitycontrol genomeannotation dataimport curl bzip2 xz-utils zlib cpp

68 stars 9.02 score 84 scripts

r-spatial

link2GI:Linking Geographic Information Systems, Remote Sensing and Other Command Line Tools

Functions and tools for using open GIS and remote sensing command-line interfaces in a reproducible environment.

Maintained by Chris Reudenbach. Last updated 4 months ago.

26 stars 8.99 score 78 scripts 1 dependents

appsilon

rhino:A Framework for Enterprise Shiny Applications

A framework that supports creating and extending enterprise Shiny applications using best practices.

Maintained by Kamil Żyła. Last updated 3 days ago.

rhinoverse shiny

305 stars 8.99 score 145 scripts

pharmar

riskmetric:Risk Metrics to Evaluating R Packages

Facilities for assessing R packages against a number of metrics to help quantify their robustness.

Maintained by Eli Miller. Last updated 7 days ago.

166 stars 8.98 score 43 scripts

ajrgodfrey

BrailleR:Improved Access for Blind Users

Blind users do not have access to the graphical output from R without printing the content of graphics windows to an embosser of some kind. This is not as immediate as is required for efficient access to statistical output. The functions here are created so that blind people can make even better use of R. This includes the text descriptions of graphs, convenience functions to replace the functionality offered in many GUI front ends, and experimental functionality for optimising graphical content to prepare it for embossing as tactile images.

Maintained by A. Jonathan R. Godfrey. Last updated 12 months ago.

123 stars 8.90 score 143 scripts

pik-piam

remind2:The REMIND R package (2nd generation)

Contains the REMIND-specific routines for data and model output manipulation.

Maintained by Renato Rodrigues. Last updated 3 days ago.

8.87 score 161 scripts 5 dependents

dvats

mcmcse:Monte Carlo Standard Errors for MCMC

Provides tools for computing Monte Carlo standard errors (MCSE) in Markov chain Monte Carlo (MCMC) settings. MCSE computation for expectation and quantile estimators is supported as well as multivariate estimations. The package also provides functions for computing effective sample size and for plotting Monte Carlo estimates versus sample size.

Maintained by Dootika Vats. Last updated 2 months ago.

effective-sample-size mcmc output-a openblas cpp

12 stars 8.77 score 314 scripts 17 dependents

insightsengineering

rbmi:Reference Based Multiple Imputation

Implements standard and reference based multiple imputation methods for continuous longitudinal endpoints (Gower-Page et al. (2022) <doi:10.21105/joss.04251>). In particular, this package supports deterministic conditional mean imputation and jackknifing as described in Wolbers et al. (2022) <doi:10.1002/pst.2234>, Bayesian multiple imputation as described in Carpenter et al. (2013) <doi:10.1080/10543406.2013.834911>, and bootstrapped maximum likelihood imputation as described in von Hippel and Bartlett (2021) <doi: 10.1214/20-STS793>.

Maintained by Isaac Gravestock. Last updated 1 months ago.

18 stars 8.76 score 33 scripts 1 dependents

cynkra

fledge:Smoother Change Tracking and Versioning for R Packages

Streamlines the process of updating changelogs (NEWS.md) and versioning R packages developed in git repositories.

Maintained by Kirill Müller. Last updated 3 months ago.

changelog git package-creation

188 stars 8.73 score 10 scripts

bioc

memes:motif matching, comparison, and de novo discovery using the MEME Suite

A seamless interface to the MEME Suite family of tools for motif analysis. 'memes' provides data aware utilities for using GRanges objects as entrypoints to motif analysis, data structures for examining & editing motif lists, and novel data visualizations. 'memes' functions and data structures are amenable to both base R and tidyverse workflows.

Maintained by Spencer Nystrom. Last updated 5 months ago.

dataimport functionalgenomics generegulation motifannotation motifdiscovery sequencematching software

50 stars 8.69 score 117 scripts 1 dependents

bioc

lefser:R implementation of the LEfSE method for microbiome biomarker discovery

lefser is the R implementation of the popular microbiome biomarker discovery too, LEfSe. It uses the Kruskal-Wallis test, Wilcoxon-Rank Sum test, and Linear Discriminant Analysis to find biomarkers from two-level classes (and optional sub-classes).

Maintained by Sehyun Oh. Last updated 1 months ago.

software sequencing differentialexpression microbiome statisticalmethod classification bioconductor-package r01ca230551

56 stars 8.44 score 56 scripts

ramikrispin

coronavirus:The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset

Provides a daily summary of the Coronavirus (COVID-19) cases by state/province. Data source: Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus <https://systems.jhu.edu/research/public-health/ncov/>.

Maintained by Rami Krispin. Last updated 2 years ago.

covid-19 covid19 covid19-data dataset

499 stars 8.25 score 716 scripts

mi2-warsaw

FSelectorRcpp:'Rcpp' Implementation of 'FSelector' Entropy-Based Feature Selection Algorithms with a Sparse Matrix Support

'Rcpp' (free of 'Java'/'Weka') implementation of 'FSelector' entropy-based feature selection algorithms based on an MDL discretization (Fayyad U. M., Irani K. B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In 13'th International Joint Conference on Uncertainly in Artificial Intelligence (IJCAI93), pages 1022-1029, Chambery, France, 1993.) <https://www.ijcai.org/Proceedings/93-2/Papers/022.pdf> with a sparse matrix support.

Maintained by Zygmunt Zawadzki. Last updated 6 months ago.

entropy feature-selection rcpp sparse-matrix cpp

35 stars 8.22 score 78 scripts 1 dependents

r-dbi

DBItest:Testing DBI Backends

A helper that tests DBI back ends for conformity to the interface.

Maintained by Kirill Müller. Last updated 14 days ago.

database testing

24 stars 8.21 score 11 scripts

mrc-ide

malariasimulation:An individual based model for malaria

Specifies the latest and greatest malaria model.

Maintained by Giovanni Charles. Last updated 1 months ago.

cpp

17 stars 8.19 score 146 scripts

brockk

escalation:A Modular Approach to Dose-Finding Clinical Trials

Methods for working with dose-finding clinical trials. We provide implementations of many dose-finding clinical trial designs, including the continual reassessment method (CRM) by O'Quigley et al. (1990) <doi:10.2307/2531628>, the toxicity probability interval (TPI) design by Ji et al. (2007) <doi:10.1177/1740774507079442>, the modified TPI (mTPI) design by Ji et al. (2010) <doi:10.1177/1740774510382799>, the Bayesian optimal interval design (BOIN) by Liu & Yuan (2015) <doi:10.1111/rssc.12089>, EffTox by Thall & Cook (2004) <doi:10.1111/j.0006-341X.2004.00218.x>; the design of Wages & Tait (2015) <doi:10.1080/10543406.2014.920873>, and the 3+3 described by Korn et al. (1994) <doi:10.1002/sim.4780131802>. All designs are implemented with a common interface. We also offer optional additional classes to tailor the behaviour of all designs, including avoiding skipping doses, stopping after n patients have been treated at the recommended dose, stopping when a toxicity condition is met, or demanding that n patients are treated before stopping is allowed. By daisy-chaining together these classes using the pipe operator from 'magrittr', it is simple to tailor the behaviour of a dose-finding design so it behaves how the trialist wants. Having provided a flexible interface for specifying designs, we then provide functions to run simulations and calculate dose-paths for future cohorts of patients.

Maintained by Kristian Brock. Last updated 3 days ago.

15 stars 8.16 score 67 scripts

r-hyperspec

hyperSpec:Work with Hyperspectral Data, i.e. Spectra + Meta Information (Spatial, Time, Concentration, ...)

Comfortable ways to work with hyperspectral data sets, i.e. spatially or time-resolved spectra, or spectra with any other kind of information associated with each of the spectra. The spectra can be data as obtained in XRF, UV/VIS, Fluorescence, AES, NIR, IR, Raman, NMR, MS, etc. More generally, any data that is recorded over a discretized variable, e.g. absorbance = f(wavelength), stored as a vector of absorbance values for discrete wavelengths is suitable.

Maintained by Claudia Beleites. Last updated 10 months ago.

data-wrangling hyperspectral imaging infrared nmr raman spectroscopy uv-vis xrf

16 stars 8.10 score 233 scripts 2 dependents

ramiromagno

gwasrapidd:'REST' 'API' Client for the 'NHGRI'-'EBI' 'GWAS' Catalog

'GWAS' R 'API' Data Download. This package provides easy access to the 'NHGRI'-'EBI' 'GWAS' Catalog data by accessing the 'REST' 'API' <https://www.ebi.ac.uk/gwas/rest/docs/api/>.

Maintained by Ramiro Magno. Last updated 1 years ago.

thirdpartyclient biomedicalinformatics genomewideassociation snp association-studies gwas-catalog human rest-client trait trait-ontology

95 stars 8.10 score 49 scripts 1 dependents

bioc

FLAMES:FLAMES: Full Length Analysis of Mutations and Splicing in long read RNA-seq data

Semi-supervised isoform detection and annotation from both bulk and single-cell long read RNA-seq data. Flames provides automated pipelines for analysing isoforms, as well as intermediate functions for manual execution.

Maintained by Changqing Wang. Last updated 11 hours ago.

rnaseq singlecell transcriptomics dataimport differentialsplicing alternativesplicing geneexpression longread zlib curl bzip2 xz-utils cpp

33 stars 8.04 score 12 scripts

mazamascience

MazamaSpatialUtils:Spatial Data Download and Utility Functions

A suite of conversion functions to create internally standardized spatial polygons data frames. Utility functions use these data sets to return values such as country, state, time zone, watershed, etc. associated with a set of longitude/latitude pairs. (They also make cool maps.)

Maintained by Jonathan Callahan. Last updated 5 months ago.

5 stars 8.01 score 282 scripts 2 dependents

bioc

scDD:Mixture modeling of single-cell RNA-seq data to identify genes with differential distributions

This package implements a method to analyze single-cell RNA- seq Data utilizing flexible Dirichlet Process mixture models. Genes with differential distributions of expression are classified into several interesting patterns of differences between two conditions. The package also includes functions for simulating data with these patterns from negative binomial distributions.

Maintained by Keegan Korthauer. Last updated 5 months ago.

immunooncology bayesian clustering rnaseq singlecell multiplecomparison visualization differentialexpression

33 stars 7.92 score 50 scripts

ocbe-uio

BayesMallows:Bayesian Preference Learning with the Mallows Rank Model

An implementation of the Bayesian version of the Mallows rank model (Vitelli et al., Journal of Machine Learning Research, 2018 <https://jmlr.org/papers/v18/15-481.html>; Crispino et al., Annals of Applied Statistics, 2019 <doi:10.1214/18-AOAS1203>; Sorensen et al., R Journal, 2020 <doi:10.32614/RJ-2020-026>; Stein, PhD Thesis, 2023 <https://eprints.lancs.ac.uk/id/eprint/195759>). Both Metropolis-Hastings and sequential Monte Carlo algorithms for estimating the models are available. Cayley, footrule, Hamming, Kendall, Spearman, and Ulam distances are supported in the models. The rank data to be analyzed can be in the form of complete rankings, top-k rankings, partially missing rankings, as well as consistent and inconsistent pairwise preferences. Several functions for plotting and studying the posterior distributions of parameters are provided. The package also provides functions for estimating the partition function (normalizing constant) of the Mallows rank model, both with the importance sampling algorithm of Vitelli et al. and asymptotic approximation with the IPFP algorithm (Mukherjee, Annals of Statistics, 2016 <doi:10.1214/15-AOS1389>).

Maintained by Oystein Sorensen. Last updated 2 months ago.

mallows-model openblas cpp openmp

21 stars 7.91 score 36 scripts 1 dependents

patriciamar

ShinyItemAnalysis:Test and Item Analysis via Shiny

Package including functions and interactive shiny application for the psychometric analysis of educational tests, psychological assessments, health-related and other types of multi-item measurements, or ratings from multiple raters.

Maintained by Patricia Martinkova. Last updated 12 days ago.

assessment differential-item-functioning item-analysis item-response-theory psychometrics shiny

45 stars 7.88 score 105 scripts 3 dependents

bioc

EBSeq:An R package for gene and isoform differential expression analysis of RNA-seq data

Differential Expression analysis at both gene and isoform level using RNA-seq data

Maintained by Xiuyu Ma. Last updated 10 days ago.

immunooncology statisticalmethod differentialexpression multiplecomparison rnaseq sequencing cpp

7.86 score 162 scripts 6 dependents

bioc

biodb:biodb, a library and a development framework for connecting to chemical and biological databases

The biodb package provides access to standard remote chemical and biological databases (ChEBI, KEGG, HMDB, ...), as well as to in-house local database files (CSV, SQLite), with easy retrieval of entries, access to web services, search of compounds by mass and/or name, and mass spectra matching for LCMS and MSMS. Its architecture as a development framework facilitates the development of new database connectors for local projects or inside separate published packages.

Maintained by Pierrick Roger. Last updated 5 months ago.

software infrastructure dataimport kegg biology cheminformatics chemistry databases cpp

11 stars 7.85 score 24 scripts 6 dependents

emilopezcano

SixSigma:Six Sigma Tools for Quality Control and Improvement

Functions and utilities to perform Statistical Analyses in the Six Sigma way. Through the DMAIC cycle (Define, Measure, Analyze, Improve, Control), you can manage several Quality Management studies: Gage R&R, Capability Analysis, Control Charts, Loss Function Analysis, etc. Data frames used in the books "Six Sigma with R" [ISBN 978-1-4614-3652-2] and "Quality Control with R" [ISBN 978-3-319-24046-6], are also included in the package.

Maintained by Emilio L. Cano. Last updated 2 years ago.

quality-control quality-improvement six-sigma spc

15 stars 7.82 score 169 scripts 1 dependents

matloff

dsld:Data Science Looks at Discrimination

Statistical and graphical tools for detecting and measuring discrimination and bias, be it racial, gender, age or other. Detection and remediation of bias in machine learning algorithms. 'Python' interfaces available.

Maintained by Norm Matloff. Last updated 2 months ago.

12 stars 7.81 score 35 scripts

epiverse-trace

simulist:Simulate Disease Outbreak Line List and Contacts Data

Tools to simulate realistic raw case data for an epidemic in the form of line lists and contacts using a branching process. Simulated outbreaks are parameterised with epidemiological parameters and can have age-structured populations, age-stratified hospitalisation and death risk and time-varying case fatality risk.

Maintained by Joshua W. Lambert. Last updated 5 days ago.

epidemiology epiverse linelist outbreaks

8 stars 7.79 score 27 scripts

mazamascience

MazamaCoreUtils:Utility Functions for Production R Code

A suite of utility functions providing functionality commonly needed for production level projects such as logging, error handling, cache management and date-time parsing. Functions for date-time parsing and formatting require that time zones be specified explicitly, avoiding a common source of error when working with environmental time series.

Maintained by Jonathan Callahan. Last updated 4 months ago.

4 stars 7.76 score 119 scripts 5 dependents

gesistsa

oolong:Create Validation Tests for Automated Content Analysis

Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, <https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models>) and word set intrusion (Ying et al. 2021) <doi:10.1017/pan.2021.33> tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) <doi:10.1080/10584609.2020.1723752>.

Maintained by Chung-hong Chan. Last updated 1 months ago.

textanalysis topicmodeling validation

55 stars 7.58 score 23 scripts

biogenies

tidysq:Tidy Processing and Analysis of Biological Sequences

A tidy approach to analysis of biological sequences. All processing and data-storage functions are heavily optimized to allow the fastest and most efficient data storage.

Maintained by Dominik Rafacz. Last updated 3 months ago.

bioconductor bioinformatics biological-sequences fasta s3 sequences tibble tidy tidyverse vctrs cpp

40 stars 7.56 score 38 scripts

wincowgerdev

OpenSpecy:Analyze, Process, Identify, and Share Raman and (FT)IR Spectra

Raman and (FT)IR spectral analysis tool for plastic particles and other environmental samples (Cowger et al. 2021, <doi:10.1021/acs.analchem.1c00123>). With read_any(), Open Specy provides a single function for reading individual, batch, or map spectral data files like .asp, .csv, .jdx, .spc, .spa, .0, and .zip. process_spec() simplifies processing spectra, including smoothing, baseline correction, range restriction and flattening, intensity conversions, wavenumber alignment, and min-max normalization. Spectra can be identified in batch using an onboard reference library (Cowger et al. 2020, <doi:10.1177/0003702820929064>) using match_spec(). A Shiny app is available via run_app() or online at <https://openanalysis.org/openspecy/>.

Maintained by Win Cowger. Last updated 1 months ago.

29 stars 7.55 score 22 scripts

sem-in-r

seminr:Building and Estimating Structural Equation Models

A powerful, easy to syntax for specifying and estimating complex Structural Equation Models. Models can be estimated using Partial Least Squares Path Modeling or Covariance-Based Structural Equation Modeling or covariance based Confirmatory Factor Analysis. Methods described in Ray, Danks, and Valdez (2021).

Maintained by Nicholas Patrick Danks. Last updated 3 years ago.

common-factors composites construct pls-models

62 stars 7.46 score 284 scripts

koheiw

seededlda:Seeded Sequential LDA for Topic Modeling

Seeded Sequential LDA can classify sentences of texts into pre-define topics with a small number of seed words (Watanabe & Baturo, 2023) <doi:10.1177/08944393231178605>. Implements Seeded LDA (Lu et al., 2010) <doi:10.1109/ICDMW.2011.125> and Sequential LDA (Du et al., 2012) <doi:10.1007/s10115-011-0425-1> with the distributed LDA algorithm (Newman, et al., 2009) for parallel computing.

Maintained by Kohei Watanabe. Last updated 2 months ago.

semi-supervised-learning text-classification onetbb cpp

75 stars 7.38 score 177 scripts 1 dependents

bioc

cogena:co-expressed gene-set enrichment analysis

cogena is a workflow for co-expressed gene-set enrichment analysis. It aims to discovery smaller scale, but highly correlated cellular events that may be of great biological relevance. A novel pipeline for drug discovery and drug repositioning based on the cogena workflow is proposed. Particularly, candidate drugs can be predicted based on the gene expression of disease-related data, or other similar drugs can be identified based on the gene expression of drug-related data. Moreover, the drug mode of action can be disclosed by the associated pathway analysis. In summary, cogena is a flexible workflow for various gene set enrichment analysis for co-expressed genes, with a focus on pathway/GO analysis and drug repositioning.

Maintained by Zhilong Jia. Last updated 5 months ago.

clustering genesetenrichment geneexpression visualization pathways kegg go microarray sequencing systemsbiology datarepresentation dataimport bioconductor bioinformatics

12 stars 7.36 score 32 scripts

hedgehogqa

hedgehog:Property-Based Testing

Hedgehog will eat all your bugs. 'Hedgehog' is a property-based testing package in the spirit of 'QuickCheck'. With 'Hedgehog', one can test properties of their programs against randomly generated input, providing far superior test coverage compared to unit testing. One of the key benefits of 'Hedgehog' is integrated shrinking of counterexamples, which allows one to quickly find the cause of bugs, given salient examples when incorrect behaviour occurs.

Maintained by Huw Campbell. Last updated 4 years ago.

56 stars 7.33 score 63 scripts 1 dependents

nerler

JointAI:Joint Analysis and Imputation of Incomplete Data

Joint analysis and imputation of incomplete data in the Bayesian framework, using (generalized) linear (mixed) models and extensions there of, survival models, or joint models for longitudinal and survival data, as described in Erler, Rizopoulos and Lesaffre (2021) <doi:10.18637/jss.v100.i20>. Incomplete covariates, if present, are automatically imputed. The package performs some preprocessing of the data and creates a 'JAGS' model, which will then automatically be passed to 'JAGS' <https://mcmc-jags.sourceforge.io/> with the help of the package 'rjags'.

Maintained by Nicole S. Erler. Last updated 12 months ago.

bayesian generalized-linear-models glm glmm imputation imputations jags joint-analysis linear-mixed-models linear-regression-models mcmc-sample mcmc-sampling missing-data missing-values survival cpp

28 stars 7.30 score 59 scripts 1 dependents

hoxo-m

githubinstall:A Helpful Way to Install R Packages Hosted on GitHub

Provides an helpful way to install packages hosted on GitHub.

Maintained by Koji Makiyama. Last updated 7 years ago.

r-language

49 stars 7.29 score 177 scripts

wwiecek

baggr:Bayesian Aggregate Treatment Effects

Running and comparing meta-analyses of data with hierarchical Bayesian models in Stan, including convenience functions for formatting data, plotting and pooling measures specific to meta-analysis. This implements many models from Meager (2019) <doi:10.1257/app.20170299>.

Maintained by Witold Wiecek. Last updated 7 days ago.

bayesian-statistics meta-analysis quantile-regression stan treatment-effects cpp

49 stars 7.24 score 88 scripts

inbo

checklist:A Thorough and Strict Set of Checks for R Packages and Source Code

An opinionated set of rules for R packages and R source code projects.

Maintained by Thierry Onkelinx. Last updated 1 months ago.

checklist continuous-integration continuous-testing quality-assurance

19 stars 7.24 score 21 scripts 2 dependents

insightsengineering

tern.mmrm:Tables and Graphs for Mixed Models for Repeated Measures (MMRM)

Mixed models for repeated measures (MMRM) are a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials and beyond; see for example Cnaan, Laird and Slasor (1997) <doi:10.1002/(SICI)1097-0258(19971030)16:20%3C2349::AID-SIM667%3E3.0.CO;2-E>. This package provides an interface for fitting MMRM within the 'tern' <https://cran.r-project.org/package=tern> framework by Zhu et al. (2023) and tabulate results easily using 'rtables' <https://cran.r-project.org/package=rtables> by Becker et al. (2023). It builds on 'mmrm' <https://cran.r-project.org/package=mmrm> by Sabanés Bové et al. (2023) for the actual MMRM computations.

Maintained by Joe Zhu. Last updated 6 months ago.

graphs listings statistical-engineering tables

6 stars 7.23 score 8 scripts 1 dependents

pik-piam

lucode2:Code Manipulation and Analysis Tools

A collection of tools which allow to manipulate and analyze code.

Maintained by Jan Philipp Dietrich. Last updated 10 days ago.

7.22 score 364 scripts 8 dependents

luckinet

tabshiftr:Reshape Disorganised Messy Data

Helps the user to build and register schema descriptions of disorganised (messy) tables. Disorganised tables are tables that are not in a topologically coherent form, where packages such as 'tidyr' could be used for reshaping. The schema description documents the arrangement of input tables and is used to reshape them into a standardised (tidy) output format.

Maintained by Steffen Ehrmann. Last updated 1 months ago.

data-management data-reshaping schemas

6 stars 7.13 score 62 scripts 1 dependents

melissagwolf

dynamic:DFI Cutoffs for Latent Variable Models

Returns dynamic fit index (DFI) cutoffs for latent variable models that are tailored to the user's model statement, model type, and sample size. This is the counterpart of the Shiny Application, <https://dynamicfit.app>.

Maintained by Melissa G. Wolf. Last updated 3 months ago.

16 stars 7.13 score 139 scripts

ericgiunta

Colossus:"Risk Model Regression and Analysis with Complex Non-Linear Models"

Performs survival analysis using general non-linear models. Risk models can be the sum or product of terms. Each term is the product of exponential/linear functions of covariates. Additionally sub-terms can be defined as a sum of exponential, linear threshold, and step functions. Cox Proportional hazards <https://en.wikipedia.org/wiki/Proportional_hazards_model>, Poisson <https://en.wikipedia.org/wiki/Poisson_regression>, and Fine-Gray competing risks <https://www.publichealth.columbia.edu/research/population-health-methods/competing-risk-analysis> regression are supported. This work was sponsored by NASA Grant 80NSSC19M0161 through a subcontract from the National Council on Radiation Protection and Measurements (NCRP). The computing for this project was performed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CNS-1006860, EPS-1006860, EPS-0919443, ACI-1440548, CHE-1726332, and NIH P20GM113109.

Maintained by Eric Giunta. Last updated 3 days ago.

cpp openmp

1 stars 7.11 score 36 scripts

niehs

amadeus:Accessing and Analyzing Large-Scale Environmental Data

Functions are designed to facilitate access to and utility with large scale, publicly available environmental data in R. The package contains functions for downloading raw data files from web URLs (download_data()), processing the raw data files into clean spatial objects (process_covariates()), and extracting values from the spatial data objects at point and polygon locations (calculate_covariates()). These functions call a series of source-specific functions which are tailored to each data sources/datasets particular URL structure, data format, and spatial/temporal resolution. The functions are tested, versioned, and open source and open access. For sum_edc() method details, see Messier, Akita, and Serre (2012) <doi:10.1021/es203152a>.

Maintained by Kyle Messier. Last updated 1 months ago.

8 stars 7.07 score 13 scripts

loelschlaeger

fHMM:Fitting Hidden Markov Models to Financial Data

Fitting (hierarchical) hidden Markov models to financial data via maximum likelihood estimation. See Oelschläger, L. and Adam, T. "Detecting Bearish and Bullish Markets in Financial Time Series Using Hierarchical Hidden Markov Models" (2021, Statistical Modelling) <doi:10.1177/1471082X211034048> for a reference on the method. A user guide is provided by the accompanying software paper "fHMM: Hidden Markov Models for Financial Time Series in R", Oelschläger, L., Adam, T., and Michels, R. (2024, Journal of Statistical Software) <doi:10.18637/jss.v109.i09>.

Maintained by Lennart Oelschläger. Last updated 8 days ago.

finance hidden-markov-models cpp openmp

17 stars 7.04 score 5 scripts

r-lib

roxygen2md:'Roxygen' to 'Markdown'

Converts elements of 'roxygen' documentation to 'markdown'.

Maintained by Kirill Müller. Last updated 4 months ago.

documentation markdown

68 stars 7.00 score 11 scripts 2 dependents

doccstat

fastcpd:Fast Change Point Detection via Sequential Gradient Descent

Implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn <https://proceedings.mlr.press/v206/zhang23b.html>. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function.

Maintained by Xingchi Li. Last updated 12 hours ago.

change-point-detection cpp custom-function gradient-descent lasso linear-regression logistic-regression offline pelt penalized-regression poisson-regression quasi-newton statistics time-series warm-start fortran openblas cpp openmp

21 stars 6.98 score 7 scripts

bioc

CoGAPS:Coordinated Gene Activity in Pattern Sets

Coordinated Gene Activity in Pattern Sets (CoGAPS) implements a Bayesian MCMC matrix factorization algorithm, GAPS, and links it to gene set statistic methods to infer biological process activity. It can be used to perform sparse matrix factorization on any data, and when this data represents biomolecules, to do gene set analysis.

Maintained by Elana J. Fertig. Last updated 19 days ago.

geneexpression transcription genesetenrichment differentialexpression bayesian clustering timecourse rnaseq microarray multiplecomparison dimensionreduction immunooncology cpp

6.97 score 104 scripts

thinkr-open

thinkr:Tools for Cleaning Up Messy Files

Some tools for cleaning up messy 'Excel' files to be suitable for R. People who have been working with 'Excel' for years built more or less complicated sheets with names, characters, formats that are not homogeneous. To be able to use them in R nowadays, we built a set of functions that will avoid the majority of importation problems and keep all the data at best.

Maintained by Vincent Guyader. Last updated 3 years ago.

hacktoberfest thinkr-not-maintained

29 stars 6.96 score 45 scripts

google

patrick:Parameterized Unit Testing

This is an extension of the 'testthat' package that lets you add parameters to your unit tests. Parameterized unit tests are often easier to read and more reliable, since they follow the DNRY (do not repeat yourself) rule.

Maintained by Michael Quinn. Last updated 23 days ago.

139 stars 6.92 score 19 scripts

lcrawlab

mvMAPIT:Multivariate Genome Wide Marginal Epistasis Test

Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this package, we present the 'multivariate MArginal ePIstasis Test' ('mvMAPIT') – a multi-outcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact – thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search based methods. Our proposed 'mvMAPIT' builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate 'mvMAPIT' as a multivariate linear mixed model and develop a multi-trait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. Crawford et al. (2017) <doi:10.1371/journal.pgen.1006869>. Stamp et al. (2023) <doi:10.1093/g3journal/jkad118>.

Maintained by Julian Stamp. Last updated 5 months ago.

cpp epistasis epistasis-analysis gwas gwas-tools linear-mixed-models mapit mvmapit variance-components openblas cpp openmp

11 stars 6.90 score 17 scripts 1 dependents

pik-piam

edgeTransport:Prepare EDGE Transport Data for the REMIND model

EDGE-T is a fork of the GCAM transport module https://jgcri.github.io/gcam-doc/energy.html#transportation with a high level of detail in its representation of technological and modal options. It is a partial equilibrium model with a nested multinomial logit structure and relies on the modified logit formulation. Most of the sources are not publicly available. PIK-internal users can find the sources in the distributed file system in the folder `/p/projects/rd3mod/inputdata/sources/EDGE-Transport-Standalone`.

Maintained by Johanna Hoppe. Last updated 2 days ago.

5 stars 6.84 score 16 scripts 2 dependents

kozodoi

fairness:Algorithmic Fairness Metrics

Offers calculation, visualization and comparison of algorithmic fairness metrics. Fair machine learning is an emerging topic with the overarching aim to critically assess whether ML algorithms reinforce existing social biases. Unfair algorithms can propagate such biases and produce predictions with a disparate impact on various sensitive groups of individuals (defined by sex, gender, ethnicity, religion, income, socioeconomic status, physical or mental disabilities). Fair algorithms possess the underlying foundation that these groups should be treated similarly or have similar prediction outcomes. The fairness R package offers the calculation and comparisons of commonly and less commonly used fairness metrics in population subgroups. These methods are described by Calders and Verwer (2010) <doi:10.1007/s10618-010-0190-x>, Chouldechova (2017) <doi:10.1089/big.2016.0047>, Feldman et al. (2015) <doi:10.1145/2783258.2783311> , Friedler et al. (2018) <doi:10.1145/3287560.3287589> and Zafar et al. (2017) <doi:10.1145/3038912.3052660>. The package also offers convenient visualizations to help understand fairness metrics.

Maintained by Nikita Kozodoi. Last updated 2 years ago.

algorithmic-discrimination algorithmic-fairness discrimination disparate-impact fairness fairness-ai fairness-ml machine-learning

32 stars 6.82 score 69 scripts 1 dependents

ropensci

ohun:Optimizing Acoustic Signal Detection

Facilitates the automatic detection of acoustic signals, providing functions to diagnose and optimize the performance of detection routines. Detections from other software can also be explored and optimized. This package has been peer-reviewed by rOpenSci. Araya-Salas et al. (2022) <doi:10.1101/2022.12.13.520253>.

Maintained by Marcelo Araya-Salas. Last updated 5 months ago.

audio-processing bioacoustics sound-event-detection spectrogram streamline-analysis

14 stars 6.78 score 24 scripts 1 dependents

bstaton1

postpack:Utilities for Processing Posterior Samples Stored in 'mcmc.lists'

The aim of 'postpack' is to provide the infrastructure for a standardized workflow for 'mcmc.list' objects. These objects can be used to store output from models fitted with Bayesian inference using 'JAGS', 'WinBUGS', 'OpenBUGS', 'NIMBLE', 'Stan', or even custom MCMC algorithms. Although the 'coda' R package provides some methods for these objects, it is somewhat limited in easily performing post-processing tasks for specific nodes. Models are ever increasing in their complexity and the number of tracked nodes, and oftentimes a user may wish to summarize/diagnose sampling behavior for only a small subset of nodes at a time for a particular question or figure. Thus, many 'postpack' functions support performing tasks on a subset of nodes, where the subset is specified with regular expressions. The functions in 'postpack' streamline the extraction, summarization, and diagnostics of specific monitored nodes after model fitting. Further, because there is rarely only ever one model under consideration, 'postpack' scales efficiently to perform the same tasks on output from multiple models simultaneously, facilitating rapid assessment of model sensitivity to changes in assumptions.

Maintained by Ben Staton. Last updated 2 years ago.

2 stars 6.75 score 233 scripts 1 dependents

thinkr-open

checkhelper:Deal with Check Outputs

Deal with packages 'check' outputs and reduce the risk of rejection by 'CRAN' by following policies.

Maintained by Sebastien Rochette. Last updated 1 years ago.

34 stars 6.74 score 18 scripts

frbcesab

rcompendium:Create a Package or Research Compendium Structure

Makes easier the creation of R package or research compendium (i.e. a predefined files/folders structure) so that users can focus on the code/analysis instead of wasting time organizing files. A full ready-to-work structure is set up with some additional features: version control, remote repository creation, CI/CD configuration (check package integrity under several OS, test code with 'testthat', and build and deploy website using 'pkgdown'). This package heavily relies on the R packages 'devtools' and 'usethis' and follows recommendations made by Wickham H. (2015) <ISBN:9781491910597> and Marwick B. et al. (2018) <doi:10.7287/peerj.preprints.3192v2>.

Maintained by Nicolas Casajus. Last updated 2 months ago.

reproducible-research research-compendium

40 stars 6.72 score 22 scripts

imbi-heidelberg

DescrTab2:Publication Quality Descriptive Statistics Tables

Provides functions to create descriptive statistics tables for continuous and categorical variables. By default, summary statistics such as mean, standard deviation, quantiles, minimum and maximum for continuous variables and relative and absolute frequencies for categorical variables are calculated. 'DescrTab2' features a sophisticated algorithm to choose appropriate test statistics for your data and provides p-values. On top of this, confidence intervals for group differences of appropriated summary measures are automatically produces for two-group comparison. Tables generated by 'DescrTab2' can be integrated in a variety of document formats, including .html, .tex and .docx documents. 'DescrTab2' also allows printing tables to console and saving table objects for later use.

Maintained by Jan Meis. Last updated 1 years ago.

categorical-variables continuous-variable descriptive-statistics p-values statistical-tests statistics

9 stars 6.71 score 19 scripts 1 dependents

bioc

syntenet:Inference And Analysis Of Synteny Networks

syntenet can be used to infer synteny networks from whole-genome protein sequences and analyze them. Anchor pairs are detected with the MCScanX algorithm, which was ported to this package with the Rcpp framework for R and C++ integration. Anchor pairs from synteny analyses are treated as an undirected unweighted graph (i.e., a synteny network), and users can perform: i. network clustering; ii. phylogenomic profiling (by identifying which species contain which clusters) and; iii. microsynteny-based phylogeny reconstruction with maximum likelihood.

Maintained by Fabrício Almeida-Silva. Last updated 3 months ago.

software networkinference functionalgenomics comparativegenomics phylogenetics systemsbiology graphandnetwork wholegenome network comparative-genomics evolutionary-genomics network-science phylogenomics synteny synteny-network cpp

28 stars 6.70 score 12 scripts 1 dependents

correlaid

newsanchor:Client for the News API

Interface to gather news from the 'News API', based on a multilevel query <https://newsapi.org/>. A personal API key is required.

Maintained by Yannik Buhl. Last updated 5 years ago.

36 stars 6.70 score 40 scripts

bioc

megadepth:megadepth: BigWig and BAM related utilities

This package provides an R interface to Megadepth by Christopher Wilks available at https://github.com/ChristopherWilks/megadepth. It is particularly useful for computing the coverage of a set of genomic regions across bigWig or BAM files. With this package, you can build base-pair coverage matrices for regions or annotations of your choice from BigWig files. Megadepth was used to create the raw files provided by https://bioconductor.org/packages/recount3.

Maintained by David Zhang. Last updated 4 months ago.

software coverage dataimport transcriptomics rnaseq preprocessing bam bigwig daspter megadepth recount2 recount3

12 stars 6.69 score 7 scripts 3 dependents

ropensci

baRulho:Quantifying (Animal) Sound Degradation

Intended to facilitate acoustic analysis of (animal) sound propagation experiments, which typically aim to quantify changes in signal structure when transmitted in a given habitat by broadcasting and re-recording animal sounds at increasing distances. The package offers a workflow with functions to prepare the data set for analysis as well as to calculate and visualize several degradation metrics, including blur ratio, signal-to-noise ratio, excess attenuation and envelope correlation among others (Dabelsteen et al 1993 <doi:10.1121/1.406682>).

Maintained by Marcelo Araya-Salas. Last updated 5 days ago.

acoustic-signals animal behavior bioacoustics

6 stars 6.66 score 18 scripts

simnph

SimNPH:Simulate Non-Proportional Hazards

A toolkit for simulation studies concerning time-to-event endpoints with non-proportional hazards. 'SimNPH' encompasses functions for simulating time-to-event data in various scenarios, simulating different trial designs like fixed-followup, event-driven, and group sequential designs. The package provides functions to calculate the true values of common summary statistics for the implemented scenarios and offers common analysis methods for time-to-event data. Helper functions for running simulations with the 'SimDesign' package and for aggregating and presenting the results are also included. Results of the conducted simulation study are available in the paper: "A Comparison of Statistical Methods for Time-To-Event Analyses in Randomized Controlled Trials Under Non-Proportional Hazards", Klinglmüller et al. (2025) <doi:10.1002/sim.70019>.

Maintained by Tobias Fellinger. Last updated 27 days ago.

clinical-trial-simulations non-proportional-hazards statistical-simulation statistics survival-analysis

6 stars 6.63 score 43 scripts

mazamascience

AirMonitor:Air Quality Data Analysis

Utilities for working with hourly air quality monitoring data with a focus on small particulates (PM2.5). A compact data model is structured as a list with two dataframes. A 'meta' dataframe contains spatial and measuring device metadata associated with deployments at known locations. A 'data' dataframe contains a 'datetime' column followed by columns of measurements associated with each "device-deployment". Algorithms to calculate NowCast and the associated Air Quality Index (AQI) are defined at the US Environmental Projection Agency AirNow program: <https://document.airnow.gov/technical-assistance-document-for-the-reporting-of-daily-air-quailty.pdf>.

Maintained by Jonathan Callahan. Last updated 6 months ago.

7 stars 6.57 score 178 scripts

insightsengineering

tern.rbmi:Create Interface for 'RBMI' and 'tern'

'RBMI' implements standard and reference based multiple imputation methods for continuous longitudinal endpoints (Gower-Page et al. (2022) <doi:10.21105/joss.04251>). This package provides an interface for 'RBMI' uses the 'tern' <https://cran.r-project.org/package=tern> framework by Zhu et al. (2023) and tabulate results easily using 'rtables' <https://cran.r-project.org/package=rtables> by Becker et al. (2023).

Maintained by Joe Zhu. Last updated 24 days ago.

graphs listings tables

3 stars 6.53 score 3 scripts

jakesherman

easypackages:Easy Loading and Installing of Packages

Easily load and install multiple packages from different sources, including CRAN and GitHub. The libraries function allows you to load or attach multiple packages in the same function call. The packages function will load one or more packages, and install any packages that are not installed on your system (after prompting you). Also included is a from_import function that allows you to import specific functions from a package into the global environment.

Maintained by Jake Sherman. Last updated 7 years ago.

packages

11 stars 6.52 score 490 scripts

gisma

uavRmp:UAV Mission Planner

The Unmanned Aerial Vehicle Mission Planner provides an easy to use work flow for planning autonomous obstacle avoiding surveys of ready to fly unmanned aerial vehicles to retrieve aerial or spot related data. It creates either intermediate flight control files for the DJI-Litchi supported series or ready to upload control files for the pixhawk-based flight controller as used in the 3DR-Solo or Yuneec series. Additionally it contains some useful tools for digitizing and data manipulation.

Maintained by Chris Reudenbach. Last updated 10 months ago.

cultural-heritage dji drone flight-planning forest-mapping litchi low-budget-uav mission-planning photogrammetry pixhawk pixhawk-controller qgroundcontrol2litchi solo survey terrain-following terrain-mapping uavs yuneec

25 stars 6.48 score 6 scripts

bachmannpatrick

CLVTools:Tools for Customer Lifetime Value Estimation

A set of state-of-the-art probabilistic modeling approaches to derive estimates of individual customer lifetime values (CLV). Commonly, probabilistic approaches focus on modelling 3 processes, i.e. individuals' attrition, transaction, and spending process. Latent customer attrition models, which are also known as "buy-'til-you-die models", model the attrition as well as the transaction process. They are used to make inferences and predictions about transactional patterns of individual customers such as their future purchase behavior. Moreover, these models have also been used to predict individuals’ long-term engagement in activities such as playing an online game or posting to a social media platform. The spending process is usually modelled by a separate probabilistic model. Combining these results yields in lifetime values estimates for individual customers. This package includes fast and accurate implementations of various probabilistic models for non-contractual settings (e.g., grocery purchases or hotel visits). All implementations support time-invariant covariates, which can be used to control for e.g., socio-demographics. If such an extension has been proposed in literature, we further provide the possibility to control for time-varying covariates to control for e.g., seasonal patterns. Currently, the package includes the following latent attrition models to model individuals' attrition and transaction process: [1] Pareto/NBD model (Pareto/Negative-Binomial-Distribution), [2] the Extended Pareto/NBD model (Pareto/Negative-Binomial-Distribution with time-varying covariates), [3] the BG/NBD model (Beta-Gamma/Negative-Binomial-Distribution) and the [4] GGom/NBD (Gamma-Gompertz/Negative-Binomial-Distribution). Further, we provide an implementation of the Gamma/Gamma model to model the spending process of individuals.

Maintained by Patrick Bachmann. Last updated 4 months ago.

clv customer-lifetime-value customer-relationship-management openblas gsl cpp openmp

55 stars 6.47 score 12 scripts

bstatcomp

bayes4psy:User Friendly Bayesian Data Analysis for Psychology

Contains several Bayesian models for data analysis of psychological tests. A user friendly interface for these models should enable students and researchers to perform professional level Bayesian data analysis without advanced knowledge in programming and Bayesian statistics. This package is based on the Stan platform (Carpenter et el. 2017 <doi:10.18637/jss.v076.i01>).

Maintained by Jure Demšar. Last updated 2 years ago.

cpp

14 stars 6.44 score 33 scripts

bioc

doubletrouble:Identification and classification of duplicated genes

doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.

Maintained by Fabrício Almeida-Silva. Last updated 18 days ago.

software wholegenome comparativegenomics functionalgenomics phylogenetics network classification bioinformatics comparative-genomics gene-duplication molecular-evolution whole-genome-duplication

23 stars 6.44 score 17 scripts

alexpghayes

modeltests:Testing Infrastructure for Broom Model Generics

Provides a number of testthat tests that can be used to verify that tidy(), glance() and augment() methods meet consistent specifications. This allows methods for the same generic to be spread across multiple packages, since all of those packages can make the same guarantees to users about returned objects.

Maintained by Alex Hayes. Last updated 11 months ago.

6 stars 6.42 score 396 scripts

jakubsob

cucumber:Behavior-Driven Development for R

Write executable specifications in a natural language that describes how your code should behave. Write specifications in feature files using 'Gherkin' language and execute them using functions implemented in R. Use them as an extension to your 'testthat' tests to provide a high level description of how your code works.

Maintained by Jakub Sobolewski. Last updated 11 days ago.

acceptance-testing behavior-driven-development cucumber testing

13 stars 6.37 score 10 scripts

bioc

MSstatsShiny:MSstats GUI for Statistical Anaylsis of Proteomics Experiments

MSstatsShiny is an R-Shiny graphical user interface (GUI) integrated with the R packages MSstats, MSstatsTMT, and MSstatsPTM. It provides a point and click end-to-end analysis pipeline applicable to a wide variety of experimental designs. These include data-dependedent acquisitions (DDA) which are label-free or tandem mass tag (TMT)-based, as well as DIA, SRM, and PRM acquisitions and those targeting post-translational modifications (PTMs). The application automatically saves users selections and builds an R script that recreates their analysis, supporting reproducible data analysis.

Maintained by Devon Kohler. Last updated 5 months ago.

immunooncology massspectrometry proteomics software shinyapps differentialexpression onechannel twochannel normalization qualitycontrol gui

15 stars 6.31 score 4 scripts

pik-piam

mrremind:MadRat REMIND Input Data Package

The mrremind packages contains data preprocessing for the REMIND model.

Maintained by Lavinia Baumstark. Last updated 3 days ago.

4 stars 6.25 score 15 scripts 1 dependents

ropensci

autotest:Automatic Package Testing

Automatic testing of R packages via a simple YAML schema.

Maintained by Mark Padgham. Last updated 5 months ago.

automated-testing fuzzing testing

54 stars 6.21 score 25 scripts

ms-quality-hub

rmzqc:Creation, Reading and Validation of 'mzqc' Files

Reads, writes and validates 'mzQC' files. The 'mzQC' format is a standardized file format for the exchange, transmission, and archiving of quality metrics derived from biological mass spectrometry data, as defined by the HUPO-PSI (Human Proteome Organisation - Proteomics Standards Initiative) Quality Control working group. See <https://hupo-psi.github.io/mzQC/> for details.

Maintained by Chris Bielow. Last updated 3 days ago.

hacktoberfest mass-spectrometry mzqc quality-control

3 stars 6.21 score 10 scripts 3 dependents

mansmeg

markmyassignment:Automatic Marking of R Assignments

Automatic marking of R assignments for students and teachers based on 'testthat' test suites.

Maintained by Mans Magnusson. Last updated 1 years ago.

5 stars 6.19 score 155 scripts

nhs-r-community

NHSRwaitinglist:R-Package to Implement a Waiting List Management Using Queuing Theory

R-package to implement the waiting list management approach described in Fong et al. 2022 <doi:10.1101/2022.08.23.22279117>.

Maintained by Chris Mainey. Last updated 4 hours ago.

nhs nhs-r-community queuing-theory waiting-list

18 stars 6.17 score 17 scripts

lucas-castillo

samplr:Compare Human Performance to Sampling Algorithms

Understand human performance from the perspective of sampling, both looking at how people generate samples and how people use the samples they have generated. A longer overview and other resources can be found at <https://sampling.warwick.ac.uk>.

Maintained by Lucas Castillo. Last updated 6 hours ago.

openblas cpp

2 stars 6.15 score 25 scripts

snystrom

cmdfun:Framework for Building Interfaces to Shell Commands

Writing interfaces to command line software is cumbersome. 'cmdfun' provides a framework for building function calls to seamlessly interface with shell commands by allowing lazy evaluation of command line arguments. 'cmdfun' also provides methods for handling user-specific paths to tool installs or secrets like API keys. Its focus is to equally serve package builders who wish to wrap command line software, and to help analysts stay inside R when they might usually leave to execute non-R software.

Maintained by Spencer Nystrom. Last updated 4 years ago.

15 stars 6.13 score 7 scripts 6 dependents

ramikrispin

covid19italy:The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Italy Dataset

Provides a daily summary of the Coronavirus (COVID-19) cases in Italy by country, region and province level. Data source: Presidenza del Consiglio dei Ministri - Dipartimento della Protezione Civile <https://www.protezionecivile.it/>.

Maintained by Rami Krispin. Last updated 2 years ago.

47 stars 6.07 score 25 scripts

ludvigolsen

xpectr:Generates Expectations for 'testthat' Unit Testing

Helps systematize and ease the process of building unit tests with the 'testthat' package by providing tools for generating expectations.

Maintained by Ludvig Renbo Olsen. Last updated 26 days ago.

37 stars 6.06 score 62 scripts

kforner

srcpkgs:R Source Packages Manager

Manage a collection/library of R source packages. Discover, document, load, test source packages. Enable to use those packages as if they were actually installed. Quickly reload only what is needed on source code change. Run tests and checks in parallel.

Maintained by Karl Forner. Last updated 10 months ago.

11 stars 6.04 score 6 scripts

irods

rirods:R Client for 'iRODS'

The open sourced data management software 'Integrated Rule-Oriented Data System' ('iRODS') offers solutions for the whole data life cycle (<https://irods.org/>). The loosely constructed and highly configurable architecture of 'iRODS' frees the user from strict formatting constraints and single-vendor solutions. This package provides an interface to the 'iRODS' HTTP API, allowing you to manage your data and metadata in 'iRODS' with R. Storage of annotated files and R objects in 'iRODS' ensures findability, accessibility, interoperability, and reusability of data.

Maintained by Martin Schobben. Last updated 1 years ago.

irods irods-client

7 stars 6.04 score 31 scripts

bioc

INDEED:Interactive Visualization of Integrated Differential Expression and Differential Network Analysis for Biomarker Candidate Selection Package

An R package for integrated differential expression and differential network analysis based on omic data for cancer biomarker discovery. Both correlation and partial correlation can be used to generate differential network to aid the traditional differential expression analysis to identify changes between biomolecules on both their expression and pairwise association levels. A detailed description of the methodology has been published in Methods journal (PMID: 27592383). An interactive visualization feature allows for the exploration and selection of candidate biomarkers.

Maintained by Ressom group. Last updated 5 months ago.

immunooncology software researchfield biologicalquestion statisticalmethod differentialexpression massspectrometry metabolomics

5 stars 6.02 score 10 scripts

marce10

Rraven:Connecting R and 'Raven' Sound Analysis Software

A tool to exchange data between R and 'Raven' sound analysis software (Cornell Lab of Ornithology). Functions work on data formats compatible with the R package 'warbleR'.

Maintained by Marcelo Araya-Salas. Last updated 3 months ago.

animal raven sounds

10 stars 6.00 score 50 scripts

ixpantia

ixplorer:Easy DataOps for R Users

Create and view tickets in 'gitea', a self-hosted git service <https://about.gitea.com>, using an 'RStudio' addin, and use helper functions to publish documentation and use git.

Maintained by Frans van Dunne. Last updated 5 months ago.

dataops gitea hacktoberfest

2 stars 5.94 score 5 scripts

bioc

PathoStat:PathoStat Statistical Microbiome Analysis Package

The purpose of this package is to perform Statistical Microbiome Analysis on metagenomics results from sequencing data samples. In particular, it supports analyses on the PathoScope generated report files. PathoStat provides various functionalities including Relative Abundance charts, Diversity estimates and plots, tests of Differential Abundance, Time Series visualization, and Core OTU analysis.

Maintained by Solaiappan Manimaran. Last updated 5 months ago.

microbiome metagenomics graphandnetwork microarray patternlogic principalcomponent sequencing software visualization rnaseq immunooncology

8 stars 5.90 score 8 scripts

jeffreyrstevens

flashr:Create Flashcards of Terms and Definitions

Provides functions for creating flashcard decks of terms and definitions. This package creates HTML slides using 'revealjs' that can be viewed in the 'RStudio' viewer or a web browser. Users can create flashcards from either existing built-in decks or create their own from CSV files or vectors of function names.

Maintained by Jeffrey R. Stevens. Last updated 1 years ago.

9 stars 5.89 score 171 scripts

thinhong

denim:Generate and Simulate Deterministic Discrete-Time Compartmental Models

R package to build and simulate deterministic discrete-time compartmental models that can be non-Markov. Length of stay in each compartment can be defined to follow a parametric distribution (d_exponential(), d_gamma(), d_weibull(), d_lognormal()) or a non-parametric distribution (nonparametric()). Other supported types of transition from one compartment to another includes fixed transition (constant()), multinomial (multinomial()), fixed transition probability (transprob()).

Maintained by Anh Phan. Last updated 14 days ago.

cpp

2 stars 5.86 score 8 scripts

jimbrig

lossrx:Actuarial Loss Development and Reserving with R

Actuarial Loss Development and Reserving Helper Functions and ShinyApp.

Maintained by Jimmy Briggs. Last updated 3 months ago.

actuarial-science claims-data claims-reserving data-science insurance modelling property-casualty reserving rshiny workflow

14 stars 5.82 score 7 scripts

guokai8

microbial:Do 16s Data Analysis and Generate Figures

Provides functions to enhance the available statistical analysis procedures in R by providing simple functions to analysis and visualize the 16S rRNA data.Here we present a tutorial with minimum working examples to demonstrate usage and dependencies.

Maintained by Kai Guo. Last updated 6 months ago.

software graphandnetwork microbiome microbiome-analysis

13 stars 5.81 score 25 scripts

cran

exact2x2:Exact Tests and Confidence Intervals for 2x2 Tables

Calculates conditional exact tests (Fisher's exact test, Blaker's exact test, or exact McNemar's test) and unconditional exact tests (including score-based tests on differences in proportions, ratios of proportions, and odds ratios, and Boshcloo's test) with appropriate matching confidence intervals, and provides power and sample size calculations. Gives melded confidence intervals for the binomial case (Fay, et al, 2015, <DOI:10.1111/biom.12231>). Gives boundary-optimized rejection region test (Gabriel, et al, 2018, <DOI:10.1002/sim.7579>), an unconditional exact test for the situation where the controls are all expected to fail. Gives confidence intervals compatible with exact McNemar's or sign tests (Fay and Lumbard, 2021, <DOI:10.1002/sim.8829>). For review of these kinds of exact tests see Fay and Hunsberger (2021, <DOI:10.1214/21-SS131>).

Maintained by Michael P. Fay. Last updated 1 years ago.

3 stars 5.81 score 6 dependents

modeloriented

modelDown:Make Static HTML Website for Predictive Models

Website generator with HTML summaries for predictive models. This package uses 'DALEX' explainers to describe global model behavior. We can see how well models behave (tabs: Model Performance, Auditor), how much each variable contributes to predictions (tabs: Variable Response) and which variables are the most important for a given model (tabs: Variable Importance). We can also compare Concept Drift for pairs of models (tabs: Drifter). Additionally, data available on the website can be easily recreated in current R session. Work on this package was financially supported by the NCN Opus grant 2017/27/B/ST6/01307 at Warsaw University of Technology, Faculty of Mathematics and Information Science.

Maintained by Kamil Romaszko. Last updated 4 years ago.

121 stars 5.80 score 15 scripts

appsilon

shiny.benchmark:Benchmark the Performance of 'shiny' Applications

Compare performance between different versions of a 'shiny' application based on 'git' references.

Maintained by Douglas Azevedo. Last updated 12 months ago.

performance-testing rhinoverse shiny

31 stars 5.79 score 6 scripts

socialresearchcentre

testdat:Data Unit Testing for R

Test your data! An extension of the 'testthat' unit testing framework with a family of functions and reporting tools for checking and validating data frames.

Maintained by Danny Smith. Last updated 10 months ago.

8 stars 5.78 score 50 scripts

bioc

MetID:Network-based prioritization of putative metabolite IDs

This package uses an innovative network-based approach that will enhance our ability to determine the identities of significant ions detected by LC-MS.

Maintained by Zhenzhi Li. Last updated 5 months ago.

assaydomain biologicalquestion infrastructure researchfield statisticalmethod technology workflowstep network kegg

1 stars 5.74 score 110 scripts

jwiley

multilevelTools:Multilevel and Mixed Effects Model Diagnostics and Effect Sizes

Effect sizes, diagnostics and performance metrics for multilevel and mixed effects models. Includes marginal and conditional 'R2' estimates for linear mixed effects models based on Johnson (2014) <doi:10.1111/2041-210X.12225>.

Maintained by Joshua F. Wiley. Last updated 5 days ago.

4 stars 5.74 score 136 scripts

bioc

atSNP:Affinity test for identifying regulatory SNPs

atSNP performs affinity tests of motif matches with the SNP or the reference genomes and SNP-led changes in motif matches.

Maintained by Sunyoung Shin. Last updated 5 months ago.

software chipseq genomeannotation motifannotation visualization cpp

1 stars 5.73 score 36 scripts

limengbinggz

ddtlcm:Latent Class Analysis with Dirichlet Diffusion Tree Process Prior

Implements a Bayesian algorithm for overcoming weak separation in Bayesian latent class analysis. Reference: Li et al. (2023) <arXiv:2306.04700>.

Maintained by Mengbing Li. Last updated 8 months ago.

6 stars 5.73 score 8 scripts

wa-department-of-agriculture

soils:Visualize and Report Soil Health Data

Collection of soil health data visualization and reporting tools, including a RStudio project template with everything you need to generate custom HTML and Microsoft Word reports for each participant in your soil health sampling project.

Maintained by Jadey N Ryan. Last updated 6 days ago.

10 stars 5.73 score 9 scripts

holgstr

fmeffects:Model-Agnostic Interpretations with Forward Marginal Effects

Create local, regional, and global explanations for any machine learning model with forward marginal effects. You provide a model and data, and 'fmeffects' computes feature effects. The package is based on the theory in: C. A. Scholbeck, G. Casalicchio, C. Molnar, B. Bischl, and C. Heumann (2022) <doi:10.48550/arXiv.2201.08837>.

Maintained by Holger Löwe. Last updated 5 months ago.

2 stars 5.73 score 6 scripts

tiago-simoes

EvoPhylo:Pre- And Postprocessing of Morphological Data from Relaxed Clock Bayesian Phylogenetics

Performs automated morphological character partitioning for phylogenetic analyses and analyze macroevolutionary parameter outputs from clock (time-calibrated) Bayesian inference analyses, following concepts introduced by Simões and Pierce (2021) <doi:10.1038/s41559-021-01532-x>.

Maintained by Tiago Simoes. Last updated 2 years ago.

4 stars 5.66 score 19 scripts

mazamascience

MazamaLocationUtils:Manage Spatial Metadata for Known Locations

Utility functions for discovering and managing metadata associated with spatially unique "known locations". Applications include all fields of environmental monitoring (e.g. air and water quality) where data are collected at stationary sites.

Maintained by Jonathan Callahan. Last updated 4 months ago.

5.64 score 108 scripts

selcukorkmaz

PubChemR:Interface to the 'PubChem' Database for Chemical Data Retrieval

Provides an interface to the 'PubChem' database via the PUG REST <https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest> and PUG View <https://pubchem.ncbi.nlm.nih.gov/docs/pug-view> services. This package allows users to automatically access chemical and biological data from 'PubChem', including compounds, substances, assays, and various other data types. Functions are available to retrieve data in different formats, perform searches, and access detailed annotations.

Maintained by Selcuk Korkmaz. Last updated 6 months ago.

2 stars 5.62 score 23 scripts

bflammers

ANN2:Artificial Neural Networks for Anomaly Detection

Training of neural networks for classification and regression tasks using mini-batch gradient descent. Special features include a function for training autoencoders, which can be used to detect anomalies, and some related plotting functions. Multiple activation functions are supported, including tanh, relu, step and ramp. For the use of the step and ramp activation functions in detecting anomalies using autoencoders, see Hawkins et al. (2002) <doi:10.1007/3-540-46145-0_17>. Furthermore, several loss functions are supported, including robust ones such as Huber and pseudo-Huber loss, as well as L1 and L2 regularization. The possible options for optimization algorithms are RMSprop, Adam and SGD with momentum. The package contains a vectorized C++ implementation that facilitates fast training through mini-batch learning.

Maintained by Bart Lammers. Last updated 4 years ago.

anomaly-detection artificial-neural-networks autoencoders neural-networks robust-statistics openblas cpp openmp

13 stars 5.59 score 60 scripts

seankross

swirl:Learn R, in R

Use the R console as an interactive learning environment. Users receive immediate feedback as they are guided through self-paced lessons in data science and R programming.

Maintained by Sean Kross. Last updated 5 years ago.

5.57 score 1.8k scripts 1 dependents

bioc

chevreulPlot:Plots used in the chevreulPlot package

Tools for plotting SingleCellExperiment objects in the chevreulPlot package. Includes functions for analysis and visualization of single-cell data. Supported by NIH grants R01CA137124 and R01EY026661 to David Cobrinik.

Maintained by Kevin Stachelek. Last updated 1 months ago.

coverage rnaseq sequencing visualization geneexpression transcription singlecell transcriptomics normalization preprocessing qualitycontrol dimensionreduction dataimport

5.56 score 2 scripts 1 dependents

tomzylkin

penppml:Penalized Poisson Pseudo Maximum Likelihood Regression

A set of tools that enables efficient estimation of penalized Poisson Pseudo Maximum Likelihood regressions, using lasso or ridge penalties, for models that feature one or more sets of high-dimensional fixed effects. The methodology is based on Breinlich, Corradi, Rocha, Ruta, Santos Silva, and Zylkin (2021) <http://hdl.handle.net/10986/35451> and takes advantage of the method of alternating projections of Gaure (2013) <doi:10.1016/j.csda.2013.03.024> for dealing with HDFE, as well as the coordinate descent algorithm of Friedman, Hastie and Tibshirani (2010) <doi:10.18637/jss.v033.i01> for fitting lasso regressions. The package is also able to carry out cross-validation and to implement the plugin lasso of Belloni, Chernozhukov, Hansen and Kozbur (2016) <doi:10.1080/07350015.2015.1102733>.

Maintained by Joao Cruz. Last updated 2 months ago.

cpp

12 stars 5.56 score 10 scripts

hughjonesd

doctest:Generate Tests from Examples Using 'roxygen' and 'testthat'

Creates 'testthat' tests from 'roxygen' examples using simple tags.

Maintained by David Hugh-Jones. Last updated 1 years ago.

33 stars 5.52 score 4 scripts

bioc

methylclock:Methylclock - DNA methylation-based clocks

This package allows to estimate chronological and gestational DNA methylation (DNAm) age as well as biological age using different methylation clocks. Chronological DNAm age (in years) : Horvath's clock, Hannum's clock, BNN, Horvath's skin+blood clock, PedBE clock and Wu's clock. Gestational DNAm age : Knight's clock, Bohlin's clock, Mayne's clock and Lee's clocks. Biological DNAm clocks : Levine's clock and Telomere Length's clock.

Maintained by Dolors Pelegri-Siso. Last updated 5 months ago.

dnamethylation biologicalquestion preprocessing statisticalmethod normalization cpp

39 stars 5.52 score 28 scripts

bioc

GeoDiff:Count model based differential expression and normalization on GeoMx RNA data

A series of statistical models using count generating distributions for background modelling, feature and sample QC, normalization and differential expression analysis on GeoMx RNA data. The application of these methods are demonstrated by example data analysis vignette.

Maintained by Nicole Ortogero. Last updated 5 months ago.

geneexpression differentialexpression normalization openblas cpp openmp

8 stars 5.51 score 9 scripts

ropensci

mcbette:Model Comparison Using 'babette'

'BEAST2' (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. 'mcbette' allows to do a Bayesian model comparison over some site and clock models, using 'babette' (<https://github.com/ropensci/babette/>).

Maintained by Richèl J.C. Bilderbeek. Last updated 8 months ago.

openjdk

7 stars 5.50 score 18 scripts

ineelhere

shiny.ollama:R 'shiny' Interface for Chatting with Large Language Models Offline on Local with 'ollama'

Chat with large language models like 'deepseek-r1', 'nemotron', 'llama', 'qwen' and many more on your machine without internet with complete privacy via 'ollama', powered by R 'shiny' interface. For more information on 'ollama', visit <https://ollama.com>.

Maintained by Indraneel Chakraborty. Last updated 19 days ago.

deepseek-r1 llama3 llm local-llm offline-first offline-llm ollama ollama-app ollama-gui shiny shinyapp

9 stars 5.50 score 2 scripts

bioc

MotifPeeker:Benchmarking Epigenomic Profiling Methods Using Motif Enrichment

MotifPeeker is used to compare and analyse datasets from epigenomic profiling methods with motif enrichment as the key benchmark. The package outputs an HTML report consisting of three sections: (1. General Metrics) Overview of peaks-related general metrics for the datasets (FRiP scores, peak widths and motif-summit distances). (2. Known Motif Enrichment Analysis) Statistics for the frequency of user-provided motifs enriched in the datasets. (3. De-Novo Motif Enrichment Analysis) Statistics for the frequency of de-novo discovered motifs enriched in the datasets and compared with known motifs.

Maintained by Hiranyamaya Dash. Last updated 3 months ago.

epigenetics genetics qualitycontrol chipseq multiplecomparison functionalgenomics motifdiscovery sequencematching software alignment bioconductor bioconductor-package chip-seq epigenomics interactive-report motif-enrichment-analysis

2 stars 5.48 score 6 scripts

loelschlaeger

RprobitB:Bayesian Probit Choice Modeling

Bayes estimation of probit choice models, both in the cross-sectional and panel setting. The package can analyze binary, multivariate, ordered, and ranked choices, as well as heterogeneity of choice behavior among deciders. The main functionality includes model fitting via Markov chain Monte Carlo m ethods, tools for convergence diagnostic, choice data simulation, in-sample and out-of-sample choice prediction, and model selection using information criteria and Bayes factors. The latent class model extension facilitates preference-based decider classification, where the number of latent classes can be inferred via the Dirichlet process or a weight-based updating heuristic. This allows for flexible modeling of choice behavior without the need to impose structural constraints. For a reference on the method see Oelschlaeger and Bauer (2021) <https://trid.trb.org/view/1759753>.

Maintained by Lennart Oelschläger. Last updated 6 months ago.

bayes discrete-choice probit openblas cpp openmp

4 stars 5.45 score 1 scripts

jonthegeek

beekeeper:Rapidly Scaffold API Client Packages

Automatically generate R package skeletons from 'application programming interfaces (APIs)' that follow the 'OpenAPI Specification (OAS)'. The skeletons implement best practices to streamline package development.

Maintained by Jon Harmon. Last updated 6 months ago.

53 stars 5.42 score 2 scripts

luckinet

arealDB:Harmonise and Integrate Heterogeneous Areal Data

Many relevant applications in the environmental and socioeconomic sciences use areal data, such as biodiversity checklists, agricultural statistics, or socioeconomic surveys. For applications that surpass the spatial, temporal or thematic scope of any single data source, data must be integrated from several heterogeneous sources. Inconsistent concepts, definitions, or messy data tables make this a tedious and error-prone process. 'arealDB' tackles those problems and helps the user to integrate a harmonised databases of areal data. Read the paper at Ehrmann, Seppelt & Meyer (2020) <doi:10.1016/j.envsoft.2020.104799>.

Maintained by Steffen Ehrmann. Last updated 2 months ago.

areal-data database

2 stars 5.41 score 15 scripts

crunch-io

crplyr:A 'dplyr' Interface for Crunch

In order to facilitate analysis of datasets hosted on the Crunch data platform <https://crunch.io/>, the 'crplyr' package implements 'dplyr' methods on top of the Crunch backend. The usual methods 'select', 'filter', 'group_by', 'summarize', and 'collect' are implemented in such a way as to perform as much computation on the server and pull as little data locally as possible.

Maintained by Greg Freedman Ellis. Last updated 2 years ago.

6 stars 5.41 score 17 scripts

simon-smart88

shinyscholar:A Template for Creating Reproducible 'shiny' Applications

Create a skeleton 'shiny' application with create_template() that is reproducible, can be saved and meets academic standards for attribution. Forked from 'wallace'. Code is split into modules that are loaded and linked together automatically and each call one function. Guidance pages explain modules to users and flexible logging informs them of any errors. Options enable asynchronous operations, viewing of source code, interactive maps and data tables. Use to create complex analytical applications, following best practices in open science and software development. Includes functions for automating repetitive development tasks and an example application at run_shinyscholar() that requires install.packages("shinyscholar", dependencies = TRUE). A guide to developing applications can be found on the package website.

Maintained by Simon E. H. Smart. Last updated 7 days ago.

22 stars 5.40 score 5 scripts

loelschlaeger

oeli:Utilities for Developing Data Science Software

Some general helper functions that I (and maybe others) find useful when developing data science software.

Maintained by Lennart Oelschläger. Last updated 4 months ago.

openblas cpp

2 stars 5.38 score 1 scripts 4 dependents

beerda

nuggets:Extensible Data Pattern Searching Framework

Extensible framework for subgroup discovery (Atzmueller (2015) <doi:10.1002/widm.1144>), contrast patterns (Chen (2022) <doi:10.48550/arXiv.2209.13556>), emerging patterns (Dong (1999) <doi:10.1145/312129.312191>), association rules (Agrawal (1994) <https://www.vldb.org/conf/1994/P487.PDF>) and conditional correlations (Hájek (1978) <doi:10.1007/978-3-642-66943-9>). Both crisp (Boolean, binary) and fuzzy data are supported. It generates conditions in the form of elementary conjunctions, evaluates them on a dataset and checks the induced sub-data for interesting statistical properties. A user-defined function may be defined to evaluate on each generated condition to search for custom patterns.

Maintained by Michal Burda. Last updated 18 days ago.

association-rule-mining contrast-pattern-mining data-mining fuzzy knowledge-discovery pattern-recognition cpp openmp

2 stars 5.38 score 10 scripts

angabrio

missingHE:Missing Outcome Data in Health Economic Evaluation

Contains a suite of functions for health economic evaluations with missing outcome data. The package can fit different types of statistical models under a fully Bayesian approach using the software 'JAGS' (which should be installed locally and which is loaded in 'missingHE' via the 'R' package 'R2jags'). Three classes of models can be fitted under a variety of missing data assumptions: selection models, pattern mixture models and hurdle models. In addition to model fitting, 'missingHE' provides a set of specialised functions to assess model convergence and fit, and to summarise the statistical and economic results using different types of measures and graphs. The methods implemented are described in Mason (2018) <doi:10.1002/hec.3793>, Molenberghs (2000) <doi:10.1007/978-1-4419-0300-6_18> and Gabrio (2019) <doi:10.1002/sim.8045>.

Maintained by Andrea Gabrio. Last updated 2 years ago.

cost-effectiveness-analysis health-economic-evaluation individual-level-data jags missing-data parametric-modelling sensitivity-analysis cpp

5 stars 5.38 score 24 scripts

bioc

chevreulProcess:Tools for managing SingleCellExperiment objects as projects

Tools analyzing SingleCellExperiment objects as projects. for input into the Chevreul app downstream. Includes functions for analysis of single cell RNA sequencing data. Supported by NIH grants R01CA137124 and R01EY026661 to David Cobrinik.

Maintained by Kevin Stachelek. Last updated 2 months ago.

coverage rnaseq sequencing visualization geneexpression transcription singlecell transcriptomics normalization preprocessing qualitycontrol dimensionreduction dataimport

5.38 score 2 scripts 2 dependents

marce10

dynaSpec:Dynamic Spectrogram Visualizations

A set of tools to generate dynamic spectrogram visualizations in video format.

Maintained by Marcelo Araya-Salas. Last updated 1 months ago.

animal-sounds bioacoustics spectrogram

23 stars 5.37 score 34 scripts

tbrown122387

gradeR:Helps Grade Assignment Submissions that are R Scripts

After being given the location of your students' submissions and a test file, the function runs each .R file, and evaluates the results from all the given tests. Results are neatly returned in a data frame that has a row for each student, and a column for each test.

Maintained by Taylor Brown. Last updated 3 years ago.

4 stars 5.36 score 57 scripts

matteo21q

dani:Design and Analysis of Non-Inferiority Trials

Provides tools to help with the design and analysis of non-inferiority trials. These include functions for doing sample size calculations and for analysing non-inferiority trials, using a variety of outcome types and population-level sumamry measures. It also features functions to make trials more resilient by using the concept of non-inferiority frontiers, as described in Quartagno et al. (2019) <arXiv:1905.00241>. Finally it includes function to design and analyse MAMS-ROCI (aka DURATIONS) trials.

Maintained by Matteo Quartagno. Last updated 7 months ago.

2 stars 5.33 score 27 scripts

bioc

MsQuality:MsQuality - Quality metric calculation from Spectra and MsExperiment objects

The MsQuality provides functionality to calculate quality metrics for mass spectrometry-derived, spectral data at the per-sample level. MsQuality relies on the mzQC framework of quality metrics defined by the Human Proteom Organization-Proteomics Standards Initiative (HUPO-PSI). These metrics quantify the quality of spectral raw files using a controlled vocabulary. The package is especially addressed towards users that acquire mass spectrometry data on a large scale (e.g. data sets from clinical settings consisting of several thousands of samples). The MsQuality package allows to calculate low-level quality metrics that require minimum information on mass spectrometry data: retention time, m/z values, and associated intensities. MsQuality relies on the Spectra package, or alternatively the MsExperiment package, and its infrastructure to store spectral data.

Maintained by Thomas Naake. Last updated 2 months ago.

metabolomics proteomics massspectrometry qualitycontrol mass-spectrometry qc

7 stars 5.32 score 2 scripts

henningte

ir:Functions to Handle and Preprocess Infrared Spectra

Functions to import and handle infrared spectra (import from '.csv' and Thermo Galactic's '.spc', baseline correction, binning, clipping, interpolating, smoothing, averaging, adding, subtracting, dividing, multiplying, plotting).

Maintained by Henning Teickner. Last updated 3 years ago.

chemometrics infrared infrared-spectra ir-package mid-infrared-spectra spectroscopy

6 stars 5.32 score 35 scripts

frankiecho

ahpsurvey:Analytic Hierarchy Process for Survey Data

The Analytic Hierarchy Process is a versatile multi-criteria decision-making tool introduced by Saaty (1987) <doi:10.1016/0270-0255(87)90473-8> that allows decision-makers to weigh attributes and evaluate alternatives presented to them. This package provides a consistent methodology for researchers to reformat data and run analytic hierarchy process in R on data that are formatted using the survey data entry mode. It is optimized for performing the analytic hierarchy process with many decision-makers, and provides tools and options for researchers to aggregate individual preferences and test multiple options. It also allows researchers to quantify, visualize and correct for inconsistency in the decision-maker's comparisons.

Maintained by Frankie Cho. Last updated 4 years ago.

analytic-hierarchy-process operations-research questionnaire survey-data

14 stars 5.28 score 27 scripts

mazamascience

MazamaTimeSeries:Core Functionality for Environmental Time Series

Utility functions for working with environmental time series data from known locations. The compact data model is structured as a list with two dataframes. A 'meta' dataframe contains spatial and measuring device metadata associated with deployments at known locations. A 'data' dataframe contains a 'datetime' column followed by columns of measurements associated with each "device-deployment". Ephemerides calculations are based on code originally found in NOAA's "Solar Calculator" <https://gml.noaa.gov/grad/solcalc/>.

Maintained by Jonathan Callahan. Last updated 1 years ago.

timeseries

5.27 score 62 scripts 1 dependents

boennecd

psqn:Partially Separable Quasi-Newton

Provides quasi-Newton methods to minimize partially separable functions. The methods are largely described by Nocedal and Wright (2006) <doi:10.1007/978-0-387-40065-5>.

Maintained by Benjamin Christoffersen. Last updated 6 months ago.

optimization optimization-algorithms quasi-newton openblas cpp openmp

2 stars 5.26 score 5 scripts 3 dependents

mi2datalab

tidycharts:Generate Tidy Charts Inspired by 'IBCS'

There is a wide range of R packages created for data visualization, but still, there was no simple and easily accessible way to create clean and transparent charts - up to now. The 'tidycharts' package enables the user to generate charts compliant with International Business Communication Standards ('IBCS'). It means unified bar widths, colors, chart sizes, etc. Creating homogeneous reports has never been that easy! Additionally, users can apply semantic notation to indicate different data scenarios (plan, budget, forecast). What's more, it is possible to customize the charts by creating a personal color pallet with the possibility of switching to default options after the experiments. We wanted the package to be helpful in writing reports, so we also made joining charts in a one, clear image possible. All charts are generated in SVG format and can be shown in the 'RStudio' viewer pane or exported to HTML output of 'knitr'/'markdown'.

Maintained by Bartosz Sawicki. Last updated 3 years ago.

charts clean ibcs visualization

5 stars 5.23 score 17 scripts

bioc

CNVPanelizer:Reliable CNV detection in targeted sequencing applications

A method that allows for the use of a collection of non-matched normal tissue samples. Our approach uses a non-parametric bootstrap subsampling of the available reference samples to estimate the distribution of read counts from targeted sequencing. As inspired by random forest, this is combined with a procedure that subsamples the amplicons associated with each of the targeted genes. The obtained information allows us to reliably classify the copy number aberrations on the gene level.

Maintained by Thomas Wolf. Last updated 5 months ago.

classification sequencing normalization copynumbervariation coverage

5.23 score 12 scripts

bioc

runibic:runibic: row-based biclustering algorithm for analysis of gene expression data in R

This package implements UbiBic algorithm in R. This biclustering algorithm for analysis of gene expression data was introduced by Zhenjia Wang et al. in 2016. It is currently considered the most promising biclustering method for identification of meaningful structures in complex and noisy data.

Maintained by Patryk Orzechowski. Last updated 5 months ago.

microarray clustering geneexpression sequencing coverage cpp openmp

4 stars 5.20 score 7 scripts

boennecd

VAJointSurv:Variational Approximation for Joint Survival and Marker Models

Estimates joint marker (longitudinal) and survival (time-to-event) outcomes using variational approximations. The package supports multivariate markers allowing for correlated error terms and multiple types of survival outcomes which may be left-truncated, right-censored, and recurrent. Time-varying fixed and random covariate effects are supported along with non-proportional hazards.

Maintained by Benjamin Christoffersen. Last updated 3 months ago.

openblas cpp openmp

5 stars 5.20 score 21 scripts

pik-piam

modelstats:Run Analysis Tools

A collection of tools to analyze model runs.

Maintained by Anastasis Giannousakis. Last updated 14 days ago.

1 stars 5.19 score 2 scripts

ramikrispin

covid19sf:The Covid19 San Francisco Dataset

Provides a verity of summary tables of the Covid19 cases in San Francisco. Data source: San Francisco, Department of Public Health - Population Health Division <https://datasf.org/opendata/>.

Maintained by Rami Krispin. Last updated 2 years ago.

12 stars 5.16 score 12 scripts

larsenlab

hlaR:Tools for HLA Data

A streamlined tool for eplet analysis of donor and recipient HLA (human leukocyte antigen) mismatch. Messy, low-resolution HLA typing data is cleaned, and imputed to high-resolution using the NMDP (National Marrow Donor Program) haplotype reference database <https://haplostats.org/haplostats>. High resolution data is analyzed for overall or single antigen eplet mismatch using a reference table (currently supporting 'HLAMatchMaker' <http://www.epitopes.net> versions 2 and 3). Data can enter or exit the workflow at different points depending on the user's aims and initial data quality.

Maintained by Joan Zhang. Last updated 2 years ago.

7 stars 5.15 score 9 scripts

miriamesteve

GSSTDA:Progression Analysis of Disease with Survival using Topological Data Analysis

Mapper-based survival analysis with transcriptomics data is designed to carry out. Mapper-based survival analysis is a modification of Progression Analysis of Disease (PAD) where survival data is taken into account in the filtering function. More details in: J. Fores-Martos, B. Suay-Garcia, R. Bosch-Romeu, M.C. Sanfeliu-Alonso, A. Falco, J. Climent, "Progression Analysis of Disease with Survival (PAD-S) by SurvMap identifies different prognostic subgroups of breast cancer in a large combined set of transcriptomics and methylation studies" <doi:10.1101/2022.09.08.507080>.

Maintained by Miriam Esteve. Last updated 8 months ago.

2 stars 5.15 score 7 scripts

choi-phd

lordif:Logistic Ordinal Regression Differential Item Functioning using IRT

Performs analysis of Differential Item Functioning (DIF) for dichotomous and polytomous items using an iterative hybrid of ordinal logistic regression and item response theory (IRT) according to Choi, Gibbons, and Crane (2011) <doi:10.18637/jss.v039.i08>.

Maintained by Seung W. Choi. Last updated 3 months ago.

1 stars 5.12 score 35 scripts 1 dependents

s-fleck

testthis:Utils and 'RStudio' Addins to Make Testing Even More Fun

Utility functions and 'RStudio' addins for writing, running and organizing automated tests. Integrates tightly with the packages 'testthat', 'devtools' and 'usethis'. Hotkeys can be assigned to the 'RStudio' addins for running tests in a single file or to switch between a source file and the associated test file. In addition, testthis provides function to manage and run tests in subdirectories of the test/testthat directory.

Maintained by Stefan Fleck. Last updated 3 years ago.

rstudio rstudio-addin rstudio-addins testing testthat

33 stars 5.12 score 20 scripts

julia-wrobel

mxfda:A Functional Data Analysis Package for Spatial Single Cell Data

Methods and tools for deriving spatial summary functions from single-cell imaging data and performing functional data analyses. Functions can be applied to other single-cell technologies such as spatial transcriptomics. Functional regression and functional principal component analysis methods are in the 'refund' package <https://cran.r-project.org/package=refund> while calculation of the spatial summary functions are from the 'spatstat' package <https://spatstat.org/>.

Maintained by Alex Soupir. Last updated 1 months ago.

1 stars 5.08 score 8 scripts

bioc

chevreulShiny:Tools for managing SingleCellExperiment objects as projects

Tools for managing SingleCellExperiment objects as projects. Includes functions for analysis and visualization of single-cell data. Also included is a shiny app for visualization of pre-processed scRNA data. Supported by NIH grants R01CA137124 and R01EY026661 to David Cobrinik.

Maintained by Kevin Stachelek. Last updated 28 days ago.

coverage rnaseq sequencing visualization geneexpression transcription singlecell transcriptomics normalization preprocessing qualitycontrol dimensionreduction dataimport

5.08 score

yanrong-stacy-song

creditr:Credit Default Swaps

Price credit default swaps using 'C' code from the International Swaps and Derivatives Association CDS Standard Model. See <https://www.cdsmodel.com/cdsmodel/documentation.html> for more information about the model and <https://www.cdsmodel.com/cdsmodel/cds-disclaimer.html> for license details for the 'C' code.

Maintained by Yanrong Song. Last updated 8 days ago.

5.05 score 32 scripts

philchalmers

Spower:Power Analyses using Monte Carlo Simulations

Provides a general purpose simulation-based power analysis API for routine and customized simulation experimental designs. The package focuses exclusively on Monte Carlo simulation variants of (expected) prospective power analyses, criterion analyses, compromise analyses, sensitivity analyses, and a priori analyses. The default simulation experiment functions found within the package provide stochastic variants of the power analyses subroutines found in the G*Power 3.1 software (Faul, Erdfelder, Buchner, and Lang, 2009) <doi:10.3758/brm.41.4.1149>, along with various other parametric and non-parametric power analysis examples (e.g., mediation analyses). Supporting functions are also included, such as for building empirical power curve estimates, which utilize a similar API structure.

Maintained by Phil Chalmers. Last updated 19 hours ago.

5.04 score

nalimilan

R.temis:Integrated Text Mining Solution

An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.

Maintained by Milan Bouchet-Valat. Last updated 3 days ago.

text-mining

28 stars 5.00 score 24 scripts

bioc

GARS:GARS: Genetic Algorithm for the identification of Robust Subsets of variables in high-dimensional and challenging datasets

Feature selection aims to identify and remove redundant, irrelevant and noisy variables from high-dimensional datasets. Selecting informative features affects the subsequent classification and regression analyses by improving their overall performances. Several methods have been proposed to perform feature selection: most of them relies on univariate statistics, correlation, entropy measurements or the usage of backward/forward regressions. Herein, we propose an efficient, robust and fast method that adopts stochastic optimization approaches for high-dimensional. GARS is an innovative implementation of a genetic algorithm that selects robust features in high-dimensional and challenging datasets.

Maintained by Mattia Chiesa. Last updated 5 months ago.

classification featureextraction clustering openjdk

5.00 score 2 scripts

dieghernan

pkgdev:Helpers to Develop a Package using GitHub Actions

A small set of functions that takes advantage of GitHub Actions for making your life easier as a R package developer. This package is primarily intended for personal use, however feel free to use it (at your own risk :)).

Maintained by Diego Hernangómez. Last updated 7 days ago.

developer-tools experimental github-actions

4 stars 5.00 score 7 scripts

bioc

broadSeq:broadSeq : for streamlined exploration of RNA-seq data

This package helps user to do easily RNA-seq data analysis with multiple methods (usually which needs many different input formats). Here the user will provid the expression data as a SummarizedExperiment object and will get results from different methods. It will help user to quickly evaluate different methods.

Maintained by Rishi Das Roy. Last updated 5 months ago.

geneexpression differentialexpression rnaseq transcriptomics sequencing coverage genesetenrichment go

4 stars 5.00 score 7 scripts

cran

exactci:Exact P-Values and Matching Confidence Intervals for Simple Discrete Parametric Cases

Calculates exact tests and confidence intervals for one-sample binomial and one- or two-sample Poisson cases (see Fay (2010) <doi:10.32614/rj-2010-008>).

Maintained by Michael P. Fay. Last updated 2 years ago.

5.00 score 10 dependents

alphaprime7

normfluodbf:Cleans and Normalizes FLUOstar DBF and DAT Files from 'Liposome' Flux Assays

Cleans and Normalizes FLUOstar DBF and DAT Files obtained from liposome flux assays. Users should verify extended usage of the package on files from other assay types.

Maintained by Tingwei Adeck. Last updated 5 months ago.

1 stars 4.98 score 12 scripts

pboutros

bedr:Genomic Region Processing using Tools Such as 'BEDTools', 'BEDOPS' and 'Tabix'

Genomic regions processing using open-source command line tools such as 'BEDTools', 'BEDOPS' and 'Tabix'. These tools offer scalable and efficient utilities to perform genome arithmetic e.g indexing, formatting and merging. bedr API enhances access to these tools as well as offers additional utilities for genomic regions processing.

Maintained by Paul C. Boutros. Last updated 6 years ago.

4.98 score 264 scripts 2 dependents

dzhakparov

GeneSelectR:Comprehensive Feature Selection Worfkflow for Bulk RNAseq Datasets

GeneSelectR is a versatile R package designed for efficient RNA sequencing data analysis. Its key innovation lies in the seamless integration of the Python sklearn machine learning framework with R-based bioinformatics tools. This integration enables GeneSelectR to perform robust ML-driven feature selection while simultaneously leveraging the power of Gene Ontology (GO) enrichment and semantic similarity analyses. By combining these diverse methodologies, GeneSelectR offers a comprehensive workflow that optimizes both the computational aspects of ML and the biological insights afforded by advanced bioinformatics analyses. Ideal for researchers in bioinformatics, GeneSelectR stands out as a unique tool for analyzing complex RNAseq datasets with enhanced precision and relevance.

Maintained by Damir Zhakparov. Last updated 10 months ago.

19 stars 4.98 score 7 scripts

ryan-riggs

RivRetrieve:Retrieve Global River Gauge Data

Provides access to global river gauge data from a variety of national-level river agencies. The package interfaces with the national-level agency websites to provide access to river gauge locations, river discharge, and river stage. Currently, the package is available for the following countries: Australia, Brazil, Canada, Chile, France, Japan, South Africa, the United Kingdom, and the United States.

Maintained by Ryan Riggs. Last updated 3 months ago.

9 stars 4.95 score 7 scripts

lcrawlab

smer:Sparse Marginal Epistasis Test

The Sparse Marginal Epistasis Test is a computationally efficient genetics method which detects statistical epistasis in complex traits; see Stamp et al. (2025, <doi:10.1101/2025.01.11.632557>) for details.

Maintained by Julian Stamp. Last updated 2 months ago.

genomewideassociation epistasis genetics snp linearmixedmodel cpp epistasis-analysis epistatis gwas gwas-tools mapit zlib cpp openmp

1 stars 4.95 score 8 scripts

openwashdata

washr:Publication Toolkit for Water, Sanitation and Hygiene (WASH) Data

A toolkit to set up an R data package in a consistent structure. Automates tasks like tidy data export, data dictionary documentation, README and website creation, and citation management.

Maintained by Colin Walder. Last updated 5 months ago.

2 stars 4.95 score 7 scripts

qile0317

FastUtils:Fast, Readable Utility Functions

A wide variety of tools for general data analysis, wrangling, spelling, statistics, visualizations, package development, and more. All functions have vectorized implementations whenever possible. Exported names are designed to be readable, with longer names possessing short aliases.

Maintained by Qile Yang. Last updated 4 months ago.

scientific-computing utilities utility cpp

2 stars 4.95 score 2 scripts

armcn

quickcheck:Property Based Testing

Property based testing, inspired by the original 'QuickCheck'. This package builds on the property based testing framework provided by 'hedgehog' and is designed to seamlessly integrate with 'testthat'.

Maintained by Andrew McNeil. Last updated 1 years ago.

functional-programming property-based-testing

25 stars 4.94 score 70 scripts

tpisel

openmeteo:Retrieve Weather Data from the Open-Meteo API

A client for the Open-Meteo API that retrieves Open-Meteo weather data in a tidy format. No API key is required. The API specification is located at <https://open-meteo.com/en/docs>.

Maintained by Tom Pisel. Last updated 1 years ago.

20 stars 4.93 score 86 scripts

bioc

Oscope:Oscope - A statistical pipeline for identifying oscillatory genes in unsynchronized single cell RNA-seq

Oscope is a statistical pipeline developed to identifying and recovering the base cycle profiles of oscillating genes in an unsynchronized single cell RNA-seq experiment. The Oscope pipeline includes three modules: a sine model module to search for candidate oscillator pairs; a K-medoids clustering module to cluster candidate oscillators into groups; and an extended nearest insertion module to recover the base cycle order for each oscillator group.

Maintained by Ning Leng. Last updated 5 months ago.

immunooncology statisticalmethod rnaseq sequencing geneexpression

4.92 score 14 scripts 1 dependents

lukeduttweiler

skipTrack:A Bayesian Hierarchical Model that Controls for Non-Adherence in Mobile Menstrual Cycle Tracking

Implements a Bayesian hierarchical model designed to identify skips in mobile menstrual cycle self-tracking on mobile apps. Future developments will allow for the inclusion of covariates affecting cycle mean and regularity, as well as extra information regarding tracking non-adherence. Main methods to be outlined in a forthcoming paper, with alternative models from Li et al. (2022) <doi:10.1093/jamia/ocab182>.

Maintained by Luke Duttweiler. Last updated 2 months ago.

4.90 score 4 scripts

tspsyched

autoFC:Automatic Construction of Forced-Choice Tests

Forced-choice (FC) response has gained increasing popularity and interest for its resistance to faking when well-designed (Cao & Drasgow, 2019 <doi:10.1037/apl0000414>). To established well-designed FC scales, typically each item within a block should measure different trait and have similar level of social desirability (Zhang et al., 2020 <doi:10.1177/1094428119836486>). Recent study also suggests the importance of high inter-item agreement of social desirability between items within a block (Pavlov et al., 2021 <doi:10.31234/osf.io/hmnrc>). In addition to this, FC developers may also need to maximize factor loading differences (Brown & Maydeu-Olivares, 2011 <doi:10.1177/0013164410375112>) or minimize item location differences (Cao & Drasgow, 2019 <doi:10.1037/apl0000414>) depending on scoring models. Decision of which items should be assigned to the same block, termed item pairing, is thus critical to the quality of an FC test. This pairing process is essentially an optimization process which is currently carried out manually. However, given that we often need to simultaneously meet multiple objectives, manual pairing becomes impractical or even not feasible once the number of latent traits and/or number of items per trait are relatively large. To address these problems, autoFC is developed as a practical tool for facilitating the automatic construction of FC tests (Li et al., 2022 <doi:10.1177/01466216211051726>), essentially exempting users from the burden of manual item pairing and reducing the computational costs and biases induced by simple ranking methods. Given characteristics of each item (and item responses), FC tests can be automatically constructed based on user-defined pairing criteria and weights as well as customized optimization behavior. Users can also construct parallel forms of the same test following the same pairing rules.

Maintained by Mengtong Li. Last updated 19 days ago.

4 stars 4.90 score 3 scripts

schlosslab

clustur:Clustering

A tool that implements the clustering algorithms from 'mothur' (Schloss PD et al. (2009) <doi:10.1128/AEM.01541-09>). 'clustur' make use of the cluster() and make.shared() command from 'mothur'. Our cluster() function has five different algorithms implemented: 'OptiClust', 'furthest', 'nearest', 'average', and 'weighted'. 'OptiClust' is an optimized clustering method for Operational Taxonomic Units, and you can learn more here, (Westcott SL, Schloss PD (2017) <doi:10.1128/mspheredirect.00073-17>). The make.shared() command is always applied at the end of the clustering command. This functionality allows us to generate and create clustering and abundance data efficiently.

Maintained by Patrick Schloss. Last updated 4 months ago.

cpp

1 stars 4.85 score 7 scripts

mazamascience

MazamaSpatialPlots:Thematic Plots for Mazama Spatial Datasets

A suite of convenience functions for generating US state and county thematic maps using datasets from the MazamaSpatialUtils package.

Maintained by Jonathan Callahan. Last updated 2 months ago.

4.84 score 23 scripts

bioc

MLSeq:Machine Learning Interface for RNA-Seq Data

This package applies several machine learning methods, including SVM, bagSVM, Random Forest and CART to RNA-Seq data.

Maintained by Gokmen Zararsiz. Last updated 5 months ago.

immunooncology sequencing rnaseq classification clustering

4.81 score 27 scripts 1 dependents

mrcieu

gwasglue:GWAS summary data sources connected to analytical tools

Many tools exist that use GWAS summary data for colocalisation, fine mapping, Mendelian randomization, visualisation, etc. This package is a conduit that connects R packages that can retrieve GWAS summary data to various tools for analysing those data.

Maintained by Gibran Hemani. Last updated 3 years ago.

134 stars 4.79 score 91 scripts

mkearney

pkgverse:Build a Meta-Package Universe

Build your own universe of packages similar to the 'tidyverse' package <https://tidyverse.org/> with this meta-package creator. Create a package-verse, or meta package, by supplying a custom name for the collection of packages and the vector of desired package names to include– and optionally supply a destination directory, an indicator of whether to keep the created package directory, and/or a vector of verbs implement via the 'usethis' <http://usethis.r-lib.org/> package.

Maintained by Michael Wayne Kearney. Last updated 6 years ago.

package-manager tidyverse

121 stars 4.78 score 6 scripts

ocbe-uio

BayesSurvive:Bayesian Survival Models for High-Dimensional Data

An implementation of Bayesian survival models with graph-structured selection priors for sparse identification of omics features predictive of survival (Madjar et al., 2021 <doi:10.1186/s12859-021-04483-z>) and its extension to use a fixed graph via a Markov Random Field (MRF) prior for capturing known structure of omics features, e.g. disease-specific pathways from the Kyoto Encyclopedia of Genes and Genomes database (Hermansen et al., 2025 <doi:10.48550/arXiv.2503.13078>).

Maintained by Zhi Zhao. Last updated 11 days ago.

bayesian-cox-models bayesian-variable-selection graph-learning high-dimensional-statistics omics-data-integration survival-analysis openblas cpp openmp

4.78 score 1 scripts

simulatr

simrel:Simulation of Multivariate Linear Model Data

Researchers have been using simulated data from a multivariate linear model to compare and evaluate different methods, ideas and models. Additionally, teachers and educators have been using a simulation tool to demonstrate and teach various statistical and machine learning concepts. This package helps users to simulate linear model data with a wide range of properties by tuning few parameters such as relevant latent components. In addition, a shiny app as an 'RStudio' gadget gives users a simple interface for using the simulation function. See more on: Sæbø, S., Almøy, T., Helland, I.S. (2015) <doi:10.1016/j.chemolab.2015.05.012> and Rimal, R., Almøy, T., Sæbø, S. (2018) <doi:10.1016/j.chemolab.2018.02.009>.

Maintained by Raju Rimal. Last updated 2 years ago.

bivariate-simulation multivariate-simulation relevant-predictor-components simulated-data simulation univariate-simulation

3 stars 4.78 score 40 scripts

mkorvink

archetyper:An Archetype for Data Mining and Data Science Projects

A project template to support the data science workflow.

Maintained by Michael Korvink. Last updated 4 years ago.

6 stars 4.78 score 7 scripts

vgherard

sbo:Text Prediction via Stupid Back-Off N-Gram Models

Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).

Maintained by Valerio Gherardi. Last updated 4 years ago.

natural-language-processing ngram-models predictive-text sbo cpp

10 stars 4.78 score 12 scripts

bioc

biodbChebi:biodbChebi, a library for connecting to the ChEBI Database

The biodbChebi library provides access to the ChEBI Database, using biodb package framework. It allows to retrieve entries by their accession number. Web services can be accessed for searching the database by name, mass or other fields.

Maintained by Pierrick Roger. Last updated 5 months ago.

software infrastructure dataimport

2 stars 4.78 score 3 scripts 1 dependents

bioc

mobileRNA:mobileRNA: Investigate the RNA mobilome & population-scale changes

Genomic analysis can be utilised to identify differences between RNA populations in two conditions, both in production and abundance. This includes the identification of RNAs produced by multiple genomes within a biological system. For example, RNA produced by pathogens within a host or mobile RNAs in plant graft systems. The mobileRNA package provides methods to pre-process, analyse and visualise the sRNA and mRNA populations based on the premise of mapping reads to all genotypes at the same time.

Maintained by Katie Jeynes-Cupper. Last updated 5 months ago.

visualization rnaseq sequencing smallrna genomeassembly clustering experimentaldesign qualitycontrol workflowstep alignment preprocessing bioinformatics plant-science

3 stars 4.78 score 2 scripts