R-universe search: batch

usepa

ctxR:Utilities for Interacting with the 'CTX' APIs

Access chemical, hazard, bioactivity, and exposure data from the Computational Toxicology and Exposure ('CTX') APIs <https://www.epa.gov/comptox-tools/computational-toxicology-and-exposure-apis>. 'ctxR' was developed to streamline the process of accessing the information available through the 'CTX' APIs without requiring prior knowledge of how to use APIs. Most data is also available on the CompTox Chemical Dashboard ('CCD') <https://comptox.epa.gov/dashboard/> and other resources found at the EPA Computational Toxicology and Exposure Online Resources <https://www.epa.gov/comptox-tools>.

Maintained by Paul Kruse. Last updated 2 months ago.

ccte comptox ord

76.0 match 10 stars 8.02 score 13 scripts 1 dependents

usepa

ccdR:Utilities for Interacting with the 'CTX' APIs

Access chemical, hazard, bioactivity, and exposure data from the Computational Toxicology and Exposure ('CTX') APIs <https://api-ccte.epa.gov/docs/>. 'ccdR' was developed to streamline the process of accessing the information available through the 'CTX' APIs without requiring prior knowledge of how to use APIs. Most data is also available on the CompTox Chemical Dashboard ('CCD') <https://comptox.epa.gov/dashboard/> and other resources found at the EPA Computational Toxicology and Exposure Online Resources <https://www.epa.gov/comptox-tools>.

Maintained by Paul Kruse. Last updated 8 months ago.

78.7 match 2 stars 6.38 score 7 scripts

pecanproject

PEcAn.assim.batch:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Istem Fer. Last updated 4 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

44.7 match 216 stars 9.94 score 20 scripts 2 dependents

edubruell

tidyllm:Tidy Integration of Large Language Models

A tidy interface for integrating large language model (LLM) APIs such as 'Claude', 'Openai', 'Groq','Mistral' and local models via 'Ollama' into R workflows. The package supports text and media-based interactions, interactive message history, batch request APIs, and a tidy, pipeline-oriented interface for streamlined integration into data workflows. Web services are available at <https://www.anthropic.com>, <https://openai.com>, <https://groq.com>, <https://mistral.ai/> and <https://ollama.com>.

Maintained by Eduard Brüll. Last updated 6 days ago.

52.2 match 68 stars 7.82 score 26 scripts

wlandau

crew.aws.batch:A Crew Launcher Plugin for AWS Batch

In computationally demanding analysis projects, statisticians and data scientists asynchronously deploy long-running tasks to distributed systems, ranging from traditional clusters to cloud services. The 'crew.aws.batch' package extends the 'mirai'-powered 'crew' package with a worker launcher plugin for AWS Batch. Inspiration also comes from packages 'mirai' by Gao (2023) <https://github.com/shikokuchuo/mirai>, 'future' by Bengtsson (2021) <doi:10.32614/RJ-2021-048>, 'rrq' by FitzJohn and Ashton (2023) <https://github.com/mrc-ide/rrq>, 'clustermq' by Schubert (2019) <doi:10.1093/bioinformatics/btz284>), and 'batchtools' by Lang, Bischl, and Surmann (2017). <doi:10.21105/joss.00135>.

Maintained by William Michael Landau. Last updated 1 months ago.

aws-batch crew high-performance-computing

62.2 match 15 stars 4.99 score 6 scripts

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

27.2 match 10.93 score 10k scripts 55 dependents

bioc

batchelor:Single-Cell Batch Correction Methods

Implements a variety of methods for batch correction of single-cell (RNA sequencing) data. This includes methods based on detecting mutually nearest neighbors, as well as several efficient variants of linear regression of the log-expression values. Functions are also provided to perform global rescaling to remove differences in depth between batches, and to perform a principal components analysis that is robust to differences in the numbers of cells across batches.

Maintained by Aaron Lun. Last updated 5 days ago.

sequencing rnaseq software geneexpression transcriptomics singlecell batcheffect normalization cpp

31.6 match 9.10 score 1.2k scripts 10 dependents

bioc

singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.

Maintained by Joshua David Campbell. Last updated 25 days ago.

singlecell geneexpression differentialexpression alignment clustering immunooncology batcheffect normalization qualitycontrol dataimport gui

25.6 match 181 stars 10.16 score 252 scripts

cran

biogas:Process Biogas Data and Predict Biogas Production

High- and low-level functions for processing biogas data and predicting biogas production. Molar mass and calculated oxygen demand (COD') can be determined from a chemical formula. Measured gas volume can be corrected for water vapor and to (possibly user-defined) standard temperature and pressure. Gas quantity can be converted between volume, mass, and moles. Gas composition, cumulative production, or other variables can be interpolated to a specified time. Cumulative biogas and methane production (and rates) can be calculated from raw data obtained using volumetric, manometric, gravimetric, or gas density methods for any number of bottles. With cumulative methane production data and data on bottle contents, biochemical methane potential (BMP) or specific methane production (SMP) can be calculated and summarized, including subtraction of the inoculum contribution and normalization by substrate mass. Cumulative production and production rates can be summarized in several different ways (e.g., omitting normalization) using the same function. Biogas quantity and composition can be predicted from substrate composition and additional, optional data. Inoculum and substrate mass can be determined for planning BMP experiments. Finally, first-order models can be fit to measurements in order to extract estimates of ultimate yield and kinetic constants.

Maintained by Sasha D. Hafner. Last updated 3 months ago.

66.5 match 3 stars 3.78 score

bioc

BatchQC:Batch Effects Quality Control Software

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.

Maintained by Jessica McClintock. Last updated 5 months ago.

batcheffect graphandnetwork microarray normalization principalcomponent sequencing software visualization qualitycontrol rnaseq preprocessing differentialexpression immunooncology

26.1 match 7 stars 8.96 score 54 scripts

mlampros

ClusterR:Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering

Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering with the option to plot, validate, predict (new data) and estimate the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. For more information, see (i) "Clustering in an Object-Oriented Environment" by Anja Struyf, Mia Hubert, Peter Rousseeuw (1997), Journal of Statistical Software, <doi:10.18637/jss.v001.i04>; (ii) "Web-scale k-means clustering" by D. Sculley (2010), ACM Digital Library, <doi:10.1145/1772690.1772862>; (iii) "Armadillo: a template-based C++ library for linear algebra" by Sanderson et al (2016), The Journal of Open Source Software, <doi:10.21105/joss.00026>; (iv) "Clustering by Passing Messages Between Data Points" by Brendan J. Frey and Delbert Dueck, Science 16 Feb 2007: Vol. 315, Issue 5814, pp. 972-976, <doi:10.1126/science.1136800>.

Maintained by Lampros Mouselimis. Last updated 9 months ago.

affinity-propagation cpp11 gmm kmeans kmedoids-clustering mini-batch-kmeans rcpparmadillo openblas cpp openmp

18.7 match 84 stars 11.08 score 640 scripts 24 dependents

lme4

lme4:Linear Mixed-Effects Models using 'Eigen' and S4

Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".

Maintained by Ben Bolker. Last updated 5 days ago.

cpp

9.8 match 647 stars 20.69 score 35k scripts 1.5k dependents

dylanpieper

hellmer:Batch Processing for Chat Models

Batch processing framework for 'ellmer' chat model interactions. Enables sequential and parallel processing of chat completions. Core capabilities include error handling with backoff, state persistence, progress tracking, and retry management. Parallel processing is implemented via the 'future' framework. Additional features include structured data extraction, tool integration, timeout handling, verbosity control, and sound notifications. Includes methods for returning chat texts, chat objects, progress status, and structured data.

Maintained by Dylan Pieper. Last updated 4 days ago.

batch batch-processing ellmer llm

36.3 match 6 stars 5.18 score

llrs

experDesign:Design Experiments for Batches

Distributes samples in batches while making batches homogeneous according to their description. Allows for an arbitrary number of variables, both numeric and categorical. For quality control it provides functions to subset a representative sample.

Maintained by Lluís Revilla Sancho. Last updated 3 months ago.

batch experiment-design

31.1 match 10 stars 5.54 score 1 scripts

mlr-org

bbotk:Black-Box Optimization Toolkit

Features highly configurable search spaces via the 'paradox' package and optimizes every user-defined objective function. The package includes several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). bbotk is the base package of 'mlr3tuning', 'mlr3fselect' and 'miesmuschel'.

Maintained by Marc Becker. Last updated 3 months ago.

bbotk black-box-optimization data-science hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 optimization

16.9 match 22 stars 9.83 score 166 scripts 14 dependents

bioc

BERT:High Performance Data Integration for Large-Scale Analyses of Incomplete Omic Profiles Using Batch-Effect Reduction Trees (BERT)

Provides efficient batch-effect adjustment of data with missing values. BERT orders all batch effect correction to a tree of pairwise computations. BERT allows parallelization over sub-trees.

Maintained by Yannis Schumann. Last updated 2 months ago.

batcheffect preprocessing experimentaldesign qualitycontrol batch-effect bioconductor-package bioinformatics data-integration data-science

29.4 match 2 stars 5.40 score 18 scripts

poissonconsulting

batchr:Batch Process Files

Processes multiple files with a user-supplied function. The key design principle is that only files which were last modified before the directory was configured are processed. A hidden file stores the configuration time and function etc while successfully processed files are automatically touched to update their modification date. As a result batch processing can be stopped and restarted and any files created (or modified or deleted) during processing are ignored.

Maintained by Joe Thorley. Last updated 2 months ago.

batch-processing

31.8 match 6 stars 4.56 score 8 scripts

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 1 days ago.

10.5 match 845 stars 13.60 score 264 scripts 2 dependents

mllg

batchtools:Tools for Computation on Batch Systems

As a successor of the packages 'BatchJobs' and 'BatchExperiments', this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers 'IBM Spectrum LSF' (<https://www.ibm.com/products/hpc-workload-management>), 'OpenLava' (<https://www.openlava.org/>), 'Univa Grid Engine'/'Oracle Grid Engine' (<https://www.univa.com/>), 'Slurm' (<https://slurm.schedmd.com/>), 'TORQUE/PBS' (<https://adaptivecomputing.com/cherry-services/torque-resource-manager/>), or 'Docker Swarm' (<https://docs.docker.com/engine/swarm/>). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define large-scale computer experiments in a well-organized and reproducible way.

Maintained by Michel Lang. Last updated 2 years ago.

batchexperiments batchjobs docker-swarm high-performance-computing hpc hpc-clusters lsf openlava parallel-computing reproducibility sge slurm torque

12.3 match 175 stars 11.39 score 772 scripts 14 dependents

markedmondson1234

googleAuthR:Authenticate and Create Google APIs

Create R functions that interact with OAuth2 Google APIs <https://developers.google.com/apis-explorer/> easily, with auto-refresh and Shiny compatibility.

Maintained by Erik Grönroos. Last updated 10 months ago.

api authentication google googleauthr oauth2-flow shiny

10.8 match 178 stars 12.84 score 804 scripts 13 dependents

bioc

CDI:Clustering Deviation Index (CDI)

Single-cell RNA-sequencing (scRNA-seq) is widely used to explore cellular variation. The analysis of scRNA-seq data often starts from clustering cells into subpopulations. This initial step has a high impact on downstream analyses, and hence it is important to be accurate. However, there have not been unsupervised metric designed for scRNA-seq to evaluate clustering performance. Hence, we propose clustering deviation index (CDI), an unsupervised metric based on the modeling of scRNA-seq UMI counts to evaluate clustering of cells.

Maintained by Jiyuan Fang. Last updated 5 months ago.

singlecell software clustering visualization sequencing rnaseq cellbasedassays

27.6 match 5 stars 5.00 score 4 scripts

bedapub

designit:Blocking and Randomization for Experimental Design

Intelligently assign samples to batches in order to reduce batch effects. Batch effects can have a significant impact on data analysis, especially when the assignment of samples to batches coincides with the contrast groups being studied. By defining a batch container and a scoring function that reflects the contrasts, this package allows users to assign samples in a way that minimizes the potential impact of batch effects on the comparison of interest. Among other functionality, we provide an implementation for OSAT score by Yan et al. (2012, <doi:10.1186/1471-2164-13-689>).

Maintained by Iakov I. Davydov. Last updated 4 months ago.

design-of-experiments randomization

18.3 match 8 stars 7.28 score 24 scripts

bioc

BEclear:Correction of batch effects in DNA methylation data

Provides functions to detect and correct for batch effects in DNA methylation data. The core function is based on latent factor models and can also be used to predict missing values in any other matrix containing real numbers.

Maintained by Livia Rasp. Last updated 5 months ago.

batcheffect dnamethylation software preprocessing statisticalmethod batch-effects bioconductor-package dna-methylation latent-factor-model methylation missing-data missing-values stochastic-gradient-descent cpp

22.4 match 4 stars 5.90 score 11 scripts

mlr-org

mlr3tuning:Hyperparameter Optimization for 'mlr3'

Hyperparameter optimization package of the 'mlr3' ecosystem. It features highly configurable search spaces via the 'paradox' package and finds optimal hyperparameter configurations for any 'mlr3' learner. 'mlr3tuning' works with several optimization algorithms e.g. Random Search, Iterated Racing, Bayesian Optimization (in 'mlr3mbo') and Hyperband (in 'mlr3hyperband'). Moreover, it can automatically optimize learners and estimate the performance of optimized models with nested resampling.

Maintained by Marc Becker. Last updated 3 months ago.

bbotk hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 optimization tune tuning

11.1 match 55 stars 11.53 score 384 scripts 11 dependents

shixiangwang

ezcox:Easily Process a Batch of Cox Models

A tool to operate a batch of univariate or multivariate Cox models and return tidy result.

Maintained by Shixiang Wang. Last updated 1 years ago.

batch-processing cox-model

17.5 match 21 stars 7.22 score 44 scripts 1 dependents

gitdemont

IFC:Tools for Imaging Flow Cytometry

Contains several tools to treat imaging flow cytometry data from 'ImageStream®' and 'FlowSight®' cytometers ('Amnis®' 'Cytek®'). Provides an easy and simple way to read and write .fcs, .rif, .cif and .daf files. Information such as masks, features, regions and populations set within these files can be retrieved for each single cell. In addition, raw data such as images stored can also be accessed. Users, may hopefully increase their productivity thanks to dedicated functions to extract, visualize, manipulate and export 'IFC' data. Toy data example can be installed through the 'IFCdata' package of approximately 32 MB, which is available in a 'drat' repository <https://gitdemont.github.io/IFCdata/>. See file 'COPYRIGHTS' and file 'AUTHORS' for a list of copyright holders and authors.

Maintained by Yohann Demont. Last updated 11 days ago.

cytometry cytometry-data flow flow-cytometry flow-cytometry-analysis flow-cytometry-data flow-cytometry-files ifc image imaging-flow-cytometry imaging-flow-cytometry-data microscopy cpp

22.1 match 4 stars 5.34 score 12 scripts

stcolema

batchmix:Semi-Supervised Bayesian Mixture Models Incorporating Batch Correction

Semi-supervised and unsupervised Bayesian mixture models that simultaneously infer the cluster/class structure and a batch correction. Densities available are the multivariate normal and the multivariate t. The model sampler is implemented in C++. This package is aimed at analysis of low-dimensional data generated across several batches. See Coleman et al. (2022) <doi:10.1101/2022.01.14.476352> for details of the model.

Maintained by Stephen Coleman. Last updated 10 months ago.

openblas cpp openmp

27.7 match 4.00 score 3 scripts

bioc

PLSDAbatch:PLSDA-batch

A novel framework to correct for batch effects prior to any downstream analysis in microbiome data based on Projection to Latent Structures Discriminant Analysis. The main method is named “PLSDA-batch”. It first estimates treatment and batch variation with latent components, then subtracts batch-associated components from the data whilst preserving biological variation of interest. PLSDA-batch is highly suitable for microbiome data as it is non-parametric, multivariate and allows for ordination and data visualisation. Combined with centered log-ratio transformation for addressing uneven library sizes and compositional structure, PLSDA-batch addresses all characteristics of microbiome data that existing correction methods have ignored so far. Two other variants are proposed for 1/ unbalanced batch x treatment designs that are commonly encountered in studies with small sample sizes, and for 2/ selection of discriminative variables amongst treatment groups to avoid overfitting in classification problems. These two variants have widened the scope of applicability of PLSDA-batch to different data settings.

Maintained by Yiwen (Eva) Wang. Last updated 5 months ago.

statisticalmethod dimensionreduction principalcomponent classification microbiome batcheffect normalization visualization

19.9 match 13 stars 5.37 score 18 scripts

bioc

debrowser:Interactive Differential Expresion Analysis Browser

Bioinformatics platform containing interactive plots and tables for differential gene and region expression studies. Allows visualizing expression data much more deeply in an interactive and faster way. By changing the parameters, users can easily discover different parts of the data that like never have been done before. Manually creating and looking these plots takes time. With DEBrowser users can prepare plots without writing any code. Differential expression, PCA and clustering analysis are made on site and the results are shown in various plots such as scatter, bar, box, volcano, ma plots and Heatmaps.

Maintained by Alper Kucukural. Last updated 5 months ago.

sequencing chipseq rnaseq differentialexpression geneexpression clustering immunooncology

13.2 match 61 stars 7.80 score 65 scripts

stopsack

batchtma:Batch Effect Adjustments

Different adjustment methods for batch effects in biomarker data, such as from tissue microarrays. Some methods attempt to retain differences between batches that may be due to between-batch differences in "biological" factors that influence biomarker values.

Maintained by Konrad Stopsack. Last updated 9 months ago.

batch-effects measurement-error tissue-microarray-analysis

27.2 match 1 stars 3.70 score 3 scripts

ropensci

tarchetypes:Archetypes for Targets

Function-oriented Make-like declarative pipelines for Statistics and data science are supported in the 'targets' R package. As an extension to 'targets', the 'tarchetypes' package provides convenient user-side functions to make 'targets' easier to use. By establishing reusable archetypes for common kinds of targets and pipelines, these functions help express complicated reproducible pipelines concisely and compactly. The methods in this package were influenced by the 'targets' R package. by Will Landau (2018) <doi:10.21105/joss.00550>.

Maintained by William Michael Landau. Last updated 22 days ago.

data-science high-performance-computing peer-reviewed pipeline r-targetopia reproducibility targets workflow

8.7 match 141 stars 11.43 score 1.7k scripts 10 dependents

gojiplus

captr:Client for the Captricity API

Get text from images of text using Captricity Optical Character Recognition (OCR) API. Captricity allows you to get text from handwritten forms --- think surveys --- and other structured paper documents. And it can output data in form a delimited file keeping field information intact. For more information, read <https://shreddr.captricity.com/developer/overview/>.

Maintained by Gaurav Sood. Last updated 7 years ago.

captricity captricity-api ocr

18.8 match 14 stars 5.29 score 28 scripts

rstudio

tfdatasets:Interface to 'TensorFlow' Datasets

Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://www.tensorflow.org/guide> for additional details.

Maintained by Tomasz Kalinowski. Last updated 6 days ago.

10.3 match 34 stars 9.32 score 656 scripts 3 dependents

cran

batch:Batching Routines in Parallel and Passing Command-Line Arguments to R

Functions to allow you to easily pass command-line arguments into R, and functions to aid in submitting your R code in parallel on a cluster and joining the results afterward (e.g. multiple parameter values for simulations running in parallel, splitting up a permutation test in parallel, etc.). See `parseCommandArgs(...)' for the main example of how to use this package.

Maintained by Thomas Hoffmann. Last updated 7 years ago.

59.8 match 1.60 score

romanhornung

bapred:Batch Effect Removal and Addon Normalization (in Phenotype Prediction using Gene Data)

Various tools dealing with batch effects, in particular enabling the removal of discrepancies between training and test sets in prediction scenarios. Moreover, addon quantile normalization and addon RMA normalization (Kostka & Spang, 2008) is implemented to enable integrating the quantile normalization step into prediction rules. The following batch effect removal methods are implemented: FAbatch, ComBat, (f)SVA, mean-centering, standardization, Ratio-A and Ratio-G. For each of these we provide an additional function which enables a posteriori ('addon') batch effect removal in independent batches ('test data'). Here, the (already batch effect adjusted) training data is not altered. For evaluating the success of batch effect adjustment several metrics are provided. Moreover, the package implements a plot for the visualization of batch effects using principal component analysis. The main functions of the package for batch effect adjustment are ba() and baaddon() which enable batch effect removal and addon batch effect removal, respectively, with one of the seven methods mentioned above. Another important function here is bametric() which is a wrapper function for all implemented methods for evaluating the success of batch effect removal. For (addon) quantile normalization and (addon) RMA normalization the functions qunormtrain(), qunormaddon(), rmatrain() and rmaaddon() can be used.

Maintained by Roman Hornung. Last updated 3 years ago.

52.8 match 1.78 score 20 scripts

tudo-r

BatchJobs:Batch Computing with R

Provides Map, Reduce and Filter variants to generate jobs on batch computing systems like PBS/Torque, LSF, SLURM and Sun Grid Engine. Multicore and SSH systems are also supported. For further details see the project web page.

Maintained by Bernd Bischl. Last updated 3 years ago.

10.9 match 85 stars 8.57 score 616 scripts 3 dependents

tiledb-inc

tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays

The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.

Maintained by Isaiah Norton. Last updated 6 days ago.

array hdfs s3 storage-manager tiledb cpp

7.8 match 107 stars 11.96 score 306 scripts 4 dependents

bioc

sevenbridges:Seven Bridges Platform API Client and Common Workflow Language Tool Builder in R

R client and utilities for Seven Bridges platform API, from Cancer Genomics Cloud to other Seven Bridges supported platforms.

Maintained by Phil Webster. Last updated 5 months ago.

software dataimport thirdpartyclient api-client bioconductor bioinformatics cloud common-workflow-language sevenbridges

11.5 match 35 stars 7.40 score 24 scripts

r-simmer

simmer:Discrete-Event Simulation for R

A process-oriented and trajectory-based Discrete-Event Simulation (DES) package for R. It is designed as a generic yet powerful framework. The architecture encloses a robust and fast simulation core written in 'C++' with automatic monitoring capabilities. It provides a rich and flexible R API that revolves around the concept of trajectory, a common path in the simulation model for entities of the same type. Documentation about 'simmer' is provided by several vignettes included in this package, via the paper by Ucar, Smeets & Azcorra (2019, <doi:10.18637/jss.v090.i02>), and the paper by Ucar, Hernández, Serrano & Azcorra (2018, <doi:10.1109/MCOM.2018.1700960>); see 'citation("simmer")' for details.

Maintained by Iñaki Ucar. Last updated 6 months ago.

discrete-event simulation cpp

7.3 match 223 stars 11.47 score 440 scripts 6 dependents

stevenmmortimer

salesforcer:An Implementation of 'Salesforce' APIs Using Tidy Principles

Functions connecting to the 'Salesforce' Platform APIs (REST, SOAP, Bulk 1.0, Bulk 2.0, Metadata, Reports and Dashboards) <https://trailhead.salesforce.com/content/learn/modules/api_basics/api_basics_overview>. "API" is an acronym for "application programming interface". Most all calls from these APIs are supported as they use CSV, XML or JSON data that can be parsed into R data structures. For more details please see the 'Salesforce' API documentation and this package's website <https://stevenmmortimer.github.io/salesforcer/> for more information, documentation, and examples.

Maintained by Steven M. Mortimer. Last updated 4 months ago.

api-wrappers r-language r-programming salesforce salesforce-apis

8.8 match 82 stars 9.27 score 191 scripts

zheng206

ComBatFamQC:Comprehensive Batch Effect Diagnostics and Harmonization

Provides a comprehensive framework for batch effect diagnostics, harmonization, and post-harmonization downstream analysis. Features include interactive visualization tools, robust statistical tests, and a range of harmonization techniques. Additionally, 'ComBatFamQC' enables the creation of life-span age trend plots with estimated age-adjusted centiles and facilitates the generation of covariate-corrected residuals for analytical purposes. Methods for harmonization are based on approaches described in Johnson et al., (2007) <doi:10.1093/biostatistics/kxj037>, Beer et al., (2020) <doi:10.1016/j.neuroimage.2020.117129>, Pomponio et al., (2020) <doi:10.1016/j.neuroimage.2019.116450>, and Chen et al., (2021) <doi:10.1002/hbm.25688>.

Maintained by Zheng Ren. Last updated 1 days ago.

diagnostic-tool harmonization rshinyapp

14.9 match 2 stars 5.41 score 16 scripts

bioc

standR:Spatial transcriptome analyses of Nanostring's DSP data in R

standR is an user-friendly R package providing functions to assist conducting good-practice analysis of Nanostring's GeoMX DSP data. All functions in the package are built based on the SpatialExperiment object, allowing integration into various spatial transcriptomics-related packages from Bioconductor. standR allows data inspection, quality control, normalization, batch correction and evaluation with informative visualizations.

Maintained by Ning Liu. Last updated 1 months ago.

spatial transcriptomics geneexpression differentialexpression qualitycontrol normalization experimenthubsoftware

10.8 match 18 stars 7.39 score 45 scripts

davidcsterratt

retistruct:Retinal Reconstruction Program

Reconstructs retinae by morphing a flat surface with cuts (a dissected flat-mount retina) onto a curvilinear surface (the standard retinal shape). It can estimate the position of a point on the intact adult retina to within 8 degrees of arc (3.6% of nasotemporal axis). The coordinates in reconstructed retinae can be transformed to visuotopic coordinates. For more details see Sterratt, D. C., Lyngholm, D., Willshaw, D. J. and Thompson, I. D. (2013) <doi:10.1371/journal.pcbi.1002921>.

Maintained by David C. Sterratt. Last updated 11 days ago.

17.4 match 8 stars 4.60 score

azure

AzureGraph:Simple Interface to 'Microsoft Graph'

A simple interface to the 'Microsoft Graph' API <https://learn.microsoft.com/en-us/graph/overview>. 'Graph' is a comprehensive framework for accessing data in various online Microsoft services. This package was originally intended to provide an R interface only to the 'Azure Active Directory' part, with a view to supporting interoperability of R and 'Azure': users, groups, registered apps and service principals. However it has since been expanded into a more general tool for interacting with Graph. Part of the 'AzureR' family of packages.

Maintained by Hong Ooi. Last updated 2 years ago.

azure-active-directory-graph-api azure-sdk-r microsoft-graph-api

7.5 match 32 stars 10.30 score 36 scripts 21 dependents

bioc

MBECS:Evaluation and correction of batch effects in microbiome data-sets

The Microbiome Batch Effect Correction Suite (MBECS) provides a set of functions to evaluate and mitigate unwated noise due to processing in batches. To that end it incorporates a host of batch correcting algorithms (BECA) from various packages. In addition it offers a correction and reporting pipeline that provides a preliminary look at the characteristics of a data-set before and after correcting for batch effects.

Maintained by Michael Olbrich. Last updated 5 months ago.

batcheffect microbiome reportwriting visualization normalization qualitycontrol

16.6 match 4 stars 4.60 score 4 scripts

bioc

mbkmeans:Mini-batch K-means Clustering for Single-Cell RNA-seq

Implements the mini-batch k-means algorithm for large datasets, including support for on-disk data representation.

Maintained by Davide Risso. Last updated 5 months ago.

clustering geneexpression rnaseq software transcriptomics sequencing singlecell human-cell-atlas cpp

10.0 match 10 stars 7.41 score 54 scripts 2 dependents

martinloza

Canek:Batch Correction of Single Cell Transcriptome Data

Non-linear/linear hybrid method for batch-effect correction that uses Mutual Nearest Neighbors (MNNs) to identify similar cells between datasets. Reference: Loza M. et al. (NAR Genomics and Bioinformatics, 2020) <doi:10.1093/nargab/lqac022>.

Maintained by Martin Loza. Last updated 1 years ago.

batch-effects bioinformatics single-cell-rna-seq transcriptomics

14.3 match 5 stars 5.06 score 23 scripts

crunch-io

crunch:Crunch.io Data Tools

The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.

Maintained by Greg Freedman Ellis. Last updated 12 days ago.

6.9 match 9 stars 10.53 score 200 scripts 2 dependents

yaoxiangli

cmmr:CEU Mass Mediator RESTful API

CEU (CEU San Pablo University) Mass Mediator is an on-line tool for aiding researchers in performing metabolite annotation. 'cmmr' (CEU Mass Mediator RESTful API) allows for programmatic access in R: batch search, batch advanced search, MS/MS (tandem mass spectrometry) search, etc. For more information about the API Endpoint please go to <https://github.com/YaoxiangLi/cmmr>.

Maintained by Yaoxiang Li. Last updated 5 months ago.

batch-search ceu-mass-mediator metablomics ms-search

15.1 match 15 stars 4.73 score 12 scripts

ouhscbbmc

REDCapR:Interaction Between R and REDCap

Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.

Maintained by Will Beasley. Last updated 2 months ago.

redcap redcap-api

5.7 match 118 stars 12.36 score 438 scripts 6 dependents

bioc

RPA:RPA: Robust Probabilistic Averaging for probe-level analysis

Probabilistic analysis of probe reliability and differential gene expression on short oligonucleotide arrays.

Maintained by Leo Lahti. Last updated 5 months ago.

geneexpression microarray preprocessing qualitycontrol

12.2 match 5.78 score 20 scripts 1 dependents

mrcieu

ieugwasr:Interface to the 'OpenGWAS' Database API

Interface to the 'OpenGWAS' database API <https://api.opengwas.io/api/>. Includes a wrapper to make generic calls to the API, plus convenience functions for specific queries.

Maintained by Gibran Hemani. Last updated 5 days ago.

6.5 match 89 stars 10.71 score 404 scripts 6 dependents

kharchenkolab

conos:Clustering on Network of Samples

Wires together large collections of single-cell RNA-seq datasets, which allows for both the identification of recurrent cell clusters and the propagation of information between datasets in multi-sample or atlas-scale collections. 'Conos' focuses on the uniform mapping of homologous cell types across heterogeneous sample collections. For instance, users could investigate a collection of dozens of peripheral blood samples from cancer patients combined with dozens of controls, which perhaps includes samples of a related tissue such as lymph nodes. This package interacts with data available through the 'conosPanel' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/conos>. The size of the 'conosPanel' package is approximately 12 MB.

Maintained by Evan Biederstedt. Last updated 1 years ago.

batch-correction scrna-seq single-cell-rna-seq openblas cpp openmp

9.1 match 204 stars 7.32 score 258 scripts

bioc

NewWave:Negative binomial model for scRNA-seq

A model designed for dimensionality reduction and batch effect removal for scRNA-seq data. It is designed to be massively parallelizable using shared objects that prevent memory duplication, and it can be used with different mini-batch approaches in order to reduce time consumption. It assumes a negative binomial distribution for the data with a dispersion parameter that can be both commonwise across gene both genewise.

Maintained by Federico Agostinis. Last updated 5 months ago.

software geneexpression transcriptomics singlecell batcheffect sequencing coverage regression batch-effects dimensionality-reduction negative-binomial scrna-seq

12.4 match 4 stars 5.33 score 27 scripts

bioc

sccomp:Tests differences in cell-type proportion for single-cell data, robust to outliers

A robust and outlier-aware method for testing differences in cell-type proportion in single-cell data. This model can infer changes in tissue composition and heterogeneity, and can produce realistic data simulations based on any existing dataset. This model can also transfer knowledge from a large set of integrated datasets to increase accuracy further.

Maintained by Stefano Mangiola. Last updated 3 days ago.

bayesian regression differentialexpression singlecell metagenomics flowcytometry spatial batch-correction composition cytof differential-proportion microbiome multilevel proportions random-effects single-cell unwanted-variation

7.5 match 99 stars 8.43 score 69 scripts

paws-r

paws:Amazon Web Services Software Development Kit

Interface to Amazon Web Services <https://aws.amazon.com>, including storage, database, and compute services, such as 'Simple Storage Service' ('S3'), 'DynamoDB' 'NoSQL' database, and 'Lambda' functions-as-a-service.

Maintained by Dyfan Jones. Last updated 5 days ago.

aws aws-sdk

5.6 match 332 stars 11.25 score 177 scripts 12 dependents

berrij

profoc:Probabilistic Forecast Combination Using CRPS Learning

Combine probabilistic forecasts using CRPS learning algorithms proposed in Berrisch, Ziel (2021) <doi:10.48550/arXiv.2102.00968> <doi:10.1016/j.jeconom.2021.11.008>. The package implements multiple online learning algorithms like Bernstein online aggregation; see Wintenberger (2014) <doi:10.48550/arXiv.1404.1356>. Quantile regression is also implemented for comparison purposes. Model parameters can be tuned automatically with respect to the loss of the forecast combination. Methods like predict(), update(), plot() and print() are available for convenience. This package utilizes the optim C++ library for numeric optimization <https://github.com/kthohr/optim>.

Maintained by Jonathan Berrisch. Last updated 6 months ago.

openblas cpp openmp

10.8 match 14 stars 5.74 score 13 scripts

bioc

ChemmineR:Cheminformatics Toolkit for R

ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.

Maintained by Thomas Girke. Last updated 5 months ago.

cheminformatics biomedicalinformatics pharmacogenetics pharmacogenomics microtitreplateassay cellbasedassays visualization infrastructure dataimport clustering proteomics metabolomics cpp

6.5 match 15 stars 9.45 score 253 scripts 12 dependents

bioc

scMerge:scMerge: Merging multiple batches of scRNA-seq data

Like all gene expression data, single-cell data suffers from batch effects and other unwanted variations that makes accurate biological interpretations difficult. The scMerge method leverages factor analysis, stably expressed genes (SEGs) and (pseudo-) replicates to remove unwanted variations and merge multiple single-cell data. This package contains all the necessary functions in the scMerge pipeline, including the identification of SEGs, replication-identification methods, and merging of single-cell data.

Maintained by Yingxin Lin. Last updated 5 months ago.

batcheffect geneexpression normalization rnaseq sequencing singlecell software transcriptomics bioinformatics single-cell

6.4 match 67 stars 9.52 score 137 scripts 1 dependents

sciurus365

simlandr:Simulation-Based Landscape Construction for Dynamical Systems

A toolbox for constructing potential landscapes for dynamical systems using Monte Carlo simulation. The method is based on the potential landscape definition by Wang et al. (2008) <doi:10.1073/pnas.0800579105> (also see Zhou & Li, 2016 <doi:10.1063/1.4943096> for further mathematical discussions) and can be used for a large variety of models.

Maintained by Jingmeng Cui. Last updated 1 months ago.

research-tool

9.4 match 6 stars 6.41 score 12 scripts 2 dependents

mikejareds

hermiter:Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate)

Facilitates estimation of full univariate and bivariate probability density functions and cumulative distribution functions along with full quantile functions (univariate) and nonparametric correlation (bivariate) using Hermite series based estimators. These estimators are particularly useful in the sequential setting (both stationary and non-stationary) and one-pass batch estimation setting for large data sets. Based on: Stephanou, Michael, Varughese, Melvin and Macdonald, Iain. "Sequential quantiles via Hermite series density estimation." Electronic Journal of Statistics 11.1 (2017): 570-607 <doi:10.1214/17-EJS1245>, Stephanou, Michael and Varughese, Melvin. "On the properties of Hermite series based distribution function estimators." Metrika (2020) <doi:10.1007/s00184-020-00785-z> and Stephanou, Michael and Varughese, Melvin. "Sequential estimation of Spearman rank correlation using Hermite series estimators." Journal of Multivariate Analysis (2021) <doi:10.1016/j.jmva.2021.104783>.

Maintained by Michael Stephanou. Last updated 7 months ago.

cumulative-distribution-function kendall-correlation-coefficient online-algorithms probability-density-function quantile spearman-correlation-coefficient statistics streaming-algorithms streaming-data cpp

10.7 match 15 stars 5.58 score 17 scripts

bioc

CellMixS:Evaluate Cellspecific Mixing

CellMixS provides metrics and functions to evaluate batch effects, data integration and batch effect correction in single cell trancriptome data with single cell resolution. Results can be visualized and summarised on different levels, e.g. on cell, celltype or dataset level.

Maintained by Almut Lütge. Last updated 5 months ago.

singlecell transcriptomics geneexpression batcheffect

9.4 match 7 stars 6.35 score 64 scripts

bioc

pmp:Peak Matrix Processing and signal batch correction for metabolomics datasets

Methods and tools for (pre-)processing of metabolomics datasets (i.e. peak matrices), including filtering, normalisation, missing value imputation, scaling, and signal drift and batch effect correction methods. Filtering methods are based on: the fraction of missing values (across samples or features); Relative Standard Deviation (RSD) calculated from the Quality Control (QC) samples; the blank samples. Normalisation methods include Probabilistic Quotient Normalisation (PQN) and normalisation to total signal intensity. A unified user interface for several commonly used missing value imputation algorithms is also provided. Supported methods are: k-nearest neighbours (knn), random forests (rf), Bayesian PCA missing value estimator (bpca), mean or median value of the given feature and a constant small value. The generalised logarithm (glog) transformation algorithm is available to stabilise the variance across low and high intensity mass spectral features. Finally, this package provides an implementation of the Quality Control-Robust Spline Correction (QCRSC) algorithm for signal drift and batch effect correction of mass spectrometry-based datasets.

Maintained by Gavin Rhys Lloyd. Last updated 5 months ago.

massspectrometry metabolomics software qualitycontrol batcheffect

12.6 match 4.60 score 33 scripts

bioc

BUScorrect:Batch Effects Correction with Unknown Subtypes

High-throughput experimental data are accumulating exponentially in public databases. However, mining valid scientific discoveries from these abundant resources is hampered by technical artifacts and inherent biological heterogeneity. The former are usually termed "batch effects," and the latter is often modelled by "subtypes." The R package BUScorrect fits a Bayesian hierarchical model, the Batch-effects-correction-with-Unknown-Subtypes model (BUS), to correct batch effects in the presence of unknown subtypes. BUS is capable of (a) correcting batch effects explicitly, (b) grouping samples that share similar characteristics into subtypes, (c) identifying features that distinguish subtypes, and (d) enjoying a linear-order computation complexity.

Maintained by Xiangyu Luo. Last updated 5 months ago.

geneexpression statisticalmethod bayesian clustering featureextraction batcheffect

14.4 match 4.00 score 2 scripts

bioc

MetaCyto:MetaCyto: A package for meta-analysis of cytometry data

This package provides functions for preprocessing, automated gating and meta-analysis of cytometry data. It also provides functions that facilitate the collection of cytometry data from the ImmPort database.

Maintained by Zicheng Hu. Last updated 5 months ago.

immunooncology cellbiology flowcytometry clustering statisticalmethod software cellbasedassays preprocessing

11.9 match 4.73 score 18 scripts

paws-r

paws.compute:'Amazon Web Services' Compute Services

Interface to 'Amazon Web Services' compute services, including 'Elastic Compute Cloud' ('EC2'), 'Lambda' functions-as-a-service, containers, batch processing, and more <https://aws.amazon.com/>.

Maintained by Dyfan Jones. Last updated 5 days ago.

aws aws-sdk

6.1 match 332 stars 9.12 score 16 dependents

rezakj

iCellR:Analyzing High-Throughput Single Cell Sequencing Data

A toolkit that allows scientists to work with data from single cell sequencing technologies such as scRNA-seq, scVDJ-seq, scATAC-seq, CITE-Seq and Spatial Transcriptomics (ST). Single (i) Cell R package ('iCellR') provides unprecedented flexibility at every step of the analysis pipeline, including normalization, clustering, dimensionality reduction, imputation, visualization, and so on. Users can design both unsupervised and supervised models to best suit their research. In addition, the toolkit provides 2D and 3D interactive visualizations, differential expression analysis, filters based on cells, genes and clusters, data merging, normalizing for dropouts, data imputation methods, correcting for batch differences, pathway analysis, tools to find marker genes for clusters and conditions, predict cell types and pseudotime analysis. See Khodadadi-Jamayran, et al (2020) <doi:10.1101/2020.05.05.078550> and Khodadadi-Jamayran, et al (2020) <doi:10.1101/2020.03.31.019109> for more details.

Maintained by Alireza Khodadadi-Jamayran. Last updated 8 months ago.

10xgenomics 3d batch-normalization cell-type-classification cite-seq clustering clustering-algorithm diffusion-maps dropout icellr imputation intractive-graph normalization pseudotime scrna-seq scvdj-seq singel-cell-sequencing umap cpp

9.9 match 121 stars 5.56 score 7 scripts 1 dependents

bioc

onlineFDR:Online error rate control

This package allows users to control the false discovery rate (FDR) or familywise error rate (FWER) for online multiple hypothesis testing, where hypotheses arrive in a stream. In this framework, a null hypothesis is rejected based on the evidence against it and on the previous rejection decisions.

Maintained by David S. Robertson. Last updated 5 months ago.

multiplecomparison software statisticalmethod error-rate-control fdr fwer hypothesis-testing cpp

7.9 match 14 stars 6.88 score 26 scripts

sciviews

svMisc:Miscellaneous Functions for 'SciViews::R'

Functions required for the 'SciViews::R' dialect or for general use: manage a temporary environment attached to the search path, define synonyms for R functions using aka(), test if 'Aqua', 'Mac', 'Win' ... Show progress bar, etc.

Maintained by Philippe Grosjean. Last updated 4 months ago.

gui sciviews

6.4 match 3 stars 8.32 score 380 scripts 16 dependents

bioc

oligoClasses:Classes for high-throughput arrays supported by oligo and crlmm

This package contains class definitions, validity checks, and initialization methods for classes used by the oligo and crlmm packages.

Maintained by Benilton Carvalho. Last updated 5 months ago.

infrastructure

8.8 match 5.85 score 93 scripts 17 dependents

piusdahinden

expirest:Expiry Estimation Procedures

The Australian Regulatory Guidelines for Prescription Medicines (ARGPM), guidance on "Stability testing for prescription medicines", recommends to predict the shelf life of chemically derived medicines from stability data by taking the worst case situation at batch release into account. Consequently, if a change over time is observed, a release limit needs to be specified. Finding a release limit and the associated shelf life is supported, as well as the standard approach that is recommended by guidance Q1E "Evaluation of stability data" from the International Council for Harmonisation (ICH).

Maintained by Pius Dahinden. Last updated 21 days ago.

15.0 match 3.40 score 6 scripts

bioc

Xeva:Analysis of patient-derived xenograft (PDX) data

The Xeva package provides efficient and powerful functions for patient-drived xenograft (PDX) based pharmacogenomic data analysis. This package contains a set of functions to perform analysis of patient-derived xenograft data. This package was developed by the BHKLab, for further information please see our documentation.

Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.

geneexpression pharmacogenetics pharmacogenomics software classification

8.0 match 11 stars 6.35 score 17 scripts

tianshu129

foqat:Field Observation Quick Analysis Toolkit

Tools for quickly processing and analyzing field observation data and air quality data. This tools contain functions that facilitate analysis in atmospheric chemistry (especially in ozone pollution). Some functions of time series are also applicable to other fields. For detail please view homepage<https://github.com/tianshu129/foqat>. Scientific Reference: 1. The Hydroxyl Radical (OH) Reactivity: Roger Atkinson and Janet Arey (2003) <doi:10.1021/cr0206420>. 2. Ozone Formation Potential (OFP): <https://ww2.arb.ca.gov/sites/default/files/classic/regact/2009/mir2009/mir10.pdf>, Zhang et al.(2021) <doi:10.5194/acp-21-11053-2021>. 3. Aerosol Formation Potential (AFP): Wenjing Wu et al. (2016) <doi:10.1016/j.jes.2016.03.025>. 4. TUV model: <https://www2.acom.ucar.edu/modeling/tropospheric-ultraviolet-and-visible-tuv-radiation-model>.

Maintained by Tianshu Chen. Last updated 6 months ago.

air-pollution air-quality air-quality-data air-quality-measurements air-quality-monitor air-quality-reports air-quality-sensor atmospheric-chemistry atmospheric-modelling atmospheric-science daily-maximum-8-hour-ozone field-observation mir ofp ozone-formation-potential photolysis-rate-coefficients time-series time-series-analysis tuv

11.1 match 35 stars 4.54 score 20 scripts

ropensci

tacmagic:Positron Emission Tomography Time-Activity Curve Analysis

To facilitate the analysis of positron emission tomography (PET) time activity curve (TAC) data, and to encourage open science and replicability, this package supports data loading and analysis of multiple TAC file formats. Functions are available to analyze loaded TAC data for individual participants or in batches. Major functionality includes weighted TAC merging by region of interest (ROI), calculating models including standardized uptake value ratio (SUVR) and distribution volume ratio (DVR, Logan et al. 1996 <doi:10.1097/00004647-199609000-00008>), basic plotting functions and calculation of cut-off values (Aizenstein et al. 2008 <doi:10.1001/archneur.65.11.1509>). Please see the walkthrough vignette for a detailed overview of 'tacmagic' functions.

Maintained by Eric Brown. Last updated 5 years ago.

mri neuroimaging neuroscience neuroscience-methods pet pet-mr positron positron-emission-tomography statistics

10.3 match 5 stars 4.76 score 23 scripts

wraff

wrMisc:Analyze Experimental High-Throughput (Omics) Data

The efficient treatment and convenient analysis of experimental high-throughput (omics) data gets facilitated through this collection of diverse functions. Several functions address advanced object-conversions, like manipulating lists of lists or lists of arrays, reorganizing lists to arrays or into separate vectors, merging of multiple entries, etc. Another set of functions provides speed-optimized calculation of standard deviation (sd), coefficient of variance (CV) or standard error of the mean (SEM) for data in matrixes or means per line with respect to additional grouping (eg n groups of replicates). A group of functions facilitate dealing with non-redundant information, by indexing unique, adding counters to redundant or eliminating lines with respect redundancy in a given reference-column, etc. Help is provided to identify very closely matching numeric values to generate (partial) distance matrixes for very big data in a memory efficient manner or to reduce the complexity of large data-sets by combining very close values. Other functions help aligning a matrix or data.frame to a reference using partial matching or to mine an experimental setup to extract patterns of replicate samples. Many times large experimental datasets need some additional filtering, adequate functions are provided. Convenient data normalization is supported in various different modes, parameter estimation via permutations or boot-strap as well as flexible testing of multiple pair-wise combinations using the framework of 'limma' is provided, too. Batch reading (or writing) of sets of files and combining data to arrays is supported, too.

Maintained by Wolfgang Raffelsberger. Last updated 7 months ago.

10.9 match 4.44 score 33 scripts 4 dependents

piusdahinden

disprofas:Non-Parametric Dissolution Profile Analysis

Similarity of dissolution profiles is assessed using the similarity factor f2 according to the EMA guideline (European Medicines Agency 2010) "On the investigation of bioequivalence". Dissolution profiles are regarded as similar if the f2 value is between 50 and 100. For the applicability of the similarity factor f2, the variability between profiles needs to be within certain limits. Often, this constraint is violated. One possibility in this situation is to resample the measured profiles in order to obtain a bootstrap estimate of f2 (Shah et al. (1998) <doi:10.1023/A:1011976615750>). Other alternatives are the model-independent non-parametric multivariate confidence region (MCR) procedure (Tsong et al. (1996) <doi:10.1177/009286159603000427>) or the T2-test for equivalence procedure (Hoffelder (2016) <https://www.ecv.de/suse_item.php?suseId=Z|pi|8430>). Functions for estimation of f1, f2, bootstrap f2, MCR / T2-test for equivalence procedure are implemented.

Maintained by Pius Dahinden. Last updated 9 months ago.

12.8 match 3.70 score 2 scripts

avi-kenny

SimEngine:A Modular Framework for Statistical Simulations in R

An open-source R package for structuring, maintaining, running, and debugging statistical simulations on both local and cluster-based computing environments.See full documentation at <https://avi-kenny.github.io/SimEngine/>.

Maintained by Avi Kenny. Last updated 25 days ago.

6.4 match 12 stars 7.18 score 50 scripts

michaelhallquist

MplusAutomation:An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus

Leverages the R language to automate latent variable model estimation and interpretation using 'Mplus', a powerful latent variable modeling program developed by Muthen and Muthen (<https://www.statmodel.com>). Specifically, this package provides routines for creating related groups of models, running batches of models, and extracting and tabulating model parameters and fit statistics.

Maintained by Michael Hallquist. Last updated 2 months ago.

3.6 match 86 stars 12.96 score 664 scripts 13 dependents

mlr-org

mlr3torch:Deep Learning with 'mlr3'

Deep Learning library that extends the mlr3 framework by building upon the 'torch' package. It allows to conveniently build, train, and evaluate deep learning models without having to worry about low level details. Custom architectures can be created using the graph language defined in 'mlr3pipelines'.

Maintained by Sebastian Fischer. Last updated 1 months ago.

data-science deep-learning machine-learning mlr3 torch

6.0 match 42 stars 7.63 score 78 scripts

tengfei-emory

QuantNorm:Mitigating the Adverse Impact of Batch Effects in Sample Pattern Detection

Modifies the distance matrix obtained from data with batch effects, so as to improve the performance of sample pattern detection, such as clustering, dimension reduction, and construction of networks between subjects. The method has been published in Bioinformatics (Fei et al, 2018, <doi:10.1093/bioinformatics/bty117>). Also available on 'GitHub' <https://github.com/tengfei-emory/QuantNorm>.

Maintained by Teng Fei. Last updated 5 years ago.

batch-effects

12.5 match 9 stars 3.65 score 9 scripts

ohdsi

Andromeda:Asynchronous Disk-Based Representation of Massive Data

Storing very large data objects on a local drive, while still making it possible to manipulate the data in an efficient manner.

Maintained by Martijn Schuemie. Last updated 7 months ago.

hades

4.9 match 11 stars 9.18 score 57 scripts 7 dependents

mmaechler

cluster:"Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al.

Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) "Finding Groups in Data".

Maintained by Martin Maechler. Last updated 6 days ago.

3.8 match 3 stars 11.98 score 14k scripts 2.2k dependents

us-bea

bea.R:Bureau of Economic Analysis API

Provides an R interface for the Bureau of Economic Analysis (BEA) API (see <http://www.bea.gov/API/bea_web_service_api_user_guide.htm> for more information) that serves two core purposes - 1. To Extract/Transform/Load data [beaGet()] from the BEA API as R-friendly formats in the user's work space [transformation done by default in beaGet() can be modified using optional parameters; see, too, bea2List(), bea2Tab()]. 2. To enable the search of descriptive meta data [beaSearch()]. Other features of the library exist mainly as intermediate methods or are in early stages of development. Important Note - You must have an API key to use this library. Register for a key at <http://www.bea.gov/API/signup/index.cfm> .

Maintained by Andrea Batch. Last updated 2 months ago.

9.2 match 118 stars 4.77 score

bioc

sva:Surrogate Variable Analysis

The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).

Maintained by Jeffrey T. Leek. Last updated 5 months ago.

immunooncology microarray statisticalmethod preprocessing multiplecomparison sequencing rnaseq batcheffect normalization

4.3 match 10.04 score 3.2k scripts 50 dependents

yufree

enviGCMS:GC/LC-MS Data Analysis for Environmental Science

Gas/Liquid Chromatography-Mass Spectrometer(GC/LC-MS) Data Analysis for Environmental Science. This package covered topics such molecular isotope ratio, matrix effects and Short-Chain Chlorinated Paraffins analysis etc. in environmental analysis.

Maintained by Miao YU. Last updated 2 months ago.

environment mass-spectrometry metabolomics

6.7 match 17 stars 6.49 score 30 scripts 1 dependents

cvasi-tktd

cvasi:Calibration, Validation, and Simulation of TKTD Models

Eases the use of ecotoxicological effect models. Can simulate common toxicokinetic-toxicodynamic (TK/TD) models such as General Unified Threshold models of Survival (GUTS) and Lemna. It can derive effects and effect profiles (EPx) from scenarios. It supports the use of 'tidyr' workflows employing the pipe symbol. Time-consuming tasks can be parallelized.

Maintained by Nils Kehrein. Last updated 6 days ago.

ecotoxicology modeling simulation

6.8 match 2 stars 6.26 score 12 scripts

jessecambon

tidygeocoder:Geocoding Made Easy

An intuitive interface for getting data from geocoding services.

Maintained by Jesse Cambon. Last updated 4 months ago.

geocoding rspatial tidyverse

3.8 match 287 stars 11.35 score 1.0k scripts 9 dependents

cbailiss

pivottabler:Create Pivot Tables

Create regular pivot tables with just a few lines of R. More complex pivot tables can also be created, e.g. pivot tables with irregular layouts, multiple calculations and/or derived calculations based on multiple data frames. Pivot tables are constructed using R only and can be written to a range of output formats (plain text, 'HTML', 'Latex' and 'Excel'), including with styling/formatting.

Maintained by Christopher Bailiss. Last updated 1 years ago.

calculations html htmlwidget latex pivot-tables visualization

5.2 match 122 stars 8.08 score 358 scripts 1 dependents

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 28 days ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

2.8 match 305 stars 14.45 score 1.6k scripts 6 dependents

zdk123

pulsar:Parallel Utilities for Lambda Selection along a Regularization Path

Model selection for penalized graphical models using the Stability Approach to Regularization Selection ('StARS'), with options for speed-ups including Bounded StARS (B-StARS), batch computing, and other stability metrics (e.g., graphlet stability G-StARS). Christian L. Müller, Richard Bonneau, Zachary Kurtz (2016) <arXiv:1605.07072>.

Maintained by Zachary Kurtz. Last updated 1 years ago.

graphical-models

6.5 match 10 stars 6.16 score 65 scripts

bioc

scMultiSim:Simulation of Multi-Modality Single Cell Data Guided By Gene Regulatory Networks and Cell-Cell Interactions

scMultiSim simulates paired single cell RNA-seq, single cell ATAC-seq and RNA velocity data, while incorporating mechanisms of gene regulatory networks, chromatin accessibility and cell-cell interactions. It allows users to tune various parameters controlling the amount of each biological factor, variation of gene-expression levels, the influence of chromatin accessibility on RNA sequence data, and so on. It can be used to benchmark various computational methods for single cell multi-omics data, and to assist in experimental design of wet-lab experiments.

Maintained by Hechen Li. Last updated 5 months ago.

singlecell transcriptomics geneexpression sequencing experimentaldesign

5.6 match 23 stars 7.08 score 11 scripts

ropensci

workloopR:Analysis of Work Loops and Other Data from Muscle Physiology Experiments

Functions for the import, transformation, and analysis of data from muscle physiology experiments. The work loop technique is used to evaluate the mechanical work and power output of muscle. Josephson (1985) <doi:10.1242/jeb.114.1.493> modernized the technique for application in comparative biomechanics. Although our initial motivation was to provide functions to analyze work loop experiment data, as we developed the package we incorporated the ability to analyze data from experiments that are often complementary to work loops. There are currently three supported experiment types: work loops, simple twitches, and tetanus trials. Data can be imported directly from .ddf files or via an object constructor function. Through either method, data can then be cleaned or transformed via methods typically used in studies of muscle physiology. Data can then be analyzed to determine the timing and magnitude of force development and relaxation (for isometric trials) or the magnitude of work, net power, and instantaneous power among other things (for work loops). Although we do not provide plotting functions, all resultant objects are designed to be friendly to visualization via either base-R plotting or 'tidyverse' functions. This package has been peer-reviewed by rOpenSci (v. 1.1.0).

Maintained by Vikram B. Baliga. Last updated 8 months ago.

ddf muscle-force muscle-physiology-experiments tetanus work-loop workloop

6.6 match 3 stars 5.92 score 46 scripts

cran

zoomGroupStats:Analyze Text, Audio, and Video from 'Zoom' Meetings

Provides utilities for processing and analyzing the files that are exported from a recorded 'Zoom' Meeting. This includes analyzing data captured through video cameras and microphones, the text-based chat, and meta-data. You can analyze aspects of the conversation among meeting participants and their emotional expressions throughout the meeting.

Maintained by Andrew Knight. Last updated 4 years ago.

11.7 match 3.30 score 10 scripts

cyclestreets

cyclestreets:Cycle Routing and Data for Cycling Advocacy

An interface to the cycle routing/data services provided by 'CycleStreets', a not-for-profit social enterprise and advocacy organisation. The application programming interfaces (APIs) provided by 'CycleStreets' are documented at (<https://www.cyclestreets.net/api/>). The focus of this package is the journey planning API, which aims to emulate the routes taken by a knowledgeable cyclist. An innovative feature of the routing service of its provision of fastest, quietest and balanced profiles. These represent routes taken to minimise time, avoid traffic and compromise between the two, respectively.

Maintained by Robin Lovelace. Last updated 3 months ago.

cycling routing transport transportation-planning

6.8 match 27 stars 5.62 score 31 scripts

bioc

randRotation:Random Rotation Methods for High Dimensional Data with Batch Structure

A collection of methods for performing random rotations on high-dimensional, normally distributed data (e.g. microarray or RNA-seq data) with batch structure. The random rotation approach allows exact testing of dependent test statistics with linear models following arbitrary batch effect correction methods.

Maintained by Peter Hettegger. Last updated 5 months ago.

software sequencing batcheffect biomedicalinformatics rnaseq preprocessing microarray differentialexpression geneexpression genetics micrornaarray normalization statisticalmethod

10.5 match 3.60 score 3 scripts

bioc

ChIPpeakAnno:Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments, or any experiments that result in large number of genomic interval data

The package encompasses a range of functions for identifying the closest gene, exon, miRNA, or custom features—such as highly conserved elements and user-supplied transcription factor binding sites. Additionally, users can retrieve sequences around the peaks and obtain enriched Gene Ontology (GO) or Pathway terms. In version 2.0.5 and beyond, new functionalities have been introduced. These include features for identifying peaks associated with bi-directional promoters along with summary statistics (peaksNearBDP), summarizing motif occurrences in peaks (summarizePatternInPeaks), and associating additional identifiers with annotated peaks or enrichedGO (addGeneIDs). The package integrates with various other packages such as biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest, and stat to enhance its analytical capabilities.

Maintained by Jianhong Ou. Last updated 2 months ago.

annotation chipseq chipchip

4.3 match 8.75 score 584 scripts 6 dependents

bioc

ChAMP:Chip Analysis Methylation Pipeline for Illumina HumanMethylation450 and EPIC

The package includes quality control metrics, a selection of normalization methods and novel methods to identify differentially methylated regions and to highlight copy number alterations.

Maintained by Yuan Tian. Last updated 5 months ago.

microarray methylationarray normalization twochannel copynumber dnamethylation

5.6 match 6.54 score 278 scripts

cristianetaniguti

onemap:Construction of Genetic Maps in Experimental Crosses

Analysis of molecular marker data from model (backcrosses, F2 and recombinant inbred lines) and non-model systems (i. e. outcrossing species). For the later, it allows statistical analysis by simultaneously estimating linkage and linkage phases (genetic map construction) according to Wu et al. (2002) <doi:10.1006/tpbi.2002.1577>. All analysis are based on multipoint approaches using hidden Markov models.

Maintained by Cristiane Taniguti. Last updated 2 months ago.

cpp

5.5 match 3 stars 6.58 score 183 scripts

rqtl

qtl2:Quantitative Trait Locus Mapping in Experimental Crosses

Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.

Maintained by Karl W Broman. Last updated 10 days ago.

cpp

3.8 match 34 stars 9.48 score 1.1k scripts 5 dependents

ropensci

pathviewr:Wrangle, Analyze, and Visualize Animal Movement Data

Tools to import, clean, and visualize movement data, particularly from motion capture systems such as Optitrack's 'Motive', the Straw Lab's 'Flydra', or from other sources. We provide functions to remove artifacts, standardize tunnel position and tunnel axes, select a region of interest, isolate specific trajectories, fill gaps in trajectory data, and calculate 3D and per-axis velocity. For experiments of visual guidance, we also provide functions that use subject position to estimate perception of visual stimuli.

Maintained by Vikram B. Baliga. Last updated 2 years ago.

animal-movement flydra motion movement-data optitrack trajectories trajectory-analysis visual-guidance visual-perception

5.5 match 8 stars 6.56 score 102 scripts

bioc

gwasurvivr:gwasurvivr: an R package for genome wide survival analysis

gwasurvivr is a package to perform survival analysis using Cox proportional hazard models on imputed genetic data.

Maintained by Abbas Rizvi. Last updated 5 months ago.

genomewideassociation survival regression genetics snp geneticvariability pharmacogenomics biomedicalinformatics

5.5 match 12 stars 6.43 score 75 scripts

bioc

Harman:The removal of batch effects from datasets using a PCA and constrained optimisation based technique

Harman is a PCA and constrained optimisation based technique that maximises the removal of batch effects from datasets, with the constraint that the probability of overcorrection (i.e. removing genuine biological signal along with batch noise) is kept to a fraction which is set by the end-user.

Maintained by Jason Ross. Last updated 5 months ago.

batcheffect microarray multiplecomparison principalcomponent normalization preprocessing dnamethylation transcription software statisticalmethod cpp

7.1 match 4.97 score 31 scripts 1 dependents

dylanpieper

batchLLM:Batch Process LLM Text Completions Using a Data Frame

Batch process large language model (LLM) text completions using data frame rows, with support for OpenAI's 'GPT' (<https://chat.openai.com>), Anthropic's 'Claude' (<https://claude.ai>), and Google's 'Gemini' (<https://gemini.google.com>). Includes features such as local storage, metadata logging, API rate limiting delays, and a 'shiny' app addin.

Maintained by Dylan Pieper. Last updated 1 months ago.

depreciated

7.3 match 11 stars 4.85 score 6 scripts

csafe-isu

handwriter:Handwriting Analysis in R

Perform statistical writership analysis of scanned handwritten documents. Webpage provided at: <https://github.com/CSAFE-ISU/handwriter>.

Maintained by Stephanie Reinders. Last updated 1 months ago.

cpp jags

4.0 match 24 stars 8.70 score 27 scripts 2 dependents

r-dbi

DBI:R Database Interface

A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.

Maintained by Kirill Müller. Last updated 3 months ago.

database interface

1.7 match 302 stars 20.88 score 19k scripts 2.9k dependents

bioc

MultiBaC:Multiomic Batch effect Correction

MultiBaC is a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. MultiBaC is the first Batch effect correction algorithm that dealing with batch effect correction in multiomics datasets. MultiBaC is able to remove batch effects across different omics generated within separate batches provided that at least one common omic data type is included in all the batches considered.

Maintained by The package maintainer. Last updated 5 months ago.

software statisticalmethod principalcomponent datarepresentation geneexpression transcription batcheffect

10.5 match 3.30 score 7 scripts

mlr-org

mlr3batchmark:Batch Experiments for 'mlr3'

Extends the 'mlr3' package with a connector to the package 'batchtools'. This allows to run large-scale benchmark experiments on scheduled high-performance computing clusters.

Maintained by Marc Becker. Last updated 1 years ago.

batchtools cluster-computing high-performance-computing hpc mlr3

7.1 match 5 stars 4.85 score 57 scripts

neurodata

causalBatch:Causal Batch Effects

Software which provides numerous functionalities for detecting and removing group-level effects from high-dimensional scientific data which, when combined with additional assumptions, allow for causal conclusions, as-described in our manuscripts Bridgeford et al. (2024) <doi:10.1101/2021.09.03.458920> and Bridgeford et al. (2023) <doi:10.48550/arXiv.2307.13868>. Also provides a number of useful utilities for generating simulations and balancing covariates across multiple groups/batches of data via matching and propensity trimming for more than two groups.

Maintained by Eric W. Bridgeford. Last updated 5 days ago.

7.3 match 4 stars 4.70 score 23 scripts

cran

ELISAtools:ELISA Data Analysis with Batch Correction

To run data analysis for enzyme-link immunosorbent assays (ELISAs). Either the five- or four-parameter logistic model will be fitted for data of single ELISA. Moreover, the batch effect correction/normalization will be carried out, when there are more than one batches of ELISAs. Feng (2018) <doi:10.1101/483800>.

Maintained by Feng Feng. Last updated 4 years ago.

10.4 match 1 stars 3.29 score 39 scripts

thej022214

corHMM:Hidden Markov Models of Character Evolution

Fits hidden Markov models of discrete character evolution which allow different transition rate classes on different portions of a phylogeny. Beaulieu et al (2013) <doi:10.1093/sysbio/syt034>.

Maintained by Jeremy Beaulieu. Last updated 29 days ago.

3.6 match 12 stars 9.48 score 422 scripts 2 dependents

cjbarrie

academictwitteR:Access the Twitter Academic Research Product Track V2 API Endpoint

Package to query the Twitter Academic Research Product Track, providing access to full-archive search and other v2 API endpoints. Functions are written with academic research in mind. They provide flexibility in how the user wishes to store collected data, and encourage regular storage of data to mitigate loss when collecting large volumes of tweets. They also provide workarounds to manage and reshape the format in which data is provided on the client side.

Maintained by Christopher Barrie. Last updated 2 years ago.

twitter twitter-api

3.8 match 275 stars 8.94 score 177 scripts

rstudio

tfprobability:Interface to 'TensorFlow Probability'

Interface to 'TensorFlow Probability', a 'Python' library built on 'TensorFlow' that makes it easy to combine probabilistic models and deep learning on modern hardware ('TPU', 'GPU'). 'TensorFlow Probability' includes a wide selection of probability distributions and bijectors, probabilistic layers, variational inference, Markov chain Monte Carlo, and optimizers such as Nelder-Mead, BFGS, and SGLD.

Maintained by Tomasz Kalinowski. Last updated 3 years ago.

3.9 match 54 stars 8.63 score 221 scripts 3 dependents

cran

agena.ai:R Wrapper for 'agena.ai' API

An R wrapper for 'agena.ai' <https://www.agena.ai> which provides users capabilities to work with 'agena.ai' using the R environment. Users can create Bayesian network models from scratch or import existing models in R and export to 'agena.ai' cloud or local API for calculations. Note: running calculations requires a valid 'agena.ai' API license (past the initial trial period of the local API).

Maintained by Eugene Dementiev. Last updated 1 years ago.

9.3 match 3.54 score

raim

segmenTier:Similarity-Based Segmentation of Multidimensional Signals

A dynamic programming solution to segmentation based on maximization of arbitrary similarity measures within segments. The general idea, theory and this implementation are described in Machne, Murray & Stadler (2017) <doi:10.1038/s41598-017-12401-8>. In addition to the core algorithm, the package provides time-series processing and clustering functions as described in the publication. These are generally applicable where a `k-means` clustering yields meaningful results, and have been specifically developed for clustering of the Discrete Fourier Transform of periodic gene expression data (`circadian' or `yeast metabolic oscillations'). This clustering approach is outlined in the supplemental material of Machne & Murray (2012) <doi:10.1371/journal.pone.0037906>), and here is used as a basis of segment similarity measures. Notably, the time-series processing and clustering functions can also be used as stand-alone tools, independent of segmentation, e.g., for transcriptome data already mapped to genes.

Maintained by Rainer Machne. Last updated 4 years ago.

cpp

7.4 match 3 stars 4.48 score 8 scripts

bioc

BiocParallel:Bioconductor facilities for parallel evaluation

This package provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects.

Maintained by Martin Morgan. Last updated 27 days ago.

infrastructure bioconductor-package core-package u24ca289073 cpp

1.9 match 67 stars 17.40 score 7.3k scripts 1.1k dependents

bioc

BUSseq:Batch Effect Correction with Unknow Subtypes for scRNA-seq data

BUSseq R package fits an interpretable Bayesian hierarchical model---the Batch Effects Correction with Unknown Subtypes for scRNA seq Data (BUSseq)---to correct batch effects in the presence of unknown cell types. BUSseq is able to simultaneously correct batch effects, clusters cell types, and takes care of the count data nature, the overdispersion, the dropout events, and the cell-specific sequencing depth of scRNA-seq data. After correcting the batch effects with BUSseq, the corrected value can be used for downstream analysis as if all cells were sequenced in a single batch. BUSseq can integrate read count matrices obtained from different scRNA-seq platforms and allow cell types to be measured in some but not all of the batches as long as the experimental design fulfills the conditions listed in our manuscript.

Maintained by Fangda Song. Last updated 5 months ago.

experimentaldesign geneexpression statisticalmethod bayesian clustering featureextraction batcheffect singlecell sequencing cpp openmp

7.3 match 4.48 score 30 scripts

cmstatr

cmstatr:Statistical Methods for Composite Material Data

An implementation of the statistical methods commonly used for advanced composite materials in aerospace applications. This package focuses on calculating basis values (lower tolerance bounds) for material strength properties, as well as performing the associated diagnostic tests. This package provides functions for calculating basis values assuming several different distributions, as well as providing functions for non-parametric methods of computing basis values. Functions are also provided for testing the hypothesis that there is no difference between strength and modulus data from an alternate sample and that from a "qualification" or "baseline" sample. For a discussion of these statistical methods and their use, see the Composite Materials Handbook, Volume 1 (2012, ISBN: 978-0-7680-7811-4). Additional details about this package are available in the paper by Kloppenborg (2020, <doi:10.21105/joss.02265>).

Maintained by Stefan Kloppenborg. Last updated 4 months ago.

composite-material-data data materials-science statistical-analysis statistics

5.2 match 4 stars 6.26 score 23 scripts

kmkuesters

pooledpeaks:Genetic Analysis of Pooled Samples

Analyzing genetic data obtained from pooled samples. This package can read in Fragment Analysis output files, process the data, and score peaks, as well as facilitate various analyses, including cluster analysis, calculation of genetic distances and diversity indices, as well as bootstrap resampling for statistical inference. Specifically tailored to handle genetic data efficiently, researchers can explore population structure, genetic differentiation, and genetic relatedness among samples. We updated some functions from Covarrubias-Pazaran et al. (2016) <doi:10.1186/s12863-016-0365-6> to allow for the use of new file formats and referenced the following to write our genetic analysis functions: Long et al. (2022) <doi:10.1038/s41598-022-04776-0>, Jost (2008) <doi:10.1111/j.1365-294x.2008.03887.x>, Nei (1973) <doi:10.1073/pnas.70.12.3321>, Foulley et al. (2006) <doi:10.1016/j.livprodsci.2005.10.021>, Chao et al. (2008) <doi:10.1111/j.1541-0420.2008.01010.x>.

Maintained by Kathleen Kuesters. Last updated 4 days ago.

6.6 match 1 stars 4.85 score 3 scripts

rorynolan

detrendr:Detrend Images

Detrend fluorescence microscopy image series for fluorescence fluctuation and correlation spectroscopy ('FCS' and 'FFS') analysis. This package contains functionality published in a 2016 paper <doi:10.1093/bioinformatics/btx434> but it has been extended since then with the Robin Hood algorithm and thus contains unpublished work.

Maintained by Rory Nolan. Last updated 2 months ago.

cpp

5.3 match 3 stars 6.08 score 25 scripts 1 dependents

bioc

msmsEDA:Exploratory Data Analysis of LC-MS/MS data by spectral counts

Exploratory data analysis to assess the quality of a set of LC-MS/MS experiments, and visualize de influence of the involved factors.

Maintained by Josep Gregori. Last updated 5 months ago.

immunooncology software massspectrometry proteomics

7.1 match 4.38 score 4 scripts 2 dependents

tuomonieminen

read.gt3x:Parse 'ActiGraph' 'GT3X'/'GT3X+' 'Accelerometer' Data

Implements a high performance C++ parser for 'ActiGraph' 'GT3X'/'GT3X+' data format (with extension '.gt3x') for 'accelerometer' samples. Activity samples can be easily read into a matrix or data.frame. This allows for storing the raw 'accelerometer' samples in the original binary format to reserve space.

Maintained by Tuomo Nieminen. Last updated 3 years ago.

cpp

7.0 match 4.46 score 24 scripts 4 dependents

asmahani

EnsembleBase:Extensible Package for Parallel, Batch Training of Base Learners for Ensemble Modeling

Extensible S4 classes and methods for batch training of regression and classification algorithms such as Random Forest, Gradient Boosting Machine, Neural Network, Support Vector Machines, K-Nearest Neighbors, Penalized Regression (L1/L2), and Bayesian Additive Regression Trees. These algorithms constitute a set of 'base learners', which can subsequently be combined together to form ensemble predictions. This package provides cross-validation wrappers to allow for downstream application of ensemble integration techniques, including best-error selection. All base learner estimation objects are retained, allowing for repeated prediction calls without the need for re-training. For large problems, an option is provided to save estimation objects to disk, along with prediction methods that utilize these objects. This allows users to train and predict with large ensembles of base learners without being constrained by system RAM.

Maintained by Alireza S. Mahani. Last updated 2 months ago.

openjdk

15.8 match 1.95 score 5 scripts 3 dependents

bioc

pvca:Principal Variance Component Analysis (PVCA)

This package contains the function to assess the batch sourcs by fitting all "sources" as random effects including two-way interaction terms in the Mixed Model(depends on lme4 package) to selected principal components, which were obtained from the original data correlation matrix. This package accompanies the book "Batch Effects and Noise in Microarray Experiements, chapter 12.

Maintained by Jianying LI. Last updated 5 months ago.

microarray batcheffect

5.4 match 5.67 score 111 scripts 1 dependents

mlampros

SuperpixelImageSegmentation:Superpixel Image Segmentation

Image Segmentation using Superpixels, Affinity Propagation and Kmeans Clustering. The R code is based primarily on the article "Image Segmentation using SLIC Superpixels and Affinity Propagation Clustering, Bao Zhou, International Journal of Science and Research (IJSR), 2013" <https://www.ijsr.net/archive/v4i4/SUB152869.pdf>.

Maintained by Lampros Mouselimis. Last updated 2 years ago.

affinity-propagation kmeans mini-batch-kmeans slic superpixels openblas cpp openmp

6.7 match 18 stars 4.61 score 15 scripts 1 dependents

allenzhuaz

PPQplan:Process Performance Qualification (PPQ) Plans in Chemistry, Manufacturing and Controls (CMC) Statistical Analysis

Assessment for statistically-based PPQ sampling plan, including calculating the passing probability, optimizing the baseline and high performance cutoff points, visualizing the PPQ plan and power dynamically. The analytical idea is based on the simulation methods from the textbook Burdick, R. K., LeBlond, D. J., Pfahler, L. B., Quiroz, J., Sidor, L., Vukovinsky, K., & Zhang, L. (2017). Statistical Methods for CMC Applications. In Statistical Applications for Chemistry, Manufacturing and Controls (CMC) in the Pharmaceutical Industry (pp. 227-250). Springer, Cham.

Maintained by Yalin Zhu. Last updated 3 years ago.

biostatistics pharmaceuticals sampling-methods

7.4 match 1 stars 4.11 score 13 scripts

ravengan

SCIBER:Single-Cell Integrator and Batch Effect Remover

Remove batch effects by projecting query batches into the reference batch space.

Maintained by Dailin Gan. Last updated 2 years ago.

7.1 match 4 stars 4.30 score 8 scripts

alanarnholt

BSDA:Basic Statistics and Data Analysis

Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.

Maintained by Alan T. Arnholt. Last updated 2 years ago.

3.3 match 7 stars 9.11 score 1.3k scripts 6 dependents

civisanalytics

civis:R Client for the 'Civis Platform API'

A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.

Maintained by Peter Cooman. Last updated 2 months ago.

3.9 match 16 stars 7.84 score 144 scripts

r-lib

bit:Classes and Methods for Fast Memory-Efficient Boolean Selections

Provided are classes for boolean and skewed boolean vectors, fast boolean methods, fast unique and non-unique integer sorting, fast set operations on sorted and unsorted sets of integers, and foundations for ff (range index, compression, chunked processing).

Maintained by Michael Chirico. Last updated 7 days ago.

2.0 match 12 stars 15.15 score 131 scripts 3.2k dependents

bioc

metabCombiner:Method for Combining LC-MS Metabolomics Feature Measurements

This package aligns LC-HRMS metabolomics datasets acquired from biologically similar specimens analyzed under similar, but not necessarily identical, conditions. Peak-picked and simply aligned metabolomics feature tables (consisting of m/z, rt, and per-sample abundance measurements, plus optional identifiers & adduct annotations) are accepted as input. The package outputs a combined table of feature pair alignments, organized into groups of similar m/z, and ranked by a similarity score. Input tables are assumed to be acquired using similar (but not necessarily identical) analytical methods.

Maintained by Hani Habra. Last updated 5 months ago.

software massspectrometry metabolomics mass-spectrometry

5.3 match 10 stars 5.65 score 5 scripts

broccolito

bolt4jr:Interface for the 'Neo4j Bolt' Protocol

Querying, extracting, and processing large-scale network data from Neo4j databases using the 'Neo4j Bolt' <https://neo4j.com/docs/bolt/current/bolt/> protocol. This interface supports efficient data retrieval, batch processing for large datasets, and seamless conversion of query results into R data frames, making it ideal for bioinformatics, computational biology, and other graph-based applications.

Maintained by Wanjun Gu. Last updated 6 days ago.

6.3 match 4.65 score 2 scripts

roelandkindt

BiodiversityR:Package for Community Ecology and Suitability Analysis

Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.

Maintained by Roeland Kindt. Last updated 2 months ago.

3.9 match 16 stars 7.42 score 390 scripts 2 dependents

bioc

PAA:PAA (Protein Array Analyzer)

PAA imports single color (protein) microarray data that has been saved in gpr file format - esp. ProtoArray data. After preprocessing (background correction, batch filtering, normalization) univariate feature preselection is performed (e.g., using the "minimum M statistic" approach - hereinafter referred to as "mMs"). Subsequently, a multivariate feature selection is conducted to discover biomarker candidates. Therefore, either a frequency-based backwards elimination aproach or ensemble feature selection can be used. PAA provides a complete toolbox of analysis tools including several different plots for results examination and evaluation.

Maintained by Michael Turewicz. Last updated 5 months ago.

classification microarray onechannel proteomics cpp

6.7 match 4.34 score 11 scripts

pdwaggoner

hdImpute:A Batch Process for High Dimensional Imputation

A correlation-based batch process for fast, accurate imputation for high dimensional missing data problems via chained random forests. See Waggoner (2023) <doi:10.1007/s00180-023-01325-9> for more on 'hdImpute', Stekhoven and Bühlmann (2012) <doi:10.1093/bioinformatics/btr597> for more on 'missForest', and Mayer (2022) <https://github.com/mayer79/missRanger> for more on 'missRanger'.

Maintained by Philip Waggoner. Last updated 2 months ago.

8.5 match 2 stars 3.41 score 13 scripts

mlverse

torch:Tensors and Neural Networks with 'GPU' Acceleration

Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.

Maintained by Daniel Falbel. Last updated 8 days ago.

autograd deep-learning torch cpp

1.8 match 520 stars 16.52 score 1.4k scripts 38 dependents

poissonconsulting

dbflobr:Read and Write Files to SQLite Databases

Reads and writes files to SQLite databases <https://www.sqlite.org/index.html> as flobs (a flob is a blob that preserves the file extension).

Maintained by Evan Amies-Galonski. Last updated 2 months ago.

blob databases flob sqlite

4.9 match 6 stars 5.86 score 5 scripts

truecluster

ff:Memory-Efficient Storage of Large Data on Disk and Fast Access Functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R's standard atomic data types 'double', 'logical', 'raw' and 'integer' and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example 'quad' allows efficient storage of genomic data as an 'A','T','G','C' factor. The unsigned types support 'circular' arithmetic. There is also support for close-to-atomic types 'factor', 'ordered', 'POSIXct', 'Date' and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with 'permanent' files as well as creating/removing 'temporary' ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, 'logicals' and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package 'bit': chunked looping, fast bit operations and coercions between different objects that can store subscript information ('bit', 'bitwhich', ff 'boolean', ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.

Maintained by Jens Oehlschlägel. Last updated 2 months ago.

cpp

2.4 match 27 stars 12.01 score 764 scripts 71 dependents

r-dbi

RSQLite:SQLite Interface for R

Embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine and for various extensions in a recent version is included. System libraries will never be consulted because this package relies on static linking for the plugins it includes; this also ensures a consistent experience across all installations.

Maintained by Kirill Müller. Last updated 26 days ago.

database sqlite3 cpp

1.5 match 327 stars 18.73 score 8.1k scripts 1.1k dependents

bioc

systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation

systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.

Maintained by Thomas Girke. Last updated 5 months ago.

genetics infrastructure dataimport sequencing rnaseq riboseq chipseq methylseq snp geneexpression coverage genesetenrichment alignment qualitycontrol immunooncology reportwriting workflowstep workflowmanagement

2.4 match 53 stars 11.56 score 344 scripts 3 dependents

bioc

limma:Linear Models for Microarray and Omics Data

Data analysis, linear models and differential expression for omics data.

Maintained by Gordon Smyth. Last updated 7 days ago.

exonarray geneexpression transcription alternativesplicing differentialexpression differentialsplicing genesetenrichment dataimport bayesian clustering regression timecourse microarray micrornaarray mrnamicroarray onechannel proprietaryplatforms twochannel sequencing rnaseq batcheffect multiplecomparison normalization preprocessing qualitycontrol biomedicalinformatics cellbiology cheminformatics epigenetics functionalgenomics genetics immunooncology metabolomics proteomics systemsbiology transcriptomics

2.0 match 13.81 score 16k scripts 585 dependents

rorynolan

nandb:Number and Brightness Image Analysis

Calculation of molecular number and brightness from fluorescence microscopy image series. The software was published in a 2016 paper <doi:10.1093/bioinformatics/btx434>. The seminal paper for the technique is Digman et al. 2008 <doi:10.1529/biophysj.107.114645>. A review of the technique was published in 2017 <doi:10.1016/j.ymeth.2017.12.001>.

Maintained by Rory Nolan. Last updated 2 months ago.

cpp

5.3 match 2 stars 5.24 score 29 scripts

bioc

scone:Single Cell Overview of Normalized Expression data

SCONE is an R package for comparing and ranking the performance of different normalization schemes for single-cell RNA-seq and other high-throughput analyses.

Maintained by Davide Risso. Last updated 27 days ago.

immunooncology normalization preprocessing qualitycontrol geneexpression rnaseq software transcriptomics sequencing singlecell coverage

3.0 match 53 stars 9.12 score 104 scripts

bluegreen-labs

daymetr:Interface to the 'Daymet' Web Services

Programmatic interface to the 'Daymet' web services (<http://daymet.ornl.gov>). Allows for easy downloads of 'Daymet' climate data directly to your R workspace or your computer. Routines for both single pixel data downloads and gridded (netCDF) data are provided.

Maintained by Koen Hufkens. Last updated 1 years ago.

climate-data data-science daymet gridded-data netcdf ornl-daac

3.4 match 31 stars 8.13 score 242 scripts 2 dependents

lbbe-software

DRomics:Dose Response for Omics

Several functions are provided for dose-response (or concentration-response) characterization from omics data. 'DRomics' is especially dedicated to omics data obtained using a typical dose-response design, favoring a great number of tested doses (or concentrations) rather than a great number of replicates (no need of replicates). 'DRomics' provides functions 1) to check, normalize and or transform data, 2) to select monotonic or biphasic significantly responding items (e.g. probes, metabolites), 3) to choose the best-fit model among a predefined family of monotonic and biphasic models to describe each selected item, 4) to derive a benchmark dose or concentration and a typology of response from each fitted curve. In the available version data are supposed to be single-channel microarray data in log2, RNAseq data in raw counts, or already pretreated continuous omics data (such as metabolomic data) in log scale. In order to link responses across biological levels based on a common method, 'DRomics' also handles apical data as long as they are continuous and follow a normal distribution for each dose or concentration, with a common standard error. For further details see Delignette-Muller et al (2023) <DOI:10.24072/pcjournal.325> and Larras et al (2018) <DOI:10.1021/acs.est.8b04752>.

Maintained by Aurelie Siberchicot. Last updated 14 days ago.

4.5 match 4 stars 5.96 score 7 scripts

bioc

scp:Mass Spectrometry-Based Single-Cell Proteomics Data Analysis

Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the 'QFeatures' package and relies on 'SingleCellExpirement' to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization.

Maintained by Christophe Vanderaa. Last updated 18 days ago.

geneexpression proteomics singlecell massspectrometry preprocessing cellbasedassays bioconductor mass-spectrometry single-cell software

3.0 match 25 stars 8.94 score 115 scripts

maialba3

LipidMS:Lipid Annotation for LC-MS/MS DDA or DIA Data

Lipid annotation in untargeted LC-MS lipidomics based on fragmentation rules. Alcoriza-Balaguer MI, Garcia-Canaveras JC, Lopez A, Conde I, Juan O, Carretero J, Lahoz A (2019) <doi:10.1021/acs.analchem.8b03409>.

Maintained by M Isabel Alcoriza-Balaguer. Last updated 7 months ago.

cpp

5.0 match 2 stars 5.33 score 12 scripts 1 dependents

talhouklab

nanostringr:Performs Quality Control, Data Normalization, and Batch Effect Correction for 'NanoString nCounter' Data

Provides quality control (QC), normalization, and batch effect correction operations for 'NanoString nCounter' data, Talhouk et al. (2016) <doi:10.1371/journal.pone.0153844>. Various metrics are used to determine which samples passed or failed QC. Gene expression should first be normalized to housekeeping genes, before a reference-based approach is used to adjust for batch effects. Raw NanoString data can be imported in the form of Reporter Code Count (RCC) files.

Maintained by Derek Chiu. Last updated 1 months ago.

5.3 match 5 stars 4.95 score 12 scripts

shixiangwang

regport:Regression Model Processing Port

Provides R6 classes, methods and utilities to construct, analyze, summarize, and visualize regression models.

Maintained by Shixiang Wang. Last updated 26 days ago.

batch-processing regression-models

7.5 match 6 stars 3.48 score 4 scripts

sssydysss

TransProR:Analysis and Visualization of Multi-Omics Data

A tool for comprehensive transcriptomic data analysis, with a focus on transcript-level data preprocessing, expression profiling, differential expression analysis, and functional enrichment. It enables researchers to identify key biological processes, disease biomarkers, and gene regulatory mechanisms. 'TransProR' is aimed at researchers and bioinformaticians working with RNA-Seq data, providing an intuitive framework for in-depth analysis and visualization of transcriptomic datasets. The package includes comprehensive documentation and usage examples to guide users through the entire analysis pipeline. The differential expression analysis methods incorporated in the package include 'limma' (Ritchie et al., 2015, <doi:10.1093/nar/gkv007>; Smyth, 2005, <doi:10.1007/0-387-29362-0_23>), 'edgeR' (Robinson et al., 2010, <doi:10.1093/bioinformatics/btp616>), 'DESeq2' (Love et al., 2014, <doi:10.1186/s13059-014-0550-8>), and Wilcoxon tests (Li et al., 2022, <doi:10.1186/s13059-022-02648-4>), providing flexible and robust approaches to RNA-Seq data analysis. For more information, refer to the package vignettes and related publications.

Maintained by Dongyue Yu. Last updated 21 days ago.

3.4 match 174 stars 7.55 score 34 scripts

bioc

RnBeads:RnBeads

RnBeads facilitates comprehensive analysis of various types of DNA methylation data at the genome scale.

Maintained by Fabian Mueller. Last updated 1 months ago.

dnamethylation methylationarray methylseq epigenetics qualitycontrol preprocessing batcheffect differentialmethylation sequencing cpgisland immunooncology twochannel dataimport

3.8 match 6.85 score 169 scripts 1 dependents

bluegreen-labs

ecmwfr:Interface to 'ECMWF' and 'CDS' Data Web Services

Programmatic interface to the European Centre for Medium-Range Weather Forecasts dataset web services (ECMWF; <https://www.ecmwf.int/>) and Copernicus's Data Stores. Allows for easy downloads of weather forecasts and climate reanalysis data in R. Data stores covered include the Climate Data Store (CDS; <https://cds.climate.copernicus.eu>), Atmosphere Data Store (ADS; <https://ads.atmosphere.copernicus.eu>) and Early Warning Data Store (CEMS; <https://ewds.climate.copernicus.eu>).

Maintained by Koen Hufkens. Last updated 1 months ago.

cds climate-data copernicus ecmwf-api ecmwf-services

2.5 match 111 stars 10.08 score 156 scripts 3 dependents

bioc

lumi:BeadArray Specific Methods for Illumina Methylation and Expression Microarrays

The lumi package provides an integrated solution for the Illumina microarray data analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality control, BeadArray-specific variance stabilization, normalization and gene annotation at the probe level. It also includes the functions of processing Illumina methylation microarrays, especially Illumina Infinium methylation microarrays.

Maintained by Lei Huang. Last updated 5 months ago.

microarray onechannel preprocessing dnamethylation qualitycontrol twochannel

4.0 match 6.26 score 294 scripts 5 dependents

cbroeckl

RAMClustR:Mass Spectrometry Metabolomics Feature Clustering and Interpretation

A feature clustering algorithm for non-targeted mass spectrometric metabolomics data. This method is compatible with gas and liquid chromatography coupled mass spectrometry, including indiscriminant tandem mass spectrometry <DOI: 10.1021/ac501530d> data.

Maintained by Helge Hecht. Last updated 7 months ago.

massspectrometry metabolomics

3.6 match 12 stars 6.78 score 20 scripts

bioc

Omixer:Omixer: multivariate and reproducible sample randomization to proactively counter batch effects in omics studies

Omixer - an Bioconductor package for multivariate and reproducible sample randomization, which ensures optimal sample distribution across batches with well-documented methods. It outputs lab-friendly sample layouts, reducing the risk of sample mixups when manually pipetting randomized samples.

Maintained by Lucy Sinke. Last updated 5 months ago.

datarepresentation experimentaldesign qualitycontrol software visualization

6.0 match 4.00 score 2 scripts

bioc

tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles

This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.

Maintained by Timothy Keyes. Last updated 5 months ago.

singlecell flowcytometry bioinformatics cytometry data-science single-cell tidyverse cpp

3.3 match 18 stars 7.24 score 35 scripts

sbg

sevenbridges2:The 'Seven Bridges Platform' API Client

R client and utilities for 'Seven Bridges Platform' API, from 'Cancer Genomics Cloud' to other 'Seven Bridges' supported platforms. API documentation is hosted publicly at <https://docs.sevenbridges.com/docs/the-api>.

Maintained by Marko Trifunovic. Last updated 22 days ago.

api-client bioinformatics cloud sevenbridges

4.0 match 2 stars 5.90 score 4 scripts

apache

nanoarrow:Interface to the 'nanoarrow' 'C' Library

Provides an 'R' interface to the 'nanoarrow' 'C' library and the 'Apache Arrow' application binary interface. Functions to import and export 'ArrowArray', 'ArrowSchema', and 'ArrowArrayStream' 'C' structures to and from 'R' objects are provided alongside helpers to facilitate zero-copy data transfer among 'R' bindings to libraries implementing the 'Arrow' 'C' data interface.

Maintained by Dewey Dunnington. Last updated 3 days ago.

cpp

2.0 match 183 stars 11.79 score 37 scripts 27 dependents

hiweller

recolorize:Color-Based Image Segmentation

Automatic, semi-automatic, and manual functions for generating color maps from images. The idea is to simplify the colors of an image according to a metric that is useful for the user, using deterministic methods whenever possible. Many images will be clustered well using the out-of-the-box functions, but the package also includes a toolbox of functions for making manual adjustments (layer merging/isolation, blurring, fitting to provided color clusters or those from another image, etc). Also includes export methods for other color/pattern analysis packages (pavo, patternize, colordistance).

Maintained by Hannah Weller. Last updated 15 days ago.

3.0 match 39 stars 7.68 score 87 scripts

kerschke

flacco:Feature-Based Landscape Analysis of Continuous and Constrained Optimization Problems

Tools and features for "Exploratory Landscape Analysis (ELA)" of single-objective continuous optimization problems. Those features are able to quantify rather complex properties, such as the global structure, separability, etc., of the optimization problems.

Maintained by Pascal Kerschke. Last updated 2 years ago.

exploratory-landscape-analysis gui optimization

3.4 match 61 stars 6.70 score 41 scripts

martynplummer

coda:Output Analysis and Diagnostics for MCMC

Provides functions for summarizing and plotting the output from Markov Chain Monte Carlo (MCMC) simulations, as well as diagnostic tests of convergence to the equilibrium distribution of the Markov chain.

Maintained by Martyn Plummer. Last updated 1 years ago.

2.0 match 6 stars 11.33 score 8.3k scripts 1.1k dependents

ropensci

lightr:Read Spectrometric Data and Metadata

Parse various reflectance/transmittance/absorbance spectra file formats to extract spectral data and metadata, as described in Gruson, White & Maia (2019) <doi:10.21105/joss.01857>. Among other formats, it can import files from 'Avantes' <https://www.avantes.com/>, 'CRAIC' <https://www.microspectra.com/>, and 'OceanOptics'/'OceanInsight' <https://www.oceanoptics.com/> brands.

Maintained by Hugo Gruson. Last updated 1 months ago.

file-import reproducibility reproducible-research reproducible-science spectral-data spectroscopy

3.1 match 13 stars 7.11 score 11 scripts 2 dependents

ohdsi

DatabaseConnector:Connecting to Various Database Platforms

An R 'DataBase Interface' ('DBI') compatible interface to various database platforms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Impala', 'Google BigQuery', 'Snowflake', 'Spark', 'SQLite', and 'InterSystems IRIS'). Also includes support for fetching data as 'Andromeda' objects. Uses either 'Java Database Connectivity' ('JDBC') or other 'DBI' drivers to connect to databases.

Maintained by Martijn Schuemie. Last updated 2 months ago.

hades openjdk

1.8 match 56 stars 12.63 score 772 scripts 11 dependents

marce10

Rraven:Connecting R and 'Raven' Sound Analysis Software

A tool to exchange data between R and 'Raven' sound analysis software (Cornell Lab of Ornithology). Functions work on data formats compatible with the R package 'warbleR'.

Maintained by Marcelo Araya-Salas. Last updated 2 months ago.

animal raven sounds

3.7 match 10 stars 6.00 score 50 scripts

bioc

structToolbox:Data processing & analysis tools for Metabolomics and other omics

An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.

Maintained by Gavin Rhys Lloyd. Last updated 27 days ago.

workflowstep metabolomics bioconductor-package dims lc-ms machine-learning multivariate-analysis statistics univariate

3.5 match 10 stars 6.26 score 12 scripts

lau-mel

swamp:Visualization, Analysis and Adjustment of High-Dimensional Data in Respect to Sample Annotations

Collection of functions to connect the structure of the data with the information on the samples. Three types of associations are covered: 1. linear model of principal components. 2. hierarchical clustering analysis. 3. distribution of features-sample annotation associations. Additionally, the inter-relation between sample annotations can be analyzed. Simple methods are provided for the correction of batch effects and removal of principal components.

Maintained by Martin Lauss. Last updated 5 years ago.

9.1 match 2.42 score 29 scripts 1 dependents

bioc

cogeqc:Systematic quality checks on comparative genomics analyses

cogeqc aims to facilitate systematic quality checks on standard comparative genomics analyses to help researchers detect issues and select the most suitable parameters for each data set. cogeqc can be used to asses: i. genome assembly and annotation quality with BUSCOs and comparisons of statistics with publicly available genomes on the NCBI; ii. orthogroup inference using a protein domain-based approach and; iii. synteny detection using synteny network properties. There are also data visualization functions to explore QC summary statistics.

Maintained by Fabrício Almeida-Silva. Last updated 5 months ago.

software genomeassembly comparativegenomics functionalgenomics phylogenetics qualitycontrol network comparative-genomics evolutionary-genomics

3.6 match 10 stars 6.08 score 20 scripts

eagerai

fastai:Interface to 'fastai'

The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.

Maintained by Turgut Abdullayev. Last updated 11 months ago.

audio collaborative-filtering darknet darknet-image-classification fastai medical object-detection tabular text vision

2.3 match 118 stars 9.40 score 76 scripts

bioc

GWASTools:Tools for Genome Wide Association Studies

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.

Maintained by Stephanie M. Gogarten. Last updated 5 months ago.

snp geneticvariability qualitycontrol microarray

2.0 match 17 stars 10.50 score 396 scripts 5 dependents

niaid

dsb:Normalize & Denoise Droplet Single Cell Protein Data (CITE-Seq)

This lightweight R package provides a method for normalizing and denoising protein expression data from droplet based single cell experiments. Raw protein Unique Molecular Index (UMI) counts from sequencing DNA-conjugated antibody derived tags (ADT) in droplets (e.g. 'CITE-seq') have substantial measurement noise. Our experiments and computational modeling revealed two major components of this noise: 1) protein-specific noise originating from ambient, unbound antibody encapsulated in droplets that can be accurately inferred via the expected protein counts detected in empty droplets, and 2) droplet/cell-specific noise revealed via the shared variance component associated with isotype antibody controls and background protein counts in each cell. This package normalizes and removes both of these sources of noise from raw protein data derived from methods such as 'CITE-seq', 'REAP-seq', 'ASAP-seq', 'TEA-seq', 'proteogenomic' data from the Mission Bio platform, etc. See the vignette for tutorials on how to integrate dsb with 'Seurat' and 'Bioconductor' and how to use dsb in 'Python'. Please see our paper Mulè M.P., Martins A.J., and Tsang J.S. Nature Communications 2022 <https://www.nature.com/articles/s41467-022-29356-8> for more details on the method.

Maintained by Matthew Mulè. Last updated 9 months ago.

cite-seq niaid-tsang-lab

2.7 match 65 stars 7.73 score 104 scripts

cran

qpcR:Modelling and Analysis of Real-Time PCR Data

Model fitting, optimal model selection and calculation of various features that are essential in the analysis of quantitative real-time polymerase chain reaction (qPCR).

Maintained by Andrej-Nikolai Spiess. Last updated 7 years ago.

6.8 match 2 stars 3.06 score 1 dependents

bioc

chevreulProcess:Tools for managing SingleCellExperiment objects as projects

Tools analyzing SingleCellExperiment objects as projects. for input into the Chevreul app downstream. Includes functions for analysis of single cell RNA sequencing data. Supported by NIH grants R01CA137124 and R01EY026661 to David Cobrinik.

Maintained by Kevin Stachelek. Last updated 1 months ago.

coverage rnaseq sequencing visualization geneexpression transcription singlecell transcriptomics normalization preprocessing qualitycontrol dimensionreduction dataimport

3.8 match 5.38 score 2 scripts 2 dependents

felixfan

FinCal:Time Value of Money, Time Series Analysis and Computational Finance

Package for time value of money calculation, time series analysis and computational finance.

Maintained by Felix Yanhui Fan. Last updated 8 years ago.

3.3 match 23 stars 6.02 score 203 scripts 1 dependents

bioc

PhosR:A set of methods and tools for comprehensive analysis of phosphoproteomics data

PhosR is a package for the comprenhensive analysis of phosphoproteomic data. There are two major components to PhosR: processing and downstream analysis. PhosR consists of various processing tools for phosphoproteomics data including filtering, imputation, normalisation, and functional analysis for inferring active kinases and signalling pathways.

Maintained by Taiyun Kim. Last updated 5 months ago.

software researchfield proteomics

4.2 match 4.71 score 51 scripts

palderman

DSSAT:A Comprehensive R Interface for the DSSAT Cropping Systems Model

The purpose of this package is to provide a comprehensive R interface to the Decision Support System for Agrotechnology Transfer Cropping Systems Model (DSSAT-CSM; see <https://dssat.net> for more information). The package provides cross-platform functions to read and write input files, run DSSAT-CSM, and read output files.

Maintained by Phillip D. Alderman. Last updated 1 years ago.

3.5 match 22 stars 5.57 score 34 scripts

danchaltiel

crosstable:Crosstables for Descriptive Analyses

Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using 'rmarkdown' or 'officer'. You can also compute effect sizes and statistical tests if needed.

Maintained by Dan Chaltiel. Last updated 2 months ago.

descriptive-statistics flextable frequency-table html-report msword officer

1.9 match 116 stars 10.37 score 340 scripts

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 13 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

1.2 match 375 stars 16.11 score 17k scripts 115 dependents

bioc

MAPFX:MAssively Parallel Flow cytometry Xplorer (MAPFX): A Toolbox for Analysing Data from the Massively-Parallel Cytometry Experiments

MAPFX is an end-to-end toolbox that pre-processes the raw data from MPC experiments (e.g., BioLegend's LEGENDScreen and BD Lyoplates assays), and further imputes the ‘missing’ infinity markers in the wells without those measurements. The pipeline starts by performing background correction on raw intensities to remove the noise from electronic baseline restoration and fluorescence compensation by adapting a normal-exponential convolution model. Unwanted technical variation, from sources such as well effects, is then removed using a log-normal model with plate, column, and row factors, after which infinity markers are imputed using the informative backbone markers as predictors. The completed dataset can then be used for clustering and other statistical analyses. Additionally, MAPFX can be used to normalise data from FFC assays as well.

Maintained by Hsiao-Chi Liao. Last updated 5 months ago.

software flowcytometry cellbasedassays singlecell proteomics clustering

4.3 match 1 stars 4.54 score

vimc

orderly:Lightweight Reproducible Reporting

Order, create and store reports from R. By defining a lightweight interface around the inputs and outputs of an analysis, a lot of the repetitive work for reproducible research can be automated. We define a simple format for organising and describing work that facilitates collaborative reproducible research and acknowledges that all analyses are run multiple times over their lifespans.

Maintained by Rich FitzJohn. Last updated 2 years ago.

2.0 match 117 stars 9.63 score 94 scripts 4 dependents

bioc

scuttle:Single-Cell RNA-Seq Analysis Utilities

Provides basic utility functions for performing single-cell analyses, focusing on simple normalization, quality control and data transformations. Also provides some helper functions to assist development of other packages.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology singlecell rnaseq qualitycontrol preprocessing normalization transcriptomics geneexpression sequencing software dataimport openblas cpp

1.9 match 10.21 score 1.7k scripts 80 dependents

marce10

warbleR:Streamline Bioacoustic Analysis

Functions aiming to facilitate the analysis of the structure of animal acoustic signals in 'R'. 'warbleR' makes use of the basic sound analysis tools from the packages 'tuneR' and 'seewave', and offers new tools for explore and quantify acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals.

Maintained by Marcelo Araya-Salas. Last updated 2 months ago.

animal-acoustic-signals audio-processing bioacoustics spectrogram streamline-analysis cpp

1.8 match 56 stars 10.86 score 270 scripts 4 dependents

r-arcgis

arcgisgeocode:A Robust Interface to ArcGIS 'Geocoding Services'

A very fast and robust interface to ArcGIS 'Geocoding Services'. Provides capabilities for reverse geocoding, finding address candidates, character-by-character search autosuggestion, and batch geocoding. The public 'ArcGIS World Geocoder' is accessible for free use via 'arcgisgeocode' for all services except batch geocoding. 'arcgisgeocode' also integrates with 'arcgisutils' to provide access to custom locators or private 'ArcGIS World Geocoder' hosted on 'ArcGIS Enterprise'. Learn more in the 'Geocode service' API reference <https://developers.arcgis.com/rest/geocode/api-reference/overview-world-geocoding-service.htm>.

Maintained by Josiah Parry. Last updated 2 months ago.

rust cargo

2.8 match 41 stars 6.82 score 20 scripts 1 dependents

schochastics

networkdata:Repository of Network Datasets

The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.

Maintained by David Schoch. Last updated 12 months ago.

dataset network-analysis

3.8 match 143 stars 5.01 score 143 scripts

zarquon42b

Morpho:Calculations and Visualisations Related to Geometric Morphometrics

A toolset for Geometric Morphometrics and mesh processing. This includes (among other stuff) mesh deformations based on reference points, permutation tests, detection of outliers, processing of sliding semi-landmarks and semi-automated surface landmark placement.

Maintained by Stefan Schlager. Last updated 5 months ago.

openblas cpp openmp

1.9 match 51 stars 10.00 score 218 scripts 13 dependents

bioc

POMA:Tools for Omics Data Analysis

The POMA package offers a comprehensive toolkit designed for omics data analysis, streamlining the process from initial visualization to final statistical analysis. Its primary goal is to simplify and unify the various steps involved in omics data processing, making it more accessible and manageable within a single, intuitive R package. Emphasizing on reproducibility and user-friendliness, POMA leverages the standardized SummarizedExperiment class from Bioconductor, ensuring seamless integration and compatibility with a wide array of Bioconductor tools. This approach guarantees maximum flexibility and replicability, making POMA an essential asset for researchers handling omics datasets. See https://github.com/pcastellanoescuder/POMAShiny. Paper: Castellano-Escuder et al. (2021) <doi:10.1371/journal.pcbi.1009148> for more details.

Maintained by Pol Castellano-Escuder. Last updated 4 months ago.

batcheffect classification clustering decisiontree dimensionreduction multidimensionalscaling normalization preprocessing principalcomponent regression rnaseq software statisticalmethod visualization bioconductor bioinformatics data-visualization dimension-reduction exploratory-data-analysis machine-learning omics-data-integration pipeline pre-processing statistical-analysis user-friendly workflow

2.3 match 11 stars 8.23 score 20 scripts 1 dependents

welch-lab

rliger:Linked Inference of Genomic Experimental Relationships

Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.

Maintained by Yichen Wang. Last updated 2 months ago.

nonnegative-matrix-factorization single-cell openblas cpp

1.7 match 408 stars 10.77 score 334 scripts 1 dependents

ha-pu

globaltrends:Download and Measure Global Trends Through Google Search Volumes

Google offers public access to global search volumes from its search engine through the Google Trends portal. The package downloads these search volumes provided by Google Trends and uses them to measure and analyze the distribution of search scores across countries or within countries. The package allows researchers and analysts to use these search scores to investigate global trends based on patterns within these scores. This offers insights such as degree of internationalization of firms and organizations or dissemination of political, social, or technological trends across the globe or within single countries. An outline of the package's methodological foundations and potential applications is available as a working paper: <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3969013>.

Maintained by Harald Puhr. Last updated 2 years ago.

google-trends internationalization

3.7 match 18 stars 5.00 score 11 scripts

rfastofficial

Rfast2:A Collection of Efficient and Extremely Fast R Functions II

A collection of fast statistical and utility functions for data analysis. Functions for regression, maximum likelihood, column-wise statistics and many more have been included. C++ has been utilized to speed up the functions. References: Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>.

Maintained by Manos Papadakis. Last updated 1 years ago.

openblas cpp openmp

2.3 match 38 stars 8.09 score 75 scripts 26 dependents

usdaforestservice

gdalraster:Bindings to the 'Geospatial Data Abstraction Library' Raster API

Interface to the Raster API of the 'Geospatial Data Abstraction Library' ('GDAL', <https://gdal.org>). Bindings are implemented in an exposed C++ class encapsulating a 'GDALDataset' and its raster band objects, along with several stand-alone functions. These support manual creation of uninitialized datasets, creation from existing raster as template, read/set dataset parameters, low level I/O, color tables, raster attribute tables, virtual raster (VRT), and 'gdalwarp' wrapper for reprojection and mosaicing. Includes 'GDAL' algorithms ('dem_proc()', 'polygonize()', 'rasterize()', etc.), and functions for coordinate transformation and spatial reference systems. Calling signatures resemble the native C, C++ and Python APIs provided by the 'GDAL' project. Includes raster 'calc()' to evaluate a given R expression on a layer or stack of layers, with pixel x/y available as variables in the expression; and raster 'combine()' to identify and count unique pixel combinations across multiple input layers, with optional output of the pixel-level combination IDs. Provides raster display using base 'graphics'. Bindings to a subset of the 'OGR' API are also included for managing vector data sources. Bindings to a subset of the Virtual Systems Interface ('VSI') are also included to support operations on 'GDAL' virtual file systems. These are general utility functions that abstract file system operations on URLs, cloud storage services, 'Zip'/'GZip'/'7z'/'RAR' archives, and in-memory files. 'gdalraster' may be useful in applications that need scalable, low-level I/O, or prefer a direct 'GDAL' API.

Maintained by Chris Toney. Last updated 9 hours ago.

gdal geospatial raster vector cpp

1.9 match 42 stars 9.51 score 32 scripts 3 dependents

ohdsi

ResultModelManager:Result Model Manager

Database data model management utilities for R packages in the Observational Health Data Sciences and Informatics program <https://ohdsi.org>. 'ResultModelManager' provides utility functions to allow package maintainers to migrate existing SQL database models, export and import results in consistent patterns.

Maintained by Jamie Gilbert. Last updated 6 months ago.

openjdk

2.4 match 4 stars 7.38 score 9 scripts 3 dependents

bioc

bnbc:Bandwise normalization and batch correction of Hi-C data

Tools to normalize (several) Hi-C data from replicates.

Maintained by Kipper Fletez-Brant. Last updated 5 months ago.

hic preprocessing normalization software cpp

4.6 match 1 stars 3.88 score 15 scripts

bioc

methylKit:DNA methylation analysis from high-throughput bisulfite sequencing results

methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Methylation calling can be performed directly from Bismark aligned BAM files.

Maintained by Altuna Akalin. Last updated 18 days ago.

dnamethylation sequencing methylseq genome-biology methylation statistical-analysis visualization curl bzip2 xz-utils zlib cpp

1.5 match 220 stars 11.80 score 578 scripts 3 dependents

bioc

cydar:Using Mass Cytometry for Differential Abundance Analyses

Identifies differentially abundant populations between samples and groups in mass cytometry data. Provides methods for counting cells into hyperspheres, controlling the spatial false discovery rate, and visualizing changes in abundance in the high-dimensional marker space.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology flowcytometry multiplecomparison proteomics singlecell cpp

3.1 match 5.64 score 48 scripts

tjaki

PK:Basic Non-Compartmental Pharmacokinetics

Estimation of pharmacokinetic parameters using non-compartmental theory.

Maintained by Thomas Jaki. Last updated 2 years ago.

6.8 match 2.59 score 13 scripts 1 dependents

psegaert

mrfDepth:Depth Measures in Multivariate, Regression and Functional Settings

Tools to compute depth measures and implementations of related tasks such as outlier detection, data exploration and classification of multivariate, regression and functional data.

Maintained by Jakob Raymaekers. Last updated 6 years ago.

fortran openblas cpp

3.5 match 3 stars 4.99 score 72 scripts 3 dependents

polar-fhir

fhircrackr:Handling HL7 FHIR® Resources in R

Useful tools for conveniently downloading FHIR resources in xml format and converting them to R data.frames. The package uses FHIR-search to download bundles from a FHIR server, provides functions to save and read xml-files containing such bundles and allows flattening the bundles to data.frames using XPath expressions. FHIR® is the registered trademark of HL7 and is used with the permission of HL7. Use of the FHIR trademark does not constitute endorsement of this product by HL7.

Maintained by Julia Palm. Last updated 13 days ago.

fhir fhir-client

2.3 match 33 stars 7.63 score 46 scripts

bioc

wateRmelon:Illumina DNA methylation array normalization and metrics

15 flavours of betas and three performance metrics, with methods for objects produced by methylumi and minfi packages.

Maintained by Leo C Schalkwyk. Last updated 4 months ago.

dnamethylation microarray twochannel preprocessing qualitycontrol

2.3 match 7.75 score 247 scripts 2 dependents

bioc

scrapper:Bindings to C++ Libraries for Single-Cell Analysis

Implements R bindings to C++ code for analyzing single-cell (expression) data, mostly from various libscran libraries. Each function performs an individual step in the single-cell analysis workflow, ranging from quality control to clustering and marker detection. It is mostly intended for other Bioconductor package developers to build more user-friendly end-to-end workflows.

Maintained by Aaron Lun. Last updated 6 days ago.

normalization rnaseq software geneexpression transcriptomics singlecell batcheffect qualitycontrol differentialexpression featureextraction principalcomponent clustering openblas cpp

3.1 match 5.55 score 32 scripts

tudo-r

BatchExperiments:Statistical Experiments on Batch Computing Clusters

Extends the BatchJobs package to run statistical experiments on batch computing clusters. For further details see the project web page.

Maintained by Michel Lang. Last updated 3 years ago.

3.5 match 17 stars 4.90 score 47 scripts

mlampros

textTinyR:Text Processing for Small or Big Data Files

It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.

Maintained by Lampros Mouselimis. Last updated 1 years ago.

bh boost cpp11 processing rcpp rcpparmadillo text openblas cpp openmp

2.3 match 38 stars 7.64 score 244 scripts 1 dependents

bioc

bluster:Clustering Algorithms for Bioconductor

Wraps common clustering algorithms in an easily extended S4 framework. Backends are implemented for hierarchical, k-means and graph-based clustering. Several utilities are also provided to compare and evaluate clustering results.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology software geneexpression transcriptomics singlecell clustering cpp

1.8 match 9.43 score 636 scripts 51 dependents