R-universe search: needs:mclust

bioc

minfi:Analyze Illumina Infinium DNA methylation arrays

Tools to analyze & visualize Illumina Infinium methylation arrays.

Maintained by Kasper Daniel Hansen. Last updated 4 months ago.

immunooncology dnamethylation differentialmethylation epigenetics microarray methylationarray multichannel twochannel dataimport normalization preprocessing qualitycontrol

60 stars 12.82 score 996 scripts 27 dependents

alexiosg

rugarch:Univariate GARCH Models

ARFIMA, in-mean, external regressors and various GARCH flavors, with methods for fit, forecast, simulation, inference and plotting.

Maintained by Alexios Galanos. Last updated 3 months ago.

cpp

26 stars 12.25 score 1.3k scripts 16 dependents

jamesramsay5

fda:Functional Data Analysis

These functions were developed to support functional data analysis as described in Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. New York: Springer and in Ramsay, J. O., Hooker, Giles, and Graves, Spencer (2009). Functional Data Analysis with R and Matlab (Springer). The package includes data sets and script files working many examples including all but one of the 76 figures in this latter book. Matlab versions are available by ftp from <https://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/>.

Maintained by James Ramsay. Last updated 4 months ago.

3 stars 11.88 score 2.0k scripts 142 dependents

bioc

methylKit:DNA methylation analysis from high-throughput bisulfite sequencing results

methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Methylation calling can be performed directly from Bismark aligned BAM files.

Maintained by Altuna Akalin. Last updated 1 months ago.

dnamethylation sequencing methylseq genome-biology methylation statistical-analysis visualization curl bzip2 xz-utils zlib cpp

224 stars 11.78 score 578 scripts 3 dependents

bioc

Maaslin2:"Multivariable Association Discovery in Population-scale Meta-omics Studies"

MaAsLin2 is comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta'omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods. MaAsLin2 is the next generation of MaAsLin.

Maintained by Lauren McIver. Last updated 5 months ago.

metagenomics software microbiome normalization biobakery bioconductor differential-abundance-analysis false-discovery-rate multiple-covariates public repeated-measures tools

133 stars 11.03 score 532 scripts 3 dependents

r-lum

Luminescence:Comprehensive Luminescence Dating Data Analysis

A collection of various R functions for the purpose of Luminescence dating data analysis. This includes, amongst others, data import, export, application of age models, curve deconvolution, sequence analysis and plotting of equivalent dose distributions.

Maintained by Sebastian Kreutzer. Last updated 13 hours ago.

bayesian-statistics data-science geochronology luminescence luminescence-dating open-science osl plotting radiofluorescence tl xsyg cpp

16 stars 10.67 score 178 scripts 8 dependents

dicook

nullabor:Tools for Graphical Inference

Tools for visual inference. Generate null data sets and null plots using permutation and simulation. Calculate distance metrics for a lineup, and examine the distributions of metrics.

Maintained by Di Cook. Last updated 2 months ago.

57 stars 10.38 score 370 scripts 2 dependents

egeulgen

pathfindR:Enrichment Analysis Utilizing Active Subnetworks

Enrichment analysis enables researchers to uncover mechanisms underlying a phenotype. However, conventional methods for enrichment analysis do not take into account protein-protein interaction information, resulting in incomplete conclusions. 'pathfindR' is a tool for enrichment analysis utilizing active subnetworks. The main function identifies active subnetworks in a protein-protein interaction network using a user-provided list of genes and associated p values. It then performs enrichment analyses on the identified subnetworks, identifying enriched terms (i.e. pathways or, more broadly, gene sets) that possibly underlie the phenotype of interest. 'pathfindR' also offers functionalities to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results. The enrichment, clustering and other methods implemented in 'pathfindR' are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2019. 'pathfindR': An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. <doi:10.3389/fgene.2019.00858>.

Maintained by Ege Ulgen. Last updated 1 months ago.

active-subnetworks enrichment pathway pathway-enrichment-analysis subnetwork

187 stars 10.38 score 138 scripts

robjhyndman

hdrcde:Highest Density Regions and Conditional Density Estimation

Computation of highest density regions in one and two dimensions, kernel estimation of univariate density functions conditional on one covariate,and multimodal regression.

Maintained by Rob Hyndman. Last updated 2 years ago.

fortran

24 stars 10.38 score 128 scripts 158 dependents

bioc

pRoloc:A unifying bioinformatics framework for spatial proteomics

The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation.

Maintained by Lisa Breckels. Last updated 4 days ago.

immunooncology proteomics massspectrometry classification clustering qualitycontrol bioconductor proteomics-data spatial-proteomics visualisation openblas cpp

15 stars 10.31 score 101 scripts 2 dependents

tarnduong

ks:Kernel Smoothing

Kernel smoothers for univariate and multivariate data, with comprehensive visualisation and bandwidth selection capabilities, including for densities, density derivatives, cumulative distributions, clustering, classification, density ridges, significant modal regions, and two-sample hypothesis tests. Chacon & Duong (2018) <doi:10.1201/9780429485572>.

Maintained by Tarn Duong. Last updated 6 months ago.

6 stars 10.19 score 920 scripts 262 dependents

bioc

singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.

Maintained by Joshua David Campbell. Last updated 1 months ago.

singlecell geneexpression differentialexpression alignment clustering immunooncology batcheffect normalization qualitycontrol dataimport gui

182 stars 10.17 score 252 scripts

refunders

refund:Regression with Functional Data

Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.

Maintained by Julia Wrobel. Last updated 6 months ago.

43 stars 10.11 score 472 scripts 17 dependents

mhahsler

stream:Infrastructure for Data Stream Mining

A framework for data stream modeling and associated data mining tasks such as clustering and classification. The development of this package was supported in part by NSF IIS-0948893, NSF CMMI 1728612, and NIH R21HG005912. Hahsler et al (2017) <doi:10.18637/jss.v076.i14>.

Maintained by Michael Hahsler. Last updated 19 days ago.

data-stream-clustering datastream stream-mining cpp

39 stars 10.05 score 132 scripts 3 dependents

bioc

methylumi:Handle Illumina methylation data

This package provides classes for holding and manipulating Illumina methylation data. Based on eSet, it can contain MIAME information, sample information, feature information, and multiple matrices of data. An "intelligent" import function, methylumiR can read the Illumina text files and create a MethyLumiSet. methylumIDAT can directly read raw IDAT files from HumanMethylation27 and HumanMethylation450 microarrays. Normalization, background correction, and quality control features for GoldenGate, Infinium, and Infinium HD arrays are also included.

Maintained by Sean Davis. Last updated 5 months ago.

dnamethylation twochannel preprocessing qualitycontrol cpgisland

9 stars 9.90 score 89 scripts 9 dependents

bioc

PureCN:Copy number calling and SNV classification using targeted short read sequencing

This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.

Maintained by Markus Riester. Last updated 1 days ago.

copynumbervariation software sequencing variantannotation variantdetection coverage immunooncology bioconductor-package cell-free-dna copy-number loh tumor-heterogeneity tumor-mutational-burden tumor-purity

132 stars 9.88 score 40 scripts

moviedo5

fda.usc:Functional Data Analysis and Utilities for Statistical Computing

Routines for exploratory and descriptive analysis of functional data such as depth measurements, atypical curves detection, regression models, supervised classification, unsupervised classification and functional analysis of variance.

Maintained by Manuel Oviedo de la Fuente. Last updated 5 months ago.

functional-data-analysis fortran

12 stars 9.72 score 560 scripts 22 dependents

bblonder

hypervolume:High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls

Estimates the shape and volume of high-dimensional datasets and performs set operations: intersection / overlap, union, unique components, inclusion test, and hole detection. Uses stochastic geometry approach to high-dimensional kernel density estimation, support vector machine delineation, and convex hull generation. Applications include modeling trait and niche hypervolumes and species distribution modeling.

Maintained by Benjamin Blonder. Last updated 2 months ago.

openblas cpp

23 stars 9.69 score 211 scripts 7 dependents

hafen

trelliscopejs:Create Interactive Trelliscope Displays

Trelliscope is a scalable, flexible, interactive approach to visualizing data (Hafen, 2013 <doi:10.1109/LDAV.2013.6675164>). This package provides methods that make it easy to create a Trelliscope display specification for TrelliscopeJS. High-level functions are provided for creating displays from within 'tidyverse' or 'ggplot2' workflows. Low-level functions are also provided for creating new interfaces.

Maintained by Ryan Hafen. Last updated 1 years ago.

visualization

262 stars 9.61 score 1000 scripts 1 dependents

bioc

Nebulosa:Single-Cell Data Visualisation Using Kernel Gene-Weighted Density Estimation

This package provides a enhanced visualization of single-cell data based on gene-weighted density estimation. Nebulosa recovers the signal from dropped-out features and allows the inspection of the joint expression from multiple features (e.g. genes). Seurat and SingleCellExperiment objects can be used within Nebulosa.

Maintained by Jose Alquicira-Hernandez. Last updated 5 months ago.

software geneexpression singlecell visualization dimensionreduction single-cell single-cell-analysis single-cell-multiomics single-cell-rna-seq

99 stars 9.52 score 494 scripts

immunomind

immunarch:Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires

A comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq) and single-cell sequencing (scRNAseq). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published 'tcR' immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.

Maintained by Vadim I. Nazarov. Last updated 1 years ago.

airr-analysis b-cell-receptor bcr bcr-repertoire bioinformatics ig ig-repertoire immune-repertoire immune-repertoire-analysis immune-repertoire-data immunoglobulin immunoinformatics immunology rep-seq repertoire-analysis single-cell single-cell-analysis t-cell-receptor tcr tcr-repertoire cpp

316 stars 9.49 score 203 scripts

ecospat

ecospat:Spatial Ecology Miscellaneous Methods

Collection of R functions and data sets for the support of spatial ecology analyses with a focus on pre, core and post modelling analyses of species distribution, niche quantification and community assembly. Written by current and former members and collaborators of the ecospat group of Antoine Guisan, Department of Ecology and Evolution (DEE) and Institute of Earth Surface Dynamics (IDYST), University of Lausanne, Switzerland. Read Di Cola et al. (2016) <doi:10.1111/ecog.02671> for details.

Maintained by Olivier Broennimann. Last updated 2 months ago.

32 stars 9.35 score 418 scripts 1 dependents

chrhennig

fpc:Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

Maintained by Christian Hennig. Last updated 6 months ago.

11 stars 9.32 score 2.6k scripts 69 dependents

matthias-da

robCompositions:Compositional Data Analysis

Methods for analysis of compositional data including robust methods (<doi:10.1007/978-3-319-96422-5>), imputation of missing values (<doi:10.1016/j.csda.2009.11.023>), methods to replace rounded zeros (<doi:10.1080/02664763.2017.1410524>, <doi:10.1016/j.chemolab.2016.04.011>, <doi:10.1016/j.csda.2012.02.012>), count zeros (<doi:10.1177/1471082X14535524>), methods to deal with essential zeros (<doi:10.1080/02664763.2016.1182135>), (robust) outlier detection for compositional data, (robust) principal component analysis for compositional data, (robust) factor analysis for compositional data, (robust) discriminant analysis for compositional data (Fisher rule), robust regression with compositional predictors, functional data analysis (<doi:10.1016/j.csda.2015.07.007>) and p-splines (<doi:10.1016/j.csda.2015.07.007>), contingency (<doi:10.1080/03610926.2013.824980>) and compositional tables (<doi:10.1111/sjos.12326>, <doi:10.1111/sjos.12223>, <doi:10.1080/02664763.2013.856871>) and (robust) Anderson-Darling normality tests for compositional data as well as popular log-ratio transformations (addLR, cenLR, isomLR, and their inverse transformations). In addition, visualisation and diagnostic tools are implemented as well as high and low-level plot functions for the ternary diagram.

Maintained by Matthias Templ. Last updated 1 months ago.

cpp

11 stars 9.21 score 226 scripts 2 dependents

andrewljackson

SIBER:Stable Isotope Bayesian Ellipses in R

Fits bi-variate ellipses to stable isotope data using Bayesian inference with the aim being to describe and compare their isotopic niche.

Maintained by Andrew Jackson. Last updated 10 months ago.

community-ecology ecology niche-modelling stable-isotopes jags cpp

37 stars 9.15 score 187 scripts 1 dependents

bioc

Banksy:Spatial transcriptomic clustering

Banksy is an R package that incorporates spatial information to cluster cells in a feature space (e.g. gene expression). To incorporate spatial information, BANKSY computes the mean neighborhood expression and azimuthal Gabor filters that capture gene expression gradients. These features are combined with the cell's own expression to embed cells in a neighbor-augmented product space which can then be clustered, allowing for accurate and spatially-aware cell typing and tissue domain segmentation.

Maintained by Joseph Lee. Last updated 28 days ago.

clustering spatial singlecell geneexpression dimensionreduction clustering-algorithm single-cell-omics spatial-omics

90 stars 9.03 score 248 scripts

bioc

scPipe:Pipeline for single cell multi-omic data pre-processing

A preprocessing pipeline for single cell RNA-seq/ATAC-seq data that starts from the fastq files and produces a feature count matrix with associated quality control information. It can process fastq data generated by CEL-seq, MARS-seq, Drop-seq, Chromium 10x and SMART-seq protocols.

Maintained by Shian Su. Last updated 4 months ago.

immunooncology software sequencing rnaseq geneexpression singlecell visualization sequencematching preprocessing qualitycontrol genomeannotation dataimport curl bzip2 xz-utils zlib cpp

68 stars 9.02 score 84 scripts

bioc

scone:Single Cell Overview of Normalized Expression data

SCONE is an R package for comparing and ranking the performance of different normalization schemes for single-cell RNA-seq and other high-throughput analyses.

Maintained by Davide Risso. Last updated 1 months ago.

immunooncology normalization preprocessing qualitycontrol geneexpression rnaseq software transcriptomics sequencing singlecell coverage

53 stars 9.00 score 104 scripts

data-edu

tidyLPA:Easily Carry Out Latent Profile Analysis (LPA) Using Open-Source or Commercial Software

Easily carry out latent profile analysis ("LPA"), determine the correct number of classes based on best practices, and tabulate and plot the results. Provides functionality to estimate commonly-specified models with free means, variances, and covariances for each profile. Follows a tidy approach, in that output is in the form of a data frame that can subsequently be computed on. Models can be estimated using the free open source 'R' packages 'Mclust' and 'OpenMx', or using the commercial program 'MPlus', via the 'MplusAutomation' package.

Maintained by Joshua M Rosenberg. Last updated 1 years ago.

58 stars 8.76 score 121 scripts

r-forge

ClassDiscovery:Classes and Methods for "Class Discovery" with Microarrays or Proteomics

Defines the classes used for "class discovery" problems in the OOMPA project (<http://oompa.r-forge.r-project.org/>). Class discovery primarily consists of unsupervised clustering methods with attempts to assess their statistical significance.

Maintained by Kevin R. Coombes. Last updated 2 months ago.

microarray clustering

8.53 score 85 scripts 9 dependents

alexiosg

rmgarch:Multivariate GARCH Models

Feasible multivariate GARCH models including DCC, GO-GARCH and Copula-GARCH.

Maintained by Alexios Galanos. Last updated 3 months ago.

openblas cpp openmp

14 stars 8.51 score 294 scripts 2 dependents

hmorlon

RPANDA:Phylogenetic ANalyses of DiversificAtion

Implements macroevolutionary analyses on phylogenetic trees. See Morlon et al. (2010) <DOI:10.1371/journal.pbio.1000493>, Morlon et al. (2011) <DOI:10.1073/pnas.1102543108>, Condamine et al. (2013) <DOI:10.1111/ele.12062>, Morlon et al. (2014) <DOI:10.1111/ele.12251>, Manceau et al. (2015) <DOI:10.1111/ele.12415>, Lewitus & Morlon (2016) <DOI:10.1093/sysbio/syv116>, Drury et al. (2016) <DOI:10.1093/sysbio/syw020>, Manceau et al. (2016) <DOI:10.1093/sysbio/syw115>, Morlon et al. (2016) <DOI:10.1111/2041-210X.12526>, Clavel & Morlon (2017) <DOI:10.1073/pnas.1606868114>, Drury et al. (2017) <DOI:10.1093/sysbio/syx079>, Lewitus & Morlon (2017) <DOI:10.1093/sysbio/syx095>, Drury et al. (2018) <DOI:10.1371/journal.pbio.2003563>, Clavel et al. (2019) <DOI:10.1093/sysbio/syy045>, Maliet et al. (2019) <DOI:10.1038/s41559-019-0908-0>, Billaud et al. (2019) <DOI:10.1093/sysbio/syz057>, Lewitus et al. (2019) <DOI:10.1093/sysbio/syz061>, Aristide & Morlon (2019) <DOI:10.1111/ele.13385>, Maliet et al. (2020) <DOI:10.1111/ele.13592>, Drury et al. (2021) <DOI:10.1371/journal.pbio.3001270>, Perez-Lamarque & Morlon (2022) <DOI:10.1111/mec.16478>, Perez-Lamarque et al. (2022) <DOI:10.1101/2021.08.30.458192>, Mazet et al. (2023) <DOI:10.1111/2041-210X.14195>, Drury et al. (2024) <DOI:10.1016/j.cub.2023.12.055>.

Maintained by Hélène Morlon. Last updated 3 months ago.

24 stars 8.50 score 255 scripts

wallaceecomod

wallace:A Modular Platform for Reproducible Modeling of Species Niches and Distributions

The 'shiny' application Wallace is a modular platform for reproducible modeling of species niches and distributions. Wallace guides users through a complete analysis, from the acquisition of species occurrence and environmental data to visualizing model predictions on an interactive map, thus bundling complex workflows into a single, streamlined interface. An extensive vignette, which guides users through most package functionality can be found on the package's GitHub Pages website: <https://wallaceecomod.github.io/wallace/articles/tutorial-v2.html>.

Maintained by Mary E. Blair. Last updated 24 days ago.

openjdk

133 stars 8.36 score 96 scripts

mlr-org

mlr3verse:Easily Install and Load the 'mlr3' Package Family

The 'mlr3' package family is a set of packages for machine-learning purposes built in a modular fashion. This wrapper package is aimed to simplify the installation and loading of the core 'mlr3' packages. Get more information about the 'mlr3' project at <https://mlr3book.mlr-org.com/>.

Maintained by Marc Becker. Last updated 3 months ago.

machine-learning meta mlr3

55 stars 8.32 score 720 scripts 1 dependents

cefet-rj-dal

harbinger:A Unified Time Series Event Detection Framework

By analyzing time series, it is possible to observe significant changes in the behavior of observations that frequently characterize events. Events present themselves as anomalies, change points, or motifs. In the literature, there are several methods for detecting events. However, searching for a suitable time series method is a complex task, especially considering that the nature of events is often unknown. This work presents Harbinger, a framework for integrating and analyzing event detection methods. Harbinger contains several state-of-the-art methods described in Salles et al. (2020) <doi:10.5753/sbbd.2020.13626>.

Maintained by Eduardo Ogasawara. Last updated 4 months ago.

18 stars 8.32 score 216 scripts

mlr-org

mlr3cluster:Cluster Extension for 'mlr3'

Extends the 'mlr3' package with cluster analysis.

Maintained by Maximilian Mücke. Last updated 1 months ago.

cluster-analysis clustering mlr3

23 stars 8.31 score 50 scripts 2 dependents

bioc

flowStats:Statistical methods for the analysis of flow cytometry data

Methods and functionality to analyse flow data that is beyond the basic infrastructure provided by the flowCore package.

Maintained by Greg Finak. Last updated 5 months ago.

immunooncology flowcytometry cellbasedassays

14 stars 8.27 score 195 scripts 1 dependents

branchlab

metasnf:Meta Clustering with Similarity Network Fusion

Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.

Maintained by Prashanth S Velayudhan. Last updated 6 days ago.

bioinformatics clustering metaclustering snf

8 stars 8.21 score 30 scripts

robjhyndman

demography:Forecasting Mortality, Fertility, Migration and Population Data

Functions for demographic analysis including lifetable calculations; Lee-Carter modelling; functional data analysis of mortality rates, fertility rates, net migration numbers; and stochastic population forecasting.

Maintained by Rob Hyndman. Last updated 4 months ago.

actuarial demography forecasting

74 stars 8.21 score 241 scripts 6 dependents

jranke

mkin:Kinetic Evaluation of Chemical Degradation Data

Calculation routines based on the FOCUS Kinetics Report (2006, 2014). Includes a function for conveniently defining differential equation models, model solution based on eigenvalues if possible or using numerical solvers. If a C compiler (on windows: 'Rtools') is installed, differential equation models are solved using automatically generated C functions. Non-constant errors can be taken into account using variance by variable or two-component error models <doi:10.3390/environments6120124>. Hierarchical degradation models can be fitted using nonlinear mixed-effects model packages as a back end <doi:10.3390/environments8080071>. Please note that no warranty is implied for correctness of results or fitness for a particular purpose.

Maintained by Johannes Ranke. Last updated 2 months ago.

degradation focus-kinetics kinetic-models kinetics ode ode-model

11 stars 8.18 score 78 scripts 1 dependents

rcannood

SCORPIUS:Inferring Developmental Chronologies from Single-Cell RNA Sequencing Data

An accurate and easy tool for performing linear trajectory inference on single cells using single-cell RNA sequencing data. In addition, 'SCORPIUS' provides functions for discovering the most important genes with respect to the reconstructed trajectory, as well as nice visualisation tools. Cannoodt et al. (2016) <doi:10.1101/079509>.

Maintained by Robrecht Cannoodt. Last updated 2 years ago.

59 stars 8.17 score 126 scripts

alinetalhouk

diceR:Diverse Cluster Ensemble in R

Performs cluster analysis using an ensemble clustering framework, Chiu & Talhouk (2018) <doi:10.1186/s12859-017-1996-y>. Results from a diverse set of algorithms are pooled together using methods such as majority voting, K-Modes, LinkCluE, and CSPA. There are options to compare cluster assignments across algorithms using internal and external indices, visualizations such as heatmaps, and significance testing for the existence of clusters.

Maintained by Derek Chiu. Last updated 2 months ago.

cpp

37 stars 8.13 score 60 scripts 3 dependents

andrewcparnell

Bchron:Radiocarbon Dating, Age-Depth Modelling, Relative Sea Level Rate Estimation, and Non-Parametric Phase Modelling

Enables quick calibration of radiocarbon dates under various calibration curves (including user generated ones); age-depth modelling as per the algorithm of Haslett and Parnell (2008) <DOI:10.1111/j.1467-9876.2008.00623.x>; Relative sea level rate estimation incorporating time uncertainty in polynomial regression models (Parnell and Gehrels 2015) <DOI:10.1002/9781118452547.ch32>; non-parametric phase modelling via Gaussian mixtures as a means to determine the activity of a site (and as an alternative to the Oxcal function SUM; currently unpublished), and reverse calibration of dates from calibrated into un-calibrated years (also unpublished).

Maintained by Andrew Parnell. Last updated 2 years ago.

36 stars 8.09 score 176 scripts 1 dependents

acorg

Racmacs:Antigenic Cartography Macros

A toolkit for making antigenic maps from immunological assay data, in order to quantify and visualize antigenic differences between different pathogen strains as described in Smith et al. (2004) <doi:10.1126/science.1097211> and used in the World Health Organization influenza vaccine strain selection process. Additional functions allow for the diagnostic evaluation of antigenic maps and an interactive viewer is provided to explore antigenic relationships amongst several strains and incorporate the visualization of associated genetic information.

Maintained by Sam Wilks. Last updated 9 months ago.

openblas cpp openmp

21 stars 8.06 score 362 scripts

sbgraves237

Ecfun:Functions for 'Ecdat'

Functions and vignettes to update data sets in 'Ecdat' and to create, manipulate, plot, and analyze those and similar data sets.

Maintained by Spencer Graves. Last updated 4 months ago.

8.02 score 85 scripts 4 dependents

bioc

netZooR:Unified methods for the inference and analysis of gene regulatory networks

netZooR unifies the implementations of several Network Zoo methods (netzoo, netzoo.github.io) into a single package by creating interfaces between network inference and network analysis methods. Currently, the package has 3 methods for network inference including PANDA and its optimized implementation OTTER (network reconstruction using mutliple lines of biological evidence), LIONESS (single-sample network inference), and EGRET (genotype-specific networks). Network analysis methods include CONDOR (community detection), ALPACA (differential community detection), CRANE (significance estimation of differential modules), MONSTER (estimation of network transition states). In addition, YARN allows to process gene expresssion data for tissue-specific analyses and SAMBAR infers missing mutation data based on pathway information.

Maintained by Tara Eicher. Last updated 13 days ago.

networkinference network generegulation geneexpression transcription microarray graphandnetwork gene-regulatory-network transcription-factors

105 stars 7.98 score

bioc

scDD:Mixture modeling of single-cell RNA-seq data to identify genes with differential distributions

This package implements a method to analyze single-cell RNA- seq Data utilizing flexible Dirichlet Process mixture models. Genes with differential distributions of expression are classified into several interesting patterns of differences between two conditions. The package also includes functions for simulating data with these patterns from negative binomial distributions.

Maintained by Keegan Korthauer. Last updated 5 months ago.

immunooncology bayesian clustering rnaseq singlecell multiplecomparison visualization differentialexpression

33 stars 7.92 score 50 scripts

bioc

AneuFinder:Analysis of Copy Number Variation in Single-Cell-Sequencing Data

AneuFinder implements functions for copy-number detection, breakpoint detection, and karyotype and heterogeneity analysis in single-cell whole genome sequencing and strand-seq data.

Maintained by Aaron Taudt. Last updated 5 days ago.

immunooncology software sequencing singlecell copynumbervariation genomicvariation hiddenmarkovmodel wholegenome cpp

18 stars 7.90 score 37 scripts

bioc

BayesSpace:Clustering and Resolution Enhancement of Spatial Transcriptomes

Tools for clustering and enhancing the resolution of spatial gene expression experiments. BayesSpace clusters a low-dimensional representation of the gene expression matrix, incorporating a spatial prior to encourage neighboring spots to cluster together. The method can enhance the resolution of the low-dimensional representation into "sub-spots", for which features such as gene expression or cell type composition can be imputed.

Maintained by Matt Stone. Last updated 5 months ago.

software clustering transcriptomics geneexpression singlecell immunooncology dataimport openblas cpp openmp

126 stars 7.90 score 278 scripts 1 dependents

davidrusi

mombf:Model Selection with Bayesian Methods and Information Criteria

Model selection and averaging for regression and mixtures, inclusing Bayesian model selection and information criteria (BIC, EBIC, AIC, GIC).

Maintained by David Rossell. Last updated 2 months ago.

openblas cpp openmp

7 stars 7.89 score 73 scripts 1 dependents

niaid

dsb:Normalize & Denoise Droplet Single Cell Protein Data (CITE-Seq)

This lightweight R package provides a method for normalizing and denoising protein expression data from droplet based single cell experiments. Raw protein Unique Molecular Index (UMI) counts from sequencing DNA-conjugated antibody derived tags (ADT) in droplets (e.g. 'CITE-seq') have substantial measurement noise. Our experiments and computational modeling revealed two major components of this noise: 1) protein-specific noise originating from ambient, unbound antibody encapsulated in droplets that can be accurately inferred via the expected protein counts detected in empty droplets, and 2) droplet/cell-specific noise revealed via the shared variance component associated with isotype antibody controls and background protein counts in each cell. This package normalizes and removes both of these sources of noise from raw protein data derived from methods such as 'CITE-seq', 'REAP-seq', 'ASAP-seq', 'TEA-seq', 'proteogenomic' data from the Mission Bio platform, etc. See the vignette for tutorials on how to integrate dsb with 'Seurat' and 'Bioconductor' and how to use dsb in 'Python'. Please see our paper Mulè M.P., Martins A.J., and Tsang J.S. Nature Communications 2022 <https://www.nature.com/articles/s41467-022-29356-8> for more details on the method.

Maintained by Matthew Mulè. Last updated 10 months ago.

cite-seq niaid-tsang-lab

65 stars 7.73 score 104 scripts

bioc

wateRmelon:Illumina DNA methylation array normalization and metrics

15 flavours of betas and three performance metrics, with methods for objects produced by methylumi and minfi packages.

Maintained by Leo C Schalkwyk. Last updated 4 months ago.

dnamethylation microarray twochannel preprocessing qualitycontrol

7.73 score 247 scripts 2 dependents

valeriapolicastro

robin:ROBustness in Network

Assesses the robustness of the community structure of a network found by one or more community detection algorithm to give indications about their reliability. It detects if the community structure found by a set of algorithms is statistically significant and compares the different selected detection algorithms on the same network. robin helps to choose among different community detection algorithms the one that better fits the network of interest. Reference in Policastro V., Righelli D., Carissimo A., Cutillo L., De Feis I. (2021) <https://journal.r-project.org/archive/2021/RJ-2021-040/index.html>.

Maintained by Valeria Policastro. Last updated 10 days ago.

19 stars 7.72 score 8 scripts

bioc

MLInterfaces:Uniform interfaces to R machine learning procedures for data in Bioconductor containers

This package provides uniform interfaces to machine learning code for data in R and Bioconductor containers.

Maintained by Vincent Carey. Last updated 5 months ago.

classification clustering

7.63 score 79 scripts 6 dependents

bioc

scDesign3:A unified framework of realistic in silico data generation and statistical model inference for single-cell and spatial omics

We present a statistical simulator, scDesign3, to generate realistic single-cell and spatial omics data, including various cell states, experimental designs, and feature modalities, by learning interpretable parameters from real data. Using a unified probabilistic model for single-cell and spatial omics data, scDesign3 infers biologically meaningful parameters; assesses the goodness-of-fit of inferred cell clusters, trajectories, and spatial locations; and generates in silico negative and positive controls for benchmarking computational tools.

Maintained by Dongyuan Song. Last updated 30 days ago.

software singlecell sequencing geneexpression spatial

89 stars 7.59 score 25 scripts

samhforbes

PupillometryR:A Unified Pipeline for Pupillometry Data

Provides a unified pipeline to clean, prepare, plot, and run basic analyses on pupillometry experiments.

Maintained by Samuel Forbes. Last updated 2 years ago.

44 stars 7.58 score 288 scripts 1 dependents

bioc

TSCAN:Tools for Single-Cell Analysis

Provides methods to perform trajectory analysis based on a minimum spanning tree constructed from cluster centroids. Computes pseudotemporal cell orderings by mapping cells in each cluster (or new cells) to the closest edge in the tree. Uses linear modelling to identify differentially expressed genes along each path through the tree. Several plotting and interactive visualization functions are also implemented.

Maintained by Zhicheng Ji. Last updated 5 months ago.

geneexpression visualization gui

7.58 score 207 scripts 3 dependents

bioc

cola:A Framework for Consensus Partitioning

Subgroup classification is a basic task in genomic data analysis, especially for gene expression and DNA methylation data analysis. It can also be used to test the agreement to known clinical annotations, or to test whether there exist significant batch effects. The cola package provides a general framework for subgroup classification by consensus partitioning. It has the following features: 1. It modularizes the consensus partitioning processes that various methods can be easily integrated. 2. It provides rich visualizations for interpreting the results. 3. It allows running multiple methods at the same time and provides functionalities to straightforward compare results. 4. It provides a new method to extract features which are more efficient to separate subgroups. 5. It automatically generates detailed reports for the complete analysis. 6. It allows applying consensus partitioning in a hierarchical manner.

Maintained by Zuguang Gu. Last updated 2 months ago.

clustering geneexpression classification software consensus-clustering cpp

61 stars 7.49 score 112 scripts

dboslab

expowo:An R package for mining global plant diversity and distribution data

Produces diversity estimates and species lists with associated global distribution for any vascular plant family and genus from 'Plants of the World Online' database <https://powo.science.kew.org/>, by interacting with the source code of each plant taxon page. It also creates global maps of species richness, graphics of species discoveries and nomenclatural changes over time. For more details

Maintained by Debora Zuanny. Last updated 7 days ago.

data-mining extractor

8 stars 7.44 score 64 scripts

bioc

genefu:Computation of Gene Expression-Based Signatures in Breast Cancer

This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.

Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.

differentialexpression geneexpression visualization clustering classification

7.42 score 193 scripts 3 dependents

thibautjombart

treespace:Statistical Exploration of Landscapes of Phylogenetic Trees

Tools for the exploration of distributions of phylogenetic trees. This package includes a 'shiny' interface which can be started from R using treespaceServer(). For further details see Jombart et al. (2017) <DOI:10.1111/1755-0998.12676>.

Maintained by Michelle Kendall. Last updated 2 years ago.

cpp

28 stars 7.39 score 63 scripts

bioc

cogena:co-expressed gene-set enrichment analysis

cogena is a workflow for co-expressed gene-set enrichment analysis. It aims to discovery smaller scale, but highly correlated cellular events that may be of great biological relevance. A novel pipeline for drug discovery and drug repositioning based on the cogena workflow is proposed. Particularly, candidate drugs can be predicted based on the gene expression of disease-related data, or other similar drugs can be identified based on the gene expression of drug-related data. Moreover, the drug mode of action can be disclosed by the associated pathway analysis. In summary, cogena is a flexible workflow for various gene set enrichment analysis for co-expressed genes, with a focus on pathway/GO analysis and drug repositioning.

Maintained by Zhilong Jia. Last updated 5 months ago.

clustering genesetenrichment geneexpression visualization pathways kegg go microarray sequencing systemsbiology datarepresentation dataimport bioconductor bioinformatics

12 stars 7.36 score 32 scripts

bioc

shinyMethyl:Interactive visualization for Illumina methylation arrays

Interactive tool for visualizing Illumina methylation array data. Both the 450k and EPIC array are supported.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

dnamethylation microarray twochannel preprocessing qualitycontrol methylationarray

5 stars 7.34 score 42 scripts

mkossmeier

metaviz:Forest Plots, Funnel Plots, and Visual Funnel Plot Inference for Meta-Analysis

A compilation of functions to create visually appealing and information-rich plots of meta-analytic data using 'ggplot2'. Currently allows to create forest plots, funnel plots, and many of their variants, such as rainforest plots, thick forest plots, additional evidence contour funnel plots, and sunset funnel plots. In addition, functionalities for visual inference with the funnel plot in the context of meta-analysis are provided.

Maintained by Michael Kossmeier. Last updated 5 years ago.

funnel-plots rainforest-plots

17 stars 7.32 score 135 scripts

bioc

missMethyl:Analysing Illumina HumanMethylation BeadChip Data

Normalisation, testing for differential variability and differential methylation and gene set testing for data from Illumina's Infinium HumanMethylation arrays. The normalisation procedure is subset-quantile within-array normalisation (SWAN), which allows Infinium I and II type probes on a single array to be normalised together. The test for differential variability is based on an empirical Bayes version of Levene's test. Differential methylation testing is performed using RUV, which can adjust for systematic errors of unknown origin in high-dimensional data by using negative control probes. Gene ontology analysis is performed by taking into account the number of probes per gene on the array, as well as taking into account multi-gene associated probes.

Maintained by Belinda Phipson. Last updated 29 days ago.

normalization dnamethylation methylationarray genomicvariation geneticvariability differentialmethylation genesetenrichment

7.24 score 300 scripts 1 dependents

siacus

sde:Simulation and Inference for Stochastic Differential Equations

Companion package to the book Simulation and Inference for Stochastic Differential Equations With R Examples, ISBN 978-0-387-75838-1, Springer, NY. *

Maintained by Stefano Maria Iacus. Last updated 2 years ago.

7.08 score 178 scripts 15 dependents

bioc

cqn:Conditional quantile normalization

A normalization tool for RNA-Seq data, implementing the conditional quantile normalization method.

Maintained by Kasper Daniel Hansen. Last updated 5 months ago.

immunooncology rnaseq preprocessing differentialexpression

6.93 score 238 scripts 4 dependents

bioc

pRolocGUI:Interactive visualisation of spatial proteomics data

The package pRolocGUI comprises functions to interactively visualise spatial proteomics data on the basis of pRoloc, pRolocdata and shiny.

Maintained by Lisa Breckels. Last updated 5 months ago.

proteomics visualization gui

8 stars 6.90 score 3 scripts

bioc

RnBeads:RnBeads

RnBeads facilitates comprehensive analysis of various types of DNA methylation data at the genome scale.

Maintained by Fabian Mueller. Last updated 2 months ago.

dnamethylation methylationarray methylseq epigenetics qualitycontrol preprocessing batcheffect differentialmethylation sequencing cpgisland immunooncology twochannel dataimport

6.85 score 169 scripts 1 dependents

songw01

MEGENA:Multiscale Clustering of Geometrical Network

Co-Expression Network Analysis by adopting network embedding technique. Song W.-M., Zhang B. (2015) Multiscale Embedded Gene Co-expression Network Analysis. PLoS Comput Biol 11(11): e1004574. <doi: 10.1371/journal.pcbi.1004574>.

Maintained by Won-Min Song. Last updated 1 years ago.

cpp

49 stars 6.82 score 45 scripts 1 dependents

bioc

mnem:Mixture Nested Effects Models

Mixture Nested Effects Models (mnem) is an extension of Nested Effects Models and allows for the analysis of single cell perturbation data provided by methods like Perturb-Seq (Dixit et al., 2016) or Crop-Seq (Datlinger et al., 2017). In those experiments each of many cells is perturbed by a knock-down of a specific gene, i.e. several cells are perturbed by a knock-down of gene A, several by a knock-down of gene B, ... and so forth. The observed read-out has to be multi-trait and in the case of the Perturb-/Crop-Seq gene are expression profiles for each cell. mnem uses a mixture model to simultaneously cluster the cell population into k clusters and and infer k networks causally linking the perturbed genes for each cluster. The mixture components are inferred via an expectation maximization algorithm.

Maintained by Martin Pirkl. Last updated 6 days ago.

pathways systemsbiology networkinference network rnaseq pooledscreens singlecell crispr atacseq dnaseq geneexpression cpp

4 stars 6.81 score 15 scripts 4 dependents

unina-sfere

funcharts:Functional Control Charts

Provides functional control charts for statistical process monitoring of functional data, using the methods of Capezza et al. (2020) <doi:10.1002/asmb.2507>, Centofanti et al. (2021) <doi:10.1080/00401706.2020.1753581>, Capezza et al. (2024) <doi:10.1080/00401706.2024.2327346>, Capezza et al. (2024) <doi:10.1080/00224065.2024.2383674>, Centofanti et al. (2022) <doi:10.48550/arXiv.2205.06256>. The package is thoroughly illustrated in the paper of Capezza et al (2023) <doi:10.1080/00224065.2023.2219012>.

Maintained by Christian Capezza. Last updated 14 days ago.

openblas cpp

2 stars 6.73 score 168 scripts

filzmoserp

chemometrics:Multivariate Statistical Analysis in Chemometrics

R companion to the book "Introduction to Multivariate Statistical Analysis in Chemometrics" written by K. Varmuza and P. Filzmoser (2009).

Maintained by Peter Filzmoser. Last updated 2 years ago.

4 stars 6.72 score 213 scripts 4 dependents

neon-biodiversity

Ostats:O-Stats, or Pairwise Community-Level Niche Overlap Statistics

O-statistics, or overlap statistics, measure the degree of community-level trait overlap. They are estimated by fitting nonparametric kernel density functions to each species’ trait distribution and calculating their areas of overlap. For instance, the median pairwise overlap for a community is calculated by first determining the overlap of each species pair in trait space, and then taking the median overlap of each species pair in a community. This median overlap value is called the O-statistic (O for overlap). The Ostats() function calculates separate univariate overlap statistics for each trait, while the Ostats_multivariate() function calculates a single multivariate overlap statistic for all traits. O-statistics can be evaluated against null models to obtain standardized effect sizes. 'Ostats' is part of the collaborative Macrosystems Biodiversity Project "Local- to continental-scale drivers of biodiversity across the National Ecological Observatory Network (NEON)." For more information on this project, see the Macrosystems Biodiversity Website (<https://neon-biodiversity.github.io/>). Calculation of O-statistics is described in Read et al. (2018) <doi:10.1111/ecog.03641>, and a teaching module for introducing the underlying biological concepts at an undergraduate level is described in Grady et al. (2018) <http://tiee.esa.org/vol/v14/issues/figure_sets/grady/abstract.html>.

Maintained by Quentin D. Read. Last updated 4 months ago.

ecology

7 stars 6.69 score 28 scripts

briencj

growthPheno:Functional Analysis of Phenotypic Growth Data to Smooth and Extract Traits

Assists in the plotting and functional smoothing of traits measured over time and the extraction of features from these traits, implementing the SET (Smoothing and Extraction of Traits) method described in Brien et al. (2020) Plant Methods, 16. Smoothing of growth trends for individual plants using natural cubic smoothing splines or P-splines is available for removing transient effects and segmented smoothing is available to deal with discontinuities in growth trends. There are graphical tools for assessing the adequacy of trait smoothing, both when using this and other packages, such as those that fit nonlinear growth models. A range of per-unit (plant, pot, plot) growth traits or features can be extracted from the data, including single time points, interval growth rates and other growth statistics, such as maximum growth or days to maximum growth. The package also has tools adapted to inputting data from high-throughput phenotyping facilities, such from a Lemna-Tec Scananalyzer 3D (see <https://www.youtube.com/watch?v=MRAF_mAEa7E/> for more information). The package 'growthPheno' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 16 days ago.

6 stars 6.66 score 42 scripts

gloewing

fastFMM:Fast Functional Mixed Models using Fast Univariate Inference

Implementation of the fast univariate inference approach (Cui et al. (2022) <doi:10.1080/10618600.2021.1950006>, Loewinger et al. (2024) <doi:10.7554/eLife.95802.2>) for fitting functional mixed models. User guides and Python package information can be found at <https://github.com/gloewing/photometry_FLMM>.

Maintained by Erjia Cui. Last updated 7 days ago.

9 stars 6.51 score 22 scripts

keefe-murphy

MoEClust:Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

Clustering via parsimonious Gaussian Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2020) <doi:10.1007/s11634-019-00373-8>. This package fits finite Gaussian mixture models with a formula interface for supplying gating and/or expert network covariates using a range of parsimonious covariance parameterisations from the GPCM family via the EM/CEM algorithm. Visualisation of the results of such models using generalised pairs plots and the inclusion of an additional noise component is also facilitated. A greedy forward stepwise search algorithm is provided for identifying the optimal model in terms of the number of components, the GPCM covariance parameterisation, and the subsets of gating/expert network covariates.

Maintained by Keefe Murphy. Last updated 26 days ago.

gaussian-mixture-models mixture-of-experts model-based-clustering

7 stars 6.51 score 44 scripts 1 dependents

bioc

ChAMP:Chip Analysis Methylation Pipeline for Illumina HumanMethylation450 and EPIC

The package includes quality control metrics, a selection of normalization methods and novel methods to identify differentially methylated regions and to highlight copy number alterations.

Maintained by Yuan Tian. Last updated 5 months ago.

microarray methylationarray normalization twochannel copynumber dnamethylation

6.50 score 278 scripts

mflores72000

ILS:Interlaboratory Study

It performs interlaboratory studies (ILS) to detect those laboratories that provide non-consistent results when comparing to others. It permits to work simultaneously with various testing materials, from standard univariate, and functional data analysis (FDA) perspectives. The univariate approach based on ASTM E691-08 consist of estimating the Mandel's h and k statistics to identify those laboratories that provide more significant different results, testing also the presence of outliers by Cochran and Grubbs tests, Analysis of variance (ANOVA) techniques are provided (F and Tuckey tests) to test differences in means corresponding to different laboratories per each material. Taking into account the functional nature of data retrieved in analytical chemistry, applied physics and engineering (spectra, thermograms, etc.). ILS package provides a FDA approach for finding the Mandel's k and h statistics distribution by smoothing bootstrap resampling.

Maintained by Miguel Flores. Last updated 2 years ago.

6.48 score 75 scripts

bioc

doubletrouble:Identification and classification of duplicated genes

doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.

Maintained by Fabrício Almeida-Silva. Last updated 19 days ago.

software wholegenome comparativegenomics functionalgenomics phylogenetics network classification bioinformatics comparative-genomics gene-duplication molecular-evolution whole-genome-duplication

23 stars 6.44 score 17 scripts

sciurus365

simlandr:Simulation-Based Landscape Construction for Dynamical Systems

A toolbox for constructing potential landscapes for dynamical systems using Monte Carlo simulation. The method is based on the potential landscape definition by Wang et al. (2008) <doi:10.1073/pnas.0800579105> (also see Zhou & Li, 2016 <doi:10.1063/1.4943096> for further mathematical discussions) and can be used for a large variety of models.

Maintained by Jingmeng Cui. Last updated 2 months ago.

research-tool

6 stars 6.41 score 12 scripts 2 dependents

bioc

quantro:A test for when to use quantile normalization

A data-driven test for the assumptions of quantile normalization using raw data such as objects that inherit eSets (e.g. ExpressionSet, MethylSet). Group level information about each sample (such as Tumor / Normal status) must also be provided because the test assesses if there are global differences in the distributions between the user-defined groups.

Maintained by Stephanie Hicks. Last updated 5 months ago.

normalization preprocessing multiplecomparison microarray sequencing

6.40 score 69 scripts 2 dependents

benjaminhlina

nichetools:Complementary Package to 'nicheROVER' and 'SIBER'

Provides functions complementary to packages 'nicheROVER' and 'SIBER' allowing the user to extract Bayesian estimates from data objects created by the packages 'nicheROVER' and 'SIBER'. Please see the following publications for detailed methods on 'nicheROVER' and 'SIBER' Hansen et al. (2015) <doi:10.1890/14-0235.1>, Jackson et al. (2011) <do i:10.1111/j.1365-2656.2011.01806.x>, and Layman et al. (2007) <doi:10.1890/0012-9658(2007)88[42:CSIRPF]2.0.CO;2>, respectfully.

Maintained by Benjamin L. Hlina. Last updated 11 days ago.

jags cpp

2 stars 6.39 score 17 scripts

trackerproject

trackeR:Infrastructure for Running, Cycling and Swimming Data from GPS-Enabled Tracking Devices

Provides infrastructure for handling running, cycling and swimming data from GPS-enabled tracking devices within R. The package provides methods to extract, clean and organise workout and competition data into session-based and unit-aware data objects of class 'trackeRdata' (S3 class). The information can then be visualised, summarised, and analysed through flexible and extensible methods. Frick and Kosmidis (2017) <doi: 10.18637/jss.v082.i07>, which is updated and maintained as one of the vignettes, provides detailed descriptions of the package and its methods, and real-data demonstrations of the package functionality.

Maintained by Ioannis Kosmidis. Last updated 1 years ago.

90 stars 6.37 score 58 scripts 1 dependents

bioc

signifinder:Collection and implementation of public transcriptional cancer signatures

signifinder is an R package for computing and exploring a compendium of tumor signatures. It allows to compute a variety of signatures, based on gene expression values, and return single-sample scores. Currently, signifinder contains more than 60 distinct signatures collected from the literature, relating to multiple tumors and multiple cancer processes.

Maintained by Stefania Pirrotta. Last updated 3 months ago.

geneexpression genetarget immunooncology biomedicalinformatics rnaseq microarray reportwriting visualization singlecell spatial genesignaling

7 stars 6.28 score 15 scripts

bioc

recountmethylation:Access and analyze public DNA methylation array data compilations

Resources for cross-study analyses of public DNAm array data from NCBI GEO repo, produced using Illumina's Infinium HumanMethylation450K (HM450K) and MethylationEPIC (EPIC) platforms. Provided functions enable download, summary, and filtering of large compilation files. Vignettes detail background about file formats, example analyses, and more. Note the disclaimer on package load and consult the main manuscripts for further info.

Maintained by Sean K Maden. Last updated 5 months ago.

dnamethylation epigenetics microarray methylationarray experimenthub

9 stars 6.28 score 9 scripts

bioc

Linnorm:Linear model and normality based normalization and transformation method (Linnorm)

Linnorm is an algorithm for normalizing and transforming RNA-seq, single cell RNA-seq, ChIP-seq count data or any large scale count data. It has been independently reviewed by Tian et al. on Nature Methods (https://doi.org/10.1038/s41592-019-0425-8). Linnorm can work with raw count, CPM, RPKM, FPKM and TPM.

Maintained by Shun Hang Yip. Last updated 5 months ago.

immunooncology sequencing chipseq rnaseq differentialexpression geneexpression genetics normalization software transcription batcheffect peakdetection clustering network singlecell cpp

6.26 score 61 scripts 5 dependents

bioc

lumi:BeadArray Specific Methods for Illumina Methylation and Expression Microarrays

The lumi package provides an integrated solution for the Illumina microarray data analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality control, BeadArray-specific variance stabilization, normalization and gene annotation at the probe level. It also includes the functions of processing Illumina methylation microarrays, especially Illumina Infinium methylation microarrays.

Maintained by Lei Huang. Last updated 5 months ago.

microarray onechannel preprocessing dnamethylation qualitycontrol twochannel

6.26 score 294 scripts 5 dependents

angelacar

TwoTimeScales:Analysis of Event Data with Two Time Scales

Analyse time to event data with two time scales by estimating a smooth hazard that varies over two time scales. If covariates are available, estimate a proportional hazards model with such a two-dimensional baseline hazard. Functions are provided to prepare the raw data for estimation, to estimate and to plot the two-dimensional smooth hazard. Extension to a competing risks model are implemented. For details about the method please refer to Carollo et al. (2024) <doi:10.1002/sim.10297>.

Maintained by Angela Carollo. Last updated 2 months ago.

9 stars 6.26 score 5 scripts

lozalojo

mem:The Moving Epidemic Method

The Moving Epidemic Method, created by T Vega and JE Lozano (2012, 2015) <doi:10.1111/j.1750-2659.2012.00422.x>, <doi:10.1111/irv.12330>, allows the weekly assessment of the epidemic and intensity status to help in routine respiratory infections surveillance in health systems. Allows the comparison of different epidemic indicators, timing and shape with past epidemics and across different regions or countries with different surveillance systems. Also, it gives a measure of the performance of the method in terms of sensitivity and specificity of the alert week.

Maintained by Jose E. Lozano. Last updated 2 years ago.

influenza mem

14 stars 6.24 score 82 scripts 1 dependents

tzerk

RLumShiny:'Shiny' Applications for the R Package 'Luminescence'

A collection of 'shiny' applications for the R package 'Luminescence'. These mainly, but not exclusively, include applications for plotting chronometric data from e.g. luminescence or radiocarbon dating. It further provides access to bootstraps tooltip and popover functionality and contains the 'jscolor.js' library with a custom 'shiny' output binding.

Maintained by Christoph Burow. Last updated 6 days ago.

bootstrap jscolor luminescence luminescence-dating shiny shiny-applications tooltip

7 stars 6.23 score 67 scripts 2 dependents

crp2a

BayLum:Chronological Bayesian Models Integrating Optically Stimulated Luminescence and Radiocarbon Age Dating

Bayesian analysis of luminescence data and C-14 age estimates. Bayesian models are based on the following publications: Combes, B. & Philippe, A. (2017) <doi:10.1016/j.quageo.2017.02.003> and Combes et al (2015) <doi:10.1016/j.quageo.2015.04.001>. This includes, amongst others, data import, export, application of age models and palaeodose model.

Maintained by Anne Philippe. Last updated 12 months ago.

archaeometry bayesian-statistics geochronology luminescence-dating radiocarbon-dates jags cpp

9 stars 6.22 score 37 scripts

bioc

iNETgrate:Integrates DNA methylation data with gene expression in a single gene network

The iNETgrate package provides functions to build a correlation network in which nodes are genes. DNA methylation and gene expression data are integrated to define the connections between genes. This network is used to identify modules (clusters) of genes. The biological information in each of the resulting modules is represented by an eigengene. These biological signatures can be used as features e.g., for classification of patients into risk categories. The resulting biological signatures are very robust and give a holistic view of the underlying molecular changes.

Maintained by Habil Zare. Last updated 5 months ago.

geneexpression rnaseq dnamethylation networkinference network graphandnetwork biomedicalinformatics systemsbiology transcriptomics classification clustering dimensionreduction principalcomponent mrnamicroarray normalization geneprediction kegg survival core-services

74 stars 6.21 score 1 scripts

florianstijven

Surrogate:Evaluation of Surrogate Endpoints in Clinical Trials

In a clinical trial, it frequently occurs that the most credible outcome to evaluate the effectiveness of a new therapy (the true endpoint) is difficult to measure. In such a situation, it can be an effective strategy to replace the true endpoint by a (bio)marker that is easier to measure and that allows for a prediction of the treatment effect on the true endpoint (a surrogate endpoint). The package 'Surrogate' allows for an evaluation of the appropriateness of a candidate surrogate endpoint based on the meta-analytic, information-theoretic, and causal-inference frameworks. Part of this software has been developed using funding provided from the European Union's Seventh Framework Programme for research, technological development and demonstration (Grant Agreement no 602552), the Special Research Fund (BOF) of Hasselt University (BOF-number: BOF2OCPO3), GlaxoSmithKline Biologicals, Baekeland Mandaat (HBC.2022.0145), and Johnson & Johnson Innovative Medicine.

Maintained by Wim Van Der Elst. Last updated 1 months ago.

1 stars 6.15 score 133 scripts

feiyoung

DR.SC:Joint Dimension Reduction and Spatial Clustering

Joint dimension reduction and spatial clustering is conducted for Single-cell RNA sequencing and spatial transcriptomics data, and more details can be referred to Wei Liu, Xu Liao, Yi Yang, Huazhen Lin, Joe Yeong, Xiang Zhou, Xingjie Shi and Jin Liu. (2022) <doi:10.1093/nar/gkac219>. It is not only computationally efficient and scalable to the sample size increment, but also is capable of choosing the smoothness parameter and the number of clusters as well.

Maintained by Wei Liu. Last updated 1 years ago.

dimension-reduction selfsupervised spatial-clustering spatial-transcriptomics openblas cpp

5 stars 6.12 score 29 scripts 2 dependents

jmadinlab

habtools:Tools and Metrics for 3D Surfaces and Objects

A collection of functions for sampling and simulating 3D surfaces and objects and estimating metrics like rugosity, fractal dimension, convexity, sphericity, circularity, second moments of area and volume, and more.

Maintained by Nina Schiettekatte. Last updated 26 days ago.

12 stars 6.10 score 9 scripts

capnrefsmmat

regressinator:Simulate and Diagnose (Generalized) Linear Models

Simulate samples from populations with known covariate distributions, generate response variables according to common linear and generalized linear model families, draw from sampling distributions of regression estimates, and perform visual inference on diagnostics from model fits.

Maintained by Alex Reinhart. Last updated 6 months ago.

statistics

4 stars 6.08 score 25 scripts

chrhennig

prabclus:Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data

Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetic data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distance-based regressions are equal. Try package?prabclus for on overview.

Maintained by Christian Hennig. Last updated 6 months ago.

1 stars 6.07 score 90 scripts 70 dependents

bioc

GloScope:Population-level Representation on scRNA-Seq data

This package aims at representing and summarizing the entire single-cell profile of a sample. It allows researchers to perform important bioinformatic analyses at the sample-level such as visualization and quality control. The main functions Estimate sample distribution and calculate statistical divergence among samples, and visualize the distance matrix through MDS plots.

Maintained by William Torous. Last updated 5 months ago.

datarepresentation qualitycontrol rnaseq sequencing software singlecell

3 stars 6.05 score 84 scripts

puttickmacroevolution

motmot:Models of Trait Macroevolution on Trees

Functions for fitting models of trait evolution on phylogenies for continuous traits. The majority of functions described in Thomas and Freckleton (2012) <doi:10.1111/j.2041-210X.2011.00132.x> and include functions that allow for tests of variation in the rates of trait evolution.

Maintained by Mark Puttick. Last updated 5 years ago.

cpp

4 stars 6.05 score 35 scripts

bioc

ENmix:Quality control and analysis tools for Illumina DNA methylation BeadChip

Tools for quanlity control, analysis and visulization of Illumina DNA methylation array data.

Maintained by Zongli Xu. Last updated 18 days ago.

dnamethylation preprocessing qualitycontrol twochannel microarray onechannel methylationarray batcheffect normalization dataimport regression principalcomponent epigenetics multichannel differentialmethylation immunooncology

6.01 score 115 scripts

zhenkewu

baker:"Nested Partially Latent Class Models"

Provides functions to specify, fit and visualize nested partially-latent class models ( Wu, Deloria-Knoll, Hammitt, and Zeger (2016) <doi:10.1111/rssc.12101>; Wu, Deloria-Knoll, and Zeger (2017) <doi:10.1093/biostatistics/kxw037>; Wu and Chen (2021) <doi:10.1002/sim.8804>) for inference of population disease etiology and individual diagnosis. In the motivating Pneumonia Etiology Research for Child Health (PERCH) study, because both quantities of interest sum to one hundred percent, the PERCH scientists frequently refer to them as population etiology pie and individual etiology pie, hence the name of the package.

Maintained by Zhenke Wu. Last updated 11 months ago.

bayesian case-control latent-class-analysis jags cpp

8 stars 6.00 score 21 scripts

mhahsler

streamMOA:Interface for MOA Stream Clustering Algorithms

Interface for data stream clustering algorithms implemented in the MOA (Massive Online Analysis) framework (Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer (2010). MOA: Massive Online Analysis, Journal of Machine Learning Research 11: 1601-1604).

Maintained by Michael Hahsler. Last updated 7 months ago.

clustering datamining datastream openjdk

13 stars 5.98 score 37 scripts

bioc

consensusOV:Gene expression-based subtype classification for high-grade serous ovarian cancer

This package implements four major subtype classifiers for high-grade serous (HGS) ovarian cancer as described by Helland et al. (PLoS One, 2011), Bentink et al. (PLoS One, 2012), Verhaak et al. (J Clin Invest, 2013), and Konecny et al. (J Natl Cancer Inst, 2014). In addition, the package implements a consensus classifier, which consolidates and improves on the robustness of the proposed subtype classifiers, thereby providing reliable stratification of patients with HGS ovarian tumors of clearly defined subtype.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

classification clustering differentialexpression geneexpression microarray transcriptomics cancer-data cancer-genomics cancer-research expression-database ovarian-cancer

3 stars 5.98 score 15 scripts 1 dependents

bioc

FEAST:FEAture SelcTion (FEAST) for Single-cell clustering

Cell clustering is one of the most important and commonly performed tasks in single-cell RNA sequencing (scRNA-seq) data analysis. An important step in cell clustering is to select a subset of genes (referred to as “features”), whose expression patterns will then be used for downstream clustering. A good set of features should include the ones that distinguish different cell types, and the quality of such set could have significant impact on the clustering accuracy. FEAST is an R library for selecting most representative features before performing the core of scRNA-seq clustering. It can be used as a plug-in for the etablished clustering algorithms such as SC3, TSCAN, SHARP, SIMLR, and Seurat. The core of FEAST algorithm includes three steps: 1. consensus clustering; 2. gene-level significance inference; 3. validation of an optimized feature set.

Maintained by Kenong Su. Last updated 5 months ago.

sequencing singlecell clustering featureextraction

10 stars 5.97 score 47 scripts

mbinois

GPareto:Gaussian Processes for Pareto Front Estimation and Optimization

Gaussian process regression models, a.k.a. Kriging models, are applied to global multi-objective optimization of black-box functions. Multi-objective Expected Improvement and Step-wise Uncertainty Reduction sequential infill criteria are available. A quantification of uncertainty on Pareto fronts is provided using conditional simulations.

Maintained by Mickael Binois. Last updated 1 years ago.

cpp

16 stars 5.96 score 38 scripts 1 dependents

leoegidi

pivmet:Pivotal Methods for Bayesian Relabelling and k-Means Clustering

Collection of pivotal algorithms for: relabelling the MCMC chains in order to undo the label switching problem in Bayesian mixture models; fitting sparse finite mixtures; initializing the centers of the classical k-means algorithm in order to obtain a better clustering solution. For further details see Egidi, Pappadà, Pauli and Torelli (2018b)<ISBN:9788891910233>.

Maintained by Leonardo Egidi. Last updated 10 months ago.

jags cpp

5 stars 5.94 score 25 scripts

bioc

REMP:Repetitive Element Methylation Prediction

Machine learning-based tools to predict DNA methylation of locus-specific repetitive elements (RE) by learning surrounding genetic and epigenetic information. These tools provide genomewide and single-base resolution of DNA methylation prediction on RE that are difficult to measure using array-based or sequencing-based platforms, which enables epigenome-wide association study (EWAS) and differentially methylated region (DMR) analysis on RE.

Maintained by Yinan Zheng. Last updated 5 months ago.

dnamethylation microarray methylationarray sequencing genomewideassociation epigenetics preprocessing multichannel twochannel differentialmethylation qualitycontrol dataimport

2 stars 5.94 score 18 scripts

sistm

cytometree:Automated Cytometry Gating and Annotation

Given the hypothesis of a bi-modal distribution of cells for each marker, the algorithm constructs a binary tree, the nodes of which are subpopulations of cells. At each node, observed cells and markers are modeled by both a family of normal distributions and a family of bi-modal normal mixture distributions. Splitting is done according to a normalized difference of AIC between the two families. Method is detailed in: Commenges, Alkhassim, Gottardo, Hejblum & Thiebaut (2018) <doi: 10.1002/cyto.a.23601>.

Maintained by Boris P Hejblum. Last updated 2 years ago.

cpp

9 stars 5.91 score 15 scripts 1 dependents

feiyoung

ProFAST:Probabilistic Factor Analysis for Spatially-Aware Dimension Reduction

Probabilistic factor analysis for spatially-aware dimension reduction across multi-section spatial transcriptomics data with millions of spatial locations. More details can be referred to Wei Liu, et al. (2023) <doi:10.1101/2023.07.11.548486>.

Maintained by Wei Liu. Last updated 2 months ago.

openblas cpp

2 stars 5.86 score 12 scripts 1 dependents

bioc

epiNEM:epiNEM

epiNEM is an extension of the original Nested Effects Models (NEM). EpiNEM is able to take into account double knockouts and infer more complex network signalling pathways. It is tailored towards large scale double knock-out screens.

Maintained by Martin Pirkl. Last updated 5 months ago.

pathways systemsbiology networkinference network

1 stars 5.83 score 1 scripts 3 dependents

bioc

benchdamic:Benchmark of differential abundance methods on microbiome data

Starting from a microbiome dataset (16S or WMS with absolute count values) it is possible to perform several analysis to assess the performances of many differential abundance detection methods. A basic and standardized version of the main differential abundance analysis methods is supplied but the user can also add his method to the benchmark. The analyses focus on 4 main aspects: i) the goodness of fit of each method's distributional assumptions on the observed count data, ii) the ability to control the false discovery rate, iii) the within and between method concordances, iv) the truthfulness of the findings if any apriori knowledge is given. Several graphical functions are available for result visualization.

Maintained by Matteo Calgaro. Last updated 4 months ago.

metagenomics microbiome differentialexpression multiplecomparison normalization preprocessing software benchmark differential-abundance-methods

8 stars 5.78 score 8 scripts

bioc

deconvR:Simulation and Deconvolution of Omic Profiles

This package provides a collection of functions designed for analyzing deconvolution of the bulk sample(s) using an atlas of reference omic signature profiles and a user-selected model. Users are given the option to create or extend a reference atlas and,also simulate the desired size of the bulk signature profile of the reference cell types.The package includes the cell-type-specific methylation atlas and, Illumina Epic B5 probe ids that can be used in deconvolution. Additionally,we included BSmeth2Probe, to make mapping WGBS data to their probe IDs easier.

Maintained by Irem B. Gündüz. Last updated 5 months ago.

dnamethylation regression geneexpression rnaseq singlecell statisticalmethod transcriptomics bioconductor-package deconvolution dna-methylation omics

10 stars 5.78 score 15 scripts

drsimonspencer

AMISforInfectiousDiseases:Implement the AMIS Algorithm for Infectious Disease Models

Implements the Adaptive Multiple Importance Sampling (AMIS) algorithm, as described by Retkute et al. (2021, <doi:10.1214/21-AOAS1486>), to estimate key epidemiological parameters by combining outputs from a geostatistical model of infectious diseases (such as prevalence, incidence, or relative risk) with a disease transmission model. Utilising the resulting posterior distributions, the package enables forward projections at the local level.

Maintained by Simon Spencer. Last updated 2 months ago.

cpp openmp

5.78 score 6 scripts

mmukaigawara

geocausal:Causal Inference with Spatio-Temporal Data

Spatio-temporal causal inference based on point process data. You provide the raw data of locations and timings of treatment and outcome events, specify counterfactual scenarios, and the package estimates causal effects over specified spatial and temporal windows. See Papadogeorgou, et al. (2022) <doi:10.1111/rssb.12548> and Mukaigawara, et al. (2024) <doi:10.31219/osf.io/5kc6f>.

Maintained by Mitsuru Mukaigawara. Last updated 12 days ago.

45 stars 5.77 score

robjhyndman

weird:Functions and Data Sets for "That's Weird: Anomaly Detection Using R" by Rob J Hyndman

All functions and data sets required for the examples in the book Hyndman (2024) "That's Weird: Anomaly Detection Using R" <https://OTexts.com/weird/>. All packages needed to run the examples are also loaded.

Maintained by Rob Hyndman. Last updated 3 months ago.

17 stars 5.74 score 18 scripts

r-lum

RLumModel:Solving Ordinary Differential Equations to Understand Luminescence

A collection of functions to simulate luminescence signals in quartz and Al2O3 based on published models.

Maintained by Johannes Friedrich. Last updated 3 years ago.

differential-equations energy-band-model geochronology luminescence luminescence-models modelling quartz simulation openblas cpp

5 stars 5.73 score 18 scripts 1 dependents

bioc

CaMutQC:An R Package for Comprehensive Filtration and Selection of Cancer Somatic Mutations

CaMutQC is able to filter false positive mutations generated due to technical issues, as well as to select candidate cancer mutations through a series of well-structured functions by labeling mutations with various flags. And a detailed and vivid filter report will be offered after completing a whole filtration or selection section. Also, CaMutQC integrates serveral methods and gene panels for Tumor Mutational Burden (TMB) estimation.

Maintained by Xin Wang. Last updated 5 months ago.

software qualitycontrol genetarget cancer-genomics somatic-mutations

7 stars 5.72 score 1 scripts

bioc

GeoTcgaData:Processing Various Types of Data on GEO and TCGA

Gene Expression Omnibus(GEO) and The Cancer Genome Atlas (TCGA) provide us with a wealth of data, such as RNA-seq, DNA Methylation, SNP and Copy number variation data. It's easy to download data from TCGA using the gdc tool, but processing these data into a format suitable for bioinformatics analysis requires more work. This R package was developed to handle these data.

Maintained by Erqiang Hu. Last updated 5 months ago.

geneexpression differentialexpression rnaseq copynumbervariation microarray software dnamethylation differentialmethylation snp atacseq methylationarray

25 stars 5.68 score 19 scripts

xiaozhangryy

CAESAR.Suite:CAESAR: a Cross-Technology and Cross-Resolution Framework for Spatial Omics Annotation

Biotechnology in spatial omics has advanced rapidly over the past few years, enhancing both throughput and resolution. However, existing annotation pipelines in spatial omics predominantly rely on clustering methods, lacking the flexibility to integrate extensive annotated information from single-cell RNA sequencing (scRNA-seq) due to discrepancies in spatial resolutions, species, or modalities. Here we introduce the CAESAR suite, an open-source software package that provides image-based spatial co-embedding of locations and genomic features. It uniquely transfers labels from scRNA-seq reference, enabling the annotation of spatial omics datasets across different technologies, resolutions, species, and modalities, based on the conserved relationship between signature genes and cells/locations at an appropriate level of granularity. Notably, CAESAR enriches location-level pathways, allowing for the detection of gradual biological pathway activation within spatially defined domain types. More details on the methods related to our paper currently under submission. A full reference to the paper will be provided in future versions once the paper is published.

Maintained by Xiao Zhang. Last updated 7 days ago.

openblas cpp

1 stars 5.67 score 2 scripts

markusul

SDModels:Spectrally Deconfounded Models

Screen for and analyze non-linear sparse direct effects in the presence of unobserved confounding using the spectral deconfounding techniques (Ćevid, Bühlmann, and Meinshausen (2020)<jmlr.org/papers/v21/19-545.html>, Guo, Ćevid, and Bühlmann (2022) <doi:10.1214/21-AOS2152>). These methods have been shown to be a good estimate for the true direct effect if we observe many covariates, e.g., high-dimensional settings, and we have fairly dense confounding. Even if the assumptions are violated, it seems like there is not much to lose, and the deconfounded models will, in general, estimate a function closer to the true one than classical least squares optimization. 'SDModels' provides functions SDAM() for Spectrally Deconfounded Additive Models (Scheidegger, Guo, and Bühlmann (2025) <doi:10.1145/3711116>) and SDForest() for Spectrally Deconfounded Random Forests (Ulmer, Scheidegger, and Bühlmann (2025) <doi:10.48550/arXiv.2502.03969>).

Maintained by Markus Ulmer. Last updated 20 hours ago.

2 stars 5.65 score 15 scripts

bioc

netresponse:Functional Network Analysis

Algorithms for functional network analysis. Includes an implementation of a variational Dirichlet process Gaussian mixture model for nonparametric mixture modeling.

Maintained by Leo Lahti. Last updated 5 months ago.

cellbiology clustering geneexpression genetics network graphandnetwork differentialexpression microarray networkinference transcription

3 stars 5.64 score 21 scripts

bioc

flowMeans:Non-parametric Flow Cytometry Data Gating

Identifies cell populations in Flow Cytometry data using non-parametric clustering and segmented-regression-based change point detection. Note: R 2.11.0 or newer is required.

Maintained by Nima Aghaeepour. Last updated 5 months ago.

immunooncology flowcytometry cellbiology clustering

5.64 score 36 scripts 2 dependents

adamlilith

enmSdmX:Species Distribution Modeling and Ecological Niche Modeling

Implements species distribution modeling and ecological niche modeling, including: bias correction, spatial cross-validation, model evaluation, raster interpolation, biotic "velocity" (speed and direction of movement of a "mass" represented by a raster), interpolating across a time series of rasters, and use of spatially imprecise records. The heart of the package is a set of "training" functions which automatically optimize model complexity based number of available occurrences. These algorithms include MaxEnt, MaxNet, boosted regression trees/gradient boosting machines, generalized additive models, generalized linear models, natural splines, and random forests. To enhance interoperability with other modeling packages, no new classes are created. The package works with 'PROJ6' geodetic objects and coordinate reference systems.

Maintained by Adam B. Smith. Last updated 1 months ago.

bias-correction biogeography ecological-niche-modeling ecological-niche-modelling niche-modeling niche-modelling species-distribution-modeling openjdk

25 stars 5.57 score 37 scripts

bioc

bandle:An R package for the Bayesian analysis of differential subcellular localisation experiments

The Bandle package enables the analysis and visualisation of differential localisation experiments using mass-spectrometry data. Experimental methods supported include dynamic LOPIT-DC, hyperLOPIT, Dynamic Organellar Maps, Dynamic PCP. It provides Bioconductor infrastructure to analyse these data.

Maintained by Oliver M. Crook. Last updated 2 months ago.

bayesian classification clustering immunooncology qualitycontrol dataimport proteomics massspectrometry openblas cpp openmp

4 stars 5.56 score 3 scripts

jeffreyhanson

raptr:Representative and Adequate Prioritization Toolkit in R

Biodiversity is in crisis. The overarching aim of conservation is to preserve biodiversity patterns and processes. To this end, protected areas are established to buffer species and preserve biodiversity processes. But resources are limited and so protected areas must be cost-effective. This package contains tools to generate plans for protected areas (prioritizations), using spatially explicit targets for biodiversity patterns and processes. To obtain solutions in a feasible amount of time, this package uses the commercial 'Gurobi' software (obtained from <https://www.gurobi.com/>). For more information on using this package, see Hanson et al. (2018) <doi:10.1111/2041-210X.12862>.

Maintained by Jeffrey O Hanson. Last updated 1 years ago.

cpp

8 stars 5.52 score 83 scripts

bioc

methylclock:Methylclock - DNA methylation-based clocks

This package allows to estimate chronological and gestational DNA methylation (DNAm) age as well as biological age using different methylation clocks. Chronological DNAm age (in years) : Horvath's clock, Hannum's clock, BNN, Horvath's skin+blood clock, PedBE clock and Wu's clock. Gestational DNAm age : Knight's clock, Bohlin's clock, Mayne's clock and Lee's clocks. Biological DNAm clocks : Levine's clock and Telomere Length's clock.

Maintained by Dolors Pelegri-Siso. Last updated 5 months ago.

dnamethylation biologicalquestion preprocessing statisticalmethod normalization cpp

39 stars 5.52 score 28 scripts

bioc

miRSM:Inferring miRNA sponge modules in heterogeneous data

The package aims to identify miRNA sponge or ceRNA modules in heterogeneous data. It provides several functions to study miRNA sponge modules at single-sample and multi-sample levels, including popular methods for inferring gene modules (candidate miRNA sponge or ceRNA modules), and two functions to identify miRNA sponge modules at single-sample and multi-sample levels, as well as several functions to conduct modular analysis of miRNA sponge modules.

Maintained by Junpeng Zhang. Last updated 5 months ago.

geneexpression biomedicalinformatics clustering genesetenrichment microarray software generegulation genetarget cerna mirna mirna-sponge mirna-targets modules openjdk

4 stars 5.51 score 5 scripts

bioc

conumee:Enhanced copy-number variation analysis using Illumina DNA methylation arrays

This package contains a set of processing and plotting methods for performing copy-number variation (CNV) analysis using Illumina 450k or EPIC methylation arrays.

Maintained by Volker Hovestadt. Last updated 5 months ago.

copynumbervariation dnamethylation methylationarray microarray normalization preprocessing qualitycontrol software

5.48 score 30 scripts

pridiltal

stray:Anomaly Detection in High Dimensional and Temporal Data

This is a modification of 'HDoutliers' package. The 'HDoutliers' algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. This package implements the algorithm proposed in Talagala, Hyndman and Smith-Miles (2019) <arXiv:1908.04000> for detecting anomalies in high-dimensional data that addresses these limitations of 'HDoutliers' algorithm. We define an anomaly as an observation that deviates markedly from the majority with a large distance gap. An approach based on extreme value theory is used for the anomalous threshold calculation.

Maintained by Priyanga Dilini Talagala. Last updated 1 years ago.

stray

58 stars 5.47 score 34 scripts 1 dependents

bioc

bigmelon:Illumina methylation array analysis for large experiments

Methods for working with Illumina arrays using gdsfmt.

Maintained by Leonard C. Schalkwyk. Last updated 5 months ago.

dnamethylation microarray twochannel preprocessing qualitycontrol methylationarray dataimport cpgisland

5.47 score 21 scripts

bioc

UMI4Cats:UMI4Cats: Processing, analysis and visualization of UMI-4C chromatin contact data

UMI-4C is a technique that allows characterization of 3D chromatin interactions with a bait of interest, taking advantage of a sonication step to produce unique molecular identifiers (UMIs) that help remove duplication bias, thus allowing a better differential comparsion of chromatin interactions between conditions. This package allows processing of UMI-4C data, starting from FastQ files provided by the sequencing facility. It provides two statistical methods for detecting differential contacts and includes a visualization function to plot integrated information from a UMI-4C assay.

Maintained by Mireia Ramos-Rodriguez. Last updated 5 months ago.

qualitycontrol preprocessing alignment normalization visualization sequencing coverage chromatin chromatin-interaction genomics umi4c

5 stars 5.40 score 7 scripts

flaviomoc

divraster:Diversity Metrics Calculations for Rasterized Data

Alpha and beta diversity for taxonomic (TD), functional (FD), and phylogenetic (PD) dimensions based on rasters. Spatial and temporal beta diversity can be partitioned into replacement and richness difference components. It also calculates standardized effect size for FD and PD alpha diversity and the average individual traits across multilayer rasters. The layers of the raster represent species, while the cells represent communities. Methods details can be found at Cardoso et al. 2022 <https://CRAN.R-project.org/package=BAT> and Heming et al. 2023 <https://CRAN.R-project.org/package=SESraster>.

Maintained by Flávio M. M. Mota. Last updated 14 days ago.

10 stars 5.40 score 7 scripts

andrea-havron

clustTMB:Spatio-Temporal Finite Mixture Model using 'TMB'

Fits a spatio-temporal finite mixture model using 'TMB'. Covariate, spatial and temporal random effects can be incorporated into the gating formula using multinomial logistic regression, the expert formula using a generalized linear mixed model framework, or both.

Maintained by Andrea M. Havron. Last updated 6 months ago.

cpp

4 stars 5.38 score 9 scripts

adrientaudiere

cati:Community Assembly by Traits: Individuals and Beyond

Detect and quantify community assembly processes using trait values of individuals or populations, the T-statistics and other metrics, and dedicated null models.

Maintained by Adrien Taudiere. Last updated 5 months ago.

12 stars 5.33 score 15 scripts

bioc

preciseTAD:preciseTAD: A machine learning framework for precise TAD boundary prediction

preciseTAD provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.

Maintained by Mikhail Dozmorov. Last updated 5 months ago.

software hic sequencing clustering classification functionalgenomics featureextraction

7 stars 5.29 score 14 scripts

keefe-murphy

IMIFA:Infinite Mixtures of Infinite Factor Analysers and Related Models

Provides flexible Bayesian estimation of Infinite Mixtures of Infinite Factor Analysers and related models, for nonparametrically clustering high-dimensional data, introduced by Murphy et al. (2020) <doi:10.1214/19-BA1179>. The IMIFA model conducts Bayesian nonparametric model-based clustering with factor analytic covariance structures without recourse to model selection criteria to choose the number of clusters or cluster-specific latent factors, mostly via efficient Gibbs updates. Model-specific diagnostic tools are also provided, as well as many options for plotting results, conducting posterior inference on parameters of interest, posterior predictive checking, and quantifying uncertainty.

Maintained by Keefe Murphy. Last updated 1 years ago.

bayesian-nonparametrics dimension-reduction factor-analysis gaussian-mixture-model model-based-clustering

7 stars 5.25 score 51 scripts

anna-neufeld

splinetree:Longitudinal Regression Trees and Forests

Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.

Maintained by Anna Neufeld. Last updated 6 years ago.

4 stars 5.24 score 29 scripts

zhiwent

BCClong:Bayesian Consensus Clustering for Multiple Longitudinal Features

It is very common nowadays for a study to collect multiple features and appropriately integrating multiple longitudinal features simultaneously for defining individual clusters becomes increasingly crucial to understanding population heterogeneity and predicting future outcomes. 'BCClong' implements a Bayesian consensus clustering (BCC) model for multiple longitudinal features via a generalized linear mixed model. Compared to existing packages, several key features make the 'BCClong' package appealing: (a) it allows simultaneous clustering of mixed-type (e.g., continuous, discrete and categorical) longitudinal features, (b) it allows each longitudinal feature to be collected from different sources with measurements taken at distinct sets of time points (known as irregularly sampled longitudinal data), (c) it relaxes the assumption that all features have the same clustering structure by estimating the feature-specific (local) clusterings and consensus (global) clustering.

Maintained by Zhiwen Tan. Last updated 9 months ago.

openblas cpp openmp

4 stars 5.20 score 10 scripts

bioc

maSigPro:Significant Gene Expression Profile Differences in Time Course Gene Expression Data

maSigPro is a regression based approach to find genes for which there are significant gene expression profile differences between experimental groups in time course microarray and RNA-Seq experiments.

Maintained by Maria Jose Nueda. Last updated 5 months ago.

microarray rna-seq differential expression timecourse

5.18 score 76 scripts

bioc

methylCC:Estimate the cell composition of whole blood in DNA methylation samples

A tool to estimate the cell composition of DNA methylation whole blood sample measured on any platform technology (microarray and sequencing).

Maintained by Stephanie C. Hicks. Last updated 5 months ago.

microarray sequencing dnamethylation methylationarray methylseq wholegenome

19 stars 5.18 score 8 scripts

bioc

spatialFDA:A Tool for Spatial Multi-sample Comparisons

spatialFDA is a package to calculate spatial statistics metrics. The package takes a SpatialExperiment object and calculates spatial statistics metrics using the package spatstat. Then it compares the resulting functions across samples/conditions using functional additive models as implemented in the package refund. Furthermore, it provides exploratory visualisations using functional principal component analysis, as well implemented in refund.

Maintained by Martin Emons. Last updated 1 months ago.

software spatial transcriptomics

3 stars 5.18 score 6 scripts

salbeke

rKIN:(Kernel) Isotope Niche Estimation

Applies methods used to estimate animal homerange, but instead of geospatial coordinates, we use isotopic coordinates. The estimation methods include: 1) 2-dimensional bivariate normal kernel utilization density estimator, 2) bivariate normal ellipse estimator, and 3) minimum convex polygon estimator, all applied to stable isotope data. Additionally, functions to determine niche area, polygon overlap between groups and levels (confidence contours) and plotting capabilities.

Maintained by Shannon E Albeke. Last updated 28 days ago.

4 stars 5.13 score 34 scripts

tsmodels

tstests:Time Series Goodness of Fit and Forecast Evaluation Tests

Goodness of Fit and Forecast Evaluation Tests for timeseries models. Includes, among others, the Generalized Method of Moments (GMM) Orthogonality Test of Hansen (1982), the Nyblom (1989) parameter constancy test, the sign-bias test of Engle and Ng (1993), and a range of tests for value at risk and expected shortfall evaluation.

Maintained by Alexios Galanos. Last updated 5 months ago.

forecasting statistical-tests

5 stars 5.10 score 3 scripts

bioc

rCGH:Comprehensive Pipeline for Analyzing and Visualizing Array-Based CGH Data

A comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH arrays. As inputs, rCGH supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS or Affymetrix Power Tools. rCGH also supports custom arrays, provided data complies with the expected format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to profiles segmentation and gene annotations. This package also provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.

Maintained by Frederic Commo. Last updated 5 months ago.

acgh copynumbervariation preprocessing featureextraction

4 stars 5.10 score 26 scripts 1 dependents

jlp-bioinf

rnaCrosslinkOO:Analysis of RNA Crosslinking Data

Analysis of RNA crosslinking data for RNA structure prediction. The package is suitable for the analysis of RNA structure cross-linking data and chemical probing data.

Maintained by Jonathan Price. Last updated 2 months ago.

comrades psoralen rna-crosslinking rna-structure rna-structure-prediction

1 stars 5.08 score 3 scripts

julia-wrobel

mxfda:A Functional Data Analysis Package for Spatial Single Cell Data

Methods and tools for deriving spatial summary functions from single-cell imaging data and performing functional data analyses. Functions can be applied to other single-cell technologies such as spatial transcriptomics. Functional regression and functional principal component analysis methods are in the 'refund' package <https://cran.r-project.org/package=refund> while calculation of the spatial summary functions are from the 'spatstat' package <https://spatstat.org/>.

Maintained by Alex Soupir. Last updated 1 months ago.

1 stars 5.08 score 8 scripts

ustervbo

beadplexr:Analysis of Multiplex Cytometric Bead Assays

Reproducible and automated analysis of multiplex bead assays such as CBA (Morgan et al. 2004; <doi: 10.1016/j.clim.2003.11.017>), LEGENDplex (Yu et al. 2015; <doi: 10.1084/jem.20142318>), and MACSPlex (Miltenyi Biotec 2014; Application note: Data acquisition and analysis without the MACSQuant analyzer; <https://www.miltenyibiotec.com/upload/assets/IM0021608.PDF>). The package provides functions for streamlined reading of fcs files, and identification of bead clusters and analyte expression. The package eases the calculation of standard curves and the subsequent calculation of the analyte concentration.

Maintained by Ulrik Stervbo. Last updated 2 years ago.

5.07 score 39 scripts

martinloza

Canek:Batch Correction of Single Cell Transcriptome Data

Non-linear/linear hybrid method for batch-effect correction that uses Mutual Nearest Neighbors (MNNs) to identify similar cells between datasets. Reference: Loza M. et al. (NAR Genomics and Bioinformatics, 2020) <doi:10.1093/nargab/lqac022>.

Maintained by Martin Loza. Last updated 1 years ago.

batch-effects bioinformatics single-cell-rna-seq transcriptomics

5 stars 5.06 score 23 scripts

emanuelsommer

portvine:Vine Based (Un)Conditional Portfolio Risk Measure Estimation

Following Sommer (2022) <https://mediatum.ub.tum.de/1658240> portfolio level risk estimates (e.g. Value at Risk, Expected Shortfall) are estimated by modeling each asset univariately by an ARMA-GARCH model and then their cross dependence via a Vine Copula model in a rolling window fashion. One can even condition on variables/time series at certain quantile levels to stress test the risk measure estimates.

Maintained by Emanuel Sommer. Last updated 1 years ago.

expected-shortfall garch-models value-at-risk vine-copulas cpp

22 stars 5.04 score 6 scripts

acabassi

coca:Cluster-of-Clusters Analysis

Contains the R functions needed to perform Cluster-Of-Clusters Analysis (COCA) and Consensus Clustering (CC). For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>.

Maintained by Alessandra Cabassi. Last updated 5 years ago.

cluster-analysis cluster-of-clusters clustering coca genomics integrative-clustering multi-omics

6 stars 5.03 score 12 scripts 1 dependents

bioc

shinyepico:ShinyÉPICo

ShinyÉPICo is a graphical pipeline to analyze Illumina DNA methylation arrays (450k or EPIC). It allows to calculate differentially methylated positions and differentially methylated regions in a user-friendly interface. Moreover, it includes several options to export the results and obtain files to perform downstream analysis.

Maintained by Octavio Morante-Palacios. Last updated 5 months ago.

differentialmethylation dnamethylation microarray preprocessing qualitycontrol

5 stars 5.00 score 1 scripts

wudongjie

em:Generic EM Algorithm

A generic function for running the Expectation-Maximization (EM) algorithm within a maximum likelihood framework, based on Dempster, Laird, and Rubin (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x> is implemented. It can be applied after a model fitting using R's existing functions and packages. The research leading to the software described here has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 851293).

Maintained by Dongjie Wu. Last updated 2 years ago.

openblas cpp openmp

8 stars 4.98 score 24 scripts

uscbiostats

LUCIDus:Latent Unknown Clustering Integrating Multi-View Data

An implementation of the LUCID model (Peng (2019) <doi:10.1093/bioinformatics/btz667>). LUCID conducts integrated clustering using exposures, omics data (and outcome as an option). An EM algorithm is implemented to estimate MLE of the LUCID model. 'LUCIDus' features integrated variable selection, incorporation of missing omics data, bootstrap inference, prediction and visualization of the model.

Maintained by Yinqi Zhao. Last updated 2 years ago.

7 stars 4.98 score 27 scripts

hendersontrent

theftdlc:Analyse and Interpret Time Series Features

Provides a suite of functions for analysing, interpreting, and visualising time-series features calculated from different feature sets from the 'theft' package. Implements statistical learning methodologies described in Henderson, T., Bryant, A., and Fulcher, B. (2023) <arXiv:2303.17809>.

Maintained by Trent Henderson. Last updated 2 months ago.

data-science data-visualization machine-learning statistics time-series

4 stars 4.94 score 11 scripts

martirm

clustAnalytics:Cluster Evaluation on Graphs

Evaluates the stability and significance of clusters on 'igraph' graphs. Supports weighted and unweighted graphs. Implements the cluster evaluation methods defined by Arratia A, Renedo M (2021) <doi:10.7717/peerj-cs.600>. Also includes an implementation of the Reduced Mutual Information introduced by Newman et al. (2020) <doi:10.1103/PhysRevE.101.042304>.

Maintained by Martí Renedo Mirambell. Last updated 1 years ago.

cpp

5 stars 4.92 score 33 scripts

refunders

refund.shiny:Interactive Plotting for Functional Data Analyses

Produces Shiny applications for different types of popular functional data analyses. The functional data analyses are implemented in the refund package, then refund.shiny reads in the refund object and implements an object-specific set of plots based on the object class using S3.

Maintained by Julia Wrobel. Last updated 1 years ago.

4 stars 4.91 score 45 scripts

bioc

tweeDEseq:RNA-seq data analysis using the Poisson-Tweedie family of distributions

Differential expression analysis of RNA-seq using the Poisson-Tweedie (PT) family of distributions. PT distributions are described by a mean, a dispersion and a shape parameter and include Poisson and NB distributions, among others, as particular cases. An important feature of this family is that, while the Negative Binomial (NB) distribution only allows a quadratic mean-variance relationship, the PT distributions generalizes this relationship to any orde.

Maintained by Dolors Pelegri-Siso. Last updated 5 months ago.

immunooncology statisticalmethod differentialexpression sequencing rnaseq dnaseq

4.91 score 45 scripts 1 dependents

barbarabodinier

sharp:Stability-enHanced Approaches using Resampling Procedures

In stability selection (N Meinshausen, P Bühlmann (2010) <doi:10.1111/j.1467-9868.2010.00740.x>) and consensus clustering (S Monti et al (2003) <doi:10.1023/A:1023949509487>), resampling techniques are used to enhance the reliability of the results. In this package (B Bodinier et al (2025) <doi:10.18637/jss.v112.i05>), hyper-parameters are calibrated by maximising model stability, which is measured under the null hypothesis that all selection (or co-membership) probabilities are identical (B Bodinier et al (2023a) <doi:10.1093/jrsssc/qlad058> and B Bodinier et al (2023b) <doi:10.1093/bioinformatics/btad635>). Functions are readily implemented for the use of LASSO regression, sparse PCA, sparse (group) PLS or graphical LASSO in stability selection, and hierarchical clustering, partitioning around medoids, K means or Gaussian mixture models in consensus clustering.

Maintained by Barbara Bodinier. Last updated 6 days ago.

13 stars 4.91 score 124 scripts

bioc

Melissa:Bayesian clustering and imputationa of single cell methylomes

Melissa is a Baysian probabilistic model for jointly clustering and imputing single cell methylomes. This is done by taking into account local correlations via a Generalised Linear Model approach and global similarities using a mixture modelling approach.

Maintained by C. A. Kapourani. Last updated 5 months ago.

immunooncology dnamethylation geneexpression generegulation epigenetics genetics clustering featureextraction regression rnaseq bayesian kegg sequencing coverage singlecell

4.90 score 7 scripts

andrewdhawan

sigQC:Quality Control Metrics for Gene Signatures

Provides gene signature quality control metrics in publication ready plots. Namely, enables the visualization of properties such as expression, variability, correlation, and comparison of methods of standardisation and scoring metrics.

Maintained by Andrew Dhawan. Last updated 8 months ago.

4 stars 4.89 score 13 scripts

orange-opensource

linkspotter:Bivariate Correlations Calculation and Visualization

Compute and visualize using the 'visNetwork' package all the bivariate correlations of a dataframe. Several and different types of correlation coefficients (Pearson's r, Spearman's rho, Kendall's tau, distance correlation, maximal information coefficient and equal-freq discretization-based maximal normalized mutual information) are used according to the variable couple type (quantitative vs categorical, quantitative vs quantitative, categorical vs categorical).

Maintained by Alassane Samba. Last updated 1 years ago.

7 stars 4.89 score 22 scripts

sebdejean

CCA:Canonical Correlation Analysis

Provides a set of functions that extend the 'cancor' function with new numerical and graphical outputs. It also include a regularized extension of the canonical correlation analysis to deal with datasets with more variables than observations.

Maintained by Sébastien Déjean. Last updated 2 years ago.

4.85 score 334 scripts 3 dependents

bioc

evaluomeR:Evaluation of Bioinformatics Metrics

Evaluating the reliability of your own metrics and the measurements done on your own datasets by analysing the stability and goodness of the classifications of such metrics.

Maintained by José Antonio Bernabé-Díaz. Last updated 5 months ago.

clustering classification featureextraction assessment clustering-evaluation evaluome evaluomer metrics

4.82 score 33 scripts

cran

rainbow:Bagplots, Boxplots and Rainbow Plots for Functional Data

Visualizing functional data and identifying functional outliers.

Maintained by Han Lin Shang. Last updated 1 years ago.

4.79 score 150 dependents

cran

fds:Functional Data Sets

Functional data sets.

Maintained by Han Lin Shang. Last updated 6 years ago.

1 stars 4.79 score 148 dependents

bioc

airpart:Differential cell-type-specific allelic imbalance

Airpart identifies sets of genes displaying differential cell-type-specific allelic imbalance across cell types or states, utilizing single-cell allelic counts. It makes use of a generalized fused lasso with binomial observations of allelic counts to partition cell types by their allelic imbalance. Alternatively, a nonparametric method for partitioning cell types is offered. The package includes a number of visualizations and quality control functions for examining single cell allelic imbalance datasets.

Maintained by Wancen Mu. Last updated 5 months ago.

singlecell rnaseq atacseq chipseq sequencing generegulation geneexpression transcription transcriptomevariant cellbiology functionalgenomics differentialexpression graphandnetwork regression clustering qualitycontrol

2 stars 4.78 score 2 scripts

tylerjpike

sovereign:State-Dependent Empirical Analysis

A set of tools for state-dependent empirical analysis through both VAR- and local projection-based state-dependent forecasts, impulse response functions, historical decompositions, and forecast error variance decompositions.

Maintained by Tyler J. Pike. Last updated 2 years ago.

econometrics forecasting impulse-response local-projection macroeconomics state-dependent time-series vector-autoregression

12 stars 4.78 score 8 scripts

egarpor

goffda:Goodness-of-Fit Tests for Functional Data

Implementation of several goodness-of-fit tests for functional data. Currently, mostly related with the functional linear model with functional/scalar response and functional/scalar predictor. The package allows for the replication of the data applications considered in García-Portugués, Álvarez-Liébana, Álvarez-Pérez and González-Manteiga (2021) <doi:10.1111/sjos.12486>.

Maintained by Eduardo García-Portugués. Last updated 1 years ago.

functional-data-analysis goodness-of-fit reproducible-research statistics openblas cpp

10 stars 4.76 score 19 scripts 1 dependents

ropensci

phruta:Phylogenetic Reconstruction and Time-dating

The phruta R package is designed to simplify the basic phylogenetic pipeline. Specifically, all code is run within the same program and data from intermediate steps are saved in independent folders. Furthermore, all code is run within the same environment which increases the reproducibility of your analysis. phruta retrieves gene sequences, combines newly downloaded and local gene sequences, and performs sequence alignments.

Maintained by Cristian Roman Palacios. Last updated 9 months ago.

9 stars 4.75 score 14 scripts

okgreece

Cluster.OBeu:Cluster Analysis 'OpenBudgets.eu'

Estimate and return the needed parameters for visualisations designed for 'OpenBudgets' <http://openbudgets.eu/> data. Calculate cluster analysis measures in Budget data of municipalities across Europe, according to the 'OpenBudgets' data model. It involves a set of techniques and algorithms used to find and divide the data into groups of similar observations. Also, can be used generally to extract visualisation parameters convert them to 'JSON' format and use them as input in a different graphical interface.

Maintained by Kleanthis Koupidis. Last updated 4 years ago.

cluster cluster-analysis clustering-algorithm clustering-measures estimate-clustering-parameters obeu open-budgets openbudgets

2 stars 4.75 score 14 scripts

abdalkarima

iClusterVB:Fast Integrative Clustering and Feature Selection for High Dimensional Data

A variational Bayesian approach for fast integrative clustering and feature selection, facilitating the analysis of multi-view, mixed type, high-dimensional datasets with applications in fields like cancer research, genomics, and more.

Maintained by Abdalkarim Alnajjar. Last updated 4 months ago.

openblas cpp openmp

1 stars 4.74 score 6 scripts

dgrun

RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data

Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).

Maintained by Dominic Grün. Last updated 4 months ago.

cpp

4.74 score 110 scripts

bioc

MesKit:A tool kit for dissecting cancer evolution from multi-region derived tumor biopsies via somatic alterations

MesKit provides commonly used analysis and visualization modules based on mutational data generated by multi-region sequencing (MRS). This package allows to depict mutational profiles, measure heterogeneity within or between tumors from the same patient, track evolutionary dynamics, as well as characterize mutational patterns on different levels. Shiny application was also developed for a need of GUI-based analysis. As a handy tool, MesKit can facilitate the interpretation of tumor heterogeneity and the understanding of evolutionary relationship between regions in MRS study.

Maintained by Mengni Liu. Last updated 5 months ago.

4.73 score 18 scripts 1 dependents

mflores72000

qcr:Quality Control Review

Univariate and multivariate SQC tools that completes and increases the SQC techniques available in R. Apart from integrating different R packages devoted to SQC ('qcc','MSQC'), provides nonparametric tools that are highly useful when Gaussian assumption is not met. This package computes standard univariate control charts for individual measurements, X-bar, S, R, p, np, c, u, EWMA and CUSUM. In addition, it includes functions to perform multivariate control charts such as Hotelling T2, MEWMA and MCUSUM. As representative feature, multivariate nonparametric alternatives based on data depth are implemented in this package: r, Q and S control charts. In addition, Phase I and II control charts for functional data are included. This package also allows the estimation of the most complete set of capability indices from first to fourth generation, covering the nonparametric alternatives, and performing the corresponding capability analysis graphical outputs, including the process capability plots.

Maintained by Miguel Flores. Last updated 2 years ago.

4.71 score 172 scripts 1 dependents

pridiltal

oddstream:Outlier Detection in Data Streams

We proposes a framework that provides real time support for early detection of anomalous series within a large collection of streaming time series data. By definition, anomalies are rare in comparison to a system's typical behaviour. We define an anomaly as an observation that is very unlikely given the forecast distribution. The algorithm first forecasts a boundary for the system's typical behaviour using a representative sample of the typical behaviour of the system. An approach based on extreme value theory is used for this boundary prediction process. Then a sliding window is used to test for anomalous series within the newly arrived collection of series. Feature based representation of time series is used as the input to the model. To cope with concept drift, the forecast boundary for the system's typical behaviour is updated periodically. More details regarding the algorithm can be found in Talagala, P. D., Hyndman, R. J., Smith-Miles, K., et al. (2019) <doi:10.1080/10618600.2019.1617160>.

Maintained by Priyanga Dilini Talagala. Last updated 5 years ago.

64 stars 4.71 score 16 scripts

bioc

puma:Propagating Uncertainty in Microarray Analysis(including Affymetrix tranditional 3' arrays and exon arrays and Human Transcriptome Array 2.0)

Most analyses of Affymetrix GeneChip data (including tranditional 3' arrays and exon arrays and Human Transcriptome Array 2.0) are based on point estimates of expression levels and ignore the uncertainty of such estimates. By propagating uncertainty to downstream analyses we can improve results from microarray analyses. For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. In additon to calculte gene expression from Affymetrix 3' arrays, puma also provides methods to process exon arrays and produces gene and isoform expression for alternative splicing study. puma also offers improvements in terms of scope and speed of execution over previously available uncertainty propagation methods. Included are summarisation, differential expression detection, clustering and PCA methods, together with useful plotting functions.

Maintained by Xuejun Liu. Last updated 10 days ago.

microarray onechannel preprocessing differentialexpression clustering exonarray geneexpression mrnamicroarray chiponchip alternativesplicing differentialsplicing bayesian twochannel dataimport hta2.0

4.71 score 17 scripts

bioc

FuseSOM:A Correlation Based Multiview Self Organizing Maps Clustering For IMC Datasets

A correlation-based multiview self-organizing map for the characterization of cell types in highly multiplexed in situ imaging cytometry assays (`FuseSOM`) is a tool for unsupervised clustering. `FuseSOM` is robust and achieves high accuracy by combining a `Self Organizing Map` architecture and a `Multiview` integration of correlation based metrics. This allows FuseSOM to cluster highly multiplexed in situ imaging cytometry assays.

Maintained by Elijah Willie. Last updated 5 months ago.

singlecell cellbasedassays clustering spatial

1 stars 4.71 score 17 scripts

neurodata

causalBatch:Causal Batch Effects

Software which provides numerous functionalities for detecting and removing group-level effects from high-dimensional scientific data which, when combined with additional assumptions, allow for causal conclusions, as-described in our manuscripts Bridgeford et al. (2024) <doi:10.1101/2021.09.03.458920> and Bridgeford et al. (2023) <doi:10.48550/arXiv.2307.13868>. Also provides a number of useful utilities for generating simulations and balancing covariates across multiple groups/batches of data via matching and propensity trimming for more than two groups.

Maintained by Eric W. Bridgeford. Last updated 18 days ago.

4 stars 4.70 score 23 scripts

cygei

mixtree:A Statistical Framework for Comparing Sets of Trees

Apply hypothesis testing methods to assess differences between sets of trees.

Maintained by Cyril Geismar. Last updated 1 months ago.

4.70 score

bioc

HGC:A fast hierarchical graph-based clustering method

HGC (short for Hierarchical Graph-based Clustering) is an R package for conducting hierarchical clustering on large-scale single-cell RNA-seq (scRNA-seq) data. The key idea is to construct a dendrogram of cells on their shared nearest neighbor (SNN) graph. HGC provides functions for building graphs and for conducting hierarchical clustering on the graph. The users with old R version could visit https://github.com/XuegongLab/HGC/tree/HGC4oldRVersion to get HGC package built for R 3.6.

Maintained by XGlab. Last updated 5 months ago.

singlecell software clustering rnaseq graphandnetwork dnaseq cpp

4.70 score 25 scripts

trackerproject

trackeRapp:Interface for the Analysis of Running, Cycling and Swimming Data from GPS-Enabled Tracking Devices

Provides an integrated user interface and workflow for the analysis of running, cycling and swimming data from GPS-enabled tracking devices through the 'trackeR' <https://CRAN.R-project.org/package=trackeR> R package.

Maintained by Ioannis Kosmidis. Last updated 3 years ago.

data-visualization shiny sports-app web-app web-development

32 stars 4.68 score 2 scripts

bioc

scDDboost:A compositional model to assess expression changes from single-cell rna-seq data

scDDboost is an R package to analyze changes in the distribution of single-cell expression data between two experimental conditions. Compared to other methods that assess differential expression, scDDboost benefits uniquely from information conveyed by the clustering of cells into cellular subtypes. Through a novel empirical Bayesian formulation it calculates gene-specific posterior probabilities that the marginal expression distribution is the same (or different) between the two conditions. The implementation in scDDboost treats gene-level expression data within each condition as a mixture of negative binomial distributions.

Maintained by Xiuyu Ma. Last updated 16 days ago.

singlecell software clustering sequencing geneexpression differentialexpression bayesian cpp

4.68 score 19 scripts

bkeller2

mlmpower:Power Analysis and Data Simulation for Multilevel Models

A declarative language for specifying multilevel models, solving for population parameters based on specified variance-explained effect size measures, generating data, and conducting power analyses to determine sample size recommendations. The specification allows for any number of within-cluster effects, between-cluster effects, covariate effects at either level, and random coefficients. Moreover, the models do not assume orthogonal effects, and predictors can correlate at either level and accommodate models with multiple interaction effects.

Maintained by Brian T. Keller. Last updated 5 months ago.

3 stars 4.65 score 3 scripts

kfinucane

MetSizeR:A Shiny App for Sample Size Estimation in Metabolomic Experiments

Provides a Shiny application to estimate the sample size required for a metabolomic experiment to achieve a desired statistical power. Estimation is possible with or without available data from a pilot study.

Maintained by Kate Finucane. Last updated 4 years ago.

4.65 score 7 scripts

cran

ftsa:Functional Time Series Analysis

Functions for visualizing, modeling, forecasting and hypothesis testing of functional time series.

Maintained by Han Lin Shang. Last updated 1 months ago.

6 stars 4.61 score 10 dependents

bioc

reconsi:Resampling Collapsed Null Distributions for Simultaneous Inference

Improves simultaneous inference under dependence of tests by estimating a collapsed null distribution through resampling. Accounting for the dependence between tests increases the power while reducing the variability of the false discovery proportion. This dependence is common in genomics applications, e.g. when combining flow cytometry measurements with microbiome sequence counts.

Maintained by Stijn Hawinkel. Last updated 5 months ago.

metagenomics microbiome multiplecomparison flowcytometry

2 stars 4.60 score 2 scripts

bioc

methInheritSim:Simulating Whole-Genome Inherited Bisulphite Sequencing Data

Simulate a multigeneration methylation case versus control experiment with inheritance relation using a real control dataset.

Maintained by Pascal Belleau. Last updated 5 months ago.

biologicalquestion epigenetics dnamethylation differentialmethylation methylseq software immunooncology statisticalmethod wholegenome sequencing bisulphite-sequencing inheritance methylation simulation

1 stars 4.60 score 1 scripts

bioc

methylInheritance:Permutation-Based Analysis associating Conserved Differentially Methylated Elements Across Multiple Generations to a Treatment Effect

Permutation analysis, based on Monte Carlo sampling, for testing the hypothesis that the number of conserved differentially methylated elements, between several generations, is associated to an effect inherited from a treatment and that stochastic effect can be dismissed.

Maintained by Astrid Deschênes. Last updated 5 months ago.

biologicalquestion epigenetics dnamethylation differentialmethylation methylseq software immunooncology statisticalmethod wholegenome sequencing analysis bioconductor bioinformatics cpg differentially-methylated-elements inheritance monte-carlo-sampling permutation

4.60 score 1 scripts

bioc

iPath:iPath pipeline for detecting perturbed pathways at individual level

iPath is the Bioconductor package used for calculating personalized pathway score and test the association with survival outcomes. Abundant single-gene biomarkers have been identified and used in the clinics. However, hundreds of oncogenes or tumor-suppressor genes are involved during the process of tumorigenesis. We believe individual-level expression patterns of pre-defined pathways or gene sets are better biomarkers than single genes. In this study, we devised a computational method named iPath to identify prognostic biomarker pathways, one sample at a time. To test its utility, we conducted a pan-cancer analysis across 14 cancer types from The Cancer Genome Atlas and demonstrated that iPath is capable of identifying highly predictive biomarkers for clinical outcomes, including overall survival, tumor subtypes, and tumor stage classifications. We found that pathway-based biomarkers are more robust and effective than single genes.

Maintained by Kenong Su. Last updated 5 months ago.

pathways software geneexpression survival cpp

2 stars 4.60 score 3 scripts

bioc

wavClusteR:Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data

The package provides an integrated pipeline for the analysis of PAR-CLIP data. PAR-CLIP-induced transitions are first discriminated from sequencing errors, SNPs and additional non-experimental sources by a non- parametric mixture model. The protein binding sites (clusters) are then resolved at high resolution and cluster statistics are estimated using a rigorous Bayesian framework. Post-processing of the results, data export for UCSC genome browser visualization and motif search analysis are provided. In addition, the package allows to integrate RNA-Seq data to estimate the False Discovery Rate of cluster detection. Key functions support parallel multicore computing. Note: while wavClusteR was designed for PAR-CLIP data analysis, it can be applied to the analysis of other NGS data obtained from experimental procedures that induce nucleotide substitutions (e.g. BisSeq).

Maintained by Federico Comoglio. Last updated 5 months ago.

immunooncology sequencing technology ripseq rnaseq bayesian

4.60 score 3 scripts

bioc

bnem:Training of logical models from indirect measurements of perturbation experiments

bnem combines the use of indirect measurements of Nested Effects Models (package mnem) with the Boolean networks of CellNOptR. Perturbation experiments of signalling nodes in cells are analysed for their effect on the global gene expression profile. Those profiles give evidence for the Boolean regulation of down-stream nodes in the network, e.g., whether two parents activate their child independently (OR-gate) or jointly (AND-gate).

Maintained by Martin Pirkl. Last updated 5 months ago.

pathways systemsbiology networkinference network geneexpression generegulation preprocessing

2 stars 4.60 score 5 scripts

manalytics

stppSim:Spatiotemporal Point Patterns Simulation

Generates artificial point patterns marked by their spatial and temporal signatures. The resulting point cloud may exhibit inherent interactions between both signatures. The simulation integrates microsimulation (Holm, E., (2017)<doi:10.1002/9781118786352.wbieg0320>) and agent-based models (Bonabeau, E., (2002)<doi:10.1073/pnas.082080899>), beginning with the configuration of movement characteristics for the specified agents (referred to as 'walkers') and their interactions within the simulation environment. These interactions (Quaglietta, L. and Porto, M., (2019)<doi:10.1186/s40462-019-0154-8>) result in specific spatiotemporal patterns that can be visualized, analyzed, and used for various analytical purposes. Given the growing scarcity of detailed spatiotemporal data across many domains, this package provides an alternative data source for applications in social and life sciences.

Maintained by Monsuru Adepeju. Last updated 8 months ago.

4 stars 4.60 score 5 scripts

modal-inria

cfda:Categorical Functional Data Analysis

Package for the analysis of categorical functional data. The main purpose is to compute an encoding (real functional variable) for each state <doi:10.3390/math9233074>. It also provides functions to perform basic statistical analysis on categorical functional data.

Maintained by Quentin Grimonprez. Last updated 2 months ago.

categorical-data functional-data-analysis hacktoberfest

4 stars 4.60 score 3 scripts

bioc

nempi:Inferring unobserved perturbations from gene expression data

Takes as input an incomplete perturbation profile and differential gene expression in log odds and infers unobserved perturbations and augments observed ones. The inference is done by iteratively inferring a network from the perturbations and inferring perturbations from the network. The network inference is done by Nested Effects Models.

Maintained by Martin Pirkl. Last updated 5 months ago.

software geneexpression differentialexpression differentialmethylation genesignaling pathways network classification neuralnetwork networkinference atacseq dnaseq rnaseq pooledscreens crispr singlecell systemsbiology

2 stars 4.60 score 2 scripts

tarnduong

feature:Local Inferential Feature Significance for Multivariate Kernel Density Estimation

Local inferential feature significance for multivariate kernel density estimation.

Maintained by Tarn Duong. Last updated 4 years ago.

4.60 score 38 scripts 5 dependents

bioc

dce:Pathway Enrichment Based on Differential Causal Effects

Compute differential causal effects (dce) on (biological) networks. Given observational samples from a control experiment and non-control (e.g., cancer) for two genes A and B, we can compute differential causal effects with a (generalized) linear regression. If the causal effect of gene A on gene B in the control samples is different from the causal effect in the non-control samples the dce will differ from zero. We regularize the dce computation by the inclusion of prior network information from pathway databases such as KEGG.

Maintained by Kim Philipp Jablonski. Last updated 3 months ago.

software statisticalmethod graphandnetwork regression geneexpression differentialexpression networkenrichment network kegg bioconductor causality

13 stars 4.59 score 4 scripts

francescobartolucci

LMest:Generalized Latent Markov Models

Latent Markov models for longitudinal continuous and categorical data. See Bartolucci, Pandolfi, Pennoni (2017)<doi:10.18637/jss.v081.i04>.

Maintained by Francesco Bartolucci. Last updated 3 months ago.

fortran openblas

3 stars 4.58 score 42 scripts

bioc

flowMerge:Cluster Merging for Flow Cytometry Data

Merging of mixture components for model-based automated gating of flow cytometry data using the flowClust framework. Note: users should have a working copy of flowClust 2.0 installed.

Maintained by Greg Finak. Last updated 5 months ago.

immunooncology clustering flowcytometry

4.56 score 6 scripts 1 dependents

colemanrharris

mxnorm:Apply Normalization Methods to Multiplexed Images

Implements methods to normalize multiplexed imaging data, including statistical metrics and visualizations to quantify technical variation in this data type. Reference for methods listed here: Harris, C., Wrobel, J., & Vandekar, S. (2022). mxnorm: An R Package to Normalize Multiplexed Imaging Data. Journal of Open Source Software, 7(71), 4180, <doi:10.21105/joss.04180>.

Maintained by Coleman Harris. Last updated 2 years ago.

7 stars 4.54 score 7 scripts