R-universe search: identification

bioc

genefu:Computation of Gene Expression-Based Signatures in Breast Cancer

This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.

Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.

differentialexpression geneexpression visualization clustering classification

29.7 match 7.42 score 193 scripts 3 dependents

jaredhuling

personalized:Estimation and Validation Methods for Subgroup Identification and Personalized Medicine

Provides functions for fitting and validation of models for subgroup identification and personalized medicine / precision medicine under the general subgroup identification framework of Chen et al. (2017) <doi:10.1111/biom.12676>. This package is intended for use for both randomized controlled trials and observational studies and is described in detail in Huling and Yu (2021) <doi:10.18637/jss.v098.i05>.

Maintained by Jared Huling. Last updated 3 years ago.

causal-inference heterogeneity-of-treatment-effect individualized-treatment-rules personalized-medicine precision-medicine subgroup-identification treatment-effects treatment-scoring

24.3 match 32 stars 7.38 score 125 scripts 1 dependents

alexanderlange53

svars:Data-Driven Identification of SVAR Models

Implements data-driven identification methods for structural vector autoregressive (SVAR) models as described in Lange et al. (2021) <doi:10.18637/jss.v097.i05>. Based on an existing VAR model object (provided by e.g. VAR() from the 'vars' package), the structural impact matrix is obtained via data-driven identification techniques (i.e. changes in volatility (Rigobon, R. (2003) <doi:10.1162/003465303772815727>), patterns of GARCH (Normadin, M., Phaneuf, L. (2004) <doi:10.1016/j.jmoneco.2003.11.002>), independent component analysis (Matteson, D. S, Tsay, R. S., (2013) <doi:10.1080/01621459.2016.1150851>), least dependent innovations (Herwartz, H., Ploedt, M., (2016) <doi:10.1016/j.jimonfin.2015.11.001>), smooth transition in variances (Luetkepohl, H., Netsunajev, A. (2017) <doi:10.1016/j.jedc.2017.09.001>) or non-Gaussian maximum likelihood (Lanne, M., Meitz, M., Saikkonen, P. (2017) <doi:10.1016/j.jeconom.2016.06.002>)).

Maintained by Alexander Lange. Last updated 2 years ago.

openblas cpp openmp

20.4 match 46 stars 7.22 score 130 scripts

idigbio

ridigbio:Interface to the iDigBio Data API

An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio.

Maintained by Jesse Bennett. Last updated 7 days ago.

10.0 match 16 stars 10.23 score 63 scripts 7 dependents

bioc

MSnID:Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications

Extracts MS/MS ID data from mzIdentML (leveraging mzID package) or text files. After collating the search results from multiple datasets it assesses their identification quality and optimize filtering criteria to achieve the maximum number of identifications while not exceeding a specified false discovery rate. Also contains a number of utilities to explore the MS/MS results and assess missed and irregular enzymatic cleavages, mass measurement accuracy, etc.

Maintained by Vlad Petyuk. Last updated 5 months ago.

proteomics massspectrometry immunooncology

17.6 match 5.06 score 57 scripts

meyer-lab-cshl

plinkQC:Genotype Quality Control with 'PLINK'

Genotyping arrays enable the direct measurement of an individuals genotype at thousands of markers. 'plinkQC' facilitates genotype quality control for genetic association studies as described by Anderson and colleagues (2010) <doi:10.1038/nprot.2010.116>. It makes 'PLINK' basic statistics (e.g. missing genotyping rates per individual, allele frequencies per genetic marker) and relationship functions accessible from 'R' and generates a per-individual and per-marker quality control report. Individuals and markers that fail the quality control can subsequently be removed to generate a new, clean dataset. Removal of individuals based on relationship status is optimised to retain as many individuals as possible in the study.

Maintained by Hannah Meyer. Last updated 3 years ago.

12.6 match 58 stars 6.75 score 49 scripts

lucaweihs

SEMID:Identifiability of Linear Structural Equation Models

Provides routines to check identifiability or non-identifiability of linear structural equation models as described in Drton, Foygel, and Sullivant (2011) <doi:10.1214/10-AOS859>, Foygel, Draisma, and Drton (2012) <doi:10.1214/12-AOS1012>, and other works. The routines are based on the graphical representation of structural equation models.

Maintained by Nils Sturma. Last updated 2 years ago.

19.8 match 4 stars 4.06 score 29 scripts

bsvars

bsvars:Bayesian Estimation of Structural Vector Autoregressive Models

Provides fast and efficient procedures for Bayesian analysis of Structural Vector Autoregressions. This package estimates a wide range of models, including homo-, heteroskedastic, and non-normal specifications. Structural models can be identified by adjustable exclusion restrictions, time-varying volatility, or non-normality. They all include a flexible three-level equation-specific local-global hierarchical prior distribution for the estimated level of shrinkage for autoregressive and structural parameters. Additionally, the package facilitates predictive and structural analyses such as impulse responses, forecast error variance and historical decompositions, forecasting, verification of heteroskedasticity, non-normality, and hypotheses on autoregressive parameters, as well as analyses of structural shocks, volatilities, and fitted values. Beautiful plots, informative summary functions, and extensive documentation including the vignette by Woźniak (2024) <doi:10.48550/arXiv.2410.15090> complement all this. The implemented techniques align closely with those presented in Lütkepohl, Shang, Uzeda, & Woźniak (2024) <doi:10.48550/arXiv.2404.11057>, Lütkepohl & Woźniak (2020) <doi:10.1016/j.jedc.2020.103862>, and Song & Woźniak (2021) <doi:10.1093/acrefore/9780190625979.013.174>. The 'bsvars' package is aligned regarding objects, workflows, and code structure with the R package 'bsvarSIGNs' by Wang & Woźniak (2024) <doi:10.32614/CRAN.package.bsvarSIGNs>, and they constitute an integrated toolset.

Maintained by Tomasz Woźniak. Last updated 1 months ago.

bayesian-inference econometrics vector-autoregression openblas cpp openmp

10.3 match 46 stars 7.67 score 32 scripts 1 dependents

jniedballa

camtrapR:Camera Trap Data Management and Preparation of Occupancy and Spatial Capture-Recapture Analyses

Management of and data extraction from camera trap data in wildlife studies. The package provides a workflow for storing and sorting camera trap photos (and videos), tabulates records of species and individuals, and creates detection/non-detection matrices for occupancy and spatial capture-recapture analyses with great flexibility. In addition, it can visualise species activity data and provides simple mapping functions with GIS export.

Maintained by Juergen Niedballa. Last updated 3 months ago.

occupancy-modeling spatial-capture-recapture wildlife

8.6 match 35 stars 8.65 score 178 scripts

plangfelder

WGCNA:Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Maintained by Peter Langfelder. Last updated 6 months ago.

cpp

7.7 match 54 stars 9.65 score 5.3k scripts 32 dependents

abjur

abjutils:Useful Tools for Jurimetrical Analysis Used by the Brazilian Jurimetrics Association

The Brazilian Jurimetrics Association (ABJ in Portuguese, see <https://abj.org.br/> for more information) is a non-profit organization which aims to investigate and promote the use of statistics and probability in the study of Law and its institutions. This package implements general purpose tools used by ABJ, such as functions for sampling and basic manipulation of Brazilian lawsuits identification number. It also implements functions for text cleaning, such as accentuation removal.

Maintained by Caio Lente. Last updated 1 years ago.

jurimetrics toolkit

10.9 match 55 stars 6.76 score 78 scripts 1 dependents

yangcq-ivy

NicheBarcoding:Niche-model-Based Species Identification

Species Identification using DNA Barcodes Integrated with Environmental Niche Models.

Maintained by Cai-qing YANG. Last updated 7 months ago.

openjdk

16.7 match 1 stars 4.18 score 7 scripts

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 18 days ago.

genetics imputation lmm fortran

5.7 match 12 stars 11.88 score 448 scripts 16 dependents

bioc

AMOUNTAIN:Active modules for multilayer weighted gene co-expression networks: a continuous optimization approach

A pure data-driven gene network, weighted gene co-expression network (WGCN) could be constructed only from expression profile. Different layers in such networks may represent different time points, multiple conditions or various species. AMOUNTAIN aims to search active modules in multi-layer WGCN using a continuous optimization approach.

Maintained by Dong Li. Last updated 5 months ago.

geneexpression microarray differentialexpression network gsl

16.5 match 3.78 score 1 scripts 1 dependents

valentint

rrcovHD:Robust Multivariate Methods for High Dimensional Data

Robust multivariate methods for high dimensional data including outlier detection (Filzmoser and Todorov (2013) <doi:10.1016/j.ins.2012.10.017>), robust sparse PCA (Croux et al. (2013) <doi:10.1080/00401706.2012.727746>, Todorov and Filzmoser (2013) <doi:10.1007/978-3-642-33042-1_31>), robust PLS (Todorov and Filzmoser (2014) <doi:10.17713/ajs.v43i4.44>), and robust sparse classification (Ortner et al. (2020) <doi:10.1007/s10618-019-00666-8>).

Maintained by Valentin Todorov. Last updated 7 months ago.

cpp

18.3 match 3.39 score 49 scripts

cpanse

protViz:Visualizing and Analyzing Mass Spectrometry Related Data in Proteomics

Helps with quality checks, visualizations and analysis of mass spectrometry data, coming from proteomics experiments. The package is developed, tested and used at the Functional Genomics Center Zurich <https://fgcz.ch>. We use this package mainly for prototyping, teaching, and having fun with proteomics data. But it can also be used to do data analysis for small scale data sets.

Maintained by Christian Panse. Last updated 1 years ago.

fun mass-spectrometry peptide-identification proteomics quantification visualization cpp

7.5 match 11 stars 7.88 score 72 scripts 2 dependents

r-forge

car:Companion to Applied Regression

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.

Maintained by John Fox. Last updated 5 months ago.

3.8 match 15.29 score 43k scripts 901 dependents

santikka

causaleffect:Deriving Expressions of Joint Interventional Distributions and Transport Formulas in Causal Models

Functions for identification and transportation of causal effects. Provides a conditional causal effect identification algorithm (IDC) by Shpitser, I. and Pearl, J. (2006) <http://ftp.cs.ucla.edu/pub/stat_ser/r329-uai.pdf>, an algorithm for transportability from multiple domains with limited experiments by Bareinboim, E. and Pearl, J. (2014) <http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf>, and a selection bias recovery algorithm by Bareinboim, E. and Tian, J. (2015) <http://ftp.cs.ucla.edu/pub/stat_ser/r445.pdf>. All of the previously mentioned algorithms are based on a causal effect identification algorithm by Tian , J. (2002) <http://ftp.cs.ucla.edu/pub/stat_ser/r309.pdf>.

Maintained by Santtu Tikka. Last updated 2 years ago.

causal-inference causal-models causality-algorithms directed-acyclic-graph graphs identifiability identification igraph

10.9 match 29 stars 5.28 score 44 scripts 1 dependents

pharmaverse

pharmaversesdtm:SDTM Test Data for the 'Pharmaverse' Family of Packages

A set of Study Data Tabulation Model (SDTM) datasets from the Clinical Data Interchange Standards Consortium (CDISC) pilot project used for testing and developing Analysis Data Model (ADaM) datasets inside the pharmaverse family of packages. SDTM dataset specifications are described in the CDISC SDTM implementation guide, accessible by creating a free account on <https://www.cdisc.org/>.

Maintained by Edoardo Mancini. Last updated 1 days ago.

7.5 match 15 stars 7.46 score 143 scripts

bioc

AlpsNMR:Automated spectraL Processing System for NMR

Reads Bruker NMR data directories both zipped and unzipped. It provides automated and efficient signal processing for untargeted NMR metabolomics. It is able to interpolate the samples, detect outliers, exclude regions, normalize, detect peaks, align the spectra, integrate peaks, manage metadata and visualize the spectra. After spectra proccessing, it can apply multivariate analysis on extracted data. Efficient plotting with 1-D data is also available. Basic reading of 1D ACD/Labs exported JDX samples is also available.

Maintained by Sergio Oller Moreno. Last updated 5 months ago.

software preprocessing visualization classification cheminformatics metabolomics dataimport

7.3 match 15 stars 7.59 score 12 scripts 1 dependents

laurafancello

net4pg:Handle Ambiguity of Protein Identifications from Shotgun Proteomics

In shotgun proteomics, shared peptides (i.e., peptides that might originate from different proteins sharing homology, from different proteoforms due to alternative mRNA splicing, post-translational modifications, proteolytic cleavages, and/or allelic variants) represent a major source of ambiguity in protein identifications. The 'net4pg' package allows to assess and handle ambiguity of protein identifications. It implements methods for two main applications. First, it allows to represent and quantify ambiguity of protein identifications by means of graph connected components (CCs). In graph theory, CCs are defined as the largest subgraphs in which any two vertices are connected to each other by a path and not connected to any other of the vertices in the supergraph. Here, proteins sharing one or more peptides are thus gathered in the same CC (multi-protein CC), while unambiguous protein identifications constitute CCs with a single protein vertex (single-protein CCs). Therefore, the proportion of single-protein CCs and the size of multi-protein CCs can be used to measure the level of ambiguity of protein identifications. The package implements a strategy to efficiently calculate graph connected components on large datasets and allows to visually inspect them. Secondly, the 'net4pg' package allows to exploit the increasing availability of matched transcriptomic and proteomic datasets to reduce ambiguity of protein identifications. More precisely, it implement a transcriptome-based filtering strategy fundamentally consisting in the removal of those proteins whose corresponding transcript is not expressed in the sample-matched transcriptome. The underlying assumption is that, according to the central dogma of biology, there can be no proteins without the corresponding transcript. Most importantly, the package allows to visually inspect the effect of the filtering on protein identifications and quantify ambiguity before and after filtering by means of graph connected components. As such, it constitutes a reproducible and transparent method to exploit transcriptome information to enhance protein identifications. All methods implemented in the 'net4pg' package are fully described in Fancello and Burger (2022) <doi:10.1186/s13059-022-02701-2>.

Maintained by Laura Fancello. Last updated 3 years ago.

13.7 match 2 stars 4.00 score 3 scripts

bioc

mzR:parser for netCDF, mzXML and mzML and mzIdentML files (mass spectrometry data)

mzR provides a unified API to the common file formats and parsers available for mass spectrometry data. It comes with a subset of the proteowizard library for mzXML, mzML and mzIdentML. The netCDF reading code has previously been used in XCMS.

Maintained by Steffen Neumann. Last updated 1 months ago.

immunooncology infrastructure dataimport proteomics metabolomics massspectrometry zlib cpp

4.3 match 45 stars 12.77 score 204 scripts 44 dependents

cran

apc:Age-Period-Cohort Analysis

Functions for age-period-cohort analysis. Aggregate data can be organised in matrices indexed by age-cohort, age-period or cohort-period. The data can include dose and response or just doses. The statistical model is a generalized linear model (GLM) allowing for 3,2,1 or 0 of the age-period-cohort factors. Individual-level data should have a row for each individual and columns for each of age, period, and cohort. The statistical model for repeated cross-section is a generalized linear model. The statistical model for panel data is ordinary least squares. The canonical parametrisation of Kuang, Nielsen and Nielsen (2008) <DOI:10.1093/biomet/asn026> is used. Thus, the analysis does not rely on ad hoc identification.

Maintained by Bent Nielsen. Last updated 4 years ago.

12.1 match 4.49 score 49 scripts

veronicanava

RamanMP:Analysis and Identification of Raman Spectra of Microplastics

Pre-processing and polymer identification of Raman spectra of plastics. Pre-processing includes normalisation functions, peak identification based on local maxima, smoothing process and removal of spectral region of no interest. Polymer identification can be performed using Pearson correlation coefficient or Euclidean distance (Renner et al. (2019), <doi:10.1016/j.trac.2018.12.004>), and the comparison can be done with a user-defined database or with the database already implemented in the package, which currently includes 356 spectra, with several spectra of plastic colorants.

Maintained by Veronica Nava. Last updated 3 years ago.

15.4 match 6 stars 3.48 score 1 scripts

bioc

survtype:Subtype Identification with Survival Data

Subtypes are defined as groups of samples that have distinct molecular and clinical features. Genomic data can be analyzed for discovering patient subtypes, associated with clinical data, especially for survival information. This package is aimed to identify subtypes that are both clinically relevant and biologically meaningful.

Maintained by Dongmin Jung. Last updated 5 months ago.

software statisticalmethod geneexpression survival clustering sequencing coverage

13.0 match 4.00 score 3 scripts

bioc

betaHMM:A Hidden Markov Model Approach for Identifying Differentially Methylated Sites and Regions for Beta-Valued DNA Methylation Data

A novel approach utilizing a homogeneous hidden Markov model. And effectively model untransformed beta values. To identify DMCs while considering the spatial. Correlation of the adjacent CpG sites.

Maintained by Koyel Majumdar. Last updated 3 months ago.

dnamethylation differentialmethylation immunooncology biomedicalinformatics methylationarray software multiplecomparison sequencing spatial coverage genetarget hiddenmarkovmodel microarray

12.3 match 4.18 score

magnusdv

dvir:Disaster Victim Identification

Joint DNA-based disaster victim identification (DVI), as described in Vigeland and Egeland (2021) <doi:10.21203/rs.3.rs-296414/v1>. Identification is performed by optimising the joint likelihood of all victim samples and reference individuals. Individual identification probabilities, conditional on all available information, are derived from the joint solution in the form of posterior pairing probabilities. 'dvir' is part of the 'pedsuite' collection of packages for pedigree analysis.

Maintained by Magnus Dehli Vigeland. Last updated 3 months ago.

dvi forensic-genetics

10.1 match 3 stars 5.05 score 21 scripts 1 dependents

a91quaini

intrinsicFRP:An R Package for Factor Model Asset Pricing

Functions for evaluating and testing asset pricing models, including estimation and testing of factor risk premia, selection of "strong" risk factors (factors having nonzero population correlation with test asset returns), heteroskedasticity and autocorrelation robust covariance matrix estimation and testing for model misspecification and identification. The functions for estimating and testing factor risk premia implement the Fama-MachBeth (1973) <doi:10.1086/260061> two-pass approach, the misspecification-robust approaches of Kan-Robotti-Shanken (2013) <doi:10.1111/jofi.12035>, and the approaches based on tradable factor risk premia of Quaini-Trojani-Yuan (2023) <doi:10.2139/ssrn.4574683>. The functions for selecting the "strong" risk factors are based on the Oracle estimator of Quaini-Trojani-Yuan (2023) <doi:10.2139/ssrn.4574683> and the factor screening procedure of Gospodinov-Kan-Robotti (2014) <doi:10.2139/ssrn.2579821>. The functions for evaluating model misspecification implement the HJ model misspecification distance of Kan-Robotti (2008) <doi:10.1016/j.jempfin.2008.03.003>, which is a modification of the prominent Hansen-Jagannathan (1997) <doi:10.1111/j.1540-6261.1997.tb04813.x> distance. The functions for testing model identification specialize the Kleibergen-Paap (2006) <doi:10.1016/j.jeconom.2005.02.011> and the Chen-Fang (2019) <doi:10.1111/j.1540-6261.1997.tb04813.x> rank test to the regression coefficient matrix of test asset returns on risk factors. Finally, the function for heteroskedasticity and autocorrelation robust covariance estimation implements the Newey-West (1994) <doi:10.2307/2297912> covariance estimator.

Maintained by Alberto Quaini. Last updated 8 months ago.

factor-models factor-selection finance identification-tests misspecification rcpparmadillo risk-premium openblas cpp openmp

11.5 match 7 stars 4.45 score 1 scripts

igordot

clustermole:Unbiased Single-Cell Transcriptomic Data Cell Type Identification

Assignment of cell type labels to single-cell RNA sequencing (scRNA-seq) clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases.

Maintained by Igor Dolgalev. Last updated 1 years ago.

cell-type cell-type-annotation cell-type-classification cell-type-identification cell-type-matching gene-expression-signatures scrna-seq single-cell

9.5 match 13 stars 5.37 score 36 scripts

bioc

CNEr:CNE Detection and Visualization

Large-scale identification and advanced visualization of sets of conserved noncoding elements.

Maintained by Ge Tan. Last updated 5 months ago.

generegulation visualization dataimport

5.4 match 3 stars 9.28 score 35 scripts 19 dependents

bioc

miRspongeR:Identification and analysis of miRNA sponge regulation

This package provides several functions to explore miRNA sponge (also called ceRNA or miRNA decoy) regulation from putative miRNA-target interactions or/and transcriptomics data (including bulk, single-cell and spatial gene expression data). It provides eight popular methods for identifying miRNA sponge interactions, and an integrative method to integrate miRNA sponge interactions from different methods, as well as the functions to validate miRNA sponge interactions, and infer miRNA sponge modules, conduct enrichment analysis of miRNA sponge modules, and conduct survival analysis of miRNA sponge modules. By using a sample control variable strategy, it provides a function to infer sample-specific miRNA sponge interactions. In terms of sample-specific miRNA sponge interactions, it implements three similarity methods to construct sample-sample correlation network.

Maintained by Junpeng Zhang. Last updated 5 months ago.

geneexpression biomedicalinformatics networkenrichment survival microarray software singlecell spatial rnaseq cerna mirna sponge

8.4 match 5 stars 5.88 score 8 scripts

bioc

MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics

MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.

Maintained by Laurent Gatto. Last updated 4 days ago.

immunooncology infrastructure proteomics massspectrometry qualitycontrol dataimport bioconductor bioinformatics mass-spectrometry proteomics-data visualisation cpp

3.8 match 130 stars 12.81 score 772 scripts 36 dependents

asalavaty

influential:Identification and Classification of the Most Influential Nodes

Contains functions for the classification and ranking of top candidate features, reconstruction of networks from adjacency matrices and data frames, analysis of the topology of the network and calculation of centrality measures, and identification of the most influential nodes. Also, a function is provided for running SIRIR model, which is the combination of leave-one-out cross validation technique and the conventional SIR model, on a network to unsupervisedly rank the true influence of vertices. Additionally, some functions have been provided for the assessment of dependence and correlation of two network centrality measures as well as the conditional probability of deviation from their corresponding means in opposite direction. Fred Viole and David Nawrocki (2013, ISBN:1490523995). Csardi G, Nepusz T (2006). "The igraph software package for complex network research." InterJournal, Complex Systems, 1695. Adopted algorithms and sources are referenced in function document.

Maintained by Adrian Salavaty. Last updated 5 months ago.

centrality-measures classification-model influence-ranking network-analysis priaritization-model

7.4 match 27 stars 6.54 score 43 scripts 1 dependents

thibautjombart

adegenet:Exploratory Analysis of Genetic and Genomic Data

Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure ('genind' class), alleles counts by populations ('genpop'), and genome-wide SNP data ('genlight'). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.

Maintained by Zhian N. Kamvar. Last updated 1 months ago.

3.7 match 182 stars 12.60 score 1.9k scripts 29 dependents

bioc

ILoReg:ILoReg: a tool for high-resolution cell population identification from scRNA-Seq data

ILoReg is a tool for identification of cell populations from scRNA-seq data. In particular, ILoReg is useful for finding cell populations with subtle transcriptomic differences. The method utilizes a self-supervised learning method, called Iteratitive Clustering Projection (ICP), to find cluster probabilities, which are used in noise reduction prior to PCA and the subsequent hierarchical clustering and t-SNE steps. Additionally, functions for differential expression analysis to find gene markers for the populations and gene expression visualization are provided.

Maintained by Johannes Smolander. Last updated 5 months ago.

singlecell software clustering dimensionreduction rnaseq visualization transcriptomics datarepresentation differentialexpression transcription geneexpression

9.3 match 5 stars 4.88 score 2 scripts

cran

deident:Persistent Data Anonymization Pipeline

A framework for the replicable removal of personally identifiable data (PID) in data sets. The package implements a suite of methods to suit different data types based on the suggestions of Garfinkel (2015) <doi:10.6028/NIST.IR.8053> and the ICO "Guidelines on Anonymization" (2012) <https://ico.org.uk/media/1061/anonymisation-code.pdf>.

Maintained by Robert Cook. Last updated 4 months ago.

14.2 match 3.16 score 16 scripts

klausvigo

kknn:Weighted k-Nearest Neighbors

Weighted k-Nearest Neighbors for Classification, Regression and Clustering.

Maintained by Klaus Schliep. Last updated 4 years ago.

nearest-neighbor

4.0 match 23 stars 11.08 score 4.6k scripts 41 dependents

bioc

synapter:Label-free data analysis pipeline for optimal identification and quantitation

The synapter package provides functionality to reanalyse label-free proteomics data acquired on a Synapt G2 mass spectrometer. One or several runs, possibly processed with additional ion mobility separation to increase identification accuracy can be combined to other quantitation files to maximise identification and quantitation accuracy.

Maintained by Laurent Gatto. Last updated 6 days ago.

immunooncology massspectrometry proteomics qualitycontrol

9.1 match 4 stars 4.73 score 5 scripts

bioc

ASICS:Automatic Statistical Identification in Complex Spectra

With a set of pure metabolite reference spectra, ASICS quantifies concentration of metabolites in a complex spectrum. The identification of metabolites is performed by fitting a mixture model to the spectra of the library with a sparse penalty. The method and its statistical properties are described in Tardivel et al. (2017) <doi:10.1007/s11306-017-1244-5>.

Maintained by Gaëlle Lefort. Last updated 5 months ago.

software dataimport cheminformatics metabolomics

8.2 match 5.18 score 30 scripts

bioc

iPAC:Identification of Protein Amino acid Clustering

iPAC is a novel tool to identify somatic amino acid mutation clustering within proteins while taking into account protein structure.

Maintained by Gregory Ryslik. Last updated 4 days ago.

clustering proteomics

7.7 match 5.56 score 4 scripts 3 dependents

davidhofmeyr

PPCI:Projection Pursuit for Cluster Identification

Implements recently developed projection pursuit algorithms for finding optimal linear cluster separators. The clustering algorithms use optimal hyperplane separators based on minimum density, Pavlidis et. al (2016) <https://jmlr.csail.mit.edu/papers/volume17/15-307/15-307.pdf>; minimum normalised cut, Hofmeyr (2017) <doi:10.1109/TPAMI.2016.2609929>; and maximum variance ratio clusterability, Hofmeyr and Pavlidis (2015) <doi:10.1109/SSCI.2015.116>.

Maintained by David Hofmeyr. Last updated 5 years ago.

openblas cpp

12.5 match 2 stars 3.26 score 18 scripts

bioc

doubletrouble:Identification and classification of duplicated genes

doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.

Maintained by Fabrício Almeida-Silva. Last updated 5 days ago.

software wholegenome comparativegenomics functionalgenomics phylogenetics network classification bioinformatics comparative-genomics gene-duplication molecular-evolution whole-genome-duplication

6.3 match 23 stars 6.44 score 17 scripts

tidymodels

modeldata:Data Sets Useful for Modeling Examples

Data sets used for demonstrating or testing model-related packages are contained in this package.

Maintained by Max Kuhn. Last updated 5 months ago.

3.8 match 22 stars 10.66 score 2.2k scripts 17 dependents

rstudio

reticulate:Interface to 'Python'

Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.

Maintained by Tomasz Kalinowski. Last updated 22 hours ago.

cpp

1.9 match 1.7k stars 21.07 score 18k scripts 429 dependents

paterijk

MCDA:Support for the Multicriteria Decision Aiding Process

Support for the analyst in a Multicriteria Decision Aiding (MCDA) process with algorithms, preference elicitation and data visualisation functions. Sébastien Bigaret, Richard Hodgett, Patrick Meyer, Tatyana Mironova, Alexandru Olteanu (2017) Supporting the multi-criteria decision aiding process : R and the MCDA package, Euro Journal On Decision Processes, Volume 5, Issue 1 - 4, pages 169 - 194 <doi:10.1007/s40070-017-0064-1>.

Maintained by Patrick Meyer. Last updated 2 years ago.

6.5 match 30 stars 6.04 score 182 scripts

hdarjus

sparvaride:Variance Identification in Sparse Factor Analysis

This is an implementation of the algorithm described in Section 3 of Hosszejni and Frühwirth-Schnatter (2022) <doi:10.48550/arXiv.2211.00671>. The algorithm is used to verify that the counting rule CR(r,1) holds for the sparsity pattern of the transpose of a factor loading matrix. As detailed in Section 2 of the same paper, if CR(r,1) holds, then the idiosyncratic variances are generically identified. If CR(r,1) does not hold, then we do not know whether the idiosyncratic variances are identified or not.

Maintained by Darjus Hosszejni. Last updated 2 years ago.

econometrics factor-analysis latent-factors parameter-identification cpp

10.5 match 1 stars 3.70 score 4 scripts

bioc

IVAS:Identification of genetic Variants affecting Alternative Splicing

Identification of genetic variants affecting alternative splicing.

Maintained by Seonggyun Han. Last updated 5 months ago.

immunooncology alternativesplicing differentialexpression differentialsplicing geneexpression generegulation regression rnaseq sequencing snp software transcription

8.1 match 4.78 score 1 scripts 1 dependents

bioc

CluMSID:Clustering of MS2 Spectra for Metabolite Identification

CluMSID is a tool that aids the identification of features in untargeted LC-MS/MS analysis by the use of MS2 spectra similarity and unsupervised statistical methods. It offers functions for a complete and customisable workflow from raw data to visualisations and is interfaceable with the xmcs family of preprocessing packages.

Maintained by Tobias Depke. Last updated 5 months ago.

metabolomics preprocessing clustering

6.4 match 10 stars 6.04 score 22 scripts

bioc

PIUMA:Phenotypes Identification Using Mapper from topological data Analysis

The PIUMA package offers a tidy pipeline of Topological Data Analysis frameworks to identify and characterize communities in high and heterogeneous dimensional data.

Maintained by Mattia Chiesa. Last updated 5 months ago.

clustering graphandnetwork dimensionreduction network classification

7.3 match 4 stars 5.08 score 2 scripts

cran

kappalab:Non-Additive Measure and Integral Manipulation Functions

S4 tool box for capacity (or non-additive measure, fuzzy measure) and integral manipulation in a finite setting. It contains routines for handling various types of set functions such as games or capacities. It can be used to compute several non-additive integrals: the Choquet integral, the Sugeno integral, and the symmetric and asymmetric Choquet integrals. An analysis of capacities in terms of decision behavior can be performed through the computation of various indices such as the Shapley value, the interaction index, the orness degree, etc. The well-known Möbius transform, as well as other equivalent representations of set functions can also be computed. Kappalab further contains seven capacity identification routines: three least squares based approaches, a method based on linear programming, a maximum entropy like method based on variance minimization, a minimum distance approach and an unsupervised approach based on parametric entropies. The functions contained in Kappalab can for instance be used in the framework of multicriteria decision making or cooperative game theory.

Maintained by Ivan Kojadinovic. Last updated 1 years ago.

16.7 match 2.21 score 5 dependents

thomasjemielita

StratifiedMedicine:Stratified Medicine

A toolkit for stratified medicine, subgroup identification, and precision medicine. Current tools include (1) filtering models (reduce covariate space), (2) patient-level estimate models (counterfactual patient-level quantities, such as the conditional average treatment effect), (3) subgroup identification models (find subsets of patients with similar treatment effects), and (4) treatment effect estimation and inference (for the overall population and discovered subgroups). These tools can be customized and are directly used in PRISM (patient response identifiers for stratified medicine; Jemielita and Mehrotra 2019 <arXiv:1912.03337>. This package is in beta and will be continually updated.

Maintained by Thomas Jemielita. Last updated 3 years ago.

7.8 match 2 stars 4.73 score 27 scripts

xdomingoal

erah:Automated Spectral Deconvolution, Alignment, and Metabolite Identification in GC/MS-Based Untargeted Metabolomics

Automated compound deconvolution, alignment across samples, and identification of metabolites by spectral library matching in Gas Chromatography - Mass spectrometry (GC-MS) untargeted metabolomics. Outputs a table with compound names, matching scores and the integrated area of the compound for each sample. Package implementation is described in Domingo-Almenara et al. (2016) <doi:10.1021/acs.analchem.6b02927>.

Maintained by Xavier Domingo-Almenara. Last updated 1 years ago.

massspectrometry metabolomics

7.7 match 5 stars 4.70 score 20 scripts

bioc

rTRM:Identification of Transcriptional Regulatory Modules from Protein-Protein Interaction Networks

rTRM identifies transcriptional regulatory modules (TRMs) from protein-protein interaction networks.

Maintained by Diego Diez. Last updated 5 months ago.

transcription network generegulation graphandnetwork bioconductor bioinformatics

7.4 match 3 stars 4.86 score 3 scripts 1 dependents

kurthornik

mlbench:Machine Learning Benchmark Problems

A collection of artificial and real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.

Maintained by Kurt Hornik. Last updated 3 months ago.

4.0 match 2 stars 8.93 score 5.0k scripts 55 dependents

bioboot

bio3d:Biological Structure Analysis

Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.

Maintained by Barry Grant. Last updated 5 months ago.

zlib cpp

4.2 match 5 stars 8.49 score 1.4k scripts 10 dependents

mlampros

fastText:Efficient Learning of Word Representations and Sentence Classification

An interface to the 'fastText' <https://github.com/facebookresearch/fastText> library for efficient learning of word representations and sentence classification. The 'fastText' algorithm is explained in detail in (i) "Enriching Word Vectors with subword Information", Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov, 2017, <doi:10.1162/tacl_a_00051>; (ii) "Bag of Tricks for Efficient Text Classification", Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov, 2017, <doi:10.18653/v1/e17-2068>; (iii) "FastText.zip: Compressing text classification models", Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Herve Jegou, Tomas Mikolov, 2016, <arXiv:1612.03651>.

Maintained by Lampros Mouselimis. Last updated 1 years ago.

cpp11 fasttext cpp

4.8 match 42 stars 7.37 score 56 scripts

bioc

TADCompare:TADCompare: Identification and characterization of differential TADs

TADCompare is an R package designed to identify and characterize differential Topologically Associated Domains (TADs) between multiple Hi-C contact matrices. It contains functions for finding differential TADs between two datasets, finding differential TADs over time and identifying consensus TADs across multiple matrices. It takes all of the main types of HiC input and returns simple, comprehensive, easy to analyze results.

Maintained by Mikhail Dozmorov. Last updated 5 months ago.

software hic sequencing featureextraction clustering

5.0 match 23 stars 7.04 score 10 scripts

ingorohlfing

MMRcaseselection:Case Classification and Selection Based on Regression Results

Researchers doing a mixed-methods analysis (nested analysis as developed by Lieberman (2005) <doi:10.1017/S0003055405051762>) can use the package for the classification of cases and case selection using results of a linear regression. One can designate cases as typical, deviant, extreme and pathway case and use different case selection strategies for the choice of a case belonging to one of these types.

Maintained by Ingo Rohlfing. Last updated 3 years ago.

8.0 match 1 stars 4.38 score 12 scripts

bioc

ppcseq:Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models

Relative transcript abundance has proven to be a valuable tool for understanding the function of genes in biological systems. For the differential analysis of transcript abundance using RNA sequencing data, the negative binomial model is by far the most frequently adopted. However, common methods that are based on a negative binomial model are not robust to extreme outliers, which we found to be abundant in public datasets. So far, no rigorous and probabilistic methods for detection of outliers have been developed for RNA sequencing data, leaving the identification mostly to visual inspection. Recent advances in Bayesian computation allow large-scale comparison of observed data against its theoretical distribution given in a statistical model. Here we propose ppcseq, a key quality-control tool for identifying transcripts that include outlier data points in differential expression analysis, which do not follow a negative binomial distribution. Applying ppcseq to analyse several publicly available datasets using popular tools, we show that from 3 to 10 percent of differentially abundant transcripts across algorithms and datasets had statistics inflated by the presence of outliers.

Maintained by Stefano Mangiola. Last updated 5 months ago.

rnaseq differentialexpression geneexpression normalization clustering qualitycontrol sequencing transcription transcriptomics bayesian-inference deseq2 edger negative-binomial outlier stan cpp

6.1 match 8 stars 5.71 score 16 scripts

santikka

dosearch:Causal Effect Identification from Multiple Incomplete Data Sources

Identification of causal effects from arbitrary observational and experimental probability distributions via do-calculus and standard probability manipulations using a search-based algorithm by Tikka, Hyttinen and Karvanen (2021) <doi:10.18637/jss.v099.i05>. Allows for the presence of mechanisms related to selection bias (Bareinboim and Tian, 2015) <doi:10.1609/aaai.v29i1.9679>, transportability (Bareinboim and Pearl, 2014) <http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf>, missing data (Mohan, Pearl, and Tian, 2013) <http://ftp.cs.ucla.edu/pub/stat_ser/r410.pdf>) and arbitrary combinations of these. Also supports identification in the presence of context-specific independence (CSI) relations through labeled directed acyclic graphs (LDAG). For details on CSIs see (Corander et al., 2019) <doi:10.1016/j.apal.2019.04.004>.

Maintained by Santtu Tikka. Last updated 8 months ago.

c-plus-plus causal-inference causal-models causality causality-algorithms directed-acyclic-graph graphs labeled-graphs cpp

6.5 match 7 stars 5.32 score 8 scripts 1 dependents

topepo

caret:Classification and Regression Training

Misc functions for training and plotting classification and regression models.

Maintained by Max Kuhn. Last updated 3 months ago.

1.8 match 1.6k stars 19.24 score 61k scripts 303 dependents

bioc

cardelino:Clone Identification from Single Cell Data

Methods to infer clonal tree configuration for a population of cells using single-cell RNA-seq data (scRNA-seq), and possibly other data modalities. Methods are also provided to assign cells to inferred clones and explore differences in gene expression between clones. These methods can flexibly integrate information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. A flexible beta-binomial error model that accounts for stochastic dropout events as well as systematic allelic imbalance is used.

Maintained by Davis McCarthy. Last updated 5 months ago.

singlecell rnaseq visualization transcriptomics geneexpression sequencing software exomeseq clonal-clustering gibbs-sampling scrna-seq single-cell somatic-mutations

4.9 match 61 stars 7.05 score 62 scripts

bioc

CAMERA:Collection of annotation related methods for mass spectrometry data

Annotation of peaklists generated by xcms, rule based annotation of isotopes and adducts, isotope validation, EIC correlation based tagging of unknown adducts and fragments

Maintained by Steffen Neumann. Last updated 5 months ago.

immunooncology massspectrometry metabolomics

3.3 match 11 stars 10.27 score 175 scripts 6 dependents

bioc

CEMiTool:Co-expression Modules identification Tool

The CEMiTool package unifies the discovery and the analysis of coexpression gene modules in a fully automatic manner, while providing a user-friendly html report with high quality graphs. Our tool evaluates if modules contain genes that are over-represented by specific pathways or that are altered in a specific sample group. Additionally, CEMiTool is able to integrate transcriptomic data with interactome information, identifying the potential hubs on each network.

Maintained by Helder Nakaya. Last updated 5 months ago.

geneexpression transcriptomics graphandnetwork mrnamicroarray rnaseq network networkenrichment pathways immunooncology

5.9 match 5.76 score 38 scripts

boopsboops

spider:Species Identity and Evolution in R

Analysis of species limits and DNA barcoding data. Included are functions for generating important summary statistics from DNA barcode data, assessing specimen identification efficacy, testing and optimizing divergence threshold limits, assessment of diagnostic nucleotides, and calculation of the probability of reciprocal monophyly. Additionally, a sliding window function offers opportunities to analyse information across a gene, often used for marker design in degraded DNA studies. Further information on the package has been published in Brown et al (2012) <doi:10.1111/j.1755-0998.2011.03108.x>.

Maintained by Rupert A. Collins. Last updated 6 years ago.

dna-barcode edna evolution species-delimitation species-identity

6.5 match 2 stars 5.20 score 66 scripts 1 dependents

bioc

FLAMES:FLAMES: Full Length Analysis of Mutations and Splicing in long read RNA-seq data

Semi-supervised isoform detection and annotation from both bulk and single-cell long read RNA-seq data. Flames provides automated pipelines for analysing isoforms, as well as intermediate functions for manual execution.

Maintained by Changqing Wang. Last updated 7 days ago.

rnaseq singlecell transcriptomics dataimport differentialsplicing alternativesplicing geneexpression longread zlib curl bzip2 xz-utils cpp

4.3 match 31 stars 7.95 score 12 scripts

prise6

aVirtualTwins:Adaptation of Virtual Twins Method from Jared Foster

Research of subgroups in random clinical trials with binary outcome and two treatments groups. This is an adaptation of the Jared Foster method (<https://www.ncbi.nlm.nih.gov/pubmed/21815180>).

Maintained by Francois Vieille. Last updated 7 years ago.

subgroup-identification trials

7.5 match 4 stars 4.51 score 16 scripts

bioc

CHETAH:Fast and accurate scRNA-seq cell type identification

CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate, selective and fast scRNA-seq classifier. Classification is guided by a reference dataset, preferentially also a scRNA-seq dataset. By hierarchical clustering of the reference data, CHETAH creates a classification tree that enables a step-wise, top-to-bottom classification. Using a novel stopping rule, CHETAH classifies the input cells to the cell types of the references and to "intermediate types": more general classifications that ended in an intermediate node of the tree.

Maintained by Jurrian de Kanter. Last updated 5 months ago.

classification rnaseq singlecell clustering geneexpression immunooncology

4.6 match 44 stars 7.27 score 70 scripts

bioc

EventPointer:An effective identification of alternative splicing events using junction arrays and RNA-Seq data

EventPointer is an R package to identify alternative splicing events that involve either simple (case-control experiment) or complex experimental designs such as time course experiments and studies including paired-samples. The algorithm can be used to analyze data from either junction arrays (Affymetrix Arrays) or sequencing data (RNA-Seq). The software returns a data.frame with the detected alternative splicing events: gene name, type of event (cassette, alternative 3',...,etc), genomic position, statistical significance and increment of the percent spliced in (Delta PSI) for all the events. The algorithm can generate a series of files to visualize the detected alternative splicing events in IGV. This eases the interpretation of results and the design of primers for standard PCR validation.

Maintained by Juan A. Ferrer-Bonsoms. Last updated 5 months ago.

alternativesplicing differentialsplicing mrnamicroarray rnaseq transcription sequencing timecourse immunooncology

5.4 match 4 stars 6.00 score 6 scripts

emmanuelparadis

ape:Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

Maintained by Emmanuel Paradis. Last updated 2 days ago.

openblas cpp

1.9 match 64 stars 17.22 score 13k scripts 599 dependents

r-econometrics

lfe:Linear Group Fixed Effects

Transforms away factors with many levels prior to doing an OLS. Useful for estimating linear models with multiple group fixed effects, and for estimating linear models which uses factors with many levels as pure control variables. See Gaure (2013) <doi:10.1016/j.csda.2013.03.024> Includes support for instrumental variables, conditional F statistics for weak instruments, robust and multi-way clustered standard errors, as well as limited mobility bias correction (Gaure 2014 <doi:10.1002/sta4.68>). Since version 3.0, it provides dedicated functions to estimate Poisson models.

Maintained by Mauricio Vargas Sepulveda. Last updated 1 years ago.

openblas

3.1 match 10.30 score 1.8k scripts 5 dependents

jkcshea

ivmte:Instrumental Variables: Extrapolation by Marginal Treatment Effects

The marginal treatment effect was introduced by Heckman and Vytlacil (2005) <doi:10.1111/j.1468-0262.2005.00594.x> to provide a choice-theoretic interpretation to instrumental variables models that maintain the monotonicity condition of Imbens and Angrist (1994) <doi:10.2307/2951620>. This interpretation can be used to extrapolate from the compliers to estimate treatment effects for other subpopulations. This package provides a flexible set of methods for conducting this extrapolation. It allows for parametric or nonparametric sieve estimation, and allows the user to maintain shape restrictions such as monotonicity. The package operates in the general framework developed by Mogstad, Santos and Torgovitsky (2018) <doi:10.3982/ECTA15463>, and accommodates either point identification or partial identification (bounds). In the partially identified case, bounds are computed using either linear programming or quadratically constrained quadratic programming. Support for four solvers is provided. Gurobi and the Gurobi R API can be obtained from <http://www.gurobi.com/index>. CPLEX can be obtained from <https://www.ibm.com/analytics/cplex-optimizer>. CPLEX R APIs 'Rcplex' and 'cplexAPI' are available from CRAN. MOSEK and the MOSEK R API can be obtained from <https://www.mosek.com/>. The lp_solve library is freely available from <http://lpsolve.sourceforge.net/5.5/>, and is included when installing its API 'lpSolveAPI', which is available from CRAN.

Maintained by Joshua Shea. Last updated 7 months ago.

6.0 match 18 stars 5.33 score 30 scripts

bioc

geneAttribution:Identification of candidate genes associated with genetic variation

Identification of the most likely gene or genes through which variation at a given genomic locus in the human genome acts. The most basic functionality assumes that the closer gene is to the input locus, the more likely the gene is to be causative. Additionally, any empirical data that links genomic regions to genes (e.g. eQTL or genome conformation data) can be used if it is supplied in the UCSC .BED file format.

Maintained by Arthur Wuster. Last updated 5 months ago.

snp geneprediction genomewideassociation variantannotation genomicvariation

7.9 match 4.00 score 3 scripts

ericarcher

banter:BioAcoustic eveNT classifiER

Create a hierarchical acoustic event species classifier out of multiple call type detectors as described in Rankin et al (2017) <doi:10.1111/mms.12381>.

Maintained by Eric Archer. Last updated 1 years ago.

acoustics bioacoustics cetaceans classification dolphins machine-learning noaa random-forest species-identification supervised-learning supervised-machine-learning whales jags cpp

7.5 match 9 stars 4.22 score 37 scripts

atbounds

ATbounds:Bounding Treatment Effects by Limited Information Pooling

Estimation and inference methods for bounding average treatment effects (on the treated) that are valid under an unconfoundedness assumption. The bounds are designed to be robust in challenging situations, for example, when the conditioning variables take on a large number of different values in the observed sample, or when the overlap condition is violated. This robustness is achieved by only using limited "pooling" of information across observations. For more details, see the paper by Lee and Weidner (2021), "Bounding Treatment Effects by Pooling Limited Information across Observations," <arXiv:2111.05243>.

Maintained by Sokbae Lee. Last updated 3 years ago.

causal-inference lack-of-overlap limited-overlap partial-identification treatment-effects unconfoundedness-assumption

7.5 match 3 stars 4.18 score 6 scripts

bioc

BioNERO:Biological Network Reconstruction Omnibus

BioNERO aims to integrate all aspects of biological network inference in a single package, including data preprocessing, exploratory analyses, network inference, and analyses for biological interpretations. BioNERO can be used to infer gene coexpression networks (GCNs) and gene regulatory networks (GRNs) from gene expression data. Additionally, it can be used to explore topological properties of protein-protein interaction (PPI) networks. GCN inference relies on the popular WGCNA algorithm. GRN inference is based on the "wisdom of the crowds" principle, which consists in inferring GRNs with multiple algorithms (here, CLR, GENIE3 and ARACNE) and calculating the average rank for each interaction pair. As all steps of network analyses are included in this package, BioNERO makes users avoid having to learn the syntaxes of several packages and how to communicate between them. Finally, users can also identify consensus modules across independent expression sets and calculate intra and interspecies module preservation statistics between different networks.

Maintained by Fabricio Almeida-Silva. Last updated 5 months ago.

software geneexpression generegulation systemsbiology graphandnetwork preprocessing network networkinference

4.0 match 27 stars 7.78 score 50 scripts 1 dependents

spluque

diveMove:Dive Analysis and Calibration

Utilities to represent, visualize, filter, analyse, and summarize time-depth recorder (TDR) data. Miscellaneous functions for handling location data are also provided.

Maintained by Sebastian P. Luque. Last updated 5 months ago.

animal-behavior behavioural-ecology biology diving science

4.6 match 6 stars 6.75 score 55 scripts

bioc

IPO:Automated Optimization of XCMS Data Processing parameters

The outcome of XCMS data processing strongly depends on the parameter settings. IPO (`Isotopologue Parameter Optimization`) is a parameter optimization tool that is applicable for different kinds of samples and liquid chromatography coupled to high resolution mass spectrometry devices, fast and free of labeling steps. IPO uses natural, stable 13C isotopes to calculate a peak picking score. Retention time correction is optimized by minimizing the relative retention time differences within features and grouping parameters are optimized by maximizing the number of features showing exactly one peak from each injection of a pooled sample. The different parameter settings are achieved by design of experiment. The resulting scores are evaluated using response surface models.

Maintained by Thomas Lieb. Last updated 5 months ago.

immunooncology metabolomics massspectrometry

3.8 match 34 stars 8.14 score 41 scripts

trevorhastie

mda:Mixture and Flexible Discriminant Analysis

Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, and vector-response smoothing splines. Hastie, Tibshirani and Friedman (2009) "Elements of Statistical Learning (second edition, chap 12)" Springer, New York.

Maintained by Trevor Hastie. Last updated 4 months ago.

fortran

4.0 match 3 stars 7.60 score 428 scripts 17 dependents

bioc

mzID:An mzIdentML parser for R

A parser for mzIdentML files implemented using the XML package. The parser tries to be general and able to handle all types of mzIdentML files with the drawback of having less 'pretty' output than a vendor specific parser. Please contact the maintainer with any problems and supply an mzIdentML file so the problems can be fixed quickly.

Maintained by Laurent Gatto. Last updated 5 months ago.

immunooncology dataimport massspectrometry proteomics

3.9 match 7.83 score 32 scripts 38 dependents

sokbae

ciccr:Causal Inference in Case-Control and Case-Population Studies

Estimation and inference methods for causal relative and attributable risk in case-control and case-population studies under the monotone treatment response and monotone treatment selection assumptions. For more details, see the paper by Jun and Lee (2023), "Causal Inference under Outcome-Based Sampling with Monotonicity Assumptions," <arXiv:2004.08318 [econ.EM]>, accepted for publication in Journal of Business & Economic Statistics.

Maintained by Sokbae Lee. Last updated 1 years ago.

case-control-studies causal-inference partial-identification treatment-effects

7.5 match 2 stars 4.00 score 4 scripts

eriqande

rubias:Bayesian Inference from the Conditional Genetic Stock Identification Model

Implements Bayesian inference for the conditional genetic stock identification model. It allows inference of mixed fisheries and also simulation of mixtures to predict accuracy. A full description of the underlying methods is available in a recently published article in the Canadian Journal of Fisheries and Aquatic Sciences: <doi:10.1139/cjfas-2018-0016>.

Maintained by Eric C. Anderson. Last updated 1 years ago.

noaa-omics-software cpp

5.1 match 3 stars 5.90 score 89 scripts

schaubert

catdata:Categorical Data

This R-package contains examples from the book "Regression for Categorical Data", Tutz 2012, Cambridge University Press. The names of the examples refer to the chapter and the data set that is used.

Maintained by Gunther Schauberger. Last updated 1 years ago.

4.5 match 6.61 score 158 scripts 2 dependents

bioc

DNAfusion:Identification of gene fusions using paired-end sequencing

DNAfusion can identify gene fusions such as EML4-ALK based on paired-end sequencing results. This package was developed using position deduplicated BAM files generated with the AVENIO Oncology Analysis Software. These files are made using the AVENIO ctDNA surveillance kit and Illumina Nextseq 500 sequencing. This is a targeted hybridization NGS approach and includes ALK-specific but not EML4-specific probes.

Maintained by Christoffer Trier Maansson. Last updated 5 months ago.

targetedresequencing genetics genefusiondetection sequencing bioconductor-package circulating-tumor-dna gene-fusion liquid-biopsy next-generation-sequencing targeted-sequencing variant-calling

6.6 match 3 stars 4.48 score 10 scripts

biogenies

CancerGram:Prediction of Anticancer Peptides

Predicts anticancer peptides using random forests trained on the n-gram encoded peptides. The implemented algorithm can be accessed from both the command line and shiny-based GUI. The CancerGram model is too large for CRAN and it has to be downloaded separately from the repository: <https://github.com/BioGenies/CancerGramModel>. For more information see: Burdukiewicz et al. (2020) <doi:10.3390/pharmaceutics12111045>.

Maintained by Michal Burdukiewicz. Last updated 4 years ago.

anticancer-peptides bioinformatics k-mer n-gram peptide-identification random-forests

7.5 match 4 stars 3.90 score 3 scripts

cvxgrp

CVXR:Disciplined Convex Optimization

An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.

Maintained by Anqi Fu. Last updated 4 months ago.

cpp

2.3 match 207 stars 12.89 score 768 scripts 51 dependents

dlcarl

TSCI:Tools for Causal Inference with Possibly Invalid Instrumental Variables

Two stage curvature identification with machine learning for causal inference in settings when instrumental variable regression is not suitable because of potentially invalid instrumental variables. Based on Guo and Buehlmann (2022) "Two Stage Curvature Identification with Machine Learning: Causal Inference with Possibly Invalid Instrumental Variables" <arXiv:2203.12808>. The vignette is available in Carl, Emmenegger, Bühlmann and Guo (2023) "TSCI: two stage curvature identification for causal inference with invalid instruments" <arXiv:2304.00513>.

Maintained by David Carl. Last updated 1 years ago.

9.6 match 1 stars 3.00 score 3 scripts

bioc

MethPed:A DNA methylation classifier tool for the identification of pediatric brain tumor subtypes

Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro).

Maintained by Helena Carén. Last updated 5 months ago.

immunooncology dnamethylation classification epigenetics

7.2 match 4.00 score 1 scripts

ropengov

hetu:Structural Handling of Finnish Personal Identity Codes

Structural handling of Finnish identity codes (natural persons and organizations); extract information, check ID validity and diagnostics.

Maintained by Pyry Kantanen. Last updated 4 months ago.

ropengov

5.9 match 2 stars 4.86 score 18 scripts

bioc

LOBSTAHS:Lipid and Oxylipin Biomarker Screening through Adduct Hierarchy Sequences

LOBSTAHS is a multifunction package for screening, annotation, and putative identification of mass spectral features in large, HPLC-MS lipid datasets. In silico data for a wide range of lipids, oxidized lipids, and oxylipins can be generated from user-supplied structural criteria with a database generation function. LOBSTAHS then applies these databases to assign putative compound identities to features in any high-mass accuracy dataset that has been processed using xcms and CAMERA. Users can then apply a series of orthogonal screening criteria based on adduct ion formation patterns, chromatographic retention time, and other properties, to evaluate and assign confidence scores to this list of preliminary assignments. During the screening routine, LOBSTAHS rejects assignments that do not meet the specified criteria, identifies potential isomers and isobars, and assigns a variety of annotation codes to assist the user in evaluating the accuracy of each assignment.

Maintained by Henry Holm. Last updated 5 months ago.

immunooncology massspectrometry metabolomics lipidomics dataimport adduct algae bioconductor hplc-esi-ms lipid mass-spectrometry oxidative-stress-biomarkers oxidized-lipids oxylipins plankton

4.3 match 8 stars 6.56 score 9 scripts

conradwasko

hydroEvents:Extract Event Statistics in Hydrologic Time Series

Events from individual hydrologic time series are extracted, and events from multiple time series can be matched to each other. Tang, W. & Carey, S. K. (2017) <doi:10.1002/hyp.11185>. Kaur, S., Horne, A., Stewardson, M.J., Nathan, R., Costa, A.M., Szemis, J.M., & Webb, J.A. (2017) <doi:10.1080/24705357.2016.1276418>. Ladson, A., Brown, R., Neal, B., & Nathan, R. J. (2013) <doi:10.7158/W12-028.2013.17.1>.

Maintained by Conrad Wasko. Last updated 1 months ago.

7.0 match 6 stars 4.03 score 36 scripts

bioc

nnSVG:Scalable identification of spatially variable genes in spatially-resolved transcriptomics data

Method for scalable identification of spatially variable genes (SVGs) in spatially-resolved transcriptomics data. The method is based on nearest-neighbor Gaussian processes and uses the BRISC algorithm for model fitting and parameter estimation. Allows identification and ranking of SVGs with flexible length scales across a tissue slide or within spatial domains defined by covariates. Scales linearly with the number of spatial locations and can be applied to datasets containing thousands or more spatial locations.

Maintained by Lukas M. Weber. Last updated 20 days ago.

spatial singlecell transcriptomics geneexpression preprocessing

3.6 match 17 stars 7.75 score 183 scripts 1 dependents

bioc

cogeqc:Systematic quality checks on comparative genomics analyses

cogeqc aims to facilitate systematic quality checks on standard comparative genomics analyses to help researchers detect issues and select the most suitable parameters for each data set. cogeqc can be used to asses: i. genome assembly and annotation quality with BUSCOs and comparisons of statistics with publicly available genomes on the NCBI; ii. orthogroup inference using a protein domain-based approach and; iii. synteny detection using synteny network properties. There are also data visualization functions to explore QC summary statistics.

Maintained by Fabrício Almeida-Silva. Last updated 5 months ago.

software genomeassembly comparativegenomics functionalgenomics phylogenetics qualitycontrol network comparative-genomics evolutionary-genomics

4.5 match 10 stars 6.08 score 20 scripts

clavellab

maldipickr:Dereplicate and Cherry-Pick Mass Spectrometry Spectra

Convenient wrapper functions for the analysis of matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) spectra data in order to select only representative spectra (also called cherry-pick). The package covers the preprocessing and dereplication steps (based on Strejcek, Smrhova, Junkova and Uhlik (2018) <doi:10.3389/fmicb.2018.01294>) needed to cluster MALDI-TOF spectra before the final cherry-picking step. It enables the easy exclusion of spectra and/or clusters to accommodate complex cherry-picking strategies. Alternatively, cherry-picking using taxonomic identification MALDI-TOF data is made easy with functions to import inconsistently formatted reports.

Maintained by Charlie Pauvert. Last updated 25 days ago.

cherry-pick dereplication maldi-tof-ms

5.1 match 2 stars 5.32 score 8 scripts

cran

mpwR:Standardized Comparison of Workflows in Mass Spectrometry-Based Bottom-Up Proteomics

Useful functions to analyze proteomic workflows including number of identifications, data completeness, missed cleavages, quantitative and retention time precision etc. Various software outputs are supported such as 'ProteomeDiscoverer', 'Spectronaut', 'DIA-NN' and 'MaxQuant'.

Maintained by Oliver Kardell. Last updated 1 years ago.

8.3 match 3.30 score

bioc

GARS:GARS: Genetic Algorithm for the identification of Robust Subsets of variables in high-dimensional and challenging datasets

Feature selection aims to identify and remove redundant, irrelevant and noisy variables from high-dimensional datasets. Selecting informative features affects the subsequent classification and regression analyses by improving their overall performances. Several methods have been proposed to perform feature selection: most of them relies on univariate statistics, correlation, entropy measurements or the usage of backward/forward regressions. Herein, we propose an efficient, robust and fast method that adopts stochastic optimization approaches for high-dimensional. GARS is an innovative implementation of a genetic algorithm that selects robust features in high-dimensional and challenging datasets.

Maintained by Mattia Chiesa. Last updated 5 months ago.

classification featureextraction clustering openjdk

5.5 match 5.00 score 2 scripts

ncordon

imbalance:Preprocessing Algorithms for Imbalanced Datasets

Class imbalance usually damages the performance of classifiers. Thus, it is important to treat data before applying a classifier algorithm. This package includes recent resampling algorithms in the literature: (Barua et al. 2014) <doi:10.1109/tkde.2012.232>; (Das et al. 2015) <doi:10.1109/tkde.2014.2324567>, (Zhang et al. 2014) <doi:10.1016/j.inffus.2013.12.003>; (Gao et al. 2014) <doi:10.1016/j.neucom.2014.02.006>; (Almogahed et al. 2014) <doi:10.1007/s00500-014-1484-5>. It also includes an useful interface to perform oversampling.

Maintained by Ignacio Cordón. Last updated 5 years ago.

binary-classification imbalanced-data oversampling openblas cpp

3.8 match 36 stars 7.14 score 98 scripts

bioc

GraphPAC:Identification of Mutational Clusters in Proteins via a Graph Theoretical Approach.

Identifies mutational clusters of amino acids in a protein while utilizing the proteins tertiary structure via a graph theoretical model.

Maintained by Gregory Ryslik. Last updated 4 days ago.

clustering proteomics

5.7 match 4.65 score 1 scripts 1 dependents

rtsay1

MTS:All-Purpose Toolkit for Analyzing Multivariate Time Series (MTS) and Estimating Multivariate Volatility Models

Multivariate Time Series (MTS) is a general package for analyzing multivariate linear time series and estimating multivariate volatility models. It also handles factor models, constrained factor models, asymptotic principal component analysis commonly used in finance and econometrics, and principal volatility component analysis. (a) For the multivariate linear time series analysis, the package performs model specification, estimation, model checking, and prediction for many widely used models, including vector AR models, vector MA models, vector ARMA models, seasonal vector ARMA models, VAR models with exogenous variables, multivariate regression models with time series errors, augmented VAR models, and Error-correction VAR models for co-integrated time series. For model specification, the package performs structural specification to overcome the difficulties of identifiability of VARMA models. The methods used for structural specification include Kronecker indices and Scalar Component Models. (b) For multivariate volatility modeling, the MTS package handles several commonly used models, including multivariate exponentially weighted moving-average volatility, Cholesky decomposition volatility models, dynamic conditional correlation (DCC) models, copula-based volatility models, and low-dimensional BEKK models. The package also considers multiple tests for conditional heteroscedasticity, including rank-based statistics. (c) Finally, the MTS package also performs forecasting using diffusion index , transfer function analysis, Bayesian estimation of VAR models, and multivariate time series analysis with missing values.Users can also use the package to simulate VARMA models, to compute impulse response functions of a fitted VARMA model, and to calculate theoretical cross-covariance matrices of a given VARMA model.

Maintained by Ruey S. Tsay. Last updated 3 years ago.

cpp

4.0 match 6 stars 6.52 score 272 scripts 6 dependents

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 11 months ago.

11.4 match 2.28 score 38 scripts

gabrielelubatti

MitoHEAR:Quantification of Mitochondrial DNA Heteroplasmy

R package that allows the estimation and downstream statistical analysis of the mitochondrial DNA Heteroplasmy calculated from single-cell datasets.

Maintained by Gabriele Lubatti. Last updated 3 years ago.

software

5.8 match 4.45 score 14 scripts

magnusdv

forrel:Forensic Pedigree Analysis and Relatedness Inference

Forensic applications of pedigree analysis, including likelihood ratios for relationship testing, general relatedness inference, marker simulation, and power analysis. 'forrel' is part of the 'pedsuite', a collection of packages for pedigree analysis, further described in the book 'Pedigree Analysis in R' (Vigeland, 2021, ISBN:9780128244302). Several functions deal specifically with power analysis in missing person cases, implementing methods described in Vigeland et al. (2020) <doi:10.1016/j.fsigen.2020.102376>. Data import from the 'Familias' software (Egeland et al. (2000) <doi:10.1016/S0379-0738(00)00147-X>) is supported through the 'pedFamilias' package.

Maintained by Magnus Dehli Vigeland. Last updated 6 days ago.

3.6 match 11 stars 6.98 score 63 scripts 7 dependents

rfael0cm

RTIGER:HMM-Based Model for Genotyping and Cross-Over Identification

Our method integrates information from all sequenced samples, thus avoiding loss of alleles due to low coverage. Moreover, it increases the statistical power to uncover sequencing or alignment errors <doi:10.1093/plphys/kiad191>.

Maintained by Rafael Campos-Martin. Last updated 1 years ago.

genomeannotation hiddenmarkovmodel sequencing

5.8 match 4 stars 4.30 score 5 scripts

petertuwien

mvoutlier:Multivariate Outlier Detection Based on Robust Methods

Various methods for multivariate outlier detection: arw, a Mahalanobis-type method with an adaptive outlier cutoff value; locout, a method incorporating local neighborhood; pcout, a method for high-dimensional data; mvoutlier.CoDa, a method for compositional data. References are provided in the corresponding help files.

Maintained by P. Filzmoser. Last updated 4 years ago.

5.1 match 1 stars 4.84 score 294 scripts 4 dependents

jackmwolf

tehtuner:Fit and Tune Models to Detect Treatment Effect Heterogeneity

Implements methods to fit Virtual Twins models (Foster et al. (2011) <doi:10.1002/sim.4322>) for identifying subgroups with differential effects in the context of clinical trials while controlling the probability of falsely detecting a differential effect when the conditional average treatment effect is uniform across the study population using parameter selection methods proposed in Wolf et al. (2022) <doi:10.1177/17407745221095855>.

Maintained by Jack Wolf. Last updated 2 years ago.

clinical-trials heterogeneity-of-treatment-effect subgroup-identification

7.5 match 4 stars 3.30 score 6 scripts

bioc

CARNIVAL:A CAusal Reasoning tool for Network Identification (from gene expression data) using Integer VALue programming

An upgraded causal reasoning tool from Melas et al in R with updated assignments of TFs' weights from PROGENy scores. Optimization parameters can be freely adjusted and multiple solutions can be obtained and aggregated.

Maintained by Attila Gabor. Last updated 5 months ago.

transcriptomics geneexpression network causal-models footprints integer-linear-programming pathway-enrichment-analysis

2.7 match 57 stars 9.03 score 90 scripts 1 dependents

bruigtp

REDCapDM:'REDCap' Data Management

REDCap Data Management - REDCapDM is an R package that allows users to manage data exported directly from REDCap or using an API connection. This package includes several functions designed for pre-processing data, generating reports of queries such as outliers or missing values, and following up on the identified queries. 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>) is a web application developed at Vanderbilt University, designed for creating and managing online surveys and databases and the REDCap API is an interface that allows external applications to connect to REDCap remotely, and is used to programmatically retrieve or modify project data or settings within REDCap, such as importing or exporting data.

Maintained by João Carmezim. Last updated 4 days ago.

4.1 match 4 stars 5.89 score 9 scripts

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 5 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

1.8 match 182 stars 13.71 score 1.3k scripts 22 dependents

avrodrigues

naturaList:Classify Occurrences by Confidence Levels in the Species ID

Classify occurrence records based on confidence levels of species identification. In addition, implement tools to filter occurrences inside grid cells and to manually check for possibles errors with an interactive shiny application.

Maintained by Arthur Vinicius Rodrigues. Last updated 1 years ago.

5.1 match 4.66 score 23 scripts

bioc

Spectra:Spectra Infrastructure for Mass Spectrometry Data

The Spectra package defines an efficient infrastructure for storing and handling mass spectrometry spectra and functionality to subset, process, visualize and compare spectra data. It provides different implementations (backends) to store mass spectrometry data. These comprise backends tuned for fast data access and processing and backends for very large data sets ensuring a small memory footprint.

Maintained by RforMassSpectrometry Package Maintainer. Last updated 11 days ago.

infrastructure proteomics massspectrometry metabolomics bioconductor hacktoberfest mass-spectrometry

1.8 match 41 stars 13.01 score 254 scripts 35 dependents

bioc

planttfhunter:Identification and classification of plant transcription factors

planttfhunter is used to identify plant transcription factors (TFs) from protein sequence data and classify them into families and subfamilies using the classification scheme implemented in PlantTFDB. TFs are identified using pre-built hidden Markov model profiles for DNA-binding domains. Then, auxiliary and forbidden domains are used with DNA-binding domains to classify TFs into families and subfamilies (when applicable). Currently, TFs can be classified in 58 different TF families/subfamilies.

Maintained by Fabrício Almeida-Silva. Last updated 5 months ago.

software transcription functionalprediction genomeannotation functionalgenomics hiddenmarkovmodel sequencing classification functional-genomics gene-families hidden-markov-models plant-genomics plants protein-domains transcription-factors

5.8 match 4.00 score 5 scripts

hanjunwei-lab

SMDIC:Identification of Somatic Mutation-Driven Immune Cells

A computing tool is developed to automated identify somatic mutation-driven immune cells. The operation modes including: i) inferring the relative abundance matrix of tumor-infiltrating immune cells and integrating it with a particular gene mutation status, ii) detecting differential immune cells with respect to the gene mutation status and converting the abundance matrix of significant differential immune cell into two binary matrices (one for up-regulated and one for down-regulated), iii) identifying somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples, and iv) visualization of immune cell abundance of samples in different mutation status..

Maintained by Junwei Han. Last updated 5 months ago.

5.8 match 2 stars 4.00 score 5 scripts

bioc

BindingSiteFinder:Binding site defintion based on iCLIP data

Precise knowledge on the binding sites of an RNA-binding protein (RBP) is key to understand (post-) transcriptional regulatory processes. Here we present a workflow that describes how exact binding sites can be defined from iCLIP data. The package provides functions for binding site definition and result visualization. For details please see the vignette.

Maintained by Mirko Brüggemann. Last updated 2 days ago.

sequencing geneexpression generegulation functionalgenomics coverage dataimport binding-site-classification binding-sites bioconductor-package iclip rna-binding-proteins

4.0 match 6 stars 5.73 score 3 scripts

hristostyr

scoringfunctions:A Collection of Loss Functions for Assessing Point Forecasts

Implements multiple consistent scoring functions (Gneiting T (2011) <doi:10.1198/jasa.2011.r10138>) for assessing point forecasts and point predictions. Detailed documentation of scoring functions' properties is included for facilitating interpretation of results.

Maintained by Hristos Tyralis. Last updated 15 days ago.

15.3 match 1 stars 1.48 score

rvlenth

emmeans:Estimated Marginal Means, aka Least-Squares Means

Obtain estimated marginal means (EMMs) for many linear, generalized linear, and mixed models. Compute contrasts or linear functions of EMMs, trends, and comparisons of slopes. Plots and other displays. Least-squares means are discussed, and the term "estimated marginal means" is suggested, in Searle, Speed, and Milliken (1980) Population marginal means in the linear model: An alternative to least squares means, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>.

Maintained by Russell V. Lenth. Last updated 5 days ago.

1.2 match 377 stars 19.19 score 13k scripts 187 dependents

bioc

adductomicsR:Processing of adductomic mass spectral datasets

Processes MS2 data to identify potentially adducted peptides from spectra that has been corrected for mass drift and retention time drift and quantifies MS1 level mass spectral peaks.

Maintained by Josie Hayes. Last updated 5 months ago.

massspectrometry metabolomics software thirdpartyclient dataimport gui

5.6 match 1 stars 4.00 score 5 scripts

santikka

cfid:Identification of Counterfactual Queries in Causal Models

Facilitates the identification of counterfactual queries in structural causal models via the ID* and IDC* algorithms by Shpitser, I. and Pearl, J. (2007, 2008) <arXiv:1206.5294>, <https://jmlr.org/papers/v9/shpitser08a.html>. Provides a simple interface for defining causal diagrams and counterfactual conjunctions. Construction of parallel worlds graphs and counterfactual graphs is carried out automatically based on the counterfactual query and the causal diagram. See Tikka, S. (2023) <doi:10.32614/RJ-2023-053> for a tutorial of the package.

Maintained by Santtu Tikka. Last updated 8 months ago.

causal-inference causal-models causality-algorithms counterfactual counterfactuals directed-acyclic-graph identifiability

5.5 match 7 stars 4.02 score 2 scripts 1 dependents

config-i1

greybox:Toolbox for Model Building and Forecasting

Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.

Maintained by Ivan Svetunkov. Last updated 4 days ago.

forecasting model-selection model-selection-and-evaluation regression regression-models statistics cpp

2.0 match 30 stars 11.03 score 97 scripts 34 dependents

markmfredrickson

optmatch:Functions for Optimal Matching

Distance based bipartite matching using minimum cost flow, oriented to matching of treatment and control groups in observational studies ('Hansen' and 'Klopfer' 2006 <doi:10.1198/106186006X137047>). Routines are provided to generate distances from generalised linear models (propensity score matching), formulas giving variables on which to limit matched distances, stratified or exact matching directives, or calipers, alone or in combination.

Maintained by Josh Errickson. Last updated 3 months ago.

matching openblas cpp

1.8 match 47 stars 12.22 score 588 scripts 5 dependents

bioc

EpiDISH:Epigenetic Dissection of Intra-Sample-Heterogeneity

EpiDISH is a R package to infer the proportions of a priori known cell-types present in a sample representing a mixture of such cell-types. Right now, the package can be used on DNAm data of blood-tissue of any age, from birth to old-age, generic epithelial tissue and breast tissue. Besides, the package provides a function that allows the identification of differentially methylated cell-types and their directionality of change in Epigenome-Wide Association Studies.

Maintained by Shijie C. Zheng. Last updated 4 months ago.

dnamethylation methylationarray epigenetics differentialmethylation immunooncology

2.1 match 48 stars 10.28 score 166 scripts 4 dependents

hanjunwei-lab

ProgModule:Identification of Prognosis-Related Mutually Exclusive Modules

A novel tool to identify candidate driver modules for predicting the prognosis of patients by integrating exclusive coverage of mutations with clinical characteristics in cancer.

Maintained by Junwei Han. Last updated 3 months ago.

5.8 match 3.70 score 1 scripts

racdale

sindyr:Sparse Identification of Nonlinear Dynamics

This implements the Brunton et al (2016; PNAS <doi:10.1073/pnas.1517384113>) sparse identification algorithm for finding ordinary differential equations for a measured system from raw data (SINDy). The package includes a set of additional tools for working with raw data, with an emphasis on cognitive science applications (Dale and Bhat, 2018 <doi:10.1016/j.cogsys.2018.06.020>). See <https://github.com/racdale/sindyr> for examples and updates.

Maintained by Rick Dale. Last updated 11 months ago.

5.5 match 15 stars 3.92 score 11 scripts

tim-tu

weibulltools:Statistical Methods for Life Data Analysis

Provides statistical methods and visualizations that are often used in reliability engineering. Comprises a compact and easily accessible set of methods and visualization tools that make the examination and adjustment as well as the analysis and interpretation of field data (and bench tests) as simple as possible. Non-parametric estimators like Median Ranks, Kaplan-Meier (Abernethy, 2006, <ISBN:978-0-9653062-3-2>), Johnson (Johnson, 1964, <ISBN:978-0444403223>), and Nelson-Aalen for failure probability estimation within samples that contain failures as well as censored data are included. The package supports methods like Maximum Likelihood and Rank Regression, (Genschel and Meeker, 2010, <DOI:10.1080/08982112.2010.503447>) for the estimation of multiple parametric lifetime distributions, as well as the computation of confidence intervals of quantiles and probabilities using the delta method related to Fisher's confidence intervals (Meeker and Escobar, 1998, <ISBN:9780471673279>) and the beta-binomial confidence bounds. If desired, mixture model analysis can be done with segmented regression and the EM algorithm. Besides the well-known Weibull analysis, the package also contains Monte Carlo methods for the correction and completion of imprecisely recorded or unknown lifetime characteristics. (Verband der Automobilindustrie e.V. (VDA), 2016, <ISSN:0943-9412>). Plots are created statically ('ggplot2') or interactively ('plotly') and can be customized with functions of the respective visualization package. The graphical technique of probability plotting as well as the addition of regression lines and confidence bounds to existing plots are supported.

Maintained by Tim-Gunnar Hensel. Last updated 2 years ago.

field-data-analysis interactive-visualizations plotly reliability-analysis weibull-analysis weibulltools openblas cpp

3.5 match 13 stars 6.15 score 54 scripts

svilsen

STRMPS:Analysis of Short Tandem Repeat (STR) Massively Parallel Sequencing (MPS) Data

Loading, identifying, aggregating, manipulating, and analysing short tandem repeat regions of massively parallel sequencing data in forensic genetics. The analyses and framework implemented in this package relies on the papers of Vilsen et al. (2017) <doi:10.1016/j.fsigen.2017.01.017> and Vilsen et al. (2018) <doi:10.1016/j.fsigen.2018.04.003>. Note: that the parallelisation in the package relies on mclapply() and, thus, speed-ups will only be seen on UNIX based systems.

Maintained by Søren B. Vilsen. Last updated 4 days ago.

biostrings pwalign shortread iranges

5.0 match 4.30 score

bioc

VariantAnnotation:Annotation of Genetic Variants

Annotate variants, compute amino acid coding changes, predict coding outcomes.

Maintained by Bioconductor Package Maintainer. Last updated 2 months ago.

dataimport sequencing snp annotation genetics variantannotation curl bzip2 xz-utils zlib

1.9 match 11.39 score 1.9k scripts 152 dependents

bdhitt

binGroup2:Identification and Estimation using Group Testing

Methods for the group testing identification problem: 1) Operating characteristics (e.g., expected number of tests) for commonly used hierarchical and array-based algorithms, and 2) Optimal testing configurations for these same algorithms. Methods for the group testing estimation problem: 1) Estimation and inference procedures for an overall prevalence, and 2) Regression modeling for commonly used hierarchical and array-based algorithms.

Maintained by Brianna Hitt. Last updated 1 years ago.

openblas cpp

8.6 match 2.48 score 3 scripts 1 dependents

marsicofl

mispitools:Missing Person Identification Tools

An open source software package written in R statistical language. It consists of a set of decision-making tools to conduct missing person searches. Particularly, it allows computing optimal LR threshold for declaring potential matches in DNA-based database search. More recently 'mispitools' incorporates preliminary investigation data based LRs. Statistical weight of different traces of evidence such as biological sex, age and hair color are presented. For citing mispitools please use the following references: Marsico and Caridi, 2023 <doi:10.1016/j.fsigen.2023.102891> and Marsico, Vigeland et al. 2021 <doi:10.1016/j.fsigen.2021.102519>.

Maintained by Franco Marsico. Last updated 3 months ago.

3.1 match 35 stars 6.74 score 19 scripts 1 dependents

a-dudek-ue

clusterSim:Searching for Optimal Clustering Procedure for a Data Set

Distance measures (GDM1, GDM2, Sokal-Michener, Bray-Curtis, for symbolic interval-valued data), cluster quality indices (Calinski-Harabasz, Baker-Hubert, Hubert-Levine, Silhouette, Krzanowski-Lai, Hartigan, Gap, Davies-Bouldin), data normalization formulas (metric data, interval-valued symbolic data), data generation (typical and non-typical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic interval-valued data). (MILLIGAN, G.W., COOPER, M.C. (1985) <doi:10.1007/BF02294245>, HUBERT, L., ARABIE, P. (1985) <doi:10.1007%2FBF01908075>, RAND, W.M. (1971) <doi:10.1080/01621459.1971.10482356>, JAJUGA, K., WALESIAK, M. (2000) <doi:10.1007/978-3-642-57280-7_11>, MILLIGAN, G.W., COOPER, M.C. (1988) <doi:10.1007/BF01897163>, JAJUGA, K., WALESIAK, M., BAK, A. (2003) <doi:10.1007/978-3-642-55721-7_12>, DAVIES, D.L., BOULDIN, D.W. (1979) <doi:10.1109/TPAMI.1979.4766909>, CALINSKI, T., HARABASZ, J. (1974) <doi:10.1080/03610927408827101>, HUBERT, L. (1974) <doi:10.1080/01621459.1974.10480191>, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) <doi:10.1111/1467-9868.00293>, BRECKENRIDGE, J.N. (2000) <doi:10.1207/S15327906MBR3502_5>, WALESIAK, M., DUDEK, A. (2008) <doi:10.1007/978-3-540-78246-9_11>).

Maintained by Andrzej Dudek. Last updated 6 months ago.

cpp

3.3 match 2 stars 6.35 score 512 scripts 9 dependents

ekstroem

dataMaid:A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process

Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.

Maintained by Claus Thorn Ekstrøm. Last updated 3 years ago.

data-cleaning data-screening reproducible-research

2.7 match 143 stars 7.53 score 236 scripts

dgrun

RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data

Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).

Maintained by Dominic Grün. Last updated 4 months ago.

cpp

4.3 match 4.74 score 110 scripts

okdll

flowTraceR:Tracing Information Flow for Inter-Software Comparisons in Mass Spectrometry-Based Bottom-Up Proteomics

Useful functions to standardize software outputs from ProteomeDiscoverer, Spectronaut, DIA-NN and MaxQuant on precursor, modified peptide and proteingroup level and to trace software differences for identifications such as varying proteingroup denotations for common precursor.

Maintained by Oliver Kardell. Last updated 3 years ago.

3.9 match 3 stars 5.17 score 11 scripts 1 dependents

mreginato

monographaR:Taxonomic Monographs Tools

Contains functions intended to facilitate the production of plant taxonomic monographs. The package includes functions to convert tables into taxonomic descriptions, lists of collectors, examined specimens, identification keys (dichotomous and interactive), and can generate a monograph skeleton. Additionally, wrapper functions to batch the production of phenology histograms and distributional and diversity maps are also available.

Maintained by Marcelo Reginato. Last updated 1 years ago.

4.3 match 3 stars 4.73 score 18 scripts

ropensci

phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'

A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.

Maintained by Shixiang Wang. Last updated 8 months ago.

blastn genbank peer-reviewed phylogenetics sequence-alignment

3.4 match 23 stars 5.86 score 156 scripts

gertraudmalsinerwalli

telescope:Bayesian Mixtures with an Unknown Number of Components

Fits Bayesian finite mixtures with an unknown number of components using the telescoping sampler and different component distributions. For more details see Frühwirth-Schnatter et al. (2021) <doi:10.1214/21-BA1294>.

Maintained by Gertraud Malsiner-Walli. Last updated 2 months ago.

6.7 match 3.00 score 4 scripts

niuniular

MDDC:Modified Detecting Deviating Cells Algorithm in Pharmacovigilance

Methods for detecting signals related to (adverse event, medical product e.g. drugs, vaccines) pairs, a data generation function for simulating pharmacovigilance datasets, and various utility functions. For more details please see Liu A., Mukhopadhyay R., and Markatou M. <doi:10.48550/arXiv.2410.01168>.

Maintained by Anran Liu. Last updated 5 months ago.

pharmacovigilance cpp

4.4 match 1 stars 4.54 score 4 scripts

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

1.7 match 215 stars 11.83 score 1.2k scripts 9 dependents

sebkrantz

collapse:Advanced and Fast Data Transformation

A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.

Maintained by Sebastian Krantz. Last updated 8 days ago.

data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data scientific-computing statistics time-series weighted weights cpp openmp

1.2 match 672 stars 16.63 score 708 scripts 97 dependents

bioc

MetID:Network-based prioritization of putative metabolite IDs

This package uses an innovative network-based approach that will enhance our ability to determine the identities of significant ions detected by LC-MS.

Maintained by Zhenzhi Li. Last updated 5 months ago.

assaydomain biologicalquestion infrastructure researchfield statisticalmethod technology workflowstep network kegg

3.5 match 1 stars 5.74 score 110 scripts

bioc

MassSpecWavelet:Peak Detection for Mass Spectrometry data using wavelet-based algorithms

Peak Detection in Mass Spectrometry data is one of the important preprocessing steps. The performance of peak detection affects subsequent processes, including protein identification, profile alignment and biomarker identification. Using Continuous Wavelet Transform (CWT), this package provides a reliable algorithm for peak detection that does not require any type of smoothing or previous baseline correction method, providing more consistent results for different spectra. See <doi:10.1093/bioinformatics/btl355} for further details.

Maintained by Sergio Oller Moreno. Last updated 3 months ago.

immunooncology massspectrometry proteomics peakdetection

2.1 match 9 stars 9.41 score 37 scripts 18 dependents

bioc

Cepo:Cepo for the identification of differentially stable genes

Defining the identity of a cell is fundamental to understand the heterogeneity of cells to various environmental signals and perturbations. We present Cepo, a new method to explore cell identities from single-cell RNA-sequencing data using differential stability as a new metric to define cell identity genes. Cepo computes cell-type specific gene statistics pertaining to differential stable gene expression.

Maintained by Hani Jieun Kim. Last updated 5 months ago.

classification geneexpression singlecell software sequencing differentialexpression

4.3 match 4.62 score 14 scripts 1 dependents

bioc

GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.

Maintained by Robert Castelo. Last updated 6 days ago.

functionalgenomics microarray rnaseq pathways genesetenrichment gene-set-enrichment genomics pathway-enrichment-analysis

1.3 match 210 stars 14.72 score 1.6k scripts 19 dependents

r-forge

mlogit:Multinomial Logit Models

Maximum Likelihood estimation of random utility discrete choice models, as described in Kenneth Train (2009) Discrete Choice Methods with Simulations <doi:10.1017/CBO9780511805271>.

Maintained by Yves Croissant. Last updated 5 years ago.

2.0 match 9.81 score 1.2k scripts 14 dependents

tomaskrehlik

frequencyConnectedness:Spectral Decomposition of Connectedness Measures

Accompanies a paper (Barunik, Krehlik (2018) <doi:10.1093/jjfinec/nby001>) dedicated to spectral decomposition of connectedness measures and their interpretation. We implement all the developed estimators as well as the historical counterparts. For more information, see the help or GitHub page (<https://github.com/tomaskrehlik/frequencyConnectedness>) for relevant information.

Maintained by Tomas Krehlik. Last updated 2 years ago.

3.3 match 100 stars 5.88 score 50 scripts 1 dependents

mboeck11

BGVAR:Bayesian Global Vector Autoregressions

Estimation of Bayesian Global Vector Autoregressions (BGVAR) with different prior setups and the possibility to introduce stochastic volatility. Built-in priors include the Minnesota, the stochastic search variable selection and Normal-Gamma (NG) prior. For a reference see also Crespo Cuaresma, J., Feldkircher, M. and F. Huber (2016) "Forecasting with Global Vector Autoregressive Models: a Bayesian Approach", Journal of Applied Econometrics, Vol. 31(7), pp. 1371-1391 <doi:10.1002/jae.2504>. Post-processing functions allow for doing predictions, structurally identify the model with short-run or sign-restrictions and compute impulse response functions, historical decompositions and forecast error variance decompositions. Plotting functions are also available. The package has a companion paper: Boeck, M., Feldkircher, M. and F. Huber (2022) "BGVAR: Bayesian Global Vector Autoregressions with Shrinkage Priors in R", Journal of Statistical Software, Vol. 104(9), pp. 1-28 <doi:10.18637/jss.v104.i09>.

Maintained by Maximilian Boeck. Last updated 3 months ago.

openblas cpp

2.6 match 27 stars 7.58 score 156 scripts

tombeesley

eyetools:Analyse Eye Data

Enables the automation of actions across the pipeline, including initial steps of transforming binocular data and gap repair to event-based processing such as fixations, saccades, and entry/duration in Areas of Interest (AOIs). It also offers visualisation of eye movement and AOI entries. These tools take relatively raw (trial, time, x, and y form) data and can be used to return fixations, saccades, and AOI entries and time spent in AOIs. As the tools rely on this basic data format, the functions can work with data from any eye tracking device. Implements fixation and saccade detection using methods proposed by Salvucci and Goldberg (2000) <doi:10.1145/355017.355028>.

Maintained by Tom Beesley. Last updated 3 months ago.

areas-of-interest attention-visualization cognitive-science dwell-time-algorithm eye-tracker eye-tracking eyetracking ggplot2 psychology psychology-experiments saccades tobii tobii-eye-tracker visualization

3.6 match 4 stars 5.45 score 8 scripts

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

1.3 match 459 stars 14.63 score 948 scripts 18 dependents

han-siyu

LncFinder:LncRNA Identification and Analysis Using Heterologous Features

Long non-coding RNAs identification and analysis. Default models are trained with human, mouse and wheat datasets by employing SVM. Features are based on intrinsic composition of sequence, EIIP value (electron-ion interaction pseudopotential), and secondary structure. This package can also extract other classic features and build new classifiers. Reference: Han S., et al. (2019) <doi:10.1093/bib/bby065>.

Maintained by Siyu Han. Last updated 6 months ago.

5.2 match 2 stars 3.68 score 53 scripts

olechnwin

DIME:Differential Identification using Mixture Ensemble

A robust identification of differential binding sites method for analyzing ChIP-seq (Chromatin Immunoprecipitation Sequencing) comparing two samples that considers an ensemble of finite mixture models combined with a local false discovery rate (fdr) allowing for flexible modeling of data. Methods for Differential Identification using Mixture Ensemble (DIME) is described in: Taslim et al., (2011) <doi:10.1093/bioinformatics/btr165>.

Maintained by Cenny Taslim. Last updated 3 years ago.

7.3 match 2.63 score 43 scripts

bioc

HEM:Heterogeneous error model for identification of differentially expressed genes under multiple conditions

This package fits heterogeneous error models for analysis of microarray data

Maintained by HyungJun Cho. Last updated 5 months ago.

microarray differentialexpression

4.4 match 4.30 score 6 scripts

bioc

EnMCB:Predicting Disease Progression Based on Methylation Correlated Blocks using Ensemble Models

Creation of the correlated blocks using DNA methylation profiles. Machine learning models can be constructed to predict differentially methylated blocks and disease progression.

Maintained by Xin Yu. Last updated 5 months ago.

normalization dnamethylation methylationarray supportvectormachine

3.6 match 9 stars 5.26 score 2 scripts

paulhendricks

detector:Detect Data Containing Personally Identifiable Information

Allows users to quickly and easily detect data containing Personally Identifiable Information (PII) through convenience functions.

Maintained by Paul Hendricks. Last updated 8 years ago.

3.5 match 15 stars 5.34 score 29 scripts

wilsonfreitas

numbersBR:Validate, Compare and Format Identification Numbers from Brazil

Validate, format and compare identification numbers used in Brazil. These numbers are used to identify individuals (CPF), vehicles (RENAVAN), companies (CNPJ) and etc. Functions to format, validate and compare these numbers have been implemented in a vectorized way in order to speed up validations and comparisons in big datasets.

Maintained by Wilson Freitas. Last updated 7 years ago.

brasil cnpj cpf renavan

5.2 match 9 stars 3.65 score 5 scripts

noaa-nwfsc

zoid:Bayesian Zero-and-One Inflated Dirichlet Regression Modelling

Fits Dirichlet regression and zero-and-one inflated Dirichlet regression with Bayesian methods implemented in Stan. These models are sometimes referred to as trinomial mixture models; covariates and overdispersion can optionally be included.

Maintained by Eric J. Ward. Last updated 21 hours ago.

mixture-models nwfsc-cb stan cpp

3.0 match 8 stars 6.19 score 12 scripts

weiliu123

PCLassoReg:Group Regression Models for Risk Protein Complex Identification

Two protein complex-based group regression models (PCLasso and PCLasso2) for risk protein complex identification. PCLasso is a prognostic model that identifies risk protein complexes associated with survival. PCLasso2 is a classification model that identifies risk protein complexes associated with classes. For more information, see Wang and Liu (2021) <doi:10.1093/bib/bbab212>.

Maintained by Wei Liu. Last updated 3 years ago.

5.1 match 1 stars 3.70 score 1 scripts

charvey23

AvInertia:Calculate the Inertial Properties of a Flying Bird

Tools to compute the center of gravity and moment of inertia tensor of any flying bird. The tools function by modeling a bird as a composite structure of simple geometric objects. This requires detailed morphological measurements of bird specimens although those obtained for the associated paper have been included in the package for use. Refer to the vignettes and supplementary material for detailed information on the package function.

Maintained by Christina Harvey. Last updated 3 years ago.

3.8 match 6 stars 5.00 score 33 scripts

bioc

DominoEffect:Identification and Annotation of Protein Hotspot Residues

The functions support identification and annotation of hotspot residues in proteins. These are individual amino acids that accumulate mutations at a much higher rate than their surrounding regions.

Maintained by Marija Buljan. Last updated 5 months ago.

software somaticmutation proteomics sequencematching alignment

5.3 match 3.48 score 1 scripts

schmidtpk

PointFore:Interpretation of Point Forecasts as State-Dependent Quantiles and Expectiles

Estimate specification models for the state-dependent level of an optimal quantile/expectile forecast. Wald Tests and the test of overidentifying restrictions are implemented. Plotting of the estimated specification model is possible. The package contains two data sets with forecasts and realizations: the daily accumulated precipitation at London, UK from the high-resolution model of the European Centre for Medium-Range Weather Forecasts (ECMWF, <https://www.ecmwf.int/>) and GDP growth Greenbook data by the US Federal Reserve. See Schmidt, Katzfuss and Gneiting (2015) <arXiv:1506.01917> for more details on the identification and estimation of a directive behind a point forecast.

Maintained by Patrick Schmidt. Last updated 4 years ago.

4.1 match 4.48 score 20 scripts

gregorkastner

factorstochvol:Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models

Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models with interweaving <doi:10.1080/10618600.2017.1322091>. Sparsity can be achieved through the usage of Normal-Gamma priors on the factor loading matrix <doi:10.1016/j.jeconom.2018.11.007>.

Maintained by Gregor Kastner. Last updated 1 years ago.

openblas cpp

3.9 match 7 stars 4.73 score 17 scripts 1 dependents

mathewchamberlain

SignacX:Cell Type Identification and Discovery from Single Cell Gene Expression Data

An implementation of neural networks trained with flow-sorted gene expression data to classify cellular phenotypes in single cell RNA-sequencing data. See Chamberlain M et al. (2021) <doi:10.1101/2021.02.01.429207> for more details.

Maintained by Mathew Chamberlain. Last updated 2 years ago.

cellular-phenotypes seurat single-cell-rna-seq

2.8 match 24 stars 6.46 score 34 scripts

thiyangt

denguedatahub:A Tidy Format Datasets of Dengue by Country

Provides a weekly, monthly, yearly summary of dengue cases by state/ province/ country.

Maintained by Thiyanga S. Talagala. Last updated 1 months ago.

openjdk

3.5 match 11 stars 5.12 score 34 scripts

laresbernardo

lares:Analytics & Machine Learning Sidekick

Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.

Maintained by Bernardo Lares. Last updated 26 days ago.

analytics api automation automl data-science descriptive-statistics h2o machine-learning marketing mmm predictive-modeling puzzle rlanguage robyn visualization

1.8 match 233 stars 9.84 score 185 scripts 1 dependents

bioc

coMethDMR:Accurate identification of co-methylated and differentially methylated regions in epigenome-wide association studies

coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.

Maintained by Fernanda Veitzman. Last updated 5 months ago.

dnamethylation epigenetics methylationarray differentialmethylation genomewideassociation

2.7 match 7 stars 6.47 score 42 scripts

bart1

move:Visualizing and Analyzing Animal Track Data

Contains functions to access movement data stored in 'movebank.org' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.

Maintained by Bart Kranstauber. Last updated 4 months ago.

cpp

2.0 match 8.74 score 690 scripts 3 dependents

r-forge

cardidates:Identification of Cardinal Dates in Ecological Time Series

Identification of cardinal dates (begin, time of maximum, end of mass developments) in ecological time series using fitted Weibull functions.

Maintained by Thomas Petzoldt. Last updated 1 years ago.

5.2 match 3.34 score 22 scripts

cran

crosstalkr:Analysis of Graph-Structured Data with a Focus on Protein-Protein Interaction Networks

Provides a general toolkit for drug target identification. We include functionality to reduce large graphs to subgraphs and prioritize nodes. In addition to being optimized for use with generic graphs, we also provides support to analyze protein-protein interactions networks from online repositories. For more details on core method, refer to Weaver et al. (2021) <https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008755>.

Maintained by Davis Weaver. Last updated 10 months ago.

cpp

6.4 match 2.70 score

ilapros

rnrfa:UK National River Flow Archive Data from R

Utility functions to retrieve data from the UK National River Flow Archive (<https://nrfa.ceh.ac.uk/>, terms and conditions: <https://nrfa.ceh.ac.uk/costs-terms-and-conditions>). The package contains R wrappers to the UK NRFA data temporary-API. There are functions to retrieve stations falling in a bounding box, to generate a map and extracting time series and general information. The package is fully described in Vitolo et al (2016) "rnrfa: An R package to Retrieve, Filter and Visualize Data from the UK National River Flow Archive" <https://journal.r-project.org/archive/2016/RJ-2016-036/RJ-2016-036.pdf>.

Maintained by Ilaria Prosdocimi. Last updated 9 months ago.

3.0 match 2 stars 5.71 score 51 scripts

bioc

SurfR:Surface Protein Prediction and Identification

Identify Surface Protein coding genes from a list of candidates. Systematically download data from GEO and TCGA or use your own data. Perform DGE on bulk RNAseq data. Perform Meta-analysis. Descriptive enrichment analysis and plots.

Maintained by Aurora Maurizio. Last updated 4 days ago.

software sequencing rnaseq geneexpression transcription differentialexpression principalcomponent genesetenrichment pathways batcheffect functionalgenomics visualization dataimport functionalprediction geneprediction go dge enrichment-analysis metaanalysis plots proteins public-data surface surfaceome

3.1 match 3 stars 5.43 score 3 scripts

bioc

PIPETS:Poisson Identification of PEaks from Term-Seq data

PIPETS provides statistically robust analysis for 3'-seq/term-seq data. It utilizes a sliding window approach to apply a Poisson Distribution test to identify genomic positions with termination read coverage that is significantly higher than the surrounding signal. PIPETS then condenses proximal signal and produces strand specific results that contain all significant termination peaks.

Maintained by Quinlan Furumo. Last updated 5 months ago.

sequencing transcription generegulation peakdetection genetics transcriptomics coverage

4.5 match 3.78 score 2 scripts

bioc

TrIdent:TrIdent - Transduction Identification

The `TrIdent` R package automates the analysis of transductomics data by detecting, classifying, and characterizing read coverage patterns associated with potential transduction events. Transductomics is a DNA sequencing-based method for the detection and characterization of transduction events in pure cultures and complex communities. Transductomics relies on mapping sequencing reads from a viral-like particle (VLP)-fraction of a sample to contigs assembled from the metagenome (whole-community) of the same sample. Reads from bacterial DNA carried by VLPs will map back to the bacterial contigs of origin creating read coverage patterns indicative of ongoing transduction.

Maintained by Jessie Maier. Last updated 15 days ago.

coverage metagenomics patternlogic classification sequencing bacteriophage horizontal-gene-transfer pattern-matching phage sequencing-coverage transduction transductomics virus-like-particle

3.3 match 2 stars 5.04 score 7 scripts

hanase

BMA:Bayesian Model Averaging

Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).

Maintained by Hana Sevcikova. Last updated 2 months ago.

fortran

1.8 match 38 stars 9.40 score 152 scripts 14 dependents

zhangab2008

BarcodingR:Species Identification using DNA Barcodes

To perform species identification using DNA barcodes.

Maintained by Ai-bing ZHANG. Last updated 5 years ago.

11.5 match 1 stars 1.41 score 26 scripts

cran

hsphase:Phasing, Pedigree Reconstruction, Sire Imputation and Recombination Events Identification of Half-sib Families Using SNP Data

Identification of recombination events, haplotype reconstruction, sire imputation and pedigree reconstruction using half-sib family SNP data.

Maintained by Mohammad Ferdosi. Last updated 1 years ago.

cpp openmp

4.8 match 1 stars 3.32 score 1 dependents

ikosmidis

brglm:Bias Reduction in Binomial-Response Generalized Linear Models

Fit generalized linear models with binomial responses using either an adjusted-score approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudo-data. The interface is essentially the same as 'glm'. More flexibility is provided by the fact that custom pseudo-data representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reduced-bias estimates.

Maintained by Ioannis Kosmidis. Last updated 4 years ago.

2.3 match 6 stars 7.14 score 86 scripts 11 dependents

hanjunwei-lab

ICDS:Identification of Cancer Dysfunctional Subpathway with Omics Data

Identify Cancer Dysfunctional Sub-pathway by integrating gene expression, DNA methylation and copy number variation, and pathway topological information. 1)We firstly calculate the gene risk scores by integrating three kinds of data: DNA methylation, copy number variation, and gene expression. 2)Secondly, we perform a greedy search algorithm to identify the key dysfunctional sub-pathways within the pathways for which the discriminative scores were locally maximal. 3)Finally, the permutation test was used to calculate statistical significance level for these key dysfunctional sub-pathways.

Maintained by Junwei Han. Last updated 8 months ago.

4.5 match 3.54 score 3 scripts

trinker

wakefield:Generate Random Data Sets

Generates random data sets including: data.frames, lists, and vectors.

Maintained by Tyler Rinker. Last updated 5 years ago.

data-generation wakefield

2.3 match 256 stars 7.13 score 209 scripts

eddelbuettel

RcppUUID:Generating Universally Unique Identificators

Using the efficient implementation in the Boost C++ library, functions are provided to generate vectors of 'Universally Unique Identifiers (UUID)' from R supporting random (version 4), name (version 5) and time (version 7) 'UUIDs'. The initial repository was at <https://gitlab.com/artemklevtsov/rcppuuid>.

Maintained by Dirk Eddelbuettel. Last updated 1 months ago.

cpp

5.0 match 1 stars 3.18 score 1 scripts

cran

ips:Interfaces to Phylogenetic Software in R

Functions that wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states.

Maintained by Christoph Heibl. Last updated 11 months ago.

3.7 match 4.28 score 128 scripts 1 dependents

bioc

EBarrays:Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification

EBarrays provides tools for the analysis of replicated/unreplicated microarray data.

Maintained by Ming Yuan. Last updated 5 months ago.

clustering differentialexpression

2.8 match 5.56 score 5 scripts 6 dependents

bioc

VegaMC:VegaMC: A Package Implementing a Variational Piecewise Smooth Model for Identification of Driver Chromosomal Imbalances in Cancer

This package enables the detection of driver chromosomal imbalances including loss of heterozygosity (LOH) from array comparative genomic hybridization (aCGH) data. VegaMC performs a joint segmentation of a dataset and uses a statistical framework to distinguish between driver and passenger mutation. VegaMC has been implemented so that it can be immediately integrated with the output produced by PennCNV tool. In addition, VegaMC produces in output two web pages that allows a rapid navigation between both the detected regions and the altered genes. In the web page that summarizes the altered genes, the link to the respective Ensembl gene web page is reported.

Maintained by Sandro Morganella. Last updated 5 months ago.

acgh copynumbervariation

4.3 match 3.60 score 1 scripts

bioc

diffcoexp:Differential Co-expression Analysis

A tool for the identification of differentially coexpressed links (DCLs) and differentially coexpressed genes (DCGs). DCLs are gene pairs with significantly different correlation coefficients under two conditions. DCGs are genes with significantly more DCLs than by chance.

Maintained by Wenbin Wei. Last updated 5 months ago.

geneexpression differentialexpression transcription microarray onechannel twochannel rnaseq sequencing coverage immunooncology

2.2 match 15 stars 6.89 score 37 scripts

bioc

GSgalgoR:An Evolutionary Framework for the Identification and Study of Prognostic Gene Expression Signatures in Cancer

A multi-objective optimization algorithm for disease sub-type discovery based on a non-dominated sorting genetic algorithm. The 'Galgo' framework combines the advantages of clustering algorithms for grouping heterogeneous 'omics' data and the searching properties of genetic algorithms for feature selection. The algorithm search for the optimal number of clusters determination considering the features that maximize the survival difference between sub-types while keeping cluster consistency high.

Maintained by Carlos Catania. Last updated 5 months ago.

geneexpression transcription clustering classification survival

2.8 match 15 stars 5.48 score 6 scripts

bioc

mosbi:Molecular Signature identification using Biclustering

This package is a implementation of biclustering ensemble method MoSBi (Molecular signature Identification from Biclustering). MoSBi provides standardized interfaces for biclustering results and can combine their results with a multi-algorithm ensemble approach to compute robust ensemble biclusters on molecular omics data. This is done by computing similarity networks of biclusters and filtering for overlaps using a custom error model. After that, the louvain modularity it used to extract bicluster communities from the similarity network, which can then be converted to ensemble biclusters. Additionally, MoSBi includes several network visualization methods to give an intuitive and scalable overview of the results. MoSBi comes with several biclustering algorithms, but can be easily extended to new biclustering algorithms.

Maintained by Tim Daniel Rose. Last updated 5 months ago.

software statisticalmethod clustering network cpp

3.5 match 4.30 score 8 scripts

cran

BioPred:An R Package for Biomarkers Analysis in Precision Medicine

Provides functions for training extreme gradient boosting model using propensity score A-learning and weight-learning methods. For further details, see Liu et al. (2024) <doi:10.1093/bioinformatics/btae592>.

Maintained by Zihuan Liu. Last updated 4 months ago.

5.0 match 3.00 score

bsnatr

tswge:Time Series for Data Science

Accompanies the texts Time Series for Data Science with R by Woodward, Sadler and Robertson & Applied Time Series Analysis with R, 2nd edition by Woodward, Gray, and Elliott. It is helpful for data analysis and for time series instruction.

Maintained by Bivin Sadler. Last updated 2 years ago.

5.5 match 2.70 score 496 scripts

bioc

CiteFuse:CiteFuse: multi-modal analysis of CITE-seq data

CiteFuse pacakage implements a suite of methods and tools for CITE-seq data from pre-processing to integrative analytics, including doublet detection, network-based modality integration, cell type clustering, differential RNA and protein expression analysis, ADT evaluation, ligand-receptor interaction analysis, and interactive web-based visualisation of the analyses.

Maintained by Yingxin Lin. Last updated 5 months ago.

singlecell geneexpression bioinformatics single-cell cpp

2.3 match 27 stars 6.59 score 18 scripts

igorrigolon

datazoom.amazonia:Simplify Access to Data from the Amazon Region

Functions to download and treat data regarding the Brazilian Amazon region from a variety of official sources.

Maintained by Igor Rigolon Veiga. Last updated 1 years ago.

3.4 match 4.29 score 15 scripts

sunnypig1988

BCSub:A Bayesian Semiparametric Factor Analysis Model for Subtype Identification (Clustering)

Gene expression profiles are commonly utilized to infer disease subtypes and many clustering methods can be adopted for this task. However, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering. To deal with these challenges, we develop a novel clustering method in the Bayesian setting. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering.

Maintained by Jiehuan Sun. Last updated 8 years ago.

openblas cpp

7.3 match 2.00 score 2 scripts

traminer

TraMineR:Trajectory Miner: a Sequence Analysis Toolkit

Set of sequence analysis tools for manipulating, describing and rendering categorical sequences, and more generally mining sequence data in the field of social sciences. Although this sequence analysis package is primarily intended for state or event sequences that describe time use or life courses such as family formation histories or professional careers, its features also apply to many other kinds of categorical sequence data. It accepts many different sequence representations as input and provides tools for converting sequences from one format to another. It offers several functions for describing and rendering sequences, for computing distances between sequences with different metrics (among which optimal matching), original dissimilarity-based analysis tools, and functions for extracting the most frequent event subsequences and identifying the most discriminating ones among them. A user's guide can be found on the TraMineR web page.

Maintained by Gilbert Ritschard. Last updated 3 months ago.

cpp

1.8 match 11 stars 8.24 score 534 scripts 13 dependents

castleli

scBSP:A Fast Tool for Single-Cell Spatially Variable Genes Identifications on Large-Scale Data

Identifying spatially variable genes is critical in linking molecular cell functions with tissue phenotypes. This package utilizes a granularity-based dimension-agnostic tool, single-cell big-small patch (scBSP), implementing sparse matrix operation and KD tree methods for distance calculation, for the identification of spatially variable genes on large-scale data. The detailed description of this method is available at Wang, J. and Li, J. et al. 2023 (Wang, J. and Li, J. (2023), <doi:10.1038/s41467-023-43256-5>).

Maintained by Jinpu Li. Last updated 1 months ago.

3.2 match 18 stars 4.43 score 2 scripts

bioc

PhyloProfile:PhyloProfile

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Maintained by Vinh Tran. Last updated 8 days ago.

software visualization datarepresentation multiplecomparison functionalprediction dimensionreduction bioinformatics heatmap interactive-visualizations orthologs phylogenetic-profile shiny

1.8 match 33 stars 7.77 score 10 scripts

bioc

cTRAP:Identification of candidate causal perturbations from differential gene expression data

Compare differential gene expression results with those from known cellular perturbations (such as gene knock-down, overexpression or small molecules) derived from the Connectivity Map. Such analyses allow not only to infer the molecular causes of the observed difference in gene expression but also to identify small molecules that could drive or revert specific transcriptomic alterations.

Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.

differentialexpression geneexpression rnaseq transcriptomics pathways immunooncology genesetenrichment bioconductor bioinformatics cmap gene-expression l1000

2.8 match 5 stars 5.08 score 16 scripts

bioc

Uniquorn:Identification of cancer cell lines based on their weighted mutational/ variational fingerprint

'Uniquorn' enables users to identify cancer cell lines. Cancer cell line misidentification and cross-contamination reprents a significant challenge for cancer researchers. The identification is vital and in the frame of this package based on the locations/ loci of somatic and germline mutations/ variations. The input format is vcf/ vcf.gz and the files have to contain a single cancer cell line sample (i.e. a single member/genotype/gt column in the vcf file).

Maintained by Raik Otto. Last updated 5 months ago.

immunooncology statisticalmethod wholegenome exomeseq

3.3 match 4.30 score

bioc

logicFS:Identification of SNP Interactions

Identification of interactions between binary variables using Logic Regression. Can, e.g., be used to find interesting SNP interactions. Contains also a bagging version of logic regression for classification.

Maintained by Holger Schwender. Last updated 5 months ago.

snp classification genetics

3.9 match 3.60 score 8 scripts

ropensci

datapack:A Flexible Container to Transport and Manipulate Data and Associated Resources

Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI- ORE standard is described at <https://www.openarchives.org/ore/>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at <https://tools.ietf.org/html/draft-kunze-bagit-08>.

Maintained by Matthew B. Jones. Last updated 3 years ago.

1.6 match 44 stars 8.56 score 195 scripts 4 dependents

setzler

DiDforBigData:A Big Data Implementation of Difference-in-Differences Estimation with Staggered Treatment

Provides a big-data-friendly and memory-efficient difference-in-differences estimator for staggered (and non-staggered) treatment contexts.

Maintained by Bradley Setzler. Last updated 9 months ago.

2.8 match 5 stars 5.00 score 10 scripts

noreastermt

allelematch:Identifying Unique Multilocus Genotypes where Genotyping Error and Missing Data may be Present

Tools for the identification of unique of multilocus genotypes when both genotyping error and missing data may be present; targeted for use with large datasets and databases containing multiple samples of each individual (a common situation in conservation genetics, particularly in non-invasive wildlife sampling applications). Functions explicitly incorporate missing data and can tolerate allele mismatches created by genotyping error. If you use this package, please cite the original publication in Molecular Ecology Resources (Galpern et al., 2012), the details for which can be generated using citation('allelematch'). For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.

Maintained by Todd Cross. Last updated 12 months ago.

6.1 match 2.26 score 8 scripts 1 dependents

nk027

BVAR:Hierarchical Bayesian Vector Autoregression

Estimation of hierarchical Bayesian vector autoregressive models following Kuschnig & Vashold (2021) <doi:10.18637/jss.v100.i14>. Implements hierarchical prior selection for conjugate priors in the fashion of Giannone, Lenza & Primiceri (2015) <doi:10.1162/REST_a_00483>. Functions to compute and identify impulse responses, calculate forecasts, forecast error variance decompositions and scenarios are available. Several methods to print, plot and summarise results facilitate analysis.

Maintained by Nikolas Kuschnig. Last updated 4 months ago.

bayesian bvar forecasts impulse-responses vector-autoregressions

1.9 match 51 stars 7.30 score 68 scripts 1 dependents

bioc

Rdisop:Decomposition of Isotopic Patterns

In high resolution mass spectrometry (HR-MS), the measured masses can be decomposed into potential element combinations (chemical sum formulas). Where additional mass/intensity information of respective isotopic peaks is available, decomposition can take this information into account to better rank the potential candidate sum formulas. To compare measured mass/intensity information with the theoretical distribution of candidate sum formulas, the latter needs to be calculated. This package implements fast algorithms to address both tasks, the calculation of isotopic distributions for arbitrary sum formulas (assuming a HR-MS resolution of roughly 30,000), and the ranked list of sum formulas fitting an observed peak or isotopic peak set.

Maintained by Steffen Neumann. Last updated 1 months ago.

immunooncology massspectrometry metabolomics mass-spectrometry cpp

1.5 match 4 stars 9.12 score 111 scripts 2 dependents

matthewblackwell

Amelia:A Program for Missing Data

A tool that "multiply imputes" missing data in a single cross-section (such as a survey), from a time series (like variables collected for each year in a country), or from a time-series-cross-sectional data set (such as collected by years for each of several countries). Amelia II implements our bootstrapping-based algorithm that gives essentially the same answers as the standard IP or EMis approaches, is usually considerably faster than existing approaches and can handle many more variables. Unlike Amelia I and other statistically rigorous imputation software, it virtually never crashes (but please let us know if you find to the contrary!). The program also generalizes existing approaches by allowing for trends in time series across observations within a cross-sectional unit, as well as priors that allow experts to incorporate beliefs they have about the values of missing cells in their data. Amelia II also includes useful diagnostics of the fit of multiple imputation models. The program works from the R command line or via a graphical user interface that does not require users to know R.

Maintained by Matthew Blackwell. Last updated 4 months ago.

openblas cpp

1.5 match 1 stars 9.06 score 1.4k scripts 7 dependents

bioc

NetSAM:Network Seriation And Modularization

The NetSAM (Network Seriation and Modularization) package takes an edge-list representation of a weighted or unweighted network as an input, performs network seriation and modularization analysis, and generates as files that can be used as an input for the one-dimensional network visualization tool NetGestalt (http://www.netgestalt.org) or other network analysis. The NetSAM package can also generate correlation network (e.g. co-expression network) based on the input matrix data, perform seriation and modularization analysis for the correlation network and calculate the associations between the sample features and modules or identify the associated GO terms for the modules.

Maintained by Zhiao Shi. Last updated 5 months ago.

3.7 match 3.60 score 1 scripts