Showing 200 of total 621 results (show query)
bioc
genefu:Computation of Gene Expression-Based Signatures in Breast Cancer
This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.
Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.
differentialexpressiongeneexpressionvisualizationclusteringclassification
29.7 match 7.42 score 193 scripts 3 dependentsjaredhuling
personalized:Estimation and Validation Methods for Subgroup Identification and Personalized Medicine
Provides functions for fitting and validation of models for subgroup identification and personalized medicine / precision medicine under the general subgroup identification framework of Chen et al. (2017) <doi:10.1111/biom.12676>. This package is intended for use for both randomized controlled trials and observational studies and is described in detail in Huling and Yu (2021) <doi:10.18637/jss.v098.i05>.
Maintained by Jared Huling. Last updated 3 years ago.
causal-inferenceheterogeneity-of-treatment-effectindividualized-treatment-rulespersonalized-medicineprecision-medicinesubgroup-identificationtreatment-effectstreatment-scoring
24.3 match 32 stars 7.38 score 125 scripts 1 dependentsalexanderlange53
svars:Data-Driven Identification of SVAR Models
Implements data-driven identification methods for structural vector autoregressive (SVAR) models as described in Lange et al. (2021) <doi:10.18637/jss.v097.i05>. Based on an existing VAR model object (provided by e.g. VAR() from the 'vars' package), the structural impact matrix is obtained via data-driven identification techniques (i.e. changes in volatility (Rigobon, R. (2003) <doi:10.1162/003465303772815727>), patterns of GARCH (Normadin, M., Phaneuf, L. (2004) <doi:10.1016/j.jmoneco.2003.11.002>), independent component analysis (Matteson, D. S, Tsay, R. S., (2013) <doi:10.1080/01621459.2016.1150851>), least dependent innovations (Herwartz, H., Ploedt, M., (2016) <doi:10.1016/j.jimonfin.2015.11.001>), smooth transition in variances (Luetkepohl, H., Netsunajev, A. (2017) <doi:10.1016/j.jedc.2017.09.001>) or non-Gaussian maximum likelihood (Lanne, M., Meitz, M., Saikkonen, P. (2017) <doi:10.1016/j.jeconom.2016.06.002>)).
Maintained by Alexander Lange. Last updated 2 years ago.
20.4 match 46 stars 7.22 score 130 scriptsidigbio
ridigbio:Interface to the iDigBio Data API
An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio.
Maintained by Jesse Bennett. Last updated 6 days ago.
10.0 match 16 stars 10.23 score 63 scripts 7 dependentsbioc
MSnID:Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications
Extracts MS/MS ID data from mzIdentML (leveraging mzID package) or text files. After collating the search results from multiple datasets it assesses their identification quality and optimize filtering criteria to achieve the maximum number of identifications while not exceeding a specified false discovery rate. Also contains a number of utilities to explore the MS/MS results and assess missed and irregular enzymatic cleavages, mass measurement accuracy, etc.
Maintained by Vlad Petyuk. Last updated 5 months ago.
proteomicsmassspectrometryimmunooncology
17.6 match 5.06 score 57 scriptsmeyer-lab-cshl
plinkQC:Genotype Quality Control with 'PLINK'
Genotyping arrays enable the direct measurement of an individuals genotype at thousands of markers. 'plinkQC' facilitates genotype quality control for genetic association studies as described by Anderson and colleagues (2010) <doi:10.1038/nprot.2010.116>. It makes 'PLINK' basic statistics (e.g. missing genotyping rates per individual, allele frequencies per genetic marker) and relationship functions accessible from 'R' and generates a per-individual and per-marker quality control report. Individuals and markers that fail the quality control can subsequently be removed to generate a new, clean dataset. Removal of individuals based on relationship status is optimised to retain as many individuals as possible in the study.
Maintained by Hannah Meyer. Last updated 3 years ago.
12.6 match 58 stars 6.75 score 49 scriptslucaweihs
SEMID:Identifiability of Linear Structural Equation Models
Provides routines to check identifiability or non-identifiability of linear structural equation models as described in Drton, Foygel, and Sullivant (2011) <doi:10.1214/10-AOS859>, Foygel, Draisma, and Drton (2012) <doi:10.1214/12-AOS1012>, and other works. The routines are based on the graphical representation of structural equation models.
Maintained by Nils Sturma. Last updated 2 years ago.
19.8 match 4 stars 4.06 score 29 scriptsbsvars
bsvars:Bayesian Estimation of Structural Vector Autoregressive Models
Provides fast and efficient procedures for Bayesian analysis of Structural Vector Autoregressions. This package estimates a wide range of models, including homo-, heteroskedastic, and non-normal specifications. Structural models can be identified by adjustable exclusion restrictions, time-varying volatility, or non-normality. They all include a flexible three-level equation-specific local-global hierarchical prior distribution for the estimated level of shrinkage for autoregressive and structural parameters. Additionally, the package facilitates predictive and structural analyses such as impulse responses, forecast error variance and historical decompositions, forecasting, verification of heteroskedasticity, non-normality, and hypotheses on autoregressive parameters, as well as analyses of structural shocks, volatilities, and fitted values. Beautiful plots, informative summary functions, and extensive documentation including the vignette by Woลบniak (2024) <doi:10.48550/arXiv.2410.15090> complement all this. The implemented techniques align closely with those presented in Lรผtkepohl, Shang, Uzeda, & Woลบniak (2024) <doi:10.48550/arXiv.2404.11057>, Lรผtkepohl & Woลบniak (2020) <doi:10.1016/j.jedc.2020.103862>, and Song & Woลบniak (2021) <doi:10.1093/acrefore/9780190625979.013.174>. The 'bsvars' package is aligned regarding objects, workflows, and code structure with the R package 'bsvarSIGNs' by Wang & Woลบniak (2024) <doi:10.32614/CRAN.package.bsvarSIGNs>, and they constitute an integrated toolset.
Maintained by Tomasz Woลบniak. Last updated 1 months ago.
bayesian-inferenceeconometricsvector-autoregressionopenblascppopenmp
10.3 match 46 stars 7.67 score 32 scripts 1 dependentsjniedballa
camtrapR:Camera Trap Data Management and Preparation of Occupancy and Spatial Capture-Recapture Analyses
Management of and data extraction from camera trap data in wildlife studies. The package provides a workflow for storing and sorting camera trap photos (and videos), tabulates records of species and individuals, and creates detection/non-detection matrices for occupancy and spatial capture-recapture analyses with great flexibility. In addition, it can visualise species activity data and provides simple mapping functions with GIS export.
Maintained by Juergen Niedballa. Last updated 3 months ago.
occupancy-modelingspatial-capture-recapturewildlife
8.6 match 35 stars 8.65 score 178 scriptsplangfelder
WGCNA:Weighted Correlation Network Analysis
Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.
Maintained by Peter Langfelder. Last updated 6 months ago.
7.7 match 54 stars 9.65 score 5.3k scripts 32 dependentsabjur
abjutils:Useful Tools for Jurimetrical Analysis Used by the Brazilian Jurimetrics Association
The Brazilian Jurimetrics Association (ABJ in Portuguese, see <https://abj.org.br/> for more information) is a non-profit organization which aims to investigate and promote the use of statistics and probability in the study of Law and its institutions. This package implements general purpose tools used by ABJ, such as functions for sampling and basic manipulation of Brazilian lawsuits identification number. It also implements functions for text cleaning, such as accentuation removal.
Maintained by Caio Lente. Last updated 1 years ago.
10.9 match 55 stars 6.76 score 78 scripts 1 dependentsyangcq-ivy
NicheBarcoding:Niche-model-Based Species Identification
Species Identification using DNA Barcodes Integrated with Environmental Niche Models.
Maintained by Cai-qing YANG. Last updated 7 months ago.
16.7 match 1 stars 4.18 score 7 scriptsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 17 days ago.
5.7 match 12 stars 11.88 score 448 scripts 16 dependentsbioc
AMOUNTAIN:Active modules for multilayer weighted gene co-expression networks: a continuous optimization approach
A pure data-driven gene network, weighted gene co-expression network (WGCN) could be constructed only from expression profile. Different layers in such networks may represent different time points, multiple conditions or various species. AMOUNTAIN aims to search active modules in multi-layer WGCN using a continuous optimization approach.
Maintained by Dong Li. Last updated 5 months ago.
geneexpressionmicroarraydifferentialexpressionnetworkgsl
16.5 match 3.78 score 1 scripts 1 dependentsvalentint
rrcovHD:Robust Multivariate Methods for High Dimensional Data
Robust multivariate methods for high dimensional data including outlier detection (Filzmoser and Todorov (2013) <doi:10.1016/j.ins.2012.10.017>), robust sparse PCA (Croux et al. (2013) <doi:10.1080/00401706.2012.727746>, Todorov and Filzmoser (2013) <doi:10.1007/978-3-642-33042-1_31>), robust PLS (Todorov and Filzmoser (2014) <doi:10.17713/ajs.v43i4.44>), and robust sparse classification (Ortner et al. (2020) <doi:10.1007/s10618-019-00666-8>).
Maintained by Valentin Todorov. Last updated 7 months ago.
18.3 match 3.39 score 49 scriptscpanse
protViz:Visualizing and Analyzing Mass Spectrometry Related Data in Proteomics
Helps with quality checks, visualizations and analysis of mass spectrometry data, coming from proteomics experiments. The package is developed, tested and used at the Functional Genomics Center Zurich <https://fgcz.ch>. We use this package mainly for prototyping, teaching, and having fun with proteomics data. But it can also be used to do data analysis for small scale data sets.
Maintained by Christian Panse. Last updated 1 years ago.
funmass-spectrometrypeptide-identificationproteomicsquantificationvisualizationcpp
7.5 match 11 stars 7.88 score 72 scripts 2 dependentsr-forge
car:Companion to Applied Regression
Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.
Maintained by John Fox. Last updated 5 months ago.
3.8 match 15.29 score 43k scripts 901 dependentssantikka
causaleffect:Deriving Expressions of Joint Interventional Distributions and Transport Formulas in Causal Models
Functions for identification and transportation of causal effects. Provides a conditional causal effect identification algorithm (IDC) by Shpitser, I. and Pearl, J. (2006) <http://ftp.cs.ucla.edu/pub/stat_ser/r329-uai.pdf>, an algorithm for transportability from multiple domains with limited experiments by Bareinboim, E. and Pearl, J. (2014) <http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf>, and a selection bias recovery algorithm by Bareinboim, E. and Tian, J. (2015) <http://ftp.cs.ucla.edu/pub/stat_ser/r445.pdf>. All of the previously mentioned algorithms are based on a causal effect identification algorithm by Tian , J. (2002) <http://ftp.cs.ucla.edu/pub/stat_ser/r309.pdf>.
Maintained by Santtu Tikka. Last updated 2 years ago.
causal-inferencecausal-modelscausality-algorithmsdirected-acyclic-graphgraphsidentifiabilityidentificationigraph
10.9 match 29 stars 5.28 score 44 scripts 1 dependentspharmaverse
pharmaversesdtm:SDTM Test Data for the 'Pharmaverse' Family of Packages
A set of Study Data Tabulation Model (SDTM) datasets from the Clinical Data Interchange Standards Consortium (CDISC) pilot project used for testing and developing Analysis Data Model (ADaM) datasets inside the pharmaverse family of packages. SDTM dataset specifications are described in the CDISC SDTM implementation guide, accessible by creating a free account on <https://www.cdisc.org/>.
Maintained by Edoardo Mancini. Last updated 16 hours ago.
7.5 match 15 stars 7.46 score 143 scriptsbioc
AlpsNMR:Automated spectraL Processing System for NMR
Reads Bruker NMR data directories both zipped and unzipped. It provides automated and efficient signal processing for untargeted NMR metabolomics. It is able to interpolate the samples, detect outliers, exclude regions, normalize, detect peaks, align the spectra, integrate peaks, manage metadata and visualize the spectra. After spectra proccessing, it can apply multivariate analysis on extracted data. Efficient plotting with 1-D data is also available. Basic reading of 1D ACD/Labs exported JDX samples is also available.
Maintained by Sergio Oller Moreno. Last updated 5 months ago.
softwarepreprocessingvisualizationclassificationcheminformaticsmetabolomicsdataimport
7.3 match 15 stars 7.59 score 12 scripts 1 dependentsbioc
mzR:parser for netCDF, mzXML and mzML and mzIdentML files (mass spectrometry data)
mzR provides a unified API to the common file formats and parsers available for mass spectrometry data. It comes with a subset of the proteowizard library for mzXML, mzML and mzIdentML. The netCDF reading code has previously been used in XCMS.
Maintained by Steffen Neumann. Last updated 1 months ago.
immunooncologyinfrastructuredataimportproteomicsmetabolomicsmassspectrometryzlibcpp
4.3 match 45 stars 12.77 score 204 scripts 44 dependentscran
apc:Age-Period-Cohort Analysis
Functions for age-period-cohort analysis. Aggregate data can be organised in matrices indexed by age-cohort, age-period or cohort-period. The data can include dose and response or just doses. The statistical model is a generalized linear model (GLM) allowing for 3,2,1 or 0 of the age-period-cohort factors. Individual-level data should have a row for each individual and columns for each of age, period, and cohort. The statistical model for repeated cross-section is a generalized linear model. The statistical model for panel data is ordinary least squares. The canonical parametrisation of Kuang, Nielsen and Nielsen (2008) <DOI:10.1093/biomet/asn026> is used. Thus, the analysis does not rely on ad hoc identification.
Maintained by Bent Nielsen. Last updated 4 years ago.
12.1 match 4.49 score 49 scriptsveronicanava
RamanMP:Analysis and Identification of Raman Spectra of Microplastics
Pre-processing and polymer identification of Raman spectra of plastics. Pre-processing includes normalisation functions, peak identification based on local maxima, smoothing process and removal of spectral region of no interest. Polymer identification can be performed using Pearson correlation coefficient or Euclidean distance (Renner et al. (2019), <doi:10.1016/j.trac.2018.12.004>), and the comparison can be done with a user-defined database or with the database already implemented in the package, which currently includes 356 spectra, with several spectra of plastic colorants.
Maintained by Veronica Nava. Last updated 3 years ago.
15.4 match 6 stars 3.48 score 1 scriptsbioc
survtype:Subtype Identification with Survival Data
Subtypes are defined as groups of samples that have distinct molecular and clinical features. Genomic data can be analyzed for discovering patient subtypes, associated with clinical data, especially for survival information. This package is aimed to identify subtypes that are both clinically relevant and biologically meaningful.
Maintained by Dongmin Jung. Last updated 5 months ago.
softwarestatisticalmethodgeneexpressionsurvivalclusteringsequencingcoverage
13.0 match 4.00 score 3 scriptsbioc
betaHMM:A Hidden Markov Model Approach for Identifying Differentially Methylated Sites and Regions for Beta-Valued DNA Methylation Data
A novel approach utilizing a homogeneous hidden Markov model. And effectively model untransformed beta values. To identify DMCs while considering the spatial. Correlation of the adjacent CpG sites.
Maintained by Koyel Majumdar. Last updated 3 months ago.
dnamethylationdifferentialmethylationimmunooncologybiomedicalinformaticsmethylationarraysoftwaremultiplecomparisonsequencingspatialcoveragegenetargethiddenmarkovmodelmicroarray
12.3 match 4.18 scoremagnusdv
dvir:Disaster Victim Identification
Joint DNA-based disaster victim identification (DVI), as described in Vigeland and Egeland (2021) <doi:10.21203/rs.3.rs-296414/v1>. Identification is performed by optimising the joint likelihood of all victim samples and reference individuals. Individual identification probabilities, conditional on all available information, are derived from the joint solution in the form of posterior pairing probabilities. 'dvir' is part of the 'pedsuite' collection of packages for pedigree analysis.
Maintained by Magnus Dehli Vigeland. Last updated 3 months ago.
10.1 match 3 stars 5.05 score 21 scripts 1 dependentsa91quaini
intrinsicFRP:An R Package for Factor Model Asset Pricing
Functions for evaluating and testing asset pricing models, including estimation and testing of factor risk premia, selection of "strong" risk factors (factors having nonzero population correlation with test asset returns), heteroskedasticity and autocorrelation robust covariance matrix estimation and testing for model misspecification and identification. The functions for estimating and testing factor risk premia implement the Fama-MachBeth (1973) <doi:10.1086/260061> two-pass approach, the misspecification-robust approaches of Kan-Robotti-Shanken (2013) <doi:10.1111/jofi.12035>, and the approaches based on tradable factor risk premia of Quaini-Trojani-Yuan (2023) <doi:10.2139/ssrn.4574683>. The functions for selecting the "strong" risk factors are based on the Oracle estimator of Quaini-Trojani-Yuan (2023) <doi:10.2139/ssrn.4574683> and the factor screening procedure of Gospodinov-Kan-Robotti (2014) <doi:10.2139/ssrn.2579821>. The functions for evaluating model misspecification implement the HJ model misspecification distance of Kan-Robotti (2008) <doi:10.1016/j.jempfin.2008.03.003>, which is a modification of the prominent Hansen-Jagannathan (1997) <doi:10.1111/j.1540-6261.1997.tb04813.x> distance. The functions for testing model identification specialize the Kleibergen-Paap (2006) <doi:10.1016/j.jeconom.2005.02.011> and the Chen-Fang (2019) <doi:10.1111/j.1540-6261.1997.tb04813.x> rank test to the regression coefficient matrix of test asset returns on risk factors. Finally, the function for heteroskedasticity and autocorrelation robust covariance estimation implements the Newey-West (1994) <doi:10.2307/2297912> covariance estimator.
Maintained by Alberto Quaini. Last updated 8 months ago.
factor-modelsfactor-selectionfinanceidentification-testsmisspecificationrcpparmadillorisk-premiumopenblascppopenmp
11.5 match 7 stars 4.45 score 1 scriptsigordot
clustermole:Unbiased Single-Cell Transcriptomic Data Cell Type Identification
Assignment of cell type labels to single-cell RNA sequencing (scRNA-seq) clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases.
Maintained by Igor Dolgalev. Last updated 1 years ago.
cell-typecell-type-annotationcell-type-classificationcell-type-identificationcell-type-matchinggene-expression-signaturesscrna-seqsingle-cell
9.5 match 13 stars 5.37 score 36 scriptsbioc
CNEr:CNE Detection and Visualization
Large-scale identification and advanced visualization of sets of conserved noncoding elements.
Maintained by Ge Tan. Last updated 5 months ago.
generegulationvisualizationdataimport
5.4 match 3 stars 9.28 score 35 scripts 19 dependentsbioc
miRspongeR:Identification and analysis of miRNA sponge regulation
This package provides several functions to explore miRNA sponge (also called ceRNA or miRNA decoy) regulation from putative miRNA-target interactions or/and transcriptomics data (including bulk, single-cell and spatial gene expression data). It provides eight popular methods for identifying miRNA sponge interactions, and an integrative method to integrate miRNA sponge interactions from different methods, as well as the functions to validate miRNA sponge interactions, and infer miRNA sponge modules, conduct enrichment analysis of miRNA sponge modules, and conduct survival analysis of miRNA sponge modules. By using a sample control variable strategy, it provides a function to infer sample-specific miRNA sponge interactions. In terms of sample-specific miRNA sponge interactions, it implements three similarity methods to construct sample-sample correlation network.
Maintained by Junpeng Zhang. Last updated 5 months ago.
geneexpressionbiomedicalinformaticsnetworkenrichmentsurvivalmicroarraysoftwaresinglecellspatialrnaseqcernamirnasponge
8.4 match 5 stars 5.88 score 8 scriptsbioc
MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics
MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.
Maintained by Laurent Gatto. Last updated 3 days ago.
immunooncologyinfrastructureproteomicsmassspectrometryqualitycontroldataimportbioconductorbioinformaticsmass-spectrometryproteomics-datavisualisationcpp
3.8 match 130 stars 12.81 score 772 scripts 36 dependentsasalavaty
influential:Identification and Classification of the Most Influential Nodes
Contains functions for the classification and ranking of top candidate features, reconstruction of networks from adjacency matrices and data frames, analysis of the topology of the network and calculation of centrality measures, and identification of the most influential nodes. Also, a function is provided for running SIRIR model, which is the combination of leave-one-out cross validation technique and the conventional SIR model, on a network to unsupervisedly rank the true influence of vertices. Additionally, some functions have been provided for the assessment of dependence and correlation of two network centrality measures as well as the conditional probability of deviation from their corresponding means in opposite direction. Fred Viole and David Nawrocki (2013, ISBN:1490523995). Csardi G, Nepusz T (2006). "The igraph software package for complex network research." InterJournal, Complex Systems, 1695. Adopted algorithms and sources are referenced in function document.
Maintained by Adrian Salavaty. Last updated 5 months ago.
centrality-measuresclassification-modelinfluence-rankingnetwork-analysispriaritization-model
7.4 match 27 stars 6.54 score 43 scripts 1 dependentsthibautjombart
adegenet:Exploratory Analysis of Genetic and Genomic Data
Toolset for the exploration of genetic and genomic data. Adegenet provides formal (S4) classes for storing and handling various genetic data, including genetic markers with varying ploidy and hierarchical population structure ('genind' class), alleles counts by populations ('genpop'), and genome-wide SNP data ('genlight'). It also implements original multivariate methods (DAPC, sPCA), graphics, statistical tests, simulation tools, distance and similarity measures, and several spatial methods. A range of both empirical and simulated datasets is also provided to illustrate various methods.
Maintained by Zhian N. Kamvar. Last updated 1 months ago.
3.7 match 182 stars 12.60 score 1.9k scripts 29 dependentsbioc
ILoReg:ILoReg: a tool for high-resolution cell population identification from scRNA-Seq data
ILoReg is a tool for identification of cell populations from scRNA-seq data. In particular, ILoReg is useful for finding cell populations with subtle transcriptomic differences. The method utilizes a self-supervised learning method, called Iteratitive Clustering Projection (ICP), to find cluster probabilities, which are used in noise reduction prior to PCA and the subsequent hierarchical clustering and t-SNE steps. Additionally, functions for differential expression analysis to find gene markers for the populations and gene expression visualization are provided.
Maintained by Johannes Smolander. Last updated 5 months ago.
singlecellsoftwareclusteringdimensionreductionrnaseqvisualizationtranscriptomicsdatarepresentationdifferentialexpressiontranscriptiongeneexpression
9.3 match 5 stars 4.88 score 2 scriptscran
deident:Persistent Data Anonymization Pipeline
A framework for the replicable removal of personally identifiable data (PID) in data sets. The package implements a suite of methods to suit different data types based on the suggestions of Garfinkel (2015) <doi:10.6028/NIST.IR.8053> and the ICO "Guidelines on Anonymization" (2012) <https://ico.org.uk/media/1061/anonymisation-code.pdf>.
Maintained by Robert Cook. Last updated 4 months ago.
14.2 match 3.16 score 16 scriptsklausvigo
kknn:Weighted k-Nearest Neighbors
Weighted k-Nearest Neighbors for Classification, Regression and Clustering.
Maintained by Klaus Schliep. Last updated 4 years ago.
4.0 match 23 stars 11.08 score 4.6k scripts 41 dependentsbioc
synapter:Label-free data analysis pipeline for optimal identification and quantitation
The synapter package provides functionality to reanalyse label-free proteomics data acquired on a Synapt G2 mass spectrometer. One or several runs, possibly processed with additional ion mobility separation to increase identification accuracy can be combined to other quantitation files to maximise identification and quantitation accuracy.
Maintained by Laurent Gatto. Last updated 6 days ago.
immunooncologymassspectrometryproteomicsqualitycontrol
9.1 match 4 stars 4.73 score 5 scriptsbioc
ASICS:Automatic Statistical Identification in Complex Spectra
With a set of pure metabolite reference spectra, ASICS quantifies concentration of metabolites in a complex spectrum. The identification of metabolites is performed by fitting a mixture model to the spectra of the library with a sparse penalty. The method and its statistical properties are described in Tardivel et al. (2017) <doi:10.1007/s11306-017-1244-5>.
Maintained by Gaรซlle Lefort. Last updated 5 months ago.
softwaredataimportcheminformaticsmetabolomics
8.2 match 5.18 score 30 scriptsbioc
iPAC:Identification of Protein Amino acid Clustering
iPAC is a novel tool to identify somatic amino acid mutation clustering within proteins while taking into account protein structure.
Maintained by Gregory Ryslik. Last updated 3 days ago.
7.7 match 5.56 score 4 scripts 3 dependentsdavidhofmeyr
PPCI:Projection Pursuit for Cluster Identification
Implements recently developed projection pursuit algorithms for finding optimal linear cluster separators. The clustering algorithms use optimal hyperplane separators based on minimum density, Pavlidis et. al (2016) <https://jmlr.csail.mit.edu/papers/volume17/15-307/15-307.pdf>; minimum normalised cut, Hofmeyr (2017) <doi:10.1109/TPAMI.2016.2609929>; and maximum variance ratio clusterability, Hofmeyr and Pavlidis (2015) <doi:10.1109/SSCI.2015.116>.
Maintained by David Hofmeyr. Last updated 5 years ago.
12.5 match 2 stars 3.26 score 18 scriptsbioc
doubletrouble:Identification and classification of duplicated genes
doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.
Maintained by Fabrรญcio Almeida-Silva. Last updated 5 days ago.
softwarewholegenomecomparativegenomicsfunctionalgenomicsphylogeneticsnetworkclassificationbioinformaticscomparative-genomicsgene-duplicationmolecular-evolutionwhole-genome-duplication
6.3 match 23 stars 6.44 score 17 scriptstidymodels
modeldata:Data Sets Useful for Modeling Examples
Data sets used for demonstrating or testing model-related packages are contained in this package.
Maintained by Max Kuhn. Last updated 5 months ago.
3.8 match 22 stars 10.66 score 2.2k scripts 17 dependentsrstudio
reticulate:Interface to 'Python'
Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.
Maintained by Tomasz Kalinowski. Last updated 13 hours ago.
1.9 match 1.7k stars 21.07 score 18k scripts 429 dependentspaterijk
MCDA:Support for the Multicriteria Decision Aiding Process
Support for the analyst in a Multicriteria Decision Aiding (MCDA) process with algorithms, preference elicitation and data visualisation functions. Sรฉbastien Bigaret, Richard Hodgett, Patrick Meyer, Tatyana Mironova, Alexandru Olteanu (2017) Supporting the multi-criteria decision aiding process : R and the MCDA package, Euro Journal On Decision Processes, Volume 5, Issue 1 - 4, pages 169 - 194 <doi:10.1007/s40070-017-0064-1>.
Maintained by Patrick Meyer. Last updated 2 years ago.
6.5 match 30 stars 6.04 score 182 scriptshdarjus
sparvaride:Variance Identification in Sparse Factor Analysis
This is an implementation of the algorithm described in Section 3 of Hosszejni and Frรผhwirth-Schnatter (2022) <doi:10.48550/arXiv.2211.00671>. The algorithm is used to verify that the counting rule CR(r,1) holds for the sparsity pattern of the transpose of a factor loading matrix. As detailed in Section 2 of the same paper, if CR(r,1) holds, then the idiosyncratic variances are generically identified. If CR(r,1) does not hold, then we do not know whether the idiosyncratic variances are identified or not.
Maintained by Darjus Hosszejni. Last updated 2 years ago.
econometricsfactor-analysislatent-factorsparameter-identificationcpp
10.5 match 1 stars 3.70 score 4 scriptsbioc
IVAS:Identification of genetic Variants affecting Alternative Splicing
Identification of genetic variants affecting alternative splicing.
Maintained by Seonggyun Han. Last updated 5 months ago.
immunooncologyalternativesplicingdifferentialexpressiondifferentialsplicinggeneexpressiongeneregulationregressionrnaseqsequencingsnpsoftwaretranscription
8.1 match 4.78 score 1 scripts 1 dependentsbioc
CluMSID:Clustering of MS2 Spectra for Metabolite Identification
CluMSID is a tool that aids the identification of features in untargeted LC-MS/MS analysis by the use of MS2 spectra similarity and unsupervised statistical methods. It offers functions for a complete and customisable workflow from raw data to visualisations and is interfaceable with the xmcs family of preprocessing packages.
Maintained by Tobias Depke. Last updated 5 months ago.
metabolomicspreprocessingclustering
6.4 match 10 stars 6.04 score 22 scriptsbioc
PIUMA:Phenotypes Identification Using Mapper from topological data Analysis
The PIUMA package offers a tidy pipeline of Topological Data Analysis frameworks to identify and characterize communities in high and heterogeneous dimensional data.
Maintained by Mattia Chiesa. Last updated 5 months ago.
clusteringgraphandnetworkdimensionreductionnetworkclassification
7.3 match 4 stars 5.08 score 2 scriptsthomasjemielita
StratifiedMedicine:Stratified Medicine
A toolkit for stratified medicine, subgroup identification, and precision medicine. Current tools include (1) filtering models (reduce covariate space), (2) patient-level estimate models (counterfactual patient-level quantities, such as the conditional average treatment effect), (3) subgroup identification models (find subsets of patients with similar treatment effects), and (4) treatment effect estimation and inference (for the overall population and discovered subgroups). These tools can be customized and are directly used in PRISM (patient response identifiers for stratified medicine; Jemielita and Mehrotra 2019 <arXiv:1912.03337>. This package is in beta and will be continually updated.
Maintained by Thomas Jemielita. Last updated 3 years ago.
7.8 match 2 stars 4.73 score 27 scriptsxdomingoal
erah:Automated Spectral Deconvolution, Alignment, and Metabolite Identification in GC/MS-Based Untargeted Metabolomics
Automated compound deconvolution, alignment across samples, and identification of metabolites by spectral library matching in Gas Chromatography - Mass spectrometry (GC-MS) untargeted metabolomics. Outputs a table with compound names, matching scores and the integrated area of the compound for each sample. Package implementation is described in Domingo-Almenara et al. (2016) <doi:10.1021/acs.analchem.6b02927>.
Maintained by Xavier Domingo-Almenara. Last updated 1 years ago.
7.7 match 5 stars 4.70 score 20 scriptsbioc
rTRM:Identification of Transcriptional Regulatory Modules from Protein-Protein Interaction Networks
rTRM identifies transcriptional regulatory modules (TRMs) from protein-protein interaction networks.
Maintained by Diego Diez. Last updated 5 months ago.
transcriptionnetworkgeneregulationgraphandnetworkbioconductorbioinformatics
7.4 match 3 stars 4.86 score 3 scripts 1 dependentskurthornik
mlbench:Machine Learning Benchmark Problems
A collection of artificial and real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.
Maintained by Kurt Hornik. Last updated 3 months ago.
4.0 match 2 stars 8.93 score 5.0k scripts 55 dependentsbioboot
bio3d:Biological Structure Analysis
Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.
Maintained by Barry Grant. Last updated 5 months ago.
4.2 match 5 stars 8.49 score 1.4k scripts 10 dependentsmlampros
fastText:Efficient Learning of Word Representations and Sentence Classification
An interface to the 'fastText' <https://github.com/facebookresearch/fastText> library for efficient learning of word representations and sentence classification. The 'fastText' algorithm is explained in detail in (i) "Enriching Word Vectors with subword Information", Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov, 2017, <doi:10.1162/tacl_a_00051>; (ii) "Bag of Tricks for Efficient Text Classification", Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov, 2017, <doi:10.18653/v1/e17-2068>; (iii) "FastText.zip: Compressing text classification models", Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Herve Jegou, Tomas Mikolov, 2016, <arXiv:1612.03651>.
Maintained by Lampros Mouselimis. Last updated 1 years ago.
4.8 match 42 stars 7.37 score 56 scriptsbioc
TADCompare:TADCompare: Identification and characterization of differential TADs
TADCompare is an R package designed to identify and characterize differential Topologically Associated Domains (TADs) between multiple Hi-C contact matrices. It contains functions for finding differential TADs between two datasets, finding differential TADs over time and identifying consensus TADs across multiple matrices. It takes all of the main types of HiC input and returns simple, comprehensive, easy to analyze results.
Maintained by Mikhail Dozmorov. Last updated 5 months ago.
softwarehicsequencingfeatureextractionclustering
5.0 match 23 stars 7.04 score 10 scriptsingorohlfing
MMRcaseselection:Case Classification and Selection Based on Regression Results
Researchers doing a mixed-methods analysis (nested analysis as developed by Lieberman (2005) <doi:10.1017/S0003055405051762>) can use the package for the classification of cases and case selection using results of a linear regression. One can designate cases as typical, deviant, extreme and pathway case and use different case selection strategies for the choice of a case belonging to one of these types.
Maintained by Ingo Rohlfing. Last updated 3 years ago.
8.0 match 1 stars 4.38 score 12 scriptsbioc
ppcseq:Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models
Relative transcript abundance has proven to be a valuable tool for understanding the function of genes in biological systems. For the differential analysis of transcript abundance using RNA sequencing data, the negative binomial model is by far the most frequently adopted. However, common methods that are based on a negative binomial model are not robust to extreme outliers, which we found to be abundant in public datasets. So far, no rigorous and probabilistic methods for detection of outliers have been developed for RNA sequencing data, leaving the identification mostly to visual inspection. Recent advances in Bayesian computation allow large-scale comparison of observed data against its theoretical distribution given in a statistical model. Here we propose ppcseq, a key quality-control tool for identifying transcripts that include outlier data points in differential expression analysis, which do not follow a negative binomial distribution. Applying ppcseq to analyse several publicly available datasets using popular tools, we show that from 3 to 10 percent of differentially abundant transcripts across algorithms and datasets had statistics inflated by the presence of outliers.
Maintained by Stefano Mangiola. Last updated 5 months ago.
rnaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsbayesian-inferencedeseq2edgernegative-binomialoutlierstancpp
6.1 match 8 stars 5.71 score 16 scriptssantikka
dosearch:Causal Effect Identification from Multiple Incomplete Data Sources
Identification of causal effects from arbitrary observational and experimental probability distributions via do-calculus and standard probability manipulations using a search-based algorithm by Tikka, Hyttinen and Karvanen (2021) <doi:10.18637/jss.v099.i05>. Allows for the presence of mechanisms related to selection bias (Bareinboim and Tian, 2015) <doi:10.1609/aaai.v29i1.9679>, transportability (Bareinboim and Pearl, 2014) <http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf>, missing data (Mohan, Pearl, and Tian, 2013) <http://ftp.cs.ucla.edu/pub/stat_ser/r410.pdf>) and arbitrary combinations of these. Also supports identification in the presence of context-specific independence (CSI) relations through labeled directed acyclic graphs (LDAG). For details on CSIs see (Corander et al., 2019) <doi:10.1016/j.apal.2019.04.004>.
Maintained by Santtu Tikka. Last updated 8 months ago.
c-plus-pluscausal-inferencecausal-modelscausalitycausality-algorithmsdirected-acyclic-graphgraphslabeled-graphscpp
6.5 match 7 stars 5.32 score 8 scripts 1 dependentstopepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 3 months ago.
1.8 match 1.6k stars 19.24 score 61k scripts 303 dependentsbioc
cardelino:Clone Identification from Single Cell Data
Methods to infer clonal tree configuration for a population of cells using single-cell RNA-seq data (scRNA-seq), and possibly other data modalities. Methods are also provided to assign cells to inferred clones and explore differences in gene expression between clones. These methods can flexibly integrate information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. A flexible beta-binomial error model that accounts for stochastic dropout events as well as systematic allelic imbalance is used.
Maintained by Davis McCarthy. Last updated 5 months ago.
singlecellrnaseqvisualizationtranscriptomicsgeneexpressionsequencingsoftwareexomeseqclonal-clusteringgibbs-samplingscrna-seqsingle-cellsomatic-mutations
4.9 match 61 stars 7.05 score 62 scriptsbioc
CAMERA:Collection of annotation related methods for mass spectrometry data
Annotation of peaklists generated by xcms, rule based annotation of isotopes and adducts, isotope validation, EIC correlation based tagging of unknown adducts and fragments
Maintained by Steffen Neumann. Last updated 5 months ago.
immunooncologymassspectrometrymetabolomics
3.3 match 11 stars 10.27 score 175 scripts 6 dependentsbioc
CEMiTool:Co-expression Modules identification Tool
The CEMiTool package unifies the discovery and the analysis of coexpression gene modules in a fully automatic manner, while providing a user-friendly html report with high quality graphs. Our tool evaluates if modules contain genes that are over-represented by specific pathways or that are altered in a specific sample group. Additionally, CEMiTool is able to integrate transcriptomic data with interactome information, identifying the potential hubs on each network.
Maintained by Helder Nakaya. Last updated 5 months ago.
geneexpressiontranscriptomicsgraphandnetworkmrnamicroarrayrnaseqnetworknetworkenrichmentpathwaysimmunooncology
5.9 match 5.76 score 38 scriptsboopsboops
spider:Species Identity and Evolution in R
Analysis of species limits and DNA barcoding data. Included are functions for generating important summary statistics from DNA barcode data, assessing specimen identification efficacy, testing and optimizing divergence threshold limits, assessment of diagnostic nucleotides, and calculation of the probability of reciprocal monophyly. Additionally, a sliding window function offers opportunities to analyse information across a gene, often used for marker design in degraded DNA studies. Further information on the package has been published in Brown et al (2012) <doi:10.1111/j.1755-0998.2011.03108.x>.
Maintained by Rupert A. Collins. Last updated 6 years ago.
dna-barcodeednaevolutionspecies-delimitationspecies-identity
6.5 match 2 stars 5.20 score 66 scripts 1 dependentsbioc
FLAMES:FLAMES: Full Length Analysis of Mutations and Splicing in long read RNA-seq data
Semi-supervised isoform detection and annotation from both bulk and single-cell long read RNA-seq data. Flames provides automated pipelines for analysing isoforms, as well as intermediate functions for manual execution.
Maintained by Changqing Wang. Last updated 7 days ago.
rnaseqsinglecelltranscriptomicsdataimportdifferentialsplicingalternativesplicinggeneexpressionlongreadzlibcurlbzip2xz-utilscpp
4.3 match 31 stars 7.95 score 12 scriptsprise6
aVirtualTwins:Adaptation of Virtual Twins Method from Jared Foster
Research of subgroups in random clinical trials with binary outcome and two treatments groups. This is an adaptation of the Jared Foster method (<https://www.ncbi.nlm.nih.gov/pubmed/21815180>).
Maintained by Francois Vieille. Last updated 7 years ago.
7.5 match 4 stars 4.51 score 16 scriptsbioc
CHETAH:Fast and accurate scRNA-seq cell type identification
CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate, selective and fast scRNA-seq classifier. Classification is guided by a reference dataset, preferentially also a scRNA-seq dataset. By hierarchical clustering of the reference data, CHETAH creates a classification tree that enables a step-wise, top-to-bottom classification. Using a novel stopping rule, CHETAH classifies the input cells to the cell types of the references and to "intermediate types": more general classifications that ended in an intermediate node of the tree.
Maintained by Jurrian de Kanter. Last updated 5 months ago.
classificationrnaseqsinglecellclusteringgeneexpressionimmunooncology
4.6 match 44 stars 7.27 score 70 scriptsbioc
EventPointer:An effective identification of alternative splicing events using junction arrays and RNA-Seq data
EventPointer is an R package to identify alternative splicing events that involve either simple (case-control experiment) or complex experimental designs such as time course experiments and studies including paired-samples. The algorithm can be used to analyze data from either junction arrays (Affymetrix Arrays) or sequencing data (RNA-Seq). The software returns a data.frame with the detected alternative splicing events: gene name, type of event (cassette, alternative 3',...,etc), genomic position, statistical significance and increment of the percent spliced in (Delta PSI) for all the events. The algorithm can generate a series of files to visualize the detected alternative splicing events in IGV. This eases the interpretation of results and the design of primers for standard PCR validation.
Maintained by Juan A. Ferrer-Bonsoms. Last updated 5 months ago.
alternativesplicingdifferentialsplicingmrnamicroarrayrnaseqtranscriptionsequencingtimecourseimmunooncology
5.4 match 4 stars 6.00 score 6 scriptsemmanuelparadis
ape:Analyses of Phylogenetics and Evolution
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
Maintained by Emmanuel Paradis. Last updated 2 days ago.
1.9 match 64 stars 17.22 score 13k scripts 599 dependentsr-econometrics
lfe:Linear Group Fixed Effects
Transforms away factors with many levels prior to doing an OLS. Useful for estimating linear models with multiple group fixed effects, and for estimating linear models which uses factors with many levels as pure control variables. See Gaure (2013) <doi:10.1016/j.csda.2013.03.024> Includes support for instrumental variables, conditional F statistics for weak instruments, robust and multi-way clustered standard errors, as well as limited mobility bias correction (Gaure 2014 <doi:10.1002/sta4.68>). Since version 3.0, it provides dedicated functions to estimate Poisson models.
Maintained by Mauricio Vargas Sepulveda. Last updated 1 years ago.
3.1 match 10.30 score 1.8k scripts 5 dependentsbioc
geneAttribution:Identification of candidate genes associated with genetic variation
Identification of the most likely gene or genes through which variation at a given genomic locus in the human genome acts. The most basic functionality assumes that the closer gene is to the input locus, the more likely the gene is to be causative. Additionally, any empirical data that links genomic regions to genes (e.g. eQTL or genome conformation data) can be used if it is supplied in the UCSC .BED file format.
Maintained by Arthur Wuster. Last updated 5 months ago.
snpgenepredictiongenomewideassociationvariantannotationgenomicvariation
7.9 match 4.00 score 3 scriptsericarcher
banter:BioAcoustic eveNT classifiER
Create a hierarchical acoustic event species classifier out of multiple call type detectors as described in Rankin et al (2017) <doi:10.1111/mms.12381>.
Maintained by Eric Archer. Last updated 1 years ago.
acousticsbioacousticscetaceansclassificationdolphinsmachine-learningnoaarandom-forestspecies-identificationsupervised-learningsupervised-machine-learningwhalesjagscpp
7.5 match 9 stars 4.22 score 37 scriptsatbounds
ATbounds:Bounding Treatment Effects by Limited Information Pooling
Estimation and inference methods for bounding average treatment effects (on the treated) that are valid under an unconfoundedness assumption. The bounds are designed to be robust in challenging situations, for example, when the conditioning variables take on a large number of different values in the observed sample, or when the overlap condition is violated. This robustness is achieved by only using limited "pooling" of information across observations. For more details, see the paper by Lee and Weidner (2021), "Bounding Treatment Effects by Pooling Limited Information across Observations," <arXiv:2111.05243>.
Maintained by Sokbae Lee. Last updated 3 years ago.
causal-inferencelack-of-overlaplimited-overlappartial-identificationtreatment-effectsunconfoundedness-assumption
7.5 match 3 stars 4.18 score 6 scriptsbioc
BioNERO:Biological Network Reconstruction Omnibus
BioNERO aims to integrate all aspects of biological network inference in a single package, including data preprocessing, exploratory analyses, network inference, and analyses for biological interpretations. BioNERO can be used to infer gene coexpression networks (GCNs) and gene regulatory networks (GRNs) from gene expression data. Additionally, it can be used to explore topological properties of protein-protein interaction (PPI) networks. GCN inference relies on the popular WGCNA algorithm. GRN inference is based on the "wisdom of the crowds" principle, which consists in inferring GRNs with multiple algorithms (here, CLR, GENIE3 and ARACNE) and calculating the average rank for each interaction pair. As all steps of network analyses are included in this package, BioNERO makes users avoid having to learn the syntaxes of several packages and how to communicate between them. Finally, users can also identify consensus modules across independent expression sets and calculate intra and interspecies module preservation statistics between different networks.
Maintained by Fabricio Almeida-Silva. Last updated 5 months ago.
softwaregeneexpressiongeneregulationsystemsbiologygraphandnetworkpreprocessingnetworknetworkinference
4.0 match 27 stars 7.78 score 50 scripts 1 dependentsspluque
diveMove:Dive Analysis and Calibration
Utilities to represent, visualize, filter, analyse, and summarize time-depth recorder (TDR) data. Miscellaneous functions for handling location data are also provided.
Maintained by Sebastian P. Luque. Last updated 5 months ago.
animal-behaviorbehavioural-ecologybiologydivingscience
4.6 match 6 stars 6.75 score 55 scriptsbioc
IPO:Automated Optimization of XCMS Data Processing parameters
The outcome of XCMS data processing strongly depends on the parameter settings. IPO (`Isotopologue Parameter Optimization`) is a parameter optimization tool that is applicable for different kinds of samples and liquid chromatography coupled to high resolution mass spectrometry devices, fast and free of labeling steps. IPO uses natural, stable 13C isotopes to calculate a peak picking score. Retention time correction is optimized by minimizing the relative retention time differences within features and grouping parameters are optimized by maximizing the number of features showing exactly one peak from each injection of a pooled sample. The different parameter settings are achieved by design of experiment. The resulting scores are evaluated using response surface models.
Maintained by Thomas Lieb. Last updated 5 months ago.
immunooncologymetabolomicsmassspectrometry
3.8 match 34 stars 8.14 score 41 scriptstrevorhastie
mda:Mixture and Flexible Discriminant Analysis
Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, and vector-response smoothing splines. Hastie, Tibshirani and Friedman (2009) "Elements of Statistical Learning (second edition, chap 12)" Springer, New York.
Maintained by Trevor Hastie. Last updated 4 months ago.
4.0 match 3 stars 7.60 score 428 scripts 17 dependentsbioc
mzID:An mzIdentML parser for R
A parser for mzIdentML files implemented using the XML package. The parser tries to be general and able to handle all types of mzIdentML files with the drawback of having less 'pretty' output than a vendor specific parser. Please contact the maintainer with any problems and supply an mzIdentML file so the problems can be fixed quickly.
Maintained by Laurent Gatto. Last updated 5 months ago.
immunooncologydataimportmassspectrometryproteomics
3.9 match 7.83 score 32 scripts 38 dependentssokbae
ciccr:Causal Inference in Case-Control and Case-Population Studies
Estimation and inference methods for causal relative and attributable risk in case-control and case-population studies under the monotone treatment response and monotone treatment selection assumptions. For more details, see the paper by Jun and Lee (2023), "Causal Inference under Outcome-Based Sampling with Monotonicity Assumptions," <arXiv:2004.08318 [econ.EM]>, accepted for publication in Journal of Business & Economic Statistics.
Maintained by Sokbae Lee. Last updated 1 years ago.
case-control-studiescausal-inferencepartial-identificationtreatment-effects
7.5 match 2 stars 4.00 score 4 scriptseriqande
rubias:Bayesian Inference from the Conditional Genetic Stock Identification Model
Implements Bayesian inference for the conditional genetic stock identification model. It allows inference of mixed fisheries and also simulation of mixtures to predict accuracy. A full description of the underlying methods is available in a recently published article in the Canadian Journal of Fisheries and Aquatic Sciences: <doi:10.1139/cjfas-2018-0016>.
Maintained by Eric C. Anderson. Last updated 1 years ago.
5.1 match 3 stars 5.90 score 89 scriptsschaubert
catdata:Categorical Data
This R-package contains examples from the book "Regression for Categorical Data", Tutz 2012, Cambridge University Press. The names of the examples refer to the chapter and the data set that is used.
Maintained by Gunther Schauberger. Last updated 1 years ago.
4.5 match 6.61 score 158 scripts 2 dependentsbioc
DNAfusion:Identification of gene fusions using paired-end sequencing
DNAfusion can identify gene fusions such as EML4-ALK based on paired-end sequencing results. This package was developed using position deduplicated BAM files generated with the AVENIO Oncology Analysis Software. These files are made using the AVENIO ctDNA surveillance kit and Illumina Nextseq 500 sequencing. This is a targeted hybridization NGS approach and includes ALK-specific but not EML4-specific probes.
Maintained by Christoffer Trier Maansson. Last updated 5 months ago.
targetedresequencinggeneticsgenefusiondetectionsequencingbioconductor-packagecirculating-tumor-dnagene-fusionliquid-biopsynext-generation-sequencingtargeted-sequencingvariant-calling
6.6 match 3 stars 4.48 score 10 scriptsbiogenies
CancerGram:Prediction of Anticancer Peptides
Predicts anticancer peptides using random forests trained on the n-gram encoded peptides. The implemented algorithm can be accessed from both the command line and shiny-based GUI. The CancerGram model is too large for CRAN and it has to be downloaded separately from the repository: <https://github.com/BioGenies/CancerGramModel>. For more information see: Burdukiewicz et al. (2020) <doi:10.3390/pharmaceutics12111045>.
Maintained by Michal Burdukiewicz. Last updated 4 years ago.
anticancer-peptidesbioinformaticsk-mern-grampeptide-identificationrandom-forests
7.5 match 4 stars 3.90 score 3 scriptscvxgrp
CVXR:Disciplined Convex Optimization
An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.
Maintained by Anqi Fu. Last updated 4 months ago.
2.3 match 207 stars 12.89 score 768 scripts 51 dependentsdlcarl
TSCI:Tools for Causal Inference with Possibly Invalid Instrumental Variables
Two stage curvature identification with machine learning for causal inference in settings when instrumental variable regression is not suitable because of potentially invalid instrumental variables. Based on Guo and Buehlmann (2022) "Two Stage Curvature Identification with Machine Learning: Causal Inference with Possibly Invalid Instrumental Variables" <arXiv:2203.12808>. The vignette is available in Carl, Emmenegger, Bรผhlmann and Guo (2023) "TSCI: two stage curvature identification for causal inference with invalid instruments" <arXiv:2304.00513>.
Maintained by David Carl. Last updated 1 years ago.
9.6 match 1 stars 3.00 score 3 scriptsbioc
MethPed:A DNA methylation classifier tool for the identification of pediatric brain tumor subtypes
Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro).
Maintained by Helena Carรฉn. Last updated 5 months ago.
immunooncologydnamethylationclassificationepigenetics
7.2 match 4.00 score 1 scriptsropengov
hetu:Structural Handling of Finnish Personal Identity Codes
Structural handling of Finnish identity codes (natural persons and organizations); extract information, check ID validity and diagnostics.
Maintained by Pyry Kantanen. Last updated 4 months ago.
5.9 match 2 stars 4.86 score 18 scriptsbioc
LOBSTAHS:Lipid and Oxylipin Biomarker Screening through Adduct Hierarchy Sequences
LOBSTAHS is a multifunction package for screening, annotation, and putative identification of mass spectral features in large, HPLC-MS lipid datasets. In silico data for a wide range of lipids, oxidized lipids, and oxylipins can be generated from user-supplied structural criteria with a database generation function. LOBSTAHS then applies these databases to assign putative compound identities to features in any high-mass accuracy dataset that has been processed using xcms and CAMERA. Users can then apply a series of orthogonal screening criteria based on adduct ion formation patterns, chromatographic retention time, and other properties, to evaluate and assign confidence scores to this list of preliminary assignments. During the screening routine, LOBSTAHS rejects assignments that do not meet the specified criteria, identifies potential isomers and isobars, and assigns a variety of annotation codes to assist the user in evaluating the accuracy of each assignment.
Maintained by Henry Holm. Last updated 5 months ago.
immunooncologymassspectrometrymetabolomicslipidomicsdataimportadductalgaebioconductorhplc-esi-mslipidmass-spectrometryoxidative-stress-biomarkersoxidized-lipidsoxylipinsplankton
4.3 match 8 stars 6.56 score 9 scriptsconradwasko
hydroEvents:Extract Event Statistics in Hydrologic Time Series
Events from individual hydrologic time series are extracted, and events from multiple time series can be matched to each other. Tang, W. & Carey, S. K. (2017) <doi:10.1002/hyp.11185>. Kaur, S., Horne, A., Stewardson, M.J., Nathan, R., Costa, A.M., Szemis, J.M., & Webb, J.A. (2017) <doi:10.1080/24705357.2016.1276418>. Ladson, A., Brown, R., Neal, B., & Nathan, R. J. (2013) <doi:10.7158/W12-028.2013.17.1>.
Maintained by Conrad Wasko. Last updated 1 months ago.
7.0 match 6 stars 4.03 score 36 scriptsbioc
nnSVG:Scalable identification of spatially variable genes in spatially-resolved transcriptomics data
Method for scalable identification of spatially variable genes (SVGs) in spatially-resolved transcriptomics data. The method is based on nearest-neighbor Gaussian processes and uses the BRISC algorithm for model fitting and parameter estimation. Allows identification and ranking of SVGs with flexible length scales across a tissue slide or within spatial domains defined by covariates. Scales linearly with the number of spatial locations and can be applied to datasets containing thousands or more spatial locations.
Maintained by Lukas M. Weber. Last updated 20 days ago.
spatialsinglecelltranscriptomicsgeneexpressionpreprocessing
3.6 match 17 stars 7.75 score 183 scripts 1 dependentsbioc
cogeqc:Systematic quality checks on comparative genomics analyses
cogeqc aims to facilitate systematic quality checks on standard comparative genomics analyses to help researchers detect issues and select the most suitable parameters for each data set. cogeqc can be used to asses: i. genome assembly and annotation quality with BUSCOs and comparisons of statistics with publicly available genomes on the NCBI; ii. orthogroup inference using a protein domain-based approach and; iii. synteny detection using synteny network properties. There are also data visualization functions to explore QC summary statistics.
Maintained by Fabrรญcio Almeida-Silva. Last updated 5 months ago.
softwaregenomeassemblycomparativegenomicsfunctionalgenomicsphylogeneticsqualitycontrolnetworkcomparative-genomicsevolutionary-genomics
4.5 match 10 stars 6.08 score 20 scriptsclavellab
maldipickr:Dereplicate and Cherry-Pick Mass Spectrometry Spectra
Convenient wrapper functions for the analysis of matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) spectra data in order to select only representative spectra (also called cherry-pick). The package covers the preprocessing and dereplication steps (based on Strejcek, Smrhova, Junkova and Uhlik (2018) <doi:10.3389/fmicb.2018.01294>) needed to cluster MALDI-TOF spectra before the final cherry-picking step. It enables the easy exclusion of spectra and/or clusters to accommodate complex cherry-picking strategies. Alternatively, cherry-picking using taxonomic identification MALDI-TOF data is made easy with functions to import inconsistently formatted reports.
Maintained by Charlie Pauvert. Last updated 25 days ago.
cherry-pickdereplicationmaldi-tof-ms
5.1 match 2 stars 5.32 score 8 scriptscran
mpwR:Standardized Comparison of Workflows in Mass Spectrometry-Based Bottom-Up Proteomics
Useful functions to analyze proteomic workflows including number of identifications, data completeness, missed cleavages, quantitative and retention time precision etc. Various software outputs are supported such as 'ProteomeDiscoverer', 'Spectronaut', 'DIA-NN' and 'MaxQuant'.
Maintained by Oliver Kardell. Last updated 1 years ago.
8.3 match 3.30 scorebioc
GARS:GARS: Genetic Algorithm for the identification of Robust Subsets of variables in high-dimensional and challenging datasets
Feature selection aims to identify and remove redundant, irrelevant and noisy variables from high-dimensional datasets. Selecting informative features affects the subsequent classification and regression analyses by improving their overall performances. Several methods have been proposed to perform feature selection: most of them relies on univariate statistics, correlation, entropy measurements or the usage of backward/forward regressions. Herein, we propose an efficient, robust and fast method that adopts stochastic optimization approaches for high-dimensional. GARS is an innovative implementation of a genetic algorithm that selects robust features in high-dimensional and challenging datasets.
Maintained by Mattia Chiesa. Last updated 5 months ago.
classificationfeatureextractionclusteringopenjdk
5.5 match 5.00 score 2 scriptsncordon
imbalance:Preprocessing Algorithms for Imbalanced Datasets
Class imbalance usually damages the performance of classifiers. Thus, it is important to treat data before applying a classifier algorithm. This package includes recent resampling algorithms in the literature: (Barua et al. 2014) <doi:10.1109/tkde.2012.232>; (Das et al. 2015) <doi:10.1109/tkde.2014.2324567>, (Zhang et al. 2014) <doi:10.1016/j.inffus.2013.12.003>; (Gao et al. 2014) <doi:10.1016/j.neucom.2014.02.006>; (Almogahed et al. 2014) <doi:10.1007/s00500-014-1484-5>. It also includes an useful interface to perform oversampling.
Maintained by Ignacio Cordรณn. Last updated 5 years ago.
binary-classificationimbalanced-dataoversamplingopenblascpp
3.8 match 36 stars 7.14 score 98 scriptsbioc
GraphPAC:Identification of Mutational Clusters in Proteins via a Graph Theoretical Approach.
Identifies mutational clusters of amino acids in a protein while utilizing the proteins tertiary structure via a graph theoretical model.
Maintained by Gregory Ryslik. Last updated 3 days ago.
5.7 match 4.65 score 1 scripts 1 dependentskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
11.4 match 2.28 score 38 scriptsgabrielelubatti
MitoHEAR:Quantification of Mitochondrial DNA Heteroplasmy
R package that allows the estimation and downstream statistical analysis of the mitochondrial DNA Heteroplasmy calculated from single-cell datasets.
Maintained by Gabriele Lubatti. Last updated 3 years ago.
5.8 match 4.45 score 14 scriptsmagnusdv
forrel:Forensic Pedigree Analysis and Relatedness Inference
Forensic applications of pedigree analysis, including likelihood ratios for relationship testing, general relatedness inference, marker simulation, and power analysis. 'forrel' is part of the 'pedsuite', a collection of packages for pedigree analysis, further described in the book 'Pedigree Analysis in R' (Vigeland, 2021, ISBN:9780128244302). Several functions deal specifically with power analysis in missing person cases, implementing methods described in Vigeland et al. (2020) <doi:10.1016/j.fsigen.2020.102376>. Data import from the 'Familias' software (Egeland et al. (2000) <doi:10.1016/S0379-0738(00)00147-X>) is supported through the 'pedFamilias' package.
Maintained by Magnus Dehli Vigeland. Last updated 6 days ago.
3.6 match 11 stars 6.98 score 63 scripts 7 dependentsrfael0cm
RTIGER:HMM-Based Model for Genotyping and Cross-Over Identification
Our method integrates information from all sequenced samples, thus avoiding loss of alleles due to low coverage. Moreover, it increases the statistical power to uncover sequencing or alignment errors <doi:10.1093/plphys/kiad191>.
Maintained by Rafael Campos-Martin. Last updated 1 years ago.
genomeannotationhiddenmarkovmodelsequencing
5.8 match 4 stars 4.30 score 5 scriptspetertuwien
mvoutlier:Multivariate Outlier Detection Based on Robust Methods
Various methods for multivariate outlier detection: arw, a Mahalanobis-type method with an adaptive outlier cutoff value; locout, a method incorporating local neighborhood; pcout, a method for high-dimensional data; mvoutlier.CoDa, a method for compositional data. References are provided in the corresponding help files.
Maintained by P. Filzmoser. Last updated 4 years ago.
5.1 match 1 stars 4.84 score 294 scripts 4 dependentsjackmwolf
tehtuner:Fit and Tune Models to Detect Treatment Effect Heterogeneity
Implements methods to fit Virtual Twins models (Foster et al. (2011) <doi:10.1002/sim.4322>) for identifying subgroups with differential effects in the context of clinical trials while controlling the probability of falsely detecting a differential effect when the conditional average treatment effect is uniform across the study population using parameter selection methods proposed in Wolf et al. (2022) <doi:10.1177/17407745221095855>.
Maintained by Jack Wolf. Last updated 2 years ago.
clinical-trialsheterogeneity-of-treatment-effectsubgroup-identification
7.5 match 4 stars 3.30 score 6 scriptsbioc
CARNIVAL:A CAusal Reasoning tool for Network Identification (from gene expression data) using Integer VALue programming
An upgraded causal reasoning tool from Melas et al in R with updated assignments of TFs' weights from PROGENy scores. Optimization parameters can be freely adjusted and multiple solutions can be obtained and aggregated.
Maintained by Attila Gabor. Last updated 5 months ago.
transcriptomicsgeneexpressionnetworkcausal-modelsfootprintsinteger-linear-programmingpathway-enrichment-analysis
2.7 match 57 stars 9.03 score 90 scripts 1 dependentsbruigtp
REDCapDM:'REDCap' Data Management
REDCap Data Management - REDCapDM is an R package that allows users to manage data exported directly from REDCap or using an API connection. This package includes several functions designed for pre-processing data, generating reports of queries such as outliers or missing values, and following up on the identified queries. 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>) is a web application developed at Vanderbilt University, designed for creating and managing online surveys and databases and the REDCap API is an interface that allows external applications to connect to REDCap remotely, and is used to programmatically retrieve or modify project data or settings within REDCap, such as importing or exporting data.
Maintained by Joรฃo Carmezim. Last updated 3 days ago.
4.1 match 4 stars 5.89 score 9 scriptsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 5 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
1.8 match 182 stars 13.71 score 1.3k scripts 22 dependentsavrodrigues
naturaList:Classify Occurrences by Confidence Levels in the Species ID
Classify occurrence records based on confidence levels of species identification. In addition, implement tools to filter occurrences inside grid cells and to manually check for possibles errors with an interactive shiny application.
Maintained by Arthur Vinicius Rodrigues. Last updated 1 years ago.
5.1 match 4.66 score 23 scriptsbioc
Spectra:Spectra Infrastructure for Mass Spectrometry Data
The Spectra package defines an efficient infrastructure for storing and handling mass spectrometry spectra and functionality to subset, process, visualize and compare spectra data. It provides different implementations (backends) to store mass spectrometry data. These comprise backends tuned for fast data access and processing and backends for very large data sets ensuring a small memory footprint.
Maintained by RforMassSpectrometry Package Maintainer. Last updated 11 days ago.
infrastructureproteomicsmassspectrometrymetabolomicsbioconductorhacktoberfestmass-spectrometry
1.8 match 41 stars 13.01 score 254 scripts 35 dependentsbioc
planttfhunter:Identification and classification of plant transcription factors
planttfhunter is used to identify plant transcription factors (TFs) from protein sequence data and classify them into families and subfamilies using the classification scheme implemented in PlantTFDB. TFs are identified using pre-built hidden Markov model profiles for DNA-binding domains. Then, auxiliary and forbidden domains are used with DNA-binding domains to classify TFs into families and subfamilies (when applicable). Currently, TFs can be classified in 58 different TF families/subfamilies.
Maintained by Fabrรญcio Almeida-Silva. Last updated 5 months ago.
softwaretranscriptionfunctionalpredictiongenomeannotationfunctionalgenomicshiddenmarkovmodelsequencingclassificationfunctional-genomicsgene-familieshidden-markov-modelsplant-genomicsplantsprotein-domainstranscription-factors
5.8 match 4.00 score 5 scriptshanjunwei-lab
SMDIC:Identification of Somatic Mutation-Driven Immune Cells
A computing tool is developed to automated identify somatic mutation-driven immune cells. The operation modes including: i) inferring the relative abundance matrix of tumor-infiltrating immune cells and integrating it with a particular gene mutation status, ii) detecting differential immune cells with respect to the gene mutation status and converting the abundance matrix of significant differential immune cell into two binary matrices (one for up-regulated and one for down-regulated), iii) identifying somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples, and iv) visualization of immune cell abundance of samples in different mutation status..
Maintained by Junwei Han. Last updated 5 months ago.
5.8 match 2 stars 4.00 score 5 scriptsbioc
BindingSiteFinder:Binding site defintion based on iCLIP data
Precise knowledge on the binding sites of an RNA-binding protein (RBP) is key to understand (post-) transcriptional regulatory processes. Here we present a workflow that describes how exact binding sites can be defined from iCLIP data. The package provides functions for binding site definition and result visualization. For details please see the vignette.
Maintained by Mirko Brรผggemann. Last updated 1 days ago.
sequencinggeneexpressiongeneregulationfunctionalgenomicscoveragedataimportbinding-site-classificationbinding-sitesbioconductor-packageicliprna-binding-proteins
4.0 match 6 stars 5.73 score 3 scriptshristostyr
scoringfunctions:A Collection of Loss Functions for Assessing Point Forecasts
Implements multiple consistent scoring functions (Gneiting T (2011) <doi:10.1198/jasa.2011.r10138>) for assessing point forecasts and point predictions. Detailed documentation of scoring functions' properties is included for facilitating interpretation of results.
Maintained by Hristos Tyralis. Last updated 14 days ago.
15.3 match 1 stars 1.48 scorervlenth
emmeans:Estimated Marginal Means, aka Least-Squares Means
Obtain estimated marginal means (EMMs) for many linear, generalized linear, and mixed models. Compute contrasts or linear functions of EMMs, trends, and comparisons of slopes. Plots and other displays. Least-squares means are discussed, and the term "estimated marginal means" is suggested, in Searle, Speed, and Milliken (1980) Population marginal means in the linear model: An alternative to least squares means, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>.
Maintained by Russell V. Lenth. Last updated 4 days ago.
1.2 match 377 stars 19.19 score 13k scripts 187 dependentsbioc
adductomicsR:Processing of adductomic mass spectral datasets
Processes MS2 data to identify potentially adducted peptides from spectra that has been corrected for mass drift and retention time drift and quantifies MS1 level mass spectral peaks.
Maintained by Josie Hayes. Last updated 5 months ago.
massspectrometrymetabolomicssoftwarethirdpartyclientdataimportgui
5.6 match 1 stars 4.00 score 5 scriptssantikka
cfid:Identification of Counterfactual Queries in Causal Models
Facilitates the identification of counterfactual queries in structural causal models via the ID* and IDC* algorithms by Shpitser, I. and Pearl, J. (2007, 2008) <arXiv:1206.5294>, <https://jmlr.org/papers/v9/shpitser08a.html>. Provides a simple interface for defining causal diagrams and counterfactual conjunctions. Construction of parallel worlds graphs and counterfactual graphs is carried out automatically based on the counterfactual query and the causal diagram. See Tikka, S. (2023) <doi:10.32614/RJ-2023-053> for a tutorial of the package.
Maintained by Santtu Tikka. Last updated 8 months ago.
causal-inferencecausal-modelscausality-algorithmscounterfactualcounterfactualsdirected-acyclic-graphidentifiability
5.5 match 7 stars 4.02 score 2 scripts 1 dependentsconfig-i1
greybox:Toolbox for Model Building and Forecasting
Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
Maintained by Ivan Svetunkov. Last updated 3 days ago.
forecastingmodel-selectionmodel-selection-and-evaluationregressionregression-modelsstatisticscpp
2.0 match 30 stars 11.03 score 97 scripts 34 dependentsmarkmfredrickson
optmatch:Functions for Optimal Matching
Distance based bipartite matching using minimum cost flow, oriented to matching of treatment and control groups in observational studies ('Hansen' and 'Klopfer' 2006 <doi:10.1198/106186006X137047>). Routines are provided to generate distances from generalised linear models (propensity score matching), formulas giving variables on which to limit matched distances, stratified or exact matching directives, or calipers, alone or in combination.
Maintained by Josh Errickson. Last updated 3 months ago.
1.8 match 47 stars 12.22 score 588 scripts 5 dependentsbioc
EpiDISH:Epigenetic Dissection of Intra-Sample-Heterogeneity
EpiDISH is a R package to infer the proportions of a priori known cell-types present in a sample representing a mixture of such cell-types. Right now, the package can be used on DNAm data of blood-tissue of any age, from birth to old-age, generic epithelial tissue and breast tissue. Besides, the package provides a function that allows the identification of differentially methylated cell-types and their directionality of change in Epigenome-Wide Association Studies.
Maintained by Shijie C. Zheng. Last updated 4 months ago.
dnamethylationmethylationarrayepigeneticsdifferentialmethylationimmunooncology
2.1 match 48 stars 10.28 score 166 scripts 4 dependentshanjunwei-lab
ProgModule:Identification of Prognosis-Related Mutually Exclusive Modules
A novel tool to identify candidate driver modules for predicting the prognosis of patients by integrating exclusive coverage of mutations with clinical characteristics in cancer.
Maintained by Junwei Han. Last updated 3 months ago.
5.8 match 3.70 score 1 scriptsracdale
sindyr:Sparse Identification of Nonlinear Dynamics
This implements the Brunton et al (2016; PNAS <doi:10.1073/pnas.1517384113>) sparse identification algorithm for finding ordinary differential equations for a measured system from raw data (SINDy). The package includes a set of additional tools for working with raw data, with an emphasis on cognitive science applications (Dale and Bhat, 2018 <doi:10.1016/j.cogsys.2018.06.020>). See <https://github.com/racdale/sindyr> for examples and updates.
Maintained by Rick Dale. Last updated 11 months ago.
5.5 match 15 stars 3.92 score 11 scriptstim-tu
weibulltools:Statistical Methods for Life Data Analysis
Provides statistical methods and visualizations that are often used in reliability engineering. Comprises a compact and easily accessible set of methods and visualization tools that make the examination and adjustment as well as the analysis and interpretation of field data (and bench tests) as simple as possible. Non-parametric estimators like Median Ranks, Kaplan-Meier (Abernethy, 2006, <ISBN:978-0-9653062-3-2>), Johnson (Johnson, 1964, <ISBN:978-0444403223>), and Nelson-Aalen for failure probability estimation within samples that contain failures as well as censored data are included. The package supports methods like Maximum Likelihood and Rank Regression, (Genschel and Meeker, 2010, <DOI:10.1080/08982112.2010.503447>) for the estimation of multiple parametric lifetime distributions, as well as the computation of confidence intervals of quantiles and probabilities using the delta method related to Fisher's confidence intervals (Meeker and Escobar, 1998, <ISBN:9780471673279>) and the beta-binomial confidence bounds. If desired, mixture model analysis can be done with segmented regression and the EM algorithm. Besides the well-known Weibull analysis, the package also contains Monte Carlo methods for the correction and completion of imprecisely recorded or unknown lifetime characteristics. (Verband der Automobilindustrie e.V. (VDA), 2016, <ISSN:0943-9412>). Plots are created statically ('ggplot2') or interactively ('plotly') and can be customized with functions of the respective visualization package. The graphical technique of probability plotting as well as the addition of regression lines and confidence bounds to existing plots are supported.
Maintained by Tim-Gunnar Hensel. Last updated 2 years ago.
field-data-analysisinteractive-visualizationsplotlyreliability-analysisweibull-analysisweibulltoolsopenblascpp
3.5 match 13 stars 6.15 score 54 scriptssvilsen
STRMPS:Analysis of Short Tandem Repeat (STR) Massively Parallel Sequencing (MPS) Data
Loading, identifying, aggregating, manipulating, and analysing short tandem repeat regions of massively parallel sequencing data in forensic genetics. The analyses and framework implemented in this package relies on the papers of Vilsen et al. (2017) <doi:10.1016/j.fsigen.2017.01.017> and Vilsen et al. (2018) <doi:10.1016/j.fsigen.2018.04.003>. Note: that the parallelisation in the package relies on mclapply() and, thus, speed-ups will only be seen on UNIX based systems.
Maintained by Sรธren B. Vilsen. Last updated 4 days ago.
biostringspwalignshortreadiranges
5.0 match 4.30 scorebioc
VariantAnnotation:Annotation of Genetic Variants
Annotate variants, compute amino acid coding changes, predict coding outcomes.
Maintained by Bioconductor Package Maintainer. Last updated 2 months ago.
dataimportsequencingsnpannotationgeneticsvariantannotationcurlbzip2xz-utilszlib
1.9 match 11.39 score 1.9k scripts 152 dependentsbdhitt
binGroup2:Identification and Estimation using Group Testing
Methods for the group testing identification problem: 1) Operating characteristics (e.g., expected number of tests) for commonly used hierarchical and array-based algorithms, and 2) Optimal testing configurations for these same algorithms. Methods for the group testing estimation problem: 1) Estimation and inference procedures for an overall prevalence, and 2) Regression modeling for commonly used hierarchical and array-based algorithms.
Maintained by Brianna Hitt. Last updated 1 years ago.
8.6 match 2.48 score 3 scripts 1 dependentsmarsicofl
mispitools:Missing Person Identification Tools
An open source software package written in R statistical language. It consists of a set of decision-making tools to conduct missing person searches. Particularly, it allows computing optimal LR threshold for declaring potential matches in DNA-based database search. More recently 'mispitools' incorporates preliminary investigation data based LRs. Statistical weight of different traces of evidence such as biological sex, age and hair color are presented. For citing mispitools please use the following references: Marsico and Caridi, 2023 <doi:10.1016/j.fsigen.2023.102891> and Marsico, Vigeland et al. 2021 <doi:10.1016/j.fsigen.2021.102519>.
Maintained by Franco Marsico. Last updated 3 months ago.
3.1 match 35 stars 6.74 score 19 scripts 1 dependentsekstroem
dataMaid:A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process
Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.
Maintained by Claus Thorn Ekstrรธm. Last updated 3 years ago.
data-cleaningdata-screeningreproducible-research
2.7 match 143 stars 7.53 score 236 scriptsdgrun
RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data
Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).
Maintained by Dominic Grรผn. Last updated 4 months ago.
4.3 match 4.74 score 110 scriptsokdll
flowTraceR:Tracing Information Flow for Inter-Software Comparisons in Mass Spectrometry-Based Bottom-Up Proteomics
Useful functions to standardize software outputs from ProteomeDiscoverer, Spectronaut, DIA-NN and MaxQuant on precursor, modified peptide and proteingroup level and to trace software differences for identifications such as varying proteingroup denotations for common precursor.
Maintained by Oliver Kardell. Last updated 3 years ago.
3.9 match 3 stars 5.17 score 11 scripts 1 dependentsmreginato
monographaR:Taxonomic Monographs Tools
Contains functions intended to facilitate the production of plant taxonomic monographs. The package includes functions to convert tables into taxonomic descriptions, lists of collectors, examined specimens, identification keys (dichotomous and interactive), and can generate a monograph skeleton. Additionally, wrapper functions to batch the production of phenology histograms and distributional and diversity maps are also available.
Maintained by Marcelo Reginato. Last updated 1 years ago.
4.3 match 3 stars 4.73 score 18 scriptsropensci
phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'
A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.
Maintained by Shixiang Wang. Last updated 8 months ago.
blastngenbankpeer-reviewedphylogeneticssequence-alignment
3.4 match 23 stars 5.86 score 156 scriptsgertraudmalsinerwalli
telescope:Bayesian Mixtures with an Unknown Number of Components
Fits Bayesian finite mixtures with an unknown number of components using the telescoping sampler and different component distributions. For more details see Frรผhwirth-Schnatter et al. (2021) <doi:10.1214/21-BA1294>.
Maintained by Gertraud Malsiner-Walli. Last updated 2 months ago.
6.7 match 3.00 score 4 scriptsniuniular
MDDC:Modified Detecting Deviating Cells Algorithm in Pharmacovigilance
Methods for detecting signals related to (adverse event, medical product e.g. drugs, vaccines) pairs, a data generation function for simulating pharmacovigilance datasets, and various utility functions. For more details please see Liu A., Mukhopadhyay R., and Markatou M. <doi:10.48550/arXiv.2410.01168>.
Maintained by Anran Liu. Last updated 5 months ago.
4.4 match 1 stars 4.54 score 4 scriptsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
1.7 match 215 stars 11.83 score 1.2k scripts 9 dependentssebkrantz
collapse:Advanced and Fast Data Transformation
A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.
Maintained by Sebastian Krantz. Last updated 7 days ago.
data-aggregationdata-analysisdata-manipulationdata-processingdata-sciencedata-transformationeconometricshigh-performancepanel-datascientific-computingstatisticstime-seriesweightedweightscppopenmp
1.2 match 672 stars 16.63 score 708 scripts 97 dependentsbioc
MetID:Network-based prioritization of putative metabolite IDs
This package uses an innovative network-based approach that will enhance our ability to determine the identities of significant ions detected by LC-MS.
Maintained by Zhenzhi Li. Last updated 5 months ago.
assaydomainbiologicalquestioninfrastructureresearchfieldstatisticalmethodtechnologyworkflowstepnetworkkegg
3.5 match 1 stars 5.74 score 110 scriptsbioc
MassSpecWavelet:Peak Detection for Mass Spectrometry data using wavelet-based algorithms
Peak Detection in Mass Spectrometry data is one of the important preprocessing steps. The performance of peak detection affects subsequent processes, including protein identification, profile alignment and biomarker identification. Using Continuous Wavelet Transform (CWT), this package provides a reliable algorithm for peak detection that does not require any type of smoothing or previous baseline correction method, providing more consistent results for different spectra. See <doi:10.1093/bioinformatics/btl355} for further details.
Maintained by Sergio Oller Moreno. Last updated 3 months ago.
immunooncologymassspectrometryproteomicspeakdetection
2.1 match 9 stars 9.38 score 37 scripts 17 dependentsbioc
Cepo:Cepo for the identification of differentially stable genes
Defining the identity of a cell is fundamental to understand the heterogeneity of cells to various environmental signals and perturbations. We present Cepo, a new method to explore cell identities from single-cell RNA-sequencing data using differential stability as a new metric to define cell identity genes. Cepo computes cell-type specific gene statistics pertaining to differential stable gene expression.
Maintained by Hani Jieun Kim. Last updated 5 months ago.
classificationgeneexpressionsinglecellsoftwaresequencingdifferentialexpression
4.3 match 4.62 score 14 scripts 1 dependentsbioc
GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data
Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.
Maintained by Robert Castelo. Last updated 6 days ago.
functionalgenomicsmicroarrayrnaseqpathwaysgenesetenrichmentgene-set-enrichmentgenomicspathway-enrichment-analysis
1.3 match 210 stars 14.72 score 1.6k scripts 19 dependentsr-forge
mlogit:Multinomial Logit Models
Maximum Likelihood estimation of random utility discrete choice models, as described in Kenneth Train (2009) Discrete Choice Methods with Simulations <doi:10.1017/CBO9780511805271>.
Maintained by Yves Croissant. Last updated 5 years ago.
2.0 match 9.81 score 1.2k scripts 14 dependentstomaskrehlik
frequencyConnectedness:Spectral Decomposition of Connectedness Measures
Accompanies a paper (Barunik, Krehlik (2018) <doi:10.1093/jjfinec/nby001>) dedicated to spectral decomposition of connectedness measures and their interpretation. We implement all the developed estimators as well as the historical counterparts. For more information, see the help or GitHub page (<https://github.com/tomaskrehlik/frequencyConnectedness>) for relevant information.
Maintained by Tomas Krehlik. Last updated 2 years ago.
3.3 match 100 stars 5.88 score 50 scripts 1 dependentstombeesley
eyetools:Analyse Eye Data
Enables the automation of actions across the pipeline, including initial steps of transforming binocular data and gap repair to event-based processing such as fixations, saccades, and entry/duration in Areas of Interest (AOIs). It also offers visualisation of eye movement and AOI entries. These tools take relatively raw (trial, time, x, and y form) data and can be used to return fixations, saccades, and AOI entries and time spent in AOIs. As the tools rely on this basic data format, the functions can work with data from any eye tracking device. Implements fixation and saccade detection using methods proposed by Salvucci and Goldberg (2000) <doi:10.1145/355017.355028>.
Maintained by Tom Beesley. Last updated 3 months ago.
areas-of-interestattention-visualizationcognitive-sciencedwell-time-algorithmeye-trackereye-trackingeyetrackingggplot2psychologypsychology-experimentssaccadestobiitobii-eye-trackervisualization
3.6 match 4 stars 5.45 score 8 scriptsbioc
maftools:Summarize, Analyze and Visualize MAF Files
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.
Maintained by Anand Mayakonda. Last updated 5 months ago.
datarepresentationdnaseqvisualizationdrivermutationvariantannotationfeatureextractionclassificationsomaticmutationsequencingfunctionalgenomicssurvivalbioinformaticscancer-genome-atlascancer-genomicsgenomicsmaf-filestcgacurlbzip2xz-utilszlib
1.3 match 459 stars 14.63 score 948 scripts 18 dependentshan-siyu
LncFinder:LncRNA Identification and Analysis Using Heterologous Features
Long non-coding RNAs identification and analysis. Default models are trained with human, mouse and wheat datasets by employing SVM. Features are based on intrinsic composition of sequence, EIIP value (electron-ion interaction pseudopotential), and secondary structure. This package can also extract other classic features and build new classifiers. Reference: Han S., et al. (2019) <doi:10.1093/bib/bby065>.
Maintained by Siyu Han. Last updated 6 months ago.
5.2 match 2 stars 3.68 score 53 scriptsolechnwin
DIME:Differential Identification using Mixture Ensemble
A robust identification of differential binding sites method for analyzing ChIP-seq (Chromatin Immunoprecipitation Sequencing) comparing two samples that considers an ensemble of finite mixture models combined with a local false discovery rate (fdr) allowing for flexible modeling of data. Methods for Differential Identification using Mixture Ensemble (DIME) is described in: Taslim et al., (2011) <doi:10.1093/bioinformatics/btr165>.
Maintained by Cenny Taslim. Last updated 3 years ago.
7.3 match 2.63 score 43 scriptsbioc
HEM:Heterogeneous error model for identification of differentially expressed genes under multiple conditions
This package fits heterogeneous error models for analysis of microarray data
Maintained by HyungJun Cho. Last updated 5 months ago.
microarraydifferentialexpression
4.4 match 4.30 score 6 scriptsbioc
EnMCB:Predicting Disease Progression Based on Methylation Correlated Blocks using Ensemble Models
Creation of the correlated blocks using DNA methylation profiles. Machine learning models can be constructed to predict differentially methylated blocks and disease progression.
Maintained by Xin Yu. Last updated 5 months ago.
normalizationdnamethylationmethylationarraysupportvectormachine
3.6 match 9 stars 5.26 score 2 scriptspaulhendricks
detector:Detect Data Containing Personally Identifiable Information
Allows users to quickly and easily detect data containing Personally Identifiable Information (PII) through convenience functions.
Maintained by Paul Hendricks. Last updated 8 years ago.
3.5 match 15 stars 5.34 score 29 scriptswilsonfreitas
numbersBR:Validate, Compare and Format Identification Numbers from Brazil
Validate, format and compare identification numbers used in Brazil. These numbers are used to identify individuals (CPF), vehicles (RENAVAN), companies (CNPJ) and etc. Functions to format, validate and compare these numbers have been implemented in a vectorized way in order to speed up validations and comparisons in big datasets.
Maintained by Wilson Freitas. Last updated 7 years ago.
5.2 match 9 stars 3.65 score 5 scriptsnoaa-nwfsc
zoid:Bayesian Zero-and-One Inflated Dirichlet Regression Modelling
Fits Dirichlet regression and zero-and-one inflated Dirichlet regression with Bayesian methods implemented in Stan. These models are sometimes referred to as trinomial mixture models; covariates and overdispersion can optionally be included.
Maintained by Eric J. Ward. Last updated 12 hours ago.
3.0 match 8 stars 6.19 score 12 scriptsweiliu123
PCLassoReg:Group Regression Models for Risk Protein Complex Identification
Two protein complex-based group regression models (PCLasso and PCLasso2) for risk protein complex identification. PCLasso is a prognostic model that identifies risk protein complexes associated with survival. PCLasso2 is a classification model that identifies risk protein complexes associated with classes. For more information, see Wang and Liu (2021) <doi:10.1093/bib/bbab212>.
Maintained by Wei Liu. Last updated 3 years ago.
5.1 match 1 stars 3.70 score 1 scriptscharvey23
AvInertia:Calculate the Inertial Properties of a Flying Bird
Tools to compute the center of gravity and moment of inertia tensor of any flying bird. The tools function by modeling a bird as a composite structure of simple geometric objects. This requires detailed morphological measurements of bird specimens although those obtained for the associated paper have been included in the package for use. Refer to the vignettes and supplementary material for detailed information on the package function.
Maintained by Christina Harvey. Last updated 3 years ago.
3.8 match 6 stars 5.00 score 33 scriptsbioc
DominoEffect:Identification and Annotation of Protein Hotspot Residues
The functions support identification and annotation of hotspot residues in proteins. These are individual amino acids that accumulate mutations at a much higher rate than their surrounding regions.
Maintained by Marija Buljan. Last updated 5 months ago.
softwaresomaticmutationproteomicssequencematchingalignment
5.3 match 3.48 score 1 scriptsschmidtpk
PointFore:Interpretation of Point Forecasts as State-Dependent Quantiles and Expectiles
Estimate specification models for the state-dependent level of an optimal quantile/expectile forecast. Wald Tests and the test of overidentifying restrictions are implemented. Plotting of the estimated specification model is possible. The package contains two data sets with forecasts and realizations: the daily accumulated precipitation at London, UK from the high-resolution model of the European Centre for Medium-Range Weather Forecasts (ECMWF, <https://www.ecmwf.int/>) and GDP growth Greenbook data by the US Federal Reserve. See Schmidt, Katzfuss and Gneiting (2015) <arXiv:1506.01917> for more details on the identification and estimation of a directive behind a point forecast.
Maintained by Patrick Schmidt. Last updated 4 years ago.
4.1 match 4.48 score 20 scriptsgregorkastner
factorstochvol:Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models
Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models with interweaving <doi:10.1080/10618600.2017.1322091>. Sparsity can be achieved through the usage of Normal-Gamma priors on the factor loading matrix <doi:10.1016/j.jeconom.2018.11.007>.
Maintained by Gregor Kastner. Last updated 1 years ago.
3.9 match 7 stars 4.73 score 17 scripts 1 dependentsmathewchamberlain
SignacX:Cell Type Identification and Discovery from Single Cell Gene Expression Data
An implementation of neural networks trained with flow-sorted gene expression data to classify cellular phenotypes in single cell RNA-sequencing data. See Chamberlain M et al. (2021) <doi:10.1101/2021.02.01.429207> for more details.
Maintained by Mathew Chamberlain. Last updated 2 years ago.
cellular-phenotypesseuratsingle-cell-rna-seq
2.8 match 24 stars 6.46 score 34 scriptsthiyangt
denguedatahub:A Tidy Format Datasets of Dengue by Country
Provides a weekly, monthly, yearly summary of dengue cases by state/ province/ country.
Maintained by Thiyanga S. Talagala. Last updated 1 months ago.
3.5 match 11 stars 5.12 score 34 scriptslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 25 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
1.8 match 233 stars 9.84 score 185 scripts 1 dependentsbioc
coMethDMR:Accurate identification of co-methylated and differentially methylated regions in epigenome-wide association studies
coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.
Maintained by Fernanda Veitzman. Last updated 5 months ago.
dnamethylationepigeneticsmethylationarraydifferentialmethylationgenomewideassociation
2.7 match 7 stars 6.47 score 42 scriptsbart1
move:Visualizing and Analyzing Animal Track Data
Contains functions to access movement data stored in 'movebank.org' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.
Maintained by Bart Kranstauber. Last updated 4 months ago.
2.0 match 8.74 score 690 scripts 3 dependentsr-forge
cardidates:Identification of Cardinal Dates in Ecological Time Series
Identification of cardinal dates (begin, time of maximum, end of mass developments) in ecological time series using fitted Weibull functions.
Maintained by Thomas Petzoldt. Last updated 1 years ago.
5.2 match 3.34 score 22 scriptscran
crosstalkr:Analysis of Graph-Structured Data with a Focus on Protein-Protein Interaction Networks
Provides a general toolkit for drug target identification. We include functionality to reduce large graphs to subgraphs and prioritize nodes. In addition to being optimized for use with generic graphs, we also provides support to analyze protein-protein interactions networks from online repositories. For more details on core method, refer to Weaver et al. (2021) <https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008755>.
Maintained by Davis Weaver. Last updated 10 months ago.
6.4 match 2.70 scoreilapros
rnrfa:UK National River Flow Archive Data from R
Utility functions to retrieve data from the UK National River Flow Archive (<https://nrfa.ceh.ac.uk/>, terms and conditions: <https://nrfa.ceh.ac.uk/costs-terms-and-conditions>). The package contains R wrappers to the UK NRFA data temporary-API. There are functions to retrieve stations falling in a bounding box, to generate a map and extracting time series and general information. The package is fully described in Vitolo et al (2016) "rnrfa: An R package to Retrieve, Filter and Visualize Data from the UK National River Flow Archive" <https://journal.r-project.org/archive/2016/RJ-2016-036/RJ-2016-036.pdf>.
Maintained by Ilaria Prosdocimi. Last updated 9 months ago.
3.0 match 2 stars 5.71 score 51 scriptsbioc
SurfR:Surface Protein Prediction and Identification
Identify Surface Protein coding genes from a list of candidates. Systematically download data from GEO and TCGA or use your own data. Perform DGE on bulk RNAseq data. Perform Meta-analysis. Descriptive enrichment analysis and plots.
Maintained by Aurora Maurizio. Last updated 3 days ago.
softwaresequencingrnaseqgeneexpressiontranscriptiondifferentialexpressionprincipalcomponentgenesetenrichmentpathwaysbatcheffectfunctionalgenomicsvisualizationdataimportfunctionalpredictiongenepredictiongodgeenrichment-analysismetaanalysisplotsproteinspublic-datasurfacesurfaceome
3.1 match 3 stars 5.43 score 3 scriptsbioc
PIPETS:Poisson Identification of PEaks from Term-Seq data
PIPETS provides statistically robust analysis for 3'-seq/term-seq data. It utilizes a sliding window approach to apply a Poisson Distribution test to identify genomic positions with termination read coverage that is significantly higher than the surrounding signal. PIPETS then condenses proximal signal and produces strand specific results that contain all significant termination peaks.
Maintained by Quinlan Furumo. Last updated 5 months ago.
sequencingtranscriptiongeneregulationpeakdetectiongeneticstranscriptomicscoverage
4.5 match 3.78 score 2 scriptsbioc
TrIdent:TrIdent - Transduction Identification
The `TrIdent` R package automates the analysis of transductomics data by detecting, classifying, and characterizing read coverage patterns associated with potential transduction events. Transductomics is a DNA sequencing-based method for the detection and characterization of transduction events in pure cultures and complex communities. Transductomics relies on mapping sequencing reads from a viral-like particle (VLP)-fraction of a sample to contigs assembled from the metagenome (whole-community) of the same sample. Reads from bacterial DNA carried by VLPs will map back to the bacterial contigs of origin creating read coverage patterns indicative of ongoing transduction.
Maintained by Jessie Maier. Last updated 14 days ago.
coveragemetagenomicspatternlogicclassificationsequencingbacteriophagehorizontal-gene-transferpattern-matchingphagesequencing-coveragetransductiontransductomicsvirus-like-particle
3.3 match 2 stars 5.04 score 7 scriptshanase
BMA:Bayesian Model Averaging
Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).
Maintained by Hana Sevcikova. Last updated 2 months ago.
1.8 match 38 stars 9.40 score 152 scripts 14 dependentszhangab2008
BarcodingR:Species Identification using DNA Barcodes
To perform species identification using DNA barcodes.
Maintained by Ai-bing ZHANG. Last updated 5 years ago.
11.5 match 1 stars 1.41 score 26 scriptscran
hsphase:Phasing, Pedigree Reconstruction, Sire Imputation and Recombination Events Identification of Half-sib Families Using SNP Data
Identification of recombination events, haplotype reconstruction, sire imputation and pedigree reconstruction using half-sib family SNP data.
Maintained by Mohammad Ferdosi. Last updated 1 years ago.
4.8 match 1 stars 3.32 score 1 dependentsikosmidis
brglm:Bias Reduction in Binomial-Response Generalized Linear Models
Fit generalized linear models with binomial responses using either an adjusted-score approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudo-data. The interface is essentially the same as 'glm'. More flexibility is provided by the fact that custom pseudo-data representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reduced-bias estimates.
Maintained by Ioannis Kosmidis. Last updated 4 years ago.
2.3 match 6 stars 7.14 score 86 scripts 11 dependentshanjunwei-lab
ICDS:Identification of Cancer Dysfunctional Subpathway with Omics Data
Identify Cancer Dysfunctional Sub-pathway by integrating gene expression, DNA methylation and copy number variation, and pathway topological information. 1)We firstly calculate the gene risk scores by integrating three kinds of data: DNA methylation, copy number variation, and gene expression. 2)Secondly, we perform a greedy search algorithm to identify the key dysfunctional sub-pathways within the pathways for which the discriminative scores were locally maximal. 3)Finally, the permutation test was used to calculate statistical significance level for these key dysfunctional sub-pathways.
Maintained by Junwei Han. Last updated 8 months ago.
4.5 match 3.54 score 3 scriptstrinker
wakefield:Generate Random Data Sets
Generates random data sets including: data.frames, lists, and vectors.
Maintained by Tyler Rinker. Last updated 5 years ago.
2.3 match 256 stars 7.13 score 209 scriptseddelbuettel
RcppUUID:Generating Universally Unique Identificators
Using the efficient implementation in the Boost C++ library, functions are provided to generate vectors of 'Universally Unique Identifiers (UUID)' from R supporting random (version 4), name (version 5) and time (version 7) 'UUIDs'. The initial repository was at <https://gitlab.com/artemklevtsov/rcppuuid>.
Maintained by Dirk Eddelbuettel. Last updated 1 months ago.
5.0 match 1 stars 3.18 score 1 scriptscran
ips:Interfaces to Phylogenetic Software in R
Functions that wrap popular phylogenetic software for sequence alignment, masking of sequence alignments, and estimation of phylogenies and ancestral character states.
Maintained by Christoph Heibl. Last updated 11 months ago.
3.7 match 4.28 score 128 scripts 1 dependentsbioc
EBarrays:Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification
EBarrays provides tools for the analysis of replicated/unreplicated microarray data.
Maintained by Ming Yuan. Last updated 5 months ago.
clusteringdifferentialexpression
2.8 match 5.56 score 5 scripts 6 dependentsbioc
VegaMC:VegaMC: A Package Implementing a Variational Piecewise Smooth Model for Identification of Driver Chromosomal Imbalances in Cancer
This package enables the detection of driver chromosomal imbalances including loss of heterozygosity (LOH) from array comparative genomic hybridization (aCGH) data. VegaMC performs a joint segmentation of a dataset and uses a statistical framework to distinguish between driver and passenger mutation. VegaMC has been implemented so that it can be immediately integrated with the output produced by PennCNV tool. In addition, VegaMC produces in output two web pages that allows a rapid navigation between both the detected regions and the altered genes. In the web page that summarizes the altered genes, the link to the respective Ensembl gene web page is reported.
Maintained by Sandro Morganella. Last updated 5 months ago.
4.3 match 3.60 score 1 scriptsbioc
diffcoexp:Differential Co-expression Analysis
A tool for the identification of differentially coexpressed links (DCLs) and differentially coexpressed genes (DCGs). DCLs are gene pairs with significantly different correlation coefficients under two conditions. DCGs are genes with significantly more DCLs than by chance.
Maintained by Wenbin Wei. Last updated 5 months ago.
geneexpressiondifferentialexpressiontranscriptionmicroarrayonechanneltwochannelrnaseqsequencingcoverageimmunooncology
2.2 match 15 stars 6.89 score 37 scriptsbioc
GSgalgoR:An Evolutionary Framework for the Identification and Study of Prognostic Gene Expression Signatures in Cancer
A multi-objective optimization algorithm for disease sub-type discovery based on a non-dominated sorting genetic algorithm. The 'Galgo' framework combines the advantages of clustering algorithms for grouping heterogeneous 'omics' data and the searching properties of genetic algorithms for feature selection. The algorithm search for the optimal number of clusters determination considering the features that maximize the survival difference between sub-types while keeping cluster consistency high.
Maintained by Carlos Catania. Last updated 5 months ago.
geneexpressiontranscriptionclusteringclassificationsurvival
2.8 match 15 stars 5.48 score 6 scriptsbioc
mosbi:Molecular Signature identification using Biclustering
This package is a implementation of biclustering ensemble method MoSBi (Molecular signature Identification from Biclustering). MoSBi provides standardized interfaces for biclustering results and can combine their results with a multi-algorithm ensemble approach to compute robust ensemble biclusters on molecular omics data. This is done by computing similarity networks of biclusters and filtering for overlaps using a custom error model. After that, the louvain modularity it used to extract bicluster communities from the similarity network, which can then be converted to ensemble biclusters. Additionally, MoSBi includes several network visualization methods to give an intuitive and scalable overview of the results. MoSBi comes with several biclustering algorithms, but can be easily extended to new biclustering algorithms.
Maintained by Tim Daniel Rose. Last updated 5 months ago.
softwarestatisticalmethodclusteringnetworkcpp
3.5 match 4.30 score 8 scriptscran
BioPred:An R Package for Biomarkers Analysis in Precision Medicine
Provides functions for training extreme gradient boosting model using propensity score A-learning and weight-learning methods. For further details, see Liu et al. (2024) <doi:10.1093/bioinformatics/btae592>.
Maintained by Zihuan Liu. Last updated 4 months ago.
5.0 match 3.00 scorebsnatr
tswge:Time Series for Data Science
Accompanies the texts Time Series for Data Science with R by Woodward, Sadler and Robertson & Applied Time Series Analysis with R, 2nd edition by Woodward, Gray, and Elliott. It is helpful for data analysis and for time series instruction.
Maintained by Bivin Sadler. Last updated 2 years ago.
5.5 match 2.70 score 496 scriptsbioc
CiteFuse:CiteFuse: multi-modal analysis of CITE-seq data
CiteFuse pacakage implements a suite of methods and tools for CITE-seq data from pre-processing to integrative analytics, including doublet detection, network-based modality integration, cell type clustering, differential RNA and protein expression analysis, ADT evaluation, ligand-receptor interaction analysis, and interactive web-based visualisation of the analyses.
Maintained by Yingxin Lin. Last updated 5 months ago.
singlecellgeneexpressionbioinformaticssingle-cellcpp
2.3 match 27 stars 6.59 score 18 scriptsigorrigolon
datazoom.amazonia:Simplify Access to Data from the Amazon Region
Functions to download and treat data regarding the Brazilian Amazon region from a variety of official sources.
Maintained by Igor Rigolon Veiga. Last updated 1 years ago.
3.4 match 4.29 score 15 scriptssunnypig1988
BCSub:A Bayesian Semiparametric Factor Analysis Model for Subtype Identification (Clustering)
Gene expression profiles are commonly utilized to infer disease subtypes and many clustering methods can be adopted for this task. However, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering. To deal with these challenges, we develop a novel clustering method in the Bayesian setting. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering.
Maintained by Jiehuan Sun. Last updated 8 years ago.
7.3 match 2.00 score 2 scriptstraminer
TraMineR:Trajectory Miner: a Sequence Analysis Toolkit
Set of sequence analysis tools for manipulating, describing and rendering categorical sequences, and more generally mining sequence data in the field of social sciences. Although this sequence analysis package is primarily intended for state or event sequences that describe time use or life courses such as family formation histories or professional careers, its features also apply to many other kinds of categorical sequence data. It accepts many different sequence representations as input and provides tools for converting sequences from one format to another. It offers several functions for describing and rendering sequences, for computing distances between sequences with different metrics (among which optimal matching), original dissimilarity-based analysis tools, and functions for extracting the most frequent event subsequences and identifying the most discriminating ones among them. A user's guide can be found on the TraMineR web page.
Maintained by Gilbert Ritschard. Last updated 3 months ago.
1.8 match 11 stars 8.24 score 534 scripts 13 dependentscastleli
scBSP:A Fast Tool for Single-Cell Spatially Variable Genes Identifications on Large-Scale Data
Identifying spatially variable genes is critical in linking molecular cell functions with tissue phenotypes. This package utilizes a granularity-based dimension-agnostic tool, single-cell big-small patch (scBSP), implementing sparse matrix operation and KD tree methods for distance calculation, for the identification of spatially variable genes on large-scale data. The detailed description of this method is available at Wang, J. and Li, J. et al. 2023 (Wang, J. and Li, J. (2023), <doi:10.1038/s41467-023-43256-5>).
Maintained by Jinpu Li. Last updated 1 months ago.
3.2 match 18 stars 4.43 score 2 scriptsbioc
PhyloProfile:PhyloProfile
PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.
Maintained by Vinh Tran. Last updated 8 days ago.
softwarevisualizationdatarepresentationmultiplecomparisonfunctionalpredictiondimensionreductionbioinformaticsheatmapinteractive-visualizationsorthologsphylogenetic-profileshiny
1.8 match 33 stars 7.77 score 10 scriptsbioc
cTRAP:Identification of candidate causal perturbations from differential gene expression data
Compare differential gene expression results with those from known cellular perturbations (such as gene knock-down, overexpression or small molecules) derived from the Connectivity Map. Such analyses allow not only to infer the molecular causes of the observed difference in gene expression but also to identify small molecules that could drive or revert specific transcriptomic alterations.
Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.
differentialexpressiongeneexpressionrnaseqtranscriptomicspathwaysimmunooncologygenesetenrichmentbioconductorbioinformaticscmapgene-expressionl1000
2.8 match 5 stars 5.08 score 16 scriptsbioc
Uniquorn:Identification of cancer cell lines based on their weighted mutational/ variational fingerprint
'Uniquorn' enables users to identify cancer cell lines. Cancer cell line misidentification and cross-contamination reprents a significant challenge for cancer researchers. The identification is vital and in the frame of this package based on the locations/ loci of somatic and germline mutations/ variations. The input format is vcf/ vcf.gz and the files have to contain a single cancer cell line sample (i.e. a single member/genotype/gt column in the vcf file).
Maintained by Raik Otto. Last updated 5 months ago.
immunooncologystatisticalmethodwholegenomeexomeseq
3.3 match 4.30 scorebioc
logicFS:Identification of SNP Interactions
Identification of interactions between binary variables using Logic Regression. Can, e.g., be used to find interesting SNP interactions. Contains also a bagging version of logic regression for classification.
Maintained by Holger Schwender. Last updated 5 months ago.
3.9 match 3.60 score 8 scriptsropensci
datapack:A Flexible Container to Transport and Manipulate Data and Associated Resources
Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI- ORE standard is described at <https://www.openarchives.org/ore/>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at <https://tools.ietf.org/html/draft-kunze-bagit-08>.
Maintained by Matthew B. Jones. Last updated 3 years ago.
1.6 match 44 stars 8.56 score 195 scripts 4 dependentssetzler
DiDforBigData:A Big Data Implementation of Difference-in-Differences Estimation with Staggered Treatment
Provides a big-data-friendly and memory-efficient difference-in-differences estimator for staggered (and non-staggered) treatment contexts.
Maintained by Bradley Setzler. Last updated 9 months ago.
2.8 match 5 stars 5.00 score 10 scriptsnoreastermt
allelematch:Identifying Unique Multilocus Genotypes where Genotyping Error and Missing Data may be Present
Tools for the identification of unique of multilocus genotypes when both genotyping error and missing data may be present; targeted for use with large datasets and databases containing multiple samples of each individual (a common situation in conservation genetics, particularly in non-invasive wildlife sampling applications). Functions explicitly incorporate missing data and can tolerate allele mismatches created by genotyping error. If you use this package, please cite the original publication in Molecular Ecology Resources (Galpern et al., 2012), the details for which can be generated using citation('allelematch'). For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
Maintained by Todd Cross. Last updated 12 months ago.
6.1 match 2.26 score 8 scripts 1 dependentsnk027
BVAR:Hierarchical Bayesian Vector Autoregression
Estimation of hierarchical Bayesian vector autoregressive models following Kuschnig & Vashold (2021) <doi:10.18637/jss.v100.i14>. Implements hierarchical prior selection for conjugate priors in the fashion of Giannone, Lenza & Primiceri (2015) <doi:10.1162/REST_a_00483>. Functions to compute and identify impulse responses, calculate forecasts, forecast error variance decompositions and scenarios are available. Several methods to print, plot and summarise results facilitate analysis.
Maintained by Nikolas Kuschnig. Last updated 4 months ago.
bayesianbvarforecastsimpulse-responsesvector-autoregressions
1.9 match 51 stars 7.30 score 68 scripts 1 dependentsbioc
Rdisop:Decomposition of Isotopic Patterns
In high resolution mass spectrometry (HR-MS), the measured masses can be decomposed into potential element combinations (chemical sum formulas). Where additional mass/intensity information of respective isotopic peaks is available, decomposition can take this information into account to better rank the potential candidate sum formulas. To compare measured mass/intensity information with the theoretical distribution of candidate sum formulas, the latter needs to be calculated. This package implements fast algorithms to address both tasks, the calculation of isotopic distributions for arbitrary sum formulas (assuming a HR-MS resolution of roughly 30,000), and the ranked list of sum formulas fitting an observed peak or isotopic peak set.
Maintained by Steffen Neumann. Last updated 1 months ago.
immunooncologymassspectrometrymetabolomicsmass-spectrometrycpp
1.5 match 4 stars 9.12 score 111 scripts 2 dependentsmatthewblackwell
Amelia:A Program for Missing Data
A tool that "multiply imputes" missing data in a single cross-section (such as a survey), from a time series (like variables collected for each year in a country), or from a time-series-cross-sectional data set (such as collected by years for each of several countries). Amelia II implements our bootstrapping-based algorithm that gives essentially the same answers as the standard IP or EMis approaches, is usually considerably faster than existing approaches and can handle many more variables. Unlike Amelia I and other statistically rigorous imputation software, it virtually never crashes (but please let us know if you find to the contrary!). The program also generalizes existing approaches by allowing for trends in time series across observations within a cross-sectional unit, as well as priors that allow experts to incorporate beliefs they have about the values of missing cells in their data. Amelia II also includes useful diagnostics of the fit of multiple imputation models. The program works from the R command line or via a graphical user interface that does not require users to know R.
Maintained by Matthew Blackwell. Last updated 4 months ago.
1.5 match 1 stars 9.06 score 1.4k scripts 7 dependentsbioc
NetSAM:Network Seriation And Modularization
The NetSAM (Network Seriation and Modularization) package takes an edge-list representation of a weighted or unweighted network as an input, performs network seriation and modularization analysis, and generates as files that can be used as an input for the one-dimensional network visualization tool NetGestalt (http://www.netgestalt.org) or other network analysis. The NetSAM package can also generate correlation network (e.g. co-expression network) based on the input matrix data, perform seriation and modularization analysis for the correlation network and calculate the associations between the sample features and modules or identify the associated GO terms for the modules.
Maintained by Zhiao Shi. Last updated 5 months ago.
3.7 match 3.60 score 1 scripts