Showing 102 of total 102 results (show query)
bioc
decontam:Identify Contaminants in Marker-gene and Metagenomics Sequencing Data
Simple statistical identification of contaminating sequence features in marker-gene or metagenomics data. Works on any kind of feature derived from environmental sequencing data (e.g. ASVs, OTUs, taxonomic groups, MAGs,...). Requires DNA quantitation data or sequenced negative control samples.
Maintained by Benjamin Callahan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclassificationmetagenomicsampliconbioinformaticscontaminationmetabarcoding
21.1 match 153 stars 11.42 score 524 scripts 6 dependentsconstantamateur
SoupX:Single Cell mRNA Soup eXterminator
Quantify, profile and remove ambient mRNA contamination (the "soup") from droplet based single cell RNA-seq experiments. Implements the method described in Young et al. (2018) <doi:10.1101/303727>.
Maintained by Matthew Daniel Young. Last updated 2 years ago.
13.2 match 264 stars 10.09 score 594 scripts 1 dependentsbioc
celda:CEllular Latent Dirichlet Allocation
Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.
Maintained by Joshua Campbell. Last updated 29 days ago.
singlecellgeneexpressionclusteringsequencingbayesianimmunooncologydataimportcppopenmp
9.0 match 147 stars 10.47 score 256 scripts 2 dependentsmaarten14c
rice:Radiocarbon Equations
Provides functions for the calibration of radiocarbon dates, as well as options to calculate different radiocarbon realms (C14 age, F14C, pMC, D14C) and estimating the effects of contamination or local reservoir offsets (Reimer and Reimer 2001 <doi:10.1017/S0033822200038339>). The methods follow long-established recommendations such as Stuiver and Polach (1977) <doi:10.1017/S0033822200003672> and Reimer et al. (2004) <doi:10.1017/S0033822200033154>. This package complements the data package 'rintcal'.
Maintained by Maarten Blaauw. Last updated 2 months ago.
15.0 match 1 stars 6.13 score 13 scripts 4 dependentsbioc
CleanUpRNAseq:Detect and Correct Genomic DNA Contamination in RNA-seq Data
RNA-seq data generated by some library preparation methods, such as rRNA-depletion-based method and the SMART-seq method, might be contaminated by genomic DNA (gDNA), if DNase I disgestion is not performed properly during RNA preparation. CleanUpRNAseq is developed to check if RNA-seq data is suffered from gDNA contamination. If so, it can perform correction for gDNA contamination and reduce false discovery rate of differentially expressed genes.
Maintained by Haibo Liu. Last updated 4 months ago.
qualitycontrolsequencinggeneexpression
14.3 match 5 stars 5.44 score 4 scriptscbielow
PTXQC:Quality Report Generation for MaxQuant and mzTab Results
Generates Proteomics (PTX) quality control (QC) reports for shotgun LC-MS data analyzed with the MaxQuant software suite (from .txt files) or mzTab files (ideally from OpenMS 'QualityControl' tool). Reports are customizable (target thresholds, subsetting) and available in HTML or PDF format. Published in J. Proteome Res., Proteomics Quality Control: Quality Control Software for MaxQuant Results (2015) <doi:10.1021/acs.jproteome.5b00780>.
Maintained by Chris Bielow. Last updated 1 years ago.
drag-and-drophacktoberfestheatmapmatch-between-runsmaxquantmetricmztabopenmsproteomicsquality-controlquality-metricsreport
7.0 match 42 stars 9.35 score 105 scripts 1 dependentslbbe-software
fitdistrplus:Help to Fit of a Parametric Distribution to Non-Censored or Censored Data
Extends the fitdistr() function (of the MASS package) with several functions to help the fit of a parametric distribution to non-censored or censored data. Censored data may contain left censored, right censored and interval censored values, with several lower and upper bounds. In addition to maximum likelihood estimation (MLE), the package provides moment matching (MME), quantile matching (QME), maximum goodness-of-fit estimation (MGE) and maximum spacing estimation (MSE) methods (available only for non-censored data). Weighted versions of MLE, MME, QME and MSE are available. See e.g. Casella & Berger (2002), Statistical inference, Pacific Grove, for a general introduction to parametric estimation.
Maintained by Aurélie Siberchicot. Last updated 14 days ago.
3.5 match 54 stars 16.15 score 4.5k scripts 153 dependentsbioc
gDNAx:Diagnostics for assessing genomic DNA contamination in RNA-seq data
Provides diagnostics for assessing genomic DNA contamination in RNA-seq data, as well as plots representing these diagnostics. Moreover, the package can be used to get an insight into the strand library protocol used and, in case of strand-specific libraries, the strandedness of the data. Furthermore, it provides functionality to filter out reads of potential gDNA origin.
Maintained by Robert Castelo. Last updated 2 months ago.
transcriptiontranscriptomicsrnaseqsequencingpreprocessingsoftwaregeneexpressioncoveragedifferentialexpressionfunctionalgenomicssplicedalignmentalignment
10.6 match 1 stars 5.08 score 3 scriptsbioc
decontX:Decontamination of single cell genomics data
This package contains implementation of DecontX (Yang et al. 2020), a decontamination algorithm for single-cell RNA-seq, and DecontPro (Yin et al. 2023), a decontamination algorithm for single cell protein expression data. DecontX is a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. DecontPro is a Bayesian method that estimates the level of contamination from ambient and background sources in CITE-seq ADT dataset and decontaminate the dataset.
Maintained by Joshua Campbell. Last updated 29 days ago.
9.2 match 4.94 score 29 scriptshardin47
biwt:Functions to Compute the Biweight Mean Vector and Covariance and Correlation Matrices
The base functions compute multivariate location, scale, and correlation estimates based on Tukey's biweight M-estimator. Using the base function, the computations can be applied to a large number of observations to create either a matrix of biweight distances or biweight correlations.
Maintained by Johanna Hardin. Last updated 6 months ago.
8.0 match 5.58 score 16 scripts 2 dependentsprotviz
prozor:Minimal Protein Set Explaining Peptide Spectrum Matches
Determine minimal protein set explaining peptide spectrum matches. Utility functions for creating fasta amino acid databases with decoys and contaminants. Peptide false discovery rate estimation for target decoy search results on psm, precursor, peptide and protein level. Computing dynamic swath window sizes based on MS1 or MS2 signal distributions.
Maintained by Witold Wolski. Last updated 4 months ago.
softwaremassspectrometryproteomicsexperimenthubsoftware
9.6 match 6 stars 4.45 score 93 scriptsr-forge
coin:Conditional Inference Procedures in a Permutation Test Framework
Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems described in <doi:10.18637/jss.v028.i08>.
Maintained by Torsten Hothorn. Last updated 9 months ago.
3.5 match 11.68 score 1.6k scripts 74 dependentshubverse-org
hubVis:Plotting methods for hub models output
Plotting methods for hub models output.
Maintained by Lucie Contamin. Last updated 4 months ago.
9.1 match 3 stars 4.44 score 22 scripts 1 dependentsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 29 days ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
3.8 match 33 stars 10.63 score 115 scripts 2 dependentsbioc
singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data
The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.
Maintained by Joshua David Campbell. Last updated 25 days ago.
singlecellgeneexpressiondifferentialexpressionalignmentclusteringimmunooncologybatcheffectnormalizationqualitycontroldataimportgui
3.9 match 181 stars 10.16 score 252 scriptswaynegitshell
GWSDAT:GroundWater Spatiotemporal Data Analysis Tool (GWSDAT)
Shiny application for the analysis of groundwater monitoring data, designed to work with simple time-series data for solute concentration and ground water elevation, but can also plot non-aqueous phase liquid (NAPL) thickness if required. Also provides the import of a site basemap in GIS shapefile format.
Maintained by Wayne Jones. Last updated 1 years ago.
contamination-detectiongroundwater-flowplumeshinyspatio-temporal-analysis
7.5 match 32 stars 5.02 score 11 scriptsalanarnholt
BSDA:Basic Statistics and Data Analysis
Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.
Maintained by Alan T. Arnholt. Last updated 2 years ago.
3.4 match 7 stars 9.11 score 1.3k scripts 6 dependentsvalentint
tclust:Robust Trimmed Clustering
Provides functions for robust trimmed clustering. The methods are described in Garcia-Escudero (2008) <doi:10.1214/07-AOS515>, Fritz et al. (2012) <doi:10.18637/jss.v047.i12>, Garcia-Escudero et al. (2011) <doi:10.1007/s11222-010-9194-z> and others.
Maintained by Valentin Todorov. Last updated 26 days ago.
3.5 match 3 stars 8.02 score 72 scripts 3 dependentsaalfons
simFrame:Simulation Framework
A general framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with minimal programming effort. The package provides functionality for drawing samples from a distribution or a finite population, for adding outliers and missing values, as well as for visualization of the simulation results. It follows a clear object-oriented design and supports parallel computing to increase computational performance.
Maintained by Andreas Alfons. Last updated 3 years ago.
7.1 match 2 stars 3.90 score 80 scriptstbep-tech
tbeptools:Data and Indicators for the Tampa Bay Estuary Program
Several functions are provided for working with Tampa Bay Estuary Program data and indicators, including the water quality report card, tidal creek assessments, Tampa Bay Nekton Index, Tampa Bay Benthic Index, seagrass transect data, habitat report card, and fecal indicator bacteria. Additional functions are provided for miscellaneous tasks, such as reference library curation.
Maintained by Marcus Beck. Last updated 10 days ago.
data-analysistampa-baytbepwater-quality
3.5 match 10 stars 7.86 score 133 scriptslethargy608
sssc:Same Species Sample Contamination Detection
Imports Variant Calling Format file into R. It can detect whether a sample contains contaminant from the same species. In the first stage of the approach, a change-point detection method is used to identify copy number variations for filtering. Next, features are extracted from the data for a support vector machine model. For log-likelihood calculation, the deviation parameter is estimated by maximum likelihood method. Using a radial basis function kernel support vector machine, the contamination of a sample can be detected.
Maintained by Tao Jiang. Last updated 7 years ago.
14.6 match 1.70 score 4 scriptsyufree
pmd:Paired Mass Distance Analysis for GC/LC-MS Based Non-Targeted Analysis and Reactomics Analysis
Paired mass distance (PMD) analysis proposed in Yu, Olkowicz and Pawliszyn (2018) <doi:10.1016/j.aca.2018.10.062> and PMD based reactomics analysis proposed in Yu and Petrick (2020) <doi:10.1038/s42004-020-00403-z> for gas/liquid chromatography–mass spectrometry (GC/LC-MS) based non-targeted analysis. PMD analysis including GlobalStd algorithm and structure/reaction directed analysis. GlobalStd algorithm could found independent peaks in m/z-retention time profiles based on retention time hierarchical cluster analysis and frequency analysis of paired mass distances within retention time groups. Structure directed analysis could be used to find potential relationship among those independent peaks in different retention time groups based on frequency of paired mass distances. Reactomics analysis could also be performed to build PMD network, assign sources and make biomarker reaction discovery. GUIs for PMD analysis is also included as 'shiny' applications.
Maintained by Miao YU. Last updated 2 months ago.
mass-spectrometrymetabolomicsnon-target
3.5 match 10 stars 6.68 score 40 scriptsr-forge
distrEx:Extensions of Package 'distr'
Extends package 'distr' by functionals, distances, and conditional distributions.
Maintained by Matthias Kohl. Last updated 2 months ago.
3.5 match 6.68 score 107 scripts 17 dependentsbioc
SpotClean:SpotClean adjusts for spot swapping in spatial transcriptomics data
SpotClean is a computational method to adjust for spot swapping in spatial transcriptomics data. Recent spatial transcriptomics experiments utilize slides containing thousands of spots with spot-specific barcodes that bind mRNA. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case due to bleed from nearby spots, an artifact we refer to as spot swapping. SpotClean is able to estimate the contamination rate in observed data and decontaminate the spot swapping effect, thus increase the sensitivity and precision of downstream analyses.
Maintained by Zijian Ni. Last updated 5 months ago.
dataimportrnaseqsequencinggeneexpressionspatialsinglecelltranscriptomicspreprocessingrna-seqspatial-transcriptomics
3.5 match 28 stars 6.48 score 36 scriptsopengeos
whitebox:'WhiteboxTools' R Frontend
An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.
Maintained by Andrew Brown. Last updated 5 months ago.
geomorphometrygeoprocessinggeospatialgishydrologyremote-sensingrstudio
2.3 match 173 stars 9.65 score 203 scripts 2 dependentstestpregnault
vsgoftest:Goodness-of-Fit Tests Based on Kullback-Leibler Divergence
An implementation of Vasicek and Song goodness-of-fit tests. Several functions are provided to estimate differential Shannon entropy, i.e., estimate Shannon entropy of real random variables with density, and test the goodness-of-fit of some family of distributions, including uniform, Gaussian, log-normal, exponential, gamma, Weibull, Pareto, Fisher, Laplace and beta distributions; see Lequesne and Regnault (2020) <doi:10.18637/jss.v096.c01>.
Maintained by Philippe Regnault. Last updated 4 years ago.
9.0 match 2.38 score 24 scriptscran
ContaminatedMixt:Clustering and Classification with the Contaminated Normal
Fits mixtures of multivariate contaminated normal distributions (with eigen-decomposed scale matrices) via the expectation conditional- maximization algorithm under a clustering or classification paradigm Methods are described in Antonio Punzo, Angelo Mazza, and Paul D McNicholas (2018) <doi:10.18637/jss.v085.i10>.
Maintained by Angelo Mazza. Last updated 2 years ago.
9.0 match 2.28 score 63 scripts 1 dependentsr-forge
RobAStBase:Robust Asymptotic Statistics
Base S4-classes and functions for robust asymptotic statistics.
Maintained by Matthias Kohl. Last updated 2 months ago.
4.1 match 4.96 score 64 scripts 4 dependentsm-fer1
msamp:Estimate Sample Size to Detect Bacterial Contamination in a Product Lot
Estimates the sample size needed to detect microbial contamination in a lot with a user-specified detection probability and user-specified analytical sensitivity. Various patterns of microbial contamination are accounted for: homogeneous (Poisson), heterogeneous (Poisson-Gamma) or localized(Zero-inflated Poisson). Ida Jongenburger et al. (2010) <doi:10.1016/j.foodcont.2012.02.004> "Impact of microbial distributions on food safety". Leroy Simon (1963) <doi:10.1017/S0515036100001975> "Casualty Actuarial Society - The Negative Binomial and Poisson Distributions Compared".
Maintained by Martine Ferguson. Last updated 2 years ago.
10.0 match 2.00 score 2 scriptscran
NADA:Nondetects and Data Analysis for Environmental Data
Contains methods described by Dennis Helsel in his book "Nondetects And Data Analysis: Statistics for Censored Environmental Data".
Maintained by Lopaka Lee. Last updated 5 years ago.
3.6 match 2 stars 5.48 score 118 scripts 14 dependentslethargy608
vanquish:Variant Quality Investigation Helper
Imports Variant Calling Format file into R. It can detect whether a sample contains contaminant from the same species. In the first stage of the approach, a change-point detection method is used to identify copy number variations for filtering. Next, features are extracted from the data for a support vector machine model. For log-likelihood calculation, the deviation parameter is estimated by maximum likelihood method. Using a radial basis function kernel support vector machine, the contamination of a sample can be detected.
Maintained by Tao Jiang. Last updated 7 years ago.
11.5 match 1.70 score 4 scriptsbioc
DAPAR:Tools for the Differential Analysis of Proteins Abundance with R
The package DAPAR is a Bioconductor distributed R package which provides all the necessary functions to analyze quantitative data from label-free proteomics experiments. Contrarily to most other similar R packages, it is endowed with rich and user-friendly graphical interfaces, so that no programming skill is required (see `Prostar` package).
Maintained by Samuel Wieczorek. Last updated 5 months ago.
proteomicsnormalizationpreprocessingmassspectrometryqualitycontrolgodataimportprostar1
3.6 match 2 stars 5.42 score 22 scripts 1 dependentsbioc
artMS:Analytical R tools for Mass Spectrometry
artMS provides a set of tools for the analysis of proteomics label-free datasets. It takes as input the MaxQuant search result output (evidence.txt file) and performs quality control, relative quantification using MSstats, downstream analysis and integration. artMS also provides a set of functions to re-format and make it compatible with other analytical tools, including, SAINTq, SAINTexpress, Phosfate, and PHOTON. Check [http://artms.org](http://artms.org) for details.
Maintained by David Jimenez-Morales. Last updated 5 months ago.
proteomicsdifferentialexpressionbiomedicalinformaticssystemsbiologymassspectrometryannotationqualitycontrolgenesetenrichmentclusteringnormalizationimmunooncologymultiplecomparisonanalysisanalyticalap-msbioconductorbioinformaticsmass-spectrometryphosphoproteomicspost-translational-modificationquantitative-analysis
2.9 match 14 stars 6.41 score 13 scriptsclaudioagostinelli
GSE:Robust Estimation in the Presence of Cellwise and Casewise Contamination and Missing Data
Robust Estimation of Multivariate Location and Scatter in the Presence of Cellwise and Casewise Contamination and Missing Data.
Maintained by Claudio Agostinelli. Last updated 2 years ago.
5.0 match 3.52 score 22 scripts 5 dependentswahani
saeSim:Simulation Tools for Small Area Estimation
Tools for the simulation of data in the context of small area estimation. Combine all steps of your simulation - from data generation over drawing samples to model fitting - in one object. This enables easy modification and combination of different scenarios. You can store your results in a folder or start the simulation in parallel.
Maintained by Sebastian Warnholz. Last updated 3 years ago.
3.5 match 3 stars 4.72 score 35 scriptsjpquast
protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools
Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.
Maintained by Jan-Philipp Quast. Last updated 5 months ago.
data-analysislip-msmass-spectrometryomicsproteinproteomicssystems-biology
1.9 match 61 stars 8.58 score 83 scriptsvalentint
robust:Port of the S+ "Robust Library"
Methods for robust statistics, a state of the art in the early 2000s, notably for robust regression and robust multivariate analysis.
Maintained by Valentin Todorov. Last updated 7 months ago.
2.0 match 7.52 score 572 scripts 8 dependentsbioc
QFeatures:Quantitative features for mass spectrometry data
The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.
Maintained by Laurent Gatto. Last updated 14 days ago.
infrastructuremassspectrometryproteomicsmetabolomicsbioconductormass-spectrometry
1.3 match 27 stars 11.87 score 278 scripts 49 dependentsiembry
ie2miscdata:Irucka Embry's Miscellaneous USGS Data Collection
A collection of Irucka Embry's miscellaneous USGS data sets (USGS Parameter codes with fixed values, USGS global time zone codes, and US Air Force Global Engineering Weather Data). Irucka created these data sets while a Cherokee Nation Technology Solutions (CNTS) United States Geological Survey (USGS) Contractor and/or USGS employee.
Maintained by Irucka Embry. Last updated 2 years ago.
3.6 match 4.00 scorer-forge
ROptEst:Optimally Robust Estimation
R infrastructure for optimally robust estimation in general smoothly parameterized models using S4 classes and methods as described Kohl, M., Ruckdeschel, P., and Rieder, H. (2010), <doi:10.1007/s10260-010-0133-0>, and in Rieder, H., Kohl, M., and Ruckdeschel, P. (2008), <doi:10.1007/s10260-007-0047-7>.
Maintained by Matthias Kohl. Last updated 2 months ago.
3.4 match 4.26 score 50 scripts 1 dependentsdcousin3
superb:Summary Plots with Adjusted Error Bars
Computes standard error and confidence interval of various descriptive statistics under various designs and sampling schemes. The main function, superb(), return a plot. It can also be used to obtain a dataframe with the statistics and their precision intervals so that other plotting environments (e.g., Excel) can be used. See Cousineau and colleagues (2021) <doi:10.1177/25152459211035109> or Cousineau (2017) <doi:10.5709/acp-0214-z> for a review as well as Cousineau (2005) <doi:10.20982/tqmp.01.1.p042>, Morey (2008) <doi:10.20982/tqmp.04.2.p061>, Baguley (2012) <doi:10.3758/s13428-011-0123-7>, Cousineau & Laurencelle (2016) <doi:10.1037/met0000055>, Cousineau & O'Brien (2014) <doi:10.3758/s13428-013-0441-z>, Calderini & Harding <doi:10.20982/tqmp.15.1.p001> for specific references.
Maintained by Denis Cousineau. Last updated 2 months ago.
error-barsplottingstatisticssummary-plotssummary-statisticsvisualization
1.5 match 19 stars 9.55 score 155 scripts 2 dependentsdpc10ster
RJafroc:Artificial Intelligence Systems and Observer Performance
Analyzing the performance of artificial intelligence (AI) systems/algorithms characterized by a 'search-and-report' strategy. Historically observer performance has dealt with measuring radiologists' performances in search tasks, e.g., searching for lesions in medical images and reporting them, but the implicit location information has been ignored. The implemented methods apply to analyzing the absolute and relative performances of AI systems, comparing AI performance to a group of human readers or optimizing the reporting threshold of an AI system. In addition to performing historical receiver operating receiver operating characteristic (ROC) analysis (localization information ignored), the software also performs free-response receiver operating characteristic (FROC) analysis, where lesion localization information is used. A book using the software has been published: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples, Taylor-Francis LLC; 2017: <https://www.routledge.com/Observer-Performance-Methods-for-Diagnostic-Imaging-Foundations-Modeling/Chakraborty/p/book/9781482214840>. Online updates to this book, which use the software, are at <https://dpc10ster.github.io/RJafrocQuickStart/>, <https://dpc10ster.github.io/RJafrocRocBook/> and at <https://dpc10ster.github.io/RJafrocFrocBook/>. Supported data collection paradigms are the ROC, FROC and the location ROC (LROC). ROC data consists of single ratings per images, where a rating is the perceived confidence level that the image is that of a diseased patient. An ROC curve is a plot of true positive fraction vs. false positive fraction. FROC data consists of a variable number (zero or more) of mark-rating pairs per image, where a mark is the location of a reported suspicious region and the rating is the confidence level that it is a real lesion. LROC data consists of a rating and a location of the most suspicious region, for every image. Four models of observer performance, and curve-fitting software, are implemented: the binormal model (BM), the contaminated binormal model (CBM), the correlated contaminated binormal model (CORCBM), and the radiological search model (RSM). Unlike the binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves that do not inappropriately cross the chance diagonal. Additionally, RSM parameters are related to search performance (not measured in conventional ROC analysis) and classification performance. Search performance refers to finding lesions, i.e., true positives, while simultaneously not finding false positive locations. Classification performance measures the ability to distinguish between true and false positive locations. Knowing these separate performances allows principled optimization of reader or AI system performance. This package supersedes Windows JAFROC (jackknife alternative FROC) software V4.2.1, <https://github.com/dpc10ster/WindowsJafroc>. Package functions are organized as follows. Data file related function names are preceded by 'Df', curve fitting functions by 'Fit', included data sets by 'dataset', plotting functions by 'Plot', significance testing functions by 'St', sample size related functions by 'Ss', data simulation functions by 'Simulate' and utility functions by 'Util'. Implemented are figures of merit (FOMs) for quantifying performance and functions for visualizing empirical or fitted operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC) curves. For fully crossed study designs significance testing of reader-averaged FOM differences between modalities is implemented via either Dorfman-Berbaum-Metz or the Obuchowski-Rockette methods. Also implemented is single modality analysis, which allows comparison of performance of a group of radiologists to a specified value, or comparison of AI to a group of radiologists interpreting the same cases. Crossed-modality analysis is implemented wherein there are two crossed modality factors and the aim is to determined performance in each modality factor averaged over all levels of the second factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve the desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's collaborations. This update includes improvements to the code, some as a result of user-reported bugs and new feature requests, and others discovered during ongoing testing and code simplification.
Maintained by Dev Chakraborty. Last updated 5 months ago.
ai-optimizationartificial-intelligence-algorithmscomputer-aided-diagnosisfroc-analysisroc-analysistarget-classificationtarget-localizationcpp
2.4 match 19 stars 5.69 score 65 scriptssb452
MendelianRandomization:Mendelian Randomization Package
Encodes several methods for performing Mendelian randomization analyses with summarized data. Summarized data on genetic associations with the exposure and with the outcome can be obtained from large consortia. These data can be used for obtaining causal estimates using instrumental variable methods.
Maintained by Stephen Burgess. Last updated 2 years ago.
2.0 match 1 stars 6.83 score 940 scripts 1 dependentsbioc
autonomics:Unified Statistical Modeling of Omics Data
This package unifies access to Statistal Modeling of Omics Data. Across linear modeling engines (lm, lme, lmer, limma, and wilcoxon). Across coding systems (treatment, difference, deviation, etc). Across model formulae (with/without intercept, random effect, interaction or nesting). Across omics platforms (microarray, rnaseq, msproteomics, affinity proteomics, metabolomics). Across projection methods (pca, pls, sma, lda, spls, opls). Across clustering methods (hclust, pam, cmeans). It provides a fast enrichment analysis implementation. And an intuitive contrastogram visualisation to summarize contrast effects in complex designs.
Maintained by Aditya Bhagwat. Last updated 2 months ago.
softwaredataimportpreprocessingdimensionreductionprincipalcomponentregressiondifferentialexpressiongenesetenrichmenttranscriptomicstranscriptiongeneexpressionrnaseqmicroarrayproteomicsmetabolomicsmassspectrometry
2.3 match 5.95 score 5 scriptsjohnjsl7
daewr:Design and Analysis of Experiments with R
Contains Data frames and functions used in the book "Design and Analysis of Experiments with R", Lawson(2015) ISBN-13:978-1-4398-6813-3.
Maintained by John Lawson. Last updated 2 years ago.
3.4 match 3 stars 3.83 score 217 scripts 3 dependentskloke
npsm:Nonparametric Statistical Methods
Accompanies the book "Nonparametric Statistical Methods Using R, 2nd Edition" by Kloke and McKean (2024, ISBN:9780367651350). Includes methods, datasets, and random number generation useful for the study of robust and/or nonparametric statistics. Emphasizes classical nonparametric methods for a variety of designs --- especially one-sample and two-sample problems. Includes methods for general scores, including estimation and testing for the two-sample location problem as well as Hogg's adaptive method.
Maintained by John Kloke. Last updated 9 months ago.
3.6 match 3.47 score 59 scriptslvclark
polyRAD:Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids
Read depth data from genotyping-by-sequencing (GBS) or restriction site-associated DNA sequencing (RAD-seq) are imported and used to make Bayesian probability estimates of genotypes in polyploids or diploids. The genotype probabilities, posterior mean genotypes, or most probable genotypes can then be exported for downstream analysis. 'polyRAD' is described by Clark et al. (2019) <doi:10.1534/g3.118.200913>, and the Hind/He statistic for marker filtering is described by Clark et al. (2022) <doi:10.1186/s12859-022-04635-9>. A variant calling pipeline for highly duplicated genomes is also included and is described by Clark et al. (2020, Version 1) <doi:10.1101/2020.01.11.902890>.
Maintained by Lindsay V. Clark. Last updated 9 days ago.
bioinformaticsdna-sequencinggenotype-likelihoodsgenotyping-by-sequencinghacktoberfestrad-seqrad-sequencingsnp-genotypingcpp
1.8 match 28 stars 6.98 score 85 scriptscran
PairedData:Paired Data Analysis
Many datasets and a set of graphics (based on ggplot2), statistics, effect sizes and hypothesis tests are provided for analysing paired data with S4 class.
Maintained by Stephane Champely. Last updated 7 years ago.
2.3 match 2 stars 5.18 score 326 scripts 4 dependentsbioc
affyContam:structured corruption of affymetrix cel file data
structured corruption of cel file data to demonstrate QA effectiveness
Maintained by V. Carey. Last updated 5 months ago.
3.3 match 3.30 score 1 scriptscran
robreg3S:Three-Step Regression and Inference for Cellwise and Casewise Contamination
Three-step regression and inference for cellwise and casewise contamination.
Maintained by Andy Leung. Last updated 9 years ago.
6.8 match 1.48 score 3 scripts 1 dependentscran
SpatialBSS:Blind Source Separation for Multivariate Spatial Data
Blind source separation for multivariate spatial data based on simultaneous/joint diagonalization of (robust) local covariance matrices. This package is an implementation of the methods described in Bachoc, Genton, Nordhausen, Ruiz-Gazen and Virta (2020) <doi:10.1093/biomet/asz079>.
Maintained by Klaus Nordhausen. Last updated 2 years ago.
4.0 match 2.48 score 1 dependentsbioc
Statial:A package to identify changes in cell state relative to spatial associations
Statial is a suite of functions for identifying changes in cell state. The functionality provided by Statial provides robust quantification of cell type localisation which are invariant to changes in tissue structure. In addition to this Statial uncovers changes in marker expression associated with varying levels of localisation. These features can be used to explore how the structure and function of different cell types may be altered by the agents they are surrounded with.
Maintained by Farhan Ameen. Last updated 5 months ago.
singlecellspatialclassificationsingle-cell
1.8 match 5 stars 5.49 score 23 scriptsoskarhansson
strvalidator:Process Control and Validation of Forensic STR Kits
An open source platform for validation and process control. Tools to analyze data from internal validation of forensic short tandem repeat (STR) kits are provided. The tools are developed to provide the necessary data to conform with guidelines for internal validation issued by the European Network of Forensic Science Institutes (ENFSI) DNA Working Group, and the Scientific Working Group on DNA Analysis Methods (SWGDAM). A front-end graphical user interface is provided. More information about each function can be found in the respective help documentation.
Maintained by Oskar Hansson. Last updated 2 months ago.
2.3 match 5 stars 4.29 score 13 scriptsjean-baptiste-camps
stemmatology:Stemmatological Analysis of Textual Traditions
Explore and analyse the genealogy of textual or musical traditions, from their variants, with various stemmatological methods, mainly the disagreement-based algorithms suggested by Camps and Cafiero (2015) <doi:10.1484/M.LECTIO-EB.5.102565>.
Maintained by Jean-Baptiste Camps. Last updated 6 years ago.
1.8 match 15 stars 5.29 score 26 scriptsbioc
TPP2D:Detection of ligand-protein interactions from 2D thermal profiles (DLPTP)
Detection of ligand-protein interactions from 2D thermal profiles (DLPTP), Performs an FDR-controlled analysis of 2D-TPP experiments by functional analysis of dose-response curves across temperatures.
Maintained by Nils Kurzawa. Last updated 5 months ago.
2.3 match 4.20 score 16 scriptshubverse-org
hubExamples:Example Hub Data
This package provides example data for forecasting and scenario modeling hubs in the hubverse format.
Maintained by Evan L Ray. Last updated 2 months ago.
1.6 match 1 stars 5.46 score 20 scripts 1 dependentsbioc
msqrob2:Robust statistical inference for quantitative LC-MS proteomics
msqrob2 provides a robust linear mixed model framework for assessing differential abundance in MS-based Quantitative proteomics experiments. Our workflows can start from raw peptide intensities or summarised protein expression values. The model parameter estimates can be stabilized by ridge regression, empirical Bayes variance estimation and robust M-estimation. msqrob2's hurde workflow can handle missing data without having to rely on hard-to-verify imputation assumptions, and, outcompetes state-of-the-art methods with and without imputation for both high and low missingness. It builds on QFeature infrastructure for quantitative mass spectrometry data to store the model results together with the raw data and preprocessed data.
Maintained by Lieven Clement. Last updated 19 days ago.
proteomicsmassspectrometrydifferentialexpressionmultiplecomparisonregressionexperimentaldesignsoftwareimmunooncologynormalizationtimecoursepreprocessing
1.2 match 10 stars 6.94 score 83 scriptsyonghuidong
MSbox:Mass Spectrometry Tools
Common mass spectrometry tools described in John Roboz (2013) <doi:10.1201/b15436>. It allows checking element isotopes, calculating (isotope labelled) exact monoisitopic mass, m/z values and mass accuracy, and inspecting possible contaminant mass peaks, examining possible adducts in electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI) ion sources.
Maintained by Yonghui Dong. Last updated 2 years ago.
chemoinformaticsmass-spectrometrymetabolitesmetabolomics
2.8 match 1 stars 2.70 score 10 scriptsbioc
DEWSeq:Differential Expressed Windows Based on Negative Binomial Distribution
DEWSeq is a sliding window approach for the analysis of differentially enriched binding regions eCLIP or iCLIP next generation sequencing data.
Maintained by bioinformatics team Hentze. Last updated 5 months ago.
sequencinggeneregulationfunctionalgenomicsdifferentialexpressionbioinformaticseclipngs-analysis
1.3 match 5 stars 5.30 score 4 scriptscran
STAND:Statistical Analysis of Non-Detects
Provides functions for the analysis of occupational and environmental data with non-detects. Maximum likelihood (ML) methods for censored log-normal data and non-parametric methods based on the product limit estimate (PLE) for left censored data are used to calculate all of the statistics recommended by the American Industrial Hygiene Association (AIHA) for the complete data case. Functions for the analysis of complete samples using exact methods are also provided for the lognormal model. Revised from 2007-11-05 'survfit~1'.
Maintained by E. P. Adams. Last updated 9 years ago.
3.4 match 2.00 scorealexisderumigny
MMDCopula:Robust Estimation of Copulas by Maximum Mean Discrepancy
Provides functions for the robust estimation of parametric families of copulas using minimization of the Maximum Mean Discrepancy, following the article Alquier, Chérief-Abdellatif, Derumigny and Fermanian (2022) <doi:10.1080/01621459.2021.2024836>.
Maintained by Alexis Derumigny. Last updated 3 years ago.
1.5 match 5 stars 4.40 score 3 scriptsfunwithr
LongMemoryTS:Long Memory Time Series
Long Memory Time Series is a collection of functions for estimation, simulation and testing of long memory processes, spurious long memory processes and fractionally cointegrated systems.
Maintained by Christian Leschinski. Last updated 6 years ago.
1.7 match 2 stars 3.40 score 42 scripts 1 dependentsmayooran1987
grabsampling:Probability of detection for grab sample selection
The goal of grabsampling package is to enable probability of detection calculation for grab samples selection by using two different methods such as systematic or random based on two-state Markov chain in bulk production process.
Maintained by Mayooran Thevaraja. Last updated 2 years ago.
2.0 match 1 stars 2.70 score 1 scriptsyunyishen
robustcov:Collection of Robust Covariance and (Sparse) Precision Matrix Estimators
Collection of methods for robust covariance and (sparse) precision matrix estimation based on Loh and Tan (2018) <doi:10.1214/18-EJS1427>.
Maintained by Yunyi Shen. Last updated 4 years ago.
precision-matrixrobust-estimatesopenblascppopenmp
2.0 match 1 stars 2.70 scorecran
HeckmanEM:Fit Normal, Student-t or Contaminated Normal Heckman Selection Models
It performs maximum likelihood estimation for the Heckman selection model (Normal, Student-t or Contaminated normal) using an EM-algorithm <doi:10.1016/j.jmva.2021.104737>. It also performs influence diagnostic through global and local influence for four possible perturbation schema.
Maintained by Marcos Prates. Last updated 11 months ago.
5.0 match 1.00 scorecran
ssMRCD:Spatially Smoothed MRCD Estimator
Estimation of the Spatially Smoothed Minimum Regularized Determinant (ssMRCD) estimator and its usage in an ssMRCD-based outlier detection method as described in Puchhammer and Filzmoser (2023) <doi:10.1080/10618600.2023.2277875> and for sparse robust PCA for multi-source data described in Puchhammer, Wilms and Filzmoser (2024) <doi:10.48550/arXiv.2407.16299>. Included are also complementary visualization and parameter tuning tools.
Maintained by Patricia Puchhammer. Last updated 7 months ago.
2.3 match 2.00 score 3 scriptsmeintraumus
AFFECT:Accelerated Functional Failure Time Model with Error-Contaminated Survival Times
We aim to deal with data with measurement error in the response and misclassification censoring status under an AFT model. This package primarily contains three functions, which are used to generate artificial data, correction for error-prone data and estimate the functional covariates for an AFT model.
Maintained by Hsiao-Ting Huang. Last updated 2 years ago.
4.4 match 1.00 scoreschoonees
cds:Constrained Dual Scaling for Detecting Response Styles
This is an implementation of constrained dual scaling for detecting response styles in categorical data, including utility functions. The procedure involves adding additional columns to the data matrix representing the boundaries between the rating categories. The resulting matrix is then doubled and analyzed by dual scaling. One-dimensional solutions are sought which provide optimal scores for the rating categories. These optimal scores are constrained to follow monotone quadratic splines. Clusters are introduced within which the response styles can vary. The type of response style present in a cluster can be diagnosed from the optimal scores for said cluster, and this can be used to construct an imputed version of the data set which adjusts for response styles.
Maintained by Pieter Schoonees. Last updated 9 years ago.
1.7 match 2.65 score 37 scripts 1 dependentscran
varitas:Variant Calling in Targeted Analysis Sequencing Data
Multi-caller variant analysis pipeline for targeted analysis sequencing (TAS) data. Features a modular, automated workflow that can start with raw reads and produces a user-friendly PDF summary and a spreadsheet containing consensus variant information.
Maintained by Adam Mills. Last updated 4 years ago.
1.9 match 2.30 scorecran
MSclust:Multiple-Scaled Clustering
Model based clustering using the multivariate multiple Scaled t (MST) and multivariate multiple scaled contaminated normal (MSCN) distributions. The MST is an extension of the multivariate Student-t distribution to include flexible tail behaviors, Forbes, F. & Wraith, D. (2014) <doi:10.1007/s11222-013-9414-4>. The MSCN represents a heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers (also referred to as "bad" points) and automatically detect bad points, Punzo, A. & Tortora, C. (2021) <doi:10.1177/1471082X19890935>.
Maintained by Cristina Tortora. Last updated 11 months ago.
4.1 match 1.00 scoreaipao757
InfiniumPurify:Estimate and Account for Tumor Purity in Cancer Methylation Data Analysis
The proportion of cancer cells in solid tumor sample, known as the tumor purity, has adverse impact on a variety of data analyses if not properly accounted for. We develop 'InfiniumPurify', which is a comprehensive R package for estimating and accounting for tumor purity based on DNA methylation Infinium 450k array data. 'InfiniumPurify' provides functionalities for tumor purity estimation. In addition, it can perform differential methylation detection and tumor sample clustering with the consideration of tumor purities.
Maintained by Yufang Qin. Last updated 8 years ago.
1.7 match 2.00 score 7 scriptshungtong
MixtureMissing:Robust and Flexible Model-Based Clustering for Data Sets with Missing Values at Random
Implementations of various robust and flexible model-based clustering methods for data sets with missing values at random. Two main models are: Multivariate Contaminated Normal Mixture (MCNM, Tong and Tortora, 2022, <doi:10.1007/s11634-021-00476-1>) and Multivariate Generalized Hyperbolic Mixture (MGHM, Wei et al., 2019, <doi:10.1016/j.csda.2018.08.016>). Mixtures via some special or limiting cases of the multivariate generalized hyperbolic distribution are also included: Normal-Inverse Gaussian, Symmetric Normal-Inverse Gaussian, Skew-Cauchy, Cauchy, Skew-t, Student's t, Normal, Symmetric Generalized Hyperbolic, Hyperbolic Univariate Marginals, Hyperbolic, and Symmetric Hyperbolic.
Maintained by Hung Tong. Last updated 1 months ago.
2.3 match 1.48 score 5 scriptsmarkbaas
con2lki:Calculate the Dutch Air Quality Index (LKI)
Calculates the dutch air quality index (LKI). This index was created on the basis of scientific studies of the health effects of air pollution. From these studies it can be deduced at what concentrations a certain percentage of the population can be affected. For more information see: <https://www.rivm.nl/bibliotheek/rapporten/2014-0050.pdf>.
Maintained by Mark Baas. Last updated 4 years ago.
2.0 match 1.70 score 3 scriptskolesarm
multe:Multiple Treatment Effects Regression
Implements contamination bias diagnostics and alternative estimators for regressions with multiple treatments. The implementation is based on Goldsmith-Pinkham, Hull, and Kolesár (2024) <doi:10.48550/arXiv.2106.05024>.
Maintained by Michal Kolesár. Last updated 3 months ago.
0.5 match 16 stars 5.51 score 2 scriptsbioc
SeqSQC:A bioconductor package for sample quality check with next generation sequencing data
The SeqSQC is designed to identify problematic samples in NGS data, including samples with gender mismatch, contamination, cryptic relatedness, and population outlier.
Maintained by Qian Liu. Last updated 5 months ago.
experiment datahomo_sapiens_datasequencing dataproject1000genomesgenome
0.5 match 5.38 score 2 scriptsbioc
strandCheckR:Calculate strandness information of a bam file
This package aims to quantify and remove putative double strand DNA from a strand-specific RNA sample. There are also options and methods to plot the positive/negative proportions of all sliding windows, which allow users to have an idea of how much the sample was contaminated and the appropriate threshold to be used for filtering.
Maintained by Thu-Hien To. Last updated 5 months ago.
rnaseqalignmentqualitycontrolcoverageimmunooncology
0.5 match 4.78 score 7 scriptsbioc
SMAD:Statistical Modelling of AP-MS Data (SMAD)
Assigning probability scores to protein interactions captured in affinity purification mass spectrometry (AP-MS) expriments to infer protein-protein interactions. The output would facilitate non-specific background removal as contaminants are commonly found in AP-MS data.
Maintained by Qingzhou Zhang. Last updated 5 months ago.
massspectrometryproteomicssoftwarecpp
0.5 match 4.60 score 3 scriptsbioc
Uniquorn:Identification of cancer cell lines based on their weighted mutational/ variational fingerprint
'Uniquorn' enables users to identify cancer cell lines. Cancer cell line misidentification and cross-contamination reprents a significant challenge for cancer researchers. The identification is vital and in the frame of this package based on the locations/ loci of somatic and germline mutations/ variations. The input format is vcf/ vcf.gz and the files have to contain a single cancer cell line sample (i.e. a single member/genotype/gt column in the vcf file).
Maintained by Raik Otto. Last updated 5 months ago.
immunooncologystatisticalmethodwholegenomeexomeseq
0.5 match 4.30 scorecran
MatrixHMM:Parsimonious Families of Hidden Markov Models for Matrix-Variate Longitudinal Data
Implements three families of parsimonious hidden Markov models (HMMs) for matrix-variate longitudinal data using the Expectation-Conditional Maximization (ECM) algorithm. The package supports matrix-variate normal, t, and contaminated normal distributions as emission distributions. For each hidden state, parsimony is achieved through the eigen-decomposition of the covariance matrices associated with the emission distribution. This approach results in a comprehensive set of 98 parsimonious HMMs for each type of emission distribution. Atypical matrix detection is also supported, utilizing the fitted (heavy-tailed) models.
Maintained by Salvatore D. Tomarchio. Last updated 7 months ago.
2.1 match 1.00 scorecran
SeleMix:Selective Editing via Mixture Models
Detection of outliers and influential errors using a latent variable model.
Maintained by Teresa Buglielli. Last updated 2 months ago.
2.0 match 1.04 score 11 scriptsbioc
Rbec:Rbec: a tool for analysis of amplicon sequencing data from synthetic microbial communities
Rbec is a adapted version of DADA2 for analyzing amplicon sequencing data from synthetic communities (SynComs), where the reference sequences for each strain exists. Rbec can not only accurately profile the microbial compositions in SynComs, but also predict the contaminants in SynCom samples.
Maintained by Pengfan Zhang. Last updated 5 months ago.
sequencingmicrobialstrainmicrobiomecpp
0.5 match 4.00 score 1 scriptssb452
MRZero:Diet Mendelian Randomization
Encodes several methods for performing Mendelian randomization analyses with summarized data. Similar to the 'MendelianRandomization' package, but with fewer bells and whistles, and less frequent updates. As described in Yavorska (2017) <doi:10.1093/ije/dyx034> and Broadbent (2020) <doi:10.12688/wellcomeopenres.16374.2>.
Maintained by Stephen Burgess. Last updated 11 months ago.
2.0 match 1.00 scorephamdn
mbRes:Exploration of Multiple Biomarker Responses using Effect Size
Summarize multiple biomarker responses of aquatic organisms to contaminants using Cliff’s delta, as described in Pham & Sokolova (2023) <doi:10.1002/ieam.4676>.
Maintained by Duy Nghia Pham. Last updated 7 months ago.
0.5 match 3.00 score 1 scriptsxilustat
Bayenet:Bayesian Quantile Elastic Net for Genetic Study
As heavy-tailed error distribution and outliers in the response variable widely exist, models which are robust to data contamination are highly demanded. Here, we develop a novel robust Bayesian variable selection method with elastic net penalty for quantile regression in genetic analysis. In particular, the spike-and-slab priors have been incorporated to impose sparsity. An efficient Gibbs sampler has been developed to facilitate computation.The core modules of the package have been developed in 'C++' and R.
Maintained by Xi Lu. Last updated 11 months ago.
0.5 match 2.70 scorexilustat
marble:Robust Marginal Bayesian Variable Selection for Gene-Environment Interactions
Recently, multiple marginal variable selection methods have been developed and shown to be effective in Gene-Environment interactions studies. We propose a novel marginal Bayesian variable selection method for Gene-Environment interactions studies. In particular, our marginal Bayesian method is robust to data contamination and outliers in the outcome variables. With the incorporation of spike-and-slab priors, we have implemented the Gibbs sampler based on Markov Chain Monte Carlo. The core algorithms of the package have been developed in 'C++'.
Maintained by Xi Lu. Last updated 11 months ago.
0.5 match 2.70 score 2 scriptschedgala
lqr:Robust Linear Quantile Regression
It fits a robust linear quantile regression model using a new family of zero-quantile distributions for the error term. Missing values and censored observations can be handled as well. This family of distribution includes skewed versions of the Normal, Student's t, Laplace, Slash and Contaminated Normal distribution. It also performs logistic quantile regression for bounded responses as shown in Galarza et.al.(2020) <doi:10.1007/s13571-020-00231-0>. It provides estimates and full inference. It also provides envelopes plots for assessing the fit and confidences bands when several quantiles are provided simultaneously.
Maintained by Christian E. Galarza. Last updated 8 months ago.
0.5 match 1 stars 2.08 score 9 scripts 2 dependentsroger0268
octopucs:Statistical Support for Hierarchical Clusters
Generates n hierarchical clustering hypotheses on subsets of classifiers (usually species in community ecology studies). The n clustering hypotheses are combined to generate a generalized cluster, and computes three metrics of support. 1) The average proportion of elements conforming the group in each of the n clusters (integrity). And 2) the contamination, i.e., the average proportion of elements from other groups that enter a focal group. 3) The probability of existence of the group gives the integrity and contamination in a Bayesian approach.
Maintained by Roger Guevara. Last updated 7 months ago.
0.8 match 1.30 scorecran
ssym:Fitting Semi-Parametric log-Symmetric Regression Models
Set of tools to fit a semi-parametric regression model suitable for analysis of data sets in which the response variable is continuous, strictly positive, asymmetric and possibly, censored. Under this setup, both the median and the skewness of the response variable distribution are explicitly modeled by using semi-parametric functions, whose non-parametric components may be approximated by natural cubic splines or P-splines. Supported distributions for the model error include log-normal, log-Student-t, log-power-exponential, log-hyperbolic, log-contaminated-normal, log-slash, Birnbaum-Saunders and Birnbaum-Saunders-t distributions.
Maintained by Luis Hernando Vanegas. Last updated 2 years ago.
0.5 match 1.48 score 1 dependentsclaudioagostinelli
robustvarComp:Robust Estimation of Variance Component Models
Robust Estimation of Variance Component Models by classic and composite robust procedures. The composite procedures are robust against outliers generated by the Independent Contamination Model.
Maintained by Claudio Agostinelli. Last updated 2 years ago.
0.5 match 1.38 score 24 scriptscran
TSMN:Truncated Scale Mixtures of Normal Distributions
Return the first four moments of the SMN distributions (Normal, Student-t, Pearson VII, Slash or Contaminated Normal).
Maintained by Eraldo B. dos Anjos Filho. Last updated 8 years ago.
0.5 match 1.00 scorecran
TSMSN:Truncated Scale Mixtures of Skew-Normal Distributions
Return the first four moments, estimation of parameters and sample of the TSMSN distributions (Skew Normal, Skew t, Skew Slash or Skew Contaminated Normal).
Maintained by Eraldo B. dos Anjos Filho. Last updated 6 years ago.
0.5 match 1.00 scorecran
Inquilab:Dissipation Kinetics Analysis, Half Life Period, Rate Constant, Plots
For environmental chemists, ecologists, researchers and agricultural scientists to understand the dissipation kinetics, calculate the half-life periods and rate constants of compounds, pesticides, contaminants in different matrices.
Maintained by Jajati Mandal. Last updated 1 years ago.
0.5 match 1.00 scorehenriqueest
TVMM:Multivariate Tests for the Vector of Means
This is a statistical tool interactive that provides multivariate statistical tests that are more powerful than traditional Hotelling T2 test and LRT (likelihood ratio test) for the vector of normal mean populations with and without contamination and non-normal populations (Henrique J. P. Alves & Daniel F. Ferreira (2019) <DOI: 10.1080/03610918.2019.1693596>).
Maintained by Henrique Jose de Paula Alves. Last updated 4 years ago.
0.5 match 1.00 score 3 scriptsdk657
spedecon:Smoothness-Penalized Deconvolution for Density Estimation Under Measurement Error
Implements the Smoothness-Penalized Deconvolution method for estimating a probability density under measurement error of Kent and Ruppert (2023) <doi:10.1080/01621459.2023.2259028>. The estimator is formed by computing a histogram of the error-contaminated data, and then finding an estimate that minimizes a reconstruction error plus a smoothness-inducing penalty term. The primary function, sped(), takes the data and error distribution, and returns the estimator as a function.
Maintained by David Kent. Last updated 1 years ago.
0.5 match 1.00 scorelirong95
LqG:Robust Group Variable Screening Based on Maximum Lq-Likelihood Estimation
Produces a group screening procedure that is based on maximum Lq-likelihood estimation, to simultaneously account for the group structure and data contamination in variable screening. The methods are described in Li, Y., Li, R., Qin, Y., Lin, C., & Yang, Y. (2021) Robust Group Variable Screening Based on Maximum Lq-likelihood Estimation. Statistics in Medicine, 40:6818-6834.<doi:10.1002/sim.9212>.
Maintained by Rong Li. Last updated 3 years ago.
0.5 match 1.00 scoregallaump
MatrixMixtures:Model-Based Clustering via Matrix-Variate Mixture Models
Implements finite mixtures of matrix-variate contaminated normal distributions via expectation conditional-maximization algorithm for model-based clustering, as described in Tomarchio et al.(2020) <arXiv:2005.03861>. One key advantage of this model is the ability to automatically detect potential outlying matrices by computing their a posteriori probability of being typical or atypical points. Finite mixtures of matrix-variate t and matrix-variate normal distributions are also implemented by using expectation-maximization algorithms.
Maintained by Michael P.B. Gallaugher. Last updated 4 years ago.
0.5 match 1.00 score