Showing 200 of total 610 results (show query)
jkrijthe
RSSL:Implementations of Semi-Supervised Learning Approaches for Classification
A collection of implementations of semi-supervised classifiers and methods to evaluate their performance. The package includes implementations of, among others, Implicitly Constrained Learning, Moment Constrained Learning, the Transductive SVM, Manifold regularization, Maximum Contrastive Pessimistic Likelihood estimation, S4VM and WellSVM.
Maintained by Jesse Krijthe. Last updated 1 years ago.
60.3 match 58 stars 6.05 score 128 scripts 1 dependentsbioc
genefu:Computation of Gene Expression-Based Signatures in Breast Cancer
This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.
Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.
differentialexpressiongeneexpressionvisualizationclusteringclassification
26.3 match 7.42 score 193 scripts 3 dependentscomputationalstylistics
stylo:Stylometric Multivariate Analyses
Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.
Maintained by Maciej Eder. Last updated 2 months ago.
22.5 match 186 stars 8.59 score 462 scriptspokotylo
ddalpha:Depth-Based Classification and Calculation of Data Depth
Contains procedures for depth-based supervised learning, which are entirely non-parametric, in particular the DDalpha-procedure (Lange, Mosler and Mozharovskyi, 2014 <doi:10.1007/s00362-012-0488-4>). The training data sample is transformed by a statistical depth function to a compact low-dimensional space, where the final classification is done. It also offers an extension to functional data and routines for calculating certain notions of statistical depth functions. 50 multivariate and 5 functional classification problems are included. (Pokotylo, Mozharovskyi and Dyckerhoff, 2019 <doi:10.18637/jss.v091.i05>).
Maintained by Oleksii Pokotylo. Last updated 6 months ago.
40.1 match 2 stars 4.40 score 211 scripts 7 dependentsbioc
geneClassifiers:Application of gene classifiers
This packages aims for easy accessible application of classifiers which have been published in literature using an ExpressionSet as input.
Maintained by R Kuiper. Last updated 5 months ago.
geneexpressionbiomedicalinformaticsclassificationsurvivalmicroarray
37.8 match 1 stars 4.62 score 35 scriptsretowuest
autoMrP:Improving MrP with Ensemble Learning
A tool that improves the prediction performance of multilevel regression with post-stratification (MrP) by combining a number of machine learning methods. For information on the method, please refer to Broniecki, Wüest, Leemann (2020) ''Improving Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP)'' in the 'Journal of Politics'. Final pre-print version: <https://lucasleemann.files.wordpress.com/2020/07/automrp-r2pa.pdf>.
Maintained by Philipp Broniecki. Last updated 5 months ago.
29.3 match 27 stars 5.61 scoremajkamichal
naivebayes:High Performance Implementation of the Naive Bayes Algorithm
In this implementation of the Naive Bayes classifier following class conditional distributions are available: 'Bernoulli', 'Categorical', 'Gaussian', 'Poisson', 'Multinomial' and non-parametric representation of the class conditional density estimated via Kernel Density Estimation. Implemented classifiers handle missing data and can take advantage of sparse data.
Maintained by Michal Majka. Last updated 1 months ago.
classification-modeldatasciencemachine-learningnaive-bayes
13.9 match 37 stars 10.47 score 1.0k scripts 6 dependentsbupaverse
bupaR:Business Process Analysis in R
Comprehensive Business Process Analysis toolkit. Creates S3-class for event log objects, and related handler functions. Imports related packages for filtering event data, computation of descriptive statistics, handling of 'Petri Net' objects and visualization of process maps. See also packages 'edeaR','processmapR', 'eventdataR' and 'processmonitR'.
Maintained by Gert Janssenswillen. Last updated 2 years ago.
15.3 match 55 stars 9.07 score 389 scripts 11 dependentsbnaras
pamr:Pam: Prediction Analysis for Microarrays
Some functions for sample classification in microarrays.
Maintained by Balasubramanian Narasimhan. Last updated 9 months ago.
15.1 match 7.90 score 256 scripts 14 dependentsbmihaljevic
bnclassify:Learning Discrete Bayesian Network Classifiers from Data
State-of-the art algorithms for learning discrete Bayesian network classifiers from data, including a number of those described in Bielza & Larranaga (2014) <doi:10.1145/2576868>, with functions for prediction, model evaluation and inspection.
Maintained by Mihaljevic Bojan. Last updated 1 years ago.
17.1 match 18 stars 6.85 score 66 scriptsbioc
switchBox:Utilities to train and validate classifiers based on pair switching using the K-Top-Scoring-Pair (KTSP) algorithm
The package offer different classifiers based on comparisons of pair of features (TSP), using various decision rules (e.g., majority wins principle).
Maintained by Bahman Afsari. Last updated 5 months ago.
softwarestatisticalmethodclassification
26.0 match 4.30 score 11 scripts 1 dependentse-sensing
sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes
An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
Maintained by Gilberto Camara. Last updated 1 months ago.
big-earth-datacbersearth-observationeo-datacubesgeospatialimage-time-seriesland-cover-classificationlandsatplanetary-computerr-spatialremote-sensingrspatialsatellite-image-time-seriessatellite-imagerysentinel-2stac-apistac-catalogcpp
11.4 match 494 stars 9.50 score 384 scriptsbioc
twoddpcr:Classify 2-d Droplet Digital PCR (ddPCR) data and quantify the number of starting molecules
The twoddpcr package takes Droplet Digital PCR (ddPCR) droplet amplitude data from Bio-Rad's QuantaSoft and can classify the droplets. A summary of the positive/negative droplet counts can be generated, which can then be used to estimate the number of molecules using the Poisson distribution. This is the first open source package that facilitates the automatic classification of general two channel ddPCR data. Previous work includes 'definetherain' (Jones et al., 2014) and 'ddpcRquant' (Trypsteen et al., 2015) which both handle one channel ddPCR experiments only. The 'ddpcr' package available on CRAN (Attali et al., 2016) supports automatic gating of a specific class of two channel ddPCR experiments only.
Maintained by Anthony Chiu. Last updated 5 months ago.
18.6 match 10 stars 5.78 score 4 scriptsbioc
TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data
The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.
Maintained by Tiago Chedraoui Silva. Last updated 26 days ago.
dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksequencingsurvivalsoftwarebiocbioconductorgdcintegrative-analysistcgatcga-datatcgabiolinks
7.3 match 305 stars 14.45 score 1.6k scripts 6 dependentsrfastofficial
Rfast2:A Collection of Efficient and Extremely Fast R Functions II
A collection of fast statistical and utility functions for data analysis. Functions for regression, maximum likelihood, column-wise statistics and many more have been included. C++ has been utilized to speed up the functions. References: Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>.
Maintained by Manos Papadakis. Last updated 1 years ago.
12.7 match 38 stars 8.09 score 75 scripts 26 dependentsbioimaginggroup
nucim:Nucleome Imaging Toolbox
Tools for 4D nucleome imaging. Quantitative analysis of the 3D nuclear landscape recorded with super-resolved fluorescence microscopy. See Volker J. Schmid, Marion Cremer, Thomas Cremer (2017) <doi:10.1016/j.ymeth.2017.03.013>.
Maintained by Volker Schmid. Last updated 3 years ago.
22.5 match 2 stars 4.48 score 7 scriptsopengeos
whitebox:'WhiteboxTools' R Frontend
An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.
Maintained by Andrew Brown. Last updated 5 months ago.
geomorphometrygeoprocessinggeospatialgishydrologyremote-sensingrstudio
10.3 match 173 stars 9.65 score 203 scripts 2 dependentsbioc
doubletrouble:Identification and classification of duplicated genes
doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.
Maintained by Fabrício Almeida-Silva. Last updated 3 days ago.
softwarewholegenomecomparativegenomicsfunctionalgenomicsphylogeneticsnetworkclassificationbioinformaticscomparative-genomicsgene-duplicationmolecular-evolutionwhole-genome-duplication
15.0 match 23 stars 6.44 score 17 scriptsbioc
CMA:Synthesis of microarray-based classification
This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.
Maintained by Roman Hornung. Last updated 5 months ago.
18.6 match 5.09 score 61 scriptsbioc
scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data
The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.
Maintained by Johannes Griss. Last updated 5 months ago.
singlecelltranscriptomicsgeneexpressionsupportvectormachineclassificationsoftware
13.1 match 15 stars 6.73 score 20 scriptsipa-tys
ROCR:Visualizing the Performance of Scoring Classifiers
ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of trade-off visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different cross-validation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.
Maintained by Felix G.M. Ernst. Last updated 12 months ago.
6.1 match 38 stars 14.29 score 9.2k scripts 217 dependentsrspatial
terra:Spatial Data Analysis
Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).
Maintained by Robert J. Hijmans. Last updated 9 hours ago.
geospatialrasterspatialvectoronetbbprojgdalgeoscpp
4.9 match 559 stars 17.64 score 17k scripts 851 dependentskurthornik
RWeka:R/Weka Interface
An R interface to Weka (Version 3.9.3). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Package 'RWeka' contains the interface code, the Weka jar is in a separate package 'RWekajars'. For more information on Weka see <https://www.cs.waikato.ac.nz/ml/weka/>.
Maintained by Kurt Hornik. Last updated 2 years ago.
9.3 match 4 stars 8.24 score 1.8k scripts 14 dependentsrrwen
nbc4va:Bayes Classifier for Verbal Autopsy Data
An implementation of the Naive Bayes Classifier (NBC) algorithm used for Verbal Autopsy (VA) built on code from Miasnikof et al (2015) <DOI:10.1186/s12916-015-0521-2>.
Maintained by Richard Wen. Last updated 3 years ago.
autopsybayescauseclassifiercodedcomputerdeathestimateimputationlearningmachinemdsmillionnaivenbcprobabilitystudytheoryvaverbal
16.6 match 4.60 score 79 scriptsbioc
PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit
Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationsurvivalclusteringgeneprediction
16.6 match 1 stars 4.31 score 17 scriptsrivolli
utiml:Utilities for Multi-Label Learning
Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) <doi:10.1109/TKDE.2013.39> and Gibaja, E. and Ventura, S. (2015) A Tutorial on Multi-label Learning.
Maintained by Adriano Rivolli. Last updated 4 years ago.
10.9 match 28 stars 6.39 score 87 scriptsquanteda
quanteda.textmodels:Scaling Models and Classifiers for Textual Data
Scaling models and classifiers for sparse matrix objects representing textual data in the form of a document-feature matrix. Includes original implementations of 'Laver', 'Benoit', and Garry's (2003) <doi:10.1017/S0003055403000698>, 'Wordscores' model, the Perry and 'Benoit' (2017) <doi:10.48550/arXiv.1710.08963> class affinity scaling model, and the 'Slapin' and 'Proksch' (2008) <doi:10.1111/j.1540-5907.2008.00338.x> 'wordfish' model, as well as methods for correspondence analysis, latent semantic analysis, and fast Naive Bayes and linear 'SVMs' specially designed for sparse textual data.
Maintained by Kenneth Benoit. Last updated 1 months ago.
7.3 match 42 stars 9.56 score 432 scriptscran
Modeler:Classes and Methods for Training and Using Binary Prediction Models
Defines classes and methods to learn models and use them to predict binary outcomes. These are generic tools, but we also include specific examples for many common classifiers.
Maintained by Kevin R. Coombes. Last updated 2 months ago.
19.4 match 3.48 score 1 dependentseuropeanifcbgroup
iRfcb:Tools for Managing Imaging FlowCytobot (IFCB) Data
A comprehensive suite of tools for managing, processing, and analyzing data from the IFCB. I R FlowCytobot ('iRfcb') supports quality control, geospatial analysis, and preparation of IFCB data for publication in databases like <https://www.gbif.org>, <https://www.obis.org>, <https://emodnet.ec.europa.eu/en>, <https://shark.smhi.se/>, and <https://www.ecotaxa.org>. The package integrates with the MATLAB 'ifcb-analysis' tool, which is described in Sosik and Olson (2007) <doi:10.4319/lom.2007.5.204>, and provides features for working with raw, manually classified, and machine learning–classified image datasets. Key functionalities include image extraction, particle size distribution analysis, taxonomic data handling, and biomass concentration calculations, essential for plankton research.
Maintained by Anders Torstensson. Last updated 2 days ago.
11.7 match 1 stars 5.72 scoretopepo
sparsediscrim:Sparse and Regularized Discriminant Analysis
A collection of sparse and regularized discriminant analysis methods intended for small-sample, high-dimensional data sets. The package features the High-Dimensional Regularized Discriminant Analysis classifier from Ramey et al. (2017) <arXiv:1602.01182>. Other classifiers include those from Dudoit et al. (2002) <doi:10.1198/016214502753479248>, Pang et al. (2009) <doi:10.1111/j.1541-0420.2009.01200.x>, and Tong et al. (2012) <doi:10.1093/bioinformatics/btr690>.
Maintained by Max Kuhn. Last updated 4 years ago.
16.1 match 3 stars 4.11 score 86 scriptsbioc
adverSCarial:adverSCarial, generate and analyze the vulnerability of scRNA-seq classifier to adversarial attacks
adverSCarial is an R Package designed for generating and analyzing the vulnerability of scRNA-seq classifiers to adversarial attacks. The package is versatile and provides a format for integrating any type of classifier. It offers functions for studying and generating two types of attacks, single gene attack and max change attack. The single-gene attack involves making a small modification to the input to alter the classification. The max-change attack involves making a large modification to the input without changing its classification. The package provides a comprehensive solution for evaluating the robustness of scRNA-seq classifiers against adversarial attacks.
Maintained by Ghislain FIEVET. Last updated 5 months ago.
softwaresinglecelltranscriptomicsclassification
12.2 match 5.42 score 19 scriptsbergsmat
yamlet:Versatile Curation of Table Metadata
A YAML-based mechanism for working with table metadata. Supports compact syntax for creating, modifying, viewing, exporting, importing, displaying, and plotting metadata coded as column attributes. The 'yamlet' dialect is valid 'YAML' with defaults and conventions chosen to improve readability. See ?yamlet, ?decorate, ?modify, ?io_csv, and ?ggplot.decorated.
Maintained by Tim Bergsma. Last updated 22 days ago.
11.0 match 2 stars 5.99 score 60 scripts 1 dependentsrafaeljm
LibOPF:Design of Optimum-Path Forest Classifiers
The 'LibOPF' is a framework to develop pattern recognition techniques based on optimum-path forests (OPF), João P. Papa and Alexandre X. Falcão (2008) <doi:10.1007/978-3-540-89639-5_89>, with methods for supervised learning and data clustering.
Maintained by Rafael Junqueira Martarelli. Last updated 4 years ago.
20.7 match 1 stars 3.18 scoremuvisu
biplotEZ:EZ-to-Use Biplots
Provides users with an EZ-to-use platform for representing data with biplots. Currently principal component analysis (PCA), canonical variate analysis (CVA) and simple correspondence analysis (CA) biplots are included. This is accompanied by various formatting options for the samples and axes. Alpha-bags and concentration ellipses are included for visual enhancements and interpretation. For an extensive discussion on the topic, see Gower, J.C., Lubbe, S. and le Roux, N.J. (2011, ISBN: 978-0-470-01255-0) Understanding Biplots. Wiley: Chichester.
Maintained by Sugnet Lubbe. Last updated 6 days ago.
7.8 match 7 stars 8.39 score 30 scripts 1 dependentsemeyers
NeuroDecodeR:Decode Information from Neural Activity
Neural decoding is method of analyzing neural data that uses a pattern classifiers to predict experimental conditions based on neural activity. 'NeuroDecodeR' is a system of objects that makes it easy to run neural decoding analyses. For more information on neural decoding see Meyers & Kreiman (2011) <doi:10.7551/mitpress/8404.003.0024>.
Maintained by Ethan Meyers. Last updated 1 years ago.
10.1 match 12 stars 6.49 score 17 scriptsbbuchsbaum
multivarious:Extensible Data Structures for Multivariate Analysis
Provides a set of basic and extensible data structures and functions for multivariate analysis, including dimensionality reduction techniques, projection methods, and preprocessing functions. The aim of this package is to offer a flexible and user-friendly framework for multivariate analysis that can be easily extended for custom requirements and specific data analysis tasks.
Maintained by Bradley Buchsbaum. Last updated 3 months ago.
18.4 match 3.53 score 17 scriptsleapigufpb
FuzzyClass:Fuzzy and Non-Fuzzy Classifiers
It provides classifiers which can be used for discrete variables and for continuous variables based on the Naive Bayes and Fuzzy Naive Bayes hypothesis. Those methods were developed by researchers belong to the 'Laboratory of Technologies for Virtual Teaching and Statistics (LabTEVE)' and 'Laboratory of Applied Statistics to Image Processing and Geoprocessing (LEAPIG)' at 'Federal University of Paraiba, Brazil'. They considered some statistical distributions and their papers were published in the scientific literature, as for instance, the Gaussian classifier using fuzzy parameters, proposed by 'Moraes, Ferreira and Machado' (2021) <doi:10.1007/s40815-020-00936-4>.
Maintained by Jodavid Ferreira. Last updated 5 months ago.
16.2 match 1 stars 4.00 score 10 scriptswenjie2wang
abclass:Angle-Based Large-Margin Classifiers
Multi-category angle-based large-margin classifiers. See Zhang and Liu (2014) <doi:10.1093/biomet/asu017> for details.
Maintained by Wenjie Wang. Last updated 1 years ago.
21.1 match 2 stars 3.04 score 11 scriptstechtonique
learningmachine:Machine Learning with Explanations and Uncertainty Quantification
Regression-based Machine Learning with explanations and uncertainty quantification.
Maintained by T. Moudiki. Last updated 4 months ago.
conformal-predictionmachine-learningmachine-learning-algorithmsmachinelearningstatistical-learninguncertainty-quantificationcpp
11.5 match 5 stars 5.57 score 21 scriptsbioc
MLInterfaces:Uniform interfaces to R machine learning procedures for data in Bioconductor containers
This package provides uniform interfaces to machine learning code for data in R and Bioconductor containers.
Maintained by Vincent Carey. Last updated 5 months ago.
8.3 match 7.63 score 79 scripts 6 dependentsluca-scr
mclust:Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation
Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
Maintained by Luca Scrucca. Last updated 11 months ago.
5.2 match 21 stars 12.23 score 6.6k scripts 587 dependentsmodeloriented
fairmodels:Flexible Tool for Bias Detection, Visualization, and Mitigation
Measure fairness metrics in one place for many models. Check how big is model's bias towards different races, sex, nationalities etc. Use measures such as Statistical Parity, Equal odds to detect the discrimination against unprivileged groups. Visualize the bias using heatmap, radar plot, biplot, bar chart (and more!). There are various pre-processing and post-processing bias mitigation algorithms implemented. Package also supports calculating fairness metrics for regression models. Find more details in (Wiśniewski, Biecek (2021)) <arXiv:2104.00507>.
Maintained by Jakub Wiśniewski. Last updated 1 months ago.
explain-classifiersexplainable-mlfairnessfairness-comparisonfairness-mlmodel-evaluation
7.5 match 86 stars 7.72 score 51 scripts 1 dependentsbioc
DirichletMultinomial:Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data
Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.
Maintained by Martin Morgan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclusteringclassificationmetagenomicsgsl
5.2 match 11 stars 10.97 score 125 scripts 26 dependentscran
class:Functions for Classification
Various functions for classification, including k-nearest neighbour, Learning Vector Quantization and Self-Organizing Maps.
Maintained by Brian Ripley. Last updated 2 months ago.
7.2 match 1 stars 7.83 score 2.2k dependentsmothur
phylotypr:Classifying DNA Sequences to Taxonomic Groupings
Classification based analysis of DNA sequences to taxonomic groupings. This package primarily implements Naive Bayesian Classifier from the Ribosomal Database Project. This approach has traditionally been used to classify 16S rRNA gene sequences to bacterial taxonomic outlines; however, it can be used for any type of gene sequence. The method was originally described by Wang, Garrity, Tiedje, and Cole in Applied and Environmental Microbiology 73(16):5261-7 <doi:10.1128/AEM.00062-07>. The package also provides functions to read in 'FASTA'-formatted sequence data.
Maintained by Pat Schloss. Last updated 23 days ago.
9.2 match 8 stars 6.08 score 5 scriptsbioc
geNetClassifier:Classify diseases and build associated gene networks using gene expression profiles
Comprehensive package to automatically train and validate a multi-class SVM classifier based on gene expression data. Provides transparent selection of gene markers, their coexpression networks, and an interface to query the classifier.
Maintained by Sara Aibar. Last updated 5 months ago.
classificationdifferentialexpressionmicroarray
12.7 match 4.38 score 1 scripts 2 dependentseasystats
performance:Assessment of Regression Models Performance
Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.
Maintained by Daniel Lüdecke. Last updated 18 days ago.
aiceasystatshacktoberfestloomachine-learningmixed-modelsmodelsperformancer2statistics
3.4 match 1.1k stars 16.17 score 4.3k scripts 47 dependentsrelund
gMOIP:Tools for 2D and 3D Plots of Single and Multi-Objective Linear/Integer Programming Models
Make 2D and 3D plots of linear programming (LP), integer linear programming (ILP), or mixed integer linear programming (MILP) models with up to three objectives. Plots of both the solution and criterion space are possible. For instance the non-dominated (Pareto) set for bi-objective LP/ILP/MILP programming models (see vignettes for an overview). The package also contains an function for checking if a point is inside the convex hull.
Maintained by Lars Relund Nielsen. Last updated 5 months ago.
2d-plot3d-plotbi-objectiveconvex-hullinteger-programminglinear-programmingmathmilpmixed-integer-programmingmulti-objectivepolytopetri-objectivevisualization
7.1 match 5 stars 7.83 score 79 scripts 3 dependentsmoviedo5
fda.usc:Functional Data Analysis and Utilities for Statistical Computing
Routines for exploratory and descriptive analysis of functional data such as depth measurements, atypical curves detection, regression models, supervised classification, unsupervised classification and functional analysis of variance.
Maintained by Manuel Oviedo de la Fuente. Last updated 4 months ago.
functional-data-analysisfortran
5.7 match 12 stars 9.72 score 560 scripts 22 dependentslarssnip
microclass:Tools for taxonomic classification of prokaryotes
Functions for working with taxonomic classifications in R
Maintained by Lars Snipen. Last updated 1 years ago.
11.8 match 4 stars 4.68 score 20 scriptsips-lmu
emuR:Main Package of the EMU Speech Database Management System
Provide the EMU Speech Database Management System (EMU-SDMS) with database management, data extraction, data preparation and data visualization facilities. See <https://ips-lmu.github.io/The-EMU-SDMS-Manual/> for more details.
Maintained by Markus Jochim. Last updated 1 years ago.
7.9 match 24 stars 6.89 score 135 scripts 1 dependentsdavid-cortes
costsensitive:Cost-Sensitive Multi-Class Classification
Reduction-based techniques for cost-sensitive multi-class classification, in which each observation has a different cost for classifying it into one class, and the goal is to predict the class with the minimum expected cost for each new observation. Implements Weighted All-Pairs (Beygelzimer, Langford, & Zadrozny (2008) <doi:10.1007/978-0-387-79361-0_1>), Weighted One-Vs-Rest (Beygelzimer,Dani, Hayes, Langford, Zadrozny, (2005) <https://dl.acm.org/citation.cfm?id=1102358>) and Regression One-Vs-Rest. Works with arbitrary classifiers taking observation weights, or with regressors. Also implements cost-proportionate rejection sampling for working with classifiers that don't accept observation weights.
Maintained by David Cortes. Last updated 2 months ago.
cost-sensitive-classificationmulti-label-classification
10.2 match 47 stars 5.30 score 28 scriptsbearloga
MLPUGS:Multi-Label Prediction Using Gibbs Sampling (and Classifier Chains)
An implementation of classifier chains (CC's) for multi-label prediction. Users can employ an external package (e.g. 'randomForest', 'C50'), or supply their own. The package can train a single set of CC's or train an ensemble of CC's -- in parallel if running in a multi-core environment. New observations are classified using a Gibbs sampler since each unobserved label is conditioned on the others. The package includes methods for evaluating the predictions for accuracy and aggregating across iterations and models to produce binary or probabilistic classifications.
Maintained by Mikhail Popov. Last updated 5 years ago.
classificationmachine-learningmcmcmulti-label-classificationsupervised-learning
11.1 match 11 stars 4.74 score 6 scriptsropensci
gigs:Assess Fetal, Newborn, and Child Growth with International Standards
Convert between anthropometric measures and z-scores/centiles in multiple growth standards, and classify fetal, newborn, and child growth accordingly. With a simple interface to growth standards from the World Health Organisation and International Fetal and Newborn Growth Consortium for the 21st Century, gigs makes growth assessment easy and reproducible for clinicians, researchers and policy-makers.
Maintained by Simon R Parker. Last updated 25 days ago.
anthropometrygrowth-standardsintergrowthwho
11.9 match 4 stars 4.38 score 8 scriptslucymcgowan
tidycode:Analyze Lines of R Code the Tidy Way
Analyze lines of R code using tidy principles. This allows you to input lines of R code and output a data frame with one row per function included. Additionally, it facilitates code classification via included lexicons.
Maintained by Lucy DAgostino McGowan. Last updated 4 years ago.
8.0 match 32 stars 6.54 score 36 scriptsbioc
rRDP:Interface to the RDP Classifier
This package installs and interfaces the naive Bayesian classifier for 16S rRNA sequences developed by the Ribosomal Database Project (RDP). With this package the classifier trained with the standard training set can be used or a custom classifier can be trained.
Maintained by Michael Hahsler. Last updated 5 months ago.
geneticssequencinginfrastructureclassificationmicrobiomeimmunooncologyalignmentsequencematchingdataimportbayesianbioconductorbioinformaticsopenjdk
10.4 match 4 stars 5.00 score 6 scriptsbioc
CHETAH:Fast and accurate scRNA-seq cell type identification
CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate, selective and fast scRNA-seq classifier. Classification is guided by a reference dataset, preferentially also a scRNA-seq dataset. By hierarchical clustering of the reference data, CHETAH creates a classification tree that enables a step-wise, top-to-bottom classification. Using a novel stopping rule, CHETAH classifies the input cells to the cell types of the references and to "intermediate types": more general classifications that ended in an intermediate node of the tree.
Maintained by Jurrian de Kanter. Last updated 5 months ago.
classificationrnaseqsinglecellclusteringgeneexpressionimmunooncology
7.0 match 44 stars 7.27 score 70 scriptsbioc
SingleR:Reference-Based Single-Cell RNA-Seq Annotation
Performs unbiased cell type recognition from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types to infer the cell of origin of each single cell independently.
Maintained by Aaron Lun. Last updated 28 days ago.
softwaresinglecellgeneexpressiontranscriptomicsclassificationclusteringannotationbioconductorsinglercpp
4.0 match 182 stars 12.60 score 2.1k scripts 1 dependentsabichat
evabic:Evaluation of Binary Classifiers
Evaluates the performance of binary classifiers. Computes confusion measures (TP, TN, FP, FN), derived measures (TPR, FDR, accuracy, F1, DOR, ..), and area under the curve. Outputs are well suited for nested dataframes.
Maintained by Antoine Bichat. Last updated 3 years ago.
classifiermeasurespredictorsroc-curvestatistics
13.9 match 6 stars 3.62 score 14 scriptsrueda-lab
iC10:A Copy Number and Expression-Based Classifier for Breast Tumours
Implementation of the classifier described in the paper Ali HR et al (2014) <doi:10.1186/s13059-014-0431-1>. It uses copy number and/or expression form breast cancer data, trains a Tibshirani's 'pamr' classifier with the features available and predicts the iC10 group.
Maintained by Oscar M Rueda. Last updated 8 months ago.
16.5 match 2.94 score 12 scripts 4 dependentsbioc
MLSeq:Machine Learning Interface for RNA-Seq Data
This package applies several machine learning methods, including SVM, bagSVM, Random Forest and CART to RNA-Seq data.
Maintained by Gokmen Zararsiz. Last updated 5 months ago.
immunooncologysequencingrnaseqclassificationclustering
10.0 match 4.81 score 27 scripts 1 dependentsheal-kgs
STICr:Process Stream Temperature, Intermittency, and Conductivity (STIC) Sensor Data
A collection of functions for processing raw data from Stream Temperature, Intermittency, and Conductivity (STIC) loggers. 'STICr' (pronounced "sticker") includes functions for tidying, calibrating, classifying, and doing quality checks on data from STIC sensors. Some package functionality is described in Wheeler/Zipper et al. (2023) <doi:10.31223/X5636K>.
Maintained by Sam Zipper. Last updated 3 months ago.
9.0 match 4 stars 5.26 score 9 scriptszarquon42b
Morpho:Calculations and Visualisations Related to Geometric Morphometrics
A toolset for Geometric Morphometrics and mesh processing. This includes (among other stuff) mesh deformations based on reference points, permutation tests, detection of outliers, processing of sliding semi-landmarks and semi-automated surface landmark placement.
Maintained by Stefan Schlager. Last updated 5 months ago.
4.7 match 51 stars 10.00 score 218 scripts 13 dependentsbioc
MethPed:A DNA methylation classifier tool for the identification of pediatric brain tumor subtypes
Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro).
Maintained by Helena Carén. Last updated 5 months ago.
immunooncologydnamethylationclassificationepigenetics
11.8 match 4.00 score 1 scriptsbioc
DaMiRseq:Data Mining for RNA-seq data: normalization, feature selection and classification
The DaMiRseq package offers a tidy pipeline of data mining procedures to identify transcriptional biomarkers and exploit them for both binary and multi-class classification purposes. The package accepts any kind of data presented as a table of raw counts and allows including both continous and factorial variables that occur with the experimental setting. A series of functions enable the user to clean up the data by filtering genomic features and samples, to adjust data by identifying and removing the unwanted source of variation (i.e. batches and confounding factors) and to select the best predictors for modeling. Finally, a "stacking" ensemble learning technique is applied to build a robust classification model. Every step includes a checkpoint that the user may exploit to assess the effects of data management by looking at diagnostic plots, such as clustering and heatmaps, RLE boxplots, MDS or correlation plot.
Maintained by Mattia Chiesa. Last updated 5 months ago.
sequencingrnaseqclassificationimmunooncologyopenjdk
8.8 match 5.32 score 7 scripts 1 dependentscran
evclass:Evidential Distance-Based Classification
Different evidential classifiers, which provide outputs in the form of Dempster-Shafer mass functions. The methods are: the evidential K-nearest neighbor rule, the evidential neural network, radial basis function neural networks, logistic regression, feed-forward neural networks.
Maintained by Thierry Denoeux. Last updated 1 years ago.
23.5 match 1 stars 2.00 scorermaia
pavo:Perceptual Analysis, Visualization and Organization of Spectral Colour Data
A cohesive framework for the spectral and spatial analysis of colour described in Maia, Eliason, Bitton, Doucet & Shawkey (2013) <doi:10.1111/2041-210X.12069> and Maia, Gruson, Endler & White (2019) <doi:10.1111/2041-210X.13174>.
Maintained by Thomas White. Last updated 1 months ago.
4.8 match 72 stars 9.72 score 151 scripts 1 dependentsbioc
cleanUpdTSeq:cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data
This package implements a Naive Bayes classifier for accurately differentiating true polyadenylation sites (pA sites) from oligo(dT)-mediated 3' end sequencing such as PAS-Seq, PolyA-Seq and RNA-Seq by filtering out false polyadenylation sites, mainly due to oligo(dT)-mediated internal priming during reverse transcription. The classifer is highly accurate and outperforms other heuristic methods.
Maintained by Jianhong Ou. Last updated 2 months ago.
sequencing3 end sequencingpolyadenylation siteinternal priming
10.9 match 4.26 score 8 scripts 1 dependentsrstudio
tfestimators:Interface to 'TensorFlow' Estimators
Interface to 'TensorFlow' Estimators <https://www.tensorflow.org/guide/estimator>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
5.5 match 57 stars 8.42 score 170 scriptsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 16 days ago.
3.9 match 12 stars 11.88 score 448 scripts 16 dependentscivisanalytics
civis:R Client for the 'Civis Platform API'
A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.
Maintained by Peter Cooman. Last updated 2 months ago.
5.6 match 16 stars 7.84 score 144 scriptsmicrosoft
wpa:Tools for Analysing and Visualising Viva Insights Data
Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.
Maintained by Martin Chan. Last updated 4 months ago.
6.6 match 30 stars 6.69 score 39 scripts 1 dependentshkestler
TunePareto:Multi-Objective Parameter Tuning for Classifiers
Generic methods for parameter tuning of classification algorithms using multiple scoring functions (Muessel et al. (2012), <doi:10.18637/jss.v046.i05>).
Maintained by Hans Kestler. Last updated 1 years ago.
12.4 match 1 stars 3.52 score 92 scripts 2 dependentsbioc
clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters
Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.
Maintained by Rui Fu. Last updated 5 months ago.
singlecellannotationsequencingmicroarraygeneexpressionassign-identitiesclustersmarker-genesrna-seqsingle-cell-rna-seq
4.5 match 119 stars 9.63 score 296 scriptsmaxmenssen
predint:Prediction Intervals
An implementation of prediction intervals for overdispersed count data, for overdispersed binomial data and for linear random effects models.
Maintained by Max Menssen. Last updated 4 months ago.
14.4 match 3.00 score 4 scriptsavrodrigues
naturaList:Classify Occurrences by Confidence Levels in the Species ID
Classify occurrence records based on confidence levels of species identification. In addition, implement tools to filter occurrences inside grid cells and to manually check for possibles errors with an interactive shiny application.
Maintained by Arthur Vinicius Rodrigues. Last updated 1 years ago.
9.2 match 4.66 score 23 scriptsfangzhou-xie
rethnicity:Predicting Ethnic Group from Names
Implementation of the race/ethnicity prediction method, described in "rethnicity: An R package for predicting ethnicity from names" by Fangzhou Xie (2022) <doi:10.1016/j.softx.2021.100965> and "Rethnicity: Predicting Ethnicity from Names" by Fangzhou Xie (2021) <doi:10.48550/arXiv.2109.09228>.
Maintained by Fangzhou Xie. Last updated 3 days ago.
ethnicity-classifierethnicity-predictionlstmcpp
7.5 match 9 stars 5.66 score 17 scriptsropensci
nuts:Convert European Regional Data
Motivated by changing administrative boundaries over time, the 'nuts' package can convert European regional data with NUTS codes between versions (2006, 2010, 2013, 2016 and 2021) and levels (NUTS 1, NUTS 2 and NUTS 3). The package uses spatial interpolation as in Lam (1983) <doi:10.1559/152304083783914958> based on granular (100m x 100m) area, population and land use data provided by the European Commission's Joint Research Center.
Maintained by Moritz Hennicke. Last updated 5 months ago.
europeeuropean-unioneurostatnomenclaturenutsnuts-codesnuts-regionsregional-data
7.2 match 8 stars 5.86 score 3 scriptsbioc
GSgalgoR:An Evolutionary Framework for the Identification and Study of Prognostic Gene Expression Signatures in Cancer
A multi-objective optimization algorithm for disease sub-type discovery based on a non-dominated sorting genetic algorithm. The 'Galgo' framework combines the advantages of clustering algorithms for grouping heterogeneous 'omics' data and the searching properties of genetic algorithms for feature selection. The algorithm search for the optimal number of clusters determination considering the features that maximize the survival difference between sub-types while keeping cluster consistency high.
Maintained by Carlos Catania. Last updated 5 months ago.
geneexpressiontranscriptionclusteringclassificationsurvival
7.7 match 15 stars 5.48 score 6 scriptsusccana
netdiffuseR:Analysis of Diffusion and Contagion Processes on Networks
Empirical statistical analysis, visualization and simulation of diffusion and contagion processes on networks. The package implements algorithms for calculating network diffusion statistics such as transmission rate, hazard rates, exposure models, network threshold levels, infectiousness (contagion), and susceptibility. The package is inspired by work published in Valente, et al., (2015) <DOI:10.1016/j.socscimed.2015.10.001>; Valente (1995) <ISBN: 9781881303213>, Myers (2000) <DOI:10.1086/303110>, Iyengar and others (2011) <DOI:10.1287/mksc.1100.0566>, Burt (1987) <DOI:10.1086/228667>; among others.
Maintained by George Vega Yon. Last updated 3 months ago.
contagiondiffusion-networknetwork-analysisnetwork-visualizationopenblascppopenmp
4.7 match 88 stars 8.88 score 217 scriptscran
soundgen:Sound Synthesis and Acoustic Analysis
Performs parametric synthesis of sounds with harmonic and noise components such as animal vocalizations or human voice. Also offers tools for audio manipulation and acoustic analysis, including pitch tracking, spectral analysis, audio segmentation, pitch and formant shifting, etc. Includes four interactive web apps for synthesizing and annotating audio, manually correcting pitch contours, and measuring formant frequencies. Reference: Anikin (2019) <doi:10.3758/s13428-018-1095-7>.
Maintained by Andrey Anikin. Last updated 2 months ago.
8.6 match 1 stars 4.86 score 110 scripts 2 dependentssollano
forestmangr:Forest Mensuration and Management
Processing forest inventory data with methods such as simple random sampling, stratified random sampling and systematic sampling. There are also functions for yield and growth predictions and model fitting, linear and nonlinear grouped data fitting, and statistical tests. References: Kershaw Jr., Ducey, Beers and Husch (2016). <doi:10.1002/9781118902028>.
Maintained by Sollano Rabelo Braga. Last updated 3 months ago.
5.2 match 17 stars 7.97 score 378 scriptsbioc
clst:Classification by local similarity threshold
Package for modified nearest-neighbor classification based on calculation of a similarity threshold distinguishing within-group from between-group comparisons.
Maintained by Noah Hoffman. Last updated 5 months ago.
10.9 match 3.78 score 10 scripts 1 dependentsr-lidar
lasR:Fast and Pipeable Airborne LiDAR Data Tools
Fast and pipeable airborne lidar processing tools. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, normalization, individual tree segmentation and other manipulations in a powerful and versatile processing chain.
Maintained by Jean-Romain Roussel. Last updated 21 days ago.
6.0 match 17 stars 6.76 score 26 scriptsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
5.6 match 145 stars 7.09 score 50 scripts 2 dependentscran
MASS:Support Functions and Datasets for Venables and Ripley's MASS
Functions and datasets to support Venables and Ripley, "Modern Applied Statistics with S" (4th edition, 2002).
Maintained by Brian Ripley. Last updated 16 days ago.
3.7 match 19 stars 10.53 score 11k dependentsmhahsler
arulesCBA:Classification Based on Association Rules
Provides the infrastructure for association rule-based classification including the algorithms CBA, CMAR, CPAR, C4.5, FOIL, PART, PRM, RCAR, and RIPPER to build associative classifiers. Hahsler et al (2019) <doi:10.32614/RJ-2019-048>.
Maintained by Michael Hahsler. Last updated 7 months ago.
association-rulesclassification
6.9 match 3 stars 5.49 score 47 scripts 1 dependentsropensci
canaper:Categorical Analysis of Neo- And Paleo-Endemism
Provides functions to analyze the spatial distribution of biodiversity, in particular categorical analysis of neo- and paleo-endemism (CANAPE) as described in Mishler et al (2014) <doi:10.1038/ncomms5473>. 'canaper' conducts statistical tests to determine the types of endemism that occur in a study area while accounting for the evolutionary relationships of species.
Maintained by Joel H. Nitta. Last updated 2 years ago.
7.0 match 7 stars 5.38 score 23 scriptsbioc
CRImage:CRImage a package to classify cells and calculate tumour cellularity
CRImage provides functionality to process and analyze images, in particular to classify cells in biological images. Furthermore, in the context of tumor images, it provides functionality to calculate tumour cellularity.
Maintained by Henrik Failmezger. Last updated 5 months ago.
9.4 match 4.00 score 6 scriptsadamlilith
fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'
Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.
Maintained by Adam B. Smith. Last updated 18 days ago.
aspectdistancefragmentationfragmentation-indicesgisgrassgrass-gisrasterraster-projectionrasterizeslopetopographyvectorization
4.9 match 58 stars 7.69 score 8 scriptsbioc
PrInCE:Predicting Interactomes from Co-Elution
PrInCE (Predicting Interactomes from Co-Elution) uses a naive Bayes classifier trained on dataset-derived features to recover protein-protein interactions from co-elution chromatogram profiles. This package contains the R implementation of PrInCE.
Maintained by Michael Skinnider. Last updated 5 months ago.
proteomicssystemsbiologynetworkinference
5.8 match 8 stars 6.38 score 25 scriptsfelixthestudent
cellpypes:Cell Type Pipes for Single-Cell RNA Sequencing Data
Annotate single-cell RNA sequencing data manually based on marker gene thresholds. Find cell type rules (gene+threshold) through exploration, use the popular piping operator '%>%' to reconstruct complex cell type hierarchies. 'cellpypes' models technical noise to find positive and negative cells for a given expression threshold and returns cell type labels or pseudobulks. Cite this package as Frauhammer (2022) <doi:10.5281/zenodo.6555728> and visit <https://github.com/FelixTheStudent/cellpypes> for tutorials and newest features.
Maintained by Felix Frauhammer. Last updated 1 years ago.
celltype-annotationclassification-algorithmscrnaseqsingle-cell-rna-seq
8.4 match 51 stars 4.41 score 8 scriptssym33
RecordLinkage:Record Linkage Functions for Linking and Deduplicating Data Sets
Provides functions for linking and deduplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain. For details, see our paper "The RecordLinkage Package: Detecting Errors in Data" Sariyar M / Borg A (2010) <doi:10.32614/RJ-2010-017>.
Maintained by Murat Sariyar. Last updated 2 years ago.
4.0 match 6 stars 9.00 score 454 scripts 8 dependentscran
e1071:Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien
Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, generalized k-nearest neighbour ...
Maintained by David Meyer. Last updated 6 months ago.
2.5 match 28 stars 14.46 score 19k scripts 2.0k dependentsropensci
pixelclasser:Classifies Image Pixels by Colour
Contains functions to classify the pixels of an image file (jpeg or tiff) by its colour. It implements a simple form of the techniques known as Support Vector Machine adapted to this particular problem.
Maintained by Carlos Real. Last updated 4 years ago.
12.0 match 2 stars 3.00 score 8 scriptsleef-uzh
LEEF.analysis:Access Functions, Tests and Basic Analysis of the RRD Data from the LEEF Project
Provides simple access functions to read data out of the sqlite RRD database. SQL queries can be configured in a yaml config file and used.
Maintained by Rainer M. Krug. Last updated 1 months ago.
14.8 match 2.44 score 23 scriptsmhahsler
stream:Infrastructure for Data Stream Mining
A framework for data stream modeling and associated data mining tasks such as clustering and classification. The development of this package was supported in part by NSF IIS-0948893, NSF CMMI 1728612, and NIH R21HG005912. Hahsler et al (2017) <doi:10.18637/jss.v076.i14>.
Maintained by Michael Hahsler. Last updated 4 days ago.
data-stream-clusteringdatastreamstream-miningcpp
3.5 match 39 stars 10.05 score 132 scripts 3 dependentsandreasdominik
som.nn:Topological k-NN Classifier Based on Self-Organising Maps
A topological version of k-NN: An abstract model is build as 2-dimensional self-organising map. Samples of unknown class are predicted by mapping them on the SOM and analysing class membership of neurons in the neighbourhood.
Maintained by Andreas Dominik. Last updated 12 months ago.
14.7 match 2.40 score 28 scriptskhliland
multiblock:Multiblock Data Fusion in Statistics and Machine Learning
Functions and datasets to support Smilde, Næs and Liland (2021, ISBN: 978-1-119-60096-1) "Multiblock Data Fusion in Statistics and Machine Learning - Applications in the Natural and Life Sciences". This implements and imports a large collection of methods for multiblock data analysis with common interfaces, result- and plotting functions, several real data sets and six vignettes covering a range different applications.
Maintained by Kristian Hovde Liland. Last updated 2 months ago.
5.3 match 14 stars 6.68 score 19 scriptsnano-optics
planar:Multilayer Optics
Solves the electromagnetic problem of reflection and transmission at a planar multilayer interface. Also computed are the decay rates and emission profile for a dipolar emitter.
Maintained by Baptiste Auguié. Last updated 3 years ago.
6.0 match 7 stars 5.83 score 65 scriptsejosymart
sizeMat:Estimate Size at Sexual Maturity
Estimate morphometric and gonadal size at sexual maturity for organisms, usually fish and invertebrates. It includes methods for classification based on relative growth (using principal components analysis, hierarchical clustering, discriminant analysis), logistic regression (Frequentist or Bayes), parameters estimation and some basic plots.
Maintained by Josymar Torrejon-Magallanes. Last updated 1 years ago.
allometric-variablesgonad-maturitymorphometric-maturity
7.3 match 4 stars 4.72 score 26 scriptsmrshoenel
mmb:Arbitrary Dependency Mixed Multivariate Bayesian Models
Supports Bayesian models with full and partial (hence arbitrary) dependencies between random variables. Discrete and continuous variables are supported, and conditional joint probabilities and probability densities are estimated using Kernel Density Estimation (KDE). The full general form, which implements an extension to Bayes' theorem, as well as the simple form, which is just a Bayesian network, both support regression through segmentation and KDE and estimation of probability or relative likelihood of discrete or continuous target random variables. This package also provides true statistical distance measures based on Bayesian models. Furthermore, these measures can be facilitated on neighborhood searches, and to estimate the similarity and distance between data points. Related work is by Bayes (1763) <doi:10.1098/rstl.1763.0053> and by Scutari (2010) <doi:10.18637/jss.v035.i03>.
Maintained by Sebastian Hönel. Last updated 4 years ago.
bayes-classifierkernel-density-estimationneighborhood-searchregression-models
9.2 match 3.70 score 5 scriptscran
SSLR:Semi-Supervised Classification, Regression and Clustering Methods
Providing a collection of techniques for semi-supervised classification, regression and clustering. In semi-supervised problem, both labeled and unlabeled data are used to train a classifier. The package includes a collection of semi-supervised learning techniques: self-training, co-training, democratic, decision tree, random forest, 'S3VM' ... etc, with a fairly intuitive interface that is easy to use.
Maintained by Francisco Jesús Palomares Alabarce. Last updated 4 years ago.
9.2 match 1 stars 3.64 score 73 scriptsbioc
ClassifyR:A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing
The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.
Maintained by Dario Strbenac. Last updated 6 days ago.
4.0 match 5 stars 8.36 score 45 scripts 3 dependentsgrunwaldlab
metacoder:Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data
Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the 'taxmap' format defined by the 'taxa' package. The 'metacoder' package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.
Maintained by Zachary Foster. Last updated 1 months ago.
community-diversityhierarchicalmetabarcodingpcrtaxonomytreescpp
3.5 match 140 stars 9.64 score 328 scriptsroaldarbol
animovement:An R toolbox for analysing animal movement across space and time
An R toolbox for analysing animal movement across space and time.
Maintained by Mikkel Roald-Arbøl. Last updated 2 months ago.
animal-behaviouranimal-movementneuroethologyneuroscience
6.8 match 10 stars 4.81 score 8 scriptsericarcher
banter:BioAcoustic eveNT classifiER
Create a hierarchical acoustic event species classifier out of multiple call type detectors as described in Rankin et al (2017) <doi:10.1111/mms.12381>.
Maintained by Eric Archer. Last updated 1 years ago.
acousticsbioacousticscetaceansclassificationdolphinsmachine-learningnoaarandom-forestspecies-identificationsupervised-learningsupervised-machine-learningwhalesjagscpp
7.7 match 9 stars 4.22 score 37 scriptsr-lidar
lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications
Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.
Maintained by Jean-Romain Roussel. Last updated 1 months ago.
alsforestrylaslazlidarpoint-cloudremote-sensingopenblascppopenmp
2.3 match 623 stars 14.47 score 844 scripts 8 dependentskliegr
arc:Association Rule Classification
Implements the Classification-based on Association Rules (CBA) algorithm for association rule classification. The package, also described in Hahsler et al. (2019) <doi:10.32614/RJ-2019-048>, contains several convenience methods that allow to automatically set CBA parameters (minimum confidence, minimum support) and it also natively handles numeric attributes by integrating a pre-discretization step. The rule generation phase is handled by the 'arules' package. To further decrease the size of the CBA models produced by the 'arc' package, postprocessing by the 'qCBA' package is suggested.
Maintained by Tomas Kliegr. Last updated 6 months ago.
6.4 match 7 stars 5.09 score 39 scripts 1 dependentsbioc
bioCancer:Interactive Multi-Omics Cancers Data Visualization and Analysis
This package is a Shiny App to visualize and analyse interactively Multi-Assays of Cancer Genomic Data.
Maintained by Karim Mezhoud. Last updated 5 months ago.
guidatarepresentationnetworkmultiplecomparisonpathwaysreactomevisualizationgeneexpressiongenetargetanalysisbiocancer-interfacecancercancer-studiesrmarkdown
5.4 match 20 stars 5.95 score 7 scriptsbioc
tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles
This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.
Maintained by Timothy Keyes. Last updated 5 months ago.
singlecellflowcytometrybioinformaticscytometrydata-sciencesingle-celltidyversecpp
4.4 match 19 stars 7.26 score 35 scriptsjaroslav-kuchar
rCBA:CBA Classifier for R
Provides implementations of a classifier based on the "Classification Based on Associations" (CBA). It can be used for building classification models from association rules. Rules are pruned in the order of precedence given by the sort criteria and a default rule is added. The final classifier labels provided instances. CBA was originally proposed by Liu, B. Hsu, W. and Ma, Y. Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI. pp80-86 (1998, ISBN:1-57735-070-7).
Maintained by Jaroslav Kuchar. Last updated 6 years ago.
7.7 match 7 stars 4.14 score 39 scriptseworx-org
labourR:Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations
Allows the user to map multilingual free-text of occupations to a broad range of standardized classifications. The package facilitates automatic occupation coding (see, e.g., Gweon et al. (2017) <doi:10.1515/jos-2017-0006> and Turrell et al. (2019) <doi:10.3386/w25837>), where the ISCO to ESCO mapping is exploited to extend the occupations hierarchy, Le Vrang et al. (2014) <doi:10.1109/mc.2014.283>. Document vectorization is performed using the multilingual ESCO corpus. A method based on the nearest neighbor search is used to suggest the closest ISCO occupation.
Maintained by Alexandros Kouretsis. Last updated 3 years ago.
5.0 match 28 stars 6.29 score 23 scripts 1 dependentsajwills72
grt:General Recognition Theory
Functions to generate and analyze data for psychology experiments based on the General Recognition Theory.
Maintained by Andy Wills. Last updated 8 years ago.
13.4 match 2.34 score 44 scriptswanchanglin
mt:Metabolomics Data Analysis Toolbox
Functions for metabolomics data analysis: data preprocessing, orthogonal signal correction, PCA analysis, PCA-DA analysis, PLS-DA analysis, classification, feature selection, correlation analysis, data visualisation and re-sampling strategies.
Maintained by Wanchang Lin. Last updated 1 years ago.
6.9 match 3 stars 4.57 score 50 scriptsfriendly
HistData:Data Sets from the History of Statistics and Data Visualization
The 'HistData' package provides a collection of small data sets that are interesting and important in the history of statistics and data visualization. The goal of the package is to make these available, both for instructional use and for historical research. Some of these present interesting challenges for graphics or analysis in R.
Maintained by Michael Friendly. Last updated 10 months ago.
3.4 match 63 stars 9.19 score 732 scripts 2 dependentsgzt
MixMatrix:Classification with Matrix Variate Normal and t Distributions
Provides sampling and density functions for matrix variate normal, t, and inverted t distributions; ML estimation for matrix variate normal and t distributions using the EM algorithm, including some restrictions on the parameters; and classification by linear and quadratic discriminant analysis for matrix variate normal and t distributions described in Thompson et al. (2019) <doi:10.1080/10618600.2019.1696208>. Performs clustering with matrix variate normal and t mixture models.
Maintained by Geoffrey Thompson. Last updated 6 months ago.
5.0 match 3 stars 6.19 score 29 scripts 3 dependentsmskogholt
fastNaiveBayes:Extremely Fast Implementation of a Naive Bayes Classifier
This is an extremely fast implementation of a Naive Bayes classifier. This package is currently the only package that supports a Bernoulli distribution, a Multinomial distribution, and a Gaussian distribution, making it suitable for both binary features, frequency counts, and numerical features. Another feature is the support of a mix of different event models. Only numerical variables are allowed, however, categorical variables can be transformed into dummies and used with the Bernoulli distribution. The implementation is largely based on the paper "A comparison of event models for Naive Bayes anti-spam e-mail filtering" written by K.M. Schneider (2003) <doi:10.3115/1067807.1067848>. Any issues can be submitted to: <https://github.com/mskogholt/fastNaiveBayes/issues>.
Maintained by Martin Skogholt. Last updated 5 years ago.
5.2 match 42 stars 5.96 score 43 scriptsmlr-org
mlr3torch:Deep Learning with 'mlr3'
Deep Learning library that extends the mlr3 framework by building upon the 'torch' package. It allows to conveniently build, train, and evaluate deep learning models without having to worry about low level details. Custom architectures can be created using the graph language defined in 'mlr3pipelines'.
Maintained by Sebastian Fischer. Last updated 1 months ago.
data-sciencedeep-learningmachine-learningmlr3torch
4.0 match 42 stars 7.63 score 78 scriptsuligges
klaR:Classification and Visualization
Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kernel-density naive Bayes, an interface to 'svmlight' and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.
Maintained by Uwe Ligges. Last updated 1 years ago.
4.0 match 5 stars 7.61 score 1.4k scripts 13 dependentsbioc
dada2:Accurate, high-resolution sample inference from amplicon sequencing data
The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.
Maintained by Benjamin Callahan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclassificationmetagenomicsampliconbioconductorbioinformaticsmetabarcodingtaxonomycpp
2.3 match 485 stars 13.17 score 3.0k scripts 4 dependentsalanarnholt
BSDA:Basic Statistics and Data Analysis
Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.
Maintained by Alan T. Arnholt. Last updated 2 years ago.
3.3 match 7 stars 9.11 score 1.3k scripts 6 dependentsr-spatial
classInt:Choose Univariate Class Intervals
Selected commonly used methods for choosing univariate class intervals for mapping or other graphics purposes.
Maintained by Roger Bivand. Last updated 3 months ago.
1.9 match 34 stars 16.02 score 3.2k scripts 1.2k dependentshkestler
ORION:Ordinal Relations
Functions to handle ordinal relations reflected within the feature space. Those function allow to search for ordinal relations in multi-class datasets. One can check whether proposed relations are reflected in a specific feature representation. Furthermore, it provides functions to filter, organize and further analyze those ordinal relations.
Maintained by HA Kestler. Last updated 3 years ago.
9.3 match 3.23 score 17 scriptsr-tensorflow
autokeras:R Interface to 'AutoKeras'
R Interface to 'AutoKeras' <https://autokeras.com/>. 'AutoKeras' is an open source software library for Automated Machine Learning (AutoML). The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. 'AutoKeras' provides functions to automatically search for architecture and hyperparameters of deep learning models.
Maintained by Juan Cruz Rodriguez. Last updated 4 years ago.
autodlautomatic-machine-learningautomldeep-learningkerasmachine-learningtensorflow
5.5 match 73 stars 5.34 scoremlcollyer
RRPP:Linear Model Evaluation with Randomized Residuals in a Permutation Procedure
Linear model calculations are made for many random versions of data. Using residual randomization in a permutation procedure, sums of squares are calculated over many permutations to generate empirical probability distributions for evaluating model effects. Additionally, coefficients, statistics, fitted values, and residuals generated over many permutations can be used for various procedures including pairwise tests, prediction, classification, and model comparison. This package should provide most tools one could need for the analysis of high-dimensional data, especially in ecology and evolutionary biology, but certainly other fields, as well.
Maintained by Michael Collyer. Last updated 26 days ago.
3.0 match 4 stars 9.84 score 173 scripts 7 dependentscran
SeqDetect:Sequence and Latent Process Detector
Sequence detector in this package contains a specific automaton model that can be used to learn and detect data and process sequences. Automaton model in this package is capable of learning and tracing sequences. Automaton model can be found in Krleža, Vrdoljak, Brčić (2019) <doi:10.1109/ACCESS.2019.2955245>. This research has been partly supported under Competitiveness and Cohesion Operational Programme from the European Regional and Development Fund, as part of the Integrated Anti-Fraud System project no. KK.01.2.1.01.0041. This research has also been partly supported by the European Regional Development Fund under the grant KK.01.1.1.01.0009.
Maintained by Dalibor Krleža. Last updated 5 years ago.
14.6 match 2.00 score 2 scriptsandregustavom
mlquantify:Algorithms for Class Distribution Estimation
Quantification is a prominent machine learning task that has received an increasing amount of attention in the last years. The objective is to predict the class distribution of a data sample. This package is a collection of machine learning algorithms for class distribution estimation. This package include algorithms from different paradigms of quantification. These methods are described in the paper: A. Maletzke, W. Hassan, D. dos Reis, and G. Batista. The importance of the test set size in quantification assessment. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI20, pages 2640–2646, 2020. <doi:10.24963/ijcai.2020/366>.
Maintained by Andre Maletzke. Last updated 3 years ago.
8.1 match 7 stars 3.54 score 1 scriptstrevorhastie
mda:Mixture and Flexible Discriminant Analysis
Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, and vector-response smoothing splines. Hastie, Tibshirani and Friedman (2009) "Elements of Statistical Learning (second edition, chap 12)" Springer, New York.
Maintained by Trevor Hastie. Last updated 4 months ago.
3.8 match 3 stars 7.60 score 428 scripts 17 dependentsbrry
berryFunctions:Function Collection Related to Plotting and Hydrology
Draw horizontal histograms, color scattered points by 3rd dimension, enhance date- and log-axis plots, zoom in X11 graphics, trace errors and warnings, use the unit hydrograph in a linear storage cascade, convert lists to data.frames and arrays, fit multiple functions.
Maintained by Berry Boessenkool. Last updated 1 months ago.
3.0 match 13 stars 9.43 score 350 scripts 16 dependentsbioc
canceR:A Graphical User Interface for accessing and modeling the Cancer Genomics Data of MSKCC
The package is user friendly interface based on the cgdsr and other modeling packages to explore, compare, and analyse all available Cancer Data (Clinical data, Gene Mutation, Gene Methylation, Gene Expression, Protein Phosphorylation, Copy Number Alteration) hosted by the Computational Biology Center at Memorial-Sloan-Kettering Cancer Center (MSKCC).
Maintained by Karim Mezhoud. Last updated 5 months ago.
guigeneexpressionclusteringgogenesetenrichmentkeggmultiplecomparisoncancercancer-datagenegene-expressiongene-methylationgene-mutationgene-setsmethylationmskccmutationstcltk
5.4 match 7 stars 5.25 score 17 scriptsdeclaredesign
fabricatr:Imagine Your Data Before You Collect It
Helps you imagine your data before you collect it. Hierarchical data structures and correlated data can be easily simulated, either from random number generators or by resampling from existing data sources. This package is faster with 'data.table' and 'mvnfast' installed.
Maintained by Graeme Blair. Last updated 1 months ago.
3.4 match 93 stars 8.29 score 234 scripts 5 dependentsbioc
TrIdent:TrIdent - Transduction Identification
The `TrIdent` R package automates the analysis of transductomics data by detecting, classifying, and characterizing read coverage patterns associated with potential transduction events. Transductomics is a DNA sequencing-based method for the detection and characterization of transduction events in pure cultures and complex communities. Transductomics relies on mapping sequencing reads from a viral-like particle (VLP)-fraction of a sample to contigs assembled from the metagenome (whole-community) of the same sample. Reads from bacterial DNA carried by VLPs will map back to the bacterial contigs of origin creating read coverage patterns indicative of ongoing transduction.
Maintained by Jessie Maier. Last updated 13 days ago.
coveragemetagenomicspatternlogicclassificationsequencingbacteriophagehorizontal-gene-transferpattern-matchingphagesequencing-coveragetransductiontransductomicsvirus-like-particle
5.5 match 2 stars 5.04 score 7 scriptsjulienmoeys
soiltexture:Functions for Soil Texture Plot, Classification and Transformation
"The Soil Texture Wizard" is a set of R functions designed to produce texture triangles (also called texture plots, texture diagrams, texture ternary plots), classify and transform soil textures data. These functions virtually allows to plot any soil texture triangle (classification) into any triangle geometry (isosceles, right-angled triangles, etc.). This set of function is expected to be useful to people using soil textures data from different soil texture classification or different particle size systems. Many (> 15) texture triangles from all around the world are predefined in the package. A simple text based graphical user interface is provided: soiltexture_gui().
Maintained by Julien Moeys. Last updated 1 years ago.
3.9 match 28 stars 7.11 score 136 scripts 1 dependentsbioc
maftools:Summarize, Analyze and Visualize MAF Files
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.
Maintained by Anand Mayakonda. Last updated 5 months ago.
datarepresentationdnaseqvisualizationdrivermutationvariantannotationfeatureextractionclassificationsomaticmutationsequencingfunctionalgenomicssurvivalbioinformaticscancer-genome-atlascancer-genomicsgenomicsmaf-filestcgacurlbzip2xz-utilszlib
1.9 match 459 stars 14.63 score 948 scripts 18 dependentsdmarchette
cccd:Class Cover Catch Digraphs
Class Cover Catch Digraphs, neighborhood graphs, and relatives.
Maintained by David J. Marchette. Last updated 3 years ago.
12.8 match 1 stars 2.12 score 131 scriptscran
PoiClaClu:Classification and Clustering of Sequencing Data Based on a Poisson Model
Implements the methods described in the paper, Witten (2011) Classification and Clustering of Sequencing Data using a Poisson Model, Annals of Applied Statistics 5(4) 2493-2518.
Maintained by Daniela Witten. Last updated 6 years ago.
7.0 match 3.81 score 107 scripts 2 dependentseasystats
parameters:Processing of Model Parameters
Utilities for processing the parameters of various statistical models. Beyond computing p values, CIs, and other indices for a wide variety of models (see list of supported models using the function 'insight::supported_models()'), this package implements features like bootstrapping or simulating of parameters and models, feature reduction (feature extraction and variable selection) as well as functions to describe data and variable characteristics (e.g. skewness, kurtosis, smoothness or distribution).
Maintained by Daniel Lüdecke. Last updated 2 days ago.
betabootstrapciconfidence-intervalsdata-reductioneasystatsfafeature-extractionfeature-reductionhacktoberfestparameterspcapvaluesregression-modelsrobust-statisticsstandardizestandardized-estimatesstatistical-models
1.7 match 453 stars 15.65 score 1.8k scripts 56 dependentspeter-t-ruehr
forceR:Force Measurement Analyses
For cleaning and analysis of graphs, such as animal closing force measurements. 'forceR' was initially written and optimized to deal with insect bite force measurements, but can be used for any time series. Includes a full workflow to load, plot and crop data, correct amplifier and baseline drifts, identify individual peak shapes (bites), rescale (normalize) peak curves, and find best polynomial fits to describe and analyze force curve shapes.
Maintained by Peter T. Rühr. Last updated 12 months ago.
7.3 match 3.70 score 10 scriptslorenc5
RTextTools:Automatic Text Classification via Supervised Learning
A machine learning package for automatic text classification that makes it simple for novice users to get started with machine learning, while allowing experienced users to easily experiment with different settings and algorithm combinations. The package includes eight algorithms for ensemble classification (svm, slda, boosting, bagging, random forests, glmnet, decision trees, neural networks), comprehensive analytics, and thorough documentation.
Maintained by Loren Collingwood. Last updated 5 years ago.
7.0 match 1 stars 3.84 score 772 scriptspbourkey
polymapR:Linkage Analysis in Outcrossing Polyploids
Creation of linkage maps in polyploid species from marker dosage scores of an F1 cross from two heterozygous parents. Currently works for outcrossing diploid, autotriploid, autotetraploid and autohexaploid species, as well as segmental allotetraploids. Methods are described in a manuscript of Bourke et al. (2018) <doi:10.1093/bioinformatics/bty371>. Since version 1.1.0, both discrete and probabilistic genotypes are acceptable input; for more details on the latter see Liao et al. (2021) <doi:10.1007/s00122-021-03834-x>.
Maintained by Peter Bourke. Last updated 10 months ago.
6.6 match 1 stars 4.03 score 54 scriptsbioc
a4Classif:Automated Affymetrix Array Analysis Classification Package
Functionalities for classification of Affymetrix microarray data, integrating within the Automated Affymetrix Array Analysis set of packages.
Maintained by Laure Cougnaud. Last updated 5 months ago.
microarraygeneexpressionclassification
7.0 match 3.78 score 1 scripts 1 dependentsviroli
quantileDA:Quantile Classifier
Code for centroid, median and quantile classifiers.
Maintained by Cinzia Viroli. Last updated 12 months ago.
26.4 match 1.00 score 10 scriptsshaunpwilkinson
insect:Informatic Sequence Classification Trees
Provides tools for probabilistic taxon assignment with informatic sequence classification trees. See Wilkinson et al (2018) <doi:10.7287/peerj.preprints.26812v1>.
Maintained by Shaun Wilkinson. Last updated 4 years ago.
4.5 match 14 stars 5.80 score 91 scriptsballings
AUC:Threshold Independent Performance Measures for Probabilistic Classifiers
Various functions to compute the area under the curve of selected measures: The area under the sensitivity curve (AUSEC), the area under the specificity curve (AUSPC), the area under the accuracy curve (AUACC), and the area under the receiver operating characteristic curve (AUROC). Support for visualization and partial areas is included.
Maintained by Michel Ballings. Last updated 3 years ago.
4.7 match 5.37 score 424 scripts 7 dependentsmabelc
ssc:Semi-Supervised Classification Methods
Provides a collection of self-labeled techniques for semi-supervised classification. In semi-supervised classification, both labeled and unlabeled data are used to train a classifier. This learning paradigm has obtained promising results, specifically in the presence of a reduced set of labeled examples. This package implements a collection of self-labeled techniques to construct a classification model. This family of techniques enlarges the original labeled set using the most confident predictions to classify unlabeled data. The techniques implemented can be applied to classification problems in several domains by the specification of a supervised base classifier. At low ratios of labeled data, it can be shown to perform better than classical supervised classifiers.
Maintained by Christoph Bergmeir. Last updated 5 years ago.
4.8 match 9 stars 5.22 score 62 scripts 1 dependentsbioc
GrafGen:Classification of Helicobacter Pylori Genomes
To classify Helicobacter pylori genomes according to genetic distance from nine reference populations. The nine reference populations are hpgpAfrica, hpgpAfrica-distant, hpgpAfroamerica, hpgpEuroamerica, hpgpMediterranea, hpgpEurope, hpgpEurasia, hpgpAsia, and hpgpAklavik86-like. The vertex populations are Africa, Europe and Asia.
Maintained by William Wheeler. Last updated 2 months ago.
geneticssoftwaregenomeannotationclassificationcpp
5.3 match 4.65 score 2 scriptsaebilgrau
GMCM:Fast Estimation of Gaussian Mixture Copula Models
Unsupervised Clustering and Meta-analysis using Gaussian Mixture Copula Models.
Maintained by Anders Ellern Bilgrau. Last updated 3 years ago.
clusteringgaussian-mixture-modelsmeta-analysisrankunsupervised-cluster-analysisopenblascpp
5.3 match 15 stars 4.62 score 56 scriptsandyliaw-mrk
randomForest:Breiman and Cutlers Random Forests for Classification and Regression
Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <DOI:10.1023/A:1010933404324>.
Maintained by Andy Liaw. Last updated 6 months ago.
2.0 match 47 stars 12.11 score 35k scripts 282 dependentsdiogoferrari
hdpGLM:Hierarchical Dirichlet Process Generalized Linear Models
Implementation of MCMC algorithms to estimate the Hierarchical Dirichlet Process Generalized Linear Model (hdpGLM) presented in the paper Ferrari (2020) Modeling Context-Dependent Latent Heterogeneity, Political Analysis <DOI:10.1017/pan.2019.13> and <doi:10.18637/jss.v107.i10>.
Maintained by Diogo Ferrari. Last updated 1 years ago.
dirichlet-process-mixtureshierarchical-clusteringnonparametricnonparametricbayesnpbsemi-parametricopenblascpp
5.0 match 12 stars 4.78 score 5 scriptsolechnwin
DIME:Differential Identification using Mixture Ensemble
A robust identification of differential binding sites method for analyzing ChIP-seq (Chromatin Immunoprecipitation Sequencing) comparing two samples that considers an ensemble of finite mixture models combined with a local false discovery rate (fdr) allowing for flexible modeling of data. Methods for Differential Identification using Mixture Ensemble (DIME) is described in: Taslim et al., (2011) <doi:10.1093/bioinformatics/btr165>.
Maintained by Cenny Taslim. Last updated 3 years ago.
9.0 match 2.63 score 43 scriptsrenkun-ken
rlist:A Toolbox for Non-Tabular Data Manipulation
Provides a set of functions for data manipulation with list objects, including mapping, filtering, grouping, sorting, updating, searching, and other useful functions. Most functions are designed to be pipeline friendly so that data processing with lists can be chained.
Maintained by Kun Ren. Last updated 2 years ago.
1.7 match 206 stars 13.73 score 2.2k scripts 123 dependentscran
compositions:Compositional Data Analysis
Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.
Maintained by K. Gerald van den Boogaart. Last updated 1 years ago.
3.7 match 1 stars 6.35 score 36 dependentsghtaranto
scapesClassification:User-Defined Classification of Raster Surfaces
Series of algorithms to translate users' mental models of seascapes, landscapes and, more generally, of geographic features into computer representations (classifications). Spaces and geographic objects are classified with user-defined rules taking into account spatial data as well as spatial relationships among different classes and objects.
Maintained by Gerald H. Taranto. Last updated 3 years ago.
classification-algorithmobject-detectionrasterspatial
5.5 match 1 stars 4.22 score 33 scriptseasystats
see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'
Provides plotting utilities supporting packages in the 'easystats' ecosystem (<https://github.com/easystats/easystats>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <https://materialui.co/>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.
Maintained by Indrajeet Patil. Last updated 5 days ago.
data-visualizationeasystatsggplot2hacktoberfestplottingseestatisticsvisualisationvisualization
1.8 match 902 stars 13.22 score 2.0k scripts 3 dependentsbioc
immunoClust:immunoClust - Automated Pipeline for Population Detection in Flow Cytometry
immunoClust is a model based clustering approach for Flow Cytometry samples. The cell-events of single Flow Cytometry samples are modelled by a mixture of multinominal normal- or t-distributions. The cell-event clusters of several samples are modelled by a mixture of multinominal normal-distributions aiming stable co-clusters across these samples.
Maintained by Till Soerensen. Last updated 4 months ago.
clusteringflowcytometrysinglecellcellbasedassaysimmunooncologygslcpp
5.3 match 4.38 score 4 scriptsbozenne
BuyseTest:Generalized Pairwise Comparisons
Implementation of the Generalized Pairwise Comparisons (GPC) as defined in Buyse (2010) <doi:10.1002/sim.3923> for complete observations, and extended in Peron (2018) <doi:10.1177/0962280216658320> to deal with right-censoring. GPC compare two groups of observations (intervention vs. control group) regarding several prioritized endpoints to estimate the probability that a random observation drawn from one group performs better/worse/equivalently than a random observation drawn from the other group. Summary statistics such as the net treatment benefit, win ratio, or win odds are then deduced from these probabilities. Confidence intervals and p-values are obtained based on asymptotic results (Ozenne 2021 <doi:10.1177/09622802211037067>), non-parametric bootstrap, or permutations. The software enables the use of thresholds of minimal importance difference, stratification, non-prioritized endpoints (O Brien test), and can handle right-censoring and competing-risks.
Maintained by Brice Ozenne. Last updated 4 days ago.
generalized-pairwise-comparisonsnon-parametricstatisticscpp
3.9 match 5 stars 5.91 score 90 scriptsdusadrian
admisc:Adrian Dusa's Miscellaneous
Contains functions used across packages 'DDIwR', 'QCA' and 'venn'. Interprets and translates, factorizes and negates SOP - Sum of Products expressions, for both binary and multi-value crisp sets, and extracts information (set names, set values) from those expressions. Other functions perform various other checks if possibly numeric (even if all numbers reside in a character vector) and coerce to numeric, or check if the numbers are whole. It also offers, among many others, a highly versatile recoding routine and some more flexible alternatives to the base functions 'with()' and 'within()'. SOP simplification functions in this package use related minimization from package 'QCA', which is recommended to be installed despite not being listed in the Imports field, due to circular dependency issues.
Maintained by Adrian Dusa. Last updated 3 days ago.
3.0 match 2 stars 7.61 score 20 scripts 92 dependentshendersontrent
theftdlc:Analyse and Interpret Time Series Features
Provides a suite of functions for analysing, interpreting, and visualising time-series features calculated from different feature sets from the 'theft' package. Implements statistical learning methodologies described in Henderson, T., Bryant, A., and Fulcher, B. (2023) <arXiv:2303.17809>.
Maintained by Trent Henderson. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningstatisticstime-series
4.6 match 4 stars 4.94 score 11 scriptsspatstat
spatstat.geom:Geometrical Functionality of the 'spatstat' Family
Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)
Maintained by Adrian Baddeley. Last updated 1 days ago.
classes-and-objectsdistance-calculationgeometrygeometry-processingimagesmensurationplottingpoint-patternsspatial-dataspatial-data-analysis
1.9 match 7 stars 12.11 score 241 scripts 227 dependentsjwijffels
RMOA:Connect R with MOA for Massive Online Analysis
Connect R with MOA (Massive Online Analysis - <https://moa.cms.waikato.ac.nz/>) to build classification models and regression models on streaming data or out-of-RAM data. Also streaming recommendation models are made available.
Maintained by Jan Wijffels. Last updated 3 years ago.
9.0 match 1 stars 2.53 score 34 scriptsbioc
GenomicAlignments:Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Maintained by Hervé Pagès. Last updated 5 months ago.
infrastructuredataimportgeneticssequencingrnaseqsnpcoveragealignmentimmunooncologybioconductor-packagecore-package
1.7 match 10 stars 13.61 score 3.1k scripts 529 dependentsrstudio
tfhub:Interface to 'TensorFlow' Hub
'TensorFlow' Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a 'TensorFlow' graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Transfer learning train a model with a smaller dataset, improve generalization, and speed up training.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
3.0 match 29 stars 7.46 score 73 scripts 1 dependentsinbo
effectclass:Classification and Visualisation of Effects
Classify effects by comparing the confidence intervals with thresholds.
Maintained by Thierry Onkelinx. Last updated 10 months ago.
4.2 match 6 stars 5.30 score 37 scripts 1 dependentssebkrantz
osmclass:Classify Open Street Map Features
Classify Open Street Map (OSM) features into meaningful functional or analytical categories. Designed for OSM PBF files, e.g. from <https://download.geofabrik.de/> imported as spatial data frames. A classification consists of a list of categories that are related to certain OSM tags and values. Given a layer from an OSM PBF file and a classification, the main osm_classify() function returns a classification data table giving, for each feature, the primary and alternative categories (if there is overlap) assigned, and the tag(s) and value(s) matched on. The package also contains a classification of OSM features by economic function/significance, following Krantz (2023) <https://www.ssrn.com/abstract=4537867>.
Maintained by Sebastian Krantz. Last updated 7 months ago.
7.3 match 1 stars 3.00 score 5 scriptsai-jyc
GENEAclassify:Segmentation and Classification of Accelerometer Data
Segmentation and classification procedures for data from the 'Activinsights GENEActiv' <https://activinsights.com/technology/geneactiv/> accelerometer that provides the user with a model to guess behaviour from test data where behaviour is missing. Includes a step counting algorithm, a function to create segmented data with custom features and a function to use recursive partitioning provided in the function rpart() of the 'rpart' package to create classification models.
Maintained by Jia Ying Chua. Last updated 1 years ago.
5.6 match 1 stars 3.88 score 51 scriptsbioc
HIBAG:HLA Genotype Imputation with Attribute Bagging
Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.
Maintained by Xiuwen Zheng. Last updated 4 months ago.
geneticsstatisticalmethodbioinformaticsgpuhlaimputationmhcsnpcpp
2.7 match 30 stars 8.24 score 48 scriptsjpfitzinger
tidyfit:Regularized Linear Modeling with Tidy Data
An extension to the 'R' tidy data environment for automated machine learning. The package allows fitting and cross validation of linear regression and classification algorithms on grouped data.
Maintained by Johann Pfitzinger. Last updated 2 months ago.
auto-mlclassificationmachine-learningregressiontidyverse
3.0 match 16 stars 7.22 score 26 scriptsbioc
sampleClassifier:Sample Classifier
The package is designed to classify microarray RNA-seq gene expression profiles.
Maintained by Khadija El Amrani. Last updated 5 months ago.
immunooncologyclassificationmicroarrayrnaseqgeneexpression
6.6 match 3.30 scoreandysouth
rworldmap:Mapping Global Data
Enables mapping of country level and gridded user datasets.
Maintained by Andy South. Last updated 2 years ago.
1.8 match 30 stars 11.83 score 3.2k scripts 14 dependentsc-monaghan
lwc2022:Langa-Weir Classification of Cognitive Function for 2022 HRS Data
Generates the Langa-Weir classification of cognitive function for the 2022 Health and Retirement Study (HRS) cognition data. It is particularly useful for researchers studying cognitive aging who wish to work with the most recent release of HRS data. The package provides user-friendly functions for data preprocessing, scoring, and classification allowing users to easily apply the Langa-Weir classification system. For details regarding the; HRS <https://hrsdata.isr.umich.edu/> and Langa-Weir classifications <https://hrsdata.isr.umich.edu/data-products/langa-weir-classification-cognitive-function-1995-2020>.
Maintained by Cormac Monaghan. Last updated 4 months ago.
4.8 match 4.48 score 4 scriptsdrordas
D2MCS:Data Driving Multiple Classifier System
Provides a novel framework to able to automatically develop and deploy an accurate Multiple Classifier System based on the feature-clustering distribution achieved from an input dataset. 'D2MCS' was developed focused on four main aspects: (i) the ability to determine an effective method to evaluate the independence of features, (ii) the identification of the optimal number of feature clusters, (iii) the training and tuning of ML models and (iv) the execution of voting schemes to combine the outputs of each classifier comprising the Multiple Classifier System.
Maintained by Miguel Ferreiro-Díaz. Last updated 3 years ago.
5.7 match 3.70 scorempjashby
sfhotspot:Hot-Spot Analysis with Simple Features
Identify and understand clusters of points (typically representing the locations of places or events) stored in simple-features (SF) objects. This is useful for analysing, for example, hot-spots of crime events. The package emphasises producing results from point SF data in a single step using reasonable default values for all other arguments, to aid rapid data analysis by users who are starting out. Functions available include kernel density estimation (for details, see Yip (2020) <doi:10.22224/gistbok/2020.1.12>), analysis of spatial association (Getis and Ord (1992) <doi:10.1111/j.1538-4632.1992.tb00261.x>) and hot-spot classification (Chainey (2020) ISBN:158948584X).
Maintained by Matt Ashby. Last updated 23 days ago.
hotspothotspotshotspots-analysismappingmapping-tools
3.8 match 12 stars 5.56 score 30 scriptskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
9.2 match 2.28 score 38 scriptsbioc
SPONGE:Sparse Partial Correlations On Gene Expression
This package provides methods to efficiently detect competitive endogeneous RNA interactions between two genes. Such interactions are mediated by one or several miRNAs such that both gene and miRNA expression data for a larger number of samples is needed as input. The SPONGE package now also includes spongEffects: ceRNA modules offer patient-specific insights into the miRNA regulatory landscape.
Maintained by Markus List. Last updated 5 months ago.
geneexpressiontranscriptiongeneregulationnetworkinferencetranscriptomicssystemsbiologyregressionrandomforestmachinelearning
3.9 match 5.36 score 38 scripts 1 dependentssvilsen
RWNN:Random Weight Neural Networks
Creation, estimation, and prediction of random weight neural networks (RWNN), Schmidt et al. (1992) <doi:10.1109/ICPR.1992.201708>, including popular variants like extreme learning machines, Huang et al. (2006) <doi:10.1016/j.neucom.2005.12.126>, sparse RWNN, Zhang et al. (2019) <doi:10.1016/j.neunet.2019.01.007>, and deep RWNN, Henríquez et al. (2018) <doi:10.1109/IJCNN.2018.8489703>. It further allows for the creation of ensemble RWNNs like bagging RWNN, Sui et al. (2021) <doi:10.1109/ECCE47101.2021.9595113>, boosting RWNN, stacking RWNN, and ensemble deep RWNN, Shi et al. (2021) <doi:10.1016/j.patcog.2021.107978>.
Maintained by Søren B. Vilsen. Last updated 22 days ago.
6.0 match 3.40 scoredmurdoch
plotrix:Various Plotting Functions
Lots of plots, various labeling, axis and color scaling functions. The author/maintainer died in September 2023.
Maintained by Duncan Murdoch. Last updated 1 years ago.
1.8 match 5 stars 11.31 score 9.2k scripts 361 dependentstaikisan21
PamBinaries:Read and Process 'Pamguard' Binary Data
Functions for easily reading and processing binary data files created by 'Pamguard' (<https://www.pamguard.org/>). All functions for directly reading the binary data files are based on 'MATLAB' code written by Michael Oswald.
Maintained by Taiki Sakai. Last updated 2 months ago.
3.8 match 10 stars 5.39 score 18 scripts 3 dependentskurthornik
mlbench:Machine Learning Benchmark Problems
A collection of artificial and real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.
Maintained by Kurt Hornik. Last updated 3 months ago.
2.3 match 2 stars 8.93 score 5.0k scripts 55 dependentsbioc
Rmagpie:MicroArray Gene-expression-based Program In Error rate estimation
Microarray Classification is designed for both biologists and statisticians. It offers the ability to train a classifier on a labelled microarray dataset and to then use that classifier to predict the class of new observations. A range of modern classifiers are available, including support vector machines (SVMs), nearest shrunken centroids (NSCs)... Advanced methods are provided to estimate the predictive error rate and to report the subset of genes which appear essential in discriminating between classes.
Maintained by Camille Maumet. Last updated 5 months ago.
6.0 match 3.30 score 1 scriptsklausvigo
kknn:Weighted k-Nearest Neighbors
Weighted k-Nearest Neighbors for Classification, Regression and Clustering.
Maintained by Klaus Schliep. Last updated 4 years ago.
1.8 match 23 stars 11.08 score 4.6k scripts 41 dependentscran
rbooster:AdaBoost Framework for Any Classifier
This is a simple package which provides a function that boosts pre-ready or custom-made classifiers. Package uses Discrete AdaBoost (<doi:10.1006/jcss.1997.1504>) and Real AdaBoost (<doi:10.1214/aos/1016218223>) for two class, SAMME (<doi:10.4310/SII.2009.v2.n3.a8>) and SAMME.R (<doi:10.4310/SII.2009.v2.n3.a8>) for multiclass classification.
Maintained by Fatih Saglam. Last updated 3 years ago.
7.3 match 2.70 score 6 scriptsalexanderrobitzsch
CDM:Cognitive Diagnosis Modeling
Functions for cognitive diagnosis modeling and multidimensional item response modeling for dichotomous and polytomous item responses. This package enables the estimation of the DINA and DINO model (Junker & Sijtsma, 2001, <doi:10.1177/01466210122032064>), the multiple group (polytomous) GDINA model (de la Torre, 2011, <doi:10.1007/s11336-011-9207-7>), the multiple choice DINA model (de la Torre, 2009, <doi:10.1177/0146621608320523>), the general diagnostic model (GDM; von Davier, 2008, <doi:10.1348/000711007X193957>), the structured latent class model (SLCA; Formann, 1992, <doi:10.1080/01621459.1992.10475229>) and regularized latent class analysis (Chen, Li, Liu, & Ying, 2017, <doi:10.1007/s11336-016-9545-6>). See George, Robitzsch, Kiefer, Gross, and Uenlue (2017) <doi:10.18637/jss.v074.i02> or Robitzsch and George (2019, <doi:10.1007/978-3-030-05584-4_26>) for further details on estimation and the package structure. For tutorials on how to use the CDM package see George and Robitzsch (2015, <doi:10.20982/tqmp.11.3.p189>) as well as Ravand and Robitzsch (2015).
Maintained by Alexander Robitzsch. Last updated 9 months ago.
cognitive-diagnostic-modelsitem-response-theorycpp
2.3 match 22 stars 8.76 score 138 scripts 28 dependentsthiyangt
seer:Feature-Based Forecast Model Selection
A novel meta-learning framework for forecast model selection using time series features. Many applications require a large number of time series to be forecast. Providing better forecasts for these time series is important in decision and policy making. We propose a classification framework which selects forecast models based on features calculated from the time series. We call this framework FFORMS (Feature-based FORecast Model Selection). FFORMS builds a mapping that relates the features of time series to the best forecast model using a random forest. 'seer' package is the implementation of the FFORMS algorithm. For more details see our paper at <https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp06-2018.pdf>.
Maintained by Thiyanga Talagala. Last updated 2 years ago.
3.7 match 78 stars 5.31 score 52 scriptsnavdeep-g
h2o4gpu:Interface to 'H2O4GPU'
Interface to 'H2O4GPU' <https://github.com/h2oai/h2o4gpu>, a collection of 'GPU' solvers for machine learning algorithms.
Maintained by Navdeep Gill. Last updated 4 years ago.
6.0 match 1 stars 3.24 score 35 scriptsbleutner
RStoolbox:Remote Sensing Data Analysis
Toolbox for remote sensing image processing and analysis such as calculating spectral indexes, principal component transformation, unsupervised and supervised classification or fractional cover analyses.
Maintained by Konstantin Mueller. Last updated 1 months ago.
ggplot2land-cover-mappingremote-sensingspectral-unmixingsupervised-classificationunsupervised-classificationopenblascpp
1.9 match 275 stars 10.10 score 1.1k scripts