R-universe search: classifier

jkrijthe

RSSL:Implementations of Semi-Supervised Learning Approaches for Classification

A collection of implementations of semi-supervised classifiers and methods to evaluate their performance. The package includes implementations of, among others, Implicitly Constrained Learning, Moment Constrained Learning, the Transductive SVM, Manifold regularization, Maximum Contrastive Pessimistic Likelihood estimation, S4VM and WellSVM.

Maintained by Jesse Krijthe. Last updated 1 years ago.

openblas cpp

60.3 match 58 stars 6.05 score 128 scripts 1 dependents

bioc

genefu:Computation of Gene Expression-Based Signatures in Breast Cancer

This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.

Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.

differentialexpression geneexpression visualization clustering classification

26.3 match 7.42 score 193 scripts 3 dependents

neurodata

lolR:Linear Optimal Low-Rank Projection

Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this High dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package and the associated manuscript Vogelstein et al. (2017) <arXiv:1709.01233>, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.

Maintained by Eric Bridgeford. Last updated 4 years ago.

26.6 match 20 stars 7.28 score 80 scripts

computationalstylistics

stylo:Stylometric Multivariate Analyses

Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.

Maintained by Maciej Eder. Last updated 2 months ago.

22.5 match 186 stars 8.59 score 462 scripts

riazakhan94

ROCit:Performance Assessment of Binary Classifier with Visualization

Sensitivity (or recall or true positive rate), false positive rate, specificity, precision (or positive predictive value), negative predictive value, misclassification rate, accuracy, F-score- these are popular metrics for assessing performance of binary classifier for certain threshold. These metrics are calculated at certain threshold values. Receiver operating characteristic (ROC) curve is a common tool for assessing overall diagnostic ability of the binary classifier. Unlike depending on a certain threshold, area under ROC curve (also known as AUC), is a summary statistic about how well a binary classifier performs overall for the classification task. ROCit package provides flexibility to easily evaluate threshold-bound metrics. Also, ROC curve, along with AUC, can be obtained using different methods, such as empirical, binormal and non-parametric. ROCit encompasses a wide variety of methods for constructing confidence interval of ROC curve and AUC. ROCit also features the option of constructing empirical gains table, which is a handy tool for direct marketing. The package offers options for commonly used visualization, such as, ROC curve, KS plot, lift plot. Along with in-built default graphics setting, there are rooms for manual tweak by providing the necessary values as function arguments. ROCit is a powerful tool offering a range of things, yet it is very easy to use.

Maintained by Md Riaz Ahmed Khan. Last updated 3 years ago.

23.8 match 7.66 score 332 scripts 6 dependents

pokotylo

ddalpha:Depth-Based Classification and Calculation of Data Depth

Contains procedures for depth-based supervised learning, which are entirely non-parametric, in particular the DDalpha-procedure (Lange, Mosler and Mozharovskyi, 2014 <doi:10.1007/s00362-012-0488-4>). The training data sample is transformed by a statistical depth function to a compact low-dimensional space, where the final classification is done. It also offers an extension to functional data and routines for calculating certain notions of statistical depth functions. 50 multivariate and 5 functional classification problems are included. (Pokotylo, Mozharovskyi and Dyckerhoff, 2019 <doi:10.18637/jss.v091.i05>).

Maintained by Oleksii Pokotylo. Last updated 6 months ago.

fortran cpp

40.1 match 2 stars 4.40 score 211 scripts 7 dependents

bioc

geneClassifiers:Application of gene classifiers

This packages aims for easy accessible application of classifiers which have been published in literature using an ExpressionSet as input.

Maintained by R Kuiper. Last updated 5 months ago.

geneexpression biomedicalinformatics classification survival microarray

37.8 match 1 stars 4.62 score 35 scripts

retowuest

autoMrP:Improving MrP with Ensemble Learning

A tool that improves the prediction performance of multilevel regression with post-stratification (MrP) by combining a number of machine learning methods. For information on the method, please refer to Broniecki, Wüest, Leemann (2020) ''Improving Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP)'' in the 'Journal of Politics'. Final pre-print version: <https://lucasleemann.files.wordpress.com/2020/07/automrp-r2pa.pdf>.

Maintained by Philipp Broniecki. Last updated 5 months ago.

29.3 match 27 stars 5.61 score

majkamichal

naivebayes:High Performance Implementation of the Naive Bayes Algorithm

In this implementation of the Naive Bayes classifier following class conditional distributions are available: 'Bernoulli', 'Categorical', 'Gaussian', 'Poisson', 'Multinomial' and non-parametric representation of the class conditional density estimated via Kernel Density Estimation. Implemented classifiers handle missing data and can take advantage of sparse data.

Maintained by Michal Majka. Last updated 1 months ago.

classification-model datascience machine-learning naive-bayes

13.9 match 37 stars 10.47 score 1.0k scripts 6 dependents

bupaverse

bupaR:Business Process Analysis in R

Comprehensive Business Process Analysis toolkit. Creates S3-class for event log objects, and related handler functions. Imports related packages for filtering event data, computation of descriptive statistics, handling of 'Petri Net' objects and visualization of process maps. See also packages 'edeaR','processmapR', 'eventdataR' and 'processmonitR'.

Maintained by Gert Janssenswillen. Last updated 2 years ago.

15.3 match 55 stars 9.07 score 389 scripts 11 dependents

bnaras

pamr:Pam: Prediction Analysis for Microarrays

Some functions for sample classification in microarrays.

Maintained by Balasubramanian Narasimhan. Last updated 9 months ago.

15.1 match 7.90 score 256 scripts 14 dependents

bmihaljevic

bnclassify:Learning Discrete Bayesian Network Classifiers from Data

State-of-the art algorithms for learning discrete Bayesian network classifiers from data, including a number of those described in Bielza & Larranaga (2014) <doi:10.1145/2576868>, with functions for prediction, model evaluation and inspection.

Maintained by Mihaljevic Bojan. Last updated 1 years ago.

cpp

17.1 match 18 stars 6.85 score 66 scripts

bioc

switchBox:Utilities to train and validate classifiers based on pair switching using the K-Top-Scoring-Pair (KTSP) algorithm

The package offer different classifiers based on comparisons of pair of features (TSP), using various decision rules (e.g., majority wins principle).

Maintained by Bahman Afsari. Last updated 5 months ago.

software statisticalmethod classification

26.0 match 4.30 score 11 scripts 1 dependents

e-sensing

sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes

An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.

Maintained by Gilberto Camara. Last updated 1 months ago.

big-earth-data cbers earth-observation eo-datacubes geospatial image-time-series land-cover-classification landsat planetary-computer r-spatial remote-sensing rspatial satellite-image-time-series satellite-imagery sentinel-2 stac-api stac-catalog cpp

11.4 match 494 stars 9.50 score 384 scripts

bioc

twoddpcr:Classify 2-d Droplet Digital PCR (ddPCR) data and quantify the number of starting molecules

The twoddpcr package takes Droplet Digital PCR (ddPCR) droplet amplitude data from Bio-Rad's QuantaSoft and can classify the droplets. A summary of the positive/negative droplet counts can be generated, which can then be used to estimate the number of molecules using the Poisson distribution. This is the first open source package that facilitates the automatic classification of general two channel ddPCR data. Previous work includes 'definetherain' (Jones et al., 2014) and 'ddpcRquant' (Trypsteen et al., 2015) which both handle one channel ddPCR experiments only. The 'ddpcr' package available on CRAN (Attali et al., 2016) supports automatic gating of a specific class of two channel ddPCR experiments only.

Maintained by Anthony Chiu. Last updated 5 months ago.

ddpcr software classification

18.6 match 10 stars 5.78 score 4 scripts

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 26 days ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

7.3 match 305 stars 14.45 score 1.6k scripts 6 dependents

rfastofficial

Rfast2:A Collection of Efficient and Extremely Fast R Functions II

A collection of fast statistical and utility functions for data analysis. Functions for regression, maximum likelihood, column-wise statistics and many more have been included. C++ has been utilized to speed up the functions. References: Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>.

Maintained by Manos Papadakis. Last updated 1 years ago.

openblas cpp openmp

12.7 match 38 stars 8.09 score 75 scripts 26 dependents

bioimaginggroup

nucim:Nucleome Imaging Toolbox

Tools for 4D nucleome imaging. Quantitative analysis of the 3D nuclear landscape recorded with super-resolved fluorescence microscopy. See Volker J. Schmid, Marion Cremer, Thomas Cremer (2017) <doi:10.1016/j.ymeth.2017.03.013>.

Maintained by Volker Schmid. Last updated 3 years ago.

22.5 match 2 stars 4.48 score 7 scripts

opengeos

whitebox:'WhiteboxTools' R Frontend

An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.

Maintained by Andrew Brown. Last updated 5 months ago.

geomorphometry geoprocessing geospatial gis hydrology remote-sensing rstudio

10.3 match 173 stars 9.65 score 203 scripts 2 dependents

bioc

doubletrouble:Identification and classification of duplicated genes

doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.

Maintained by Fabrício Almeida-Silva. Last updated 3 days ago.

software wholegenome comparativegenomics functionalgenomics phylogenetics network classification bioinformatics comparative-genomics gene-duplication molecular-evolution whole-genome-duplication

15.0 match 23 stars 6.44 score 17 scripts

bioc

CMA:Synthesis of microarray-based classification

This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.

Maintained by Roman Hornung. Last updated 5 months ago.

classification decisiontree

18.6 match 5.09 score 61 scripts

bioc

scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data

The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.

Maintained by Johannes Griss. Last updated 5 months ago.

singlecell transcriptomics geneexpression supportvectormachine classification software

13.1 match 15 stars 6.73 score 20 scripts

ipa-tys

ROCR:Visualizing the Performance of Scoring Classifiers

ROC graphs, sensitivity/specificity curves, lift charts, and precision/recall plots are popular examples of trade-off visualizations for specific pairs of performance measures. ROCR is a flexible tool for creating cutoff-parameterized 2D performance curves by freely combining two from over 25 performance measures (new performance measures can be added using a standard interface). Curves from different cross-validation or bootstrapping runs can be averaged by different methods, and standard deviations, standard errors or box plots can be used to visualize the variability across the runs. The parameterization can be visualized by printing cutoff values at the corresponding curve positions, or by coloring the curve according to cutoff. All components of a performance plot can be quickly adjusted using a flexible parameter dispatching mechanism. Despite its flexibility, ROCR is easy to use, with only three commands and reasonable default values for all optional parameters.

Maintained by Felix G.M. Ernst. Last updated 12 months ago.

6.1 match 38 stars 14.29 score 9.2k scripts 217 dependents

rspatial

terra:Spatial Data Analysis

Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).

Maintained by Robert J. Hijmans. Last updated 9 hours ago.

geospatial raster spatial vector onetbb proj gdal geos cpp

4.9 match 559 stars 17.64 score 17k scripts 851 dependents

kurthornik

RWeka:R/Weka Interface

An R interface to Weka (Version 3.9.3). Weka is a collection of machine learning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Package 'RWeka' contains the interface code, the Weka jar is in a separate package 'RWekajars'. For more information on Weka see <https://www.cs.waikato.ac.nz/ml/weka/>.

Maintained by Kurt Hornik. Last updated 2 years ago.

openjdk

9.3 match 4 stars 8.24 score 1.8k scripts 14 dependents

rrwen

nbc4va:Bayes Classifier for Verbal Autopsy Data

An implementation of the Naive Bayes Classifier (NBC) algorithm used for Verbal Autopsy (VA) built on code from Miasnikof et al (2015) <DOI:10.1186/s12916-015-0521-2>.

Maintained by Richard Wen. Last updated 3 years ago.

autopsy bayes cause classifier coded computer death estimate imputation learning machine mds million naive nbc probability study theory va verbal

16.6 match 4.60 score 79 scripts

bioc

PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit

Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression pharmacogenetics pharmacogenomics software classification survival clustering geneprediction

16.6 match 1 stars 4.31 score 17 scripts

rivolli

utiml:Utilities for Multi-Label Learning

Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) <doi:10.1109/TKDE.2013.39> and Gibaja, E. and Ventura, S. (2015) A Tutorial on Multi-label Learning.

Maintained by Adriano Rivolli. Last updated 4 years ago.

10.9 match 28 stars 6.39 score 87 scripts

quanteda

quanteda.textmodels:Scaling Models and Classifiers for Textual Data

Scaling models and classifiers for sparse matrix objects representing textual data in the form of a document-feature matrix. Includes original implementations of 'Laver', 'Benoit', and Garry's (2003) <doi:10.1017/S0003055403000698>, 'Wordscores' model, the Perry and 'Benoit' (2017) <doi:10.48550/arXiv.1710.08963> class affinity scaling model, and the 'Slapin' and 'Proksch' (2008) <doi:10.1111/j.1540-5907.2008.00338.x> 'wordfish' model, as well as methods for correspondence analysis, latent semantic analysis, and fast Naive Bayes and linear 'SVMs' specially designed for sparse textual data.

Maintained by Kenneth Benoit. Last updated 1 months ago.

openblas cpp

7.3 match 42 stars 9.56 score 432 scripts

cran

Modeler:Classes and Methods for Training and Using Binary Prediction Models

Defines classes and methods to learn models and use them to predict binary outcomes. These are generic tools, but we also include specific examples for many common classifiers.

Maintained by Kevin R. Coombes. Last updated 2 months ago.

microarray clustering

19.4 match 3.48 score 1 dependents

europeanifcbgroup

iRfcb:Tools for Managing Imaging FlowCytobot (IFCB) Data

A comprehensive suite of tools for managing, processing, and analyzing data from the IFCB. I R FlowCytobot ('iRfcb') supports quality control, geospatial analysis, and preparation of IFCB data for publication in databases like <https://www.gbif.org>, <https://www.obis.org>, <https://emodnet.ec.europa.eu/en>, <https://shark.smhi.se/>, and <https://www.ecotaxa.org>. The package integrates with the MATLAB 'ifcb-analysis' tool, which is described in Sosik and Olson (2007) <doi:10.4319/lom.2007.5.204>, and provides features for working with raw, manually classified, and machine learning–classified image datasets. Key functionalities include image extraction, particle size distribution analysis, taxonomic data handling, and biomass concentration calculations, essential for plankton research.

Maintained by Anders Torstensson. Last updated 2 days ago.

11.7 match 1 stars 5.72 score

topepo

sparsediscrim:Sparse and Regularized Discriminant Analysis

A collection of sparse and regularized discriminant analysis methods intended for small-sample, high-dimensional data sets. The package features the High-Dimensional Regularized Discriminant Analysis classifier from Ramey et al. (2017) <arXiv:1602.01182>. Other classifiers include those from Dudoit et al. (2002) <doi:10.1198/016214502753479248>, Pang et al. (2009) <doi:10.1111/j.1541-0420.2009.01200.x>, and Tong et al. (2012) <doi:10.1093/bioinformatics/btr690>.

Maintained by Max Kuhn. Last updated 4 years ago.

16.1 match 3 stars 4.11 score 86 scripts

bioc

adverSCarial:adverSCarial, generate and analyze the vulnerability of scRNA-seq classifier to adversarial attacks

adverSCarial is an R Package designed for generating and analyzing the vulnerability of scRNA-seq classifiers to adversarial attacks. The package is versatile and provides a format for integrating any type of classifier. It offers functions for studying and generating two types of attacks, single gene attack and max change attack. The single-gene attack involves making a small modification to the input to alter the classification. The max-change attack involves making a large modification to the input without changing its classification. The package provides a comprehensive solution for evaluating the robustness of scRNA-seq classifiers against adversarial attacks.

Maintained by Ghislain FIEVET. Last updated 5 months ago.

software singlecell transcriptomics classification

12.2 match 5.42 score 19 scripts

bergsmat

yamlet:Versatile Curation of Table Metadata

A YAML-based mechanism for working with table metadata. Supports compact syntax for creating, modifying, viewing, exporting, importing, displaying, and plotting metadata coded as column attributes. The 'yamlet' dialect is valid 'YAML' with defaults and conventions chosen to improve readability. See ?yamlet, ?decorate, ?modify, ?io_csv, and ?ggplot.decorated.

Maintained by Tim Bergsma. Last updated 22 days ago.

11.0 match 2 stars 5.99 score 60 scripts 1 dependents

rafaeljm

LibOPF:Design of Optimum-Path Forest Classifiers

The 'LibOPF' is a framework to develop pattern recognition techniques based on optimum-path forests (OPF), João P. Papa and Alexandre X. Falcão (2008) <doi:10.1007/978-3-540-89639-5_89>, with methods for supervised learning and data clustering.

Maintained by Rafael Junqueira Martarelli. Last updated 4 years ago.

20.7 match 1 stars 3.18 score

muvisu

biplotEZ:EZ-to-Use Biplots

Provides users with an EZ-to-use platform for representing data with biplots. Currently principal component analysis (PCA), canonical variate analysis (CVA) and simple correspondence analysis (CA) biplots are included. This is accompanied by various formatting options for the samples and axes. Alpha-bags and concentration ellipses are included for visual enhancements and interpretation. For an extensive discussion on the topic, see Gower, J.C., Lubbe, S. and le Roux, N.J. (2011, ISBN: 978-0-470-01255-0) Understanding Biplots. Wiley: Chichester.

Maintained by Sugnet Lubbe. Last updated 6 days ago.

fortran

7.8 match 7 stars 8.39 score 30 scripts 1 dependents

emeyers

NeuroDecodeR:Decode Information from Neural Activity

Neural decoding is method of analyzing neural data that uses a pattern classifiers to predict experimental conditions based on neural activity. 'NeuroDecodeR' is a system of objects that makes it easy to run neural decoding analyses. For more information on neural decoding see Meyers & Kreiman (2011) <doi:10.7551/mitpress/8404.003.0024>.

Maintained by Ethan Meyers. Last updated 1 years ago.

10.1 match 12 stars 6.49 score 17 scripts

bbuchsbaum

multivarious:Extensible Data Structures for Multivariate Analysis

Provides a set of basic and extensible data structures and functions for multivariate analysis, including dimensionality reduction techniques, projection methods, and preprocessing functions. The aim of this package is to offer a flexible and user-friendly framework for multivariate analysis that can be easily extended for custom requirements and specific data analysis tasks.

Maintained by Bradley Buchsbaum. Last updated 3 months ago.

18.4 match 3.53 score 17 scripts

leapigufpb

FuzzyClass:Fuzzy and Non-Fuzzy Classifiers

It provides classifiers which can be used for discrete variables and for continuous variables based on the Naive Bayes and Fuzzy Naive Bayes hypothesis. Those methods were developed by researchers belong to the 'Laboratory of Technologies for Virtual Teaching and Statistics (LabTEVE)' and 'Laboratory of Applied Statistics to Image Processing and Geoprocessing (LEAPIG)' at 'Federal University of Paraiba, Brazil'. They considered some statistical distributions and their papers were published in the scientific literature, as for instance, the Gaussian classifier using fuzzy parameters, proposed by 'Moraes, Ferreira and Machado' (2021) <doi:10.1007/s40815-020-00936-4>.

Maintained by Jodavid Ferreira. Last updated 5 months ago.

fuzzy machine-learning

16.2 match 1 stars 4.00 score 10 scripts

wenjie2wang

abclass:Angle-Based Large-Margin Classifiers

Multi-category angle-based large-margin classifiers. See Zhang and Liu (2014) <doi:10.1093/biomet/asu017> for details.

Maintained by Wenjie Wang. Last updated 1 years ago.

openblas cpp openmp

21.1 match 2 stars 3.04 score 11 scripts

techtonique

learningmachine:Machine Learning with Explanations and Uncertainty Quantification

Regression-based Machine Learning with explanations and uncertainty quantification.

Maintained by T. Moudiki. Last updated 4 months ago.

conformal-prediction machine-learning machine-learning-algorithms machinelearning statistical-learning uncertainty-quantification cpp

11.5 match 5 stars 5.57 score 21 scripts

nourmarzouka

multiclassPairs:Build MultiClass Pair-Based Classifiers using TSPs or RF

A toolbox to train a single sample classifier that uses in-sample feature relationships. The relationships are represented as feature1 < feature2 (e.g. gene1 < gene2). We provide two options to go with. First is based on 'switchBox' package which uses Top-score pairs algorithm. Second is a novel implementation based on random forest algorithm. For simple problems we recommend to use one-vs-rest using TSP option due to its simplicity and for being easy to interpret. For complex problems RF performs better. Both lines filter the features first then combine the filtered features to make the list of all the possible rules (i.e. rule1: feature1 < feature2, rule2: feature1 < feature3, etc...). Then the list of rules will be filtered and the most important and informative rules will be kept. The informative rules will be assembled in an one-vs-rest model or in an RF model. We provide a detailed description with each function in this package to explain the filtration and training methodology in each line. Reference: Marzouka & Eriksson (2021) <doi:10.1093/bioinformatics/btab088>.

Maintained by Nour-al-dain Marzouka. Last updated 2 years ago.

classification

13.2 match 12 stars 4.82 score 11 scripts

bioc

MLInterfaces:Uniform interfaces to R machine learning procedures for data in Bioconductor containers

This package provides uniform interfaces to machine learning code for data in R and Bioconductor containers.

Maintained by Vincent Carey. Last updated 5 months ago.

classification clustering

8.3 match 7.63 score 79 scripts 6 dependents

luca-scr

mclust:Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.

Maintained by Luca Scrucca. Last updated 11 months ago.

fortran openblas

5.2 match 21 stars 12.23 score 6.6k scripts 587 dependents

modeloriented

fairmodels:Flexible Tool for Bias Detection, Visualization, and Mitigation

Measure fairness metrics in one place for many models. Check how big is model's bias towards different races, sex, nationalities etc. Use measures such as Statistical Parity, Equal odds to detect the discrimination against unprivileged groups. Visualize the bias using heatmap, radar plot, biplot, bar chart (and more!). There are various pre-processing and post-processing bias mitigation algorithms implemented. Package also supports calculating fairness metrics for regression models. Find more details in (Wiśniewski, Biecek (2021)) <arXiv:2104.00507>.

Maintained by Jakub Wiśniewski. Last updated 1 months ago.

explain-classifiers explainable-ml fairness fairness-comparison fairness-ml model-evaluation

7.5 match 86 stars 7.72 score 51 scripts 1 dependents

bioc

DirichletMultinomial:Dirichlet-Multinomial Mixture Model Machine Learning for Microbiome Data

Dirichlet-multinomial mixture models can be used to describe variability in microbial metagenomic data. This package is an interface to code originally made available by Holmes, Harris, and Quince, 2012, PLoS ONE 7(2): 1-15, as discussed further in the man page for this package, ?DirichletMultinomial.

Maintained by Martin Morgan. Last updated 5 months ago.

immunooncology microbiome sequencing clustering classification metagenomics gsl

5.2 match 11 stars 10.97 score 125 scripts 26 dependents

cran

class:Functions for Classification

Various functions for classification, including k-nearest neighbour, Learning Vector Quantization and Self-Organizing Maps.

Maintained by Brian Ripley. Last updated 2 months ago.

7.2 match 1 stars 7.83 score 2.2k dependents

mothur

phylotypr:Classifying DNA Sequences to Taxonomic Groupings

Classification based analysis of DNA sequences to taxonomic groupings. This package primarily implements Naive Bayesian Classifier from the Ribosomal Database Project. This approach has traditionally been used to classify 16S rRNA gene sequences to bacterial taxonomic outlines; however, it can be used for any type of gene sequence. The method was originally described by Wang, Garrity, Tiedje, and Cole in Applied and Environmental Microbiology 73(16):5261-7 <doi:10.1128/AEM.00062-07>. The package also provides functions to read in 'FASTA'-formatted sequence data.

Maintained by Pat Schloss. Last updated 23 days ago.

cpp

9.2 match 8 stars 6.08 score 5 scripts

bioc

geNetClassifier:Classify diseases and build associated gene networks using gene expression profiles

Comprehensive package to automatically train and validate a multi-class SVM classifier based on gene expression data. Provides transparent selection of gene markers, their coexpression networks, and an interface to query the classifier.

Maintained by Sara Aibar. Last updated 5 months ago.

classification differentialexpression microarray

12.7 match 4.38 score 1 scripts 2 dependents

easystats

performance:Assessment of Regression Models Performance

Utilities for computing measures to assess model quality, which are not directly provided by R's 'base' or 'stats' packages. These include e.g. measures like r-squared, intraclass correlation coefficient (Nakagawa, Johnson & Schielzeth (2017) <doi:10.1098/rsif.2017.0213>), root mean squared error or functions to check models for overdispersion, singularity or zero-inflation and more. Functions apply to a large variety of regression models, including generalized linear models, mixed effects models and Bayesian models. References: Lüdecke et al. (2021) <doi:10.21105/joss.03139>.

Maintained by Daniel Lüdecke. Last updated 18 days ago.

aic easystats hacktoberfest loo machine-learning mixed-models models performance r2 statistics

3.4 match 1.1k stars 16.17 score 4.3k scripts 47 dependents

relund

gMOIP:Tools for 2D and 3D Plots of Single and Multi-Objective Linear/Integer Programming Models

Make 2D and 3D plots of linear programming (LP), integer linear programming (ILP), or mixed integer linear programming (MILP) models with up to three objectives. Plots of both the solution and criterion space are possible. For instance the non-dominated (Pareto) set for bi-objective LP/ILP/MILP programming models (see vignettes for an overview). The package also contains an function for checking if a point is inside the convex hull.

Maintained by Lars Relund Nielsen. Last updated 5 months ago.

2d-plot 3d-plot bi-objective convex-hull integer-programming linear-programming math milp mixed-integer-programming multi-objective polytope tri-objective visualization

7.1 match 5 stars 7.83 score 79 scripts 3 dependents

moviedo5

fda.usc:Functional Data Analysis and Utilities for Statistical Computing

Routines for exploratory and descriptive analysis of functional data such as depth measurements, atypical curves detection, regression models, supervised classification, unsupervised classification and functional analysis of variance.

Maintained by Manuel Oviedo de la Fuente. Last updated 4 months ago.

functional-data-analysis fortran

5.7 match 12 stars 9.72 score 560 scripts 22 dependents

larssnip

microclass:Tools for taxonomic classification of prokaryotes

Functions for working with taxonomic classifications in R

Maintained by Lars Snipen. Last updated 1 years ago.

cpp

11.8 match 4 stars 4.68 score 20 scripts

ropensci

coder:Deterministic Categorization of Items Based on External Code Data

Fast categorization of items based on external code data identified by regular expressions. A typical use case considers patient with medically coded data, such as codes from the International Classification of Diseases ('ICD') or the Anatomic Therapeutic Chemical ('ATC') classification system. Functions of the package relies on a triad of objects: (1) case data with unit id:s and possible dates of interest; (2) external code data for corresponding units in (1) and with optional dates of interest and; (3) a classification scheme ('classcodes' object) with regular expressions to identify and categorize relevant codes from (2). It is easy to introduce new classification schemes ('classcodes' objects) or to use default schemes included in the package. Use cases includes patient categorization based on 'comorbidity indices' such as 'Charlson', 'Elixhauser', 'RxRisk V', or the 'comorbidity-polypharmacy' score (CPS), as well as adverse events after hip and knee replacement surgery.

Maintained by Erik Bulow. Last updated 2 years ago.

classification icd-10

8.7 match 22 stars 6.31 score 23 scripts

ips-lmu

emuR:Main Package of the EMU Speech Database Management System

Provide the EMU Speech Database Management System (EMU-SDMS) with database management, data extraction, data preparation and data visualization facilities. See <https://ips-lmu.github.io/The-EMU-SDMS-Manual/> for more details.

Maintained by Markus Jochim. Last updated 1 years ago.

7.9 match 24 stars 6.89 score 135 scripts 1 dependents

david-cortes

costsensitive:Cost-Sensitive Multi-Class Classification

Reduction-based techniques for cost-sensitive multi-class classification, in which each observation has a different cost for classifying it into one class, and the goal is to predict the class with the minimum expected cost for each new observation. Implements Weighted All-Pairs (Beygelzimer, Langford, & Zadrozny (2008) <doi:10.1007/978-0-387-79361-0_1>), Weighted One-Vs-Rest (Beygelzimer,Dani, Hayes, Langford, Zadrozny, (2005) <https://dl.acm.org/citation.cfm?id=1102358>) and Regression One-Vs-Rest. Works with arbitrary classifiers taking observation weights, or with regressors. Also implements cost-proportionate rejection sampling for working with classifiers that don't accept observation weights.

Maintained by David Cortes. Last updated 2 months ago.

cost-sensitive-classification multi-label-classification

10.2 match 47 stars 5.30 score 28 scripts

bearloga

MLPUGS:Multi-Label Prediction Using Gibbs Sampling (and Classifier Chains)

An implementation of classifier chains (CC's) for multi-label prediction. Users can employ an external package (e.g. 'randomForest', 'C50'), or supply their own. The package can train a single set of CC's or train an ensemble of CC's -- in parallel if running in a multi-core environment. New observations are classified using a Gibbs sampler since each unobserved label is conditioned on the others. The package includes methods for evaluating the predictions for accuracy and aggregating across iterations and models to produce binary or probabilistic classifications.

Maintained by Mikhail Popov. Last updated 5 years ago.

classification machine-learning mcmc multi-label-classification supervised-learning

11.1 match 11 stars 4.74 score 6 scripts

ropensci

gigs:Assess Fetal, Newborn, and Child Growth with International Standards

Convert between anthropometric measures and z-scores/centiles in multiple growth standards, and classify fetal, newborn, and child growth accordingly. With a simple interface to growth standards from the World Health Organisation and International Fetal and Newborn Growth Consortium for the 21st Century, gigs makes growth assessment easy and reproducible for clinicians, researchers and policy-makers.

Maintained by Simon R Parker. Last updated 25 days ago.

anthropometry growth-standards intergrowth who

11.9 match 4 stars 4.38 score 8 scripts

lucymcgowan

tidycode:Analyze Lines of R Code the Tidy Way

Analyze lines of R code using tidy principles. This allows you to input lines of R code and output a data frame with one row per function included. Additionally, it facilitates code classification via included lexicons.

Maintained by Lucy DAgostino McGowan. Last updated 4 years ago.

8.0 match 32 stars 6.54 score 36 scripts

bioc

rRDP:Interface to the RDP Classifier

This package installs and interfaces the naive Bayesian classifier for 16S rRNA sequences developed by the Ribosomal Database Project (RDP). With this package the classifier trained with the standard training set can be used or a custom classifier can be trained.

Maintained by Michael Hahsler. Last updated 5 months ago.

genetics sequencing infrastructure classification microbiome immunooncology alignment sequencematching dataimport bayesian bioconductor bioinformatics openjdk

10.4 match 4 stars 5.00 score 6 scripts

bioc

CHETAH:Fast and accurate scRNA-seq cell type identification

CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate, selective and fast scRNA-seq classifier. Classification is guided by a reference dataset, preferentially also a scRNA-seq dataset. By hierarchical clustering of the reference data, CHETAH creates a classification tree that enables a step-wise, top-to-bottom classification. Using a novel stopping rule, CHETAH classifies the input cells to the cell types of the references and to "intermediate types": more general classifications that ended in an intermediate node of the tree.

Maintained by Jurrian de Kanter. Last updated 5 months ago.

classification rnaseq singlecell clustering geneexpression immunooncology

7.0 match 44 stars 7.27 score 70 scripts

bioc

SingleR:Reference-Based Single-Cell RNA-Seq Annotation

Performs unbiased cell type recognition from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types to infer the cell of origin of each single cell independently.

Maintained by Aaron Lun. Last updated 28 days ago.

software singlecell geneexpression transcriptomics classification clustering annotation bioconductor singler cpp

4.0 match 182 stars 12.60 score 2.1k scripts 1 dependents

abichat

evabic:Evaluation of Binary Classifiers

Evaluates the performance of binary classifiers. Computes confusion measures (TP, TN, FP, FN), derived measures (TPR, FDR, accuracy, F1, DOR, ..), and area under the curve. Outputs are well suited for nested dataframes.

Maintained by Antoine Bichat. Last updated 3 years ago.

classifier measures predictors roc-curve statistics

13.9 match 6 stars 3.62 score 14 scripts

rfastofficial

Rfast:A Collection of Efficient and Extremely Fast R Functions

A collection of fast (utility) functions for data analysis. Column and row wise means, medians, variances, minimums, maximums, many t, F and G-square tests, many regressions (normal, logistic, Poisson), are some of the many fast functions. References: a) Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>. b) Tsagris M. and Papadakis M. (2018). Forward regression in R: from the extreme slow to the extreme fast. Journal of Data Science, 16(4): 771--780. <doi:10.6339/JDS.201810_16(4).00006>. c) Chatzipantsiou C., Dimitriadis M., Papadakis M. and Tsagris M. (2020). Extremely Efficient Permutation and Bootstrap Hypothesis Tests Using Hypothesis Tests Using R. Journal of Modern Applied Statistical Methods, 18(2), eP2898. <doi:10.48550/arXiv.1806.10947>. d) Tsagris M., Papadakis M., Alenazi A. and Alzeley O. (2024). Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm. Computation, 12(9): 185. <doi:10.3390/computation12090185>. e) Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. <doi:10.48550/arXiv.2501.02849>.

Maintained by Manos Papadakis. Last updated 17 days ago.

openblas cpp openmp

3.9 match 147 stars 12.54 score 1.2k scripts 166 dependents

rueda-lab

iC10:A Copy Number and Expression-Based Classifier for Breast Tumours

Implementation of the classifier described in the paper Ali HR et al (2014) <doi:10.1186/s13059-014-0431-1>. It uses copy number and/or expression form breast cancer data, trains a Tibshirani's 'pamr' classifier with the features available and predicts the iC10 group.

Maintained by Oscar M Rueda. Last updated 8 months ago.

16.5 match 2.94 score 12 scripts 4 dependents

bioc

MLSeq:Machine Learning Interface for RNA-Seq Data

This package applies several machine learning methods, including SVM, bagSVM, Random Forest and CART to RNA-Seq data.

Maintained by Gokmen Zararsiz. Last updated 5 months ago.

immunooncology sequencing rnaseq classification clustering

10.0 match 4.81 score 27 scripts 1 dependents

heal-kgs

STICr:Process Stream Temperature, Intermittency, and Conductivity (STIC) Sensor Data

A collection of functions for processing raw data from Stream Temperature, Intermittency, and Conductivity (STIC) loggers. 'STICr' (pronounced "sticker") includes functions for tidying, calibrating, classifying, and doing quality checks on data from STIC sensors. Some package functionality is described in Wheeler/Zipper et al. (2023) <doi:10.31223/X5636K>.

Maintained by Sam Zipper. Last updated 3 months ago.

9.0 match 4 stars 5.26 score 9 scripts

zarquon42b

Morpho:Calculations and Visualisations Related to Geometric Morphometrics

A toolset for Geometric Morphometrics and mesh processing. This includes (among other stuff) mesh deformations based on reference points, permutation tests, detection of outliers, processing of sliding semi-landmarks and semi-automated surface landmark placement.

Maintained by Stefan Schlager. Last updated 5 months ago.

openblas cpp openmp

4.7 match 51 stars 10.00 score 218 scripts 13 dependents

bioc

MethPed:A DNA methylation classifier tool for the identification of pediatric brain tumor subtypes

Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro).

Maintained by Helena Carén. Last updated 5 months ago.

immunooncology dnamethylation classification epigenetics

11.8 match 4.00 score 1 scripts

bioc

DaMiRseq:Data Mining for RNA-seq data: normalization, feature selection and classification

The DaMiRseq package offers a tidy pipeline of data mining procedures to identify transcriptional biomarkers and exploit them for both binary and multi-class classification purposes. The package accepts any kind of data presented as a table of raw counts and allows including both continous and factorial variables that occur with the experimental setting. A series of functions enable the user to clean up the data by filtering genomic features and samples, to adjust data by identifying and removing the unwanted source of variation (i.e. batches and confounding factors) and to select the best predictors for modeling. Finally, a "stacking" ensemble learning technique is applied to build a robust classification model. Every step includes a checkpoint that the user may exploit to assess the effects of data management by looking at diagnostic plots, such as clustering and heatmaps, RLE boxplots, MDS or correlation plot.

Maintained by Mattia Chiesa. Last updated 5 months ago.

sequencing rnaseq classification immunooncology openjdk

8.8 match 5.32 score 7 scripts 1 dependents

cran

evclass:Evidential Distance-Based Classification

Different evidential classifiers, which provide outputs in the form of Dempster-Shafer mass functions. The methods are: the evidential K-nearest neighbor rule, the evidential neural network, radial basis function neural networks, logistic regression, feed-forward neural networks.

Maintained by Thierry Denoeux. Last updated 1 years ago.

23.5 match 1 stars 2.00 score

rmaia

pavo:Perceptual Analysis, Visualization and Organization of Spectral Colour Data

A cohesive framework for the spectral and spatial analysis of colour described in Maia, Eliason, Bitton, Doucet & Shawkey (2013) <doi:10.1111/2041-210X.12069> and Maia, Gruson, Endler & White (2019) <doi:10.1111/2041-210X.13174>.

Maintained by Thomas White. Last updated 1 months ago.

4.8 match 72 stars 9.72 score 151 scripts 1 dependents

bioc

cleanUpdTSeq:cleanUpdTSeq cleans up artifacts from polyadenylation sites from oligo(dT)-mediated 3' end RNA sequending data

This package implements a Naive Bayes classifier for accurately differentiating true polyadenylation sites (pA sites) from oligo(dT)-mediated 3' end sequencing such as PAS-Seq, PolyA-Seq and RNA-Seq by filtering out false polyadenylation sites, mainly due to oligo(dT)-mediated internal priming during reverse transcription. The classifer is highly accurate and outperforms other heuristic methods.

Maintained by Jianhong Ou. Last updated 2 months ago.

sequencing 3 end sequencing polyadenylation site internal priming

10.9 match 4.26 score 8 scripts 1 dependents

rstudio

tfestimators:Interface to 'TensorFlow' Estimators

Interface to 'TensorFlow' Estimators <https://www.tensorflow.org/guide/estimator>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.

Maintained by Tomasz Kalinowski. Last updated 3 years ago.

5.5 match 57 stars 8.42 score 170 scripts

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 16 days ago.

genetics imputation lmm fortran

3.9 match 12 stars 11.88 score 448 scripts 16 dependents

civisanalytics

civis:R Client for the 'Civis Platform API'

A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.

Maintained by Peter Cooman. Last updated 2 months ago.

5.6 match 16 stars 7.84 score 144 scripts

microsoft

wpa:Tools for Analysing and Visualising Viva Insights Data

Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.

Maintained by Martin Chan. Last updated 4 months ago.

workplace-analytics

6.6 match 30 stars 6.69 score 39 scripts 1 dependents

hkestler

TunePareto:Multi-Objective Parameter Tuning for Classifiers

Generic methods for parameter tuning of classification algorithms using multiple scoring functions (Muessel et al. (2012), <doi:10.18637/jss.v046.i05>).

Maintained by Hans Kestler. Last updated 1 years ago.

12.4 match 1 stars 3.52 score 92 scripts 2 dependents

bioc

clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters

Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.

Maintained by Rui Fu. Last updated 5 months ago.

singlecell annotation sequencing microarray geneexpression assign-identities clusters marker-genes rna-seq single-cell-rna-seq

4.5 match 119 stars 9.63 score 296 scripts

maxmenssen

predint:Prediction Intervals

An implementation of prediction intervals for overdispersed count data, for overdispersed binomial data and for linear random effects models.

Maintained by Max Menssen. Last updated 4 months ago.

14.4 match 3.00 score 4 scripts

fberding

aifeducation:Artificial Intelligence for Education

In social and educational settings, the use of Artificial Intelligence (AI) is a challenging task. Relevant data is often only available in handwritten forms, or the use of data is restricted by privacy policies. This often leads to small data sets. Furthermore, in the educational and social sciences, data is often unbalanced in terms of frequencies. To support educators as well as educational and social researchers in using the potentials of AI for their work, this package provides a unified interface for neural nets in 'PyTorch' to deal with natural language problems. In addition, the package ships with a shiny app, providing a graphical user interface. This allows the usage of AI for people without skills in writing python/R scripts. The tools integrate existing mathematical and statistical methods for dealing with small data sets via pseudo-labeling (e.g. Cascante-Bonilla et al. (2020) <doi:10.48550/arXiv.2001.06001>) and imbalanced data via the creation of synthetic cases (e.g. Bunkhumpornpat et al. (2012) <doi:10.1007/s10489-011-0287-y>). Performance evaluation of AI is connected to measures from content analysis which educational and social researchers are generally more familiar with (e.g. Berding & Pargmann (2022) <doi:10.30819/5581>, Gwet (2014) <ISBN:978-0-9708062-8-4>, Krippendorff (2019) <doi:10.4135/9781071878781>). Estimation of energy consumption and CO2 emissions during model training is done with the 'python' library 'codecarbon'. Finally, all objects created with this package allow to share trained AI models with other people.

Maintained by Berding Florian. Last updated 1 months ago.

cpp

9.6 match 4.48 score 8 scripts

avrodrigues

naturaList:Classify Occurrences by Confidence Levels in the Species ID

Classify occurrence records based on confidence levels of species identification. In addition, implement tools to filter occurrences inside grid cells and to manually check for possibles errors with an interactive shiny application.

Maintained by Arthur Vinicius Rodrigues. Last updated 1 years ago.

9.2 match 4.66 score 23 scripts

fangzhou-xie

rethnicity:Predicting Ethnic Group from Names

Implementation of the race/ethnicity prediction method, described in "rethnicity: An R package for predicting ethnicity from names" by Fangzhou Xie (2022) <doi:10.1016/j.softx.2021.100965> and "Rethnicity: Predicting Ethnicity from Names" by Fangzhou Xie (2021) <doi:10.48550/arXiv.2109.09228>.

Maintained by Fangzhou Xie. Last updated 3 days ago.

ethnicity-classifier ethnicity-prediction lstm cpp

7.5 match 9 stars 5.66 score 17 scripts

ropensci

nuts:Convert European Regional Data

Motivated by changing administrative boundaries over time, the 'nuts' package can convert European regional data with NUTS codes between versions (2006, 2010, 2013, 2016 and 2021) and levels (NUTS 1, NUTS 2 and NUTS 3). The package uses spatial interpolation as in Lam (1983) <doi:10.1559/152304083783914958> based on granular (100m x 100m) area, population and land use data provided by the European Commission's Joint Research Center.

Maintained by Moritz Hennicke. Last updated 5 months ago.

europe european-union eurostat nomenclature nuts nuts-codes nuts-regions regional-data

7.2 match 8 stars 5.86 score 3 scripts

bioc

GSgalgoR:An Evolutionary Framework for the Identification and Study of Prognostic Gene Expression Signatures in Cancer

A multi-objective optimization algorithm for disease sub-type discovery based on a non-dominated sorting genetic algorithm. The 'Galgo' framework combines the advantages of clustering algorithms for grouping heterogeneous 'omics' data and the searching properties of genetic algorithms for feature selection. The algorithm search for the optimal number of clusters determination considering the features that maximize the survival difference between sub-types while keeping cluster consistency high.

Maintained by Carlos Catania. Last updated 5 months ago.

geneexpression transcription clustering classification survival

7.7 match 15 stars 5.48 score 6 scripts

usccana

netdiffuseR:Analysis of Diffusion and Contagion Processes on Networks

Empirical statistical analysis, visualization and simulation of diffusion and contagion processes on networks. The package implements algorithms for calculating network diffusion statistics such as transmission rate, hazard rates, exposure models, network threshold levels, infectiousness (contagion), and susceptibility. The package is inspired by work published in Valente, et al., (2015) <DOI:10.1016/j.socscimed.2015.10.001>; Valente (1995) <ISBN: 9781881303213>, Myers (2000) <DOI:10.1086/303110>, Iyengar and others (2011) <DOI:10.1287/mksc.1100.0566>, Burt (1987) <DOI:10.1086/228667>; among others.

Maintained by George Vega Yon. Last updated 3 months ago.

contagion diffusion-network network-analysis network-visualization openblas cpp openmp

4.7 match 88 stars 8.88 score 217 scripts

cran

soundgen:Sound Synthesis and Acoustic Analysis

Performs parametric synthesis of sounds with harmonic and noise components such as animal vocalizations or human voice. Also offers tools for audio manipulation and acoustic analysis, including pitch tracking, spectral analysis, audio segmentation, pitch and formant shifting, etc. Includes four interactive web apps for synthesizing and annotating audio, manually correcting pitch contours, and measuring formant frequencies. Reference: Anikin (2019) <doi:10.3758/s13428-018-1095-7>.

Maintained by Andrey Anikin. Last updated 2 months ago.

8.6 match 1 stars 4.86 score 110 scripts 2 dependents

sollano

forestmangr:Forest Mensuration and Management

Processing forest inventory data with methods such as simple random sampling, stratified random sampling and systematic sampling. There are also functions for yield and growth predictions and model fitting, linear and nonlinear grouped data fitting, and statistical tests. References: Kershaw Jr., Ducey, Beers and Husch (2016). <doi:10.1002/9781118902028>.

Maintained by Sollano Rabelo Braga. Last updated 3 months ago.

5.2 match 17 stars 7.97 score 378 scripts

bioc

clst:Classification by local similarity threshold

Package for modified nearest-neighbor classification based on calculation of a similarity threshold distinguishing within-group from between-group comparisons.

Maintained by Noah Hoffman. Last updated 5 months ago.

classification

10.9 match 3.78 score 10 scripts 1 dependents

r-lidar

lasR:Fast and Pipeable Airborne LiDAR Data Tools

Fast and pipeable airborne lidar processing tools. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, normalization, individual tree segmentation and other manipulations in a powerful and versatile processing chain.

Maintained by Jean-Romain Roussel. Last updated 21 days ago.

gdal cpp openmp

6.0 match 17 stars 6.76 score 26 scripts

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

5.6 match 145 stars 7.09 score 50 scripts 2 dependents

cran

MASS:Support Functions and Datasets for Venables and Ripley's MASS

Functions and datasets to support Venables and Ripley, "Modern Applied Statistics with S" (4th edition, 2002).

Maintained by Brian Ripley. Last updated 16 days ago.

3.7 match 19 stars 10.53 score 11k dependents

szelepcsenyi

macroBiome:A Tool for Mapping the Distribution of the Biomes and Bioclimate

Procedures for simulating biomes by equilibrium vegetation models, with a special focus on paleoenvironmental applications. Three widely used equilibrium biome models are currently implemented in the package: the Holdridge Life Zone (HLZ) system (Holdridge 1947, <doi:10.1126/science.105.2727.367>), the Köppen-Geiger classification (KGC) system (Köppen 1936, <https://koeppen-geiger.vu-wien.ac.at/pdf/Koppen_1936.pdf>) and the BIOME model (Prentice et al. 1992, <doi:10.2307/2845499>). Three climatic forest-steppe models are also implemented. An approach for estimating monthly time series of relative sunshine duration from temperature and precipitation data (Yin 1999, <doi:10.1007/s007040050111>) is also adapted, allowing process-based biome models to be combined with high-resolution paleoclimate simulation datasets (e.g., CHELSA-TraCE21k v1.0 dataset: <https://chelsa-climate.org/chelsa-trace21k/>).

Maintained by Zoltán Szelepcsényi. Last updated 2 years ago.

14.2 match 2.70 score 2 scripts

mhahsler

arulesCBA:Classification Based on Association Rules

Provides the infrastructure for association rule-based classification including the algorithms CBA, CMAR, CPAR, C4.5, FOIL, PART, PRM, RCAR, and RIPPER to build associative classifiers. Hahsler et al (2019) <doi:10.32614/RJ-2019-048>.

Maintained by Michael Hahsler. Last updated 7 months ago.

association-rules classification

6.9 match 3 stars 5.49 score 47 scripts 1 dependents

ropensci

canaper:Categorical Analysis of Neo- And Paleo-Endemism

Provides functions to analyze the spatial distribution of biodiversity, in particular categorical analysis of neo- and paleo-endemism (CANAPE) as described in Mishler et al (2014) <doi:10.1038/ncomms5473>. 'canaper' conducts statistical tests to determine the types of endemism that occur in a study area while accounting for the evolutionary relationships of species.

Maintained by Joel H. Nitta. Last updated 2 years ago.

biodiversity canape

7.0 match 7 stars 5.38 score 23 scripts

bioc

CRImage:CRImage a package to classify cells and calculate tumour cellularity

CRImage provides functionality to process and analyze images, in particular to classify cells in biological images. Furthermore, in the context of tumor images, it provides functionality to calculate tumour cellularity.

Maintained by Henrik Failmezger. Last updated 5 months ago.

cellbiology classification

9.4 match 4.00 score 6 scripts

adamlilith

fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'

Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.

Maintained by Adam B. Smith. Last updated 18 days ago.

aspect distance fragmentation fragmentation-indices gis grass grass-gis raster raster-projection rasterize slope topography vectorization

4.9 match 58 stars 7.69 score 8 scripts

bioc

PrInCE:Predicting Interactomes from Co-Elution

PrInCE (Predicting Interactomes from Co-Elution) uses a naive Bayes classifier trained on dataset-derived features to recover protein-protein interactions from co-elution chromatogram profiles. This package contains the R implementation of PrInCE.

Maintained by Michael Skinnider. Last updated 5 months ago.

proteomics systemsbiology networkinference

5.8 match 8 stars 6.38 score 25 scripts

felixthestudent

cellpypes:Cell Type Pipes for Single-Cell RNA Sequencing Data

Annotate single-cell RNA sequencing data manually based on marker gene thresholds. Find cell type rules (gene+threshold) through exploration, use the popular piping operator '%>%' to reconstruct complex cell type hierarchies. 'cellpypes' models technical noise to find positive and negative cells for a given expression threshold and returns cell type labels or pseudobulks. Cite this package as Frauhammer (2022) <doi:10.5281/zenodo.6555728> and visit <https://github.com/FelixTheStudent/cellpypes> for tutorials and newest features.

Maintained by Felix Frauhammer. Last updated 1 years ago.

celltype-annotation classification-algorithm scrnaseq single-cell-rna-seq

8.4 match 51 stars 4.41 score 8 scripts

sym33

RecordLinkage:Record Linkage Functions for Linking and Deduplicating Data Sets

Provides functions for linking and deduplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain. For details, see our paper "The RecordLinkage Package: Detecting Errors in Data" Sariyar M / Borg A (2010) <doi:10.32614/RJ-2010-017>.

Maintained by Murat Sariyar. Last updated 2 years ago.

4.0 match 6 stars 9.00 score 454 scripts 8 dependents

cran

e1071:Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, generalized k-nearest neighbour ...

Maintained by David Meyer. Last updated 6 months ago.

cpp

2.5 match 28 stars 14.46 score 19k scripts 2.0k dependents

ropensci

pixelclasser:Classifies Image Pixels by Colour

Contains functions to classify the pixels of an image file (jpeg or tiff) by its colour. It implements a simple form of the techniques known as Support Vector Machine adapted to this particular problem.

Maintained by Carlos Real. Last updated 4 years ago.

12.0 match 2 stars 3.00 score 8 scripts

leef-uzh

LEEF.analysis:Access Functions, Tests and Basic Analysis of the RRD Data from the LEEF Project

Provides simple access functions to read data out of the sqlite RRD database. SQL queries can be configured in a yaml config file and used.

Maintained by Rainer M. Krug. Last updated 1 months ago.

14.8 match 2.44 score 23 scripts

mhahsler

stream:Infrastructure for Data Stream Mining

A framework for data stream modeling and associated data mining tasks such as clustering and classification. The development of this package was supported in part by NSF IIS-0948893, NSF CMMI 1728612, and NIH R21HG005912. Hahsler et al (2017) <doi:10.18637/jss.v076.i14>.

Maintained by Michael Hahsler. Last updated 4 days ago.

data-stream-clustering datastream stream-mining cpp

3.5 match 39 stars 10.05 score 132 scripts 3 dependents

andreasdominik

som.nn:Topological k-NN Classifier Based on Self-Organising Maps

A topological version of k-NN: An abstract model is build as 2-dimensional self-organising map. Samples of unknown class are predicted by mapping them on the SOM and analysing class membership of neurons in the neighbourhood.

Maintained by Andreas Dominik. Last updated 12 months ago.

14.7 match 2.40 score 28 scripts

khliland

multiblock:Multiblock Data Fusion in Statistics and Machine Learning

Functions and datasets to support Smilde, Næs and Liland (2021, ISBN: 978-1-119-60096-1) "Multiblock Data Fusion in Statistics and Machine Learning - Applications in the Natural and Life Sciences". This implements and imports a large collection of methods for multiblock data analysis with common interfaces, result- and plotting functions, several real data sets and six vignettes covering a range different applications.

Maintained by Kristian Hovde Liland. Last updated 2 months ago.

cpp

5.3 match 14 stars 6.68 score 19 scripts

nano-optics

planar:Multilayer Optics

Solves the electromagnetic problem of reflection and transmission at a planar multilayer interface. Also computed are the decay rates and emission profile for a dipolar emitter.

Maintained by Baptiste Auguié. Last updated 3 years ago.

openblas cpp

6.0 match 7 stars 5.83 score 65 scripts

cran

bnlearn:Bayesian Network Structure Learning, Parameter Learning and Inference

Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC, HPC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, cross-validation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <https://www.bnlearn.com/>.

Maintained by Marco Scutari. Last updated 2 months ago.

openblas

4.5 match 57 stars 7.72 score 32 dependents

ejosymart

sizeMat:Estimate Size at Sexual Maturity

Estimate morphometric and gonadal size at sexual maturity for organisms, usually fish and invertebrates. It includes methods for classification based on relative growth (using principal components analysis, hierarchical clustering, discriminant analysis), logistic regression (Frequentist or Bayes), parameters estimation and some basic plots.

Maintained by Josymar Torrejon-Magallanes. Last updated 1 years ago.

allometric-variables gonad-maturity morphometric-maturity

7.3 match 4 stars 4.72 score 26 scripts

mrshoenel

mmb:Arbitrary Dependency Mixed Multivariate Bayesian Models

Supports Bayesian models with full and partial (hence arbitrary) dependencies between random variables. Discrete and continuous variables are supported, and conditional joint probabilities and probability densities are estimated using Kernel Density Estimation (KDE). The full general form, which implements an extension to Bayes' theorem, as well as the simple form, which is just a Bayesian network, both support regression through segmentation and KDE and estimation of probability or relative likelihood of discrete or continuous target random variables. This package also provides true statistical distance measures based on Bayesian models. Furthermore, these measures can be facilitated on neighborhood searches, and to estimate the similarity and distance between data points. Related work is by Bayes (1763) <doi:10.1098/rstl.1763.0053> and by Scutari (2010) <doi:10.18637/jss.v035.i03>.

Maintained by Sebastian Hönel. Last updated 4 years ago.

bayes-classifier kernel-density-estimation neighborhood-search regression-models

9.2 match 3.70 score 5 scripts

cran

Allspice:RNA-Seq Profile Classifier

We developed a lightweight machine learning tool for RNA profiling of acute lymphoblastic leukemia (ALL), however, it can be used for any problem where multiple classes need to be identified from multi-dimensional data. The methodology is described in Makinen V-P, Rehn J, Breen J, Yeung D, White DL (2022) Multi-cohort transcriptomic subtyping of B-cell acute lymphoblastic leukemia, International Journal of Molecular Sciences 23:4574, <doi:10.3390/ijms23094574>. The classifier contains optimized mean profiles of the classes (centroids) as observed in the training data, and new samples are matched to these centroids using the shortest Euclidean distance. Centroids derived from a dataset of 1,598 ALL patients are included, but users can train the models with their own data as well. The output includes both numerical and visual presentations of the classification results. Samples with mixed features from multiple classes or atypical values are also identified.

Maintained by Ville-Petteri Makinen. Last updated 2 years ago.

17.0 match 2.00 score

cran

SSLR:Semi-Supervised Classification, Regression and Clustering Methods

Providing a collection of techniques for semi-supervised classification, regression and clustering. In semi-supervised problem, both labeled and unlabeled data are used to train a classifier. The package includes a collection of semi-supervised learning techniques: self-training, co-training, democratic, decision tree, random forest, 'S3VM' ... etc, with a fairly intuitive interface that is easy to use.

Maintained by Francisco Jesús Palomares Alabarce. Last updated 4 years ago.

cpp

9.2 match 1 stars 3.64 score 73 scripts

bioc

ClassifyR:A framework for cross-validated classification problems, with applications to differential variability and differential distribution testing

The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.

Maintained by Dario Strbenac. Last updated 6 days ago.

classification survival cpp

4.0 match 5 stars 8.36 score 45 scripts 3 dependents

grunwaldlab

metacoder:Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the 'taxmap' format defined by the 'taxa' package. The 'metacoder' package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.

Maintained by Zachary Foster. Last updated 1 months ago.

community-diversity hierarchical metabarcoding pcr taxonomy trees cpp

3.5 match 140 stars 9.64 score 328 scripts

roaldarbol

animovement:An R toolbox for analysing animal movement across space and time

An R toolbox for analysing animal movement across space and time.

Maintained by Mikkel Roald-Arbøl. Last updated 2 months ago.

animal-behaviour animal-movement neuroethology neuroscience

6.8 match 10 stars 4.81 score 8 scripts

ericarcher

banter:BioAcoustic eveNT classifiER

Create a hierarchical acoustic event species classifier out of multiple call type detectors as described in Rankin et al (2017) <doi:10.1111/mms.12381>.

Maintained by Eric Archer. Last updated 1 years ago.

acoustics bioacoustics cetaceans classification dolphins machine-learning noaa random-forest species-identification supervised-learning supervised-machine-learning whales jags cpp

7.7 match 9 stars 4.22 score 37 scripts

r-lidar

lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications

Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.

Maintained by Jean-Romain Roussel. Last updated 1 months ago.

als forestry las laz lidar point-cloud remote-sensing openblas cpp openmp

2.3 match 623 stars 14.47 score 844 scripts 8 dependents

kliegr

arc:Association Rule Classification

Implements the Classification-based on Association Rules (CBA) algorithm for association rule classification. The package, also described in Hahsler et al. (2019) <doi:10.32614/RJ-2019-048>, contains several convenience methods that allow to automatically set CBA parameters (minimum confidence, minimum support) and it also natively handles numeric attributes by integrating a pre-discretization step. The rule generation phase is handled by the 'arules' package. To further decrease the size of the CBA models produced by the 'arc' package, postprocessing by the 'qCBA' package is suggested.

Maintained by Tomas Kliegr. Last updated 6 months ago.

6.4 match 7 stars 5.09 score 39 scripts 1 dependents

bioc

bioCancer:Interactive Multi-Omics Cancers Data Visualization and Analysis

This package is a Shiny App to visualize and analyse interactively Multi-Assays of Cancer Genomic Data.

Maintained by Karim Mezhoud. Last updated 5 months ago.

gui datarepresentation network multiplecomparison pathways reactome visualization geneexpression genetarget analysis biocancer-interface cancer cancer-studies rmarkdown

5.4 match 20 stars 5.95 score 7 scripts

bioc

tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles

This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.

Maintained by Timothy Keyes. Last updated 5 months ago.

singlecell flowcytometry bioinformatics cytometry data-science single-cell tidyverse cpp

4.4 match 19 stars 7.26 score 35 scripts

jaroslav-kuchar

rCBA:CBA Classifier for R

Provides implementations of a classifier based on the "Classification Based on Associations" (CBA). It can be used for building classification models from association rules. Rules are pruned in the order of precedence given by the sort criteria and a default rule is added. The final classifier labels provided instances. CBA was originally proposed by Liu, B. Hsu, W. and Ma, Y. Integrating Classification and Association Rule Mining. Proceedings KDD-98, New York, 27-31 August. AAAI. pp80-86 (1998, ISBN:1-57735-070-7).

Maintained by Jaroslav Kuchar. Last updated 6 years ago.

openjdk

7.7 match 7 stars 4.14 score 39 scripts

eworx-org

labourR:Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations

Allows the user to map multilingual free-text of occupations to a broad range of standardized classifications. The package facilitates automatic occupation coding (see, e.g., Gweon et al. (2017) <doi:10.1515/jos-2017-0006> and Turrell et al. (2019) <doi:10.3386/w25837>), where the ISCO to ESCO mapping is exploited to extend the occupations hierarchy, Le Vrang et al. (2014) <doi:10.1109/mc.2014.283>. Document vectorization is performed using the multilingual ESCO corpus. A method based on the nearest neighbor search is used to suggest the closest ISCO occupation.

Maintained by Alexandros Kouretsis. Last updated 3 years ago.

5.0 match 28 stars 6.29 score 23 scripts 1 dependents

ajwills72

grt:General Recognition Theory

Functions to generate and analyze data for psychology experiments based on the General Recognition Theory.

Maintained by Andy Wills. Last updated 8 years ago.

13.4 match 2.34 score 44 scripts

wanchanglin

mt:Metabolomics Data Analysis Toolbox

Functions for metabolomics data analysis: data preprocessing, orthogonal signal correction, PCA analysis, PCA-DA analysis, PLS-DA analysis, classification, feature selection, correlation analysis, data visualisation and re-sampling strategies.

Maintained by Wanchang Lin. Last updated 1 years ago.

6.9 match 3 stars 4.57 score 50 scripts

friendly

HistData:Data Sets from the History of Statistics and Data Visualization

The 'HistData' package provides a collection of small data sets that are interesting and important in the history of statistics and data visualization. The goal of the package is to make these available, both for instructional use and for historical research. Some of these present interesting challenges for graphics or analysis in R.

Maintained by Michael Friendly. Last updated 10 months ago.

graphics historical-data

3.4 match 63 stars 9.19 score 732 scripts 2 dependents

gzt

MixMatrix:Classification with Matrix Variate Normal and t Distributions

Provides sampling and density functions for matrix variate normal, t, and inverted t distributions; ML estimation for matrix variate normal and t distributions using the EM algorithm, including some restrictions on the parameters; and classification by linear and quadratic discriminant analysis for matrix variate normal and t distributions described in Thompson et al. (2019) <doi:10.1080/10618600.2019.1696208>. Performs clustering with matrix variate normal and t mixture models.

Maintained by Geoffrey Thompson. Last updated 6 months ago.

openblas cpp openmp

5.0 match 3 stars 6.19 score 29 scripts 3 dependents

mskogholt

fastNaiveBayes:Extremely Fast Implementation of a Naive Bayes Classifier

This is an extremely fast implementation of a Naive Bayes classifier. This package is currently the only package that supports a Bernoulli distribution, a Multinomial distribution, and a Gaussian distribution, making it suitable for both binary features, frequency counts, and numerical features. Another feature is the support of a mix of different event models. Only numerical variables are allowed, however, categorical variables can be transformed into dummies and used with the Bernoulli distribution. The implementation is largely based on the paper "A comparison of event models for Naive Bayes anti-spam e-mail filtering" written by K.M. Schneider (2003) <doi:10.3115/1067807.1067848>. Any issues can be submitted to: <https://github.com/mskogholt/fastNaiveBayes/issues>.

Maintained by Martin Skogholt. Last updated 5 years ago.

5.2 match 42 stars 5.96 score 43 scripts

mlr-org

mlr3torch:Deep Learning with 'mlr3'

Deep Learning library that extends the mlr3 framework by building upon the 'torch' package. It allows to conveniently build, train, and evaluate deep learning models without having to worry about low level details. Custom architectures can be created using the graph language defined in 'mlr3pipelines'.

Maintained by Sebastian Fischer. Last updated 1 months ago.

data-science deep-learning machine-learning mlr3 torch

4.0 match 42 stars 7.63 score 78 scripts

uligges

klaR:Classification and Visualization

Miscellaneous functions for classification and visualization, e.g. regularized discriminant analysis, sknn() kernel-density naive Bayes, an interface to 'svmlight' and stepclass() wrapper variable selection for supervised classification, partimat() visualization of classification rules and shardsplot() of cluster results as well as kmodes() clustering for categorical data, corclust() variable clustering, variable extraction from different variable clustering models and weight of evidence preprocessing.

Maintained by Uwe Ligges. Last updated 1 years ago.

4.0 match 5 stars 7.61 score 1.4k scripts 13 dependents

bioc

dada2:Accurate, high-resolution sample inference from amplicon sequencing data

The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.

Maintained by Benjamin Callahan. Last updated 5 months ago.

immunooncology microbiome sequencing classification metagenomics amplicon bioconductor bioinformatics metabarcoding taxonomy cpp

2.3 match 485 stars 13.17 score 3.0k scripts 4 dependents

alanarnholt

BSDA:Basic Statistics and Data Analysis

Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.

Maintained by Alan T. Arnholt. Last updated 2 years ago.

3.3 match 7 stars 9.11 score 1.3k scripts 6 dependents

r-spatial

classInt:Choose Univariate Class Intervals

Selected commonly used methods for choosing univariate class intervals for mapping or other graphics purposes.

Maintained by Roger Bivand. Last updated 3 months ago.

fortran

1.9 match 34 stars 16.02 score 3.2k scripts 1.2k dependents

hkestler

ORION:Ordinal Relations

Functions to handle ordinal relations reflected within the feature space. Those function allow to search for ordinal relations in multi-class datasets. One can check whether proposed relations are reflected in a specific feature representation. Furthermore, it provides functions to filter, organize and further analyze those ordinal relations.

Maintained by HA Kestler. Last updated 3 years ago.

9.3 match 3.23 score 17 scripts

r-tensorflow

autokeras:R Interface to 'AutoKeras'

R Interface to 'AutoKeras' <https://autokeras.com/>. 'AutoKeras' is an open source software library for Automated Machine Learning (AutoML). The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. 'AutoKeras' provides functions to automatically search for architecture and hyperparameters of deep learning models.

Maintained by Juan Cruz Rodriguez. Last updated 4 years ago.

autodl automatic-machine-learning automl deep-learning keras machine-learning tensorflow

5.5 match 73 stars 5.34 score

mlcollyer

RRPP:Linear Model Evaluation with Randomized Residuals in a Permutation Procedure

Linear model calculations are made for many random versions of data. Using residual randomization in a permutation procedure, sums of squares are calculated over many permutations to generate empirical probability distributions for evaluating model effects. Additionally, coefficients, statistics, fitted values, and residuals generated over many permutations can be used for various procedures including pairwise tests, prediction, classification, and model comparison. This package should provide most tools one could need for the analysis of high-dimensional data, especially in ecology and evolutionary biology, but certainly other fields, as well.

Maintained by Michael Collyer. Last updated 26 days ago.

3.0 match 4 stars 9.84 score 173 scripts 7 dependents

cran

SeqDetect:Sequence and Latent Process Detector

Sequence detector in this package contains a specific automaton model that can be used to learn and detect data and process sequences. Automaton model in this package is capable of learning and tracing sequences. Automaton model can be found in Krleža, Vrdoljak, Brčić (2019) <doi:10.1109/ACCESS.2019.2955245>. This research has been partly supported under Competitiveness and Cohesion Operational Programme from the European Regional and Development Fund, as part of the Integrated Anti-Fraud System project no. KK.01.2.1.01.0041. This research has also been partly supported by the European Regional Development Fund under the grant KK.01.1.1.01.0009.

Maintained by Dalibor Krleža. Last updated 5 years ago.

cpp

14.6 match 2.00 score 2 scripts

andregustavom

mlquantify:Algorithms for Class Distribution Estimation

Quantification is a prominent machine learning task that has received an increasing amount of attention in the last years. The objective is to predict the class distribution of a data sample. This package is a collection of machine learning algorithms for class distribution estimation. This package include algorithms from different paradigms of quantification. These methods are described in the paper: A. Maletzke, W. Hassan, D. dos Reis, and G. Batista. The importance of the test set size in quantification assessment. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI20, pages 2640–2646, 2020. <doi:10.24963/ijcai.2020/366>.

Maintained by Andre Maletzke. Last updated 3 years ago.

8.1 match 7 stars 3.54 score 1 scripts

trevorhastie

mda:Mixture and Flexible Discriminant Analysis

Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, and vector-response smoothing splines. Hastie, Tibshirani and Friedman (2009) "Elements of Statistical Learning (second edition, chap 12)" Springer, New York.

Maintained by Trevor Hastie. Last updated 4 months ago.

fortran

3.8 match 3 stars 7.60 score 428 scripts 17 dependents

brry

berryFunctions:Function Collection Related to Plotting and Hydrology

Draw horizontal histograms, color scattered points by 3rd dimension, enhance date- and log-axis plots, zoom in X11 graphics, trace errors and warnings, use the unit hydrograph in a linear storage cascade, convert lists to data.frames and arrays, fit multiple functions.

Maintained by Berry Boessenkool. Last updated 1 months ago.

3.0 match 13 stars 9.43 score 350 scripts 16 dependents

bioc

canceR:A Graphical User Interface for accessing and modeling the Cancer Genomics Data of MSKCC

The package is user friendly interface based on the cgdsr and other modeling packages to explore, compare, and analyse all available Cancer Data (Clinical data, Gene Mutation, Gene Methylation, Gene Expression, Protein Phosphorylation, Copy Number Alteration) hosted by the Computational Biology Center at Memorial-Sloan-Kettering Cancer Center (MSKCC).

Maintained by Karim Mezhoud. Last updated 5 months ago.

gui geneexpression clustering go genesetenrichment kegg multiplecomparison cancer cancer-data gene gene-expression gene-methylation gene-mutation gene-sets methylation mskcc mutations tcltk

5.4 match 7 stars 5.25 score 17 scripts

declaredesign

fabricatr:Imagine Your Data Before You Collect It

Helps you imagine your data before you collect it. Hierarchical data structures and correlated data can be easily simulated, either from random number generators or by resampling from existing data sources. This package is faster with 'data.table' and 'mvnfast' installed.

Maintained by Graeme Blair. Last updated 1 months ago.

3.4 match 93 stars 8.29 score 234 scripts 5 dependents

bioc

TrIdent:TrIdent - Transduction Identification

The `TrIdent` R package automates the analysis of transductomics data by detecting, classifying, and characterizing read coverage patterns associated with potential transduction events. Transductomics is a DNA sequencing-based method for the detection and characterization of transduction events in pure cultures and complex communities. Transductomics relies on mapping sequencing reads from a viral-like particle (VLP)-fraction of a sample to contigs assembled from the metagenome (whole-community) of the same sample. Reads from bacterial DNA carried by VLPs will map back to the bacterial contigs of origin creating read coverage patterns indicative of ongoing transduction.

Maintained by Jessie Maier. Last updated 13 days ago.

coverage metagenomics patternlogic classification sequencing bacteriophage horizontal-gene-transfer pattern-matching phage sequencing-coverage transduction transductomics virus-like-particle

5.5 match 2 stars 5.04 score 7 scripts

julienmoeys

soiltexture:Functions for Soil Texture Plot, Classification and Transformation

"The Soil Texture Wizard" is a set of R functions designed to produce texture triangles (also called texture plots, texture diagrams, texture ternary plots), classify and transform soil textures data. These functions virtually allows to plot any soil texture triangle (classification) into any triangle geometry (isosceles, right-angled triangles, etc.). This set of function is expected to be useful to people using soil textures data from different soil texture classification or different particle size systems. Many (> 15) texture triangles from all around the world are predefined in the package. A simple text based graphical user interface is provided: soiltexture_gui().

Maintained by Julien Moeys. Last updated 1 years ago.

3.9 match 28 stars 7.11 score 136 scripts 1 dependents

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

1.9 match 459 stars 14.63 score 948 scripts 18 dependents

dmarchette

cccd:Class Cover Catch Digraphs

Class Cover Catch Digraphs, neighborhood graphs, and relatives.

Maintained by David J. Marchette. Last updated 3 years ago.

12.8 match 1 stars 2.12 score 131 scripts

cran

PoiClaClu:Classification and Clustering of Sequencing Data Based on a Poisson Model

Implements the methods described in the paper, Witten (2011) Classification and Clustering of Sequencing Data using a Poisson Model, Annals of Applied Statistics 5(4) 2493-2518.

Maintained by Daniela Witten. Last updated 6 years ago.

7.0 match 3.81 score 107 scripts 2 dependents

easystats

parameters:Processing of Model Parameters

Utilities for processing the parameters of various statistical models. Beyond computing p values, CIs, and other indices for a wide variety of models (see list of supported models using the function 'insight::supported_models()'), this package implements features like bootstrapping or simulating of parameters and models, feature reduction (feature extraction and variable selection) as well as functions to describe data and variable characteristics (e.g. skewness, kurtosis, smoothness or distribution).

Maintained by Daniel Lüdecke. Last updated 2 days ago.

beta bootstrap ci confidence-intervals data-reduction easystats fa feature-extraction feature-reduction hacktoberfest parameters pca pvalues regression-models robust-statistics standardize standardized-estimates statistical-models

1.7 match 453 stars 15.65 score 1.8k scripts 56 dependents

peter-t-ruehr

forceR:Force Measurement Analyses

For cleaning and analysis of graphs, such as animal closing force measurements. 'forceR' was initially written and optimized to deal with insect bite force measurements, but can be used for any time series. Includes a full workflow to load, plot and crop data, correct amplifier and baseline drifts, identify individual peak shapes (bites), rescale (normalize) peak curves, and find best polynomial fits to describe and analyze force curve shapes.

Maintained by Peter T. Rühr. Last updated 12 months ago.

7.3 match 3.70 score 10 scripts

lorenc5

RTextTools:Automatic Text Classification via Supervised Learning

A machine learning package for automatic text classification that makes it simple for novice users to get started with machine learning, while allowing experienced users to easily experiment with different settings and algorithm combinations. The package includes eight algorithms for ensemble classification (svm, slda, boosting, bagging, random forests, glmnet, decision trees, neural networks), comprehensive analytics, and thorough documentation.

Maintained by Loren Collingwood. Last updated 5 years ago.

7.0 match 1 stars 3.84 score 772 scripts

pbourkey

polymapR:Linkage Analysis in Outcrossing Polyploids

Creation of linkage maps in polyploid species from marker dosage scores of an F1 cross from two heterozygous parents. Currently works for outcrossing diploid, autotriploid, autotetraploid and autohexaploid species, as well as segmental allotetraploids. Methods are described in a manuscript of Bourke et al. (2018) <doi:10.1093/bioinformatics/bty371>. Since version 1.1.0, both discrete and probabilistic genotypes are acceptable input; for more details on the latter see Liao et al. (2021) <doi:10.1007/s00122-021-03834-x>.

Maintained by Peter Bourke. Last updated 10 months ago.

6.6 match 1 stars 4.03 score 54 scripts

bioc

a4Classif:Automated Affymetrix Array Analysis Classification Package

Functionalities for classification of Affymetrix microarray data, integrating within the Automated Affymetrix Array Analysis set of packages.

Maintained by Laure Cougnaud. Last updated 5 months ago.

microarray geneexpression classification

7.0 match 3.78 score 1 scripts 1 dependents

viroli

quantileDA:Quantile Classifier

Code for centroid, median and quantile classifiers.

Maintained by Cinzia Viroli. Last updated 12 months ago.

26.4 match 1.00 score 10 scripts

shaunpwilkinson

insect:Informatic Sequence Classification Trees

Provides tools for probabilistic taxon assignment with informatic sequence classification trees. See Wilkinson et al (2018) <doi:10.7287/peerj.preprints.26812v1>.

Maintained by Shaun Wilkinson. Last updated 4 years ago.

4.5 match 14 stars 5.80 score 91 scripts

ballings

AUC:Threshold Independent Performance Measures for Probabilistic Classifiers

Various functions to compute the area under the curve of selected measures: The area under the sensitivity curve (AUSEC), the area under the specificity curve (AUSPC), the area under the accuracy curve (AUACC), and the area under the receiver operating characteristic curve (AUROC). Support for visualization and partial areas is included.

Maintained by Michel Ballings. Last updated 3 years ago.

4.7 match 5.37 score 424 scripts 7 dependents

mabelc

ssc:Semi-Supervised Classification Methods

Provides a collection of self-labeled techniques for semi-supervised classification. In semi-supervised classification, both labeled and unlabeled data are used to train a classifier. This learning paradigm has obtained promising results, specifically in the presence of a reduced set of labeled examples. This package implements a collection of self-labeled techniques to construct a classification model. This family of techniques enlarges the original labeled set using the most confident predictions to classify unlabeled data. The techniques implemented can be applied to classification problems in several domains by the specification of a supervised base classifier. At low ratios of labeled data, it can be shown to perform better than classical supervised classifiers.

Maintained by Christoph Bergmeir. Last updated 5 years ago.

4.8 match 9 stars 5.22 score 62 scripts 1 dependents

cran

Compositional:Compositional Data Analysis

Regression, classification, contour plots, hypothesis testing and fitting of distributions for compositional data are some of the functions included. We further include functions for percentages (or proportions). The standard textbook for such data is John Aitchison's (1986) "The statistical analysis of compositional data". Relevant papers include: a) Tsagris M.T., Preston S. and Wood A.T.A. (2011). "A data-based power transformation for compositional data". Fourth International International Workshop on Compositional Data Analysis. <doi:10.48550/arXiv.1106.1451> b) Tsagris M. (2014). "The k-NN algorithm for compositional data: a revised approach with and without zero values present". Journal of Data Science, 12(3): 519--534. <doi:10.6339/JDS.201407_12(3).0008>. c) Tsagris M. (2015). "A novel, divergence based, regression for compositional data". Proceedings of the 28th Panhellenic Statistics Conference, 15-18 April 2015, Athens, Greece, 430--444. <doi:10.48550/arXiv.1511.07600>. d) Tsagris M. (2015). "Regression analysis with compositional data containing zero values". Chilean Journal of Statistics, 6(2): 47--57. <https://soche.cl/chjs/volumes/06/02/Tsagris(2015).pdf>. e) Tsagris M., Preston S. and Wood A.T.A. (2016). "Improved supervised classification for compositional data using the alpha-transformation". Journal of Classification, 33(2): 243--261. <doi:10.1007/s00357-016-9207-5>. f) Tsagris M., Preston S. and Wood A.T.A. (2017). "Nonparametric hypothesis testing for equality of means on the simplex". Journal of Statistical Computation and Simulation, 87(2): 406--422. <doi:10.1080/00949655.2016.1216554>. g) Tsagris M. and Stewart C. (2018). "A Dirichlet regression model for compositional data with zeros". Lobachevskii Journal of Mathematics, 39(3): 398--412. <doi:10.1134/S1995080218030198>. h) Alenazi A. (2019). "Regression for compositional data with compositional data as predictor variables with or without zero values". Journal of Data Science, 17(1): 219--238. <doi:10.6339/JDS.201901_17(1).0010>. i) Tsagris M. and Stewart C. (2020). "A folded model for compositional data analysis". Australian and New Zealand Journal of Statistics, 62(2): 249--277. <doi:10.1111/anzs.12289>. j) Alenazi A.A. (2022). "f-divergence regression models for compositional data". Pakistan Journal of Statistics and Operation Research, 18(4): 867--882. <doi:10.18187/pjsor.v18i4.3969>. k) Tsagris M. and Stewart C. (2022). "A Review of Flexible Transformations for Modeling Compositional Data". In Advances and Innovations in Statistics and Data Science, pp. 225--234. <doi:10.1007/978-3-031-08329-7_10>. l) Alenazi A. (2023). "A review of compositional data analysis and recent advances". Communications in Statistics--Theory and Methods, 52(16): 5535--5567. <doi:10.1080/03610926.2021.2014890>. m) Tsagris M., Alenazi A. and Stewart C. (2023). "Flexible non-parametric regression models for compositional response data with zeros". Statistics and Computing, 33(106). <doi:10.1007/s11222-023-10277-5>. n) Tsagris. M. (2025). "Constrained least squares simplicial-simplicial regression". Statistics and Computing, 35(27). <doi:10.1007/s11222-024-10560-z>. o) Sevinc V. and Tsagris. M. (2024). "Energy Based Equality of Distributions Testing for Compositional Data". <doi:10.48550/arXiv.2412.05199>.

Maintained by Michail Tsagris. Last updated 2 months ago.

6.9 match 3 stars 3.64 score 4 dependents

bioc

GrafGen:Classification of Helicobacter Pylori Genomes

To classify Helicobacter pylori genomes according to genetic distance from nine reference populations. The nine reference populations are hpgpAfrica, hpgpAfrica-distant, hpgpAfroamerica, hpgpEuroamerica, hpgpMediterranea, hpgpEurope, hpgpEurasia, hpgpAsia, and hpgpAklavik86-like. The vertex populations are Africa, Europe and Asia.

Maintained by William Wheeler. Last updated 2 months ago.

genetics software genomeannotation classification cpp

5.3 match 4.65 score 2 scripts

aebilgrau

GMCM:Fast Estimation of Gaussian Mixture Copula Models

Unsupervised Clustering and Meta-analysis using Gaussian Mixture Copula Models.

Maintained by Anders Ellern Bilgrau. Last updated 3 years ago.

clustering gaussian-mixture-models meta-analysis rank unsupervised-cluster-analysis openblas cpp

5.3 match 15 stars 4.62 score 56 scripts

andyliaw-mrk

randomForest:Breiman and Cutlers Random Forests for Classification and Regression

Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <DOI:10.1023/A:1010933404324>.

Maintained by Andy Liaw. Last updated 6 months ago.

fortran

2.0 match 47 stars 12.11 score 35k scripts 282 dependents

diogoferrari

hdpGLM:Hierarchical Dirichlet Process Generalized Linear Models

Implementation of MCMC algorithms to estimate the Hierarchical Dirichlet Process Generalized Linear Model (hdpGLM) presented in the paper Ferrari (2020) Modeling Context-Dependent Latent Heterogeneity, Political Analysis <DOI:10.1017/pan.2019.13> and <doi:10.18637/jss.v107.i10>.

Maintained by Diogo Ferrari. Last updated 1 years ago.

dirichlet-process-mixtures hierarchical-clustering nonparametric nonparametricbayes npb semi-parametric openblas cpp

5.0 match 12 stars 4.78 score 5 scripts

olechnwin

DIME:Differential Identification using Mixture Ensemble

A robust identification of differential binding sites method for analyzing ChIP-seq (Chromatin Immunoprecipitation Sequencing) comparing two samples that considers an ensemble of finite mixture models combined with a local false discovery rate (fdr) allowing for flexible modeling of data. Methods for Differential Identification using Mixture Ensemble (DIME) is described in: Taslim et al., (2011) <doi:10.1093/bioinformatics/btr165>.

Maintained by Cenny Taslim. Last updated 3 years ago.

9.0 match 2.63 score 43 scripts

renkun-ken

rlist:A Toolbox for Non-Tabular Data Manipulation

Provides a set of functions for data manipulation with list objects, including mapping, filtering, grouping, sorting, updating, searching, and other useful functions. Most functions are designed to be pipeline friendly so that data processing with lists can be chained.

Maintained by Kun Ren. Last updated 2 years ago.

1.7 match 206 stars 13.73 score 2.2k scripts 123 dependents

ying-ju

basemodels:Baseline Models for Classification and Regression

Providing equivalent functions for the dummy classifier and regressor used in 'Python' 'scikit-learn' library. Our goal is to allow R users to easily identify baseline performance for their classification and regression problems. Our baseline models use no predictors, and are useful in cases of class imbalance, multiclass classification, and when users want to quickly identify how much improvement their statistical and machine learning models are over several baseline models. We use a "better" default (proportional guessing) for the dummy classifier than the 'Python' implementation ("prior", which is the most frequent class in the training set). The functions in the package can be used on their own, or introduce methods named 'dummy_regressor' or 'dummy_classifier' that can be used within the caret package pipeline.

Maintained by Ying-Ju Chen. Last updated 2 years ago.

6.3 match 3.70 score 7 scripts

cran

compositions:Compositional Data Analysis

Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.

Maintained by K. Gerald van den Boogaart. Last updated 1 years ago.

openblas

3.7 match 1 stars 6.35 score 36 dependents

ghtaranto

scapesClassification:User-Defined Classification of Raster Surfaces

Series of algorithms to translate users' mental models of seascapes, landscapes and, more generally, of geographic features into computer representations (classifications). Spaces and geographic objects are classified with user-defined rules taking into account spatial data as well as spatial relationships among different classes and objects.

Maintained by Gerald H. Taranto. Last updated 3 years ago.

classification-algorithm object-detection raster spatial

5.5 match 1 stars 4.22 score 33 scripts

easystats

see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'

Provides plotting utilities supporting packages in the 'easystats' ecosystem (<https://github.com/easystats/easystats>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <https://materialui.co/>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.

Maintained by Indrajeet Patil. Last updated 5 days ago.

data-visualization easystats ggplot2 hacktoberfest plotting see statistics visualisation visualization

1.8 match 902 stars 13.22 score 2.0k scripts 3 dependents

bioc

immunoClust:immunoClust - Automated Pipeline for Population Detection in Flow Cytometry

immunoClust is a model based clustering approach for Flow Cytometry samples. The cell-events of single Flow Cytometry samples are modelled by a mixture of multinominal normal- or t-distributions. The cell-event clusters of several samples are modelled by a mixture of multinominal normal-distributions aiming stable co-clusters across these samples.

Maintained by Till Soerensen. Last updated 4 months ago.

clustering flowcytometry singlecell cellbasedassays immunooncology gsl cpp

5.3 match 4.38 score 4 scripts

bozenne

BuyseTest:Generalized Pairwise Comparisons

Implementation of the Generalized Pairwise Comparisons (GPC) as defined in Buyse (2010) <doi:10.1002/sim.3923> for complete observations, and extended in Peron (2018) <doi:10.1177/0962280216658320> to deal with right-censoring. GPC compare two groups of observations (intervention vs. control group) regarding several prioritized endpoints to estimate the probability that a random observation drawn from one group performs better/worse/equivalently than a random observation drawn from the other group. Summary statistics such as the net treatment benefit, win ratio, or win odds are then deduced from these probabilities. Confidence intervals and p-values are obtained based on asymptotic results (Ozenne 2021 <doi:10.1177/09622802211037067>), non-parametric bootstrap, or permutations. The software enables the use of thresholds of minimal importance difference, stratification, non-prioritized endpoints (O Brien test), and can handle right-censoring and competing-risks.

Maintained by Brice Ozenne. Last updated 4 days ago.

generalized-pairwise-comparisons non-parametric statistics cpp

3.9 match 5 stars 5.91 score 90 scripts

dusadrian

admisc:Adrian Dusa's Miscellaneous

Contains functions used across packages 'DDIwR', 'QCA' and 'venn'. Interprets and translates, factorizes and negates SOP - Sum of Products expressions, for both binary and multi-value crisp sets, and extracts information (set names, set values) from those expressions. Other functions perform various other checks if possibly numeric (even if all numbers reside in a character vector) and coerce to numeric, or check if the numbers are whole. It also offers, among many others, a highly versatile recoding routine and some more flexible alternatives to the base functions 'with()' and 'within()'. SOP simplification functions in this package use related minimization from package 'QCA', which is recommended to be installed despite not being listed in the Imports field, due to circular dependency issues.

Maintained by Adrian Dusa. Last updated 3 days ago.

3.0 match 2 stars 7.61 score 20 scripts 92 dependents

hendersontrent

theftdlc:Analyse and Interpret Time Series Features

Provides a suite of functions for analysing, interpreting, and visualising time-series features calculated from different feature sets from the 'theft' package. Implements statistical learning methodologies described in Henderson, T., Bryant, A., and Fulcher, B. (2023) <arXiv:2303.17809>.

Maintained by Trent Henderson. Last updated 1 months ago.

data-science data-visualization machine-learning statistics time-series

4.6 match 4 stars 4.94 score 11 scripts

spatstat

spatstat.geom:Geometrical Functionality of the 'spatstat' Family

Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)

Maintained by Adrian Baddeley. Last updated 1 days ago.

classes-and-objects distance-calculation geometry geometry-processing images mensuration plotting point-patterns spatial-data spatial-data-analysis

1.9 match 7 stars 12.11 score 241 scripts 227 dependents

jwijffels

RMOA:Connect R with MOA for Massive Online Analysis

Connect R with MOA (Massive Online Analysis - <https://moa.cms.waikato.ac.nz/>) to build classification models and regression models on streaming data or out-of-RAM data. Also streaming recommendation models are made available.

Maintained by Jan Wijffels. Last updated 3 years ago.

openjdk

9.0 match 1 stars 2.53 score 34 scripts

bioc

GenomicAlignments:Representation and manipulation of short genomic alignments

Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.

Maintained by Hervé Pagès. Last updated 5 months ago.

infrastructure dataimport genetics sequencing rnaseq snp coverage alignment immunooncology bioconductor-package core-package

1.7 match 10 stars 13.61 score 3.1k scripts 529 dependents

rstudio

tfhub:Interface to 'TensorFlow' Hub

'TensorFlow' Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a 'TensorFlow' graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Transfer learning train a model with a smaller dataset, improve generalization, and speed up training.

Maintained by Tomasz Kalinowski. Last updated 3 years ago.

3.0 match 29 stars 7.46 score 73 scripts 1 dependents

inbo

effectclass:Classification and Visualisation of Effects

Classify effects by comparing the confidence intervals with thresholds.

Maintained by Thierry Onkelinx. Last updated 10 months ago.

effect-size fan-chart

4.2 match 6 stars 5.30 score 37 scripts 1 dependents

sebkrantz

osmclass:Classify Open Street Map Features

Classify Open Street Map (OSM) features into meaningful functional or analytical categories. Designed for OSM PBF files, e.g. from <https://download.geofabrik.de/> imported as spatial data frames. A classification consists of a list of categories that are related to certain OSM tags and values. Given a layer from an OSM PBF file and a classification, the main osm_classify() function returns a classification data table giving, for each feature, the primary and alternative categories (if there is overlap) assigned, and the tag(s) and value(s) matched on. The package also contains a classification of OSM features by economic function/significance, following Krantz (2023) <https://www.ssrn.com/abstract=4537867>.

Maintained by Sebastian Krantz. Last updated 7 months ago.

7.3 match 1 stars 3.00 score 5 scripts

ai-jyc

GENEAclassify:Segmentation and Classification of Accelerometer Data

Segmentation and classification procedures for data from the 'Activinsights GENEActiv' <https://activinsights.com/technology/geneactiv/> accelerometer that provides the user with a model to guess behaviour from test data where behaviour is missing. Includes a step counting algorithm, a function to create segmented data with custom features and a function to use recursive partitioning provided in the function rpart() of the 'rpart' package to create classification models.

Maintained by Jia Ying Chua. Last updated 1 years ago.

5.6 match 1 stars 3.88 score 51 scripts

bioc

HIBAG:HLA Genotype Imputation with Attribute Bagging

Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.

Maintained by Xiuwen Zheng. Last updated 4 months ago.

genetics statisticalmethod bioinformatics gpu hla imputation mhc snp cpp

2.7 match 30 stars 8.24 score 48 scripts

jpfitzinger

tidyfit:Regularized Linear Modeling with Tidy Data

An extension to the 'R' tidy data environment for automated machine learning. The package allows fitting and cross validation of linear regression and classification algorithms on grouped data.

Maintained by Johann Pfitzinger. Last updated 2 months ago.

auto-ml classification machine-learning regression tidyverse

3.0 match 16 stars 7.22 score 26 scripts

bioc

sampleClassifier:Sample Classifier

The package is designed to classify microarray RNA-seq gene expression profiles.

Maintained by Khadija El Amrani. Last updated 5 months ago.

immunooncology classification microarray rnaseq geneexpression

6.6 match 3.30 score

mthrun

DataVisualizations:Visualizations of High-Dimensional Data

Gives access to data visualisation methods that are relevant from the data scientist's point of view. The flagship idea of 'DataVisualizations' is the mirrored density plot (MD-plot) for either classified or non-classified multivariate data published in Thrun, M.C. et al.: "Analyzing the Fine Structure of Distributions" (2020), PLoS ONE, <DOI:10.1371/journal.pone.0238835>. The MD-plot outperforms the box-and-whisker diagram (box plot), violin plot and bean plot and geom_violin plot of ggplot2. Furthermore, a collection of various visualization methods for univariate data is provided. In the case of exploratory data analysis, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through a combination of four methods. One of these methods is the Pareto density estimation (PDE) of the probability density function (pdf). Additionally, visualizations of the distribution of distances using PDE, the scatter-density plot using PDE for two variables as well as the Shepard density plot and the Bland-Altman plot are presented here. Pertaining to classified high-dimensional data, a number of visualizations are described, such as f.ex. the heat map and silhouette plot. A political map of the world or Germany can be visualized with the additional information defined by a classification of countries or regions. By extending the political map further, an uncomplicated function for a Choropleth map can be used which is useful for measurements across a geographic area. For categorical features, the Pie charts, slope charts and fan plots, improved by the ABC analysis, become usable. More detailed explanations are found in the book by Thrun, M.C.: "Projection-Based Clustering through Self-Organization and Swarm Intelligence" (2018) <DOI:10.1007/978-3-658-20540-9>.

Maintained by Michael Thrun. Last updated 2 months ago.

cpp

2.8 match 7 stars 7.72 score 118 scripts 7 dependents

andysouth

rworldmap:Mapping Global Data

Enables mapping of country level and gridded user datasets.

Maintained by Andy South. Last updated 2 years ago.

1.8 match 30 stars 11.83 score 3.2k scripts 14 dependents

c-monaghan

lwc2022:Langa-Weir Classification of Cognitive Function for 2022 HRS Data

Generates the Langa-Weir classification of cognitive function for the 2022 Health and Retirement Study (HRS) cognition data. It is particularly useful for researchers studying cognitive aging who wish to work with the most recent release of HRS data. The package provides user-friendly functions for data preprocessing, scoring, and classification allowing users to easily apply the Langa-Weir classification system. For details regarding the; HRS <https://hrsdata.isr.umich.edu/> and Langa-Weir classifications <https://hrsdata.isr.umich.edu/data-products/langa-weir-classification-cognitive-function-1995-2020>.

Maintained by Cormac Monaghan. Last updated 4 months ago.

4.8 match 4.48 score 4 scripts

drordas

D2MCS:Data Driving Multiple Classifier System

Provides a novel framework to able to automatically develop and deploy an accurate Multiple Classifier System based on the feature-clustering distribution achieved from an input dataset. 'D2MCS' was developed focused on four main aspects: (i) the ability to determine an effective method to evaluate the independence of features, (ii) the identification of the optimal number of feature clusters, (iii) the training and tuning of ML models and (iv) the execution of voting schemes to combine the outputs of each classifier comprising the Multiple Classifier System.

Maintained by Miguel Ferreiro-Díaz. Last updated 3 years ago.

openjdk

5.7 match 3.70 score

mpjashby

sfhotspot:Hot-Spot Analysis with Simple Features

Identify and understand clusters of points (typically representing the locations of places or events) stored in simple-features (SF) objects. This is useful for analysing, for example, hot-spots of crime events. The package emphasises producing results from point SF data in a single step using reasonable default values for all other arguments, to aid rapid data analysis by users who are starting out. Functions available include kernel density estimation (for details, see Yip (2020) <doi:10.22224/gistbok/2020.1.12>), analysis of spatial association (Getis and Ord (1992) <doi:10.1111/j.1538-4632.1992.tb00261.x>) and hot-spot classification (Chainey (2020) ISBN:158948584X).

Maintained by Matt Ashby. Last updated 23 days ago.

hotspot hotspots hotspots-analysis mapping mapping-tools

3.8 match 12 stars 5.56 score 30 scripts

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 11 months ago.

9.2 match 2.28 score 38 scripts

bioc

SPONGE:Sparse Partial Correlations On Gene Expression

This package provides methods to efficiently detect competitive endogeneous RNA interactions between two genes. Such interactions are mediated by one or several miRNAs such that both gene and miRNA expression data for a larger number of samples is needed as input. The SPONGE package now also includes spongEffects: ceRNA modules offer patient-specific insights into the miRNA regulatory landscape.

Maintained by Markus List. Last updated 5 months ago.

geneexpression transcription generegulation networkinference transcriptomics systemsbiology regression randomforest machinelearning

3.9 match 5.36 score 38 scripts 1 dependents

svilsen

RWNN:Random Weight Neural Networks

Creation, estimation, and prediction of random weight neural networks (RWNN), Schmidt et al. (1992) <doi:10.1109/ICPR.1992.201708>, including popular variants like extreme learning machines, Huang et al. (2006) <doi:10.1016/j.neucom.2005.12.126>, sparse RWNN, Zhang et al. (2019) <doi:10.1016/j.neunet.2019.01.007>, and deep RWNN, Henríquez et al. (2018) <doi:10.1109/IJCNN.2018.8489703>. It further allows for the creation of ensemble RWNNs like bagging RWNN, Sui et al. (2021) <doi:10.1109/ECCE47101.2021.9595113>, boosting RWNN, stacking RWNN, and ensemble deep RWNN, Shi et al. (2021) <doi:10.1016/j.patcog.2021.107978>.

Maintained by Søren B. Vilsen. Last updated 22 days ago.

openblas cpp openmp

6.0 match 3.40 score

dmurdoch

plotrix:Various Plotting Functions

Lots of plots, various labeling, axis and color scaling functions. The author/maintainer died in September 2023.

Maintained by Duncan Murdoch. Last updated 1 years ago.

1.8 match 5 stars 11.31 score 9.2k scripts 361 dependents

taikisan21

PamBinaries:Read and Process 'Pamguard' Binary Data

Functions for easily reading and processing binary data files created by 'Pamguard' (<https://www.pamguard.org/>). All functions for directly reading the binary data files are based on 'MATLAB' code written by Michael Oswald.

Maintained by Taiki Sakai. Last updated 2 months ago.

3.8 match 10 stars 5.39 score 18 scripts 3 dependents

kurthornik

mlbench:Machine Learning Benchmark Problems

A collection of artificial and real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.

Maintained by Kurt Hornik. Last updated 3 months ago.

2.3 match 2 stars 8.93 score 5.0k scripts 55 dependents

runzhiz

asmbPLS:Predicting and Classifying Patient Phenotypes with Multi-Omics Data

Adaptive Sparse Multi-block Partial Least Square, a supervised algorithm, is an extension of the Sparse Multi-block Partial Least Square, which allows different quantiles to be used in different blocks of different partial least square components to decide the proportion of features to be retained. The best combinations of quantiles can be chosen from a set of user-defined quantiles combinations by cross-validation. By doing this, it enables us to do the feature selection for different blocks, and the selected features can then be further used to predict the outcome. For example, in biomedical applications, clinical covariates plus different types of omics data such as microbiome, metabolome, mRNA data, methylation data, copy number variation data might be predictive for patients outcome such as survival time or response to therapy. Different types of data could be put in different blocks and along with survival time to fit the model. The fitted model can then be used to predict the survival for the new samples with the corresponding clinical covariates and omics data. In addition, Adaptive Sparse Multi-block Partial Least Square Discriminant Analysis is also included, which extends Adaptive Sparse Multi-block Partial Least Square for classifying the categorical outcome.

Maintained by Runzhi Zhang. Last updated 12 months ago.

openblas cpp openmp

5.1 match 1 stars 3.95 score 18 scripts

bioc

Rmagpie:MicroArray Gene-expression-based Program In Error rate estimation

Microarray Classification is designed for both biologists and statisticians. It offers the ability to train a classifier on a labelled microarray dataset and to then use that classifier to predict the class of new observations. A range of modern classifiers are available, including support vector machines (SVMs), nearest shrunken centroids (NSCs)... Advanced methods are provided to estimate the predictive error rate and to report the subset of genes which appear essential in discriminating between classes.

Maintained by Camille Maumet. Last updated 5 months ago.

microarray classification

6.0 match 3.30 score 1 scripts

klausvigo

kknn:Weighted k-Nearest Neighbors

Weighted k-Nearest Neighbors for Classification, Regression and Clustering.

Maintained by Klaus Schliep. Last updated 4 years ago.

nearest-neighbor

1.8 match 23 stars 11.08 score 4.6k scripts 41 dependents

r-forge

fuzzySim:Fuzzy Similarity in Species Distributions

Functions to compute fuzzy versions of species occurrence patterns based on presence-absence data (including inverse distance interpolation, trend surface analysis, and prevalence-independent favourability obtained from probability of presence), as well as pair-wise fuzzy similarity (based on fuzzy logic versions of commonly used similarity indices) among those occurrence patterns. Includes also functions for model consensus and comparison (overlap and fuzzy similarity, fuzzy loss, fuzzy gain), and for data preparation, such as obtaining unique abbreviations of species names, defining the background region, cleaning and gridding (thinning) point occurrence data onto raster maps, selecting among (pseudo)absences to address survey bias, converting species lists (long format) to presence-absence tables (wide format), transposing part of a data frame, selecting relevant variables for models, assessing the false discovery rate, or analysing and dealing with multicollinearity. Initially described in Barbosa (2015) <doi:10.1111/2041-210X.12372>.

Maintained by A. Marcia Barbosa. Last updated 20 days ago.

3.7 match 2 stars 5.35 score 156 scripts

cran

rbooster:AdaBoost Framework for Any Classifier

This is a simple package which provides a function that boosts pre-ready or custom-made classifiers. Package uses Discrete AdaBoost (<doi:10.1006/jcss.1997.1504>) and Real AdaBoost (<doi:10.1214/aos/1016218223>) for two class, SAMME (<doi:10.4310/SII.2009.v2.n3.a8>) and SAMME.R (<doi:10.4310/SII.2009.v2.n3.a8>) for multiclass classification.

Maintained by Fatih Saglam. Last updated 3 years ago.

7.3 match 2.70 score 6 scripts

alexanderrobitzsch

CDM:Cognitive Diagnosis Modeling

Functions for cognitive diagnosis modeling and multidimensional item response modeling for dichotomous and polytomous item responses. This package enables the estimation of the DINA and DINO model (Junker & Sijtsma, 2001, <doi:10.1177/01466210122032064>), the multiple group (polytomous) GDINA model (de la Torre, 2011, <doi:10.1007/s11336-011-9207-7>), the multiple choice DINA model (de la Torre, 2009, <doi:10.1177/0146621608320523>), the general diagnostic model (GDM; von Davier, 2008, <doi:10.1348/000711007X193957>), the structured latent class model (SLCA; Formann, 1992, <doi:10.1080/01621459.1992.10475229>) and regularized latent class analysis (Chen, Li, Liu, & Ying, 2017, <doi:10.1007/s11336-016-9545-6>). See George, Robitzsch, Kiefer, Gross, and Uenlue (2017) <doi:10.18637/jss.v074.i02> or Robitzsch and George (2019, <doi:10.1007/978-3-030-05584-4_26>) for further details on estimation and the package structure. For tutorials on how to use the CDM package see George and Robitzsch (2015, <doi:10.20982/tqmp.11.3.p189>) as well as Ravand and Robitzsch (2015).

Maintained by Alexander Robitzsch. Last updated 9 months ago.

cognitive-diagnostic-models item-response-theory cpp

2.3 match 22 stars 8.76 score 138 scripts 28 dependents

thiyangt

seer:Feature-Based Forecast Model Selection

A novel meta-learning framework for forecast model selection using time series features. Many applications require a large number of time series to be forecast. Providing better forecasts for these time series is important in decision and policy making. We propose a classification framework which selects forecast models based on features calculated from the time series. We call this framework FFORMS (Feature-based FORecast Model Selection). FFORMS builds a mapping that relates the features of time series to the best forecast model using a random forest. 'seer' package is the implementation of the FFORMS algorithm. For more details see our paper at <https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp06-2018.pdf>.

Maintained by Thiyanga Talagala. Last updated 2 years ago.

3.7 match 78 stars 5.31 score 52 scripts

navdeep-g

h2o4gpu:Interface to 'H2O4GPU'

Interface to 'H2O4GPU' <https://github.com/h2oai/h2o4gpu>, a collection of 'GPU' solvers for machine learning algorithms.

Maintained by Navdeep Gill. Last updated 4 years ago.

6.0 match 1 stars 3.24 score 35 scripts

bleutner

RStoolbox:Remote Sensing Data Analysis

Toolbox for remote sensing image processing and analysis such as calculating spectral indexes, principal component transformation, unsupervised and supervised classification or fractional cover analyses.

Maintained by Konstantin Mueller. Last updated 1 months ago.

ggplot2 land-cover-mapping remote-sensing spectral-unmixing supervised-classification unsupervised-classification openblas cpp

1.9 match 275 stars 10.10 score 1.1k scripts