Showing 200 of total 845 results (show query)

topepo

caret:Classification and Regression Training

Misc functions for training and plotting classification and regression models.

Maintained by Max Kuhn. Last updated 3 months ago.

29.3 match 1.6k stars 19.24 score 61k scripts 303 dependents

bioc

PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit

Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationsurvivalclusteringgeneprediction

53.0 match 1 stars 4.31 score 17 scripts

e-sensing

sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes

An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.

Maintained by Gilberto Camara. Last updated 1 months ago.

big-earth-datacbersearth-observationeo-datacubesgeospatialimage-time-seriesland-cover-classificationlandsatplanetary-computerr-spatialremote-sensingrspatialsatellite-image-time-seriessatellite-imagerysentinel-2stac-apistac-catalogcpp

18.6 match 494 stars 9.50 score 384 scripts

tidymodels

rsample:General Resampling Infrastructure

Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).

Maintained by Hannah Frick. Last updated 6 days ago.

6.5 match 341 stars 16.72 score 5.2k scripts 79 dependents

runxiao

deepnet:Deep Learning Toolkit in R

Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.

Maintained by Xiao Rong. Last updated 3 years ago.

15.7 match 24 stars 4.79 score 131 scripts 1 dependents

r-lib

scales:Scale Functions for Visualization

Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.

Maintained by Thomas Lin Pedersen. Last updated 5 months ago.

ggplot2

3.8 match 419 stars 19.88 score 88k scripts 7.9k dependents

fberding

aifeducation:Artificial Intelligence for Education

In social and educational settings, the use of Artificial Intelligence (AI) is a challenging task. Relevant data is often only available in handwritten forms, or the use of data is restricted by privacy policies. This often leads to small data sets. Furthermore, in the educational and social sciences, data is often unbalanced in terms of frequencies. To support educators as well as educational and social researchers in using the potentials of AI for their work, this package provides a unified interface for neural nets in 'PyTorch' to deal with natural language problems. In addition, the package ships with a shiny app, providing a graphical user interface. This allows the usage of AI for people without skills in writing python/R scripts. The tools integrate existing mathematical and statistical methods for dealing with small data sets via pseudo-labeling (e.g. Cascante-Bonilla et al. (2020) <doi:10.48550/arXiv.2001.06001>) and imbalanced data via the creation of synthetic cases (e.g. Bunkhumpornpat et al. (2012) <doi:10.1007/s10489-011-0287-y>). Performance evaluation of AI is connected to measures from content analysis which educational and social researchers are generally more familiar with (e.g. Berding & Pargmann (2022) <doi:10.30819/5581>, Gwet (2014) <ISBN:978-0-9708062-8-4>, Krippendorff (2019) <doi:10.4135/9781071878781>). Estimation of energy consumption and CO2 emissions during model training is done with the 'python' library 'codecarbon'. Finally, all objects created with this package allow to share trained AI models with other people.

Maintained by Berding Florian. Last updated 1 months ago.

cpp

14.5 match 4.48 score 8 scripts

sbgraves237

Ecdat:Data Sets for Econometrics

Data sets for econometrics, including political science.

Maintained by Spencer Graves. Last updated 4 months ago.

7.3 match 2 stars 7.25 score 740 scripts 3 dependents

klausvigo

kknn:Weighted k-Nearest Neighbors

Weighted k-Nearest Neighbors for Classification, Regression and Clustering.

Maintained by Klaus Schliep. Last updated 4 years ago.

nearest-neighbor

4.5 match 23 stars 11.08 score 4.6k scripts 41 dependents

maxwell-geospatial

geodl:Geospatial Semantic Segmentation with Torch and Terra

Provides tools for semantic segmentation of geospatial data using convolutional neural network-based deep learning. Utility functions allow for creating masks, image chips, data frames listing image chips in a directory, and DataSets for use within DataLoaders. Additional functions are provided to serve as checks during the data preparation and training process. A UNet architecture can be defined with 4 blocks in the encoder, a bottleneck block, and 4 blocks in the decoder. The UNet can accept a variable number of input channels, and the user can define the number of feature maps produced in each encoder and decoder block and the bottleneck. Users can also choose to (1) replace all rectified linear unit (ReLU) activation functions with leaky ReLU or swish, (2) implement attention gates along the skip connections, (3) implement squeeze and excitation modules within the encoder blocks, (4) add residual connections within all blocks, (5) replace the bottleneck with a modified atrous spatial pyramid pooling (ASPP) module, and/or (6) implement deep supervision using predictions generated at each stage in the decoder. A unified focal loss framework is implemented after Yeung et al. (2022) <https://doi.org/10.1016/j.compmedimag.2021.102026>. We have also implemented assessment metrics using the 'luz' package including F1-score, recall, and precision. Trained models can be used to predict to spatial data without the need to generate chips from larger spatial extents. Functions are available for performing accuracy assessment. The package relies on 'torch' for implementing deep learning, which does not require the installation of a 'Python' environment. Raster geospatial data are handled with 'terra'. Models can be trained using a Compute Unified Device Architecture (CUDA)-enabled graphics processing unit (GPU); however, multi-GPU training is not supported by 'torch' in 'R'.

Maintained by Aaron Maxwell. Last updated 8 months ago.

6.5 match 12 stars 6.98 score 20 scripts

bnaras

pamr:Pam: Prediction Analysis for Microarrays

Some functions for sample classification in microarrays.

Maintained by Balasubramanian Narasimhan. Last updated 9 months ago.

5.7 match 7.90 score 256 scripts 14 dependents

tidymodels

butcher:Model Butcher

Provides a set of S3 generics to axe components of fitted model objects and help reduce the size of model objects saved to disk.

Maintained by Julia Silge. Last updated 14 days ago.

3.9 match 132 stars 11.54 score 146 scripts 13 dependents

neurodata

lolR:Linear Optimal Low-Rank Projection

Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this High dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package and the associated manuscript Vogelstein et al. (2017) <arXiv:1709.01233>, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.

Maintained by Eric Bridgeford. Last updated 4 years ago.

5.6 match 20 stars 7.28 score 80 scripts

andyliaw-mrk

locfit:Local Regression, Likelihood and Density Estimation

Local regression, likelihood and density estimation methods as described in the 1999 book by Loader.

Maintained by Andy Liaw. Last updated 12 days ago.

4.0 match 1 stars 9.40 score 428 scripts 606 dependents

alanarnholt

BSDA:Basic Statistics and Data Analysis

Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.

Maintained by Alan T. Arnholt. Last updated 2 years ago.

3.3 match 7 stars 9.11 score 1.3k scripts 6 dependents

tidyverse

modelr:Modelling Functions that Work with the Pipe

Functions for modelling that help you seamlessly integrate modelling into a pipeline of data manipulation and visualisation.

Maintained by Hadley Wickham. Last updated 1 years ago.

modelling

1.8 match 401 stars 16.44 score 6.9k scripts 1.0k dependents