Showing 200 of total 402 results (show query)
modeloriented
DALEX:moDel Agnostic Language for Exploration and eXplanation
Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.
Maintained by Przemyslaw Biecek. Last updated 1 months ago.
black-boxdalexdata-scienceexplainable-aiexplainable-artificial-intelligenceexplainable-mlexplanationsexplanatory-model-analysisfairnessimlinterpretabilityinterpretable-machine-learningmachine-learningmodel-visualizationpredictive-modelingresponsible-airesponsible-mlxai
38.0 match 1.4k stars 13.40 score 876 scripts 21 dependentspersimune
explainer:Machine Learning Model Explainer
It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements.
Maintained by Ramtin Zargari Marandi. Last updated 6 months ago.
aiclassificationclinical-researchexplainabilityexplainable-aiinterpretabilitymachine-learningregressionshapstatistics
75.7 match 13 stars 5.37 score 12 scriptscarloshellin
LearningRlab:Statistical Learning Functions
Aids in learning statistical functions incorporating the result of calculus done with each function and how they are obtained, that is, which equation and variables are used. Also for all these equations and their related variables detailed explanations and interactive exercises are also included. All these characteristics allow to the package user to improve the learning of statistics basics by means of their use.
Maintained by Carlos Javier Hellin Asensio. Last updated 2 years ago.
99.5 match 3.64 score 44 scriptsnorskregnesentral
shapr:Prediction Explanation with Dependence-Aware Shapley Values
Complex machine learning models are often hard to interpret. However, in many situations it is crucial to understand and explain why a model made a specific prediction. Shapley values is the only method for such prediction explanation framework with a solid theoretical foundation. Previously known methods for estimating the Shapley values do, however, assume feature independence. This package implements methods which accounts for any feature dependence, and thereby produces more accurate estimates of the true Shapley values. An accompanying 'Python' wrapper ('shaprpy') is available through the GitHub repository.
Maintained by Martin Jullum. Last updated 1 months ago.
explainable-aiexplainable-mlrcpprcpparmadilloshapleyopenblascppopenmp
30.1 match 153 stars 10.62 score 175 scripts 1 dependentsmodeloriented
survex:Explainable Machine Learning in Survival Analysis
Survival analysis models are commonly used in medicine and other areas. Many of them are too complex to be interpreted by human. Exploration and explanation is needed, but standard methods do not give a broad enough picture. 'survex' provides easy-to-apply methods for explaining survival models, both complex black-boxes and simpler statistical models. They include methods specific to survival analysis such as SurvSHAP(t) introduced in Krzyzinski et al., (2023) <doi:10.1016/j.knosys.2022.110234>, SurvLIME described in Kovalev et al., (2020) <doi:10.1016/j.knosys.2020.106164> as well as extensions of existing ones described in Biecek et al., (2021) <doi:10.1201/9780429027192>.
Maintained by Mikołaj Spytek. Last updated 9 months ago.
biostatisticsbrier-scorescensored-datacox-modelcox-regressionexplainable-aiexplainable-machine-learningexplainable-mlexplanatory-model-analysisinterpretable-machine-learninginterpretable-mlmachine-learningprobabilistic-machine-learningshapsurvival-analysistime-to-eventvariable-importancexai
37.6 match 110 stars 8.40 score 114 scriptssvkucheryavski
mdatools:Multivariate Data Analysis for Chemometrics
Projection based methods for preprocessing, exploring and analysis of multivariate data used in chemometrics. S. Kucheryavskiy (2020) <doi:10.1016/j.chemolab.2020.103937>.
Maintained by Sergey Kucheryavskiy. Last updated 8 months ago.
30.7 match 35 stars 7.37 score 220 scripts 1 dependentsmodeloriented
modelStudio:Interactive Studio for Explanatory Model Analysis
Automate the explanatory analysis of machine learning predictive models. Generate advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks. The main function computes various (instance and model-level) explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. It is possible to easily save the dashboard and share it with others. 'modelStudio' facilitates the process of Interactive Explanatory Model Analysis introduced in Baniecki et al. (2023) <doi:10.1007/s10618-023-00924-w>.
Maintained by Hubert Baniecki. Last updated 2 years ago.
aiexplainableexplainable-aiexplainable-machine-learningexplanatory-model-analysishumanimlinteractiveinteractivityinterpretabilityinterpretableinterpretable-machine-learninglearningmachinemodelmodel-visualizationvisualizationxai
24.2 match 330 stars 7.92 score 56 scriptsmodeloriented
treeshap:Compute SHAP Values for Your Tree-Based Models Using the 'TreeSHAP' Algorithm
An efficient implementation of the 'TreeSHAP' algorithm introduced by Lundberg et al., (2020) <doi:10.1038/s42256-019-0138-9>. It is capable of calculating SHAP (SHapley Additive exPlanations) values for tree-based models in polynomial time. Currently supported models include 'gbm', 'randomForest', 'ranger', 'xgboost', 'lightgbm'.
Maintained by Mateusz Krzyzinski. Last updated 1 years ago.
explainabilityexplainable-aiexplainable-artificial-intelligenceexplanatory-model-analysisimlinterpretabilityinterpretable-machine-learningmachine-learningresponsible-mlshapshapley-valuexaicpp
24.2 match 82 stars 6.69 score 170 scriptsbgreenwell
fastshap:Fast Approximate Shapley Values
Computes fast (relative to other implementations) approximate Shapley values for any supervised learning model. Shapley values help to explain the predictions from any black box model using ideas from game theory; see Strumbel and Kononenko (2014) <doi:10.1007/s10115-013-0679-x> for details.
Maintained by Brandon Greenwell. Last updated 1 years ago.
explainable-aiexplainable-mlinterpretable-machine-learningshapleyshapley-valuesvariable-importancexaicpp
18.8 match 118 stars 8.56 score 155 scripts 2 dependentsbgreenwell
ebm:Explainable Boosting Machines
An interface to the 'Python' 'InterpretML' framework for fitting explainable boosting machines (EBMs); see Nori et al. (2019) <doi:10.48550/arXiv.1909.09223> for. EBMs are a modern type of generalized additive model that use tree-based, cyclic gradient boosting with automatic interaction detection. They are often as accurate as state-of-the-art blackbox models while remaining completely interpretable.
Maintained by Brandon M. Greenwell. Last updated 11 days ago.
aiblackboxexplainable-aiexplainable-machine-learningexplainable-mlglassboximlinterpretabilityinterpretability-and-explainabilityinterpretableinterpretable-aiinterpretable-machine-learninginterpretable-mlinterpretable-modelsmachine-learningxai
34.9 match 1 stars 4.60 scorerolkra
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 3 months ago.
data-explorationdata-visualisationdecision-treesedarmarkdownshinytidy
13.9 match 228 stars 11.43 score 221 scripts 1 dependentstidyverse
dplyr:A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
Maintained by Hadley Wickham. Last updated 13 days ago.
5.3 match 4.8k stars 24.68 score 659k scripts 7.8k dependentsmodeloriented
arenar:Arena for the Exploration and Comparison of any ML Models
Generates data for challenging machine learning models in 'Arena' <https://arena.drwhy.ai> - an interactive web application. You can start the server with XAI (Explainable Artificial Intelligence) plots to be generated on-demand or precalculate and auto-upload data file beside shareable 'Arena' URL.
Maintained by Piotr Piątyszek. Last updated 4 years ago.
axplainable-artificial-intelligenceemaexplainabilityexplanatory-model-analysisimlinteractive-xaiinterpretabilityxai
20.9 match 31 stars 5.94 score 14 scriptsbioc
MOFA2:Multi-Omics Factor Analysis v2
The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.
Maintained by Ricard Argelaguet. Last updated 5 months ago.
dimensionreductionbayesianvisualizationfactor-analysismofamulti-omics
12.0 match 319 stars 10.02 score 502 scriptsmodeloriented
vivo:Variable Importance via Oscillations
Provides an easy to calculate local variable importance measure based on Ceteris Paribus profile and global variable importance measure based on Partial Dependence Profiles.
Maintained by Anna Kozak. Last updated 4 years ago.
explainable-aiexplainable-artificial-intelligenceexplainable-mlimlinterpretable-machine-learningvariable-importancexai
21.7 match 14 stars 5.45 score 7 scriptsmodeloriented
fairmodels:Flexible Tool for Bias Detection, Visualization, and Mitigation
Measure fairness metrics in one place for many models. Check how big is model's bias towards different races, sex, nationalities etc. Use measures such as Statistical Parity, Equal odds to detect the discrimination against unprivileged groups. Visualize the bias using heatmap, radar plot, biplot, bar chart (and more!). There are various pre-processing and post-processing bias mitigation algorithms implemented. Package also supports calculating fairness metrics for regression models. Find more details in (Wiśniewski, Biecek (2021)) <arXiv:2104.00507>.
Maintained by Jakub Wiśniewski. Last updated 1 months ago.
explain-classifiersexplainable-mlfairnessfairness-comparisonfairness-mlmodel-evaluation
15.0 match 86 stars 7.72 score 51 scripts 1 dependentsbioc
BatchQC:Batch Effects Quality Control Software
Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.
Maintained by Jessica McClintock. Last updated 5 months ago.
batcheffectgraphandnetworkmicroarraynormalizationprincipalcomponentsequencingsoftwarevisualizationqualitycontrolrnaseqpreprocessingdifferentialexpressionimmunooncology
11.9 match 7 stars 8.99 score 54 scriptsmodeloriented
DALEXtra:Extension for 'DALEX' Package
Provides wrapper of various machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the interpretable machine learning, there are more and more new ideas for explaining black-box models, that are implemented in 'R'. 'DALEXtra' creates 'DALEX' Biecek (2018) <arXiv:1806.08915> explainer for many type of models including those created using 'python' 'scikit-learn' and 'keras' libraries, and 'java' 'h2o' library. Important part of the package is Champion-Challenger analysis and innovative approach to model performance across subsets of test data presented in Funnel Plot.
Maintained by Szymon Maksymiuk. Last updated 2 years ago.
12.1 match 67 stars 7.71 score 400 scripts 1 dependentsmodeloriented
ingredients:Effects and Importances of Model Ingredients
Collection of tools for assessment of feature importance and feature effects. Key functions are: feature_importance() for assessment of global level feature importance, ceteris_paribus() for calculation of the what-if plots, partial_dependence() for partial dependence plots, conditional_dependence() for conditional dependence plots, accumulated_dependence() for accumulated local effects plots, aggregate_profiles() and cluster_profiles() for aggregation of ceteris paribus profiles, generic print() and plot() for better usability of selected explainers, generic plotD3() for interactive, D3 based explanations, and generic describe() for explanations in natural language. The package 'ingredients' is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.
Maintained by Przemyslaw Biecek. Last updated 2 years ago.
8.7 match 37 stars 10.38 score 83 scripts 22 dependentsthomasp85
lime:Local Interpretable Model-Agnostic Explanations
When building complex models, it is often difficult to explain why the model should be trusted. While global measures such as accuracy are useful, they cannot be used for explaining why a model made a specific prediction. 'lime' (a port of the 'lime' 'Python' package) is a method for explaining the outcome of black box models by fitting a local model around the point in question an perturbations of this point. The approach is described in more detail in the article by Ribeiro et al. (2016) <arXiv:1602.04938>.
Maintained by Emil Hvitfeldt. Last updated 3 years ago.
caretmodel-checkingmodel-evaluationmodelingcpp
7.6 match 485 stars 11.07 score 732 scripts 1 dependentsmodeloriented
shapviz:SHAP Visualizations
Visualizations for SHAP (SHapley Additive exPlanations), such as waterfall plots, force plots, various types of importance plots, dependence plots, and interaction plots. These plots act on a 'shapviz' object created from a matrix of SHAP values and a corresponding feature dataset. Wrappers for the R packages 'xgboost', 'lightgbm', 'fastshap', 'shapr', 'h2o', 'treeshap', 'DALEX', and 'kernelshap' are added for convenience. By separating visualization and computation, it is possible to display factor variables in graphs, even if the SHAP values are calculated by a model that requires numerical features. The plots are inspired by those provided by the 'shap' package in Python, but there is no dependency on it.
Maintained by Michael Mayer. Last updated 2 months ago.
explainable-aimachine-learningshapshapley-valuevisualizationxai
7.5 match 89 stars 9.95 score 250 scriptsr-lib
generics:Common S3 Generics not Provided by Base R Methods Related to Model Fitting
In order to reduce potential package dependencies and conflicts, generics provides a number of commonly used S3 generics.
Maintained by Hadley Wickham. Last updated 1 years ago.
5.3 match 61 stars 14.00 score 131 scripts 9.8k dependentspbiecek
ceterisParibus:Ceteris Paribus Profiles
Ceteris Paribus Profiles (What-If Plots) are designed to present model responses around selected points in a feature space. For example around a single prediction for an interesting observation. Plots are designed to work in a model-agnostic fashion, they are working for any predictive Machine Learning model and allow for model comparisons. Ceteris Paribus Plots supplement the Break Down Plots from 'breakDown' package.
Maintained by Przemyslaw Biecek. Last updated 5 years ago.
13.1 match 42 stars 5.48 score 36 scriptsmodeloriented
randomForestExplainer:Explaining and Visualizing Random Forests in Terms of Variable Importance
A set of tools to help explain which variables are most important in a random forests. Various variable importance measures are calculated and visualized in different settings in order to get an idea on how their importance changes depending on our criteria (Hemant Ishwaran and Udaya B. Kogalur and Eiran Z. Gorodeski and Andy J. Minn and Michael S. Lauer (2010) <doi:10.1198/jasa.2009.tm08622>, Leo Breiman (2001) <doi:10.1023/A:1010933404324>).
Maintained by Yue Jiang. Last updated 12 months ago.
6.9 match 231 stars 9.82 score 236 scriptsjuanv66x
viralx:Explainers for Regression Models in HIV Research
A dedicated viral-explainer model tool designed to empower researchers in the field of HIV research, particularly in viral load and CD4 (Cluster of Differentiation 4) lymphocytes regression modeling. Drawing inspiration from the 'tidymodels' framework for rigorous model building of Max Kuhn and Hadley Wickham (2020) <https://www.tidymodels.org>, and the 'DALEXtra' tool for explainability by Przemyslaw Biecek (2020) <doi:10.48550/arXiv.2009.13248>. It aims to facilitate interpretable and reproducible research in biostatistics and computational biology for the benefit of understanding HIV dynamics.
Maintained by Juan Pablo Acuña González. Last updated 4 months ago.
21.5 match 3.00 score 1 scriptsmodeloriented
kernelshap:Kernel SHAP
Efficient implementation of Kernel SHAP, see Lundberg and Lee (2017), and Covert and Lee (2021) <http://proceedings.mlr.press/v130/covert21a>. Furthermore, for up to 14 features, exact permutation SHAP values can be calculated. The package plays well together with meta-learning packages like 'tidymodels', 'caret' or 'mlr3'. Visualizations can be done using the R package 'shapviz'.
Maintained by Michael Mayer. Last updated 6 months ago.
explainable-aiinterpretabilityinterpretable-machine-learningmachine-learningshapxai
7.5 match 45 stars 8.38 score 117 scripts 17 dependentsbioc
scater:Single-Cell Analysis Toolkit for Gene Expression Data in R
A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control and visualization.
Maintained by Alan OCallaghan. Last updated 10 days ago.
immunooncologysinglecellrnaseqqualitycontrolpreprocessingnormalizationvisualizationdimensionreductiontranscriptomicsgeneexpressionsequencingsoftwaredataimportdatarepresentationinfrastructurecoverage
5.3 match 11.07 score 12k scripts 43 dependentsmodeloriented
auditor:Model Audit - Verification, Validation, and Error Analysis
Provides an easy to use unified interface for creating validation plots for any model. The 'auditor' helps to avoid repetitive work consisting of writing code needed to create residual plots. This visualizations allow to asses and compare the goodness of fit, performance, and similarity of models.
Maintained by Alicja Gosiewska. Last updated 1 years ago.
classificationerror-analysisexplainable-artificial-intelligencemachine-learningmodel-validationregression-modelsresidualsxai
6.7 match 58 stars 8.76 score 94 scripts 2 dependentsbioc
MBECS:Evaluation and correction of batch effects in microbiome data-sets
The Microbiome Batch Effect Correction Suite (MBECS) provides a set of functions to evaluate and mitigate unwated noise due to processing in batches. To that end it incorporates a host of batch correcting algorithms (BECA) from various packages. In addition it offers a correction and reporting pipeline that provides a preliminary look at the characteristics of a data-set before and after correcting for batch effects.
Maintained by Michael Olbrich. Last updated 5 months ago.
batcheffectmicrobiomereportwritingvisualizationnormalizationqualitycontrol
12.7 match 4 stars 4.60 score 4 scriptsforestry-labs
distillML:Model Distillation and Interpretability Methods for Machine Learning Models
Provides several methods for model distillation and interpretability for general black box machine learning models and treatment effect estimation methods. For details on the algorithms implemented, see <https://forestry-labs.github.io/distillML/index.html> Brian Cho, Theo F. Saarinen, Jasjeet S. Sekhon, Simon Walter.
Maintained by Theo Saarinen. Last updated 2 years ago.
bartdistillation-modelexplainable-machine-learningexplainable-mlinterpretabilityinterpretable-machine-learningmachine-learningmodelrandom-forestxgboost
14.2 match 7 stars 3.92 score 12 scriptsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 4 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
4.0 match 182 stars 13.71 score 1.3k scripts 22 dependentsbioc
PCAtools:PCAtools: Everything Principal Components Analysis
Principal Component Analysis (PCA) is a very powerful technique that has wide applicability in data science, bioinformatics, and further afield. It was initially developed to analyse large volumes of data in order to tease out the differences/relationships between the logical entities being analysed. It extracts the fundamental structure of the data without the need to build any model to represent it. This 'summary' of the data is arrived at through a process of reduction that can transform the large number of variables into a lesser number that are uncorrelated (i.e. the 'principal components'), while at the same time being capable of easy interpretation on the original data. PCAtools provides functions for data exploration via PCA, and allows the user to generate publication-ready figures. PCA is performed via BiocSingular - users can also identify optimal number of principal components via different metrics, such as elbow method and Horn's parallel analysis, which has relevance for data reduction in single-cell RNA-seq (scRNA-seq) and high dimensional mass cytometry data.
Maintained by Kevin Blighe. Last updated 5 months ago.
rnaseqatacseqgeneexpressiontranscriptionsinglecellprincipalcomponentcpp
4.9 match 343 stars 11.12 score 832 scripts 2 dependentsrobertoalcantara9
LearnClust:Learning Hierarchical Clustering Algorithms
Classical hierarchical clustering algorithms, agglomerative and divisive clustering. Algorithms are implemented as a theoretical way, step by step. It includes some detailed functions that explain each step. Every function allows options to get different results using different techniques. The package explains non expert users how hierarchical clustering algorithms work.
Maintained by Roberto Alcantara. Last updated 4 years ago.
26.3 match 2.04 score 11 scriptsagoutsmedt
biblionetwork:Create Different Types of Bibliometric Networks
Functions to find edges for bibliometric networks like bibliographic coupling network, co-citation network and co-authorship network. The weights of network edges can be calculated according to different methods, depending on the type of networks, the type of nodes, and what you want to analyse. These functions are optimized to be be used on large dataset. The package contains functions inspired by: Leydesdorff, Loet and Park, Han Woo (2017) <doi:10.1016/j.joi.2016.11.007>; Perianes-Rodriguez, Antonio, Ludo Waltman, and Nees Jan Van Eck (2016) <doi:10.1016/j.joi.2016.10.006>; Sen, Subir K. and Shymal K. Gan (1983) <http://nopr.niscair.res.in/handle/123456789/28008>; Shen, Si, Zhu, Danhao, Rousseau, Ronald, Su, Xinning and Wang, Dongbo (2019) <doi:10.1016/j.joi.2019.01.012>; Zhao, Dangzhi and Strotmann, Andreas (2008) <doi:10.1002/meet.2008.1450450292>.
Maintained by Aurélien Goutsmedt. Last updated 2 years ago.
authorship-networkbibliometric-networksbibliometricscoupling-angle
10.4 match 7 stars 5.18 score 43 scriptsdavid-cortes
outliertree:Explainable Outlier Detection Through Decision Tree Conditioning
Outlier detection method that flags suspicious values within observations, constrasting them against the normal values in a user-readable format, potentially describing conditions within the data that make a given outlier more rare. Full procedure is described in Cortes (2020) <doi:10.48550/arXiv.2001.00636>. Loosely based on the 'GritBot' <https://www.rulequest.com/gritbot-info.html> software.
Maintained by David Cortes. Last updated 2 months ago.
anomaly-detectionoutlier-detectioncppopenmp
7.1 match 58 stars 7.34 score 21 scripts 2 dependentstrinker
qdapRegex:Regular Expression Removal, Extraction, and Replacement Tools
A collection of regular expression tools associated with the 'qdap' package that may be useful outside of the context of discourse analysis. Tools include removal/extraction/replacement of abbreviations, dates, dollar amounts, email addresses, hash tags, numbers, percentages, citations, person tags, phone numbers, times, and zip codes.
Maintained by Tyler Rinker. Last updated 1 years ago.
5.3 match 50 stars 9.48 score 502 scripts 41 dependentsapxr
analyzer:Data Analysis and Automated R Notebook Generation
Easy data analysis and quality checks which are commonly used in data science. It combines the tabular and graphical visualization for easier usability. This package also creates an R Notebook with detailed data exploration with one function call. The notebook can be made interactive.
Maintained by Apurv Priyam. Last updated 5 years ago.
12.0 match 4.13 score 27 scriptsr-lib
pak:Another Approach to Package Installation
The goal of 'pak' is to make package installation faster and more reliable. In particular, it performs all HTTP operations in parallel, so metadata resolution and package downloads are fast. Metadata and package files are cached on the local disk as well. 'pak' has a dependency solver, so it finds version conflicts before performing the installation. This version of 'pak' supports CRAN, 'Bioconductor' and 'GitHub' packages as well.
Maintained by Gábor Csárdi. Last updated 14 hours ago.
3.8 match 717 stars 13.05 score 277 scripts 17 dependentsmodeloriented
EIX:Explain Interactions in 'XGBoost'
Structure mining from 'XGBoost' and 'LightGBM' models. Key functionalities of this package cover: visualisation of tree-based ensembles models, identification of interactions, measuring of variable importance, measuring of interaction importance, explanation of single prediction with break down plots (based on 'xgboostExplainer' and 'iBreakDown' packages). To download the 'LightGBM' use the following link: <https://github.com/Microsoft/LightGBM>. 'EIX' is a part of the 'DrWhy.AI' universe.
Maintained by Ewelina Karbowiak. Last updated 4 years ago.
8.3 match 26 stars 5.72 score 6 scriptsropensci
elastic:General Purpose Interface to 'Elasticsearch'
Connect to 'Elasticsearch', a 'NoSQL' database built on the 'Java' Virtual Machine. Interacts with the 'Elasticsearch' 'HTTP' API (<https://www.elastic.co/elasticsearch/>), including functions for setting connection details to 'Elasticsearch' instances, loading bulk data, searching for documents with both 'HTTP' query variables and 'JSON' based body requests. In addition, 'elastic' provides functions for interacting with API's for 'indices', documents, nodes, clusters, an interface to the cat API, and more.
Maintained by Scott Chamberlain. Last updated 2 years ago.
databaseelasticsearchhttpapisearchnosqljavajsondocumentsdata-sciencedatabase-wrapperetl
5.3 match 247 stars 8.98 score 151 scripts 1 dependentsgreenwoodlab
pcev:Principal Component of Explained Variance
Principal component of explained variance (PCEV) is a statistical tool for the analysis of a multivariate response vector. It is a dimension- reduction technique, similar to Principal component analysis (PCA), that seeks to maximize the proportion of variance (in the response vector) being explained by a set of covariates.
Maintained by Maxime Turgeon. Last updated 6 years ago.
10.6 match 4 stars 4.30 score 7 scriptsmarjoleinf
pre:Prediction Rule Ensembles
Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; <DOI:10.1214/07-AOAS148>), with adjustments and improvements. The main function pre() derives prediction rule ensembles consisting of rules and/or linear terms for continuous, binary, count, multinomial, and multivariate continuous responses. Function gpe() derives generalized prediction ensembles, consisting of rules, hinge and linear functions of the predictor variables.
Maintained by Marjolein Fokkema. Last updated 9 months ago.
5.1 match 58 stars 8.49 score 98 scripts 1 dependentssrmatth
mshap:Multiplicative SHAP Values for Two-Part Models
Allows for the computation of mSHAP values on two-part models as proposed by Matthews, S. and Hartman, B. (2021) <arXiv:2106.08990>. Also contains functions for simple plotting of the results (or any SHAP values). For information about the TreeSHAP algorithm that mSHAP builds on, see Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.I. (2020) <doi:10.1038/s42256-019-0138-9>.
Maintained by Spencer Matthews. Last updated 3 years ago.
8.8 match 4 stars 4.78 score 15 scriptslaylaparast
Rsurrogate:Robust Estimation of the Proportion of Treatment Effect Explained by Surrogate Marker Information
Provides functions to estimate the proportion of treatment effect on the primary outcome that is explained by the treatment effect on the surrogate marker.
Maintained by Layla Parast. Last updated 2 years ago.
13.2 match 3.16 score 12 scripts 4 dependentsbioc
decompTumor2Sig:Decomposition of individual tumors into mutational signatures by signature refitting
Uses quadratic programming for signature refitting, i.e., to decompose the mutation catalog from an individual tumor sample into a set of given mutational signatures (either Alexandrov-model signatures or Shiraishi-model signatures), computing weights that reflect the contributions of the signatures to the mutation load of the tumor.
Maintained by Rosario M. Piro. Last updated 5 months ago.
softwaresnpsequencingdnaseqgenomicvariationsomaticmutationbiomedicalinformaticsgeneticsbiologicalquestionstatisticalmethod
8.7 match 1 stars 4.78 score 10 scripts 1 dependentsinesortega
neuralGAM:Interpretable Neural Network Based on Generalized Additive Models
Neural network framework based on Generalized Additive Models from Hastie & Tibshirani (1990, ISBN:9780412343902), which trains a different neural network to estimate the contribution of each feature to the response variable. The networks are trained independently leveraging the local scoring and backfitting algorithms to ensure that the Generalized Additive Model converges and it is additive. The resultant Neural Network is a highly accurate and interpretable deep learning model, which can be used for high-risk AI practices where decision-making should be based on accountable and interpretable algorithms.
Maintained by Ines Ortega-Fernandez. Last updated 6 months ago.
deep-neural-networksexplainable-aigamganngeneralized-additive-modelsgeneralized-additive-neural-networkself-explanatory-mlxai
7.5 match 2 stars 5.44 score 40 scriptsnashjc
optimx:Expanded Replacement and Extension of the 'optim' Function
Provides a replacement and extension of the optim() function to call to several function minimization codes in R in a single statement. These methods handle smooth, possibly box constrained functions of several or many parameters. Note that function 'optimr()' was prepared to simplify the incorporation of minimization codes going forward. Also implements some utility codes and some extra solvers, including safeguarded Newton methods. Many methods previously separate are now included here. This is the version for CRAN.
Maintained by John C Nash. Last updated 2 months ago.
3.1 match 2 stars 12.87 score 1.8k scripts 89 dependentsmodeloriented
localModel:LIME-Based Explanations with Interpretable Inputs Based on Ceteris Paribus Profiles
Local explanations of machine learning models describe, how features contributed to a single prediction. This package implements an explanation method based on LIME (Local Interpretable Model-agnostic Explanations, see Tulio Ribeiro, Singh, Guestrin (2016) <doi:10.1145/2939672.2939778>) in which interpretable inputs are created based on local rather than global behaviour of each original feature.
Maintained by Przemyslaw Biecek. Last updated 3 years ago.
6.5 match 14 stars 6.16 score 23 scriptsmrcieu
TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database
A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.
Maintained by Gibran Hemani. Last updated 11 days ago.
3.4 match 467 stars 11.23 score 1.7k scripts 1 dependentsalexzwanenburg
familiar:End-to-End Automated Machine Learning and Model Evaluation
Single unified interface for end-to-end modelling of regression, categorical and time-to-event (survival) outcomes. Models created using familiar are self-containing, and their use does not require additional information such as baseline survival, feature clustering, or feature transformation and normalisation parameters. Model performance, calibration, risk group stratification, (permutation) variable importance, individual conditional expectation, partial dependence, and more, are assessed automatically as part of the evaluation process and exported in tabular format and plotted, and may also be computed manually using export and plot functions. Where possible, metrics and values obtained during the evaluation process come with confidence intervals.
Maintained by Alex Zwanenburg. Last updated 6 months ago.
aiexplainable-aimachine-learningsurvival-analysistabular-data
7.5 match 31 stars 5.05 score 18 scriptshadley
ggvis:Interactive Grammar of Graphics
An implementation of an interactive grammar of graphics, taking the best parts of 'ggplot2', combining them with the reactive framework of 'shiny' and drawing web graphics using 'vega'.
Maintained by Hadley Wickham. Last updated 1 years ago.
5.3 match 1 stars 7.02 score 2.3k scripts 11 dependentsbioc
ChemmineR:Cheminformatics Toolkit for R
ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.
Maintained by Thomas Girke. Last updated 5 months ago.
cheminformaticsbiomedicalinformaticspharmacogeneticspharmacogenomicsmicrotitreplateassaycellbasedassaysvisualizationinfrastructuredataimportclusteringproteomicsmetabolomicscpp
3.9 match 14 stars 9.42 score 253 scripts 12 dependentsdwarton
ecostats:Code and Data Accompanying the Eco-Stats Text (Warton 2022)
Functions and data supporting the Eco-Stats text (Warton, 2022, Springer), and solutions to exercises. Functions include tools for using simulation envelopes in diagnostic plots, and a function for diagnostic plots of multivariate linear models. Datasets mentioned in the package are included here (where not available elsewhere) and there is a vignette for each chapter of the text with solutions to exercises.
Maintained by David Warton. Last updated 1 years ago.
5.5 match 8 stars 6.58 score 53 scriptsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
5.1 match 145 stars 7.09 score 50 scripts 2 dependentsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 6 days ago.
4.3 match 12 stars 7.92 score 46 scriptsgrvanderploeg
parafac4microbiome:Parallel Factor Analysis Modelling of Longitudinal Microbiome Data
Creation and selection of PARAllel FACtor Analysis (PARAFAC) models of longitudinal microbiome data. You can import your own data with our import functions or use one of the example datasets to create your own PARAFAC models. Selection of the optimal number of components can be done using assessModelQuality() and assessModelStability(). The selected model can then be plotted using plotPARAFACmodel(). The Parallel Factor Analysis method was originally described by Caroll and Chang (1970) <doi:10.1007/BF02310791> and Harshman (1970) <https://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf>.
Maintained by Geert Roelof van der Ploeg. Last updated 20 days ago.
dimensionality-reductionmicrobiomemicrobiome-datamultiwaymultiway-algorithmsparallel-factor-analysis
5.2 match 6 stars 6.31 score 13 scriptsmrc-ide
orderly2:Orderly Next Generation
Distributed reproducible computing framework, adopting ideas from git, docker and other software. By defining a lightweight interface around the inputs and outputs of an analysis, a lot of the repetitive work for reproducible research can be automated. We define a simple format for organising and describing work that facilitates collaborative reproducible research and acknowledges that all analyses are run multiple times over their lifespans.
Maintained by Rich FitzJohn. Last updated 2 months ago.
3.8 match 8 stars 8.30 score 49 scripts 2 dependentsgiuseppec
iml:Interpretable Machine Learning
Interpretability methods to analyze the behavior and predictions of any machine learning model. Implemented methods are: Feature importance described by Fisher et al. (2018) <doi:10.48550/arxiv.1801.01489>, accumulated local effects plots described by Apley (2018) <doi:10.48550/arxiv.1612.08468>, partial dependence plots described by Friedman (2001) <www.jstor.org/stable/2699986>, individual conditional expectation ('ice') plots described by Goldstein et al. (2013) <doi:10.1080/10618600.2014.907095>, local models (variant of 'lime') described by Ribeiro et. al (2016) <doi:10.48550/arXiv.1602.04938>, the Shapley Value described by Strumbelj et. al (2014) <doi:10.1007/s10115-013-0679-x>, feature interactions described by Friedman et. al <doi:10.1214/07-AOAS148> and tree surrogate models.
Maintained by Giuseppe Casalicchio. Last updated 20 days ago.
2.4 match 494 stars 12.86 score 642 scripts 4 dependentsviadee
localICE:Local Individual Conditional Expectation
Local Individual Conditional Expectation ('localICE') is a local explanation approach from the field of eXplainable Artificial Intelligence (XAI). localICE is a model-agnostic XAI approach which provides three-dimensional local explanations for particular data instances. The approach is proposed in the master thesis of Martin Walter as an extension to ICE (see Reference). The three dimensions are the two features at the horizontal and vertical axes as well as the target represented by different colors. The approach is applicable for classification and regression problems to explain interactions of two features towards the target. For classification models, the number of classes can be more than two and each class is added as a different color to the plot. The given instance is added to the plot as two dotted lines according to the feature values. The localICE-package can explain features of type factor and numeric of any machine learning model. Automatically supported machine learning packages are 'mlr', 'randomForest', 'caret' or all other with an S3 predict function. For further model types from other libraries, a predict function has to be provided as an argument in order to get access to the model. Reference to the ICE approach: Alex Goldstein, Adam Kapelner, Justin Bleich, Emil Pitkin (2013) <arXiv:1309.6392>.
Maintained by Martin Walter. Last updated 5 years ago.
aiexplainable-aiggplotmachine-learningvisualization
8.4 match 7 stars 3.54 score 3 scriptsmfrasco
Metrics:Evaluation Metrics for Machine Learning
An implementation of evaluation metrics in R that are commonly used in supervised machine learning. It implements metrics for regression, time series, binary classification, classification, and information retrieval problems. It has zero dependencies and a consistent, simple interface for all functions.
Maintained by Michael Frasco. Last updated 6 years ago.
2.3 match 99 stars 13.02 score 6.1k scripts 51 dependentsmodeloriented
live:Local Interpretable (Model-Agnostic) Visual Explanations
Interpretability of complex machine learning models is a growing concern. This package helps to understand key factors that drive the decision made by complicated predictive model (so called black box model). This is achieved through local approximations that are either based on additive regression like model or CART like model that allows for higher interactions. The methodology is based on Tulio Ribeiro, Singh, Guestrin (2016) <doi:10.1145/2939672.2939778>. More details can be found in Staniak, Biecek (2018) <doi:10.32614/RJ-2018-072>.
Maintained by Mateusz Staniak. Last updated 6 years ago.
imlinterpretabilitylimemachine-learningmodel-visualizationvisual-explanationsxai
5.2 match 35 stars 5.59 score 55 scriptspbiecek
breakDown:Model Agnostic Explainers for Individual Predictions
Model agnostic tool for decomposition of predictions from black boxes. Break Down Table shows contributions of every variable to a final prediction. Break Down Plot presents variable contributions in a concise graphical way. This package work for binary classifiers and general regression models.
Maintained by Przemyslaw Biecek. Last updated 1 years ago.
data-scienceimlinterpretabilitymachine-learningvisual-explanationsxai
3.0 match 103 stars 8.90 score 91 scripts 2 dependentsrdatatable
data.table:Extension of `data.frame`
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
Maintained by Tyson Barrett. Last updated 4 hours ago.
1.1 match 3.7k stars 23.52 score 230k scripts 4.6k dependentsbioc
scGPS:A complete analysis of single cell subpopulations, from identifying subpopulations to analysing their relationship (scGPS = single cell Global Predictions of Subpopulation)
The package implements two main algorithms to answer two key questions: a SCORE (Stable Clustering at Optimal REsolution) to find subpopulations, followed by scGPS to investigate the relationships between subpopulations.
Maintained by Quan Nguyen. Last updated 5 months ago.
singlecellclusteringdataimportsequencingcoverageopenblascpp
5.1 match 4 stars 5.20 score 7 scriptsrconsortium
S7:An Object Oriented System Meant to Become a Successor to S3 and S4
A new object oriented programming system designed to be a successor to S3 and S4. It includes formal class, generic, and method specification, and a limited form of multiple dispatch. It has been designed and implemented collaboratively by the R Consortium Object-Oriented Programming Working Group, which includes representatives from R-Core, 'Bioconductor', 'Posit'/'tidyverse', and the wider R community.
Maintained by Hadley Wickham. Last updated 4 months ago.
2.0 match 432 stars 13.15 score 86 scripts 22 dependentsspatstat
spatstat.utils:Utility Functions for 'spatstat'
Contains utility functions for the 'spatstat' family of packages which may also be useful for other purposes.
Maintained by Adrian Baddeley. Last updated 2 days ago.
spatial-analysisspatial-dataspatstat
2.3 match 5 stars 11.66 score 134 scripts 248 dependentsduckdb
duckdb:DBI Package for the DuckDB Database Management System
The DuckDB project is an embedded analytical data management system with support for the Structured Query Language (SQL). This package includes all of DuckDB and an R Database Interface (DBI) connector.
Maintained by Kirill Müller. Last updated 3 days ago.
1.9 match 158 stars 13.79 score 1.7k scripts 46 dependentskhliland
multiblock:Multiblock Data Fusion in Statistics and Machine Learning
Functions and datasets to support Smilde, Næs and Liland (2021, ISBN: 978-1-119-60096-1) "Multiblock Data Fusion in Statistics and Machine Learning - Applications in the Natural and Life Sciences". This implements and imports a large collection of methods for multiblock data analysis with common interfaces, result- and plotting functions, several real data sets and six vignettes covering a range different applications.
Maintained by Kristian Hovde Liland. Last updated 2 months ago.
3.8 match 14 stars 6.68 score 19 scriptsadeverse
ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences
Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.
Maintained by Aurélie Siberchicot. Last updated 13 days ago.
1.7 match 39 stars 14.96 score 2.2k scripts 256 dependentsbioc
MatrixQCvis:Shiny-based interactive data-quality exploration for omics data
Data quality assessment is an integral part of preparatory data analysis to ensure sound biological information retrieval. We present here the MatrixQCvis package, which provides shiny-based interactive visualization of data quality metrics at the per-sample and per-feature level. It is broadly applicable to quantitative omics data types that come in matrix-like format (features x samples). It enables the detection of low-quality samples, drifts, outliers and batch effects in data sets. Visualizations include amongst others bar- and violin plots of the (count/intensity) values, mean vs standard deviation plots, MA plots, empirical cumulative distribution function (ECDF) plots, visualizations of the distances between samples, and multiple types of dimension reduction plots. Furthermore, MatrixQCvis allows for differential expression analysis based on the limma (moderated t-tests) and proDA (Wald tests) packages. MatrixQCvis builds upon the popular Bioconductor SummarizedExperiment S4 class and enables thus the facile integration into existing workflows. The package is especially tailored towards metabolomics and proteomics mass spectrometry data, but also allows to assess the data quality of other data types that can be represented in a SummarizedExperiment object.
Maintained by Thomas Naake. Last updated 5 months ago.
visualizationshinyappsguiqualitycontroldimensionreductionmetabolomicsproteomicstranscriptomics
5.2 match 4.74 score 4 scriptslaijiangshan
gam.hp:Hierarchical Partitioning of Adjusted R2 and Explained Deviance for Generalized Additive Models
Conducts hierarchical partitioning to calculate individual contributions of each predictor towards adjusted R2 and explained deviance for generalized additive models based on output of gam()in 'mgcv' package, applying the algorithm in this paper: Lai(2024) <doi:10.1016/j.pld.2024.06.002>.
Maintained by Jiangshan Lai. Last updated 3 months ago.
5.0 match 6 stars 4.95 score 6 scriptsvegandevs
vegan:Community Ecology Package
Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
Maintained by Jari Oksanen. Last updated 16 days ago.
ecological-modellingecologyordinationfortranopenblas
1.3 match 472 stars 19.41 score 15k scripts 440 dependentsdcousin3
ANOPA:Analyses of Proportions using Anscombe Transform
Analyses of Proportions can be performed on the Anscombe (arcsine-related) transformed data. The 'ANOPA' package can analyze proportions obtained from up to four factors. The factors can be within-subject or between-subject or a mix of within- and between-subject. The main, omnibus analysis can be followed by additive decompositions into interaction effects, main effects, simple effects, contrast effects, etc., mimicking precisely the logic of ANOVA. For that reason, we call this set of tools 'ANOPA' (Analysis of Proportion using Anscombe transform) to highlight its similarities with ANOVA. The 'ANOPA' framework also allows plots of proportions easy to obtain along with confidence intervals. Finally, effect sizes and planning statistical power are easily done under this framework. Only particularity, the 'ANOPA' computes F statistics which have an infinite degree of freedom on the denominator. See Laurencelle and Cousineau (2023) <doi:10.3389/fpsyg.2022.1045436>.
Maintained by Denis Cousineau. Last updated 2 months ago.
error-barsproportionsstatistical-testingstatisticssummary-statistics
6.6 match 1 stars 3.65 score 18 scriptsjpfitzinger
tidyfit:Regularized Linear Modeling with Tidy Data
An extension to the 'R' tidy data environment for automated machine learning. The package allows fitting and cross validation of linear regression and classification algorithms on grouped data.
Maintained by Johann Pfitzinger. Last updated 2 months ago.
auto-mlclassificationmachine-learningregressiontidyverse
3.3 match 16 stars 7.22 score 26 scriptsprotviz
prozor:Minimal Protein Set Explaining Peptide Spectrum Matches
Determine minimal protein set explaining peptide spectrum matches. Utility functions for creating fasta amino acid databases with decoys and contaminants. Peptide false discovery rate estimation for target decoy search results on psm, precursor, peptide and protein level. Computing dynamic swath window sizes based on MS1 or MS2 signal distributions.
Maintained by Witold Wolski. Last updated 4 months ago.
softwaremassspectrometryproteomicsexperimenthubsoftware
5.2 match 6 stars 4.45 score 93 scriptsvast-lib
tinyVAST:Multivariate Spatio-Temporal Models using Structural Equations
Fits a wide variety of multivariate spatio-temporal models with simultaneous and lagged interactions among variables (including vector autoregressive spatio-temporal ('VAST') dynamics) for areal, continuous, or network spatial domains. It includes time-variable, space-variable, and space-time-variable interactions using dynamic structural equation models ('DSEM') as expressive interface, and the 'mgcv' package to specify splines via the formula interface. See Thorson et al. (2024) <doi:10.48550/arXiv.2401.10193> for more details.
Maintained by James T. Thorson. Last updated 8 hours ago.
vector-autoregressive-spatio-temporal-modelcpp
3.3 match 13 stars 6.80 scoretidyverse
duckplyr:A 'DuckDB'-Backed Version of 'dplyr'
A drop-in replacement for 'dplyr', powered by 'DuckDB' for performance. Offers convenient utilities for working with in-memory and larger-than-memory data while retaining full 'dplyr' compatibility.
Maintained by Kirill Müller. Last updated 5 days ago.
analyticsdataframedplyrduckdbperformance
2.0 match 309 stars 11.33 score 220 scriptsbioc
pcaMethods:A collection of PCA methods
Provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA. A cluster based method for missing value estimation is included for comparison. BPCA, PPCA and NipalsPCA may be used to perform PCA on incomplete data as well as for accurate missing value estimation. A set of methods for printing and plotting the results is also provided. All PCA methods make use of the same data structure (pcaRes) to provide a common interface to the PCA results. Initiated at the Max-Planck Institute for Molecular Plant Physiology, Golm, Germany.
Maintained by Henning Redestig. Last updated 5 months ago.
1.7 match 49 stars 13.10 score 538 scripts 73 dependentstagteam
riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks
Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.
Maintained by Thomas Alexander Gerds. Last updated 18 days ago.
1.7 match 46 stars 13.00 score 736 scripts 35 dependentsaljacq
LorenzRegression:Lorenz and Penalized Lorenz Regressions
Inference for the Lorenz and penalized Lorenz regressions. More broadly, the package proposes functions to assess inequality and graphically represent it. The Lorenz Regression procedure is introduced in Heuchenne and Jacquemain (2022) <doi:10.1016/j.csda.2021.107347> and in Jacquemain, A., C. Heuchenne, and E. Pircalabelu (2024) <doi:10.1214/23-EJS2200>.
Maintained by Alexandre Jacquemain. Last updated 11 days ago.
5.3 match 1 stars 3.95 scoreeagerai
fastai:Interface to 'fastai'
The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.
Maintained by Turgut Abdullayev. Last updated 11 months ago.
audiocollaborative-filteringdarknetdarknet-image-classificationfastaimedicalobject-detectiontabulartextvision
2.3 match 118 stars 9.40 score 76 scriptseasystats
effectsize:Indices of Effect Size
Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>.
Maintained by Mattan S. Ben-Shachar. Last updated 1 months ago.
anovacohens-dcomputeconversioncorrelationeffect-sizeeffectsizehacktoberfesthedges-ginterpretationstandardizationstandardizedstatistics
1.3 match 344 stars 16.38 score 1.8k scripts 29 dependentsmodeloriented
rSAFE:Surrogate-Assisted Feature Extraction
Provides a model agnostic tool for white-box model trained on features extracted from a black-box model. For more information see: Gosiewska et al. (2020) <doi:10.1016/j.dss.2021.113556>.
Maintained by Alicja Gosiewska. Last updated 3 years ago.
feature-engineeringfeature-extractionimlinterpretabilitymachine-learningxai
3.0 match 28 stars 6.79 score 44 scriptschrsigg
nsprcomp:Non-Negative and Sparse PCA
Two methods for performing a constrained principal component analysis (PCA), where non-negativity and/or sparsity constraints are enforced on the principal axes (PAs). The function 'nsprcomp' computes one principal component (PC) after the other. Each PA is optimized such that the corresponding PC has maximum additional variance not explained by the previous components. In contrast, the function 'nscumcomp' jointly computes all PCs such that the cumulative variance is maximal. Both functions have the same interface as the 'prcomp' function from the 'stats' package (plus some extra parameters), and both return the result of the analysis as an object of class 'nsprcomp', which inherits from 'prcomp'. See <https://sigg-iten.ch/learningbits/2013/05/27/nsprcomp-is-on-cran/> and Sigg et al. (2008) <doi:10.1145/1390156.1390277> for more details.
Maintained by Christian Sigg. Last updated 7 years ago.
4.3 match 9 stars 4.77 score 22 scripts 1 dependentsdcousin3
CohensdpLibrary:Cohen's d_p Computation with Confidence Intervals
Computing Cohen's d_p in any experimental designs (between-subject, within-subject, and single-group design). Cousineau (2022) <https://github.com/dcousin3/CohensdpLibrary/>; Cohen (1969, ISBN: 0-8058-0283-5).
Maintained by Denis Cousineau. Last updated 4 days ago.
6.6 match 1 stars 3.00 score 3 scriptsbioc
CausalR:Causal network analysis methods
Causal network analysis methods for regulator prediction and network reconstruction from genome scale data.
Maintained by Glyn Bradley. Last updated 5 months ago.
immunooncologysystemsbiologynetworkgraphandnetworknetwork inferencetranscriptomicsproteomicsdifferentialexpressionrnaseqmicroarray
5.5 match 3.60 score 7 scriptslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 24 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
2.0 match 233 stars 9.84 score 185 scripts 1 dependentssquidlobster
castor:Efficient Phylogenetics on Large Trees
Efficient phylogenetic analyses on massive phylogenies comprising up to millions of tips. Functions include pruning, rerooting, calculation of most-recent common ancestors, calculating distances from the tree root and calculating pairwise distances. Calculation of phylogenetic signal and mean trait depth (trait conservatism), ancestral state reconstruction and hidden character prediction of discrete characters, simulating and fitting models of trait evolution, fitting and simulating diversification models, dating trees, comparing trees, and reading/writing trees in Newick format. Citation: Louca, Stilianos and Doebeli, Michael (2017) <doi:10.1093/bioinformatics/btx701>.
Maintained by Stilianos Louca. Last updated 4 months ago.
3.4 match 2 stars 5.75 score 450 scripts 9 dependentsericaponzi
RaJIVE:Robust Angle Based Joint and Individual Variation Explained
A robust alternative to the aJIVE (angle based Joint and Individual Variation Explained) method (Feng et al 2018: <doi:10.1016/j.jmva.2018.03.008>) for the estimation of joint and individual components in the presence of outliers in multi-source data. It decomposes the multi-source data into joint, individual and residual (noise) contributions. The decomposition is robust to outliers and noise in the data. The method is illustrated in Ponzi et al (2021) <arXiv:2101.09110>.
Maintained by Erica Ponzi. Last updated 4 years ago.
7.1 match 2.70 score 1 scriptscomputationalstylistics
litRiddle:Dataset and Tools to Research the Riddle of Literary Quality
Dataset and functions to explore quality of literary novels. The package is a part of the Riddle of Literary Quality project, and it contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data. For more details, see: Eder M, van Zundert J, Lensink S, van Dalen-Oskam K (2022). Replicating The Riddle of Literary Quality: The litRiddle package for R. In _Digital Humanities 2022: Conference Abstracts_, 636-637.
Maintained by Maciej Eder. Last updated 2 years ago.
7.1 match 2.70 score 2 scriptscecileproust-lima
lcmm:Extended Mixed Models Using Latent Classes and Latent Processes
Estimation of various extensions of the mixed models including latent class mixed models, joint latent class mixed models, mixed models for curvilinear outcomes, mixed models for multivariate longitudinal outcomes using a maximum likelihood estimation method (Proust-Lima, Philipps, Liquet (2017) <doi:10.18637/jss.v078.i02>).
Maintained by Cecile Proust-Lima. Last updated 1 months ago.
1.7 match 62 stars 11.41 score 249 scripts 7 dependentsconnormayer
maxent.ot:Perform Phonological Analyses using Maximum Entropy Optimality Theory
Fit Maximum Entropy Optimality Theory models to data sets, generate the predictions made by such models for novel data, and compare the fit of different models using a variety of metrics. The package is described in Mayer, C., Tan, A., Zuraw, K. (in press) <https://sites.socsci.uci.edu/~cjmayer/papers/cmayer_et_al_maxent_ot_accepted.pdf>.
Maintained by Connor Mayer. Last updated 4 months ago.
3.4 match 8 stars 5.51 score 6 scriptstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
2.3 match 3 stars 8.20 score 7.8k scripts 11 dependentsplangfelder
WGCNA:Weighted Correlation Network Analysis
Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.
Maintained by Peter Langfelder. Last updated 6 months ago.
1.9 match 54 stars 9.65 score 5.3k scripts 32 dependentsanna-neufeld
splinetree:Longitudinal Regression Trees and Forests
Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.
Maintained by Anna Neufeld. Last updated 6 years ago.
3.4 match 4 stars 5.24 score 29 scriptscran
IMEC:Ising Model of Explanatory Coherence
Theories are one of the most important tools of science. Although psychologists discussed problems of theory in their discipline for a long time, weak theories are still widespread in most subfields. One possible reason for this is that psychologists lack the tools to systematically assess the quality of their theories. Previously a computational model for formal theory evaluation based on the concept of explanatory coherence was developed (Thagard, 1989, <doi:10.1017/S0140525X00057046>). However, there are possible improvements to this model and it is not available in software that psychologists typically use. Therefore, a new implementation of explanatory coherence based on the Ising model is available in this R-package.
Maintained by Maximilian Maier. Last updated 4 years ago.
6.6 match 2.70 score 1 scriptsdanchaltiel
crosstable:Crosstables for Descriptive Analyses
Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using 'rmarkdown' or 'officer'. You can also compute effect sizes and statistical tests if needed.
Maintained by Dan Chaltiel. Last updated 2 months ago.
descriptive-statisticsflextablefrequency-tablehtml-reportmswordofficer
1.7 match 116 stars 10.37 score 340 scriptsbioc
singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data
The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.
Maintained by Joshua David Campbell. Last updated 24 days ago.
singlecellgeneexpressiondifferentialexpressionalignmentclusteringimmunooncologybatcheffectnormalizationqualitycontroldataimportgui
1.7 match 181 stars 10.16 score 252 scriptsfchen365
epca:Exploratory Principal Component Analysis
Exploratory principal component analysis for large-scale dataset, including sparse principal component analysis and sparse matrix approximation.
Maintained by Fan Chen. Last updated 11 months ago.
community-detectionexploratory-data-analysismatrix-decompositionspcaprincipal-component-analysissparse-matrix
3.7 match 11 stars 4.74 score 8 scriptsmattblackwell
DirectEffects:Estimating Controlled Direct Effects for Explaining Causal Findings
A set of functions to estimate the controlled direct effect of treatment fixing a potential mediator to a specific value. Implements the sequential g-estimation estimator described in Vansteelandt (2009) <doi:10.1097/EDE.0b013e3181b6f4c9> and Acharya, Blackwell, and Sen (2016) <doi:10.1017/S0003055416000216> and the telescope matching estimator described in Blackwell and Strezhnev (2020) <doi:10.1111/rssa.12759>.
Maintained by Matthew Blackwell. Last updated 20 days ago.
2.9 match 18 stars 6.09 score 17 scriptsdonaldrwilliams
BGGM:Bayesian Gaussian Graphical Models
Fit Bayesian Gaussian graphical models. The methods are separated into two Bayesian approaches for inference: hypothesis testing and estimation. There are extensions for confirmatory hypothesis testing, comparing Gaussian graphical models, and node wise predictability. These methods were recently introduced in the Gaussian graphical model literature, including Williams (2019) <doi:10.31234/osf.io/x8dpr>, Williams and Mulder (2019) <doi:10.31234/osf.io/ypxd8>, Williams, Rast, Pericchi, and Mulder (2019) <doi:10.31234/osf.io/yt386>.
Maintained by Philippe Rast. Last updated 3 months ago.
bayes-factorsbayesian-hypothesis-testinggaussian-graphical-modelsopenblascppopenmp
1.8 match 55 stars 9.64 score 102 scripts 1 dependentsgrowthcharts
brokenstick:Broken Stick Model for Irregular Longitudinal Data
Data on multiple individuals through time are often sampled at times that differ between persons. Irregular observation times can severely complicate the statistical analysis of the data. The broken stick model approximates each subject’s trajectory by one or more connected line segments. The times at which segments connect (breakpoints) are identical for all subjects and under control of the user. A well-fitting broken stick model effectively transforms individual measurements made at irregular times into regular trajectories with common observation times. Specification of the model requires three variables: time, measurement and subject. The model is a special case of the linear mixed model, with time as a linear B-spline and subject as the grouping factor. The main assumptions are: subjects are exchangeable, trajectories between consecutive breakpoints are straight, random effects follow a multivariate normal distribution, and unobserved data are missing at random. The package contains functions for fitting the broken stick model to data, for predicting curves in new data and for plotting broken stick estimates. The package supports two optimization methods, and includes options to structure the variance-covariance matrix of the random effects. The analyst may use the software to smooth growth curves by a series of connected straight lines, to align irregularly observed curves to a common time grid, to create synthetic curves at a user-specified set of breakpoints, to estimate the time-to-time correlation matrix and to predict future observations. See <doi:10.18637/jss.v106.i07> for additional documentation on background, methodology and applications.
Maintained by Stef van Buuren. Last updated 2 years ago.
b-splinegrowth-curveslinear-mixed-modelslongitudinal-data
3.2 match 9 stars 5.33 score 12 scriptsropensci
sofa:Connector to 'CouchDB'
Provides an interface to the 'NoSQL' database 'CouchDB' (<http://couchdb.apache.org>). Methods are provided for managing databases within 'CouchDB', including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local 'CouchDB' instance, or a remote 'CouchDB' databases such as 'Cloudant'. Documents can be inserted directly from vectors, lists, data.frames, and 'JSON'. Targeted at 'CouchDB' v2 or greater.
Maintained by Yaoxiang Li. Last updated 1 months ago.
couchdbdatabasenosqldocumentscloudantcouchdb-client
2.3 match 33 stars 7.51 score 54 scriptsdwarton
mvabund:Statistical Methods for Analysing Multivariate Abundance Data
A set of tools for displaying, modeling and analysing multivariate abundance data in community ecology. See 'mvabund-package.Rd' for details of overall package organization. The package is implemented with the Gnu Scientific Library (<http://www.gnu.org/software/gsl/>) and 'Rcpp' (<http://dirk.eddelbuettel.com/code/rcpp.html>) 'R' / 'C++' classes.
Maintained by David Warton. Last updated 1 years ago.
1.7 match 10 stars 10.13 score 680 scripts 5 dependentsgavinsimpson
analogue:Analogue and Weighted Averaging Methods for Palaeoecology
Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.
Maintained by Gavin L. Simpson. Last updated 6 months ago.
1.9 match 14 stars 8.96 score 185 scripts 4 dependentsrichardjtelford
palaeoSig:Significance Tests for Palaeoenvironmental Reconstructions
Several tests of quantitative palaeoenvironmental reconstructions from microfossil assemblages, including the null model tests of the statistically significant of reconstructions developed by Telford and Birks (2011) <doi:10.1016/j.quascirev.2011.03.002>, and tests of the effect of spatial autocorrelation on transfer function model performance using methods from Telford and Birks (2009) <doi:10.1016/j.quascirev.2008.12.020> and Trachsel and Telford (2016) <doi:10.5194/cp-12-1215-2016>. Age-depth models with generalized mixed-effect regression from Heegaard et al (2005) <doi:10.1191/0959683605hl836rr> are also included.
Maintained by Richard Telford. Last updated 2 years ago.
3.0 match 3 stars 5.45 score 31 scriptsevolecolgroup
tidysdm:Species Distribution Models with Tidymodels
Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.
Maintained by Andrea Manica. Last updated 10 days ago.
species-distribution-modellingtidymodels
1.9 match 31 stars 8.82 score 51 scriptschrsigg
nscancor:Non-Negative and Sparse CCA
Two implementations of canonical correlation analysis (CCA) that are based on iterated regression. By choosing the appropriate regression algorithm for each data domain, it is possible to enforce sparsity, non-negativity or other kinds of constraints on the projection vectors. Multiple canonical variables are computed sequentially using a generalized deflation scheme, where the additional correlation not explained by previous variables is maximized. nscancor() is used to analyze paired data from two domains, and has the same interface as cancor() from the 'stats' package (plus some extra parameters). mcancor() is appropriate for analyzing data from three or more domains. See <https://sigg-iten.ch/learningbits/2014/01/20/canonical-correlation-analysis-under-constraints/> and Sigg et al. (2007) <doi:10.1109/MLSP.2007.4414315> for more details.
Maintained by Christian Sigg. Last updated 2 years ago.
4.3 match 13 stars 3.81 score 7 scriptszabore
riskclustr:Functions to Study Etiologic Heterogeneity
A collection of functions related to the study of etiologic heterogeneity both across disease subtypes and across individual disease markers. The included functions allow one to quantify the extent of etiologic heterogeneity in the context of a case-control study, and provide p-values to test for etiologic heterogeneity across individual risk factors. Begg CB, Zabor EC, Bernstein JL, Bernstein L, Press MF, Seshan VE (2013) <doi:10.1002/sim.5902>.
Maintained by Emily C. Zabor. Last updated 1 years ago.
3.4 match 1 stars 4.81 score 26 scriptsbioc
consICA:consensus Independent Component Analysis
consICA implements a data-driven deconvolution method – consensus independent component analysis (ICA) to decompose heterogeneous omics data and extract features suitable for patient diagnostics and prognostics. The method separates biologically relevant transcriptional signals from technical effects and provides information about the cellular composition and biological processes. The implementation of parallel computing in the package ensures efficient analysis of modern multicore systems.
Maintained by Petr V. Nazarov. Last updated 5 months ago.
technologystatisticalmethodsequencingrnaseqtranscriptomicsclassificationfeatureextraction
3.8 match 4.30 score 2 scriptsbioc
ALDEx2:Analysis Of Differential Abundance Taking Sample and Scale Variation Into Account
A differential abundance analysis for the comparison of two or more conditions. Useful for analyzing data from standard RNA-seq or meta-RNA-seq assays as well as selected and unselected values from in-vitro sequence selections. Uses a Dirichlet-multinomial model to infer abundance from counts, optimized for three or more experimental replicates. The method infers biological and sampling variation to calculate the expected false discovery rate, given the variation, based on a Wilcoxon Rank Sum test and Welch's t-test (via aldex.ttest), a Kruskal-Wallis test (via aldex.kw), a generalized linear model (via aldex.glm), or a correlation test (via aldex.corr). All tests report predicted p-values and posterior Benjamini-Hochberg corrected p-values. ALDEx2 also calculates expected standardized effect sizes for paired or unpaired study designs. ALDEx2 can now be used to estimate the effect of scale on the results and report on the scale-dependent robustness of results.
Maintained by Greg Gloor. Last updated 5 months ago.
differentialexpressionrnaseqtranscriptomicsgeneexpressiondnaseqchipseqbayesiansequencingsoftwaremicrobiomemetagenomicsimmunooncologyscale simulationposterior p-value
1.5 match 28 stars 10.70 score 424 scripts 3 dependentsbxc147
Epi:Statistical Analysis in Epidemiology
Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.
Maintained by Bendix Carstensen. Last updated 2 months ago.
1.7 match 4 stars 9.65 score 708 scripts 11 dependentscitoverse
cito:Building and Training Neural Networks
The 'cito' package provides a user-friendly interface for training and interpreting deep neural networks (DNN). 'cito' simplifies the fitting of DNNs by supporting the familiar formula syntax, hyperparameter tuning under cross-validation, and helps to detect and handle convergence problems. DNNs can be trained on CPU, GPU and MacOS GPUs. In addition, 'cito' has many downstream functionalities such as various explainable AI (xAI) metrics (e.g. variable importance, partial dependence plots, accumulated local effect plots, and effect estimates) to interpret trained DNNs. 'cito' optionally provides confidence intervals (and p-values) for all xAI metrics and predictions. At the same time, 'cito' is computationally efficient because it is based on the deep learning framework 'torch'. The 'torch' package is native to R, so no Python installation or other API is required for this package.
Maintained by Maximilian Pichler. Last updated 2 months ago.
machine-learningneural-network
1.8 match 42 stars 9.07 score 129 scripts 1 dependentsstscl
gdverse:Analysis of Spatial Stratified Heterogeneity
Analyzing spatial factors and exploring spatial associations based on the concept of spatial stratified heterogeneity, while also taking into account local spatial dependencies, spatial interpretability, complex spatial interactions, and robust spatial stratification. Additionally, it supports the spatial stratified heterogeneity family established in academic literature.
Maintained by Wenbo Lv. Last updated 2 days ago.
geographical-detectorgeoinformaticsgeospatial-analysisspatial-statisticsspatial-stratified-heterogeneitycpp
1.8 match 32 stars 9.07 score 41 scripts 2 dependentsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 16 days ago.
1.3 match 12 stars 11.88 score 448 scripts 16 dependentsleondap
recluster:Ordination Methods for the Analysis of Beta-Diversity Indices
The analysis of different aspects of biodiversity requires specific algorithms. For example, in regionalisation analyses, the high frequency of ties and zero values in dissimilarity matrices produced by Beta-diversity turnover produces hierarchical cluster dendrograms whose topology and bootstrap supports are affected by the order of rows in the original matrix. Moreover, visualisation of biogeographical regionalisation can be facilitated by a combination of hierarchical clustering and multi-dimensional scaling. The recluster package provides robust techniques to visualise and analyse pattern of biodiversity and to improve occurrence data for cryptic taxa.
Maintained by Leonardo Dapporto. Last updated 4 months ago.
3.3 match 4 stars 4.69 score 41 scriptsr-forge
modEvA:Model Evaluation and Analysis
Analyses species distribution models and evaluates their performance. It includes functions for variation partitioning, extracting variable importance, computing several metrics of model discrimination and calibration performance, optimizing prediction thresholds based on a number of criteria, performing multivariate environmental similarity surface (MESS) analysis, and displaying various analytical plots. Initially described in Barbosa et al. (2013) <doi:10.1111/ddi.12100>.
Maintained by A. Marcia Barbosa. Last updated 11 days ago.
2.3 match 6.82 score 269 scripts 3 dependentsmrc-ide
monty:Monte Carlo Models
Experimental sources for the next generation of mcstate, now called 'monty', which will support much of the old mcstate functionality but new things like better parameter interfaces, Hamiltonian Monte Carlo, and other features.
Maintained by Rich FitzJohn. Last updated 1 months ago.
2.0 match 3 stars 7.52 score 29 scripts 3 dependentsreumandc
wsyn:Wavelet Approaches to Studies of Synchrony in Ecology and Other Fields
Tools for a wavelet-based approach to analyzing spatial synchrony, principally in ecological data. Some tools will be useful for studying community synchrony. See, for instance, Sheppard et al (2016) <doi: 10.1038/NCLIMATE2991>, Sheppard et al (2017) <doi: 10.1051/epjnbp/2017000>, Sheppard et al (2019) <doi: 10.1371/journal.pcbi.1006744>.
Maintained by Daniel C. Reuman. Last updated 3 years ago.
3.1 match 1 stars 4.80 score 125 scriptscran
BioM2:Biologically Explainable Machine Learning Framework
Biologically Explainable Machine Learning Framework for Phenotype Prediction using omics data described in Chen and Schwarz (2017) <doi:10.48550/arXiv.1712.00336>.Identifying reproducible and interpretable biological patterns from high-dimensional omics data is a critical factor in understanding the risk mechanism of complex disease. As such, explainable machine learning can offer biological insight in addition to personalized risk scoring.In this process, a feature space of biological pathways will be generated, and the feature space can also be subsequently analyzed using WGCNA (Described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559> ) methods.
Maintained by Shunjie Zhang. Last updated 26 days ago.
5.6 match 2.65 score 9 scriptsveseshan
clinfun:Clinical Trial Design and Data Analysis Functions
Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2-stage and group sequential designs and for data analysis such as Jonckheere-Terpstra test and estimating survival quantiles.
Maintained by Venkatraman E. Seshan. Last updated 1 years ago.
1.9 match 5 stars 7.86 score 124 scripts 8 dependentsmodeloriented
shapper:Wrapper of Python Library 'shap'
Provides SHAP explanations of machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the Interpretable Machine Learning, there are more and more new ideas for explaining black-box models. One of the best known method for local explanations is SHapley Additive exPlanations (SHAP) introduced by Lundberg, S., et al., (2016) <arXiv:1705.07874> The SHAP method is used to calculate influences of variables on the particular observation. This method is based on Shapley values, a technique used in game theory. The R package 'shapper' is a port of the Python library 'shap'.
Maintained by Szymon Maksymiuk. Last updated 2 years ago.
2.0 match 58 stars 7.31 score 59 scriptsbioc
variancePartition:Quantify and interpret drivers of variation in multilevel gene expression experiments
Quantify and interpret multiple sources of biological and technical variation in gene expression experiments. Uses a linear mixed model to quantify variation in gene expression attributable to individual, tissue, time point, or technical variables. Includes dream differential expression analysis for repeated measures.
Maintained by Gabriel E. Hoffman. Last updated 2 months ago.
rnaseqgeneexpressiongenesetenrichmentdifferentialexpressionbatcheffectqualitycontrolregressionepigeneticsfunctionalgenomicstranscriptomicsnormalizationpreprocessingmicroarrayimmunooncologysoftware
1.3 match 7 stars 11.69 score 1.1k scripts 3 dependentsgsmolinski
dedupewider:Deduplication Across Multiple Columns
Duplicated data can exist in different rows and columns and user may need to treat observations (rows) connected by duplicated data as one observation, e.g. companies can belong to one family (and thus: be one company) by sharing some telephone numbers. This package allows to find connected rows based on data on chosen columns and collapse it into one row.
Maintained by Grzegorz Smoliński. Last updated 3 years ago.
3.3 match 4 stars 4.30 score 7 scriptsbioc
monocle:Clustering, differential expression, and trajectory analysis for single- cell RNA-Seq
Monocle performs differential expression and time-series analysis for single-cell expression experiments. It orders individual cells according to progress through a biological process, without knowing ahead of time which genes define progress through that process. Monocle also performs differential expression analysis, clustering, visualization, and other useful tasks on single cell expression data. It is designed to work with RNA-Seq and qPCR data, but could be used with other types as well.
Maintained by Cole Trapnell. Last updated 5 months ago.
immunooncologysequencingrnaseqgeneexpressiondifferentialexpressioninfrastructuredataimportdatarepresentationvisualizationclusteringmultiplecomparisonqualitycontrolcpp
1.6 match 8.89 score 1.6k scripts 2 dependentstagteam
pec:Prediction Error Curves for Risk Prediction Models in Survival Analysis
Validation of risk predictions obtained from survival models and competing risk models based on censored data using inverse weighting and cross-validation. Most of the 'pec' functionality has been moved to 'riskRegression'.
Maintained by Thomas A. Gerds. Last updated 2 years ago.
1.9 match 7.42 score 512 scripts 26 dependentsrobson-fernandes
bnviewer:Bayesian Networks Interactive Visualization and Explainable Artificial Intelligence
Bayesian networks provide an intuitive framework for probabilistic reasoning and its graphical nature can be interpreted quite clearly. Graph based methods of machine learning are becoming more popular because they offer a richer model of knowledge that can be understood by a human in a graphical format. The 'bnviewer' is an R Package that allows the interactive visualization of Bayesian Networks. The aim of this package is to improve the Bayesian Networks visualization over the basic and static views offered by existing packages.
Maintained by Robson Fernandes. Last updated 5 years ago.
bayesian-inferencebayesian-networkbayesian-networksprobabilistic-graphical-models
2.9 match 7 stars 4.86 score 69 scripts 1 dependentsgforge
Gmisc:Descriptive Statistics, Transition Plots, and More
Tools for making the descriptive "Table 1" used in medical articles, a transition plot for showing changes between categories (also known as a Sankey diagram), flow charts by extending the grid package, a method for variable selection based on the SVD, Bézier lines with arrows complementing the ones in the 'grid' package, and more.
Maintained by Max Gordon. Last updated 2 years ago.
1.3 match 50 stars 10.40 score 233 scripts 2 dependentsjcrodriguez1989
chatgpt:Interface to 'ChatGPT' from R
'OpenAI's 'ChatGPT' <https://chat.openai.com/> coding assistant for 'RStudio'. A set of functions and 'RStudio' addins that aim to help the R developer in tedious coding tasks.
Maintained by Juan Cruz Rodriguez. Last updated 3 months ago.
assistantchatgptgpt-3gpt-4hacktoberfestllmnlpopenairstatsesrstudiorstudio-addin
2.0 match 321 stars 6.81 score 50 scriptsmodeloriented
iBreakDown:Model Agnostic Instance Level Variable Attributions
Model agnostic tool for decomposition of predictions from black boxes. Supports additive attributions and attributions with interactions. The Break Down Table shows contributions of every variable to a final prediction. The Break Down Plot presents variable contributions in a concise graphical way. This package works for classification and regression models. It is an extension of the 'breakDown' package (Staniak and Biecek 2018) <doi:10.32614/RJ-2018-072>, with new and faster strategies for orderings. It supports interactions in explanations and has interactive visuals (implemented with 'D3.js' library). The methodology behind is described in the 'iBreakDown' article (Gosiewska and Biecek 2019) <arXiv:1903.11420> This package is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.
Maintained by Przemyslaw Biecek. Last updated 1 years ago.
breakdownimlinterpretabilityshapleyxai
1.3 match 84 stars 10.07 score 56 scripts 22 dependentsuscbiostats
partition:Agglomerative Partitioning Framework for Dimension Reduction
A fast and flexible framework for agglomerative partitioning. 'partition' uses an approach called Direct-Measure-Reduce to create new variables that maintain the user-specified minimum level of information. Each reduced variable is also interpretable: the original variables map to one and only one variable in the reduced data set. 'partition' is flexible, as well: how variables are selected to reduce, how information loss is measured, and the way data is reduced can all be customized. 'partition' is based on the Partition framework discussed in Millstein et al. (2020) <doi:10.1093/bioinformatics/btz661>.
Maintained by Malcolm Barrett. Last updated 4 months ago.
data-reductiondimensionality-reductionpartitional-clusteringopenblascpp
1.7 match 36 stars 7.72 score 27 scripts 1 dependentscwolock
survML:Tools for Flexible Survival Analysis Using Machine Learning
Statistical tools for analyzing time-to-event data using machine learning. Implements survival stacking for conditional survival estimation, standardized survival function estimation for current status data, and methods for algorithm-agnostic variable importance. See Wolock CJ, Gilbert PB, Simon N, and Carone M (2024) <doi:10.1080/10618600.2024.2304070>.
Maintained by Charles Wolock. Last updated 2 months ago.
1.6 match 16 stars 8.06 score 73 scripts 1 dependentsangelospsy
multifear:Multiverse Analyses for Conditioning Data
A suite of functions for performing analyses, based on a multiverse approach, for conditioning data. Specifically, given the appropriate data, the functions are able to perform t-tests, analyses of variance, and mixed models for the provided data and return summary statistics and plots. The function is also able to return for all those tests p-values, confidence intervals, and Bayes factors. The methods are described in Lonsdorf, Gerlicher, Klingelhofer-Jens, & Krypotos (2022) <doi:10.1016/j.brat.2022.104072>.
Maintained by Angelos-Miltiadis Krypotos. Last updated 1 years ago.
3.1 match 3 stars 4.18 score 7 scriptsroelandkindt
BiodiversityR:Package for Community Ecology and Suitability Analysis
Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.
Maintained by Roeland Kindt. Last updated 2 months ago.
1.8 match 16 stars 7.42 score 390 scripts 2 dependentsmichaelchirico
potools:Tools for Internationalization and Portability in R Packages
Translating messages in R packages is managed using the po top-level directory and the 'gettext' program. This package provides some helper functions for building this support in R packages, e.g. common validation & I/O tasks.
Maintained by Michael Chirico. Last updated 9 months ago.
1.8 match 59 stars 7.20 score 15 scriptsbioc
glmSparseNet:Network Centrality Metrics for Elastic-Net Regularized Models
glmSparseNet is an R-package that generalizes sparse regression models when the features (e.g. genes) have a graph structure (e.g. protein-protein interactions), by including network-based regularizers. glmSparseNet uses the glmnet R-package, by including centrality measures of the network as penalty weights in the regularization. The current version implements regularization based on node degree, i.e. the strength and/or number of its associated edges, either by promoting hubs in the solution or orphan genes in the solution. All the glmnet distribution families are supported, namely "gaussian", "poisson", "binomial", "multinomial", "cox", and "mgaussian".
Maintained by André Veríssimo. Last updated 5 months ago.
softwarestatisticalmethoddimensionreductionregressionclassificationsurvivalnetworkgraphandnetwork
1.7 match 6 stars 7.42 score 41 scripts 1 dependentsmrc-ide
odin2:Next generation odin
Temporary package for rewriting odin.
Maintained by Rich FitzJohn. Last updated 2 months ago.
2.0 match 5 stars 6.32 score 22 scriptsbioc
psichomics:Graphical Interface for Alternative Splicing Quantification, Analysis and Visualisation
Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), Sequence Read Archive (SRA) and user-provided data. The tool interactively performs survival, dimensionality reduction and median- and variance-based differential splicing and gene expression analyses that benefit from the incorporation of clinical and molecular sample-associated features (such as tumour stage or survival). Interactive visual access to genomic mapping and functional annotation of selected alternative splicing events is also included.
Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.
sequencingrnaseqalternativesplicingdifferentialsplicingtranscriptionguiprincipalcomponentsurvivalbiomedicalinformaticstranscriptomicsimmunooncologyvisualizationmultiplecomparisongeneexpressiondifferentialexpressionalternative-splicingbioconductordata-analysesdifferential-gene-expressiondifferential-splicing-analysisgene-expressiongtexrecount2rna-seq-datasplicing-quantificationsratcgavast-toolscpp
1.8 match 36 stars 6.95 score 31 scriptskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
5.4 match 2.28 score 38 scriptsteebusch
mifa:Multiple Imputation for Exploratory Factor Analysis
Impute the covariance matrix of incomplete data so that factor analysis can be performed. Imputations are made using multiple imputation by Multivariate Imputation with Chained Equations (MICE) and combined with Rubin's rules. Parametric Fieller confidence intervals and nonparametric bootstrap confidence intervals can be obtained for the variance explained by different numbers of principal components. The method is described in Nassiri et al. (2018) <doi:10.3758/s13428-017-1013-4>.
Maintained by Tobias Busch. Last updated 4 years ago.
4.1 match 2 stars 3.00 score 5 scriptsmdsteiner
EFAtools:Fast and Flexible Implementations of Exploratory Factor Analysis Tools
Provides functions to perform exploratory factor analysis (EFA) procedures and compare their solutions. The goal is to provide state-of-the-art factor retention methods and a high degree of flexibility in the EFA procedures. This way, for example, implementations from R 'psych' and 'SPSS' can be compared. Moreover, functions for Schmid-Leiman transformation and the computation of omegas are provided. To speed up the analyses, some of the iterative procedures, like principal axis factoring (PAF), are implemented in C++.
Maintained by Markus Steiner. Last updated 3 months ago.
1.9 match 10 stars 6.57 score 83 scripts 1 dependentsdaniel-jg
BeviMed:Bayesian Evaluation of Variant Involvement in Mendelian Disease
A fast integrative genetic association test for rare diseases based on a model for disease status given allele counts at rare variant sites. Probability of association, mode of inheritance and probability of pathogenicity for individual variants are all inferred in a Bayesian framework - 'A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases', Greene et al 2017 <doi:10.1016/j.ajhg.2017.05.015>.
Maintained by Daniel Greene. Last updated 10 months ago.
3.6 match 1 stars 3.41 score 17 scriptslygitdata
PaLMr:Interface for 'Google Pathways Language Model 2 (PaLM 2)'
'Google's 'PaLM 2' <https://developers.generativeai.google/> as a coding and writing assistant designed for 'R' and 'RStudio.' With a range of functions, including natural language processing and coding optimization, to assist R developers in simplifying tedious coding tasks and content searching.
Maintained by Li Yuan. Last updated 1 years ago.
aigooglegptmachine-learningnlppalm-apipalm2rstudio
3.5 match 6 stars 3.48 scorejeremygelb
geocmeans:Implementing Methods for Spatial Fuzzy Unsupervised Classification
Provides functions to apply spatial fuzzy unsupervised classification, visualize and interpret results. This method is well suited when the user wants to analyze data with a fuzzy clustering algorithm and to account for the spatial dimension of the dataset. In addition, indexes for estimating the spatial consistency and classification quality are proposed. The methods were originally proposed in the field of brain imagery (seed Cai and al. 2007 <doi:10.1016/j.patcog.2006.07.011> and Zaho and al. 2013 <doi:10.1016/j.dsp.2012.09.016>) and recently applied in geography (see Gelb and Apparicio <doi:10.4000/cybergeo.36414>).
Maintained by Jeremy Gelb. Last updated 4 months ago.
clusteringcmeansfuzzy-classification-algorithmsspatial-analysisspatial-fuzzy-cmeansunsupervised-learningcppopenmp
2.0 match 27 stars 6.08 score 90 scriptsniaid
HDStIM:High Dimensional Stimulation Immune Mapping ('HDStIM')
A method for identifying responses to experimental stimulation in mass or flow cytometry that uses high dimensional analysis of measured parameters and can be performed with an end-to-end unsupervised approach. In the context of in vitro stimulation assays where high-parameter cytometry was used to monitor intracellular response markers, using cell populations annotated either through automated clustering or manual gating for a combined set of stimulated and unstimulated samples, 'HDStIM' labels cells as responding or non-responding. The package also provides auxiliary functions to rank intracellular markers based on their contribution to identifying responses and generating diagnostic plots.
Maintained by Rohit Farmer. Last updated 1 years ago.
complexheatmapassaycytofcytometrycytometry-analysis-pipelineflowcytometrystimulation
2.8 match 3 stars 4.41 score 17 scriptslleisong
itsdm:Isolation Forest-Based Presence-Only Species Distribution Modeling
Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest) and its variations such as Extended isolation forest and SCiForest. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) <https://proceedings.mlr.press/v48/guha16.html>, Cortes, D. (2021) <arXiv:2110.13402>. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) <https://dl.acm.org/doi/abs/10.5555/3295222.3295230>, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Štrumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including 'WorldClim' version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and 'CMCC-BioClimInd' (see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>.
Maintained by Lei Song. Last updated 2 years ago.
isolation-forestoutlier-detectionpresence-onlymodelshapley-valuespecies-distribution-modelling
2.2 match 4 stars 5.59 score 65 scriptskharchenkolab
conos:Clustering on Network of Samples
Wires together large collections of single-cell RNA-seq datasets, which allows for both the identification of recurrent cell clusters and the propagation of information between datasets in multi-sample or atlas-scale collections. 'Conos' focuses on the uniform mapping of homologous cell types across heterogeneous sample collections. For instance, users could investigate a collection of dozens of peripheral blood samples from cancer patients combined with dozens of controls, which perhaps includes samples of a related tissue such as lymph nodes. This package interacts with data available through the 'conosPanel' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/conos>. The size of the 'conosPanel' package is approximately 12 MB.
Maintained by Evan Biederstedt. Last updated 1 years ago.
batch-correctionscrna-seqsingle-cell-rna-seqopenblascppopenmp
1.7 match 204 stars 7.32 score 258 scriptsrafajpsantos
bagged.outliertrees:Robust Explainable Outlier Detection Based on OutlierTree
Bagged OutlierTrees is an explainable unsupervised outlier detection method based on an ensemble implementation of the existing OutlierTree procedure (Cortes, 2020). This implementation takes advantage of bootstrap aggregating (bagging) to improve robustness by reducing the possible masking effect and subsequent high variance (similarly to Isolation Forest), hence the name "Bagged OutlierTrees". To learn more about the base procedure OutlierTree (Cortes, 2020), please refer to <arXiv:2001.00636>.
Maintained by Rafael Santos. Last updated 4 years ago.
3.4 match 6 stars 3.48 score 8 scriptsgadenbuie
regexplain:Rstudio Addin to Explain, Test and Build Regular Expressions
A set of RStudio Addins to help interactively test and build regular expressions. Provides a Shiny gadget interface for interactively constructing the regular expression and viewing the results from common string-searching functions. The gadget interface includes a helpful regex syntax reference sheet and a library of common patterns.
Maintained by Garrick Aden-Buie. Last updated 4 years ago.
gadgetregexregex-expressionregular-expressionrstudio-addinshinystringr
2.9 match 486 stars 4.07 score 12 scriptsjbgruber
askgpt:Asking GPT About R Stuff
A chat package connecting to API endpoints by 'OpenAI' (<https://platform.openai.com/>) to answer questions (about R).
Maintained by Johannes Gruber. Last updated 10 months ago.
2.0 match 56 stars 5.68 score 17 scriptsccy-dev
LongDat:A Tool for 'Covariate'-Sensitive Longitudinal Analysis on 'omics' Data
This tool takes longitudinal dataset as input and analyzes if there is significant change of the features over time (a proxy for treatments), while detects and controls for 'covariates' simultaneously. 'LongDat' is able to take in several data types as input, including count, proportion, binary, ordinal and continuous data. The output table contains p values, effect sizes and 'covariates' of each feature, making the downstream analysis easy.
Maintained by Chia-Yu Chen. Last updated 4 months ago.
2.4 match 4 stars 4.60 score 4 scriptsjulia-wrobel
registr:Curve Registration for Exponential Family Functional Data
A method for performing joint registration and functional principal component analysis for curves (functional data) that are generated from exponential family distributions. This mainly implements the algorithms described in 'Wrobel et al. (2019)' <doi:10.1111/biom.12963> and further adapts them to potentially incomplete curves where (some) curves are not observed from the beginning and/or until the end of the common domain. Curve registration can be used to better understand patterns in functional data by separating curves into phase and amplitude variability. This software handles both binary and continuous functional data, and is especially applicable in accelerometry and wearable technology.
Maintained by Julia Wrobel. Last updated 3 years ago.
1.7 match 16 stars 6.27 score 29 scriptsjoshwlambert
DAISIEprep:Extracts Phylogenetic Island Community Data from Phylogenetic Trees
Extracts colonisation and branching times of island species to be used for analysis in the R package 'DAISIE'. It uses phylogenetic and endemicity data to extract the separate island colonists and store them.
Maintained by Joshua W. Lambert. Last updated 1 months ago.
data-scienceisland-biogeographyphylogenetics
1.6 match 6 stars 6.78 score 24 scriptsmodeloriented
triplot:Explaining Correlated Features in Machine Learning Models
Tools for exploring effects of correlated features in predictive models. The predict_triplot() function delivers instance-level explanations that calculate the importance of the groups of explanatory variables. The model_triplot() function delivers data-level explanations. The generic plot function visualises in a concise way importance of hierarchical groups of predictors. All of the the tools are model agnostic, therefore works for any predictive machine learning models. Find more details in Biecek (2018) <arXiv:1806.08915>.
Maintained by Katarzyna Pekala. Last updated 4 years ago.
explanationsexplanatory-model-analysismachine-learningmodel-visualizationxai
2.9 match 9 stars 3.65 score 7 scriptslaylaparast
SBdecomp:Estimation of the Proportion of SB Explained by Confounders
Uses parametric and nonparametric methods to quantify the proportion of the estimated selection bias (SB) explained by each observed confounder when estimating propensity score weighted treatment effects. Parast, L and Griffin, BA (2020). "Quantifying the Bias due to Observed Individual Confounders in Causal Treatment Effect Estimates". Statistics in Medicine, 39(18): 2447- 2476 <doi: 10.1002/sim.8549>.
Maintained by Layla Parast. Last updated 3 years ago.
5.3 match 1 stars 2.00 scoremgondan
mathml:Translate R Expressions to 'MathML' and 'LaTeX'/'MathJax'
Translate R expressions to 'MathML' or 'MathJax'/'LaTeX' so that they can be rendered in R markdown documents and shiny apps. This package depends on R package 'rolog', which requires an installation of the 'SWI'-'Prolog' runtime either from 'swi-prolog.org' or from R package 'rswipl'.
Maintained by Matthias Gondan. Last updated 5 hours ago.
1.6 match 4 stars 6.46 score 32 scriptsmaarten14c
rice:Radiocarbon Equations
Provides functions for the calibration of radiocarbon dates, as well as options to calculate different radiocarbon realms (C14 age, F14C, pMC, D14C) and estimating the effects of contamination or local reservoir offsets (Reimer and Reimer 2001 <doi:10.1017/S0033822200038339>). The methods follow long-established recommendations such as Stuiver and Polach (1977) <doi:10.1017/S0033822200003672> and Reimer et al. (2004) <doi:10.1017/S0033822200033154>. This package complements the data package 'rintcal'.
Maintained by Maarten Blaauw. Last updated 2 months ago.
1.7 match 1 stars 6.13 score 13 scripts 4 dependentsbioc
StructuralVariantAnnotation:Variant annotations for structural variants
StructuralVariantAnnotation provides a framework for analysis of structural variants within the Bioconductor ecosystem. This package contains contains useful helper functions for dealing with structural variants in VCF format. The packages contains functions for parsing VCFs from a number of popular callers as well as functions for dealing with breakpoints involving two separate genomic loci encoded as GRanges objects.
Maintained by Daniel Cameron. Last updated 5 months ago.
dataimportsequencingannotationgeneticsvariantannotation
1.7 match 6.26 score 102 scripts 2 dependentsg-rho
xgrove:Explanation Groves
Compute surrogate explanation groves for predictive machine learning models and analyze complexity vs. explanatory power of an explanation according to Szepannek, G. and von Holt, B. (2023) <doi:10.1007/s41237-023-00205-2>.
Maintained by Gero Szepannek. Last updated 2 months ago.
3.0 match 3.40 score 1 scriptsphilipppro
measures:Performance Measures for Statistical Learning
Provides the biggest amount of statistical measures in the whole R world. Includes measures of regression, (multiclass) classification and multilabel classification. The measures come mainly from the 'mlr' package and were programed by several 'mlr' developers.
Maintained by Philipp Probst. Last updated 4 years ago.
2.3 match 1 stars 4.47 score 88 scripts 2 dependentsbioc
peco:A Supervised Approach for **P**r**e**dicting **c**ell Cycle Pr**o**gression using scRNA-seq data
Our approach provides a way to assign continuous cell cycle phase using scRNA-seq data, and consequently, allows to identify cyclic trend of gene expression levels along the cell cycle. This package provides method and training data, which includes scRNA-seq data collected from 6 individual cell lines of induced pluripotent stem cells (iPSCs), and also continuous cell cycle phase derived from FUCCI fluorescence imaging data.
Maintained by Chiaowen Joyce Hsiao. Last updated 5 months ago.
sequencingrnaseqgeneexpressiontranscriptomicssinglecellsoftwarestatisticalmethodclassificationvisualizationcell-cyclesingle-cell-rna-seq
1.7 match 12 stars 6.09 score 34 scriptsbioc
ramwas:Fast Methylome-Wide Association Study Pipeline for Enrichment Platforms
A complete toolset for methylome-wide association studies (MWAS). It is specifically designed for data from enrichment based methylation assays, but can be applied to other data as well. The analysis pipeline includes seven steps: (1) scanning aligned reads from BAM files, (2) calculation of quality control measures, (3) creation of methylation score (coverage) matrix, (4) principal component analysis for capturing batch effects and detection of outliers, (5) association analysis with respect to phenotypes of interest while correcting for top PCs and known covariates, (6) annotation of significant findings, and (7) multi-marker analysis (methylation risk score) using elastic net. Additionally, RaMWAS include tools for joint analysis of methlyation and genotype data. This work is published in Bioinformatics, Shabalin et al. (2018) <doi:10.1093/bioinformatics/bty069>.
Maintained by Andrey A Shabalin. Last updated 5 months ago.
dnamethylationsequencingqualitycontrolcoveragepreprocessingnormalizationbatcheffectprincipalcomponentdifferentialmethylationvisualization
1.7 match 10 stars 6.08 score 85 scriptsrgcca-factory
RGCCA:Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data
Multi-block data analysis concerns the analysis of several sets of variables (blocks) observed on the same group of individuals. The main aims of the RGCCA package are: to study the relationships between blocks and to identify subsets of variables of each block which are active in their relationships with the other blocks. This package allows to (i) run R/SGCCA and related methods, (ii) help the user to find out the optimal parameters for R/SGCCA such as regularization parameters (tau or sparsity), (iii) evaluate the stability of the RGCCA results and their significance, (iv) build predictive models from the R/SGCCA. (v) Generic print() and plot() functions apply to all these functionalities.
Maintained by Arthur Tenenhaus. Last updated 8 months ago.
1.3 match 12 stars 7.43 score 74 scriptsncss-tech
SoilTaxonomy:A System of Soil Classification for Making and Interpreting Soil Surveys
Taxonomic dictionaries, formative element lists, and functions related to the maintenance, development and application of U.S. Soil Taxonomy. Data and functionality are based on official U.S. Department of Agriculture sources including the latest edition of the Keys to Soil Taxonomy. Descriptions and metadata are obtained from the National Soil Information System or Soil Survey Geographic databases. Other sources are referenced in the data documentation. Provides tools for understanding and interacting with concepts in the U.S. Soil Taxonomic System. Most of the current utilities are for working with taxonomic concepts at the "higher" taxonomic levels: Order, Suborder, Great Group, and Subgroup.
Maintained by Andrew Brown. Last updated 6 months ago.
great-groupncss-techsoilsoil-surveysoil-taxonomysubgroupsuborderusda
1.8 match 15 stars 5.65 scorejiscah
sequoia:Pedigree Inference from SNPs
Multi-generational pedigree inference from incomplete data on hundreds of SNPs, including parentage assignment and sibship clustering. See Huisman (2017) (<DOI:10.1111/1755-0998.12665>) for more information.
Maintained by Jisca Huisman. Last updated 9 months ago.
pedigreepedigree-reconstructionpedigreessequoiasnpsnp-datafortran
1.3 match 26 stars 7.40 score 79 scriptsplantedml
glex:Global Explanations for Tree-Based Models
Global explanations for tree-based models by decomposing regression or classification functions into the sum of main components and interaction components of arbitrary order. Calculates SHAP values and q-interaction SHAP for all values of q for tree-based models such as xgboost.
Maintained by Marvin N. Wright. Last updated 3 days ago.
2.0 match 5 stars 4.75 score 15 scriptsmathurlabstanford
multibiasmeta:Sensitivity Analysis for Multiple Biases in Meta-Analyses
Meta-analyses can be compromised by studies' internal biases (e.g., confounding in nonrandomized studies) as well as by publication bias. This package conducts sensitivity analyses for the joint effects of these biases (per Mathur (2022) <doi:10.31219/osf.io/u7vcb>). These sensitivity analyses address two questions: (1) For a given severity of internal bias across studies and of publication bias, how much could the results change?; and (2) For a given severity of publication bias, how severe would internal bias have to be, hypothetically, to attenuate the results to the null or by a given amount?
Maintained by Peter Solymos. Last updated 2 years ago.
2.4 match 4.00 score 6 scriptsproject-gen3sis
gen3sis:General Engine for Eco-Evolutionary Simulations
Contains an engine for spatially-explicit eco-evolutionary mechanistic models with a modular implementation and several support functions. It allows exploring the consequences of ecological and macroevolutionary processes across realistic or theoretical spatio-temporal landscapes on biodiversity patterns as a general term. Reference: Oskar Hagen, Benjamin Flueck, Fabian Fopp, Juliano S. Cabral, Florian Hartig, Mikael Pontarp, Thiago F. Rangel, Loic Pellissier (2021) "gen3sis: A general engine for eco-evolutionary simulations of the processes that shape Earth's biodiversity" <doi:10.1371/journal.pbio.3001340>.
Maintained by Oskar Hagen. Last updated 1 years ago.
biodiversityecologyevolutionmechanisticmodelmodelingsimulationcpp
1.3 match 29 stars 7.56 score 69 scriptsjinli22
spm:Spatial Predictive Modeling
Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https:www.ga.gov.au/metadata-gateway/metadata/record/gcat_71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https:www.ga.gov.au/metadata-gateway/metadata/record/74030>.
Maintained by Jin Li. Last updated 3 years ago.
1.7 match 3 stars 5.46 score 107 scripts 3 dependentsdgrun
RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data
Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).
Maintained by Dominic Grün. Last updated 4 months ago.
1.9 match 4.74 score 110 scriptscmlmagneville
mFD:Compute and Illustrate the Multiple Facets of Functional Diversity
Computing functional traits-based distances between pairs of species for species gathered in assemblages allowing to build several functional spaces. The package allows to compute functional diversity indices assessing the distribution of species (and of their dominance) in a given functional space for each assemblage and the overlap between assemblages in a given functional space, see: Chao et al. (2018) <doi:10.1002/ecm.1343>, Maire et al. (2015) <doi:10.1111/geb.12299>, Mouillot et al. (2013) <doi:10.1016/j.tree.2012.10.004>, Mouillot et al. (2014) <doi:10.1073/pnas.1317625111>, Ricotta and Szeidl (2009) <doi:10.1016/j.tpb.2009.10.001>. Graphical outputs are included. Visit the 'mFD' website for more information, documentation and examples.
Maintained by Camille Magneville. Last updated 3 months ago.
1.2 match 26 stars 7.35 score 61 scriptsbioc
pipeComp:pipeComp pipeline benchmarking framework
A simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition` class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline` function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.
Maintained by Pierre-Luc Germain. Last updated 5 months ago.
geneexpressiontranscriptomicsclusteringdatarepresentationbenchmarkbioconductorpipeline-benchmarkingpipelinessingle-cell-rna-seq
1.3 match 41 stars 7.02 score 43 scriptsbioc
corral:Correspondence Analysis for Single Cell Data
Correspondence analysis (CA) is a matrix factorization method, and is similar to principal components analysis (PCA). Whereas PCA is designed for application to continuous, approximately normally distributed data, CA is appropriate for non-negative, count-based data that are in the same additive scale. The corral package implements CA for dimensionality reduction of a single matrix of single-cell data, as well as a multi-table adaptation of CA that leverages data-optimized scaling to align data generated from different sequencing platforms by projecting into a shared latent space. corral utilizes sparse matrices and a fast implementation of SVD, and can be called directly on Bioconductor objects (e.g., SingleCellExperiment) for easy pipeline integration. The package also includes additional options, including variations of CA to address overdispersion in count data (e.g., Freeman-Tukey chi-squared residual), as well as the option to apply CA-style processing to continuous data (e.g., proteomic TOF intensities) with the Hellinger distance adaptation of CA.
Maintained by Lauren Hsu. Last updated 5 months ago.
batcheffectdimensionreductiongeneexpressionpreprocessingprincipalcomponentsequencingsinglecellsoftwarevisualization
1.9 match 4.64 score 22 scriptsbioc
SpliceWiz:interactive analysis and visualization of alternative splicing in R
The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization.
Maintained by Alex Chit Hei Wong. Last updated 4 days ago.
softwaretranscriptomicsrnaseqalternativesplicingcoveragedifferentialsplicingdifferentialexpressionguisequencingcppopenmp
1.3 match 16 stars 6.41 score 8 scriptssoumyaray
air:AI Assistant to Write and Understand R Code
An R console utility that lets you ask R related questions to the 'OpenAI' large language model. It can answer 'how-to()' questions by providing code, and 'whatis()' questions by explaining what given code does. You must provision your own key for the 'OpenAI' API <https://platform.openai.com/docs/api-reference>.
Maintained by Soumya Ray. Last updated 1 years ago.
2.1 match 14 stars 3.92 score 12 scriptsbioc
weitrix:Tools for matrices with precision weights, test and explore weighted or sparse data
Data type and tools for working with matrices having precision weights and missing data. This package provides a common representation and tools that can be used with many types of high-throughput data. The meaning of the weights is compatible with usage in the base R function "lm" and the package "limma". Calibrate weights to account for known predictors of precision. Find rows with excess variability. Perform differential testing and find rows with the largest confident differences. Find PCA-like components of variation even with many missing values, rotated so that individual components may be meaningfully interpreted. DelayedArray matrices and BiocParallel are supported.
Maintained by Paul Harrison. Last updated 5 months ago.
softwaredatarepresentationdimensionreductiongeneexpressiontranscriptomicsrnaseqsinglecellregression
1.7 match 4.70 score 8 scriptsoconnellmj
r.jive:Perform JIVE Decomposition for Multi-Source Data
Performs the Joint and Individual Variation Explained (JIVE) decomposition on a list of data sets when the data share a dimension, returning low-rank matrices that capture the joint and individual structure of the data [O'Connell, MJ and Lock, EF (2016) <doi:10.1093/bioinformatics/btw324>]. It provides two methods of rank selection when the rank is unknown, a permutation test and a Bayesian Information Criterion (BIC) selection algorithm. Also included in the package are three plotting functions for visualizing the variance attributed to each data source: a bar plot that shows the percentages of the variability attributable to joint and individual structure, a heatmap that shows the structure of the variability, and principal component plots.
Maintained by Michael J. OConnell. Last updated 4 years ago.
2.5 match 2 stars 3.18 score 75 scriptslrocconi
mlmhelpr:Multilevel/Mixed Model Helper Functions
A collection of miscellaneous helper function for running multilevel/mixed models in 'lme4'. This package aims to provide functions to compute common tasks when estimating multilevel models such as computing the intraclass correlation and design effect, centering variables, estimating the proportion of variance explained at each level, pseudo-R squared, random intercept and slope reliabilities, tests for homogeneity of variance at level-1, and cluster robust and bootstrap standard errors. The tests and statistics reported in the package are from Raudenbush & Bryk (2002, ISBN:9780761919049), Hox et al. (2018, ISBN:9781138121362), and Snijders & Bosker (2012, ISBN:9781849202015).
Maintained by Louis Rocconi. Last updated 3 months ago.
2.5 match 1 stars 3.00 score 10 scriptsbioc
trigger:Transcriptional Regulatory Inference from Genetics of Gene ExpRession
This R package provides tools for the statistical analysis of integrative genomic data that involve some combination of: genotypes, high-dimensional intermediate traits (e.g., gene expression, protein abundance), and higher-order traits (phenotypes). The package includes functions to: (1) construct global linkage maps between genetic markers and gene expression; (2) analyze multiple-locus linkage (epistasis) for gene expression; (3) quantify the proportion of genome-wide variation explained by each locus and identify eQTL hotspots; (4) estimate pair-wise causal gene regulatory probabilities and construct gene regulatory networks; and (5) identify causal genes for a quantitative trait of interest.
Maintained by John D. Storey. Last updated 5 months ago.
geneexpressionsnpgeneticvariabilitymicroarraygenetics
2.2 match 3.30 score 3 scriptsbioc
STATegRa:Classes and methods for multi-omics data integration
Classes and tools for multi-omics data integration.
Maintained by David Gomez-Cabrero. Last updated 5 months ago.
softwarestatisticalmethodclusteringdimensionreductionprincipalcomponent
1.8 match 4.15 score 3 scriptsranjitstat
EEML:Ensemble Explainable Machine Learning Models
We introduced a novel ensemble-based explainable machine learning model using Model Confidence Set (MCS) and two stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) algorithm. The model combined the predictive capabilities of different machine-learning models and integrates the interpretability of explainability methods. To develop the proposed algorithm, a two-stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) framework was employed. The package has been developed using the algorithm of Paul et al. (2023) <doi:10.1007/s40009-023-01218-x> and Yeasin and Paul (2024) <doi:10.1007/s11227-023-05542-3>.
Maintained by Dr. Ranjit Kumar Paul. Last updated 8 months ago.
5.6 match 1.30 scorecoffeemuggler
EMMAgeo:End-Member Modelling of Grain-Size Data
End-member modelling analysis of grain-size data is an approach to unmix a data set's underlying distributions and their contribution to the data set. EMMAgeo provides deterministic and robust protocols for that purpose.
Maintained by Michael Dietze. Last updated 5 years ago.
1.8 match 10 stars 4.13 score 27 scriptssunsmiling
PPtreeregViz:Projection Pursuit Regression Tree Visualization
It was developed as a tool for exploring 'PPTreereg' (Projection Pursuit TREE of REGression). It uses various projection pursuit indexes and 'XAI' (eXplainable Artificial Intelligence) methods to help understand the model by finding connections between the input variables and prediction values of the model. The 'KernelSHAP' (Aas, Jullum and Løland (2019) <arXiv:1903.10464>) algorithm was modified to fit ‘PPTreereg’, and some codes were modified from the 'shapr' package (Sellereite, Nikolai, and Martin Jullum (2020) <doi:10.21105/joss.02027>). The implemented methods help to explore the model at the single instance level as well as at the whole dataset level. Users can compare with other machine learning models by applying it to the 'DALEX' package of 'R'.
Maintained by HyunSun Cho. Last updated 1 years ago.
2.3 match 2 stars 3.00 score 3 scriptsbenjaminschlegel
glm.predict:Predicted Values and Discrete Changes for Regression Models
Functions to calculate predicted values and the difference between the two cases with confidence interval for lm() [linear model], glm() [generalized linear model], glm.nb() [negative binomial model], polr() [ordinal logistic model], vglm() [generalized ordinal logistic model], multinom() [multinomial model], tobit() [tobit model], svyglm() [survey-weighted generalised linear models] and lmer() [linear multilevel models] using Monte Carlo simulations or bootstrap. Reference: Bennet A. Zelner (2009) <doi:10.1002/smj.783>.
Maintained by Benjamin E. Schlegel. Last updated 7 months ago.
1.3 match 1 stars 5.10 score 55 scriptsnzilbb
nzilbb.vowels:Vowel Covariation Tools
Tools to support research on vowel covariation. Methods are provided to support Principal Component Analysis workflows (as in Brand et al. (2021) <doi:10.1016/j.wocn.2021.101096> and Wilson Black et al. (2023) <doi:10.1515/lingvan-2022-0086>).
Maintained by Joshua Wilson Black. Last updated 3 months ago.
1.8 match 3.88 score 15 scriptscran
ciu:Contextual Importance and Utility
Implementation of the Contextual Importance and Utility (CIU) concepts for Explainable AI (XAI). A recent description of CIU can be found in e.g. Främling (2020) <arXiv:2009.13996>.
Maintained by Kary Främling. Last updated 2 years ago.
6.8 match 1.00 scorel-ramirez-lopez
resemble:Memory-Based Learning in Spectral Chemometrics
Functions for dissimilarity analysis and memory-based learning (MBL, a.k.a local modeling) in complex spectral data sets. Most of these functions are based on the methods presented in Ramirez-Lopez et al. (2013) <doi:10.1016/j.geoderma.2012.12.014>.
Maintained by Leonardo Ramirez-Lopez. Last updated 2 years ago.
chemoinformaticschemometricsinfrared-spectroscopylazy-learninglocal-regressionmachine-learningmemory-based-learningnirpedometricssoil-spectroscopyspectral-dataspectral-libraryspectroscopyopenblascppopenmp
1.1 match 20 stars 5.91 score 27 scripts