R-universe search: explainability

modeloriented

DALEX:moDel Agnostic Language for Exploration and eXplanation

Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.

Maintained by Przemyslaw Biecek. Last updated 1 months ago.

black-box dalex data-science explainable-ai explainable-artificial-intelligence explainable-ml explanations explanatory-model-analysis fairness iml interpretability interpretable-machine-learning machine-learning model-visualization predictive-modeling responsible-ai responsible-ml xai

38.0 match 1.4k stars 13.40 score 876 scripts 21 dependents

persimune

explainer:Machine Learning Model Explainer

It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements.

Maintained by Ramtin Zargari Marandi. Last updated 6 months ago.

ai classification clinical-research explainability explainable-ai interpretability machine-learning regression shap statistics

75.7 match 13 stars 5.37 score 12 scripts

carloshellin

LearningRlab:Statistical Learning Functions

Aids in learning statistical functions incorporating the result of calculus done with each function and how they are obtained, that is, which equation and variables are used. Also for all these equations and their related variables detailed explanations and interactive exercises are also included. All these characteristics allow to the package user to improve the learning of statistics basics by means of their use.

Maintained by Carlos Javier Hellin Asensio. Last updated 2 years ago.

99.5 match 3.64 score 44 scripts

norskregnesentral

shapr:Prediction Explanation with Dependence-Aware Shapley Values

Complex machine learning models are often hard to interpret. However, in many situations it is crucial to understand and explain why a model made a specific prediction. Shapley values is the only method for such prediction explanation framework with a solid theoretical foundation. Previously known methods for estimating the Shapley values do, however, assume feature independence. This package implements methods which accounts for any feature dependence, and thereby produces more accurate estimates of the true Shapley values. An accompanying 'Python' wrapper ('shaprpy') is available through the GitHub repository.

Maintained by Martin Jullum. Last updated 1 months ago.

explainable-ai explainable-ml rcpp rcpparmadillo shapley openblas cpp openmp

30.1 match 153 stars 10.62 score 175 scripts 1 dependents

modeloriented

survex:Explainable Machine Learning in Survival Analysis

Survival analysis models are commonly used in medicine and other areas. Many of them are too complex to be interpreted by human. Exploration and explanation is needed, but standard methods do not give a broad enough picture. 'survex' provides easy-to-apply methods for explaining survival models, both complex black-boxes and simpler statistical models. They include methods specific to survival analysis such as SurvSHAP(t) introduced in Krzyzinski et al., (2023) <doi:10.1016/j.knosys.2022.110234>, SurvLIME described in Kovalev et al., (2020) <doi:10.1016/j.knosys.2020.106164> as well as extensions of existing ones described in Biecek et al., (2021) <doi:10.1201/9780429027192>.

Maintained by Mikołaj Spytek. Last updated 9 months ago.

biostatistics brier-scores censored-data cox-model cox-regression explainable-ai explainable-machine-learning explainable-ml explanatory-model-analysis interpretable-machine-learning interpretable-ml machine-learning probabilistic-machine-learning shap survival-analysis time-to-event variable-importance xai

37.6 match 110 stars 8.40 score 114 scripts

svkucheryavski

mdatools:Multivariate Data Analysis for Chemometrics

Projection based methods for preprocessing, exploring and analysis of multivariate data used in chemometrics. S. Kucheryavskiy (2020) <doi:10.1016/j.chemolab.2020.103937>.

Maintained by Sergey Kucheryavskiy. Last updated 8 months ago.

30.7 match 35 stars 7.37 score 220 scripts 1 dependents

modeloriented

modelStudio:Interactive Studio for Explanatory Model Analysis

Automate the explanatory analysis of machine learning predictive models. Generate advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks. The main function computes various (instance and model-level) explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. It is possible to easily save the dashboard and share it with others. 'modelStudio' facilitates the process of Interactive Explanatory Model Analysis introduced in Baniecki et al. (2023) <doi:10.1007/s10618-023-00924-w>.

Maintained by Hubert Baniecki. Last updated 2 years ago.

ai explainable explainable-ai explainable-machine-learning explanatory-model-analysis human iml interactive interactivity interpretability interpretable interpretable-machine-learning learning machine model model-visualization visualization xai

24.2 match 330 stars 7.92 score 56 scripts

modeloriented

treeshap:Compute SHAP Values for Your Tree-Based Models Using the 'TreeSHAP' Algorithm

An efficient implementation of the 'TreeSHAP' algorithm introduced by Lundberg et al., (2020) <doi:10.1038/s42256-019-0138-9>. It is capable of calculating SHAP (SHapley Additive exPlanations) values for tree-based models in polynomial time. Currently supported models include 'gbm', 'randomForest', 'ranger', 'xgboost', 'lightgbm'.

Maintained by Mateusz Krzyzinski. Last updated 1 years ago.

explainability explainable-ai explainable-artificial-intelligence explanatory-model-analysis iml interpretability interpretable-machine-learning machine-learning responsible-ml shap shapley-value xai cpp

24.2 match 82 stars 6.69 score 170 scripts

bgreenwell

fastshap:Fast Approximate Shapley Values

Computes fast (relative to other implementations) approximate Shapley values for any supervised learning model. Shapley values help to explain the predictions from any black box model using ideas from game theory; see Strumbel and Kononenko (2014) <doi:10.1007/s10115-013-0679-x> for details.

Maintained by Brandon Greenwell. Last updated 1 years ago.

explainable-ai explainable-ml interpretable-machine-learning shapley shapley-values variable-importance xai cpp

18.8 match 118 stars 8.56 score 155 scripts 2 dependents

bgreenwell

ebm:Explainable Boosting Machines

An interface to the 'Python' 'InterpretML' framework for fitting explainable boosting machines (EBMs); see Nori et al. (2019) <doi:10.48550/arXiv.1909.09223> for. EBMs are a modern type of generalized additive model that use tree-based, cyclic gradient boosting with automatic interaction detection. They are often as accurate as state-of-the-art blackbox models while remaining completely interpretable.

Maintained by Brandon M. Greenwell. Last updated 11 days ago.

ai blackbox explainable-ai explainable-machine-learning explainable-ml glassbox iml interpretability interpretability-and-explainability interpretable interpretable-ai interpretable-machine-learning interpretable-ml interpretable-models machine-learning xai

34.9 match 1 stars 4.60 score

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

13.9 match 228 stars 11.43 score 221 scripts 1 dependents

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 13 days ago.

data-manipulation grammar cpp

5.3 match 4.8k stars 24.68 score 659k scripts 7.8k dependents

modeloriented

arenar:Arena for the Exploration and Comparison of any ML Models

Generates data for challenging machine learning models in 'Arena' <https://arena.drwhy.ai> - an interactive web application. You can start the server with XAI (Explainable Artificial Intelligence) plots to be generated on-demand or precalculate and auto-upload data file beside shareable 'Arena' URL.

Maintained by Piotr Piątyszek. Last updated 4 years ago.

axplainable-artificial-intelligence ema explainability explanatory-model-analysis iml interactive-xai interpretability xai

20.9 match 31 stars 5.94 score 14 scripts

bioc

MOFA2:Multi-Omics Factor Analysis v2

The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.

Maintained by Ricard Argelaguet. Last updated 5 months ago.

dimensionreduction bayesian visualization factor-analysis mofa multi-omics

12.0 match 319 stars 10.02 score 502 scripts

modeloriented

vivo:Variable Importance via Oscillations

Provides an easy to calculate local variable importance measure based on Ceteris Paribus profile and global variable importance measure based on Partial Dependence Profiles.

Maintained by Anna Kozak. Last updated 4 years ago.

explainable-ai explainable-artificial-intelligence explainable-ml iml interpretable-machine-learning variable-importance xai

21.7 match 14 stars 5.45 score 7 scripts

modeloriented

fairmodels:Flexible Tool for Bias Detection, Visualization, and Mitigation

Measure fairness metrics in one place for many models. Check how big is model's bias towards different races, sex, nationalities etc. Use measures such as Statistical Parity, Equal odds to detect the discrimination against unprivileged groups. Visualize the bias using heatmap, radar plot, biplot, bar chart (and more!). There are various pre-processing and post-processing bias mitigation algorithms implemented. Package also supports calculating fairness metrics for regression models. Find more details in (Wiśniewski, Biecek (2021)) <arXiv:2104.00507>.

Maintained by Jakub Wiśniewski. Last updated 1 months ago.

explain-classifiers explainable-ml fairness fairness-comparison fairness-ml model-evaluation

15.0 match 86 stars 7.72 score 51 scripts 1 dependents

bioc

BatchQC:Batch Effects Quality Control Software

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.

Maintained by Jessica McClintock. Last updated 5 months ago.

batcheffect graphandnetwork microarray normalization principalcomponent sequencing software visualization qualitycontrol rnaseq preprocessing differentialexpression immunooncology

11.9 match 7 stars 8.99 score 54 scripts

modeloriented

DALEXtra:Extension for 'DALEX' Package

Provides wrapper of various machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the interpretable machine learning, there are more and more new ideas for explaining black-box models, that are implemented in 'R'. 'DALEXtra' creates 'DALEX' Biecek (2018) <arXiv:1806.08915> explainer for many type of models including those created using 'python' 'scikit-learn' and 'keras' libraries, and 'java' 'h2o' library. Important part of the package is Champion-Challenger analysis and innovative approach to model performance across subsets of test data presented in Funnel Plot.

Maintained by Szymon Maksymiuk. Last updated 2 years ago.

extension-for-dalex-package

12.1 match 67 stars 7.71 score 400 scripts 1 dependents

modeloriented

ingredients:Effects and Importances of Model Ingredients

Collection of tools for assessment of feature importance and feature effects. Key functions are: feature_importance() for assessment of global level feature importance, ceteris_paribus() for calculation of the what-if plots, partial_dependence() for partial dependence plots, conditional_dependence() for conditional dependence plots, accumulated_dependence() for accumulated local effects plots, aggregate_profiles() and cluster_profiles() for aggregation of ceteris paribus profiles, generic print() and plot() for better usability of selected explainers, generic plotD3() for interactive, D3 based explanations, and generic describe() for explanations in natural language. The package 'ingredients' is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.

Maintained by Przemyslaw Biecek. Last updated 2 years ago.

8.7 match 37 stars 10.38 score 83 scripts 22 dependents

thomasp85

lime:Local Interpretable Model-Agnostic Explanations

When building complex models, it is often difficult to explain why the model should be trusted. While global measures such as accuracy are useful, they cannot be used for explaining why a model made a specific prediction. 'lime' (a port of the 'lime' 'Python' package) is a method for explaining the outcome of black box models by fitting a local model around the point in question an perturbations of this point. The approach is described in more detail in the article by Ribeiro et al. (2016) <arXiv:1602.04938>.

Maintained by Emil Hvitfeldt. Last updated 3 years ago.

caret model-checking model-evaluation modeling cpp

7.6 match 485 stars 11.07 score 732 scripts 1 dependents

modeloriented

shapviz:SHAP Visualizations

Visualizations for SHAP (SHapley Additive exPlanations), such as waterfall plots, force plots, various types of importance plots, dependence plots, and interaction plots. These plots act on a 'shapviz' object created from a matrix of SHAP values and a corresponding feature dataset. Wrappers for the R packages 'xgboost', 'lightgbm', 'fastshap', 'shapr', 'h2o', 'treeshap', 'DALEX', and 'kernelshap' are added for convenience. By separating visualization and computation, it is possible to display factor variables in graphs, even if the SHAP values are calculated by a model that requires numerical features. The plots are inspired by those provided by the 'shap' package in Python, but there is no dependency on it.

Maintained by Michael Mayer. Last updated 2 months ago.

explainable-ai machine-learning shap shapley-value visualization xai

7.5 match 89 stars 9.95 score 250 scripts

r-lib

generics:Common S3 Generics not Provided by Base R Methods Related to Model Fitting

In order to reduce potential package dependencies and conflicts, generics provides a number of commonly used S3 generics.

Maintained by Hadley Wickham. Last updated 1 years ago.

5.3 match 61 stars 14.00 score 131 scripts 9.8k dependents

pbiecek

ceterisParibus:Ceteris Paribus Profiles

Ceteris Paribus Profiles (What-If Plots) are designed to present model responses around selected points in a feature space. For example around a single prediction for an interesting observation. Plots are designed to work in a model-agnostic fashion, they are working for any predictive Machine Learning model and allow for model comparisons. Ceteris Paribus Plots supplement the Break Down Plots from 'breakDown' package.

Maintained by Przemyslaw Biecek. Last updated 5 years ago.

13.1 match 42 stars 5.48 score 36 scripts

modeloriented

randomForestExplainer:Explaining and Visualizing Random Forests in Terms of Variable Importance

A set of tools to help explain which variables are most important in a random forests. Various variable importance measures are calculated and visualized in different settings in order to get an idea on how their importance changes depending on our criteria (Hemant Ishwaran and Udaya B. Kogalur and Eiran Z. Gorodeski and Andy J. Minn and Michael S. Lauer (2010) <doi:10.1198/jasa.2009.tm08622>, Leo Breiman (2001) <doi:10.1023/A:1010933404324>).

Maintained by Yue Jiang. Last updated 12 months ago.

random-forest

6.9 match 231 stars 9.82 score 236 scripts

juanv66x

viralx:Explainers for Regression Models in HIV Research

A dedicated viral-explainer model tool designed to empower researchers in the field of HIV research, particularly in viral load and CD4 (Cluster of Differentiation 4) lymphocytes regression modeling. Drawing inspiration from the 'tidymodels' framework for rigorous model building of Max Kuhn and Hadley Wickham (2020) <https://www.tidymodels.org>, and the 'DALEXtra' tool for explainability by Przemyslaw Biecek (2020) <doi:10.48550/arXiv.2009.13248>. It aims to facilitate interpretable and reproducible research in biostatistics and computational biology for the benefit of understanding HIV dynamics.

Maintained by Juan Pablo Acuña González. Last updated 4 months ago.

21.5 match 3.00 score 1 scripts

modeloriented

kernelshap:Kernel SHAP

Efficient implementation of Kernel SHAP, see Lundberg and Lee (2017), and Covert and Lee (2021) <http://proceedings.mlr.press/v130/covert21a>. Furthermore, for up to 14 features, exact permutation SHAP values can be calculated. The package plays well together with meta-learning packages like 'tidymodels', 'caret' or 'mlr3'. Visualizations can be done using the R package 'shapviz'.

Maintained by Michael Mayer. Last updated 6 months ago.

explainable-ai interpretability interpretable-machine-learning machine-learning shap xai

7.5 match 45 stars 8.38 score 117 scripts 17 dependents

bioc

scater:Single-Cell Analysis Toolkit for Gene Expression Data in R

A collection of tools for doing various analyses of single-cell RNA-seq gene expression data, with a focus on quality control and visualization.

Maintained by Alan OCallaghan. Last updated 10 days ago.

immunooncology singlecell rnaseq qualitycontrol preprocessing normalization visualization dimensionreduction transcriptomics geneexpression sequencing software dataimport datarepresentation infrastructure coverage

5.3 match 11.07 score 12k scripts 43 dependents

modeloriented

auditor:Model Audit - Verification, Validation, and Error Analysis

Provides an easy to use unified interface for creating validation plots for any model. The 'auditor' helps to avoid repetitive work consisting of writing code needed to create residual plots. This visualizations allow to asses and compare the goodness of fit, performance, and similarity of models.

Maintained by Alicja Gosiewska. Last updated 1 years ago.

classification error-analysis explainable-artificial-intelligence machine-learning model-validation regression-models residuals xai

6.7 match 58 stars 8.76 score 94 scripts 2 dependents

bioc

MBECS:Evaluation and correction of batch effects in microbiome data-sets

The Microbiome Batch Effect Correction Suite (MBECS) provides a set of functions to evaluate and mitigate unwated noise due to processing in batches. To that end it incorporates a host of batch correcting algorithms (BECA) from various packages. In addition it offers a correction and reporting pipeline that provides a preliminary look at the characteristics of a data-set before and after correcting for batch effects.

Maintained by Michael Olbrich. Last updated 5 months ago.

batcheffect microbiome reportwriting visualization normalization qualitycontrol

12.7 match 4 stars 4.60 score 4 scripts

forestry-labs

distillML:Model Distillation and Interpretability Methods for Machine Learning Models

Provides several methods for model distillation and interpretability for general black box machine learning models and treatment effect estimation methods. For details on the algorithms implemented, see <https://forestry-labs.github.io/distillML/index.html> Brian Cho, Theo F. Saarinen, Jasjeet S. Sekhon, Simon Walter.

Maintained by Theo Saarinen. Last updated 2 years ago.

bart distillation-model explainable-machine-learning explainable-ml interpretability interpretable-machine-learning machine-learning model random-forest xgboost

14.2 match 7 stars 3.92 score 12 scripts

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 4 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

4.0 match 182 stars 13.71 score 1.3k scripts 22 dependents

bioc

PCAtools:PCAtools: Everything Principal Components Analysis

Principal Component Analysis (PCA) is a very powerful technique that has wide applicability in data science, bioinformatics, and further afield. It was initially developed to analyse large volumes of data in order to tease out the differences/relationships between the logical entities being analysed. It extracts the fundamental structure of the data without the need to build any model to represent it. This 'summary' of the data is arrived at through a process of reduction that can transform the large number of variables into a lesser number that are uncorrelated (i.e. the 'principal components'), while at the same time being capable of easy interpretation on the original data. PCAtools provides functions for data exploration via PCA, and allows the user to generate publication-ready figures. PCA is performed via BiocSingular - users can also identify optimal number of principal components via different metrics, such as elbow method and Horn's parallel analysis, which has relevance for data reduction in single-cell RNA-seq (scRNA-seq) and high dimensional mass cytometry data.

Maintained by Kevin Blighe. Last updated 5 months ago.

rnaseq atacseq geneexpression transcription singlecell principalcomponent cpp

4.9 match 343 stars 11.12 score 832 scripts 2 dependents

robertoalcantara9

LearnClust:Learning Hierarchical Clustering Algorithms

Classical hierarchical clustering algorithms, agglomerative and divisive clustering. Algorithms are implemented as a theoretical way, step by step. It includes some detailed functions that explain each step. Every function allows options to get different results using different techniques. The package explains non expert users how hierarchical clustering algorithms work.

Maintained by Roberto Alcantara. Last updated 4 years ago.

26.3 match 2.04 score 11 scripts

agoutsmedt

biblionetwork:Create Different Types of Bibliometric Networks

Functions to find edges for bibliometric networks like bibliographic coupling network, co-citation network and co-authorship network. The weights of network edges can be calculated according to different methods, depending on the type of networks, the type of nodes, and what you want to analyse. These functions are optimized to be be used on large dataset. The package contains functions inspired by: Leydesdorff, Loet and Park, Han Woo (2017) <doi:10.1016/j.joi.2016.11.007>; Perianes-Rodriguez, Antonio, Ludo Waltman, and Nees Jan Van Eck (2016) <doi:10.1016/j.joi.2016.10.006>; Sen, Subir K. and Shymal K. Gan (1983) <http://nopr.niscair.res.in/handle/123456789/28008>; Shen, Si, Zhu, Danhao, Rousseau, Ronald, Su, Xinning and Wang, Dongbo (2019) <doi:10.1016/j.joi.2019.01.012>; Zhao, Dangzhi and Strotmann, Andreas (2008) <doi:10.1002/meet.2008.1450450292>.

Maintained by Aurélien Goutsmedt. Last updated 2 years ago.

authorship-network bibliometric-networks bibliometrics coupling-angle

10.4 match 7 stars 5.18 score 43 scripts

david-cortes

outliertree:Explainable Outlier Detection Through Decision Tree Conditioning

Outlier detection method that flags suspicious values within observations, constrasting them against the normal values in a user-readable format, potentially describing conditions within the data that make a given outlier more rare. Full procedure is described in Cortes (2020) <doi:10.48550/arXiv.2001.00636>. Loosely based on the 'GritBot' <https://www.rulequest.com/gritbot-info.html> software.

Maintained by David Cortes. Last updated 2 months ago.

anomaly-detection outlier-detection cpp openmp

7.1 match 58 stars 7.34 score 21 scripts 2 dependents

trinker

qdapRegex:Regular Expression Removal, Extraction, and Replacement Tools

A collection of regular expression tools associated with the 'qdap' package that may be useful outside of the context of discourse analysis. Tools include removal/extraction/replacement of abbreviations, dates, dollar amounts, email addresses, hash tags, numbers, percentages, citations, person tags, phone numbers, times, and zip codes.

Maintained by Tyler Rinker. Last updated 1 years ago.

qdapregex regular-expression

5.3 match 50 stars 9.48 score 502 scripts 41 dependents

apxr

analyzer:Data Analysis and Automated R Notebook Generation

Easy data analysis and quality checks which are commonly used in data science. It combines the tabular and graphical visualization for easier usability. This package also creates an R Notebook with detailed data exploration with one function call. The notebook can be made interactive.

Maintained by Apurv Priyam. Last updated 5 years ago.

12.0 match 4.13 score 27 scripts

r-lib

pak:Another Approach to Package Installation

The goal of 'pak' is to make package installation faster and more reliable. In particular, it performs all HTTP operations in parallel, so metadata resolution and package downloads are fast. Metadata and package files are cached on the local disk as well. 'pak' has a dependency solver, so it finds version conflicts before performing the installation. This version of 'pak' supports CRAN, 'Bioconductor' and 'GitHub' packages as well.

Maintained by Gábor Csárdi. Last updated 14 hours ago.

3.8 match 717 stars 13.05 score 277 scripts 17 dependents

modeloriented

EIX:Explain Interactions in 'XGBoost'

Structure mining from 'XGBoost' and 'LightGBM' models. Key functionalities of this package cover: visualisation of tree-based ensembles models, identification of interactions, measuring of variable importance, measuring of interaction importance, explanation of single prediction with break down plots (based on 'xgboostExplainer' and 'iBreakDown' packages). To download the 'LightGBM' use the following link: <https://github.com/Microsoft/LightGBM>. 'EIX' is a part of the 'DrWhy.AI' universe.

Maintained by Ewelina Karbowiak. Last updated 4 years ago.

8.3 match 26 stars 5.72 score 6 scripts

ropensci

elastic:General Purpose Interface to 'Elasticsearch'

Connect to 'Elasticsearch', a 'NoSQL' database built on the 'Java' Virtual Machine. Interacts with the 'Elasticsearch' 'HTTP' API (<https://www.elastic.co/elasticsearch/>), including functions for setting connection details to 'Elasticsearch' instances, loading bulk data, searching for documents with both 'HTTP' query variables and 'JSON' based body requests. In addition, 'elastic' provides functions for interacting with API's for 'indices', documents, nodes, clusters, an interface to the cat API, and more.

Maintained by Scott Chamberlain. Last updated 2 years ago.

database elasticsearch http api search nosql java json documents data-science database-wrapper etl

5.3 match 247 stars 8.98 score 151 scripts 1 dependents

greenwoodlab

pcev:Principal Component of Explained Variance

Principal component of explained variance (PCEV) is a statistical tool for the analysis of a multivariate response vector. It is a dimension- reduction technique, similar to Principal component analysis (PCA), that seeks to maximize the proportion of variance (in the response vector) being explained by a set of covariates.

Maintained by Maxime Turgeon. Last updated 6 years ago.

10.6 match 4 stars 4.30 score 7 scripts

marjoleinf

pre:Prediction Rule Ensembles

Derives prediction rule ensembles (PREs). Largely follows the procedure for deriving PREs as described in Friedman & Popescu (2008; <DOI:10.1214/07-AOAS148>), with adjustments and improvements. The main function pre() derives prediction rule ensembles consisting of rules and/or linear terms for continuous, binary, count, multinomial, and multivariate continuous responses. Function gpe() derives generalized prediction ensembles, consisting of rules, hinge and linear functions of the predictor variables.

Maintained by Marjolein Fokkema. Last updated 9 months ago.

5.1 match 58 stars 8.49 score 98 scripts 1 dependents

srmatth

mshap:Multiplicative SHAP Values for Two-Part Models

Allows for the computation of mSHAP values on two-part models as proposed by Matthews, S. and Hartman, B. (2021) <arXiv:2106.08990>. Also contains functions for simple plotting of the results (or any SHAP values). For information about the TreeSHAP algorithm that mSHAP builds on, see Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.I. (2020) <doi:10.1038/s42256-019-0138-9>.

Maintained by Spencer Matthews. Last updated 3 years ago.

explainable-ai shap

8.8 match 4 stars 4.78 score 15 scripts

laylaparast

Rsurrogate:Robust Estimation of the Proportion of Treatment Effect Explained by Surrogate Marker Information

Provides functions to estimate the proportion of treatment effect on the primary outcome that is explained by the treatment effect on the surrogate marker.

Maintained by Layla Parast. Last updated 2 years ago.

13.2 match 3.16 score 12 scripts 4 dependents

bioc

decompTumor2Sig:Decomposition of individual tumors into mutational signatures by signature refitting

Uses quadratic programming for signature refitting, i.e., to decompose the mutation catalog from an individual tumor sample into a set of given mutational signatures (either Alexandrov-model signatures or Shiraishi-model signatures), computing weights that reflect the contributions of the signatures to the mutation load of the tumor.

Maintained by Rosario M. Piro. Last updated 5 months ago.

software snp sequencing dnaseq genomicvariation somaticmutation biomedicalinformatics genetics biologicalquestion statisticalmethod

8.7 match 1 stars 4.78 score 10 scripts 1 dependents

inesortega

neuralGAM:Interpretable Neural Network Based on Generalized Additive Models

Neural network framework based on Generalized Additive Models from Hastie & Tibshirani (1990, ISBN:9780412343902), which trains a different neural network to estimate the contribution of each feature to the response variable. The networks are trained independently leveraging the local scoring and backfitting algorithms to ensure that the Generalized Additive Model converges and it is additive. The resultant Neural Network is a highly accurate and interpretable deep learning model, which can be used for high-risk AI practices where decision-making should be based on accountable and interpretable algorithms.

Maintained by Ines Ortega-Fernandez. Last updated 6 months ago.

deep-neural-networks explainable-ai gam gann generalized-additive-models generalized-additive-neural-network self-explanatory-ml xai

7.5 match 2 stars 5.44 score 40 scripts

nashjc

optimx:Expanded Replacement and Extension of the 'optim' Function

Provides a replacement and extension of the optim() function to call to several function minimization codes in R in a single statement. These methods handle smooth, possibly box constrained functions of several or many parameters. Note that function 'optimr()' was prepared to simplify the incorporation of minimization codes going forward. Also implements some utility codes and some extra solvers, including safeguarded Newton methods. Many methods previously separate are now included here. This is the version for CRAN.

Maintained by John C Nash. Last updated 2 months ago.

3.1 match 2 stars 12.87 score 1.8k scripts 89 dependents

modeloriented

localModel:LIME-Based Explanations with Interpretable Inputs Based on Ceteris Paribus Profiles

Local explanations of machine learning models describe, how features contributed to a single prediction. This package implements an explanation method based on LIME (Local Interpretable Model-agnostic Explanations, see Tulio Ribeiro, Singh, Guestrin (2016) <doi:10.1145/2939672.2939778>) in which interpretable inputs are created based on local rather than global behaviour of each original feature.

Maintained by Przemyslaw Biecek. Last updated 3 years ago.

6.5 match 14 stars 6.16 score 23 scripts

mrcieu

TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.

Maintained by Gibran Hemani. Last updated 11 days ago.

3.4 match 467 stars 11.23 score 1.7k scripts 1 dependents

alexzwanenburg

familiar:End-to-End Automated Machine Learning and Model Evaluation

Single unified interface for end-to-end modelling of regression, categorical and time-to-event (survival) outcomes. Models created using familiar are self-containing, and their use does not require additional information such as baseline survival, feature clustering, or feature transformation and normalisation parameters. Model performance, calibration, risk group stratification, (permutation) variable importance, individual conditional expectation, partial dependence, and more, are assessed automatically as part of the evaluation process and exported in tabular format and plotted, and may also be computed manually using export and plot functions. Where possible, metrics and values obtained during the evaluation process come with confidence intervals.

Maintained by Alex Zwanenburg. Last updated 6 months ago.

ai explainable-ai machine-learning survival-analysis tabular-data

7.5 match 31 stars 5.05 score 18 scripts

hadley

ggvis:Interactive Grammar of Graphics

An implementation of an interactive grammar of graphics, taking the best parts of 'ggplot2', combining them with the reactive framework of 'shiny' and drawing web graphics using 'vega'.

Maintained by Hadley Wickham. Last updated 1 years ago.

5.3 match 1 stars 7.02 score 2.3k scripts 11 dependents

bioc

ChemmineR:Cheminformatics Toolkit for R

ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.

Maintained by Thomas Girke. Last updated 5 months ago.

cheminformatics biomedicalinformatics pharmacogenetics pharmacogenomics microtitreplateassay cellbasedassays visualization infrastructure dataimport clustering proteomics metabolomics cpp

3.9 match 14 stars 9.42 score 253 scripts 12 dependents

dwarton

ecostats:Code and Data Accompanying the Eco-Stats Text (Warton 2022)

Functions and data supporting the Eco-Stats text (Warton, 2022, Springer), and solutions to exercises. Functions include tools for using simulation envelopes in diagnostic plots, and a function for diagnostic plots of multivariate linear models. Datasets mentioned in the package are included here (where not available elsewhere) and there is a vignette for each chapter of the text with solutions to exercises.

Maintained by David Warton. Last updated 1 years ago.

5.5 match 8 stars 6.58 score 53 scripts

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

5.1 match 145 stars 7.09 score 50 scripts 2 dependents

myles-lewis

nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'

Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.

Maintained by Myles Lewis. Last updated 6 days ago.

4.3 match 12 stars 7.92 score 46 scripts

grvanderploeg

parafac4microbiome:Parallel Factor Analysis Modelling of Longitudinal Microbiome Data

Creation and selection of PARAllel FACtor Analysis (PARAFAC) models of longitudinal microbiome data. You can import your own data with our import functions or use one of the example datasets to create your own PARAFAC models. Selection of the optimal number of components can be done using assessModelQuality() and assessModelStability(). The selected model can then be plotted using plotPARAFACmodel(). The Parallel Factor Analysis method was originally described by Caroll and Chang (1970) <doi:10.1007/BF02310791> and Harshman (1970) <https://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf>.

Maintained by Geert Roelof van der Ploeg. Last updated 20 days ago.

dimensionality-reduction microbiome microbiome-data multiway multiway-algorithms parallel-factor-analysis

5.2 match 6 stars 6.31 score 13 scripts

mrc-ide

orderly2:Orderly Next Generation

Distributed reproducible computing framework, adopting ideas from git, docker and other software. By defining a lightweight interface around the inputs and outputs of an analysis, a lot of the repetitive work for reproducible research can be automated. We define a simple format for organising and describing work that facilitates collaborative reproducible research and acknowledges that all analyses are run multiple times over their lifespans.

Maintained by Rich FitzJohn. Last updated 2 months ago.

3.8 match 8 stars 8.30 score 49 scripts 2 dependents

giuseppec

iml:Interpretable Machine Learning

Interpretability methods to analyze the behavior and predictions of any machine learning model. Implemented methods are: Feature importance described by Fisher et al. (2018) <doi:10.48550/arxiv.1801.01489>, accumulated local effects plots described by Apley (2018) <doi:10.48550/arxiv.1612.08468>, partial dependence plots described by Friedman (2001) <www.jstor.org/stable/2699986>, individual conditional expectation ('ice') plots described by Goldstein et al. (2013) <doi:10.1080/10618600.2014.907095>, local models (variant of 'lime') described by Ribeiro et. al (2016) <doi:10.48550/arXiv.1602.04938>, the Shapley Value described by Strumbelj et. al (2014) <doi:10.1007/s10115-013-0679-x>, feature interactions described by Friedman et. al <doi:10.1214/07-AOAS148> and tree surrogate models.

Maintained by Giuseppe Casalicchio. Last updated 20 days ago.

2.4 match 494 stars 12.86 score 642 scripts 4 dependents

viadee

localICE:Local Individual Conditional Expectation

Local Individual Conditional Expectation ('localICE') is a local explanation approach from the field of eXplainable Artificial Intelligence (XAI). localICE is a model-agnostic XAI approach which provides three-dimensional local explanations for particular data instances. The approach is proposed in the master thesis of Martin Walter as an extension to ICE (see Reference). The three dimensions are the two features at the horizontal and vertical axes as well as the target represented by different colors. The approach is applicable for classification and regression problems to explain interactions of two features towards the target. For classification models, the number of classes can be more than two and each class is added as a different color to the plot. The given instance is added to the plot as two dotted lines according to the feature values. The localICE-package can explain features of type factor and numeric of any machine learning model. Automatically supported machine learning packages are 'mlr', 'randomForest', 'caret' or all other with an S3 predict function. For further model types from other libraries, a predict function has to be provided as an argument in order to get access to the model. Reference to the ICE approach: Alex Goldstein, Adam Kapelner, Justin Bleich, Emil Pitkin (2013) <arXiv:1309.6392>.

Maintained by Martin Walter. Last updated 5 years ago.

ai explainable-ai ggplot machine-learning visualization

8.4 match 7 stars 3.54 score 3 scripts

mfrasco

Metrics:Evaluation Metrics for Machine Learning

An implementation of evaluation metrics in R that are commonly used in supervised machine learning. It implements metrics for regression, time series, binary classification, classification, and information retrieval problems. It has zero dependencies and a consistent, simple interface for all functions.

Maintained by Michael Frasco. Last updated 6 years ago.

2.3 match 99 stars 13.02 score 6.1k scripts 51 dependents

modeloriented

live:Local Interpretable (Model-Agnostic) Visual Explanations

Interpretability of complex machine learning models is a growing concern. This package helps to understand key factors that drive the decision made by complicated predictive model (so called black box model). This is achieved through local approximations that are either based on additive regression like model or CART like model that allows for higher interactions. The methodology is based on Tulio Ribeiro, Singh, Guestrin (2016) <doi:10.1145/2939672.2939778>. More details can be found in Staniak, Biecek (2018) <doi:10.32614/RJ-2018-072>.

Maintained by Mateusz Staniak. Last updated 6 years ago.

iml interpretability lime machine-learning model-visualization visual-explanations xai

5.2 match 35 stars 5.59 score 55 scripts

dcousin3

ANOFA:Analyses of Frequency Data

Analyses of frequencies can be performed using an alternative test based on the G statistic. The test has similar type-I error rates and power as the chi-square test. However, it is based on a total statistic that can be decomposed in an additive fashion into interaction effects, main effects, simple effects, contrast effects, etc., mimicking precisely the logic of ANOVA. We call this set of tools 'ANOFA' (Analysis of Frequency data) to highlight its similarities with ANOVA. This framework also renders plots of frequencies along with confidence intervals. Finally, effect sizes and planning statistical power are easily done under this framework. The ANOFA is a tool that assesses the significance of effects instead of the significance of parameters; as such, it is more intuitive to most researchers than alternative approaches based on generalized linear models. See Laurencelle and Cousineau (2023) <doi:10.20982/tqmp.19.2.p173>.

Maintained by Denis Cousineau. Last updated 2 months ago.

frequencies statistics

6.6 match 1 stars 4.30 score 1 scripts

pbiecek

breakDown:Model Agnostic Explainers for Individual Predictions

Model agnostic tool for decomposition of predictions from black boxes. Break Down Table shows contributions of every variable to a final prediction. Break Down Plot presents variable contributions in a concise graphical way. This package work for binary classifiers and general regression models.

Maintained by Przemyslaw Biecek. Last updated 1 years ago.

data-science iml interpretability machine-learning visual-explanations xai

3.0 match 103 stars 8.90 score 91 scripts 2 dependents

rdatatable

data.table:Extension of `data.frame`

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.

Maintained by Tyson Barrett. Last updated 4 hours ago.

1.1 match 3.7k stars 23.52 score 230k scripts 4.6k dependents

bioc

scGPS:A complete analysis of single cell subpopulations, from identifying subpopulations to analysing their relationship (scGPS = single cell Global Predictions of Subpopulation)

The package implements two main algorithms to answer two key questions: a SCORE (Stable Clustering at Optimal REsolution) to find subpopulations, followed by scGPS to investigate the relationships between subpopulations.

Maintained by Quan Nguyen. Last updated 5 months ago.

singlecell clustering dataimport sequencing coverage openblas cpp

5.1 match 4 stars 5.20 score 7 scripts

rconsortium

S7:An Object Oriented System Meant to Become a Successor to S3 and S4

A new object oriented programming system designed to be a successor to S3 and S4. It includes formal class, generic, and method specification, and a limited form of multiple dispatch. It has been designed and implemented collaboratively by the R Consortium Object-Oriented Programming Working Group, which includes representatives from R-Core, 'Bioconductor', 'Posit'/'tidyverse', and the wider R community.

Maintained by Hadley Wickham. Last updated 4 months ago.

2.0 match 432 stars 13.15 score 86 scripts 22 dependents

spatstat

spatstat.utils:Utility Functions for 'spatstat'

Contains utility functions for the 'spatstat' family of packages which may also be useful for other purposes.

Maintained by Adrian Baddeley. Last updated 2 days ago.

spatial-analysis spatial-data spatstat

2.3 match 5 stars 11.66 score 134 scripts 248 dependents

duckdb

duckdb:DBI Package for the DuckDB Database Management System

The DuckDB project is an embedded analytical data management system with support for the Structured Query Language (SQL). This package includes all of DuckDB and an R Database Interface (DBI) connector.

Maintained by Kirill Müller. Last updated 3 days ago.

database duckdb olap cpp

1.9 match 158 stars 13.79 score 1.7k scripts 46 dependents

khliland

multiblock:Multiblock Data Fusion in Statistics and Machine Learning

Functions and datasets to support Smilde, Næs and Liland (2021, ISBN: 978-1-119-60096-1) "Multiblock Data Fusion in Statistics and Machine Learning - Applications in the Natural and Life Sciences". This implements and imports a large collection of methods for multiblock data analysis with common interfaces, result- and plotting functions, several real data sets and six vignettes covering a range different applications.

Maintained by Kristian Hovde Liland. Last updated 2 months ago.

cpp

3.8 match 14 stars 6.68 score 19 scripts

adeverse

ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.

Maintained by Aurélie Siberchicot. Last updated 13 days ago.

openblas cpp

1.7 match 39 stars 14.96 score 2.2k scripts 256 dependents

bioc

MatrixQCvis:Shiny-based interactive data-quality exploration for omics data

Data quality assessment is an integral part of preparatory data analysis to ensure sound biological information retrieval. We present here the MatrixQCvis package, which provides shiny-based interactive visualization of data quality metrics at the per-sample and per-feature level. It is broadly applicable to quantitative omics data types that come in matrix-like format (features x samples). It enables the detection of low-quality samples, drifts, outliers and batch effects in data sets. Visualizations include amongst others bar- and violin plots of the (count/intensity) values, mean vs standard deviation plots, MA plots, empirical cumulative distribution function (ECDF) plots, visualizations of the distances between samples, and multiple types of dimension reduction plots. Furthermore, MatrixQCvis allows for differential expression analysis based on the limma (moderated t-tests) and proDA (Wald tests) packages. MatrixQCvis builds upon the popular Bioconductor SummarizedExperiment S4 class and enables thus the facile integration into existing workflows. The package is especially tailored towards metabolomics and proteomics mass spectrometry data, but also allows to assess the data quality of other data types that can be represented in a SummarizedExperiment object.

Maintained by Thomas Naake. Last updated 5 months ago.

visualization shinyapps gui qualitycontrol dimensionreduction metabolomics proteomics transcriptomics

5.2 match 4.74 score 4 scripts

laijiangshan

gam.hp:Hierarchical Partitioning of Adjusted R2 and Explained Deviance for Generalized Additive Models

Conducts hierarchical partitioning to calculate individual contributions of each predictor towards adjusted R2 and explained deviance for generalized additive models based on output of gam()in 'mgcv' package, applying the algorithm in this paper: Lai(2024) <doi:10.1016/j.pld.2024.06.002>.

Maintained by Jiangshan Lai. Last updated 3 months ago.

5.0 match 6 stars 4.95 score 6 scripts

vegandevs

vegan:Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Maintained by Jari Oksanen. Last updated 16 days ago.

ecological-modelling ecology ordination fortran openblas

1.3 match 472 stars 19.41 score 15k scripts 440 dependents

dcousin3

ANOPA:Analyses of Proportions using Anscombe Transform

Analyses of Proportions can be performed on the Anscombe (arcsine-related) transformed data. The 'ANOPA' package can analyze proportions obtained from up to four factors. The factors can be within-subject or between-subject or a mix of within- and between-subject. The main, omnibus analysis can be followed by additive decompositions into interaction effects, main effects, simple effects, contrast effects, etc., mimicking precisely the logic of ANOVA. For that reason, we call this set of tools 'ANOPA' (Analysis of Proportion using Anscombe transform) to highlight its similarities with ANOVA. The 'ANOPA' framework also allows plots of proportions easy to obtain along with confidence intervals. Finally, effect sizes and planning statistical power are easily done under this framework. Only particularity, the 'ANOPA' computes F statistics which have an infinite degree of freedom on the denominator. See Laurencelle and Cousineau (2023) <doi:10.3389/fpsyg.2022.1045436>.

Maintained by Denis Cousineau. Last updated 2 months ago.

error-bars proportions statistical-testing statistics summary-statistics

6.6 match 1 stars 3.65 score 18 scripts

jpfitzinger

tidyfit:Regularized Linear Modeling with Tidy Data

An extension to the 'R' tidy data environment for automated machine learning. The package allows fitting and cross validation of linear regression and classification algorithms on grouped data.

Maintained by Johann Pfitzinger. Last updated 2 months ago.

auto-ml classification machine-learning regression tidyverse

3.3 match 16 stars 7.22 score 26 scripts

protviz

prozor:Minimal Protein Set Explaining Peptide Spectrum Matches

Determine minimal protein set explaining peptide spectrum matches. Utility functions for creating fasta amino acid databases with decoys and contaminants. Peptide false discovery rate estimation for target decoy search results on psm, precursor, peptide and protein level. Computing dynamic swath window sizes based on MS1 or MS2 signal distributions.

Maintained by Witold Wolski. Last updated 4 months ago.

software massspectrometry proteomics experimenthubsoftware

5.2 match 6 stars 4.45 score 93 scripts

vast-lib

tinyVAST:Multivariate Spatio-Temporal Models using Structural Equations

Fits a wide variety of multivariate spatio-temporal models with simultaneous and lagged interactions among variables (including vector autoregressive spatio-temporal ('VAST') dynamics) for areal, continuous, or network spatial domains. It includes time-variable, space-variable, and space-time-variable interactions using dynamic structural equation models ('DSEM') as expressive interface, and the 'mgcv' package to specify splines via the formula interface. See Thorson et al. (2024) <doi:10.48550/arXiv.2401.10193> for more details.

Maintained by James T. Thorson. Last updated 8 hours ago.

vector-autoregressive-spatio-temporal-model cpp

3.3 match 13 stars 6.80 score

tidyverse

duckplyr:A 'DuckDB'-Backed Version of 'dplyr'

A drop-in replacement for 'dplyr', powered by 'DuckDB' for performance. Offers convenient utilities for working with in-memory and larger-than-memory data while retaining full 'dplyr' compatibility.

Maintained by Kirill Müller. Last updated 5 days ago.

analytics dataframe dplyr duckdb performance

2.0 match 309 stars 11.33 score 220 scripts

bioc

pcaMethods:A collection of PCA methods

Provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA. A cluster based method for missing value estimation is included for comparison. BPCA, PPCA and NipalsPCA may be used to perform PCA on incomplete data as well as for accurate missing value estimation. A set of methods for printing and plotting the results is also provided. All PCA methods make use of the same data structure (pcaRes) to provide a common interface to the PCA results. Initiated at the Max-Planck Institute for Molecular Plant Physiology, Golm, Germany.

Maintained by Henning Redestig. Last updated 5 months ago.

bayesian cpp

1.7 match 49 stars 13.10 score 538 scripts 73 dependents

tagteam

riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks

Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.

Maintained by Thomas Alexander Gerds. Last updated 18 days ago.

openblas cpp

1.7 match 46 stars 13.00 score 736 scripts 35 dependents

aljacq

LorenzRegression:Lorenz and Penalized Lorenz Regressions

Inference for the Lorenz and penalized Lorenz regressions. More broadly, the package proposes functions to assess inequality and graphically represent it. The Lorenz Regression procedure is introduced in Heuchenne and Jacquemain (2022) <doi:10.1016/j.csda.2021.107347> and in Jacquemain, A., C. Heuchenne, and E. Pircalabelu (2024) <doi:10.1214/23-EJS2200>.

Maintained by Alexandre Jacquemain. Last updated 11 days ago.

cpp

5.3 match 1 stars 3.95 score

eagerai

fastai:Interface to 'fastai'

The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.

Maintained by Turgut Abdullayev. Last updated 11 months ago.

audio collaborative-filtering darknet darknet-image-classification fastai medical object-detection tabular text vision

2.3 match 118 stars 9.40 score 76 scripts

easystats

effectsize:Indices of Effect Size

Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>.

Maintained by Mattan S. Ben-Shachar. Last updated 1 months ago.

anova cohens-d compute conversion correlation effect-size effectsize hacktoberfest hedges-g interpretation standardization standardized statistics

1.3 match 344 stars 16.38 score 1.8k scripts 29 dependents

modeloriented

rSAFE:Surrogate-Assisted Feature Extraction

Provides a model agnostic tool for white-box model trained on features extracted from a black-box model. For more information see: Gosiewska et al. (2020) <doi:10.1016/j.dss.2021.113556>.

Maintained by Alicja Gosiewska. Last updated 3 years ago.

feature-engineering feature-extraction iml interpretability machine-learning xai

3.0 match 28 stars 6.79 score 44 scripts

chrsigg

nsprcomp:Non-Negative and Sparse PCA

Two methods for performing a constrained principal component analysis (PCA), where non-negativity and/or sparsity constraints are enforced on the principal axes (PAs). The function 'nsprcomp' computes one principal component (PC) after the other. Each PA is optimized such that the corresponding PC has maximum additional variance not explained by the previous components. In contrast, the function 'nscumcomp' jointly computes all PCs such that the cumulative variance is maximal. Both functions have the same interface as the 'prcomp' function from the 'stats' package (plus some extra parameters), and both return the result of the analysis as an object of class 'nsprcomp', which inherits from 'prcomp'. See <https://sigg-iten.ch/learningbits/2013/05/27/nsprcomp-is-on-cran/> and Sigg et al. (2008) <doi:10.1145/1390156.1390277> for more details.

Maintained by Christian Sigg. Last updated 7 years ago.

4.3 match 9 stars 4.77 score 22 scripts 1 dependents

dcousin3

CohensdpLibrary:Cohen's d_p Computation with Confidence Intervals

Computing Cohen's d_p in any experimental designs (between-subject, within-subject, and single-group design). Cousineau (2022) <https://github.com/dcousin3/CohensdpLibrary/>; Cohen (1969, ISBN: 0-8058-0283-5).

Maintained by Denis Cousineau. Last updated 4 days ago.

statistics fortran

6.6 match 1 stars 3.00 score 3 scripts

bioc

CausalR:Causal network analysis methods

Causal network analysis methods for regulator prediction and network reconstruction from genome scale data.

Maintained by Glyn Bradley. Last updated 5 months ago.

immunooncology systemsbiology network graphandnetwork network inference transcriptomics proteomics differentialexpression rnaseq microarray

5.5 match 3.60 score 7 scripts

laresbernardo

lares:Analytics & Machine Learning Sidekick

Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.

Maintained by Bernardo Lares. Last updated 24 days ago.

analytics api automation automl data-science descriptive-statistics h2o machine-learning marketing mmm predictive-modeling puzzle rlanguage robyn visualization

2.0 match 233 stars 9.84 score 185 scripts 1 dependents

squidlobster

castor:Efficient Phylogenetics on Large Trees

Efficient phylogenetic analyses on massive phylogenies comprising up to millions of tips. Functions include pruning, rerooting, calculation of most-recent common ancestors, calculating distances from the tree root and calculating pairwise distances. Calculation of phylogenetic signal and mean trait depth (trait conservatism), ancestral state reconstruction and hidden character prediction of discrete characters, simulating and fitting models of trait evolution, fitting and simulating diversification models, dating trees, comparing trees, and reading/writing trees in Newick format. Citation: Louca, Stilianos and Doebeli, Michael (2017) <doi:10.1093/bioinformatics/btx701>.

Maintained by Stilianos Louca. Last updated 4 months ago.

cpp

3.4 match 2 stars 5.75 score 450 scripts 9 dependents

ericaponzi

RaJIVE:Robust Angle Based Joint and Individual Variation Explained

A robust alternative to the aJIVE (angle based Joint and Individual Variation Explained) method (Feng et al 2018: <doi:10.1016/j.jmva.2018.03.008>) for the estimation of joint and individual components in the presence of outliers in multi-source data. It decomposes the multi-source data into joint, individual and residual (noise) contributions. The decomposition is robust to outliers and noise in the data. The method is illustrated in Ponzi et al (2021) <arXiv:2101.09110>.

Maintained by Erica Ponzi. Last updated 4 years ago.

7.1 match 2.70 score 1 scripts

computationalstylistics

litRiddle:Dataset and Tools to Research the Riddle of Literary Quality

Dataset and functions to explore quality of literary novels. The package is a part of the Riddle of Literary Quality project, and it contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data. For more details, see: Eder M, van Zundert J, Lensink S, van Dalen-Oskam K (2022). Replicating The Riddle of Literary Quality: The litRiddle package for R. In _Digital Humanities 2022: Conference Abstracts_, 636-637.

Maintained by Maciej Eder. Last updated 2 years ago.

7.1 match 2.70 score 2 scripts

cecileproust-lima

lcmm:Extended Mixed Models Using Latent Classes and Latent Processes

Estimation of various extensions of the mixed models including latent class mixed models, joint latent class mixed models, mixed models for curvilinear outcomes, mixed models for multivariate longitudinal outcomes using a maximum likelihood estimation method (Proust-Lima, Philipps, Liquet (2017) <doi:10.18637/jss.v078.i02>).

Maintained by Cecile Proust-Lima. Last updated 1 months ago.

fortran

1.7 match 62 stars 11.41 score 249 scripts 7 dependents

connormayer

maxent.ot:Perform Phonological Analyses using Maximum Entropy Optimality Theory

Fit Maximum Entropy Optimality Theory models to data sets, generate the predictions made by such models for novel data, and compare the fit of different models using a variety of metrics. The package is described in Mayer, C., Tan, A., Zuraw, K. (in press) <https://sites.socsci.uci.edu/~cjmayer/papers/cmayer_et_al_maxent_ot_accepted.pdf>.

Maintained by Connor Mayer. Last updated 4 months ago.

3.4 match 8 stars 5.51 score 6 scripts

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

2.3 match 3 stars 8.20 score 7.8k scripts 11 dependents

plangfelder

WGCNA:Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Maintained by Peter Langfelder. Last updated 6 months ago.

cpp

1.9 match 54 stars 9.65 score 5.3k scripts 32 dependents

anna-neufeld

splinetree:Longitudinal Regression Trees and Forests

Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.

Maintained by Anna Neufeld. Last updated 6 years ago.

3.4 match 4 stars 5.24 score 29 scripts

cran

IMEC:Ising Model of Explanatory Coherence

Theories are one of the most important tools of science. Although psychologists discussed problems of theory in their discipline for a long time, weak theories are still widespread in most subfields. One possible reason for this is that psychologists lack the tools to systematically assess the quality of their theories. Previously a computational model for formal theory evaluation based on the concept of explanatory coherence was developed (Thagard, 1989, <doi:10.1017/S0140525X00057046>). However, there are possible improvements to this model and it is not available in software that psychologists typically use. Therefore, a new implementation of explanatory coherence based on the Ising model is available in this R-package.

Maintained by Maximilian Maier. Last updated 4 years ago.

6.6 match 2.70 score 1 scripts

danchaltiel

crosstable:Crosstables for Descriptive Analyses

Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using 'rmarkdown' or 'officer'. You can also compute effect sizes and statistical tests if needed.

Maintained by Dan Chaltiel. Last updated 2 months ago.

descriptive-statistics flextable frequency-table html-report msword officer

1.7 match 116 stars 10.37 score 340 scripts

bioc

singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.

Maintained by Joshua David Campbell. Last updated 24 days ago.

singlecell geneexpression differentialexpression alignment clustering immunooncology batcheffect normalization qualitycontrol dataimport gui

1.7 match 181 stars 10.16 score 252 scripts

fchen365

epca:Exploratory Principal Component Analysis

Exploratory principal component analysis for large-scale dataset, including sparse principal component analysis and sparse matrix approximation.

Maintained by Fan Chen. Last updated 11 months ago.

community-detection exploratory-data-analysis matrix-decompositions pca principal-component-analysis sparse-matrix

3.7 match 11 stars 4.74 score 8 scripts

mattblackwell

DirectEffects:Estimating Controlled Direct Effects for Explaining Causal Findings

A set of functions to estimate the controlled direct effect of treatment fixing a potential mediator to a specific value. Implements the sequential g-estimation estimator described in Vansteelandt (2009) <doi:10.1097/EDE.0b013e3181b6f4c9> and Acharya, Blackwell, and Sen (2016) <doi:10.1017/S0003055416000216> and the telescope matching estimator described in Blackwell and Strezhnev (2020) <doi:10.1111/rssa.12759>.

Maintained by Matthew Blackwell. Last updated 20 days ago.

2.9 match 18 stars 6.09 score 17 scripts

donaldrwilliams

BGGM:Bayesian Gaussian Graphical Models

Fit Bayesian Gaussian graphical models. The methods are separated into two Bayesian approaches for inference: hypothesis testing and estimation. There are extensions for confirmatory hypothesis testing, comparing Gaussian graphical models, and node wise predictability. These methods were recently introduced in the Gaussian graphical model literature, including Williams (2019) <doi:10.31234/osf.io/x8dpr>, Williams and Mulder (2019) <doi:10.31234/osf.io/ypxd8>, Williams, Rast, Pericchi, and Mulder (2019) <doi:10.31234/osf.io/yt386>.

Maintained by Philippe Rast. Last updated 3 months ago.

bayes-factors bayesian-hypothesis-testing gaussian-graphical-models openblas cpp openmp

1.8 match 55 stars 9.64 score 102 scripts 1 dependents

growthcharts

brokenstick:Broken Stick Model for Irregular Longitudinal Data

Data on multiple individuals through time are often sampled at times that differ between persons. Irregular observation times can severely complicate the statistical analysis of the data. The broken stick model approximates each subject’s trajectory by one or more connected line segments. The times at which segments connect (breakpoints) are identical for all subjects and under control of the user. A well-fitting broken stick model effectively transforms individual measurements made at irregular times into regular trajectories with common observation times. Specification of the model requires three variables: time, measurement and subject. The model is a special case of the linear mixed model, with time as a linear B-spline and subject as the grouping factor. The main assumptions are: subjects are exchangeable, trajectories between consecutive breakpoints are straight, random effects follow a multivariate normal distribution, and unobserved data are missing at random. The package contains functions for fitting the broken stick model to data, for predicting curves in new data and for plotting broken stick estimates. The package supports two optimization methods, and includes options to structure the variance-covariance matrix of the random effects. The analyst may use the software to smooth growth curves by a series of connected straight lines, to align irregularly observed curves to a common time grid, to create synthetic curves at a user-specified set of breakpoints, to estimate the time-to-time correlation matrix and to predict future observations. See <doi:10.18637/jss.v106.i07> for additional documentation on background, methodology and applications.

Maintained by Stef van Buuren. Last updated 2 years ago.

b-spline growth-curves linear-mixed-models longitudinal-data

3.2 match 9 stars 5.33 score 12 scripts

ropensci

sofa:Connector to 'CouchDB'

Provides an interface to the 'NoSQL' database 'CouchDB' (<http://couchdb.apache.org>). Methods are provided for managing databases within 'CouchDB', including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local 'CouchDB' instance, or a remote 'CouchDB' databases such as 'Cloudant'. Documents can be inserted directly from vectors, lists, data.frames, and 'JSON'. Targeted at 'CouchDB' v2 or greater.

Maintained by Yaoxiang Li. Last updated 1 months ago.

couchdb database nosql documents cloudant couchdb-client

2.3 match 33 stars 7.51 score 54 scripts

dwarton

mvabund:Statistical Methods for Analysing Multivariate Abundance Data

A set of tools for displaying, modeling and analysing multivariate abundance data in community ecology. See 'mvabund-package.Rd' for details of overall package organization. The package is implemented with the Gnu Scientific Library (<http://www.gnu.org/software/gsl/>) and 'Rcpp' (<http://dirk.eddelbuettel.com/code/rcpp.html>) 'R' / 'C++' classes.

Maintained by David Warton. Last updated 1 years ago.

gsl cpp

1.7 match 10 stars 10.13 score 680 scripts 5 dependents

gavinsimpson

analogue:Analogue and Weighted Averaging Methods for Palaeoecology

Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.

Maintained by Gavin L. Simpson. Last updated 6 months ago.

1.9 match 14 stars 8.96 score 185 scripts 4 dependents

richardjtelford

palaeoSig:Significance Tests for Palaeoenvironmental Reconstructions

Several tests of quantitative palaeoenvironmental reconstructions from microfossil assemblages, including the null model tests of the statistically significant of reconstructions developed by Telford and Birks (2011) <doi:10.1016/j.quascirev.2011.03.002>, and tests of the effect of spatial autocorrelation on transfer function model performance using methods from Telford and Birks (2009) <doi:10.1016/j.quascirev.2008.12.020> and Trachsel and Telford (2016) <doi:10.5194/cp-12-1215-2016>. Age-depth models with generalized mixed-effect regression from Heegaard et al (2005) <doi:10.1191/0959683605hl836rr> are also included.

Maintained by Richard Telford. Last updated 2 years ago.

3.0 match 3 stars 5.45 score 31 scripts

evolecolgroup

tidysdm:Species Distribution Models with Tidymodels

Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.

Maintained by Andrea Manica. Last updated 10 days ago.

species-distribution-modelling tidymodels

1.9 match 31 stars 8.82 score 51 scripts

chrsigg

nscancor:Non-Negative and Sparse CCA

Two implementations of canonical correlation analysis (CCA) that are based on iterated regression. By choosing the appropriate regression algorithm for each data domain, it is possible to enforce sparsity, non-negativity or other kinds of constraints on the projection vectors. Multiple canonical variables are computed sequentially using a generalized deflation scheme, where the additional correlation not explained by previous variables is maximized. nscancor() is used to analyze paired data from two domains, and has the same interface as cancor() from the 'stats' package (plus some extra parameters). mcancor() is appropriate for analyzing data from three or more domains. See <https://sigg-iten.ch/learningbits/2014/01/20/canonical-correlation-analysis-under-constraints/> and Sigg et al. (2007) <doi:10.1109/MLSP.2007.4414315> for more details.

Maintained by Christian Sigg. Last updated 2 years ago.

4.3 match 13 stars 3.81 score 7 scripts

zabore

riskclustr:Functions to Study Etiologic Heterogeneity

A collection of functions related to the study of etiologic heterogeneity both across disease subtypes and across individual disease markers. The included functions allow one to quantify the extent of etiologic heterogeneity in the context of a case-control study, and provide p-values to test for etiologic heterogeneity across individual risk factors. Begg CB, Zabor EC, Bernstein JL, Bernstein L, Press MF, Seshan VE (2013) <doi:10.1002/sim.5902>.

Maintained by Emily C. Zabor. Last updated 1 years ago.

3.4 match 1 stars 4.81 score 26 scripts

bioc

consICA:consensus Independent Component Analysis

consICA implements a data-driven deconvolution method – consensus independent component analysis (ICA) to decompose heterogeneous omics data and extract features suitable for patient diagnostics and prognostics. The method separates biologically relevant transcriptional signals from technical effects and provides information about the cellular composition and biological processes. The implementation of parallel computing in the package ensures efficient analysis of modern multicore systems.

Maintained by Petr V. Nazarov. Last updated 5 months ago.

technology statisticalmethod sequencing rnaseq transcriptomics classification featureextraction

3.8 match 4.30 score 2 scripts

bioc

ALDEx2:Analysis Of Differential Abundance Taking Sample and Scale Variation Into Account

A differential abundance analysis for the comparison of two or more conditions. Useful for analyzing data from standard RNA-seq or meta-RNA-seq assays as well as selected and unselected values from in-vitro sequence selections. Uses a Dirichlet-multinomial model to infer abundance from counts, optimized for three or more experimental replicates. The method infers biological and sampling variation to calculate the expected false discovery rate, given the variation, based on a Wilcoxon Rank Sum test and Welch's t-test (via aldex.ttest), a Kruskal-Wallis test (via aldex.kw), a generalized linear model (via aldex.glm), or a correlation test (via aldex.corr). All tests report predicted p-values and posterior Benjamini-Hochberg corrected p-values. ALDEx2 also calculates expected standardized effect sizes for paired or unpaired study designs. ALDEx2 can now be used to estimate the effect of scale on the results and report on the scale-dependent robustness of results.

Maintained by Greg Gloor. Last updated 5 months ago.

differentialexpression rnaseq transcriptomics geneexpression dnaseq chipseq bayesian sequencing software microbiome metagenomics immunooncology scale simulation posterior p-value

1.5 match 28 stars 10.70 score 424 scripts 3 dependents

bxc147

Epi:Statistical Analysis in Epidemiology

Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.

Maintained by Bendix Carstensen. Last updated 2 months ago.

1.7 match 4 stars 9.65 score 708 scripts 11 dependents

citoverse

cito:Building and Training Neural Networks

The 'cito' package provides a user-friendly interface for training and interpreting deep neural networks (DNN). 'cito' simplifies the fitting of DNNs by supporting the familiar formula syntax, hyperparameter tuning under cross-validation, and helps to detect and handle convergence problems. DNNs can be trained on CPU, GPU and MacOS GPUs. In addition, 'cito' has many downstream functionalities such as various explainable AI (xAI) metrics (e.g. variable importance, partial dependence plots, accumulated local effect plots, and effect estimates) to interpret trained DNNs. 'cito' optionally provides confidence intervals (and p-values) for all xAI metrics and predictions. At the same time, 'cito' is computationally efficient because it is based on the deep learning framework 'torch'. The 'torch' package is native to R, so no Python installation or other API is required for this package.

Maintained by Maximilian Pichler. Last updated 2 months ago.

machine-learning neural-network

1.8 match 42 stars 9.07 score 129 scripts 1 dependents

stscl

gdverse:Analysis of Spatial Stratified Heterogeneity

Analyzing spatial factors and exploring spatial associations based on the concept of spatial stratified heterogeneity, while also taking into account local spatial dependencies, spatial interpretability, complex spatial interactions, and robust spatial stratification. Additionally, it supports the spatial stratified heterogeneity family established in academic literature.

Maintained by Wenbo Lv. Last updated 2 days ago.

geographical-detector geoinformatics geospatial-analysis spatial-statistics spatial-stratified-heterogeneity cpp

1.8 match 32 stars 9.07 score 41 scripts 2 dependents

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 16 days ago.

genetics imputation lmm fortran

1.3 match 12 stars 11.88 score 448 scripts 16 dependents

emkayoh

Dark:The Analysis of Dark Adaptation Data

The recovery of visual sensitivity in a dark environment is known as dark adaptation. In a clinical or research setting the recovery is typically measured after a dazzling flash of light and can be described by the Mahroo, Lamb and Pugh (MLP) model of dark adaptation. The functions in this package take dark adaptation data and use nonlinear regression to find the parameters of the model that 'best' describe the data. They do this by firstly, generating rapid initial objective estimates of data adaptation parameters, then a multi-start algorithm is used to reduce the possibility of a local minimum. There is also a bootstrap method to calculate parameter confidence intervals. The functions rely upon a 'dark' list or object. This object is created as the first step in the workflow and parts of the object are updated as it is processed.

Maintained by Jeremiah MF Kelly. Last updated 2 months ago.

3.0 match 5.18 score 30 scripts

leondap

recluster:Ordination Methods for the Analysis of Beta-Diversity Indices

The analysis of different aspects of biodiversity requires specific algorithms. For example, in regionalisation analyses, the high frequency of ties and zero values in dissimilarity matrices produced by Beta-diversity turnover produces hierarchical cluster dendrograms whose topology and bootstrap supports are affected by the order of rows in the original matrix. Moreover, visualisation of biogeographical regionalisation can be facilitated by a combination of hierarchical clustering and multi-dimensional scaling. The recluster package provides robust techniques to visualise and analyse pattern of biodiversity and to improve occurrence data for cryptic taxa.

Maintained by Leonardo Dapporto. Last updated 4 months ago.

3.3 match 4 stars 4.69 score 41 scripts

r-forge

modEvA:Model Evaluation and Analysis

Analyses species distribution models and evaluates their performance. It includes functions for variation partitioning, extracting variable importance, computing several metrics of model discrimination and calibration performance, optimizing prediction thresholds based on a number of criteria, performing multivariate environmental similarity surface (MESS) analysis, and displaying various analytical plots. Initially described in Barbosa et al. (2013) <doi:10.1111/ddi.12100>.

Maintained by A. Marcia Barbosa. Last updated 11 days ago.

2.3 match 6.82 score 269 scripts 3 dependents

end-to-end-provenance

provExplainR:Compare Provenance Collections to Explain Changed Script Outputs

Inspects provenance collected by the 'rdt' or 'rdtLite' packages, or other tools providing compatible PROV JSON output created by the execution of a script, and find differences between two provenance collections. Factors under examination included the hardware and software used to execute the script, versions of attached libraries, use of global variables, modified inputs and outputs, and changes in main and sourced scripts. Based on detected changes, 'provExplainR' can be used to study how these factors affect the behavior of the script and generate a promising diagnosis of the causes of different script results. More information about 'rdtLite' and associated tools is available at <https://github.com/End-to-end-provenance/> and Barbara Lerner, Emery Boose, and Luis Perez (2018), Using Introspection to Collect Provenance in R, Informatics, <doi:10.3390/informatics5010012>.

Maintained by Barbara Lerner. Last updated 3 years ago.

5.1 match 3.00 score 8 scripts

mrc-ide

monty:Monte Carlo Models

Experimental sources for the next generation of mcstate, now called 'monty', which will support much of the old mcstate functionality but new things like better parameter interfaces, Hamiltonian Monte Carlo, and other features.

Maintained by Rich FitzJohn. Last updated 1 months ago.

cpp

2.0 match 3 stars 7.52 score 29 scripts 3 dependents

reumandc

wsyn:Wavelet Approaches to Studies of Synchrony in Ecology and Other Fields

Tools for a wavelet-based approach to analyzing spatial synchrony, principally in ecological data. Some tools will be useful for studying community synchrony. See, for instance, Sheppard et al (2016) <doi: 10.1038/NCLIMATE2991>, Sheppard et al (2017) <doi: 10.1051/epjnbp/2017000>, Sheppard et al (2019) <doi: 10.1371/journal.pcbi.1006744>.

Maintained by Daniel C. Reuman. Last updated 3 years ago.

3.1 match 1 stars 4.80 score 125 scripts

cran

BioM2:Biologically Explainable Machine Learning Framework

Biologically Explainable Machine Learning Framework for Phenotype Prediction using omics data described in Chen and Schwarz (2017) <doi:10.48550/arXiv.1712.00336>.Identifying reproducible and interpretable biological patterns from high-dimensional omics data is a critical factor in understanding the risk mechanism of complex disease. As such, explainable machine learning can offer biological insight in addition to personalized risk scoring.In this process, a feature space of biological pathways will be generated, and the feature space can also be subsequently analyzed using WGCNA (Described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559> ) methods.

Maintained by Shunjie Zhang. Last updated 26 days ago.

5.6 match 2.65 score 9 scripts

veseshan

clinfun:Clinical Trial Design and Data Analysis Functions

Utilities to make your clinical collaborations easier if not fun. It contains functions for designing studies such as Simon 2-stage and group sequential designs and for data analysis such as Jonckheere-Terpstra test and estimating survival quantiles.

Maintained by Venkatraman E. Seshan. Last updated 1 years ago.

fortran

1.9 match 5 stars 7.86 score 124 scripts 8 dependents

modeloriented

shapper:Wrapper of Python Library 'shap'

Provides SHAP explanations of machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the Interpretable Machine Learning, there are more and more new ideas for explaining black-box models. One of the best known method for local explanations is SHapley Additive exPlanations (SHAP) introduced by Lundberg, S., et al., (2016) <arXiv:1705.07874> The SHAP method is used to calculate influences of variables on the particular observation. This method is based on Shapley values, a technique used in game theory. The R package 'shapper' is a port of the Python library 'shap'.

Maintained by Szymon Maksymiuk. Last updated 2 years ago.

2.0 match 58 stars 7.31 score 59 scripts

bioc

variancePartition:Quantify and interpret drivers of variation in multilevel gene expression experiments

Quantify and interpret multiple sources of biological and technical variation in gene expression experiments. Uses a linear mixed model to quantify variation in gene expression attributable to individual, tissue, time point, or technical variables. Includes dream differential expression analysis for repeated measures.

Maintained by Gabriel E. Hoffman. Last updated 2 months ago.

rnaseq geneexpression genesetenrichment differentialexpression batcheffect qualitycontrol regression epigenetics functionalgenomics transcriptomics normalization preprocessing microarray immunooncology software

1.3 match 7 stars 11.69 score 1.1k scripts 3 dependents

gsmolinski

dedupewider:Deduplication Across Multiple Columns

Duplicated data can exist in different rows and columns and user may need to treat observations (rows) connected by duplicated data as one observation, e.g. companies can belong to one family (and thus: be one company) by sharing some telephone numbers. This package allows to find connected rows based on data on chosen columns and collapse it into one row.

Maintained by Grzegorz Smoliński. Last updated 3 years ago.

3.3 match 4 stars 4.30 score 7 scripts

bioc

monocle:Clustering, differential expression, and trajectory analysis for single- cell RNA-Seq

Monocle performs differential expression and time-series analysis for single-cell expression experiments. It orders individual cells according to progress through a biological process, without knowing ahead of time which genes define progress through that process. Monocle also performs differential expression analysis, clustering, visualization, and other useful tasks on single cell expression data. It is designed to work with RNA-Seq and qPCR data, but could be used with other types as well.

Maintained by Cole Trapnell. Last updated 5 months ago.

immunooncology sequencing rnaseq geneexpression differentialexpression infrastructure dataimport datarepresentation visualization clustering multiplecomparison qualitycontrol cpp

1.6 match 8.89 score 1.6k scripts 2 dependents

tagteam

pec:Prediction Error Curves for Risk Prediction Models in Survival Analysis

Validation of risk predictions obtained from survival models and competing risk models based on censored data using inverse weighting and cross-validation. Most of the 'pec' functionality has been moved to 'riskRegression'.

Maintained by Thomas A. Gerds. Last updated 2 years ago.

1.9 match 7.42 score 512 scripts 26 dependents

robson-fernandes

bnviewer:Bayesian Networks Interactive Visualization and Explainable Artificial Intelligence

Bayesian networks provide an intuitive framework for probabilistic reasoning and its graphical nature can be interpreted quite clearly. Graph based methods of machine learning are becoming more popular because they offer a richer model of knowledge that can be understood by a human in a graphical format. The 'bnviewer' is an R Package that allows the interactive visualization of Bayesian Networks. The aim of this package is to improve the Bayesian Networks visualization over the basic and static views offered by existing packages.

Maintained by Robson Fernandes. Last updated 5 years ago.

bayesian-inference bayesian-network bayesian-networks probabilistic-graphical-models

2.9 match 7 stars 4.86 score 69 scripts 1 dependents

gforge

Gmisc:Descriptive Statistics, Transition Plots, and More

Tools for making the descriptive "Table 1" used in medical articles, a transition plot for showing changes between categories (also known as a Sankey diagram), flow charts by extending the grid package, a method for variable selection based on the SVD, Bézier lines with arrows complementing the ones in the 'grid' package, and more.

Maintained by Max Gordon. Last updated 2 years ago.

cpp

1.3 match 50 stars 10.40 score 233 scripts 2 dependents

jcrodriguez1989

chatgpt:Interface to 'ChatGPT' from R

'OpenAI's 'ChatGPT' <https://chat.openai.com/> coding assistant for 'RStudio'. A set of functions and 'RStudio' addins that aim to help the R developer in tedious coding tasks.

Maintained by Juan Cruz Rodriguez. Last updated 3 months ago.

assistant chatgpt gpt-3 gpt-4 hacktoberfest llm nlp openai rstatses rstudio rstudio-addin

2.0 match 321 stars 6.81 score 50 scripts

r-forge

stops:Structure Optimized Proximity Scaling

Methods that use flexible variants of multidimensional scaling (MDS) which incorporate parametric nonlinear distance transformations and trade-off the goodness-of-fit fit with structure considerations to find optimal hyperparameters, also known as structure optimized proximity scaling (STOPS) (Rusch, Mair & Hornik, 2023,<doi:10.1007/s11222-022-10197-w>). The package contains various functions, wrappers, methods and classes for fitting, plotting and displaying different 1-way MDS models with ratio, interval, ordinal optimal scaling in a STOPS framework. These cover essentially the functionality of the package smacofx, including Torgerson (classical) scaling with power transformations of dissimilarities, SMACOF MDS with powers of dissimilarities, Sammon mapping with powers of dissimilarities, elastic scaling with powers of dissimilarities, spherical SMACOF with powers of dissimilarities, (ALSCAL) s-stress MDS with powers of dissimilarities, r-stress MDS, MDS with powers of dissimilarities and configuration distances, elastic scaling powers of dissimilarities and configuration distances, Sammon mapping powers of dissimilarities and configuration distances, power stress MDS (POST-MDS), approximate power stress, Box-Cox MDS, local MDS, Isomap, curvilinear component analysis (CLCA), curvilinear distance analysis (CLDA) and sparsified (power) multidimensional scaling and (power) multidimensional distance analysis (experimental models from smacofx influenced by CLCA). All of these models can also be fit by optimizing over hyperparameters based on goodness-of-fit fit only (i.e., no structure considerations). The package further contains functions for optimization, specifically the adaptive Luus-Jaakola algorithm and a wrapper for Bayesian optimization with treed Gaussian process with jumps to linear models, and functions for various c-structuredness indices.

Maintained by Thomas Rusch. Last updated 2 months ago.

openjdk

3.0 match 1 stars 4.48 score 23 scripts

modeloriented

iBreakDown:Model Agnostic Instance Level Variable Attributions

Model agnostic tool for decomposition of predictions from black boxes. Supports additive attributions and attributions with interactions. The Break Down Table shows contributions of every variable to a final prediction. The Break Down Plot presents variable contributions in a concise graphical way. This package works for classification and regression models. It is an extension of the 'breakDown' package (Staniak and Biecek 2018) <doi:10.32614/RJ-2018-072>, with new and faster strategies for orderings. It supports interactions in explanations and has interactive visuals (implemented with 'D3.js' library). The methodology behind is described in the 'iBreakDown' article (Gosiewska and Biecek 2019) <arXiv:1903.11420> This package is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.

Maintained by Przemyslaw Biecek. Last updated 1 years ago.

breakdown iml interpretability shapley xai

1.3 match 84 stars 10.07 score 56 scripts 22 dependents

uscbiostats

partition:Agglomerative Partitioning Framework for Dimension Reduction

A fast and flexible framework for agglomerative partitioning. 'partition' uses an approach called Direct-Measure-Reduce to create new variables that maintain the user-specified minimum level of information. Each reduced variable is also interpretable: the original variables map to one and only one variable in the reduced data set. 'partition' is flexible, as well: how variables are selected to reduce, how information loss is measured, and the way data is reduced can all be customized. 'partition' is based on the Partition framework discussed in Millstein et al. (2020) <doi:10.1093/bioinformatics/btz661>.

Maintained by Malcolm Barrett. Last updated 4 months ago.

data-reduction dimensionality-reduction partitional-clustering openblas cpp

1.7 match 36 stars 7.72 score 27 scripts 1 dependents

cwolock

survML:Tools for Flexible Survival Analysis Using Machine Learning

Statistical tools for analyzing time-to-event data using machine learning. Implements survival stacking for conditional survival estimation, standardized survival function estimation for current status data, and methods for algorithm-agnostic variable importance. See Wolock CJ, Gilbert PB, Simon N, and Carone M (2024) <doi:10.1080/10618600.2024.2304070>.

Maintained by Charles Wolock. Last updated 2 months ago.

1.6 match 16 stars 8.06 score 73 scripts 1 dependents

angelospsy

multifear:Multiverse Analyses for Conditioning Data

A suite of functions for performing analyses, based on a multiverse approach, for conditioning data. Specifically, given the appropriate data, the functions are able to perform t-tests, analyses of variance, and mixed models for the provided data and return summary statistics and plots. The function is also able to return for all those tests p-values, confidence intervals, and Bayes factors. The methods are described in Lonsdorf, Gerlicher, Klingelhofer-Jens, & Krypotos (2022) <doi:10.1016/j.brat.2022.104072>.

Maintained by Angelos-Miltiadis Krypotos. Last updated 1 years ago.

conditioning multiverse

3.1 match 3 stars 4.18 score 7 scripts

roelandkindt

BiodiversityR:Package for Community Ecology and Suitability Analysis

Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.

Maintained by Roeland Kindt. Last updated 2 months ago.

1.8 match 16 stars 7.42 score 390 scripts 2 dependents

michaelchirico

potools:Tools for Internationalization and Portability in R Packages

Translating messages in R packages is managed using the po top-level directory and the 'gettext' program. This package provides some helper functions for building this support in R packages, e.g. common validation & I/O tasks.

Maintained by Michael Chirico. Last updated 9 months ago.

i18n translation

1.8 match 59 stars 7.20 score 15 scripts

bioc

glmSparseNet:Network Centrality Metrics for Elastic-Net Regularized Models

glmSparseNet is an R-package that generalizes sparse regression models when the features (e.g. genes) have a graph structure (e.g. protein-protein interactions), by including network-based regularizers. glmSparseNet uses the glmnet R-package, by including centrality measures of the network as penalty weights in the regularization. The current version implements regularization based on node degree, i.e. the strength and/or number of its associated edges, either by promoting hubs in the solution or orphan genes in the solution. All the glmnet distribution families are supported, namely "gaussian", "poisson", "binomial", "multinomial", "cox", and "mgaussian".

Maintained by André Veríssimo. Last updated 5 months ago.

software statisticalmethod dimensionreduction regression classification survival network graphandnetwork

1.7 match 6 stars 7.42 score 41 scripts 1 dependents

mrc-ide

odin2:Next generation odin

Temporary package for rewriting odin.

Maintained by Rich FitzJohn. Last updated 2 months ago.

2.0 match 5 stars 6.32 score 22 scripts

bioc

psichomics:Graphical Interface for Alternative Splicing Quantification, Analysis and Visualisation

Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), Sequence Read Archive (SRA) and user-provided data. The tool interactively performs survival, dimensionality reduction and median- and variance-based differential splicing and gene expression analyses that benefit from the incorporation of clinical and molecular sample-associated features (such as tumour stage or survival). Interactive visual access to genomic mapping and functional annotation of selected alternative splicing events is also included.

Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.

sequencing rnaseq alternativesplicing differentialsplicing transcription gui principalcomponent survival biomedicalinformatics transcriptomics immunooncology visualization multiplecomparison geneexpression differentialexpression alternative-splicing bioconductor data-analyses differential-gene-expression differential-splicing-analysis gene-expression gtex recount2 rna-seq-data splicing-quantification sra tcga vast-tools cpp

1.8 match 36 stars 6.95 score 31 scripts

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 11 months ago.

5.4 match 2.28 score 38 scripts

teebusch

mifa:Multiple Imputation for Exploratory Factor Analysis

Impute the covariance matrix of incomplete data so that factor analysis can be performed. Imputations are made using multiple imputation by Multivariate Imputation with Chained Equations (MICE) and combined with Rubin's rules. Parametric Fieller confidence intervals and nonparametric bootstrap confidence intervals can be obtained for the variance explained by different numbers of principal components. The method is described in Nassiri et al. (2018) <doi:10.3758/s13428-017-1013-4>.

Maintained by Tobias Busch. Last updated 4 years ago.

factor-analysis imputation

4.1 match 2 stars 3.00 score 5 scripts

mdsteiner

EFAtools:Fast and Flexible Implementations of Exploratory Factor Analysis Tools

Provides functions to perform exploratory factor analysis (EFA) procedures and compare their solutions. The goal is to provide state-of-the-art factor retention methods and a high degree of flexibility in the EFA procedures. This way, for example, implementations from R 'psych' and 'SPSS' can be compared. Moreover, functions for Schmid-Leiman transformation and the computation of omegas are provided. To speed up the analyses, some of the iterative procedures, like principal axis factoring (PAF), are implemented in C++.

Maintained by Markus Steiner. Last updated 3 months ago.

openblas cpp openmp

1.9 match 10 stars 6.57 score 83 scripts 1 dependents

daniel-jg

BeviMed:Bayesian Evaluation of Variant Involvement in Mendelian Disease

A fast integrative genetic association test for rare diseases based on a model for disease status given allele counts at rare variant sites. Probability of association, mode of inheritance and probability of pathogenicity for individual variants are all inferred in a Bayesian framework - 'A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases', Greene et al 2017 <doi:10.1016/j.ajhg.2017.05.015>.

Maintained by Daniel Greene. Last updated 10 months ago.

cpp

3.6 match 1 stars 3.41 score 17 scripts

lygitdata

PaLMr:Interface for 'Google Pathways Language Model 2 (PaLM 2)'

'Google's 'PaLM 2' <https://developers.generativeai.google/> as a coding and writing assistant designed for 'R' and 'RStudio.' With a range of functions, including natural language processing and coding optimization, to assist R developers in simplifying tedious coding tasks and content searching.

Maintained by Li Yuan. Last updated 1 years ago.

ai google gpt machine-learning nlp palm-api palm2 rstudio

3.5 match 6 stars 3.48 score

jeremygelb

geocmeans:Implementing Methods for Spatial Fuzzy Unsupervised Classification

Provides functions to apply spatial fuzzy unsupervised classification, visualize and interpret results. This method is well suited when the user wants to analyze data with a fuzzy clustering algorithm and to account for the spatial dimension of the dataset. In addition, indexes for estimating the spatial consistency and classification quality are proposed. The methods were originally proposed in the field of brain imagery (seed Cai and al. 2007 <doi:10.1016/j.patcog.2006.07.011> and Zaho and al. 2013 <doi:10.1016/j.dsp.2012.09.016>) and recently applied in geography (see Gelb and Apparicio <doi:10.4000/cybergeo.36414>).

Maintained by Jeremy Gelb. Last updated 4 months ago.

clustering cmeans fuzzy-classification-algorithms spatial-analysis spatial-fuzzy-cmeans unsupervised-learning cpp openmp

2.0 match 27 stars 6.08 score 90 scripts

niaid

HDStIM:High Dimensional Stimulation Immune Mapping ('HDStIM')

A method for identifying responses to experimental stimulation in mass or flow cytometry that uses high dimensional analysis of measured parameters and can be performed with an end-to-end unsupervised approach. In the context of in vitro stimulation assays where high-parameter cytometry was used to monitor intracellular response markers, using cell populations annotated either through automated clustering or manual gating for a combined set of stimulated and unstimulated samples, 'HDStIM' labels cells as responding or non-responding. The package also provides auxiliary functions to rank intracellular markers based on their contribution to identifying responses and generating diagnostic plots.

Maintained by Rohit Farmer. Last updated 1 years ago.

complexheatmap assay cytof cytometry cytometry-analysis-pipeline flowcytometry stimulation

2.8 match 3 stars 4.41 score 17 scripts

lleisong

itsdm:Isolation Forest-Based Presence-Only Species Distribution Modeling

Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest) and its variations such as Extended isolation forest and SCiForest. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) <https://proceedings.mlr.press/v48/guha16.html>, Cortes, D. (2021) <arXiv:2110.13402>. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) <https://dl.acm.org/doi/abs/10.5555/3295222.3295230>, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Štrumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including 'WorldClim' version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and 'CMCC-BioClimInd' (see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>.

Maintained by Lei Song. Last updated 2 years ago.

isolation-forest outlier-detection presence-onlymodel shapley-value species-distribution-modelling

2.2 match 4 stars 5.59 score 65 scripts

kharchenkolab

conos:Clustering on Network of Samples

Wires together large collections of single-cell RNA-seq datasets, which allows for both the identification of recurrent cell clusters and the propagation of information between datasets in multi-sample or atlas-scale collections. 'Conos' focuses on the uniform mapping of homologous cell types across heterogeneous sample collections. For instance, users could investigate a collection of dozens of peripheral blood samples from cancer patients combined with dozens of controls, which perhaps includes samples of a related tissue such as lymph nodes. This package interacts with data available through the 'conosPanel' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/conos>. The size of the 'conosPanel' package is approximately 12 MB.

Maintained by Evan Biederstedt. Last updated 1 years ago.

batch-correction scrna-seq single-cell-rna-seq openblas cpp openmp

1.7 match 204 stars 7.32 score 258 scripts

rafajpsantos

bagged.outliertrees:Robust Explainable Outlier Detection Based on OutlierTree

Bagged OutlierTrees is an explainable unsupervised outlier detection method based on an ensemble implementation of the existing OutlierTree procedure (Cortes, 2020). This implementation takes advantage of bootstrap aggregating (bagging) to improve robustness by reducing the possible masking effect and subsequent high variance (similarly to Isolation Forest), hence the name "Bagged OutlierTrees". To learn more about the base procedure OutlierTree (Cortes, 2020), please refer to <arXiv:2001.00636>.

Maintained by Rafael Santos. Last updated 4 years ago.

3.4 match 6 stars 3.48 score 8 scripts

gadenbuie

regexplain:Rstudio Addin to Explain, Test and Build Regular Expressions

A set of RStudio Addins to help interactively test and build regular expressions. Provides a Shiny gadget interface for interactively constructing the regular expression and viewing the results from common string-searching functions. The gadget interface includes a helpful regex syntax reference sheet and a library of common patterns.

Maintained by Garrick Aden-Buie. Last updated 4 years ago.

gadget regex regex-expression regular-expression rstudio-addin shiny stringr

2.9 match 486 stars 4.07 score 12 scripts

jbgruber

askgpt:Asking GPT About R Stuff

A chat package connecting to API endpoints by 'OpenAI' (<https://platform.openai.com/>) to answer questions (about R).

Maintained by Johannes Gruber. Last updated 10 months ago.

2.0 match 56 stars 5.68 score 17 scripts

ccy-dev

LongDat:A Tool for 'Covariate'-Sensitive Longitudinal Analysis on 'omics' Data

This tool takes longitudinal dataset as input and analyzes if there is significant change of the features over time (a proxy for treatments), while detects and controls for 'covariates' simultaneously. 'LongDat' is able to take in several data types as input, including count, proportion, binary, ordinal and continuous data. The output table contains p values, effect sizes and 'covariates' of each feature, making the downstream analysis easy.

Maintained by Chia-Yu Chen. Last updated 4 months ago.

2.4 match 4 stars 4.60 score 4 scripts

julia-wrobel

registr:Curve Registration for Exponential Family Functional Data

A method for performing joint registration and functional principal component analysis for curves (functional data) that are generated from exponential family distributions. This mainly implements the algorithms described in 'Wrobel et al. (2019)' <doi:10.1111/biom.12963> and further adapts them to potentially incomplete curves where (some) curves are not observed from the beginning and/or until the end of the common domain. Curve registration can be used to better understand patterns in functional data by separating curves into phase and amplitude variability. This software handles both binary and continuous functional data, and is especially applicable in accelerometry and wearable technology.

Maintained by Julia Wrobel. Last updated 3 years ago.

openblas cpp

1.7 match 16 stars 6.27 score 29 scripts

joshwlambert

DAISIEprep:Extracts Phylogenetic Island Community Data from Phylogenetic Trees

Extracts colonisation and branching times of island species to be used for analysis in the R package 'DAISIE'. It uses phylogenetic and endemicity data to extract the separate island colonists and store them.

Maintained by Joshua W. Lambert. Last updated 1 months ago.

data-science island-biogeography phylogenetics

1.6 match 6 stars 6.78 score 24 scripts

modeloriented

triplot:Explaining Correlated Features in Machine Learning Models

Tools for exploring effects of correlated features in predictive models. The predict_triplot() function delivers instance-level explanations that calculate the importance of the groups of explanatory variables. The model_triplot() function delivers data-level explanations. The generic plot function visualises in a concise way importance of hierarchical groups of predictors. All of the the tools are model agnostic, therefore works for any predictive machine learning models. Find more details in Biecek (2018) <arXiv:1806.08915>.

Maintained by Katarzyna Pekala. Last updated 4 years ago.

explanations explanatory-model-analysis machine-learning model-visualization xai

2.9 match 9 stars 3.65 score 7 scripts

laylaparast

SBdecomp:Estimation of the Proportion of SB Explained by Confounders

Uses parametric and nonparametric methods to quantify the proportion of the estimated selection bias (SB) explained by each observed confounder when estimating propensity score weighted treatment effects. Parast, L and Griffin, BA (2020). "Quantifying the Bias due to Observed Individual Confounders in Causal Treatment Effect Estimates". Statistics in Medicine, 39(18): 2447- 2476 <doi: 10.1002/sim.8549>.

Maintained by Layla Parast. Last updated 3 years ago.

5.3 match 1 stars 2.00 score

mgondan

mathml:Translate R Expressions to 'MathML' and 'LaTeX'/'MathJax'

Translate R expressions to 'MathML' or 'MathJax'/'LaTeX' so that they can be rendered in R markdown documents and shiny apps. This package depends on R package 'rolog', which requires an installation of the 'SWI'-'Prolog' runtime either from 'swi-prolog.org' or from R package 'rswipl'.

Maintained by Matthias Gondan. Last updated 5 hours ago.

1.6 match 4 stars 6.46 score 32 scripts

maarten14c

rice:Radiocarbon Equations

Provides functions for the calibration of radiocarbon dates, as well as options to calculate different radiocarbon realms (C14 age, F14C, pMC, D14C) and estimating the effects of contamination or local reservoir offsets (Reimer and Reimer 2001 <doi:10.1017/S0033822200038339>). The methods follow long-established recommendations such as Stuiver and Polach (1977) <doi:10.1017/S0033822200003672> and Reimer et al. (2004) <doi:10.1017/S0033822200033154>. This package complements the data package 'rintcal'.

Maintained by Maarten Blaauw. Last updated 2 months ago.

1.7 match 1 stars 6.13 score 13 scripts 4 dependents

bioc

StructuralVariantAnnotation:Variant annotations for structural variants

StructuralVariantAnnotation provides a framework for analysis of structural variants within the Bioconductor ecosystem. This package contains contains useful helper functions for dealing with structural variants in VCF format. The packages contains functions for parsing VCFs from a number of popular callers as well as functions for dealing with breakpoints involving two separate genomic loci encoded as GRanges objects.

Maintained by Daniel Cameron. Last updated 5 months ago.

dataimport sequencing annotation genetics variantannotation

1.7 match 6.26 score 102 scripts 2 dependents

g-rho

xgrove:Explanation Groves

Compute surrogate explanation groves for predictive machine learning models and analyze complexity vs. explanatory power of an explanation according to Szepannek, G. and von Holt, B. (2023) <doi:10.1007/s41237-023-00205-2>.

Maintained by Gero Szepannek. Last updated 2 months ago.

3.0 match 3.40 score 1 scripts

bcjaeger

r2glmm:Computes R Squared for Mixed (Multilevel) Models

The model R squared and semi-partial R squared for the linear and generalized linear mixed model (LMM and GLMM) are computed with confidence limits. The R squared measure from Edwards et.al (2008) <DOI:10.1002/sim.3429> is extended to the GLMM using penalized quasi-likelihood (PQL) estimation (see Jaeger et al. 2016 <DOI:10.1080/02664763.2016.1193725>). Three methods of computation are provided and described as follows. First, The Kenward-Roger approach. Due to some inconsistency between the 'pbkrtest' package and the 'glmmPQL' function, the Kenward-Roger approach in the 'r2glmm' package is limited to the LMM. Second, The method introduced by Nakagawa and Schielzeth (2013) <DOI:10.1111/j.2041-210x.2012.00261.x> and later extended by Johnson (2014) <DOI:10.1111/2041-210X.12225>. The 'r2glmm' package only computes marginal R squared for the LMM and does not generalize the statistic to the GLMM; however, confidence limits and semi-partial R squared for fixed effects are useful additions. Lastly, an approach using standardized generalized variance (SGV) can be used for covariance model selection. Package installation instructions can be found in the readme file.

Maintained by Byron Jaeger. Last updated 10 months ago.

1.6 match 16 stars 6.29 score 243 scripts

philipppro

measures:Performance Measures for Statistical Learning

Provides the biggest amount of statistical measures in the whole R world. Includes measures of regression, (multiclass) classification and multilabel classification. The measures come mainly from the 'mlr' package and were programed by several 'mlr' developers.

Maintained by Philipp Probst. Last updated 4 years ago.

2.3 match 1 stars 4.47 score 88 scripts 2 dependents

bioc

peco:A Supervised Approach for Predicting cell Cycle Progression using scRNA-seq data

Our approach provides a way to assign continuous cell cycle phase using scRNA-seq data, and consequently, allows to identify cyclic trend of gene expression levels along the cell cycle. This package provides method and training data, which includes scRNA-seq data collected from 6 individual cell lines of induced pluripotent stem cells (iPSCs), and also continuous cell cycle phase derived from FUCCI fluorescence imaging data.

Maintained by Chiaowen Joyce Hsiao. Last updated 5 months ago.

sequencing rnaseq geneexpression transcriptomics singlecell software statisticalmethod classification visualization cell-cycle single-cell-rna-seq

1.7 match 12 stars 6.09 score 34 scripts

bioc

ramwas:Fast Methylome-Wide Association Study Pipeline for Enrichment Platforms

A complete toolset for methylome-wide association studies (MWAS). It is specifically designed for data from enrichment based methylation assays, but can be applied to other data as well. The analysis pipeline includes seven steps: (1) scanning aligned reads from BAM files, (2) calculation of quality control measures, (3) creation of methylation score (coverage) matrix, (4) principal component analysis for capturing batch effects and detection of outliers, (5) association analysis with respect to phenotypes of interest while correcting for top PCs and known covariates, (6) annotation of significant findings, and (7) multi-marker analysis (methylation risk score) using elastic net. Additionally, RaMWAS include tools for joint analysis of methlyation and genotype data. This work is published in Bioinformatics, Shabalin et al. (2018) <doi:10.1093/bioinformatics/bty069>.

Maintained by Andrey A Shabalin. Last updated 5 months ago.

dnamethylation sequencing qualitycontrol coverage preprocessing normalization batcheffect principalcomponent differentialmethylation visualization

1.7 match 10 stars 6.08 score 85 scripts

cran

rmlnomogram:Construct Explainable Nomogram for a Machine Learning Model

Construct an explainable nomogram for a machine learning (ML) model to improve availability of an ML prediction model in addition to a computer application, particularly in a situation where a computer, a mobile phone, an internet connection, or the application accessibility are unreliable. This package enables a nomogram creation for any ML prediction models, which is conventionally limited to only a linear/logistic regression model. This nomogram may indicate the explainability value per feature, e.g., the Shapley additive explanation value, for each individual. However, this package only allows a nomogram creation for a model using categorical without or with single numerical predictors. Detailed methodologies and examples are documented in our vignette, available at <https://htmlpreview.github.io/?https://github.com/herdiantrisufriyana/rmlnomogram/blob/master/doc/ml_nomogram_exemplar.html>.

Maintained by Herdiantri Sufriyana. Last updated 2 months ago.

3.7 match 2.70 score

rgcca-factory

RGCCA:Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data

Multi-block data analysis concerns the analysis of several sets of variables (blocks) observed on the same group of individuals. The main aims of the RGCCA package are: to study the relationships between blocks and to identify subsets of variables of each block which are active in their relationships with the other blocks. This package allows to (i) run R/SGCCA and related methods, (ii) help the user to find out the optimal parameters for R/SGCCA such as regularization parameters (tau or sparsity), (iii) evaluate the stability of the RGCCA results and their significance, (iv) build predictive models from the R/SGCCA. (v) Generic print() and plot() functions apply to all these functionalities.

Maintained by Arthur Tenenhaus. Last updated 8 months ago.

1.3 match 12 stars 7.43 score 74 scripts

ncss-tech

SoilTaxonomy:A System of Soil Classification for Making and Interpreting Soil Surveys

Taxonomic dictionaries, formative element lists, and functions related to the maintenance, development and application of U.S. Soil Taxonomy. Data and functionality are based on official U.S. Department of Agriculture sources including the latest edition of the Keys to Soil Taxonomy. Descriptions and metadata are obtained from the National Soil Information System or Soil Survey Geographic databases. Other sources are referenced in the data documentation. Provides tools for understanding and interacting with concepts in the U.S. Soil Taxonomic System. Most of the current utilities are for working with taxonomic concepts at the "higher" taxonomic levels: Order, Suborder, Great Group, and Subgroup.

Maintained by Andrew Brown. Last updated 6 months ago.

great-group ncss-tech soil soil-survey soil-taxonomy subgroup suborder usda

1.8 match 15 stars 5.65 score

jiscah

sequoia:Pedigree Inference from SNPs

Multi-generational pedigree inference from incomplete data on hundreds of SNPs, including parentage assignment and sibship clustering. See Huisman (2017) (<DOI:10.1111/1755-0998.12665>) for more information.

Maintained by Jisca Huisman. Last updated 9 months ago.

pedigree pedigree-reconstruction pedigrees sequoia snp snp-data fortran

1.3 match 26 stars 7.40 score 79 scripts

plantedml

glex:Global Explanations for Tree-Based Models

Global explanations for tree-based models by decomposing regression or classification functions into the sum of main components and interaction components of arbitrary order. Calculates SHAP values and q-interaction SHAP for all values of q for tree-based models such as xgboost.

Maintained by Marvin N. Wright. Last updated 3 days ago.

cpp

2.0 match 5 stars 4.75 score 15 scripts

mathurlabstanford

multibiasmeta:Sensitivity Analysis for Multiple Biases in Meta-Analyses

Meta-analyses can be compromised by studies' internal biases (e.g., confounding in nonrandomized studies) as well as by publication bias. This package conducts sensitivity analyses for the joint effects of these biases (per Mathur (2022) <doi:10.31219/osf.io/u7vcb>). These sensitivity analyses address two questions: (1) For a given severity of internal bias across studies and of publication bias, how much could the results change?; and (2) For a given severity of publication bias, how severe would internal bias have to be, hypothetically, to attenuate the results to the null or by a given amount?

Maintained by Peter Solymos. Last updated 2 years ago.

2.4 match 4.00 score 6 scripts

project-gen3sis

gen3sis:General Engine for Eco-Evolutionary Simulations

Contains an engine for spatially-explicit eco-evolutionary mechanistic models with a modular implementation and several support functions. It allows exploring the consequences of ecological and macroevolutionary processes across realistic or theoretical spatio-temporal landscapes on biodiversity patterns as a general term. Reference: Oskar Hagen, Benjamin Flueck, Fabian Fopp, Juliano S. Cabral, Florian Hartig, Mikael Pontarp, Thiago F. Rangel, Loic Pellissier (2021) "gen3sis: A general engine for eco-evolutionary simulations of the processes that shape Earth's biodiversity" <doi:10.1371/journal.pbio.3001340>.

Maintained by Oskar Hagen. Last updated 1 years ago.

biodiversity ecology evolution mechanistic model modeling simulation cpp

1.3 match 29 stars 7.56 score 69 scripts

rmurden

CJIVE:Canonical Joint and Individual Variation Explained (CJIVE)

Joint and Individual Variation Explained (JIVE) is a method for decomposing multiple datasets obtained on the same subjects into shared structure, structure unique to each dataset, and noise. The two most common implementations are R.JIVE, an iterative approach, and AJIVE, which uses principal angle analysis. JIVE estimates subspaces but interpreting these subspaces can be challenging with AJIVE or R.JIVE. We expand upon insights into AJIVE as a canonical correlation analysis (CCA) of principal component scores. This reformulation, which we call CJIVE, 1) provides an ordering of joint components by the degree of correlation between corresponding canonical variables; 2) uses a computationally efficient permutation test for the number of joint components, which provides a p-value for each component; and 3) can be used to predict subject scores for out-of-sample observations. Please cite the following article when utilizing this package: Murden, R., Zhang, Z., Guo, Y., & Risk, B. (2022) <doi:10.3389/fnins.2022.969510>.

Maintained by Raphiel Murden. Last updated 2 years ago.

5.3 match 1.78 score 12 scripts

jinli22

spm:Spatial Predictive Modeling

Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https:www.ga.gov.au/metadata-gateway/metadata/record/gcat_71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https:www.ga.gov.au/metadata-gateway/metadata/record/74030>.

Maintained by Jin Li. Last updated 3 years ago.

1.7 match 3 stars 5.46 score 107 scripts 3 dependents

dgrun

RaceID:Identification of Cell Types, Inference of Lineage Trees, and Prediction of Noise Dynamics from Single-Cell RNA-Seq Data

Application of 'RaceID' allows inference of cell types and prediction of lineage trees by the 'StemID2' algorithm (Herman, J.S., Sagar, Grun D. (2018) <DOI:10.1038/nmeth.4662>). 'VarID2' is part of this package and allows quantification of biological gene expression noise at single-cell resolution (Rosales-Alvarez, R.E., Rettkowski, J., Herman, J.S., Dumbovic, G., Cabezas-Wallscheid, N., Grun, D. (2023) <DOI:10.1186/s13059-023-02974-1>).

Maintained by Dominic Grün. Last updated 4 months ago.

cpp

1.9 match 4.74 score 110 scripts

cmlmagneville

mFD:Compute and Illustrate the Multiple Facets of Functional Diversity

Computing functional traits-based distances between pairs of species for species gathered in assemblages allowing to build several functional spaces. The package allows to compute functional diversity indices assessing the distribution of species (and of their dominance) in a given functional space for each assemblage and the overlap between assemblages in a given functional space, see: Chao et al. (2018) <doi:10.1002/ecm.1343>, Maire et al. (2015) <doi:10.1111/geb.12299>, Mouillot et al. (2013) <doi:10.1016/j.tree.2012.10.004>, Mouillot et al. (2014) <doi:10.1073/pnas.1317625111>, Ricotta and Szeidl (2009) <doi:10.1016/j.tpb.2009.10.001>. Graphical outputs are included. Visit the 'mFD' website for more information, documentation and examples.

Maintained by Camille Magneville. Last updated 3 months ago.

1.2 match 26 stars 7.35 score 61 scripts

bioc

pipeComp:pipeComp pipeline benchmarking framework

A simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition` class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline` function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.

Maintained by Pierre-Luc Germain. Last updated 5 months ago.

geneexpression transcriptomics clustering datarepresentation benchmark bioconductor pipeline-benchmarking pipelines single-cell-rna-seq

1.3 match 41 stars 7.02 score 43 scripts

bioc

corral:Correspondence Analysis for Single Cell Data

Correspondence analysis (CA) is a matrix factorization method, and is similar to principal components analysis (PCA). Whereas PCA is designed for application to continuous, approximately normally distributed data, CA is appropriate for non-negative, count-based data that are in the same additive scale. The corral package implements CA for dimensionality reduction of a single matrix of single-cell data, as well as a multi-table adaptation of CA that leverages data-optimized scaling to align data generated from different sequencing platforms by projecting into a shared latent space. corral utilizes sparse matrices and a fast implementation of SVD, and can be called directly on Bioconductor objects (e.g., SingleCellExperiment) for easy pipeline integration. The package also includes additional options, including variations of CA to address overdispersion in count data (e.g., Freeman-Tukey chi-squared residual), as well as the option to apply CA-style processing to continuous data (e.g., proteomic TOF intensities) with the Hellinger distance adaptation of CA.

Maintained by Lauren Hsu. Last updated 5 months ago.

batcheffect dimensionreduction geneexpression preprocessing principalcomponent sequencing singlecell software visualization

1.9 match 4.64 score 22 scripts

laylaparast

SurrogateOutcome:Estimation of the Proportion of Treatment Effect Explained by Surrogate Outcome Information

Provides functions to estimate the proportion of treatment effect on a censored primary outcome that is explained by the treatment effect on a censored surrogate outcome/event. All methods are described in detail in Parast, Tian, Cai (2020) "Assessing the Value of a Censored Surrogate Outcome" <doi:10.1007/s10985-019-09473-1>. The main functions are (1) R.q.event() which calculates the proportion of the treatment effect (the difference in restricted mean survival time at time t) explained by surrogate outcome information observed up to a selected landmark time, (2) R.t.estimate() which calculates the proportion of the treatment effect explained by primary outcome information only observed up to a selected landmark time, and (3) IV.event() which calculates the incremental value of the surrogate outcome information.

Maintained by Layla Parast. Last updated 3 years ago.

8.6 match 1.00 score

bioc

SpliceWiz:interactive analysis and visualization of alternative splicing in R

The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization.

Maintained by Alex Chit Hei Wong. Last updated 4 days ago.

software transcriptomics rnaseq alternativesplicing coverage differentialsplicing differentialexpression gui sequencing cpp openmp

1.3 match 16 stars 6.41 score 8 scripts

soumyaray

air:AI Assistant to Write and Understand R Code

An R console utility that lets you ask R related questions to the 'OpenAI' large language model. It can answer 'how-to()' questions by providing code, and 'whatis()' questions by explaining what given code does. You must provision your own key for the 'OpenAI' API <https://platform.openai.com/docs/api-reference>.

Maintained by Soumya Ray. Last updated 1 years ago.

2.1 match 14 stars 3.92 score 12 scripts

bioc

weitrix:Tools for matrices with precision weights, test and explore weighted or sparse data

Data type and tools for working with matrices having precision weights and missing data. This package provides a common representation and tools that can be used with many types of high-throughput data. The meaning of the weights is compatible with usage in the base R function "lm" and the package "limma". Calibrate weights to account for known predictors of precision. Find rows with excess variability. Perform differential testing and find rows with the largest confident differences. Find PCA-like components of variation even with many missing values, rotated so that individual components may be meaningfully interpreted. DelayedArray matrices and BiocParallel are supported.

Maintained by Paul Harrison. Last updated 5 months ago.

software datarepresentation dimensionreduction geneexpression transcriptomics rnaseq singlecell regression

1.7 match 4.70 score 8 scripts

oconnellmj

r.jive:Perform JIVE Decomposition for Multi-Source Data

Performs the Joint and Individual Variation Explained (JIVE) decomposition on a list of data sets when the data share a dimension, returning low-rank matrices that capture the joint and individual structure of the data [O'Connell, MJ and Lock, EF (2016) <doi:10.1093/bioinformatics/btw324>]. It provides two methods of rank selection when the rank is unknown, a permutation test and a Bayesian Information Criterion (BIC) selection algorithm. Also included in the package are three plotting functions for visualizing the variance attributed to each data source: a bar plot that shows the percentages of the variability attributable to joint and individual structure, a heatmap that shows the structure of the variability, and principal component plots.

Maintained by Michael J. OConnell. Last updated 4 years ago.

2.5 match 2 stars 3.18 score 75 scripts

lrocconi

mlmhelpr:Multilevel/Mixed Model Helper Functions

A collection of miscellaneous helper function for running multilevel/mixed models in 'lme4'. This package aims to provide functions to compute common tasks when estimating multilevel models such as computing the intraclass correlation and design effect, centering variables, estimating the proportion of variance explained at each level, pseudo-R squared, random intercept and slope reliabilities, tests for homogeneity of variance at level-1, and cluster robust and bootstrap standard errors. The tests and statistics reported in the package are from Raudenbush & Bryk (2002, ISBN:9780761919049), Hox et al. (2018, ISBN:9781138121362), and Snijders & Bosker (2012, ISBN:9781849202015).

Maintained by Louis Rocconi. Last updated 3 months ago.

2.5 match 1 stars 3.00 score 10 scripts

bioc

trigger:Transcriptional Regulatory Inference from Genetics of Gene ExpRession

This R package provides tools for the statistical analysis of integrative genomic data that involve some combination of: genotypes, high-dimensional intermediate traits (e.g., gene expression, protein abundance), and higher-order traits (phenotypes). The package includes functions to: (1) construct global linkage maps between genetic markers and gene expression; (2) analyze multiple-locus linkage (epistasis) for gene expression; (3) quantify the proportion of genome-wide variation explained by each locus and identify eQTL hotspots; (4) estimate pair-wise causal gene regulatory probabilities and construct gene regulatory networks; and (5) identify causal genes for a quantitative trait of interest.

Maintained by John D. Storey. Last updated 5 months ago.

geneexpression snp geneticvariability microarray genetics

2.2 match 3.30 score 3 scripts

bioc

STATegRa:Classes and methods for multi-omics data integration

Classes and tools for multi-omics data integration.

Maintained by David Gomez-Cabrero. Last updated 5 months ago.

software statisticalmethod clustering dimensionreduction principalcomponent

1.8 match 4.15 score 3 scripts

ranjitstat

EEML:Ensemble Explainable Machine Learning Models

We introduced a novel ensemble-based explainable machine learning model using Model Confidence Set (MCS) and two stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) algorithm. The model combined the predictive capabilities of different machine-learning models and integrates the interpretability of explainability methods. To develop the proposed algorithm, a two-stage Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) framework was employed. The package has been developed using the algorithm of Paul et al. (2023) <doi:10.1007/s40009-023-01218-x> and Yeasin and Paul (2024) <doi:10.1007/s11227-023-05542-3>.

Maintained by Dr. Ranjit Kumar Paul. Last updated 8 months ago.

5.6 match 1.30 score

coffeemuggler

EMMAgeo:End-Member Modelling of Grain-Size Data

End-member modelling analysis of grain-size data is an approach to unmix a data set's underlying distributions and their contribution to the data set. EMMAgeo provides deterministic and robust protocols for that purpose.

Maintained by Michael Dietze. Last updated 5 years ago.

1.8 match 10 stars 4.13 score 27 scripts

kapelner

GreedyExperimentalDesign:Greedy Experimental Design Construction

Computes experimental designs for a two-arm experiment with covariates via a number of methods: (0) complete randomization and randomization with forced-balance, (1) Greedily optimizing a balance objective function via pairwise switching. This optimization provides lower variance for the treatment effect estimator (and higher power) while preserving a design that is close to complete randomization. We return all iterations of the designs for use in a permutation test, (2) The second is via numerical optimization (via 'gurobi' which must be installed, see <https://www.gurobi.com/documentation/9.1/quickstart_windows/r_ins_the_r_package.html>) a la Bertsimas and Kallus, (3) rerandomization, (4) Karp's method for one covariate, (5) exhaustive enumeration to find the optimal solution (only for small sample sizes), (6) Binary pair matching using the 'nbpMatching' library, (7) Binary pair matching plus design number (1) to further optimize balance, (8) Binary pair matching plus design number (3) to further optimize balance, (9) Hadamard designs, (10) Simultaneous Multiple Kernels. In (1-9) we allow for three objective functions: Mahalanobis distance, Sum of absolute differences standardized and Kernel distances via the 'kernlab' library. This package is the result of a stream of research that can be found in Krieger, A, Azriel, D and Kapelner, A "Nearly Random Designs with Greatly Improved Balance" (2016) <arXiv:1612.02315>, Krieger, A, Azriel, D and Kapelner, A "Better Experimental Design by Hybridizing Binary Matching with Imbalance Optimization" (2021) <arXiv:2012.03330>.

Maintained by Adam Kapelner. Last updated 12 days ago.

cpp openjdk

1.7 match 4.16 score 16 scripts 1 dependents

sunsmiling

PPtreeregViz:Projection Pursuit Regression Tree Visualization

It was developed as a tool for exploring 'PPTreereg' (Projection Pursuit TREE of REGression). It uses various projection pursuit indexes and 'XAI' (eXplainable Artificial Intelligence) methods to help understand the model by finding connections between the input variables and prediction values of the model. The 'KernelSHAP' (Aas, Jullum and Løland (2019) <arXiv:1903.10464>) algorithm was modified to fit ‘PPTreereg’, and some codes were modified from the 'shapr' package (Sellereite, Nikolai, and Martin Jullum (2020) <doi:10.21105/joss.02027>). The implemented methods help to explore the model at the single instance level as well as at the whole dataset level. Users can compare with other machine learning models by applying it to the 'DALEX' package of 'R'.

Maintained by HyunSun Cho. Last updated 1 years ago.

openblas cpp

2.3 match 2 stars 3.00 score 3 scripts

ataher76

aLBI:Estimating Length-Based Indicators for Fish Stock

Provides tools for estimating length-based indicators from length frequency data to assess fish stock status and manage fisheries sustainably. Implements methods from Cope and Punt (2009) <doi:10.1577/C08-025.1> for data-limited stock assessment and Froese (2004) <doi:10.1111/j.1467-2979.2004.00144.x> for detecting overfishing using simple indicators. Key functions include: FrequencyTable(): Calculate the frequency table from the collected and also the extract the length frequency data from the frequency table with the upper length_range. A numeric value specifying the bin width for class intervals. If not provided, the bin width is automatically calculated using Sturges (1926) <doi:10.1080/01621459.1926.10502161> formula. CalPar(): Calculates various lengths used in fish stock assessment as biological length indicators such as asymptotic length (Linf), maximum length (Lmax), length at sexual maturity (Lm), and optimal length (Lopt). FishPar(): Calculates length-based indicators (LBIs) proposed by Froese (2004) <doi:10.1111/j.1467-2979.2004.00144.x> such as the percentage of mature fish (Pmat), percentage of optimal length fish (Popt), percentage of mega spawners (Pmega), and the sum of these as Pobj. This function also estimates confidence intervals for different lengths, visualizes length frequency distributions, and provides data frames containing calculated values. FishSS(): Makes decisions based on input from Cope and Punt (2009) <doi:10.1577/C08-025.1> and parameters calculated by FishPar() (e.g., Pobj, Pmat, Popt, LM_ratio) to determine stock status as target spawning biomass (TSB40) and limit spawning biomass (LSB25). These tools support fisheries management decisions by providing robust, data-driven insights.

Maintained by Ataher Ali. Last updated 4 months ago.

1.5 match 1 stars 4.60 score 7 scripts

benjaminschlegel

glm.predict:Predicted Values and Discrete Changes for Regression Models

Functions to calculate predicted values and the difference between the two cases with confidence interval for lm() [linear model], glm() [generalized linear model], glm.nb() [negative binomial model], polr() [ordinal logistic model], vglm() [generalized ordinal logistic model], multinom() [multinomial model], tobit() [tobit model], svyglm() [survey-weighted generalised linear models] and lmer() [linear multilevel models] using Monte Carlo simulations or bootstrap. Reference: Bennet A. Zelner (2009) <doi:10.1002/smj.783>.

Maintained by Benjamin E. Schlegel. Last updated 7 months ago.

1.3 match 1 stars 5.10 score 55 scripts

nzilbb

nzilbb.vowels:Vowel Covariation Tools

Tools to support research on vowel covariation. Methods are provided to support Principal Component Analysis workflows (as in Brand et al. (2021) <doi:10.1016/j.wocn.2021.101096> and Wilson Black et al. (2023) <doi:10.1515/lingvan-2022-0086>).

Maintained by Joshua Wilson Black. Last updated 3 months ago.

1.8 match 3.88 score 15 scripts

cran

ciu:Contextual Importance and Utility

Implementation of the Contextual Importance and Utility (CIU) concepts for Explainable AI (XAI). A recent description of CIU can be found in e.g. Främling (2020) <arXiv:2009.13996>.

Maintained by Kary Främling. Last updated 2 years ago.

6.8 match 1.00 score

qiuanzhu

remss:Refining Evaluation Methodology on Stage System

T (extent of the primary tumor), N (absence or presence and extent of regional lymph node metastasis) and M (absence or presence of distant metastasis) are three components to describe the anatomical tumor extent. TNM stage is important in treatment decision-making and outcome predicting. The existing oropharyngeal Cancer (OPC) TNM stages have not made distinction of the two sub sites of Human papillomavirus positive (HPV+) and Human papillomavirus negative (HPV-) diseases. We developed novel criteria to assess performance of the TNM stage grouping schemes based on parametric modeling adjusting on important clinical factors. These criteria evaluate the TNM stage grouping scheme in five different measures: hazard consistency, hazard discrimination, explained variation, likelihood difference, and balance. The methods are described in Xu, W., et al. (2015) <https://www.austinpublishinggroup.com/biometrics/fulltext/biometrics-v2-id1014.php>.

Maintained by Yi Zhu. Last updated 4 years ago.

2.5 match 2.70 score 2 scripts

cran

GUniFrac:Generalized UniFrac Distances, Distance-Based Multivariate Methods and Feature-Based Univariate Methods for Microbiome Data Analysis

A suite of methods for powerful and robust microbiome data analysis including data normalization, data simulation, community-level association testing and differential abundance analysis. It implements generalized UniFrac distances, Geometric Mean of Pairwise Ratios (GMPR) normalization, semiparametric data simulator, distance-based statistical methods, and feature-based statistical methods. The distance-based statistical methods include three extensions of PERMANOVA: (1) PERMANOVA using the Freedman-Lane permutation scheme, (2) PERMANOVA omnibus test using multiple matrices, and (3) analytical approach to approximating PERMANOVA p-value. Feature-based statistical methods include linear model-based methods for differential abundance analysis of zero-inflated high-dimensional compositional data.

Maintained by Jun Chen. Last updated 2 years ago.

cpp

1.1 match 5.96 score 277 scripts 7 dependents

j-mitchel

scITD:Single-Cell Interpretable Tensor Decomposition

Single-cell Interpretable Tensor Decomposition (scITD) employs the Tucker tensor decomposition to extract multicell-type gene expression patterns that vary across donors/individuals. This tool is geared for use with single-cell RNA-sequencing datasets consisting of many source donors. The method has a wide range of potential applications, including the study of inter-individual variation at the population-level, patient sub-grouping/stratification, and the analysis of sample-level batch effects. Each "multicellular process" that is extracted consists of (A) a multi cell type gene loadings matrix and (B) a corresponding donor scores vector indicating the level at which the corresponding loadings matrix is expressed in each donor. Additional methods are implemented to aid in selecting an appropriate number of factors and to evaluate stability of the decomposition. Additional tools are provided for downstream analysis, including integration of gene set enrichment analysis and ligand-receptor analysis. Tucker, L.R. (1966) <doi:10.1007/BF02289464>. Unkel, S., Hannachi, A., Trendafilov, N. T., & Jolliffe, I. T. (2011) <doi:10.1007/s13253-011-0055-9>. Zhou, G., & Cichocki, A. (2012) <doi:10.2478/v10175-012-0051-4>.

Maintained by Jonathan Mitchel. Last updated 2 years ago.

cpp

3.3 match 1.98 score 19 scripts

l-ramirez-lopez

resemble:Memory-Based Learning in Spectral Chemometrics

Functions for dissimilarity analysis and memory-based learning (MBL, a.k.a local modeling) in complex spectral data sets. Most of these functions are based on the methods presented in Ramirez-Lopez et al. (2013) <doi:10.1016/j.geoderma.2012.12.014>.

Maintained by Leonardo Ramirez-Lopez. Last updated 2 years ago.

chemoinformatics chemometrics infrared-spectroscopy lazy-learning local-regression machine-learning memory-based-learning nir pedometrics soil-spectroscopy spectral-data spectral-library spectroscopy openblas cpp openmp

1.1 match 20 stars 5.91 score 27 scripts