Showing 200 of total 407 results (show query)
topepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 4 months ago.
1.6k stars 19.24 score 61k scripts 303 dependentstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 3 days ago.
586 stars 18.80 score 7.2k scripts 383 dependentstidymodels
tidymodels:Easily Install and Load the 'Tidymodels' Packages
The tidy modeling "verse" is a collection of packages for modeling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse.
Maintained by Max Kuhn. Last updated 1 months ago.
783 stars 16.52 score 66k scripts 15 dependentstidymodels
tune:Tidy Tuning Tools
The ability to tune models is important. 'tune' contains functions and classes to be used in conjunction with other 'tidymodels' packages for finding reasonable values of hyper-parameters in models, pre-processing methods, and post-processing steps.
Maintained by Max Kuhn. Last updated 27 days ago.
293 stars 14.27 score 756 scripts 39 dependentsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
626 stars 14.20 score 4.0k scripts 16 dependentstidymodels
workflows:Modeling Workflows
Managing both a 'parsnip' model and a preprocessor, such as a model formula or recipe from 'recipes', can often be challenging. The goal of 'workflows' is to streamline this process by bundling the model alongside the preprocessor, all within the same object.
Maintained by Simon Couch. Last updated 1 months ago.
207 stars 13.97 score 876 scripts 43 dependentsbusiness-science
tidyquant:Tidy Quantitative Financial Analysis
Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.
Maintained by Matt Dancho. Last updated 2 months ago.
dplyrfinancial-analysisfinancial-datafinancial-statementsmultiple-stocksperformance-analysisperformanceanalyticsquantmodstockstock-exchangesstock-indexesstock-listsstock-performancestock-pricesstock-symboltidyversetime-seriestimeseriesxts
872 stars 13.34 score 5.2k scriptsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 9 days ago.
deep-learningmachine-learningnlptransformersopenjdk
145 stars 13.21 score 436 scripts 1 dependentsr-dbi
bigrquery:An Interface to Google's 'BigQuery' 'API'
Easily talk to Google's 'BigQuery' database from R.
Maintained by Hadley Wickham. Last updated 1 months ago.
520 stars 12.47 score 1.8k scripts 4 dependentstidymodels
probably:Tools for Post-Processing Predicted Values
Models can be improved by post-processing class probabilities, by: recalibration, conversion to hard probabilities, assessment of equivocal zones, and other activities. 'probably' contains tools for conducting these operations as well as calibration tools and conformal inference techniques for regression models.
Maintained by Max Kuhn. Last updated 6 months ago.
115 stars 12.09 score 21k scripts 1 dependentstidymodels
workflowsets:Create a Collection of 'tidymodels' Workflows
A workflow is a combination of a model and preprocessors (e.g, a formula, recipe, etc.) (Kuhn and Silge (2021) <https://www.tmwr.org/>). In order to try different combinations of these, an object can be created that contains many workflows. There are functions to create workflows en masse as well as training them and visualizing the results.
Maintained by Simon Couch. Last updated 5 months ago.
94 stars 12.04 score 294 scripts 19 dependentszachmayer
caretEnsemble:Ensembles of Caret Models
Functions for creating ensembles of caret models: caretList() and caretStack(). caretList() is a convenience function for fitting multiple caret::train() models to the same dataset. caretStack() will make linear or non-linear combinations of these models, using a caret::train() model as a meta-model.
Maintained by Zachary A. Deane-Mayer. Last updated 3 months ago.
226 stars 11.98 score 780 scripts 1 dependentshannameyer
CAST:'caret' Applications for Spatial-Temporal Models
Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.
Maintained by Hanna Meyer. Last updated 2 months ago.
autocorrelationcaretfeature-selectionmachine-learningoverfittingpredictive-modelingspatialspatio-temporalvariable-selection
114 stars 11.85 score 298 scripts 1 dependentstidymodels
stacks:Tidy Model Stacking
Model stacking is an ensemble technique that involves training a model to combine the outputs of many diverse statistical models, and has been shown to improve predictive performance in a variety of settings. 'stacks' implements a grammar for 'tidymodels'-aligned model stacking.
Maintained by Simon Couch. Last updated 5 months ago.
298 stars 11.46 score 840 scriptstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 12 days ago.
160 stars 10.86 score 964 scripts 1 dependentsbusiness-science
modeltime:The Tidymodels Extension for Time Series Modeling
The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).
Maintained by Matt Dancho. Last updated 5 months ago.
arimadata-sciencedeep-learningetsforecastingmachine-learningmachine-learning-algorithmsmodeltimeprophettbatstidymodelingtidymodelstimetime-seriestime-series-analysistimeseriestimeseries-forecasting
551 stars 10.61 score 1.1k scripts 7 dependentstidymodels
themis:Extra Recipes Steps for Dealing with Unbalanced Data
A dataset with an uneven number of cases in each class is said to be unbalanced. Many models produce a subpar performance on unbalanced datasets. A dataset can be balanced by increasing the number of minority cases using SMOTE 2011 <doi:10.48550/arXiv.1106.1813>, BorderlineSMOTE 2005 <doi:10.1007/11538059_91> and ADASYN 2008 <https://ieeexplore.ieee.org/document/4633969>. Or by decreasing the number of majority cases using NearMiss 2003 <https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf> or Tomek link removal 1976 <https://ieeexplore.ieee.org/document/4309452>.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
143 stars 10.37 score 1.3k scripts 2 dependentsludvigolsen
cvms:Cross-Validation for Model Selection
Cross-validate one or multiple regression and classification models and get relevant evaluation metrics in a tidy format. Validate the best model on a test set and compare it to a baseline evaluation. Alternatively, evaluate predictions from an external model. Currently supports regression and classification (binary and multiclass). Described in chp. 5 of Jeyaraman, B. P., Olsen, L. R., & Wambugu M. (2019, ISBN: 9781838550134).
Maintained by Ludvig Renbo Olsen. Last updated 25 days ago.
39 stars 10.31 score 492 scripts 5 dependentsbioc
pRoloc:A unifying bioinformatics framework for spatial proteomics
The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation.
Maintained by Lisa Breckels. Last updated 4 days ago.
immunooncologyproteomicsmassspectrometryclassificationclusteringqualitycontrolbioconductorproteomics-dataspatial-proteomicsvisualisationopenblascpp
15 stars 10.31 score 101 scripts 2 dependentsbleutner
RStoolbox:Remote Sensing Data Analysis
Toolbox for remote sensing image processing and analysis such as calculating spectral indexes, principal component transformation, unsupervised and supervised classification or fractional cover analyses.
Maintained by Konstantin Mueller. Last updated 2 months ago.
ggplot2land-cover-mappingremote-sensingspectral-unmixingsupervised-classificationunsupervised-classificationopenblascpp
275 stars 10.10 score 1.1k scriptstlverse
sl3:Pipelines for Machine Learning and Super Learning
A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.
Maintained by Jeremy Coyle. Last updated 5 months ago.
data-scienceensemble-learningensemble-modelmachine-learningmodel-selectionregressionstackingstatistics
100 stars 9.94 score 748 scripts 7 dependentsohdsi
CohortConstructor:Build and Manipulate Study Cohorts Using a Common Data Model
Create and manipulate study cohorts in data mapped to the Observational Medical Outcomes Partnership Common Data Model.
Maintained by Edward Burn. Last updated 3 days ago.
2 stars 9.73 score 207 scripts 2 dependentsbblonder
hypervolume:High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls
Estimates the shape and volume of high-dimensional datasets and performs set operations: intersection / overlap, union, unique components, inclusion test, and hole detection. Uses stochastic geometry approach to high-dimensional kernel density estimation, support vector machine delineation, and convex hull generation. Applications include modeling trait and niche hypervolumes and species distribution modeling.
Maintained by Benjamin Blonder. Last updated 2 months ago.
23 stars 9.69 score 211 scripts 7 dependentsbusiness-science
anomalize:Tidy Anomaly Detection
The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess ("stl") and seasonal decomposition by piecewise medians ("twitter"). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range ("iqr") and generalized extreme studentized deviation ("gesd"). These methods are based on those used in the 'forecast' package and the Twitter 'AnomalyDetection' package. Refer to the associated functions for specific references for these methods.
Maintained by Matt Dancho. Last updated 1 years ago.
anomalyanomaly-detectiondecompositiondetect-anomaliesiqrtime-series
339 stars 9.56 score 332 scriptsndphillips
FFTrees:Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees
Create, visualize, and test fast-and-frugal decision trees (FFTs) using the algorithms and methods described by Phillips, Neth, Woike & Gaissmaier (2017), <doi:10.1017/S1930297500006239>. FFTs are simple and transparent decision trees for solving binary classification problems. FFTs can be preferable to more complex algorithms because they require very little information, are easy to understand and communicate, and are robust against overfitting.
Maintained by Hansjoerg Neth. Last updated 5 months ago.
136 stars 9.53 score 144 scriptsmicrosoft
finnts:Microsoft Finance Time Series Forecasting Framework
Automated time series forecasting developed by Microsoft Finance. The Microsoft Finance Time Series Forecasting Framework, aka Finn, can be used to forecast any component of the income statement, balance sheet, or any other area of interest by finance. Any numerical quantity over time, Finn can be used to forecast it. While it can be applied outside of the finance domain, Finn was built to meet the needs of financial analysts to better forecast their businesses within a company, and has a lot of built in features that are specific to the needs of financial forecasters. Happy forecasting!
Maintained by Mike Tokic. Last updated 1 months ago.
businessdata-sciencefeature-selectionfinancefinntsforecastingmachine-learningmicrosofttime-series
194 stars 9.30 score 39 scriptsbusiness-science
sweep:Tidy Tools for Forecasting
Tidies up the forecasting modeling and prediction work flow, extends the 'broom' package with 'sw_tidy', 'sw_glance', 'sw_augment', and 'sw_tidy_decomp' functions for various forecasting models, and enables converting 'forecast' objects to "tidy" data frames with 'sw_sweep'.
Maintained by Matt Dancho. Last updated 1 years ago.
broomforecastforecasting-modelspredictiontidytidyversetimetime-seriestimeseries
155 stars 9.23 score 399 scripts 1 dependentstidymodels
embed:Extra Recipes for Encoding Predictors
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
142 stars 9.18 score 1.1k scriptsmlverse
tabnet:Fit 'TabNet' Models for Classification and Regression
Implements the 'TabNet' model by Sercan O. Arik et al. (2019) <doi:10.48550/arXiv.1908.07442> with 'Coherent Hierarchical Multi-label Classification Networks' by Giunchiglia et al. <doi:10.48550/arXiv.2010.10151> and provides a consistent interface for fitting and creating predictions. It's also fully compatible with the 'tidymodels' ecosystem.
Maintained by Christophe Regouby. Last updated 14 hours ago.
109 stars 9.05 score 65 scriptspedrohcgs
DRDID:Doubly Robust Difference-in-Differences Estimators
Implements the locally efficient doubly robust difference-in-differences (DiD) estimators for the average treatment effect proposed by Sant'Anna and Zhao (2020) <doi:10.1016/j.jeconom.2020.06.003>. The estimator combines inverse probability weighting and outcome regression estimators (also implemented in the package) to form estimators with more attractive statistical properties. Two different estimation methods can be used to estimate the nuisance functions.
Maintained by Pedro H. C. SantAnna. Last updated 6 months ago.
92 stars 8.88 score 133 scripts 5 dependentsevolecolgroup
tidysdm:Species Distribution Models with Tidymodels
Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.
Maintained by Andrea Manica. Last updated 24 days ago.
species-distribution-modellingtidymodels
31 stars 8.82 score 51 scriptsropensci
weatherOz:An API Client for Australian Weather and Climate Data Resources
Provides automated downloading, parsing and formatting of weather data for Australia through API endpoints provided by the Department of Primary Industries and Regional Development ('DPIRD') of Western Australia and by the Science and Technology Division of the Queensland Government's Department of Environment and Science ('DES'). As well as the Bureau of Meteorology ('BOM') of the Australian government precis and coastal forecasts, and downloading and importing radar and satellite imagery files. 'DPIRD' weather data are accessed through public 'APIs' provided by 'DPIRD', <https://www.agric.wa.gov.au/weather-api-20>, providing access to weather station data from the 'DPIRD' weather station network. Australia-wide weather data are based on data from the Australian Bureau of Meteorology ('BOM') data and accessed through 'SILO' (Scientific Information for Land Owners) Jeffrey et al. (2001) <doi:10.1016/S1364-8152(01)00008-1>. 'DPIRD' data are made available under a Creative Commons Attribution 3.0 Licence (CC BY 3.0 AU) license <https://creativecommons.org/licenses/by/3.0/au/deed.en>. SILO data are released under a Creative Commons Attribution 4.0 International licence (CC BY 4.0) <https://creativecommons.org/licenses/by/4.0/>. 'BOM' data are (c) Australian Government Bureau of Meteorology and released under a Creative Commons (CC) Attribution 3.0 licence or Public Access Licence ('PAL') as appropriate, see <http://www.bom.gov.au/other/copyright.shtml> for further details.
Maintained by Rodrigo Pires. Last updated 1 months ago.
dpirdbommeteorological-dataweather-forecastaustraliaweatherweather-datameteorologywestern-australiaaustralia-bureau-of-meteorologywestern-australia-agricultureaustralia-agricultureaustralia-climateaustralia-weatherapi-clientclimatedatarainfallweather-api
31 stars 8.47 score 40 scriptstidymodels
tidyposterior:Bayesian Analysis to Compare Models using Resampling Statistics
Bayesian analysis used here to answer the question: "when looking at resampling results, are the differences between models 'real'?" To answer this, a model can be created were the performance statistic is the resampling statistics (e.g. accuracy or RMSE). These values are explained by the model types. In doing this, we can get parameter estimates for each model's affect on performance and make statistical (and practical) comparisons between models. The methods included here are similar to Benavoli et al (2017) <https://jmlr.org/papers/v18/16-305.html>.
Maintained by Max Kuhn. Last updated 6 months ago.
102 stars 8.44 score 273 scriptstidymodels
finetune:Additional Functions for Model Tuning
The ability to tune models is important. 'finetune' enhances the 'tune' package by providing more specialized methods for finding reasonable values of model tuning parameters. Two racing methods described by Kuhn (2014) <arXiv:1405.6974> are included. An iterative search method using generalized simulated annealing (Bohachevsky, Johnson and Stein, 1986) <doi:10.1080/00401706.1986.10488128> is also included.
Maintained by Max Kuhn. Last updated 8 months ago.
62 stars 8.36 score 704 scripts 1 dependentscefet-rj-dal
harbinger:A Unified Time Series Event Detection Framework
By analyzing time series, it is possible to observe significant changes in the behavior of observations that frequently characterize events. Events present themselves as anomalies, change points, or motifs. In the literature, there are several methods for detecting events. However, searching for a suitable time series method is a complex task, especially considering that the nature of events is often unknown. This work presents Harbinger, a framework for integrating and analyzing event detection methods. Harbinger contains several state-of-the-art methods described in Salles et al. (2020) <doi:10.5753/sbbd.2020.13626>.
Maintained by Eduardo Ogasawara. Last updated 4 months ago.
18 stars 8.32 score 216 scriptsbusiness-science
modeltime.ensemble:Ensemble Algorithms for Time Series Forecasting with Modeltime
A 'modeltime' extension that implements time series ensemble forecasting methods including model averaging, weighted averaging, and stacking. These techniques are popular methods to improve forecast accuracy and stability.
Maintained by Matt Dancho. Last updated 9 months ago.
ensembleensemble-learningforecastforecastingmodeltimestackingstacking-ensembletidymodelstimetime-seriestimeseries
77 stars 8.30 score 143 scriptsdarwin-eu
DrugUtilisation:Summarise Patient-Level Drug Utilisation in Data Mapped to the OMOP Common Data Model
Summarise patient-level drug utilisation cohorts using data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. New users and prevalent users cohorts can be generated and their characteristics, indication and drug use summarised.
Maintained by Martí Català. Last updated 2 months ago.
8.20 score 156 scripts 2 dependentsbioc
POMA:Tools for Omics Data Analysis
The POMA package offers a comprehensive toolkit designed for omics data analysis, streamlining the process from initial visualization to final statistical analysis. Its primary goal is to simplify and unify the various steps involved in omics data processing, making it more accessible and manageable within a single, intuitive R package. Emphasizing on reproducibility and user-friendliness, POMA leverages the standardized SummarizedExperiment class from Bioconductor, ensuring seamless integration and compatibility with a wide array of Bioconductor tools. This approach guarantees maximum flexibility and replicability, making POMA an essential asset for researchers handling omics datasets. See https://github.com/pcastellanoescuder/POMAShiny. Paper: Castellano-Escuder et al. (2021) <doi:10.1371/journal.pcbi.1009148> for more details.
Maintained by Pol Castellano-Escuder. Last updated 4 months ago.
batcheffectclassificationclusteringdecisiontreedimensionreductionmultidimensionalscalingnormalizationpreprocessingprincipalcomponentregressionrnaseqsoftwarestatisticalmethodvisualizationbioconductorbioinformaticsdata-visualizationdimension-reductionexploratory-data-analysismachine-learningomics-data-integrationpipelinepre-processingstatistical-analysisuser-friendlyworkflow
11 stars 8.16 score 20 scripts 1 dependentsdarwin-eu
IncidencePrevalence:Estimate Incidence and Prevalence using the OMOP Common Data Model
Calculate incidence and prevalence using data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. Incidence and prevalence can be estimated for the total population in a database or for a stratification cohort.
Maintained by Edward Burn. Last updated 21 days ago.
9 stars 7.96 score 102 scripts 1 dependentsbrian-j-smith
MachineShop:Machine Learning Models and Tools
Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.
Maintained by Brian J Smith. Last updated 7 months ago.
classification-modelsmachine-learningpredictive-modelingregression-modelssurvival-models
62 stars 7.95 score 121 scriptsbcallaway11
BMisc:Miscellaneous Functions for Panel Data, Quantiles, and Printing Results
These are miscellaneous functions for working with panel data, quantiles, and printing results. For panel data, the package includes functions for making a panel data balanced (that is, dropping missing individuals that have missing observations in any time period), converting id numbers to row numbers, and to treat repeated cross sections as panel data under the assumption of rank invariance. For quantiles, there are functions to make distribution functions from a set of data points (this is particularly useful when a distribution function is created in several steps), to combine distribution functions based on some external weights, and to invert distribution functions. Finally, there are several other miscellaneous functions for obtaining weighted means, weighted distribution functions, and weighted quantiles; to generate summary statistics and their differences for two groups; and to add or drop covariates from formulas.
Maintained by Brantly Callaway. Last updated 2 months ago.
7 stars 7.92 score 110 scripts 8 dependentstlverse
tmle3:The Extensible TMLE Framework
A general framework supporting the implementation of targeted maximum likelihood estimators (TMLEs) of a diverse range of statistical target parameters through a unified interface. The goal is that the exposed framework be as general as the mathematical framework upon which it draws.
Maintained by Jeremy Coyle. Last updated 5 months ago.
causal-inferencemachine-learningtargeted-learningvariable-importance
38 stars 7.91 score 286 scripts 5 dependentsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 11 days ago.
12 stars 7.90 score 46 scriptskylebutts
did2s:Two-Stage Difference-in-Differences Following Gardner (2021)
Estimates Two-way Fixed Effects difference-in-differences/event-study models using the approach proposed by Gardner (2021) <doi:10.48550/arXiv.2207.05943>. To avoid the problems caused by OLS estimation of the Two-way Fixed Effects model, this function first estimates the fixed effects and covariates using untreated observations and then in a second stage, estimates the treatment effects.
Maintained by Kyle Butts. Last updated 15 days ago.
97 stars 7.89 score 134 scriptsbioc
mistyR:Multiview Intercellular SpaTial modeling framework
mistyR is an implementation of the Multiview Intercellular SpaTialmodeling framework (MISTy). MISTy is an explainable machine learning framework for knowledge extraction and analysis of single-cell, highly multiplexed, spatially resolved data. MISTy facilitates an in-depth understanding of marker interactions by profiling the intra- and intercellular relationships. MISTy is a flexible framework able to process a custom number of views. Each of these views can describe a different spatial context, i.e., define a relationship among the observed expressions of the markers, such as intracellular regulation or paracrine regulation, but also, the views can also capture cell-type specific relationships, capture relations between functional footprints or focus on relations between different anatomical regions. Each MISTy view is considered as a potential source of variability in the measured marker expressions. Each MISTy view is then analyzed for its contribution to the total expression of each marker and is explained in terms of the interactions with other measurements that led to the observed contribution.
Maintained by Jovan Tanevski. Last updated 5 months ago.
softwarebiomedicalinformaticscellbiologysystemsbiologyregressiondecisiontreesinglecellspatialbioconductorbiologyintercellularmachine-learningmodularmolecular-biologymultiviewspatial-transcriptomics
52 stars 7.87 score 160 scriptsschlosslab
mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines
An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Maintained by Kelly Sovacool. Last updated 2 years ago.
56 stars 7.83 score 86 scriptsmatloff
dsld:Data Science Looks at Discrimination
Statistical and graphical tools for detecting and measuring discrimination and bias, be it racial, gender, age or other. Detection and remediation of bias in machine learning algorithms. 'Python' interfaces available.
Maintained by Norm Matloff. Last updated 2 months ago.
12 stars 7.81 score 35 scriptsohdsi
PhenotypeR:Assess Study Cohorts Using a Common Data Model
Phenotype study cohorts in data mapped to the Observational Medical Outcomes Partnership Common Data Model. Diagnostics are run at the database, code list, cohort, and population level to assess whether study cohorts are ready for research.
Maintained by Edward Burn. Last updated 2 hours ago.
3 stars 7.76 score 57 scriptsnsaph-software
CausalGPS:Matching on Generalized Propensity Scores with Continuous Exposures
Provides a framework for estimating causal effects of a continuous exposure using observational data, and implementing matching and weighting on the generalized propensity score. Wu, X., Mealli, F., Kioumourtzoglou, M.A., Dominici, F. and Braun, D., 2022. Matching on generalized propensity scores with continuous exposures. Journal of the American Statistical Association, pp.1-29.
Maintained by Naeem Khoshnevis. Last updated 10 months ago.
24 stars 7.67 score 39 scriptsspsanderson
healthyR.ts:The Time Series Modeling Companion to 'healthyR'
Hospital time series data analysis workflow tools, modeling, and automations. This library provides many useful tools to review common administrative time series hospital data. Some of these include average length of stay, and readmission rates. The aim is to provide a simple and consistent verb framework that takes the guesswork out of everything.
Maintained by Steven Sanderson. Last updated 6 months ago.
aiarima-forecastingarima-modeletsforecastingggplot2machine-learningmodelingprophettime-seriestime-series-analysisworkflows
19 stars 7.58 score 56 scripts 1 dependentsohdsi
CohortSymmetry:Sequence Symmetry Analysis Using the Observational Medical Outcomes Partnership Common Data Model
Calculating crude sequence ratio, adjusted sequence ratio and confidence intervals using data mapped to the Observational Medical Outcomes Partnership Common Data Model.
Maintained by Xihang Chen. Last updated 7 days ago.
1 stars 7.52 score 73 scriptsrisktoollib
RTL:Risk Tool Library - Trading, Risk, Analytics for Commodities
A toolkit for Commodities 'analytics', risk management and trading professionals. Includes functions for API calls to <https://commodities.morningstar.com/#/>, <https://developer.genscape.com/>, and <https://www.bankofcanada.ca/valet/docs>.
Maintained by Philippe Cote. Last updated 1 months ago.
analyticsapicommoditiescommodities-apifinancegenscapemorningstarpythonrisk-managementcpp
30 stars 7.51 score 198 scriptsohdsi
OmopSketch:Characterise Tables of an OMOP Common Data Model Instance
Summarises key information in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. Assess suitability to perform specific epidemiological studies and explore the different domains to obtain feasibility counts and trends.
Maintained by Cecilia Campanile. Last updated 15 days ago.
2 stars 7.47 score 16 scripts 1 dependentsrgcca-factory
RGCCA:Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data
Multi-block data analysis concerns the analysis of several sets of variables (blocks) observed on the same group of individuals. The main aims of the RGCCA package are: to study the relationships between blocks and to identify subsets of variables of each block which are active in their relationships with the other blocks. This package allows to (i) run R/SGCCA and related methods, (ii) help the user to find out the optimal parameters for R/SGCCA such as regularization parameters (tau or sparsity), (iii) evaluate the stability of the RGCCA results and their significance, (iv) build predictive models from the R/SGCCA. (v) Generic print() and plot() functions apply to all these functionalities.
Maintained by Arthur Tenenhaus. Last updated 9 months ago.
12 stars 7.43 score 74 scriptsspsanderson
healthyR.ai:The Machine Learning and AI Modeling Companion to 'healthyR'
Hospital machine learning and ai data analysis workflow tools, modeling, and automations. This library provides many useful tools to review common administrative hospital data. Some of these include predicting length of stay, and readmits. The aim is to provide a simple and consistent verb framework that takes the guesswork out of everything.
Maintained by Steven Sanderson. Last updated 2 months ago.
aiartificial-intelligencehealthcareanalyticshealthyrhealthyversemachine-learning
16 stars 7.37 score 36 scripts 1 dependentsspsanderson
healthyR:Hospital Data Analysis Workflow Tools
Hospital data analysis workflow tools, modeling, and automations. This library provides many useful tools to review common administrative hospital data. Some of these include average length of stay, readmission rates, average net pay amounts by service lines just to name a few. The aim is to provide a simple and consistent verb framework that takes the guesswork out of everything.
Maintained by Steven Sanderson. Last updated 9 months ago.
analysisanalyticshealthcarehealthyr
30 stars 7.27 score 103 scripts 1 dependentsbioc
tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles
This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.
Maintained by Timothy Keyes. Last updated 5 months ago.
singlecellflowcytometrybioinformaticscytometrydata-sciencesingle-celltidyversecpp
18 stars 7.24 score 35 scriptstidymodels
tidyclust:A Common API to Clustering
A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
112 stars 7.21 score 139 scriptsbusiness-science
correlationfunnel:Speed Up Exploratory Data Analysis (EDA) with the Correlation Funnel
Speeds up exploratory data analysis (EDA) by providing a succinct workflow and interactive visualization tools for understanding which features have relationships to target (response). Uses binary correlation analysis to determine relationship. Default correlation method is the Pearson method. Lian Duan, W Nick Street, Yanchi Liu, Songhua Xu, and Brook Wu (2014) <doi:10.1145/2637484>.
Maintained by Matt Dancho. Last updated 1 years ago.
correlationexploratory-analysisexploratory-data-analysisexploratory-data-visualizationstidyverse
137 stars 7.20 score 115 scriptsroux-ohdsi
allofus:Interface for 'All of Us' Researcher Workbench
Streamline use of the 'All of Us' Researcher Workbench (<https://www.researchallofus.org/data-tools/workbench/>)with tools to extract and manipulate data from the 'All of Us' database. Increase interoperability with the Observational Health Data Science and Informatics ('OHDSI') tool stack by decreasing reliance of 'All of Us' tools and allowing for cohort creation via 'Atlas'. Improve reproducible and transparent research using 'All of Us'.
Maintained by Rob Cavanaugh. Last updated 5 months ago.
16 stars 7.19 score 30 scriptsdarwin-eu
DrugExposureDiagnostics:Diagnostics for OMOP Common Data Model Drug Records
Ingredient specific diagnostics for drug exposure records in the Observational Medical Outcomes Partnership (OMOP) common data model.
Maintained by Ger Inberg. Last updated 18 days ago.
4 stars 7.11 score 41 scriptsbioc
animalcules:Interactive microbiome analysis toolkit
animalcules is an R package for utilizing up-to-date data analytics, visualization methods, and machine learning models to provide users an easy-to-use interactive microbiome analysis framework. It can be used as a standalone software package or users can explore their data with the accompanying interactive R Shiny application. Traditional microbiome analysis such as alpha/beta diversity and differential abundance analysis are enhanced, while new methods like biomarker identification are introduced by animalcules. Powerful interactive and dynamic figures generated by animalcules enable users to understand their data better and discover new insights.
Maintained by Jessica McClintock. Last updated 5 months ago.
microbiomemetagenomicscoveragevisualization
55 stars 6.95 score 23 scriptsbioc
pRolocGUI:Interactive visualisation of spatial proteomics data
The package pRolocGUI comprises functions to interactively visualise spatial proteomics data on the basis of pRoloc, pRolocdata and shiny.
Maintained by Lisa Breckels. Last updated 5 months ago.
8 stars 6.90 score 3 scriptscytomining
cytominer:Methods for Image-Based Cell Profiling
`cytominer` is a suite of common functions used to process high-dimensional readouts from image-based cell profiling experiments.
Maintained by Shantanu Singh. Last updated 2 years ago.
50 stars 6.89 score 44 scriptstidymodels
agua:'tidymodels' Integration with 'h2o'
Create and evaluate models using 'tidymodels' and 'h2o' <https://h2o.ai/>. The package enables users to specify 'h2o' as an engine for several modeling methods.
Maintained by Qiushi Yan. Last updated 10 months ago.
22 stars 6.88 score 80 scriptstidymodels
usemodels:Boilerplate Code for 'Tidymodels' Analyses
Code snippets to fit models using the tidymodels framework can be easily created for a given data set.
Maintained by Max Kuhn. Last updated 6 months ago.
84 stars 6.88 score 128 scriptsraymondbalise
rUM:R Templates from the University of Miami
This holds some r markdown and quarto templates and a template to create a research project in "R Studio".
Maintained by Raymond Balise. Last updated 10 days ago.
9 stars 6.84 score 16 scriptskozodoi
fairness:Algorithmic Fairness Metrics
Offers calculation, visualization and comparison of algorithmic fairness metrics. Fair machine learning is an emerging topic with the overarching aim to critically assess whether ML algorithms reinforce existing social biases. Unfair algorithms can propagate such biases and produce predictions with a disparate impact on various sensitive groups of individuals (defined by sex, gender, ethnicity, religion, income, socioeconomic status, physical or mental disabilities). Fair algorithms possess the underlying foundation that these groups should be treated similarly or have similar prediction outcomes. The fairness R package offers the calculation and comparisons of commonly and less commonly used fairness metrics in population subgroups. These methods are described by Calders and Verwer (2010) <doi:10.1007/s10618-010-0190-x>, Chouldechova (2017) <doi:10.1089/big.2016.0047>, Feldman et al. (2015) <doi:10.1145/2783258.2783311> , Friedler et al. (2018) <doi:10.1145/3287560.3287589> and Zafar et al. (2017) <doi:10.1145/3038912.3052660>. The package also offers convenient visualizations to help understand fairness metrics.
Maintained by Nikita Kozodoi. Last updated 2 years ago.
algorithmic-discriminationalgorithmic-fairnessdiscriminationdisparate-impactfairnessfairness-aifairness-mlmachine-learning
32 stars 6.82 score 69 scripts 1 dependentsmichaellli
evalITR:Evaluating Individualized Treatment Rules
Provides various statistical methods for evaluating Individualized Treatment Rules under randomized data. The provided metrics include Population Average Value (PAV), Population Average Prescription Effect (PAPE), Area Under Prescription Effect Curve (AUPEC). It also provides the tools to analyze Individualized Treatment Rules under budget constraints. Detailed reference in Imai and Li (2019) <arXiv:1905.05389>.
Maintained by Michael Lingzhi Li. Last updated 2 years ago.
14 stars 6.78 score 36 scriptsharrison4192
autostats:Auto Stats
Automatically do statistical exploration. Create formulas using 'tidyselect' syntax, and then determine cross-validated model accuracy and variable contributions using 'glm' and 'xgboost'. Contains additional helper functions to create and modify formulas. Has a flagship function to quickly determine relationships between categorical and continuous variables in the data set.
Maintained by Harrison Tietze. Last updated 26 days ago.
6 stars 6.76 score 5 scripts 2 dependentsbioc
SPONGE:Sparse Partial Correlations On Gene Expression
This package provides methods to efficiently detect competitive endogeneous RNA interactions between two genes. Such interactions are mediated by one or several miRNAs such that both gene and miRNA expression data for a larger number of samples is needed as input. The SPONGE package now also includes spongEffects: ceRNA modules offer patient-specific insights into the miRNA regulatory landscape.
Maintained by Markus List. Last updated 5 months ago.
geneexpressiontranscriptiongeneregulationnetworkinferencetranscriptomicssystemsbiologyregressionrandomforestmachinelearning
6.66 score 38 scripts 1 dependentsbusiness-science
modeltime.resample:Resampling Tools for Time Series Forecasting
A 'modeltime' extension that implements forecast resampling tools that assess time-based model performance and stability for a single time series, panel data, and cross-sectional time series analysis.
Maintained by Matt Dancho. Last updated 1 years ago.
accuracy-metricsbacktestingbootstrapbootstrappingcross-validationforecastingmodeltimemodeltime-resampleresamplingstatisticstidymodelstime-series
19 stars 6.64 score 38 scripts 1 dependentsbioc
scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data
The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.
Maintained by Johannes Griss. Last updated 5 months ago.
singlecelltranscriptomicsgeneexpressionsupportvectormachineclassificationsoftware
15 stars 6.61 score 20 scriptsspsanderson
tidyAML:Automatic Machine Learning with 'tidymodels'
The goal of this package will be to provide a simple interface for automatic machine learning that fits the 'tidymodels' framework. The intention is to work for regression and classification problems with a simple verb framework.
Maintained by Steven Sanderson. Last updated 11 months ago.
automatic-machine-learningautomlclassificationmachine-learningparsnipr-languager-programmingregressiontidytidymodelstidyverse
68 stars 6.56 score 36 scripts 1 dependentstsailintung
fastdid:Fast Staggered Difference-in-Difference Estimators
A fast and flexible implementation of Callaway and Sant'Anna's (2021)<doi:10.1016/j.jeconom.2020.12.001> staggered Difference-in-Differences (DiD) estimators, 'fastdid' reduces the computation time from hours to seconds, and incorporates extensions such as time-varying covariates and multiple events.
Maintained by Lin-Tung Tsai. Last updated 4 months ago.
difference-in-differencesevent-studystaggered-did
28 stars 6.56 score 4 scriptsbioc
condiments:Differential Topology, Progression and Differentiation
This package encapsulate many functions to conduct a differential topology analysis. It focuses on analyzing an 'omic dataset with multiple conditions. While the package is mostly geared toward scRNASeq, it does not place any restriction on the actual input format.
Maintained by Hector Roux de Bezieux. Last updated 4 months ago.
rnaseqsequencingsoftwaresinglecelltranscriptomicsmultiplecomparisonvisualization
26 stars 6.52 score 17 scriptsstatsgary
OddsPlotty:Odds Plot to Visualise a Logistic Regression Model
Uses the outputs of a logistic regression model, from caret <https://CRAN.R-project.org/package=caret>, to build an odds plot. This allows for the rapid visualisation of odds plot ratios and works best with the outputs of CARET's GLM model class, by returning the final trained model.
Maintained by Gary Hutson. Last updated 1 months ago.
17 stars 6.39 score 48 scripts 1 dependentsmattheaphy
actxps:Create Actuarial Experience Studies: Prepare Data, Summarize Results, and Create Reports
Experience studies are used by actuaries to explore historical experience across blocks of business and to inform assumption setting activities. This package provides functions for preparing data, creating studies, visualizing results, and beginning assumption development. Experience study methods, including exposure calculations, are described in: Atkinson & McGarry (2016) "Experience Study Calculations" <https://www.soa.org/49378a/globalassets/assets/files/research/experience-study-calculations.pdf>. The limited fluctuation credibility method used by the 'exp_stats()' function is described in: Herzog (1999, ISBN:1-56698-374-6) "Introduction to Credibility Theory".
Maintained by Matt Heaphy. Last updated 3 months ago.
14 stars 6.38 score 23 scriptsthewileylab
ReviewR:A Light-Weight, Portable Tool for Reviewing Individual Patient Records
A portable Shiny tool to explore patient-level electronic health record data and perform chart review in a single integrated framework. This tool supports browsing clinical data in many different formats including multiple versions of the 'OMOP' common data model as well as the 'MIMIC-III' data model. In addition, chart review information is captured and stored securely via the Shiny interface in a 'REDCap' (Research Electronic Data Capture) project using the 'REDCap' API. See the 'ReviewR' website for additional information, documentation, and examples.
Maintained by David Mayer. Last updated 2 years ago.
24 stars 6.33 score 6 scriptsegeulgen
driveR:Prioritizing Cancer Driver Genes Using Genomics Data
Cancer genomes contain large numbers of somatic alterations but few genes drive tumor development. Identifying cancer driver genes is critical for precision oncology. Most of current approaches either identify driver genes based on mutational recurrence or using estimated scores predicting the functional consequences of mutations. 'driveR' is a tool for personalized or batch analysis of genomic data for driver gene prioritization by combining genomic information and prior biological knowledge. As features, 'driveR' uses coding impact metaprediction scores, non-coding impact scores, somatic copy number alteration scores, hotspot gene/double-hit gene condition, 'phenolyzer' gene scores and memberships to cancer-related KEGG pathways. It uses these features to estimate cancer-type-specific probability for each gene of being a cancer driver using the related task of a multi-task learning classification model. The method is described in detail in Ulgen E, Sezerman OU. 2021. driveR: driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics <doi:10.1186/s12859-021-04203-7>.
Maintained by Ege Ulgen. Last updated 2 years ago.
cancer-drivernessdriverdriver-gene-prioritizationidentify-driver-genesranking-genesscoring
15 stars 6.29 score 260 scriptsbioc
iNETgrate:Integrates DNA methylation data with gene expression in a single gene network
The iNETgrate package provides functions to build a correlation network in which nodes are genes. DNA methylation and gene expression data are integrated to define the connections between genes. This network is used to identify modules (clusters) of genes. The biological information in each of the resulting modules is represented by an eigengene. These biological signatures can be used as features e.g., for classification of patients into risk categories. The resulting biological signatures are very robust and give a holistic view of the underlying molecular changes.
Maintained by Habil Zare. Last updated 5 months ago.
geneexpressionrnaseqdnamethylationnetworkinferencenetworkgraphandnetworkbiomedicalinformaticssystemsbiologytranscriptomicsclassificationclusteringdimensionreductionprincipalcomponentmrnamicroarraynormalizationgenepredictionkeggsurvivalcore-services
74 stars 6.21 score 1 scriptstidymodels
shinymodels:Interactive Assessments of Models
Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points.
Maintained by Simon Couch. Last updated 5 months ago.
48 stars 6.21 score 48 scriptsjbryer
login:'shiny' Login Module
Framework for adding authentication to 'shiny' applications. Provides flexibility as compared to other options for where user credentials are saved, allows users to create their own accounts, and password reset functionality. Bryer (2024) <doi:10.5281/zenodo.10987876>.
Maintained by Jason Bryer. Last updated 12 months ago.
21 stars 6.15 score 45 scriptsipd-tools
ipd:Inference on Predicted Data
Performs valid statistical inference on predicted data (IPD) using recent methods, where for a subset of the data, the outcomes have been predicted by an algorithm. Provides a wrapper function with specified defaults for the type of model and method to be used for estimation and inference. Further provides methods for tidying and summarizing results. Salerno et al., (2024) <doi:10.48550/arXiv.2410.09665>.
Maintained by Stephen Salerno. Last updated 3 months ago.
8 stars 6.13 score 5 scriptsandreanini
idiolect:Forensic Authorship Analysis
Carry out comparative authorship analysis of disputed and undisputed texts within the Likelihood Ratio Framework for expressing evidence in forensic science. This package contains implementations of well-known algorithms for comparative authorship analysis, such as Smith and Aldridge's (2011) Cosine Delta <doi:10.1080/09296174.2011.533591> or Koppel and Winter's (2014) Impostors Method <doi:10.1002/asi.22954>, as well as functions to measure their performance and to calibrate their outputs into Log-Likelihood Ratios.
Maintained by Andrea Nini. Last updated 24 days ago.
14 stars 6.12 score 3 scriptssentometricsresearch
sentometrics:An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction
Optimized prediction based on textual sentiment, accounting for the intrinsic challenge that sentiment can be computed and pooled across texts and time in various ways. See Ardia et al. (2021) <doi:10.18637/jss.v099.i02>.
Maintained by Samuel Borms. Last updated 4 years ago.
nlppredictionsentiment-analysistext-miningtime-seriesopenblascppopenmp
83 stars 6.09 score 49 scriptsnicholasjclark
MRFcov:Markov Random Fields with Additional Covariates
Approximate node interaction parameters of Markov Random Fields graphical networks. Models can incorporate additional covariates, allowing users to estimate how interactions between nodes in the graph are predicted to change across covariate gradients. The general methods implemented in this package are described in Clark et al. (2018) <doi:10.1002/ecy.2221>.
Maintained by Nicholas J Clark. Last updated 1 years ago.
conditional-random-fieldsgraphical-modelsmachine-learningmarkov-random-fieldmultivariate-analysismultivariate-statisticsnetwork-analysisnetworks
24 stars 6.03 score 30 scriptsjacekbialek
PriceIndices:Calculating Bilateral and Multilateral Price Indexes
Preparing a scanner data set for price dynamics calculations (data selecting, data classification, data matching, data filtering). Computing bilateral and multilateral indexes. For details on these methods see: Diewert and Fox (2020) <doi:10.1080/07350015.2020.1816176>, Białek (2019) <doi:10.2478/jos-2019-0014> or Białek (2020) <doi:10.2478/jos-2020-0037>.
Maintained by Jacek Białek. Last updated 2 months ago.
11 stars 6.02 score 16 scriptsbusiness-science
alphavantager:Lightweight Interface to the Alpha Vantage API
Alpha Vantage has free historical financial information. All you need to do is get a free API key at <https://www.alphavantage.co>. Then you can use the R interface to retrieve free equity information. Refer to the Alpha Vantage website for more information.
Maintained by Matt Dancho. Last updated 2 years ago.
alpha-vantagefinancial-datahistorical-financial-data
70 stars 5.98 score 64 scriptsrajohansen
waterquality:Satellite Derived Water Quality Detection Algorithms
The main purpose of waterquality is to quickly and easily convert satellite-based reflectance imagery into one or many well-known water quality algorithms designed for the detection of harmful algal blooms or the following pigment proxies: chlorophyll-a, blue-green algae (phycocyanin), and turbidity. Johansen et al. (2019) <doi:10.21079/11681/35053>.
Maintained by Richard Johansen. Last updated 1 years ago.
algal-bloomalgorithmslandsat-8merismodisolciremote-sensingsentinel-2water-quality
44 stars 5.97 score 21 scriptsbioc
REMP:Repetitive Element Methylation Prediction
Machine learning-based tools to predict DNA methylation of locus-specific repetitive elements (RE) by learning surrounding genetic and epigenetic information. These tools provide genomewide and single-base resolution of DNA methylation prediction on RE that are difficult to measure using array-based or sequencing-based platforms, which enables epigenome-wide association study (EWAS) and differentially methylated region (DMR) analysis on RE.
Maintained by Yinan Zheng. Last updated 5 months ago.
dnamethylationmicroarraymethylationarraysequencinggenomewideassociationepigeneticspreprocessingmultichanneltwochanneldifferentialmethylationqualitycontroldataimport
2 stars 5.94 score 18 scriptsbioc
miRspongeR:Identification and analysis of miRNA sponge regulation
This package provides several functions to explore miRNA sponge (also called ceRNA or miRNA decoy) regulation from putative miRNA-target interactions or/and transcriptomics data (including bulk, single-cell and spatial gene expression data). It provides eight popular methods for identifying miRNA sponge interactions, and an integrative method to integrate miRNA sponge interactions from different methods, as well as the functions to validate miRNA sponge interactions, and infer miRNA sponge modules, conduct enrichment analysis of miRNA sponge modules, and conduct survival analysis of miRNA sponge modules. By using a sample control variable strategy, it provides a function to infer sample-specific miRNA sponge interactions. In terms of sample-specific miRNA sponge interactions, it implements three similarity methods to construct sample-sample correlation network.
Maintained by Junpeng Zhang. Last updated 5 months ago.
geneexpressionbiomedicalinformaticsnetworkenrichmentsurvivalmicroarraysoftwaresinglecellspatialrnaseqcernamirnasponge
5 stars 5.88 score 8 scriptsphytoclass
phytoclass:Estimate Chla Concentrations of Phytoplankton Groups
Determine the chlorophyll a (Chl a) concentrations of different phytoplankton groups based on their pigment biomarkers. The method uses non-negative matrix factorisation and simulated annealing to minimise error between the observed and estimated values of pigment concentrations (Hayward et al. (2023) <doi:10.1002/lom3.10541>). The approach is similar to the widely used 'CHEMTAX' program (Mackey et al. 1996) <doi:10.3354/meps144265>, but is more straightforward, accurate, and not reliant on initial guesses for the pigment to Chl a ratios for phytoplankton groups.
Maintained by Alexander Hayward. Last updated 25 days ago.
2 stars 5.88 score 9 scriptsbcallaway11
qte:Quantile Treatment Effects
Provides several methods for computing the Quantile Treatment Effect (QTE) and Quantile Treatment Effect on the Treated (QTT). The main cases covered are (i) Treatment is randomly assigned, (ii) Treatment is as good as randomly assigned after conditioning on some covariates (also called conditional independence or selection on observables) using the methods developed in Firpo (2007) <doi:10.1111/j.1468-0262.2007.00738.x>, (iii) Identification is based on a Difference in Differences assumption (several varieties are available in the package e.g. Athey and Imbens (2006) <doi:10.1111/j.1468-0262.2006.00668.x> Callaway and Li (2019) <doi:10.3982/QE935>, Callaway, Li, and Oka (2018) <doi:10.1016/j.jeconom.2018.06.008>).
Maintained by Brantly Callaway. Last updated 11 months ago.
9 stars 5.87 score 55 scriptsvdblab
FLORAL:Fit Log-Ratio Lasso Regression for Compositional Data
Log-ratio Lasso regression for continuous, binary, and survival outcomes with (longitudinal) compositional features. See Fei and others (2024) <doi:10.1016/j.crmeth.2024.100899>.
Maintained by Teng Fei. Last updated 1 months ago.
12 stars 5.85 score 13 scriptsnelson-gon
manymodelr:Build and Tune Several Models
Frequently one needs a convenient way to build and tune several models in one go.The goal is to provide a number of machine learning convenience functions. It provides the ability to build, tune and obtain predictions of several models in one function. The models are built using functions from 'caret' with easier to read syntax. Kuhn(2014) <doi:10.48550/arXiv.1405.6974>.
Maintained by Nelson Gonzabato. Last updated 11 days ago.
analysis-of-varianceanovacorrelationcorrelation-coefficientgeneralized-linear-modelsgradient-boosting-decision-treesknn-classificationlinear-modelslinear-regressionmachine-learningmissing-valuesmodelsr-programmingrandom-forest-algorithmregression-models
2 stars 5.78 score 50 scriptsshinyworks
cookies:Use Browser Cookies with 'shiny'
Browser cookies are name-value pairs that are saved in a user's browser by a website. Cookies allow websites to persist information about the user and their use of the website. Here we provide tools for working with cookies in 'shiny' apps, in part by wrapping the 'js-cookie' JavaScript library <https://github.com/js-cookie/js-cookie>.
Maintained by Jon Harmon. Last updated 5 months ago.
33 stars 5.71 score 26 scripts 2 dependentsstatsgary
MLDataR:Collection of Machine Learning Datasets for Supervised Machine Learning
Contains a collection of datasets for working with machine learning tasks. It will contain datasets for supervised machine learning Jiang (2020)<doi:10.1016/j.beth.2020.05.002> and will include datasets for classification and regression. The aim of this package is to use data generated around health and other domains.
Maintained by Gary Hutson. Last updated 1 years ago.
53 stars 5.70 score 19 scriptsbioc
CytoGLMM:Conditional Differential Analysis for Flow and Mass Cytometry Experiments
The CytoGLMM R package implements two multiple regression strategies: A bootstrapped generalized linear model (GLM) and a generalized linear mixed model (GLMM). Most current data analysis tools compare expressions across many computationally discovered cell types. CytoGLMM focuses on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. As a result, CytoGLMM finds differential proteins in flow and mass cytometry data while reducing biases arising from marker correlations and safeguarding against false discoveries induced by patient heterogeneity.
Maintained by Christof Seiler. Last updated 5 months ago.
flowcytometryproteomicssinglecellcellbasedassayscellbiologyimmunooncologyregressionstatisticalmethodsoftware
2 stars 5.68 score 1 scripts 1 dependentsbioc
metabCombiner:Method for Combining LC-MS Metabolomics Feature Measurements
This package aligns LC-HRMS metabolomics datasets acquired from biologically similar specimens analyzed under similar, but not necessarily identical, conditions. Peak-picked and simply aligned metabolomics feature tables (consisting of m/z, rt, and per-sample abundance measurements, plus optional identifiers & adduct annotations) are accepted as input. The package outputs a combined table of feature pair alignments, organized into groups of similar m/z, and ranked by a similarity score. Input tables are assumed to be acquired using similar (but not necessarily identical) analytical methods.
Maintained by Hani Habra. Last updated 5 months ago.
softwaremassspectrometrymetabolomicsmass-spectrometry
10 stars 5.65 score 5 scriptsakai01
caretForecast:Conformal Time Series Forecasting Using State of Art Machine Learning Algorithms
Conformal time series forecasting using the caret infrastructure. It provides access to state-of-the-art machine learning models for forecasting applications. The hyperparameter of each model is selected based on time series cross-validation, and forecasting is done recursively.
Maintained by Resul Akay. Last updated 2 years ago.
caretconformal-predictiondata-scienceeconometricsforecastforecastingforecasting-modelsmachine-learningmacroeconometricsmicroeconometricstime-seriestime-series-forcastingtime-series-prediction
25 stars 5.62 score 28 scripts 4 dependentstrevorld
datetimeoffset:Datetimes with Optional UTC Offsets and/or Heterogeneous Time Zones
Supports import/export for a number of datetime string standards and R datetime classes often including lossless re-export of any original reduced precision including 'ISO 8601' <https://en.wikipedia.org/wiki/ISO_8601> and 'pdfmark' <https://opensource.adobe.com/dc-acrobat-sdk-docs/library/pdfmark/> datetime strings. Supports local/global datetimes with optional UTC offsets and/or (possibly heterogeneous) time zones with up to nanosecond precision.
Maintained by Trevor L. Davis. Last updated 7 days ago.
6 stars 5.56 score 1 scripts 2 dependentsbioc
bandle:An R package for the Bayesian analysis of differential subcellular localisation experiments
The Bandle package enables the analysis and visualisation of differential localisation experiments using mass-spectrometry data. Experimental methods supported include dynamic LOPIT-DC, hyperLOPIT, Dynamic Organellar Maps, Dynamic PCP. It provides Bioconductor infrastructure to analyse these data.
Maintained by Oliver M. Crook. Last updated 2 months ago.
bayesianclassificationclusteringimmunooncologyqualitycontroldataimportproteomicsmassspectrometryopenblascppopenmp
4 stars 5.56 score 3 scriptstechtonique
learningmachine:Machine Learning with Explanations and Uncertainty Quantification
Regression-based Machine Learning with explanations and uncertainty quantification.
Maintained by T. Moudiki. Last updated 4 months ago.
conformal-predictionmachine-learningmachine-learning-algorithmsmachinelearningstatistical-learninguncertainty-quantificationcpp
5 stars 5.53 score 21 scriptsjeffreyhanson
raptr:Representative and Adequate Prioritization Toolkit in R
Biodiversity is in crisis. The overarching aim of conservation is to preserve biodiversity patterns and processes. To this end, protected areas are established to buffer species and preserve biodiversity processes. But resources are limited and so protected areas must be cost-effective. This package contains tools to generate plans for protected areas (prioritizations), using spatially explicit targets for biodiversity patterns and processes. To obtain solutions in a feasible amount of time, this package uses the commercial 'Gurobi' software (obtained from <https://www.gurobi.com/>). For more information on using this package, see Hanson et al. (2018) <doi:10.1111/2041-210X.12862>.
Maintained by Jeffrey O Hanson. Last updated 1 years ago.
8 stars 5.52 score 83 scriptsjonathan-g
datafsm:Estimating Finite State Machine Models from Data
Automatic generation of finite state machine models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple deterministic approximations that explain most of the structure of complex stochastic processes. We have applied the software to empirical data, and demonstrated it's ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
Maintained by Jonathan M. Gilligan. Last updated 4 years ago.
11 stars 5.52 score 30 scriptsriccardo-df
causalQual:Causal Inference for Qualitative Outcomes
Implements the framework introduced in Di Francesco and Mellace (2025) <doi:10.48550/arXiv.2502.11691>, shifting the focus to well-defined and interpretable estimands that quantify how treatment affects the probability distribution over outcome categories. It supports selection-on-observables, instrumental variables, regression discontinuity, and difference-in-differences designs.
Maintained by Riccardo Di Francesco. Last updated 11 days ago.
11 stars 5.44 scorehehta
RESIDE:Rapid Easy Synthesis to Inform Data Extraction
Developed to assist researchers with planning analysis, prior to obtaining data from Trusted Research Environments (TREs) also known as safe havens. With functionality to export and import marginal distributions as well as synthesise data, both with and without correlations from these marginal distributions. Using a multivariate cumulative distribution (COPULA). Additionally the International Stroke Trial (IST) is included as an example dataset under ODC-By licence Sandercock et al. (2011) <doi:10.7488/ds/104>, Sandercock et al. (2011) <doi:10.1186/1745-6215-12-101>.
Maintained by Ryan Field. Last updated 25 days ago.
5.44 score 5 scriptspersimune
explainer:Machine Learning Model Explainer
It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements.
Maintained by Ramtin Zargari Marandi. Last updated 6 months ago.
aiclassificationclinical-researchexplainabilityexplainable-aiinterpretabilitymachine-learningregressionshapstatistics
15 stars 5.43 score 12 scriptsjaipizgon
NeuralSens:Sensitivity Analysis of Neural Networks
Analysis functions to quantify inputs importance in neural network models. Functions are available for calculating and plotting the inputs importance and obtaining the activation function of each neuron layer and its derivatives. The importance of a given input is defined as the distribution of the derivatives of the output with respect to that input in each training data point <doi:10.18637/jss.v102.i07>.
Maintained by Jaime Pizarroso Gonzalo. Last updated 6 months ago.
15 stars 5.43 score 24 scriptsshinyworks
scenes:Switch Between Alternative 'shiny' UIs
Sometimes it is useful to serve up alternative 'shiny' UIs depending on information passed in the request object, such as the value of a cookie or a query parameter. This packages facilitates such switches.
Maintained by Jon Harmon. Last updated 5 months ago.
16 stars 5.41 score 16 scriptsflaviomoc
divraster:Diversity Metrics Calculations for Rasterized Data
Alpha and beta diversity for taxonomic (TD), functional (FD), and phylogenetic (PD) dimensions based on rasters. Spatial and temporal beta diversity can be partitioned into replacement and richness difference components. It also calculates standardized effect size for FD and PD alpha diversity and the average individual traits across multilayer rasters. The layers of the raster represent species, while the cells represent communities. Methods details can be found at Cardoso et al. 2022 <https://CRAN.R-project.org/package=BAT> and Heming et al. 2023 <https://CRAN.R-project.org/package=SESraster>.
Maintained by Flávio M. M. Mota. Last updated 14 days ago.
10 stars 5.40 score 7 scriptsfawda123
WRTDStidal:Weighted Regression for Water Quality Evaluation in Tidal Waters
An adaptation for estuaries (tidal waters) of weighted regression on time, discharge, and season to evaluate trends in water quality time series. Please see Beck and Hagy (2015) <doi:10.1007/s10666-015-9452-8> for details.
Maintained by Marcus W. Beck. Last updated 1 years ago.
4 stars 5.38 score 119 scriptssmaakage85
modelgrid:A Framework for Creating, Managing and Training Multiple Caret Models
A minimalistic but flexible framework that facilitates the creation, management and training of multiple 'caret' models. A model grid consists of two components: (1) a set of settings that is shared by all models by default, and (2) specifications that apply only to the individual models. When the model grid is trained, model and training specifications are first consolidated from the shared and the model specific settings into complete 'caret' model configurations. These models are then trained with the 'train' function from the 'caret' package.
Maintained by Lars Kjeldgaard. Last updated 6 years ago.
caretmachine-learningpredictive-analyticspredictive-modeling
23 stars 5.34 score 19 scriptsadrientaudiere
cati:Community Assembly by Traits: Individuals and Beyond
Detect and quantify community assembly processes using trait values of individuals or populations, the T-statistics and other metrics, and dedicated null models.
Maintained by Adrien Taudiere. Last updated 5 months ago.
12 stars 5.33 score 15 scriptstlverse
tmle3shift:Targeted Learning of the Causal Effects of Stochastic Interventions
Targeted maximum likelihood estimation (TMLE) of population-level causal effects under stochastic treatment regimes and related nonparametric variable importance analyses. Tools are provided for TML estimation of the counterfactual mean under a stochastic intervention characterized as a modified treatment policy, such as treatment policies that shift the natural value of the exposure. The causal parameter and estimation were described in Díaz and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x> and an improved estimation approach was given by Díaz and van der Laan (2018) <doi:10.1007/978-3-319-65304-4_14>.
Maintained by Nima Hejazi. Last updated 6 months ago.
causal-inferencemachine-learningmarginal-structural-modelsstochastic-interventionstargeted-learningtreatment-effectsvariable-importance
17 stars 5.33 score 42 scripts 1 dependentsalexkychen
assignPOP:Population Assignment using Genetic, Non-Genetic or Integrated Data in a Machine Learning Framework
Use Monte-Carlo and K-fold cross-validation coupled with machine- learning classification algorithms to perform population assignment, with functionalities of evaluating discriminatory power of independent training samples, identifying informative loci, reducing data dimensionality for genomic data, integrating genetic and non-genetic data, and visualizing results.
Maintained by Kuan-Yu (Alex) Chen. Last updated 1 years ago.
cross-validationdata-integrationgbsmachine-learningpopulation-assignmentpopulation-genomicsradseq
17 stars 5.33 score 25 scriptsbioc
DaMiRseq:Data Mining for RNA-seq data: normalization, feature selection and classification
The DaMiRseq package offers a tidy pipeline of data mining procedures to identify transcriptional biomarkers and exploit them for both binary and multi-class classification purposes. The package accepts any kind of data presented as a table of raw counts and allows including both continous and factorial variables that occur with the experimental setting. A series of functions enable the user to clean up the data by filtering genomic features and samples, to adjust data by identifying and removing the unwanted source of variation (i.e. batches and confounding factors) and to select the best predictors for modeling. Finally, a "stacking" ensemble learning technique is applied to build a robust classification model. Every step includes a checkpoint that the user may exploit to assess the effects of data management by looking at diagnostic plots, such as clustering and heatmaps, RLE boxplots, MDS or correlation plot.
Maintained by Mattia Chiesa. Last updated 5 months ago.
sequencingrnaseqclassificationimmunooncologyopenjdk
5.32 score 7 scripts 1 dependentsbioc
preciseTAD:preciseTAD: A machine learning framework for precise TAD boundary prediction
preciseTAD provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.
Maintained by Mikhail Dozmorov. Last updated 5 months ago.
softwarehicsequencingclusteringclassificationfunctionalgenomicsfeatureextraction
7 stars 5.29 score 14 scriptsrandcorporation
optic:Simulation Tool for Causal Inference Using Longitudinal Data
Implements a simulation study to assess the strengths and weaknesses of causal inference methods for estimating policy effects using panel data. See Griffin et al. (2021) <doi:10.1007/s10742-022-00284-w> and Griffin et al. (2022) <doi:10.1186/s12874-021-01471-y> for a description of our methods.
Maintained by Pedro Nascimento de Lima. Last updated 2 months ago.
causal-inferencediff-in-difflongitudinal-datasimulation
9 stars 5.26 score 6 scriptsbioc
scGPS:A complete analysis of single cell subpopulations, from identifying subpopulations to analysing their relationship (scGPS = single cell Global Predictions of Subpopulation)
The package implements two main algorithms to answer two key questions: a SCORE (Stable Clustering at Optimal REsolution) to find subpopulations, followed by scGPS to investigate the relationships between subpopulations.
Maintained by Quan Nguyen. Last updated 5 months ago.
singlecellclusteringdataimportsequencingcoverageopenblascpp
4 stars 5.20 score 7 scriptsbioc
squallms:Speedy quality assurance via lasso labeling for LC-MS data
squallms is a Bioconductor R package that implements a "semi-labeled" approach to untargeted mass spectrometry data. It pulls in raw data from mass-spec files to calculate several metrics that are then used to label MS features in bulk as high or low quality. These metrics of peak quality are then passed to a simple logistic model that produces a fully-labeled dataset suitable for downstream analysis.
Maintained by William Kumler. Last updated 5 months ago.
massspectrometrymetabolomicsproteomicslipidomicsshinyappsclassificationclusteringfeatureextractionprincipalcomponentregressionpreprocessingqualitycontrolvisualization
3 stars 5.13 score 5 scriptsbioc
SGCP:SGCP: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks
SGC is a semi-supervised pipeline for gene clustering in gene co-expression networks. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules.
Maintained by Niloofar AghaieAbiane. Last updated 5 months ago.
geneexpressiongenesetenrichmentnetworkenrichmentsystemsbiologyclassificationclusteringdimensionreductiongraphandnetworkneuralnetworknetworkmrnamicroarrayrnaseqvisualizationbioinformaticsgenecoexpressionnetworkgraphsnetworkclusteringnetworksself-trainingsemi-supervised-learningunsupervised-learning
2 stars 5.12 score 44 scriptsspsanderson
healthyverse:Easily Install and Load the 'healthyverse'
The 'healthyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'healthyverse' packages in a single step.
Maintained by Steven Sanderson. Last updated 6 months ago.
analyticshealthcarehealthcare-applicationinstallationinstallermetapackages
11 stars 5.12 score 24 scriptspromidat
loadeR:Load Data for Analysis System
Provides a framework to load text and excel files through a 'shiny' graphical interface. It allows renaming, transforming, ordering and removing variables. It includes basic exploratory methods such as the mean, median, mode, normality test, histogram and correlation.
Maintained by Oldemar Rodriguez. Last updated 2 years ago.
5.09 score 275 scripts 3 dependentsadefazio
classifierplots:Generates a Visualization of Classifier Performance as a Grid of Diagnostic Plots
Generates a visualization of binary classifier performance as a grid of diagnostic plots with just one function call. Includes ROC curves, prediction density, accuracy, precision, recall and calibration plots, all using ggplot2 for easy modification. Debug your binary classifiers faster and easier!
Maintained by Aaron Defazio. Last updated 4 years ago.
50 stars 5.08 score 16 scriptsfrankiethull
maize:Specialty Kernels for SVMs
Bindings for svm kernels via kernlab for use with the 'parsnip' package. Specifically related to specialty kernels for support vector machines not available in parsnip. package includes interface for various kernlab kernels and custom kernels too.
Maintained by Frankie T. Hull. Last updated 3 days ago.
10 stars 5.08 score 3 scriptslanl
NEONiso:Tools to Calibrate and Work with NEON Atmospheric Isotope Data
Functions for downloading, calibrating, and analyzing atmospheric isotope data bundled into the eddy covariance data products of the National Ecological Observatory Network (NEON) <https://www.neonscience.org>. Calibration tools are provided for carbon and water isotope products. Carbon isotope calibration details are found in Fiorella et al. (2021) <doi:10.1029/2020JG005862>, and the readme file at <https://github.com/lanl/NEONiso>. Tools for calibrating water isotope products have been added as of 0.6.0, but have known deficiencies and should be considered experimental and unsupported.
Maintained by Rich Fiorella. Last updated 1 months ago.
2 stars 5.08 score 6 scriptstrevorld
xmpdf:Edit 'XMP' Metadata and 'PDF' Bookmarks and Documentation Info
Edit 'XMP' metadata <https://en.wikipedia.org/wiki/Extensible_Metadata_Platform> in a variety of media file formats as well as edit bookmarks (aka outline aka table of contents) and documentation info entries in 'pdf' files. Can detect and use a variety of command-line tools to perform these operations such as 'exiftool' <https://exiftool.org/>, 'ghostscript' <https://www.ghostscript.com/>, and/or 'pdftk' <https://gitlab.com/pdftk-java/pdftk>.
Maintained by Trevor L Davis. Last updated 1 years ago.
4 stars 5.08 score 1 scripts 1 dependentsff1201
sgs:Sparse-Group SLOPE: Adaptive Bi-Level Selection with FDR Control
Implementation of Sparse-group SLOPE (SGS) (Feser and Evangelou (2023) <doi:10.48550/arXiv.2305.09467>) models. Linear and logistic regression models are supported, both of which can be fit using k-fold cross-validation. Dense and sparse input matrices are supported. In addition, a general Adaptive Three Operator Splitting (ATOS) (Pedregosa and Gidel (2018) <doi:10.48550/arXiv.1804.02339>) implementation is provided. Group SLOPE (gSLOPE) (Brzyski et al. (2019) <doi:10.1080/01621459.2017.1411269>) and group-based OSCAR models (Feser and Evangelou (2024) <doi:10.48550/arXiv.2405.15357>) are also implemented. All models are available with strong screening rules (Feser and Evangelou (2024) <doi:10.48550/arXiv.2405.15357>) for computational speed-up.
Maintained by Fabio Feser. Last updated 6 days ago.
1 stars 5.07 score 13 scripts 1 dependentsjameshwade
measure:A Recipes-style Interface to Tidymodels for Analytical Measurements
Analytical measurements...
Maintained by James Wade. Last updated 1 months ago.
5 stars 5.06 score 58 scriptsaleksandarsekulic
meteo:RFSI & STRK Interpolation for Meteo and Environmental Variables
Random Forest Spatial Interpolation (RFSI, Sekulić et al. (2020) <doi:10.3390/rs12101687>) and spatio-temporal geostatistical (spatio-temporal regression Kriging (STRK)) interpolation for meteorological (Kilibarda et al. (2014) <doi:10.1002/2013JD020803>, Sekulić et al. (2020) <doi:10.1007/s00704-019-03077-3>) and other environmental variables. Contains global spatio-temporal models calculated using publicly available data.
Maintained by Aleksandar Sekulić. Last updated 6 months ago.
18 stars 5.06 score 64 scriptspyanglab
AdaSampling:Adaptive Sampling for Positive Unlabeled and Label Noise Learning
Implements the adaptive sampling procedure, a framework for both positive unlabeled learning and learning with class label noise. Yang, P., Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J. (2018) <doi:10.1109/TCYB.2018.2816984>.
Maintained by Pengyi Yang. Last updated 6 years ago.
11 stars 5.04 score 10 scriptsacabassi
coca:Cluster-of-Clusters Analysis
Contains the R functions needed to perform Cluster-Of-Clusters Analysis (COCA) and Consensus Clustering (CC). For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>.
Maintained by Alessandra Cabassi. Last updated 5 years ago.
cluster-analysiscluster-of-clustersclusteringcocagenomicsintegrative-clusteringmulti-omics
6 stars 5.03 score 12 scripts 1 dependentssstoeckl
FFdownload:Download Data from Kenneth French's Website
Downloads all the datasets (you can exclude the daily ones or specify a list of those you are targeting specifically) from Kenneth French's Website at <https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html>, process them and convert them to list of 'xts' (time series).
Maintained by Sebastian Stoeckl. Last updated 10 months ago.
9 stars 5.03 score 12 scriptsjobnmadu
Dyn4cast:Dynamic Modeling and Machine Learning Environment
Estimates, predict and forecast dynamic models as well as Machine Learning metrics which assists in model selection for further analysis. The package also have capabilities to provide tools and metrics that are useful in machine learning and modeling. For example, there is quick summary, percent sign, Mallow's Cp tools and others. The ecosystem of this package is analysis of economic data for national development. The package is so far stable and has high reliability and efficiency as well as time-saving.
Maintained by Job Nmadu. Last updated 15 days ago.
data-scienceequal-lenght-forecastforecastingknotsmachine-learningnigeriapredictionregression-modelsspline-modelsstatisticstime-series
4 stars 5.03 score 38 scriptscaranathunge
promor:Proteomics Data Analysis and Modeling Tools
A comprehensive, user-friendly package for label-free proteomics data analysis and machine learning-based modeling. Data generated from 'MaxQuant' can be easily used to conduct differential expression analysis, build predictive models with top protein candidates, and assess model performance. promor includes a suite of tools for quality control, visualization, missing data imputation (Lazar et. al. (2016) <doi:10.1021/acs.jproteome.5b00981>), differential expression analysis (Ritchie et. al. (2015) <doi:10.1093/nar/gkv007>), and machine learning-based modeling (Kuhn (2008) <doi:10.18637/jss.v028.i05>).
Maintained by Chathurani Ranathunge. Last updated 2 years ago.
biomarkersdifferential-expressionlfqmachine-learningmass-spectrometrymodelingproteomics
15 stars 5.02 score 14 scriptslanedrew
ldmppr:Estimate and Simulate from Location Dependent Marked Point Processes
A suite of tools for estimating, assessing model fit, simulating from, and visualizing location dependent marked point processes characterized by regularity in the pattern. You provide a reference marked point process, a set of raster images containing location specific covariates, and select the estimation algorithm and type of mark model. 'ldmppr' estimates the process and mark models and allows you to check the appropriateness of the model using a variety of diagnostic tools. Once a satisfactory model fit is obtained, you can simulate from the model and visualize the results. Documentation for the package 'ldmppr' is available in the form of a vignette.
Maintained by Lane Drew. Last updated 1 months ago.
1 stars 5.00 score 2 scriptsbioc
jazzPanda:Finding spatially relevant marker genes in image based spatial transcriptomics data
This package contains the function to find marker genes for image-based spatial transcriptomics data. There are functions to create spatial vectors from the cell and transcript coordiantes, which are passed as inputs to find marker genes. Marker genes are detected for every cluster by two approaches. The first approach is by permtuation testing, which is implmented in parallel for finding marker genes for one sample study. The other approach is to build a linear model for every gene. This approach can account for multiple samples and backgound noise.
Maintained by Melody Jin. Last updated 29 days ago.
spatialgeneexpressiondifferentialexpressionstatisticalmethodtranscriptomicscorrelationlinear-modelsmarker-genesspatial-transcriptomics
2 stars 5.00 scorebioc
MAI:Mechanism-Aware Imputation
A two-step approach to imputing missing data in metabolomics. Step 1 uses a random forest classifier to classify missing values as either Missing Completely at Random/Missing At Random (MCAR/MAR) or Missing Not At Random (MNAR). MCAR/MAR are combined because it is often difficult to distinguish these two missing types in metabolomics data. Step 2 imputes the missing values based on the classified missing mechanisms, using the appropriate imputation algorithms. Imputation algorithms tested and available for MCAR/MAR include Bayesian Principal Component Analysis (BPCA), Multiple Imputation No-Skip K-Nearest Neighbors (Multi_nsKNN), and Random Forest. Imputation algorithms tested and available for MNAR include nsKNN and a single imputation approach for imputation of metabolites where left-censoring is present.
Maintained by Jonathan Dekermanjian. Last updated 5 months ago.
softwaremetabolomicsstatisticalmethodclassificationimputation-methodsmachine-learningmissing-data
2 stars 5.00 score 6 scriptsbioc
GARS:GARS: Genetic Algorithm for the identification of Robust Subsets of variables in high-dimensional and challenging datasets
Feature selection aims to identify and remove redundant, irrelevant and noisy variables from high-dimensional datasets. Selecting informative features affects the subsequent classification and regression analyses by improving their overall performances. Several methods have been proposed to perform feature selection: most of them relies on univariate statistics, correlation, entropy measurements or the usage of backward/forward regressions. Herein, we propose an efficient, robust and fast method that adopts stochastic optimization approaches for high-dimensional. GARS is an innovative implementation of a genetic algorithm that selects robust features in high-dimensional and challenging datasets.
Maintained by Mattia Chiesa. Last updated 5 months ago.
classificationfeatureextractionclusteringopenjdk
5.00 score 2 scriptskevinhzq
healthdb:Working with Healthcare Databases
A system for identifying diseases or events from healthcare databases and preparing data for epidemiological studies. It includes capabilities not supported by 'SQL', such as matching strings by 'stringr' style regular expressions, and can compute comorbidity scores (Quan et al. (2005) <doi:10.1097/01.mlr.0000182534.19832.83>) directly on a database server. The implementation is based on 'dbplyr' with full 'tidyverse' compatibility.
Maintained by Kevin Hu. Last updated 1 months ago.
2 stars 4.95 scoretylerjpike
OOS:Out-of-Sample Time Series Forecasting
A comprehensive and cohesive API for the out-of-sample forecasting workflow: data preparation, forecasting - including both traditional econometric time series models and modern machine learning techniques - forecast combination, model and error analysis, and forecast visualization.
Maintained by Tyler J. Pike. Last updated 4 years ago.
econometricsforecast-combinationforecastingmachine-learning
9 stars 4.95 score 5 scriptsshanpengli
FastJM:Semi-Parametric Joint Modeling of Longitudinal and Survival Data
Maximum likelihood estimation for the semi-parametric joint modeling of competing risks and longitudinal data applying customized linear scan algorithms, proposed by Li and colleagues (2022) <doi:10.1155/2022/1362913>. The time-to-event data is modelled using a (cause-specific) Cox proportional hazards regression model with time-fixed covariates. The longitudinal outcome is modelled using a linear mixed effects model. The association is captured by shared random effects. The model is estimated using an Expectation Maximization algorithm.
Maintained by Shanpeng Li. Last updated 9 days ago.
5 stars 4.95 score 2 scripts 2 dependentsbioc
HPiP:Host-Pathogen Interaction Prediction
HPiP (Host-Pathogen Interaction Prediction) uses an ensemble learning algorithm for prediction of host-pathogen protein-protein interactions (HP-PPIs) using structural and physicochemical descriptors computed from amino acid-composition of host and pathogen proteins.The proposed package can effectively address data shortages and data unavailability for HP-PPI network reconstructions. Moreover, establishing computational frameworks in that regard will reveal mechanistic insights into infectious diseases and suggest potential HP-PPI targets, thus narrowing down the range of possible candidates for subsequent wet-lab experimental validations.
Maintained by Matineh Rahmatbakhsh. Last updated 5 months ago.
proteomicssystemsbiologynetworkinferencestructuralpredictiongenepredictionnetwork
3 stars 4.95 score 6 scriptstjetka
SLEMI:Statistical Learning Based Estimation of Mutual Information
The implementation of the algorithm for estimation of mutual information and channel capacity from experimental data by classification procedures (logistic regression). Technically, it allows to estimate information-theoretic measures between finite-state input and multivariate, continuous output. Method described in Jetka et al. (2019) <doi:10.1371/journal.pcbi.1007132>.
Maintained by Tomasz Jetka. Last updated 1 years ago.
channel-capacityinformation-theorylogistic-regressionmutual-information-estimation
4 stars 4.92 score 21 scriptsconnor-reid-tiffany
omu:A Metabolomics Analysis Tool for Intuitive Figures and Convenient Metadata Collection
Facilitates the creation of intuitive figures to describe metabolomics data by utilizing Kyoto Encyclopedia of Genes and Genomes (KEGG) hierarchy data, and gathers functional orthology and gene data from the KEGG-REST API.
Maintained by Connor Tiffany. Last updated 1 years ago.
3 stars 4.89 score 52 scriptsfmgarciadiaz
PortalHacienda:Acceder Con R a Los Datos Del Portal De Hacienda
Obtener listado de datos, acceder y extender series del Portal de Datos de Hacienda.Las proyecciones se realizan con 'forecast', Hyndman RJ, Khandakar Y (2008) <doi:10.18637/jss.v027.i03>. Search, download and forecast time-series from the Ministry of Economy of Argentina. Forecasts are built with the 'forecast' package, Hyndman RJ, Khandakar Y (2008) <doi:10.18637/jss.v027.i03>.
Maintained by Fernando Garcia Diaz. Last updated 2 years ago.
apiargentinaeconomiaministerio-de-economiaseries-de-tiempo
15 stars 4.88 score 7 scriptsbdwilliamson
flevr:Flexible, Ensemble-Based Variable Selection with Potentially Missing Data
Perform variable selection in settings with possibly missing data based on extrinsic (algorithm-specific) and intrinsic (population-level) variable importance. Uses a Super Learner ensemble to estimate the underlying prediction functions that give rise to estimates of variable importance. For more information about the methods, please see Williamson and Huang (2023+) <arXiv:2202.12989>.
Maintained by Brian D. Williamson. Last updated 1 years ago.
5 stars 4.88 score 2 scriptsrobson-fernandes
bnviewer:Bayesian Networks Interactive Visualization and Explainable Artificial Intelligence
Bayesian networks provide an intuitive framework for probabilistic reasoning and its graphical nature can be interpreted quite clearly. Graph based methods of machine learning are becoming more popular because they offer a richer model of knowledge that can be understood by a human in a graphical format. The 'bnviewer' is an R Package that allows the interactive visualization of Bayesian Networks. The aim of this package is to improve the Bayesian Networks visualization over the basic and static views offered by existing packages.
Maintained by Robson Fernandes. Last updated 5 years ago.
bayesian-inferencebayesian-networkbayesian-networksprobabilistic-graphical-models
7 stars 4.86 score 69 scripts 1 dependentshectorrdb
Ecume:Equality of 2 (or k) Continuous Univariate and Multivariate Distributions
We implement (or re-implements in R) a variety of statistical tools. They are focused on non-parametric two-sample (or k-sample) distribution comparisons in the univariate or multivariate case. See the vignette for more info.
Maintained by Hector Roux de Bezieux. Last updated 10 months ago.
1 stars 4.86 score 16 scripts 3 dependentsgokmenzararsiz
dtComb:Statistical Combination of Diagnostic Tests
A system for combining two diagnostic tests using various approaches that include statistical and machine-learning-based methodologies. These approaches are divided into four groups: linear combination methods, non-linear combination methods, mathematical operators, and machine learning algorithms. See the <https://biotools.erciyes.edu.tr/dtComb/> website for more information, documentation, and examples.
Maintained by Gokmen Zararsiz. Last updated 4 days ago.
4.85 score 7 scriptsepivec
TDLM:Systematic Comparison of Trip Distribution Laws and Models
The main purpose of this package is to propose a rigorous framework to fairly compare trip distribution laws and models as described in Lenormand et al. (2016) <doi:10.1016/j.jtrangeo.2015.12.008>.
Maintained by Maxime Lenormand. Last updated 25 days ago.
2 stars 4.85 score 3 scriptsbioc
MLSeq:Machine Learning Interface for RNA-Seq Data
This package applies several machine learning methods, including SVM, bagSVM, Random Forest and CART to RNA-Seq data.
Maintained by Gokmen Zararsiz. Last updated 5 months ago.
immunooncologysequencingrnaseqclassificationclustering
4.81 score 27 scripts 1 dependentsjoelcuerrier
cdid:The Chained Difference-in-Differences
Extends the 'did' package to improve efficiency and handling of unbalanced panel data. Bellego, Benatia, and Dortet-Bernadet (2024), "The Chained Difference-in-Differences", Journal of Econometrics, <doi:10.1016/j.jeconom.2024.105783>.
Maintained by David Benatia. Last updated 2 months ago.
2 stars 4.78 score 3 scriptsbioc
supersigs:Supervised mutational signatures
Generate SuperSigs (supervised mutational signatures) from single nucleotide variants in the cancer genome. Functions included in the package allow the user to learn supervised mutational signatures from their data and apply them to new data. The methodology is based on the one described in Afsari (2021, ELife).
Maintained by Albert Kuo. Last updated 5 months ago.
featureextractionclassificationregressionsequencingwholegenomesomaticmutation
3 stars 4.78 score 3 scriptsadimajo
glmtree:Logistic Regression Trees
A logistic regression tree is a decision tree with logistic regressions at its leaves. A particular stochastic expectation maximization algorithm is used to draw a few good trees, that are then assessed via the user's criterion of choice among BIC / AIC / test set Gini. The formal development is given in a PhD chapter, see Ehrhardt (2019) <https://github.com/adimajo/manuscrit_these/releases/>.
Maintained by Adrien Ehrhardt. Last updated 1 years ago.
6 stars 4.78 score 3 scriptsharrison4192
tidybins:Make Tidy Bins
Multiple ways to bin numeric columns with a tidy output. Wraps a variety of existing binning methods into one function, and includes a new method for binning by equal value, which is useful for sales data. Provides a function to automatically summarize the properties of the binned columns.
Maintained by Harrison Tietze. Last updated 10 months ago.
4 stars 4.78 score 2 scripts 1 dependentscefet-rj-dal
heimdall:Drift Adaptable Models
By analyzing streaming datasets, it is possible to observe significant changes in the data distribution or models' accuracy during their prediction (concept drift). The goal of 'heimdall' is to measure when concept drift occurs. The package makes available several state-of-the-art methods. It also tackles how to adapt models in a nonstationary context. Some concept drifts methods are described in Tavares (2022) <doi:10.1007/s12530-021-09415-z>.
Maintained by Eduardo Ogasawara. Last updated 2 months ago.
2 stars 4.77 score 45 scriptshknd23
DeepLearningCausal:Causal Inference with Super Learner and Deep Neural Networks
Functions to estimate Conditional Average Treatment Effects (CATE) and Population Average Treatment Effects on the Treated (PATT) from experimental or observational data using the Super Learner (SL) ensemble method and Deep neural networks. The package first provides functions to implement meta-learners such as the Single-learner (S-learner) and Two-learner (T-learner) described in Künzel et al. (2019) <doi:10.1073/pnas.1804597116> for estimating the CATE. The S- and T-learner are each estimated using the SL ensemble method and deep neural networks. It then provides functions to implement the Ottoboni and Poulos (2020) <doi:10.1515/jci-2018-0035> PATT-C estimator to obtain the PATT from experimental data with noncompliance by using the SL ensemble method and deep neural networks.
Maintained by Nguyen K. Huynh. Last updated 2 months ago.
causal-inferencedeep-neural-networksmachine-learning
2 stars 4.73 score 5 scriptsbioc
TOP:TOP Constructs Transferable Model Across Gene Expression Platforms
TOP constructs a transferable model across gene expression platforms for prospective experiments. Such a transferable model can be trained to make predictions on independent validation data with an accuracy that is similar to a re-substituted model. The TOP procedure also has the flexibility to be adapted to suit the most common clinical response variables, including linear response, binomial and Cox PH models.
Maintained by Harry Robertson. Last updated 5 months ago.
softwaresurvivalgeneexpression
4.70 score 50 scriptspierreroudier
dissever:Spatial Downscaling using the Dissever Algorithm
Spatial downscaling of coarse grid mapping to fine grid mapping using predictive covariates and a model fitted using the 'caret' package. The original dissever algorithm was published by Malone et al. (2012) <doi:10.1016/j.cageo.2011.08.021>, and extended by Roudier et al. (2017) <doi:10.1016/j.compag.2017.08.021>.
Maintained by Pierre Roudier. Last updated 5 years ago.
10 stars 4.70 score 6 scriptsduolajiang
RCTrep:Validation of Estimates of Treatment Effects in Observational Data
Validates estimates of (conditional) average treatment effects obtained using observational data by a) making it easy to obtain and visualize estimates derived using a large variety of methods (G-computation, inverse propensity score weighting, etc.), and b) ensuring that estimates are easily compared to a gold standard (i.e., estimates derived from randomized controlled trials). 'RCTrep' offers a generic protocol for treatment effect validation based on four simple steps, namely, set-selection, estimation, diagnosis, and validation. 'RCTrep' provides a simple dashboard to review the obtained results. The validation approach is introduced by Shen, L., Geleijnse, G. and Kaptein, M. (2023) <doi:10.21203/rs.3.rs-2559287/v1>.
Maintained by Lingjie Shen. Last updated 2 years ago.
8 stars 4.68 score 12 scriptssahirbhatnagar
eclust:Environment Based Clustering for Interpretable Predictive Models in High Dimensional Data
Companion package to the paper: An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures. Bhatnagar, Yang, Khundrakpam, Evans, Blanchette, Bouchard, Greenwood (2017) <DOI:10.1101/102475>. This package includes an algorithm for clustering high dimensional data that can be affected by an environmental factor.
Maintained by Sahir Rai Bhatnagar. Last updated 8 years ago.
2 stars 4.62 score 14 scriptsbioc
branchpointer:Prediction of intronic splicing branchpoints
Predicts branchpoint probability for sites in intronic branchpoint windows. Queries can be supplied as intronic regions; or to evaluate the effects of mutations, SNPs.
Maintained by Beth Signal. Last updated 5 months ago.
softwaregenomeannotationgenomicvariationmotifannotation
4.62 score 21 scriptsabichat
scimo:Extra Recipes Steps for Dealing with Omics Data
Omics data (e.g. transcriptomics, proteomics, metagenomics...) offer a detailed and multi-dimensional perspective on the molecular components and interactions within complex biological (eco)systems. Analyzing these data requires adapted procedures, which are implemented as steps according to the 'recipes' package.
Maintained by Antoine BICHAT. Last updated 10 months ago.
4 stars 4.60 score 4 scriptsccy-dev
LongDat:A Tool for 'Covariate'-Sensitive Longitudinal Analysis on 'omics' Data
This tool takes longitudinal dataset as input and analyzes if there is significant change of the features over time (a proxy for treatments), while detects and controls for 'covariates' simultaneously. 'LongDat' is able to take in several data types as input, including count, proportion, binary, ordinal and continuous data. The output table contains p values, effect sizes and 'covariates' of each feature, making the downstream analysis easy.
Maintained by Chia-Yu Chen. Last updated 4 months ago.
4 stars 4.60 score 4 scriptsbioc
MAIT:Statistical Analysis of Metabolomic Data
The MAIT package contains functions to perform end-to-end statistical analysis of LC/MS Metabolomic Data. Special emphasis is put on peak annotation and in modular function design of the functions.
Maintained by Pol Sola-Santos. Last updated 5 months ago.
immunooncologymassspectrometrymetabolomicssoftware
4.60 score 20 scriptsbioc
SVMDO:Identification of Tumor-Discriminating mRNA Signatures via Support Vector Machines Supported by Disease Ontology
It is an easy-to-use GUI using disease information for detecting tumor/normal sample discriminating gene sets from differentially expressed genes. Our approach is based on an iterative algorithm filtering genes with disease ontology enrichment analysis and wilk and wilks lambda criterion connected to SVM classification model construction. Along with gene set extraction, SVMDO also provides individual prognostic marker detection. The algorithm is designed for FPKM and RPKM normalized RNA-Seq transcriptome datasets.
Maintained by Mustafa Erhan Ozer. Last updated 5 months ago.
genesetenrichmentdifferentialexpressionguiclassificationrnaseqtranscriptomicssurvivalmachine-learningrna-seqshiny
4.60 score 2 scriptsriccardo-df
aggTrees:Aggregation Trees
Nonparametric data-driven approach to discovering heterogeneous subgroups in a selection-on-observables framework. 'aggTrees' allows researchers to assess whether there exists relevant heterogeneity in treatment effects by generating a sequence of optimal groupings, one for each level of granularity. For each grouping, we obtain point estimation and inference about the group average treatment effects. Please reference the use as Di Francesco (2024) <doi:10.48550/arXiv.2410.11408>.
Maintained by Riccardo Di Francesco. Last updated 1 months ago.
4.60 score 4 scriptsssa-statistical-team-projects
povmap:Extension to the 'emdi' Package
The R package 'povmap' supports small area estimation of means and poverty headcount rates. It adds several new features to the 'emdi' package (see "The R Package emdi for Estimating and Mapping Regionally Disaggregated Indicators" by Kreutzmann et al. (2019) <doi:10.18637/jss.v091.i07>). These include new options for incorporating survey weights, ex-post benchmarking of estimates, two additional transformations, several new convenient functions to assist with reporting results, and a wrapper function to facilitate access from 'Stata'.
Maintained by Ifeanyi Edochie. Last updated 5 months ago.
1 stars 4.60 score 10 scriptscolemanrharris
mxnorm:Apply Normalization Methods to Multiplexed Images
Implements methods to normalize multiplexed imaging data, including statistical metrics and visualizations to quantify technical variation in this data type. Reference for methods listed here: Harris, C., Wrobel, J., & Vandekar, S. (2022). mxnorm: An R Package to Normalize Multiplexed Imaging Data. Journal of Open Source Software, 7(71), 4180, <doi:10.21105/joss.04180>.
Maintained by Coleman Harris. Last updated 2 years ago.
7 stars 4.54 score 7 scriptsvascobranco
red:IUCN Redlisting Tools
Includes algorithms to facilitate the assessment of extinction risk of species according to the IUCN (International Union for Conservation of Nature, see <https://www.iucn.org/> for more information) red list criteria.
Maintained by Vasco V. Branco. Last updated 3 months ago.
1 stars 4.54 score 29 scripts 1 dependentserblast
parcats:Interactive Parallel Categories Diagrams for 'easyalluvial'
Complex graphical representations of data are best explored using interactive elements. 'parcats' adds interactive graphing capabilities to the 'easyalluvial' package. The 'plotly.js' parallel categories diagrams offer a good framework for creating interactive flow graphs that allow manual drag and drop sorting of dimensions and categories, highlighting single flows and displaying mouse over information. The 'plotly.js' dependency is quite heavy and therefore is outsourced into a separate package.
Maintained by Bjoern Koneswarakantha. Last updated 1 years ago.
25 stars 4.53 score 27 scriptsbioc
tLOH:Assessment of evidence for LOH in spatial transcriptomics pre-processed data using Bayes factor calculations
tLOH, or transcriptomicsLOH, assesses evidence for loss of heterozygosity (LOH) in pre-processed spatial transcriptomics data. This tool requires spatial transcriptomics cluster and allele count information at likely heterozygous single-nucleotide polymorphism (SNP) positions in VCF format. Bayes factors are calculated at each SNP to determine likelihood of potential loss of heterozygosity event. Two plotting functions are included to visualize allele fraction and aggregated Bayes factor per chromosome. Data generated with the 10X Genomics Visium Spatial Gene Expression platform must be pre-processed to obtain an individual sample VCF with columns for each cluster. Required fields are allele depth (AD) with counts for reference/alternative alleles and read depth (DP).
Maintained by Michelle Webb. Last updated 5 months ago.
copynumbervariationtranscriptionsnpgeneexpressiontranscriptomics
3 stars 4.48 score 4 scriptsenriquegit
ssr:Semi-Supervised Regression Methods
An implementation of semi-supervised regression methods including self-learning and co-training by committee based on Hady, M. F. A., Schwenker, F., & Palm, G. (2009) <doi:10.1007/978-3-642-04274-4_13>. Users can define which set of regressors to use as base models from the 'caret' package, other packages, or custom functions.
Maintained by Enrique Garcia-Ceja. Last updated 6 years ago.
data-sciencemachine-learningregressionsemi-supervised-learning
2 stars 4.46 score 29 scriptsnhejazi
medoutcon:Efficient Natural and Interventional Causal Mediation Analysis
Efficient estimators of interventional (in)direct effects in the presence of mediator-outcome confounding affected by exposure. The effects estimated allow for the impact of the exposure on the outcome through a direct path to be disentangled from that through mediators, even in the presence of intermediate confounders that complicate such a relationship. Currently supported are non-parametric efficient one-step and targeted minimum loss estimators based on the formulation of Díaz, Hejazi, Rudolph, and van der Laan (2020) <doi:10.1093/biomet/asaa085>. Support for efficient estimation of the natural (in)direct effects is also provided, appropriate for settings in which intermediate confounders are absent. The package also supports estimation of these effects when the mediators are measured using outcome-dependent two-phase sampling designs (e.g., case-cohort).
Maintained by Nima Hejazi. Last updated 1 years ago.
causal-inferencecausal-machine-learninginverse-probability-weightsmachine-learningmediation-analysisstochastic-interventionstargeted-learningtreatment-effects
13 stars 4.46 score 22 scriptssarahleavitt
nbTransmission:Naive Bayes Transmission Analysis
Estimates the relative transmission probabilities between cases in an infectious disease outbreak or cluster using naive Bayes. Included are various functions to use these probabilities to estimate transmission parameters such as the generation/serial interval and reproductive number as well as finding the contribution of covariates to the probabilities and visualizing results. The ideal use is for an infectious disease dataset with metadata on the majority of cases but more informative data such as contact tracing or pathogen whole genome sequencing on only a subset of cases. For a detailed description of the methods see Leavitt et al. (2020) <doi:10.1093/ije/dyaa031>.
Maintained by Sarah V Leavitt. Last updated 19 days ago.
4 stars 4.45 score 14 scriptsbioc
PRONE:The PROteomics Normalization Evaluator
High-throughput omics data are often affected by systematic biases introduced throughout all the steps of a clinical study, from sample collection to quantification. Normalization methods aim to adjust for these biases to make the actual biological signal more prominent. However, selecting an appropriate normalization method is challenging due to the wide range of available approaches. Therefore, a comparative evaluation of unnormalized and normalized data is essential in identifying an appropriate normalization strategy for a specific data set. This R package provides different functions for preprocessing, normalizing, and evaluating different normalization approaches. Furthermore, normalization methods can be evaluated on downstream steps, such as differential expression analysis and statistical enrichment analysis. Spike-in data sets with known ground truth and real-world data sets of biological experiments acquired by either tandem mass tag (TMT) or label-free quantification (LFQ) can be analyzed.
Maintained by Lis Arend. Last updated 10 days ago.
proteomicspreprocessingnormalizationdifferentialexpressionvisualizationdata-analysisevaluation
2 stars 4.41 score 9 scriptsacabassi
klic:Kernel Learning Integrative Clustering
Kernel Learning Integrative Clustering (KLIC) is an algorithm that allows to combine multiple kernels, each representing a different measure of the similarity between a set of observations. The contribution of each kernel on the final clustering is weighted according to the amount of information carried by it. As well as providing the functions required to perform the kernel-based clustering, this package also allows the user to simply give the data as input: the kernels are then built using consensus clustering. Different strategies to choose the best number of clusters are also available. For further details please see Cabassi and Kirk (2020) <doi:10.1093/bioinformatics/btaa593>.
Maintained by Alessandra Cabassi. Last updated 5 years ago.
cluster-analysisclusteringcocagenomicsintegrative-clusteringkernel-methodsmulti-omics
5 stars 4.40 score 10 scriptsrobson-fernandes
dbnlearn:Dynamic Bayesian Network Structure Learning, Parameter Learning and Forecasting
It allows to learn the structure of univariate time series, learning parameters and forecasting. Implements a model of Dynamic Bayesian Networks with temporal windows, with collections of linear regressors for Gaussian nodes, based on the introductory texts of Korb and Nicholson (2010) <doi:10.1201/b10391> and Nagarajan, Scutari and Lèbre (2013) <doi:10.1007/978-1-4614-6446-4>.
Maintained by Robson Fernandes. Last updated 5 years ago.
bayesian-inferencebayesian-networksdynamic-bayesian-networksprobabilistic-graphical-modelstime-series
16 stars 4.32 score 26 scriptsbioc
PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit
Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationsurvivalclusteringgeneprediction
1 stars 4.31 score 17 scriptsbioc
m6Aboost:m6Aboost
This package can help user to run the m6Aboost model on their own miCLIP2 data. The package includes functions to assign the read counts and get the features to run the m6Aboost model. The miCLIP2 data should be stored in a GRanges object. More details can be found in the vignette.
Maintained by You Zhou. Last updated 5 months ago.
sequencingepigeneticsgeneticsexperimenthubsoftware
2 stars 4.30 score 5 scriptsbioc
AnVILBilling:Provide functions to retrieve and report on usage expenses in NHGRI AnVIL (anvilproject.org).
AnVILBilling helps monitor AnVIL-related costs in R, using queries to a BigQuery table to which costs are exported daily. Functions are defined to help categorize tasks and associated expenditures, and to visualize and explore expense profiles over time. This package will be expanded to help users estimate costs for specific task sets.
Maintained by Vince Carey. Last updated 5 months ago.
4.30 score 5 scripts