R-universe search: forests

merck

forestly:Interactive Forest Plot

Interactive forest plot for clinical trial safety analysis using 'metalite', 'reactable', 'plotly', and Analysis Data Model (ADaM) datasets. Includes functionality for adverse event filtering, incidence-based group filtering, hover-over reveals, and search and sort operations. The workflow allows for metadata construction, data preparation, output formatting, and interactive plot generation.

Maintained by Benjamin Wang. Last updated 2 months ago.

69.5 match 14 stars 7.59 score 12 scripts 1 dependents

david-cortes

isotree:Isolation-Based Outlier Detection

Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <doi:10.48550/arXiv.1811.02141>), SCiForest (Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <doi:10.48550/arXiv.2110.13402>), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <doi:10.48550/arXiv.1910.12362>), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <doi:10.48550/arXiv.1911.06646>), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <doi:10.48550/arXiv.2111.11639>). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.

Maintained by David Cortes. Last updated 13 days ago.

anomaly-detection imputation isolation-forest outlier-detection cpp openmp

48.0 match 203 stars 10.41 score 115 scripts 6 dependents

cran

grf:Generalized Random Forests

Forest-based statistical estimation and inference. GRF provides non-parametric methods for heterogeneous treatment effects estimation (optionally using right-censored outcomes, multiple treatment arms or outcomes, or instrumental variables), as well as least-squares regression, quantile regression, and survival regression, all with support for missing covariates.

Maintained by Erik Sverdrup. Last updated 4 months ago.

cpp

75.9 match 5.83 score 1.2k scripts 14 dependents

wviechtb

metafor:Meta-Analysis Package for R

A comprehensive collection of functions for conducting meta-analyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit equal-, fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots (e.g., forest, funnel, radial, L'Abbe, Baujat, bubble, and GOSH plots). For meta-analyses of binomial and person-time data, the package also provides functions that implement specialized methods, including the Mantel-Haenszel method, Peto's method, and a variety of suitable generalized linear (mixed-effects) models (i.e., mixed-effects logistic and Poisson regression models). Finally, the package provides functionality for fitting meta-analytic multivariate/multilevel models that account for non-independent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network meta-analyses and meta-analyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted. An introduction to the package can be found in Viechtbauer (2010) <doi:10.18637/jss.v036.i03>.

Maintained by Wolfgang Viechtbauer. Last updated 21 hours ago.

meta-analysis mixed-effects multilevel-models multivariate

25.0 match 246 stars 16.30 score 4.9k scripts 92 dependents

sollano

forestmangr:Forest Mensuration and Management

Processing forest inventory data with methods such as simple random sampling, stratified random sampling and systematic sampling. There are also functions for yield and growth predictions and model fitting, linear and nonlinear grouped data fitting, and statistical tests. References: Kershaw Jr., Ducey, Beers and Husch (2016). <doi:10.1002/9781118902028>.

Maintained by Sollano Rabelo Braga. Last updated 3 months ago.

50.2 match 17 stars 7.97 score 378 scripts

stochastictree

stochtree:Stochastic Tree Ensembles (XBART and BART) for Supervised Learning and Causal Inference

Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285> for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195> for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers.

Maintained by Drew Herren. Last updated 16 days ago.

bart bayesian-machine-learning bayesian-methods decision-trees gradient-boosted-trees machine-learning probabilistic-models tree-ensembles cpp

42.0 match 20 stars 8.52 score 40 scripts

brandmaier

semtree:Recursive Partitioning for Structural Equation Models

SEM Trees and SEM Forests -- an extension of model-based decision trees and forests to Structural Equation Models (SEM). SEM trees hierarchically split empirical data into homogeneous groups each sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences. SEM forests are an extension of SEM trees. They are ensembles of SEM trees each built on a random sample of the original data. By aggregating over a forest, we obtain measures of variable importance that are more robust than measures from single trees. A description of the method was published by Brandmaier, von Oertzen, McArdle, & Lindenberger (2013) <doi:10.1037/a0030001> and Arnold, Voelkle, & Brandmaier (2020) <doi:10.3389/fpsyg.2020.564403>.

Maintained by Andreas M. Brandmaier. Last updated 3 months ago.

bigdata decision-tree forest multivariate randomforest recursive-partitioning sem statistical-modeling structural-equation-modeling structural-equation-models

37.8 match 15 stars 8.63 score 68 scripts

emf-creaf

medfate:Mediterranean Forest Simulation

Simulate Mediterranean forest functioning and dynamics using cohort-based description of vegetation [De Caceres et al. (2015) <doi:10.1016/j.agrformet.2015.06.012>; De Caceres et al. (2021) <doi:10.1016/j.agrformet.2020.108233>].

Maintained by Miquel De Cáceres. Last updated 8 days ago.

cpp

40.6 match 11 stars 7.49 score 183 scripts 1 dependents

modeloriented

randomForestExplainer:Explaining and Visualizing Random Forests in Terms of Variable Importance

A set of tools to help explain which variables are most important in a random forests. Various variable importance measures are calculated and visualized in different settings in order to get an idea on how their importance changes depending on our criteria (Hemant Ishwaran and Udaya B. Kogalur and Eiran Z. Gorodeski and Andy J. Minn and Michael S. Lauer (2010) <doi:10.1198/jasa.2009.tm08622>, Leo Breiman (2001) <doi:10.1023/A:1010933404324>).

Maintained by Yue Jiang. Last updated 12 months ago.

random-forest

29.1 match 231 stars 9.82 score 236 scripts

simonpcouch

forested:Forest Attributes in Washington State

A small subset of plots in Washington State are sampled and assessed "on-the-ground" as forested or non-forested by the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program, but the FIA also has access to remotely sensed data for all land in the state. The 'forested' package contains a data frame by the same name intended for use in predictive modeling applications where the more easily-accessible remotely sensed data can be used to predict whether a plot is forested or non-forested.

Maintained by Simon Couch. Last updated 7 months ago.

58.0 match 7 stars 4.66 score 33 scripts

mlr-org

mlr3extralearners:Extra Learners For mlr3

Extra learners for use in mlr3.

Maintained by Sebastian Fischer. Last updated 4 months ago.

machine-learning mlr3

25.3 match 94 stars 9.16 score 474 scripts

andrew-plowright

ForestTools:Tools for Analyzing Remote Sensing Forest Data

Tools for analyzing remote sensing forest data, including functions for detecting treetops from canopy models, outlining tree crowns, and calculating textural metrics.

Maintained by Andrew Plowright. Last updated 1 months ago.

32.9 match 73 stars 7.01 score 103 scripts 1 dependents

spatstat

spatstat.data:Datasets for 'spatstat' Family

Contains all the datasets for the 'spatstat' family of packages.

Maintained by Adrian Baddeley. Last updated 3 hours ago.

kernel-density point-process spatial-analysis spatial-data spatial-data-analysis spatstat statistical-analysis statistical-methods statistical-tests statistics

19.8 match 6 stars 11.02 score 186 scripts 228 dependents

adayim

forestploter:Create a Flexible Forest Plot

Create a forest plot based on the layout of the data. Confidence intervals in multiple columns by groups can be done easily. Editing the plot, inserting/adding text, applying a theme to the plot, and much more.

Maintained by Alimu Dayimu. Last updated 6 months ago.

forestplot

22.2 match 93 stars 9.31 score 207 scripts 4 dependents

mrcieu

TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.

Maintained by Gibran Hemani. Last updated 9 days ago.

17.0 match 467 stars 11.23 score 1.7k scripts 1 dependents

kogalur

randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)

Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.

Maintained by Udaya B. Kogalur. Last updated 2 months ago.

openmp

23.9 match 10 stars 7.90 score 1.2k scripts 12 dependents

imbs-hl

ranger:A Fast Implementation of Random Forests

A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genome-wide association studies can be analyzed efficiently. In addition to data frames, datasets of class 'gwaa.data' (R package 'GenABEL') and 'dgCMatrix' (R package 'Matrix') can be directly analyzed.

Maintained by Marvin N. Wright. Last updated 4 months ago.

cpp

11.5 match 783 stars 16.22 score 9.2k scripts 189 dependents

ropensci

aorsf:Accelerated Oblique Random Forests

Fit, interpret, and compute predictions with oblique random forests. Includes support for partial dependence, variable importance, passing customized functions for variable importance and identification of linear combinations of features. Methods for the oblique random survival forest are described in Jaeger et al., (2023) <DOI:10.1080/10618600.2023.2231048>.

Maintained by Byron Jaeger. Last updated 2 days ago.

data-science oblique random-forest survival openblas cpp openmp

19.3 match 58 stars 9.21 score 60 scripts 1 dependents

tmlange

optRF:Optimising Random Forest Stability by Determining the Optimal Number of Trees

Calculating the stability of random forest with certain numbers of trees. The non-linear relationship between stability and numbers of trees is described using a logistic regression model and used to estimate the optimal number of trees.

Maintained by Thomas Martin Lange. Last updated 1 months ago.

35.1 match 4.78 score

jinli22

spm:Spatial Predictive Modeling

Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https:www.ga.gov.au/metadata-gateway/metadata/record/gcat_71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https:www.ga.gov.au/metadata-gateway/metadata/record/74030>.

Maintained by Jin Li. Last updated 3 years ago.

29.7 match 3 stars 5.46 score 107 scripts 3 dependents

igraph

igraph:Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

Maintained by Kirill Müller. Last updated 1 days ago.

complex-networks graph-algorithms graph-theory mathematics network-analysis network-graph fortran libxml2 glpk openblas cpp

7.6 match 581 stars 21.10 score 31k scripts 1.9k dependents

doserjef

rFIA:Estimation of Forest Variables using the FIA Database

The goal of 'rFIA' is to increase the accessibility and use of the United States Forest Services (USFS) Forest Inventory and Analysis (FIA) Database by providing a user-friendly, open source toolkit to easily query and analyze FIA Data. Designed to accommodate a wide range of potential user objectives, 'rFIA' simplifies the estimation of forest variables from the FIA Database and allows all R users (experts and newcomers alike) to unlock the flexibility inherent to the Enhanced FIA design. Specifically, 'rFIA' improves accessibility to the spatial-temporal estimation capacity of the FIA Database by producing space-time indexed summaries of forest variables within user-defined population boundaries. Direct integration with other popular R packages (e.g., 'dplyr', 'tidyr', and 'sf') facilitates efficient space-time query and data summary, and supports common data representations and API design. The package implements design-based estimation procedures outlined by Bechtold & Patterson (2005) <doi:10.2737/SRS-GTR-80>, and has been validated against estimates and sampling errors produced by FIA 'EVALIDator'. Current development is focused on the implementation of spatially-enabled model-assisted and model-based estimators to improve population, change, and ratio estimates.

Maintained by Jeffrey Doser. Last updated 7 days ago.

compute-estimates fia fia-database fia-datamart forest-inventory forest-variables inventories space-time spatial

25.9 match 49 stars 5.93 score

andyliaw-mrk

randomForest:Breiman and Cutlers Random Forests for Classification and Regression

Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <DOI:10.1023/A:1010933404324>.

Maintained by Andy Liaw. Last updated 6 months ago.

fortran

12.6 match 47 stars 12.11 score 35k scripts 282 dependents

guido-s

netmeta:Network Meta-Analysis using Frequentist Methods

A comprehensive set of functions providing frequentist methods for network meta-analysis (Balduzzi et al., 2023) <doi:10.18637/jss.v106.i02> and supporting Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 8 "Network Meta-Analysis": - frequentist network meta-analysis following Rücker (2012) <doi:10.1002/jrsm.1058>; - additive network meta-analysis for combinations of treatments (Rücker et al., 2020) <doi:10.1002/bimj.201800167>; - network meta-analysis of binary data using the Mantel-Haenszel or non-central hypergeometric distribution method (Efthimiou et al., 2019) <doi:10.1002/sim.8158>, or penalised logistic regression (Evrenoglou et al., 2022) <doi:10.1002/sim.9562>; - rankograms and ranking of treatments by the Surface under the cumulative ranking curve (SUCRA) (Salanti et al., 2013) <doi:10.1016/j.jclinepi.2010.03.016>; - ranking of treatments using P-scores (frequentist analogue of SUCRAs without resampling) according to Rücker & Schwarzer (2015) <doi:10.1186/s12874-015-0060-8>; - split direct and indirect evidence to check consistency (Dias et al., 2010) <doi:10.1002/sim.3767>, (Efthimiou et al., 2019) <doi:10.1002/sim.8158>; - league table with network meta-analysis results; - 'comparison-adjusted' funnel plot (Chaimani & Salanti, 2012) <doi:10.1002/jrsm.57>; - net heat plot and design-based decomposition of Cochran's Q according to Krahn et al. (2013) <doi:10.1186/1471-2288-13-35>; - measures characterizing the flow of evidence between two treatments by König et al. (2013) <doi:10.1002/sim.6001>; - automated drawing of network graphs described in Rücker & Schwarzer (2016) <doi:10.1002/jrsm.1143>; - partial order of treatment rankings ('poset') and Hasse diagram for 'poset' (Carlsen & Bruggemann, 2014) <doi:10.1002/cem.2569>; (Rücker & Schwarzer, 2017) <doi:10.1002/jrsm.1270>; - contribution matrix as described in Papakonstantinou et al. (2018) <doi:10.12688/f1000research.14770.3> and Davies et al. (2022) <doi:10.1002/sim.9346>; - subgroup network meta-analysis.

Maintained by Guido Schwarzer. Last updated 13 hours ago.

meta-analysis network-meta-analysis rstudio

12.1 match 33 stars 11.82 score 199 scripts 10 dependents

guido-s

meta:General Package for Meta-Analysis

User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker <DOI:10.1007/978-3-319-21416-0>, "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from 'RevMan 5'; - produce forest plot summarising several (subgroup) meta-analyses.

Maintained by Guido Schwarzer. Last updated 24 days ago.

meta-analysis rstudio

9.4 match 84 stars 14.84 score 2.3k scripts 29 dependents

jlmelville

rnndescent:Nearest Neighbor Descent Method for Approximate Nearest Neighbors

The Nearest Neighbor Descent method for finding approximate nearest neighbors by Dong and co-workers (2010) <doi:10.1145/1963405.1963487>. Based on the 'Python' package 'PyNNDescent' <https://github.com/lmcinnes/pynndescent>.

Maintained by James Melville. Last updated 8 months ago.

approximate-nearest-neighbor-search cpp

18.9 match 11 stars 7.31 score 75 scripts

blasbenito

spatialRF:Easy Spatial Modeling with Random Forest

Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).

Maintained by Blas M. Benito. Last updated 3 years ago.

random-forest spatial-analysis spatial-regression

25.2 match 114 stars 5.45 score 49 scripts

smartdata-analysis-and-statistics

metamisc:Meta-Analysis of Diagnosis and Prognosis Research Studies

Facilitate frequentist and Bayesian meta-analysis of diagnosis and prognosis research studies. It includes functions to summarize multiple estimates of prediction model discrimination and calibration performance (Debray et al., 2019) <doi:10.1177/0962280218785504>. It also includes functions to evaluate funnel plot asymmetry (Debray et al., 2018) <doi:10.1002/jrsm.1266>. Finally, the package provides functions for developing multivariable prediction models from datasets with clustering (de Jong et al., 2021) <doi:10.1002/sim.8981>.

Maintained by Thomas Debray. Last updated 29 days ago.

meta-analysis prognosis prognostic-models

18.3 match 7 stars 7.48 score 102 scripts

okasag

orf:Ordered Random Forests

An implementation of the Ordered Forest estimator as developed in Lechner & Okasa (2019) <arXiv:1907.02436>. The Ordered Forest flexibly estimates the conditional probabilities of models with ordered categorical outcomes (so-called ordered choice models). Additionally to common machine learning algorithms the 'orf' package provides functions for estimating marginal effects as well as statistical inference thereof and thus provides similar output as in standard econometric models for ordered choice. The core forest algorithm relies on the fast C++ forest implementation from the 'ranger' package (Wright & Ziegler, 2017) <arXiv:1508.04409>.

Maintained by Gabriel Okasa. Last updated 3 years ago.

cpp

25.4 match 12 stars 5.38 score 22 scripts 2 dependents

cran

randomUniformForest:Random Uniform Forests for Classification, Regression and Unsupervised Learning

Ensemble model, for classification, regression and unsupervised learning, based on a forest of unpruned and randomized binary decision trees. Each tree is grown by sampling, with replacement, a set of variables at each node. Each cut-point is generated randomly, according to the continuous Uniform distribution. For each tree, data are either bootstrapped or subsampled. The unsupervised mode introduces clustering, dimension reduction and variable importance, using a three-layer engine. Random Uniform Forests are mainly aimed to lower correlation between trees (or trees residuals), to provide a deep analysis of variable importance and to allow native distributed and incremental learning.

Maintained by Saip Ciss. Last updated 3 years ago.

cpp

33.3 match 3 stars 3.77 score 99 scripts

mayer79

missRanger:Fast Imputation of Missing Values

Alternative implementation of the beautiful 'MissForest' algorithm used to impute mixed-type data sets by chaining random forests, introduced by Stekhoven, D.J. and Buehlmann, P. (2012) <doi:10.1093/bioinformatics/btr597>. Under the hood, it uses the lightning fast random forest package 'ranger'. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids imputation with values not already present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow, e.g., to do multiple imputation when repeating the call to missRanger(). Out-of-sample application is supported as well.

Maintained by Michael Mayer. Last updated 3 months ago.

imputation machine-learning missing-values random-forest

11.3 match 69 stars 11.07 score 208 scripts 6 dependents

insightsengineering

chevron:Standard TLGs for Clinical Trials Reporting

Provide standard tables, listings, and graphs (TLGs) libraries used in clinical trials. This package implements a structure to reformat the data with 'dunlin', create reporting tables using 'rtables' and 'tern' with standardized input arguments to enable quick generation of standard outputs. In addition, it also provides comprehensive data checks and script generation functionality.

Maintained by Joe Zhu. Last updated 23 days ago.

clinical-trials graphs listings nest reporting tables

14.7 match 12 stars 8.24 score 12 scripts

jernejjevsenak

MLFS:Machine Learning Forest Simulator

Climate-sensitive forest simulator based on the principles of machine learning. It stimulates all key processes in the forest: radial growth, height growth, mortality, crown recession, regeneration and harvesting. The method for predicting tree heights was described by Skudnik and Jevšenak (2022) <doi:10.1016/j.foreco.2022.120017>, while the method for predicting basal area increments (BAI) was described by Jevšenak and Skudnik (2021) <doi:10.1016/j.foreco.2020.118601>.

Maintained by Jernej Jevsenak. Last updated 3 years ago.

33.7 match 2 stars 3.40 score 25 scripts

shangzhi-hong

RfEmpImp:Multiple Imputation using Chained Random Forests

An R package for multiple imputation using chained random forests. Implemented methods can handle missing data in mixed types of variables by using prediction-based or node-based conditional distributions constructed using random forests. For prediction-based imputation, the method based on the empirical distribution of out-of-bag prediction errors of random forests and the method based on normality assumption for prediction errors of random forests are provided for imputing continuous variables. And the method based on predicted probabilities is provided for imputing categorical variables. For node-based imputation, the method based on the conditional distribution formed by the predicting nodes of random forests, and the method based on proximity measures of random forests are provided. More details of the statistical methods can be found in Hong et al. (2020) <arXiv:2004.14823>.

Maintained by Shangzhi Hong. Last updated 2 years ago.

imputation missing-data random-forest

25.6 match 5 stars 4.40 score 8 scripts

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

13.6 match 3 stars 8.20 score 7.8k scripts 11 dependents

skgrange

rmweather:Tools to Conduct Meteorological Normalisation and Counterfactual Modelling for Air Quality Data

An integrated set of tools to allow data users to conduct meteorological normalisation and counterfactual modelling for air quality data. The meteorological normalisation technique uses predictive random forest models to remove variation of pollutant concentrations so trends and interventions can be explored in a robust way. For examples, see Grange et al. (2018) <doi:10.5194/acp-18-6223-2018> and Grange and Carslaw (2019) <doi:10.1016/j.scitotenv.2018.10.344>. The random forest models can also be used for counterfactual or business as usual (BAU) modelling by using the models to predict, from the model's perspective, the future. For an example, see Grange et al. (2021) <doi:10.5194/acp-2020-1171>.

Maintained by Stuart K. Grange. Last updated 22 days ago.

17.5 match 49 stars 6.24 score 239 scripts

emf-creaf

medfateland:Mediterranean Landscape Simulation

Simulate forest hydrology, forest function and dynamics over landscapes [De Caceres et al. (2015) <doi:10.1016/j.agrformet.2015.06.012>]. Parallelization is allowed in several simulation functions and simulations may be conducted including spatial processes such as lateral water transfer and seed dispersal.

Maintained by Miquel De Cáceres. Last updated 24 days ago.

cpp

20.0 match 5 stars 5.41 score 41 scripts

ehrlinger

ggRandomForests:Visually Exploring Random Forests

Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.

Maintained by John Ehrlinger. Last updated 4 days ago.

11.8 match 148 stars 8.94 score 197 scripts

gforge

forestplot:Advanced Forest Plot Using 'grid' Graphics

Allows the creation of forest plots with advanced features, such as multiple confidence intervals per row, customizable fonts for individual text elements, and flexible confidence interval drawing. It also supports mixing text with mathematical expressions. The package extends the application of forest plots beyond traditional meta-analyses, offering a more general version of the original 'rmeta' package’s forestplot() function. It relies heavily on the 'grid' package for rendering the plots.

Maintained by Max Gordon. Last updated 4 months ago.

forestplot

9.0 match 43 stars 11.47 score 716 scripts 21 dependents

mrc-ide

epireview:Tools to update and summarise the latest pathogen data from the Pathogen Epidemiology Review Group (PERG)

Contains the latest open access pathogen data from the Pathogen Epidemiology Review Group (PERG). Tools are available to update pathogen databases with new peer-reviewed data as it becomes available, and to summarise the latest data using tables and figures.

Maintained by Sangeeta Bhatia. Last updated 23 hours ago.

15.2 match 30 stars 6.76 score 6 scripts

umr-amap

BIOMASS:Estimating Aboveground Biomass and Its Uncertainty in Tropical Forests

Contains functions to estimate aboveground biomass/carbon and its uncertainty in tropical forests. These functions allow to (1) retrieve and to correct taxonomy, (2) estimate wood density and its uncertainty, (3) construct height-diameter models, (4) manage tree and plot coordinates, (5) estimate the aboveground biomass/carbon at the stand level with associated uncertainty. To cite 'BIOMASS', please use citation("BIOMASS"). See more in the article of Réjou-Méchain et al. (2017) <doi:10.1111/2041-210X.12753>.

Maintained by Dominique Lamonica. Last updated 14 hours ago.

10.3 match 26 stars 9.90 score 68 scripts 1 dependents

bips-hb

arf:Adversarial Random Forests

Adversarial random forests (ARFs) recursively partition data into fully factorized leaves, where features are jointly independent. The procedure is iterative, with alternating rounds of generation and discrimination. Data becomes increasingly realistic at each round, until original and synthetic samples can no longer be reliably distinguished. This is useful for several unsupervised learning tasks, such as density estimation and data synthesis. Methods for both are implemented in this package. ARFs naturally handle unstructured data with mixed continuous and categorical covariates. They inherit many of the benefits of random forests, including speed, flexibility, and solid performance with default parameters. For details, see Watson et al. (2023) <https://proceedings.mlr.press/v206/watson23a.html>.

Maintained by Marvin N. Wright. Last updated 18 days ago.

15.3 match 14 stars 6.65 score 16 scripts

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

14.2 match 145 stars 7.09 score 50 scripts 2 dependents

prise6

aVirtualTwins:Adaptation of Virtual Twins Method from Jared Foster

Research of subgroups in random clinical trials with binary outcome and two treatments groups. This is an adaptation of the Jared Foster method (<https://www.ncbi.nlm.nih.gov/pubmed/21815180>).

Maintained by Francois Vieille. Last updated 7 years ago.

subgroup-identification trials

21.6 match 4 stars 4.51 score 16 scripts

molina-valero

FORTLS:Automatic Processing of Terrestrial-Based Technologies Point Cloud Data for Forestry Purposes

Process automation of point cloud data derived from terrestrial-based technologies such as Terrestrial Laser Scanner (TLS) or Mobile Laser Scanner. 'FORTLS' enables (i) detection of trees and estimation of tree-level attributes (e.g. diameters and heights), (ii) estimation of stand-level variables (e.g. density, basal area, mean and dominant height), (iii) computation of metrics related to important forest attributes estimated in Forest Inventories at stand-level, and (iv) optimization of plot design for combining TLS data and field measured data. Documentation about 'FORTLS' is described in Molina-Valero et al. (2022, <doi:10.1016/j.envsoft.2022.105337>).

Maintained by Juan Alberto Molina-Valero. Last updated 3 months ago.

forest-inventory forest-monitoring lidar-point-cloud cpp

15.8 match 22 stars 6.16 score 11 scripts

crj32

MLeval:Machine Learning Model Evaluation

Straightforward and detailed evaluation of machine learning models. 'MLeval' can produce receiver operating characteristic (ROC) curves, precision-recall (PR) curves, calibration curves, and PR gain curves. 'MLeval' accepts a data frame of class probabilities and ground truth labels, or, it can automatically interpret the Caret train function results from repeated cross validation, then select the best model and analyse the results. 'MLeval' produces a range of evaluation metrics with confidence intervals.

Maintained by Christopher R John. Last updated 5 years ago.

16.9 match 6 stars 5.71 score 144 scripts

mkossmeier

metaviz:Forest Plots, Funnel Plots, and Visual Funnel Plot Inference for Meta-Analysis

A compilation of functions to create visually appealing and information-rich plots of meta-analytic data using 'ggplot2'. Currently allows to create forest plots, funnel plots, and many of their variants, such as rainforest plots, thick forest plots, additional evidence contour funnel plots, and sunset funnel plots. In addition, functionalities for visual inference with the funnel plot in the context of meta-analysis are provided.

Maintained by Michael Kossmeier. Last updated 5 years ago.

funnel-plots rainforest-plots

13.0 match 17 stars 7.32 score 135 scripts

biodiverse

spOccupancy:Single-Species, Multi-Species, and Integrated Spatial Occupancy Models

Fits single-species, multi-species, and integrated non-spatial and spatial occupancy models using Markov Chain Monte Carlo (MCMC). Models are fit using Polya-Gamma data augmentation detailed in Polson, Scott, and Windle (2013) <doi:10.1080/01621459.2013.829001>. Spatial models are fit using either Gaussian processes or Nearest Neighbor Gaussian Processes (NNGP) for large spatial datasets. Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Provides functionality for data integration of multiple single-species occupancy data sets using a joint likelihood framework. Details on data integration are given in Miller, Pacifici, Sanderlin, and Reich (2019) <doi:10.1111/2041-210X.13110>. Details on single-species and multi-species models are found in MacKenzie, Nichols, Lachman, Droege, Royle, and Langtimm (2002) <doi:10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2> and Dorazio and Royle <doi:10.1198/016214505000000015>, respectively.

Maintained by Jeffrey Doser. Last updated 20 days ago.

openblas cpp openmp

12.9 match 59 stars 7.31 score 204 scripts

anna-neufeld

splinetree:Longitudinal Regression Trees and Forests

Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.

Maintained by Anna Neufeld. Last updated 6 years ago.

17.8 match 4 stars 5.24 score 29 scripts

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 15 days ago.

genetics imputation lmm fortran

7.8 match 12 stars 11.88 score 448 scripts 16 dependents

miriamesteve

eat:Efficiency Analysis Trees

Functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.

Maintained by Miriam Esteve. Last updated 3 years ago.

19.5 match 5 stars 4.68 score 19 scripts

lleisong

itsdm:Isolation Forest-Based Presence-Only Species Distribution Modeling

Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest) and its variations such as Extended isolation forest and SCiForest. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) <https://proceedings.mlr.press/v48/guha16.html>, Cortes, D. (2021) <arXiv:2110.13402>. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) <https://dl.acm.org/doi/abs/10.5555/3295222.3295230>, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Štrumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including 'WorldClim' version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and 'CMCC-BioClimInd' (see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>.

Maintained by Lei Song. Last updated 2 years ago.

isolation-forest outlier-detection presence-onlymodel shapley-value species-distribution-modelling

16.2 match 4 stars 5.59 score 65 scripts

farrellday

miceRanger:Multiple Imputation by Chained Equations with Random Forests

Multiple Imputation has been shown to be a flexible method to impute missing values by Van Buuren (2007) <doi:10.1177/0962280206074463>. Expanding on this, random forests have been shown to be an accurate model by Stekhoven and Buhlmann <arXiv:1105.0828> to impute missing values in datasets. They have the added benefits of returning out of bag error and variable importance estimates, as well as being simple to run in parallel.

Maintained by Sam Wilson. Last updated 3 years ago.

imputation-methods machine-learning mice missing-data missing-values random-forests

12.7 match 67 stars 7.09 score 41 scripts 1 dependents

nalzok

tree.interpreter:Random Forest Prediction Decomposition and Feature Importance Measure

An R re-implementation of the 'treeinterpreter' package on PyPI <https://pypi.org/project/treeinterpreter/>. Each prediction can be decomposed as 'prediction = bias + feature_1_contribution + ... + feature_n_contribution'. This decomposition is then used to calculate the Mean Decrease Impurity (MDI) and Mean Decrease Impurity using out-of-bag samples (MDI-oob) feature importance measures based on the work of Li et al. (2019) <arXiv:1906.10845>.

Maintained by Qingyao Sun. Last updated 5 years ago.

data-science datascience interpretability machine-learning random-forest cpp

15.3 match 12 stars 5.79 score 6 scripts

uchidamizuki

timbr:Forest/Tree Data Frames

Provides data frames for forest or tree data structures. You can create forest data structures from data frames and process them based on their hierarchies.

Maintained by Mizuki Uchida. Last updated 4 months ago.

18.0 match 11 stars 4.93 score 31 scripts

csafe-isu

handwriterRF:Handwriting Analysis with Random Forests

Perform forensic handwriting analysis of two scanned handwritten documents. This package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>. Similarity measures and a random forest produce a score-based likelihood ratio that quantifies the strength of the evidence in favor of the documents being written by the same writer or different writers.

Maintained by Stephanie Reinders. Last updated 7 days ago.

jags cpp

14.1 match 2 stars 6.18 score 15 scripts 1 dependents

syoung9836

knfi:Analysis of Korean National Forest Inventory Database

Understanding the current status of forest resources is essential for monitoring changes in forest ecosystems and generating related statistics. In South Korea, the National Forest Inventory (NFI) surveys over 4,500 sample plots nationwide every five years and records 70 items, including forest stand, forest resource, and forest vegetation surveys. Many researchers use NFI as the primary data for research, such as biomass estimation or analyzing the importance value of each species over time and space, depending on the research purpose. However, the large volume of accumulated forest survey data from across the country can make it challenging to manage and utilize such a vast dataset. To address this issue, we developed an R package that efficiently handles large-scale NFI data across time and space. The package offers a comprehensive workflow for NFI data analysis. It starts with data processing, where read_nfi() function reconstructs NFI data according to the researcher's needs while performing basic integrity checks for data quality.Following this, the package provides analytical tools that operate on the verified data. These include functions like summary_nfi() for summary statistics, diversity_nfi() for biodiversity analysis, iv_nfi() for calculating species importance value, and biomass_nfi() and cwd_biomass_nfi() for biomass estimation. Finally, for visualization, the tsvis_nfi() function generates graphs and maps, allowing users to visualize forest ecosystem changes across various spatial and temporal scales. This integrated approach and its specialized functions can enhance the efficiency of processing and analyzing NFI data, providing researchers with insights into forest ecosystems. The NFI Excel files (.xlsx) are not included in the R package and must be downloaded separately. Users can access these NFI Excel files by visiting the Korea Forest Service Forestry Statistics Platform <https://kfss.forest.go.kr/stat/ptl/article/articleList.do?curMenu=11694&bbsId=microdataboard> to download the annual NFI Excel files, which are bundled in .zip archives. Please note that this website is only available in Korean, and direct download links can be found in the notes section of the read_nfi() function.

Maintained by Sinyoung Park. Last updated 3 months ago.

data-analysis-r forestry

19.4 match 1 stars 4.48 score 2 scripts

kearutherford

BerkeleyForestsAnalytics:Compute and Summarize Core Forest Metrics from Field Data

A suite of open-source R functions designed to produce standard metrics for forest management and ecology from forest inventory data. The overarching goal is to minimize potential inconsistencies introduced by the algorithms used to compute and summarize core forest metrics. Learn more about the purpose of the package and the specific algorithms used in the package at <https://github.com/kearutherford/BerkeleyForestsAnalytics>.

Maintained by Kea Rutherford. Last updated 2 months ago.

15.7 match 7 stars 5.50 score 4 scripts

gcicc

figuRes2:Support for a Variety of Figure Production Tasks

We view a figure as a collection of graphs/tables assembled on a page and optionally annotated with metadata (titles, headers and footers). Functions and supporting documentation are offered to streamline a variety of figure production task.

Maintained by Greg Cicconetti. Last updated 3 years ago.

17.9 match 3 stars 4.78 score

carlos-alberto-silva

ForestGapR:Tropical Forest Canopy Gaps Analysis

Set of tools for detecting and analyzing Airborne Laser Scanning-derived Tropical Forest Canopy Gaps. Details were published in Silva and others (2019) <doi:10.1111/2041-210X.13211>.

Maintained by Carlos Alberto Silva. Last updated 1 years ago.

16.1 match 29 stars 5.24 score 24 scripts

plantedml

randomPlantedForest:Random Planted Forest: A Directly Interpretable Tree Ensemble

An implementation of the Random Planted Forest algorithm for directly interpretable tree ensembles based on a functional ANOVA decomposition.

Maintained by Lukas Burk. Last updated 4 months ago.

intelligibility interpretable-machine-learning interpretable-ml machine-learning ml random-forest cpp

20.3 match 5 stars 4.15 score 38 scripts

caf-ifrit

forestat:Forest Carbon Sequestration and Potential Productivity Calculation

Include assessing site classes based on the stand height growth and establishing a nonlinear mixed-effect biomass model under different site classes based on the whole stand model to achieve more accurate estimation of carbon sequestration. In particular, a carbon sequestration potential productivity calculation method based on the potential mean annual increment is proposed. This package is applicable to both natural forests and plantations. It can quantitatively assess stand’s potential productivity, realized productivity, and possible improvement under certain site, and can be used in many aspects such as site quality assessment, tree species suitability evaluation, and forest degradation evaluation. Reference: Lei X, Fu L, Li H, et al (2018) <doi:10.11707/j.1001-7488.20181213>. Fu L, Sharma R P, Zhu G, et al (2017) <doi:10.3390/f8040119>.

Maintained by Yuanyuan Han. Last updated 1 years ago.

15.2 match 23 stars 5.52 score 29 scripts

mlr-org

mlr3mbo:Flexible Bayesian Optimization

A modern and flexible approach to Bayesian Optimization / Model Based Optimization building on the 'bbotk' package. 'mlr3mbo' is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using 'mlr3mbo' for hyperparameter optimization of machine learning models within the 'mlr3' ecosystem is straightforward via 'mlr3tuning'. Examples of ready-to-use optimization algorithms include Efficient Global Optimization by Jones et al. (1998) <doi:10.1023/A:1008306431147>, ParEGO by Knowles (2006) <doi:10.1109/TEVC.2005.851274> and SMS-EGO by Ponweiser et al. (2008) <doi:10.1007/978-3-540-87700-4_78>.

Maintained by Lennart Schneider. Last updated 11 days ago.

automl bayesian-optimization bbotk black-box-optimization gaussian-process hpo hyperparameter hyperparameter-optimization hyperparameter-tuning machine-learning mlr3 model-based-optimization optimization optimizer random-forest tuning

9.5 match 25 stars 8.57 score 120 scripts 3 dependents

nicolas-robette

moreparty:A Toolbox for Conditional Inference Trees and Random Forests

Additions to 'party' and 'partykit' packages : tools for the interpretation of forests (surrogate trees, prototypes, etc.), feature selection (see Gregorutti et al (2017) <arXiv:1310.5726>, Hapfelmeier and Ulm (2013) <doi:10.1016/j.csda.2012.09.020>, Altmann et al (2010) <doi:10.1093/bioinformatics/btq134>) and parallelized versions of conditional forest and variable importance functions. Also modules and a shiny app for conditional inference trees.

Maintained by Nicolas Robette. Last updated 11 months ago.

19.3 match 3 stars 4.18 score 8 scripts

talegari

solitude:An Implementation of Isolation Forest

Isolation forest is anomaly detection method introduced by the paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>).

Maintained by Komala Sheshachala Srikanth. Last updated 4 years ago.

isolation-forest outliers rpackages

15.4 match 24 stars 5.24 score 70 scripts 1 dependents

cran

metaumbrella:Umbrella Review Package for R

A comprehensive range of facilities to perform umbrella reviews with stratification of the evidence in R. The package accomplishes this aim by building on three core functions that: (i) automatically perform all required calculations in an umbrella review (including but not limited to meta-analyses), (ii) stratify evidence according to various classification criteria, and (iii) generate a visual representation of the results. Note that if you are not familiar with R, the core features of this package are available from a web browser (<https://www.metaumbrella.org/>).

Maintained by Corentin J Gosling. Last updated 15 days ago.

17.4 match 9 stars 4.56 score

stekhoven

missForest:Nonparametric Missing Value Imputation using Random Forest

The function 'missForest' in this package is used to impute missing values particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data including complex interactions and non-linear relations. It yields an out-of-bag (OOB) imputation error estimate without the need of a test set or elaborate cross-validation. It can be run in parallel to save computation time.

Maintained by Daniel J. Stekhoven. Last updated 1 years ago.

6.8 match 92 stars 11.53 score 1.1k scripts 32 dependents

tlverse

sl3:Pipelines for Machine Learning and Super Learning

A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.

Maintained by Jeremy Coyle. Last updated 4 months ago.

data-science ensemble-learning ensemble-model machine-learning model-selection regression stacking statistics

7.6 match 100 stars 9.94 score 748 scripts 7 dependents

traitecoevo

plant:A Package for Modelling Forest Trait Ecology and Evolution

Solves trait, size and patch structured model from (Falster et al. 2016) using either method of characteristics or as stochastic, finite-sized population.

Maintained by Daniel Falster. Last updated 6 days ago.

c-plus-plus demography dynamic ecology evolution forests plant-physiology science-research simulation trait cpp

12.9 match 53 stars 5.87 score

tidymodels

tidypredict:Run Predictions Inside the Database

It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.

Maintained by Emil Hvitfeldt. Last updated 3 months ago.

dbplyr dplyr purrr rlang

6.9 match 261 stars 11.03 score 241 scripts 2 dependents

e-sensing

sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes

An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.

Maintained by Gilberto Camara. Last updated 30 days ago.

big-earth-data cbers earth-observation eo-datacubes geospatial image-time-series land-cover-classification landsat planetary-computer r-spatial remote-sensing rspatial satellite-image-time-series satellite-imagery sentinel-2 stac-api stac-catalog cpp

7.7 match 494 stars 9.50 score 384 scripts

anthonydevaux

DynForest:Random Forest with Multivariate Longitudinal Predictors

Based on random forest principle, 'DynForest' is able to include multiple longitudinal predictors to provide individual predictions. Longitudinal predictors are modeled through the random forest. The methodology is fully described for a survival outcome in: Devaux, Helmer, Genuer & Proust-Lima (2023) <doi: 10.1177/09622802231206477>.

Maintained by Anthony Devaux. Last updated 5 months ago.

11.3 match 16 stars 6.38 score 8 scripts

forestgeo

fgeo:Analyze Forest Diversity and Dynamics

To help you access, transform, analyze, and visualize ForestGEO data, we developed a collection of R packages (<https://forestgeo.github.io/fgeo/>). This package, in particular, helps you to install and load the entire package-collection with a single R command, and provides convenient ways to find relevant documentation. Most commonly, you should not worry about the individual packages that make up the package-collection as you can access all features via this package. To learn more about ForestGEO visit <http://www.forestgeo.si.edu/>.

Maintained by Mauro Lepore. Last updated 5 years ago.

abundance demography dynamic dynamics ecology fgeo forestgeo forests habitat metapackage tree

13.1 match 31 stars 5.50 score 12 scripts

randel

MixRF:A Random-Forest-Based Approach for Imputing Clustered Incomplete Data

It offers random-forest-based functions to impute clustered incomplete data. The package is tailored for but not limited to imputing multitissue expression data, in which a gene's expression is measured on the collected tissues of an individual but missing on the uncollected tissues.

Maintained by Jiebiao Wang. Last updated 8 years ago.

gene-expression imputation mixed-models random-forest

16.3 match 35 stars 4.39 score 14 scripts

daniel-conn17

fuzzyforest:Fuzzy Forests

Fuzzy forests, a new algorithm based on random forests, is designed to reduce the bias seen in random forest feature selection caused by the presence of correlated features. Fuzzy forests uses recursive feature elimination random forests to select features from separate blocks of correlated features where the correlation within each block of features is high and the correlation between blocks of features is low. One final random forest is fit using the surviving features. This package fits random forests using the 'randomForest' package and allows for easy use of 'WGCNA' to split features into distinct blocks. See D. Conn, Ngun, T., C. Ramirez, and G. Li (2019) <doi:10.18637/jss.v091.i09> for further details.

Maintained by Daniel Conn. Last updated 5 years ago.

30.6 match 2.31 score 41 scripts

ericarcher

rfPermute:Estimate Permutation p-Values for Random Forest Importance Metrics

Estimate significance of importance metrics for a Random Forest model by permuting the response variable. Produces null distribution of importance metrics for each predictor variable and p-value of observed. Provides summary and visualization functions for 'randomForest' results.

Maintained by Eric Archer. Last updated 2 years ago.

jags cpp

10.4 match 27 stars 6.77 score 96 scripts 1 dependents

alaninglis

vivid:Variable Importance and Variable Interaction Displays

A suite of plots for displaying variable importance and two-way variable interaction jointly. Can also display partial dependence plots laid out in a pairs plot or 'zenplots' style.

Maintained by Alan Inglis. Last updated 8 months ago.

9.3 match 21 stars 7.39 score 39 scripts

cseljatib

datana:Datasets and Functions to Accompany Analisis De Datos Con R

Datasets and functions to accompany the book 'Analisis de datos con el programa estadistico R: una introduccion aplicada' by Salas-Eljatib (2021, ISBN: 9789566086109). The package helps carry out data management, exploratory analyses, and model fitting.

Maintained by Christian Salas-Eljatib. Last updated 6 months ago.

52.7 match 1.30 score 1 scripts

jmsigner

amt:Animal Movement Tools

Manage and analyze animal movement data. The functionality of 'amt' includes methods to calculate home ranges, track statistics (e.g. step lengths, speed, or turning angles), prepare data for fitting habitat selection analyses, and simulation of space-use from fitted step-selection functions.

Maintained by Johannes Signer. Last updated 4 months ago.

6.3 match 41 stars 10.54 score 418 scripts

rmarko

CORElearn:Classification, Regression and Feature Evaluation

A suite of machine learning algorithms written in C++ with the R interface contains several learning techniques for classification and regression. Predictive models include e.g., classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. All predictions obtained with these models can be explained and visualized with the 'ExplainPrediction' package. This package is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, and DKM. These methods can be used for feature selection or discretization of numeric attributes. The OrdEval algorithm and its visualization is used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction. Several algorithms support parallel multithreaded execution via OpenMP. The top-level documentation is reachable through ?CORElearn.

Maintained by Marko Robnik-Sikonja. Last updated 4 months ago.

cpp openmp

11.4 match 3 stars 5.81 score 228 scripts 8 dependents

shixiangwang

ezcox:Easily Process a Batch of Cox Models

A tool to operate a batch of univariate or multivariate Cox models and return tidy result.

Maintained by Shixiang Wang. Last updated 1 years ago.

batch-processing cox-model

8.8 match 21 stars 7.22 score 44 scripts 1 dependents

mlampros

RGF:Regularized Greedy Forest

Regularized Greedy Forest wrapper of the 'Regularized Greedy Forest' <https://github.com/RGF-team/rgf/tree/master/python-package> 'python' package, which also includes a Multi-core implementation (FastRGF) <https://github.com/RGF-team/rgf/tree/master/FastRGF>.

Maintained by Lampros Mouselimis. Last updated 3 years ago.

17.7 match 3.57 score 74 scripts

hrlai

novelforestSG:Dataset from the Novel Forests of Singapore

The raw dataset and model used in Lai et al. (2021) Decoupled responses of native and exotic tree diversities to distance from old-growth forest and soil phosphorous in novel secondary forests. Applied Vegetation Science, 24, e12548.

Maintained by Hao Ran Lai. Last updated 1 years ago.

data diversity ecology forest singapore

18.7 match 2 stars 3.30 score 4 scripts

liuyu-star

ODRF:Oblique Decision Random Forest for Classification and Regression

The oblique decision tree (ODT) uses linear combinations of predictors as partitioning variables in a decision tree. Oblique Decision Random Forest (ODRF) is an ensemble of multiple ODTs generated by feature bagging. Oblique Decision Boosting Tree (ODBT) applies feature bagging during the training process of ODT-based boosting trees to ensemble multiple boosting trees. All three methods can be used for classification and regression, and ODT and ODRF serve as supplements to the classical CART of Breiman (1984) <DOI:10.1201/9781315139470> and Random Forest of Breiman (2001) <DOI:10.1023/A:1010933404324> respectively.

Maintained by Yu Liu. Last updated 5 months ago.

cpp

11.9 match 7 stars 5.10 score 18 scripts

brian-j-smith

MachineShop:Machine Learning Models and Tools

Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.

Maintained by Brian J Smith. Last updated 7 months ago.

classification-models machine-learning predictive-modeling regression-models survival-models

7.5 match 61 stars 7.95 score 121 scripts

benjilu

forestError:A Unified Framework for Random Forest Prediction Error Estimation

Estimates the conditional error distributions of random forest predictions and common parameters of those distributions, including conditional misclassification rates, conditional mean squared prediction errors, conditional biases, and conditional quantiles, by out-of-bag weighting of out-of-bag prediction errors as proposed by Lu and Hardin (2021). This package is compatible with several existing packages that implement random forests in R.

Maintained by Benjamin Lu. Last updated 4 years ago.

inference intervals machine-learning machinelearning prediction random-forest randomforest statistics

12.9 match 26 stars 4.62 score 16 scripts

mapme-initiative

mapme.biodiversity:Efficient Monitoring of Global Biodiversity Portfolios

Biodiversity areas, especially primary forest, serve a multitude of functions for local economy, regional functionality of the ecosystems as well as the global health of our planet. Recently, adverse changes in human land use practices and climatic responses to increased greenhouse gas emissions, put these biodiversity areas under a variety of different threats. The present package helps to analyse a number of biodiversity indicators based on freely available geographical datasets. It supports computational efficient routines that allow the analysis of potentially global biodiversity portfolios. The primary use case of the package is to support evidence based reporting of an organization's effort to protect biodiversity areas under threat and to identify regions were intervention is most duly needed.

Maintained by Darius A. Görgen. Last updated 3 months ago.

environment eo gis mapme spatial sustainability

6.1 match 35 stars 9.24 score 287 scripts

natydasilva

PPforest:Projection Pursuit Classification Forest

Implements projection pursuit forest algorithm for supervised classification.

Maintained by Natalia da Silva. Last updated 8 months ago.

openblas cpp

10.2 match 18 stars 5.53 score 19 scripts

forest-economics-goettingen

optimLanduse:Robust Land-Use Optimization

Robust multi-criteria land-allocation optimization that explicitly accounts for the uncertainty of the indicators in the objective function. Solves the problem of allocating scarce land to various land-use options with regard to multiple, coequal indicators. The method aims to find the land allocation that represents the indicator composition with the best possible trade-off under uncertainty. optimLanduse includes the actual optimization procedure as described by Knoke et al. (2016) <doi:10.1038/ncomms11877> and the post-hoc calculation of the portfolio performance as presented by Gosling et al. (2020) <doi:10.1016/j.jenvman.2020.110248>.

Maintained by Kai Husmann. Last updated 1 years ago.

15.6 match 2 stars 3.60 score 2 scripts

rdiaz02

varSelRF:Variable Selection using Random Forests

Variable selection from random forests using both backwards variable elimination (for the selection of small sets of non-redundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highly-correlated variables). Main applications in high-dimensional data (e.g., microarray data, and other genomics and proteomics applications).

Maintained by Ramon Diaz-Uriarte. Last updated 8 years ago.

8.7 match 12 stars 6.48 score 83 scripts 2 dependents

smouksassi

coveffectsplot:Produce Forest Plots to Visualize Covariate Effects

Produce forest plots to visualize covariate effects using either the command line or an interactive 'Shiny' application.

Maintained by Samer Mouksassi. Last updated 1 months ago.

7.0 match 32 stars 7.86 score 40 scripts

forest-economics-goettingen

woodValuationDE:Wood Valuation Germany

Monetary valuation of wood in German forests (stumpage values), including estimations of harvest quantities, wood revenues, and harvest costs. The functions are sensitive to tree species, mean diameter of the harvested trees, stand quality, and logging method. The functions include estimations for the consequences of disturbances on revenues and costs. The underlying assortment tables are taken from Offer and Staupendahl (2018) with corresponding functions for salable and skidded volume derived in Fuchs et al. (2023). Wood revenue and harvest cost functions were taken from v. Bodelschwingh (2018). The consequences of disturbances refer to Dieter (2001), Moellmann and Moehring (2017), and Fuchs et al. (2022a, 2022b). For the full references see documentation of the functions, package README, and Fuchs et al. (2023). Apart from Dieter (2001) and Moellmann and Moehring (2017), all functions and factors are based on data from HessenForst, the forest administration of the Federal State of Hesse in Germany.

Maintained by Jasper M. Fuchs. Last updated 8 months ago.

16.4 match 2 stars 3.30 score 2 scripts

distancedevelopment

Distance:Distance Sampling Detection Function and Abundance Estimation

A simple way of fitting detection functions to distance sampling data for both line and point transects. Adjustment term selection, left and right truncation as well as monotonicity constraints and binning are supported. Abundance and density estimates can also be calculated (via a Horvitz-Thompson-like estimator) if survey area information is provided. See Miller et al. (2019) <doi:10.18637/jss.v089.i01> for more information on methods and <https://examples.distancesampling.org/> for example analyses.

Maintained by Laura Marshall. Last updated 10 days ago.

6.0 match 11 stars 8.89 score 358 scripts 3 dependents

r-forge

party:A Laboratory for Recursive Partytioning

A computational toolbox for recursive partitioning. The core of the package is ctree(), an implementation of conditional inference trees which embed tree-structured regression models into a well defined theory of conditional inference procedures. This non-parametric class of regression trees is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Based on conditional inference trees, cforest() provides an implementation of Breiman's random forests. The function mob() implements an algorithm for recursive partitioning based on parametric models (e.g. linear models, GLMs or survival regression) employing parameter instability tests for split selection. Extensible functionality for visualizing tree-structured regression models is available. The methods are described in Hothorn et al. (2006) <doi:10.1198/106186006X133933>, Zeileis et al. (2008) <doi:10.1198/106186008X319331> and Strobl et al. (2007) <doi:10.1186/1471-2105-8-25>.

Maintained by Torsten Hothorn. Last updated 2 months ago.

openblas

4.6 match 11.52 score 3.2k scripts 29 dependents

sylvainschmitt

rcontroll:Individual-Based Forest Growth Simulator 'TROLL'

'TROLL' is coded in C++ and it typically simulates hundreds of thousands of individuals over hundreds of years. The 'rcontroll' R package is a wrapper of 'TROLL'. 'rcontroll' includes functions that generate inputs for simulations and run simulations. Finally, it is possible to analyse the 'TROLL' outputs through tables, figures, and maps taking advantage of other R visualisation packages. 'rcontroll' also offers the possibility to generate a virtual LiDAR point cloud that corresponds to a snapshot of the simulated forest.

Maintained by Sylvain Schmitt. Last updated 6 months ago.

gsl cpp

9.1 match 5 stars 5.76 score 19 scripts

forestry-labs

Rforestry:Random Forests, Linear Trees, and Gradient Boosting for Inference and Interpretability

Provides fast implementations of Honest Random Forests, Gradient Boosting, and Linear Random Forests, with an emphasis on inference and interpretability. Additionally contains methods for variable importance, out-of-bag prediction, regression monotonicity, and several methods for missing data imputation.

Maintained by Theo Saarinen. Last updated 3 days ago.

openblas cpp

9.4 match 5.57 score 82 scripts 1 dependents

azvoleff

gfcanalysis:Tools for Working with Hansen et al. Global Forest Change Dataset

Supports analyses using the Global Forest Change dataset released by Hansen et al. gfcanalysis was originally written for the Tropical Ecology Assessment and Monitoring (TEAM) Network. For additional details on the Global Forest Change dataset, see: Hansen, M. et al. 2013. "High-Resolution Global Maps of 21st-Century Forest Cover Change." Science 342 (15 November): 850-53. The forest change data and more information on the product is available at <http://earthenginepartners.appspot.com>.

Maintained by Matthew Cooper. Last updated 1 years ago.

10.6 match 17 stars 4.93 score 33 scripts

higuchip

forestdynR:Calculate Forest Dynamics

Determines the dynamics of tree species communities (mortality rates, recruitment, loss and gain in basal area, net changes and turnover). Important notes are a) The 'forest_df' argument (data) must contain the columns 'plot' (plot identification), 'spp' (species identification), DBH_1 (Diameter at breast height in first year of measure) and DBH_2 (Diameter at breast height in second year of measure). DBH_1 and DBH_2 must be numeric values; b) example input file in 'data(forest_df_example)'; c) The argument 'inv_time' represents the time between inventories, in years; d) The 'coord' argument must be of the type 'c(longitude, latitude)', with decimal degree values; e) Argument 'add_wd' represents a dataframe with wood density values (g cm-3) format with three columns ('genus', 'species', 'wd'). This argument is set to NULL by default, and if isn't provided, the wood density will be estimated with the getWoodDensity() function from the 'BIOMASS' package.

Maintained by Pedro Higuchi. Last updated 4 months ago.

10.8 match 4 stars 4.78 score 4 scripts

finleya

spBayes:Univariate and Multivariate Spatial-Temporal Modeling

Fits univariate and multivariate spatio-temporal random effects models for point-referenced data using Markov chain Monte Carlo (MCMC). Details are given in Finley, Banerjee, and Gelfand (2015) <doi:10.18637/jss.v063.i13> and Finley and Banerjee <doi:10.1016/j.envsoft.2019.104608>.

Maintained by Andrew Finley. Last updated 6 months ago.

openblas cpp openmp

10.9 match 1 stars 4.69 score 231 scripts 7 dependents

ropensci

MtreeRing:A Shiny Application for Automatic Measurements of Tree-Ring Widths on Digital Images

Use morphological image processing and edge detection algorithms to automatically measure tree ring widths on digital images. Users can also manually mark tree rings on species with complex anatomical structures. The arcs of inner-rings and angles of successive inclined ring boundaries are used to correct ring-width series. The package provides a Shiny-based application, allowing R beginners to easily analyze tree ring images and export ring-width series in standard file formats.

Maintained by Jingning Shi. Last updated 8 months ago.

dendrochronology forest forestry shiny-apps shinyapp tree-ring-width tree-rings

11.0 match 33 stars 4.66 score 14 scripts

softwaredeng

RRF:Regularized Random Forest

Feature Selection with Regularized Random Forest. This package is based on the 'randomForest' package by Andy Liaw. The key difference is the RRF() function that builds a regularized random forest. Fortran original by Leo Breiman and Adele Cutler, R port by Andy Liaw and Matthew Wiener, Regularized random forest for classification by Houtao Deng, Regularized random forest for regression by Xin Guan. Reference: Houtao Deng (2013) <doi:10.48550/arXiv.1306.0237>.

Maintained by Houtao Deng. Last updated 4 months ago.

fortran

13.4 match 3.81 score 118 scripts 3 dependents

cran

survcompare:Nested Cross-Validation to Compare Cox-PH, Cox-Lasso, Survival Random Forests

Performs repeated nested cross-validation for Cox Proportionate Hazards, Cox Lasso, Survival Random Forest, and their ensemble. Returns internally validated concordance index, time-dependent area under the curve, Brier score, calibration slope, and statistical testing of non-linear ensemble outperforming the baseline Cox model. In this, it helps researchers to quantify the gain of using a more complex survival model, or justify its redundancy. Equally, it shows the performance value of the non-linear and interaction terms, and may highlight the need of further feature transformation. Further details can be found in Shamsutdinova, Stamate, Roberts, & Stahl (2022) "Combining Cox Model and Tree-Based Algorithms to Boost Performance and Preserve Interpretability for Health Outcomes" <doi:10.1007/978-3-031-08337-2_15>, where the method is described as Ensemble 1.

Maintained by Diana Shamsutdinova. Last updated 5 months ago.

18.2 match 1 stars 2.70 score

zongzheng

forestSAS:Forest Spatial Structure Analysis Systems

Recent years have seen significant interest in neighborhood-based structural parameters that effectively represent the spatial characteristics of tree populations and forest communities, and possess strong applicability for guiding forestry practices. This package provides valuable information that enhances our understanding and analysis of the fine-scale spatial structure of tree populations and forest stands. Reference: Yan L, Tan W, Chai Z, et al (2019) <doi:10.13323/j.cnki.j.fafu(nat.sci.).2019.03.007>.

Maintained by Zongzheng Chai. Last updated 4 months ago.

35.5 match 1.38 score 24 scripts

ips-lmu

wrassp:Interface to the 'ASSP' Library

A wrapper around Michel Scheffers's 'libassp' (<https://libassp.sourceforge.net/>). The 'libassp' (Advanced Speech Signal Processor) library aims at providing functionality for handling speech signal files in most common audio formats and for performing analyses common in phonetic science/speech science. This includes the calculation of formants, fundamental frequency, root mean square, auto correlation, a variety of spectral analyses, zero crossing rate, filtering etc. This wrapper provides R with a large subset of 'libassp's signal processing functions and provides them to the user in a (hopefully) user-friendly manner.

Maintained by Markus Jochim. Last updated 1 years ago.

6.6 match 24 stars 7.43 score 62 scripts 3 dependents

chguiterman

dfoliatR:Detection and Analysis of Insect Defoliation Signals in Tree Rings

Tools to identify, quantify, analyze, and visualize growth suppression events in tree rings that are often produced by insect defoliation. Described in Guiterman et al. (2020) <doi:10.1016/j.dendro.2020.125750>.

Maintained by Chris Guiterman. Last updated 2 years ago.

budworm defoliators dendrochronology dendroecology disturbance forests insects outbreak tree-rings

10.0 match 7 stars 4.89 score 22 scripts

gisma

uavRmp:UAV Mission Planner

The Unmanned Aerial Vehicle Mission Planner provides an easy to use work flow for planning autonomous obstacle avoiding surveys of ready to fly unmanned aerial vehicles to retrieve aerial or spot related data. It creates either intermediate flight control files for the DJI-Litchi supported series or ready to upload control files for the pixhawk-based flight controller as used in the 3DR-Solo or Yuneec series. Additionally it contains some useful tools for digitizing and data manipulation.

Maintained by Chris Reudenbach. Last updated 9 months ago.

cultural-heritage dji drone flight-planning forest-mapping litchi low-budget-uav mission-planning photogrammetry pixhawk pixhawk-controller qgroundcontrol2litchi solo survey terrain-following terrain-mapping uavs yuneec

7.5 match 25 stars 6.48 score 6 scripts

schaffman5

rtf:Rich Text Format (RTF) Output

A set of R functions to output Rich Text Format (RTF) files with high resolution tables and graphics that may be edited with a standard word processor such as Microsoft Word.

Maintained by Michael E. Schaffer. Last updated 6 years ago.

5.7 match 5 stars 8.55 score 169 scripts 10 dependents

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

3.3 match 459 stars 14.63 score 948 scripts 18 dependents

ericarcher

banter:BioAcoustic eveNT classifiER

Create a hierarchical acoustic event species classifier out of multiple call type detectors as described in Rankin et al (2017) <doi:10.1111/mms.12381>.

Maintained by Eric Archer. Last updated 1 years ago.

acoustics bioacoustics cetaceans classification dolphins machine-learning noaa random-forest species-identification supervised-learning supervised-machine-learning whales jags cpp

11.2 match 9 stars 4.22 score 37 scripts

insightsengineering

tern:Create Common TLGs Used in Clinical Trials

Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.

Maintained by Joe Zhu. Last updated 2 months ago.

clinical-trials graphs listings nest outputs tables

3.7 match 79 stars 12.62 score 186 scripts 9 dependents

annechao

MF.beta4:Measuring Ecosystem Multi-Functionality and Its Decomposition

Provide simple functions to (i) compute a class of multi-functionality measures for a single ecosystem for given function weights, (ii) decompose gamma multi-functionality for pairs of ecosystems and K ecosystems (K can be greater than 2) into a within-ecosystem component (alpha multi-functionality) and an among-ecosystem component (beta multi-functionality). In each case, the correlation between functions can be corrected for. Based on biodiversity and ecosystem function data, this software also facilitates graphics for assessing biodiversity-ecosystem functioning relationships across scales.

Maintained by Anne Chao. Last updated 3 months ago.

10.5 match 4.40 score 3 scripts

cran

forestmodel:Forest Plots from Regression Models

Produces forest plots using 'ggplot2' from models produced by functions such as stats::lm(), stats::glm() and survival::coxph().

Maintained by Nick Kennedy. Last updated 5 years ago.

12.9 match 1 stars 3.58 score 5 dependents

mayer79

outForest:Multivariate Outlier Detection and Replacement

Provides a random forest based implementation of the method described in Chapter 7.1.2 (Regression model based anomaly detection) of Chandola et al. (2009) <doi:10.1145/1541880.1541882>. It works as follows: Each numeric variable is regressed onto all other variables by a random forest. If the scaled absolute difference between observed value and out-of-bag prediction of the corresponding random forest is suspiciously large, then a value is considered an outlier. The package offers different options to replace such outliers, e.g. by realistic values found via predictive mean matching. Once the method is trained on a reference data, it can be applied to new data.

Maintained by Michael Mayer. Last updated 8 months ago.

machine-learning outlier outlier-analysis outlier-detection random-forest

8.4 match 13 stars 5.39 score 19 scripts

luyouepiusf

SurvivalClusteringTree:Clustering Analysis Using Survival Tree and Forest Algorithms

An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters.

Maintained by Lu You. Last updated 2 years ago.

cpp

12.2 match 3.70 score 2 scripts

missvalteam

Iscores:Proper Scoring Rules for Missing Value Imputation

Implementation of a KL-based scoring rule to assess the quality of different missing value imputations in the broad sense as introduced in Michel et al. (2021) <arXiv:2106.03742>.

Maintained by Loris Michel. Last updated 2 years ago.

imputation-methods machine-learning missing-values random-forest

11.5 match 7 stars 3.91 score 23 scripts

scnext

SCGLR:Supervised Component Generalized Linear Regression

An extension of the Fisher Scoring Algorithm to combine PLS regression with GLM estimation in the multivariate context. Covariates can also be grouped in themes.

Maintained by Guillaume Cornu. Last updated 19 days ago.

partial-least-squares-regression

10.3 match 2 stars 4.30 score 67 scripts

predictiveecology

LandR:Landscape Ecosystem Modelling in R

Utilities for 'LandR' suite of landscape simulation models. These models simulate forest vegetation dynamics based on LANDIS-II, and incorporate fire and insect disturbance, as well as other important ecological processes. Models are implemented as 'SpaDES' modules.

Maintained by Eliot J B McIntire. Last updated 3 days ago.

ecological-modelling landscape-ecosystem-modelling spades

7.3 match 17 stars 6.07 score 12 scripts 4 dependents

parsifal9

RFlocalfdr:Significance Level for Random Forest Impurity Importance Scores

Sets a significance level for Random Forest MDI (Mean Decrease in Impurity, Gini or sum of squares) variable importance scores, using an empirical Bayes approach. See Dunne et al. (2022) <doi:10.1101/2022.04.06.487300>.

Maintained by Robert Dunne. Last updated 1 months ago.

9.3 match 1 stars 4.72 score 13 scripts

cran

forestinventory:Design-Based Global and Small-Area Estimations for Multiphase Forest Inventories

Extensive global and small-area estimation procedures for multiphase forest inventories under the design-based Monte-Carlo approach are provided. The implementation has been published in the Journal of Statistical Software (<doi:10.18637/jss.v097.i04>) and includes estimators for simple and cluster sampling published by Daniel Mandallaz in 2007 (<doi:10.1201/9781584889779>), 2013 (<doi:10.1139/cjfr-2012-0381>, <doi:10.1139/cjfr-2013-0181>, <doi:10.1139/cjfr-2013-0449>, <doi:10.3929/ethz-a-009990020>) and 2016 (<doi:10.3929/ethz-a-010579388>). It provides point estimates, their external- and design-based variances and confidence intervals, as well as a set of functions to analyze and visualize the produced estimates. The procedures have also been optimized for the use of remote sensing data as auxiliary information, as demonstrated in 2018 by Hill et al. (<doi:10.3390/rs10071052>).

Maintained by Andreas Hill. Last updated 4 years ago.

14.5 match 3.00 score 20 scripts

gjwgit

rattle:Graphical User Interface for Data Science in R

The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself. Note that RGtk2 and cairoDevice have been archived on CRAN. See <https://rattle.togaware.com> for installation instructions.

Maintained by Graham Williams. Last updated 3 years ago.

5.1 match 16 stars 8.48 score 3.0k scripts 3 dependents

r-forge

coin:Conditional Inference Procedures in a Permutation Test Framework

Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems described in <doi:10.18637/jss.v028.i08>.

Maintained by Torsten Hothorn. Last updated 9 months ago.

3.6 match 11.68 score 1.6k scripts 74 dependents

jaredsmurray

bcf:Causal Inference for a Binary Treatment and Continuous Outcome using Bayesian Causal Forests

Causal inference for a binary treatment and continuous outcome using Bayesian Causal Forests. See Hahn, Murray and Carvalho (2020) <https://projecteuclid.org/journals/bayesian-analysis/volume-15/issue-3/Bayesian-Regression-Tree-Models-for-Causal-Inference--Regularization-Confounding/10.1214/19-BA1195.full> for additional information. This implementation relies on code originally accompanying Pratola et. al. (2013) <arXiv:1309.1906>.

Maintained by Jared S. Murray. Last updated 1 years ago.

openblas cpp

5.1 match 41 stars 8.12 score 46 scripts

usdaforestservice

FIESTA:Forest Inventory Estimation and Analysis

A research estimation tool for analysts that work with sample-based inventory data from the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program.

Maintained by Grayson White. Last updated 2 days ago.

5.8 match 30 stars 7.24 score 62 scripts

cefet-rj-dal

daltoolbox:Leveraging Experiment Lines to Data Analytics

The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.

Maintained by Eduardo Ogasawara. Last updated 1 months ago.

6.3 match 1 stars 6.65 score 536 scripts 4 dependents

cran

CornerstoneR:Collection of Scripts for Interface Between 'Cornerstone' and 'R'

Collection of generic 'R' scripts which enable you to use existing 'R' routines in 'Cornerstone'. . The desktop application 'Cornerstone' (<https://www.camline.com/en/products/cornerstone/cornerstone-core.html>) is a data analysis software provided by 'camLine' that empowers engineering teams to find solutions even faster. The engineers incorporate intensified hands-on statistics into their projects. They benefit from an intuitive and uniquely designed graphical Workmap concept: you design experiments (DoE) and explore data, analyze dependencies, and find answers you can act upon, immediately, interactively, and without any programming. . While 'Cornerstone's' interface to the statistical programming language 'R' has been available since version 6.0, the latest interface with 'R' is even much more efficient. 'Cornerstone' release 7.1.1 allows you to integrate user defined 'R' packages directly into the standard 'Cornerstone' GUI. Your engineering team stays in 'Cornerstone's' graphical working environment and can apply 'R' routines, immediately and without the need to deal with programming code. Additionally, your 'R' programming team develops corresponding 'R' packages detached from 'Cornerstone' in their favorite 'R' environment. . Learn how to use 'R' packages in 'Cornerstone' 7.1.1 on 'camLineTV' YouTube channel (<https://www.youtube.com/watch?v=HEQHwq_laXU>) (available in German).

Maintained by Gerrith Djaja. Last updated 5 years ago.

11.7 match 3.54 score

cran

ForestElementsR:Data Structures and Functions for Working with Forest Data

Provides generic data structures and algorithms for use with forest mensuration data in a consistent framework. The functions and objects included are a collection of broadly applicable tools. More specialized applications should be implemented in separate packages that build on this foundation. Documentation about 'ForestElementsR' is provided by three vignettes included in this package. For an introduction to the field of forest mensuration, refer to the textbooks by Kershaw et al. (2017) <doi:10.1002/9781118902028>, and van Laar and Akca (2007) <doi:10.1007/978-1-4020-5991-9>.

Maintained by Peter Biber. Last updated 1 months ago.

11.8 match 3.48 score

nourmarzouka

multiclassPairs:Build MultiClass Pair-Based Classifiers using TSPs or RF

A toolbox to train a single sample classifier that uses in-sample feature relationships. The relationships are represented as feature1 < feature2 (e.g. gene1 < gene2). We provide two options to go with. First is based on 'switchBox' package which uses Top-score pairs algorithm. Second is a novel implementation based on random forest algorithm. For simple problems we recommend to use one-vs-rest using TSP option due to its simplicity and for being easy to interpret. For complex problems RF performs better. Both lines filter the features first then combine the filtered features to make the list of all the possible rules (i.e. rule1: feature1 < feature2, rule2: feature1 < feature3, etc...). Then the list of rules will be filtered and the most important and informative rules will be kept. The informative rules will be assembled in an one-vs-rest model or in an RF model. We provide a detailed description with each function in this package to explain the filtration and training methodology in each line. Reference: Marzouka & Eriksson (2021) <doi:10.1093/bioinformatics/btab088>.

Maintained by Nour-al-dain Marzouka. Last updated 2 years ago.

classification

8.5 match 12 stars 4.82 score 11 scripts

bips-hb

blockForest:Block Forests: Random Forests for Blocks of Clinical and Omics Covariate Data

A random forest variant 'block forest' ('BlockForest') tailored to the prediction of binary, survival and continuous outcomes using block-structured covariate data, for example, clinical covariates plus measurements of a certain omics data type or multi-omics data, that is, data for which measurements of different types of omics data and/or clinical data for each patient exist. Examples of different omics data types include gene expression measurements, mutation data and copy number variation measurements. Block forest are presented in Hornung & Wright (2019). The package includes four other random forest variants for multi-omics data: 'RandomBlock', 'BlockVarSel', 'VarProb', and 'SplitWeights'. These were also considered in Hornung & Wright (2019), but performed worse than block forest in their comparison study based on 20 real multi-omics data sets. Therefore, we recommend to use block forest ('BlockForest') in applications. The other random forest variants can, however, be consulted for academic purposes, for example, in the context of further methodological developments. Reference: Hornung, R. & Wright, M. N. (2019) Block Forests: random forests for blocks of clinical and omics covariate data. BMC Bioinformatics 20:358. <doi:10.1186/s12859-019-2942-y>.

Maintained by Marvin N. Wright. Last updated 2 years ago.

cpp

9.0 match 7 stars 4.52 score 47 scripts

hugaped

MBNMAdose:Dose-Response MBNMA Models

Fits Bayesian dose-response model-based network meta-analysis (MBNMA) that incorporate multiple doses within an agent by modelling different dose-response functions, as described by Mawdsley et al. (2016) <doi:10.1002/psp4.12091>. By modelling dose-response relationships this can connect networks of evidence that might otherwise be disconnected, and can improve precision on treatment estimates. Several common dose-response functions are provided; others may be added by the user. Various characteristics and assumptions can be flexibly added to the models, such as shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting at the treatment level.

Maintained by Hugo Pedder. Last updated 1 months ago.

jags cpp

6.1 match 10 stars 6.60 score

robingenuer

VSURF:Variable Selection Using Random Forests

Three steps variable selection procedure based on random forests. Initially developed to handle high dimensional data (for which number of variables largely exceeds number of observations), the package is very versatile and can treat most dimensions of data, for regression and supervised classification problems. First step is dedicated to eliminate irrelevant variables from the dataset. Second step aims to select all variables related to the response for interpretation purpose. Third step refines the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purpose. Genuer, R. Poggi, J.-M. and Tuleau-Malot, C. (2015) <https://journal.r-project.org/archive/2015-2/genuer-poggi-tuleaumalot.pdf>.

Maintained by Robin Genuer. Last updated 8 months ago.

5.3 match 36 stars 7.49 score 192 scripts 1 dependents

jmm34

abcrf:Approximate Bayesian Computation via Random Forests

Performs Approximate Bayesian Computation (ABC) model choice and parameter inference via random forests. Pudlo P., Marin J.-M., Estoup A., Cornuet J.-M., Gautier M. and Robert C. P. (2016) <doi:10.1093/bioinformatics/btv684>. Estoup A., Raynal L., Verdu P. and Marin J.-M. <http://journal-sfds.fr/article/view/709>. Raynal L., Marin J.-M., Pudlo P., Ribatet M., Robert C. P. and Estoup A. (2019) <doi:10.1093/bioinformatics/bty867>.

Maintained by Jean-Michel Marin. Last updated 2 years ago.

cpp

8.4 match 2 stars 4.69 score 74 scripts

kjakobse

EpiForsk:Code Sharing at the Department of Epidemiological Research at Statens Serum Institut

This is a collection of assorted functions and examples collected from various projects. Currently we have functionalities for simplifying overlapping time intervals, Charlson comorbidity score constructors for Danish data, getting frequency for multiple variables, getting standardized output from logistic and log-linear regressions, sibling design linear regression functionalities a method for calculating the confidence intervals for functions of parameters from a GLM, Bayes equivalent for hypothesis testing with asymptotic Bayes factor, and several help functions for generalized random forest analysis using 'grf'.

Maintained by Kim Daniel Jakobsen. Last updated 1 years ago.

8.8 match 4.48 score 8 scripts

schlosslab

mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines

An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.

Maintained by Kelly Sovacool. Last updated 2 years ago.

machine-learning

5.0 match 56 stars 7.83 score 86 scripts

kapsner

mlsurvlrnrs:R6-Based ML Survival Learners for 'mlexperiments'

Enhances 'mlexperiments' <https://CRAN.R-project.org/package=mlexperiments> with additional machine learning ('ML') learners for survival analysis. The package provides R6-based survival learners for the following algorithms: 'glmnet' <https://CRAN.R-project.org/package=glmnet>, 'ranger' <https://CRAN.R-project.org/package=ranger>, 'xgboost' <https://CRAN.R-project.org/package=xgboost>, and 'rpart' <https://CRAN.R-project.org/package=rpart>. These can be used directly with the 'mlexperiments' R package.

Maintained by Lorenz A. Kapsner. Last updated 10 days ago.

algorithms cox-regression experiments glmnet learners machine-learning random-survival-forests survival survival-support-vector-machine xgboost

6.7 match 4 stars 5.86 score 12 scripts

openintrostat

openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs

Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.

Maintained by Mine Çetinkaya-Rundel. Last updated 2 months ago.

data openintro

3.4 match 240 stars 11.39 score 6.0k scripts

simonyansenzhao

wsrf:Weighted Subspace Random Forest for Classification

A parallel implementation of Weighted Subspace Random Forest. The Weighted Subspace Random Forest algorithm was proposed in the International Journal of Data Warehousing and Mining by Baoxun Xu, Joshua Zhexue Huang, Graham Williams, Qiang Wang, and Yunming Ye (2012) <DOI:10.4018/jdwm.2012040103>. The algorithm can classify very high-dimensional data with random forests built using small subspaces. A novel variable weighting method is used for variable subspace selection in place of the traditional random variable sampling.This new approach is particularly useful in building models from high-dimensional data.

Maintained by He Zhao. Last updated 2 years ago.

cpp

7.9 match 14 stars 4.89 score 11 scripts

opengeos

whitebox:'WhiteboxTools' R Frontend

An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.

Maintained by Andrew Brown. Last updated 5 months ago.

geomorphometry geoprocessing geospatial gis hydrology remote-sensing rstudio

4.0 match 173 stars 9.65 score 203 scripts 2 dependents

ndphillips

FFTrees:Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees

Create, visualize, and test fast-and-frugal decision trees (FFTs) using the algorithms and methods described by Phillips, Neth, Woike & Gaissmaier (2017), <doi:10.1017/S1930297500006239>. FFTs are simple and transparent decision trees for solving binary classification problems. FFTs can be preferable to more complex algorithms because they require very little information, are easy to understand and communicate, and are robust against overfitting.

Maintained by Hansjoerg Neth. Last updated 5 months ago.

4.0 match 135 stars 9.58 score 144 scripts

gertvv

gemtc:Network Meta-Analysis Using Bayesian Methods

Network meta-analyses (mixed treatment comparisons) in the Bayesian framework using JAGS. Includes methods to assess heterogeneity and inconsistency, and a number of standard visualizations.

Maintained by Gert van Valkenhoef. Last updated 5 years ago.

jags cpp

5.1 match 44 stars 7.48 score 71 scripts 1 dependents

grunwaldlab

poppr:Genetic Analysis of Populations with Mixed Reproduction

Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grünwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grünwald (2015) <doi:10.3389/fgene.2015.00208>.

Maintained by Zhian N. Kamvar. Last updated 10 months ago.

clonality genetic-analysis genetic-distances minimum-spanning-networks multilocus-genotypes multilocus-lineages population-genetics populations openmp

3.5 match 69 stars 10.84 score 672 scripts

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

3.3 match 228 stars 11.43 score 221 scripts 1 dependents

markusul

SDModels:Spectrally Deconfounded Models

Screen for and analyze non-linear sparse direct effects in the presence of unobserved confounding using the spectral deconfounding techniques (Ćevid, Bühlmann, and Meinshausen (2020)<jmlr.org/papers/v21/19-545.html>, Guo, Ćevid, and Bühlmann (2022) <doi:10.1214/21-AOS2152>). These methods have been shown to be a good estimate for the true direct effect if we observe many covariates, e.g., high-dimensional settings, and we have fairly dense confounding. Even if the assumptions are violated, it seems like there is not much to lose, and the deconfounded models will, in general, estimate a function closer to the true one than classical least squares optimization. 'SDModels' provides functions SDAM() for Spectrally Deconfounded Additive Models (Scheidegger, Guo, and Bühlmann (2025) <doi:10.1145/3711116>) and SDForest() for Spectrally Deconfounded Random Forests (Ulmer, Scheidegger, and Bühlmann (2025) <doi:10.48550/arXiv.2502.03969>).

Maintained by Markus Ulmer. Last updated 4 days ago.

6.6 match 2 stars 5.67 score 15 scripts

biometry

bipartite:Visualising Bipartite Networks and Calculating Some (Ecological) Indices

Functions to visualise webs and calculate a series of indices commonly used to describe pattern in (ecological) webs. It focuses on webs consisting of only two levels (bipartite), e.g. pollination webs or predator-prey-webs. Visualisation is important to get an idea of what we are actually looking at, while the indices summarise different aspects of the web's topology.

Maintained by Carsten F. Dormann. Last updated 5 days ago.

cpp

3.4 match 37 stars 10.93 score 592 scripts 15 dependents

hugaped

MBNMAtime:Run Time-Course Model-Based Network Meta-Analysis (MBNMA) Models

Fits Bayesian time-course models for model-based network meta-analysis (MBNMA) that allows inclusion of multiple time-points from studies. Repeated measures over time are accounted for within studies by applying different time-course functions, following the method of Pedder et al. (2019) <doi:10.1002/jrsm.1351>. The method allows synthesis of studies with multiple follow-up measurements that can account for time-course for a single or multiple treatment comparisons. Several general time-course functions are provided; others may be added by the user. Various characteristics can be flexibly added to the models, such as correlation between time points and shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting.

Maintained by Hugo Pedder. Last updated 1 months ago.

jags cpp

6.1 match 7 stars 6.10 score

aleksandarsekulic

meteo:RFSI & STRK Interpolation for Meteo and Environmental Variables

Random Forest Spatial Interpolation (RFSI, Sekulić et al. (2020) <doi:10.3390/rs12101687>) and spatio-temporal geostatistical (spatio-temporal regression Kriging (STRK)) interpolation for meteorological (Kilibarda et al. (2014) <doi:10.1002/2013JD020803>, Sekulić et al. (2020) <doi:10.1007/s00704-019-03077-3>) and other environmental variables. Contains global spatio-temporal models calculated using publicly available data.

Maintained by Aleksandar Sekulić. Last updated 5 months ago.

7.4 match 18 stars 5.06 score 64 scripts

loukiaspin

rnmamod:Bayesian Network Meta-Analysis with Missing Participants

A comprehensive suite of functions to perform and visualise pairwise and network meta-analysis with aggregate binary or continuous missing participant outcome data. The package covers core Bayesian one-stage models implemented in a systematic review with multiple interventions, including fixed-effect and random-effects network meta-analysis, meta-regression, evaluation of the consistency assumption via the node-splitting approach and the unrelated mean effects model (original and revised model proposed by Spineli, (2022) <doi:10.1177/0272989X211068005>), and sensitivity analysis (see Spineli et al., (2021) <doi:10.1186/s12916-021-02195-y>). Missing participant outcome data are addressed in all models of the package (see Spineli, (2019) <doi:10.1186/s12874-019-0731-y>, Spineli et al., (2019) <doi:10.1002/sim.8207>, Spineli, (2019) <doi:10.1016/j.jclinepi.2018.09.002>, and Spineli et al., (2021) <doi:10.1002/jrsm.1478>). The robustness to primary analysis results can also be investigated using a novel intuitive index (see Spineli et al., (2021) <doi:10.1177/0962280220983544>). Methods to evaluate the transitivity assumption quantitatively are provided (see Spineli, (2024) <doi:10.1186/s12874-024-02436-7>). A novel index to facilitate interpretation of local inconsistency is also available (see Spineli, (2024) <doi:0.1186/s13643-024-02680-4>) The package also offers a rich, user-friendly visualisation toolkit that aids in appraising and interpreting the results thoroughly and preparing the manuscript for journal submission. The visualisation tools comprise the network plot, forest plots, panel of diagnostic plots, heatmaps on the extent of missing participant outcome data in the network, league heatmaps on estimation and prediction, rankograms, Bland-Altman plot, leverage plot, deviance scatterplot, heatmap of robustness, barplot of Kullback-Leibler divergence, heatmap of comparison dissimilarities and dendrogram of comparison clustering. The package also allows the user to export the results to an Excel file at the working directory.

Maintained by Loukia Spineli. Last updated 8 days ago.

jags cpp

5.6 match 5 stars 6.64 score 12 scripts

cliffordlai

bestglm:Best Subset GLM and Regression Utilities

Best subset glm using information criteria or cross-validation, carried by using 'leaps' algorithm (Furnival and Wilson, 1974) <doi:10.2307/1267601> or complete enumeration (Morgan and Tatar, 1972) <doi:10.1080/00401706.1972.10488918>. Implements PCR and PLS using AIC/BIC. Implements one-standard deviation rule for use with the 'caret' package.

Maintained by Yuanhao Lai. Last updated 5 years ago.

7.0 match 5.29 score 418 scripts 5 dependents

insightsengineering

teal.modules.clinical:'teal' Modules for Standard Clinical Outputs

Provides user-friendly tools for creating and customizing clinical trial reports. By leveraging the 'teal' framework, this package provides 'teal' modules to easily create an interactive panel that allows for seamless adjustments to data presentation, thereby streamlining the creation of detailed and accurate reports.

Maintained by Dawid Kaledkowski. Last updated 15 days ago.

clinical-trials modules nest outputs shiny

3.6 match 34 stars 10.25 score 149 scripts

bioc

CMA:Synthesis of microarray-based classification

This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.

Maintained by Roman Hornung. Last updated 5 months ago.

classification decisiontree

7.3 match 5.09 score 61 scripts

tidymodels

parsnip:A Common API to Modeling and Analysis Functions

A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).

Maintained by Max Kuhn. Last updated 3 days ago.

2.3 match 612 stars 16.37 score 3.4k scripts 69 dependents

valentint

rrcov:Scalable Robust Estimators with High Breakdown Point

Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point: principal component analysis (Filzmoser and Todorov (2013), <doi:10.1016/j.ins.2012.10.017>), linear and quadratic discriminant analysis (Todorov and Pires (2007)), multivariate tests (Todorov and Filzmoser (2010) <doi:10.1016/j.csda.2009.08.015>), outlier detection (Todorov et al. (2010) <doi:10.1007/s11634-010-0075-2>). See also Todorov and Filzmoser (2009) <urn:isbn:978-3838108148>, Todorov and Filzmoser (2010) <doi:10.18637/jss.v032.i03> and Boudt et al. (2019) <doi:10.1007/s11222-019-09869-x>.

Maintained by Valentin Todorov. Last updated 7 months ago.

fortran openblas

3.5 match 2 stars 10.51 score 484 scripts 96 dependents

inbo

forrescalc:Calculation of Aggregated Values on Dendrometry, Regeneration and Vegetation of Forests, Starting from Individual Tree Measures from Fieldmap

A collection of functions to load and aggregate measurements related to dendrometry, rejuvenation and vegetation, and to access plot level results from Flemish forest reserves in data package forresdat.

Maintained by Els Lommelen. Last updated 6 months ago.

9.7 match 3.79 score 123 scripts

fbartos

RoBMA:Robust Bayesian Meta-Analyses

A framework for estimating ensembles of meta-analytic and meta-regression models (assuming either presence or absence of the effect, heterogeneity, publication bias, and moderators). The RoBMA framework uses Bayesian model-averaging to combine the competing meta-analytic models into a model ensemble, weights the posterior parameter distributions based on posterior model probabilities and uses Bayes factors to test for the presence or absence of the individual components (e.g., effect vs. no effect; Bartoš et al., 2022, <doi:10.1002/jrsm.1594>; Maier, Bartoš & Wagenmakers, 2022, <doi:10.1037/met0000405>). Users can define a wide range of prior distributions for + the effect size, heterogeneity, publication bias (including selection models and PET-PEESE), and moderator components. The package provides convenient functions for summary, visualizations, and fit diagnostics.

Maintained by František Bartoš. Last updated 1 months ago.

meta-analysis model-averaging publication-bias jags openblas cpp

5.2 match 9 stars 6.97 score 53 scripts

romanhornung

diversityForest:Innovative Complex Split Procedures in Random Forests Through Candidate Split Sampling

Implements interaction forests [1], which are specific diversity forests and the basic form of diversity forests that uses univariable, binary splitting [2]. Interaction forests (IFs) are ensembles of decision trees that model quantitative and qualitative interaction effects using bivariable splitting. IFs come with the Effect Importance Measure (EIM), which can be used to identify variable pairs that feature quantitative and qualitative interaction effects with high predictive relevance. IFs and EIM focus on well interpretable forms of interactions. The package also offers plot functions for visualising the estimated forms of interaction effects. Categorical, metric, and survival outcomes are supported. This is a fork of the R package 'ranger' (main author: Marvin N. Wright) that implements random forests using an efficient C++ implementation. References: [1] Hornung, R. & Boulesteix, A.-L. (2022) Interaction Forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects. Computational Statistics & Data Analysis 171:107460, <doi:10.1016/j.csda.2022.107460>. [2] Hornung, R. (2022) Diversity forests: Using split sampling to enable innovative complex split procedures in random forests. SN Computer Science 3(2):1, <doi:10.1007/s42979-021-00920-1>.

Maintained by Roman Hornung. Last updated 2 years ago.

cpp

18.0 match 2.00 score 5 scripts

andykrause

hpiR:House Price Indexes

Compute house price indexes and series using a variety of different methods and models common through the real estate literature. Evaluate index 'goodness' based on accuracy, volatility and revision statistics. Background on basic model construction for repeat sales models can be found at: Case and Quigley (1991) <https://ideas.repec.org/a/tpr/restat/v73y1991i1p50-58.html> and for hedonic pricing models at: Bourassa et al (2006) <doi:10.1016/j.jhe.2006.03.001>. The package author's working paper on the random forest approach to house price indexes can be found at: <https://www.github.com/andykrause/hpi_research>.

Maintained by Andy Krause. Last updated 1 years ago.

7.5 match 15 stars 4.82 score 88 scripts

christianroever

bayesmeta:Bayesian Random-Effects Meta-Analysis and Meta-Regression

A collection of functions allowing to derive the posterior distribution of the model parameters in random-effects meta-analysis or meta-regression, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, shrinkage effects, posterior predictive p-values, etc.; For more details, see also Roever C (2020) <doi:10.18637/jss.v093.i06>, or Roever C and Friede T (2022) <doi:10.1016/j.cmpb.2022.107303>.

Maintained by Christian Roever. Last updated 1 years ago.

6.6 match 3 stars 5.40 score 73 scripts 1 dependents

cran

reportRmd:Tidy Presentation of Clinical Reporting

Streamlined statistical reporting in 'Rmarkdown' environments. Facilitates the automated reporting of descriptive statistics, multiple univariate models, multivariable models and tables combining these outputs. Plotting functions include customisable survival curves, forest plots from logistic and ordinal regression and bivariate comparison plots.

Maintained by Lisa Avery. Last updated 2 months ago.

10.3 match 3.45 score 19 scripts 1 dependents

yihui

knitr:A General-Purpose Package for Dynamic Report Generation in R

Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.

Maintained by Yihui Xie. Last updated 12 hours ago.

dynamic-documents knitr literate-programming rmarkdown sweave

1.5 match 2.4k stars 23.62 score 116k scripts 4.2k dependents

nelson-gon

manymodelr:Build and Tune Several Models

Frequently one needs a convenient way to build and tune several models in one go.The goal is to provide a number of machine learning convenience functions. It provides the ability to build, tune and obtain predictions of several models in one function. The models are built using functions from 'caret' with easier to read syntax. Kuhn(2014) <arXiv:1405.6974>.

Maintained by Nelson Gonzabato. Last updated 3 years ago.

analysis-of-variance anova correlation correlation-coefficient generalized-linear-models gradient-boosting-decision-trees knn-classification linear-models linear-regression machine-learning missing-values models r-programming random-forest-algorithm regression-models

6.7 match 2 stars 5.30 score 50 scripts

lcougnaud

nlcv:Nested Loop Cross Validation

Nested loop cross validation for classification purposes for misclassification error rate estimation. The package supports several methodologies for feature selection: random forest, Student t-test, limma, and provides an interface to the following classification methods in the 'MLInterfaces' package: linear, quadratic discriminant analyses, random forest, bagging, prediction analysis for microarray, generalized linear model, support vector machine (svm and ksvm). Visualizations to assess the quality of the classifier are included: plot of the ranks of the features, scores plot for a specific classification algorithm and number of features, misclassification rate for the different number of features and classification algorithms tested and ROC plot. For further details about the methodology, please check: Markus Ruschhaupt, Wolfgang Huber, Annemarie Poustka, and Ulrich Mansmann (2004) <doi:10.2202/1544-6115.1078>.

Maintained by Laure Cougnaud. Last updated 7 years ago.

17.4 match 2.00 score 8 scripts

jenniniku

gllvm:Generalized Linear Latent Variable Models

Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).

Maintained by Jenni Niku. Last updated 17 hours ago.

cpp openmp

3.3 match 51 stars 10.52 score 176 scripts 1 dependents

paulesantos

perutimber:Catalogue of the Timber Forest Species of the Peruvian Amazon

Access the data of the 'Catalogue of the Timber Forest Species of the Peruvian Amazon' Vásquez Martínez, R., & Rojas Gonzáles, R.D.P.(2022)<doi:10.21704/rfp.v37i3.1956>.

Maintained by Paul E. Santos Andrade. Last updated 3 months ago.

11.6 match 3.00 score 5 scripts

zackfisher

robumeta:Robust Variance Meta-Regression

Functions for conducting robust variance estimation (RVE) meta-regression using both large and small sample RVE estimators under various weighting schemes. These methods are distribution free and provide valid point estimates, standard errors and hypothesis tests even when the degree and structure of dependence between effect sizes is unknown. Also included are functions for conducting sensitivity analyses under correlated effects weighting and producing RVE-based forest plots.

Maintained by Zachary Fisher. Last updated 4 years ago.

4.5 match 8 stars 7.75 score 178 scripts 4 dependents

cran

CALIBERrfimpute:Multiple Imputation Using MICE and Random Forest

Functions to impute using random forest under full conditional specifications (multivariate imputation by chained equations). The methods are described in Shah and others (2014) <doi:10.1093/aje/kwt312>.

Maintained by Anoop Shah. Last updated 2 years ago.

13.1 match 2 stars 2.60 score

bcjaeger

obliqueRSF:Oblique Random Forests for Right-Censored Time-to-Event Data

Oblique random survival forests incorporate linear combinations of input variables into random survival forests (Ishwaran, 2008 <DOI:10.1214/08-AOAS169>). Regularized Cox proportional hazard models (Simon, 2016 <DOI:10.18637/jss.v039.i05>) are used to identify optimal linear combinations of input variables.

Maintained by Byron Jaeger. Last updated 3 years ago.

cpp

17.4 match 1.93 score 17 scripts

gavinsimpson

analogue:Analogue and Weighted Averaging Methods for Palaeoecology

Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.

Maintained by Gavin L. Simpson. Last updated 6 months ago.

3.8 match 14 stars 8.96 score 185 scripts 4 dependents

neilstats

ckbplotr:Create CKB Plots

ckbplotr provides functions to help create and style plots in R. It is being developed by, and primarily for, China Kadoorie Biobank researchers.

Maintained by Neil Wright. Last updated 2 months ago.

5.6 match 10 stars 5.87 score 37 scripts

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 5 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

2.0 match 462 stars 16.50 score 10k scripts 154 dependents

ddebeer

permimp:Conditional Permutation Importance

An add-on to the 'party' package, with a faster implementation of the partial-conditional permutation importance for random forests. The standard permutation importance is implemented exactly the same as in the 'party' package. The conditional permutation importance can be computed faster, with an option to be backward compatible to the 'party' implementation. The package is compatible with random forests fit using the 'party' and the 'randomForest' package. The methods are described in Strobl et al. (2007) <doi:10.1186/1471-2105-8-25> and Debeer and Strobl (2020) <doi:10.1186/s12859-020-03622-2>.

Maintained by Dries Debeer. Last updated 2 years ago.

5.6 match 4 stars 5.85 score 39 scripts 1 dependents

r-lidar

lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications

Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.

Maintained by Jean-Romain Roussel. Last updated 1 months ago.

als forestry las laz lidar point-cloud remote-sensing openblas cpp openmp

2.3 match 623 stars 14.47 score 844 scripts 8 dependents

ricgbl

etree:Classification and Regression with Structured and Mixed-Type Data

Implementation of Energy Trees, a statistical model to perform classification and regression with structured and mixed-type data. The model has a similar structure to Conditional Trees, but brings in Energy Statistics to test independence between variables that are possibly structured and of different nature. Currently, the package covers functions and graphs as structured covariates. It builds upon 'partykit' to provide functionalities for fitting, printing, plotting, and predicting with Energy Trees. Energy Trees are described in Giubilei et al. (2022) <arXiv:2207.04430>.

Maintained by Riccardo Giubilei. Last updated 3 years ago.

7.1 match 3 stars 4.52 score 11 scripts

ropensci

git2rdata:Store and Retrieve Data.frames in a Git Repository

The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette("plain_text", package = "git2rdata"). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette("version_control", package = "git2rdata"). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette("workflow", package = "git2rdata") gives a toy example. 4) vignette("efficiency", package = "git2rdata") provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.

Maintained by Thierry Onkelinx. Last updated 2 months ago.

reproducible-research version-control

3.1 match 99 stars 10.03 score 216 scripts 4 dependents

riccardo-df

ocf:Ordered Correlation Forest

Machine learning estimator specifically optimized for predictive modeling of ordered non-numeric outcomes. 'ocf' provides forest-based estimation of the conditional choice probabilities and the covariates’ marginal effects. Under an "honesty" condition, the estimates are consistent and asymptotically normal and standard errors can be obtained by leveraging the weight-based representation of the random forest predictions. Please reference the use as Di Francesco (2025) <doi:10.1080/07474938.2024.2429596>.

Maintained by Riccardo Di Francesco. Last updated 15 days ago.

cpp

7.9 match 3.95 score 5 scripts 1 dependents

biogenies

CancerGram:Prediction of Anticancer Peptides

Predicts anticancer peptides using random forests trained on the n-gram encoded peptides. The implemented algorithm can be accessed from both the command line and shiny-based GUI. The CancerGram model is too large for CRAN and it has to be downloaded separately from the repository: <https://github.com/BioGenies/CancerGramModel>. For more information see: Burdukiewicz et al. (2020) <doi:10.3390/pharmaceutics12111045>.

Maintained by Michal Burdukiewicz. Last updated 4 years ago.

anticancer-peptides bioinformatics k-mer n-gram peptide-identification random-forests

8.0 match 4 stars 3.90 score 3 scripts

zongzheng

forestHES:Forest Health Evaluation System at the Forest Stand Level

Assessing forest ecosystem health is an effective way for forest resource management.The national forest health evaluation system at the forest stand level using analytic hierarchy process, has a high application value and practical significance. The package can effectively and easily realize the total assessment process, and help foresters to further assess and management forest resources.

Maintained by Zongzheng Chai. Last updated 5 months ago.

27.9 match 1 stars 1.11 score 13 scripts

trotsiuk

r3PG:Simulating Forest Growth using the 3-PG Model

Provides a flexible and easy-to-use interface for the Physiological Processes Predicting Growth (3-PG) model written in Fortran. The r3PG serves as a flexible and easy-to-use interface for the 3-PGpjs (monospecific, evenaged and evergreen forests) described in Landsberg & Waring (1997) <doi:10.1016/S0378-1127(97)00026-1> and the 3-PGmix (deciduous, uneven-aged or mixed-species forests) described in Forrester & Tang (2016) <doi:10.1016/j.ecolmodel.2015.07.010>.

Maintained by Volodymyr Trotsiuk. Last updated 10 months ago.

fortran glibc

5.3 match 27 stars 5.83 score 25 scripts

myles-lewis

nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'

Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.

Maintained by Myles Lewis. Last updated 4 days ago.

3.9 match 12 stars 7.92 score 46 scripts

calakus

RFCCA:Random Forest with Canonical Correlation Analysis

Random Forest with Canonical Correlation Analysis (RFCCA) is a random forest method for estimating the canonical correlations between two sets of variables depending on the subject-related covariates. The trees are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. The method is described in Alakus et al. (2021) <doi:10.1093/bioinformatics/btab158>. 'RFCCA' uses 'randomForestSRC' package (Ishwaran and Kogalur, 2020) by freezing at the version 2.9.3. The custom splitting rule feature is utilised to apply the proposed splitting rule. The 'randomForestSRC' package implements 'OpenMP' by default, contingent upon the support provided by the target architecture and operating system. In this package, 'LAPACK' and 'BLAS' libraries are used for matrix decompositions.

Maintained by Cansu Alakus. Last updated 1 years ago.

openblas openmp

11.3 match 1 stars 2.70 score 3 scripts

jedalong

stampr:Spatial Temporal Analysis of Moving Polygons

Perform spatial temporal analysis of moving polygons; a longstanding analysis problem in Geographic Information Systems. Facilitates directional analysis, distance analysis, and some other simple functionality for examining spatial-temporal patterns of moving polygons.

Maintained by Jed Long. Last updated 12 months ago.

8.0 match 4 stars 3.81 score 16 scripts

rupppy

PiC:Pointcloud Interactive Computation for Forest Structure Analysis

Provides advanced algorithms for analyzing pointcloud data in forestry applications. Key features include fast voxelization of large datasets; segmentation of point clouds into forest floor, understorey, canopy, and wood components. The package enables efficient processing of large-scale forest pointcloud data, offering insights into forest structure, connectivity, and fire risk assessment. Algorithms to analyze pointcloud data (.xyz input file). For more details, see Ferrara & Arrizza (2025) <https://hdl.handle.net/20.500.14243/533471>. For single tree segmentation details, see Ferrara et al. (2018) <doi:10.1016/j.agrformet.2018.04.008>.

Maintained by Roberto Ferrara. Last updated 26 days ago.

7.8 match 3.88 score 19 scripts

bioc

dreamlet:Scalable differential expression analysis of single cell transcriptomics datasets with complex study designs

Recent advances in single cell/nucleus transcriptomic technology has enabled collection of cohort-scale datasets to study cell type specific gene expression differences associated disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-frieldly, purpose-build analytical framework. We have developed the dreamlet package that applies a pseudobulk approach and fits a regression model for each gene and cell cluster to test differential expression across individuals associated with a trait of interest. Use of precision-weighted linear mixed models enables accounting for repeated measures study designs, high dimensional batch effects, and varying sequencing depth or observed cells per biosample.

Maintained by Gabriel Hoffman. Last updated 5 months ago.

rnaseq geneexpression differentialexpression batcheffect qualitycontrol regression genesetenrichment generegulation epigenetics functionalgenomics transcriptomics normalization singlecell preprocessing sequencing immunooncology software cpp

3.8 match 12 stars 8.09 score 128 scripts

stephematician

literanger:Random Forests for Multiple Imputation Based on 'ranger'

An updated implementation of R package 'ranger' by Wright et al, (2017) <doi:10.18637/jss.v077.i01> for training and predicting from random forests, particularly suited to high-dimensional data, and for embedding in 'Multiple Imputation by Chained Equations' (MICE) by van Buuren (2007) <doi:10.1177/0962280206074463>. Ensembles of classification and regression trees are currently supported. Sparse data of class 'dgCMatrix' (R package 'Matrix') can be directly analyzed. Conventional bagged predictions are available alongside an efficient prediction for MICE via the algorithm proposed by Doove et al (2014) <doi:10.1016/j.csda.2013.10.025>. Survival and probability forests are not supported in the update, nor is data of class 'gwaa.data' (R package 'GenABEL'); use the original 'ranger' package for these analyses.

Maintained by Stephen Wade. Last updated 6 months ago.

cpp

9.3 match 3.26 score 2 scripts

krisrs1128

multimedia:Multimodal Mediation Analysis

Multimodal mediation analysis is an emerging problem in microbiome data analysis. Multimedia make advanced mediation analysis techniques easy to use, ensuring that all statistical components are transparent and adaptable to specific problem contexts. The package provides a uniform interface to direct and indirect effect estimation, synthetic null hypothesis testing, bootstrap confidence interval construction, and sensitivity analysis. More details are available in Jiang et al. (2024) "multimedia: Multimodal Mediation Analysis of Microbiome Data" <doi:10.1101/2024.03.27.587024>.

Maintained by Kris Sankaran. Last updated 29 days ago.

coverage microbiome regression sequencing software statisticalmethod structuralequationmodels causal-inference data-integration mediation-analysis

5.4 match 1 stars 5.56 score 13 scripts

ldbk

m2b:Movement to Behaviour Inference using Random Forest

Prediction of behaviour from movement characteristics using observation and random forest for the analyses of movement data in ecology. From movement information (speed, bearing...) the model predicts the observed behaviour (movement, foraging...) using random forest. The model can then extrapolate behavioural information to movement data without direct observation of behaviours. The specificity of this method relies on the derivation of multiple predictor variables from the movement data over a range of temporal windows. This procedure allows to capture as much information as possible on the changes and variations of movement and ensures the use of the random forest algorithm to its best capacity. The method is very generic, applicable to any set of data providing movement data together with observation of behaviour.

Maintained by Laurent Dubroca. Last updated 8 years ago.

7.3 match 2 stars 4.08 score 12 scripts

bnowok

synthpop:Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. For a description of the implemented method see Nowok, Raab and Dibben (2016) <doi:10.18637/jss.v074.i11>.

Maintained by Beata Nowok. Last updated 3 years ago.

3.8 match 44 stars 7.85 score 536 scripts

paulhendricks

titanic:Titanic Passenger Survival Data Set

This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner "Titanic", with variables such as economic status (class), sex, age, and survival. Whereas the base R Titanic data found by calling data("Titanic") is an array resulting from cross-tabulating 2201 observations, these data sets are individual non-aggregated observations and formatted in a machine learning context with a training sample, a testing sample, and two additional data sets that can be used for deeper machine learning analysis. These data sets are used in a very well known Kaggle competition; formatting the raw data sets in a package hopefully lowers the barrier to entry for users new to R and machine learning.

Maintained by Paul Hendricks. Last updated 8 years ago.

3.3 match 10 stars 8.95 score 804 scripts 2 dependents

ropensci

frictionless:Read and Write Frictionless Data Packages

Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets.

Maintained by Peter Desmet. Last updated 6 months ago.

frictionlessdata oscibio

3.0 match 30 stars 9.79 score 55 scripts 6 dependents

bioc

survcomp:Performance Assessment and Comparison for Survival Analysis

Assessment and Comparison for Performance of Risk Prediction (Survival) Models.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression differentialexpression visualization cpp

3.5 match 8.46 score 448 scripts 12 dependents

forestry-labs

distillML:Model Distillation and Interpretability Methods for Machine Learning Models

Provides several methods for model distillation and interpretability for general black box machine learning models and treatment effect estimation methods. For details on the algorithms implemented, see <https://forestry-labs.github.io/distillML/index.html> Brian Cho, Theo F. Saarinen, Jasjeet S. Sekhon, Simon Walter.

Maintained by Theo Saarinen. Last updated 2 years ago.

bart distillation-model explainable-machine-learning explainable-ml interpretability interpretable-machine-learning machine-learning model random-forest xgboost

7.5 match 7 stars 3.92 score 12 scripts

civisanalytics

civis:R Client for the 'Civis Platform API'

A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.

Maintained by Peter Cooman. Last updated 2 months ago.

3.8 match 16 stars 7.84 score 144 scripts

psychmeta

psychmeta:Psychometric Meta-Analysis Toolkit

Tools for computing bare-bones and psychometric meta-analyses and for generating psychometric data for use in meta-analysis simulations. Supports bare-bones, individual-correction, and artifact-distribution methods for meta-analyzing correlations and d values. Includes tools for converting effect sizes, computing sporadic artifact corrections, reshaping meta-analytic databases, computing multivariate corrections for range variation, and more. Bugs can be reported to <https://github.com/psychmeta/psychmeta/issues> or <issues@psychmeta.com>.

Maintained by Jeffrey A. Dahlke. Last updated 9 months ago.

hacktoberfest meta-analysis psychology psychometric psychometrics

3.5 match 57 stars 8.25 score 151 scripts

jeffreyevans

yaImpute:Nearest Neighbor Observation Imputation and Evaluation Tools

Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation: analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.

Maintained by Jeffrey S. Evans. Last updated 6 months ago.

imputation cpp

3.9 match 3 stars 7.40 score 94 scripts 12 dependents

lorismichel

drf:Distributional Random Forests

An implementation of distributional random forests as introduced in Cevid & Michel & Meinshausen & Buhlmann (2020) <arXiv:2005.14458>.

Maintained by Loris Michel. Last updated 4 years ago.

cpp

18.3 match 1.59 score 39 scripts

statistikat

VIM:Visualization and Imputation of Missing Values

New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.

Maintained by Matthias Templ. Last updated 7 months ago.

hotdeck imputation-methods model-predictions visualization cpp

2.0 match 85 stars 14.44 score 2.6k scripts 19 dependents

smin95

smplot2:Create Standalone and Composite Plots in 'ggplot2' for Publications

Provides functions for creating and annotating a composite plot in 'ggplot2'. Offers background themes and shortcut plotting functions that produce figures that are appropriate for the format of scientific journals. Some methods are described in Min and Zhou (2021) <doi:10.3389/fgene.2021.802894>.

Maintained by Seung Hyun Min. Last updated 1 months ago.

easy-to-use ggplot2 scientific-visualization visualization

4.0 match 24 stars 7.08 score 288 scripts 1 dependents

oxfordihtm

oxthema:Oxford Colours, Palettes, Fonts, and Themes

Colours, palettes, fonts, and themes based on University of Oxford's visual identity guidelines <https://communications.web.ox.ac.uk/communications-resources/visual-identity/identity-guidelines>.

Maintained by Ernest Guevarra. Last updated 5 months ago.

ggplot-themes oxford

5.8 match 3 stars 4.91 score 10 scripts

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 8 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

1.9 match 959 stars 15.16 score 4.0k scripts 21 dependents

calakus

CovRegRF:Covariance Regression with Random Forests

Covariance Regression with Random Forests (CovRegRF) is a random forest method for estimating the covariance matrix of a multivariate response given a set of covariates. Random forest trees are built with a new splitting rule which is designed to maximize the distance between the sample covariance matrix estimates of the child nodes. The method is described in Alakus et al. (2023) <doi:10.1186/s12859-023-05377-y>. 'CovRegRF' uses 'randomForestSRC' package (Ishwaran and Kogalur, 2022) <https://cran.r-project.org/package=randomForestSRC> by freezing at the version 3.1.0. The custom splitting rule feature is utilised to apply the proposed splitting rule. The 'randomForestSRC' package implements 'OpenMP' by default, contingent upon the support provided by the target architecture and operating system. In this package, 'LAPACK' and 'BLAS' libraries are used for matrix decompositions.

Maintained by Cansu Alakus. Last updated 8 months ago.

openblas openmp

10.5 match 2.70 score 3 scripts

cidree

forestdata:Download Forestry Data

Functions for downloading forestry and land use data for use in spatial analysis. This packages offers a user-friendly solution to quickly obtain datasets such as forest height, forest types, tree species under various climate change scenarios, or land use data among others.

Maintained by Adrián Cidre González. Last updated 3 months ago.

6.8 match 13 stars 4.14 score 7 scripts

aberhrml

forestControl:Approximate False Positive Rate Control in Selection Frequency for Random Forest

Approximate false positive rate control in selection frequency for random forest using the methods described by Ender Konukoglu and Melanie Ganz (2014) <arXiv:1410.2838>. Methods for calculating the selection frequency threshold at false positive rates and selection frequency false positive rate feature selection.

Maintained by Tom Wilson. Last updated 3 years ago.

randomforest cpp

7.0 match 2 stars 4.00 score 7 scripts