Showing 200 of total 680 results (show query)
merck
forestly:Interactive Forest Plot
Interactive forest plot for clinical trial safety analysis using 'metalite', 'reactable', 'plotly', and Analysis Data Model (ADaM) datasets. Includes functionality for adverse event filtering, incidence-based group filtering, hover-over reveals, and search and sort operations. The workflow allows for metadata construction, data preparation, output formatting, and interactive plot generation.
Maintained by Benjamin Wang. Last updated 2 months ago.
69.5 match 14 stars 7.59 score 12 scripts 1 dependentsdavid-cortes
isotree:Isolation-Based Outlier Detection
Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <doi:10.48550/arXiv.1811.02141>), SCiForest (Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <doi:10.48550/arXiv.2110.13402>), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <doi:10.48550/arXiv.1910.12362>), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <doi:10.48550/arXiv.1911.06646>), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <doi:10.48550/arXiv.2111.11639>). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.
Maintained by David Cortes. Last updated 13 days ago.
anomaly-detectionimputationisolation-forestoutlier-detectioncppopenmp
48.0 match 203 stars 10.41 score 115 scripts 6 dependentscran
grf:Generalized Random Forests
Forest-based statistical estimation and inference. GRF provides non-parametric methods for heterogeneous treatment effects estimation (optionally using right-censored outcomes, multiple treatment arms or outcomes, or instrumental variables), as well as least-squares regression, quantile regression, and survival regression, all with support for missing covariates.
Maintained by Erik Sverdrup. Last updated 4 months ago.
75.9 match 5.83 score 1.2k scripts 14 dependentswviechtb
metafor:Meta-Analysis Package for R
A comprehensive collection of functions for conducting meta-analyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit equal-, fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots (e.g., forest, funnel, radial, L'Abbe, Baujat, bubble, and GOSH plots). For meta-analyses of binomial and person-time data, the package also provides functions that implement specialized methods, including the Mantel-Haenszel method, Peto's method, and a variety of suitable generalized linear (mixed-effects) models (i.e., mixed-effects logistic and Poisson regression models). Finally, the package provides functionality for fitting meta-analytic multivariate/multilevel models that account for non-independent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network meta-analyses and meta-analyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted. An introduction to the package can be found in Viechtbauer (2010) <doi:10.18637/jss.v036.i03>.
Maintained by Wolfgang Viechtbauer. Last updated 21 hours ago.
meta-analysismixed-effectsmultilevel-modelsmultivariate
25.0 match 246 stars 16.30 score 4.9k scripts 92 dependentssollano
forestmangr:Forest Mensuration and Management
Processing forest inventory data with methods such as simple random sampling, stratified random sampling and systematic sampling. There are also functions for yield and growth predictions and model fitting, linear and nonlinear grouped data fitting, and statistical tests. References: Kershaw Jr., Ducey, Beers and Husch (2016). <doi:10.1002/9781118902028>.
Maintained by Sollano Rabelo Braga. Last updated 3 months ago.
50.2 match 17 stars 7.97 score 378 scriptsstochastictree
stochtree:Stochastic Tree Ensembles (XBART and BART) for Supervised Learning and Causal Inference
Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285> for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195> for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers.
Maintained by Drew Herren. Last updated 16 days ago.
bartbayesian-machine-learningbayesian-methodsdecision-treesgradient-boosted-treesmachine-learningprobabilistic-modelstree-ensemblescpp
42.0 match 20 stars 8.52 score 40 scriptsbrandmaier
semtree:Recursive Partitioning for Structural Equation Models
SEM Trees and SEM Forests -- an extension of model-based decision trees and forests to Structural Equation Models (SEM). SEM trees hierarchically split empirical data into homogeneous groups each sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences. SEM forests are an extension of SEM trees. They are ensembles of SEM trees each built on a random sample of the original data. By aggregating over a forest, we obtain measures of variable importance that are more robust than measures from single trees. A description of the method was published by Brandmaier, von Oertzen, McArdle, & Lindenberger (2013) <doi:10.1037/a0030001> and Arnold, Voelkle, & Brandmaier (2020) <doi:10.3389/fpsyg.2020.564403>.
Maintained by Andreas M. Brandmaier. Last updated 3 months ago.
bigdatadecision-treeforestmultivariaterandomforestrecursive-partitioningsemstatistical-modelingstructural-equation-modelingstructural-equation-models
37.8 match 15 stars 8.63 score 68 scriptsemf-creaf
medfate:Mediterranean Forest Simulation
Simulate Mediterranean forest functioning and dynamics using cohort-based description of vegetation [De Caceres et al. (2015) <doi:10.1016/j.agrformet.2015.06.012>; De Caceres et al. (2021) <doi:10.1016/j.agrformet.2020.108233>].
Maintained by Miquel De Cáceres. Last updated 8 days ago.
40.6 match 11 stars 7.49 score 183 scripts 1 dependentsmodeloriented
randomForestExplainer:Explaining and Visualizing Random Forests in Terms of Variable Importance
A set of tools to help explain which variables are most important in a random forests. Various variable importance measures are calculated and visualized in different settings in order to get an idea on how their importance changes depending on our criteria (Hemant Ishwaran and Udaya B. Kogalur and Eiran Z. Gorodeski and Andy J. Minn and Michael S. Lauer (2010) <doi:10.1198/jasa.2009.tm08622>, Leo Breiman (2001) <doi:10.1023/A:1010933404324>).
Maintained by Yue Jiang. Last updated 12 months ago.
29.1 match 231 stars 9.82 score 236 scriptssimonpcouch
forested:Forest Attributes in Washington State
A small subset of plots in Washington State are sampled and assessed "on-the-ground" as forested or non-forested by the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program, but the FIA also has access to remotely sensed data for all land in the state. The 'forested' package contains a data frame by the same name intended for use in predictive modeling applications where the more easily-accessible remotely sensed data can be used to predict whether a plot is forested or non-forested.
Maintained by Simon Couch. Last updated 7 months ago.
58.0 match 7 stars 4.66 score 33 scriptsmlr-org
mlr3extralearners:Extra Learners For mlr3
Extra learners for use in mlr3.
Maintained by Sebastian Fischer. Last updated 4 months ago.
25.3 match 94 stars 9.16 score 474 scriptsandrew-plowright
ForestTools:Tools for Analyzing Remote Sensing Forest Data
Tools for analyzing remote sensing forest data, including functions for detecting treetops from canopy models, outlining tree crowns, and calculating textural metrics.
Maintained by Andrew Plowright. Last updated 1 months ago.
32.9 match 73 stars 7.01 score 103 scripts 1 dependentsspatstat
spatstat.data:Datasets for 'spatstat' Family
Contains all the datasets for the 'spatstat' family of packages.
Maintained by Adrian Baddeley. Last updated 3 hours ago.
kernel-densitypoint-processspatial-analysisspatial-dataspatial-data-analysisspatstatstatistical-analysisstatistical-methodsstatistical-testsstatistics
19.8 match 6 stars 11.02 score 186 scripts 228 dependentsadayim
forestploter:Create a Flexible Forest Plot
Create a forest plot based on the layout of the data. Confidence intervals in multiple columns by groups can be done easily. Editing the plot, inserting/adding text, applying a theme to the plot, and much more.
Maintained by Alimu Dayimu. Last updated 6 months ago.
22.2 match 93 stars 9.31 score 207 scripts 4 dependentsmrcieu
TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database
A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.
Maintained by Gibran Hemani. Last updated 9 days ago.
17.0 match 467 stars 11.23 score 1.7k scripts 1 dependentskogalur
randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.
Maintained by Udaya B. Kogalur. Last updated 2 months ago.
23.9 match 10 stars 7.90 score 1.2k scripts 12 dependentsimbs-hl
ranger:A Fast Implementation of Random Forests
A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genome-wide association studies can be analyzed efficiently. In addition to data frames, datasets of class 'gwaa.data' (R package 'GenABEL') and 'dgCMatrix' (R package 'Matrix') can be directly analyzed.
Maintained by Marvin N. Wright. Last updated 4 months ago.
11.5 match 783 stars 16.22 score 9.2k scripts 189 dependentsropensci
aorsf:Accelerated Oblique Random Forests
Fit, interpret, and compute predictions with oblique random forests. Includes support for partial dependence, variable importance, passing customized functions for variable importance and identification of linear combinations of features. Methods for the oblique random survival forest are described in Jaeger et al., (2023) <DOI:10.1080/10618600.2023.2231048>.
Maintained by Byron Jaeger. Last updated 2 days ago.
data-scienceobliquerandom-forestsurvivalopenblascppopenmp
19.3 match 58 stars 9.21 score 60 scripts 1 dependentstmlange
optRF:Optimising Random Forest Stability by Determining the Optimal Number of Trees
Calculating the stability of random forest with certain numbers of trees. The non-linear relationship between stability and numbers of trees is described using a logistic regression model and used to estimate the optimal number of trees.
Maintained by Thomas Martin Lange. Last updated 1 months ago.
35.1 match 4.78 scorejinli22
spm:Spatial Predictive Modeling
Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https:www.ga.gov.au/metadata-gateway/metadata/record/gcat_71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https:www.ga.gov.au/metadata-gateway/metadata/record/74030>.
Maintained by Jin Li. Last updated 3 years ago.
29.7 match 3 stars 5.46 score 107 scripts 3 dependentsigraph
igraph:Network Analysis and Visualization
Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
Maintained by Kirill Müller. Last updated 1 days ago.
complex-networksgraph-algorithmsgraph-theorymathematicsnetwork-analysisnetwork-graphfortranlibxml2glpkopenblascpp
7.6 match 581 stars 21.10 score 31k scripts 1.9k dependentsdoserjef
rFIA:Estimation of Forest Variables using the FIA Database
The goal of 'rFIA' is to increase the accessibility and use of the United States Forest Services (USFS) Forest Inventory and Analysis (FIA) Database by providing a user-friendly, open source toolkit to easily query and analyze FIA Data. Designed to accommodate a wide range of potential user objectives, 'rFIA' simplifies the estimation of forest variables from the FIA Database and allows all R users (experts and newcomers alike) to unlock the flexibility inherent to the Enhanced FIA design. Specifically, 'rFIA' improves accessibility to the spatial-temporal estimation capacity of the FIA Database by producing space-time indexed summaries of forest variables within user-defined population boundaries. Direct integration with other popular R packages (e.g., 'dplyr', 'tidyr', and 'sf') facilitates efficient space-time query and data summary, and supports common data representations and API design. The package implements design-based estimation procedures outlined by Bechtold & Patterson (2005) <doi:10.2737/SRS-GTR-80>, and has been validated against estimates and sampling errors produced by FIA 'EVALIDator'. Current development is focused on the implementation of spatially-enabled model-assisted and model-based estimators to improve population, change, and ratio estimates.
Maintained by Jeffrey Doser. Last updated 7 days ago.
compute-estimatesfiafia-databasefia-datamartforest-inventoryforest-variablesinventoriesspace-timespatial
25.9 match 49 stars 5.93 scoreandyliaw-mrk
randomForest:Breiman and Cutlers Random Forests for Classification and Regression
Classification and regression based on a forest of trees using random inputs, based on Breiman (2001) <DOI:10.1023/A:1010933404324>.
Maintained by Andy Liaw. Last updated 6 months ago.
12.6 match 47 stars 12.11 score 35k scripts 282 dependentsguido-s
netmeta:Network Meta-Analysis using Frequentist Methods
A comprehensive set of functions providing frequentist methods for network meta-analysis (Balduzzi et al., 2023) <doi:10.18637/jss.v106.i02> and supporting Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 8 "Network Meta-Analysis": - frequentist network meta-analysis following Rücker (2012) <doi:10.1002/jrsm.1058>; - additive network meta-analysis for combinations of treatments (Rücker et al., 2020) <doi:10.1002/bimj.201800167>; - network meta-analysis of binary data using the Mantel-Haenszel or non-central hypergeometric distribution method (Efthimiou et al., 2019) <doi:10.1002/sim.8158>, or penalised logistic regression (Evrenoglou et al., 2022) <doi:10.1002/sim.9562>; - rankograms and ranking of treatments by the Surface under the cumulative ranking curve (SUCRA) (Salanti et al., 2013) <doi:10.1016/j.jclinepi.2010.03.016>; - ranking of treatments using P-scores (frequentist analogue of SUCRAs without resampling) according to Rücker & Schwarzer (2015) <doi:10.1186/s12874-015-0060-8>; - split direct and indirect evidence to check consistency (Dias et al., 2010) <doi:10.1002/sim.3767>, (Efthimiou et al., 2019) <doi:10.1002/sim.8158>; - league table with network meta-analysis results; - 'comparison-adjusted' funnel plot (Chaimani & Salanti, 2012) <doi:10.1002/jrsm.57>; - net heat plot and design-based decomposition of Cochran's Q according to Krahn et al. (2013) <doi:10.1186/1471-2288-13-35>; - measures characterizing the flow of evidence between two treatments by König et al. (2013) <doi:10.1002/sim.6001>; - automated drawing of network graphs described in Rücker & Schwarzer (2016) <doi:10.1002/jrsm.1143>; - partial order of treatment rankings ('poset') and Hasse diagram for 'poset' (Carlsen & Bruggemann, 2014) <doi:10.1002/cem.2569>; (Rücker & Schwarzer, 2017) <doi:10.1002/jrsm.1270>; - contribution matrix as described in Papakonstantinou et al. (2018) <doi:10.12688/f1000research.14770.3> and Davies et al. (2022) <doi:10.1002/sim.9346>; - subgroup network meta-analysis.
Maintained by Guido Schwarzer. Last updated 13 hours ago.
meta-analysisnetwork-meta-analysisrstudio
12.1 match 33 stars 11.82 score 199 scripts 10 dependentsguido-s
meta:General Package for Meta-Analysis
User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker <DOI:10.1007/978-3-319-21416-0>, "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from 'RevMan 5'; - produce forest plot summarising several (subgroup) meta-analyses.
Maintained by Guido Schwarzer. Last updated 24 days ago.
9.4 match 84 stars 14.84 score 2.3k scripts 29 dependentsjlmelville
rnndescent:Nearest Neighbor Descent Method for Approximate Nearest Neighbors
The Nearest Neighbor Descent method for finding approximate nearest neighbors by Dong and co-workers (2010) <doi:10.1145/1963405.1963487>. Based on the 'Python' package 'PyNNDescent' <https://github.com/lmcinnes/pynndescent>.
Maintained by James Melville. Last updated 8 months ago.
approximate-nearest-neighbor-searchcpp
18.9 match 11 stars 7.31 score 75 scriptsblasbenito
spatialRF:Easy Spatial Modeling with Random Forest
Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient 'ranger' package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).
Maintained by Blas M. Benito. Last updated 3 years ago.
random-forestspatial-analysisspatial-regression
25.2 match 114 stars 5.45 score 49 scriptssmartdata-analysis-and-statistics
metamisc:Meta-Analysis of Diagnosis and Prognosis Research Studies
Facilitate frequentist and Bayesian meta-analysis of diagnosis and prognosis research studies. It includes functions to summarize multiple estimates of prediction model discrimination and calibration performance (Debray et al., 2019) <doi:10.1177/0962280218785504>. It also includes functions to evaluate funnel plot asymmetry (Debray et al., 2018) <doi:10.1002/jrsm.1266>. Finally, the package provides functions for developing multivariable prediction models from datasets with clustering (de Jong et al., 2021) <doi:10.1002/sim.8981>.
Maintained by Thomas Debray. Last updated 29 days ago.
meta-analysisprognosisprognostic-models
18.3 match 7 stars 7.48 score 102 scriptsokasag
orf:Ordered Random Forests
An implementation of the Ordered Forest estimator as developed in Lechner & Okasa (2019) <arXiv:1907.02436>. The Ordered Forest flexibly estimates the conditional probabilities of models with ordered categorical outcomes (so-called ordered choice models). Additionally to common machine learning algorithms the 'orf' package provides functions for estimating marginal effects as well as statistical inference thereof and thus provides similar output as in standard econometric models for ordered choice. The core forest algorithm relies on the fast C++ forest implementation from the 'ranger' package (Wright & Ziegler, 2017) <arXiv:1508.04409>.
Maintained by Gabriel Okasa. Last updated 3 years ago.
25.4 match 12 stars 5.38 score 22 scripts 2 dependentscran
randomUniformForest:Random Uniform Forests for Classification, Regression and Unsupervised Learning
Ensemble model, for classification, regression and unsupervised learning, based on a forest of unpruned and randomized binary decision trees. Each tree is grown by sampling, with replacement, a set of variables at each node. Each cut-point is generated randomly, according to the continuous Uniform distribution. For each tree, data are either bootstrapped or subsampled. The unsupervised mode introduces clustering, dimension reduction and variable importance, using a three-layer engine. Random Uniform Forests are mainly aimed to lower correlation between trees (or trees residuals), to provide a deep analysis of variable importance and to allow native distributed and incremental learning.
Maintained by Saip Ciss. Last updated 3 years ago.
33.3 match 3 stars 3.77 score 99 scriptsmayer79
missRanger:Fast Imputation of Missing Values
Alternative implementation of the beautiful 'MissForest' algorithm used to impute mixed-type data sets by chaining random forests, introduced by Stekhoven, D.J. and Buehlmann, P. (2012) <doi:10.1093/bioinformatics/btr597>. Under the hood, it uses the lightning fast random forest package 'ranger'. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids imputation with values not already present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow, e.g., to do multiple imputation when repeating the call to missRanger(). Out-of-sample application is supported as well.
Maintained by Michael Mayer. Last updated 3 months ago.
imputationmachine-learningmissing-valuesrandom-forest
11.3 match 69 stars 11.07 score 208 scripts 6 dependentsinsightsengineering
chevron:Standard TLGs for Clinical Trials Reporting
Provide standard tables, listings, and graphs (TLGs) libraries used in clinical trials. This package implements a structure to reformat the data with 'dunlin', create reporting tables using 'rtables' and 'tern' with standardized input arguments to enable quick generation of standard outputs. In addition, it also provides comprehensive data checks and script generation functionality.
Maintained by Joe Zhu. Last updated 23 days ago.
clinical-trialsgraphslistingsnestreportingtables
14.7 match 12 stars 8.24 score 12 scriptsjernejjevsenak
MLFS:Machine Learning Forest Simulator
Climate-sensitive forest simulator based on the principles of machine learning. It stimulates all key processes in the forest: radial growth, height growth, mortality, crown recession, regeneration and harvesting. The method for predicting tree heights was described by Skudnik and Jevšenak (2022) <doi:10.1016/j.foreco.2022.120017>, while the method for predicting basal area increments (BAI) was described by Jevšenak and Skudnik (2021) <doi:10.1016/j.foreco.2020.118601>.
Maintained by Jernej Jevsenak. Last updated 3 years ago.
33.7 match 2 stars 3.40 score 25 scriptsshangzhi-hong
RfEmpImp:Multiple Imputation using Chained Random Forests
An R package for multiple imputation using chained random forests. Implemented methods can handle missing data in mixed types of variables by using prediction-based or node-based conditional distributions constructed using random forests. For prediction-based imputation, the method based on the empirical distribution of out-of-bag prediction errors of random forests and the method based on normality assumption for prediction errors of random forests are provided for imputing continuous variables. And the method based on predicted probabilities is provided for imputing categorical variables. For node-based imputation, the method based on the conditional distribution formed by the predicting nodes of random forests, and the method based on proximity measures of random forests are provided. More details of the statistical methods can be found in Hong et al. (2020) <arXiv:2004.14823>.
Maintained by Shangzhi Hong. Last updated 2 years ago.
imputationmissing-datarandom-forest
25.6 match 5 stars 4.40 score 8 scriptstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
13.6 match 3 stars 8.20 score 7.8k scripts 11 dependentsskgrange
rmweather:Tools to Conduct Meteorological Normalisation and Counterfactual Modelling for Air Quality Data
An integrated set of tools to allow data users to conduct meteorological normalisation and counterfactual modelling for air quality data. The meteorological normalisation technique uses predictive random forest models to remove variation of pollutant concentrations so trends and interventions can be explored in a robust way. For examples, see Grange et al. (2018) <doi:10.5194/acp-18-6223-2018> and Grange and Carslaw (2019) <doi:10.1016/j.scitotenv.2018.10.344>. The random forest models can also be used for counterfactual or business as usual (BAU) modelling by using the models to predict, from the model's perspective, the future. For an example, see Grange et al. (2021) <doi:10.5194/acp-2020-1171>.
Maintained by Stuart K. Grange. Last updated 22 days ago.
17.5 match 49 stars 6.24 score 239 scriptsemf-creaf
medfateland:Mediterranean Landscape Simulation
Simulate forest hydrology, forest function and dynamics over landscapes [De Caceres et al. (2015) <doi:10.1016/j.agrformet.2015.06.012>]. Parallelization is allowed in several simulation functions and simulations may be conducted including spatial processes such as lateral water transfer and seed dispersal.
Maintained by Miquel De Cáceres. Last updated 24 days ago.
20.0 match 5 stars 5.41 score 41 scriptsehrlinger
ggRandomForests:Visually Exploring Random Forests
Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.
Maintained by John Ehrlinger. Last updated 4 days ago.
11.8 match 148 stars 8.94 score 197 scriptsgforge
forestplot:Advanced Forest Plot Using 'grid' Graphics
Allows the creation of forest plots with advanced features, such as multiple confidence intervals per row, customizable fonts for individual text elements, and flexible confidence interval drawing. It also supports mixing text with mathematical expressions. The package extends the application of forest plots beyond traditional meta-analyses, offering a more general version of the original 'rmeta' package’s forestplot() function. It relies heavily on the 'grid' package for rendering the plots.
Maintained by Max Gordon. Last updated 4 months ago.
9.0 match 43 stars 11.47 score 716 scripts 21 dependentsmrc-ide
epireview:Tools to update and summarise the latest pathogen data from the Pathogen Epidemiology Review Group (PERG)
Contains the latest open access pathogen data from the Pathogen Epidemiology Review Group (PERG). Tools are available to update pathogen databases with new peer-reviewed data as it becomes available, and to summarise the latest data using tables and figures.
Maintained by Sangeeta Bhatia. Last updated 23 hours ago.
15.2 match 30 stars 6.76 score 6 scriptsumr-amap
BIOMASS:Estimating Aboveground Biomass and Its Uncertainty in Tropical Forests
Contains functions to estimate aboveground biomass/carbon and its uncertainty in tropical forests. These functions allow to (1) retrieve and to correct taxonomy, (2) estimate wood density and its uncertainty, (3) construct height-diameter models, (4) manage tree and plot coordinates, (5) estimate the aboveground biomass/carbon at the stand level with associated uncertainty. To cite 'BIOMASS', please use citation("BIOMASS"). See more in the article of Réjou-Méchain et al. (2017) <doi:10.1111/2041-210X.12753>.
Maintained by Dominique Lamonica. Last updated 14 hours ago.
10.3 match 26 stars 9.90 score 68 scripts 1 dependentsbips-hb
arf:Adversarial Random Forests
Adversarial random forests (ARFs) recursively partition data into fully factorized leaves, where features are jointly independent. The procedure is iterative, with alternating rounds of generation and discrimination. Data becomes increasingly realistic at each round, until original and synthetic samples can no longer be reliably distinguished. This is useful for several unsupervised learning tasks, such as density estimation and data synthesis. Methods for both are implemented in this package. ARFs naturally handle unstructured data with mixed continuous and categorical covariates. They inherit many of the benefits of random forests, including speed, flexibility, and solid performance with default parameters. For details, see Watson et al. (2023) <https://proceedings.mlr.press/v206/watson23a.html>.
Maintained by Marvin N. Wright. Last updated 18 days ago.
15.3 match 14 stars 6.65 score 16 scriptsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
14.2 match 145 stars 7.09 score 50 scripts 2 dependentsprise6
aVirtualTwins:Adaptation of Virtual Twins Method from Jared Foster
Research of subgroups in random clinical trials with binary outcome and two treatments groups. This is an adaptation of the Jared Foster method (<https://www.ncbi.nlm.nih.gov/pubmed/21815180>).
Maintained by Francois Vieille. Last updated 7 years ago.
21.6 match 4 stars 4.51 score 16 scriptsmolina-valero
FORTLS:Automatic Processing of Terrestrial-Based Technologies Point Cloud Data for Forestry Purposes
Process automation of point cloud data derived from terrestrial-based technologies such as Terrestrial Laser Scanner (TLS) or Mobile Laser Scanner. 'FORTLS' enables (i) detection of trees and estimation of tree-level attributes (e.g. diameters and heights), (ii) estimation of stand-level variables (e.g. density, basal area, mean and dominant height), (iii) computation of metrics related to important forest attributes estimated in Forest Inventories at stand-level, and (iv) optimization of plot design for combining TLS data and field measured data. Documentation about 'FORTLS' is described in Molina-Valero et al. (2022, <doi:10.1016/j.envsoft.2022.105337>).
Maintained by Juan Alberto Molina-Valero. Last updated 3 months ago.
forest-inventoryforest-monitoringlidar-point-cloudcpp
15.8 match 22 stars 6.16 score 11 scriptscrj32
MLeval:Machine Learning Model Evaluation
Straightforward and detailed evaluation of machine learning models. 'MLeval' can produce receiver operating characteristic (ROC) curves, precision-recall (PR) curves, calibration curves, and PR gain curves. 'MLeval' accepts a data frame of class probabilities and ground truth labels, or, it can automatically interpret the Caret train function results from repeated cross validation, then select the best model and analyse the results. 'MLeval' produces a range of evaluation metrics with confidence intervals.
Maintained by Christopher R John. Last updated 5 years ago.
16.9 match 6 stars 5.71 score 144 scriptsmkossmeier
metaviz:Forest Plots, Funnel Plots, and Visual Funnel Plot Inference for Meta-Analysis
A compilation of functions to create visually appealing and information-rich plots of meta-analytic data using 'ggplot2'. Currently allows to create forest plots, funnel plots, and many of their variants, such as rainforest plots, thick forest plots, additional evidence contour funnel plots, and sunset funnel plots. In addition, functionalities for visual inference with the funnel plot in the context of meta-analysis are provided.
Maintained by Michael Kossmeier. Last updated 5 years ago.
13.0 match 17 stars 7.32 score 135 scriptsanna-neufeld
splinetree:Longitudinal Regression Trees and Forests
Builds regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of Yu and Lambert (1999) <doi:10.1080/10618600.1999.10474847>. This method allows trees and forests to be built while considering either level and shape or only shape of response trajectories.
Maintained by Anna Neufeld. Last updated 6 years ago.
17.8 match 4 stars 5.24 score 29 scriptsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 15 days ago.
7.8 match 12 stars 11.88 score 448 scripts 16 dependentsmiriamesteve
eat:Efficiency Analysis Trees
Functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.
Maintained by Miriam Esteve. Last updated 3 years ago.
19.5 match 5 stars 4.68 score 19 scriptslleisong
itsdm:Isolation Forest-Based Presence-Only Species Distribution Modeling
Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest) and its variations such as Extended isolation forest and SCiForest. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) <https://proceedings.mlr.press/v48/guha16.html>, Cortes, D. (2021) <arXiv:2110.13402>. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) <https://dl.acm.org/doi/abs/10.5555/3295222.3295230>, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Štrumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including 'WorldClim' version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and 'CMCC-BioClimInd' (see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>.
Maintained by Lei Song. Last updated 2 years ago.
isolation-forestoutlier-detectionpresence-onlymodelshapley-valuespecies-distribution-modelling
16.2 match 4 stars 5.59 score 65 scriptsfarrellday
miceRanger:Multiple Imputation by Chained Equations with Random Forests
Multiple Imputation has been shown to be a flexible method to impute missing values by Van Buuren (2007) <doi:10.1177/0962280206074463>. Expanding on this, random forests have been shown to be an accurate model by Stekhoven and Buhlmann <arXiv:1105.0828> to impute missing values in datasets. They have the added benefits of returning out of bag error and variable importance estimates, as well as being simple to run in parallel.
Maintained by Sam Wilson. Last updated 3 years ago.
imputation-methodsmachine-learningmicemissing-datamissing-valuesrandom-forests
12.7 match 67 stars 7.09 score 41 scripts 1 dependentsnalzok
tree.interpreter:Random Forest Prediction Decomposition and Feature Importance Measure
An R re-implementation of the 'treeinterpreter' package on PyPI <https://pypi.org/project/treeinterpreter/>. Each prediction can be decomposed as 'prediction = bias + feature_1_contribution + ... + feature_n_contribution'. This decomposition is then used to calculate the Mean Decrease Impurity (MDI) and Mean Decrease Impurity using out-of-bag samples (MDI-oob) feature importance measures based on the work of Li et al. (2019) <arXiv:1906.10845>.
Maintained by Qingyao Sun. Last updated 5 years ago.
data-sciencedatascienceinterpretabilitymachine-learningrandom-forestcpp
15.3 match 12 stars 5.79 score 6 scriptsuchidamizuki
timbr:Forest/Tree Data Frames
Provides data frames for forest or tree data structures. You can create forest data structures from data frames and process them based on their hierarchies.
Maintained by Mizuki Uchida. Last updated 4 months ago.
18.0 match 11 stars 4.93 score 31 scriptscsafe-isu
handwriterRF:Handwriting Analysis with Random Forests
Perform forensic handwriting analysis of two scanned handwritten documents. This package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>. Similarity measures and a random forest produce a score-based likelihood ratio that quantifies the strength of the evidence in favor of the documents being written by the same writer or different writers.
Maintained by Stephanie Reinders. Last updated 7 days ago.
14.1 match 2 stars 6.18 score 15 scripts 1 dependentskearutherford
BerkeleyForestsAnalytics:Compute and Summarize Core Forest Metrics from Field Data
A suite of open-source R functions designed to produce standard metrics for forest management and ecology from forest inventory data. The overarching goal is to minimize potential inconsistencies introduced by the algorithms used to compute and summarize core forest metrics. Learn more about the purpose of the package and the specific algorithms used in the package at <https://github.com/kearutherford/BerkeleyForestsAnalytics>.
Maintained by Kea Rutherford. Last updated 2 months ago.
15.7 match 7 stars 5.50 score 4 scriptsgcicc
figuRes2:Support for a Variety of Figure Production Tasks
We view a figure as a collection of graphs/tables assembled on a page and optionally annotated with metadata (titles, headers and footers). Functions and supporting documentation are offered to streamline a variety of figure production task.
Maintained by Greg Cicconetti. Last updated 3 years ago.
17.9 match 3 stars 4.78 scorecarlos-alberto-silva
ForestGapR:Tropical Forest Canopy Gaps Analysis
Set of tools for detecting and analyzing Airborne Laser Scanning-derived Tropical Forest Canopy Gaps. Details were published in Silva and others (2019) <doi:10.1111/2041-210X.13211>.
Maintained by Carlos Alberto Silva. Last updated 1 years ago.
16.1 match 29 stars 5.24 score 24 scriptsplantedml
randomPlantedForest:Random Planted Forest: A Directly Interpretable Tree Ensemble
An implementation of the Random Planted Forest algorithm for directly interpretable tree ensembles based on a functional ANOVA decomposition.
Maintained by Lukas Burk. Last updated 4 months ago.
intelligibilityinterpretable-machine-learninginterpretable-mlmachine-learningmlrandom-forestcpp
20.3 match 5 stars 4.15 score 38 scriptscaf-ifrit
forestat:Forest Carbon Sequestration and Potential Productivity Calculation
Include assessing site classes based on the stand height growth and establishing a nonlinear mixed-effect biomass model under different site classes based on the whole stand model to achieve more accurate estimation of carbon sequestration. In particular, a carbon sequestration potential productivity calculation method based on the potential mean annual increment is proposed. This package is applicable to both natural forests and plantations. It can quantitatively assess stand’s potential productivity, realized productivity, and possible improvement under certain site, and can be used in many aspects such as site quality assessment, tree species suitability evaluation, and forest degradation evaluation. Reference: Lei X, Fu L, Li H, et al (2018) <doi:10.11707/j.1001-7488.20181213>. Fu L, Sharma R P, Zhu G, et al (2017) <doi:10.3390/f8040119>.
Maintained by Yuanyuan Han. Last updated 1 years ago.
15.2 match 23 stars 5.52 score 29 scriptsmlr-org
mlr3mbo:Flexible Bayesian Optimization
A modern and flexible approach to Bayesian Optimization / Model Based Optimization building on the 'bbotk' package. 'mlr3mbo' is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using 'mlr3mbo' for hyperparameter optimization of machine learning models within the 'mlr3' ecosystem is straightforward via 'mlr3tuning'. Examples of ready-to-use optimization algorithms include Efficient Global Optimization by Jones et al. (1998) <doi:10.1023/A:1008306431147>, ParEGO by Knowles (2006) <doi:10.1109/TEVC.2005.851274> and SMS-EGO by Ponweiser et al. (2008) <doi:10.1007/978-3-540-87700-4_78>.
Maintained by Lennart Schneider. Last updated 11 days ago.
automlbayesian-optimizationbbotkblack-box-optimizationgaussian-processhpohyperparameterhyperparameter-optimizationhyperparameter-tuningmachine-learningmlr3model-based-optimizationoptimizationoptimizerrandom-foresttuning
9.5 match 25 stars 8.57 score 120 scripts 3 dependentsnicolas-robette
moreparty:A Toolbox for Conditional Inference Trees and Random Forests
Additions to 'party' and 'partykit' packages : tools for the interpretation of forests (surrogate trees, prototypes, etc.), feature selection (see Gregorutti et al (2017) <arXiv:1310.5726>, Hapfelmeier and Ulm (2013) <doi:10.1016/j.csda.2012.09.020>, Altmann et al (2010) <doi:10.1093/bioinformatics/btq134>) and parallelized versions of conditional forest and variable importance functions. Also modules and a shiny app for conditional inference trees.
Maintained by Nicolas Robette. Last updated 11 months ago.
19.3 match 3 stars 4.18 score 8 scriptstalegari
solitude:An Implementation of Isolation Forest
Isolation forest is anomaly detection method introduced by the paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>).
Maintained by Komala Sheshachala Srikanth. Last updated 4 years ago.
isolation-forestoutliersrpackages
15.4 match 24 stars 5.24 score 70 scripts 1 dependentscran
metaumbrella:Umbrella Review Package for R
A comprehensive range of facilities to perform umbrella reviews with stratification of the evidence in R. The package accomplishes this aim by building on three core functions that: (i) automatically perform all required calculations in an umbrella review (including but not limited to meta-analyses), (ii) stratify evidence according to various classification criteria, and (iii) generate a visual representation of the results. Note that if you are not familiar with R, the core features of this package are available from a web browser (<https://www.metaumbrella.org/>).
Maintained by Corentin J Gosling. Last updated 15 days ago.
17.4 match 9 stars 4.56 scorestekhoven
missForest:Nonparametric Missing Value Imputation using Random Forest
The function 'missForest' in this package is used to impute missing values particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data including complex interactions and non-linear relations. It yields an out-of-bag (OOB) imputation error estimate without the need of a test set or elaborate cross-validation. It can be run in parallel to save computation time.
Maintained by Daniel J. Stekhoven. Last updated 1 years ago.
6.8 match 92 stars 11.53 score 1.1k scripts 32 dependentstlverse
sl3:Pipelines for Machine Learning and Super Learning
A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.
Maintained by Jeremy Coyle. Last updated 4 months ago.
data-scienceensemble-learningensemble-modelmachine-learningmodel-selectionregressionstackingstatistics
7.6 match 100 stars 9.94 score 748 scripts 7 dependentstraitecoevo
plant:A Package for Modelling Forest Trait Ecology and Evolution
Solves trait, size and patch structured model from (Falster et al. 2016) using either method of characteristics or as stochastic, finite-sized population.
Maintained by Daniel Falster. Last updated 6 days ago.
c-plus-plusdemographydynamicecologyevolutionforestsplant-physiologyscience-researchsimulationtraitcpp
12.9 match 53 stars 5.87 scoretidymodels
tidypredict:Run Predictions Inside the Database
It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.
Maintained by Emil Hvitfeldt. Last updated 3 months ago.
6.9 match 261 stars 11.03 score 241 scripts 2 dependentse-sensing
sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes
An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
Maintained by Gilberto Camara. Last updated 30 days ago.
big-earth-datacbersearth-observationeo-datacubesgeospatialimage-time-seriesland-cover-classificationlandsatplanetary-computerr-spatialremote-sensingrspatialsatellite-image-time-seriessatellite-imagerysentinel-2stac-apistac-catalogcpp
7.7 match 494 stars 9.50 score 384 scriptsanthonydevaux
DynForest:Random Forest with Multivariate Longitudinal Predictors
Based on random forest principle, 'DynForest' is able to include multiple longitudinal predictors to provide individual predictions. Longitudinal predictors are modeled through the random forest. The methodology is fully described for a survival outcome in: Devaux, Helmer, Genuer & Proust-Lima (2023) <doi: 10.1177/09622802231206477>.
Maintained by Anthony Devaux. Last updated 5 months ago.
11.3 match 16 stars 6.38 score 8 scriptsforestgeo
fgeo:Analyze Forest Diversity and Dynamics
To help you access, transform, analyze, and visualize ForestGEO data, we developed a collection of R packages (<https://forestgeo.github.io/fgeo/>). This package, in particular, helps you to install and load the entire package-collection with a single R command, and provides convenient ways to find relevant documentation. Most commonly, you should not worry about the individual packages that make up the package-collection as you can access all features via this package. To learn more about ForestGEO visit <http://www.forestgeo.si.edu/>.
Maintained by Mauro Lepore. Last updated 5 years ago.
abundancedemographydynamicdynamicsecologyfgeoforestgeoforestshabitatmetapackagetree
13.1 match 31 stars 5.50 score 12 scriptsrandel
MixRF:A Random-Forest-Based Approach for Imputing Clustered Incomplete Data
It offers random-forest-based functions to impute clustered incomplete data. The package is tailored for but not limited to imputing multitissue expression data, in which a gene's expression is measured on the collected tissues of an individual but missing on the uncollected tissues.
Maintained by Jiebiao Wang. Last updated 8 years ago.
gene-expressionimputationmixed-modelsrandom-forest
16.3 match 35 stars 4.39 score 14 scriptsdaniel-conn17
fuzzyforest:Fuzzy Forests
Fuzzy forests, a new algorithm based on random forests, is designed to reduce the bias seen in random forest feature selection caused by the presence of correlated features. Fuzzy forests uses recursive feature elimination random forests to select features from separate blocks of correlated features where the correlation within each block of features is high and the correlation between blocks of features is low. One final random forest is fit using the surviving features. This package fits random forests using the 'randomForest' package and allows for easy use of 'WGCNA' to split features into distinct blocks. See D. Conn, Ngun, T., C. Ramirez, and G. Li (2019) <doi:10.18637/jss.v091.i09> for further details.
Maintained by Daniel Conn. Last updated 5 years ago.
30.6 match 2.31 score 41 scriptsericarcher
rfPermute:Estimate Permutation p-Values for Random Forest Importance Metrics
Estimate significance of importance metrics for a Random Forest model by permuting the response variable. Produces null distribution of importance metrics for each predictor variable and p-value of observed. Provides summary and visualization functions for 'randomForest' results.
Maintained by Eric Archer. Last updated 2 years ago.
10.4 match 27 stars 6.77 score 96 scripts 1 dependentsalaninglis
vivid:Variable Importance and Variable Interaction Displays
A suite of plots for displaying variable importance and two-way variable interaction jointly. Can also display partial dependence plots laid out in a pairs plot or 'zenplots' style.
Maintained by Alan Inglis. Last updated 8 months ago.
9.3 match 21 stars 7.39 score 39 scriptscseljatib
datana:Datasets and Functions to Accompany Analisis De Datos Con R
Datasets and functions to accompany the book 'Analisis de datos con el programa estadistico R: una introduccion aplicada' by Salas-Eljatib (2021, ISBN: 9789566086109). The package helps carry out data management, exploratory analyses, and model fitting.
Maintained by Christian Salas-Eljatib. Last updated 6 months ago.
52.7 match 1.30 score 1 scriptsjmsigner
amt:Animal Movement Tools
Manage and analyze animal movement data. The functionality of 'amt' includes methods to calculate home ranges, track statistics (e.g. step lengths, speed, or turning angles), prepare data for fitting habitat selection analyses, and simulation of space-use from fitted step-selection functions.
Maintained by Johannes Signer. Last updated 4 months ago.
6.3 match 41 stars 10.54 score 418 scriptsshixiangwang
ezcox:Easily Process a Batch of Cox Models
A tool to operate a batch of univariate or multivariate Cox models and return tidy result.
Maintained by Shixiang Wang. Last updated 1 years ago.
8.8 match 21 stars 7.22 score 44 scripts 1 dependentsmlampros
RGF:Regularized Greedy Forest
Regularized Greedy Forest wrapper of the 'Regularized Greedy Forest' <https://github.com/RGF-team/rgf/tree/master/python-package> 'python' package, which also includes a Multi-core implementation (FastRGF) <https://github.com/RGF-team/rgf/tree/master/FastRGF>.
Maintained by Lampros Mouselimis. Last updated 3 years ago.
17.7 match 3.57 score 74 scriptshrlai
novelforestSG:Dataset from the Novel Forests of Singapore
The raw dataset and model used in Lai et al. (2021) Decoupled responses of native and exotic tree diversities to distance from old-growth forest and soil phosphorous in novel secondary forests. Applied Vegetation Science, 24, e12548.
Maintained by Hao Ran Lai. Last updated 1 years ago.
datadiversityecologyforestsingapore
18.7 match 2 stars 3.30 score 4 scriptsliuyu-star
ODRF:Oblique Decision Random Forest for Classification and Regression
The oblique decision tree (ODT) uses linear combinations of predictors as partitioning variables in a decision tree. Oblique Decision Random Forest (ODRF) is an ensemble of multiple ODTs generated by feature bagging. Oblique Decision Boosting Tree (ODBT) applies feature bagging during the training process of ODT-based boosting trees to ensemble multiple boosting trees. All three methods can be used for classification and regression, and ODT and ODRF serve as supplements to the classical CART of Breiman (1984) <DOI:10.1201/9781315139470> and Random Forest of Breiman (2001) <DOI:10.1023/A:1010933404324> respectively.
Maintained by Yu Liu. Last updated 5 months ago.
11.9 match 7 stars 5.10 score 18 scriptsbrian-j-smith
MachineShop:Machine Learning Models and Tools
Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.
Maintained by Brian J Smith. Last updated 7 months ago.
classification-modelsmachine-learningpredictive-modelingregression-modelssurvival-models
7.5 match 61 stars 7.95 score 121 scriptsbenjilu
forestError:A Unified Framework for Random Forest Prediction Error Estimation
Estimates the conditional error distributions of random forest predictions and common parameters of those distributions, including conditional misclassification rates, conditional mean squared prediction errors, conditional biases, and conditional quantiles, by out-of-bag weighting of out-of-bag prediction errors as proposed by Lu and Hardin (2021). This package is compatible with several existing packages that implement random forests in R.
Maintained by Benjamin Lu. Last updated 4 years ago.
inferenceintervalsmachine-learningmachinelearningpredictionrandom-forestrandomforeststatistics
12.9 match 26 stars 4.62 score 16 scriptsmapme-initiative
mapme.biodiversity:Efficient Monitoring of Global Biodiversity Portfolios
Biodiversity areas, especially primary forest, serve a multitude of functions for local economy, regional functionality of the ecosystems as well as the global health of our planet. Recently, adverse changes in human land use practices and climatic responses to increased greenhouse gas emissions, put these biodiversity areas under a variety of different threats. The present package helps to analyse a number of biodiversity indicators based on freely available geographical datasets. It supports computational efficient routines that allow the analysis of potentially global biodiversity portfolios. The primary use case of the package is to support evidence based reporting of an organization's effort to protect biodiversity areas under threat and to identify regions were intervention is most duly needed.
Maintained by Darius A. Görgen. Last updated 3 months ago.
environmenteogismapmespatialsustainability
6.1 match 35 stars 9.24 score 287 scriptsnatydasilva
PPforest:Projection Pursuit Classification Forest
Implements projection pursuit forest algorithm for supervised classification.
Maintained by Natalia da Silva. Last updated 8 months ago.
10.2 match 18 stars 5.53 score 19 scriptsforest-economics-goettingen
optimLanduse:Robust Land-Use Optimization
Robust multi-criteria land-allocation optimization that explicitly accounts for the uncertainty of the indicators in the objective function. Solves the problem of allocating scarce land to various land-use options with regard to multiple, coequal indicators. The method aims to find the land allocation that represents the indicator composition with the best possible trade-off under uncertainty. optimLanduse includes the actual optimization procedure as described by Knoke et al. (2016) <doi:10.1038/ncomms11877> and the post-hoc calculation of the portfolio performance as presented by Gosling et al. (2020) <doi:10.1016/j.jenvman.2020.110248>.
Maintained by Kai Husmann. Last updated 1 years ago.
15.6 match 2 stars 3.60 score 2 scriptsrdiaz02
varSelRF:Variable Selection using Random Forests
Variable selection from random forests using both backwards variable elimination (for the selection of small sets of non-redundant variables) and selection based on the importance spectrum (somewhat similar to scree plots; for the selection of large, potentially highly-correlated variables). Main applications in high-dimensional data (e.g., microarray data, and other genomics and proteomics applications).
Maintained by Ramon Diaz-Uriarte. Last updated 8 years ago.
8.7 match 12 stars 6.48 score 83 scripts 2 dependentssmouksassi
coveffectsplot:Produce Forest Plots to Visualize Covariate Effects
Produce forest plots to visualize covariate effects using either the command line or an interactive 'Shiny' application.
Maintained by Samer Mouksassi. Last updated 1 months ago.
7.0 match 32 stars 7.86 score 40 scriptsforest-economics-goettingen
woodValuationDE:Wood Valuation Germany
Monetary valuation of wood in German forests (stumpage values), including estimations of harvest quantities, wood revenues, and harvest costs. The functions are sensitive to tree species, mean diameter of the harvested trees, stand quality, and logging method. The functions include estimations for the consequences of disturbances on revenues and costs. The underlying assortment tables are taken from Offer and Staupendahl (2018) with corresponding functions for salable and skidded volume derived in Fuchs et al. (2023). Wood revenue and harvest cost functions were taken from v. Bodelschwingh (2018). The consequences of disturbances refer to Dieter (2001), Moellmann and Moehring (2017), and Fuchs et al. (2022a, 2022b). For the full references see documentation of the functions, package README, and Fuchs et al. (2023). Apart from Dieter (2001) and Moellmann and Moehring (2017), all functions and factors are based on data from HessenForst, the forest administration of the Federal State of Hesse in Germany.
Maintained by Jasper M. Fuchs. Last updated 8 months ago.
16.4 match 2 stars 3.30 score 2 scriptsdistancedevelopment
Distance:Distance Sampling Detection Function and Abundance Estimation
A simple way of fitting detection functions to distance sampling data for both line and point transects. Adjustment term selection, left and right truncation as well as monotonicity constraints and binning are supported. Abundance and density estimates can also be calculated (via a Horvitz-Thompson-like estimator) if survey area information is provided. See Miller et al. (2019) <doi:10.18637/jss.v089.i01> for more information on methods and <https://examples.distancesampling.org/> for example analyses.
Maintained by Laura Marshall. Last updated 10 days ago.
6.0 match 11 stars 8.89 score 358 scripts 3 dependentssylvainschmitt
rcontroll:Individual-Based Forest Growth Simulator 'TROLL'
'TROLL' is coded in C++ and it typically simulates hundreds of thousands of individuals over hundreds of years. The 'rcontroll' R package is a wrapper of 'TROLL'. 'rcontroll' includes functions that generate inputs for simulations and run simulations. Finally, it is possible to analyse the 'TROLL' outputs through tables, figures, and maps taking advantage of other R visualisation packages. 'rcontroll' also offers the possibility to generate a virtual LiDAR point cloud that corresponds to a snapshot of the simulated forest.
Maintained by Sylvain Schmitt. Last updated 6 months ago.
9.1 match 5 stars 5.76 score 19 scriptsforestry-labs
Rforestry:Random Forests, Linear Trees, and Gradient Boosting for Inference and Interpretability
Provides fast implementations of Honest Random Forests, Gradient Boosting, and Linear Random Forests, with an emphasis on inference and interpretability. Additionally contains methods for variable importance, out-of-bag prediction, regression monotonicity, and several methods for missing data imputation.
Maintained by Theo Saarinen. Last updated 3 days ago.
9.4 match 5.57 score 82 scripts 1 dependentsazvoleff
gfcanalysis:Tools for Working with Hansen et al. Global Forest Change Dataset
Supports analyses using the Global Forest Change dataset released by Hansen et al. gfcanalysis was originally written for the Tropical Ecology Assessment and Monitoring (TEAM) Network. For additional details on the Global Forest Change dataset, see: Hansen, M. et al. 2013. "High-Resolution Global Maps of 21st-Century Forest Cover Change." Science 342 (15 November): 850-53. The forest change data and more information on the product is available at <http://earthenginepartners.appspot.com>.
Maintained by Matthew Cooper. Last updated 1 years ago.
10.6 match 17 stars 4.93 score 33 scriptsfinleya
spBayes:Univariate and Multivariate Spatial-Temporal Modeling
Fits univariate and multivariate spatio-temporal random effects models for point-referenced data using Markov chain Monte Carlo (MCMC). Details are given in Finley, Banerjee, and Gelfand (2015) <doi:10.18637/jss.v063.i13> and Finley and Banerjee <doi:10.1016/j.envsoft.2019.104608>.
Maintained by Andrew Finley. Last updated 6 months ago.
10.9 match 1 stars 4.69 score 231 scripts 7 dependentsropensci
MtreeRing:A Shiny Application for Automatic Measurements of Tree-Ring Widths on Digital Images
Use morphological image processing and edge detection algorithms to automatically measure tree ring widths on digital images. Users can also manually mark tree rings on species with complex anatomical structures. The arcs of inner-rings and angles of successive inclined ring boundaries are used to correct ring-width series. The package provides a Shiny-based application, allowing R beginners to easily analyze tree ring images and export ring-width series in standard file formats.
Maintained by Jingning Shi. Last updated 8 months ago.
dendrochronologyforestforestryshiny-appsshinyapptree-ring-widthtree-rings
11.0 match 33 stars 4.66 score 14 scriptssoftwaredeng
RRF:Regularized Random Forest
Feature Selection with Regularized Random Forest. This package is based on the 'randomForest' package by Andy Liaw. The key difference is the RRF() function that builds a regularized random forest. Fortran original by Leo Breiman and Adele Cutler, R port by Andy Liaw and Matthew Wiener, Regularized random forest for classification by Houtao Deng, Regularized random forest for regression by Xin Guan. Reference: Houtao Deng (2013) <doi:10.48550/arXiv.1306.0237>.
Maintained by Houtao Deng. Last updated 4 months ago.
13.4 match 3.81 score 118 scripts 3 dependentszongzheng
forestSAS:Forest Spatial Structure Analysis Systems
Recent years have seen significant interest in neighborhood-based structural parameters that effectively represent the spatial characteristics of tree populations and forest communities, and possess strong applicability for guiding forestry practices. This package provides valuable information that enhances our understanding and analysis of the fine-scale spatial structure of tree populations and forest stands. Reference: Yan L, Tan W, Chai Z, et al (2019) <doi:10.13323/j.cnki.j.fafu(nat.sci.).2019.03.007>.
Maintained by Zongzheng Chai. Last updated 4 months ago.
35.5 match 1.38 score 24 scriptsips-lmu
wrassp:Interface to the 'ASSP' Library
A wrapper around Michel Scheffers's 'libassp' (<https://libassp.sourceforge.net/>). The 'libassp' (Advanced Speech Signal Processor) library aims at providing functionality for handling speech signal files in most common audio formats and for performing analyses common in phonetic science/speech science. This includes the calculation of formants, fundamental frequency, root mean square, auto correlation, a variety of spectral analyses, zero crossing rate, filtering etc. This wrapper provides R with a large subset of 'libassp's signal processing functions and provides them to the user in a (hopefully) user-friendly manner.
Maintained by Markus Jochim. Last updated 1 years ago.
6.6 match 24 stars 7.43 score 62 scripts 3 dependentschguiterman
dfoliatR:Detection and Analysis of Insect Defoliation Signals in Tree Rings
Tools to identify, quantify, analyze, and visualize growth suppression events in tree rings that are often produced by insect defoliation. Described in Guiterman et al. (2020) <doi:10.1016/j.dendro.2020.125750>.
Maintained by Chris Guiterman. Last updated 2 years ago.
budwormdefoliatorsdendrochronologydendroecologydisturbanceforestsinsectsoutbreaktree-rings
10.0 match 7 stars 4.89 score 22 scriptsgisma
uavRmp:UAV Mission Planner
The Unmanned Aerial Vehicle Mission Planner provides an easy to use work flow for planning autonomous obstacle avoiding surveys of ready to fly unmanned aerial vehicles to retrieve aerial or spot related data. It creates either intermediate flight control files for the DJI-Litchi supported series or ready to upload control files for the pixhawk-based flight controller as used in the 3DR-Solo or Yuneec series. Additionally it contains some useful tools for digitizing and data manipulation.
Maintained by Chris Reudenbach. Last updated 9 months ago.
cultural-heritagedjidroneflight-planningforest-mappinglitchilow-budget-uavmission-planningphotogrammetrypixhawkpixhawk-controllerqgroundcontrol2litchisolosurveyterrain-followingterrain-mappinguavsyuneec
7.5 match 25 stars 6.48 score 6 scriptsschaffman5
rtf:Rich Text Format (RTF) Output
A set of R functions to output Rich Text Format (RTF) files with high resolution tables and graphics that may be edited with a standard word processor such as Microsoft Word.
Maintained by Michael E. Schaffer. Last updated 6 years ago.
5.7 match 5 stars 8.55 score 169 scripts 10 dependentsbioc
maftools:Summarize, Analyze and Visualize MAF Files
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.
Maintained by Anand Mayakonda. Last updated 5 months ago.
datarepresentationdnaseqvisualizationdrivermutationvariantannotationfeatureextractionclassificationsomaticmutationsequencingfunctionalgenomicssurvivalbioinformaticscancer-genome-atlascancer-genomicsgenomicsmaf-filestcgacurlbzip2xz-utilszlib
3.3 match 459 stars 14.63 score 948 scripts 18 dependentsericarcher
banter:BioAcoustic eveNT classifiER
Create a hierarchical acoustic event species classifier out of multiple call type detectors as described in Rankin et al (2017) <doi:10.1111/mms.12381>.
Maintained by Eric Archer. Last updated 1 years ago.
acousticsbioacousticscetaceansclassificationdolphinsmachine-learningnoaarandom-forestspecies-identificationsupervised-learningsupervised-machine-learningwhalesjagscpp
11.2 match 9 stars 4.22 score 37 scriptsinsightsengineering
tern:Create Common TLGs Used in Clinical Trials
Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.
Maintained by Joe Zhu. Last updated 2 months ago.
clinical-trialsgraphslistingsnestoutputstables
3.7 match 79 stars 12.62 score 186 scripts 9 dependentsannechao
MF.beta4:Measuring Ecosystem Multi-Functionality and Its Decomposition
Provide simple functions to (i) compute a class of multi-functionality measures for a single ecosystem for given function weights, (ii) decompose gamma multi-functionality for pairs of ecosystems and K ecosystems (K can be greater than 2) into a within-ecosystem component (alpha multi-functionality) and an among-ecosystem component (beta multi-functionality). In each case, the correlation between functions can be corrected for. Based on biodiversity and ecosystem function data, this software also facilitates graphics for assessing biodiversity-ecosystem functioning relationships across scales.
Maintained by Anne Chao. Last updated 3 months ago.
10.5 match 4.40 score 3 scriptscran
forestmodel:Forest Plots from Regression Models
Produces forest plots using 'ggplot2' from models produced by functions such as stats::lm(), stats::glm() and survival::coxph().
Maintained by Nick Kennedy. Last updated 5 years ago.
12.9 match 1 stars 3.58 score 5 dependentsmayer79
outForest:Multivariate Outlier Detection and Replacement
Provides a random forest based implementation of the method described in Chapter 7.1.2 (Regression model based anomaly detection) of Chandola et al. (2009) <doi:10.1145/1541880.1541882>. It works as follows: Each numeric variable is regressed onto all other variables by a random forest. If the scaled absolute difference between observed value and out-of-bag prediction of the corresponding random forest is suspiciously large, then a value is considered an outlier. The package offers different options to replace such outliers, e.g. by realistic values found via predictive mean matching. Once the method is trained on a reference data, it can be applied to new data.
Maintained by Michael Mayer. Last updated 8 months ago.
machine-learningoutlieroutlier-analysisoutlier-detectionrandom-forest
8.4 match 13 stars 5.39 score 19 scriptsluyouepiusf
SurvivalClusteringTree:Clustering Analysis Using Survival Tree and Forest Algorithms
An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters.
Maintained by Lu You. Last updated 2 years ago.
12.2 match 3.70 score 2 scriptsmissvalteam
Iscores:Proper Scoring Rules for Missing Value Imputation
Implementation of a KL-based scoring rule to assess the quality of different missing value imputations in the broad sense as introduced in Michel et al. (2021) <arXiv:2106.03742>.
Maintained by Loris Michel. Last updated 2 years ago.
imputation-methodsmachine-learningmissing-valuesrandom-forest
11.5 match 7 stars 3.91 score 23 scriptsscnext
SCGLR:Supervised Component Generalized Linear Regression
An extension of the Fisher Scoring Algorithm to combine PLS regression with GLM estimation in the multivariate context. Covariates can also be grouped in themes.
Maintained by Guillaume Cornu. Last updated 19 days ago.
partial-least-squares-regression
10.3 match 2 stars 4.30 score 67 scriptspredictiveecology
LandR:Landscape Ecosystem Modelling in R
Utilities for 'LandR' suite of landscape simulation models. These models simulate forest vegetation dynamics based on LANDIS-II, and incorporate fire and insect disturbance, as well as other important ecological processes. Models are implemented as 'SpaDES' modules.
Maintained by Eliot J B McIntire. Last updated 3 days ago.
ecological-modellinglandscape-ecosystem-modellingspades
7.3 match 17 stars 6.07 score 12 scripts 4 dependentsparsifal9
RFlocalfdr:Significance Level for Random Forest Impurity Importance Scores
Sets a significance level for Random Forest MDI (Mean Decrease in Impurity, Gini or sum of squares) variable importance scores, using an empirical Bayes approach. See Dunne et al. (2022) <doi:10.1101/2022.04.06.487300>.
Maintained by Robert Dunne. Last updated 1 months ago.
9.3 match 1 stars 4.72 score 13 scriptsgjwgit
rattle:Graphical User Interface for Data Science in R
The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself. Note that RGtk2 and cairoDevice have been archived on CRAN. See <https://rattle.togaware.com> for installation instructions.
Maintained by Graham Williams. Last updated 3 years ago.
5.1 match 16 stars 8.48 score 3.0k scripts 3 dependentsr-forge
coin:Conditional Inference Procedures in a Permutation Test Framework
Conditional inference procedures for the general independence problem including two-sample, K-sample (non-parametric ANOVA), correlation, censored, ordered and multivariate problems described in <doi:10.18637/jss.v028.i08>.
Maintained by Torsten Hothorn. Last updated 9 months ago.
3.6 match 11.68 score 1.6k scripts 74 dependentsjaredsmurray
bcf:Causal Inference for a Binary Treatment and Continuous Outcome using Bayesian Causal Forests
Causal inference for a binary treatment and continuous outcome using Bayesian Causal Forests. See Hahn, Murray and Carvalho (2020) <https://projecteuclid.org/journals/bayesian-analysis/volume-15/issue-3/Bayesian-Regression-Tree-Models-for-Causal-Inference--Regularization-Confounding/10.1214/19-BA1195.full> for additional information. This implementation relies on code originally accompanying Pratola et. al. (2013) <arXiv:1309.1906>.
Maintained by Jared S. Murray. Last updated 1 years ago.
5.1 match 41 stars 8.12 score 46 scriptsusdaforestservice
FIESTA:Forest Inventory Estimation and Analysis
A research estimation tool for analysts that work with sample-based inventory data from the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program.
Maintained by Grayson White. Last updated 2 days ago.
5.8 match 30 stars 7.24 score 62 scriptscefet-rj-dal
daltoolbox:Leveraging Experiment Lines to Data Analytics
The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Maintained by Eduardo Ogasawara. Last updated 1 months ago.
6.3 match 1 stars 6.65 score 536 scripts 4 dependentscran
ForestElementsR:Data Structures and Functions for Working with Forest Data
Provides generic data structures and algorithms for use with forest mensuration data in a consistent framework. The functions and objects included are a collection of broadly applicable tools. More specialized applications should be implemented in separate packages that build on this foundation. Documentation about 'ForestElementsR' is provided by three vignettes included in this package. For an introduction to the field of forest mensuration, refer to the textbooks by Kershaw et al. (2017) <doi:10.1002/9781118902028>, and van Laar and Akca (2007) <doi:10.1007/978-1-4020-5991-9>.
Maintained by Peter Biber. Last updated 1 months ago.
11.8 match 3.48 scorehugaped
MBNMAdose:Dose-Response MBNMA Models
Fits Bayesian dose-response model-based network meta-analysis (MBNMA) that incorporate multiple doses within an agent by modelling different dose-response functions, as described by Mawdsley et al. (2016) <doi:10.1002/psp4.12091>. By modelling dose-response relationships this can connect networks of evidence that might otherwise be disconnected, and can improve precision on treatment estimates. Several common dose-response functions are provided; others may be added by the user. Various characteristics and assumptions can be flexibly added to the models, such as shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting at the treatment level.
Maintained by Hugo Pedder. Last updated 1 months ago.
6.1 match 10 stars 6.60 scorerobingenuer
VSURF:Variable Selection Using Random Forests
Three steps variable selection procedure based on random forests. Initially developed to handle high dimensional data (for which number of variables largely exceeds number of observations), the package is very versatile and can treat most dimensions of data, for regression and supervised classification problems. First step is dedicated to eliminate irrelevant variables from the dataset. Second step aims to select all variables related to the response for interpretation purpose. Third step refines the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purpose. Genuer, R. Poggi, J.-M. and Tuleau-Malot, C. (2015) <https://journal.r-project.org/archive/2015-2/genuer-poggi-tuleaumalot.pdf>.
Maintained by Robin Genuer. Last updated 8 months ago.
5.3 match 36 stars 7.49 score 192 scripts 1 dependentsjmm34
abcrf:Approximate Bayesian Computation via Random Forests
Performs Approximate Bayesian Computation (ABC) model choice and parameter inference via random forests. Pudlo P., Marin J.-M., Estoup A., Cornuet J.-M., Gautier M. and Robert C. P. (2016) <doi:10.1093/bioinformatics/btv684>. Estoup A., Raynal L., Verdu P. and Marin J.-M. <http://journal-sfds.fr/article/view/709>. Raynal L., Marin J.-M., Pudlo P., Ribatet M., Robert C. P. and Estoup A. (2019) <doi:10.1093/bioinformatics/bty867>.
Maintained by Jean-Michel Marin. Last updated 2 years ago.
8.4 match 2 stars 4.69 score 74 scriptskjakobse
EpiForsk:Code Sharing at the Department of Epidemiological Research at Statens Serum Institut
This is a collection of assorted functions and examples collected from various projects. Currently we have functionalities for simplifying overlapping time intervals, Charlson comorbidity score constructors for Danish data, getting frequency for multiple variables, getting standardized output from logistic and log-linear regressions, sibling design linear regression functionalities a method for calculating the confidence intervals for functions of parameters from a GLM, Bayes equivalent for hypothesis testing with asymptotic Bayes factor, and several help functions for generalized random forest analysis using 'grf'.
Maintained by Kim Daniel Jakobsen. Last updated 1 years ago.
8.8 match 4.48 score 8 scriptsschlosslab
mikropml:User-Friendly R Package for Supervised Machine Learning Pipelines
An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Maintained by Kelly Sovacool. Last updated 2 years ago.
5.0 match 56 stars 7.83 score 86 scriptskapsner
mlsurvlrnrs:R6-Based ML Survival Learners for 'mlexperiments'
Enhances 'mlexperiments' <https://CRAN.R-project.org/package=mlexperiments> with additional machine learning ('ML') learners for survival analysis. The package provides R6-based survival learners for the following algorithms: 'glmnet' <https://CRAN.R-project.org/package=glmnet>, 'ranger' <https://CRAN.R-project.org/package=ranger>, 'xgboost' <https://CRAN.R-project.org/package=xgboost>, and 'rpart' <https://CRAN.R-project.org/package=rpart>. These can be used directly with the 'mlexperiments' R package.
Maintained by Lorenz A. Kapsner. Last updated 10 days ago.
algorithmscox-regressionexperimentsglmnetlearnersmachine-learningrandom-survival-forestssurvivalsurvival-support-vector-machinexgboost
6.7 match 4 stars 5.86 score 12 scriptsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 2 months ago.
3.4 match 240 stars 11.39 score 6.0k scriptssimonyansenzhao
wsrf:Weighted Subspace Random Forest for Classification
A parallel implementation of Weighted Subspace Random Forest. The Weighted Subspace Random Forest algorithm was proposed in the International Journal of Data Warehousing and Mining by Baoxun Xu, Joshua Zhexue Huang, Graham Williams, Qiang Wang, and Yunming Ye (2012) <DOI:10.4018/jdwm.2012040103>. The algorithm can classify very high-dimensional data with random forests built using small subspaces. A novel variable weighting method is used for variable subspace selection in place of the traditional random variable sampling.This new approach is particularly useful in building models from high-dimensional data.
Maintained by He Zhao. Last updated 2 years ago.
7.9 match 14 stars 4.89 score 11 scriptsopengeos
whitebox:'WhiteboxTools' R Frontend
An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.
Maintained by Andrew Brown. Last updated 5 months ago.
geomorphometrygeoprocessinggeospatialgishydrologyremote-sensingrstudio
4.0 match 173 stars 9.65 score 203 scripts 2 dependentsndphillips
FFTrees:Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees
Create, visualize, and test fast-and-frugal decision trees (FFTs) using the algorithms and methods described by Phillips, Neth, Woike & Gaissmaier (2017), <doi:10.1017/S1930297500006239>. FFTs are simple and transparent decision trees for solving binary classification problems. FFTs can be preferable to more complex algorithms because they require very little information, are easy to understand and communicate, and are robust against overfitting.
Maintained by Hansjoerg Neth. Last updated 5 months ago.
4.0 match 135 stars 9.58 score 144 scriptsgertvv
gemtc:Network Meta-Analysis Using Bayesian Methods
Network meta-analyses (mixed treatment comparisons) in the Bayesian framework using JAGS. Includes methods to assess heterogeneity and inconsistency, and a number of standard visualizations.
Maintained by Gert van Valkenhoef. Last updated 5 years ago.
5.1 match 44 stars 7.48 score 71 scripts 1 dependentsgrunwaldlab
poppr:Genetic Analysis of Populations with Mixed Reproduction
Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grünwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grünwald (2015) <doi:10.3389/fgene.2015.00208>.
Maintained by Zhian N. Kamvar. Last updated 10 months ago.
clonalitygenetic-analysisgenetic-distancesminimum-spanning-networksmultilocus-genotypesmultilocus-lineagespopulation-geneticspopulationsopenmp
3.5 match 69 stars 10.84 score 672 scriptsrolkra
explore:Simplifies Exploratory Data Analysis
Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.
Maintained by Roland Krasser. Last updated 3 months ago.
data-explorationdata-visualisationdecision-treesedarmarkdownshinytidy
3.3 match 228 stars 11.43 score 221 scripts 1 dependentsbiometry
bipartite:Visualising Bipartite Networks and Calculating Some (Ecological) Indices
Functions to visualise webs and calculate a series of indices commonly used to describe pattern in (ecological) webs. It focuses on webs consisting of only two levels (bipartite), e.g. pollination webs or predator-prey-webs. Visualisation is important to get an idea of what we are actually looking at, while the indices summarise different aspects of the web's topology.
Maintained by Carsten F. Dormann. Last updated 5 days ago.
3.4 match 37 stars 10.93 score 592 scripts 15 dependentsaleksandarsekulic
meteo:RFSI & STRK Interpolation for Meteo and Environmental Variables
Random Forest Spatial Interpolation (RFSI, Sekulić et al. (2020) <doi:10.3390/rs12101687>) and spatio-temporal geostatistical (spatio-temporal regression Kriging (STRK)) interpolation for meteorological (Kilibarda et al. (2014) <doi:10.1002/2013JD020803>, Sekulić et al. (2020) <doi:10.1007/s00704-019-03077-3>) and other environmental variables. Contains global spatio-temporal models calculated using publicly available data.
Maintained by Aleksandar Sekulić. Last updated 5 months ago.
7.4 match 18 stars 5.06 score 64 scriptscliffordlai
bestglm:Best Subset GLM and Regression Utilities
Best subset glm using information criteria or cross-validation, carried by using 'leaps' algorithm (Furnival and Wilson, 1974) <doi:10.2307/1267601> or complete enumeration (Morgan and Tatar, 1972) <doi:10.1080/00401706.1972.10488918>. Implements PCR and PLS using AIC/BIC. Implements one-standard deviation rule for use with the 'caret' package.
Maintained by Yuanhao Lai. Last updated 5 years ago.
7.0 match 5.29 score 418 scripts 5 dependentsinsightsengineering
teal.modules.clinical:'teal' Modules for Standard Clinical Outputs
Provides user-friendly tools for creating and customizing clinical trial reports. By leveraging the 'teal' framework, this package provides 'teal' modules to easily create an interactive panel that allows for seamless adjustments to data presentation, thereby streamlining the creation of detailed and accurate reports.
Maintained by Dawid Kaledkowski. Last updated 15 days ago.
clinical-trialsmodulesnestoutputsshiny
3.6 match 34 stars 10.25 score 149 scriptsbioc
CMA:Synthesis of microarray-based classification
This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.
Maintained by Roman Hornung. Last updated 5 months ago.
7.3 match 5.09 score 61 scriptstidymodels
parsnip:A Common API to Modeling and Analysis Functions
A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).
Maintained by Max Kuhn. Last updated 3 days ago.
2.3 match 612 stars 16.37 score 3.4k scripts 69 dependentsvalentint
rrcov:Scalable Robust Estimators with High Breakdown Point
Robust Location and Scatter Estimation and Robust Multivariate Analysis with High Breakdown Point: principal component analysis (Filzmoser and Todorov (2013), <doi:10.1016/j.ins.2012.10.017>), linear and quadratic discriminant analysis (Todorov and Pires (2007)), multivariate tests (Todorov and Filzmoser (2010) <doi:10.1016/j.csda.2009.08.015>), outlier detection (Todorov et al. (2010) <doi:10.1007/s11634-010-0075-2>). See also Todorov and Filzmoser (2009) <urn:isbn:978-3838108148>, Todorov and Filzmoser (2010) <doi:10.18637/jss.v032.i03> and Boudt et al. (2019) <doi:10.1007/s11222-019-09869-x>.
Maintained by Valentin Todorov. Last updated 7 months ago.
3.5 match 2 stars 10.51 score 484 scripts 96 dependentsinbo
forrescalc:Calculation of Aggregated Values on Dendrometry, Regeneration and Vegetation of Forests, Starting from Individual Tree Measures from Fieldmap
A collection of functions to load and aggregate measurements related to dendrometry, rejuvenation and vegetation, and to access plot level results from Flemish forest reserves in data package forresdat.
Maintained by Els Lommelen. Last updated 6 months ago.
9.7 match 3.79 score 123 scriptsfbartos
RoBMA:Robust Bayesian Meta-Analyses
A framework for estimating ensembles of meta-analytic and meta-regression models (assuming either presence or absence of the effect, heterogeneity, publication bias, and moderators). The RoBMA framework uses Bayesian model-averaging to combine the competing meta-analytic models into a model ensemble, weights the posterior parameter distributions based on posterior model probabilities and uses Bayes factors to test for the presence or absence of the individual components (e.g., effect vs. no effect; Bartoš et al., 2022, <doi:10.1002/jrsm.1594>; Maier, Bartoš & Wagenmakers, 2022, <doi:10.1037/met0000405>). Users can define a wide range of prior distributions for + the effect size, heterogeneity, publication bias (including selection models and PET-PEESE), and moderator components. The package provides convenient functions for summary, visualizations, and fit diagnostics.
Maintained by František Bartoš. Last updated 1 months ago.
meta-analysismodel-averagingpublication-biasjagsopenblascpp
5.2 match 9 stars 6.97 score 53 scriptsandykrause
hpiR:House Price Indexes
Compute house price indexes and series using a variety of different methods and models common through the real estate literature. Evaluate index 'goodness' based on accuracy, volatility and revision statistics. Background on basic model construction for repeat sales models can be found at: Case and Quigley (1991) <https://ideas.repec.org/a/tpr/restat/v73y1991i1p50-58.html> and for hedonic pricing models at: Bourassa et al (2006) <doi:10.1016/j.jhe.2006.03.001>. The package author's working paper on the random forest approach to house price indexes can be found at: <https://www.github.com/andykrause/hpi_research>.
Maintained by Andy Krause. Last updated 1 years ago.
7.5 match 15 stars 4.82 score 88 scriptschristianroever
bayesmeta:Bayesian Random-Effects Meta-Analysis and Meta-Regression
A collection of functions allowing to derive the posterior distribution of the model parameters in random-effects meta-analysis or meta-regression, and providing functionality to evaluate joint and marginal posterior probability distributions, predictive distributions, shrinkage effects, posterior predictive p-values, etc.; For more details, see also Roever C (2020) <doi:10.18637/jss.v093.i06>, or Roever C and Friede T (2022) <doi:10.1016/j.cmpb.2022.107303>.
Maintained by Christian Roever. Last updated 1 years ago.
6.6 match 3 stars 5.40 score 73 scripts 1 dependentscran
reportRmd:Tidy Presentation of Clinical Reporting
Streamlined statistical reporting in 'Rmarkdown' environments. Facilitates the automated reporting of descriptive statistics, multiple univariate models, multivariable models and tables combining these outputs. Plotting functions include customisable survival curves, forest plots from logistic and ordinal regression and bivariate comparison plots.
Maintained by Lisa Avery. Last updated 2 months ago.
10.3 match 3.45 score 19 scripts 1 dependentsyihui
knitr:A General-Purpose Package for Dynamic Report Generation in R
Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.
Maintained by Yihui Xie. Last updated 12 hours ago.
dynamic-documentsknitrliterate-programmingrmarkdownsweave
1.5 match 2.4k stars 23.62 score 116k scripts 4.2k dependentsnelson-gon
manymodelr:Build and Tune Several Models
Frequently one needs a convenient way to build and tune several models in one go.The goal is to provide a number of machine learning convenience functions. It provides the ability to build, tune and obtain predictions of several models in one function. The models are built using functions from 'caret' with easier to read syntax. Kuhn(2014) <arXiv:1405.6974>.
Maintained by Nelson Gonzabato. Last updated 3 years ago.
analysis-of-varianceanovacorrelationcorrelation-coefficientgeneralized-linear-modelsgradient-boosting-decision-treesknn-classificationlinear-modelslinear-regressionmachine-learningmissing-valuesmodelsr-programmingrandom-forest-algorithmregression-models
6.7 match 2 stars 5.30 score 50 scriptsjenniniku
gllvm:Generalized Linear Latent Variable Models
Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).
Maintained by Jenni Niku. Last updated 17 hours ago.
3.3 match 51 stars 10.52 score 176 scripts 1 dependentspaulesantos
perutimber:Catalogue of the Timber Forest Species of the Peruvian Amazon
Access the data of the 'Catalogue of the Timber Forest Species of the Peruvian Amazon' Vásquez Martínez, R., & Rojas Gonzáles, R.D.P.(2022)<doi:10.21704/rfp.v37i3.1956>.
Maintained by Paul E. Santos Andrade. Last updated 3 months ago.
11.6 match 3.00 score 5 scriptszackfisher
robumeta:Robust Variance Meta-Regression
Functions for conducting robust variance estimation (RVE) meta-regression using both large and small sample RVE estimators under various weighting schemes. These methods are distribution free and provide valid point estimates, standard errors and hypothesis tests even when the degree and structure of dependence between effect sizes is unknown. Also included are functions for conducting sensitivity analyses under correlated effects weighting and producing RVE-based forest plots.
Maintained by Zachary Fisher. Last updated 4 years ago.
4.5 match 8 stars 7.75 score 178 scripts 4 dependentscran
CALIBERrfimpute:Multiple Imputation Using MICE and Random Forest
Functions to impute using random forest under full conditional specifications (multivariate imputation by chained equations). The methods are described in Shah and others (2014) <doi:10.1093/aje/kwt312>.
Maintained by Anoop Shah. Last updated 2 years ago.
13.1 match 2 stars 2.60 scorebcjaeger
obliqueRSF:Oblique Random Forests for Right-Censored Time-to-Event Data
Oblique random survival forests incorporate linear combinations of input variables into random survival forests (Ishwaran, 2008 <DOI:10.1214/08-AOAS169>). Regularized Cox proportional hazard models (Simon, 2016 <DOI:10.18637/jss.v039.i05>) are used to identify optimal linear combinations of input variables.
Maintained by Byron Jaeger. Last updated 3 years ago.
17.4 match 1.93 score 17 scriptsgavinsimpson
analogue:Analogue and Weighted Averaging Methods for Palaeoecology
Fits Modern Analogue Technique and Weighted Averaging transfer function models for prediction of environmental data from species data, and related methods used in palaeoecology.
Maintained by Gavin L. Simpson. Last updated 6 months ago.
3.8 match 14 stars 8.96 score 185 scripts 4 dependentsneilstats
ckbplotr:Create CKB Plots
ckbplotr provides functions to help create and style plots in R. It is being developed by, and primarily for, China Kadoorie Biobank researchers.
Maintained by Neil Wright. Last updated 2 months ago.
5.6 match 10 stars 5.87 score 37 scriptsamices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 5 days ago.
chained-equationsfcsimputationmicemissing-datamissing-valuesmultiple-imputationmultivariate-datacpp
2.0 match 462 stars 16.50 score 10k scripts 154 dependentsddebeer
permimp:Conditional Permutation Importance
An add-on to the 'party' package, with a faster implementation of the partial-conditional permutation importance for random forests. The standard permutation importance is implemented exactly the same as in the 'party' package. The conditional permutation importance can be computed faster, with an option to be backward compatible to the 'party' implementation. The package is compatible with random forests fit using the 'party' and the 'randomForest' package. The methods are described in Strobl et al. (2007) <doi:10.1186/1471-2105-8-25> and Debeer and Strobl (2020) <doi:10.1186/s12859-020-03622-2>.
Maintained by Dries Debeer. Last updated 2 years ago.
5.6 match 4 stars 5.85 score 39 scripts 1 dependentsr-lidar
lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications
Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.
Maintained by Jean-Romain Roussel. Last updated 1 months ago.
alsforestrylaslazlidarpoint-cloudremote-sensingopenblascppopenmp
2.3 match 623 stars 14.47 score 844 scripts 8 dependentsricgbl
etree:Classification and Regression with Structured and Mixed-Type Data
Implementation of Energy Trees, a statistical model to perform classification and regression with structured and mixed-type data. The model has a similar structure to Conditional Trees, but brings in Energy Statistics to test independence between variables that are possibly structured and of different nature. Currently, the package covers functions and graphs as structured covariates. It builds upon 'partykit' to provide functionalities for fitting, printing, plotting, and predicting with Energy Trees. Energy Trees are described in Giubilei et al. (2022) <arXiv:2207.04430>.
Maintained by Riccardo Giubilei. Last updated 3 years ago.
7.1 match 3 stars 4.52 score 11 scriptsropensci
git2rdata:Store and Retrieve Data.frames in a Git Repository
The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette("plain_text", package = "git2rdata"). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette("version_control", package = "git2rdata"). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette("workflow", package = "git2rdata") gives a toy example. 4) vignette("efficiency", package = "git2rdata") provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.
Maintained by Thierry Onkelinx. Last updated 2 months ago.
reproducible-researchversion-control
3.1 match 99 stars 10.03 score 216 scripts 4 dependentsriccardo-df
ocf:Ordered Correlation Forest
Machine learning estimator specifically optimized for predictive modeling of ordered non-numeric outcomes. 'ocf' provides forest-based estimation of the conditional choice probabilities and the covariates’ marginal effects. Under an "honesty" condition, the estimates are consistent and asymptotically normal and standard errors can be obtained by leveraging the weight-based representation of the random forest predictions. Please reference the use as Di Francesco (2025) <doi:10.1080/07474938.2024.2429596>.
Maintained by Riccardo Di Francesco. Last updated 15 days ago.
7.9 match 3.95 score 5 scripts 1 dependentsbiogenies
CancerGram:Prediction of Anticancer Peptides
Predicts anticancer peptides using random forests trained on the n-gram encoded peptides. The implemented algorithm can be accessed from both the command line and shiny-based GUI. The CancerGram model is too large for CRAN and it has to be downloaded separately from the repository: <https://github.com/BioGenies/CancerGramModel>. For more information see: Burdukiewicz et al. (2020) <doi:10.3390/pharmaceutics12111045>.
Maintained by Michal Burdukiewicz. Last updated 4 years ago.
anticancer-peptidesbioinformaticsk-mern-grampeptide-identificationrandom-forests
8.0 match 4 stars 3.90 score 3 scriptszongzheng
forestHES:Forest Health Evaluation System at the Forest Stand Level
Assessing forest ecosystem health is an effective way for forest resource management.The national forest health evaluation system at the forest stand level using analytic hierarchy process, has a high application value and practical significance. The package can effectively and easily realize the total assessment process, and help foresters to further assess and management forest resources.
Maintained by Zongzheng Chai. Last updated 5 months ago.
27.9 match 1 stars 1.11 score 13 scriptstrotsiuk
r3PG:Simulating Forest Growth using the 3-PG Model
Provides a flexible and easy-to-use interface for the Physiological Processes Predicting Growth (3-PG) model written in Fortran. The r3PG serves as a flexible and easy-to-use interface for the 3-PGpjs (monospecific, evenaged and evergreen forests) described in Landsberg & Waring (1997) <doi:10.1016/S0378-1127(97)00026-1> and the 3-PGmix (deciduous, uneven-aged or mixed-species forests) described in Forrester & Tang (2016) <doi:10.1016/j.ecolmodel.2015.07.010>.
Maintained by Volodymyr Trotsiuk. Last updated 10 months ago.
5.3 match 27 stars 5.83 score 25 scriptsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 4 days ago.
3.9 match 12 stars 7.92 score 46 scriptsjedalong
stampr:Spatial Temporal Analysis of Moving Polygons
Perform spatial temporal analysis of moving polygons; a longstanding analysis problem in Geographic Information Systems. Facilitates directional analysis, distance analysis, and some other simple functionality for examining spatial-temporal patterns of moving polygons.
Maintained by Jed Long. Last updated 12 months ago.
8.0 match 4 stars 3.81 score 16 scriptsrupppy
PiC:Pointcloud Interactive Computation for Forest Structure Analysis
Provides advanced algorithms for analyzing pointcloud data in forestry applications. Key features include fast voxelization of large datasets; segmentation of point clouds into forest floor, understorey, canopy, and wood components. The package enables efficient processing of large-scale forest pointcloud data, offering insights into forest structure, connectivity, and fire risk assessment. Algorithms to analyze pointcloud data (.xyz input file). For more details, see Ferrara & Arrizza (2025) <https://hdl.handle.net/20.500.14243/533471>. For single tree segmentation details, see Ferrara et al. (2018) <doi:10.1016/j.agrformet.2018.04.008>.
Maintained by Roberto Ferrara. Last updated 26 days ago.
7.8 match 3.88 score 19 scriptsbioc
dreamlet:Scalable differential expression analysis of single cell transcriptomics datasets with complex study designs
Recent advances in single cell/nucleus transcriptomic technology has enabled collection of cohort-scale datasets to study cell type specific gene expression differences associated disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-frieldly, purpose-build analytical framework. We have developed the dreamlet package that applies a pseudobulk approach and fits a regression model for each gene and cell cluster to test differential expression across individuals associated with a trait of interest. Use of precision-weighted linear mixed models enables accounting for repeated measures study designs, high dimensional batch effects, and varying sequencing depth or observed cells per biosample.
Maintained by Gabriel Hoffman. Last updated 5 months ago.
rnaseqgeneexpressiondifferentialexpressionbatcheffectqualitycontrolregressiongenesetenrichmentgeneregulationepigeneticsfunctionalgenomicstranscriptomicsnormalizationsinglecellpreprocessingsequencingimmunooncologysoftwarecpp
3.8 match 12 stars 8.09 score 128 scriptsstephematician
literanger:Random Forests for Multiple Imputation Based on 'ranger'
An updated implementation of R package 'ranger' by Wright et al, (2017) <doi:10.18637/jss.v077.i01> for training and predicting from random forests, particularly suited to high-dimensional data, and for embedding in 'Multiple Imputation by Chained Equations' (MICE) by van Buuren (2007) <doi:10.1177/0962280206074463>. Ensembles of classification and regression trees are currently supported. Sparse data of class 'dgCMatrix' (R package 'Matrix') can be directly analyzed. Conventional bagged predictions are available alongside an efficient prediction for MICE via the algorithm proposed by Doove et al (2014) <doi:10.1016/j.csda.2013.10.025>. Survival and probability forests are not supported in the update, nor is data of class 'gwaa.data' (R package 'GenABEL'); use the original 'ranger' package for these analyses.
Maintained by Stephen Wade. Last updated 6 months ago.
9.3 match 3.26 score 2 scriptskrisrs1128
multimedia:Multimodal Mediation Analysis
Multimodal mediation analysis is an emerging problem in microbiome data analysis. Multimedia make advanced mediation analysis techniques easy to use, ensuring that all statistical components are transparent and adaptable to specific problem contexts. The package provides a uniform interface to direct and indirect effect estimation, synthetic null hypothesis testing, bootstrap confidence interval construction, and sensitivity analysis. More details are available in Jiang et al. (2024) "multimedia: Multimodal Mediation Analysis of Microbiome Data" <doi:10.1101/2024.03.27.587024>.
Maintained by Kris Sankaran. Last updated 29 days ago.
coveragemicrobiomeregressionsequencingsoftwarestatisticalmethodstructuralequationmodelscausal-inferencedata-integrationmediation-analysis
5.4 match 1 stars 5.56 score 13 scriptsldbk
m2b:Movement to Behaviour Inference using Random Forest
Prediction of behaviour from movement characteristics using observation and random forest for the analyses of movement data in ecology. From movement information (speed, bearing...) the model predicts the observed behaviour (movement, foraging...) using random forest. The model can then extrapolate behavioural information to movement data without direct observation of behaviours. The specificity of this method relies on the derivation of multiple predictor variables from the movement data over a range of temporal windows. This procedure allows to capture as much information as possible on the changes and variations of movement and ensures the use of the random forest algorithm to its best capacity. The method is very generic, applicable to any set of data providing movement data together with observation of behaviour.
Maintained by Laurent Dubroca. Last updated 8 years ago.
7.3 match 2 stars 4.08 score 12 scriptspaulhendricks
titanic:Titanic Passenger Survival Data Set
This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner "Titanic", with variables such as economic status (class), sex, age, and survival. Whereas the base R Titanic data found by calling data("Titanic") is an array resulting from cross-tabulating 2201 observations, these data sets are individual non-aggregated observations and formatted in a machine learning context with a training sample, a testing sample, and two additional data sets that can be used for deeper machine learning analysis. These data sets are used in a very well known Kaggle competition; formatting the raw data sets in a package hopefully lowers the barrier to entry for users new to R and machine learning.
Maintained by Paul Hendricks. Last updated 8 years ago.
3.3 match 10 stars 8.95 score 804 scripts 2 dependentsropensci
frictionless:Read and Write Frictionless Data Packages
Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets.
Maintained by Peter Desmet. Last updated 6 months ago.
3.0 match 30 stars 9.79 score 55 scripts 6 dependentsbioc
survcomp:Performance Assessment and Comparison for Survival Analysis
Assessment and Comparison for Performance of Risk Prediction (Survival) Models.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressiondifferentialexpressionvisualizationcpp
3.5 match 8.46 score 448 scripts 12 dependentsforestry-labs
distillML:Model Distillation and Interpretability Methods for Machine Learning Models
Provides several methods for model distillation and interpretability for general black box machine learning models and treatment effect estimation methods. For details on the algorithms implemented, see <https://forestry-labs.github.io/distillML/index.html> Brian Cho, Theo F. Saarinen, Jasjeet S. Sekhon, Simon Walter.
Maintained by Theo Saarinen. Last updated 2 years ago.
bartdistillation-modelexplainable-machine-learningexplainable-mlinterpretabilityinterpretable-machine-learningmachine-learningmodelrandom-forestxgboost
7.5 match 7 stars 3.92 score 12 scriptscivisanalytics
civis:R Client for the 'Civis Platform API'
A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.
Maintained by Peter Cooman. Last updated 2 months ago.
3.8 match 16 stars 7.84 score 144 scriptspsychmeta
psychmeta:Psychometric Meta-Analysis Toolkit
Tools for computing bare-bones and psychometric meta-analyses and for generating psychometric data for use in meta-analysis simulations. Supports bare-bones, individual-correction, and artifact-distribution methods for meta-analyzing correlations and d values. Includes tools for converting effect sizes, computing sporadic artifact corrections, reshaping meta-analytic databases, computing multivariate corrections for range variation, and more. Bugs can be reported to <https://github.com/psychmeta/psychmeta/issues> or <issues@psychmeta.com>.
Maintained by Jeffrey A. Dahlke. Last updated 9 months ago.
hacktoberfestmeta-analysispsychologypsychometricpsychometrics
3.5 match 57 stars 8.25 score 151 scriptsjeffreyevans
yaImpute:Nearest Neighbor Observation Imputation and Evaluation Tools
Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation: analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.
Maintained by Jeffrey S. Evans. Last updated 6 months ago.
3.9 match 3 stars 7.40 score 94 scripts 12 dependentslorismichel
drf:Distributional Random Forests
An implementation of distributional random forests as introduced in Cevid & Michel & Meinshausen & Buhlmann (2020) <arXiv:2005.14458>.
Maintained by Loris Michel. Last updated 4 years ago.
18.3 match 1.59 score 39 scriptsstatistikat
VIM:Visualization and Imputation of Missing Values
New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
Maintained by Matthias Templ. Last updated 7 months ago.
hotdeckimputation-methodsmodel-predictionsvisualizationcpp
2.0 match 85 stars 14.44 score 2.6k scripts 19 dependentssmin95
smplot2:Create Standalone and Composite Plots in 'ggplot2' for Publications
Provides functions for creating and annotating a composite plot in 'ggplot2'. Offers background themes and shortcut plotting functions that produce figures that are appropriate for the format of scientific journals. Some methods are described in Min and Zhou (2021) <doi:10.3389/fgene.2021.802894>.
Maintained by Seung Hyun Min. Last updated 1 months ago.
easy-to-useggplot2scientific-visualizationvisualization
4.0 match 24 stars 7.08 score 288 scripts 1 dependentsoxfordihtm
oxthema:Oxford Colours, Palettes, Fonts, and Themes
Colours, palettes, fonts, and themes based on University of Oxford's visual identity guidelines <https://communications.web.ox.ac.uk/communications-resources/visual-identity/identity-guidelines>.
Maintained by Ernest Guevarra. Last updated 5 months ago.
5.8 match 3 stars 4.91 score 10 scriptssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 8 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
1.9 match 959 stars 15.16 score 4.0k scripts 21 dependentscidree
forestdata:Download Forestry Data
Functions for downloading forestry and land use data for use in spatial analysis. This packages offers a user-friendly solution to quickly obtain datasets such as forest height, forest types, tree species under various climate change scenarios, or land use data among others.
Maintained by Adrián Cidre González. Last updated 3 months ago.
6.8 match 13 stars 4.14 score 7 scriptsaberhrml
forestControl:Approximate False Positive Rate Control in Selection Frequency for Random Forest
Approximate false positive rate control in selection frequency for random forest using the methods described by Ender Konukoglu and Melanie Ganz (2014) <arXiv:1410.2838>. Methods for calculating the selection frequency threshold at false positive rates and selection frequency false positive rate feature selection.
Maintained by Tom Wilson. Last updated 3 years ago.
7.0 match 2 stars 4.00 score 7 scripts