R-universe search: rarity

Showing 16 of total 16 results (show query)

farewe

Rarity:Calculation of Rarity Indices for Species and Assemblages of Species

Allows calculation of rarity weights for species and indices of rarity for assemblages of species according to different methods (Leroy et al. 2012, Insect. Conserv. Divers. 5:159-168 <doi:10.1111/j.1752-4598.2011.00148.x>; Leroy et al. 2013, Divers. Distrib. 19:794-803 <doi:10.1111/ddi.12040>).

Maintained by Boris Leroy. Last updated 2 years ago.

61.1 match 1 stars 3.61 score 27 scripts 1 dependents

rekyt

funrar:Functional Rarity Indices Computation

Computes functional rarity indices as proposed by Violle et al. (2017) <doi:10.1016/j.tree.2017.02.002>. Various indices can be computed using both regional and local information. Functional Rarity combines both the functional aspect of rarity as well as the extent aspect of rarity. 'funrar' is presented in Grenié et al. (2017) <doi:10.1111/ddi.12629>.

Maintained by Matthias Grenié. Last updated 11 months ago.

ecological-models ecology rarity traits

20.1 match 17 stars 7.85 score 233 scripts 1 dependents

bioc

microbiome:Microbiome Analytics

Utilities for microbiome analysis.

Maintained by Leo Lahti. Last updated 5 months ago.

metagenomics microbiome sequencing systemsbiology hitchip hitchip-atlas human-microbiome microbiology microbiome-analysis phyloseq population-study

7.0 match 290 stars 12.50 score 2.0k scripts 5 dependents

b-cubed-eu

b3gbi:General Biodiversity Indicators for Biodiversity Data Cubes

Calculate general biodiversity indicators from GBIF data cubes. Includes many common indicators such as species richness and evenness, which can be calculated over time (trends) or space (maps).

Maintained by Shawn Dove. Last updated 13 days ago.

biodiversity-indicators data-cubes

7.2 match 3 stars 6.26 score 34 scripts 1 dependents

r-forge

fuzzySim:Fuzzy Similarity in Species Distributions

Functions to compute fuzzy versions of species occurrence patterns based on presence-absence data (including inverse distance interpolation, trend surface analysis, and prevalence-independent favourability obtained from probability of presence), as well as pair-wise fuzzy similarity (based on fuzzy logic versions of commonly used similarity indices) among those occurrence patterns. Includes also functions for model consensus and comparison (overlap and fuzzy similarity, fuzzy loss, fuzzy gain), and for data preparation, such as obtaining unique abbreviations of species names, defining the background region, cleaning and gridding (thinning) point occurrence data onto raster maps, selecting among (pseudo)absences to address survey bias, converting species lists (long format) to presence-absence tables (wide format), transposing part of a data frame, selecting relevant variables for models, assessing the false discovery rate, or analysing and dealing with multicollinearity. Initially described in Barbosa (2015) <doi:10.1111/2041-210X.12372>.

Maintained by A. Marcia Barbosa. Last updated 20 days ago.

5.3 match 2 stars 5.35 score 156 scripts

prioritizr

prioritizr:Systematic Conservation Prioritization in R

Systematic conservation prioritization using mixed integer linear programming (MILP). It provides a flexible interface for building and solving conservation planning problems. Once built, conservation planning problems can be solved using a variety of commercial and open-source exact algorithm solvers. By using exact algorithm solvers, solutions can be generated that are guaranteed to be optimal (or within a pre-specified optimality gap). Furthermore, conservation problems can be constructed to optimize the spatial allocation of different management actions or zones, meaning that conservation practitioners can identify solutions that benefit multiple stakeholders. To solve large-scale or complex conservation planning problems, users should install the Gurobi optimization software (available from <https://www.gurobi.com/>) and the 'gurobi' R package (see Gurobi Installation Guide vignette for details). Users can also install the IBM CPLEX software (<https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer>) and the 'cplexAPI' R package (available at <https://github.com/cran/cplexAPI>). Additionally, the 'rcbc' R package (available at <https://github.com/dirkschumacher/rcbc>) can be used to generate solutions using the CBC optimization software (<https://github.com/coin-or/Cbc>). For further details, see Hanson et al. (2025) <doi:10.1111/cobi.14376>.

Maintained by Richard Schuster. Last updated 11 days ago.

biodiversity conservation conservation-planner optimization prioritization solver spatial cpp

1.7 match 124 stars 11.82 score 584 scripts 2 dependents

curso-r

scryr:An Interface to the 'Scryfall' API

A simple, light, and robust interface between R and the 'Scryfall' card data API <https://scryfall.com/docs/api>.

Maintained by Caio Lente. Last updated 3 years ago.

api mtg

2.0 match 17 stars 6.09 score 18 scripts

bioc

dar:Differential Abundance Analysis by Consensus

Differential abundance testing in microbiome data challenges both parametric and non-parametric statistical methods, due to its sparsity, high variability and compositional nature. Microbiome-specific statistical methods often assume classical distribution models or take into account compositional specifics. These produce results that range within the specificity vs sensitivity space in such a way that type I and type II error that are difficult to ascertain in real microbiome data when a single method is used. Recently, a consensus approach based on multiple differential abundance (DA) methods was recently suggested in order to increase robustness. With dar, you can use dplyr-like pipeable sequences of DA methods and then apply different consensus strategies. In this way we can obtain more reliable results in a fast, consistent and reproducible way.

Maintained by Francesc Catala-Moll. Last updated 1 days ago.

software sequencing microbiome metagenomics multiplecomparison normalization bioconductor biomarker-discovery differential-abundance-analysis feature-selection microbiology phyloseq

2.0 match 2 stars 5.98 score 8 scripts

lyzander

RDota2:An R Steam API Client for Valve's Dota2

An R API Client for Valve's Dota2. RDota2 can be easily used to connect to the Steam API and retrieve data for Valve's popular video game Dota2. You can find out more about Dota2 at <http://store.steampowered.com/app/570/>.

Maintained by Theo Boutaris. Last updated 8 years ago.

1.9 match 12 stars 5.24 score 29 scripts

johnihrie

MPN:Most Probable Number and Other Microbial Enumeration Techniques

Calculates the Most Probable Number (MPN) to quantify the concentration (density) of microbes in serial dilutions of a laboratory sample (described in Jarvis, 2010 <doi:10.1111/j.1365-2672.2010.04792.x>). Also calculates the Aerobic Plate Count (APC) for similar microbial enumeration experiments.

Maintained by John Ihrie. Last updated 5 months ago.

1.5 match 3.30 score 10 scripts

sandrinepavoine

adiv:Analysis of Diversity

Functions, data sets and examples for the calculation of various indices of biodiversity including species, functional and phylogenetic diversity. Part of the indices are expressed in terms of equivalent numbers of species. The package also provides ways to partition biodiversity across spatial or temporal scales (alpha, beta, gamma diversities). In addition to the quantification of biodiversity, ordination approaches are available which rely on diversity indices and allow the detailed identification of species, functional or phylogenetic differences between communities.

Maintained by Sandrine Pavoine. Last updated 1 years ago.

1.6 match 1 stars 2.28 score 63 scripts

pascoalf

ulrb:Unsupervised Learning Based Definition of Microbial Rare Biosphere

A tool to define rare biosphere. 'ulrb' solves the problem of the definition of rarity by replacing arbitrary thresholds with an unsupervised machine learning algorithm (partitioning around medoids, or k-medoids). This algorithm works for any type of microbiome data, provided there is a species abundance table. For validation of this method to different species abundance tables see Pascoal et al, 2024 (in peer-review). This method also works for non-microbiome data.

Maintained by Francisco Pascoal. Last updated 20 days ago.

0.5 match 3 stars 5.68 score 9 scripts

haghish

shapley:Weighted Mean SHAP and CI for Robust Feature Selection in ML Grid

This R package introduces Weighted Mean SHapley Additive exPlanations (WMSHAP), an innovative method for calculating SHAP values for a grid of fine-tuned base-learner machine learning models as well as stacked ensembles, a method not previously available due to the common reliance on single best-performing models. By integrating the weighted mean SHAP values from individual base-learners comprising the ensemble or individual base-learners in a tuning grid search, the package weights SHAP contributions according to each model's performance, assessed by multiple either R squared (for both regression and classification models). alternatively, this software also offers weighting SHAP values based on the area under the precision-recall curve (AUCPR), the area under the curve (AUC), and F2 measures for binary classifiers. It further extends this framework to implement weighted confidence intervals for weighted mean SHAP values, offering a more comprehensive and robust feature importance evaluation over a grid of machine learning models, instead of solely computing SHAP values for the best model. This methodology is particularly beneficial for addressing the severe class imbalance (class rarity) problem by providing a transparent, generalized measure of feature importance that mitigates the risk of reporting SHAP values for an overfitted or biased model and maintains robustness under severe class imbalance, where there is no universal criteria of identifying the absolute best model. Furthermore, the package implements hypothesis testing to ascertain the statistical significance of SHAP values for individual features, as well as comparative significance testing of SHAP contributions between features. Additionally, it tackles a critical gap in feature selection literature by presenting criteria for the automatic feature selection of the most important features across a grid of models or stacked ensembles, eliminating the need for arbitrary determination of the number of top features to be extracted. This utility is invaluable for researchers analyzing feature significance, particularly within severely imbalanced outcomes where conventional methods fall short. Moreover, it is also expected to report democratic feature importance across a grid of models, resulting in a more comprehensive and generalizable feature selection. The package further implements a novel method for visualizing SHAP values both at subject level and feature level as well as a plot for feature selection based on the weighted mean SHAP ratios.

Maintained by E. F. Haghish. Last updated 3 days ago.

class-imbalance class-imbalance-problem feature-extraction feature-importance feature-selection machine-learning machine-learning-algorithms shap shap-analysis shap-values shapely shapley-additive-explanations shapley-decomposition shapley-value shapley-values shapleyvalue weighted-shap weighted-shap-confidence-interval weighted-shapley weighted-shapley-ci

0.5 match 14 stars 5.19 score 17 scripts

haghish

mlim:Single and Multiple Imputation with Automated Machine Learning

Machine learning algorithms have been used for performing single missing data imputation and most recently, multiple imputations. However, this is the first attempt for using automated machine learning algorithms for performing both single and multiple imputation. Automated machine learning is a procedure for fine-tuning the model automatic, performing a random search for a model that results in less error, without overfitting the data. The main idea is to allow the model to set its own parameters for imputing each variable separately instead of setting fixed predefined parameters to impute all variables of the dataset. Using automated machine learning, the package fine-tunes an Elastic Net (default) or Gradient Boosting, Random Forest, Deep Learning, Extreme Gradient Boosting, or Stacked Ensemble machine learning model (from one or a combination of other supported algorithms) for imputing the missing observations. This procedure has been implemented for the first time by this package and is expected to outperform other packages for imputing missing data that do not fine-tune their models. The multiple imputation is implemented via bootstrapping without letting the duplicated observations to harm the cross-validation procedure, which is the way imputed variables are evaluated. Most notably, the package implements automated procedure for handling imputing imbalanced data (class rarity problem), which happens when a factor variable has a level that is far more prevalent than the other(s). This is known to result in biased predictions, hence, biased imputation of missing data. However, the autobalancing procedure ensures that instead of focusing on maximizing accuracy (classification error) in imputing factor variables, a fairer procedure and imputation method is practiced.

Maintained by E. F. Haghish. Last updated 8 months ago.

automatic-machine-learning automl classimbalance data-science elastic-net extreme-gradient-boosting gbm glm gradient-boosting gradient-boosting-machine imputation imputation-algorithm imputation-methods machine-learning missing-data multipleimputation stack-ensemble

0.5 match 31 stars 4.49 score 7 scripts

mightymetrika

npboottprm:Nonparametric Bootstrap Test with Pooled Resampling

Addressing crucial research questions often necessitates a small sample size due to factors such as distinctive target populations, rarity of the event under study, time and cost constraints, ethical concerns, or group-level unit of analysis. Many readily available analytic methods, however, do not accommodate small sample sizes, and the choice of the best method can be unclear. The 'npboottprm' package enables the execution of nonparametric bootstrap tests with pooled resampling to help fill this gap. Grounded in the statistical methods for small sample size studies detailed in Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, the package facilitates a range of statistical tests, encompassing independent t-tests, paired t-tests, and one-way Analysis of Variance (ANOVA) F-tests. The nonparboot() function undertakes essential computations, yielding detailed outputs which include test statistics, effect sizes, confidence intervals, and bootstrap distributions. Further, 'npboottprm' incorporates an interactive 'shiny' web application, nonparboot_app(), offering intuitive, user-friendly data exploration.

Maintained by Mackson Ncube. Last updated 6 months ago.

datascience nonparametric statistics

0.5 match 1 stars 4.32 score 5 scripts 2 dependents

cran

EcoIndR:Ecological Indicators

Calculates several indices, such as of diversity, fluctuation, etc., and they are used to estimate ecological indicators.

Maintained by Castor Guisande Gonzalez. Last updated 1 years ago.

1.6 match 1 stars 1.30 score