R-universe search: fair

kozodoi

fairness:Algorithmic Fairness Metrics

Offers calculation, visualization and comparison of algorithmic fairness metrics. Fair machine learning is an emerging topic with the overarching aim to critically assess whether ML algorithms reinforce existing social biases. Unfair algorithms can propagate such biases and produce predictions with a disparate impact on various sensitive groups of individuals (defined by sex, gender, ethnicity, religion, income, socioeconomic status, physical or mental disabilities). Fair algorithms possess the underlying foundation that these groups should be treated similarly or have similar prediction outcomes. The fairness R package offers the calculation and comparisons of commonly and less commonly used fairness metrics in population subgroups. These methods are described by Calders and Verwer (2010) <doi:10.1007/s10618-010-0190-x>, Chouldechova (2017) <doi:10.1089/big.2016.0047>, Feldman et al. (2015) <doi:10.1145/2783258.2783311> , Friedler et al. (2018) <doi:10.1145/3287560.3287589> and Zafar et al. (2017) <doi:10.1145/3038912.3052660>. The package also offers convenient visualizations to help understand fairness metrics.

Maintained by Nikita Kozodoi. Last updated 2 years ago.

algorithmic-discrimination algorithmic-fairness discrimination disparate-impact fairness fairness-ai fairness-ml machine-learning

94.7 match 32 stars 6.82 score 69 scripts 1 dependents

modeloriented

fairmodels:Flexible Tool for Bias Detection, Visualization, and Mitigation

Measure fairness metrics in one place for many models. Check how big is model's bias towards different races, sex, nationalities etc. Use measures such as Statistical Parity, Equal odds to detect the discrimination against unprivileged groups. Visualize the bias using heatmap, radar plot, biplot, bar chart (and more!). There are various pre-processing and post-processing bias mitigation algorithms implemented. Package also supports calculating fairness metrics for regression models. Find more details in (Wiśniewski, Biecek (2021)) <arXiv:2104.00507>.

Maintained by Jakub Wiśniewski. Last updated 2 months ago.

explain-classifiers explainable-ml fairness fairness-comparison fairness-ml model-evaluation

73.4 match 86 stars 7.72 score 51 scripts 1 dependents

modeloriented

DALEX:moDel Agnostic Language for Exploration and eXplanation

Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.

Maintained by Przemyslaw Biecek. Last updated 1 months ago.

black-box dalex data-science explainable-ai explainable-artificial-intelligence explainable-ml explanations explanatory-model-analysis fairness iml interpretability interpretable-machine-learning machine-learning model-visualization predictive-modeling responsible-ai responsible-ml xai

10.0 match 1.4k stars 13.40 score 876 scripts 21 dependents

dplecko

fairadapt:Fair Data Adaptation with Quantile Preservation

An implementation of the fair data adaptation with quantile preservation described in Plecko & Meinshausen (2019) <arXiv:1911.06685>. The adaptation procedure uses the specified causal graph to pre-process the given training and testing data in such a way to remove the bias caused by the protected attribute. The procedure uses tree ensembles for quantile regression.

Maintained by Drago Plecko. Last updated 2 years ago.

causal-inference fairness machine-learning

22.5 match 2 stars 4.63 score 43 scripts

eblondel

zen4R:Interface to 'Zenodo' REST API

Provides an Interface to 'Zenodo' (<https://zenodo.org>) REST API, including management of depositions, attribution of DOIs by 'Zenodo' and upload and download of files.

Maintained by Emmanuel Blondel. Last updated 17 days ago.

api datacite depositions deposits doi fair zenodo

11.0 match 46 stars 8.41 score 76 scripts 1 dependents

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 11 months ago.

36.9 match 2.28 score 38 scripts

fairdatapipeline

rDataPipeline:Functions to Interact with the 'FAIR Data Pipeline'

R implementation of the 'FAIR Data Pipeline API'. The 'FAIR Data Pipeline' is intended to enable tracking of provenance of FAIR (findable, accessible and interoperable) data used in epidemiological modelling.

Maintained by Ryan Field. Last updated 4 months ago.

rhdf5 data-pipeline fair

15.0 match 4 stars 4.52 score 11 scripts

justinmshea

wooldridge:115 Data Sets from "Introductory Econometrics: A Modern Approach, 7e" by Jeffrey M. Wooldridge

Students learning both econometrics and R may find the introduction to both challenging. The wooldridge data package aims to lighten the task by efficiently loading any data set found in the text with a single command. Data sets have been compressed to a fraction of their original size. Documentation files contain page numbers, the original source, time of publication, and notes from the author suggesting avenues for further analysis and research. If one needs an introduction to R model syntax, a vignette contains solutions to examples from chapters of the text. Data sets are from the 7th edition (Wooldridge 2020, ISBN-13 978-1-337-55886-0), and are backwards compatible with all previous versions of the text.

Maintained by Justin M. Shea. Last updated 3 months ago.

econometrics

6.6 match 203 stars 9.38 score 1.4k scripts

spatstat

spatstat.utils:Utility Functions for 'spatstat'

Contains utility functions for the 'spatstat' family of packages which may also be useful for other purposes.

Maintained by Adrian Baddeley. Last updated 2 days ago.

spatial-analysis spatial-data spatstat

4.5 match 5 stars 11.66 score 134 scripts 248 dependents

ethanbass

chromConverter:Chromatographic File Converter

Reads chromatograms from binary formats into R objects. Currently supports conversion of 'Agilent ChemStation', 'Agilent MassHunter', 'Shimadzu LabSolutions', 'ThermoRaw', and 'Varian Workstation' files as well as various text-based formats. In addition to its internal parsers, chromConverter contains bindings to parsers in external libraries, such as 'Aston' <https://github.com/bovee/aston>, 'Entab' <https://github.com/bovee/entab>, 'rainbow' <https://rainbow-api.readthedocs.io/>, and 'ThermoRawFileParser' <https://github.com/compomics/ThermoRawFileParser>.

Maintained by Ethan Bass. Last updated 11 hours ago.

cheminformatics chromatography fair-data gc-fid hplc hplc-dad hplc-uv metabolomics metabolomics-data open-data open-science

7.5 match 33 stars 6.38 score 16 scripts 2 dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

5.3 match 3 stars 8.20 score 7.8k scripts 11 dependents

rudeboybert

fivethirtyeight:Data and Code Behind the Stories and Interactives at 'FiveThirtyEight'

Datasets and code published by the data journalism website 'FiveThirtyEight' available at <https://github.com/fivethirtyeight/data>. Note that while we received guidance from editors at 'FiveThirtyEight', this package is not officially published by 'FiveThirtyEight'.

Maintained by Albert Y. Kim. Last updated 2 years ago.

data-science datajournalism fivethirtyeight statistics

3.3 match 453 stars 10.98 score 1.7k scripts

alanarnholt

BSDA:Basic Statistics and Data Analysis

Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.

Maintained by Alan T. Arnholt. Last updated 2 years ago.

3.5 match 7 stars 9.11 score 1.3k scripts 6 dependents

cran

fairml:Fair Models in Machine Learning

Fair machine learning regression models which take sensitive attributes into account in model estimation. Currently implementing Komiyama et al. (2018) <http://proceedings.mlr.press/v80/komiyama18a/komiyama18a.pdf>, Zafar et al. (2019) <https://www.jmlr.org/papers/volume20/18-262/18-262.pdf> and my own approach from Scutari, Panero and Proissl (2022) <https://link.springer.com/content/pdf/10.1007/s11222-022-10143-w.pdf> that uses ridge regression to enforce fairness.

Maintained by Marco Scutari. Last updated 2 years ago.

20.4 match 1 stars 1.52 score 1 dependents

koenderks

jfa:Statistical Methods for Auditing

Provides statistical methods for auditing as implemented in JASP for Audit (Derks et al., 2021 <doi:10.21105/joss.02733>). First, the package makes it easy for an auditor to plan a statistical sample, select the sample from the population, and evaluate the misstatement in the sample compliant with international auditing standards. Second, the package provides statistical methods for auditing data, including tests of digit distributions and repeated values. Finally, the package includes methods for auditing algorithms on the aspect of fairness and bias. Next to classical statistical methodology, the package implements Bayesian equivalents of these methods whose statistical underpinnings are described in Derks et al. (2021) <doi:10.1111/ijau.12240>, Derks et al. (2024) <doi:10.2308/AJPT-2021-086>, Derks et al. (2022) <doi:10.31234/osf.io/8nf3e> Derks et al. (2024) <doi:10.31234/osf.io/tgq5z>, and Derks et al. (2025) <doi:10.31234/osf.io/b8tu2>.

Maintained by Koen Derks. Last updated 20 hours ago.

algorithm-auditing audit audit-sampling bayesian data-auditing jasp jasp-for-audit statistical-audit statistics cpp

4.1 match 8 stars 6.69 score 17 scripts

sooahnshin

aihuman:Experimental Evaluation of Algorithm-Assisted Human Decision-Making

Provides statistical methods for analyzing experimental evaluation of the causal impacts of algorithmic recommendations on human decisions developed by Imai, Jiang, Greiner, Halen, and Shin (2023) <doi:10.1093/jrsssa/qnad010> and Ben-Michael, Greiner, Huang, Imai, Jiang, and Shin (2024) <doi:10.48550/arXiv.2403.12108>. The data used for this paper, and made available here, are interim, based on only half of the observations in the study and (for those observations) only half of the study follow-up period. We use them only to illustrate methods, not to draw substantive conclusions.

Maintained by Sooahn Shin. Last updated 3 months ago.

openblas cpp openmp

5.3 match 2 stars 4.60 score 8 scripts

hneth

ds4psy:Data Science for Psychologists

All datasets and functions required for the examples and exercises of the book "Data Science for Psychologists" (by Hansjoerg Neth, Konstanz University, 2023), freely available at <https://bookdown.org/hneth/ds4psy/>. The book and course introduce principles and methods of data science to students of psychology and other biological or social sciences. The 'ds4psy' package primarily provides datasets, but also functions for data generation and manipulation (e.g., of text and time data) and graphics that are used in the book and its exercises. All functions included in 'ds4psy' are designed to be explicit and instructive, rather than efficient or elegant.

Maintained by Hansjoerg Neth. Last updated 1 months ago.

data-literacy data-science education exploratory-data-analysis psychology social-sciences visualisation

3.4 match 22 stars 6.79 score 70 scripts

schochastics

networkdata:Repository of Network Datasets

The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.

Maintained by David Schoch. Last updated 12 months ago.

dataset network-analysis

4.5 match 143 stars 5.01 score 143 scripts

jonasbhend

easyVerification:Ensemble Forecast Verification for Large Data Sets

Set of tools to simplify application of atomic forecast verification metrics for (comparative) verification of ensemble forecasts to large data sets. The forecast metrics are imported from the 'SpecsVerification' package, and additional forecast metrics are provided with this package. Alternatively, new user-defined forecast scores can be implemented using the example scores provided and applied using the functionality of this package.

Maintained by Jonas Bhend. Last updated 2 years ago.

cpp

3.5 match 1 stars 6.04 score 61 scripts 4 dependents

andrija-djurovic

PDtoolkit:Collection of Tools for PD Rating Model Development and Validation

The goal of this package is to cover the most common steps in probability of default (PD) rating model development and validation. The main procedures available are those that refer to univariate, bivariate, multivariate analysis, calibration and validation. Along with accompanied 'monobin' and 'monobinShiny' packages, 'PDtoolkit' provides functions which are suitable for different data transformation and modeling tasks such as: imputations, monotonic binning of numeric risk factors, binning of categorical risk factors, weights of evidence (WoE) and information value (IV) calculations, WoE coding (replacement of risk factors modalities with WoE values), risk factor clustering, area under curve (AUC) calculation and others. Additionally, package provides set of validation functions for testing homogeneity, heterogeneity, discriminatory and predictive power of the model.

Maintained by Andrija Djurovic. Last updated 1 years ago.

4.3 match 14 stars 4.78 score 86 scripts

svmiller

stevedata:Steve's Toy Data for Teaching About a Variety of Methodological, Social, and Political Topics

This is a collection of various kinds of data with broad uses for teaching. My students, and academics like me who teach the same topics I teach, should find this useful if their teaching workflow is also built around the R programming language. The applications are multiple but mostly cluster on topics of statistical methodology, international relations, and political economy.

Maintained by Steve Miller. Last updated 4 days ago.

3.3 match 8 stars 5.97 score 178 scripts

squ4reanalytics

D4TAlink.light:FAIR Data - Workflow Management

Tools, methods and processes for the management of analysis workflows. These lightweight solutions facilitate structuring R&D activities. These solutions were developed to comply with Good Documentation Practice (GDP), with FAIR principles as discussed by Jacobsen et al. (2017) <doi:10.1162/dint_r_00024>, and with ALCOA+ principles as proposed by the U.S. FDA.

Maintained by Gregoire Thomas. Last updated 4 months ago.

3.6 match 4.30 score

jamesliley

SPARRAfairness:Analysis of Differential Behaviour of SPARRA Score Across Demographic Groups

The SPARRA risk score (Scottish Patients At Risk of admission and Re-Admission) estimates yearly risk of emergency hospital admission using electronic health records on a monthly basis for most of the Scottish population. This package implements a suite of functions used to analyse the behaviour and performance of the score, focusing particularly on differential performance over demographically-defined groups. It includes useful utility functions to plot receiver-operator-characteristic, precision-recall and calibration curves, draw stock human figures, estimate counterfactual quantities without the need to re-compute risk scores, to simulate a semi-realistic dataset.

Maintained by James Liley. Last updated 4 months ago.

5.7 match 2.70 score 4 scripts

ycroissant

pglm:Panel Generalized Linear Models

Estimation of panel models for glm-like models: this includes binomial models (logit and probit), count models (poisson and negbin) and ordered models (logit and probit), as described in: Baltagi (2013) Econometric Analysis of Panel Data <doi:10.1007/978-3-030-53953-5> Hsiao (2014) Analysis of Panel Data <doi:10.1017/CBO9781139839327> and Croissant and Millo (2018), Panel Data Econometrics with R <doi:10.1002/9781119504641>.

Maintained by Yves Croissant. Last updated 1 years ago.

3.4 match 4.34 score 158 scripts 1 dependents

persimune

explainer:Machine Learning Model Explainer

It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements.

Maintained by Ramtin Zargari Marandi. Last updated 6 months ago.

ai classification clinical-research explainability explainable-ai interpretability machine-learning regression shap statistics

2.5 match 13 stars 5.37 score 12 scripts

blue-matter

MSEtool:Management Strategy Evaluation Toolkit

Development, simulation testing, and implementation of management procedures for fisheries (see Carruthers & Hordyk (2018) <doi:10.1111/2041-210X.13081>).

Maintained by Adrian Hordyk. Last updated 26 days ago.

cpp

1.8 match 8 stars 7.69 score 163 scripts 3 dependents

juhakarvanen

R6causal:R6 Class for Structural Causal Models

The implemented R6 class 'SCM' aims to simplify working with structural causal models. The missing data mechanism can be defined as a part of the structural model. The class contains methods for 1) defining a structural causal model via functions, text or conditional probability tables, 2) printing basic information on the model, 3) plotting the graph for the model using packages 'igraph' or 'qgraph', 4) simulating data from the model, 5) applying an intervention, 6) checking the identifiability of a query using the R packages 'causaleffect' and 'dosearch', 7) defining the missing data mechanism, 8) simulating incomplete data from the model according to the specified missing data mechanism and 9) checking the identifiability in a missing data problem using the R package 'dosearch'. In addition, there are functions for running experiments and doing counterfactual inference using simulation.

Maintained by Juha Karvanen. Last updated 1 years ago.

4.8 match 1 stars 2.70 score 3 scripts

matloff

dsld:Data Science Looks at Discrimination

Statistical and graphical tools for detecting and measuring discrimination and bias, be it racial, gender, age or other. Detection and remediation of bias in machine learning algorithms. 'Python' interfaces available.

Maintained by Norm Matloff. Last updated 1 months ago.

1.5 match 12 stars 7.81 score 35 scripts

fvafrcu

fritools:Utilities for the Forest Research Institute of the State Baden-Wuerttemberg

Miscellaneous utilities, tools and helper functions for finding and searching files on disk, searching for and removing R objects from the workspace. Does not import or depend on any third party package, but on core R only (i.e. it may depend on packages with priority 'base').

Maintained by Andreas Dominik Cullmann. Last updated 27 days ago.

1.9 match 5.98 score 4 scripts 6 dependents

modeloriented

arenar:Arena for the Exploration and Comparison of any ML Models

Generates data for challenging machine learning models in 'Arena' <https://arena.drwhy.ai> - an interactive web application. You can start the server with XAI (Explainable Artificial Intelligence) plots to be generated on-demand or precalculate and auto-upload data file beside shareable 'Arena' URL.

Maintained by Piotr Piątyszek. Last updated 4 years ago.

axplainable-artificial-intelligence ema explainability explanatory-model-analysis iml interactive-xai interpretability xai

1.9 match 31 stars 5.94 score 14 scripts

puttickmacroevolution

motmot:Models of Trait Macroevolution on Trees

Functions for fitting models of trait evolution on phylogenies for continuous traits. The majority of functions described in Thomas and Freckleton (2012) <doi:10.1111/j.2041-210X.2011.00132.x> and include functions that allow for tests of variation in the rates of trait evolution.

Maintained by Mark Puttick. Last updated 5 years ago.

cpp

1.8 match 4 stars 6.05 score 35 scripts

ropensci

frictionless:Read and Write Frictionless Data Packages

Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets.

Maintained by Peter Desmet. Last updated 6 months ago.

frictionlessdata oscibio

0.9 match 30 stars 9.79 score 55 scripts 6 dependents

colintredoux

r4lineups:Statistical Inference on Lineup Fairness

Since the early 1970s eyewitness testimony researchers have recognised the importance of estimating properties such as lineup bias (is the lineup biased against the suspect, leading to a rate of choosing higher than one would expect by chance?), and lineup size (how many reasonable choices are in fact available to the witness? A lineup is supposed to consist of a suspect and a number of additional members, or foils, whom a poor-quality witness might mistake for the perpetrator). Lineup measures are descriptive, in the first instance, but since the earliest articles in the literature researchers have recognised the importance of reasoning inferentially about them. This package contains functions to compute various properties of laboratory or police lineups, and is intended for use by researchers in forensic psychology and/or eyewitness testimony research. Among others, the r4lineups package includes functions for calculating lineup proportion, functional size, various estimates of effective size, diagnosticity ratio, homogeneity of the diagnosticity ratio, ROC curves for confidence x accuracy data and the degree of similarity of faces in a lineup.

Maintained by Colin Tredoux. Last updated 7 years ago.

3.1 match 2.58 score 38 scripts

mrkaye97

fitbitr:Interface with the 'Fitbit' API

Many 'Fitbit' users, and R-friendly 'Fitbit' users especially, have found themselves curious about their 'Fitbit' data. 'Fitbit' aggregates a large amount of personal data, much of which is interesting for personal research and to satisfy curiosity, and is even potentially useful in medical settings. The goal of 'fitbitr' is to make interfacing with the 'Fitbit' API as streamlined as possible, to make it simple for R users of all backgrounds and comfort levels to analyze their 'Fitbit' data and do whatever they want with it! Currently, 'fitbitr' includes methods for pulling data on activity, sleep, and heart rate, but this list is likely to grow in the future as the package gains more traction and more requests for new methods to be implemented come in. You can find details on the 'Fitbit' API at <https://dev.fitbit.com/build/reference/web-api/>.

Maintained by Matt Kaye. Last updated 5 months ago.

fitbit

1.8 match 16 stars 4.41 score 32 scripts

stibu81

ibawds:Functions and Datasets for the Data Science Course at IBAW

A collection of useful functions and datasets for the Data Science Course at IBAW.

Maintained by Stefan Lanz. Last updated 10 days ago.

data-science-learning educational-resources

1.8 match 2 stars 4.26 score 8 scripts

marcoblume

odds.converter:Betting Odds Conversion

Conversion between the most common odds types for sports betting. Hong Kong odds, US odds, Decimal odds, Indonesian odds, Malaysian odds, and raw Probability are covered in this package.

Maintained by Marco Blume. Last updated 7 years ago.

1.7 match 13 stars 4.48 score 46 scripts

prabhanjan-tattar

gpk:100 Data Sets for Statistics Education

Collection of datasets as prepared by Profs. A.P. Gore, S.A. Paranjape, and M.B. Kulkarni of Department of Statistics, Poona University, India. With their permission, first letter of their names forms the name of this package, the package has been built by me and made available for the benefit of R users. This collection requires a rich class of models and can be a very useful building block for a beginner.

Maintained by Prabhanjan Tattar. Last updated 12 years ago.

3.4 match 1.69 score 49 scripts

david-cortes

isotree:Isolation-Based Outlier Detection

Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <doi:10.48550/arXiv.1811.02141>), SCiForest (Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <doi:10.48550/arXiv.2110.13402>), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <doi:10.48550/arXiv.1910.12362>), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <doi:10.48550/arXiv.1911.06646>), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <doi:10.48550/arXiv.2111.11639>). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.

Maintained by David Cortes. Last updated 15 days ago.

anomaly-detection imputation isolation-forest outlier-detection cpp openmp

0.5 match 203 stars 10.41 score 115 scripts 6 dependents

cols4all

cols4all:Colors for all

Color palettes for all people, including those with color vision deficiency. Popular color palette series have been organized by type and have been scored on several properties such as color-blind-friendliness and fairness (i.e. do colors stand out equally?). Own palettes can also be loaded and analysed. Besides the common palette types (categorical, sequential, and diverging) it also includes cyclic and bivariate color palettes. Furthermore, a color for missing values is assigned to each palette.

Maintained by Martijn Tennekes. Last updated 2 months ago.

0.5 match 343 stars 9.98 score 26 dependents

cwru-sdle

FAIRmaterials:Ontology Tools with Data FAIRification in Development

Translates several CSV files with ontological terms and corresponding data into RDF triples. These RDF triples are stored in OWL and JSON-LD files, facilitating data accessibility, interoperability, and knowledge unification. The triples are also visualized in a graph saved as an SVG. The input CSVs must be formatted with a template from a public Google Sheet; see README or vignette for more information. This is a tool is used by the SDLE Research Center at Case Western Reserve University to create and visualize material science ontologies, and it includes example ontologies to demonstrate its capabilities. This work was supported by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under Solar Energy Technologies Office (SETO) Agreement Numbers E-EE0009353 and DE-EE0009347, Department of Energy (National Nuclear Security Administration) under Award Number DE-NA0004104 and Contract number B647887, and U.S. National Science Foundation Award under Award Number 2133576.

Maintained by Roger H French. Last updated 9 months ago.

2.4 match 2.00 score 8 scripts

bioc

RBGL:An interface to the BOOST graph library

A fairly extensive and comprehensive interface to the graph algorithms contained in the BOOST library.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

graphandnetwork network cpp

0.6 match 8.59 score 320 scripts 132 dependents

alarm-redist

redistmetrics:Redistricting Metrics

Reliable and flexible tools for scoring redistricting plans using common measures and metrics. These functions provide key direct access to tools useful for non-simulation analyses of redistricting plans, such as for measuring compactness or partisan fairness. Tools are designed to work with the 'redist' package seamlessly.

Maintained by Christopher T. Kenny. Last updated 9 months ago.

openblas cpp

0.5 match 10 stars 7.57 score 23 scripts 2 dependents

cran

hudr:Providing Data from the US Department of Housing and Urban Development

Provides functions to access data from the US Department of Housing and Urban Development <https://www.huduser.gov/portal/dataset/fmr-api.html>.

Maintained by Paul Richardson. Last updated 2 years ago.

3.3 match 1.15 score 14 scripts

pauljohn32

rockchalk:Regression Estimation and Presentation

A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.

Maintained by Paul E. Johnson. Last updated 3 years ago.

0.5 match 7.13 score 584 scripts 18 dependents

bioc

ATACseqQC:ATAC-seq Quality Control

ATAC-seq, an assay for Transposase-Accessible Chromatin using sequencing, is a rapid and sensitive method for chromatin accessibility analysis. It was developed as an alternative method to MNase-seq, FAIRE-seq and DNAse-seq. Comparing to the other methods, ATAC-seq requires less amount of the biological samples and time to process. In the process of analyzing several ATAC-seq dataset produced in our labs, we learned some of the unique aspects of the quality assessment for ATAC-seq data.To help users to quickly assess whether their ATAC-seq experiment is successful, we developed ATACseqQC package partially following the guideline published in Nature Method 2013 (Greenleaf et al.), including diagnostic plot of fragment size distribution, proportion of mitochondria reads, nucleosome positioning pattern, and CTCF or other Transcript Factor footprints.

Maintained by Jianhong Ou. Last updated 2 months ago.

sequencing dnaseq atacseq generegulation qualitycontrol coverage nucleosomepositioning immunooncology

0.5 match 7.12 score 146 scripts 1 dependents

f0nzie

rTorch:R Bindings to 'PyTorch'

'R' implementation and interface of the Machine Learning platform 'PyTorch' <https://pytorch.org/> developed in 'Python'. It requires a 'conda' environment with 'torch' and 'torchvision' Python packages to provide 'PyTorch' functions, methods and classes. The key object in 'PyTorch' is the tensor which is in essence a multidimensional array. These tensors are fairly flexible in performing calculations in CPUs as well as 'GPUs' to accelerate tensor operations.

Maintained by Alfonso R. Reyes. Last updated 3 years ago.

0.5 match 6 stars 5.97 score 157 scripts

markusul

SDModels:Spectrally Deconfounded Models

Screen for and analyze non-linear sparse direct effects in the presence of unobserved confounding using the spectral deconfounding techniques (Ćevid, Bühlmann, and Meinshausen (2020)<jmlr.org/papers/v21/19-545.html>, Guo, Ćevid, and Bühlmann (2022) <doi:10.1214/21-AOS2152>). These methods have been shown to be a good estimate for the true direct effect if we observe many covariates, e.g., high-dimensional settings, and we have fairly dense confounding. Even if the assumptions are violated, it seems like there is not much to lose, and the deconfounded models will, in general, estimate a function closer to the true one than classical least squares optimization. 'SDModels' provides functions SDAM() for Spectrally Deconfounded Additive Models (Scheidegger, Guo, and Bühlmann (2025) <doi:10.1145/3711116>) and SDForest() for Spectrally Deconfounded Random Forests (Ulmer, Scheidegger, and Bühlmann (2025) <doi:10.48550/arXiv.2502.03969>).

Maintained by Markus Ulmer. Last updated 6 days ago.

0.5 match 2 stars 5.67 score 15 scripts

mstrimas

colorist:Coloring Wildlife Distributions in Space-Time

Color and visualize wildlife distributions in space-time using raster data. In addition to enabling display of sequential change in distributions through the use of small multiples, 'colorist' provides functions for extracting several features of interest from a sequence of distributions and for visualizing those features using HCL (hue-chroma-luminance) color palettes. Resulting maps allow for "fair" visual comparison of intensity values (e.g., occurrence, abundance, or density) across space and time and can be used to address questions about where, when, and how consistently a species, group, or individual is likely to be found.

Maintained by Matthew Strimas-Mackey. Last updated 11 months ago.

0.5 match 14 stars 5.60 score 19 scripts

frenchrh

klovan:Geostatistics Methods and Klovan Data

A comprehensive set of geostatistical, visual, and analytical methods, in conjunction with the expanded version of the acclaimed J.E. Klovan's mining dataset, are included in 'klovan'. This makes the package an excellent learning resource for Principal Component Analysis (PCA), Factor Analysis (FA), kriging, and other geostatistical techniques. Originally published in the 1976 book 'Geological Factor Analysis', the included mining dataset was assembled by Professor J. E. Klovan of the University of Calgary. Being one of the first applications of FA in the geosciences, this dataset has significant historical importance. As a well-regarded and published dataset, it is an excellent resource for demonstrating the capabilities of PCA, FA, kriging, and other geostatistical techniques in geosciences. For those interested in these methods, the 'klovan' datasets provide a valuable and illustrative resource. Note that some methods require the 'RGeostats' package. Please refer to the README or Additional_repositories for installation instructions. This material is based upon research in the Materials Data Science for Stockpile Stewardship Center of Excellence (MDS3-COE), and supported by the Department of Energy's National Nuclear Security Administration under Award Number DE-NA0004104.

Maintained by Roger H French. Last updated 1 years ago.

1.1 match 2.48 score

epivec

TDLM:Systematic Comparison of Trip Distribution Laws and Models

The main purpose of this package is to propose a rigorous framework to fairly compare trip distribution laws and models as described in Lenormand et al. (2016) <doi:10.1016/j.jtrangeo.2015.12.008>.

Maintained by Maxime Lenormand. Last updated 11 days ago.

0.5 match 2 stars 4.85 score 3 scripts

cfhammill

lenses:Elegant Data Manipulation with Lenses

Provides tools for creating and using lenses to simplify data manipulation. Lenses are composable getter/setter pairs for working with data in a purely functional way. Inspired by the 'Haskell' library 'lens' (Kmett, 2012) <https://hackage.haskell.org/package/lens>. For a fairly comprehensive (and highly technical) history of lenses please see the 'lens' wiki <https://github.com/ekmett/lens/wiki/History-of-Lenses>.

Maintained by Chris Hammill. Last updated 6 years ago.

functional-programming

0.5 match 27 stars 4.75 score 42 scripts

jatanrt

eprscope:Processing and Analysis of Electron Paramagnetic Resonance Data and Spectra in Chemistry

Processing, analysis and plottting of Electron Paramagnetic Resonance (EPR) spectra in chemistry. Even though the package is mainly focused on continuous wave (CW) EPR/ENDOR, many functions may be also used for the integrated forms of 1D PULSED EPR spectra. It is able to find the most important spectral characteristics like g-factor, linewidth, maximum of derivative or integral intensities and single/double integrals. This is especially important in spectral (time) series consisting of many EPR spectra like during variable temperature experiments, electrochemical or photochemical radical generation and/or decay. Package also enables processing of data/spectra for the analytical (quantitative) purposes. Namely, how many radicals or paramagnetic centers can be found in the analyte/sample. The goal is to evaluate rate constants, considering different kinetic models, to describe the radical reactions. The key feature of the package resides in processing of the universal ASCII text formats (such as '.txt', '.csv' or '.asc') from scratch. No proprietary formats are used (except the MATLAB EasySpin outputs) and in such respect the package is in accordance with the FAIR data principles. Upon 'reading' (also providing automatic procedures for the most common EPR spectrometers) the spectral data are transformed into the universal R 'data frame' format. Subsequently, the EPR spectra can be visualized and are fully consistent either with the 'ggplot2' package or with the interactive formats based on 'plotly'. Additionally, simulations and fitting of the isotropic EPR spectra are also included in the package. Advanced simulation parameters provided by the MATLAB-EasySpin toolbox and results from the quantum chemical calculations like g-factor and hyperfine splitting/coupling constants (a/A) can be compared and summarized in table-format in order to analyze the EPR spectra by the most effective way.

Maintained by Ján Tarábek. Last updated 9 hours ago.

openjdk

0.5 match 4.74 score 7 scripts

regisoc

kibior:A Simple Data Management and Sharing Tool

An interface to store, retrieve, search, join and share datasets, based on Elasticsearch (ES) API. As a decentralized, FAIR and collaborative search engine and database effort, it proposes a simple push/pull/search mechanism only based on ES, a tool which can be deployed on nearly any hardware. It is a high-level R-ES binding to ease data usage using 'elastic' package (S. Chamberlain (2020)) <https://docs.ropensci.org/elastic/>, extends joins from 'dplyr' package (H. Wickham et al. (2020)) <https://dplyr.tidyverse.org/> and integrates specific biological format importation with Bioconductor packages such as 'rtracklayer' (M. Lawrence and al. (2009) <doi:10.1093/bioinformatics/btp328>) <http://bioconductor.org/packages/rtracklayer>, 'Biostrings' (H. Pagès and al. (2020) <doi:10.18129/B9.bioc.Biostrings>) <http://bioconductor.org/packages/Biostrings>, and 'Rsamtools' (M. Morgan and al. (2020) <doi:10.18129/B9.bioc.Rsamtools>) <http://bioconductor.org/packages/Rsamtools>, but also a long list of more common ones with 'rio' (C-h. Chan and al. (2018)) <https://cran.r-project.org/package=rio>.

Maintained by Régis Ongaro-Carcy. Last updated 4 years ago.

dataimport datarepresentation thirdpartyclient data-science database datasets elasticsearch elasticsearch-client push-pull search search-engine

0.5 match 3 stars 4.48 score 8 scripts

rcst

rim:Interface to 'Maxima', Enabling Symbolic Computation

An interface to the powerful and fairly complete computer algebra system 'Maxima'. It can be used to start and control 'Maxima' from within R by entering 'Maxima' commands. Results from 'Maxima' can be parsed and evaluated in R. It facilitates outputting results from 'Maxima' in 'LaTeX' and 'MathML'. 2D and 3D plots can be displayed directly. This package also registers a 'knitr'-engine enabling 'Maxima' code chunks to be written in 'RMarkdown' documents.

Maintained by Eric Stemmler. Last updated 5 months ago.

computer-algebra-system cpp

0.5 match 11 stars 4.34 score 10 scripts

hrbrmstr

docxtractr:Extract Data Tables and Comments from 'Microsoft' 'Word' Documents

'Microsoft Word' 'docx' files provide an 'XML' structure that is fairly straightforward to navigate, especially when it applies to 'Word' tables and comments. Tools are provided to determine table count/structure, comment count and also to extract/clean tables and comments from 'Microsoft Word' 'docx' documents. There is also nascent support for '.doc' files.

Maintained by Bob Rudis. Last updated 5 years ago.

0.5 match 4.05 score 193 scripts

gawainantell

divvy:Spatial Subsampling of Biodiversity Occurrence Data

Divide taxonomic occurrence data into geographic regions of fair comparison, with three customisable methods to standardise area and extent. Calculate common biodiversity and range-size metrics on subsampled data. Background theory and practical considerations for the methods are described in Antell and others (2023) <doi:10.31223/X5997Z>.

Maintained by Gawain Antell. Last updated 1 years ago.

0.5 match 4.00 score 10 scripts

ellakaye

BradleyTerryScalable:Fits the Bradley-Terry Model to Potentially Large and Sparse Networks of Comparison Data

Facilities are provided for fitting the simple, unstructured Bradley-Terry model to networks of binary comparisons. The implemented methods are designed to scale well to large, potentially sparse, networks. A fairly high degree of scalability is achieved through the use of EM and MM algorithms, which are relatively undemanding in terms of memory usage (relative to some other commonly used methods such as iterative weighted least squares, for example). Both maximum likelihood and Bayesian MAP estimation methods are implemented. The package provides various standard methods for a newly defined 'btfit' model class, such as the extraction and summarisation of model parameters and the simulation of new datasets from a fitted model. Tools are also provided for reshaping data into the newly defined "btdata" class, and for analysing the comparison network, prior to fitting the Bradley-Terry model. This package complements, rather than replaces, the existing 'BradleyTerry2' package. (BradleyTerry2 has rather different aims, which are mainly the specification and fitting of "structured" Bradley-Terry models in which the strength parameters depend on covariates.)

Maintained by Ella Kaye. Last updated 3 years ago.

openblas cpp openmp

0.5 match 25 stars 3.80 score 25 scripts

djbetancourt-gh

funGp:Gaussian Process Models for Scalar and Functional Inputs

Construction and smart selection of Gaussian process models for analysis of computer experiments with emphasis on treatment of functional inputs that are regularly sampled. This package offers: (i) flexible modeling of functional-input regression problems through the fairly general Gaussian process model; (ii) built-in dimension reduction for functional inputs; (iii) heuristic optimization of the structural parameters of the model (e.g., active inputs, kernel function, type of distance). An in-depth tutorial in the use of funGp is provided in Betancourt et al. (2024) <doi:10.18637/jss.v109.i05> and Metamodeling background is provided in Betancourt et al. (2020) <doi:10.1016/j.ress.2020.106870>. The algorithm for structural parameter optimization is described in <https://hal.science/hal-02532713>.

Maintained by Jose Betancourt. Last updated 10 months ago.

0.5 match 4 stars 3.78 score 2 scripts

cran

SSLR:Semi-Supervised Classification, Regression and Clustering Methods

Providing a collection of techniques for semi-supervised classification, regression and clustering. In semi-supervised problem, both labeled and unlabeled data are used to train a classifier. The package includes a collection of semi-supervised learning techniques: self-training, co-training, democratic, decision tree, random forest, 'S3VM' ... etc, with a fairly intuitive interface that is easy to use.

Maintained by Francisco Jesús Palomares Alabarce. Last updated 4 years ago.

cpp

0.5 match 1 stars 3.64 score 73 scripts

amarnathbose

AHPtools:Consistency in the Analytic Hierarchy Process

A Swiss Army knife of utility functions for users of the Analytic Hierarchy Process (AHP) which will help you to assess the consistency of a PCM as well as to improve its consistency ratio, to compute the sensitivity of a PCM, create a logical, not a random PCM, from the preferences you provide for the alternatives, and a function that helps evaluate the actual consistency of a PCM based on objective, fair bench marking. The various functions in the toolkit additionally provide the flexibility to users to specify only the upper triangular comparison ratios of the PCM in order to performs its assigned task.

Maintained by Amarnath Bose. Last updated 2 years ago.

0.5 match 3.00 score 3 scripts

cran

valection:Sampler for Verification Studies

A binding for the 'valection' program which offers various ways to sample the outputs of competing algorithms or parameterizations, and fairly assess their performance against each other. The 'valection' C library is required to use this package and can be downloaded from: <http://labs.oicr.on.ca/boutros-lab/software/valection>. Cooper CI, et al; Valection: Design Optimization for Validation and Verification Studies; Biorxiv 2018; <doi:10.1101/254839>.

Maintained by Paul C. Boutros. Last updated 7 years ago.

0.5 match 2.00 score

cran

mlr3summary:Model and Learner Summaries for 'mlr3'

Concise and interpretable summaries for machine learning models and learners of the 'mlr3' ecosystem. The package takes inspiration from the summary function for (generalized) linear models but extends it to non-parametric machine learning models, based on generalization performance, model complexity, feature importances and effects, and fairness metrics.

Maintained by Susanne Dandl. Last updated 11 months ago.

0.5 match 1.70 score 6 scripts

opkoutchade

winputall:Variable Input Allocation Among Crops

Using a time-varying random parameters model developed in Koutchade et al., (2024) <https://hal.science/hal-04318163>, this package allows allocating variable input costs among crops produced by farmers based on panel data including information on input expenditure aggregated at the farm level and acreage shares. It also considers in fairly way the weighting data and can allow integrating time-varying and time-constant control variables.

Maintained by Obafèmi Philippe Koutchade. Last updated 9 months ago.

cpp

0.5 match 1.70 score

cran

ShinyTester:Functions to Minimize Bonehead Moves While Working with 'shiny'

It's my experience that working with 'shiny' is intuitive once you're into it, but can be quite daunting at first. Several common mistakes are fairly predictable, and therefore we can control for these. The functions in this package help match up the assets listed in the UI and the SERVER files, and Visualize the ad hoc structure of the 'shiny' App.

Maintained by Amit Kohli. Last updated 8 years ago.

0.5 match 1.00 score

cran

predfairness:Discrimination Mitigation for Machine Learning Models

Based on different statistical definitions of discrimination, several methods have been proposed to detect and mitigate social inequality in machine learning models. This package aims to provide an alternative to fairness treatment in predictive models. The ROC method implemented in this package is described by Kamiran, Karim and Zhang (2012) <https://ieeexplore.ieee.org/document/6413831/>.

Maintained by Thaís de Bessa Gontijo de Oliveira. Last updated 4 years ago.

0.5 match 1.00 score

cran

essHist:The Essential Histogram

Provide an optimal histogram, in the sense of probability density estimation and features detection, by means of multiscale variational inference. In other words, the resulting histogram servers as an optimal density estimator, and meanwhile recovers the features, such as increases or modes, with both false positive and false negative controls. Moreover, it provides a parsimonious representation in terms of the number of blocks, which simplifies data interpretation. The only assumption for the method is that data points are independent and identically distributed, so it applies to fairly general situations, including continuous distributions, discrete distributions, and mixtures of both. For details see Li, Munk, Sieling and Walther (2016) <arXiv:1612.07216>.

Maintained by Housen Li. Last updated 6 years ago.

cpp

0.5 match 1 stars 1.00 score 6 scripts

juan-goncalves-dosantos

ProjectManagement:Management of Deterministic and Stochastic Projects

Management problems of deterministic and stochastic projects. It obtains the duration of a project and the appropriate slack for each activity in a deterministic context. In addition it obtains a schedule of activities' time (Castro, Gómez & Tejada (2007) <doi:10.1016/j.orl.2007.01.003>). It also allows the management of resources. When the project is done, and the actual duration for each activity is known, then it can know how long the project is delayed and make a fair delivery of the delay between each activity (Bergantiños, Valencia-Toledo & Vidal-Puga (2018) <doi:10.1016/j.dam.2017.08.012>). In a stochastic context it can estimate the average duration of the project and plot the density of this duration, as well as, the density of the early and last times of the chosen activities. As in the deterministic case, it can make a distribution of the delay generated by observing the project already carried out.

Maintained by Juan Carlos Gonçalves Dosantos. Last updated 5 months ago.

0.5 match 1.00 score 9 scripts