Showing 67 of total 67 results (show query)
kozodoi
fairness:Algorithmic Fairness Metrics
Offers calculation, visualization and comparison of algorithmic fairness metrics. Fair machine learning is an emerging topic with the overarching aim to critically assess whether ML algorithms reinforce existing social biases. Unfair algorithms can propagate such biases and produce predictions with a disparate impact on various sensitive groups of individuals (defined by sex, gender, ethnicity, religion, income, socioeconomic status, physical or mental disabilities). Fair algorithms possess the underlying foundation that these groups should be treated similarly or have similar prediction outcomes. The fairness R package offers the calculation and comparisons of commonly and less commonly used fairness metrics in population subgroups. These methods are described by Calders and Verwer (2010) <doi:10.1007/s10618-010-0190-x>, Chouldechova (2017) <doi:10.1089/big.2016.0047>, Feldman et al. (2015) <doi:10.1145/2783258.2783311> , Friedler et al. (2018) <doi:10.1145/3287560.3287589> and Zafar et al. (2017) <doi:10.1145/3038912.3052660>. The package also offers convenient visualizations to help understand fairness metrics.
Maintained by Nikita Kozodoi. Last updated 2 years ago.
algorithmic-discriminationalgorithmic-fairnessdiscriminationdisparate-impactfairnessfairness-aifairness-mlmachine-learning
94.7 match 32 stars 6.82 score 69 scripts 1 dependentsmodeloriented
fairmodels:Flexible Tool for Bias Detection, Visualization, and Mitigation
Measure fairness metrics in one place for many models. Check how big is model's bias towards different races, sex, nationalities etc. Use measures such as Statistical Parity, Equal odds to detect the discrimination against unprivileged groups. Visualize the bias using heatmap, radar plot, biplot, bar chart (and more!). There are various pre-processing and post-processing bias mitigation algorithms implemented. Package also supports calculating fairness metrics for regression models. Find more details in (Wiśniewski, Biecek (2021)) <arXiv:2104.00507>.
Maintained by Jakub Wiśniewski. Last updated 2 months ago.
explain-classifiersexplainable-mlfairnessfairness-comparisonfairness-mlmodel-evaluation
73.4 match 86 stars 7.72 score 51 scripts 1 dependentsmodeloriented
DALEX:moDel Agnostic Language for Exploration and eXplanation
Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.
Maintained by Przemyslaw Biecek. Last updated 1 months ago.
black-boxdalexdata-scienceexplainable-aiexplainable-artificial-intelligenceexplainable-mlexplanationsexplanatory-model-analysisfairnessimlinterpretabilityinterpretable-machine-learningmachine-learningmodel-visualizationpredictive-modelingresponsible-airesponsible-mlxai
10.0 match 1.4k stars 13.40 score 876 scripts 21 dependentsdplecko
fairadapt:Fair Data Adaptation with Quantile Preservation
An implementation of the fair data adaptation with quantile preservation described in Plecko & Meinshausen (2019) <arXiv:1911.06685>. The adaptation procedure uses the specified causal graph to pre-process the given training and testing data in such a way to remove the bias caused by the protected attribute. The procedure uses tree ensembles for quantile regression.
Maintained by Drago Plecko. Last updated 2 years ago.
causal-inferencefairnessmachine-learning
22.5 match 2 stars 4.63 score 43 scriptseblondel
zen4R:Interface to 'Zenodo' REST API
Provides an Interface to 'Zenodo' (<https://zenodo.org>) REST API, including management of depositions, attribution of DOIs by 'Zenodo' and upload and download of files.
Maintained by Emmanuel Blondel. Last updated 17 days ago.
apidatacitedepositionsdepositsdoifairzenodo
11.0 match 46 stars 8.41 score 76 scripts 1 dependentskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
36.9 match 2.28 score 38 scriptsfairdatapipeline
rDataPipeline:Functions to Interact with the 'FAIR Data Pipeline'
R implementation of the 'FAIR Data Pipeline API'. The 'FAIR Data Pipeline' is intended to enable tracking of provenance of FAIR (findable, accessible and interoperable) data used in epidemiological modelling.
Maintained by Ryan Field. Last updated 4 months ago.
15.0 match 4 stars 4.52 score 11 scriptsjustinmshea
wooldridge:115 Data Sets from "Introductory Econometrics: A Modern Approach, 7e" by Jeffrey M. Wooldridge
Students learning both econometrics and R may find the introduction to both challenging. The wooldridge data package aims to lighten the task by efficiently loading any data set found in the text with a single command. Data sets have been compressed to a fraction of their original size. Documentation files contain page numbers, the original source, time of publication, and notes from the author suggesting avenues for further analysis and research. If one needs an introduction to R model syntax, a vignette contains solutions to examples from chapters of the text. Data sets are from the 7th edition (Wooldridge 2020, ISBN-13 978-1-337-55886-0), and are backwards compatible with all previous versions of the text.
Maintained by Justin M. Shea. Last updated 3 months ago.
6.6 match 203 stars 9.38 score 1.4k scriptsspatstat
spatstat.utils:Utility Functions for 'spatstat'
Contains utility functions for the 'spatstat' family of packages which may also be useful for other purposes.
Maintained by Adrian Baddeley. Last updated 2 days ago.
spatial-analysisspatial-dataspatstat
4.5 match 5 stars 11.66 score 134 scripts 248 dependentsethanbass
chromConverter:Chromatographic File Converter
Reads chromatograms from binary formats into R objects. Currently supports conversion of 'Agilent ChemStation', 'Agilent MassHunter', 'Shimadzu LabSolutions', 'ThermoRaw', and 'Varian Workstation' files as well as various text-based formats. In addition to its internal parsers, chromConverter contains bindings to parsers in external libraries, such as 'Aston' <https://github.com/bovee/aston>, 'Entab' <https://github.com/bovee/entab>, 'rainbow' <https://rainbow-api.readthedocs.io/>, and 'ThermoRawFileParser' <https://github.com/compomics/ThermoRawFileParser>.
Maintained by Ethan Bass. Last updated 11 hours ago.
cheminformaticschromatographyfair-datagc-fidhplchplc-dadhplc-uvmetabolomicsmetabolomics-dataopen-dataopen-science
7.5 match 33 stars 6.38 score 16 scripts 2 dependentstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
5.3 match 3 stars 8.20 score 7.8k scripts 11 dependentsrudeboybert
fivethirtyeight:Data and Code Behind the Stories and Interactives at 'FiveThirtyEight'
Datasets and code published by the data journalism website 'FiveThirtyEight' available at <https://github.com/fivethirtyeight/data>. Note that while we received guidance from editors at 'FiveThirtyEight', this package is not officially published by 'FiveThirtyEight'.
Maintained by Albert Y. Kim. Last updated 2 years ago.
data-sciencedatajournalismfivethirtyeightstatistics
3.3 match 453 stars 10.98 score 1.7k scriptsalanarnholt
BSDA:Basic Statistics and Data Analysis
Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.
Maintained by Alan T. Arnholt. Last updated 2 years ago.
3.5 match 7 stars 9.11 score 1.3k scripts 6 dependentscran
fairml:Fair Models in Machine Learning
Fair machine learning regression models which take sensitive attributes into account in model estimation. Currently implementing Komiyama et al. (2018) <http://proceedings.mlr.press/v80/komiyama18a/komiyama18a.pdf>, Zafar et al. (2019) <https://www.jmlr.org/papers/volume20/18-262/18-262.pdf> and my own approach from Scutari, Panero and Proissl (2022) <https://link.springer.com/content/pdf/10.1007/s11222-022-10143-w.pdf> that uses ridge regression to enforce fairness.
Maintained by Marco Scutari. Last updated 2 years ago.
20.4 match 1 stars 1.52 score 1 dependentskoenderks
jfa:Statistical Methods for Auditing
Provides statistical methods for auditing as implemented in JASP for Audit (Derks et al., 2021 <doi:10.21105/joss.02733>). First, the package makes it easy for an auditor to plan a statistical sample, select the sample from the population, and evaluate the misstatement in the sample compliant with international auditing standards. Second, the package provides statistical methods for auditing data, including tests of digit distributions and repeated values. Finally, the package includes methods for auditing algorithms on the aspect of fairness and bias. Next to classical statistical methodology, the package implements Bayesian equivalents of these methods whose statistical underpinnings are described in Derks et al. (2021) <doi:10.1111/ijau.12240>, Derks et al. (2024) <doi:10.2308/AJPT-2021-086>, Derks et al. (2022) <doi:10.31234/osf.io/8nf3e> Derks et al. (2024) <doi:10.31234/osf.io/tgq5z>, and Derks et al. (2025) <doi:10.31234/osf.io/b8tu2>.
Maintained by Koen Derks. Last updated 20 hours ago.
algorithm-auditingauditaudit-samplingbayesiandata-auditingjaspjasp-for-auditstatistical-auditstatisticscpp
4.1 match 8 stars 6.69 score 17 scriptssooahnshin
aihuman:Experimental Evaluation of Algorithm-Assisted Human Decision-Making
Provides statistical methods for analyzing experimental evaluation of the causal impacts of algorithmic recommendations on human decisions developed by Imai, Jiang, Greiner, Halen, and Shin (2023) <doi:10.1093/jrsssa/qnad010> and Ben-Michael, Greiner, Huang, Imai, Jiang, and Shin (2024) <doi:10.48550/arXiv.2403.12108>. The data used for this paper, and made available here, are interim, based on only half of the observations in the study and (for those observations) only half of the study follow-up period. We use them only to illustrate methods, not to draw substantive conclusions.
Maintained by Sooahn Shin. Last updated 3 months ago.
5.3 match 2 stars 4.60 score 8 scriptshneth
ds4psy:Data Science for Psychologists
All datasets and functions required for the examples and exercises of the book "Data Science for Psychologists" (by Hansjoerg Neth, Konstanz University, 2023), freely available at <https://bookdown.org/hneth/ds4psy/>. The book and course introduce principles and methods of data science to students of psychology and other biological or social sciences. The 'ds4psy' package primarily provides datasets, but also functions for data generation and manipulation (e.g., of text and time data) and graphics that are used in the book and its exercises. All functions included in 'ds4psy' are designed to be explicit and instructive, rather than efficient or elegant.
Maintained by Hansjoerg Neth. Last updated 1 months ago.
data-literacydata-scienceeducationexploratory-data-analysispsychologysocial-sciencesvisualisation
3.4 match 22 stars 6.79 score 70 scriptsschochastics
networkdata:Repository of Network Datasets
The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.
Maintained by David Schoch. Last updated 12 months ago.
4.5 match 143 stars 5.01 score 143 scriptsjonasbhend
easyVerification:Ensemble Forecast Verification for Large Data Sets
Set of tools to simplify application of atomic forecast verification metrics for (comparative) verification of ensemble forecasts to large data sets. The forecast metrics are imported from the 'SpecsVerification' package, and additional forecast metrics are provided with this package. Alternatively, new user-defined forecast scores can be implemented using the example scores provided and applied using the functionality of this package.
Maintained by Jonas Bhend. Last updated 2 years ago.
3.5 match 1 stars 6.04 score 61 scripts 4 dependentsandrija-djurovic
PDtoolkit:Collection of Tools for PD Rating Model Development and Validation
The goal of this package is to cover the most common steps in probability of default (PD) rating model development and validation. The main procedures available are those that refer to univariate, bivariate, multivariate analysis, calibration and validation. Along with accompanied 'monobin' and 'monobinShiny' packages, 'PDtoolkit' provides functions which are suitable for different data transformation and modeling tasks such as: imputations, monotonic binning of numeric risk factors, binning of categorical risk factors, weights of evidence (WoE) and information value (IV) calculations, WoE coding (replacement of risk factors modalities with WoE values), risk factor clustering, area under curve (AUC) calculation and others. Additionally, package provides set of validation functions for testing homogeneity, heterogeneity, discriminatory and predictive power of the model.
Maintained by Andrija Djurovic. Last updated 1 years ago.
4.3 match 14 stars 4.78 score 86 scriptssvmiller
stevedata:Steve's Toy Data for Teaching About a Variety of Methodological, Social, and Political Topics
This is a collection of various kinds of data with broad uses for teaching. My students, and academics like me who teach the same topics I teach, should find this useful if their teaching workflow is also built around the R programming language. The applications are multiple but mostly cluster on topics of statistical methodology, international relations, and political economy.
Maintained by Steve Miller. Last updated 4 days ago.
3.3 match 8 stars 5.97 score 178 scriptssqu4reanalytics
D4TAlink.light:FAIR Data - Workflow Management
Tools, methods and processes for the management of analysis workflows. These lightweight solutions facilitate structuring R&D activities. These solutions were developed to comply with Good Documentation Practice (GDP), with FAIR principles as discussed by Jacobsen et al. (2017) <doi:10.1162/dint_r_00024>, and with ALCOA+ principles as proposed by the U.S. FDA.
Maintained by Gregoire Thomas. Last updated 4 months ago.
3.6 match 4.30 scorejamesliley
SPARRAfairness:Analysis of Differential Behaviour of SPARRA Score Across Demographic Groups
The SPARRA risk score (Scottish Patients At Risk of admission and Re-Admission) estimates yearly risk of emergency hospital admission using electronic health records on a monthly basis for most of the Scottish population. This package implements a suite of functions used to analyse the behaviour and performance of the score, focusing particularly on differential performance over demographically-defined groups. It includes useful utility functions to plot receiver-operator-characteristic, precision-recall and calibration curves, draw stock human figures, estimate counterfactual quantities without the need to re-compute risk scores, to simulate a semi-realistic dataset.
Maintained by James Liley. Last updated 4 months ago.
5.7 match 2.70 score 4 scriptsycroissant
pglm:Panel Generalized Linear Models
Estimation of panel models for glm-like models: this includes binomial models (logit and probit), count models (poisson and negbin) and ordered models (logit and probit), as described in: Baltagi (2013) Econometric Analysis of Panel Data <doi:10.1007/978-3-030-53953-5> Hsiao (2014) Analysis of Panel Data <doi:10.1017/CBO9781139839327> and Croissant and Millo (2018), Panel Data Econometrics with R <doi:10.1002/9781119504641>.
Maintained by Yves Croissant. Last updated 1 years ago.
3.4 match 4.34 score 158 scripts 1 dependentspersimune
explainer:Machine Learning Model Explainer
It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements.
Maintained by Ramtin Zargari Marandi. Last updated 6 months ago.
aiclassificationclinical-researchexplainabilityexplainable-aiinterpretabilitymachine-learningregressionshapstatistics
2.5 match 13 stars 5.37 score 12 scriptsblue-matter
MSEtool:Management Strategy Evaluation Toolkit
Development, simulation testing, and implementation of management procedures for fisheries (see Carruthers & Hordyk (2018) <doi:10.1111/2041-210X.13081>).
Maintained by Adrian Hordyk. Last updated 26 days ago.
1.8 match 8 stars 7.69 score 163 scripts 3 dependentsmatloff
dsld:Data Science Looks at Discrimination
Statistical and graphical tools for detecting and measuring discrimination and bias, be it racial, gender, age or other. Detection and remediation of bias in machine learning algorithms. 'Python' interfaces available.
Maintained by Norm Matloff. Last updated 1 months ago.
1.5 match 12 stars 7.81 score 35 scriptsfvafrcu
fritools:Utilities for the Forest Research Institute of the State Baden-Wuerttemberg
Miscellaneous utilities, tools and helper functions for finding and searching files on disk, searching for and removing R objects from the workspace. Does not import or depend on any third party package, but on core R only (i.e. it may depend on packages with priority 'base').
Maintained by Andreas Dominik Cullmann. Last updated 27 days ago.
1.9 match 5.98 score 4 scripts 6 dependentsmodeloriented
arenar:Arena for the Exploration and Comparison of any ML Models
Generates data for challenging machine learning models in 'Arena' <https://arena.drwhy.ai> - an interactive web application. You can start the server with XAI (Explainable Artificial Intelligence) plots to be generated on-demand or precalculate and auto-upload data file beside shareable 'Arena' URL.
Maintained by Piotr Piątyszek. Last updated 4 years ago.
axplainable-artificial-intelligenceemaexplainabilityexplanatory-model-analysisimlinteractive-xaiinterpretabilityxai
1.9 match 31 stars 5.94 score 14 scriptsputtickmacroevolution
motmot:Models of Trait Macroevolution on Trees
Functions for fitting models of trait evolution on phylogenies for continuous traits. The majority of functions described in Thomas and Freckleton (2012) <doi:10.1111/j.2041-210X.2011.00132.x> and include functions that allow for tests of variation in the rates of trait evolution.
Maintained by Mark Puttick. Last updated 5 years ago.
1.8 match 4 stars 6.05 score 35 scriptsropensci
frictionless:Read and Write Frictionless Data Packages
Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets.
Maintained by Peter Desmet. Last updated 6 months ago.
0.9 match 30 stars 9.79 score 55 scripts 6 dependentsmrkaye97
fitbitr:Interface with the 'Fitbit' API
Many 'Fitbit' users, and R-friendly 'Fitbit' users especially, have found themselves curious about their 'Fitbit' data. 'Fitbit' aggregates a large amount of personal data, much of which is interesting for personal research and to satisfy curiosity, and is even potentially useful in medical settings. The goal of 'fitbitr' is to make interfacing with the 'Fitbit' API as streamlined as possible, to make it simple for R users of all backgrounds and comfort levels to analyze their 'Fitbit' data and do whatever they want with it! Currently, 'fitbitr' includes methods for pulling data on activity, sleep, and heart rate, but this list is likely to grow in the future as the package gains more traction and more requests for new methods to be implemented come in. You can find details on the 'Fitbit' API at <https://dev.fitbit.com/build/reference/web-api/>.
Maintained by Matt Kaye. Last updated 5 months ago.
1.8 match 16 stars 4.41 score 32 scriptsstibu81
ibawds:Functions and Datasets for the Data Science Course at IBAW
A collection of useful functions and datasets for the Data Science Course at IBAW.
Maintained by Stefan Lanz. Last updated 10 days ago.
data-science-learningeducational-resources
1.8 match 2 stars 4.26 score 8 scriptsmarcoblume
odds.converter:Betting Odds Conversion
Conversion between the most common odds types for sports betting. Hong Kong odds, US odds, Decimal odds, Indonesian odds, Malaysian odds, and raw Probability are covered in this package.
Maintained by Marco Blume. Last updated 7 years ago.
1.7 match 13 stars 4.48 score 46 scriptsprabhanjan-tattar
gpk:100 Data Sets for Statistics Education
Collection of datasets as prepared by Profs. A.P. Gore, S.A. Paranjape, and M.B. Kulkarni of Department of Statistics, Poona University, India. With their permission, first letter of their names forms the name of this package, the package has been built by me and made available for the benefit of R users. This collection requires a rich class of models and can be a very useful building block for a beginner.
Maintained by Prabhanjan Tattar. Last updated 12 years ago.
3.4 match 1.69 score 49 scriptsdavid-cortes
isotree:Isolation-Based Outlier Detection
Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <doi:10.48550/arXiv.1811.02141>), SCiForest (Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <doi:10.48550/arXiv.2110.13402>), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <doi:10.48550/arXiv.1910.12362>), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <doi:10.48550/arXiv.1911.06646>), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <doi:10.48550/arXiv.2111.11639>). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.
Maintained by David Cortes. Last updated 15 days ago.
anomaly-detectionimputationisolation-forestoutlier-detectioncppopenmp
0.5 match 203 stars 10.41 score 115 scripts 6 dependentscols4all
cols4all:Colors for all
Color palettes for all people, including those with color vision deficiency. Popular color palette series have been organized by type and have been scored on several properties such as color-blind-friendliness and fairness (i.e. do colors stand out equally?). Own palettes can also be loaded and analysed. Besides the common palette types (categorical, sequential, and diverging) it also includes cyclic and bivariate color palettes. Furthermore, a color for missing values is assigned to each palette.
Maintained by Martijn Tennekes. Last updated 2 months ago.
0.5 match 343 stars 9.98 score 26 dependentsbioc
RBGL:An interface to the BOOST graph library
A fairly extensive and comprehensive interface to the graph algorithms contained in the BOOST library.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
0.6 match 8.59 score 320 scripts 132 dependentsalarm-redist
redistmetrics:Redistricting Metrics
Reliable and flexible tools for scoring redistricting plans using common measures and metrics. These functions provide key direct access to tools useful for non-simulation analyses of redistricting plans, such as for measuring compactness or partisan fairness. Tools are designed to work with the 'redist' package seamlessly.
Maintained by Christopher T. Kenny. Last updated 9 months ago.
0.5 match 10 stars 7.57 score 23 scripts 2 dependentscran
hudr:Providing Data from the US Department of Housing and Urban Development
Provides functions to access data from the US Department of Housing and Urban Development <https://www.huduser.gov/portal/dataset/fmr-api.html>.
Maintained by Paul Richardson. Last updated 2 years ago.
3.3 match 1.15 score 14 scriptspauljohn32
rockchalk:Regression Estimation and Presentation
A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.
Maintained by Paul E. Johnson. Last updated 3 years ago.
0.5 match 7.13 score 584 scripts 18 dependentsbioc
ATACseqQC:ATAC-seq Quality Control
ATAC-seq, an assay for Transposase-Accessible Chromatin using sequencing, is a rapid and sensitive method for chromatin accessibility analysis. It was developed as an alternative method to MNase-seq, FAIRE-seq and DNAse-seq. Comparing to the other methods, ATAC-seq requires less amount of the biological samples and time to process. In the process of analyzing several ATAC-seq dataset produced in our labs, we learned some of the unique aspects of the quality assessment for ATAC-seq data.To help users to quickly assess whether their ATAC-seq experiment is successful, we developed ATACseqQC package partially following the guideline published in Nature Method 2013 (Greenleaf et al.), including diagnostic plot of fragment size distribution, proportion of mitochondria reads, nucleosome positioning pattern, and CTCF or other Transcript Factor footprints.
Maintained by Jianhong Ou. Last updated 2 months ago.
sequencingdnaseqatacseqgeneregulationqualitycontrolcoveragenucleosomepositioningimmunooncology
0.5 match 7.12 score 146 scripts 1 dependentsf0nzie
rTorch:R Bindings to 'PyTorch'
'R' implementation and interface of the Machine Learning platform 'PyTorch' <https://pytorch.org/> developed in 'Python'. It requires a 'conda' environment with 'torch' and 'torchvision' Python packages to provide 'PyTorch' functions, methods and classes. The key object in 'PyTorch' is the tensor which is in essence a multidimensional array. These tensors are fairly flexible in performing calculations in CPUs as well as 'GPUs' to accelerate tensor operations.
Maintained by Alfonso R. Reyes. Last updated 3 years ago.
0.5 match 6 stars 5.97 score 157 scriptsmstrimas
colorist:Coloring Wildlife Distributions in Space-Time
Color and visualize wildlife distributions in space-time using raster data. In addition to enabling display of sequential change in distributions through the use of small multiples, 'colorist' provides functions for extracting several features of interest from a sequence of distributions and for visualizing those features using HCL (hue-chroma-luminance) color palettes. Resulting maps allow for "fair" visual comparison of intensity values (e.g., occurrence, abundance, or density) across space and time and can be used to address questions about where, when, and how consistently a species, group, or individual is likely to be found.
Maintained by Matthew Strimas-Mackey. Last updated 11 months ago.
0.5 match 14 stars 5.60 score 19 scriptsepivec
TDLM:Systematic Comparison of Trip Distribution Laws and Models
The main purpose of this package is to propose a rigorous framework to fairly compare trip distribution laws and models as described in Lenormand et al. (2016) <doi:10.1016/j.jtrangeo.2015.12.008>.
Maintained by Maxime Lenormand. Last updated 11 days ago.
0.5 match 2 stars 4.85 score 3 scriptscfhammill
lenses:Elegant Data Manipulation with Lenses
Provides tools for creating and using lenses to simplify data manipulation. Lenses are composable getter/setter pairs for working with data in a purely functional way. Inspired by the 'Haskell' library 'lens' (Kmett, 2012) <https://hackage.haskell.org/package/lens>. For a fairly comprehensive (and highly technical) history of lenses please see the 'lens' wiki <https://github.com/ekmett/lens/wiki/History-of-Lenses>.
Maintained by Chris Hammill. Last updated 6 years ago.
0.5 match 27 stars 4.75 score 42 scriptsregisoc
kibior:A Simple Data Management and Sharing Tool
An interface to store, retrieve, search, join and share datasets, based on Elasticsearch (ES) API. As a decentralized, FAIR and collaborative search engine and database effort, it proposes a simple push/pull/search mechanism only based on ES, a tool which can be deployed on nearly any hardware. It is a high-level R-ES binding to ease data usage using 'elastic' package (S. Chamberlain (2020)) <https://docs.ropensci.org/elastic/>, extends joins from 'dplyr' package (H. Wickham et al. (2020)) <https://dplyr.tidyverse.org/> and integrates specific biological format importation with Bioconductor packages such as 'rtracklayer' (M. Lawrence and al. (2009) <doi:10.1093/bioinformatics/btp328>) <http://bioconductor.org/packages/rtracklayer>, 'Biostrings' (H. Pagès and al. (2020) <doi:10.18129/B9.bioc.Biostrings>) <http://bioconductor.org/packages/Biostrings>, and 'Rsamtools' (M. Morgan and al. (2020) <doi:10.18129/B9.bioc.Rsamtools>) <http://bioconductor.org/packages/Rsamtools>, but also a long list of more common ones with 'rio' (C-h. Chan and al. (2018)) <https://cran.r-project.org/package=rio>.
Maintained by Régis Ongaro-Carcy. Last updated 4 years ago.
dataimportdatarepresentationthirdpartyclientdata-sciencedatabasedatasetselasticsearchelasticsearch-clientpush-pullsearchsearch-engine
0.5 match 3 stars 4.48 score 8 scriptsrcst
rim:Interface to 'Maxima', Enabling Symbolic Computation
An interface to the powerful and fairly complete computer algebra system 'Maxima'. It can be used to start and control 'Maxima' from within R by entering 'Maxima' commands. Results from 'Maxima' can be parsed and evaluated in R. It facilitates outputting results from 'Maxima' in 'LaTeX' and 'MathML'. 2D and 3D plots can be displayed directly. This package also registers a 'knitr'-engine enabling 'Maxima' code chunks to be written in 'RMarkdown' documents.
Maintained by Eric Stemmler. Last updated 5 months ago.
0.5 match 11 stars 4.34 score 10 scriptshrbrmstr
docxtractr:Extract Data Tables and Comments from 'Microsoft' 'Word' Documents
'Microsoft Word' 'docx' files provide an 'XML' structure that is fairly straightforward to navigate, especially when it applies to 'Word' tables and comments. Tools are provided to determine table count/structure, comment count and also to extract/clean tables and comments from 'Microsoft Word' 'docx' documents. There is also nascent support for '.doc' files.
Maintained by Bob Rudis. Last updated 5 years ago.
0.5 match 4.05 score 193 scriptsgawainantell
divvy:Spatial Subsampling of Biodiversity Occurrence Data
Divide taxonomic occurrence data into geographic regions of fair comparison, with three customisable methods to standardise area and extent. Calculate common biodiversity and range-size metrics on subsampled data. Background theory and practical considerations for the methods are described in Antell and others (2023) <doi:10.31223/X5997Z>.
Maintained by Gawain Antell. Last updated 1 years ago.
0.5 match 4.00 score 10 scriptsdjbetancourt-gh
funGp:Gaussian Process Models for Scalar and Functional Inputs
Construction and smart selection of Gaussian process models for analysis of computer experiments with emphasis on treatment of functional inputs that are regularly sampled. This package offers: (i) flexible modeling of functional-input regression problems through the fairly general Gaussian process model; (ii) built-in dimension reduction for functional inputs; (iii) heuristic optimization of the structural parameters of the model (e.g., active inputs, kernel function, type of distance). An in-depth tutorial in the use of funGp is provided in Betancourt et al. (2024) <doi:10.18637/jss.v109.i05> and Metamodeling background is provided in Betancourt et al. (2020) <doi:10.1016/j.ress.2020.106870>. The algorithm for structural parameter optimization is described in <https://hal.science/hal-02532713>.
Maintained by Jose Betancourt. Last updated 10 months ago.
0.5 match 4 stars 3.78 score 2 scriptscran
SSLR:Semi-Supervised Classification, Regression and Clustering Methods
Providing a collection of techniques for semi-supervised classification, regression and clustering. In semi-supervised problem, both labeled and unlabeled data are used to train a classifier. The package includes a collection of semi-supervised learning techniques: self-training, co-training, democratic, decision tree, random forest, 'S3VM' ... etc, with a fairly intuitive interface that is easy to use.
Maintained by Francisco Jesús Palomares Alabarce. Last updated 4 years ago.
0.5 match 1 stars 3.64 score 73 scriptsamarnathbose
AHPtools:Consistency in the Analytic Hierarchy Process
A Swiss Army knife of utility functions for users of the Analytic Hierarchy Process (AHP) which will help you to assess the consistency of a PCM as well as to improve its consistency ratio, to compute the sensitivity of a PCM, create a logical, not a random PCM, from the preferences you provide for the alternatives, and a function that helps evaluate the actual consistency of a PCM based on objective, fair bench marking. The various functions in the toolkit additionally provide the flexibility to users to specify only the upper triangular comparison ratios of the PCM in order to performs its assigned task.
Maintained by Amarnath Bose. Last updated 2 years ago.
0.5 match 3.00 score 3 scriptscran
valection:Sampler for Verification Studies
A binding for the 'valection' program which offers various ways to sample the outputs of competing algorithms or parameterizations, and fairly assess their performance against each other. The 'valection' C library is required to use this package and can be downloaded from: <http://labs.oicr.on.ca/boutros-lab/software/valection>. Cooper CI, et al; Valection: Design Optimization for Validation and Verification Studies; Biorxiv 2018; <doi:10.1101/254839>.
Maintained by Paul C. Boutros. Last updated 7 years ago.
0.5 match 2.00 scorecran
mlr3summary:Model and Learner Summaries for 'mlr3'
Concise and interpretable summaries for machine learning models and learners of the 'mlr3' ecosystem. The package takes inspiration from the summary function for (generalized) linear models but extends it to non-parametric machine learning models, based on generalization performance, model complexity, feature importances and effects, and fairness metrics.
Maintained by Susanne Dandl. Last updated 11 months ago.
0.5 match 1.70 score 6 scriptsopkoutchade
winputall:Variable Input Allocation Among Crops
Using a time-varying random parameters model developed in Koutchade et al., (2024) <https://hal.science/hal-04318163>, this package allows allocating variable input costs among crops produced by farmers based on panel data including information on input expenditure aggregated at the farm level and acreage shares. It also considers in fairly way the weighting data and can allow integrating time-varying and time-constant control variables.
Maintained by Obafèmi Philippe Koutchade. Last updated 9 months ago.
0.5 match 1.70 scorecran
ShinyTester:Functions to Minimize Bonehead Moves While Working with 'shiny'
It's my experience that working with 'shiny' is intuitive once you're into it, but can be quite daunting at first. Several common mistakes are fairly predictable, and therefore we can control for these. The functions in this package help match up the assets listed in the UI and the SERVER files, and Visualize the ad hoc structure of the 'shiny' App.
Maintained by Amit Kohli. Last updated 8 years ago.
0.5 match 1.00 scorecran
predfairness:Discrimination Mitigation for Machine Learning Models
Based on different statistical definitions of discrimination, several methods have been proposed to detect and mitigate social inequality in machine learning models. This package aims to provide an alternative to fairness treatment in predictive models. The ROC method implemented in this package is described by Kamiran, Karim and Zhang (2012) <https://ieeexplore.ieee.org/document/6413831/>.
Maintained by Thaís de Bessa Gontijo de Oliveira. Last updated 4 years ago.
0.5 match 1.00 scorecran
essHist:The Essential Histogram
Provide an optimal histogram, in the sense of probability density estimation and features detection, by means of multiscale variational inference. In other words, the resulting histogram servers as an optimal density estimator, and meanwhile recovers the features, such as increases or modes, with both false positive and false negative controls. Moreover, it provides a parsimonious representation in terms of the number of blocks, which simplifies data interpretation. The only assumption for the method is that data points are independent and identically distributed, so it applies to fairly general situations, including continuous distributions, discrete distributions, and mixtures of both. For details see Li, Munk, Sieling and Walther (2016) <arXiv:1612.07216>.
Maintained by Housen Li. Last updated 6 years ago.
0.5 match 1 stars 1.00 score 6 scriptsjuan-goncalves-dosantos
ProjectManagement:Management of Deterministic and Stochastic Projects
Management problems of deterministic and stochastic projects. It obtains the duration of a project and the appropriate slack for each activity in a deterministic context. In addition it obtains a schedule of activities' time (Castro, Gómez & Tejada (2007) <doi:10.1016/j.orl.2007.01.003>). It also allows the management of resources. When the project is done, and the actual duration for each activity is known, then it can know how long the project is delayed and make a fair delivery of the delay between each activity (Bergantiños, Valencia-Toledo & Vidal-Puga (2018) <doi:10.1016/j.dam.2017.08.012>). In a stochastic context it can estimate the average duration of the project and plot the density of this duration, as well as, the density of the early and last times of the chosen activities. As in the deterministic case, it can make a distribution of the delay generated by observing the project already carried out.
Maintained by Juan Carlos Gonçalves Dosantos. Last updated 5 months ago.
0.5 match 1.00 score 9 scripts