R-universe search: "kolmogorov-smirnov"

tirgit

missCompare:Intuitive Missing Data Imputation Framework

Offers a convenient pipeline to test and compare various missing data imputation algorithms on simulated and real data. These include simpler methods, such as mean and median imputation and random replacement, but also include more sophisticated algorithms already implemented in popular R packages, such as 'mi', described by Su et al. (2011) <doi:10.18637/jss.v045.i02>; 'mice', described by van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>; 'missForest', described by Stekhoven and Buhlmann (2012) <doi:10.1093/bioinformatics/btr597>; 'missMDA', described by Josse and Husson (2016) <doi:10.18637/jss.v070.i01>; and 'pcaMethods', described by Stacklies et al. (2007) <doi:10.1093/bioinformatics/btm069>. The central assumption behind 'missCompare' is that structurally different datasets (e.g. larger datasets with a large number of correlated variables vs. smaller datasets with non correlated variables) will benefit differently from different missing data imputation algorithms. 'missCompare' takes measurements of your dataset and sets up a sandbox to try a curated list of standard and sophisticated missing data imputation algorithms and compares them assuming custom missingness patterns. 'missCompare' will also impute your real-life dataset for you after the selection of the best performing algorithm in the simulations. The package also provides various post-imputation diagnostics and visualizations to help you assess imputation performance.

Maintained by Tibor V. Varga. Last updated 4 years ago.

comparison comparison-benchmarks imputation imputation-algorithm imputation-methods imputations kolmogorov-smirnov missing missing-data missing-data-imputation missing-status-check missing-values missingness post-imputation-diagnostics rmse

15.0 match 39 stars 5.89 score 40 scripts

alexkowa

EnvStats:Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).

Maintained by Alexander Kowarik. Last updated 16 days ago.

6.6 match 26 stars 12.80 score 2.4k scripts 46 dependents

drg-123

NSM3:Functions and Datasets to Accompany Hollander, Wolfe, and Chicken - Nonparametric Statistical Methods, Third Edition

Designed to replace the tables which were in the back of the first two editions of Hollander and Wolfe - Nonparametric Statistical Methods. Exact procedures are performed when computationally possible. Monte Carlo and Asymptotic procedures are performed otherwise. For those procedures included in the base packages, our code simply provides a wrapper to standardize the output with the other procedures in the package.

Maintained by Grant Schneider. Last updated 4 months ago.

18.7 match 1 stars 3.77 score 115 scripts 1 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 10 days ago.

fortran cpp

3.6 match 87 stars 16.68 score 7.7k scripts 99 dependents

pnovack-gottshall

KScorrect:Lilliefors-Corrected Kolmogorov-Smirnov Goodness-of-Fit Tests

Implements the Lilliefors-corrected Kolmogorov-Smirnov test for use in goodness-of-fit tests, suitable when population parameters are unknown and must be estimated by sample statistics. P-values are estimated by simulation. Can be used with a variety of continuous distributions, including normal, lognormal, univariate mixtures of normals, uniform, loguniform, exponential, gamma, and Weibull distributions. Functions to generate random numbers and calculate density, distribution, and quantile functions are provided for use with the log uniform and mixture distributions.

Maintained by Phil Novack-Gottshall. Last updated 6 years ago.

13.5 match 3 stars 4.35 score 50 scripts 1 dependents

laplacesdemonr

LaplacesDemon:Complete Environment for Bayesian Inference

Provides a complete environment for Bayesian inference using a variety of different samplers (see ?LaplacesDemon for an overview).

Maintained by Henrik Singmann. Last updated 12 months ago.

3.8 match 93 stars 13.45 score 1.8k scripts 60 dependents

iscience-kn

dropR:Dropout Analysis by Condition

Analysis and visualization of dropout between conditions in surveys and (online) experiments. Features include computation of dropout statistics, comparing dropout between conditions (e.g. Chi square), analyzing survival (e.g. Kaplan-Meier estimation), comparing conditions with the most different rates of dropout (Kolmogorov-Smirnov) and visualizing the result of each in designated plotting functions. Sources: Andrea Frick, Marie-Terese Baechtiger & Ulf-Dietrich Reips (2001) <https://www.researchgate.net/publication/223956222_Financial_incentives_personal_information_and_drop-out_in_online_studies>; Ulf-Dietrich Reips (2002) "Standards for Internet-Based Experimenting" <doi:10.1027//1618-3169.49.4.243>.

Maintained by Annika Tave Overlander. Last updated 4 months ago.

dropout experiments psychology social-science

7.9 match 6 stars 6.06 score 16 scripts

yanyachen

MLmetrics:Machine Learning Evaluation Metrics

A collection of evaluation metrics, including loss, score and utility functions, that measure regression, classification and ranking performance.

Maintained by Yachen Yan. Last updated 11 months ago.

4.0 match 69 stars 11.09 score 2.2k scripts 20 dependents

fishr-core-team

FSA:Simple Fisheries Stock Assessment Methods

A variety of simple fish stock assessment methods.

Maintained by Derek H. Ogle. Last updated 2 months ago.

fish fisheries fisheries-management fisheries-stock-assessment population-dynamics stock-assessment

4.0 match 68 stars 11.08 score 1.7k scripts 6 dependents

allegropiano

GLDEX:Fitting Single and Mixture of Generalised Lambda Distributions

The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution. References include the following: Karvanen and Nuutinen (2008) "Characterizing the generalized lambda distribution by L-moments" <doi:10.1016/j.csda.2007.06.021>, King and MacGillivray (1999) "A starship method for fitting the generalised lambda distributions" <doi:10.1111/1467-842X.00089>, Su (2005) "A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data" <doi:10.22237/jmasm/1130803560>, Su (2007) "Nmerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions" <doi:10.1016/j.csda.2006.06.008>, Su (2007) "Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R" <doi:10.18637/jss.v021.i09>, Su (2009) "Confidence Intervals for Quantiles Using Generalized Lambda Distributions" <doi:10.1016/j.csda.2009.02.014>, Su (2010) "Chapter 14: Fitting GLDs and Mixture of GLDs to Data using Quantile Matching Method" <doi:10.1201/b10159>, Su (2010) "Chapter 15: Fitting GLD to data using GLDEX 1.0.4 in R" <doi:10.1201/b10159>, Su (2015) "Flexible Parametric Quantile Regression Model" <doi:10.1007/s11222-014-9457-1>, Su (2021) "Flexible parametric accelerated failure time model"<doi:10.1080/10543406.2021.1934854>.

Maintained by Steve Su. Last updated 2 years ago.

14.2 match 3.05 score 93 scripts 2 dependents

jasjeetsekhon

Matching:Multivariate and Propensity Score Matching with Balance Optimization

Provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance has been obtained are also provided. For details, see the paper by Jasjeet Sekhon (2007, <doi:10.18637/jss.v042.i07>).

Maintained by Jasjeet Singh Sekhon. Last updated 5 months ago.

cpp

4.0 match 24 stars 10.36 score 852 scripts 10 dependents

mmaechler

sfsmisc:Utilities from 'Seminar fuer Statistik' ETH Zurich

Useful utilities ['goodies'] from Seminar fuer Statistik ETH Zurich, some of which were ported from S-plus in the 1990s. For graphics, have pretty (Log-scale) axes eaxis(), an enhanced Tukey-Anscombe plot, combining histogram and boxplot, 2d-residual plots, a 'tachoPlot()', pretty arrows, etc. For robustness, have a robust F test and robust range(). For system support, notably on Linux, provides 'Sys.*()' functions with more access to system and CPU information. Finally, miscellaneous utilities such as simple efficient prime numbers, integer codes, Duplicated(), toLatex.numeric() and is.whole().

Maintained by Martin Maechler. Last updated 5 months ago.

3.5 match 11 stars 10.87 score 566 scripts 119 dependents

r-forge

surveillance:Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena

Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hoehle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hoehle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.

Maintained by Sebastian Meyer. Last updated 2 days ago.

cpp

3.4 match 2 stars 10.68 score 446 scripts 3 dependents

jeromeecoac

seewave:Sound Analysis and Synthesis

Functions for analysing, manipulating, displaying, editing and synthesizing time waves (particularly sound). This package processes time analysis (oscillograms and envelopes), spectral content, resonance quality factor, entropy, cross correlation and autocorrelation, zero-crossing, dominant frequency, analytic signal, frequency coherence, 2D and 3D spectrograms and many other analyses. See Sueur et al. (2008) <doi:10.1080/09524622.2008.9753600> and Sueur (2018) <doi:10.1007/978-3-319-77647-7>.

Maintained by Jerome Sueur. Last updated 1 years ago.

4.0 match 18 stars 8.84 score 880 scripts 23 dependents

obouaziz

robusTest:Calibrated Correlation and Two-Sample Tests

Implementation of corrected two-sample tests. A corrected version of the Pearson and Kendall correlation tests, the Mann-Whitney (Wilcoxon) rank sum test, the Wilcoxon signed rank test and a variance test are implemented. The package also proposes a test for the median and an independence test between two continuous variables of Kolmogorov-Smirnov's type. All these corrected tests are asymptotically calibrated in the sense that the probability of rejection under the null hypothesis is asymptotically equal to the level of the test. See <doi:10.48550/arXiv.2211.08784> for more details on the statistical tests.

Maintained by Olivier Bouaziz. Last updated 9 months ago.

cpp

11.0 match 3.18 score 4 scripts

cdowd

twosamples:Fast Permutation Based Two Sample Tests

Fast randomization based two sample tests. Testing the hypothesis that two samples come from the same distribution using randomization to create p-values. Included tests are: Kolmogorov-Smirnov, Kuiper, Cramer-von Mises, Anderson-Darling, Wasserstein, and DTS. The default test (two_sample) is based on the DTS test statistic, as it is the most powerful, and thus most useful to most users. The DTS test statistic builds on the Wasserstein distance by using a weighting scheme like that of Anderson-Darling. See the companion paper at <arXiv:2007.01360> or <https://codowd.com/public/DTS.pdf> for details of that test statistic, and non-standard uses of the package (parallel for big N, weighted observations, one sample tests, etc). We also include the permutation scheme to make test building simple for others.

Maintained by Connor Dowd. Last updated 2 years ago.

distance-metric ecdf cpp

5.0 match 17 stars 6.88 score 62 scripts 8 dependents

zwenyu

ecp:Non-Parametric Multiple Change-Point Analysis of Multivariate Data

Implements various procedures for finding multiple change-points from Matteson D. et al (2013) <doi:10.1080/01621459.2013.849605>, Zhang W. et al (2017) <doi:10.1109/ICDMW.2017.44>, Arlot S. et al (2019). Two methods make use of dynamic programming and pruning, with no distributional assumptions other than the existence of certain absolute moments in one method. Hierarchical and exact search methods are included. All methods return the set of estimated change- points as well as other summary information.

Maintained by Wenyu Zhang. Last updated 7 months ago.

cpp

6.7 match 1 stars 5.07 score 103 scripts 18 dependents

uligges

nortest:Tests for Normality

Five omnibus tests for testing the composite hypothesis of normality.

Maintained by Uwe Ligges. Last updated 10 years ago.

3.6 match 9.13 score 3.5k scripts 155 dependents

bioc

slingshot:Tools for ordering single-cell sequencing

Provides functions for inferring continuous, branching lineage structures in low-dimensional data. Slingshot was designed to model developmental trajectories in single-cell RNA sequencing data and serve as a component in an analysis pipeline after dimensionality reduction and clustering. It is flexible enough to handle arbitrarily many branching events and allows for the incorporation of prior knowledge through supervised graph construction.

Maintained by Kelly Street. Last updated 5 months ago.

clustering differentialexpression geneexpression rnaseq sequencing software singlecell transcriptomics visualization

2.7 match 283 stars 12.01 score 1.0k scripts 4 dependents

bioc

GSAR:Gene Set Analysis in R

Gene set analysis using specific alternative hypotheses. Tests for differential expression, scale and net correlation structure.

Maintained by Yasir Rahmatallah. Last updated 5 months ago.

software statisticalmethod differentialexpression

7.1 match 4.38 score 7 scripts

mthrun

AdaptGauss:Gaussian Mixture Models (GMM)

Multimodal distributions can be modelled as a mixture of components. The model is derived using the Pareto Density Estimation (PDE) for an estimation of the pdf. PDE has been designed in particular to identify groups/classes in a dataset. Precise limits for the classes can be calculated using the theorem of Bayes. Verification of the model is possible by QQ plot, Chi-squared test and Kolmogorov-Smirnov test. The package is based on the publication of Ultsch, A., Thrun, M.C., Hansen-Goos, O., Lotsch, J. (2015) <DOI:10.3390/ijms161025897>.

Maintained by Michael Thrun. Last updated 2 years ago.

cpp

5.0 match 1 stars 6.12 score 25 scripts 5 dependents

karoliskoncevicius

matrixTests:Fast Statistical Hypothesis Tests on Rows and Columns of Matrices

Functions to perform fast statistical hypothesis tests on rows/columns of matrices. The main goals are: 1) speed via vectorization, 2) output that is detailed and easy to use, 3) compatibility with tests implemented in R (like those available in the 'stats' package).

Maintained by Karolis Koncevičius. Last updated 1 years ago.

anova fast hypothesis-testing matrix rows t-test wilcoxon-test

4.0 match 36 stars 7.60 score 272 scripts 8 dependents

uclahs-cds

BoutrosLab.plotting.general:Functions to Create Publication-Quality Plots

Contains several plotting functions such as barplots, scatterplots, heatmaps, as well as functions to combine plots and assist in the creation of these plots. These functions will give users great ease of use and customization options in broad use for biomedical applications, as well as general purpose plotting. Each of the functions also provides valid default settings to make plotting data more efficient and producing high quality plots with standard colour schemes simpler. All functions within this package are capable of producing plots that are of the quality to be presented in scientific publications and journals. P'ng et al.; BPG: Seamless, automated and interactive visualization of scientific data; BMC Bioinformatics 2019 <doi:10.1186/s12859-019-2610-2>.

Maintained by Paul Boutros. Last updated 5 months ago.

3.6 match 12 stars 8.36 score 414 scripts 6 dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

3.6 match 3 stars 8.20 score 7.8k scripts 11 dependents

braunlab-nu

fasano.franceschini.test:Fasano-Franceschini Test: A Multivariate Kolmogorov-Smirnov Two-Sample Test

An implementation of the two-sample multivariate Kolmogorov-Smirnov test described by Fasano and Franceschini (1987) <doi:10.1093/mnras/225.1.155>. This test evaluates the null hypothesis that two i.i.d. random samples were drawn from the same underlying probability distribution. The data can be of any dimension, and can be of any type (continuous, discrete, or mixed).

Maintained by Connor Puritz. Last updated 1 years ago.

cpp

6.6 match 6 stars 4.36 score 38 scripts

pvermees

provenance:Statistical Toolbox for Sedimentary Provenance Analysis

Bundles a number of established statistical methods to facilitate the visual interpretation of large datasets in sedimentary geology. Includes functionality for adaptive kernel density estimation, principal component analysis, correspondence analysis, multidimensional scaling, generalised procrustes analysis and individual differences scaling using a variety of dissimilarity measures. Univariate provenance proxies, such as single-grain ages or (isotopic) compositions are compared with the Kolmogorov-Smirnov, Kuiper, Wasserstein-2 or Sircombe-Hazelton L2 distances. Categorical provenance proxies such as chemical compositions are compared with the Aitchison and Bray-Curtis distances,and count data with the chi-square distance. Varietal data can either be converted to one or more distributional datasets, or directly compared using the multivariate Wasserstein distance. Also included are tools to plot compositional and count data on ternary diagrams and point-counting data on radial plots, to calculate the sample size required for specified levels of statistical precision, and to assess the effects of hydraulic sorting on detrital compositions. Includes an intuitive query-based user interface for users who are not proficient in R.

Maintained by Pieter Vermeesch. Last updated 2 months ago.

5.0 match 14 stars 5.52 score 79 scripts 1 dependents

gmgeorg

LambertW:Probabilistic Models to Analyze and Gaussianize Heavy-Tailed, Skewed Data

Lambert W x F distributions are a generalized framework to analyze skewed, heavy-tailed data. It is based on an input/output system, where the output random variable (RV) Y is a non-linearly transformed version of an input RV X ~ F with similar properties as X, but slightly skewed (heavy-tailed). The transformed RV Y has a Lambert W x F distribution. This package contains functions to model and analyze skewed, heavy-tailed data the Lambert Way: simulate random samples, estimate parameters, compute quantiles, and plot/ print results nicely. The most useful function is 'Gaussianize', which works similarly to 'scale', but actually makes the data Gaussian. A do-it-yourself toolkit allows users to define their own Lambert W x 'MyFavoriteDistribution' and use it in their analysis right away.

Maintained by Georg M. Goerg. Last updated 1 years ago.

gaussianize gaussianize-data heavy-tailed heavy-tailed-distributions leptokurtosis normal-distribution normalization skewed-data statistics cpp

3.4 match 10 stars 8.17 score 78 scripts 13 dependents

ugurdar

datadriftR:Concept Drift Detection Methods for Stream Data

A system designed for detecting concept drift in streaming datasets. It offers a comprehensive suite of statistical methods to detect concept drift, including methods for monitoring changes in data distributions over time. The package supports several tests, such as Drift Detection Method (DDM), Early Drift Detection Method (EDDM), Hoeffding Drift Detection Methods (HDDM_A, HDDM_W), Kolmogorov-Smirnov test-based Windowing (KSWIN) and Page Hinkley (PH) tests. The methods implemented in this package are based on established research and have been demonstrated to be effective in real-time data analysis. For more details on the methods, please check to the following sources. Kobylińska et al. (2023) <doi:10.48550/arXiv.2308.11446>, S. Kullback & R.A. Leibler (1951) <doi:10.1214/aoms/1177729694>, Gama et al. (2004) <doi:10.1007/978-3-540-28645-5_29>, Baena-Garcia et al. (2006) <https://www.researchgate.net/publication/245999704_Early_Drift_Detection_Method>, Frías-Blanco et al. (2014) <https://ieeexplore.ieee.org/document/6871418>, Raab et al. (2020) <doi:10.1016/j.neucom.2019.11.111>, Page (1954) <doi:10.1093/biomet/41.1-2.100>, Montiel et al. (2018) <https://jmlr.org/papers/volume19/18-251/18-251.pdf>.

Maintained by Ugur Dar. Last updated 7 months ago.

8.0 match 1 stars 3.30 score

ekstroem

MESS:Miscellaneous Esoteric Statistical Scripts

A mixed collection of useful and semi-useful diverse statistical functions, some of which may even be referenced in The R Primer book. See Ekstrøm, C. T. (2016). The R Primer. 2nd edition. Chapman & Hall.

Maintained by Claus Thorn Ekstrøm. Last updated 29 days ago.

biostatistics power-analysis statistical-analysis statistical-methods statistical-models openblas cpp

3.4 match 4 stars 7.76 score 328 scripts 13 dependents

cran

fBasics:Rmetrics - Markets and Basic Statistics

Provides a collection of functions to explore and to investigate basic properties of financial returns and related quantities. The covered fields include techniques of explorative data analysis and the investigation of distributional properties, including parameter estimation and hypothesis testing. Even more there are several utility functions for data handling and management.

Maintained by Georgi N. Boshnakov. Last updated 7 months ago.

3.6 match 2 stars 7.11 score 129 dependents

spatstat

spatstat:Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests

Comprehensive open-source toolbox for analysing Spatial Point Patterns. Focused mainly on two-dimensional point patterns, including multitype/marked points, in any spatial region. Also supports three-dimensional point patterns, space-time point patterns in any number of dimensions, point patterns on a linear network, and patterns of other geometrical objects. Supports spatial covariate data such as pixel images. Contains over 3000 functions for plotting spatial data, exploratory data analysis, model-fitting, simulation, spatial sampling, model diagnostics, and formal inference. Data types include point patterns, line segment patterns, spatial windows, pixel images, tessellations, and linear networks. Exploratory methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the functions ppm(), kppm(), slrm(), dppm() similar to glm(). Types of models include Poisson, Gibbs and Cox point processes, Neyman-Scott cluster processes, and determinantal point processes. Models may involve dependence on covariates, inter-point interaction, cluster formation and dependence on marks. Models are fitted by maximum likelihood, logistic regression, minimum contrast, and composite likelihood methods. A model can be fitted to a list of point patterns (replicated point pattern data) using the function mppm(). The model can include random effects and fixed effects depending on the experimental design, in addition to all the features listed above. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots.

Maintained by Adrian Baddeley. Last updated 2 months ago.

cluster-process cox-point-process gibbs-process kernel-density network-analysis point-process poisson-process spatial-analysis spatial-data spatial-data-analysis spatial-statistics spatstat statistical-methods statistical-models statistical-tests statistics

1.5 match 200 stars 16.32 score 5.5k scripts 41 dependents

tripartio

ale:Interpretable Machine Learning and Statistical Inference with Accumulated Local Effects (ALE)

Accumulated Local Effects (ALE) were initially developed as a model-agnostic approach for global explanations of the results of black-box machine learning algorithms. ALE has a key advantage over other approaches like partial dependency plots (PDP) and SHapley Additive exPlanations (SHAP): its values represent a clean functional decomposition of the model. As such, ALE values are not affected by the presence or absence of interactions among variables in a mode. Moreover, its computation is relatively rapid. This package reimplements the algorithms for calculating ALE data and develops highly interpretable visualizations for plotting these ALE values. It also extends the original ALE concept to add bootstrap-based confidence intervals and ALE-based statistics that can be used for statistical inference. For more details, see Okoli, Chitu. 2023. “Statistical Inference Using Machine Learning and Classical Techniques Based on Accumulated Local Effects (ALE).” arXiv. <arXiv:2310.09877>. <doi:10.48550/arXiv.2310.09877>.

Maintained by Chitu Okoli. Last updated 25 days ago.

3.3 match 4 stars 6.41 score 27 scripts

hanjunwei-lab

LncPath:Identifying the Pathways Regulated by LncRNA Sets of Interest

Identifies pathways synergisticly regulated by the interested lncRNA(long non-coding RNA) sets based on a lncRNA-mRNA(messenger RNA) interaction network. 1) The lncRNA-mRNA interaction network was built from the protein-protein interactions and the lncRNA-mRNA co-expression relationships in 28 RNA-Seq data sets. 2) The interested lncRNAs can be mapped into networks as seed nodes and a random walk strategy will be performed to evaluate the rate of each coding genes influenced by the seed lncRNAs. 3) Pathways regulated by the lncRNA set will be evaluated by a weighted Kolmogorov-Smirnov statistic as an ES Score. 4) The p value and false discovery rate value will also be calculated through a permutation analysis. 5) The running score of each pathway can be plotted and the heat map of each pathway can also be plotted if an expression profile is provided. 6) The rank and scores of the gene list of each pathway can be printed.

Maintained by Junwei Han. Last updated 5 years ago.

4.5 match 1 stars 4.52 score 11 scripts

carloscinelli

benford.analysis:Benford Analysis for Data Validation and Forensic Analytics

Provides tools that make it easier to validate data using Benford's Law.

Maintained by Carlos Cinelli. Last updated 6 years ago.

3.5 match 62 stars 5.66 score 74 scripts

josh-mc

discretefit:Simulated Goodness-of-Fit Tests for Discrete Distributions

Implements fast Monte Carlo simulations for goodness-of-fit (GOF) tests for discrete distributions. This includes tests based on the Chi-squared statistic, the log-likelihood-ratio (G^2) statistic, the Freeman-Tukey (Hellinger-distance) statistic, the Kolmogorov-Smirnov statistic, the Cramer-von Mises statistic as described in Choulakian, Lockhart and Stephens (1994) <doi:10.2307/3315828>, and the root-mean-square statistic, see Perkins, Tygert, and Ward (2011) <doi:10.1016/j.amc.2011.03.124>.

Maintained by Josh McCormick. Last updated 3 years ago.

cpp

4.5 match 1 stars 4.18 score 7 scripts 1 dependents

bioc

dks:The double Kolmogorov-Smirnov package for evaluating multiple testing procedures.

The dks package consists of a set of diagnostic functions for multiple testing methods. The functions can be used to determine if the p-values produced by a multiple testing procedure are correct. These functions are designed to be applied to simulated data. The functions require the entire set of p-values from multiple simulated studies, so that the joint distribution can be evaluated.

Maintained by Jeffrey T. Leek. Last updated 5 months ago.

multiplecomparison qualitycontrol

5.6 match 3.30 score 1 scripts

mmm-uca

RcmdrPlugin.UCA:UCA Rcmdr Plug-in

Some extensions to Rcmdr (R Commander), randomness test, variance test for one normal sample and predictions using active model, made by R-UCA project and used in teaching statistics at University of Cadiz (UCA).

Maintained by Manuel Munoz-Marquez. Last updated 6 months ago.

7.8 match 2.34 score 11 scripts

qddyy

LearnNonparam:'R6'-Based Flexible Framework for Permutation Tests

Implements non-parametric tests from Higgins (2004, ISBN:0534387756), including tests for one sample, two samples, k samples, paired comparisons, blocked designs, trends and association. Built with 'Rcpp' for efficiency and 'R6' for flexible, object-oriented design, the package provides a unified framework for performing or creating custom permutation tests.

Maintained by Yan Du. Last updated 2 months ago.

hypothesis-test nonparametric-statistics permutation-test cpp

3.6 match 6 stars 5.01 score 2 scripts

bioc

CaDrA:Candidate Driver Analysis

Performs both stepwise and backward heuristic search for candidate (epi)genetic drivers based on a binary multi-omics dataset. CaDrA's main objective is to identify features which, together, are significantly skewed or enriched pertaining to a given vector of continuous scores (e.g. sample-specific scores representing a phenotypic readout of interest, such as protein expression, pathway activity, etc.), based on the union occurence (i.e. logical OR) of the events.

Maintained by Reina Chau. Last updated 5 months ago.

microarray rnaseq geneexpression software featureextraction

2.4 match 24 stars 7.19 score 12 scripts

christinaheinze

CondIndTests:Nonlinear Conditional Independence Tests

Code for a variety of nonlinear conditional independence tests: Kernel conditional independence test (Zhang et al., UAI 2011, <arXiv:1202.3775>), Residual Prediction test (based on Shah and Buehlmann, <arXiv:1511.03334>), Invariant environment prediction, Invariant target prediction, Invariant residual distribution test, Invariant conditional quantile prediction (all from Heinze-Deml et al., <arXiv:1706.08576>).

Maintained by Christina Heinze-Deml. Last updated 5 years ago.

3.5 match 17 stars 4.91 score 32 scripts 1 dependents

abusjahn

wrappedtools:Useful Wrappers Around Commonly Used Functions

The main functionalities of 'wrappedtools' are: adding backticks to variable names; rounding to desired precision with special case for p-values; selecting columns based on pattern and storing their position, name, and backticked name; computing and formatting of descriptive statistics (e.g. mean±SD), comparing groups and creating publication-ready tables with descriptive statistics and p-values; creating specialized plots for correlation matrices. Functions were mainly written for my own daily work or teaching, but may be of use to others as well.

Maintained by Andreas Busjahn. Last updated 5 months ago.

descriptive-statistics test-statistic

3.6 match 2 stars 4.70 score 8 scripts

yml2017xiao

Peacock.test:Two and Three Dimensional Kolmogorov-Smirnov Two-Sample Tests

The original definition of the two and three dimensional Kolmogorov-Smirnov two-sample test statistics given by Peacock (1983) is implemented. Two R-functions: peacock2 and peacock3, are provided to compute the test statistics in two and three dimensional spaces, respectively. Note the Peacock test is different from the Fasano and Franceschini test (1987). The latter is a variant of the Peacock test.

Maintained by Yuanhui Xiao. Last updated 9 years ago.

16.9 match 1.00 score 10 scripts

iohprofiler

IOHanalyzer:Data Analysis Part of 'IOHprofiler'

The data analysis module for the Iterative Optimization Heuristics Profiler ('IOHprofiler'). This module provides statistical analysis methods for the benchmark data generated by optimization heuristics, which can be visualized through a web-based interface. The benchmark data is usually generated by the experimentation module, called 'IOHexperimenter'. 'IOHanalyzer' also supports the widely used 'COCO' (Comparing Continuous Optimisers) data format for benchmarking.

Maintained by Diederick Vermetten. Last updated 10 months ago.

cpp

3.3 match 24 stars 5.10 score 13 scripts

alanarnholt

PASWR:Probability and Statistics with R

Functions and data sets for the text Probability and Statistics with R.

Maintained by Alan T. Arnholt. Last updated 3 years ago.

3.5 match 2 stars 4.70 score 241 scripts

spatstat

spatstat.explore:Exploratory Data Analysis for the 'spatstat' Family

Functionality for exploratory data analysis and nonparametric analysis of spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported.

Maintained by Adrian Baddeley. Last updated 1 months ago.

cluster-detection confidence-intervals hypothesis-testing k-function roc-curves scan-statistics significance-testing simulation-envelopes spatial-analysis spatial-data-analysis spatial-sharpening spatial-smoothing spatial-statistics

1.5 match 1 stars 10.17 score 67 scripts 148 dependents

bioc

benchdamic:Benchmark of differential abundance methods on microbiome data

Starting from a microbiome dataset (16S or WMS with absolute count values) it is possible to perform several analysis to assess the performances of many differential abundance detection methods. A basic and standardized version of the main differential abundance analysis methods is supplied but the user can also add his method to the benchmark. The analyses focus on 4 main aspects: i) the goodness of fit of each method's distributional assumptions on the observed count data, ii) the ability to control the false discovery rate, iii) the within and between method concordances, iv) the truthfulness of the findings if any apriori knowledge is given. Several graphical functions are available for result visualization.

Maintained by Matteo Calgaro. Last updated 4 months ago.

metagenomics microbiome differentialexpression multiplecomparison normalization preprocessing software benchmark differential-abundance-methods

2.7 match 6 stars 5.73 score 8 scripts

munterfi

eRTG3D:Empirically Informed Random Trajectory Generation in 3-D

Creates realistic random trajectories in a 3-D space between two given fix points, so-called conditional empirical random walks (CERWs). The trajectory generation is based on empirical distribution functions extracted from observed trajectories (training data) and thus reflects the geometrical movement characteristics of the mover. A digital elevation model (DEM), representing the Earth's surface, and a background layer of probabilities (e.g. food sources, uplift potential, waterbodies, etc.) can be used to influence the trajectories. Unterfinger M (2018). "3-D Trajectory Simulation in Movement Ecology: Conditional Empirical Random Walk". Master's thesis, University of Zurich. <https://www.geo.uzh.ch/dam/jcr:6194e41e-055c-4635-9807-53c5a54a3be7/MasterThesis_Unterfinger_2018.pdf>. Technitis G, Weibel R, Kranstauber B, Safi K (2016). "An algorithm for empirically informed random trajectory generation between two endpoints". GIScience 2016: Ninth International Conference on Geographic Information Science, 9, online. <doi:10.5167/uzh-130652>.

Maintained by Merlin Unterfinger. Last updated 3 years ago.

3d birds conditional-empirical-random-walk gliding-and-soaring machine-learning movement-ecology random-trajectory-generator random-walk simulation trajectory-generation

2.7 match 6 stars 5.71 score 19 scripts

benrenard

HydroPortailStats:'HydroPortail' Statistical Functions

Statistical functions used in the French 'HydroPortail' <https://hydro.eaufrance.fr/>. This includes functions to estimate distributions, quantile curves and uncertainties, along with various other utilities. Technical details are available (in French) in Renard (2016) <https://hal.inrae.fr/hal-02605318>.

Maintained by Benjamin Renard. Last updated 4 months ago.

hydrology statistical-distributions statistics

4.0 match 3 stars 3.78 score 1 scripts

sergiofinances

actfts:Autocorrelation Tools Featured for Time Series

The 'actfts' package provides tools for performing autocorrelation analysis of time series data. It includes functions to compute and visualize the autocorrelation function (ACF) and the partial autocorrelation function (PACF). Additionally, it performs the Dickey-Fuller, KPSS, and Phillips-Perron unit root tests to assess the stationarity of time series. Theoretical foundations are based on Box and Cox (1964) <doi:10.1111/j.2517-6161.1964.tb00553.x>, Box and Jenkins (1976) <isbn:978-0-8162-1234-2>, and Box and Pierce (1970) <doi:10.1080/01621459.1970.10481180>. Statistical methods are also drawn from Kolmogorov (1933) <doi:10.1007/BF00993594>, Kwiatkowski et al. (1992) <doi:10.1016/0304-4076(92)90104-Y>, and Ljung and Box (1978) <doi:10.1093/biomet/65.2.297>. The package integrates functions from 'forecast' (Hyndman & Khandakar, 2008) <https://CRAN.R-project.org/package=forecast>, 'tseries' (Trapletti & Hornik, 2020) <https://CRAN.R-project.org/package=tseries>, 'xts' (Ryan & Ulrich, 2020) <https://CRAN.R-project.org/package=xts>, and 'stats' (R Core Team, 2023) <https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html>. Additionally, it provides visualization tools via 'plotly' (Sievert, 2020) <https://CRAN.R-project.org/package=plotly> and 'reactable' (Glaz, 2023) <https://CRAN.R-project.org/package=reactable>. The package also incorporates macroeconomic datasets from the U.S. Bureau of Economic Analysis: Disposable Personal Income (DPI) <https://fred.stlouisfed.org/series/DPI>, Gross Domestic Product (GDP) <https://fred.stlouisfed.org/series/GDP>, and Personal Consumption Expenditures (PCEC) <https://fred.stlouisfed.org/series/PCEC>.

Maintained by Sergio Sierra. Last updated 12 days ago.

3.2 match 1 stars 4.74 score

alanarnholt

PASWR2:Probability and Statistics with R, Second Edition

Functions and data sets for the text Probability and Statistics with R, Second Edition.

Maintained by Alan T. Arnholt. Last updated 3 years ago.

3.5 match 1 stars 4.24 score 260 scripts

spatstat

spatstat.linnet:Linear Networks Functionality of the 'spatstat' Family

Defines types of spatial data on a linear network and provides functionality for geometrical operations, data analysis and modelling of data on a linear network, in the 'spatstat' family of packages. Contains definitions and support for linear networks, including creation of networks, geometrical measurements, topological connectivity, geometrical operations such as inserting and deleting vertices, intersecting a network with another object, and interactive editing of networks. Data types defined on a network include point patterns, pixel images, functions, and tessellations. Exploratory methods include kernel estimation of intensity on a network, K-functions and pair correlation functions on a network, simulation envelopes, nearest neighbour distance and empty space distance, relative risk estimation with cross-validated bandwidth selection. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the function lppm() similar to glm(). Only Poisson models are implemented so far. Models may involve dependence on covariates and dependence on marks. Models are fitted by maximum likelihood. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots. Random point patterns on a network can be generated using a variety of models.

Maintained by Adrian Baddeley. Last updated 2 months ago.

density-estimation heat-equation kernel-density-estimation network-analysis point-processes spatial-data-analysis statistical-analysis statistical-inference statistical-models

1.5 match 6 stars 9.64 score 35 scripts 43 dependents

cran

dgof:Discrete Goodness-of-Fit Tests

A revision to the stats::ks.test() function and the associated ks.test.Rd help page. With one minor exception, it does not change the existing behavior of ks.test(), and it adds features necessary for doing one-sample tests with hypothesized discrete distributions. The package also contains cvm.test(), for doing one-sample Cramer-von Mises goodness-of-fit tests.

Maintained by Taylor B. Arnold. Last updated 5 months ago.

4.0 match 1 stars 3.36 score 4 dependents

mjuraska

sievePH:Sieve Analysis Methods for Proportional Hazards Models

Implements a suite of semiparametric and nonparametric kernel-smoothed estimation and testing procedures for continuous mark-specific stratified hazard ratio (treatment/placebo) models in a randomized treatment efficacy trial with a time-to-event endpoint. Semiparametric methods, allowing multivariate marks, are described in Juraska M and Gilbert PB (2013), Mark-specific hazard ratio model with multivariate continuous marks: an application to vaccine efficacy. Biometrics 69(2):328-337 <doi:10.1111/biom.12016>, and in Juraska M and Gilbert PB (2016), Mark-specific hazard ratio model with missing multivariate marks. Lifetime Data Analysis 22(4):606-25 <doi:10.1007/s10985-015-9353-9>. Nonparametric kernel-smoothed methods, allowing univariate marks only, are described in Sun Y and Gilbert PB (2012), Estimation of stratified mark‐specific proportional hazards models with missing marks. Scandinavian Journal of Statistics}, 39(1):34-52 <doi:10.1111/j.1467-9469.2011.00746.x>, and in Gilbert PB and Sun Y (2015), Inferences on relative failure rates in stratified mark-specific proportional hazards models with missing marks, with application to human immunodeficiency virus vaccine efficacy trials. Journal of the Royal Statistical Society Series C: Applied Statistics, 64(1):49-73 <doi:10.1111/rssc.12067>. Both semiparametric and nonparametric approaches consider two scenarios: (1) the mark is fully observed in all subjects who experience the event of interest, and (2) the mark is subject to missingness-at-random in subjects who experience the event of interest. For models with missing marks, estimators are implemented based on (i) inverse probability weighting (IPW) of complete cases (for the semiparametric framework), and (ii) augmentation of the IPW estimating functions by leveraging correlations between the mark and auxiliary data to 'impute' the augmentation term for subjects with missing marks (for both the semiparametric and nonparametric framework). The augmented IPW estimators are doubly robust and recommended for use with incomplete mark data. The semiparametric methods make two key assumptions: (i) the time-to-event is assumed to be conditionally independent of the mark given treatment, and (ii) the weight function in the semiparametric density ratio/biased sampling model is assumed to be exponential. Diagnostic testing procedures for evaluating validity of both assumptions are implemented. Summary and plotting functions are provided for estimation and inferential results.

Maintained by Michal Juraska. Last updated 9 months ago.

openblas cpp openmp

3.3 match 4.04 score 11 scripts

hectorrdb

Ecume:Equality of 2 (or k) Continuous Univariate and Multivariate Distributions

We implement (or re-implements in R) a variety of statistical tools. They are focused on non-parametric two-sample (or k-sample) distribution comparisons in the univariate or multivariate case. See the vignette for more info.

Maintained by Hector Roux de Bezieux. Last updated 10 months ago.

software infrastructure

2.7 match 1 stars 4.86 score 16 scripts 3 dependents

klebermsousa

jackstrap:Correcting Nonparametric Frontier Measurements for Outliers

Provides method used to check whether data have outlier in efficiency measurement of big samples with data envelopment analysis (DEA). In this jackstrap method, the package provides two criteria to define outliers: heaviside and k-s test. The technique was developed by Sousa and Stosic (2005) "Technical Efficiency of the Brazilian Municipalities: Correcting Nonparametric Frontier Measurements for Outliers." <doi:10.1007/s11123-005-4702-4>.

Maintained by Kleber Morais de Sousa. Last updated 5 years ago.

dea jackstrap nonparametric outlier-detection

3.2 match 1 stars 3.85 score 14 scripts

aibrt

FreqProf:Frequency Profiles Computing and Plotting

Tools for generating an informative type of line graph, the frequency profile, which allows single behaviors, multiple behaviors, or the specific behavioral patterns of individual subjects to be graphed from occurrence/nonoccurrence behavioral data.

Maintained by Ronald E. Robertson. Last updated 9 years ago.

3.3 match 2 stars 3.48 score 7 scripts

simontrimborn

gofCopula:Goodness-of-Fit Tests for Copulae

Several Goodness-of-Fit (GoF) tests for Copulae are provided. A new hybrid test, Zhang et al. (2016) <doi:10.1016/j.jeconom.2016.02.017> is implemented which supports all of the individual tests in the package, e.g. Genest et al. (2009) <doi:10.1016/j.insmatheco.2007.10.005>. Estimation methods for the margins are provided and all the tests support parameter estimation and predefined values. The parameters are estimated by pseudo maximum likelihood but if it fails the estimation switches automatically to inversion of Kendall's tau. For reproducibility of results, the functions support the definition of seeds. Also all the tests support automatized parallelization of the bootstrapping tasks. The package provides an interface to perform new GoF tests by submitting the test statistic.

Maintained by Simon Trimborn. Last updated 3 years ago.

3.4 match 3.16 score 29 scripts

martenthompson

agfh:Agnostic Fay-Herriot Model for Small Area Statistics

Implements the Agnostic Fay-Herriot model, an extension of the traditional small area model. In place of normal sampling errors, the sampling error distribution is estimated with a Gaussian process to accommodate a broader class of distributions. This flexibility is most useful in the presence of bounded, multi-modal, or heavily skewed sampling errors.

Maintained by Marten Thompson. Last updated 2 years ago.

3.8 match 2.70 score 2 scripts

pvermees

IsoplotR:Statistical Toolbox for Radiometric Geochronology

Plots U-Pb data on Wetherill and Tera-Wasserburg concordia diagrams. Calculates concordia and discordia ages. Performs linear regression of measurements with correlated errors using 'York', 'Titterington', 'Ludwig' and Omnivariant Generalised Least-Squares ('OGLS') approaches. Generates Kernel Density Estimates (KDEs) and Cumulative Age Distributions (CADs). Produces Multidimensional Scaling (MDS) configurations and Shepard plots of multi-sample detrital datasets using the Kolmogorov-Smirnov distance as a dissimilarity measure. Calculates 40Ar/39Ar ages, isochrons, and age spectra. Computes weighted means accounting for overdispersion. Calculates U-Th-He (single grain and central) ages, logratio plots and ternary diagrams. Processes fission track data using the external detector method and LA-ICP-MS, calculates central ages and plots fission track and other data on radial (a.k.a. 'Galbraith') plots. Constructs total Pb-U, Pb-Pb, Th-Pb, K-Ca, Re-Os, Sm-Nd, Lu-Hf, Rb-Sr and 230Th-U isochrons as well as 230Th-U evolution plots.

Maintained by Pieter Vermeesch. Last updated 1 months ago.

1.0 match 69 stars 9.08 score 95 scripts 6 dependents

cran

AFR:Toolkit for Regression Analysis of Kazakhstan Banking Sector Data

Tool is created for regression, prediction and forecast analysis of macroeconomic and credit data. The package includes functions from existing R packages adapted for banking sector of Kazakhstan. The purpose of the package is to optimize statistical functions for easier interpretation for bank analysts and non-statisticians.

Maintained by Sultan Zhaparov. Last updated 6 months ago.

2.7 match 3.18 score

ebner-kit

gofgamma:Goodness-of-Fit Tests for the Gamma Distribution

We implement various classical tests for the composite hypothesis of testing the fit to the family of gamma distributions as the Kolmogorov-Smirnov test, the Cramer-von Mises test, the Anderson Darling test and the Watson test. For each test a parametric bootstrap procedure is implemented, as considered in Henze, Meintanis & Ebner (2012) <doi:10.1080/03610926.2010.542851>. The recent procedures presented in Henze, Meintanis & Ebner (2012) <doi:10.1080/03610926.2010.542851> and Betsch & Ebner (2019) <doi:10.1007/s00184-019-00708-7> are implemented. Estimation of parameters of the gamma law are implemented using the method of Bhattacharya (2001) <doi:10.1080/00949650108812100>.

Maintained by Bruno Ebner. Last updated 5 years ago.

7.8 match 1.00 score 2 scripts

danielamattei

Rita:Automated Transformations, Normality Testing, and Reporting

Automated performance of common transformations used to fulfill parametric assumptions of normality and identification of the best performing method for the user. Output for various normality tests (Thode, 2002) corresponding to the best performing method and a descriptive statistical report of the input data in its original units (5-number summary and mathematical moments) are also presented. Lastly, the Rankit, an empirical normal quantile transformation (ENQT) (Soloman & Sawilowsky, 2009), is provided to accommodate non-standard use cases and facilitate adoption. <DOI: 10.1201/9780203910894>. <DOI: 10.22237/jmasm/1257034080>.

Maintained by Daniel Mattei. Last updated 3 years ago.

3.8 match 2.00 score 1 scripts

ebner-kit

gofIG:Goodness-of-Fit Tests for the Inverse Gaussian Distribution

We implement various tests for the composite hypothesis of testing the fit to the family of inverse Gaussian distributions. Included are methods presented by Allison, J.S., Betsch, S., Ebner, B., and Visagie, I.J.H. (2022) <doi:10.48550/arXiv.1910.14119>, as well as two tests from Henze and Klar (2002) <doi:10.1023/A:1022442506681>. Additionally, the package implements a test proposed by Baringhaus and Gaigall (2015) <doi:10.1016/j.jmva.2015.05.013>. For each test a parametric bootstrap procedure is implemented.

Maintained by Bruno Ebner. Last updated 5 months ago.

7.1 match 1.00 score

sth1402

GGMridge:Gaussian Graphical Models Using Ridge Penalty Followed by Thresholding and Reestimation

Estimation of partial correlation matrix using ridge penalty followed by thresholding and reestimation. Under multivariate Gaussian assumption, the matrix constitutes an Gaussian graphical model (GGM).

Maintained by Shannon T. Holloway. Last updated 1 years ago.

3.6 match 1.89 score 13 scripts 2 dependents

kylebgorman

ldamatch:Selection of Statistically Similar Research Groups

Select statistically similar research groups by backward selection using various robust algorithms, including a heuristic based on linear discriminant analysis, multiple heuristics based on the test statistic, and parallelized exhaustive search.

Maintained by Kyle Gorman. Last updated 11 months ago.

3.3 match 2.00 score 9 scripts

blunde1

agtboost:Adaptive and Automatic Gradient Boosting Computations

Fast and automatic gradient tree boosting designed to avoid manual tuning and cross-validation by utilizing an information theoretic approach. This makes the algorithm adaptive to the dataset at hand; it is completely automatic, and with minimal worries of overfitting. Consequently, the speed-ups relative to state-of-the-art implementations can be in the thousands while mathematical and technical knowledge required on the user are minimized.

Maintained by Berent Ånund Strømnes Lunde. Last updated 3 years ago.

cpp

3.8 match 1.72 score 52 scripts

pvermees

geostats:An Introduction to Statistics for Geoscientists

A collection of datasets and simplified functions for an introductory (geo)statistics module at University College London. Provides functionality for compositional, directional and spatial data, including ternary diagrams, Wulff and Schmidt stereonets, and ordinary kriging interpolation. Implements logistic and (additive and centred) logratio transformations. Computes vector averages and concentration parameters for the von-Mises distribution. Includes a collection of natural and synthetic fractals, and a simulator for deterministic chaos using a magnetic pendulum example. The main purpose of these functions is pedagogical. Researchers can find more complete alternatives for these tools in other packages such as 'compositions', 'robCompositions', 'sp', 'gstat' and 'RFOC'. All the functions are written in plain R, with no compiled code and a minimal number of dependencies. Theoretical background and worked examples are available at <https://tinyurl.com/UCLgeostats/>.

Maintained by Pieter Vermeesch. Last updated 2 years ago.

3.8 match 1.58 score 38 scripts

cran

CDFt:Downscaling and Bias Correction via Non-Parametric CDF-Transform

Statistical downscaling and bias correction (model output statistics) method based on cumulative distribution functions (CDF) transformation. See Michelangeli, Vrac, Loukos (2009) Probabilistic downscaling approaches: Application to wind cumulative distribution functions. Geophysical Research Letters, 36, L11708, <doi:10.1029/2009GL038401>. ; and Vrac, Drobinski, Merlo, Herrmann, Lavaysse, Li, Somot (2012) Dynamical and statistical downscaling of the French Mediterranean climate: uncertainty assessment. Nat. Hazards Earth Syst. Sci., 12, 2769-2784, www.nat-hazards-earth-syst-sci.net/12/2769/2012/, <doi:10.5194/nhess-12-2769-2012>.

Maintained by Mathieu Vrac. Last updated 4 years ago.

3.8 match 1.52 score 11 scripts 1 dependents

cran

MN:Matrix Normal Distribution

Density computation, random matrix generation and maximum likelihood estimation of the matrix normal distribution. References: Pocuca N., Gallaugher M. P., Clark K. M. & McNicholas P. D. (2019). Assessing and Visualizing Matrix Variate Normality. <doi:10.48550/arXiv.1910.02859> and the relevant wikipedia page.

Maintained by Michail Tsagris. Last updated 10 months ago.

3.6 match 1.48 score 1 dependents

bioc

GeneExpressionSignature:Gene Expression Signature based Similarity Metric

This package gives the implementations of the gene expression signature and its distance to each. Gene expression signature is represented as a list of genes whose expression is correlated with a biological state of interest. And its distance is defined using a nonparametric, rank-based pattern-matching strategy based on the Kolmogorov-Smirnov statistic. Gene expression signature and its distance can be used to detect similarities among the signatures of drugs, diseases, and biological states of interest.

Maintained by Yang Cao. Last updated 5 months ago.

geneexpression

1.0 match 1 stars 5.00 score 5 scripts

bioc

scShapes:A Statistical Framework for Modeling and Identifying Differential Distributions in Single-cell RNA-sequencing Data

We present a novel statistical framework for identifying differential distributions in single-cell RNA-sequencing (scRNA-seq) data between treatment conditions by modeling gene expression read counts using generalized linear models (GLMs). We model each gene independently under each treatment condition using error distributions Poisson (P), Negative Binomial (NB), Zero-inflated Poisson (ZIP) and Zero-inflated Negative Binomial (ZINB) with log link function and model based normalization for differences in sequencing depth. Since all four distributions considered in our framework belong to the same family of distributions, we first perform a Kolmogorov-Smirnov (KS) test to select genes belonging to the family of ZINB distributions. Genes passing the KS test will be then modeled using GLMs. Model selection is done by calculating the Bayesian Information Criterion (BIC) and likelihood ratio test (LRT) statistic.

Maintained by Malindrie Dharmaratne. Last updated 5 months ago.

rnaseq singlecell multiplecomparison geneexpression

1.0 match 8 stars 4.90 score 6 scripts

cuining1

DistributionTest:Powerful Goodness-of-Fit Tests Based on the Likelihood Ratio

Provides new types of omnibus tests which are generally much more powerful than traditional tests (including the Kolmogorov-Smirnov, Cramer-von Mises and Anderson-Darling tests),see Zhang (2002) <doi:10.1111/1467-9868.00337>.

Maintained by Ning Cui. Last updated 5 years ago.

4.6 match 1.00 score

hanjunwei-lab

MiRSEA:'MicroRNA' Set Enrichment Analysis

The tools for 'MicroRNA Set Enrichment Analysis' can identify risk pathways(or prior gene sets) regulated by microRNA set in the context of microRNA expression data. (1) This package constructs a correlation profile of microRNA and pathways by the hypergeometric statistic test. The gene sets of pathways derived from the three public databases (Kyoto Encyclopedia of Genes and Genomes ('KEGG'); 'Reactome'; 'Biocarta') and the target gene sets of microRNA are provided by four databases('TarBaseV6.0'; 'mir2Disease'; 'miRecords'; 'miRTarBase';). (2) This package can quantify the change of correlation between microRNA for each pathway(or prior gene set) based on a microRNA expression data with cases and controls. (3) This package uses the weighted Kolmogorov-Smirnov statistic to calculate an enrichment score (ES) of a microRNA set that co-regulate to a pathway , which reflects the degree to which a given pathway is associated with the specific phenotype. (4) This package can provide the visualization of the results.

Maintained by Junwei Han. Last updated 5 years ago.

statistics pathways microrna enrichment analysis

1.0 match 4.51 score 16 scripts

cran

acid:Analysing Conditional Income Distributions

Functions for the analysis of income distributions for subgroups of the population as defined by a set of variables like age, gender, region, etc. This entails a Kolmogorov-Smirnov test for a mixture distribution as well as functions for moments, inequality measures, entropy measures and polarisation measures of income distributions. This package thus aides the analysis of income inequality by offering tools for the exploratory analysis of income distributions at the disaggregated level.

Maintained by Alexander Sohn. Last updated 9 years ago.

4.4 match 1.00 score

jiaxiangbu

rawKS:Easily Get True-Positive Rate and False-Positive Rate and KS Statistic

The Kolmogorov-Smirnov (K-S) statistic is a standard method to measure the model strength for credit risk scoring models. This package calculates the K–S statistic and plots the true-positive rate and false-positive rate to measure the model strength. This package was written with the credit marketer, who uses risk models in conjunction with his campaigns. The users could read more details from Thrasher (1992) <doi:10.1002/dir.4000060408> and 'pyks' <https://pypi.org/project/pyks/>.

Maintained by Jiaxiang Li. Last updated 5 years ago.

ks model-evaluation

1.0 match 3 stars 4.18 score 5 scripts

cran

GSSE:Genotype-Specific Survival Estimation

We propose a fully efficient sieve maximum likelihood method to estimate genotype-specific distribution of time-to-event outcomes under a nonparametric model. We can handle missing genotypes in pedigrees. We estimate the time-dependent hazard ratio between two genetic mutation groups using B-splines, while applying nonparametric maximum likelihood estimation to the reference baseline hazard function. The estimators are calculated via an expectation-maximization algorithm.

Maintained by Baosheng Liang. Last updated 9 years ago.

3.8 match 1.00 score

cran

BenfordTests:Statistical Tests for Evaluating Conformity to Benford's Law

Several specialized statistical tests and support functions for determining if numerical data could conform to Benford's law.

Maintained by Dieter William Joenssen. Last updated 10 years ago.

3.6 match 1.00 score

ashipunov

kldtools:Kullback-Leibler Divergence and Other Tools to Analyze Frequencies

Most importantly, calculates Kullback-Leibler Divergence (KLD), Turing's perspective estimator and their confidence intervals.

Maintained by Alexey Shipunov. Last updated 3 years ago.

3.6 match 1.00 score 3 scripts

umich-biostatistics

AEenrich:Adverse Event Enrichment Tests

We extend existing gene enrichment tests to perform adverse event enrichment analysis. Unlike the continuous gene expression data, adverse event data are counts. Therefore, adverse event data has many zeros and ties. We propose two enrichment tests. One is a modified Fisher's exact test based on pre-selected significant adverse events, while the other is based on a modified Kolmogorov-Smirnov statistic. We add Covariate adjustment to improve the analysis."Adverse event enrichment tests using VAERS" Shuoran Li, Lili Zhao (2020) <arXiv:2007.02266>.

Maintained by Michael Kleinsasser. Last updated 2 years ago.

1.0 match 3 stars 3.48 score 1 scripts

statcompute

vasicek:Miscellaneous Functions for Vasicek Distribution

Provide a collection of miscellaneous R functions related to the Vasicek distribution with the intent to make the lives of risk modelers easier.

Maintained by WenSui Liu. Last updated 4 years ago.

3.4 match 1.00 score

guangbaog

Dogoftest:Distributed Online Goodness-of-Fit Tests for Distributed Datasets

Distributed Online Goodness-of-Fit Test can process the distributed datasets. The philosophy of the package is described in Guo G.(2024) <doi:10.1016/j.apm.2024.115709>.

Maintained by Guangbao Guo. Last updated 26 days ago.

3.4 match 1.00 score

rlobenchain

NU.Learning:Nonparametric and Unsupervised Learning from Cross-Sectional Observational Data

Especially when cross-sectional data are observational, effects of treatment selection bias and confounding are best revealed by using Nonparametric and Unsupervised methods to "Design" the analysis of the given data ...rather than the collection of "designed data". Specifically, the "effect-size distribution" that best quantifies a potentially causal relationship between a numeric y-Outcome variable and either a binary t-Treatment or continuous e-Exposure variable needs to consist of BLOCKS of relatively well-matched experimental units (e.g. patients) that have the most similar X-confounder characteristics. Since our NU Learning approach will form BLOCKS by "clustering" experimental units in confounder X-space, the implicit statistical model for learning is One-Way ANOVA. Within Block measures of effect-size are then either [a] LOCAL Treatment Differences (LTDs) between Within-Cluster y-Outcome Means ("new" minus "control") when treatment choice is Binary or else [b] LOCAL Rank Correlations (LRCs) when the e-Exposure variable is numeric with (hopefully many) more than two levels. An Instrumental Variable (IV) method is also provided so that Local Average y-Outcomes (LAOs) within BLOCKS may also contribute information for effect-size inferences when X-Covariates are assumed to influence Treatment choice or Exposure level but otherwise have no direct effects on y-Outcomes. Finally, a "Most-Like-Me" function provides histograms of effect-size distributions to aid Doctor-Patient (or Researcher-Society) communications about Heterogeneous Outcomes. Obenchain and Young (2013) <doi:10.1080/15598608.2013.772821>; Obenchain, Young and Krstic (2019) <doi:10.1016/j.yrtph.2019.104418>.

Maintained by Bob Obenchain. Last updated 1 years ago.

3.3 match 1.00 score 2 scripts

cran

SCBiclust:Identifies Mean, Variance, and Hierarchically Clustered Biclusters

Identifies a bicluster, a submatrix of the data such that the features and observations within the submatrix differ from those not contained in submatrix, using a two-step method. In the first step, observations in the bicluster are identified to maximize the sum of weighted between cluster feature differences. The method is described in Helgeson et al. (2020) <doi:10.1111/biom.13136>. 'SCBiclust' can be used to identify biclusters which differ based on feature means, feature variances, or more general differences.

Maintained by Erika S. Helgeson. Last updated 3 years ago.

3.3 match 1.00 score

haydarde

CryptRndTest:Statistical Tests for Cryptographic Randomness

Performs cryptographic randomness tests on a sequence of random integers or bits. Included tests are greatest common divisor, birthday spacings, book stack, adaptive chi-square, topological binary, and three random walk tests (Ryabko and Monarev, 2005) <doi:10.1016/j.jspi.2004.02.010>. Tests except greatest common divisor and birthday spacings are not covered by standard test suites. In addition to the chi-square goodness-of-fit test, results of Anderson-Darling, Kolmogorov-Smirnov, and Jarque-Bera tests are also generated by some of the cryptographic randomness tests.

Maintained by Haydar Demirhan. Last updated 3 years ago.

1.0 match 2.20 score 16 scripts

guven-code

fitmix:Finite Mixture Model Fitting of Lifespan Datasets

Fits the lifespan datasets of biological systems such as yeast, fruit flies, and other similar biological units with well-known finite mixture models introduced by Farewell et al. (1982) <doi:10.2307/2529885> and Al-Hussaini et al. (2000) <doi:10.1080/00949650008812033>. Estimates parameter space fitting of a lifespan dataset with finite mixtures of parametric distributions. Computes the following tasks; 1) Estimates parameter space of the finite mixture model by implementing the expectation maximization (EM) algorithm. 2) Finds a sequence of four goodness-of-fit measures consist of Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics. 3)The initial values is determined by k-means clustering.

Maintained by Emine Guven. Last updated 4 years ago.

1.0 match 2.00 score

xiaomangmang

iZID:Identify Zero-Inflated Distributions

Computes bootstrapped Monte Carlo estimate of p value of Kolmogorov-Smirnov (KS) test and likelihood ratio test for zero-inflated count data, based on the work of Aldirawi et al. (2019) <doi:10.1109/BHI.2019.8834661>. With the package, user can also find tools to simulate random deviates from zero inflated or hurdle models and obtain maximum likelihood estimate of unknown parameters in these models.

Maintained by Lei Wang. Last updated 5 years ago.

1.0 match 1.70 score 6 scripts

wpihongzhang

SetTest:Group Testing Procedures for Signal Detection and Goodness-of-Fit

It provides cumulative distribution function (CDF), quantile, p-value, statistical power calculator and random number generator for a collection of group-testing procedures, including the Higher Criticism tests, the one-sided Kolmogorov-Smirnov tests, the one-sided Berk-Jones tests, the one-sided phi-divergence tests, etc. The input are a group of p-values. The null hypothesis is that they are i.i.d. Uniform(0,1). In the context of signal detection, the null hypothesis means no signals. In the context of the goodness-of-fit testing, which contrasts a group of i.i.d. random variables to a given continuous distribution, the input p-values can be obtained by the CDF transformation. The null hypothesis means that these random variables follow the given distribution. For reference, see [1]Hong Zhang, Jiashun Jin and Zheyang Wu. "Distributions and power of optimal signal-detection statistics in finite case", IEEE Transactions on Signal Processing (2020) 68, 1021-1033; [2] Hong Zhang and Zheyang Wu. "The general goodness-of-fit tests for correlated data", Computational Statistics & Data Analysis (2022) 167, 107379.

Maintained by Hong Zhang. Last updated 8 months ago.

1.0 match 1.32 score 21 scripts

cran

SOPIE:Non-Parametric Estimation of the Off-Pulse Interval of a Pulsar

Provides functions to non-parametrically estimate the off-pulse interval of a source function originating from a pulsar. The technique is based on a sequential application of P-values obtained from goodness-of-fit tests for the uniform distribution, such as the Kolmogorov-Smirnov, Cramer-von Mises, Anderson-Darling and Rayleigh goodness-of-fit tests.

Maintained by Willem Daniel Schutte. Last updated 3 years ago.

1.0 match 1.00 score

cran

stodom:Estimating Consistent Tests for Stochastic Dominance

Stochastic dominance tests help ranking different distributions. The package implements the consistent test for stochastic dominance by Barrett and Donald (2003) <doi:10.1111/1468-0262.00390>. Specifically, it implements Barrett and Donald's Kolmogorov-Smirnov type tests for first- and second-order stochastic dominance based on bootstrapping 2 and 1.

Maintained by Sergei Schaub. Last updated 1 years ago.

1.0 match 1.00 score

markusboenn

fitteR:Fit Hundreds of Theoretical Distributions to Empirical Data

Systematic fit of hundreds of theoretical univariate distributions to empirical data via maximum likelihood estimation. Fits are reported and summarized by a data.frame, a csv file or a 'shiny' app (here with additional features like visual representation of fits). All output formats provide assessment of goodness-of-fit by the following methods: Kolmogorov-Smirnov test, Shapiro-Wilks test, Anderson-Darling test.

Maintained by Markus Boenn. Last updated 3 years ago.

1.0 match 1.00 score 6 scripts

leilamarvian

Hassani.Silva:A Test for Comparing the Predictive Accuracy of Two Sets of Forecasts

A non-parametric test founded upon the principles of the Kolmogorov-Smirnov (KS) test, referred to as the KS Predictive Accuracy (KSPA) test. The KSPA test is able to serve two distinct purposes. Initially, the test seeks to determine whether there exists a statistically significant difference between the distribution of forecast errors, and secondly it exploits the principles of stochastic dominance to determine whether the forecasts with the lower error also reports a stochastically smaller error than forecasts from a competing model, and thereby enables distinguishing between the predictive accuracy of forecasts. KSPA test has been described in : Hassani and Silva (2015) <doi:10.3390/econometrics3030590>.

Maintained by Leila Marvian Mashhad. Last updated 2 years ago.

1.0 match 1 stars 1.00 score

niloufardousti

AZIAD:Analyzing Zero-Inflated and Zero-Altered Data

Description: Computes maximum likelihood estimates of general, zero-inflated, and zero-altered models for discrete and continuous distributions. It also performs Kolmogorov-Smirnov (KS) tests and likelihood ratio tests for general, zero-inflated, and zero-altered data. Additionally, it obtains the inverse of the Fisher information matrix and confidence intervals for the parameters of general, zero-inflated, and zero-altered models. The package simulates random deviates from zero-inflated or hurdle models to obtain maximum likelihood estimates. Based on the work of Aldirawi et al. (2022) <doi:10.1007/s42519-021-00230-y> and Dousti Mousavi et al. (2023) <doi:10.1080/00949655.2023.2207020>.

Maintained by Niloufar Dousti Mousavi. Last updated 11 months ago.

1.0 match 1.00 score

imranshakoor

ComRiskModel:Fitting of Complementary Risk Models

Evaluates the probability density function (PDF), cumulative distribution function (CDF), quantile function (QF), random numbers and maximum likelihood estimates (MLEs) of well-known complementary binomial-G, complementary negative binomial-G and complementary geometric-G families of distributions taking baseline models such as exponential, extended exponential, Weibull, extended Weibull, Fisk, Lomax, Burr-XII and Burr-X. The functions also allow computing the goodness-of-fit measures namely the Akaike-information-criterion (AIC), the Bayesian-information-criterion (BIC), the minimum value of the negative log-likelihood (-2L) function, Anderson-Darling (A) test, Cramer-Von-Mises (W) test, Kolmogorov-Smirnov test, P-value and convergence status. Moreover, some commonly used data sets from the fields of actuarial, reliability, and medical science are also provided. Related works include: a) Tahir, M. H., & Cordeiro, G. M. (2016). Compounding of distributions: a survey and new generalized classes. Journal of Statistical Distributions and Applications, 3, 1-35. <doi:10.1186/s40488-016-0052-1>.

Maintained by Muhammad Imran. Last updated 2 years ago.

1.0 match 1.00 score

allegropiano

GLDreg:Fit GLD Regression/Quantile/AFT Model to Data

Owing to the rich shapes of Generalised Lambda Distributions (GLDs), GLD standard/quantile/Accelerated Failure Time (AFT) regression is a competitive flexible model compared to standard/quantile/AFT regression. The proposed method has some major advantages: 1) it provides a reference line which is very robust to outliers with the attractive property of zero mean residuals and 2) it gives a unified, elegant quantile regression model from the reference line with smooth regression coefficients across different quantiles. For AFT model, it also eliminates the needs to try several different AFT models, owing to the flexible shapes of GLD. The goodness of fit of the proposed model can be assessed via QQ plots and Kolmogorov-Smirnov tests and data driven smooth test, to ensure the appropriateness of the statistical inference under consideration. Statistical distributions of coefficients of the GLD regression line are obtained using simulation, and interval estimates are obtained directly from simulated data. References include the following: Su (2015) "Flexible Parametric Quantile Regression Model" <doi:10.1007/s11222-014-9457-1>, Su (2021) "Flexible parametric accelerated failure time model"<doi:10.1080/10543406.2021.1934854>.

Maintained by Steve Su. Last updated 1 years ago.

1.0 match 1 stars 1.00 score 8 scripts