R-universe search: anscombe

paulnorthrop

anscombiser:Create Datasets with Identical Summary Statistics

Anscombe's quartet are a set of four two-variable datasets that have several common summary statistics but which have very different joint distributions. This becomes apparent when the data are plotted, which illustrates the importance of using graphical displays in Statistics. This package enables the creation of datasets that have identical marginal sample means and sample variances, sample correlation, least squares regression coefficients and coefficient of determination. The user supplies an initial dataset, which is shifted, scaled and rotated in order to achieve target summary statistics. The general shape of the initial dataset is retained. The target statistics can be supplied directly or calculated based on a user-supplied dataset. The 'datasauRus' package <https://cran.r-project.org/package=datasauRus> provides further examples of datasets that have markedly different scatter plots but share many sample summary statistics.

Maintained by Paul J. Northrop. Last updated 2 years ago.

anscombe anscombes-quartet anscombesquartet

36.8 match 11 stars 4.74 score 9 scripts

r-causal

quartets:Datasets to Help Teach Statistics

In the spirit of Anscombe's quartet, this package includes datasets that demonstrate the importance of visualizing your data, the importance of not relying on statistical summary measures alone, and why additional assumptions about the data generating mechanism are needed when estimating causal effects. The package includes "Anscombe's Quartet" (Anscombe 1973) <doi:10.1080/00031305.1973.10478966>, D'Agostino McGowan & Barrett (2023) "Causal Quartet" <doi:10.1080/26939169.2023.2276446>, "Datasaurus Dozen" (Matejka & Fitzmaurice 2017), "Interaction Triptych" (Rohrer & Arslan 2021) <doi:10.1177/25152459211007368>, "Rashomon Quartet" (Biecek et al. 2023) <doi:10.48550/arXiv.2302.13356>, and Gelman "Variation and Heterogeneity Causal Quartets" (Gelman et al. 2023) <doi:10.48550/arXiv.2302.12878>.

Maintained by Lucy DAgostino McGowan. Last updated 1 years ago.

19.8 match 42 stars 5.75 score 27 scripts

r-forge

surveillance:Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena

Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hoehle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hoehle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.

Maintained by Sebastian Meyer. Last updated 3 days ago.

cpp

4.3 match 2 stars 10.68 score 446 scripts 3 dependents

luqqe

moments:Moments, Cumulants, Skewness, Kurtosis and Related Tests

Functions to calculate: moments, Pearson's kurtosis, Geary's kurtosis and skewness; tests related to them (Anscombe-Glynn, D'Agostino, Bonett-Seier).

Maintained by Lukasz Komsta. Last updated 3 years ago.

4.7 match 2 stars 9.34 score 4.8k scripts 123 dependents

kenaho1

asbio:A Collection of Statistical Tools for Biologists

Contains functions from: Aho, K. (2014) Foundational and Applied Statistics for Biologists using R. CRC/Taylor and Francis, Boca Raton, FL, ISBN: 978-1-4398-7338-0.

Maintained by Ken Aho. Last updated 2 months ago.

4.5 match 5 stars 7.32 score 310 scripts 3 dependents

cran

binhf:Haar-Fisz Functions for Binomial Data

Binomial Haar-Fisz transforms for Gaussianization as in Nunes and Nason (2009).

Maintained by Matt Nunes. Last updated 7 years ago.

8.0 match 3.85 score 3 dependents

dcousin3

ANOPA:Analyses of Proportions using Anscombe Transform

Analyses of Proportions can be performed on the Anscombe (arcsine-related) transformed data. The 'ANOPA' package can analyze proportions obtained from up to four factors. The factors can be within-subject or between-subject or a mix of within- and between-subject. The main, omnibus analysis can be followed by additive decompositions into interaction effects, main effects, simple effects, contrast effects, etc., mimicking precisely the logic of ANOVA. For that reason, we call this set of tools 'ANOPA' (Analysis of Proportion using Anscombe transform) to highlight its similarities with ANOVA. The 'ANOPA' framework also allows plots of proportions easy to obtain along with confidence intervals. Finally, effect sizes and planning statistical power are easily done under this framework. Only particularity, the 'ANOPA' computes F statistics which have an infinite degree of freedom on the denominator. See Laurencelle and Cousineau (2023) <doi:10.3389/fpsyg.2022.1045436>.

Maintained by Denis Cousineau. Last updated 2 months ago.

error-bars proportions statistical-testing statistics summary-statistics

7.2 match 1 stars 3.65 score 18 scripts

tjfarrar

skedastic:Handling Heteroskedasticity in the Linear Regression Model

Implements numerous methods for testing for, modelling, and correcting for heteroskedasticity in the classical linear regression model. The most novel contribution of the package is found in the functions that implement the as-yet-unpublished auxiliary linear variance models and auxiliary nonlinear variance models that are designed to estimate error variances in a heteroskedastic linear regression model. These models follow principles of statistical learning described in Hastie (2009) <doi:10.1007/978-0-387-21606-5>. The nonlinear version of the model is estimated using quasi-likelihood methods as described in Seber and Wild (2003, ISBN: 0-471-47135-6). Bootstrap methods for approximate confidence intervals for error variances are implemented as described in Efron and Tibshirani (1993, ISBN: 978-1-4899-4541-9), including also the expansion technique described in Hesterberg (2014) <doi:10.1080/00031305.2015.1089789>. The wild bootstrap employed here follows the description in Davidson and Flachaire (2008) <doi:10.1016/j.jeconom.2008.08.003>. Tuning of hyper-parameters makes use of a golden section search function that is modelled after the MATLAB function of Zarnowiec (2022) <https://www.mathworks.com/matlabcentral/fileexchange/25919-golden-section-method-algorithm>. A methodological description of the algorithm can be found in Fox (2021, ISBN: 978-1-003-00957-3). There are 25 different functions that implement hypothesis tests for heteroskedasticity. These include a test based on Anscombe (1961) <https://projecteuclid.org/euclid.bsmsp/1200512155>, Ramsey's (1969) BAMSET Test <doi:10.1111/j.2517-6161.1969.tb00796.x>, the tests of Bickel (1978) <doi:10.1214/aos/1176344124>, Breusch and Pagan (1979) <doi:10.2307/1911963> with and without the modification proposed by Koenker (1981) <doi:10.1016/0304-4076(81)90062-2>, Carapeto and Holt (2003) <doi:10.1080/0266476022000018475>, Cook and Weisberg (1983) <doi:10.1093/biomet/70.1.1> (including their graphical methods), Diblasi and Bowman (1997) <doi:10.1016/S0167-7152(96)00115-0>, Dufour, Khalaf, Bernard, and Genest (2004) <doi:10.1016/j.jeconom.2003.10.024>, Evans and King (1985) <doi:10.1016/0304-4076(85)90085-5> and Evans and King (1988) <doi:10.1016/0304-4076(88)90006-1>, Glejser (1969) <doi:10.1080/01621459.1969.10500976> as formulated by Mittelhammer, Judge and Miller (2000, ISBN: 0-521-62394-4), Godfrey and Orme (1999) <doi:10.1080/07474939908800438>, Goldfeld and Quandt (1965) <doi:10.1080/01621459.1965.10480811>, Harrison and McCabe (1979) <doi:10.1080/01621459.1979.10482544>, Harvey (1976) <doi:10.2307/1913974>, Honda (1989) <doi:10.1111/j.2517-6161.1989.tb01749.x>, Horn (1981) <doi:10.1080/03610928108828074>, Li and Yao (2019) <doi:10.1016/j.ecosta.2018.01.001> with and without the modification of Bai, Pan, and Yin (2016) <doi:10.1007/s11749-017-0575-x>, Rackauskas and Zuokas (2007) <doi:10.1007/s10986-007-0018-6>, Simonoff and Tsai (1994) <doi:10.2307/2986026> with and without the modification of Ferrari, Cysneiros, and Cribari-Neto (2004) <doi:10.1016/S0378-3758(03)00210-6>, Szroeter (1978) <doi:10.2307/1913831>, Verbyla (1993) <doi:10.1111/j.2517-6161.1993.tb01918.x>, White (1980) <doi:10.2307/1912934>, Wilcox and Keselman (2006) <doi:10.1080/10629360500107923>, Yuce (2008) <https://dergipark.org.tr/en/pub/iuekois/issue/8989/112070>, and Zhou, Song, and Thompson (2015) <doi:10.1002/cjs.11252>. Besides these heteroskedasticity tests, there are supporting functions that compute the BLUS residuals of Theil (1965) <doi:10.1080/01621459.1965.10480851>, the conditional two-sided p-values of Kulinskaya (2008) <arXiv:0810.2124v1>, and probabilities for the nonparametric trend statistic of Lehmann (1975, ISBN: 0-816-24996-1). For handling heteroskedasticity, in addition to the new auxiliary variance model methods, there is a function to implement various existing Heteroskedasticity-Consistent Covariance Matrix Estimators from the literature, such as those of White (1980) <doi:10.2307/1912934>, MacKinnon and White (1985) <doi:10.1016/0304-4076(85)90158-7>, Cribari-Neto (2004) <doi:10.1016/S0167-9473(02)00366-3>, Cribari-Neto et al. (2007) <doi:10.1080/03610920601126589>, Cribari-Neto and da Silva (2011) <doi:10.1007/s10182-010-0141-2>, Aftab and Chang (2016) <doi:10.18187/pjsor.v12i2.983>, and Li et al. (2017) <doi:10.1080/00949655.2016.1198906>.

Maintained by Thomas Farrar. Last updated 1 years ago.

5.3 match 7 stars 4.60 score 73 scripts

mmaechler

sfsmisc:Utilities from 'Seminar fuer Statistik' ETH Zurich

Useful utilities ['goodies'] from Seminar fuer Statistik ETH Zurich, some of which were ported from S-plus in the 1990s. For graphics, have pretty (Log-scale) axes eaxis(), an enhanced Tukey-Anscombe plot, combining histogram and boxplot, 2d-residual plots, a 'tachoPlot()', pretty arrows, etc. For robustness, have a robust F test and robust range(). For system support, notably on Linux, provides 'Sys.*()' functions with more access to system and CPU information. Finally, miscellaneous utilities such as simple efficient prime numbers, integer codes, Duplicated(), toLatex.numeric() and is.whole().

Maintained by Martin Maechler. Last updated 5 months ago.

2.2 match 11 stars 10.87 score 566 scripts 119 dependents

svmiller

stevedata:Steve's Toy Data for Teaching About a Variety of Methodological, Social, and Political Topics

This is a collection of various kinds of data with broad uses for teaching. My students, and academics like me who teach the same topics I teach, should find this useful if their teaching workflow is also built around the R programming language. The applications are multiple but mostly cluster on topics of statistical methodology, international relations, and political economy.

Maintained by Steve Miller. Last updated 4 days ago.

4.0 match 8 stars 5.97 score 178 scripts

numbats

cassowaryr:Compute Scagnostics on Pairs of Numeric Variables in a Data Set

Computes a range of scatterplot diagnostics (scagnostics) on pairs of numerical variables in a data set. A range of scagnostics, including graph and association-based scagnostics described by Leland Wilkinson and Graham Wills (2008) <doi:10.1198/106186008X320465> and association-based scagnostics described by Katrin Grimm (2016,ISBN:978-3-8439-3092-5) can be computed. Summary and plotting functions are provided.

Maintained by Harriet Mason. Last updated 12 days ago.

data-science data-visualization eda high-dimensional-data multivariate

3.5 match 3 stars 6.02 score 26 scripts 1 dependents

stephenturner

Tmisc:Turner Miscellaneous

Miscellaneous utility functions for data manipulation, data tidying, and working with gene expression data and biological sequence data.

Maintained by Stephen Turner. Last updated 11 months ago.

3.8 match 2 stars 5.44 score 174 scripts 1 dependents

ovgu-sh

desk:Didactic Econometrics Starter Kit

Written to help undergraduate as well as graduate students to get started with R for basic econometrics without the need to import specific functions and datasets from many different sources. Primarily, the package is meant to accompany the German textbook Auer, L.v., Hoffmann, S., Kranz, T. (2024, ISBN: 978-3-662-68263-0) from which the exercises cover all the topics from the textbook Auer, L.v. (2023, ISBN: 978-3-658-42699-6).

Maintained by Soenke Hoffmann. Last updated 11 months ago.

4.5 match 4.30 score 10 scripts

bioc

CSSQ:Chip-seq Signal Quantifier Pipeline

This package is desgined to perform statistical analysis to identify statistically significant differentially bound regions between multiple groups of ChIP-seq dataset.

Maintained by Fan Lab at Georgia Institute of Technology. Last updated 5 months ago.

chipseq differentialpeakcalling sequencing normalization

3.5 match 4.00 score 1 scripts

hneth

ds4psy:Data Science for Psychologists

All datasets and functions required for the examples and exercises of the book "Data Science for Psychologists" (by Hansjoerg Neth, Konstanz University, 2023), freely available at <https://bookdown.org/hneth/ds4psy/>. The book and course introduce principles and methods of data science to students of psychology and other biological or social sciences. The 'ds4psy' package primarily provides datasets, but also functions for data generation and manipulation (e.g., of text and time data) and graphics that are used in the book and its exercises. All functions included in 'ds4psy' are designed to be explicit and instructive, rather than efficient or elegant.

Maintained by Hansjoerg Neth. Last updated 1 months ago.

data-literacy data-science education exploratory-data-analysis psychology social-sciences visualisation

1.7 match 22 stars 6.79 score 70 scripts

cseljatib

datana:Datasets and Functions to Accompany Analisis De Datos Con R

Datasets and functions to accompany the book 'Analisis de datos con el programa estadistico R: una introduccion aplicada' by Salas-Eljatib (2021, ISBN: 9789566086109). The package helps carry out data management, exploratory analyses, and model fitting.

Maintained by Christian Salas-Eljatib. Last updated 6 months ago.

6.8 match 1.30 score 1 scripts

r-forge

ROptEst:Optimally Robust Estimation

R infrastructure for optimally robust estimation in general smoothly parameterized models using S4 classes and methods as described Kohl, M., Ruckdeschel, P., and Rieder, H. (2010), <doi:10.1007/s10260-010-0133-0>, and in Rieder, H., Kohl, M., and Ruckdeschel, P. (2008), <doi:10.1007/s10260-007-0047-7>.

Maintained by Matthias Kohl. Last updated 2 months ago.

2.0 match 4.26 score 50 scripts 1 dependents

sahirbhatnagar

ggmix:Variable Selection in Linear Mixed Models for SNP Data

Fit penalized multivariable linear mixed models with a single random effect to control for population structure in genetic association studies. The goal is to simultaneously fit many genetic variants at the same time, in order to select markers that are independently associated with the response. Can also handle prior annotation information, for example, rare variants, in the form of variable weights. For more information, see the website below and the accompanying paper: Bhatnagar et al., "Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models", 2020, <DOI:10.1371/journal.pgen.1008766>.

Maintained by Sahir Bhatnagar. Last updated 4 years ago.

1.3 match 10 stars 5.48 score 20 scripts

ericbarba

ExpDes.pt:Pacote Experimental Designs (Portugues)

Pacote para análise de delineamentos experimentais (DIC, DBC e DQL), experimentos em esquema fatorial duplo (em DIC e DBC), experimentos em parcelas subdivididas (em DIC e DBC), experimentos em esquema fatorial duplo com um tratamento adicional (em DIC e DBC), experimentos em fatorial triplo (em DIC e DBC) e experimentos em esquema fatorial triplo com um tratamento adicional (em DIC e DBC), fazendo analise de variancia e comparacao de multiplas medias (para tratamentos qualitativos), ou ajustando modelos de regressao ate a terceira potencia (para tratamentos quantitativos); analise de residuos (Ferreira, Cavalcanti and Nogueira, 2014) <doi:10.4236/am.2014.519280>.

Maintained by Eric Batista Ferreira. Last updated 3 years ago.

1.7 match 3.52 score 232 scripts

bb-diesunddas

multivariance:Measuring Multivariate Dependence Using Distance Multivariance

Distance multivariance is a measure of dependence which can be used to detect and quantify dependence of arbitrarily many random vectors. The necessary functions are implemented in this packages and examples are given. It includes: distance multivariance, distance multicorrelation, dependence structure detection, tests of independence and copula versions of distance multivariance based on the Monte Carlo empirical transform. Detailed references are given in the package description, as starting point for the theoretic background we refer to: B. Böttcher, Dependence and Dependence Structures: Estimation and Visualization Using the Unifying Concept of Distance Multivariance. Open Statistics, Vol. 1, No. 1 (2020), <doi:10.1515/stat-2020-0001>.

Maintained by Björn Böttcher. Last updated 3 years ago.

cpp

4.0 match 1 stars 1.36 score 23 scripts

ericbarba

ExpDes:Experimental Designs Package

Package for analysis of simple experimental designs (CRD, RBD and LSD), experiments in double factorial schemes (in CRD and RBD), experiments in a split plot in time schemes (in CRD and RBD), experiments in double factorial schemes with an additional treatment (in CRD and RBD), experiments in triple factorial scheme (in CRD and RBD) and experiments in triple factorial schemes with an additional treatment (in CRD and RBD), performing the analysis of variance and means comparison by fitting regression models until the third power (quantitative treatments) or by a multiple comparison test, Tukey test, test of Student-Newman-Keuls (SNK), Scott-Knott, Duncan test, t test (LSD) and Bonferroni t test (protected LSD) - for qualitative treatments; residual analysis (Ferreira, Cavalcanti and Nogueira, 2014) <doi:10.4236/am.2014.519280>.

Maintained by Eric Batista Ferreira. Last updated 3 years ago.

1.8 match 1 stars 2.86 score 73 scripts

daphnaharel

sur:Companion to "Statistics Using R: An Integrative Approach"

Access to the datasets and many of the functions used in "Statistics Using R: An Integrative Approach". These datasets include a subset of the National Education Longitudinal Study, the Framingham Heart Study, as well as several simulated datasets used in the examples throughout the textbook. The functions included in the package reproduce some of the functionality of 'Stata' that is not directly available in 'R'. The package also contains a tutorial on basic data frame management, including how to handle missing data.

Maintained by Daphna Harel. Last updated 5 years ago.

4.0 match 1.26 score 18 scripts

jmcurran

dafs:Data Analysis for Forensic Scientists

Data and miscellanea to support the book "Introduction to Data analysis with R for Forensic Scientists." This book was written by James Curran and published by CRC Press in 2010 (ISBN: 978-1-4200-8826-7).

Maintained by James Curran. Last updated 3 years ago.

4.5 match 1 stars 1.08 score 12 scripts

chandlerxiandeyang

CleaningValidation:Cleaning Validation Functions for Pharmaceutical Cleaning Process

Provides essential Cleaning Validation functions for complying with pharmaceutical cleaning process regulatory standards. The package includes non-parametric methods to analyze drug active-ingredient residue (DAR), cleaning agent residue (CAR), and microbial colonies (Mic) for non-Poisson distributions. Additionally, Poisson methods are provided for Mic analysis when Mic data follow a Poisson distribution.

Maintained by Xiande Yang. Last updated 10 months ago.

1.8 match 2.70 score

gabriellajg

boot.heterogeneity:A Bootstrap-Based Heterogeneity Test for Meta-Analysis

Implements a bootstrap-based heterogeneity test for standardized mean differences (d), Fisher-transformed Pearson's correlations (r), and natural-logarithm-transformed odds ratio (or) in meta-analysis studies. Depending on the presence of moderators, this Monte Carlo based test can be implemented in the random- or mixed-effects model. This package uses rma() function from the R package 'metafor' to obtain parameter estimates and likelihoods, so installation of R package 'metafor' is required. This approach refers to the studies of Anscombe (1956) <doi:10.2307/2332926>, Haldane (1940) <doi:10.2307/2332614>, Hedges (1981) <doi:10.3102/10769986006002107>, Hedges & Olkin (1985, ISBN:978-0123363800), Silagy, Lancaster, Stead, Mant, & Fowler (2004) <doi:10.1002/14651858.CD000146.pub2>, Viechtbauer (2010) <doi:10.18637/jss.v036.i03>, and Zuckerman (1994, ISBN:978-0521432009).

Maintained by Ge Jiang. Last updated 3 years ago.

0.5 match 5.18 score 4 scripts 1 dependents

cran

lmerPerm:Perform Permutation Test on General Linear and Mixed Linear Regression

We provide a solution for performing permutation tests on linear and mixed linear regression models. It allows users to obtain accurate p-values without making distributional assumptions about the data. By generating a null distribution of the test statistics through repeated permutations of the response variable, permutation tests provide a powerful alternative to traditional parameter tests (Holt et al. (2023) <doi:10.1007/s10683-023-09799-6>). In this early version, we focus on the permutation tests over observed t values of beta coefficients, i.e.original t values generated by parameter tests. After generating a null distribution of the test statistic through repeated permutations of the response variable, each observed t values would be compared to the null distribution to generate a p-value. To improve the efficiency,a stop criterion (Anscombe (1953) <doi:10.1111/j.2517-6161.1953.tb00121.x>) is adopted to force permutation to stop if the estimated standard deviation of the value falls below a fraction of the estimated p-value. By doing so, we avoid the need for massive calculations in exact permutation methods while still generating stable and accurate p-values.

Maintained by Wentao Zeng. Last updated 2 years ago.

0.5 match 1.00 score