Showing 162 of total 162 results (show query)

sbgraves237

Ecdat:Data Sets for Econometrics

Data sets for econometrics, including political science.

Maintained by Spencer Graves. Last updated 4 months ago.

7.2 match 2 stars 7.25 score 740 scripts 3 dependents

yuimaproject

yuima:The YUIMA Project Package for SDEs

Simulation and Inference for SDEs and Other Stochastic Processes.

Maintained by Stefano M. Iacus. Last updated 3 days ago.

openblascpp

5.3 match 9 stars 7.26 score 92 scripts 2 dependents

r-forge

surveillance:Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena

Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hoehle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hoehle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.

Maintained by Sebastian Meyer. Last updated 2 days ago.

cpp

3.5 match 2 stars 10.68 score 446 scripts 3 dependents

carloscinelli

benford.analysis:Benford Analysis for Data Validation and Forensic Analytics

Provides tools that make it easier to validate data using Benford's Law.

Maintained by Carlos Cinelli. Last updated 6 years ago.

5.7 match 62 stars 5.66 score 74 scripts

alanarnholt

BSDA:Basic Statistics and Data Analysis

Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.

Maintained by Alan T. Arnholt. Last updated 2 years ago.

3.4 match 7 stars 9.11 score 1.3k scripts 6 dependents

skeydan

torchaudio:R Interface to 'pytorch''s 'torchaudio'

Provides access to datasets, models and processing facilities for deep learning in audio.

Maintained by Sigrid Keydana. Last updated 2 years ago.

7.8 match 3.46 score 58 scripts

rspatial

geosphere:Spherical Trigonometry

Spherical trigonometry for geographic applications. That is, compute distances and related measures for angular (longitude/latitude) locations.

Maintained by Robert J. Hijmans. Last updated 6 months ago.

cpp

1.8 match 36 stars 13.79 score 5.7k scripts 116 dependents

nepem-ufsc

metan:Multi Environment Trials Analysis

Performs stability analysis of multi-environment trial data using parametric and non-parametric methods. Parametric methods includes Additive Main Effects and Multiplicative Interaction (AMMI) analysis by Gauch (2013) <doi:10.2135/cropsci2013.04.0241>, Ecovalence by Wricke (1965), Genotype plus Genotype-Environment (GGE) biplot analysis by Yan & Kang (2003) <doi:10.1201/9781420040371>, geometric adaptability index by Mohammadi & Amri (2008) <doi:10.1007/s10681-007-9600-6>, joint regression analysis by Eberhart & Russel (1966) <doi:10.2135/cropsci1966.0011183X000600010011x>, genotypic confidence index by Annicchiarico (1992), Murakami & Cruz's (2004) method, power law residuals (POLAR) statistics by Doring et al. (2015) <doi:10.1016/j.fcr.2015.08.005>, scale-adjusted coefficient of variation by Doring & Reckling (2018) <doi:10.1016/j.eja.2018.06.007>, stability variance by Shukla (1972) <doi:10.1038/hdy.1972.87>, weighted average of absolute scores by Olivoto et al. (2019a) <doi:10.2134/agronj2019.03.0220>, and multi-trait stability index by Olivoto et al. (2019b) <doi:10.2134/agronj2019.03.0221>. Non-parametric methods includes superiority index by Lin & Binns (1988) <doi:10.4141/cjps88-018>, nonparametric measures of phenotypic stability by Huehn (1990) <doi:10.1007/BF00024241>, TOP third statistic by Fox et al. (1990) <doi:10.1007/BF00040364>. Functions for computing biometrical analysis such as path analysis, canonical correlation, partial correlation, clustering analysis, and tools for inspecting, manipulating, summarizing and plotting typical multi-environment trial data are also provided.

Maintained by Tiago Olivoto. Last updated 9 days ago.

2.3 match 2 stars 9.48 score 1.3k scripts 2 dependents

fdzul

denhotspots:a package for calculate gi and hi local spatial statistics

the denhotspots package calculate the gi and hi local spatial statistics for areal data.

Maintained by The package maintainer. Last updated 6 days ago.

4.0 match 1 stars 4.38 score 6 scripts

larssnip

micropan:Microbial Pan-Genome Analysis

A collection of functions for computations and visualizations of microbial pan-genomes.

Maintained by Lars Snipen. Last updated 3 years ago.

2.0 match 21 stars 6.15 score 67 scripts

cran

HTMLUtils:Facilitates Automated HTML Report Creation

Facilitates automated HTML report creation, in particular framed HTML pages and dynamically sortable tables.

Maintained by "Markus Loecher, Berlin School of Economics and Law (BSEL)". Last updated 1 years ago.

7.4 match 1.48 score 9 scripts 1 dependents

insightsengineering

osprey:R Package to Create TLGs

Community effort to collect TLG code and create a catalogue.

Maintained by Nina Qi. Last updated 20 days ago.

cataloggraphslistingsnesttables

2.0 match 4 stars 5.41 score 1 dependents

flr

FLSAM:An Implementation of the State-Space Assessment Model for FLR

This package provides an FLR wrapper to the SAM state-space assessment model.

Maintained by N.T. Hintzen. Last updated 3 months ago.

2.0 match 4 stars 4.51 score 406 scripts

rwrandles

washex:Washington State Legislative Explorer

Gets data from the Washington State Legislature.

Maintained by Rohnin Randles. Last updated 3 years ago.

1.9 match 2.70 score 2 scripts

ropenspain

senadoRES:Information About the Senate of Spain

Retrieve and parse information about the Spanish Congress.

Maintained by Lluรญs Revilla Sancho. Last updated 3 months ago.

goverment-dataspaintransparency

2.3 match 1 stars 2.18 score 3 scripts

king-county-restaurant-grading

QuantileGradeR:Quantile-Adjusted Restaurant Grading

Implementation of the food safety restaurant grading system adopted by Public Health - Seattle & King County (see Ashwood, Z.C., Elias, B., and Ho. D.E. "Improving the Reliability of Food Safety Disclosure: A Quantile Adjusted Restaurant Grading System for Seattle-King County" (working paper)). As reported in the accompanying paper, this package allows jurisdictions to easily implement refinements that address common challenges with unadjusted grading systems. First, in contrast to unadjusted grading, where the most recent single routine inspection is the primary determinant of a grade, grading inputs are allowed to be flexible. For instance, it is straightforward to base the grade on average inspection scores across multiple inspection cycles. Second, the package can identify quantile cutoffs by inputting substantively meaningful regulatory thresholds (e.g., the proportion of establishments receiving sufficient violation points to warrant a return visit). Third, the quantile adjustment equalizes the proportion of establishments in a flexible number of grading categories (e.g., A/B/C) across areas (e.g., ZIP codes, inspector areas) to account for inspector differences. Fourth, the package implements a refined quantile adjustment that addresses two limitations with the stats::quantile() function when applied to inspection score datasets with large numbers of score ties. The quantile adjustment algorithm iterates over quantiles until, over all restaurants in all areas, grading proportions are within a tolerance of desired global proportions. In addition the package allows a modified definition of "quantile" from "Nearest Rank". Instead of requiring that at least p[1]% of restaurants receive the top grade and at least (p[1]+p[2])% of restaurants receive the top or second best grade for quantiles p, the algorithm searches for cutoffs so that as close as possible p[1]% of restaurants receive the top grade, and as close as possible to p[2]% of restaurants receive the second top grade.

Maintained by Zoe Ashwood. Last updated 8 years ago.

2.5 match 1.00 score 2 scripts

vivid225

planningML:A Sample Size Calculator for Machine Learning Applications in Healthcare

Advances in automated document classification has led to identifying massive numbers of clinical concepts from handwritten clinical notes. These high dimensional clinical concepts can serve as highly informative predictors in building classification algorithms for identifying patients with different clinical conditions, commonly referred to as patient phenotyping. However, from a planning perspective, it is critical to ensure that enough data is available for the phenotyping algorithm to obtain a desired classification performance. This challenge in sample size planning is further exacerbated by the high dimension of the feature space and the inherent imbalance of the response class. Currently available sample size planning methods can be categorized into: (i) model-based approaches that predict the sample size required for achieving a desired accuracy using a linear machine learning classifier and (ii) learning curve-based approaches (Figueroa et al. (2012) <doi:10.1186/1472-6947-12-8>) that fit an inverse power law curve to pilot data to extrapolate performance. We develop model-based approaches for imbalanced data with correlated features, deriving sample size formulas for performance metrics that are sensitive to class imbalance such as Area Under the receiver operating characteristic Curve (AUC) and Matthews Correlation Coefficient (MCC). This is done using a two-step approach where we first perform feature selection using the innovated High Criticism thresholding method (Hall and Jin (2010) <doi:10.1214/09-AOS764>), then determine the sample size by optimizing the two performance metrics. Further, we develop software in the form of an R package named 'planningML' and an 'R' 'Shiny' app to facilitate the convenient implementation of the developed model-based approaches and learning curve approaches for imbalanced data. We apply our methods to the problem of phenotyping rare outcomes using the MIMIC-III electronic health record database. We show that our developed methods which relate training data size and performance on AUC and MCC, can predict the true or observed performance from linear ML classifiers such as LASSO and SVM at different training data sizes. Therefore, in high-dimensional classification analysis with imbalanced data and correlated features, our approach can efficiently and accurately determine the sample size needed for machine-learning based classification.

Maintained by Xinying Fang. Last updated 2 years ago.

0.5 match 1 stars 2.00 score 2 scripts