R-universe search: glm

hanase

BMA:Bayesian Model Averaging

Package for Bayesian model averaging and variable selection for linear models, generalized linear models and survival models (cox regression).

Maintained by Hana Sevcikova. Last updated 2 months ago.

fortran

35.3 match 37 stars 9.38 score 152 scripts 14 dependents

ethan-alt

hdbayes:Bayesian Analysis of Generalized Linear Models with Historical Data

User-friendly functions for leveraging (multiple) historical data set(s) for generalized linear models. The package contains functions for sampling from the posterior distribution of a generalized linear model using the prior induced by the Bayesian hierarchical model, power prior by Ibrahim and Chen (2000) <doi:10.1214/ss/1009212673>, normalized power prior by Duan et al. (2006) <doi:10.1002/env.752>, normalized asymptotic power prior by Ibrahim et al. (2015) <doi:10.1002/sim.6728>, commensurate prior by Hobbs et al. (2011) <doi:10.1111/j.1541-0420.2011.01564.x>, robust meta-analytic-predictive prior by Schmidli et al. (2014) <doi:10.1111/biom.12242>, the latent exchangeability prior by Alt et al. (2023) <doi:10.48550/arXiv.2303.05223>, and a normal (or half-normal) prior. Functions for computing the marginal log-likelihood under each of the implemented priors are also included. The package compiles all the 'CmdStan' models once during installation using the 'instantiate' package.

Maintained by Ethan M. Alt. Last updated 5 days ago.

45.8 match 4 stars 6.23 score 7 scripts

benjaminschlegel

glm.predict:Predicted Values and Discrete Changes for Regression Models

Functions to calculate predicted values and the difference between the two cases with confidence interval for lm() [linear model], glm() [generalized linear model], glm.nb() [negative binomial model], polr() [ordinal logistic model], vglm() [generalized ordinal logistic model], multinom() [multinomial model], tobit() [tobit model], svyglm() [survey-weighted generalised linear models] and lmer() [linear multilevel models] using Monte Carlo simulations or bootstrap. Reference: Bennet A. Zelner (2009) <doi:10.1002/smj.783>.

Maintained by Benjamin E. Schlegel. Last updated 7 months ago.

47.9 match 1 stars 5.10 score 55 scripts

sinnweja

haplo.stats:Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous

Routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers). The main functions are: haplo.em(), haplo.glm(), haplo.score(), and haplo.power(); all of which have detailed examples in the vignette.

Maintained by Jason P. Sinnwell. Last updated 6 months ago.

36.4 match 2 stars 5.98 score 96 scripts 12 dependents

friendly

vcdExtra:'vcd' Extensions and Additions

Provides additional data sets, methods and documentation to complement the 'vcd' package for Visualizing Categorical Data and the 'gnm' package for Generalized Nonlinear Models. In particular, 'vcdExtra' extends mosaic, assoc and sieve plots from 'vcd' to handle 'glm()' and 'gnm()' models and adds a 3D version in 'mosaic3d'. Additionally, methods are provided for comparing and visualizing lists of 'glm' and 'loglm' objects. This package is now a support package for the book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer.

Maintained by Michael Friendly. Last updated 5 months ago.

categorical-data-visualization generalized-linear-models mosaic-plots

18.1 match 24 stars 10.34 score 472 scripts 3 dependents

jamesyang007

adelie:Group Lasso and Elastic Net Solver for Generalized Linear Models

Extremely efficient procedures for fitting the entire group lasso and group elastic net regularization path for GLMs, multinomial, the Cox model and multi-task Gaussian models. Similar to the R package 'glmnet' in scope of models, and in computational speed. This package provides R bindings to the C++ code underlying the corresponding Python package 'adelie'. These bindings offer a general purpose group elastic net solver, a wide range of matrix classes that can exploit special structure to allow large-scale inputs, and an assortment of generalized linear model classes for fitting various types of data. The package is an implementation of Yang, J. and Hastie, T. (2024) <doi:10.48550/arXiv.2405.08631>.

Maintained by Trevor Hastie. Last updated 15 days ago.

cpp openmp

30.1 match 6 stars 5.86 score 3 scripts

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

21.5 match 3 stars 8.13 score 7.8k scripts 11 dependents

ikosmidis

brglm2:Bias Reduction in Generalized Linear Models

Estimation and inference from generalized linear models based on various methods for bias reduction and maximum penalized likelihood with powers of the Jeffreys prior as penalty. The 'brglmFit' fitting method can achieve reduction of estimation bias by solving either the mean bias-reducing adjusted score equations in Firth (1993) <doi:10.1093/biomet/80.1.27> and Kosmidis and Firth (2009) <doi:10.1093/biomet/asp055>, or the median bias-reduction adjusted score equations in Kenne et al. (2017) <doi:10.1093/biomet/asx046>, or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as in Cordeiro and McCullagh (1991) <https://www.jstor.org/stable/2345592>. See Kosmidis et al (2020) <doi:10.1007/s11222-019-09860-6> for more details. Estimation in all cases takes place via a quasi Fisher scoring algorithm, and S3 methods for the construction of of confidence intervals for the reduced-bias estimates are provided. In the special case of generalized linear models for binomial and multinomial responses (both ordinal and nominal), the adjusted score approaches to mean and media bias reduction have been found to return estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation; see Kosmidis and Firth, 2020 <doi:10.1093/biomet/asaa052>, for a proof for mean bias reduction in logistic regression).

Maintained by Ioannis Kosmidis. Last updated 6 months ago.

adjusted-score-equations algorithms bias-reducing-adjustments bias-reduction estimation glm logistic-regression nominal-responses ordinal-responses regression regression-algorithms statistics

16.0 match 32 stars 10.41 score 106 scripts 10 dependents

trevorhastie

glmnet:Lasso and Elastic-Net Regularized Generalized Linear Models

Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple-response Gaussian, and the grouped multinomial regression; see <doi:10.18637/jss.v033.i01> and <doi:10.18637/jss.v039.i05>. There are two new and important additions. The family argument can be a GLM family object, which opens the door to any programmed family (<doi:10.18637/jss.v106.i01>). This comes with a modest computational cost, so when the built-in families suffice, they should be used instead. The other novelty is the relax option, which refits each of the active sets in the path unpenalized. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the papers cited.

Maintained by Trevor Hastie. Last updated 2 years ago.

fortran cpp

10.3 match 82 stars 15.15 score 22k scripts 736 dependents

bioc

ALDEx2:Analysis Of Differential Abundance Taking Sample and Scale Variation Into Account

A differential abundance analysis for the comparison of two or more conditions. Useful for analyzing data from standard RNA-seq or meta-RNA-seq assays as well as selected and unselected values from in-vitro sequence selections. Uses a Dirichlet-multinomial model to infer abundance from counts, optimized for three or more experimental replicates. The method infers biological and sampling variation to calculate the expected false discovery rate, given the variation, based on a Wilcoxon Rank Sum test and Welch's t-test (via aldex.ttest), a Kruskal-Wallis test (via aldex.kw), a generalized linear model (via aldex.glm), or a correlation test (via aldex.corr). All tests report predicted p-values and posterior Benjamini-Hochberg corrected p-values. ALDEx2 also calculates expected standardized effect sizes for paired or unpaired study designs. ALDEx2 can now be used to estimate the effect of scale on the results and report on the scale-dependent robustness of results.

Maintained by Greg Gloor. Last updated 5 months ago.

differentialexpression rnaseq transcriptomics geneexpression dnaseq chipseq bayesian sequencing software microbiome metagenomics immunooncology scale simulation posterior p-value

14.4 match 28 stars 10.70 score 424 scripts 3 dependents

ikosmidis

enrichwith:Methods to Enrich R Objects with Extra Components

Provides the "enrich" method to enrich list-like R objects with new, relevant components. The current version has methods for enriching objects of class 'family', 'link-glm', 'lm', 'glm' and 'betareg'. The resulting objects preserve their class, so all methods associated with them still apply. The package also provides the 'enriched_glm' function that has the same interface as 'glm' but results in objects of class 'enriched_glm'. In addition to the usual components in a `glm` object, 'enriched_glm' objects carry an object-specific simulate method and functions to compute the scores, the observed and expected information matrix, the first-order bias, as well as model densities, probabilities, and quantiles at arbitrary parameter values. The package can also be used to produce customizable source code templates for the structured implementation of methods to compute new components and enrich arbitrary objects.

Maintained by Ioannis Kosmidis. Last updated 5 years ago.

infrastructure

20.8 match 6 stars 7.35 score 16 scripts 12 dependents

castroloj

glm.deploy:'C' and 'Java' Source Code Generator for Fitted Glm Objects

Provides two functions that generate source code implementing the predict function of fitted glm objects. In this version, code can be generated for either 'C' or 'Java'. The idea is to provide a tool for the easy and fast deployment of glm predictive models into production. The source code generated by this package implements two function/methods. One of such functions implements the equivalent to predict(type="response"), while the second implements predict(type="link"). Source code is written to disk as a .c or .java file in the specified path. In the case of c, an .h file is also generated.

Maintained by Oscar Castro-Lopez. Last updated 6 years ago.

cpp

46.1 match 2 stars 3.04 score 11 scripts

boost-r

mboost:Model-Based Boosting

Functional gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data. Models and algorithms are described in <doi:10.1214/07-STS242>, a hands-on tutorial is available from <doi:10.1007/s00180-012-0382-5>. The package allows user-specified loss functions and base-learners.

Maintained by Torsten Hothorn. Last updated 4 months ago.

boosting-algorithms gam glm machine-learning mboost modelling r-language tutorials variable-selection openblas

11.0 match 72 stars 12.70 score 540 scripts 27 dependents

gavinsimpson

gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'

Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.

Maintained by Gavin L. Simpson. Last updated 4 days ago.

distributional-regression gam gamm generalized-additive-mixed-models generalized-additive-models ggplot2 glm lm mgcv penalized-spline random-effects smoothing splines

11.0 match 216 stars 12.68 score 1.6k scripts 1 dependents

ecpolley

SuperLearner:Super Learner Prediction

Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner.

Maintained by Eric Polley. Last updated 1 years ago.

10.5 match 273 stars 13.07 score 2.1k scripts 36 dependents

bioc

glmGamPoi:Fit a Gamma-Poisson Generalized Linear Model

Fit linear models to overdispersed count data. The package can estimate the overdispersion and fit repeated models for matrix input. It is designed to handle large input datasets as they typically occur in single cell RNA-seq experiments.

Maintained by Constantin Ahlmann-Eltze. Last updated 1 months ago.

regression rnaseq software singlecell gamma-poisson glm negative-binomial-regression on-disk openblas cpp

11.0 match 110 stars 12.11 score 1.0k scripts 4 dependents

kkholst

mets:Analysis of Multivariate Event Times

Implementation of various statistical models for multivariate event history data <doi:10.1007/s10985-013-9244-x>. Including multivariate cumulative incidence models <doi:10.1002/sim.6016>, and bivariate random effects probit models (Liability models) <doi:10.1016/j.csda.2015.01.014>. Modern methods for survival analysis, including regression modelling (Cox, Fine-Gray, Ghosh-Lin, Binomial regression) with fast computation of influence functions.

Maintained by Klaus K. Holst. Last updated 22 hours ago.

multivariate-time-to-event survival-analysis time-to-event fortran openblas cpp

9.8 match 14 stars 13.47 score 236 scripts 42 dependents

vitomuggeo

segmented:Regression Models with Break-Points / Change-Points Estimation (with Possibly Random Effects)

Fitting regression models where, in addition to possible linear terms, one or more covariates have segmented (i.e., broken-line or piece-wise linear) or stepmented (i.e. piece-wise constant) effects. Multiple breakpoints for the same variable are allowed. The estimation method is discussed in Muggeo (2003, <doi:10.1002/sim.1545>) and illustrated in Muggeo (2008, <https://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf>). An approach for hypothesis testing is presented in Muggeo (2016, <doi:10.1080/00949655.2016.1149855>), and interval estimation for the breakpoint is discussed in Muggeo (2017, <doi:10.1111/anzs.12200>). Segmented mixed models, i.e. random effects in the change point, are discussed in Muggeo (2014, <doi:10.1177/1471082X13504721>). Estimation of piecewise-constant relationships and changepoints (mean-shift models) is discussed in Fasola et al. (2018, <doi:10.1007/s00180-017-0740-4>).

Maintained by Vito M. R. Muggeo. Last updated 15 days ago.

12.3 match 9 stars 10.03 score 1.2k scripts 203 dependents

cran

MASS:Support Functions and Datasets for Venables and Ripley's MASS

Functions and datasets to support Venables and Ripley, "Modern Applied Statistics with S" (4th edition, 2002).

Maintained by Brian Ripley. Last updated 15 days ago.

11.6 match 19 stars 10.53 score 11k dependents

tidymodels

broom:Convert Statistical Objects into Tidy Tibbles

Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.

Maintained by Simon Couch. Last updated 4 months ago.

modeling tidy-data

5.5 match 1.5k stars 21.56 score 37k scripts 1.4k dependents

ecospat

ecospat:Spatial Ecology Miscellaneous Methods

Collection of R functions and data sets for the support of spatial ecology analyses with a focus on pre, core and post modelling analyses of species distribution, niche quantification and community assembly. Written by current and former members and collaborators of the ecospat group of Antoine Guisan, Department of Ecology and Evolution (DEE) and Institute of Earth Surface Dynamics (IDYST), University of Lausanne, Switzerland. Read Di Cola et al. (2016) <doi:10.1111/ecog.02671> for details.

Maintained by Olivier Broennimann. Last updated 1 months ago.

11.8 match 32 stars 9.35 score 418 scripts 1 dependents

willtownes

glmpca:Dimension Reduction of Non-Normally Distributed Data

Implements a generalized version of principal components analysis (GLM-PCA) for dimension reduction of non-normally distributed data such as counts or binary matrices. Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) <doi:10.1186/s13059-019-1861-6>. Townes FW (2019) <arXiv:1907.02647>.

Maintained by F. William Townes. Last updated 11 months ago.

10.1 match 94 stars 9.24 score 258 scripts 4 dependents

amrei-stammann

alpaca:Fit GLM's with High-Dimensional k-Way Fixed Effects

Provides a routine to partial out factors with many levels during the optimization of the log-likelihood function of the corresponding generalized linear model (glm). The package is based on the algorithm described in Stammann (2018) <arXiv:1707.01815> and is restricted to glm's that are based on maximum likelihood estimation and nonlinear. It also offers an efficient algorithm to recover estimates of the fixed effects in a post-estimation routine and includes robust and multi-way clustered standard errors. Further the package provides analytical bias corrections for binary choice models derived by Fernandez-Val and Weidner (2016) <doi:10.1016/j.jeconom.2015.12.014> and Hinz, Stammann, and Wanner (2020) <arXiv:2004.12655>.

Maintained by Amrei Stammann. Last updated 6 months ago.

openblas cpp

11.9 match 45 stars 7.01 score 105 scripts

jinseob2kim

jstable:Create Tables from Different Types of Regression

Create regression tables from generalized linear model(GLM), generalized estimating equation(GEE), generalized linear mixed-effects model(GLMM), Cox proportional hazards model, survey-weighted generalized linear model(svyglm) and survey-weighted Cox model results for publication.

Maintained by Jinseob Kim. Last updated 10 days ago.

label regression table

8.3 match 26 stars 9.98 score 199 scripts 1 dependents

alexpkeil1

qgcomp:Quantile G-Computation

G-computation for a set of time-fixed exposures with quantile-based basis functions, possibly under linearity and homogeneity assumptions. This approach estimates a regression line corresponding to the expected change in the outcome (on the link basis) given a simultaneous increase in the quantile-based category for all exposures. Works with continuous, binary, and right-censored time-to-event outcomes. Reference: Alexander P. Keil, Jessie P. Buckley, Katie M. OBrien, Kelly K. Ferguson, Shanshan Zhao, and Alexandra J. White (2019) A quantile-based g-computation approach to addressing the effects of exposure mixtures; <doi:10.1289/EHP5838>.

Maintained by Alexander Keil. Last updated 3 days ago.

exposure exposure-mixture exposure-mixtures quantile-gcomputation survival

9.4 match 37 stars 8.73 score 70 scripts 2 dependents

diystat

NBPSeq:Negative Binomial Models for RNA-Sequencing Data

Negative Binomial (NB) models for two-group comparisons and regression inferences from RNA-Sequencing Data.

Maintained by Yanming Di. Last updated 11 years ago.

16.8 match 1 stars 4.88 score 17 scripts 3 dependents

hojsgaard

doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities

Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.

Maintained by Søren Højsgaard. Last updated 3 days ago.

5.4 match 1 stars 14.94 score 3.2k scripts 939 dependents

mlr-org

mlr3learners:Recommended Learners for 'mlr3'

Recommended Learners for 'mlr3'. Extends 'mlr3' with interfaces to essential machine learning packages on CRAN. This includes, but is not limited to: (penalized) linear and logistic regression, linear and quadratic discriminant analysis, k-nearest neighbors, naive Bayes, support vector machines, and gradient boosting.

Maintained by Marc Becker. Last updated 4 months ago.

classification learners machine-learning mlr3 regression

7.0 match 91 stars 11.51 score 1.5k scripts 10 dependents

nerler

JointAI:Joint Analysis and Imputation of Incomplete Data

Joint analysis and imputation of incomplete data in the Bayesian framework, using (generalized) linear (mixed) models and extensions there of, survival models, or joint models for longitudinal and survival data, as described in Erler, Rizopoulos and Lesaffre (2021) <doi:10.18637/jss.v100.i20>. Incomplete covariates, if present, are automatically imputed. The package performs some preprocessing of the data and creates a 'JAGS' model, which will then automatically be passed to 'JAGS' <https://mcmc-jags.sourceforge.io/> with the help of the package 'rjags'.

Maintained by Nicole S. Erler. Last updated 12 months ago.

bayesian generalized-linear-models glm glmm imputation imputations jags joint-analysis linear-mixed-models linear-regression-models mcmc-sample mcmc-sampling missing-data missing-values survival cpp

11.0 match 28 stars 7.30 score 59 scripts 1 dependents

pachadotdev

gravity:Estimation Methods for Gravity Models

A wrapper of different standard estimation methods for gravity models. This package provides estimation methods for log-log models and multiplicative models.

Maintained by Mauricio Vargas. Last updated 4 months ago.

bvu bvw ddm econometrics glm gpml gravity international-trade lm maximum-likelihood nbpml nls ols ppml sils tobit trade

11.0 match 35 stars 6.98 score 55 scripts

vadimtyuryaev

RegrCoeffsExplorer:Efficient Visualization of Regression Coefficients for lm(), glm(), and glmnet() Objects

The visualization tool offers a nuanced understanding of regression dynamics, going beyond traditional per-unit interpretation of continuous variables versus categorical ones. It highlights the impact of unit changes as well as larger shifts like interquartile changes, acknowledging the distribution of empirical data. Furthermore, it generates visualizations depicting alterations in Odds Ratios for predictors across minimum, first quartile, median, third quartile, and maximum values, aiding in comprehending predictor-outcome interplay within empirical data distributions, particularly in logistic regression frameworks.

Maintained by Vadim Tyuryaev. Last updated 2 months ago.

coefficients-of-linear-regression confidence-intervals empirical-data glm glmnet lasso-regression lm postselectioninference regression-analysis regularized-linear-regression regularized-logistic-regression selectiveinference statistics-for-data-science visualization

15.1 match 1 stars 4.90 score 4 scripts

matteo21q

jomo:Multilevel Joint Modelling Multiple Imputation

Similarly to Schafer's package 'pan', 'jomo' is a package for multilevel joint modelling multiple imputation (Carpenter and Kenward, 2013) <doi:10.1002/9781119942283>. Novel aspects of 'jomo' are the possibility of handling binary and categorical data through latent normal variables, the option to use cluster-specific covariance matrices and to impute compatibly with the substantive model.

Maintained by Matteo Quartagno. Last updated 2 years ago.

7.7 match 3 stars 9.58 score 126 scripts 154 dependents

youyifong

kyotil:Utility Functions for Statistical Analysis Report Generation and Monte Carlo Studies

Helper functions for creating formatted summary of regression models, writing publication-ready tables to latex files, and running Monte Carlo experiments.

Maintained by Youyi Fong. Last updated 6 days ago.

openblas

9.0 match 7.87 score 236 scripts 7 dependents

lme4

lme4:Linear Mixed-Effects Models using 'Eigen' and S4

Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".

Maintained by Ben Bolker. Last updated 1 days ago.

cpp

3.4 match 647 stars 20.69 score 35k scripts 1.5k dependents

ewenharrison

finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling

Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.

Maintained by Ewen Harrison. Last updated 6 months ago.

6.0 match 270 stars 11.43 score 1.0k scripts

mpascariu

ungroup:Penalized Composite Link Model for Efficient Estimation of Smooth Distributions from Coarsely Binned Data

Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015) <doi:10.1093/aje/kwv020>.

Maintained by Marius D. Pascariu. Last updated 1 years ago.

distributions glm smoothing ungrouping cpp

11.0 match 14 stars 5.96 score 65 scripts

moviedo5

fda.usc:Functional Data Analysis and Utilities for Statistical Computing

Routines for exploratory and descriptive analysis of functional data such as depth measurements, atypical curves detection, regression models, supervised classification, unsupervised classification and functional analysis of variance.

Maintained by Manuel Oviedo de la Fuente. Last updated 4 months ago.

functional-data-analysis fortran

6.5 match 12 stars 9.72 score 560 scripts 22 dependents

sbgraves237

Ecfun:Functions for 'Ecdat'

Functions and vignettes to update data sets in 'Ecdat' and to create, manipulate, plot, and analyze those and similar data sets.

Maintained by Spencer Graves. Last updated 3 months ago.

7.9 match 7.94 score 85 scripts 4 dependents

mqbssppe

poisson.glm.mix:Fit High Dimensional Mixtures of Poisson GLMs

Mixtures of Poisson Generalized Linear Models for high dimensional count data clustering. The (multivariate) responses can be partitioned into set of blocks. Three different parameterizations of the linear predictor are considered. The models are estimated according to the EM algorithm with an efficient initialization scheme <doi:10.1016/j.csda.2014.07.005>.

Maintained by Panagiotis Papastamoulis. Last updated 2 years ago.

40.3 match 1.52 score 11 scripts 1 dependents

guyabel

tidycat:Expand Tidy Output for Categorical Parameter Estimates

Create additional rows and columns on broom::tidy() output to allow for easier control on categorical parameter estimates.

Maintained by Guy J. Abel. Last updated 1 years ago.

data-visualization data-viz glm model-comparison regression-analysis regression-models statistical-analysis statistical-modeling

11.0 match 4 stars 5.53 score 56 scripts 1 dependents

xsswang

remiod:Reference-Based Multiple Imputation for Ordinal/Binary Response

Reference-based multiple imputation of ordinal and binary responses under Bayesian framework, as described in Wang and Liu (2022) <arXiv:2203.02771>. Methods for missing-not-at-random include Jump-to-Reference (J2R), Copy Reference (CR), and Delta Adjustment which can generate tipping point analysis.

Maintained by Tony Wang. Last updated 2 years ago.

bayesian control-based copy-reference delta-adjustment generalized-linear-models glm jags jump-to-reference mcmc missing-at-random missing-data missing-not-at-random multiple-imputation non-ignorable ordinal-regression pattern-mixture-model reference-based statistics cpp

14.0 match 4.30 score 3 scripts

rapporter

pander:An R 'Pandoc' Writer

Contains some functions catching all messages, 'stdout' and other useful information while evaluating R code and other helpers to return user specified text elements (like: header, paragraph, table, image, lists etc.) in 'pandoc' markdown or several type of R objects similarly automatically transformed to markdown format. Also capable of exporting/converting (the resulting) complex 'pandoc' documents to e.g. HTML, 'PDF', 'docx' or 'odt'. This latter reporting feature is supported in brew syntax or with a custom reference class with a smarty caching 'backend'.

Maintained by Gergely Daróczi. Last updated 14 days ago.

literate-programming markdown pandoc pandoc-markdown reproducible-research rmarkdown cpp

3.6 match 297 stars 16.60 score 7.6k scripts 108 dependents

jared-fowler

prettyglm:Pretty Summaries of Generalized Linear Model Coefficients

One of the main advantages of using Generalised Linear Models is their interpretability. The goal of 'prettyglm' is to provide a set of functions which easily create beautiful coefficient summaries which can readily be shared and explained. 'prettyglm' helps users create coefficient summaries which include categorical base levels, variable importance and type III p.values. 'prettyglm' also creates beautiful relativity plots for categorical, continuous and splined coefficients.

Maintained by Jared Fowler. Last updated 1 years ago.

classification classification-model data-science data-visualization glm linear-models regression regression-analysis regression-model regression-models statistical-models

12.5 match 3 stars 4.73 score 36 scripts

zhuwang46

mpath:Regularized Linear Models

Algorithms compute robust estimators for loss functions in the concave convex (CC) family by the iteratively reweighted convex optimization (IRCO), an extension of the iteratively reweighted least squares (IRLS). The IRCO reduces the weight of the observation that leads to a large loss; it also provides weights to help identify outliers. Applications include robust (penalized) generalized linear models and robust support vector machines. The package also contains penalized Poisson, negative binomial, zero-inflated Poisson, zero-inflated negative binomial regression models and robust models with non-convex loss functions. Wang et al. (2014) <doi:10.1002/sim.6314>, Wang et al. (2015) <doi:10.1002/bimj.201400143>, Wang et al. (2016) <doi:10.1177/0962280214530608>, Wang (2021) <doi:10.1007/s11749-021-00770-2>, Wang (2020) <arXiv:2010.02848>.

Maintained by Zhu Wang. Last updated 3 years ago.

fortran openblas

8.8 match 1 stars 6.67 score 131 scripts 4 dependents

martin3141

spant:MR Spectroscopy Analysis Tools

Tools for reading, visualising and processing Magnetic Resonance Spectroscopy data. The package includes methods for spectral fitting: Wilson (2021) <DOI:10.1002/mrm.28385> and spectral alignment: Wilson (2018) <DOI:10.1002/mrm.27605>.

Maintained by Martin Wilson. Last updated 29 days ago.

brain mri mrs mrshub spectroscopy fortran

6.8 match 24 stars 8.55 score 81 scripts

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

7.7 match 145 stars 7.09 score 50 scripts 2 dependents

daniel-gerhard

mcprofile:Testing Generalized Linear Hypotheses for Generalized Linear Model Parameters by Profile Deviance

Calculation of signed root deviance profiles for linear combinations of parameters in a generalized linear model. Multiple tests and simultaneous confidence intervals are provided.

Maintained by Daniel Gerhard. Last updated 4 years ago.

confidence-intervals glm

11.0 match 1 stars 4.88 score 51 scripts 1 dependents

valentint

robust:Port of the S+ "Robust Library"

Methods for robust statistics, a state of the art in the early 2000s, notably for robust regression and robust multivariate analysis.

Maintained by Valentin Todorov. Last updated 7 months ago.

fortran openblas

7.1 match 7.51 score 572 scripts 8 dependents

cran

boot:Bootstrap Functions (Originally by Angelo Canty for S)

Functions and datasets for bootstrapping from the book "Bootstrap Methods and Their Application" by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.

Maintained by Alessandra R. Brazzale. Last updated 7 months ago.

6.5 match 2 stars 8.21 score 2.3k dependents

christophergandrud

coreSim:Core Functionality for Simulating Quantities of Interest from Generalised Linear Models

Core functions for simulating quantities of interest from generalised linear models (GLM). This package will form the backbone of a series of other packages that improve the interpretation of GLM estimates.

Maintained by Christopher Gandrud. Last updated 8 years ago.

generalised-linear-models glm simulating-quantities simulation

13.5 match 5 stars 3.88 score 9 scripts 1 dependents

kwstat

agridat:Agricultural Datasets

Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from small-plot trials, multi-environment trials, uniformity trials, yield monitors, and more.

Maintained by Kevin Wright. Last updated 26 days ago.

data

4.7 match 125 stars 11.02 score 1.7k scripts 2 dependents

leifeld

texreg:Conversion of R Regression Output to LaTeX or HTML Tables

Converts coefficients, standard errors, significance stars, and goodness-of-fit statistics of statistical models into LaTeX tables or HTML tables/MS Word documents or to nicely formatted screen output for the R console for easy model comparison. A list of several models can be combined in a single table. The output is highly customizable. New model types can be easily implemented. Details can be found in Leifeld (2013), JStatSoft <doi:10.18637/jss.v055.i08>.)

Maintained by Philip Leifeld. Last updated 2 months ago.

html-tables latex latex-tables regression reporting table texreg

3.7 match 113 stars 14.09 score 1.8k scripts 67 dependents

jaredlander

coefplot:Plots Coefficients from Fitted Models

Plots the coefficients from model objects. This very quickly shows the user the point estimates and confidence intervals for fitted models.

Maintained by Jared P. Lander. Last updated 3 years ago.

6.3 match 27 stars 8.28 score 744 scripts 1 dependents

stephenslab

fastglmpca:Fast Algorithms for Generalized Principal Component Analysis

Implements fast, scalable optimization algorithms for fitting generalized principal components analysis (GLM-PCA) models, as described in "A Generalization of Principal Components Analysis to the Exponential Family" Collins M, Dasgupta S, Schapire RE (2002, ISBN:9780262271738), and subsequently "Feature Selection and Dimension Reduction for Single-Cell RNA-Seq Based on a Multinomial Model" Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) <doi:10.1186/s13059-019-1861-6>.

Maintained by Eric Weine. Last updated 3 days ago.

openblas cpp

8.7 match 11 stars 5.72 score 16 scripts

bioc

MAST:Model-based Analysis of Single Cell Transcriptomics

Methods and models for handling zero-inflated single cell assay data.

Maintained by Andrew McDavid. Last updated 5 months ago.

geneexpression differentialexpression genesetenrichment rnaseq transcriptomics singlecell

3.9 match 230 stars 12.75 score 1.8k scripts 5 dependents

haghish

mlim:Single and Multiple Imputation with Automated Machine Learning

Machine learning algorithms have been used for performing single missing data imputation and most recently, multiple imputations. However, this is the first attempt for using automated machine learning algorithms for performing both single and multiple imputation. Automated machine learning is a procedure for fine-tuning the model automatic, performing a random search for a model that results in less error, without overfitting the data. The main idea is to allow the model to set its own parameters for imputing each variable separately instead of setting fixed predefined parameters to impute all variables of the dataset. Using automated machine learning, the package fine-tunes an Elastic Net (default) or Gradient Boosting, Random Forest, Deep Learning, Extreme Gradient Boosting, or Stacked Ensemble machine learning model (from one or a combination of other supported algorithms) for imputing the missing observations. This procedure has been implemented for the first time by this package and is expected to outperform other packages for imputing missing data that do not fine-tune their models. The multiple imputation is implemented via bootstrapping without letting the duplicated observations to harm the cross-validation procedure, which is the way imputed variables are evaluated. Most notably, the package implements automated procedure for handling imputing imbalanced data (class rarity problem), which happens when a factor variable has a level that is far more prevalent than the other(s). This is known to result in biased predictions, hence, biased imputation of missing data. However, the autobalancing procedure ensures that instead of focusing on maximizing accuracy (classification error) in imputing factor variables, a fairer procedure and imputation method is practiced.

Maintained by E. F. Haghish. Last updated 8 months ago.

automatic-machine-learning automl classimbalance data-science elastic-net extreme-gradient-boosting gbm glm gradient-boosting gradient-boosting-machine imputation imputation-algorithm imputation-methods machine-learning missing-data multipleimputation stack-ensemble

11.0 match 31 stars 4.49 score 7 scripts

julierennes

misaem:Linear Regression and Logistic Regression with Missing Covariates

Estimate parameters of linear regression and logistic regression with missing covariates with missing data, perform model selection and prediction, using EM-type algorithms. Jiang W., Josse J., Lavielle M., TraumaBase Group (2020) <doi:10.1016/j.csda.2019.106907>.

Maintained by Julie Josse. Last updated 4 years ago.

11.8 match 1 stars 4.20 score 32 scripts

aariq

bumbl:Tools for Modeling Bumblebee Colony Growth and Decline

Bumblebee colonies grow during worker production, then decline after switching to production of reproductive individuals (drones and gynes). This package provides tools for modeling and visualizing this pattern by identifying a switchpoint with a growth rate before and a decline rate after the switchpoint. The mathematical models fit by bumbl are described in Crone and Williams (2016) <doi:10.1111/ele.12581>.

Maintained by Eric R. Scott. Last updated 2 years ago.

bumblebee demography glm switchpoint

11.0 match 3 stars 4.48 score 8 scripts

xiaolei-lab

rMVP:Memory-Efficient, Visualize-Enhanced, Parallel-Accelerated GWAS Tool

A memory-efficient, visualize-enhanced, parallel-accelerated Genome-Wide Association Study (GWAS) tool. It can (1) effectively process large data, (2) rapidly evaluate population structure, (3) efficiently estimate variance components several algorithms, (4) implement parallel-accelerated association tests of markers three methods, (5) globally efficient design on GWAS process computing, (6) enhance visualization of related information. 'rMVP' contains three models GLM (Alkes Price (2006) <DOI:10.1038/ng1847>), MLM (Jianming Yu (2006) <DOI:10.1038/ng1702>) and FarmCPU (Xiaolei Liu (2016) <doi:10.1371/journal.pgen.1005767>); variance components estimation methods EMMAX (Hyunmin Kang (2008) <DOI:10.1534/genetics.107.080101>;), FaSTLMM (method: Christoph Lippert (2011) <DOI:10.1038/nmeth.1681>, R implementation from 'GAPIT2': You Tang and Xiaolei Liu (2016) <DOI:10.1371/journal.pone.0107684> and 'SUPER': Qishan Wang and Feng Tian (2014) <DOI:10.1371/journal.pone.0107684>), and HE regression (Xiang Zhou (2017) <DOI:10.1214/17-AOAS1052>).

Maintained by Xiaolei Liu. Last updated 2 months ago.

openblas cpp openmp

6.1 match 287 stars 8.06 score 38 scripts

gkremling

gofreg:Bootstrap-Based Goodness-of-Fit Tests for Parametric Regression

Provides statistical methods to check if a parametric family of conditional density functions fits to some given dataset of covariates and response variables. Different test statistics can be used to determine the goodness-of-fit of the assumed model, see Andrews (1997) <doi:10.2307/2171880>, Bierens & Wang (2012) <doi:10.1017/S0266466611000168>, Dikta & Scheer (2021) <doi:10.1007/978-3-030-73480-0> and Kremling & Dikta (2024) <doi:10.48550/arXiv.2409.20262>. As proposed in these papers, the corresponding p-values are approximated using a parametric bootstrap method.

Maintained by Gitte Kremling. Last updated 5 months ago.

9.0 match 5.30 score 9 scripts

r-forge

robustbase:Basic Robust Statistics

"Essential" Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book "Robust Statistics, Theory and Methods" by 'Maronna, Martin and Yohai'; Wiley 2006.

Maintained by Martin Maechler. Last updated 4 months ago.

fortran openblas

3.5 match 13.33 score 1.7k scripts 480 dependents

bioc

apeglm:Approximate posterior estimation for GLM coefficients

apeglm provides Bayesian shrinkage estimators for effect sizes for a variety of GLM models, using approximation of the posterior for individual coefficients.

Maintained by Anqi Zhu. Last updated 5 months ago.

immunooncology sequencing rnaseq differentialexpression geneexpression bayesian cpp

5.3 match 8.64 score 700 scripts 9 dependents

bioc

CytoGLMM:Conditional Differential Analysis for Flow and Mass Cytometry Experiments

The CytoGLMM R package implements two multiple regression strategies: A bootstrapped generalized linear model (GLM) and a generalized linear mixed model (GLMM). Most current data analysis tools compare expressions across many computationally discovered cell types. CytoGLMM focuses on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. As a result, CytoGLMM finds differential proteins in flow and mass cytometry data while reducing biases arising from marker correlations and safeguarding against false discoveries induced by patient heterogeneity.

Maintained by Christof Seiler. Last updated 5 months ago.

flowcytometry proteomics singlecell cellbasedassays cellbiology immunooncology regression statisticalmethod software

8.1 match 2 stars 5.68 score 1 scripts 1 dependents

myaseen208

StroupGLMM:R Codes and Datasets for Generalized Linear Mixed Models: Modern Concepts, Methods and Applications by Walter W. Stroup

R Codes and Datasets for Stroup, W. W. (2012). Generalized Linear Mixed Models Modern Concepts, Methods and Applications, CRC Press.

Maintained by Muhammad Yaseen. Last updated 5 months ago.

glm glmm lm lmm

11.0 match 14 stars 4.15 score 2 scripts

cardiomoon

moonBook:Functions and Datasets for the Book by Keon-Woong Moon

Several analysis-related functions for the book entitled "R statistics and graph for medical articles" (written in Korean), version 1, by Keon-Woong Moon with Korean demographic data with several plot functions.

Maintained by Keon-Woong Moon. Last updated 1 years ago.

4.7 match 37 stars 9.66 score 278 scripts 5 dependents

sciviews

modelit:Statistical Models for 'SciViews::R'

Create and use statistical models (linear, general, nonlinear...) with extensions to support rich-formatted tables, equations and plots for the 'SciViews::R' dialect.

Maintained by Philippe Grosjean. Last updated 4 months ago.

sciviews statsmodels

13.7 match 1 stars 3.30 score 8 scripts

cran

catalytic:Tools for Applying Catalytic Priors in Statistical Modeling

To improve estimation accuracy and stability in statistical modeling, catalytic prior distributions are employed, integrating observed data with synthetic data generated from a simpler model's predictive distribution. This approach enhances model robustness, stability, and flexibility in complex data scenarios. The catalytic prior distributions are introduced by 'Huang et al.' (2020, <doi:10.1073/pnas.1920913117>), Li and Huang (2023, <doi:10.48550/arXiv.2312.01411>).

Maintained by Dongming Huang. Last updated 3 months ago.

14.0 match 3.18 score

ddalthorp

dwp:Density-Weighted Proportion

Fit a Poisson regression to carcass distance data and integrate over the searched area at a wind farm to estimate the fraction of carcasses falling in the searched area and format the output for use as the dwp parameter in the 'GenEst' or 'eoa' package for estimating bird and bat mortality, following Dalthorp, et al. (2022) <arXiv:2201.10064>.

Maintained by Daniel Dalthorp. Last updated 2 years ago.

16.2 match 1 stars 2.70 score

crsh

papaja:Prepare American Psychological Association Journal Articles with R Markdown

Tools to create dynamic, submission-ready manuscripts, which conform to American Psychological Association manuscript guidelines. We provide R Markdown document formats for manuscripts (PDF and Word) and revision letters (PDF). Helper functions facilitate reporting statistical analyses or create publication-ready tables and plots.

Maintained by Frederik Aust. Last updated 16 days ago.

apa apa-guidelines journal manuscript psychology reproducible-paper reproducible-research rmarkdown

3.7 match 662 stars 11.74 score 1.7k scripts 1 dependents

wwbrannon

sqlscore:Utilities for Generating SQL Queries from Model Objects

Provides utilities for generating SQL queries (particularly CREATE TABLE statements) from R model objects. The most important use case is generating SQL to score a generalized linear model or related model represented as an R object, in which case the package handles parsing formula operators and including the model's response function.

Maintained by William Brannon. Last updated 6 years ago.

glm sql

11.0 match 13 stars 3.81 score 8 scripts

cwatson

brainGraph:Graph Theory Analysis of Brain MRI Data

A set of tools for performing graph theory analysis of brain MRI data. It works with data from a Freesurfer analysis (cortical thickness, volumes, local gyrification index, surface area), diffusion tensor tractography data (e.g., from FSL) and resting-state fMRI data (e.g., from DPABI). It contains a graphical user interface for graph visualization and data exploration, along with several functions for generating useful figures.

Maintained by Christopher G. Watson. Last updated 1 years ago.

brain-connectivity brain-imaging complex-networks connectome connectomics fmri graph-theory mri network-analysis neuroimaging neuroscience statistics tractography

5.3 match 188 stars 7.86 score 107 scripts 3 dependents

bioc

msmsTests:LC-MS/MS Differential Expression Tests

Statistical tests for label-free LC-MS/MS data by spectral counts, to discover differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package.The three models admit blocking factors to control for nuissance variables.To assure a good level of reproducibility a post-test filter is available, where we may set the minimum effect size considered biologicaly relevant, and the minimum expression of the most abundant condition.

Maintained by Josep Gregori i Font. Last updated 5 months ago.

immunooncology software massspectrometry proteomics

8.2 match 5.03 score 15 scripts 1 dependents

lotze

COMPoissonReg:Conway-Maxwell Poisson (COM-Poisson) Regression

Fit Conway-Maxwell Poisson (COM-Poisson or CMP) regression models to count data (Sellers & Shmueli, 2010) <doi:10.1214/09-AOAS306>. The package provides functions for model estimation, dispersion testing, and diagnostics. Zero-inflated CMP regression (Sellers & Raim, 2016) <doi:10.1016/j.csda.2016.01.007> is also supported.

Maintained by Andrew Raim. Last updated 1 years ago.

cpp

6.3 match 9 stars 6.63 score 53 scripts 3 dependents

wobbrock

multpois:Analyze Nominal Response Data with the Multinomial-Poisson Trick

Dichotomous responses having two categories can be analyzed with stats::glm() or lme4::glmer() using the family=binomial option. Unfortunately, polytomous responses with three or more unordered categories cannot be analyzed similarly because there is no analogous family=multinomial option. For between-subjects data, nnet::multinom() can address this need, but it cannot handle random factors and therefore cannot handle repeated measures. To address this gap, we transform nominal response data into counts for each categorical alternative. These counts are then analyzed using (mixed) Poisson regression as per Baker (1994) <doi:10.2307/2348134>. Omnibus analyses of variance can be run along with post hoc pairwise comparisons. For users wishing to analyze nominal responses from surveys or experiments, the functions in this package essentially act as though stats::glm() or lme4::glmer() provide a family=multinomial option.

Maintained by Jacob O. Wobbrock. Last updated 1 months ago.

8.6 match 1 stars 4.78 score 20 scripts

bioc

mirTarRnaSeq:mirTarRnaSeq

mirTarRnaSeq R package can be used for interactive mRNA miRNA sequencing statistical analysis. This package utilizes expression or differential expression mRNA and miRNA sequencing results and performs interactive correlation and various GLMs (Regular GLM, Multivariate GLM, and Interaction GLMs ) analysis between mRNA and miRNA expriments. These experiments can be time point experiments, and or condition expriments.

Maintained by Mercedeh Movassagh. Last updated 5 months ago.

mirna regression software sequencing smallrna timecourse differentialexpression

10.1 match 4.00 score 9 scripts

gksmyth

statmod:Statistical Modeling

A collection of algorithms and functions to aid statistical modeling. Includes limiting dilution analysis (aka ELDA), growth curve comparisons, mixed linear models, heteroscedastic regression, inverse-Gaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. Also includes advanced generalized linear model functions including Tweedie and Digamma distributional families, secure convergence and exact distributional calculations for unit deviances.

Maintained by Gordon Smyth. Last updated 2 years ago.

fortran

4.0 match 1 stars 9.62 score 2.2k scripts 849 dependents

luisagi

enmpa:Ecological Niche Modeling using Presence-Absence Data

A set of tools to perform Ecological Niche Modeling with presence-absence data. It includes algorithms for data partitioning, model fitting, calibration, evaluation, selection, and prediction. Other functions help to explore signals of ecological niche using univariate and multivariate analyses, and model features such as variable response curves and variable importance. Unique characteristics of this package are the ability to exclude models with concave quadratic responses, and the option to clamp model predictions to specific variables. These tools are implemented following principles proposed in Cobos et al., (2022) <doi:10.17161/bi.v17i.15985>, Cobos et al., (2019) <doi:10.7717/peerj.6281>, and Peterson et al., (2008) <doi:10.1016/j.ecolmodel.2007.11.008>.

Maintained by Luis F. Arias-Giraldo. Last updated 3 months ago.

cpp

8.7 match 5 stars 4.35 score 5 scripts

danlwarren

ENMTools:Analysis of Niche Evolution using Niche and Distribution Models

Constructing niche models and analyzing patterns of niche evolution. Acts as an interface for many popular modeling algorithms, and allows users to conduct Monte Carlo tests to address basic questions in evolutionary ecology and biogeography. Warren, D.L., R.E. Glor, and M. Turelli (2008) <doi:10.1111/j.1558-5646.2008.00482.x> Glor, R.E., and D.L. Warren (2011) <doi:10.1111/j.1558-5646.2010.01177.x> Warren, D.L., R.E. Glor, and M. Turelli (2010) <doi:10.1111/j.1600-0587.2009.06142.x> Cardillo, M., and D.L. Warren (2016) <doi:10.1111/geb.12455> D.L. Warren, L.J. Beaumont, R. Dinnage, and J.B. Baumgartner (2019) <doi:10.1111/ecog.03900>.

Maintained by Dan Warren. Last updated 2 months ago.

5.4 match 105 stars 6.91 score 126 scripts

r-forge

surveillance:Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena

Statistical methods for the modeling and monitoring of time series of counts, proportions and categorical data, as well as for the modeling of continuous-time point processes of epidemic phenomena. The monitoring methods focus on aberration detection in count data time series from public health surveillance of communicable diseases, but applications could just as well originate from environmetrics, reliability engineering, econometrics, or social sciences. The package implements many typical outbreak detection procedures such as the (improved) Farrington algorithm, or the negative binomial GLR-CUSUM method of Hoehle and Paul (2008) <doi:10.1016/j.csda.2008.02.015>. A novel CUSUM approach combining logistic and multinomial logistic modeling is also included. The package contains several real-world data sets, the ability to simulate outbreak data, and to visualize the results of the monitoring in a temporal, spatial or spatio-temporal fashion. A recent overview of the available monitoring procedures is given by Salmon et al. (2016) <doi:10.18637/jss.v070.i10>. For the retrospective analysis of epidemic spread, the package provides three endemic-epidemic modeling frameworks with tools for visualization, likelihood inference, and simulation. hhh4() estimates models for (multivariate) count time series following Paul and Held (2011) <doi:10.1002/sim.4177> and Meyer and Held (2014) <doi:10.1214/14-AOAS743>. twinSIR() models the susceptible-infectious-recovered (SIR) event history of a fixed population, e.g, epidemics across farms or networks, as a multivariate point process as proposed by Hoehle (2009) <doi:10.1002/bimj.200900050>. twinstim() estimates self-exciting point process models for a spatio-temporal point pattern of infective events, e.g., time-stamped geo-referenced surveillance data, as proposed by Meyer et al. (2012) <doi:10.1111/j.1541-0420.2011.01684.x>. A recent overview of the implemented space-time modeling frameworks for epidemic phenomena is given by Meyer et al. (2017) <doi:10.18637/jss.v077.i11>.

Maintained by Sebastian Meyer. Last updated 15 days ago.

cpp

3.5 match 2 stars 10.74 score 446 scripts 3 dependents

cliffordlai

bestglm:Best Subset GLM and Regression Utilities

Best subset glm using information criteria or cross-validation, carried by using 'leaps' algorithm (Furnival and Wilson, 1974) <doi:10.2307/1267601> or complete enumeration (Morgan and Tatar, 1972) <doi:10.1080/00401706.1972.10488918>. Implements PCR and PLS using AIC/BIC. Implements one-standard deviation rule for use with the 'caret' package.

Maintained by Yuanhao Lai. Last updated 5 years ago.

7.1 match 5.29 score 418 scripts 5 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 5 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

2.3 match 462 stars 16.50 score 10k scripts 154 dependents

jpmml

r2pmml:Convert R Models to PMML

R wrapper for the JPMML-R library <https://github.com/jpmml/jpmml-r>, which converts R models to Predictive Model Markup Language (PMML).

Maintained by Villu Ruusmann. Last updated 11 days ago.

5.8 match 74 stars 6.29 score 35 scripts

myllym

GET:Global Envelopes

Implementation of global envelopes for a set of general d-dimensional vectors T in various applications. A 100(1-alpha)% global envelope is a band bounded by two vectors such that the probability that T falls outside this envelope in any of the d points is equal to alpha. Global means that the probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be used for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional analysis of variance, functional general linear model, n-sample test of correspondence of distribution functions), for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot) and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction). See Myllymäki and Mrkvička (2024) <doi:10.18637/jss.v111.i03>, Myllymäki et al. (2017) <doi:10.1111/rssb.12172>, Mrkvička and Myllymäki (2023) <doi:10.1007/s11222-023-10275-7>, Mrkvička et al. (2016) <doi:10.1016/j.spasta.2016.04.005>, Mrkvička et al. (2017) <doi:10.1007/s11222-016-9683-9>, Mrkvička et al. (2020) <doi:10.14736/kyb-2020-3-0432>, Mrkvička et al. (2021) <doi:10.1007/s11009-019-09756-y>, Myllymäki et al. (2021) <doi:10.1016/j.spasta.2020.100436>, Mrkvička et al. (2022) <doi:10.1002/sim.9236>, Dai et al. (2022) <doi:10.5772/intechopen.100124>, Dvořák and Mrkvička (2022) <doi:10.1007/s00180-021-01134-y>, Mrkvička et al. (2023) <doi:10.48550/arXiv.2309.04746>, and Konstantinou et al. (2024) <doi: 10.1007/s00180-024-01569-z>.

Maintained by Mari Myllymäki. Last updated 3 months ago.

3.9 match 11 stars 9.33 score 46 scripts 5 dependents

cran

lctools:Local Correlation, Spatial Inequalities, Geographically Weighted Regression and Other Tools

Provides researchers and educators with easy-to-learn user friendly tools for calculating key spatial statistics and to apply simple as well as advanced methods of spatial analysis in real data. These include: Local Pearson and Geographically Weighted Pearson Correlation Coefficients, Spatial Inequality Measures (Gini, Spatial Gini, LQ, Focal LQ), Spatial Autocorrelation (Global and Local Moran's I), several Geographically Weighted Regression techniques and other Spatial Analysis tools (other geographically weighted statistics). This package also contains functions for measuring the significance of each statistic calculated, mainly based on Monte Carlo simulations.

Maintained by Stamatis Kalogirou. Last updated 12 months ago.

11.9 match 1 stars 3.03 score 53 scripts

pat-s

oddsratio:Odds Ratio Calculation for GAM(M)s & GLM(M)s

Simplified odds ratio calculation of GAM(M)s & GLM(M)s. Provides structured output (data frame) of all predictors and their corresponding odds ratios and confident intervals for further analyses. It helps to avoid false references of predictors and increments by specifying these parameters in a list instead of using 'exp(coef(model))' (standard approach of odds ratio calculation for GLMs) which just returns a plain numeric output. For GAM(M)s, odds ratio calculation is highly simplified with this package since it takes care of the multiple 'predict()' calls of the chosen predictor while holding other predictors constant. Also, this package allows odds ratio calculation of percentage steps across the whole predictor distribution range for GAM(M)s. In both cases, confident intervals are returned additionally. Calculated odds ratio of GAM(M)s can be inserted into the smooth function plot.

Maintained by Patrick Schratz. Last updated 11 months ago.

odds-ratio probability statistics

4.8 match 31 stars 7.48 score 81 scripts 1 dependents

bioc

snpStats:SnpMatrix and XSnpMatrix classes and methods

Classes and statistical methods for large SNP association studies. This extends the earlier snpMatrix package, allowing for uncertainty in genotypes.

Maintained by David Clayton. Last updated 5 months ago.

microarray snp geneticvariability zlib

3.8 match 9.41 score 674 scripts 17 dependents

pauleilers

JOPS:Practical Smoothing with P-Splines

Functions and data to reproduce all plots in the book "Practical Smoothing. The Joys of P-splines" by Paul H.C. Eilers and Brian D. Marx (2021, ISBN:978-1108482950).

Maintained by Paul Eilers. Last updated 2 years ago.

10.4 match 1 stars 3.43 score 296 scripts 3 dependents

boennecd

parglm:Parallel GLM

Provides a parallel estimation method for generalized linear models without compiling with a multithreaded LAPACK or BLAS.

Maintained by Benjamin Christoffersen. Last updated 3 years ago.

generalized-linear-models parallel-computing openblas cpp

5.5 match 11 stars 6.41 score 39 scripts 4 dependents

cran

Compositional:Compositional Data Analysis

Regression, classification, contour plots, hypothesis testing and fitting of distributions for compositional data are some of the functions included. We further include functions for percentages (or proportions). The standard textbook for such data is John Aitchison's (1986) "The statistical analysis of compositional data". Relevant papers include: a) Tsagris M.T., Preston S. and Wood A.T.A. (2011). "A data-based power transformation for compositional data". Fourth International International Workshop on Compositional Data Analysis. <doi:10.48550/arXiv.1106.1451> b) Tsagris M. (2014). "The k-NN algorithm for compositional data: a revised approach with and without zero values present". Journal of Data Science, 12(3): 519--534. <doi:10.6339/JDS.201407_12(3).0008>. c) Tsagris M. (2015). "A novel, divergence based, regression for compositional data". Proceedings of the 28th Panhellenic Statistics Conference, 15-18 April 2015, Athens, Greece, 430--444. <doi:10.48550/arXiv.1511.07600>. d) Tsagris M. (2015). "Regression analysis with compositional data containing zero values". Chilean Journal of Statistics, 6(2): 47--57. <https://soche.cl/chjs/volumes/06/02/Tsagris(2015).pdf>. e) Tsagris M., Preston S. and Wood A.T.A. (2016). "Improved supervised classification for compositional data using the alpha-transformation". Journal of Classification, 33(2): 243--261. <doi:10.1007/s00357-016-9207-5>. f) Tsagris M., Preston S. and Wood A.T.A. (2017). "Nonparametric hypothesis testing for equality of means on the simplex". Journal of Statistical Computation and Simulation, 87(2): 406--422. <doi:10.1080/00949655.2016.1216554>. g) Tsagris M. and Stewart C. (2018). "A Dirichlet regression model for compositional data with zeros". Lobachevskii Journal of Mathematics, 39(3): 398--412. <doi:10.1134/S1995080218030198>. h) Alenazi A. (2019). "Regression for compositional data with compositional data as predictor variables with or without zero values". Journal of Data Science, 17(1): 219--238. <doi:10.6339/JDS.201901_17(1).0010>. i) Tsagris M. and Stewart C. (2020). "A folded model for compositional data analysis". Australian and New Zealand Journal of Statistics, 62(2): 249--277. <doi:10.1111/anzs.12289>. j) Alenazi A.A. (2022). "f-divergence regression models for compositional data". Pakistan Journal of Statistics and Operation Research, 18(4): 867--882. <doi:10.18187/pjsor.v18i4.3969>. k) Tsagris M. and Stewart C. (2022). "A Review of Flexible Transformations for Modeling Compositional Data". In Advances and Innovations in Statistics and Data Science, pp. 225--234. <doi:10.1007/978-3-031-08329-7_10>. l) Alenazi A. (2023). "A review of compositional data analysis and recent advances". Communications in Statistics--Theory and Methods, 52(16): 5535--5567. <doi:10.1080/03610926.2021.2014890>. m) Tsagris M., Alenazi A. and Stewart C. (2023). "Flexible non-parametric regression models for compositional response data with zeros". Statistics and Computing, 33(106). <doi:10.1007/s11222-023-10277-5>. n) Tsagris. M. (2025). "Constrained least squares simplicial-simplicial regression". Statistics and Computing, 35(27). <doi:10.1007/s11222-024-10560-z>. o) Sevinc V. and Tsagris. M. (2024). "Energy Based Equality of Distributions Testing for Compositional Data". <doi:10.48550/arXiv.2412.05199>.

Maintained by Michail Tsagris. Last updated 2 months ago.

9.6 match 3 stars 3.64 score 4 dependents

lrberge

fixest:Fast Fixed-Effects Estimations

Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.

Maintained by Laurent Berge. Last updated 7 months ago.

cpp openmp

2.4 match 387 stars 14.69 score 3.8k scripts 25 dependents

evilgraham

flatr:Transforms Contingency Tables to Data Frames, and Analyses Them

Contingency Tables are a pain to work with when you want to run regressions. This package takes them, flattens them into a long data frame, so you can more easily analyse them! As well, you can calculate other related statistics. All of this is done so in a 'tidy' manner, so it should tie in nicely with 'tidyverse' series of packages.

Maintained by Scott D. Graham. Last updated 7 years ago.

contingency-table glm regression tidy tidy-data

11.0 match 3 stars 3.18 score 6 scripts

netsimanalytics

NetSimR:Actuarial Functions for Non-Life Insurance Modelling

Assists actuaries and other insurance modellers in pricing, reserving and capital modelling for non-life insurance and reinsurance modelling. Provides functions that help model excess levels, capping and pure Incurred but not reported claims (pure IBNR). Includes capped mean, exposure curves and increased limit factor curves (ILFs) for LogNormal, Gamma, Pareto, Sliced LogNormal-Pareto and Sliced Gamma-Pareto distributions. Includes mean, probability density function (pdf), cumulative probability function (cdf) and inverse cumulative probability function for Sliced LogNormal-Pareto and Sliced Gamma-Pareto distributions. Includes calculating pure IBNR exposure with LogNormal and Gamma distribution for reporting delay. Includes three shiny tools, one to simulate insurance claims applying reinsurance structures, fit generalised linear models and fit claims frequency or severity distributions. Methods used in the package refer to Free for All by Yiannis Parizas (2023) <https://www.theactuary.com/2023/03/02/free-all>; Escaping the triangle by Yiannis Parizas (2019) <https://www.theactuary.com/features/2019/06/2019/06/05/escaping-triangle>; Take to excess by Yiannis Parizas (2019) <https://www.theactuary.com/features/2019/03/2019/03/06/taken-excess>.

Maintained by Yiannis Parizas. Last updated 1 years ago.

10.5 match 1 stars 3.33 score

alexisderumigny

CondCopulas:Estimation and Inference for Conditional Copula Models

Provides functions for the estimation of conditional copulas models, various estimators of conditional Kendall's tau (proposed in Derumigny and Fermanian (2019a, 2019b, 2020) <doi:10.1515/demo-2019-0016>, <doi:10.1016/j.csda.2019.01.013>, <doi:10.1016/j.jmva.2020.104610>), and test procedures for the simplifying assumption (proposed in Derumigny and Fermanian (2017) <doi:10.1515/demo-2017-0011> and Derumigny, Fermanian and Min (2022) <doi:10.1002/cjs.11742>).

Maintained by Alexis Derumigny. Last updated 6 months ago.

conditional-copulas conditional-kendalls-tau copulas r-pkg simplifying-assumption

7.4 match 2 stars 4.70 score 7 scripts

pauljohn32

rockchalk:Regression Estimation and Presentation

A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.

Maintained by Paul E. Johnson. Last updated 3 years ago.

4.8 match 7.13 score 584 scripts 18 dependents

cran

gplm:Generalized Partial Linear Models (GPLM)

Provides functions for estimating a generalized partial linear model, a semiparametric variant of the generalized linear model (GLM) which replaces the linear predictor by the sum of a linear and a nonparametric function.

Maintained by Marlene Mueller. Last updated 9 years ago.

17.0 match 2.00 score

sinhrks

ggfortify:Data Visualization Tools for Statistical Analysis Results

Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using 'ggplot2'.

Maintained by Yuan Tang. Last updated 9 months ago.

2.3 match 529 stars 14.49 score 9.1k scripts 22 dependents

asmahani

RegressionFactory:Expander Functions for Generating Full Gradient and Hessian from Single-Slot and Multi-Slot Base Distributions

The expander functions rely on the mathematics developed for the Hessian-definiteness invariance theorem for linear projection transformations of variables, described in authors' paper, to generate the full, high-dimensional gradient and Hessian from the lower-dimensional derivative objects. This greatly relieves the computational burden of generating the regression-function derivatives, which in turn can be fed into any optimization routine that utilizes such derivatives. The theorem guarantees that Hessian definiteness is preserved, meaning that reasoning about this property can be performed in the low-dimensional space of the base distribution. This is often a much easier task than its equivalent in the full, high-dimensional space. Definiteness of Hessian can be useful in selecting optimization/sampling algorithms such as Newton-Raphson optimization or its sampling equivalent, the Stochastic Newton Sampler. Finally, in addition to being a computational tool, the regression expansion framework is of conceptual value by offering new opportunities to generate novel regression problems.

Maintained by Alireza S. Mahani. Last updated 4 years ago.

14.4 match 2.30 score 20 scripts

moskante

MixedPsy:Statistical Tools for the Analysis of Psychophysical Data

Tools for the analysis of psychophysical data in R. This package allows to estimate the Point of Subjective Equivalence (PSE) and the Just Noticeable Difference (JND), either from a psychometric function or from a Generalized Linear Mixed Model (GLMM). Additionally, the package allows plotting the fitted models and the response data, simulating psychometric functions of different shapes, and simulating data sets. For a description of the use of GLMMs applied to psychophysical data, refer to Moscatelli et al. (2012).

Maintained by Alessandro Moscatelli. Last updated 25 days ago.

8.8 match 5 stars 3.70 score 9 scripts

ikosmidis

brglm:Bias Reduction in Binomial-Response Generalized Linear Models

Fit generalized linear models with binomial responses using either an adjusted-score approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudo-data. The interface is essentially the same as 'glm'. More flexibility is provided by the fact that custom pseudo-data representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reduced-bias estimates.

Maintained by Ioannis Kosmidis. Last updated 4 years ago.

4.6 match 6 stars 7.14 score 86 scripts 11 dependents

mdonoghoe

glm2:Fitting Generalized Linear Models

Fits generalized linear models using the same model specification as glm in the stats package, but with a modified default fitting method that provides greater stability for models that may fail to converge using glm.

Maintained by Mark W. Donoghoe. Last updated 7 years ago.

5.6 match 1 stars 5.78 score 270 scripts 24 dependents

nrs02004

SGL:Fit a GLM (or Cox Model) with a Combination of Lasso and Group Lasso Regularization

Fit a regularized generalized linear model via penalized maximum likelihood. The model is fit for a path of values of the penalty parameter. Fits linear, logistic and Cox models.

Maintained by Noah Simon. Last updated 5 years ago.

cpp

7.8 match 6 stars 4.11 score 71 scripts 1 dependents

davidgohel

flextable:Functions for Tabular Reporting

Use a grammar for creating and customizing pretty tables. The following formats are supported: 'HTML', 'PDF', 'RTF', 'Microsoft Word', 'Microsoft PowerPoint' and R 'Grid Graphics'. 'R Markdown', 'Quarto' and the package 'officer' can be used to produce the result files. The syntax is the same for the user regardless of the type of output to be produced. A set of functions allows the creation, definition of cell arrangement, addition of headers or footers, formatting and definition of cell content with text and or images. The package also offers a set of high-level functions that allow tabular reporting of statistical models and the creation of complex cross tabulations.

Maintained by David Gohel. Last updated 1 months ago.

docx html5 ms-office-documents rmarkdown table

1.9 match 583 stars 17.04 score 7.3k scripts 119 dependents

cumulocity-iot

pmml:Generate PMML for Various Models

The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define machine learning, statistical and data mining models and to share models between PMML compliant applications. More information about the PMML industry standard and the Data Mining Group can be found at <http://dmg.org/>. The generated PMML can be imported into any PMML consuming application, such as Zementis Predictive Analytics products. The package isofor (used for anomaly detection) can be installed with devtools::install_github("gravesee/isofor").

Maintained by Dmitriy Bolotov. Last updated 3 years ago.

machine-learning pmml zementis

4.0 match 20 stars 7.98 score 560 scripts 1 dependents

clbustos

dominanceanalysis:Dominance Analysis

Dominance analysis is a method that allows to compare the relative importance of predictors in multiple regression models: ordinary least squares, generalized linear models, hierarchical linear models, beta regression and dynamic linear models. The main principles and methods of dominance analysis are described in Budescu, D. V. (1993) <doi:10.1037/0033-2909.114.3.542> and Azen, R., & Budescu, D. V. (2003) <doi:10.1037/1082-989X.8.2.129> for ordinary least squares regression. Subsequently, the extensions for multivariate regression, logistic regression and hierarchical linear models were described in Azen, R., & Budescu, D. V. (2006) <doi:10.3102/10769986031002157>, Azen, R., & Traxel, N. (2009) <doi:10.3102/1076998609332754> and Luo, W., & Azen, R. (2013) <doi:10.3102/1076998612458319>, respectively.

Maintained by Claudio Bustos Navarrete. Last updated 1 years ago.

5.5 match 25 stars 5.75 score 45 scripts

mlr-org

mlr3extralearners:Extra Learners For mlr3

Extra learners for use in mlr3.

Maintained by Sebastian Fischer. Last updated 4 months ago.

machine-learning mlr3

3.4 match 94 stars 9.16 score 474 scripts

promidat

traineR:Predictive (Classification and Regression) Models Homologator

Methods to unify the different ways of creating predictive models and their different predictive formats for classification and regression. It includes methods such as K-Nearest Neighbors Schliep, K. P. (2004) <doi:10.5282/ubm/epub.1769>, Decision Trees Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (2017) <doi:10.1201/9781315139470>, ADA Boosting Esteban Alfaro, Matias Gamez, Noelia García (2013) <doi:10.18637/jss.v054.i02>, Extreme Gradient Boosting Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>, Random Forest Breiman (2001) <doi:10.1023/A:1010933404324>, Neural Networks Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Support Vector Machines Bennett, K. P. & Campbell, C. (2000) <doi:10.1145/380995.380999>, Bayesian Methods Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995) <doi:10.1201/9780429258411>, Linear Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Quadratic Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Logistic Regression Dobson, A. J., & Barnett, A. G. (2018) <doi:10.1201/9781315182780> and Penalized Logistic Regression Friedman, J. H., Hastie, T., & Tibshirani, R. (2010) <doi:10.18637/jss.v033.i01>.

Maintained by Oldemar Rodriguez R.. Last updated 1 years ago.

8.5 match 3.64 score 36 scripts 2 dependents

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 10 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

1.9 match 375 stars 16.11 score 17k scripts 115 dependents

msesia

knockoff:The Knockoff Filter for Controlled Variable Selection

The knockoff filter is a general procedure for controlling the false discovery rate (FDR) when performing variable selection. For more information, see the website below and the accompanying paper: Candes et al., "Panning for gold: model-X knockoffs for high-dimensional controlled variable selection", J. R. Statist. Soc. B (2018) 80, 3, pp. 551-577.

Maintained by Matteo Sesia. Last updated 3 years ago.

5.6 match 2 stars 5.35 score 248 scripts 5 dependents

cran

RCAL:Regularized Calibrated Estimation

Regularized calibrated estimation for causal inference and missing-data problems with high-dimensional data, based on Tan (2020a) <doi:10.1093/biomet/asz059>, Tan (2020b) <doi:10.1214/19-AOS1824> and Sun and Tan (2020) <arXiv:2009.09286>.

Maintained by Zhiqiang Tan. Last updated 4 years ago.

8.5 match 3.49 score 17 scripts 1 dependents

florianhartig

DHARMa:Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models

The 'DHARMa' package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Currently supported are linear and generalized linear (mixed) models from 'lme4' (classes 'lmerMod', 'glmerMod'), 'glmmTMB', 'GLMMadaptive', and 'spaMM'; phylogenetic linear models from 'phylolm' (classes 'phylolm' and 'phyloglm'); generalized additive models ('gam' from 'mgcv'); 'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model classes. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial, phylogenetic and temporal autocorrelation.

Maintained by Florian Hartig. Last updated 11 days ago.

glmm regression regression-diagnostics residual

2.0 match 226 stars 14.74 score 2.8k scripts 10 dependents

bertcarnell

tornado:Plots for Model Sensitivity and Variable Importance

Draws tornado plots for model sensitivity to univariate changes. Implements methods for many modeling methods including linear models, generalized linear models, survival regression models, and arbitrary machine learning models in the caret package. Also draws variable importance plots.

Maintained by Rob Carnell. Last updated 7 months ago.

explanability regression sensitivity-analysis

6.1 match 7 stars 4.85 score 4 scripts

bioc

compcodeR:RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods

This package provides extensive functionality for comparing results obtained by different methods for differential expression analysis of RNAseq data. It also contains functions for simulating count data. Finally, it provides convenient interfaces to several packages for performing the differential expression analysis. These can also be used as templates for setting up and running a user-defined differential analysis workflow within the framework of the package.

Maintained by Charlotte Soneson. Last updated 3 months ago.

immunooncology rnaseq differentialexpression

3.6 match 11 stars 8.06 score 26 scripts

flxzimmer

mlpwr:A Power Analysis Toolbox to Find Cost-Efficient Study Designs

We implement a surrogate modeling algorithm to guide simulation-based sample size planning. The method is described in detail in our paper (Zimmer & Debelak (2023) <doi:10.1037/met0000611>). It supports multiple study design parameters and optimization with respect to a cost function. It can find optimal designs that correspond to a desired statistical power or that fulfill a cost constraint. We also provide a tutorial paper (Zimmer et al. (2023) <doi:10.3758/s13428-023-02269-0>).

Maintained by Felix Zimmer. Last updated 5 months ago.

5.0 match 4 stars 5.83 score 16 scripts

dqksnow

subsampling:Optimal Subsampling Methods for Statistical Models

Balancing computational and statistical efficiency, subsampling techniques offer a practical solution for handling large-scale data analysis. Subsampling methods enhance statistical modeling for massive datasets by efficiently drawing representative subsamples from full dataset based on tailored sampling probabilities. These probabilities are optimized for specific goals, such as minimizing the variance of coefficient estimates or reducing prediction error.

Maintained by Qingkai Dong. Last updated 4 months ago.

openblas cpp

5.1 match 1 stars 5.60 score 6 scripts

dwarton

ecostats:Code and Data Accompanying the Eco-Stats Text (Warton 2022)

Functions and data supporting the Eco-Stats text (Warton, 2022, Springer), and solutions to exercises. Functions include tools for using simulation envelopes in diagnostic plots, and a function for diagnostic plots of multivariate linear models. Datasets mentioned in the package are included here (where not available elsewhere) and there is a vignette for each chapter of the text with solutions to exercises.

Maintained by David Warton. Last updated 1 years ago.

4.3 match 8 stars 6.58 score 53 scripts

openpharma

beeca:Binary Endpoint Estimation with Covariate Adjustment

Performs estimation of marginal treatment effects for binary outcomes when using logistic regression working models with covariate adjustment (see discussions in Magirr et al (2024) <https://osf.io/9mp58/>). Implements the variance estimators of Ge et al (2011) <doi:10.1177/009286151104500409> and Ye et al (2023) <doi:10.1080/24754269.2023.2205802>.

Maintained by Alex Przybylski. Last updated 4 months ago.

5.2 match 6 stars 5.48 score 8 scripts

melff

memisc:Management of Survey Data and Presentation of Analysis Results

An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.

Maintained by Martin Elff. Last updated 10 days ago.

survey-data

2.3 match 46 stars 12.34 score 1.2k scripts 13 dependents

alexpkeil1

qgcompint:Quantile G-Computation Extensions for Effect Measure Modification

G-computation for a set of time-fixed exposures with quantile-based basis functions, possibly under linearity and homogeneity assumptions. Effect measure modification in this method is a way to assess how the effect of the mixture varies by a binary, categorical or continuous variable. Reference: Alexander P. Keil, Jessie P. Buckley, Katie M. OBrien, Kelly K. Ferguson, Shanshan Zhao, and Alexandra J. White (2019) A quantile-based g-computation approach to addressing the effects of exposure mixtures; <doi:10.1289/EHP5838>.

Maintained by Alexander Keil. Last updated 3 days ago.

5.6 match 4 stars 4.89 score 13 scripts

bioc

multiHiCcompare:Normalize and detect differences between Hi-C datasets when replicates of each experimental condition are available

multiHiCcompare provides functions for joint normalization and difference detection in multiple Hi-C datasets. This extension of the original HiCcompare package now allows for Hi-C experiments with more than 2 groups and multiple samples per group. multiHiCcompare operates on processed Hi-C data in the form of sparse upper triangular matrices. It accepts four column (chromosome, region1, region2, IF) tab-separated text files storing chromatin interaction matrices. multiHiCcompare provides cyclic loess and fast loess (fastlo) methods adapted to jointly normalizing Hi-C data. Additionally, it provides a general linear model (GLM) framework adapting the edgeR package to detect differences in Hi-C data in a distance dependent manner.

Maintained by Mikhail Dozmorov. Last updated 5 months ago.

software hic sequencing normalization

3.7 match 9 stars 7.30 score 37 scripts 2 dependents

jinli22

spm2:Spatial Predictive Modeling

An updated and extended version of 'spm' package, by introducing some further novel functions for modern statistical methods (i.e., generalised linear models, glmnet, generalised least squares), thin plate splines, support vector machine, kriging methods (i.e., simple kriging, universal kriging, block kriging, kriging with an external drift), and novel hybrid methods (228 hybrids plus numerous variants) of modern statistical methods or machine learning methods with mathematical and/or univariate geostatistical methods for spatial predictive modelling. For each method, two functions are provided, with one function for assessing the predictive errors and accuracy of the method based on cross-validation, and the other for generating spatial predictions. It also contains a couple of functions for data preparation and predictive accuracy assessment.

Maintained by Jin Li. Last updated 2 years ago.

13.0 match 2.08 score 2 scripts 2 dependents

gabrielshimizu

AgroR:Experimental Statistics and Graphics for Agricultural Sciences

Performs the analysis of completely randomized experimental designs (CRD), randomized blocks (RBD) and Latin square (LSD), experiments in double and triple factorial scheme (in CRD and RBD), experiments in subdivided plot scheme (in CRD and RBD), subdivided and joint analysis of experiments in CRD and RBD, linear regression analysis, test for two samples. The package performs analysis of variance, ANOVA assumptions and multiple comparison test of means or regression, according to Pimentel-Gomes (2009, ISBN: 978-85-7133-055-9), nonparametric test (Conover, 1999, ISBN: 0471160687), test for two samples, joint analysis of experiments according to Ferreira (2018, ISBN: 978-85-7269-566-4) and generalized linear model (glm) for binomial and Poisson family in CRD and RBD (Carvalho, FJ (2019), <doi:10.14393/ufu.te.2019.1244>). It can also be used to obtain descriptive measures and graphics, in addition to correlations and creative graphics used in agricultural sciences (Agronomy, Zootechnics, Food Science and related areas).

Maintained by Gabriel Danilo Shimizu. Last updated 11 months ago.

8.6 match 1 stars 3.11 score 173 scripts

f-rousset

spaMM:Mixed-Effect Models, with or without Spatial Random Effects

Inference based on models with or without spatially-correlated random effects, multivariate responses, or non-Gaussian random effects (e.g., Beta). Variation in residual variance (heteroscedasticity) can itself be represented by a mixed-effect model. Both classical geostatistical models (Rousset and Ferdy 2014 <doi:10.1111/ecog.00566>), and Markov random field models on irregular grids (as considered in the 'INLA' package, <https://www.r-inla.org>), can be fitted, with distinct computational procedures exploiting the sparse matrix representations for the latter case and other autoregressive models. Laplace approximations are used for likelihood or restricted likelihood. Penalized quasi-likelihood and other variants discussed in the h-likelihood literature (Lee and Nelder 2001 <doi:10.1093/biomet/88.4.987>) are also implemented.

Maintained by François Rousset. Last updated 9 months ago.

gsl cpp openmp

5.4 match 4.94 score 208 scripts 5 dependents

tidymodels

butcher:Model Butcher

Provides a set of S3 generics to axe components of fitted model objects and help reduce the size of model objects saved to disk.

Maintained by Julia Silge. Last updated 12 days ago.

2.3 match 132 stars 11.54 score 146 scripts 13 dependents

bioc

glmSparseNet:Network Centrality Metrics for Elastic-Net Regularized Models

glmSparseNet is an R-package that generalizes sparse regression models when the features (e.g. genes) have a graph structure (e.g. protein-protein interactions), by including network-based regularizers. glmSparseNet uses the glmnet R-package, by including centrality measures of the network as penalty weights in the regularization. The current version implements regularization based on node degree, i.e. the strength and/or number of its associated edges, either by promoting hubs in the solution or orphan genes in the solution. All the glmnet distribution families are supported, namely "gaussian", "poisson", "binomial", "multinomial", "cox", and "mgaussian".

Maintained by André Veríssimo. Last updated 5 months ago.

software statisticalmethod dimensionreduction regression classification survival network graphandnetwork

3.4 match 6 stars 7.42 score 41 scripts 1 dependents

cardiomoon

ztable:Zebra-Striped Tables in LaTeX and HTML Formats

Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm, coxph, nls, fitdistr, mytable and cbind.mytable objects.

Maintained by Keon-Woong Moon. Last updated 2 years ago.

3.2 match 21 stars 7.90 score 212 scripts 2 dependents

fbertran

plsRglm:Partial Least Squares Regression for Generalized Linear Models

Provides (weighted) Partial least squares Regression for generalized linear models and repeated k-fold cross-validation of such models using various criteria <arXiv:1810.01005>. It allows for missing data in the explanatory variables. Bootstrap confidence intervals constructions are also available.

Maintained by Frederic Bertrand. Last updated 2 years ago.

3.3 match 16 stars 7.75 score 103 scripts 5 dependents

jepusto

clubSandwich:Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections

Provides several cluster-robust variance estimators (i.e., sandwich estimators) for ordinary and weighted least squares linear regression models, including the bias-reduced linearization estimator introduced by Bell and McCaffrey (2002) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2002002/article/9058-eng.pdf> and developed further by Pustejovsky and Tipton (2017) <DOI:10.1080/07350015.2016.1247004>. The package includes functions for estimating the variance- covariance matrix and for testing single- and multiple- contrast hypotheses based on Wald test statistics. Tests of single regression coefficients use Satterthwaite or saddle-point corrections. Tests of multiple- contrast hypotheses use an approximation to Hotelling's T-squared distribution. Methods are provided for a variety of fitted models, including lm() and mlm objects, glm(), geeglm() (from package 'geepack'), ivreg() (from package 'AER'), ivreg() (from package 'ivreg' when estimated by ordinary least squares), plm() (from package 'plm'), gls() and lme() (from 'nlme'), lmer() (from `lme4`), robu() (from 'robumeta'), and rma.uni() and rma.mv() (from 'metafor').

Maintained by James Pustejovsky. Last updated 14 days ago.

2.2 match 48 stars 11.25 score 656 scripts 4 dependents

mathijsdeen

ClusterBootstrap:Analyze Clustered Data with Generalized Linear Models using the Cluster Bootstrap

Provides functionality for the analysis of clustered data using the cluster bootstrap.

Maintained by Mathijs Deen. Last updated 4 years ago.

6.9 match 2 stars 3.60 score 8 scripts

atahk

pscl:Political Science Computational Laboratory

Bayesian analysis of item-response theory (IRT) models, roll call analysis; computing highest density regions; maximum likelihood estimation of zero-inflated and hurdle models for count data; goodness-of-fit measures for GLMs; data sets used in writing and teaching; seats-votes curves.

Maintained by Simon Jackman. Last updated 1 years ago.

1.9 match 67 stars 13.28 score 2.7k scripts 54 dependents

bioc

RCM:Fit row-column association models with the negative binomial distribution for the microbiome

Combine ideas of log-linear analysis of contingency table, flexible response function estimation and empirical Bayes dispersion estimation for explorative visualization of microbiome datasets. The package includes unconstrained as well as constrained analysis. In addition, diagnostic plot to detect lack of fit are available.

Maintained by Stijn Hawinkel. Last updated 5 months ago.

metagenomics dimensionreduction microbiome visualization ordination phyloseq rcm

3.5 match 16 stars 6.90 score 25 scripts

merliseclyde

BAS:Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling

Package for Bayesian Variable Selection and Model Averaging in linear models and generalized linear models using stochastic or deterministic sampling without replacement from posterior distributions. Prior distributions on coefficients are from Zellner's g-prior or mixtures of g-priors corresponding to the Zellner-Siow Cauchy Priors or the mixture of g-priors from Liang et al (2008) <DOI:10.1198/016214507000001337> for linear models or mixtures of g-priors from Li and Clyde (2019) <DOI:10.1080/01621459.2018.1469992> in generalized linear models. Other model selection criteria include AIC, BIC and Empirical Bayes estimates of g. Sampling probabilities may be updated based on the sampled models using sampling w/out replacement or an efficient MCMC algorithm which samples models using a tree structure of the model space as an efficient hash table. See Clyde, Ghosh and Littman (2010) <DOI:10.1198/jcgs.2010.09049> for details on the sampling algorithms. Uniform priors over all models or beta-binomial prior distributions on model size are allowed, and for large p truncated priors on the model space may be used to enforce sampling models that are full rank. The user may force variables to always be included in addition to imposing constraints that higher order interactions are included only if their parents are included in the model. This material is based upon work supported by the National Science Foundation under Division of Mathematical Sciences grant 1106891. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Maintained by Merlise Clyde. Last updated 4 months ago.

bayesian bayesian-inference generalized-linear-models linear-regression logistic-regression mcmc model-selection poisson-regression predictive-modeling regression variable-selection fortran openblas

2.3 match 44 stars 10.81 score 420 scripts 3 dependents

r-spatial

spatialreg:Spatial Regression Analysis

A collection of all the estimation functions for spatial cross-sectional models (on lattice/areal data using spatial weights matrices) contained up to now in 'spdep'. These model fitting functions include maximum likelihood methods for cross-sectional models proposed by 'Cliff' and 'Ord' (1973, ISBN:0850860369) and (1981, ISBN:0850860814), fitting methods initially described by 'Ord' (1975) <doi:10.1080/01621459.1975.10480272>. The models are further described by 'Anselin' (1988) <doi:10.1007/978-94-015-7799-1>. Spatial two stage least squares and spatial general method of moment models initially proposed by 'Kelejian' and 'Prucha' (1998) <doi:10.1023/A:1007707430416> and (1999) <doi:10.1111/1468-2354.00027> are provided. Impact methods and MCMC fitting methods proposed by 'LeSage' and 'Pace' (2009) <doi:10.1201/9781420064254> are implemented for the family of cross-sectional spatial regression models. Methods for fitting the log determinant term in maximum likelihood and MCMC fitting are compared by 'Bivand et al.' (2013) <doi:10.1111/gean.12008>, and model fitting methods by 'Bivand' and 'Piras' (2015) <doi:10.18637/jss.v063.i18>; both of these articles include extensive lists of references. A recent review is provided by 'Bivand', 'Millo' and 'Piras' (2021) <doi:10.3390/math9111276>. 'spatialreg' >= 1.1-* corresponded to 'spdep' >= 1.1-1, in which the model fitting functions were deprecated and passed through to 'spatialreg', but masked those in 'spatialreg'. From versions 1.2-*, the functions have been made defunct in 'spdep'. From version 1.3-6, add Anselin-Kelejian (1997) test to `stsls` for residual spatial autocorrelation <doi:10.1177/016001769702000109>.

Maintained by Roger Bivand. Last updated 2 days ago.

bayesian impacts maximum-likelihood spatial-dependence spatial-econometrics spatial-regression openblas

1.9 match 46 stars 12.92 score 916 scripts 24 dependents

zuoyi93

ProSGPV:Penalized Regression with Second-Generation P-Values

Implementation of penalized regression with second-generation p-values for variable selection. The algorithm can handle linear regression, GLM, and Cox regression. S3 methods print(), summary(), coef(), predict(), and plot() are available for the algorithm. Technical details can be found at Zuo et al. (2021) <doi:10.1080/00031305.2021.1946150>.

Maintained by Yi Zuo. Last updated 4 years ago.

penalized-regression

5.1 match 5 stars 4.70 score 9 scripts

khliland

mixlm:Mixed Model ANOVA and Statistics for Education

The main functions perform mixed models analysis by least squares or REML by adding the function r() to formulas of lm() and glm(). A collection of text-book statistics for higher education is also included, e.g. modifications of the functions lm(), glm() and associated summaries from the package 'stats'.

Maintained by Kristian Hovde Liland. Last updated 30 days ago.

4.1 match 5.87 score 56 scripts 3 dependents

richjjackson

psc:Personalised Synthetic Controls

Allows the comparison of data cohorts (DC) against a Counter Factual Model (CFM) and measures the difference in terms of an efficacy parameter. Allows the application of Personalised Synthetic Controls.

Maintained by Richard Jackson. Last updated 4 months ago.

5.7 match 1 stars 4.23 score 24 scripts

mandymejia

BayesfMRI:Spatial Bayesian Methods for Task Functional MRI Studies

Performs a spatial Bayesian general linear model (GLM) for task functional magnetic resonance imaging (fMRI) data on the cortical surface. Additional models include group analysis and inference to detect thresholded areas of activation. Includes direct support for the 'CIFTI' neuroimaging file format. For more information see A. F. Mejia, Y. R. Yue, D. Bolin, F. Lindgren, M. A. Lindquist (2020) <doi:10.1080/01621459.2019.1611582> and D. Spencer, Y. R. Yue, D. Bolin, S. Ryan, A. F. Mejia (2022) <doi:10.1016/j.neuroimage.2022.118908>.

Maintained by Amanda Mejia. Last updated 7 days ago.

cpp

4.1 match 26 stars 5.77 score 19 scripts

cran

clusterSEs:Calculate Cluster-Robust p-Values and Confidence Intervals

Calculate p-values and confidence intervals using cluster-adjusted t-statistics (based on Ibragimov and Muller (2010) <DOI:10.1198/jbes.2009.08046>, pairs cluster bootstrapped t-statistics, and wild cluster bootstrapped t-statistics (the latter two techniques based on Cameron, Gelbach, and Miller (2008) <DOI:10.1162/rest.90.3.414>. Procedures are included for use with GLM, ivreg, plm (pooling or fixed effects), and mlogit models.

Maintained by Justin Esarey. Last updated 4 years ago.

13.4 match 2 stars 1.78 score 1 dependents

psychbruce

bruceR:Broadly Useful Convenient and Efficient R Functions

Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.

Maintained by Han-Wu-Shuang Bao. Last updated 9 months ago.

anova data-analysis data-science linear-models linear-regression multilevel-models statistics toolbox

3.0 match 176 stars 7.87 score 316 scripts 3 dependents

bioc

edgeR:Empirical Analysis of Digital Gene Expression Data in R

Differential expression analysis of sequence count data. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, quasi-likelihood, and gene set enrichment. Can perform differential analyses of any type of omics data that produces read counts, including RNA-seq, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, CAGE, metabolomics, or proteomics spectral counts. RNA-seq analyses can be conducted at the gene or isoform level, and tests can be conducted for differential exon or transcript usage.

Maintained by Yunshun Chen. Last updated 4 days ago.

alternativesplicing batcheffect bayesian biomedicalinformatics cellbiology chipseq clustering coverage differentialexpression differentialmethylation differentialsplicing dnamethylation epigenetics functionalgenomics geneexpression genesetenrichment genetics immunooncology multiplecomparison normalization pathways proteomics qualitycontrol regression rnaseq sage sequencing singlecell systemsbiology timecourse transcription transcriptomics openblas

1.8 match 13.40 score 17k scripts 255 dependents

bioc

msqrob2:Robust statistical inference for quantitative LC-MS proteomics

msqrob2 provides a robust linear mixed model framework for assessing differential abundance in MS-based Quantitative proteomics experiments. Our workflows can start from raw peptide intensities or summarised protein expression values. The model parameter estimates can be stabilized by ridge regression, empirical Bayes variance estimation and robust M-estimation. msqrob2's hurde workflow can handle missing data without having to rely on hard-to-verify imputation assumptions, and, outcompetes state-of-the-art methods with and without imputation for both high and low missingness. It builds on QFeature infrastructure for quantitative mass spectrometry data to store the model results together with the raw data and preprocessed data.

Maintained by Lieven Clement. Last updated 17 days ago.

proteomics massspectrometry differentialexpression multiplecomparison regression experimentaldesign software immunooncology normalization timecourse preprocessing

3.4 match 10 stars 6.94 score 83 scripts

nilotpalsanyal

BHMSMAfMRI:Bayesian Hierarchical Multi-Subject Multiscale Analysis of Functional MRI (fMRI) Data

Package BHMSMAfMRI performs Bayesian hierarchical multi-subject multiscale analysis of fMRI data as described in Sanyal & Ferreira (2012) <DOI:10.1016/j.neuroimage.2012.08.041>, or other multiscale data, using wavelet based prior that borrows strength across subjects and provides posterior smoothed images of the effect sizes and samples from the posterior distribution.

Maintained by Nilotpal Sanyal. Last updated 2 years ago.

bayesian-hierarchical-models fmri-data-analysis multiscale-data wavelet-transform openblas cpp openmp

8.3 match 2.81 score 13 scripts

cran

CDsampling:'CDsampling': Constraint Sampling in Paid Research Studies

In the context of paid research studies and clinical trials, budget considerations and patient sampling from available populations are subject to inherent constraints. We introduce the 'CDsampling' package, which integrates optimal design theories within the framework of constrained sampling. This package offers the possibility to find both D-optimal approximate and exact allocations for samplings with or without constraints. Additionally, it provides functions to find constrained uniform sampling as a robust sampling strategy with limited model information. Our package offers functions for the computation of the Fisher information matrix under generalized linear models (including regular linear regression model) and multinomial logistic models.To demonstrate the applications, we also provide a simulated dataset and a real dataset embedded in the package. Yifei Huang, Liping Tong, and Jie Yang (2025)<doi:10.5705/ss.202022.0414>.

Maintained by Yifei Huang. Last updated 2 months ago.

9.2 match 2.48 score

agbarnett

season:Seasonal Analysis of Health Data

Routines for the seasonal analysis of health data, including regression models, time-stratified case-crossover, plotting functions and residual checks, see Barnett and Dobson (2010) ISBN 978-3-642-10748-1. Thanks to Yuming Guo for checking the case-crossover code.

Maintained by Adrian Barnett. Last updated 3 years ago.

non-linear seasons time-series

3.9 match 2 stars 5.85 score 70 scripts

tidymodels

tidypredict:Run Predictions Inside the Database

It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.

Maintained by Emil Hvitfeldt. Last updated 3 months ago.

dbplyr dplyr purrr rlang

2.0 match 261 stars 11.03 score 241 scripts 2 dependents

insightsengineering

tern:Create Common TLGs Used in Clinical Trials

Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.

Maintained by Joe Zhu. Last updated 2 months ago.

clinical-trials graphs listings nest outputs tables

1.8 match 79 stars 12.62 score 186 scripts 9 dependents

wleoncio

EMJMCMC:Evolutionary Mode Jumping Markov Chain Monte Carlo Expert Toolbox

Implementation of the Mode Jumping Markov Chain Monte Carlo algorithm from Hubin, A., Storvik, G. (2018) <doi:10.1016/j.csda.2018.05.020>, Genetically Modified Mode Jumping Markov Chain Monte Carlo from Hubin, A., Storvik, G., & Frommlet, F. (2020) <doi:10.1214/18-BA1141>, Hubin, A., Storvik, G., & Frommlet, F. (2021) <doi:10.1613/jair.1.13047>, and Hubin, A., Heinze, G., & De Bin, R. (2023) <doi:10.3390/fractalfract7090641>, and Reversible Genetically Modified Mode Jumping Markov Chain Monte Carlo from Hubin, A., Frommlet, F., & Storvik, G. (2021) <doi:10.48550/arXiv.2110.05316>, which allow for estimating posterior model probabilities and Bayesian model averaging across a wide set of Bayesian models including linear, generalized linear, generalized linear mixed, generalized nonlinear, generalized nonlinear mixed, and logic regression models.

Maintained by Waldir Leoncio. Last updated 11 months ago.

14.9 match 1.46 score 29 scripts

bxc147

Epi:Statistical Analysis in Epidemiology

Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.

Maintained by Bendix Carstensen. Last updated 2 months ago.

2.3 match 4 stars 9.65 score 708 scripts 11 dependents

cran

smurf:Sparse Multi-Type Regularized Feature Modeling

Implementation of the SMuRF algorithm of Devriendt et al. (2021) <doi:10.1016/j.insmatheco.2020.11.010> to fit generalized linear models (GLMs) with multiple types of predictors via regularized maximum likelihood.

Maintained by Tom Reynkens. Last updated 20 days ago.

openblas cpp

6.7 match 3.21 score 27 scripts 1 dependents

syedhaider5

chicane:Capture Hi-C Analysis Engine

Toolkit for processing and calling interactions in capture Hi-C data. Converts BAM files into counts of reads linking restriction fragments, and identifies pairs of fragments that interact more than expected by chance. Significant interactions are identified by comparing the observed read count to the expected background rate from a count regression model.

Maintained by Syed Haider. Last updated 3 years ago.

7.8 match 2.75 score 28 scripts

gmcmacran

GlmSimulatoR:Creates Ideal Data for Generalized Linear Models

Creates ideal data for all distributions in the generalized linear model framework.

Maintained by Greg McMahan. Last updated 8 months ago.

4.2 match 1 stars 5.12 score 53 scripts

steve-the-bayesian

BoomSpikeSlab:MCMC for Spike and Slab Regression

Spike and slab regression with a variety of residual error distributions corresponding to Gaussian, Student T, probit, logit, SVM, and a few others. Spike and slab regression is Bayesian regression with prior distributions containing a point mass at zero. The posterior updates the amount of mass on this point, leading to a posterior distribution that is actually sparse, in the sense that if you sample from it many coefficients are actually zeros. Sampling from this posterior distribution is an elegant way to handle Bayesian variable selection and model averaging. See <DOI:10.1504/IJMMNO.2014.059942> for an explanation of the Gaussian case.

Maintained by Steven L. Scott. Last updated 1 years ago.

cpp

3.9 match 6 stars 5.46 score 95 scripts 5 dependents

suyusung

arm:Data Analysis Using Regression and Multilevel/Hierarchical Models

Functions to accompany A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.

Maintained by Yu-Sung Su. Last updated 4 months ago.

1.7 match 25 stars 12.38 score 3.3k scripts 89 dependents

scheike

timereg:Flexible Regression Models for Survival Data

Programs for Martinussen and Scheike (2006), `Dynamic Regression Models for Survival Data', Springer Verlag. Plus more recent developments. Additive survival model, semiparametric proportional odds model, fast cumulative residuals, excess risk models and more. Flexible competing risks regression including GOF-tests. Two-stage frailty modelling. PLS for the additive risk model. Lasso in the 'ahaz' package.

Maintained by Thomas Scheike. Last updated 6 months ago.

openblas

2.0 match 31 stars 10.42 score 289 scripts 44 dependents

alexanderrobitzsch

miceadds:Some Additional Multiple Imputation Functions, Especially for 'mice'

Contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).

Maintained by Alexander Robitzsch. Last updated 14 days ago.

missing-data multiple-imputation openblas cpp

2.3 match 16 stars 9.16 score 542 scripts 9 dependents

spatstat

spatstat.model:Parametric Statistical Modelling and Inference for the 'spatstat' Family

Functionality for parametric statistical modelling and inference for spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Supports parametric modelling, formal statistical inference, and model validation. Parametric models include Poisson point processes, Cox point processes, Neyman-Scott cluster processes, Gibbs point processes and determinantal point processes. Models can be fitted to data using maximum likelihood, maximum pseudolikelihood, maximum composite likelihood and the method of minimum contrast. Fitted models can be simulated and predicted. Formal inference includes hypothesis tests (quadrat counting tests, Cressie-Read tests, Clark-Evans test, Berman test, Diggle-Cressie-Loosmore-Ford test, scan test, studentised permutation test, segregation test, ANOVA tests of fitted models, adjusted composite likelihood ratio test, envelope tests, Dao-Genton test, balanced independent two-stage test), confidence intervals for parameters, and prediction intervals for point counts. Model validation techniques include leverage, influence, partial residuals, added variable plots, diagnostic plots, pseudoscore residual plots, model compensators and Q-Q plots.

Maintained by Adrian Baddeley. Last updated 6 days ago.

analysis-of-variance cluster-process confidence-intervals cox-process determinantal-point-processes gibbs-process influence leverage model-diagnostics neyman-scott parameter-estimation poisson-process spatial-analysis spatial-modelling spatial-point-processes statistical-inference

2.3 match 5 stars 9.09 score 6 scripts 46 dependents

dmurdoch

ellipse:Functions for Drawing Ellipses and Ellipse-Like Confidence Regions

Contains various routines for drawing ellipses and ellipse-like confidence regions, implementing the plots described in Murdoch and Chow (1996, <doi:10.2307/2684435>). There are also routines implementing the profile plots described in Bates and Watts (1988, <doi:10.1002/9780470316757>).

Maintained by Duncan Murdoch. Last updated 2 years ago.

1.8 match 4 stars 11.13 score 1.2k scripts 256 dependents

cran

MuMIn:Multi-Model Inference

Tools for model selection and model averaging with support for a wide range of statistical models. Automated model selection through subsets of the maximum model, with optional constraints for model inclusion. Averaging of model parameters and predictions based on model weights derived from information criteria (AICc and alike) or custom model weighting schemes.

Maintained by Kamil Bartoń. Last updated 9 months ago.

2.3 match 8 stars 8.84 score 5.6k scripts 27 dependents

hannahlowens

voluModel:Modeling Species Distributions in Three Dimensions

Facilitates modeling species' ecological niches and geographic distributions based on occurrences and environments that have a vertical as well as horizontal component, and projecting models into three-dimensional geographic space. Working in three dimensions is useful in an aquatic context when the organisms one wishes to model can be found across a wide range of depths in the water column. The package also contains functions to automatically generate marine training model training regions using machine learning, and interpolate and smooth patchily sampled environmental rasters using thin plate splines. Davis Rabosky AR, Cox CL, Rabosky DL, Title PO, Holmes IA, Feldman A, McGuire JA (2016) <doi:10.1038/ncomms11484>. Nychka D, Furrer R, Paige J, Sain S (2021) <doi:10.5065/D6W957CT>. Pateiro-Lopez B, Rodriguez-Casal A (2022) <https://CRAN.R-project.org/package=alphahull>.

Maintained by Hannah L. Owens. Last updated 18 hours ago.

3.0 match 9 stars 6.60 score 35 scripts

rstudio

tfprobability:Interface to 'TensorFlow Probability'

Interface to 'TensorFlow Probability', a 'Python' library built on 'TensorFlow' that makes it easy to combine probabilistic models and deep learning on modern hardware ('TPU', 'GPU'). 'TensorFlow Probability' includes a wide selection of probability distributions and bijectors, probabilistic layers, variational inference, Markov chain Monte Carlo, and optimizers such as Nelder-Mead, BFGS, and SGLD.

Maintained by Tomasz Kalinowski. Last updated 3 years ago.

2.3 match 54 stars 8.63 score 221 scripts 3 dependents

sachaepskamp

qgraph:Graph Plotting Methods, Psychometric Data Visualization and Graphical Model Estimation

Fork of qgraph - Weighted network visualization and analysis, as well as Gaussian graphical model computation. See Epskamp et al. (2012) <doi:10.18637/jss.v048.i04>.

Maintained by Sacha Epskamp. Last updated 1 years ago.

cpp

1.7 match 69 stars 11.43 score 1.2k scripts 63 dependents

angieshen6

BayesPPD:Bayesian Power Prior Design

Bayesian power/type I error calculation and model fitting using the power prior and the normalized power prior for generalized linear models. Detailed examples of applying the package are available at <doi:10.32614/RJ-2023-016>. Models for time-to-event outcomes are implemented in the R package 'BayesPPDSurv'. The Bayesian clinical trial design methodology is described in Chen et al. (2011) <doi:10.1111/j.1541-0420.2011.01561.x>, and Psioda and Ibrahim (2019) <doi:10.1093/biostatistics/kxy009>. The normalized power prior is described in Duan et al. (2006) <doi:10.1002/env.752> and Ibrahim et al. (2015) <doi:10.1002/sim.6728>.

Maintained by Yueqi Shen. Last updated 2 months ago.

openblas cpp openmp

7.8 match 2.48 score 7 scripts 1 dependents

cvoeten

permutes:Permutation Tests for Time Series Data

Helps you determine the analysis window to use when analyzing densely-sampled time-series data, such as EEG data, using permutation testing (Maris & Oostenveld, 2007) <doi:10.1016/j.jneumeth.2007.03.024>. These permutation tests can help identify the timepoints where significance of an effect begins and ends, and the results can be plotted in various types of heatmap for reporting. Mixed-effects models are supported using an implementation of the approach by Lee & Braun (2012) <doi:10.1111/j.1541-0420.2011.01675.x>.

Maintained by Cesko C. Voeten. Last updated 2 years ago.

4.5 match 4.23 score 16 scripts

batss-dev

BATSS:Bayesian Adaptive Trial Simulator Software (BATSS) for Generalised Linear Models

Defines operating characteristics of Bayesian Adaptive Trials considering a generalised linear model response via Monte Carlo simulations of Bayesian GLM fitted via integrated Laplace approximations (INLA).

Maintained by Dominique-Laurent Couturier. Last updated 5 months ago.

4.6 match 2 stars 4.15 score

statsgary

OddsPlotty:Odds Plot to Visualise a Logistic Regression Model

Uses the outputs of a logistic regression model, from caret <https://CRAN.R-project.org/package=caret>, to build an odds plot. This allows for the rapid visualisation of odds plot ratios and works best with the outputs of CARET's GLM model class, by returning the final trained model.

Maintained by Gary Hutson. Last updated 27 days ago.

3.0 match 17 stars 6.39 score 48 scripts 1 dependents

mages

ChainLadder:Statistical Methods and Models for Claims Reserving in General Insurance

Various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance, including those to estimate the claims development result as required under Solvency II.

Maintained by Markus Gesmann. Last updated 1 months ago.

1.9 match 82 stars 10.04 score 196 scripts 2 dependents

winvector

wrapr:Wrap R Tools for Debugging and Parametric Programming

Tools for writing and debugging R code. Provides: '%.>%' dot-pipe (an 'S3' configurable pipe), unpack/to (R style multiple assignment/return), 'build_frame()'/'draw_frame()' ('data.frame' example tools), 'qc()' (quoting concatenate), ':=' (named map builder), 'let()' (converts non-standard evaluation interfaces to parametric standard evaluation interfaces, inspired by 'gtools::strmacro()' and 'base::bquote()'), and more.

Maintained by John Mount. Last updated 2 years ago.

1.7 match 137 stars 11.11 score 390 scripts 12 dependents

harrison4192

autostats:Auto Stats

Automatically do statistical exploration. Create formulas using 'tidyselect' syntax, and then determine cross-validated model accuracy and variable contributions using 'glm' and 'xgboost'. Contains additional helper functions to create and modify formulas. Has a flagship function to quickly determine relationships between categorical and continuous variables in the data set.

Maintained by Harrison Tietze. Last updated 10 days ago.

2.8 match 6 stars 6.76 score 5 scripts 2 dependents

marc-girondot

HelpersMG:Tools for Environmental Analyses, Ecotoxicology and Various R Functions

Contains miscellaneous functions useful for managing 'NetCDF' files (see <https://en.wikipedia.org/wiki/NetCDF>), get moon phase and time for sun rise and fall, tide level, analyse and reconstruct periodic time series of temperature with irregular sinusoidal pattern, show scales and wind rose in plot with change of color of text, Metropolis-Hastings algorithm for Bayesian MCMC analysis, plot graphs or boxplot with error bars, search files in disk by there names or their content, read the contents of all files from a folder at one time.

Maintained by Marc Girondot. Last updated 2 months ago.

4.0 match 4 stars 4.59 score 160 scripts 4 dependents

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

1.6 match 228 stars 11.43 score 221 scripts 1 dependents

sinnweja

pleio:Pleiotropy Test for Multiple Traits on a Genetic Marker

Perform tests for pleiotropy of multiple traits of various variable types on genotypes for a genetic marker.

Maintained by Jason Sinnwell. Last updated 1 years ago.

6.0 match 3.00 score 7 scripts

sachsmc

stdReg2:Regression Standardization for Causal Inference

Contains more modern tools for causal inference using regression standardization. Four general classes of models are implemented; generalized linear models, conditional generalized estimating equation models, Cox proportional hazards models, and shared frailty gamma-Weibull models. Methodological details are described in Sjölander, A. (2016) <doi:10.1007/s10654-016-0157-3>. Also includes functionality for doubly robust estimation for generalized linear models in some special cases, and the ability to implement custom models.

Maintained by Michael C Sachs. Last updated 15 days ago.

3.5 match 2 stars 5.08 score 9 scripts

bips-hb

neuralnet:Training of Neural Networks

Training of neural networks using backpropagation, resilient backpropagation with (Riedmiller, 1994) or without weight backtracking (Riedmiller and Braun, 1993) or the modified globally convergent version by Anastasiadis et al. (2005). The package allows flexible settings through custom-choice of error and activation function. Furthermore, the calculation of generalized weights (Intrator O & Intrator N, 1993) is implemented.

Maintained by Marvin N. Wright. Last updated 4 years ago.

1.7 match 32 stars 10.73 score 2.9k scripts 38 dependents

american-institutes-for-research

EdSurvey:Analysis of NCES Education Survey and Assessment Data

Read in and analyze functions for education survey and assessment data from the National Center for Education Statistics (NCES) <https://nces.ed.gov/>, including National Assessment of Educational Progress (NAEP) data <https://nces.ed.gov/nationsreportcard/> and data from the International Assessment Database: Organisation for Economic Co-operation and Development (OECD) <https://www.oecd.org/en/about/directorates/directorate-for-education-and-skills.html>, including Programme for International Student Assessment (PISA), Teaching and Learning International Survey (TALIS), Programme for the International Assessment of Adult Competencies (PIAAC), and International Association for the Evaluation of Educational Achievement (IEA) <https://www.iea.nl/>, including Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, Progress in International Reading Literacy Study (PIRLS), International Civic and Citizenship Study (ICCS), International Computer and Information Literacy Study (ICILS), and Civic Education Study (CivEd).

Maintained by Paul Bailey. Last updated 14 days ago.

2.3 match 10 stars 7.86 score 139 scripts 1 dependents

bnowok

synthpop:Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. For a description of the implemented method see Nowok, Raab and Dibben (2016) <doi:10.18637/jss.v074.i11>.

Maintained by Beata Nowok. Last updated 3 years ago.

2.3 match 44 stars 7.85 score 536 scripts

smtorres

GLMpack:Data and Code to Accompany Generalized Linear Models, 2nd Edition

Contains all the data and functions used in Generalized Linear Models, 2nd edition, by Jeff Gill and Michelle Torres. Examples to create all models, tables, and plots are included for each data set.

Maintained by Michelle Torres. Last updated 6 years ago.

8.8 match 1 stars 2.00 score

easystats

report:Automated Reporting of Results and Statistical Models

The aim of the 'report' package is to bridge the gap between R’s output and the formatted results contained in your manuscript. This package converts statistical models and data frames into textual reports suited for publication, ensuring standardization and quality in results reporting.

Maintained by Rémi Thériault. Last updated 1 months ago.

anovas apa automated-report-generation automatic bayesian describe easystats hacktoberfest manuscript models report reporting reports scientific statsmodels

1.2 match 698 stars 14.48 score 1.1k scripts 3 dependents

sujit-sahu

bmstdr:Bayesian Modeling of Spatio-Temporal Data with R

Fits, validates and compares a number of Bayesian models for spatial and space time point referenced and areal unit data. Model fitting is done using several packages: 'rstan', 'INLA', 'spBayes', 'spTimer', 'spTDyn', 'CARBayes' and 'CARBayesST'. Model comparison is performed using the DIC and WAIC, and K-fold cross-validation where the user is free to select their own subset of data rows for validation. Sahu (2022) <doi:10.1201/9780429318443> describes the methods in detail.

Maintained by Sujit K. Sahu. Last updated 1 years ago.

bayesian modelling spatio-temporal-data cpp

3.5 match 15 stars 4.95 score 12 scripts

genentech

psborrow2:Bayesian Dynamic Borrowing Analysis and Simulation

Bayesian dynamic borrowing is an approach to incorporating external data to supplement a randomized, controlled trial analysis in which external data are incorporated in a dynamic way (e.g., based on similarity of outcomes); see Viele 2013 <doi:10.1002/pst.1589> for an overview. This package implements the hierarchical commensurate prior approach to dynamic borrowing as described in Hobbes 2011 <doi:10.1111/j.1541-0420.2011.01564.x>. There are three main functionalities. First, 'psborrow2' provides a user-friendly interface for applying dynamic borrowing on the study results handles the Markov Chain Monte Carlo sampling on behalf of the user. Second, 'psborrow2' provides a simulation framework to compare different borrowing parameters (e.g. full borrowing, no borrowing, dynamic borrowing) and other trial and borrowing characteristics (e.g. sample size, covariates) in a unified way. Third, 'psborrow2' provides a set of functions to generate data for simulation studies, and also allows the user to specify their own data generation process. This package is designed to use the sampling functions from 'cmdstanr' which can be installed from <https://stan-dev.r-universe.dev>.

Maintained by Matt Secrest. Last updated 1 months ago.

bayesian-dynamic-borrowing psborrow2 simulation-study

2.2 match 18 stars 7.87 score 16 scripts

demsarjure

autohrf:Automated Generation of Data-Informed GLM Models in Task-Based fMRI Data Analysis

Analysis of task-related functional magnetic resonance imaging (fMRI) activity at the level of individual participants is commonly based on general linear modelling (GLM) that allows us to estimate to what extent the blood oxygenation level dependent (BOLD) signal can be explained by task response predictors specified in the GLM model. The predictors are constructed by convolving the hypothesised timecourse of neural activity with an assumed hemodynamic response function (HRF). To get valid and precise estimates of task response, it is important to construct a model of neural activity that best matches actual neuronal activity. The construction of models is most often driven by predefined assumptions on the components of brain activity and their duration based on the task design and specific aims of the study. However, our assumptions about the onset and duration of component processes might be wrong and can also differ across brain regions. This can result in inappropriate or suboptimal models, bad fitting of the model to the actual data and invalid estimations of brain activity. Here we present an approach in which theoretically driven models of task response are used to define constraints based on which the final model is derived computationally using the actual data. Specifically, we developed 'autohrf' — a package for the 'R' programming language that allows for data-driven estimation of HRF models. The package uses genetic algorithms to efficiently search for models that fit the underlying data well. The package uses automated parameter search to find the onset and duration of task predictors which result in the highest fitness of the resulting GLM based on the fMRI signal under predefined restrictions. We evaluate the usefulness of the 'autohrf' package on publicly available datasets of task-related fMRI activity. Our results suggest that by using 'autohrf' users can find better task related brain activity models in a quick and efficient manner.

Maintained by Jure Demšar. Last updated 1 years ago.

3.6 match 2 stars 4.72 score 13 scripts

tlverse

sl3:Pipelines for Machine Learning and Super Learning

A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.

Maintained by Jeremy Coyle. Last updated 4 months ago.

data-science ensemble-learning ensemble-model machine-learning model-selection regression stacking statistics

1.7 match 100 stars 9.94 score 748 scripts 7 dependents

singmann

afex:Analysis of Factorial Experiments

Convenience functions for analyzing factorial experiments using ANOVA or mixed models. aov_ez(), aov_car(), and aov_4() allow specification of between, within (i.e., repeated-measures), or mixed (i.e., split-plot) ANOVAs for data in long format (i.e., one observation per row), automatically aggregating multiple observations per individual and cell of the design. mixed() fits mixed models using lme4::lmer() and computes p-values for all fixed effects using either Kenward-Roger or Satterthwaite approximation for degrees of freedom (LMM only), parametric bootstrap (LMMs and GLMMs), or likelihood ratio tests (LMMs and GLMMs). afex_plot() provides a high-level interface for interaction or one-way plots using ggplot2, combining raw data and model estimates. afex uses type 3 sums of squares as default (imitating commercial statistical software).

Maintained by Henrik Singmann. Last updated 7 months ago.

1.2 match 123 stars 14.50 score 1.4k scripts 15 dependents

juhkim111

MGLM:Multivariate Response Generalized Linear Models

Provides functions that (1) fit multivariate discrete distributions, (2) generate random numbers from multivariate discrete distributions, and (3) run regression and penalized regression on the multivariate categorical response data. Implemented models include: multinomial logit model, Dirichlet multinomial model, generalized Dirichlet multinomial model, and negative multinomial model. Making the best of the minorization-maximization (MM) algorithm and Newton-Raphson method, we derive and implement stable and efficient algorithms to find the maximum likelihood estimates. On a multi-core machine, multi-threading is supported.

Maintained by Juhyun Kim. Last updated 3 years ago.

3.6 match 4 stars 4.65 score 53 scripts 1 dependents

maebruck

chantrics:Loglikelihood Adjustments for Econometric Models

Adjusts the loglikelihood of common econometric models for clustered data based on the estimation process suggested in Chandler and Bate (2007) <doi:10.1093/biomet/asm015>, using the 'chandwich' package <https://cran.r-project.org/package=chandwich>, and provides convenience functions for inference on the adjusted models.

Maintained by Theo Bruckbauer. Last updated 3 years ago.

clustering econometrics likelihood likelihood-ratio-test loglikelihood-adjustment maximum-likelihood

4.5 match 3.70 score 4 scripts

vaudigier

micemd:Multiple Imputation by Chained Equations with Multilevel Data

Addons for the 'mice' package to perform multiple imputation using chained equations with two-level data. Includes imputation methods dedicated to sporadically and systematically missing values. Imputation of continuous, binary or count variables are available. Following the recommendations of Audigier, V. et al (2018) <doi:10.1214/18-STS646>, the choice of the imputation method for each variable can be facilitated by a default choice tuned according to the structure of the incomplete dataset. Allows parallel calculation and overimputation for 'mice'.

Maintained by Vincent Audigier. Last updated 1 years ago.

5.4 match 1 stars 3.08 score 80 scripts 1 dependents

evolecolgroup

tidysdm:Species Distribution Models with Tidymodels

Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.

Maintained by Andrea Manica. Last updated 8 days ago.

species-distribution-modelling tidymodels

1.9 match 31 stars 8.82 score 51 scripts

wjbraun

DAAG:Data Analysis and Graphics Data and Functions

Functions and data sets used in examples and exercises in the text Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R", and in an upcoming Maindonald, Braun, and Andrews text that builds on this earlier text.

Maintained by W. John Braun. Last updated 11 months ago.

2.0 match 8.25 score 1.2k scripts 1 dependents

cran

qountstat:Statistical Analysis of Count Data and Quantal Data

Methods for statistical analysis of count data and quantal data. For the analysis of count data an implementation of the Closure Principle Computational Approach Test ("CPCAT") is provided (Lehmann, R et al. (2016) <doi:10.1007/s00477-015-1079-4>), as well as an implementation of a "Dunnett GLM" approach using a Quasi-Poisson regression (Hothorn, L, Kluxen, F (2020) <doi:10.1101/2020.01.15.907881>). For the analysis of quantal data an implementation of the Closure Principle Fisher–Freeman–Halton test ("CPFISH") is provided (Lehmann, R et al. (2018) <doi:10.1007/s00477-017-1392-1>). P-values and no/lowest observed (adverse) effect concentration values are calculated. All implemented methods include further functions to evaluate the power and the minimum detectable difference using a bootstrapping approach.

Maintained by Benjamin Daniels. Last updated 22 days ago.

16.2 match 1.00 score

opisthokonta

chainbinomial:Chain Binomial Models for Analysis of Infectious Disease Data

Implements the chain binomial model for analysis of infectious disease data. Contains functions for calculating probabilities of the final size of infectious disease outbreaks using the method from D. Ludwig (1975) <doi:10.1016/0025-5564(75)90119-4> and for outbreaks that are not concluded, from Lindstrøm et al. (2024) <doi:10.48550/arXiv.2403.03948>. The package also contains methods for estimation and regression analysis of secondary attack rates.

Maintained by Jonas Christoffer Lindstrøm. Last updated 2 months ago.

3.1 match 5.15 score 5 scripts

pbiecek

breakDown:Model Agnostic Explainers for Individual Predictions

Model agnostic tool for decomposition of predictions from black boxes. Break Down Table shows contributions of every variable to a final prediction. Break Down Plot presents variable contributions in a concise graphical way. This package work for binary classifiers and general regression models.

Maintained by Przemyslaw Biecek. Last updated 1 years ago.

data-science iml interpretability machine-learning visual-explanations xai

1.8 match 103 stars 8.90 score 91 scripts 2 dependents

eheinzen

elo:Ranking Teams by Elo Rating and Comparable Methods

A flexible framework for calculating Elo ratings and resulting rankings of any two-team-per-matchup system (chess, sports leagues, 'Go', etc.). This implementation is capable of evaluating a variety of matchups, Elo rating updates, and win probabilities, all based on the basic Elo rating system. It also includes methods to benchmark performance, including logistic regression and Markov chain models.

Maintained by Ethan Heinzen. Last updated 1 years ago.

elo elo-rating logistic-regression markov-chain markov-model ranking sports-analytics cpp

2.3 match 37 stars 7.05 score 153 scripts

edoardocostantini

gspcr:Generalized Supervised Principal Component Regression

Generalization of supervised principal component regression (SPCR; Bair et al., 2006, <doi:10.1198/016214505000000628>) to support continuous, binary, and discrete variables as outcomes and predictors (inspired by the 'superpc' R package <https://cran.r-project.org/package=superpc>).

Maintained by Edoardo Costantini. Last updated 12 months ago.

3.8 match 1 stars 4.18 score 10 scripts

cran

HiddenMarkov:Hidden Markov Models

Contains functions for the analysis of Discrete Time Hidden Markov Models, Markov Modulated GLMs and the Markov Modulated Poisson Process. It includes functions for simulation, parameter estimation, and the Viterbi algorithm. See the topic "HiddenMarkov" for an introduction to the package, and "Change Log" for a list of recent changes. The algorithms are based of those of Walter Zucchini.

Maintained by David Harte. Last updated 2 months ago.

fortran

4.1 match 3.79 score 59 scripts 3 dependents

koalaverse

sure:Surrogate Residuals for Ordinal and General Regression Models

An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017, <doi:https://doi.org/10.1080/01621459.2017.1292915>) and Greenwell et al. (2017, <https://journal.r-project.org/archive/2018/RJ-2018-004/index.html>). These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an 'autoplot' function for producing standard diagnostic plots using 'ggplot2' graphics. The package currently supports cumulative link models from packages 'MASS', 'ordinal', 'rms', and 'VGAM'. Support for binary regression models using the standard 'glm' function is also available.

Maintained by Brandon Greenwell. Last updated 12 days ago.

categorical-data diagnostics ordinal-regression residuals

2.8 match 9 stars 5.58 score 47 scripts 1 dependents

isglobal-brge

SNPassoc:SNPs-Based Whole Genome Association Studies

Functions to perform most of the common analysis in genome association studies are implemented. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Max-statistic and genetic risk-allele score exact distributions are also possible to be estimated. The methods are described in Gonzalez JR et al., 2007 <doi: 10.1093/bioinformatics/btm025>.

Maintained by Dolors Pelegri. Last updated 5 months ago.

1.7 match 16 stars 9.14 score 89 scripts 6 dependents

richarddmorey

BayesFactor:Computation of Bayes Factors for Common Designs

A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.

Maintained by Richard D. Morey. Last updated 1 years ago.

cpp

1.1 match 133 stars 13.70 score 1.7k scripts 21 dependents

arvsjo

stdReg:Regression Standardization

Contains functionality for regression standardization. Four general classes of models are allowed; generalized linear models, conditional generalized estimating equation models, Cox proportional hazards models and shared frailty gamma-Weibull models. Sjolander, A. (2016) <doi:10.1007/s10654-016-0157-3>.

Maintained by Arvid Sjolander. Last updated 4 years ago.

5.3 match 2.80 score 53 scripts 1 dependents

brian-j-smith

MachineShop:Machine Learning Models and Tools

Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.

Maintained by Brian J Smith. Last updated 7 months ago.

classification-models machine-learning predictive-modeling regression-models survival-models

1.9 match 61 stars 7.95 score 121 scripts

luca-scr

dispmod:Modelling Dispersion in GLM

Functions for estimating Gaussian dispersion regression models (Aitkin, 1987 <doi:10.2307/2347792>), overdispersed binomial logit models (Williams, 1987 <doi:10.2307/2347977>), and overdispersed Poisson log-linear models (Breslow, 1984 <doi:10.2307/2347661>), using a quasi-likelihood approach.

Maintained by Luca Scrucca. Last updated 7 years ago.

7.3 match 2.02 score 21 scripts

ipums

ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data

An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <https://www.ipums.org>.

Maintained by Derek Burk. Last updated 17 days ago.

1.3 match 28 stars 11.07 score 720 scripts 2 dependents

ikosmidis

detectseparation:Detect and Check for Separation and Infinite Maximum Likelihood Estimates

Provides pre-fit and post-fit methods for detecting separation and infinite maximum likelihood estimates in generalized linear models with categorical responses. The pre-fit methods apply on binomial-response generalized liner models such as logit, probit and cloglog regression, and can be directly supplied as fitting methods to the glm() function. They solve the linear programming problems for the detection of separation developed in Konis (2007, <https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a>) using 'ROI' <https://cran.r-project.org/package=ROI> or 'lpSolveAPI' <https://cran.r-project.org/package=lpSolveAPI>. The post-fit methods apply to models with categorical responses, including binomial-response generalized linear models and multinomial-response models, such as baseline category logits and adjacent category logits models; for example, the models implemented in the 'brglm2' <https://cran.r-project.org/package=brglm2> package. The post-fit methods successively refit the model with increasing number of iteratively reweighted least squares iterations, and monitor the ratio of the estimated standard error for each parameter to what it has been in the first iteration. According to the results in Lesaffre & Albert (1989, <https://www.jstor.org/stable/2345845>), divergence of those ratios indicates data separation.

Maintained by Ioannis Kosmidis. Last updated 3 years ago.

2.3 match 6 stars 6.52 score 23 scripts 3 dependents