R-universe search: factorization

revelle

psych:Procedures for Psychological, Psychometric, and Personality Research

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 3 months ago.

96.5 match 52 stars 13.94 score 29k scripts 317 dependents

tidyverse

forcats:Tools for Working with Categorical Variables (Factors)

Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').

Maintained by Hadley Wickham. Last updated 1 years ago.

factor tidyverse

56.6 match 555 stars 18.77 score 21k scripts 1.2k dependents

r-forge

Matrix:Sparse and Dense Matrix Classes and Methods

A rich hierarchy of sparse and dense matrix classes, including general, symmetric, triangular, and diagonal matrices with numeric, logical, or pattern entries. Efficient methods for operating on such matrices, often wrapping the 'BLAS', 'LAPACK', and 'SuiteSparse' libraries.

Maintained by Martin Maechler. Last updated 5 days ago.

openblas

42.0 match 1 stars 17.23 score 33k scripts 12k dependents

bioc

MOFA2:Multi-Omics Factor Analysis v2

The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.

Maintained by Ricard Argelaguet. Last updated 5 months ago.

dimensionreduction bayesian visualization factor-analysis mofa multi-omics

59.9 match 319 stars 10.02 score 502 scripts

richarddmorey

BayesFactor:Computation of Bayes Factors for Common Designs

A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.

Maintained by Richard D. Morey. Last updated 1 years ago.

cpp

36.6 match 133 stars 13.70 score 1.7k scripts 21 dependents

atmoschem

vein:Vehicular Emissions Inventories

Elaboration of vehicular emissions inventories, consisting in four stages, pre-processing activity data, preparing emissions factors, estimating the emissions and post-processing of emissions in maps and databases. More details in Ibarra-Espinosa et al (2018) <doi:10.5194/gmd-11-2209-2018>. Before using VEIN you need to know the vehicular composition of your study area, in other words, the combination of of type of vehicles, size and fuel of the fleet. Then, it is recommended to start with the project to download a template to create a structure of directories and scripts.

Maintained by Sergio Ibarra-Espinosa. Last updated 1 months ago.

atmoschem atmospheric-chemistry atmospheric-science atmospheric-sciences emissions emissions-model vehicular-emissions-inventories vein fortran openmp

57.4 match 46 stars 8.65 score 137 scripts

guokai8

fctutils:Advanced Factor Manipulation Utilities

Provides a collection of utility functions for manipulating and analyzing factor vectors in R. It offers tools for filtering, splitting, combining, and reordering factor levels based on various criteria. The package is designed to simplify common tasks in categorical data analysis, making it easier to work with factors in a flexible and efficient manner.

Maintained by Kai Guo. Last updated 5 months ago.

103.8 match 2 stars 4.60 score 4 scripts

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 9 days ago.

fortran cpp

26.7 match 87 stars 16.68 score 7.7k scripts 99 dependents

briencj

dae:Functions Useful in the Design and ANOVA of Experiments

The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the design functions for randomizing and assessing designs available as a vignette called 'DesignNotes'. The ANOVA functions facilitate the extraction of information when the 'Error' function has been used in the call to 'aov'. The package 'dae' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 3 months ago.

49.5 match 1 stars 8.62 score 356 scripts 7 dependents

tidymodels

recipes:Preprocessing and Feature Engineering Steps for Modeling

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Maintained by Max Kuhn. Last updated 4 days ago.

21.8 match 584 stars 18.71 score 7.2k scripts 380 dependents

bioc

consensusSeekeR:Detection of consensus regions inside a group of experiences using genomic positions and genomic ranges

This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region. In genomic analysis where feature identification generates a position value surrounded by a genomic range, such as ChIP-Seq peaks and nucleosome positions, the replication of an experiment may result in slight differences between predicted values. This package enables the conciliation of the results into consensus regions.

Maintained by Astrid Deschênes. Last updated 5 months ago.

biologicalquestion chipseq genetics multiplecomparison transcription peakdetection sequencing coverage chip-seq-analysis genomic-data-analysis nucleosome-positioning

73.0 match 1 stars 5.26 score 5 scripts 1 dependents

willwerscheid

flashier:Empirical Bayes Matrix Factorization

Methods for matrix factorization based on Wang and Stephens (2021) <https://jmlr.org/papers/v22/20-589.html>.

Maintained by Jason Willwerscheid. Last updated 2 months ago.

44.7 match 11 stars 8.32 score 266 scripts

husson

FactoMineR:Multivariate Exploratory Data Analysis and Data Mining

Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017).

Maintained by Francois Husson. Last updated 3 months ago.

25.2 match 47 stars 14.71 score 5.6k scripts 112 dependents

easystats

bayestestR:Understand and Describe Bayesian Models and Posterior Distributions

Provides utilities to describe posterior distributions and Bayesian models. It includes point-estimates such as Maximum A Posteriori (MAP), measures of dispersion (Highest Density Interval - HDI; Kruschke, 2015 <doi:10.1016/C2012-0-00477-2>) and indices used for null-hypothesis testing (such as ROPE percentage, pd and Bayes factors). References: Makowski et al. (2021) <doi:10.21105/joss.01541>.

Maintained by Dominique Makowski. Last updated 11 days ago.

bayes-factors bayesfactor bayesian bayesian-framework credible-interval easystats hacktoberfest hdi map posterior-distributions rope

20.5 match 579 stars 16.82 score 2.2k scripts 82 dependents

welch-lab

rliger:Linked Inference of Genomic Experimental Relationships

Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.

Maintained by Yichen Wang. Last updated 2 months ago.

nonnegative-matrix-factorization single-cell openblas cpp

30.6 match 402 stars 10.80 score 334 scripts 1 dependents

adeverse

ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.

Maintained by Aurélie Siberchicot. Last updated 11 days ago.

openblas cpp

21.8 match 39 stars 14.96 score 2.2k scripts 256 dependents

rvlenth

emmeans:Estimated Marginal Means, aka Least-Squares Means

Obtain estimated marginal means (EMMs) for many linear, generalized linear, and mixed models. Compute contrasts or linear functions of EMMs, trends, and comparisons of slopes. Plots and other displays. Least-squares means are discussed, and the term "estimated marginal means" is suggested, in Searle, Speed, and Milliken (1980) Population marginal means in the linear model: An alternative to least squares means, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>.

Maintained by Russell V. Lenth. Last updated 2 days ago.

16.6 match 377 stars 19.19 score 13k scripts 187 dependents

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

39.5 match 145 stars 7.09 score 50 scripts 2 dependents

ipeagit

gtfs2emis:Estimating Public Transport Emissions from General Transit Feed Specification (GTFS) Data

A bottom up model to estimate the emission levels of public transport systems based on General Transit Feed Specification (GTFS) data. The package requires two main inputs: i) Public transport data in the GTFS standard format; and ii) Some basic information on fleet characteristics such as fleet age, technology, fuel and Euro stage. As it stands, the package estimates several pollutants at high spatial and temporal resolutions. Pollution levels can be calculated for specific transport routes, trips, time of the day or for the transport system as a whole. The output with emission estimates can be extracted in different formats, supporting analysis on how emission levels vary across space, time and by fleet characteristics. A full description of the methods used in the 'gtfs2emis' model is presented in Vieira, J. P. B.; Pereira, R. H. M.; Andrade, P. R. (2022) <doi:10.31219/osf.io/8m2cy>.

Maintained by Joao Bazzo. Last updated 2 months ago.

emissions environmental-modelling gtfs public-transport rspatial transport

35.3 match 28 stars 7.47 score 29 scripts

jwood000

RcppAlgos:High Performance Tools for Combinatorics and Computational Mathematics

Provides optimized functions and flexible iterators implemented in C++ for solving problems in combinatorics and computational mathematics. Handles various combinatorial objects including combinations, permutations, integer partitions and compositions, Cartesian products, unordered Cartesian products, and partition of groups. Utilizes the RMatrix class from 'RcppParallel' for thread safety. The combination and permutation functions contain constraint parameters that allow for generation of all results of a vector meeting specific criteria (e.g. finding all combinations such that the sum is between two bounds). Capable of ranking/unranking combinatorial objects efficiently (e.g. retrieve only the nth lexicographical result) which sets up nicely for parallelization as well as random sampling. Gmp support permits exploration where the total number of results is large (e.g. comboSample(10000, 500, n = 4)). Additionally, there are several high performance number theoretic functions that are useful for problems common in computational mathematics. Some of these functions make use of the fast integer division library 'libdivide'. The primeSieve function is based on the segmented sieve of Eratosthenes implementation by Kim Walisch. It is also efficient for large numbers by using the cache friendly improvements originally developed by Tomás Oliveira. Finally, there is a prime counting function that implements Legendre's formula based on the work of Kim Walisch.

Maintained by Joseph Wood. Last updated 1 months ago.

combinations combinatorics factorization number-theory parallel permutation prime-factorizations primesieve gmp cpp

25.1 match 45 stars 10.04 score 153 scripts 12 dependents

zdebruine

RcppML:Rcpp Machine Learning Library

Fast machine learning algorithms including matrix factorization and divisive clustering for large sparse and dense matrices.

Maintained by Zach DeBruine. Last updated 2 years ago.

clustering matrix-factorization nmf rcpp rcppeigen sparse-matrix cpp openmp

23.3 match 104 stars 10.53 score 125 scripts 46 dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

28.2 match 3 stars 8.13 score 7.8k scripts 11 dependents

nwaller

fungible:Psychometric Functions from the Waller Lab

Computes fungible coefficients and Monte Carlo data. Underlying theory for these functions is described in the following publications: Waller, N. (2008). Fungible Weights in Multiple Regression. Psychometrika, 73(4), 691-703, <DOI:10.1007/s11336-008-9066-z>. Waller, N. & Jones, J. (2009). Locating the Extrema of Fungible Regression Weights. Psychometrika, 74(4), 589-602, <DOI:10.1007/s11336-008-9087-7>. Waller, N. G. (2016). Fungible Correlation Matrices: A Method for Generating Nonsingular, Singular, and Improper Correlation Matrices for Monte Carlo Research. Multivariate Behavioral Research, 51(4), 554-568. Jones, J. A. & Waller, N. G. (2015). The normal-theory and asymptotic distribution-free (ADF) covariance matrix of standardized regression coefficients: theoretical extensions and finite sample behavior. Psychometrika, 80, 365-378, <DOI:10.1007/s11336-013-9380-y>. Waller, N. G. (2018). Direct Schmid-Leiman transformations and rank-deficient loadings matrices. Psychometrika, 83, 858-870. <DOI:10.1007/s11336-017-9599-0>.

Maintained by Niels Waller. Last updated 1 years ago.

43.4 match 5.01 score 136 scripts 8 dependents

haeran-cho

fnets:Factor-Adjusted Network Estimation and Forecasting for High-Dimensional Time Series

Implements methods for network estimation and forecasting of high-dimensional time series exhibiting strong serial and cross-sectional correlations under a factor-adjusted vector autoregressive model. See Barigozzi, Cho and Owens (2024) <doi:10.1080/07350015.2023.2257270> for further descriptions of FNETS methodology and Owens, Cho and Barigozzi (2024) <arXiv:2301.11675> accompanying the R package.

Maintained by Haeran Cho. Last updated 4 months ago.

factor-models forecasting high-dimensional network-estimation time-series vector-autoregression cpp

40.8 match 7 stars 5.33 score 28 scripts

debruine

faux:Simulation for Factorial Designs

Create datasets with factorial structure through simulation by specifying variable parameters. Extended documentation at <https://debruine.github.io/faux/>. Described in DeBruine (2020) <doi:10.5281/zenodo.2669586>.

Maintained by Lisa DeBruine. Last updated 2 months ago.

data simulation

22.9 match 98 stars 9.35 score 716 scripts 1 dependents

david-cortes

cmfrec:Collective Matrix Factorization for Recommender Systems

Collective matrix factorization (a.k.a. multi-view or multi-way factorization, Singh, Gordon, (2008) <doi:10.1145/1401890.1401969>) tries to approximate a (potentially very sparse or having many missing values) matrix 'X' as the product of two low-dimensional matrices, optionally aided with secondary information matrices about rows and/or columns of 'X', which are also factorized using the same latent components. The intended usage is for recommender systems, dimensionality reduction, and missing value imputation. Implements extensions of the original model (Cortes, (2018) <arXiv:1809.00366>) and can produce different factorizations such as the weighted 'implicit-feedback' model (Hu, Koren, Volinsky, (2008) <doi:10.1109/ICDM.2008.22>), the 'weighted-lambda-regularization' model, (Zhou, Wilkinson, Schreiber, Pan, (2008) <doi:10.1007/978-3-540-68880-8_32>), or the enhanced model with 'implicit features' (Rendle, Zhang, Koren, (2019) <arXiv:1905.01395>), with or without side information. Can use gradient-based procedures or alternating-least squares procedures (Koren, Bell, Volinsky, (2009) <doi:10.1109/MC.2009.263>), with either a Cholesky solver, a faster conjugate gradient solver (Takacs, Pilaszy, Tikk, (2011) <doi:10.1145/2043932.2043987>), or a non-negative coordinate descent solver (Franc, Hlavac, Navara, (2005) <doi:10.1007/11556121_50>), providing efficient methods for sparse and dense data, and mixtures thereof. Supports L1 and L2 regularization in the main models, offers alternative most-popular and content-based models, and implements functionality for cold-start recommendations and imputation of 2D data.

Maintained by David Cortes. Last updated 2 months ago.

cold-start collaborative-filtering collective-matrix-factorization openblas openmp

31.2 match 120 stars 6.84 score 23 scripts

yrosseel

lavaan:Latent Variable Analysis

Fit a variety of latent variable models, including confirmatory factor analysis, structural equation modeling and latent growth curve models.

Maintained by Yves Rosseel. Last updated 3 days ago.

factor-analysis growth-curve-models latent-variables missing-data multilevel-models multivariate-analysis path-analysis psychometrics statistical-modeling structural-equation-modeling

11.8 match 453 stars 16.83 score 8.4k scripts 217 dependents

ewenharrison

finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling

Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.

Maintained by Ewen Harrison. Last updated 6 months ago.

16.6 match 270 stars 11.43 score 1.0k scripts

bioc

S4Vectors:Foundation of vector-like and list-like containers in Bioconductor

The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure datarepresentation bioconductor-package core-package

11.3 match 18 stars 16.05 score 1.0k scripts 1.9k dependents

mdsteiner

EFAtools:Fast and Flexible Implementations of Exploratory Factor Analysis Tools

Provides functions to perform exploratory factor analysis (EFA) procedures and compare their solutions. The goal is to provide state-of-the-art factor retention methods and a high degree of flexibility in the EFA procedures. This way, for example, implementations from R 'psych' and 'SPSS' can be compared. Moreover, functions for Schmid-Leiman transformation and the computation of omegas are provided. To speed up the analyses, some of the iterative procedures, like principal axis factoring (PAF), are implemented in C++.

Maintained by Markus Steiner. Last updated 3 months ago.

openblas cpp openmp

26.9 match 10 stars 6.57 score 83 scripts 1 dependents

rikenbit

dcTensor:Discrete Matrix/Tensor Decomposition

Semi-Binary and Semi-Ternary Matrix Decomposition are performed based on Non-negative Matrix Factorization (NMF) and Singular Value Decomposition (SVD). For the details of the methods, see the reference section of GitHub README.md <https://github.com/rikenbit/dcTensor>.

Maintained by Koki Tsuyuzaki. Last updated 10 months ago.

34.0 match 3 stars 5.08 score

feiyoung

GFM:Generalized Factor Model

Generalized factor model is implemented for ultra-high dimensional data with mixed-type variables. Two algorithms, variational EM and alternate maximization, are designed to implement the generalized factor model, respectively. The factor matrix and loading matrix together with the number of factors can be well estimated. This model can be employed in social and behavioral sciences, economy and finance, and genomics, to extract interpretable nonlinear factors. More details can be referred to Wei Liu, Huazhen Lin, Shurong Zheng and Jin Liu. (2021) <doi:10.1080/01621459.2021.1999818>.

Maintained by Wei Liu. Last updated 6 months ago.

approximate-factor-model feature-extraction nonlinear-dimension-reduction number-of-factors openblas cpp

30.3 match 2 stars 5.68 score 8 scripts 2 dependents

spatstat

spatstat.geom:Geometrical Functionality of the 'spatstat' Family

Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)

Maintained by Adrian Baddeley. Last updated 24 days ago.

classes-and-objects distance-calculation geometry geometry-processing images mensuration plotting point-patterns spatial-data spatial-data-analysis

13.8 match 7 stars 12.16 score 241 scripts 223 dependents

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 10 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

10.4 match 375 stars 16.11 score 17k scripts 115 dependents

bioc

fabia:FABIA: Factor Analysis for Bicluster Acquisition

Biclustering by "Factor Analysis for Bicluster Acquisition" (FABIA). FABIA is a model-based technique for biclustering, that is clustering rows and columns simultaneously. Biclusters are found by factor analysis where both the factors and the loading matrix are sparse. FABIA is a multiplicative model that extracts linear dependencies between samples and feature patterns. It captures realistic non-Gaussian data distributions with heavy tails as observed in gene expression measurements. FABIA utilizes well understood model selection techniques like the EM algorithm and variational approaches and is embedded into a Bayesian framework. FABIA ranks biclusters according to their information content and separates spurious biclusters from true biclusters. The code is written in C.

Maintained by Andreas Mitterecker. Last updated 5 months ago.

statisticalmethod microarray differentialexpression multiplecomparison clustering visualization

28.6 match 5.84 score 32 scripts 6 dependents

cran

nlme:Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixed-effects models.

Maintained by R Core Team. Last updated 2 months ago.

fortran

12.6 match 6 stars 13.00 score 13k scripts 8.7k dependents

braverock

PerformanceAnalytics:Econometric Tools for Performance and Risk Analysis

Collection of econometric functions for performance and risk analysis. In addition to standard risk and performance metrics, this package aims to aid practitioners and researchers in utilizing the latest research in analysis of non-normal return streams. In general, it is most tested on return (rather than price) data on a regular scale, but most functions will work with irregular return data as well, and increasing numbers of functions will work with P&L or price data where possible.

Maintained by Brian G. Peterson. Last updated 3 months ago.

10.3 match 222 stars 15.93 score 4.8k scripts 20 dependents

gavinsimpson

gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'

Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.

Maintained by Gavin L. Simpson. Last updated 4 days ago.

distributional-regression gam gamm generalized-additive-mixed-models generalized-additive-models ggplot2 glm lm mgcv penalized-spline random-effects smoothing splines

12.9 match 216 stars 12.68 score 1.6k scripts 1 dependents

r-gregmisc

gdata:Various R Programming Tools for Data Manipulation

Various R programming tools for data manipulation, including medical unit conversions, combining objects, character vector operations, factor manipulation, obtaining information about R objects, generating fixed-width format files, extracting components of date & time objects, operations on columns of data frames, matrix operations, operations on vectors, operations on data frames, value of last evaluated expression, and a resample() wrapper for sample() that ensures consistent behavior for both scalar and vector arguments.

Maintained by Arni Magnusson. Last updated 2 months ago.

11.8 match 9 stars 13.62 score 4.5k scripts 124 dependents

tlverse

tmle3:The Extensible TMLE Framework

A general framework supporting the implementation of targeted maximum likelihood estimators (TMLEs) of a diverse range of statistical target parameters through a unified interface. The goal is that the exposed framework be as general as the mathematical framework upon which it draws.

Maintained by Jeremy Coyle. Last updated 4 months ago.

causal-inference machine-learning targeted-learning variable-importance

20.1 match 38 stars 7.91 score 286 scripts 5 dependents

collinerickson

GauPro:Gaussian Process Fitting

Fits a Gaussian process model to data. Gaussian processes are commonly used in computer experiments to fit an interpolating model. The model is stored as an 'R6' object and can be easily updated with new data. There are options to run in parallel, and 'Rcpp' has been used to speed up calculations. For more info about Gaussian process software, see Erickson et al. (2018) <doi:10.1016/j.ejor.2017.10.002>.

Maintained by Collin Erickson. Last updated 6 days ago.

openblas cpp openmp

18.8 match 16 stars 8.40 score 104 scripts 1 dependents

rspatial

raster:Geographic Data Analysis and Modeling

Reading, writing, manipulating, analyzing and modeling of spatial data. This package has been superseded by the "terra" package <https://CRAN.R-project.org/package=terra>.

Maintained by Robert J. Hijmans. Last updated 2 months ago.

cpp

9.0 match 164 stars 17.05 score 58k scripts 555 dependents

stscl

gdverse:Analysis of Spatial Stratified Heterogeneity

Analyzing spatial factors and exploring spatial associations based on the concept of spatial stratified heterogeneity, while also taking into account local spatial dependencies, spatial interpretability, complex spatial interactions, and robust spatial stratification. Additionally, it supports the spatial stratified heterogeneity family established in academic literature.

Maintained by Wenbo Lv. Last updated 13 days ago.

geographical-detector geoinformatics geospatial-analysis spatial-statistics spatial-stratified-heterogeneity cpp

16.9 match 32 stars 9.05 score 41 scripts 2 dependents

drizopoulos

ltm:Latent Trait Models under IRT

Analysis of multivariate dichotomous and polytomous data using latent trait models under the Item Response Theory approach. It includes the Rasch, the Two-Parameter Logistic, the Birnbaum's Three-Parameter, the Graded Response, and the Generalized Partial Credit Models.

Maintained by Dimitris Rizopoulos. Last updated 3 years ago.

15.9 match 30 stars 9.59 score 1.0k scripts 27 dependents

bioc

metagenomeSeq:Statistical analysis for sparse high-throughput sequencing

metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.

Maintained by Joseph N. Paulson. Last updated 3 months ago.

immunooncology classification clustering geneticvariability differentialexpression microbiome metagenomics normalization visualization multiplecomparison sequencing software

12.4 match 69 stars 12.02 score 494 scripts 7 dependents

a91quaini

intrinsicFRP:An R Package for Factor Model Asset Pricing

Functions for evaluating and testing asset pricing models, including estimation and testing of factor risk premia, selection of "strong" risk factors (factors having nonzero population correlation with test asset returns), heteroskedasticity and autocorrelation robust covariance matrix estimation and testing for model misspecification and identification. The functions for estimating and testing factor risk premia implement the Fama-MachBeth (1973) <doi:10.1086/260061> two-pass approach, the misspecification-robust approaches of Kan-Robotti-Shanken (2013) <doi:10.1111/jofi.12035>, and the approaches based on tradable factor risk premia of Quaini-Trojani-Yuan (2023) <doi:10.2139/ssrn.4574683>. The functions for selecting the "strong" risk factors are based on the Oracle estimator of Quaini-Trojani-Yuan (2023) <doi:10.2139/ssrn.4574683> and the factor screening procedure of Gospodinov-Kan-Robotti (2014) <doi:10.2139/ssrn.2579821>. The functions for evaluating model misspecification implement the HJ model misspecification distance of Kan-Robotti (2008) <doi:10.1016/j.jempfin.2008.03.003>, which is a modification of the prominent Hansen-Jagannathan (1997) <doi:10.1111/j.1540-6261.1997.tb04813.x> distance. The functions for testing model identification specialize the Kleibergen-Paap (2006) <doi:10.1016/j.jeconom.2005.02.011> and the Chen-Fang (2019) <doi:10.1111/j.1540-6261.1997.tb04813.x> rank test to the regression coefficient matrix of test asset returns on risk factors. Finally, the function for heteroskedasticity and autocorrelation robust covariance estimation implements the Newey-West (1994) <doi:10.2307/2297912> covariance estimator.

Maintained by Alberto Quaini. Last updated 8 months ago.

factor-models factor-selection finance identification-tests misspecification rcpparmadillo risk-premium openblas cpp openmp

33.2 match 7 stars 4.45 score 1 scripts

mmrabe

designr:Balanced Factorial Designs

Generate balanced factorial designs with crossed and nested random and fixed effects <https://github.com/mmrabe/designr>.

Maintained by Maximilian M. Rabe. Last updated 2 years ago.

28.4 match 10 stars 5.18 score 15 scripts

bioc

nipalsMCIA:Multiple Co-Inertia Analysis via the NIPALS Method

Computes Multiple Co-Inertia Analysis (MCIA), a dimensionality reduction (jDR) algorithm, for a multi-block dataset using a modification to the Nonlinear Iterative Partial Least Squares method (NIPALS) proposed in (Hanafi et. al, 2010). Allows multiple options for row- and table-level preprocessing, and speeds up computation of variance explained. Vignettes detail application to bulk- and single cell- multi-omics studies.

Maintained by Maximilian Mattessich. Last updated 25 days ago.

software clustering classification multiplecomparison normalization preprocessing singlecell

22.1 match 6 stars 6.60 score 10 scripts

melff

memisc:Management of Survey Data and Presentation of Analysis Results

An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.

Maintained by Martin Elff. Last updated 10 days ago.

survey-data

11.8 match 46 stars 12.34 score 1.2k scripts 13 dependents

nicholasjclark

mvgam:Multivariate (Dynamic) Generalized Additive Models

Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) <doi:10.1111/2041-210X.13974>.

Maintained by Nicholas J Clark. Last updated 1 days ago.

bayesian-statistics dynamic-factor-models ecological-modelling forecasting gaussian-process generalised-additive-models generalized-additive-models joint-species-distribution-modelling multilevel-models multivariate-timeseries stan time-series-analysis timeseries vector-autoregression vectorautoregression cpp

14.8 match 139 stars 9.85 score 117 scripts

rspatial

terra:Spatial Data Analysis

Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).

Maintained by Robert J. Hijmans. Last updated 10 hours ago.

geospatial raster spatial vector onetbb proj gdal geos cpp

8.3 match 559 stars 17.65 score 17k scripts 849 dependents

r-forge

GPArotation:GPA Factor Rotation

Gradient Projection Algorithm Rotation for Factor Analysis. See '?GPArotation.Intro' for more details.

Maintained by Paul Gilbert. Last updated 2 months ago.

11.4 match 1 stars 12.66 score 1.1k scripts 362 dependents

sebkrantz

collapse:Advanced and Fast Data Transformation

A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.

Maintained by Sebastian Krantz. Last updated 4 days ago.

data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data scientific-computing statistics time-series weighted weights cpp openmp

8.6 match 672 stars 16.63 score 708 scripts 97 dependents

faosorios

fastmatrix:Fast Computation of some Matrices Useful in Statistics

Small set of functions to fast computation of some matrices and operations useful in statistics and econometrics. Currently, there are functions for efficient computation of duplication, commutation and symmetrizer matrices with minimal storage requirements. Some commonly used matrix decompositions (LU and LDL), basic matrix operations (for instance, Hadamard, Kronecker products and the Sherman-Morrison formula) and iterative solvers for linear systems are also available. In addition, the package includes a number of common statistical procedures such as the sweep operator, weighted mean and covariance matrix using an online algorithm, linear regression (using Cholesky, QR, SVD, sweep operator and conjugate gradients methods), ridge regression (with optimal selection of the ridge parameter considering several procedures), omnibus tests for univariate normality, functions to compute the multivariate skewness, kurtosis, the Mahalanobis distance (checking the positive defineteness), and the Wilson-Hilferty transformation of gamma variables. Furthermore, the package provides interfaces to C code callable by another C code from other R packages.

Maintained by Felipe Osorio. Last updated 1 years ago.

commutation-matrix jarque-bera-test ldl-factorization lu-factorization matrix-api-for-r-packages matrix-norms modified-cholesky ols-regression power-method ridge-regression sherman-morrison statistics sweep-operator symmetrizer-matrix fortran openblas

22.7 match 19 stars 6.27 score 37 scripts 10 dependents

gregorkastner

factorstochvol:Bayesian Estimation of (Sparse) Latent Factor Stochastic Volatility Models

Markov chain Monte Carlo (MCMC) sampler for fully Bayesian estimation of latent factor stochastic volatility models with interweaving <doi:10.1080/10618600.2017.1322091>. Sparsity can be achieved through the usage of Normal-Gamma priors on the factor loading matrix <doi:10.1016/j.jeconom.2018.11.007>.

Maintained by Gregor Kastner. Last updated 1 years ago.

openblas cpp

30.1 match 7 stars 4.73 score 17 scripts 1 dependents

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 5 days ago.

9.2 match 100 stars 15.36 score 1.4k scripts 36 dependents

quantmeth

Rnest:Next Eigenvalue Sufficiency Test

Determine the number of dimensions to retain in exploratory factor analysis. The main function, nest(), returns the solution and the plot(nest()) returns a plot.

Maintained by P.-O. Caron. Last updated 2 months ago.

exploratory-data-analysis factor-analysis

34.6 match 2 stars 4.02 score 13 scripts

tbates

umx:Structural Equation Modeling and Twin Modeling in R

Quickly create, run, and report structural equation models, and twin models. See '?umx' for help, and umx_open_CRAN_page("umx") for NEWS. Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. <doi:10.1017/thg.2019.2>.

Maintained by Timothy C. Bates. Last updated 6 days ago.

behavior-genetics genetics openmx psychology sem statistics structural-equation-modeling tutorials twin-models umx

14.5 match 44 stars 9.46 score 472 scripts

lme4

lme4:Linear Mixed-Effects Models using 'Eigen' and S4

Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".

Maintained by Ben Bolker. Last updated 1 days ago.

cpp

6.6 match 647 stars 20.69 score 35k scripts 1.5k dependents

corybrunson

ordr:A Tidyverse Extension for Ordinations and Biplots

Ordination comprises several multivariate exploratory and explanatory techniques with theoretical foundations in geometric data analysis; see Podani (2000, ISBN:90-5782-067-6) for techniques and applications and Le Roux & Rouanet (2005) <doi:10.1007/1-4020-2236-0> for foundations. Greenacre (2010, ISBN:978-84-923846) shows how the most established of these, including principal components analysis, correspondence analysis, multidimensional scaling, factor analysis, and discriminant analysis, rely on eigen-decompositions or singular value decompositions of pre-processed numeric matrix data. These decompositions give rise to a set of shared coordinates along which the row and column elements can be measured. The overlay of their scatterplots on these axes, introduced by Gabriel (1971) <doi:10.1093/biomet/58.3.453>, is called a biplot. 'ordr' provides inspection, extraction, manipulation, and visualization tools for several popular ordination classes supported by a set of recovery methods. It is inspired by and designed to integrate into 'tidyverse' workflows provided by Wickham et al (2019) <doi:10.21105/joss.01686>.

Maintained by Jason Cory Brunson. Last updated 11 days ago.

biplot data-visualization dimension-reduction geometric-data-analysis grammar-of-graphics log-ratio-analysis multivariate-analysis multivariate-statistics ordination tidymodels tidyverse

18.5 match 24 stars 7.26 score 28 scripts

mayoverse

arsenal:An Arsenal of 'R' Functions for Large-Scale Statistical Summaries

An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types 'by' the levels of one or more categorical variables; paired(), a Table-1-like summary of multiple variable types paired across two time points; modelsum(), which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates); freqlist(), a powerful frequency table across many categorical variables; comparedf(), a function for comparing data.frames; and write2(), a function to output tables to a document.

Maintained by Ethan Heinzen. Last updated 7 months ago.

baseline-characteristics descriptive-statistics modeling paired-comparisons reporting statistics tableone

9.9 match 225 stars 13.45 score 1.2k scripts 16 dependents

j-mitchel

scITD:Single-Cell Interpretable Tensor Decomposition

Single-cell Interpretable Tensor Decomposition (scITD) employs the Tucker tensor decomposition to extract multicell-type gene expression patterns that vary across donors/individuals. This tool is geared for use with single-cell RNA-sequencing datasets consisting of many source donors. The method has a wide range of potential applications, including the study of inter-individual variation at the population-level, patient sub-grouping/stratification, and the analysis of sample-level batch effects. Each "multicellular process" that is extracted consists of (A) a multi cell type gene loadings matrix and (B) a corresponding donor scores vector indicating the level at which the corresponding loadings matrix is expressed in each donor. Additional methods are implemented to aid in selecting an appropriate number of factors and to evaluate stability of the decomposition. Additional tools are provided for downstream analysis, including integration of gene set enrichment analysis and ligand-receptor analysis. Tucker, L.R. (1966) <doi:10.1007/BF02289464>. Unkel, S., Hannachi, A., Trendafilov, N. T., & Jolliffe, I. T. (2011) <doi:10.1007/s13253-011-0055-9>. Zhou, G., & Cichocki, A. (2012) <doi:10.2478/v10175-012-0051-4>.

Maintained by Jonathan Mitchel. Last updated 2 years ago.

cpp

65.9 match 1.98 score 19 scripts

keefe-murphy

IMIFA:Infinite Mixtures of Infinite Factor Analysers and Related Models

Provides flexible Bayesian estimation of Infinite Mixtures of Infinite Factor Analysers and related models, for nonparametrically clustering high-dimensional data, introduced by Murphy et al. (2020) <doi:10.1214/19-BA1179>. The IMIFA model conducts Bayesian nonparametric model-based clustering with factor analytic covariance structures without recourse to model selection criteria to choose the number of clusters or cluster-specific latent factors, mostly via efficient Gibbs updates. Model-specific diagnostic tools are also provided, as well as many options for plotting results, conducting posterior inference on parameters of interest, posterior predictive checking, and quantifying uncertainty.

Maintained by Keefe Murphy. Last updated 1 years ago.

bayesian-nonparametrics dimension-reduction factor-analysis gaussian-mixture-model model-based-clustering

24.4 match 7 stars 5.25 score 51 scripts

david-cortes

poismf:Factorization of Sparse Counts Matrices Through Poisson Likelihood

Creates a non-negative low-rank approximate factorization of a sparse counts matrix by maximizing Poisson likelihood with L1/L2 regularization (e.g. for implicit-feedback recommender systems or bag-of-words-based topic modeling) (Cortes, (2018) <arXiv:1811.01908>), which usually leads to very sparse user and item factors (over 90% zero-valued). Similar to hierarchical Poisson factorization (HPF), but follows an optimization-based approach with regularization instead of a hierarchical prior, and is fit through gradient-based methods instead of variational inference.

Maintained by David Cortes. Last updated 9 months ago.

implicit-feedback poisson-factorization openblas openmp

27.5 match 46 stars 4.66 score 9 scripts

mlr-org

mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'

Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.

Maintained by Martin Binder. Last updated 7 days ago.

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing stacking

10.4 match 141 stars 12.36 score 448 scripts 7 dependents

rikenbit

nnTensor:Non-Negative Tensor Decomposition

Some functions for performing non-negative matrix factorization, non-negative CANDECOMP/PARAFAC (CP) decomposition, non-negative Tucker decomposition, and generating toy model data. See Andrzej Cichock et al (2009) and the reference section of GitHub README.md <https://github.com/rikenbit/nnTensor>, for details of the methods.

Maintained by Koki Tsuyuzaki. Last updated 10 months ago.

19.5 match 16 stars 6.58 score 9 scripts 4 dependents

melissagwolf

dynamic:DFI Cutoffs for Latent Variable Models

Returns dynamic fit index (DFI) cutoffs for latent variable models that are tailored to the user's model statement, model type, and sample size. This is the counterpart of the Shiny Application, <https://dynamicfit.app>.

Maintained by Melissa G. Wolf. Last updated 2 months ago.

17.9 match 16 stars 7.13 score 139 scripts

yixuan

recosystem:Recommender System using Matrix Factorization

R wrapper of the 'libmf' library <https://www.csie.ntu.edu.tw/~cjlin/libmf/> for recommender system using matrix factorization. It is typically used to approximate an incomplete matrix using the product of two matrices in a latent space. Other common names for this task include "collaborative filtering", "matrix completion", "matrix recovery", etc. High performance multi-core parallel computing is supported in this package.

Maintained by Yixuan Qiu. Last updated 2 years ago.

matrix-factorization recommender-system cpp openmp

15.9 match 84 stars 7.97 score 101 scripts 6 dependents

njtierney

naniar:Data Structures, Summaries, and Visualisations for Missing Data

Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.

Maintained by Nicholas Tierney. Last updated 2 days ago.

data-visualisation ggplot2 missing-data missingness tidy-data

8.1 match 657 stars 15.63 score 5.1k scripts 9 dependents

biodiverse

spAbundance:Univariate and Multivariate Spatial Modeling of Species Abundance

Fits single-species (univariate) and multi-species (multivariate) non-spatial and spatial abundance models in a Bayesian framework using Markov Chain Monte Carlo (MCMC). Spatial models are fit using Nearest Neighbor Gaussian Processes (NNGPs). Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Fits single-species and multi-species spatial and non-spatial versions of generalized linear mixed models (Gaussian, Poisson, Negative Binomial), N-mixture models (Royle 2004 <doi:10.1111/j.0006-341X.2004.00142.x>) and hierarchical distance sampling models (Royle, Dawson, Bates (2004) <doi:10.1890/03-3127>). Multi-species spatial models are fit using a spatial factor modeling approach with NNGPs for computational efficiency.

Maintained by Jeffrey Doser. Last updated 16 days ago.

openblas cpp openmp

20.5 match 17 stars 6.15 score 43 scripts 1 dependents

rohelab

vsp:Vintage Sparse PCA for Semi-Parametric Factor Analysis

Provides fast spectral estimation of latent factors in random dot product graphs using the vsp estimator. Under mild assumptions, the vsp estimator is consistent for (degree-corrected) stochastic blockmodels, (degree-corrected) mixed-membership stochastic blockmodels, and degree-corrected overlapping stochastic blockmodels.

Maintained by Alex Hayes. Last updated 4 months ago.

20.3 match 26 stars 6.17 score 19 scripts

bioc

TFARM:Transcription Factors Association Rules Miner

It searches for relevant associations of transcription factors with a transcription factor target, in specific genomic regions. It also allows to evaluate the Importance Index distribution of transcription factors (and combinations of transcription factors) in association rules.

Maintained by Liuba Nausicaa Martino. Last updated 5 months ago.

biologicalquestion infrastructure statisticalmethod transcription

31.2 match 4.00 score 2 scripts

fbartos

BayesTools:Tools for Bayesian Analyses

Provides tools for conducting Bayesian analyses and Bayesian model averaging (Kass and Raftery, 1995, <doi:10.1080/01621459.1995.10476572>, Hoeting et al., 1999, <doi:10.1214/ss/1009212519>). The package contains functions for creating a wide range of prior distribution objects, mixing posterior samples from 'JAGS' and 'Stan' models, plotting posterior distributions, and etc... The tools for working with prior distribution span from visualization, generating 'JAGS' and 'bridgesampling' syntax to basic functions such as rng, quantile, and distribution functions.

Maintained by František Bartoš. Last updated 2 months ago.

bayesian model-averaging

19.3 match 7 stars 6.42 score 17 scripts 3 dependents

chrisaberson

pwr2ppl:Power Analyses for Common Designs (Power to the People)

Statistical power analysis for designs including t-tests, correlations, multiple regression, ANOVA, mediation, and logistic regression. Functions accompany Aberson (2019) <doi:10.4324/9781315171500>.

Maintained by Chris Aberson. Last updated 3 years ago.

29.6 match 17 stars 4.16 score 17 scripts

r-lib

vctrs:Vector Helpers

Defines new notions of prototype and size that are used to provide tools for consistent and well-founded type-coercion and size-recycling, and are in turn connected to ideas of type- and size-stability useful for analysing function interfaces.

Maintained by Davis Vaughan. Last updated 5 months ago.

s3-vectors

6.5 match 290 stars 18.97 score 1.1k scripts 13k dependents

bxc147

Epi:Statistical Analysis in Epidemiology

Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.

Maintained by Bendix Carstensen. Last updated 2 months ago.

12.7 match 4 stars 9.65 score 708 scripts 11 dependents

robustport

facmodCS:Cross-Section Factor Models

Linear cross-section factor model fitting with least-squares and robust fitting the 'lmrobdetMM()' function from 'RobStatTM'; related volatility, Value at Risk and Expected Shortfall risk and performance attribution (factor-contributed vs idiosyncratic returns); tabular displays of risk and performance reports; factor model Monte Carlo. The package authors would like to thank Chicago Research on Security Prices,LLC for the cross-section of about 300 CRSP stocks data (in the data.table object 'stocksCRSP', and S&P GLOBAL MARKET INTELLIGENCE for contributing 14 factor scores (a.k.a "alpha factors".and "factor exposures") fundamental data on the 300 companies in the data.table object 'factorSPGMI'. The 'stocksCRSP' and 'factorsSPGMI' data are not covered by the GPL-2 license, are not provided as open source of any kind, and they are not to be redistributed in any form.

Maintained by Mido Shammaa. Last updated 1 years ago.

38.6 match 3.18 score 2 scripts

egeminiani

penfa:Single- And Multiple-Group Penalized Factor Analysis

Fits single- and multiple-group penalized factor analysis models via a trust-region algorithm with integrated automatic multiple tuning parameter selection (Geminiani et al., 2021 <doi:10.1007/s11336-021-09751-8>). Available penalties include lasso, adaptive lasso, scad, mcp, and ridge.

Maintained by Elena Geminiani. Last updated 4 years ago.

factor-analysis lasso latent-variables multiple-group optimization penalization psychometrics

27.2 match 3 stars 4.48 score 5 scripts

lcbc-uio

galamm:Generalized Additive Latent and Mixed Models

Estimates generalized additive latent and mixed models using maximum marginal likelihood, as defined in Sorensen et al. (2023) <doi:10.1007/s11336-023-09910-z>, which is an extension of Rabe-Hesketh and Skrondal (2004)'s unifying framework for multilevel latent variable modeling <doi:10.1007/BF02295939>. Efficient computation is done using sparse matrix methods, Laplace approximation, and automatic differentiation. The framework includes generalized multilevel models with heteroscedastic residuals, mixed response types, factor loadings, smoothing splines, crossed random effects, and combinations thereof. Syntax for model formulation is close to 'lme4' (Bates et al. (2015) <doi:10.18637/jss.v067.i01>) and 'PLmixed' (Rockwood and Jeon (2019) <doi:10.1080/00273171.2018.1516541>).

Maintained by Øystein Sørensen. Last updated 6 months ago.

generalized-additive-models hierarchical-models item-response-theory latent-variable-models structural-equation-models cpp

16.6 match 29 stars 7.33 score 41 scripts

alexanderrobitzsch

sirt:Supplementary Item Response Theory Models

Supplementary functions for item response models aiming to complement existing R packages. The functionality includes among others multidimensional compensatory and noncompensatory IRT models (Reckase, 2009, <doi:10.1007/978-0-387-89976-3>), MCMC for hierarchical IRT models and testlet models (Fox, 2010, <doi:10.1007/978-1-4419-0742-4>), NOHARM (McDonald, 1982, <doi:10.1177/014662168200600402>), Rasch copula model (Braeken, 2011, <doi:10.1007/s11336-010-9190-4>; Schroeders, Robitzsch & Schipolowski, 2014, <doi:10.1111/jedm.12054>), faceted and hierarchical rater models (DeCarlo, Kim & Johnson, 2011, <doi:10.1111/j.1745-3984.2011.00143.x>), ordinal IRT model (ISOP; Scheiblechner, 1995, <doi:10.1007/BF02301417>), DETECT statistic (Stout, Habing, Douglas & Kim, 1996, <doi:10.1177/014662169602000403>), local structural equation modeling (LSEM; Hildebrandt, Luedtke, Robitzsch, Sommer & Wilhelm, 2016, <doi:10.1080/00273171.2016.1142856>).

Maintained by Alexander Robitzsch. Last updated 2 months ago.

item-response-theory openblas cpp

12.0 match 23 stars 10.01 score 280 scripts 22 dependents

sem-in-r

seminr:Building and Estimating Structural Equation Models

A powerful, easy to syntax for specifying and estimating complex Structural Equation Models. Models can be estimated using Partial Least Squares Path Modeling or Covariance-Based Structural Equation Modeling or covariance based Confirmatory Factor Analysis. Methods described in Ray, Danks, and Valdez (2021).

Maintained by Nicholas Patrick Danks. Last updated 3 years ago.

common-factors composites construct pls-models

16.0 match 62 stars 7.46 score 284 scripts

truecluster

ff:Memory-Efficient Storage of Large Data on Disk and Fast Access Functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R's standard atomic data types 'double', 'logical', 'raw' and 'integer' and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example 'quad' allows efficient storage of genomic data as an 'A','T','G','C' factor. The unsigned types support 'circular' arithmetic. There is also support for close-to-atomic types 'factor', 'ordered', 'POSIXct', 'Date' and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with 'permanent' files as well as creating/removing 'temporary' ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, 'logicals' and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package 'bit': chunked looping, fast bit operations and coercions between different objects that can store subscript information ('bit', 'bitwhich', ff 'boolean', ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.

Maintained by Jens Oehlschlägel. Last updated 2 months ago.

cpp

9.9 match 27 stars 12.01 score 764 scripts 71 dependents

tomaspinall

NFCP:N-Factor Commodity Pricing Through Term Structure Estimation

Commodity pricing models are (systems of) stochastic differential equations that are utilized for the valuation and hedging of commodity contingent claims (i.e. derivative products on the commodity) and other commodity related investments. Commodity pricing models that capture market dynamics are of great importance to commodity market participants in order to exercise sound investment and risk-management strategies. Parameters of commodity pricing models are estimated through maximum likelihood estimation, using available term structure futures data of a commodity. 'NFCP' (n-factor commodity pricing) provides a framework for the modeling, parameter estimation, probabilistic forecasting, option valuation and simulation of commodity prices through state space and Monte Carlo methods, risk-neutral valuation and Kalman filtering. 'NFCP' allows the commodity pricing model to consist of n correlated factors, with both random walk and mean-reverting elements. The n-factor commodity pricing model framework was first presented in the work of Cortazar and Naranjo (2006) <doi:10.1002/fut.20198>. Examples presented in 'NFCP' replicate the two-factor crude oil commodity pricing model presented in the prolific work of Schwartz and Smith (2000) <doi:10.1287/mnsc.46.7.893.12034> with the approximate term structure futures data applied within this study provided in the 'NFCP' package.

Maintained by Thomas Aspinall. Last updated 3 years ago.

26.7 match 5 stars 4.40 score 4 scripts

bioc

target:Predict Combined Function of Transcription Factors

Implement the BETA algorithm for infering direct target genes from DNA-binding and perturbation expression data Wang et al. (2013) <doi: 10.1038/nprot.2013.150>. Extend the algorithm to predict the combined function of two DNA-binding elements from comprable binding and expression data.

Maintained by Mahmoud Ahmed. Last updated 5 months ago.

software statisticalmethod transcription algorithm chip-seq dna-binding gene-regulation transcription-factors

14.9 match 4 stars 7.79 score 1.3k scripts

kassambara

factoextra:Extract and Visualize the Results of Multivariate Data Analyses

Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including 'PCA' (Principal Component Analysis), 'CA' (Correspondence Analysis), 'MCA' (Multiple Correspondence Analysis), 'FAMD' (Factor Analysis of Mixed Data), 'MFA' (Multiple Factor Analysis) and 'HMFA' (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides 'ggplot2' - based elegant data visualization.

Maintained by Alboukadel Kassambara. Last updated 5 years ago.

8.2 match 363 stars 14.13 score 15k scripts 52 dependents

dgerbing

lessR:Less Code, More Results

Each function replaces multiple standard R functions. For example, two function calls, Read() and CountAll(), generate summary statistics for all variables in the data frame, plus histograms and bar charts as appropriate. Other functions provide for summary statistics via pivot tables, a comprehensive regression analysis, ANOVA and t-test, visualizations including the Violin/Box/Scatter plot for a numerical variable, bar chart, histogram, box plot, density curves, calibrated power curve, reading multiple data formats with the same function call, variable labels, time series with aggregation and forecasting, color themes, and Trellis (facet) graphics. Also includes a confirmatory factor analysis of multiple indicator measurement models, pedagogical routines for data simulation such as for the Central Limit Theorem, generation and rendering of regression instructions for interpretative output, and interactive visualizations.

Maintained by David W. Gerbing. Last updated 1 months ago.

15.5 match 6 stars 7.47 score 394 scripts 3 dependents

eeethb

edgedata:Datasets that Support the EDGE Server DIY Logic

Datasets from most recent Center for Consumer Information and Insurance Oversight (CCIIO) DIY entry in a tidy format. These support the Centers for Medicare and Medicaid Services' (CMS) risk adjustment Do-It-Yourself (DIY) process, which allows health insurance issuers to calculate member risk profiles under the Health and Human Services-Hierarchical Condition Categories (HHS-HCC) regression model. This regression model is used to calculate risk adjustment transfers. Risk adjustment is a selection mitigation program implemented under the Patient Protection and Affordable Care Act (ACA or Obamacare) in the USA. Under the ACA, health insurance issuers submit claims data to CMS in order for CMS to calculate a risk score under the HHS-HCC regression model. However, CMS does not inform issuers of their average risk score until after the data submission deadline. These data sets can be used by issuers to calculate their average risk score mid-year. More information about risk adjustment and the HHS-HCC model can be found here: <https://www.cms.gov/mmrr/Articles/A2014/MMRR2014_004_03_a03.html>.

Maintained by Ethan Brockmann. Last updated 3 years ago.

42.5 match 1 stars 2.70 score 1 scripts

danheck

metaBMA:Bayesian Model Averaging for Random and Fixed Effects Meta-Analysis

Computes the posterior model probabilities for standard meta-analysis models (null model vs. alternative model assuming either fixed- or random-effects, respectively). These posterior probabilities are used to estimate the overall mean effect size as the weighted average of the mean effect size estimates of the random- and fixed-effect model as proposed by Gronau, Van Erp, Heck, Cesario, Jonas, & Wagenmakers (2017, <doi:10.1080/23743603.2017.1326760>). The user can define a wide range of non-informative or informative priors for the mean effect size and the heterogeneity coefficient. Moreover, using pre-compiled Stan models, meta-analysis with continuous and discrete moderators with Jeffreys-Zellner-Siow (JZS) priors can be fitted and tested. This allows to compute Bayes factors and perform Bayesian model averaging across random- and fixed-effects meta-analysis with and without moderators. For a primer on Bayesian model-averaged meta-analysis, see Gronau, Heck, Berkhout, Haaf, & Wagenmakers (2021, <doi:10.1177/25152459211031256>).

Maintained by Daniel W. Heck. Last updated 1 years ago.

bayes bayes-factor bayesian-inference evidence-synthesis meta-analysis model-averaging stan cpp

14.9 match 28 stars 7.70 score 54 scripts 4 dependents

friendly

vcdExtra:'vcd' Extensions and Additions

Provides additional data sets, methods and documentation to complement the 'vcd' package for Visualizing Categorical Data and the 'gnm' package for Generalized Nonlinear Models. In particular, 'vcdExtra' extends mosaic, assoc and sieve plots from 'vcd' to handle 'glm()' and 'gnm()' models and adds a 3D version in 'mosaic3d'. Additionally, methods are provided for comparing and visualizing lists of 'glm' and 'loglm' objects. This package is now a support package for the book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer.

Maintained by Michael Friendly. Last updated 5 months ago.

categorical-data-visualization generalized-linear-models mosaic-plots

11.0 match 24 stars 10.34 score 472 scripts 3 dependents

spatstat

spatstat.utils:Utility Functions for 'spatstat'

Contains utility functions for the 'spatstat' family of packages which may also be useful for other purposes.

Maintained by Adrian Baddeley. Last updated 2 months ago.

spatial-analysis spatial-data spatstat

9.8 match 5 stars 11.57 score 134 scripts 244 dependents

kharchenkolab

pagoda2:Single Cell Analysis and Differential Expression

Analyzing and interactively exploring large-scale single-cell RNA-seq datasets. 'pagoda2' primarily performs normalization and differential gene expression analysis, with an interactive application for exploring single-cell RNA-seq datasets. It performs basic tasks such as cell size normalization, gene variance normalization, and can be used to identify subpopulations and run differential expression within individual samples. 'pagoda2' was written to rapidly process modern large-scale scRNAseq datasets of approximately 1e6 cells. The companion web application allows users to explore which gene expression patterns form the different subpopulations within your data. The package also serves as the primary method for preprocessing data for conos, <https://github.com/kharchenkolab/conos>. This package interacts with data available through the 'p2data' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/pagoda2>. The size of the 'p2data' package is approximately 6 MB.

Maintained by Evan Biederstedt. Last updated 1 years ago.

scrna-seq single-cell single-cell-rna-seq transcriptomics openblas cpp openmp

14.2 match 222 stars 8.00 score 282 scripts

adrientaudiere

MiscMetabar:Miscellaneous Functions for Metabarcoding Analysis

Facilitate the description, transformation, exploration, and reproducibility of metabarcoding analyses. 'MiscMetabar' is mainly built on top of the 'phyloseq', 'dada2' and 'targets' R packages. It helps to build reproducible and robust bioinformatics pipelines in R. 'MiscMetabar' makes ecological analysis of alpha and beta-diversity easier, more reproducible and more powerful by integrating a large number of tools. Important features are described in Taudière A. (2023) <doi:10.21105/joss.06038>.

Maintained by Adrien Taudière. Last updated 24 days ago.

sequencing microbiome metagenomics clustering classification visualization amplicon amplicon-sequencing biodiversity-informatics ecology illumina metabarcoding ngs-analysis

17.4 match 17 stars 6.44 score 23 scripts

philchalmers

mirt:Multidimensional Item Response Theory

Analysis of discrete response data using unidimensional and multidimensional item analysis models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory item factor analysis models are estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier models are available for modeling item testlets using dimension reduction EM algorithms, while multiple group analyses and mixed effects designs are included for detecting differential item, bundle, and test functioning, and for modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, mixture IRT models, and zero-inflated response models are supported, as well as a wide family of probabilistic unfolding models.

Maintained by Phil Chalmers. Last updated 10 days ago.

irt mirt openblas cpp openmp

7.4 match 210 stars 14.98 score 2.5k scripts 40 dependents

floschuberth

cSEM:Composite-Based Structural Equation Modeling

Estimate, assess, test, and study linear, nonlinear, hierarchical and multigroup structural equation models using composite-based approaches and procedures, including estimation techniques such as partial least squares path modeling (PLS-PM) and its derivatives (PLSc, ordPLSc, robustPLSc), generalized structured component analysis (GSCA), generalized structured component analysis with uniqueness terms (GSCAm), generalized canonical correlation analysis (GCCA), principal component analysis (PCA), factor score regression (FSR) using sum score, regression or Bartlett scores (including bias correction using Croon’s approach), as well as several tests and typical postestimation procedures (e.g., verify admissibility of the estimates, assess the model fit, test the model fit etc.).

Maintained by Florian Schuberth. Last updated 15 days ago.

11.9 match 28 stars 9.11 score 56 scripts 2 dependents

indrajeetpatil

ggstatsplot:'ggplot2' Based Plots with Statistical Details

Extension of 'ggplot2', 'ggstatsplot' creates graphics with details from statistical tests included in the plots themselves. It provides an easier syntax to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms, dot plots, dot-and-whisker plots) or categorical (pie and bar charts) data. Currently, it supports the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian versions of t-test/ANOVA, correlation analyses, contingency table analysis, meta-analysis, and regression analyses. References: Patil (2021) <doi:10.21105/joss.03236>.

Maintained by Indrajeet Patil. Last updated 18 days ago.

bayes-factors datascience dataviz effect-size ggplot-extension hypothesis-testing non-parametric-statistics regression-models statistical-analysis

7.5 match 2.1k stars 14.49 score 3.0k scripts 1 dependents

tidymodels

hardhat:Construct Modeling Packages

Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.

Maintained by Hannah Frick. Last updated 1 months ago.

7.3 match 103 stars 14.88 score 175 scripts 436 dependents

pbreheny

ncvreg:Regularization Paths for SCAD and MCP Penalized Regression Models

Fits regularization paths for linear regression, GLM, and Cox regression models using lasso or nonconvex penalties, in particular the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty, with options for additional L2 penalties (the "elastic net" idea). Utilities for carrying out cross-validation as well as post-fitting visualization, summarization, inference, and prediction are also provided. For more information, see Breheny and Huang (2011) <doi:10.1214/10-AOAS388> or visit the ncvreg homepage <https://pbreheny.github.io/ncvreg/>.

Maintained by Patrick Breheny. Last updated 2 days ago.

9.0 match 43 stars 12.04 score 458 scripts 38 dependents

rhartmano

labelr:Label Data Frames, Variables, and Values

Create and use data frame labels for data frame objects (frame labels), their columns (name labels), and individual values of a column (value labels). Value labels include one-to-one and many-to-one labels for nominal and ordinal variables, as well as numerical range-based value labels for continuous variables. Convert value-labeled variables so each value is replaced by its corresponding value label. Add values-converted-to-labels columns to a value-labeled data frame while preserving parent columns. Filter and subset a value-labeled data frame using labels, while returning results in terms of values. Overlay labels in place of values in common R commands to increase interpretability. Generate tables of value frequencies, with categories expressed as raw values or as labels. Access data frames that show value-to-label mappings for easy reference.

Maintained by Robert Hartman. Last updated 7 months ago.

19.1 match 3 stars 5.65 score 10 scripts

rubensmoura87

MultiATSM:Multicountry Term Structure of Interest Rates Models

Estimation routines for several classes of affine term structure of interest rates models. All the models are based on the single-country unspanned macroeconomic risk framework from Joslin, Priebsch, and Singleton (2014, JF) <doi:10.1111/jofi.12131>. Multicountry extensions such as the ones of Jotikasthira, Le, and Lundblad (2015, JFE) <doi:10.1016/j.jfineco.2014.09.004>, Candelon and Moura (2023, EM) <doi:10.1016/j.econmod.2023.106453>, and Candelon and Moura (Forthcoming, JFEC) <doi:10.1093/jjfinec/nbae008> are also available.

Maintained by Rubens Moura. Last updated 4 days ago.

27.6 match 3.90 score 8 scripts

ecmerkle

blavaan:Bayesian Latent Variable Analysis

Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models. References: Merkle & Rosseel (2018) <doi:10.18637/jss.v085.i04>; Merkle et al. (2021) <doi:10.18637/jss.v100.i06>.

Maintained by Edgar Merkle. Last updated 3 days ago.

bayesian-statistics factor-analysis growth-curve-models latent-variables missing-data multilevel-models multivariate-analysis path-analysis psychometrics statistical-modeling structural-equation-modeling cpp

9.8 match 92 stars 10.84 score 183 scripts 3 dependents

donaldrwilliams

BGGM:Bayesian Gaussian Graphical Models

Fit Bayesian Gaussian graphical models. The methods are separated into two Bayesian approaches for inference: hypothesis testing and estimation. There are extensions for confirmatory hypothesis testing, comparing Gaussian graphical models, and node wise predictability. These methods were recently introduced in the Gaussian graphical model literature, including Williams (2019) <doi:10.31234/osf.io/x8dpr>, Williams and Mulder (2019) <doi:10.31234/osf.io/ypxd8>, Williams, Rast, Pericchi, and Mulder (2019) <doi:10.31234/osf.io/yt386>.

Maintained by Philippe Rast. Last updated 3 months ago.

bayes-factors bayesian-hypothesis-testing gaussian-graphical-models openblas cpp openmp

10.9 match 55 stars 9.64 score 102 scripts 1 dependents

pik-piam

quitte:Bits and pieces of code to use with quitte-style data frames

A collection of functions for easily dealing with quitte-style data frames, doing multi-model comparisons and plots.

Maintained by Michaja Pehl. Last updated 22 hours ago.

12.7 match 8.22 score 184 scripts 35 dependents

mjskay

tidybayes:Tidy Data and 'Geoms' for Bayesian Models

Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.

Maintained by Matthew Kay. Last updated 6 months ago.

bayesian-data-analysis brms ggplot2 jags stan tidy-data visualization

7.0 match 732 stars 14.88 score 7.3k scripts 19 dependents

ludvigolsen

groupdata2:Creating Groups from Data

Methods for dividing data into groups. Create balanced partitions and cross-validation folds. Perform time series windowing and general grouping and splitting of data. Balance existing groups with up- and downsampling or collapse them to fewer groups.

Maintained by Ludvig Renbo Olsen. Last updated 3 months ago.

balance cross-validation data data-frame fold group-factor groups participants partition split staircase

11.1 match 27 stars 9.36 score 338 scripts 7 dependents

oliver-wyman-actuarial

easyr:Helpful Functions from Oliver Wyman Actuarial Consulting

Makes difficult operations easy. Includes these types of functions: shorthand, type conversion, data wrangling, and work flow. Also includes some helpful data objects: NA strings, U.S. state list, color blind charting colors. Built and shared by Oliver Wyman Actuarial Consulting. Accepting proposed contributions through GitHub.

Maintained by Bryce Chamberlain. Last updated 1 years ago.

21.3 match 20 stars 4.86 score 18 scripts

larmarange

labelled:Manipulating Labelled Data

Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.

Maintained by Joseph Larmarange. Last updated 25 days ago.

haven labels metadata sas spss stata

6.8 match 76 stars 15.02 score 2.4k scripts 96 dependents

r-lib

clock:Date-Time Types and Tools

Provides a comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) that partition responsibilities so that the complexities of time zones are only considered when they are really needed. Capabilities include: date-time parsing, formatting, arithmetic, extraction and updating of components, and rounding.

Maintained by Davis Vaughan. Last updated 13 hours ago.

cpp

7.1 match 106 stars 14.48 score 296 scripts 407 dependents

sfcheung

semlbci:Likelihood-Based Confidence Interval in Structural Equation Models

Forms likelihood-based confidence intervals (LBCIs) for parameters in structural equation modeling, introduced in Cheung and Pesigan (2023) <doi:10.1080/10705511.2023.2183860>. Currently implements the algorithm illustrated by Pek and Wu (2018) <doi:10.1037/met0000163>, and supports the robust LBCI proposed by Falk (2018) <doi:10.1080/10705511.2017.1367254>.

Maintained by Shu Fai Cheung. Last updated 2 months ago.

confidence-intervals lavaan likelihood-based profile-likelihood structural-equation-modeling

17.0 match 1 stars 5.93 score 188 scripts

mihai-sysbio

glpkAPI:R Interface to C API of GLPK

R Interface to C API of GLPK, depends on GLPK Version >= 4.42.

Maintained by Mihail Anton. Last updated 2 years ago.

glpk

16.9 match 5.96 score 51 scripts 12 dependents

bioc

CeTF:Coexpression for Transcription Factors using Regulatory Impact Factors and Partial Correlation and Information Theory analysis

This package provides the necessary functions for performing the Partial Correlation coefficient with Information Theory (PCIT) (Reverter and Chan 2008) and Regulatory Impact Factors (RIF) (Reverter et al. 2010) algorithm. The PCIT algorithm identifies meaningful correlations to define edges in a weighted network and can be applied to any correlation-based network including but not limited to gene co-expression networks, while the RIF algorithm identify critical Transcription Factors (TF) from gene expression data. These two algorithms when combined provide a very relevant layer of information for gene expression studies (Microarray, RNA-seq and single-cell RNA-seq data).

Maintained by Carlos Alberto Oliveira de Biagi Junior. Last updated 5 months ago.

sequencing rnaseq microarray geneexpression transcription normalization differentialexpression singlecell network regression chipseq immunooncology coverage cpp

23.1 match 4.30 score 9 scripts

knickodem

kfa:K-Fold Cross Validation for Factor Analysis

Provides functions to identify plausible and replicable factor structures for a set of variables via k-fold cross validation. The process combines the exploratory and confirmatory factor analytic approach to scale development (Flora & Flake, 2017) <doi:10.1037/cbs0000069> with a cross validation technique that maximizes the available data (Hastie, Tibshirani, & Friedman, 2009) <isbn:978-0-387-21606-5>. Also available are functions to determine k by drawing on power analytic techniques for covariance structures (MacCallum, Browne, & Sugawara, 1996) <doi:10.1037/1082-989X.1.2.130>, generate model syntax, and summarize results in a report.

Maintained by Kyle Nickodem. Last updated 1 years ago.

cross-validation factor-analysis psychometrics scale-development

28.0 match 7 stars 3.54 score 7 scripts

bioc

netZooR:Unified methods for the inference and analysis of gene regulatory networks

netZooR unifies the implementations of several Network Zoo methods (netzoo, netzoo.github.io) into a single package by creating interfaces between network inference and network analysis methods. Currently, the package has 3 methods for network inference including PANDA and its optimized implementation OTTER (network reconstruction using mutliple lines of biological evidence), LIONESS (single-sample network inference), and EGRET (genotype-specific networks). Network analysis methods include CONDOR (community detection), ALPACA (differential community detection), CRANE (significance estimation of differential modules), MONSTER (estimation of network transition states). In addition, YARN allows to process gene expresssion data for tissue-specific analyses and SAMBAR infers missing mutation data based on pathway information.

Maintained by Tara Eicher. Last updated 8 days ago.

networkinference network generegulation geneexpression transcription microarray graphandnetwork gene-regulatory-network transcription-factors

12.4 match 105 stars 7.98 score

bioc

ReducedExperiment:Containers and tools for dimensionally-reduced -omics representations

Provides SummarizedExperiment-like containers for storing and manipulating dimensionally-reduced assay data. The ReducedExperiment classes allow users to simultaneously manipulate their original dataset and their decomposed data, in addition to other method-specific outputs like feature loadings. Implements utilities and specialised classes for the application of stabilised independent component analysis (sICA) and weighted gene correlation network analysis (WGCNA).

Maintained by Jack Gisby. Last updated 2 months ago.

geneexpression infrastructure datarepresentation software dimensionreduction network bioconductor-package bioinformatics dimensionality-reduction

19.1 match 3 stars 5.18 score 8 scripts

tidymodels

embed:Extra Recipes for Encoding Predictors

Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.

Maintained by Emil Hvitfeldt. Last updated 1 months ago.

10.6 match 142 stars 9.35 score 1.1k scripts

lrberge

fixest:Fast Fixed-Effects Estimations

Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.

Maintained by Laurent Berge. Last updated 7 months ago.

cpp openmp

6.7 match 387 stars 14.69 score 3.8k scripts 25 dependents

biodiverse

spOccupancy:Single-Species, Multi-Species, and Integrated Spatial Occupancy Models

Fits single-species, multi-species, and integrated non-spatial and spatial occupancy models using Markov Chain Monte Carlo (MCMC). Models are fit using Polya-Gamma data augmentation detailed in Polson, Scott, and Windle (2013) <doi:10.1080/01621459.2013.829001>. Spatial models are fit using either Gaussian processes or Nearest Neighbor Gaussian Processes (NNGP) for large spatial datasets. Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Provides functionality for data integration of multiple single-species occupancy data sets using a joint likelihood framework. Details on data integration are given in Miller, Pacifici, Sanderlin, and Reich (2019) <doi:10.1111/2041-210X.13110>. Details on single-species and multi-species models are found in MacKenzie, Nichols, Lachman, Droege, Royle, and Langtimm (2002) <doi:10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2> and Dorazio and Royle <doi:10.1198/016214505000000015>, respectively.

Maintained by Jeffrey Doser. Last updated 20 days ago.

openblas cpp openmp

13.4 match 59 stars 7.31 score 204 scripts

bioc

motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).

Maintained by Simon Gert Coetzee. Last updated 5 months ago.

chipseq visualization motifannotation transcription

10.8 match 28 stars 8.96 score 103 scripts

davidbajnai

isogeochem:Tools for Stable Isotope Geochemistry

This toolbox makes working with oxygen, carbon, and clumped isotope data reproducible and straightforward. Use it to quickly calculate isotope fractionation factors, and apply paleothermometry equations.

Maintained by David Bajnai. Last updated 2 years ago.

carbonate clumped geochemistry geology isotope

19.9 match 7 stars 4.85 score 1 scripts

bayesball

LearnBayes:Learning Bayesian Inference

Contains functions for summarizing basic one and two parameter posterior distributions and predictive distributions. It contains MCMC algorithms for summarizing posterior distributions defined by the user. It also contains functions for regression models, hierarchical models, Bayesian tests, and illustrations of Gibbs sampling.

Maintained by Jim Albert. Last updated 7 years ago.

8.5 match 38 stars 11.34 score 690 scripts 31 dependents

raicheg

nFactors:Parallel Analysis and Other Non Graphical Solutions to the Cattell Scree Test

Indices, heuristics and strategies to help determine the number of factors/components to retain: 1. Acceleration factor (af with or without Parallel Analysis); 2. Optimal Coordinates (noc with or without Parallel Analysis); 3. Parallel analysis (components, factors and bootstrap); 4. lambda > mean(lambda) (Kaiser, CFA and related); 5. Cattell-Nelson-Gorsuch (CNG); 6. Zoski and Jurs multiple regression (b, t and p); 7. Zoski and Jurs standard error of the regression coeffcient (sescree); 8. Nelson R2; 9. Bartlett khi-2; 10. Anderson khi-2; 11. Lawley khi-2 and 12. Bentler-Yuan khi-2.

Maintained by Gilles Raiche. Last updated 2 years ago.

17.5 match 5.46 score 498 scripts 4 dependents

rmheiberger

HH:Statistical Analysis and Data Display: Heiberger and Holland

Support software for Statistical Analysis and Data Display (Second Edition, Springer, ISBN 978-1-4939-2121-8, 2015) and (First Edition, Springer, ISBN 0-387-40270-5, 2004) by Richard M. Heiberger and Burt Holland. This contemporary presentation of statistical methods features extensive use of graphical displays for exploring data and for displaying the analysis. The second edition includes redesigned graphics and additional chapters. The authors emphasize how to construct and interpret graphs, discuss principles of graphical design, and show how accompanying traditional tabular results are used to confirm the visual impressions derived directly from the graphs. Many of the graphical formats are novel and appear here for the first time in print. All chapters have exercises. All functions introduced in the book are in the package. R code for all examples, both graphs and tables, in the book is included in the scripts directory of the package.

Maintained by Richard M. Heiberger. Last updated 1 months ago.

14.8 match 3 stars 6.42 score 752 scripts 5 dependents

insightsengineering

tern:Create Common TLGs Used in Clinical Trials

Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.

Maintained by Joe Zhu. Last updated 2 months ago.

clinical-trials graphs listings nest outputs tables

7.4 match 79 stars 12.62 score 186 scripts 9 dependents

jwood000

RcppBigIntAlgos:Factor Big Integers with the Parallel Quadratic Sieve

Features the multiple polynomial quadratic sieve (MPQS) algorithm for factoring large integers and a vectorized factoring function that returns the complete factorization of an integer. The MPQS is based off of the seminal work of Carl Pomerance (1984) <doi:10.1007/3-540-39757-4_17> along with the modification of multiple polynomials introduced by Peter Montgomery and J. Davis as outlined by Robert D. Silverman (1987) <doi:10.1090/S0025-5718-1987-0866119-8>. Utilizes the C library GMP (GNU Multiple Precision Arithmetic). For smaller integers, a simple Elliptic Curve algorithm is attempted followed by a constrained version of Pollard's rho algorithm. The Pollard's rho algorithm is the same algorithm used by the factorize function in the 'gmp' package.

Maintained by Joseph Wood. Last updated 9 months ago.

algorithm gmp integer-factorization mpqs prime-factorizations primes quadratic-sieve quadratic-sieve-algorithm cpp

24.4 match 13 stars 3.81 score 8 scripts

briencj

asremlPlus:Augments 'ASReml-R' in Fitting Mixed Models and Packages Generally in Exploring Prediction Differences

Assists in automating the selection of terms to include in mixed models when 'asreml' is used to fit the models. Procedures are available for choosing models that conform to the hierarchy or marginality principle, for fitting and choosing between two-dimensional spatial models using correlation, natural cubic smoothing spline and P-spline models. A history of the fitting of a sequence of models is kept in a data frame. Also used to compute functions and contrasts of, to investigate differences between and to plot predictions obtained using any model fitting function. The content falls into the following natural groupings: (i) Data, (ii) Model modification functions, (iii) Model selection and description functions, (iv) Model diagnostics and simulation functions, (v) Prediction production and presentation functions, (vi) Response transformation functions, (vii) Object manipulation functions, and (viii) Miscellaneous functions (for further details see 'asremlPlus-package' in help). The 'asreml' package provides a computationally efficient algorithm for fitting a wide range of linear mixed models using Residual Maximum Likelihood. It is a commercial package and a license for it can be purchased from 'VSNi' <https://vsni.co.uk/> as 'asreml-R', who will supply a zip file for local installation/updating (see <https://asreml.kb.vsni.co.uk/>). It is not needed for functions that are methods for 'alldiffs' and 'data.frame' objects. The package 'asremPlus' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 26 days ago.

asreml mixed-models

10.0 match 19 stars 9.34 score 200 scripts

cran

mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.

Maintained by Simon Wood. Last updated 1 years ago.

openblas openmp

7.3 match 32 stars 12.71 score 17k scripts 7.8k dependents

robustport

facmodTS:Time Series Models for Asset Returns

Supports teaching methods of estimating and testing time series models for use in robust portfolio construction and analysis. Unique in providing not only classical least squares, but also modern robust model fitting methods which are not much influenced by outliers. Includes returns and risk decompositions, with user choice of standard deviation, value-at-risk, and expected shortfall risk measures. "Robust Statistics Theory and Methods (with R)", R. A. Maronna, R. D. Martin, V. J. Yohai, M. Salibian-Barrera (2019) <doi:10.1002/9781119214656>.

Maintained by Doug Martin. Last updated 7 days ago.

30.9 match 1 stars 3.00 score

mmaechler

sfsmisc:Utilities from 'Seminar fuer Statistik' ETH Zurich

Useful utilities ['goodies'] from Seminar fuer Statistik ETH Zurich, some of which were ported from S-plus in the 1990s. For graphics, have pretty (Log-scale) axes eaxis(), an enhanced Tukey-Anscombe plot, combining histogram and boxplot, 2d-residual plots, a 'tachoPlot()', pretty arrows, etc. For robustness, have a robust F test and robust range(). For system support, notably on Linux, provides 'Sys.*()' functions with more access to system and CPU information. Finally, miscellaneous utilities such as simple efficient prime numbers, integer codes, Duplicated(), toLatex.numeric() and is.whole().

Maintained by Martin Maechler. Last updated 5 months ago.

8.4 match 11 stars 10.87 score 566 scripts 119 dependents

bioc

netresponse:Functional Network Analysis

Algorithms for functional network analysis. Includes an implementation of a variational Dirichlet process Gaussian mixture model for nonparametric mixture modeling.

Maintained by Leo Lahti. Last updated 5 months ago.

cellbiology clustering geneexpression genetics network graphandnetwork differentialexpression microarray networkinference transcription

16.3 match 3 stars 5.64 score 21 scripts

projectmosaic

mosaic:Project MOSAIC Statistics and Mathematics Teaching Utilities

Data sets and utilities from Project MOSAIC (<http://www.mosaic-web.org>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.

Maintained by Randall Pruim. Last updated 1 years ago.

6.9 match 93 stars 13.32 score 7.2k scripts 7 dependents

tdjorgensen

simsem:SIMulated Structural Equation Modeling

Provides an easy framework for Monte Carlo simulation in structural equation modeling, which can be used for various purposes, such as such as model fit evaluation, power analysis, or missing data handling and planning.

Maintained by Terrence D. Jorgensen. Last updated 4 years ago.

26.8 match 3.40 score 276 scripts

juba

questionr:Functions to Make Surveys Processing Easier

Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.

Maintained by Julien Barnier. Last updated 2 years ago.

7.2 match 83 stars 12.55 score 1.1k scripts 19 dependents

sebkrantz

dfms:Dynamic Factor Models

Efficient estimation of Dynamic Factor Models using the Expectation Maximization (EM) algorithm or Two-Step (2S) estimation, supporting datasets with missing data. The estimation options follow advances in the econometric literature: either running the Kalman Filter and Smoother once with initial values from PCA - 2S estimation as in Doz, Giannone and Reichlin (2011) <doi:10.1016/j.jeconom.2011.02.012> - or via iterated Kalman Filtering and Smoothing until EM convergence - following Doz, Giannone and Reichlin (2012) <doi:10.1162/REST_a_00225> - or using the adapted EM algorithm of Banbura and Modugno (2014) <doi:10.1002/jae.2306>, allowing arbitrary patterns of missing data. The implementation makes heavy use of the 'Armadillo' 'C++' library and the 'collapse' package, providing for particularly speedy estimation. A comprehensive set of methods supports interpretation and visualization of the model as well as forecasting. Information criteria to choose the number of factors are also provided - following Bai and Ng (2002) <doi:10.1111/1468-0262.00273>.

Maintained by Sebastian Krantz. Last updated 6 months ago.

dynamic-factor-models time-series openblas cpp

16.2 match 31 stars 5.57 score 12 scripts

cran

sna:Tools for Social Network Analysis

A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, and 2D/3D network visualization.

Maintained by Carter T. Butts. Last updated 6 months ago.

13.2 match 8 stars 6.78 score 94 dependents

stan-dev

posterior:Tools for Working with Posterior Distributions

Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.

Maintained by Paul-Christian Bürkner. Last updated 9 days ago.

bayes bayesian mcmc

5.5 match 168 stars 16.13 score 3.3k scripts 342 dependents

berndbischl

BBmisc:Miscellaneous Helper Functions for B. Bischl

Miscellaneous helper functions for and from B. Bischl and some other guys, mainly for package development.

Maintained by Bernd Bischl. Last updated 2 years ago.

8.5 match 20 stars 10.59 score 980 scripts 69 dependents

hwborchers

pracma:Practical Numerical Math Functions

Provides a large number of functions from numerical analysis and linear algebra, numerical optimization, differential equations, time series, plus some well-known special mathematical functions. Uses 'MATLAB' function names where appropriate to simplify porting.

Maintained by Hans W. Borchers. Last updated 1 years ago.

7.3 match 29 stars 12.34 score 6.6k scripts 931 dependents

tidyverse

readr:Read Rectangular Text Data

The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.

Maintained by Jennifer Bryan. Last updated 8 months ago.

csv fwf parsing cpp

4.3 match 1.0k stars 21.03 score 132k scripts 2.0k dependents

revelle

psychTools:Tools to Accompany the 'psych' Package for Psychological Research

Support functions, data sets, and vignettes for the 'psych' package. Contains several of the biggest data sets for the 'psych' package as well as four vignettes. A few helper functions for file manipulation are included as well. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 12 months ago.

15.4 match 5.80 score 178 scripts 5 dependents

cbhurley

bullseye:Visualising Multiple Pairwise Variable Correlations and Other Scores

We provide a tidy data structure and visualisations for multiple or grouped variable correlations, general association measures scagnostics and other pairwise scores suitable for numerical, ordinal and nominal variables. Supported measures include distance correlation, maximal information, ace correlation, Kendall's tau, and polychoric correlation.

Maintained by Catherine Hurley. Last updated 8 days ago.

15.9 match 2 stars 5.58 score 14 scripts

bioc

scuttle:Single-Cell RNA-Seq Analysis Utilities

Provides basic utility functions for performing single-cell analyses, focusing on simple normalization, quality control and data transformations. Also provides some helper functions to assist development of other packages.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology singlecell rnaseq qualitycontrol preprocessing normalization transcriptomics geneexpression sequencing software dataimport openblas cpp

8.7 match 10.21 score 1.7k scripts 80 dependents

grvanderploeg

parafac4microbiome:Parallel Factor Analysis Modelling of Longitudinal Microbiome Data

Creation and selection of PARAllel FACtor Analysis (PARAFAC) models of longitudinal microbiome data. You can import your own data with our import functions or use one of the example datasets to create your own PARAFAC models. Selection of the optimal number of components can be done using assessModelQuality() and assessModelStability(). The selected model can then be plotted using plotPARAFACmodel(). The Parallel Factor Analysis method was originally described by Caroll and Chang (1970) <doi:10.1007/BF02310791> and Harshman (1970) <https://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf>.

Maintained by Geert Roelof van der Ploeg. Last updated 19 days ago.

dimensionality-reduction microbiome microbiome-data multiway multiway-algorithms parallel-factor-analysis

14.0 match 6 stars 6.31 score 13 scripts

bioc

decoupleR:decoupleR: Ensemble of computational methods to infer biological activities from omics data

Many methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability. Here, we present decoupleR, a Bioconductor package containing different statistical methods to extract these signatures within a unified framework. decoupleR allows the user to flexibly test any method with any resource. It incorporates methods that take into account the sign and weight of network interactions. decoupleR can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.

Maintained by Pau Badia-i-Mompel. Last updated 5 months ago.

differentialexpression functionalgenomics geneexpression generegulation network software statisticalmethod transcription

7.8 match 230 stars 11.27 score 316 scripts 3 dependents

sdctools

sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.

Maintained by Matthias Templ. Last updated 25 days ago.

cpp

8.8 match 83 stars 9.89 score 258 scripts

danielebizzarri

MiMIR:Metabolomics-Based Models for Imputing Risk

Provides an intuitive framework for ad-hoc statistical analysis of 1H-NMR metabolomics by Nightingale Health. It allows to easily explore new metabolomics measurements assayed by Nightingale Health, comparing the distributions with a large Consortium (BBMRI-nl); project previously published metabolic scores [<doi:10.1016/j.ebiom.2021.103764>, <doi:10.1161/CIRCGEN.119.002610>, <doi:10.1038/s41467-019-11311-9>, <doi:10.7554/eLife.63033>, <doi:10.1161/CIRCULATIONAHA.114.013116>, <doi:10.1007/s00125-019-05001-w>]; and calibrate the metabolic surrogate values to a desired dataset.

Maintained by Daniele Bizzarri. Last updated 2 years ago.

binary-risk-factors biomarkers linear-regression metabolites metabolomics nightingale-metabolomics risk-factor-models risk-factors surrogate-models

20.8 match 8 stars 4.11 score 32 scripts

bschneidr

svrep:Tools for Creating, Updating, and Analyzing Survey Replicate Weights

Provides tools for creating and working with survey replicate weights, extending functionality of the 'survey' package from Lumley (2004) <doi:10.18637/jss.v009.i08>. Implements bootstrap methods for complex surveys, including the generalized survey bootstrap as described by Beaumont and Patak (2012) <doi:10.1111/j.1751-5823.2011.00166.x>. Methods are provided for applying nonresponse adjustments to both full-sample and replicate weights as described by Rust and Rao (1996) <doi:10.1177/096228029600500305>. Implements methods for sample-based calibration described by Opsomer and Erciulescu (2021) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2021002/article/00006-eng.htm>. Diagnostic functions are included to compare weights and weighted estimates from different sets of replicate weights.

Maintained by Ben Schneider. Last updated 5 days ago.

10.4 match 8 stars 8.12 score 54 scripts 3 dependents

georgiosseitidis

ssifs:Stochastic Search Inconsistency Factor Selection

Evaluating the consistency assumption of Network Meta-Analysis both globally and locally in the Bayesian framework. Inconsistencies are located by applying Bayesian variable selection to the inconsistency factors. The implementation of the method is described by Seitidis et al. (2022) <arXiv:2211.07258>.

Maintained by Georgios Seitidis. Last updated 2 months ago.

consistency metaanalysis network nma ssifs ssvs variable-selection jags cpp

16.7 match 2 stars 5.08 score 4 scripts

eh-in-r

RCTS:Clustering Time Series While Resisting Outliers

Robust Clustering of Time Series (RCTS) has the functionality to cluster time series using both the classical and the robust interactive fixed effects framework. The classical framework is developed in Ando & Bai (2017) <doi:10.1080/01621459.2016.1195743>. The implementation within this package excludes the SCAD-penalty on the estimations of beta. This robust framework is developed in Boudt & Heyndels (2022) <doi:10.1016/j.ecosta.2022.01.002> and is made robust against different kinds of outliers. The algorithm iteratively updates beta (the coefficients of the observable variables), group membership, and the latent factors (which can be common and/or group-specific) along with their loadings. The number of groups and factors can be estimated if they are unknown.

Maintained by Ewoud Heyndels. Last updated 2 years ago.

42.0 match 2.00 score

cfwp

FMradio:Factor Modeling for Radiomics Data

Functions that support stable prediction and classification with radiomics data through factor-analytic modeling. For details, see Peeters et al. (2019) <arXiv:1903.11696>.

Maintained by Carel F.W. Peeters. Last updated 5 years ago.

factor-analysis machine-learning radiomics

22.4 match 11 stars 3.74 score 2 scripts

r-forge

car:Companion to Applied Regression

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.

Maintained by John Fox. Last updated 5 months ago.

5.5 match 15.29 score 43k scripts 901 dependents

gzt

CholWishart:Cholesky Decomposition of the Wishart Distribution

Sampling from the Cholesky factorization of a Wishart random variable, sampling from the inverse Wishart distribution, sampling from the Cholesky factorization of an inverse Wishart random variable, sampling from the pseudo Wishart distribution, sampling from the generalized inverse Wishart distribution, computing densities for the Wishart and inverse Wishart distributions, and computing the multivariate gamma and digamma functions. Provides a header file so the C functions can be called directly from other programs.

Maintained by Geoffrey Thompson. Last updated 6 months ago.

cholesky-decomposition cholesky-factorization digamma-functions gamma multivariate pseudo-wishart wishart wishart-distributions openblas

11.7 match 7 stars 7.05 score 41 scripts 13 dependents

easystats

parameters:Processing of Model Parameters

Utilities for processing the parameters of various statistical models. Beyond computing p values, CIs, and other indices for a wide variety of models (see list of supported models using the function 'insight::supported_models()'), this package implements features like bootstrapping or simulating of parameters and models, feature reduction (feature extraction and variable selection) as well as functions to describe data and variable characteristics (e.g. skewness, kurtosis, smoothness or distribution).

Maintained by Daniel Lüdecke. Last updated 19 hours ago.

beta bootstrap ci confidence-intervals data-reduction easystats fa feature-extraction feature-reduction hacktoberfest parameters pca pvalues regression-models robust-statistics standardize standardized-estimates statistical-models

5.3 match 453 stars 15.65 score 1.8k scripts 56 dependents

hturner

gnm:Generalized Nonlinear Models

Functions to specify and fit generalized nonlinear models, including models with multiplicative interaction terms such as the UNIDIFF model from sociology and the AMMI model from crop science, and many others. Over-parameterized representations of models are used throughout; functions are provided for inference on estimable parameter combinations, as well as standard methods for diagnostics etc.

Maintained by Heather Turner. Last updated 1 years ago.

generalized-linear-models generalized-nonlinear-models statistical-models openblas

7.8 match 16 stars 10.51 score 290 scripts 21 dependents

chr1swallace

coloc:Colocalisation Tests of Two Genetic Traits

Performs the colocalisation tests described in Giambartolomei et al (2013) <doi:10.1371/journal.pgen.1004383>, Wallace (2020) <doi:10.1371/journal.pgen.1008720>, Wallace (2021) <doi:10.1371/journal.pgen.1009440>.

Maintained by Chris Wallace. Last updated 4 months ago.

6.7 match 162 stars 12.23 score 916 scripts 3 dependents

dsy109

tolerance:Statistical Tolerance Intervals and Regions

Statistical tolerance limits provide the limits between which we can expect to find a specified proportion of a sampled population with a given level of confidence. This package provides functions for estimating tolerance limits (intervals) for various univariate distributions (binomial, Cauchy, discrete Pareto, exponential, two-parameter exponential, extreme value, hypergeometric, Laplace, logistic, negative binomial, negative hypergeometric, normal, Pareto, Poisson-Lindley, Poisson, uniform, and Zipf-Mandelbrot), Bayesian normal tolerance limits, multivariate normal tolerance regions, nonparametric tolerance intervals, tolerance bands for regression settings (linear regression, nonlinear regression, nonparametric regression, and multivariate regression), and analysis of variance tolerance intervals. Visualizations are also available for most of these settings.

Maintained by Derek S. Young. Last updated 9 months ago.

tolerance-intervals

12.6 match 4 stars 6.39 score 153 scripts 7 dependents

omarwagih

ggseqlogo:A 'ggplot2' Extension for Drawing Publication-Ready Sequence Logos

The extensive range of functions provided by this package makes it possible to draw highly versatile sequence logos. Features include, but not limited to, modifying colour schemes and fonts used to draw the logo, generating multiple logo plots, and aiding the visualisation with annotations. Sequence logos can easily be combined with other plots 'ggplot2' plots.

Maintained by Omar Wagih. Last updated 5 months ago.

7.0 match 211 stars 11.48 score 786 scripts 13 dependents

bioc

TFutils:TFutils

This package helps users to work with TF metadata from various sources. Significant catalogs of TFs and classifications thereof are made available. Tools for working with motif scans are also provided.

Maintained by Vincent Carey. Last updated 4 months ago.

transcriptomics

16.7 match 4.80 score 21 scripts

quentingronau

bridgesampling:Bridge Sampling for Marginal Likelihoods and Bayes Factors

Provides functions for estimating marginal likelihoods, Bayes factors, posterior model probabilities, and normalizing constants in general, via different versions of bridge sampling (Meng & Wong, 1996, <https://www3.stat.sinica.edu.tw/statistica/j6n4/j6n43/j6n43.htm>). Gronau, Singmann, & Wagenmakers (2020) <doi:10.18637/jss.v092.i10>.

Maintained by Quentin F. Gronau. Last updated 2 years ago.

6.6 match 32 stars 12.12 score 314 scripts 53 dependents

hdarjus

sparvaride:Variance Identification in Sparse Factor Analysis

This is an implementation of the algorithm described in Section 3 of Hosszejni and Frühwirth-Schnatter (2022) <doi:10.48550/arXiv.2211.00671>. The algorithm is used to verify that the counting rule CR(r,1) holds for the sparsity pattern of the transpose of a factor loading matrix. As detailed in Section 2 of the same paper, if CR(r,1) holds, then the idiosyncratic variances are generically identified. If CR(r,1) does not hold, then we do not know whether the idiosyncratic variances are identified or not.

Maintained by Darjus Hosszejni. Last updated 2 years ago.

econometrics factor-analysis latent-factors parameter-identification cpp

21.4 match 1 stars 3.70 score 4 scripts

bioc

IRanges:Foundation of integer range manipulation in Bioconductor

Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure datarepresentation bioconductor-package core-package

5.3 match 22 stars 15.09 score 2.1k scripts 1.8k dependents

ashenoy-cmbi

grafify:Easy Graphs for Data Visualisation and Linear Models for ANOVA

Easily explore data by plotting graphs with a few lines of code. Use these ggplot() wrappers to quickly draw graphs of scatter/dots with box-whiskers, violins or SD error bars, data distributions, before-after graphs, factorial ANOVA and more. Customise graphs in many ways, for example, by choosing from colour blind-friendly palettes (12 discreet, 3 continuous and 2 divergent palettes). Use the simple code for ANOVA as ordinary (lm()) or mixed-effects linear models (lmer()), including randomised-block or repeated-measures designs, and fit non-linear outcomes as a generalised additive model (gam) using mgcv(). Obtain estimated marginal means and perform post-hoc comparisons on fitted models (via emmeans()). Also includes small datasets for practising code and teaching basics before users move on to more complex designs. See vignettes for details on usage <https://grafify.shenoylab.com/>. Citation: <doi:10.5281/zenodo.5136508>.

Maintained by Avinash R Shenoy. Last updated 2 days ago.

ggplot2 linear-models post-hoc-comparisons statistics vignettes

14.9 match 48 stars 5.31 score 107 scripts

cdeager

standardize:Tools for Standardizing Variables for Regression in R

Tools which allow regression variables to be placed on similar scales, offering computational benefits as well as easing interpretation of regression output.

Maintained by Christopher D. Eager. Last updated 4 years ago.

12.1 match 23 stars 6.50 score 92 scripts 1 dependents

pbreheny

grpreg:Regularization Paths for Regression Models with Grouped Covariates

Efficient algorithms for fitting the regularization path of linear regression, GLM, and Cox regression models with grouped penalties. This includes group selection methods such as group lasso, group MCP, and group SCAD as well as bi-level selection methods such as the group exponential lasso, the composite MCP, and the group bridge. For more information, see Breheny and Huang (2009) <doi:10.4310/sii.2009.v2.n3.a10>, Huang, Breheny, and Ma (2012) <doi:10.1214/12-sts392>, Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2>, and Breheny (2015) <doi:10.1111/biom.12300>, or visit the package homepage <https://pbreheny.github.io/grpreg/>.

Maintained by Patrick Breheny. Last updated 15 days ago.

6.9 match 34 stars 11.38 score 192 scripts 34 dependents

kwb-r

kwb.utils:General Utility Functions Developed at KWB

This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).

Maintained by Hauke Sonnenberg. Last updated 12 months ago.

10.4 match 8 stars 7.33 score 12 scripts 78 dependents

bioc

TFBSTools:Software Package for Transcription Factor Binding Site (TFBS) Analysis

TFBSTools is a package for the analysis and manipulation of transcription factor binding sites. It includes matrices conversion between Position Frequency Matirx (PFM), Position Weight Matirx (PWM) and Information Content Matrix (ICM). It can also scan putative TFBS from sequence/alignment, query JASPAR database and provides a wrapper of de novo motif discovery software.

Maintained by Ge Tan. Last updated 3 days ago.

motifannotation generegulation motifdiscovery transcription alignment

6.1 match 28 stars 12.36 score 1.1k scripts 18 dependents

bioc

TFHAZ:Transcription Factor High Accumulation Zones

It finds trascription factor (TF) high accumulation DNA zones, i.e., regions along the genome where there is a high presence of different transcription factors. Starting from a dataset containing the genomic positions of TF binding regions, for each base of the selected chromosome the accumulation of TFs is computed. Three different types of accumulation (TF, region and base accumulation) are available, together with the possibility of considering, in the single base accumulation computing, the TFs present not only in that single base, but also in its neighborhood, within a window of a given width. Two different methods for the search of TF high accumulation DNA zones, called "binding regions" and "overlaps", are available. In addition, some functions are provided in order to analyze, visualize and compare results obtained with different input parameters.

Maintained by Gaia Ceddia. Last updated 5 months ago.

software biologicalquestion transcription chipseq coverage

18.9 match 4.00 score 2 scripts

cran

MASS:Support Functions and Datasets for Venables and Ripley's MASS

Functions and datasets to support Venables and Ripley, "Modern Applied Statistics with S" (4th edition, 2002).

Maintained by Brian Ripley. Last updated 15 days ago.

7.2 match 19 stars 10.53 score 11k dependents

bioc

RcisTarget:RcisTarget Identify transcription factor binding motifs enriched on a list of genes or genomic regions

RcisTarget identifies transcription factor binding motifs (TFBS) over-represented on a gene list. In a first step, RcisTarget selects DNA motifs that are significantly over-represented in the surroundings of the transcription start site (TSS) of the genes in the gene-set. This is achieved by using a database that contains genome-wide cross-species rankings for each motif. The motifs that are then annotated to TFs and those that have a high Normalized Enrichment Score (NES) are retained. Finally, for each motif and gene-set, RcisTarget predicts the candidate target genes (i.e. genes in the gene-set that are ranked above the leading edge).

Maintained by Gert Hulselmans. Last updated 5 months ago.

generegulation motifannotation transcriptomics transcription genesetenrichment genetarget

8.0 match 37 stars 9.47 score 191 scripts

sfcheung

semptools:Customizing Structural Equation Modelling Plots

Most function focus on specific ways to customize a graph. They use a 'qgraph' output as the first argument, and return a modified 'qgraph' object. This allows the functions to be chained by a pipe operator.

Maintained by Shu Fai Cheung. Last updated 2 months ago.

diagram graph lavaan plot sem structural-equation-modeling

10.5 match 7 stars 7.12 score 87 scripts

radiant-rstats

radiant.data:Data Menu for Radiant: Business Analytics using R and Shiny

The Radiant Data menu includes interfaces for loading, saving, viewing, visualizing, summarizing, transforming, and combining data. It also contains functionality to generate reproducible reports of the analyses conducted in the application.

Maintained by Vincent Nijs. Last updated 5 months ago.

9.0 match 54 stars 8.30 score 146 scripts 6 dependents

mages

ChainLadder:Statistical Methods and Models for Claims Reserving in General Insurance

Various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance, including those to estimate the claims development result as required under Solvency II.

Maintained by Markus Gesmann. Last updated 1 months ago.

7.4 match 82 stars 10.04 score 196 scripts 2 dependents

lamho86

phylolm:Phylogenetic Linear Regression

Provides functions for fitting phylogenetic linear models and phylogenetic generalized linear models. The computation uses an algorithm that is linear in the number of tips in the tree. The package also provides functions for simulating continuous or binary traits along the tree. Other tools include functions to test the adequacy of a population tree.

Maintained by Lam Si Tung Ho. Last updated 4 months ago.

6.9 match 33 stars 10.79 score 318 scripts 14 dependents

johnfergusonnuig

graphPAF:Estimating and Displaying Population Attributable Fractions

Estimation and display of various types of population attributable fraction and impact fractions. As well as the usual calculations of attributable fractions and impact fractions, functions are provided for attributable fraction nomograms and fan plots, continuous exposures, for pathway specific population attributable fractions, and for joint, average and sequential population attributable fractions.

Maintained by John Ferguson. Last updated 7 months ago.

19.6 match 3 stars 3.78 score 6 scripts

r-lib

generics:Common S3 Generics not Provided by Base R Methods Related to Model Fitting

In order to reduce potential package dependencies and conflicts, generics provides a number of commonly used S3 generics.

Maintained by Hadley Wickham. Last updated 1 years ago.

5.3 match 61 stars 14.00 score 131 scripts 9.8k dependents

xijianzheng

coefa:Meta Analysis of Factor Analysis Based on CO-Occurrence Matrices

Provide a series of functions to conduct a meta analysis of factor analysis based on co-occurrence matrices. The tool can be used to solve the factor structure (i.e. inner structure of a construct, or scale) debate in several disciplines, such as psychology, psychiatry, management, education so on. References: Shafer (2005) <doi:10.1037/1040-3590.17.3.324>; Shafer (2006) <doi:10.1002/jclp.20213>; Loeber and Schmaling (1985) <doi:10.1007/BF00910652>.

Maintained by Xijian Zheng. Last updated 2 years ago.

27.2 match 2.70 score 4 scripts

bioc

fgga:Hierarchical ensemble method based on factor graph

Package that implements the FGGA algorithm. This package provides a hierarchical ensemble method based ob factor graphs for the consistent cross-ontology annotation of protein coding genes. FGGA embodies elements of predicate logic, communication theory, supervised learning and inference in graphical models.

Maintained by Flavio Spetale. Last updated 5 months ago.

software statisticalmethod classification network networkinference supportvectormachine graphandnetwork go

16.4 match 3 stars 4.48 score 6 scripts

bioc

structToolbox:Data processing & analysis tools for Metabolomics and other omics

An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.

Maintained by Gavin Rhys Lloyd. Last updated 24 days ago.

workflowstep metabolomics bioconductor-package dims lc-ms machine-learning multivariate-analysis statistics univariate

11.7 match 10 stars 6.26 score 12 scripts

rohelab

fastRG:Sample Generalized Random Dot Product Graphs in Linear Time

Samples generalized random product graphs, a generalization of a broad class of network models. Given matrices X, S, and Y with with non-negative entries, samples a matrix with expectation X S Y^T and independent Poisson or Bernoulli entries using the fastRG algorithm of Rohe et al. (2017) <https://www.jmlr.org/papers/v19/17-128.html>. The algorithm first samples the number of edges and then puts them down one-by-one. As a result it is O(m) where m is the number of edges, a dramatic improvement over element-wise algorithms that which require O(n^2) operations to sample a random graph, where n is the number of nodes.

Maintained by Alex Hayes. Last updated 7 months ago.

adjacency-matrix graph-sampling latent-factors

16.1 match 5 stars 4.52 score 22 scripts

sestelo

npregfast:Nonparametric Estimation of Regression Models with Factor-by-Curve Interactions

A method for obtaining nonparametric estimates of regression models with or without factor-by-curve interactions using local polynomial kernel smoothers or splines. Additionally, a parametric model (allometric model) can be estimated.

Maintained by Marta Sestelo. Last updated 2 months ago.

allometric barnacle critical-points curve-interactions factor-by-curve fortran interaction nonparametric regression-models testing

12.6 match 5 stars 5.73 score 89 scripts 2 dependents

chiliubio

microeco:Microbial Community Ecology Data Analysis

A series of statistical and plotting approaches in microbial community ecology based on the R6 class. The classes are designed for data preprocessing, taxa abundance plotting, alpha diversity analysis, beta diversity analysis, differential abundance test, null model analysis, network analysis, machine learning, environmental data analysis and functional analysis.

Maintained by Chi Liu. Last updated 3 days ago.

7.1 match 219 stars 10.11 score 211 scripts 3 dependents

bioc

ccfindR:Cancer Clone Finder

A collection of tools for cancer genomic data clustering analyses, including those for single cell RNA-seq. Cell clustering and feature gene selection analysis employ Bayesian (and maximum likelihood) non-negative matrix factorization (NMF) algorithm. Input data set consists of RNA count matrix, gene, and cell bar code annotations. Analysis outputs are factor matrices for multiple ranks and marginal likelihood values for each rank. The package includes utilities for downstream analyses, including meta-gene identification, visualization, and construction of rank-based trees for clusters.

Maintained by Jun Woo. Last updated 5 months ago.

transcriptomics singlecell immunooncology bayesian clustering gsl cpp

18.0 match 4.00 score 9 scripts

slzhang-fd

mirtjml:Joint Maximum Likelihood Estimation for High-Dimensional Item Factor Analysis

Provides constrained joint maximum likelihood estimation algorithms for item factor analysis (IFA) based on multidimensional item response theory models. So far, we provide functions for exploratory and confirmatory IFA based on the multidimensional two parameter logistic (M2PL) model for binary response data. Comparing with traditional estimation methods for IFA, the methods implemented in this package scale better to data with large numbers of respondents, items, and latent factors. The computation is facilitated by multiprocessing 'OpenMP' API. For more information, please refer to: 1. Chen, Y., Li, X., & Zhang, S. (2018). Joint Maximum Likelihood Estimation for High-Dimensional Exploratory Item Factor Analysis. Psychometrika, 1-23. <doi:10.1007/s11336-018-9646-5>; 2. Chen, Y., Li, X., & Zhang, S. (2019). Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications. Journal of the American Statistical Association, <doi: 10.1080/01621459.2019.1635485>.

Maintained by Siliang Zhang. Last updated 4 years ago.

ifa item-factor-analysis large-scale-assessment parallel-computing psychometrics openblas cpp openmp

16.9 match 9 stars 4.21 score 12 scripts 1 dependents

jrmccombs

RHPCBenchmark:Benchmarks for High-Performance Computing Environments

Microbenchmarks for determining the run time performance of aspects of the R programming environment and packages relevant to high-performance computation. The benchmarks are divided into three categories: dense matrix linear algebra kernels, sparse matrix linear algebra kernels, and machine learning functionality.

Maintained by James McCombs. Last updated 8 years ago.

23.5 match 3.02 score 21 scripts

glenndavis52

colorscience:Color Science Methods and Data

Methods and data for color science - color conversions by observer, illuminant, and gamma. Color matching functions and chromaticity diagrams. Color indices, color differences, and spectral data conversion/analysis. This package is deprecated and will someday be removed; for reasons and details please see the README file.

Maintained by Glenn Davis. Last updated 11 months ago.

18.0 match 4 stars 3.93 score 214 scripts

stephenslab

mashr:Multivariate Adaptive Shrinkage

Implements the multivariate adaptive shrinkage (mash) method of Urbut et al (2019) <DOI:10.1038/s41588-018-0268-8> for estimating and testing large numbers of effects in many conditions (or many outcomes). Mash takes an empirical Bayes approach to testing and effect estimation; it estimates patterns of similarity among conditions, then exploits these patterns to improve accuracy of the effect estimates. The core linear algebra is implemented in C++ for fast model fitting and posterior computation.

Maintained by Peter Carbonetto. Last updated 4 months ago.

openblas gsl cpp openmp

6.4 match 91 stars 11.04 score 624 scripts 3 dependents

adeverse

adegraphics:An S4 Lattice-Based Package for the Representation of Multivariate Data

Graphical functionalities for the representation of multivariate data. It is a complete re-implementation of the functions available in the 'ade4' package.

Maintained by Aurélie Siberchicot. Last updated 8 months ago.

6.8 match 9 stars 10.37 score 386 scripts 6 dependents

pik-piam

mrremind:MadRat REMIND Input Data Package

The mrremind packages contains data preprocessing for the REMIND model.

Maintained by Lavinia Baumstark. Last updated 2 days ago.

11.2 match 4 stars 6.25 score 15 scripts 1 dependents

ausgis

GD:Geographical Detectors for Assessing Spatial Factors

Geographical detectors for measuring spatial stratified heterogeneity, as described in Jinfeng Wang (2010) <doi:10.1080/13658810802443457> and Jinfeng Wang (2016) <doi:10.1016/j.ecolind.2016.02.052>. Includes the optimal discretization of continuous data, four primary functions of geographical detectors, comparison of size effects of spatial unit and the visualizations of results. To use the package and to refer the descriptions of the package, methods and case datasets, please cite Yongze Song (2020) <doi:10.1080/15481603.2020.1760434>. The model has been applied in factor exploration of road performance and multi-scale spatial segmentation for network data, as described in Yongze Song (2018) <doi:10.3390/rs10111696> and Yongze Song (2020) <doi:10.1109/TITS.2020.3001193>, respectively.

Maintained by Wenbo Lv. Last updated 4 months ago.

geographical-detector spatial-stratified-heterogeneity

9.3 match 13 stars 7.49 score 51 scripts

rstudio

tfprobability:Interface to 'TensorFlow Probability'

Interface to 'TensorFlow Probability', a 'Python' library built on 'TensorFlow' that makes it easy to combine probabilistic models and deep learning on modern hardware ('TPU', 'GPU'). 'TensorFlow Probability' includes a wide selection of probability distributions and bijectors, probabilistic layers, variational inference, Markov chain Monte Carlo, and optimizers such as Nelder-Mead, BFGS, and SGLD.

Maintained by Tomasz Kalinowski. Last updated 3 years ago.

8.0 match 54 stars 8.63 score 221 scripts 3 dependents

bioc

matter:Out-of-core statistical computing and signal processing

Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.

Maintained by Kylie A. Bemis. Last updated 3 months ago.

infrastructure datarepresentation dataimport dimensionreduction preprocessing cpp

7.3 match 57 stars 9.52 score 64 scripts 2 dependents

easystats

see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'

Provides plotting utilities supporting packages in the 'easystats' ecosystem (<https://github.com/easystats/easystats>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <https://materialui.co/>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.

Maintained by Indrajeet Patil. Last updated 4 days ago.

data-visualization easystats ggplot2 hacktoberfest plotting see statistics visualisation visualization

5.2 match 902 stars 13.22 score 2.0k scripts 3 dependents

cran

compositions:Compositional Data Analysis

Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.

Maintained by K. Gerald van den Boogaart. Last updated 1 years ago.

openblas

10.8 match 1 stars 6.35 score 36 dependents

poissonconsulting

chk:Check User-Supplied Function Arguments

For developers to check user-supplied function arguments. It is designed to be simple, fast and customizable. Error messages follow the tidyverse style guide.

Maintained by Joe Thorley. Last updated 2 months ago.

chk

5.8 match 48 stars 11.89 score 22 scripts 95 dependents

vegandevs

vegan:Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Maintained by Jari Oksanen. Last updated 15 days ago.

ecological-modelling ecology ordination fortran openblas

3.5 match 472 stars 19.41 score 15k scripts 440 dependents

cran

NAP:Non-Local Alternative Priors in Psychology

Conducts Bayesian Hypothesis tests of a point null hypothesis against a two-sided alternative using Non-local Alternative Prior (NAP) for one- and two-sample z- and t-tests (Pramanik and Johnson, 2022). Under the alternative, the NAP is assumed on the standardized effects size in one-sample tests and on their differences in two-sample tests. The package considers two types of NAP densities: (1) the normal moment prior, and (2) the composite alternative. In fixed design tests, the functions calculate the Bayes factors and the expected weight of evidence for varied effect size and sample size. The package also provides a sequential testing framework using the Sequential Bayes Factor (SBF) design. The functions calculate the operating characteristics (OC) and the average sample number (ASN), and also conducts sequential tests for a sequentially observed data.

Maintained by Sandipan Pramanik. Last updated 3 years ago.

34.0 match 2.00 score

braverock

PortfolioAnalytics:Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios

Portfolio optimization and analysis routines and graphics.

Maintained by Brian G. Peterson. Last updated 3 months ago.

5.9 match 81 stars 11.49 score 626 scripts 2 dependents

rohelab

LRMF3:Low Rank Matrix Factorization S3 Objects

Provides S3 classes to represent low rank matrix decompositions.

Maintained by Alex Hayes. Last updated 3 years ago.

matrix-factorization singular-value-decomposition

17.9 match 2 stars 3.78 score 6 scripts 2 dependents

jhorzek

mxsem:Specify 'OpenMx' Models with a 'lavaan'-Style Syntax

Provides a 'lavaan'-like syntax for 'OpenMx' models. The syntax supports definition variables, bounds, and parameter transformations. This allows for latent growth curve models with person-specific measurement occasions, moderated nonlinear factor analysis and much more.

Maintained by Jannik H. Orzek. Last updated 4 months ago.

factor-analysis lavaan openmx structural-equation-modeling cpp

11.1 match 3 stars 6.05 score 47 scripts

danheck

multinomineq:Bayesian Inference for Multinomial Models with Inequality Constraints

Implements Gibbs sampling and Bayes factors for multinomial models with linear inequality constraints on the vector of probability parameters. As special cases, the model class includes models that predict a linear order of binomial probabilities (e.g., p[1] < p[2] < p[3] < .50) and mixture models assuming that the parameter vector p must be inside the convex hull of a finite number of predicted patterns (i.e., vertices). A formal definition of inequality-constrained multinomial models and the implemented computational methods is provided in: Heck, D.W., & Davis-Stober, C.P. (2019). Multinomial models with linear inequality constraints: Overview and improvements of computational methods for Bayesian inference. Journal of Mathematical Psychology, 91, 70-87. <doi:10.1016/j.jmp.2019.03.004>. Inequality-constrained multinomial models have applications in the area of judgment and decision making to fit and test random utility models (Regenwetter, M., Dana, J., & Davis-Stober, C.P. (2011). Transitivity of preferences. Psychological Review, 118, 42–56, <doi:10.1037/a0021150>) or to perform outcome-based strategy classification to select the decision strategy that provides the best account for a vector of observed choice frequencies (Heck, D.W., Hilbig, B.E., & Moshagen, M. (2017). From information processing to decisions: Formalizing and comparing probabilistic choice models. Cognitive Psychology, 96, 26–40. <doi:10.1016/j.cogpsych.2017.05.003>).

Maintained by Daniel W. Heck. Last updated 1 years ago.

openblas cpp openmp

15.6 match 4 stars 4.30 score 4 scripts

julianfaraway

faraway:Datasets and Functions for Books by Julian Faraway

Books are "Linear Models with R" published 1st Ed. August 2004, 2nd Ed. July 2014, 3rd Ed. February 2025 by CRC press, ISBN 9781439887332, and "Extending the Linear Model with R" published by CRC press in 1st Ed. December 2005 and 2nd Ed. March 2016, ISBN 9781584884248 and "Practical Regression and ANOVA in R" contributed documentation on CRAN (now very dated).

Maintained by Julian Faraway. Last updated 1 months ago.

data

7.1 match 29 stars 9.43 score 1.7k scripts 1 dependents

lcbc-uio

questionnaires:Package with functions to calculate components and sums for LCBC questionnaires

Creates summaries and factorials of answers to questionnaires.

Maintained by Athanasia Mo Mowinckel. Last updated 2 years ago.

14.4 match 3 stars 4.63 score 13 scripts