R-universe search: association

mhahsler

arules:Mining Association Rules and Frequent Itemsets

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.

Maintained by Michael Hahsler. Last updated 2 months ago.

arules association-rules frequent-itemsets

37.5 match 194 stars 13.99 score 3.3k scripts 28 dependents

florianstijven

Surrogate:Evaluation of Surrogate Endpoints in Clinical Trials

In a clinical trial, it frequently occurs that the most credible outcome to evaluate the effectiveness of a new therapy (the true endpoint) is difficult to measure. In such a situation, it can be an effective strategy to replace the true endpoint by a (bio)marker that is easier to measure and that allows for a prediction of the treatment effect on the true endpoint (a surrogate endpoint). The package 'Surrogate' allows for an evaluation of the appropriateness of a candidate surrogate endpoint based on the meta-analytic, information-theoretic, and causal-inference frameworks. Part of this software has been developed using funding provided from the European Union's Seventh Framework Programme for research, technological development and demonstration (Grant Agreement no 602552), the Special Research Fund (BOF) of Hasselt University (BOF-number: BOF2OCPO3), GlaxoSmithKline Biologicals, Baekeland Mandaat (HBC.2022.0145), and Johnson & Johnson Innovative Medicine.

Maintained by Wim Van Der Elst. Last updated 1 months ago.

42.5 match 1 stars 6.15 score 133 scripts

wviechtb

metadat:Meta-Analysis Datasets

A collection of meta-analysis datasets for teaching purposes, illustrating/testing meta-analytic methods, and validating published analyses.

Maintained by Wolfgang Viechtbauer. Last updated 18 days ago.

dataset datasets meta-analysis

24.2 match 30 stars 10.54 score 65 scripts 93 dependents

ikwak2

aSPU:Adaptive Sum of Powered Score Test

R codes for the (adaptive) Sum of Powered Score ('SPU' and 'aSPU') tests, inverse variance weighted Sum of Powered score ('SPUw' and 'aSPUw') tests and gene-based and some pathway based association tests (Pathway based Sum of Powered Score tests ('SPUpath'), adaptive 'SPUpath' ('aSPUpath') test, 'GEEaSPU' test for multiple traits - single 'SNP' (single nucleotide polymorphism) association in generalized estimation equations, 'MTaSPUs' test for multiple traits - single 'SNP' association with Genome Wide Association Studies ('GWAS') summary statistics, Gene-based Association Test that uses an extended 'Simes' procedure ('GATES'), Hybrid Set-based Test ('HYST') and extended version of 'GATES' test for pathway-based association testing ('GATES-Simes'). ). The tests can be used with genetic and other data sets with covariates. The response variable is binary or quantitative. Summary; (1) Single trait-'SNP' set association with individual-level data ('aSPU', 'aSPUw', 'aSPUr'), (2) Single trait-'SNP' set association with summary statistics ('aSPUs'), (3) Single trait-pathway association with individual-level data ('aSPUpath'), (4) Single trait-pathway association with summary statistics ('aSPUsPath'), (5) Multiple traits-single 'SNP' association with individual-level data ('GEEaSPU'), (6) Multiple traits- single 'SNP' association with summary statistics ('MTaSPUs'), (7) Multiple traits-'SNP' set association with summary statistics('MTaSPUsSet'), (8) Multiple traits-pathway association with summary statistics('MTaSPUsSetPath').

Maintained by Il-Youp Kwak. Last updated 4 years ago.

33.2 match 12 stars 7.18 score 42 scripts 1 dependents

mhahsler

arulesViz:Visualizing Association Rules and Frequent Itemsets

Extends package 'arules' with various visualization techniques for association rules and itemsets. The package also includes several interactive visualizations for rule exploration. Michael Hahsler (2017) <doi:10.32614/RJ-2017-047>.

Maintained by Michael Hahsler. Last updated 7 months ago.

arules association-rules frequent-itemsets interactive-visualizations visualization

21.4 match 54 stars 11.03 score 1.7k scripts 2 dependents

bioc

SIAMCAT:Statistical Inference of Associations between Microbial Communities And host phenoTypes

Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).

Maintained by Jakob Wirbel. Last updated 5 months ago.

immunooncology metagenomics classification microbiome sequencing preprocessing clustering featureextraction geneticvariability multiplecomparison regression

35.1 match 6.72 score 147 scripts

tiledb-inc

tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays

The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.

Maintained by Isaiah Norton. Last updated 4 days ago.

array hdfs s3 storage-manager tiledb cpp

19.1 match 108 stars 11.79 score 306 scripts 4 dependents

ramiromagno

gwasrapidd:'REST' 'API' Client for the 'NHGRI'-'EBI' 'GWAS' Catalog

'GWAS' R 'API' Data Download. This package provides easy access to the 'NHGRI'-'EBI' 'GWAS' Catalog data by accessing the 'REST' 'API' <https://www.ebi.ac.uk/gwas/rest/docs/api/>.

Maintained by Ramiro Magno. Last updated 1 years ago.

thirdpartyclient biomedicalinformatics genomewideassociation snp association-studies gwas-catalog human rest-client trait trait-ontology

26.7 match 95 stars 8.10 score 49 scripts 1 dependents

jiefei-wang

aws.ecx:Communicating with AWS EC2 and ECS using AWS REST APIs

Providing the functions for communicating with Amazon Web Services(AWS) Elastic Compute Cloud(EC2) and Elastic Container Service(ECS). The functions will have the prefix 'ecs_' or 'ec2_' depending on the class of the API. The request will be sent via the REST API and the parameters are given by the function argument. The credentials can be set via 'aws_set_credentials'. The EC2 documentation can be found at <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Welcome.html> and ECS can be found at <https://docs.aws.amazon.com/AmazonECS/latest/APIReference/Welcome.html>.

Maintained by Jiefei Wang. Last updated 3 years ago.

ec2 ecs ecs-functions

45.6 match 1 stars 4.18 score 2 scripts

elvanceyhan

pcds:Proximity Catch Digraphs and Their Applications

Contains the functions for construction and visualization of various families of the proximity catch digraphs (PCDs) (see (Ceyhan (2005) ISBN:978-3-639-19063-2), for computing the graph invariants for testing the patterns of segregation and association against complete spatial randomness (CSR) or uniformity in one, two and three dimensional cases. The package also has tools for generating points from these spatial patterns. The graph invariants used in testing spatial point data are the domination number (Ceyhan (2011) <doi:10.1080/03610921003597211>) and arc density (Ceyhan et al. (2006) <doi:10.1016/j.csda.2005.03.002>; Ceyhan et al. (2007) <doi:10.1002/cjs.5550350106>). The PCD families considered are Arc-Slice PCDs, Proportional-Edge PCDs, and Central Similarity PCDs.

Maintained by Elvan Ceyhan. Last updated 2 years ago.

32.1 match 5.80 score 21 scripts 2 dependents

isglobal-brge

SNPassoc:SNPs-Based Whole Genome Association Studies

Functions to perform most of the common analysis in genome association studies are implemented. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Max-statistic and genetic risk-allele score exact distributions are also possible to be estimated. The methods are described in Gonzalez JR et al., 2007 <doi: 10.1093/bioinformatics/btm025>.

Maintained by Dolors Pelegri. Last updated 5 months ago.

19.6 match 16 stars 9.11 score 89 scripts 6 dependents

nalimilan

logmult:Log-Multiplicative Models, Including Association Models

Functions to fit log-multiplicative models using 'gnm', with support for convenient printing, plots, and jackknife/bootstrap standard errors. For complex survey data, models can be fitted from design objects from the 'survey' package. Currently supported models include UNIDIFF (Erikson & Goldthorpe, 1992), a.k.a. log-multiplicative layer effect model (Xie, 1992) <doi:10.2307/2096242>, and several association models: Goodman (1979) <doi:10.2307/2286971> row-column association models of the RC(M) and RC(M)-L families with one or several dimensions; two skew-symmetric association models proposed by Yamaguchi (1990) <doi:10.2307/271086> and by van der Heijden & Mooijaart (1995) <doi:10.1177/0049124195024001002> Functions allow computing the intrinsic association coefficient (see Bouchet-Valat (2022) <doi:10.1177/0049124119852389>) and the Altham (1970) index <doi:10.1111/j.2517-6161.1970.tb00816.x>, including via the Bayes shrinkage estimator proposed by Zhou (2015) <doi:10.1177/0081175015570097>; and the RAS/IPF/Deming-Stephan algorithm.

Maintained by Milan Bouchet-Valat. Last updated 3 years ago.

log-linear-model modelling statistics

34.1 match 4 stars 5.18 score 76 scripts

dataoneorg

dataone:R Interface to the DataONE REST API

Provides read and write access to data and metadata from the DataONE network <https://www.dataone.org> of data repositories. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository. Users can also insert and update data objects on repositories that support these methods.

Maintained by Matthew B. Jones. Last updated 3 years ago.

17.5 match 36 stars 9.93 score 472 scripts 3 dependents

cran

wavethresh:Wavelets Statistics and Transforms

Performs 1, 2 and 3D real and complex-valued wavelet transforms, nondecimated transforms, wavelet packet transforms, nondecimated wavelet packet transforms, multiple wavelet transforms, complex-valued wavelet transforms, wavelet shrinkage for various kinds of data, locally stationary wavelet time series, nonstationary multiscale transfer function modeling, density estimation.

Maintained by Guy Nason. Last updated 7 months ago.

26.1 match 5.90 score 41 dependents

bioc

GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness

The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.

Maintained by Stephanie M. Gogarten. Last updated 2 months ago.

snp geneticvariability genetics statisticalmethod dimensionreduction principalcomponent genomewideassociation qualitycontrol biocviews

14.5 match 36 stars 10.44 score 342 scripts 1 dependents

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 7 days ago.

genetics imputation lmm fortran

12.5 match 12 stars 11.94 score 448 scripts 16 dependents

bioc

APL:Association Plots

APL is a package developed for computation of Association Plots (AP), a method for visualization and analysis of single cell transcriptomics data. The main focus of APL is the identification of genes characteristic for individual clusters of cells from input data. The package performs correspondence analysis (CA) and allows to identify cluster-specific genes using Association Plots. Additionally, APL computes the cluster-specificity scores for all genes which allows to rank the genes by their specificity for a selected cell cluster of interest.

Maintained by Clemens Kohl. Last updated 5 months ago.

statisticalmethod dimensionreduction singlecell sequencing rnaseq geneexpression

23.5 match 15 stars 6.31 score 15 scripts

bioc

TFARM:Transcription Factors Association Rules Miner

It searches for relevant associations of transcription factors with a transcription factor target, in specific genomic regions. It also allows to evaluate the Importance Index distribution of transcription factors (and combinations of transcription factors) in association rules.

Maintained by Liuba Nausicaa Martino. Last updated 5 months ago.

biologicalquestion infrastructure statisticalmethod transcription

36.1 match 4.00 score 2 scripts

bioc

flowWorkspace:Infrastructure for representing and interacting with gated and ungated cytometry data sets.

This package is designed to facilitate comparison of automated gating methods against manual gating done in flowJo. This package allows you to import basic flowJo workspaces into BioConductor and replicate the gating from flowJo using the flowCore functionality. Gating hierarchies, groups of samples, compensation, and transformation are performed so that the output matches the flowJo analysis.

Maintained by Greg Finak. Last updated 25 days ago.

immunooncology flowcytometry dataimport preprocessing datarepresentation zlib openblas cpp

17.3 match 7.89 score 576 scripts 10 dependents

ralmond

RNetica:R interface to Netica(R) Bayesian Network Engine

This provides an R interface to the Netica (http://norsys.com/) Bayesian network library API.

Maintained by Russell Almond. Last updated 3 months ago.

bayesian-network

27.7 match 2 stars 4.92 score 14 scripts 2 dependents

helixcn

spaa:SPecies Association Analysis

Miscellaneous functions for analysing species association and niche overlap.

Maintained by Jinlong Zhang. Last updated 4 years ago.

18.3 match 12 stars 7.40 score 155 scripts 1 dependents

nicolas-robette

descriptio:Descriptive Statistical Analysis

Description of statistical associations between variables : measures of local and global association between variables (phi, Cramér V, correlations, eta-squared, Goodman and Kruskal tau, permutation tests, etc.), multiple graphical representations of the associations between variables (using 'ggplot2') and weighted statistics.

Maintained by Nicolas Robette. Last updated 7 months ago.

26.0 match 4 stars 5.00 score 11 scripts 3 dependents

mhahsler

arulesCBA:Classification Based on Association Rules

Provides the infrastructure for association rule-based classification including the algorithms CBA, CMAR, CPAR, C4.5, FOIL, PART, PRM, RCAR, and RIPPER to build associative classifiers. Hahsler et al (2019) <doi:10.32614/RJ-2019-048>.

Maintained by Michael Hahsler. Last updated 7 months ago.

association-rules classification

22.1 match 3 stars 5.42 score 47 scripts 1 dependents

config-i1

greybox:Toolbox for Model Building and Forecasting

Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.

Maintained by Ivan Svetunkov. Last updated 18 days ago.

forecasting model-selection model-selection-and-evaluation regression regression-models statistics cpp

10.6 match 30 stars 11.03 score 97 scripts 34 dependents

xiaoruizhu

PAsso:Assessing the Partial Association Between Ordinal Variables

An implementation of the unified framework for assessing partial association between ordinal variables after adjusting for a set of covariates (Dungang Liu, Shaobo Li, Yan Yu and Irini Moustaki (2020) <doi:10.1080/01621459.2020.1796394> Journal of the American Statistical Association). This package provides a set of tools to quantify, visualize, and test partial associations between multiple ordinal variables. It can produce a number of $phi$ measures, partial regression plots, 3-D plots, and p-values for testing H_0: phi=0 or H_0: phi <= delta.

Maintained by Xiaorui (Jeremy) Zhu. Last updated 1 years ago.

association-analysis ordinal-variables partial-association statistics cpp

27.8 match 7 stars 4.14 score 13 scripts 1 dependents

victor-navarro

calmr:Canonical Associative Learning Models and their Representations

Implementations of canonical associative learning models, with tools to run experiment simulations, estimate model parameters, and compare model representations. Experiments and results are represented using S4 classes and methods.

Maintained by Victor Navarro. Last updated 10 months ago.

17.8 match 3 stars 6.40 score 17 scripts

branchlab

metasnf:Meta Clustering with Similarity Network Fusion

Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.

Maintained by Prashanth S Velayudhan. Last updated 7 days ago.

bioinformatics clustering metaclustering snf

13.9 match 8 stars 8.21 score 30 scripts

bioc

ISAnalytics:Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies

In gene therapy, stem cells are modified using viral vectors to deliver the therapeutic transgene and replace functional properties since the genetic modification is stable and inherited in all cell progeny. The retrieval and mapping of the sequences flanking the virus-host DNA junctions allows the identification of insertion sites (IS), essential for monitoring the evolution of genetically modified cells in vivo. A comprehensive toolkit for the analysis of IS is required to foster clonal trackign studies and supporting the assessment of safety and long term efficacy in vivo. This package is aimed at (1) supporting automation of IS workflow, (2) performing base and advance analysis for IS tracking (clonal abundance, clonal expansions and statistics for insertional mutagenesis, etc.), (3) providing basic biology insights of transduced stem cells in vivo.

Maintained by Francesco Gazzo. Last updated 4 months ago.

biomedicalinformatics sequencing singlecell

19.4 match 3 stars 5.83 score 15 scripts

bioc

GWASTools:Tools for Genome Wide Association Studies

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.

Maintained by Stephanie M. Gogarten. Last updated 14 days ago.

snp geneticvariability qualitycontrol microarray

10.2 match 17 stars 10.67 score 396 scripts 5 dependents

tobiaskley

quantspec:Quantile-Based Spectral Analysis of Time Series

Methods to determine, smooth and plot quantile periodograms for univariate and multivariate time series.

Maintained by Tobias Kley. Last updated 9 years ago.

cpp

18.7 match 10 stars 5.84 score 46 scripts 1 dependents

bioc

DegCre:Probabilistic association of DEGs to CREs from differential data

DegCre generates associations between differentially expressed genes (DEGs) and cis-regulatory elements (CREs) based on non-parametric concordance between differential data. The user provides GRanges of DEG TSS and CRE regions with differential p-value and optionally log-fold changes and DegCre returns an annotated Hits object with associations and their calculated probabilities. Additionally, the package provides functionality for visualization and conversion to other formats.

Maintained by Brian S. Roberts. Last updated 4 months ago.

geneexpression generegulation atacseq chipseq dnaseseq rnaseq

20.5 match 5 stars 5.30 score 2 scripts

pbreheny

ncvreg:Regularization Paths for SCAD and MCP Penalized Regression Models

Fits regularization paths for linear regression, GLM, and Cox regression models using lasso or nonconvex penalties, in particular the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty, with options for additional L2 penalties (the "elastic net" idea). Utilities for carrying out cross-validation as well as post-fitting visualization, summarization, inference, and prediction are also provided. For more information, see Breheny and Huang (2011) <doi:10.1214/10-AOAS388> or visit the ncvreg homepage <https://pbreheny.github.io/ncvreg/>.

Maintained by Patrick Breheny. Last updated 14 days ago.

9.0 match 43 stars 12.03 score 458 scripts 38 dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 14 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

7.0 match 959 stars 15.20 score 4.0k scripts 21 dependents

dboslab

expowo:An R package for mining global plant diversity and distribution data

Produces diversity estimates and species lists with associated global distribution for any vascular plant family and genus from 'Plants of the World Online' database <https://powo.science.kew.org/>, by interacting with the source code of each plant taxon page. It also creates global maps of species richness, graphics of species discoveries and nomenclatural changes over time. For more details

Maintained by Debora Zuanny. Last updated 8 days ago.

data-mining extractor

13.5 match 8 stars 7.44 score 64 scripts

cbhurley

bullseye:Visualising Multiple Pairwise Variable Correlations and Other Scores

We provide a tidy data structure and visualisations for multiple or grouped variable correlations, general association measures scagnostics and other pairwise scores suitable for numerical, ordinal and nominal variables. Supported measures include distance correlation, maximal information, ace correlation, Kendall's tau, and polychoric correlation.

Maintained by Catherine Hurley. Last updated 25 days ago.

17.5 match 2 stars 5.58 score 14 scripts

ropensci

allodb:Tree Biomass Estimation at Extra-Tropical Forest Plots

Standardize and simplify the tree biomass estimation process across globally distributed extratropical forests.

Maintained by Erika Gonzalez-Akre. Last updated 26 days ago.

16.2 match 38 stars 5.94 score 38 scripts

pharmar

riskmetric:Risk Metrics to Evaluating R Packages

Facilities for assessing R packages against a number of metrics to help quantify their robustness.

Maintained by Eli Miller. Last updated 8 days ago.

10.7 match 166 stars 8.98 score 43 scripts

bioc

hmdbQuery:utilities for exploration of human metabolome database

Define utilities for exploration of human metabolome database, including functions to retrieve specific metabolite entries and data snapshots with pairwise associations (metabolite-gene,-protein,-disease).

Maintained by VJ Carey. Last updated 5 months ago.

metabolomics infrastructure

23.1 match 4.11 score 13 scripts

i02momuj

RKEEL:Using 'KEEL' in R Code

'KEEL' is a popular 'Java' software for a large number of different knowledge data discovery tasks. This package takes the advantages of 'KEEL' and R, allowing to use 'KEEL' algorithms in simple R code. The implemented R code layer between R and 'KEEL' makes easy both using 'KEEL' algorithms in R as implementing new algorithms for 'RKEEL' in a very simple way. It includes more than 100 algorithms for classification, regression, preprocess, association rules and imbalance learning, which allows a more complete experimentation process. For more information about 'KEEL', see <http://www.keel.es/>.

Maintained by Jose M. Moyano. Last updated 2 years ago.

openjdk

38.7 match 2 stars 2.41 score 130 scripts

bioc

rGREAT:GREAT Analysis - Functional Enrichment on Genomic Regions

GREAT (Genomic Regions Enrichment of Annotations Tool) is a type of functional enrichment analysis directly performed on genomic regions. This package implements the GREAT algorithm (the local GREAT analysis), also it supports directly interacting with the GREAT web service (the online GREAT analysis). Both analysis can be viewed by a Shiny application. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions.

Maintained by Zuguang Gu. Last updated 19 days ago.

genesetenrichment go pathways software sequencing wholegenome genomeannotation coverage cpp

9.4 match 86 stars 9.96 score 320 scripts 1 dependents

revelle

psych:Procedures for Psychological, Psychometric, and Personality Research

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 11 days ago.

6.6 match 52 stars 14.12 score 29k scripts 317 dependents

stscl

gdverse:Analysis of Spatial Stratified Heterogeneity

Detecting spatial associations based on the concept of spatial stratified heterogeneity while also considering spatial dependencies, spatial interpretability, complex spatial interactions, and robust spatial stratification. In addition, it supports the spatial stratified heterogeneity family described in Lv et al. (2025)<doi:10.1111/tgis.70032>.

Maintained by Wenbo Lv. Last updated 3 days ago.

geographical-detector geoinformatics geospatial-analysis spatial-statistics spatial-stratified-heterogeneity cpp

10.2 match 33 stars 9.10 score 41 scripts 2 dependents

usepa

tcpl:ToxCast Data Analysis Pipeline

The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.

Maintained by Jason Brown. Last updated 13 days ago.

ccte comptox ord

9.8 match 36 stars 9.39 score 90 scripts

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 12 months ago.

40.1 match 2.28 score 38 scripts

rich-iannone

DiagrammeR:Graph/Network Visualization

Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.

Maintained by Richard Iannone. Last updated 2 months ago.

graph graph-functions network-graph property-graph visualization

5.9 match 1.7k stars 15.29 score 3.8k scripts 86 dependents

bioc

snpStats:SnpMatrix and XSnpMatrix classes and methods

Classes and statistical methods for large SNP association studies. This extends the earlier snpMatrix package, allowing for uncertainty in genotypes.

Maintained by David Clayton. Last updated 4 days ago.

microarray snp geneticvariability zlib

9.2 match 9.68 score 674 scripts 23 dependents

pecanproject

PEcAn.DB:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by David LeBauer. Last updated 8 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

7.3 match 216 stars 11.90 score 127 scripts 27 dependents

bioc

rhdf5:R Interface to HDF5

This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.

Maintained by Mike Smith. Last updated 7 days ago.

infrastructure dataimport hdf5 rhdf5 openssl curl zlib cpp

5.3 match 62 stars 15.87 score 4.2k scripts 232 dependents

leelabsg

SKAT:SNP-Set (Sequence) Kernel Association Test

Functions for kernel-regression-based association tests including Burden test, SKAT and SKAT-O. These methods aggregate individual SNP score statistics in a SNP set and efficiently compute SNP-set level p-values.

Maintained by Seunggeun (Shawn) Lee. Last updated 2 months ago.

sequence cpp

8.6 match 45 stars 9.70 score 268 scripts 16 dependents

grunwaldlab

poppr:Genetic Analysis of Populations with Mixed Reproduction

Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grünwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grünwald (2015) <doi:10.3389/fgene.2015.00208>.

Maintained by Zhian N. Kamvar. Last updated 11 months ago.

clonality genetic-analysis genetic-distances minimum-spanning-networks multilocus-genotypes multilocus-lineages population-genetics populations openmp

7.7 match 69 stars 10.84 score 672 scripts

refunders

refund:Regression with Functional Data

Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.

Maintained by Julia Wrobel. Last updated 6 months ago.

8.3 match 43 stars 10.11 score 472 scripts 17 dependents

bioc

regioneR:Association analysis of genomic regions based on permutation tests

regioneR offers a statistical framework based on customizable permutation tests to assess the association between genomic region sets and other genomic features.

Maintained by Bernat Gel. Last updated 5 months ago.

genetics chipseq dnaseq methylseq copynumbervariation

9.1 match 9.01 score 2.7k scripts 21 dependents

peterreichert

utility:Construct, Evaluate and Plot Value and Utility Functions

Construct and plot objective hierarchies and associated value and utility functions. Evaluate the values and utilities and visualize the results as colored objective hierarchies or tables. Visualize uncertainty by plotting median and quantile intervals within the nodes of objective hierarchies. Get numerical results of the evaluations in standard R data types for further processing.

Maintained by Peter Reichert. Last updated 2 years ago.

24.3 match 3.35 score 82 scripts 1 dependents

azure

azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'

Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.

Maintained by Diondra Peck. Last updated 3 years ago.

amlcompute azure azure-machine-learning azureml dsi machine-learning rstudio sdk-r

9.1 match 105 stars 8.91 score 221 scripts

schochastics

networkdata:Repository of Network Datasets

The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.

Maintained by David Schoch. Last updated 1 years ago.

dataset network-analysis

16.0 match 142 stars 5.01 score 143 scripts

mrcieu

ieugwasr:Interface to the 'OpenGWAS' Database API

Interface to the 'OpenGWAS' database API <https://api.opengwas.io/api/>. Includes a wrapper to make generic calls to the API, plus convenience functions for specific queries.

Maintained by Gibran Hemani. Last updated 18 days ago.

7.5 match 89 stars 10.71 score 404 scripts 6 dependents

bioc

EnrichedHeatmap:Making Enriched Heatmaps

Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization sequencing genomeannotation coverage cpp

7.3 match 190 stars 10.87 score 330 scripts 1 dependents

r-spatialecology

shar:Species-Habitat Associations

Analyse species-habitat associations in R. Therefore, information about the location of the species (as a point pattern) is needed together with environmental conditions (as a categorical raster). To test for significance habitat associations, one of the two components is randomized. Methods are mainly based on Plotkin et al. (2000) <doi:10.1006/jtbi.2000.2158> and Harms et al. (2001) <doi:10.1111/j.1365-2745.2001.00615.x>.

Maintained by Maximilian H.K. Hesselbarth. Last updated 2 months ago.

habitat-association landscape-ecology point-pattern-analysis spatial-analysis

11.6 match 20 stars 6.83 score 28 scripts

bioc

podkat:Position-Dependent Kernel Association Test

This package provides an association test that is capable of dealing with very rare and even private variants. This is accomplished by a kernel-based approach that takes the positions of the variants into account. The test can be used for pre-processed matrix data, but also directly for variant data stored in VCF files. Association testing can be performed whole-genome, whole-exome, or restricted to pre-defined regions of interest. The test is complemented by tools for analyzing and visualizing the results.

Maintained by Ulrich Bodenhofer. Last updated 5 months ago.

genetics wholegenome annotation variantannotation sequencing dataimport curl bzip2 xz-utils zlib cpp

15.8 match 5.02 score 6 scripts

venelin

PCMBase:Simulation and Likelihood Calculation of Phylogenetic Comparative Models

Phylogenetic comparative methods represent models of continuous trait data associated with the tips of a phylogenetic tree. Examples of such models are Gaussian continuous time branching stochastic processes such as Brownian motion (BM) and Ornstein-Uhlenbeck (OU) processes, which regard the data at the tips of the tree as an observed (final) state of a Markov process starting from an initial state at the root and evolving along the branches of the tree. The PCMBase R package provides a general framework for manipulating such models. This framework consists of an application programming interface for specifying data and model parameters, and efficient algorithms for simulating trait evolution under a model and calculating the likelihood of model parameters for an assumed model and trait data. The package implements a growing collection of models, which currently includes BM, OU, BM/OU with jumps, two-speed OU as well as mixed Gaussian models, in which different types of the above models can be associated with different branches of the tree. The PCMBase package is limited to trait-simulation and likelihood calculation of (mixed) Gaussian phylogenetic models. The PCMFit package provides functionality for inference of these models to tree and trait data. The package web-site <https://venelin.github.io/PCMBase/> provides access to the documentation and other resources.

Maintained by Venelin Mitov. Last updated 11 months ago.

10.9 match 6 stars 7.26 score 85 scripts 3 dependents

insightsengineering

rtables:Reporting Tables

Reporting tables often have structure that goes beyond simple rectangular data. The 'rtables' package provides a framework for declaring complex multi-level tabulations and then applying them to data. This framework models both tabulation and the resulting tables as hierarchical, tree-like objects which support sibling sub-tables, arbitrary splitting or grouping of data in row and column dimensions, cells containing multiple values, and the concept of contextual summary computations. A convenient pipe-able interface is provided for declaring table layouts and the corresponding computations, and then applying them to data.

Maintained by Joe Zhu. Last updated 3 months ago.

pharmaceuticals tables

5.8 match 232 stars 13.65 score 238 scripts 17 dependents

pbreheny

grpreg:Regularization Paths for Regression Models with Grouped Covariates

Efficient algorithms for fitting the regularization path of linear regression, GLM, and Cox regression models with grouped penalties. This includes group selection methods such as group lasso, group MCP, and group SCAD as well as bi-level selection methods such as the group exponential lasso, the composite MCP, and the group bridge. For more information, see Breheny and Huang (2009) <doi:10.4310/sii.2009.v2.n3.a10>, Huang, Breheny, and Ma (2012) <doi:10.1214/12-sts392>, Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2>, and Breheny (2015) <doi:10.1111/biom.12300>, or visit the package homepage <https://pbreheny.github.io/grpreg/>.

Maintained by Patrick Breheny. Last updated 1 months ago.

6.9 match 34 stars 11.38 score 192 scripts 35 dependents

kosukehamazaki

RAINBOWR:Genome-Wide Association Study with SNP-Set Methods

By using 'RAINBOWR' (Reliable Association INference By Optimizing Weights with R), users can test multiple SNPs (Single Nucleotide Polymorphisms) simultaneously by kernel-based (SNP-set) methods. This package can also be applied to haplotype-based GWAS (Genome-Wide Association Study). Users can test not only additive effects but also dominance and epistatic effects. In detail, please check our paper on PLOS Computational Biology: Kosuke Hamazaki and Hiroyoshi Iwata (2020) <doi:10.1371/journal.pcbi.1007663>.

Maintained by Kosuke Hamazaki. Last updated 4 months ago.

cpp

13.0 match 22 stars 5.99 score 22 scripts

ocbe-uio

contingencytables:Statistical Analysis of Contingency Tables

Provides functions to perform statistical inference of data organized in contingency tables. This package is a companion to the "Statistical Analysis of Contingency Tables" book by Fagerland et al. <ISBN 9781466588172>.

Maintained by Waldir Leoncio. Last updated 7 months ago.

contingency-table

21.2 match 3 stars 3.65 score 8 scripts 1 dependents

aplantin

MiRKAT:Microbiome Regression-Based Kernel Association Tests

Test for overall association between microbiome composition data and phenotypes via phylogenetic kernels. The phenotype can be univariate continuous or binary (Zhao et al. (2015) <doi:10.1016/j.ajhg.2015.04.003>), survival outcomes (Plantinga et al. (2017) <doi:10.1186/s40168-017-0239-9>), multivariate (Zhan et al. (2017) <doi:10.1002/gepi.22030>) and structured phenotypes (Zhan et al. (2017) <doi:10.1111/biom.12684>). The package can also use robust regression (unpublished work) and integrated quantile regression (Wang et al. (2021) <doi:10.1093/bioinformatics/btab668>). In each case, the microbiome community effect is modeled nonparametrically through a kernel function, which can incorporate phylogenetic tree information.

Maintained by Anna Plantinga. Last updated 2 years ago.

14.8 match 3 stars 5.22 score 183 scripts 1 dependents

stan-dev

rstanarm:Bayesian Applied Regression Modeling via Stan

Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.

Maintained by Ben Goodrich. Last updated 13 days ago.

bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics multilevel-models rstan rstanarm stan statistical-modeling cpp

4.9 match 393 stars 15.70 score 5.0k scripts 13 dependents

hanchenphd

GMMAT:Generalized Linear Mixed Model Association Tests

Perform association tests using generalized linear mixed models (GLMMs) in genome-wide association studies (GWAS) and sequencing association studies. First, GMMAT fits a GLMM with covariate adjustment and random effects to account for population structure and familial or cryptic relatedness. For GWAS, GMMAT performs score tests for each genetic variant as proposed in Chen et al. (2016) <DOI:10.1016/j.ajhg.2016.02.012>. For candidate gene studies, GMMAT can also perform Wald tests to get the effect size estimate for each genetic variant. For rare variant analysis from sequencing association studies, GMMAT performs the variant Set Mixed Model Association Tests (SMMAT) as proposed in Chen et al. (2019) <DOI:10.1016/j.ajhg.2018.12.012>, including the burden test, the sequence kernel association test (SKAT), SKAT-O and an efficient hybrid test of the burden test and SKAT, based on user-defined variant sets.

Maintained by Han Chen. Last updated 1 years ago.

openblas zlib bzip2 libzstd libdeflate cpp

9.1 match 41 stars 8.37 score 96 scripts 2 dependents

ropensci

rotl:Interface to the 'Open Tree of Life' API

An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.

Maintained by Francois Michonneau. Last updated 2 years ago.

metadata ropensci phylogenetics independant-contrasts biodiversity peer-reviewed phylogeny taxonomy

6.3 match 40 stars 12.05 score 356 scripts 29 dependents

firefly-cpp

niarules:Numerical Association Rule Mining using Population-Based Nature-Inspired Algorithms

Framework is devoted to mining numerical association rules through the utilization of nature-inspired algorithms for optimization. Drawing inspiration from the 'NiaARM' 'Python' and the 'NiaARM' 'Julia' packages, this repository introduces the capability to perform numerical association rule mining in the R programming language. Fister Jr., Iglesias, Galvez, Del Ser, Osaba and Fister (2018) <doi:10.1007/978-3-030-03493-1_9>.

Maintained by Iztok Jr. Fister. Last updated 28 days ago.

association-rules metaheuristics optimization

20.2 match 1 stars 3.70 score 2 scripts

bioc

traseR:GWAS trait-associated SNP enrichment analyses in genomic intervals

traseR performs GWAS trait-associated SNP enrichment analyses in genomic intervals using different hypothesis testing approaches, also provides various functionalities to explore and visualize the results.

Maintained by li chen. Last updated 5 months ago.

genetics sequencing coverage alignment qualitycontrol dataimport

22.0 match 3.30 score 3 scripts

bioc

midasHLA:R package for immunogenomics data handling and association analysis

MiDAS is a R package for immunogenetics data transformation and statistical analysis. MiDAS accepts input data in the form of HLA alleles and KIR types, and can transform it into biologically meaningful variables, enabling HLA amino acid fine mapping, analyses of HLA evolutionary divergence, KIR gene presence, as well as validated HLA-KIR interactions. Further, it allows comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS closes a gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to T cell, Natural Killer cell, and disease biology.

Maintained by Maciej Migdał. Last updated 5 months ago.

cellbiology genetics statisticalmethod

16.6 match 4.30 score 3 scripts

alenxav

NAM:Nested Association Mapping

Designed for association studies in nested association mapping (NAM) panels, experimental and random panels. The method is described by Xavier et al. (2015) <doi:10.1093/bioinformatics/btv448>. It includes tools for genome-wide associations of multiple populations, marker quality control, population genetics analysis, genome-wide prediction, solving mixed models and finding variance components through likelihood and Bayesian methods.

Maintained by Alencar Xavier. Last updated 5 years ago.

cpp

12.4 match 2 stars 5.72 score 44 scripts 1 dependents

cran

datarobot:'DataRobot' Predictive Modeling API

For working with the 'DataRobot' predictive modeling platform's API <https://www.datarobot.com/>.

Maintained by AJ Alon. Last updated 1 years ago.

20.4 match 2 stars 3.48 score

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 4 days ago.

fortran cpp

4.1 match 86 stars 16.73 score 7.7k scripts 101 dependents

nowosad

sabre:Spatial Association Between Regionalizations

Calculates a degree of spatial association between regionalizations or categorical maps using the information-theoretical V-measure (Nowosad and Stepinski (2018) <doi:10.1080/13658816.2018.1511794>). It also offers an R implementation of the MapCurve method (Hargrove et al. (2006) <doi:10.1007/s10109-006-0025-x>).

Maintained by Jakub Nowosad. Last updated 4 months ago.

entropy polygons regionalizations spatial spatial-analysis

9.9 match 36 stars 6.95 score 25 scripts

bioc

metagenomeSeq:Statistical analysis for sparse high-throughput sequencing

metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.

Maintained by Joseph N. Paulson. Last updated 3 months ago.

immunooncology classification clustering geneticvariability differentialexpression microbiome metagenomics normalization visualization multiplecomparison sequencing software

5.8 match 69 stars 11.90 score 494 scripts 7 dependents

zarquon42b

Morpho:Calculations and Visualisations Related to Geometric Morphometrics

A toolset for Geometric Morphometrics and mesh processing. This includes (among other stuff) mesh deformations based on reference points, permutation tests, detection of outliers, processing of sliding semi-landmarks and semi-automated surface landmark placement.

Maintained by Stefan Schlager. Last updated 5 months ago.

openblas cpp openmp

6.8 match 51 stars 10.01 score 218 scripts 13 dependents

faosorios

SpatialPack:Tools for Assessment the Association Between Two Spatial Processes

Tools to assess the association between two spatial processes. Currently, several methodologies are implemented: A modified t-test to perform hypothesis testing about the independence between the processes, a suitable nonparametric correlation coefficient, the codispersion coefficient, and an F test for assessing the multiple correlation between one spatial process and several others. Functions for image processing and computing the spatial association between images are also provided. Functions contained in the package are intended to accompany Vallejos, R., Osorio, F., Bevilacqua, M. (2020). Spatial Relationships Between Two Georeferenced Variables: With Applications in R. Springer, Cham <doi:10.1007/978-3-030-56681-4>.

Maintained by Felipe Osorio. Last updated 12 days ago.

codispersion-coefficient modified-t-test spatial-association spatial-processes ssim structural-similarity tjostheim-coefficient fortran

11.2 match 7 stars 5.88 score 73 scripts 1 dependents

bioc

microbiome:Microbiome Analytics

Utilities for microbiome analysis.

Maintained by Leo Lahti. Last updated 5 months ago.

metagenomics microbiome sequencing systemsbiology hitchip hitchip-atlas human-microbiome microbiology microbiome-analysis phyloseq population-study

5.3 match 293 stars 12.51 score 2.0k scripts 5 dependents

bioc

PANR:Posterior association networks and functional modules inferred from rich phenotypes of gene perturbations

This package provides S4 classes and methods for inferring functional gene networks with edges encoding posterior beliefs of gene association types and nodes encoding perturbation effects.

Maintained by Xin Wang. Last updated 5 months ago.

immunooncology networkinference visualization graphandnetwork clustering cellbasedassays

19.9 match 3.30 score 2 scripts

bioc

dcanr:Differential co-expression/association network analysis

This package implements methods and an evaluation framework to infer differential co-expression/association networks. Various methods are implemented and can be evaluated using simulated datasets. Inference of differential co-expression networks can allow identification of networks that are altered between two conditions (e.g., health and disease).

Maintained by Dharmesh D. Bhuva. Last updated 5 months ago.

networkinference graphandnetwork differentialexpression network

8.8 match 6 stars 7.45 score 26 scripts 5 dependents

bioc

rexposome:Exposome exploration and outcome data analysis

Package that allows to explore the exposome and to perform association analyses between exposures and health outcomes.

Maintained by Xavier Escribà Montagut. Last updated 5 months ago.

software biologicalquestion infrastructure dataimport datarepresentation biomedicalinformatics experimentaldesign multiplecomparison classification clustering

11.4 match 5.70 score 28 scripts 1 dependents

pln-team

PLNmodels:Poisson Lognormal Models

The Poisson-lognormal model and variants (Chiquet, Mariadassou and Robin, 2021 <doi:10.3389/fevo.2021.588292>) can be used for a variety of multivariate problems when count data are at play, including principal component analysis for count data, discriminant analysis, model-based clustering and network inference. Implements variational algorithms to fit such models accompanied with a set of functions for visualization and diagnostic.

Maintained by Julien Chiquet. Last updated 7 days ago.

count-data multivariate-analysis network-inference pca poisson-lognormal-model openblas cpp

6.7 match 55 stars 9.54 score 226 scripts

himesgroup

snpsettest:A Set-Based Association Test using GWAS Summary Statistics

The goal of 'snpsettest' is to provide simple tools that perform set-based association tests (e.g., gene-based association tests) using GWAS (genome-wide association study) summary statistics. A set-based association test in this package is based on the statistical model described in VEGAS (versatile gene-based association study), which combines the effects of a set of SNPs accounting for linkage disequilibrium between markers. This package uses a different approach from the original VEGAS implementation to compute set-level p values more efficiently, as described in <https://github.com/HimesGroup/snpsettest/wiki/Statistical-test-in-snpsettest>.

Maintained by Jaehyun Joo. Last updated 2 years ago.

openblas cpp openmp

12.7 match 7 stars 5.02 score 9 scripts

bioc

rTRM:Identification of Transcriptional Regulatory Modules from Protein-Protein Interaction Networks

rTRM identifies transcriptional regulatory modules (TRMs) from protein-protein interaction networks.

Maintained by Diego Diez. Last updated 5 months ago.

transcription network generegulation graphandnetwork bioconductor bioinformatics

13.2 match 3 stars 4.86 score 3 scripts 1 dependents

bioc

maaslin3:"Refining and extending generalized multivariate linear models for meta-omic association discovery"

MaAsLin 3 refines and extends generalized multivariate linear models for meta-omicron association discovery. It finds abundance and prevalence associations between microbiome meta-omics features and complex metadata in population-scale epidemiological studies. The software includes multiple analysis methods (including support for multiple covariates, repeated measures, and ordered predictors), filtering, normalization, and transform options to customize analysis for your specific study.

Maintained by William Nickols. Last updated 7 days ago.

metagenomics software microbiome normalization multiplecomparison

7.8 match 33 stars 8.16 score 34 scripts

nanxstats

ggsci:Scientific Journal and Sci-Fi Themed Color Palettes for 'ggplot2'

A collection of 'ggplot2' color palettes inspired by plots in scientific journals, data visualization libraries, science fiction movies, and TV shows.

Maintained by Nan Xiao. Last updated 10 months ago.

color-palettes data-visualization ggplot2 ggsci sci-fi scientific-journals visualization

3.5 match 680 stars 18.00 score 26k scripts 438 dependents

stscl

cisp:A Correlation Indicator Based on Spatial Patterns

Use the spatial association marginal contributions derived from spatial stratified heterogeneity to capture the degree of correlation between spatial patterns.

Maintained by Wenbo Lv. Last updated 2 months ago.

association correlation geoinformatics spatial-patterns

12.2 match 5 stars 5.10 score 2 scripts

babaknaimi

elsa:Entropy-Based Local Indicator of Spatial Association

A framework that provides the methods for quantifying entropy-based local indicator of spatial association (ELSA) that can be used for both continuous and categorical data. In addition, this package offers other methods to measure local indicators of spatial associations (LISA). Furthermore, global spatial structure can be measured using a variogram-like diagram, called entrogram. For more information, please check that paper: Naimi, B., Hamm, N. A., Groen, T. A., Skidmore, A. K., Toxopeus, A. G., & Alibakhshi, S. (2019) <doi:10.1016/j.spasta.2018.10.001>.

Maintained by Babak Naimi. Last updated 1 years ago.

11.9 match 14 stars 5.23 score 24 scripts

sinnweja

haplo.stats:Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous

Routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers). The main functions are: haplo.em(), haplo.glm(), haplo.score(), and haplo.power(); all of which have detailed examples in the vignette.

Maintained by Jason P. Sinnwell. Last updated 6 months ago.

10.4 match 2 stars 5.98 score 96 scripts 12 dependents

beerda

lfl:Linguistic Fuzzy Logic

Various algorithms related to linguistic fuzzy logic: mining for linguistic fuzzy association rules, composition of fuzzy relations, performing perception-based logical deduction (PbLD), and forecasting time-series using fuzzy rule-based ensemble (FRBE). The package also contains basic fuzzy-related algebraic functions capable of handling missing values in different styles (Bochvar, Sobocinski, Kleene etc.), computation of Sugeno integrals and fuzzy transform.

Maintained by Michal Burda. Last updated 5 months ago.

association-rules forecast-model fuzzy-logic inference-rules cpp openmp

11.5 match 8 stars 5.35 score 28 scripts

briencj

asremlPlus:Augments 'ASReml-R' in Fitting Mixed Models and Packages Generally in Exploring Prediction Differences

Assists in automating the selection of terms to include in mixed models when 'asreml' is used to fit the models. Procedures are available for choosing models that conform to the hierarchy or marginality principle, for fitting and choosing between two-dimensional spatial models using correlation, natural cubic smoothing spline and P-spline models. A history of the fitting of a sequence of models is kept in a data frame. Also used to compute functions and contrasts of, to investigate differences between and to plot predictions obtained using any model fitting function. The content falls into the following natural groupings: (i) Data, (ii) Model modification functions, (iii) Model selection and description functions, (iv) Model diagnostics and simulation functions, (v) Prediction production and presentation functions, (vi) Response transformation functions, (vii) Object manipulation functions, and (viii) Miscellaneous functions (for further details see 'asremlPlus-package' in help). The 'asreml' package provides a computationally efficient algorithm for fitting a wide range of linear mixed models using Residual Maximum Likelihood. It is a commercial package and a license for it can be purchased from 'VSNi' <https://vsni.co.uk/> as 'asreml-R', who will supply a zip file for local installation/updating (see <https://asreml.kb.vsni.co.uk/>). It is not needed for functions that are methods for 'alldiffs' and 'data.frame' objects. The package 'asremPlus' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 1 months ago.

asreml mixed-models

6.5 match 19 stars 9.37 score 200 scripts

ropensci

datapack:A Flexible Container to Transport and Manipulate Data and Associated Resources

Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI- ORE standard is described at <https://www.openarchives.org/ore/>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at <https://tools.ietf.org/html/draft-kunze-bagit-08>.

Maintained by Matthew B. Jones. Last updated 3 years ago.

7.1 match 43 stars 8.55 score 195 scripts 4 dependents

insightsengineering

teal.modules.general:General Modules for 'teal' Applications

Prebuilt 'shiny' modules containing tools for viewing data, visualizing data, understanding missing and outlier values within your data and performing simple data analysis. This extends 'teal' framework that supports reproducible research and analysis.

Maintained by Dawid Kaledkowski. Last updated 1 months ago.

general-purpose modules nest shiny

6.1 match 13 stars 9.74 score 71 scripts

omfmartin

zebu:Local Association Measures

Implements the estimation of local (and global) association measures: Lewontin's D, Ducher's Z, pointwise mutual information, normalized pointwise mutual information and chi-squared residuals. The significance of local (and global) association is accessed using p-values estimated by permutations.

Maintained by Olivier M. F. Martin. Last updated 2 years ago.

cpp

14.9 match 3.98 score 19 scripts

bioc

ramwas:Fast Methylome-Wide Association Study Pipeline for Enrichment Platforms

A complete toolset for methylome-wide association studies (MWAS). It is specifically designed for data from enrichment based methylation assays, but can be applied to other data as well. The analysis pipeline includes seven steps: (1) scanning aligned reads from BAM files, (2) calculation of quality control measures, (3) creation of methylation score (coverage) matrix, (4) principal component analysis for capturing batch effects and detection of outliers, (5) association analysis with respect to phenotypes of interest while correcting for top PCs and known covariates, (6) annotation of significant findings, and (7) multi-marker analysis (methylation risk score) using elastic net. Additionally, RaMWAS include tools for joint analysis of methlyation and genotype data. This work is published in Bioinformatics, Shabalin et al. (2018) <doi:10.1093/bioinformatics/bty069>.

Maintained by Andrey A Shabalin. Last updated 5 months ago.

dnamethylation sequencing qualitycontrol coverage preprocessing normalization batcheffect principalcomponent differentialmethylation visualization

9.7 match 10 stars 6.08 score 85 scripts

fishr-core-team

FSA:Simple Fisheries Stock Assessment Methods

A variety of simple fish stock assessment methods.

Maintained by Derek H. Ogle. Last updated 2 months ago.

fish fisheries fisheries-management fisheries-stock-assessment population-dynamics stock-assessment

5.3 match 69 stars 11.16 score 1.7k scripts 6 dependents

beerda

nuggets:Extensible Data Pattern Searching Framework

Extensible framework for subgroup discovery (Atzmueller (2015) <doi:10.1002/widm.1144>), contrast patterns (Chen (2022) <doi:10.48550/arXiv.2209.13556>), emerging patterns (Dong (1999) <doi:10.1145/312129.312191>), association rules (Agrawal (1994) <https://www.vldb.org/conf/1994/P487.PDF>) and conditional correlations (Hájek (1978) <doi:10.1007/978-3-642-66943-9>). Both crisp (Boolean, binary) and fuzzy data are supported. It generates conditions in the form of elementary conjunctions, evaluates them on a dataset and checks the induced sub-data for interesting statistical properties. A user-defined function may be defined to evaluate on each generated condition to search for custom patterns.

Maintained by Michal Burda. Last updated 19 days ago.

association-rule-mining contrast-pattern-mining data-mining fuzzy knowledge-discovery pattern-recognition cpp openmp

10.5 match 2 stars 5.38 score 10 scripts

r-spatial

spdep:Spatial Dependence: Weighting Schemes, Statistics

A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.

Maintained by Roger Bivand. Last updated 1 months ago.

spatial-autocorrelation spatial-dependence spatial-weights

3.4 match 131 stars 16.59 score 6.0k scripts 106 dependents

ralmond

Peanut:Parameterized Bayesian Networks, Abstract Classes

This provides support of learning conditional probability tables parameterized using CPTtools. This provides and object oriented layer on top of a CPTtools, to facilitate calculations with Parameterized models for Bayesian networks. Peanut is a collection of abstract classes and generic functions defining a protocol, with the intent that the protocol can be implemented with different Bayes net engines. The companion pacakge PNetica provides an implementation using Netica and RNetica.

Maintained by Russell Almond. Last updated 2 years ago.

bayesian-network knowledge-representation

22.7 match 1 stars 2.48 score 4 scripts 2 dependents

zhanxw

seqminer:Efficiently Read Sequence Data (VCF Format, BCF Format, METAL Format and BGEN Format) into R

Integrate sequencing data (Variant call format, e.g. VCF or BCF) or meta-analysis results in R. This package can help you (1) read VCF/BCF/BGEN files by chromosomal ranges (e.g. 1:100-200); (2) read RareMETAL summary statistics files; (3) read tables from a tabix-indexed files; (4) annotate VCF/BCF files; (5) create customized workflow based on Makefile.

Maintained by Xiaowei Zhan. Last updated 6 months ago.

annotation bcf bgen meta-analysis next-generation-sequencing plink sequencing tabix vcf workflow zlib bzip2 libzstd sqlite3 cpp

6.8 match 30 stars 8.29 score 111 scripts 6 dependents

kornl

mutoss:Unified Multiple Testing Procedures

Designed to ease the application and comparison of multiple hypothesis testing procedures for FWER, gFWER, FDR and FDX. Methods are standardized and usable by the accompanying 'mutossGUI'.

Maintained by Kornelius Rohmeyer. Last updated 1 years ago.

6.6 match 4 stars 8.50 score 24 scripts 16 dependents

bioc

omicRexposome:Exposome and omic data associatin and integration analysis

omicRexposome systematizes the association evaluation between exposures and omic data, taking advantage of MultiDataSet for coordinated data management, rexposome for exposome data definition and limma for association testing. Also to perform data integration mixing exposome and omic data using multi co-inherent analysis (omicade4) and multi-canonical correlation analysis (PMA).

Maintained by Xavier Escribà Montagut. Last updated 5 months ago.

immunooncology workflowstep multiplecomparison visualization geneexpression differentialexpression differentialmethylation generegulation epigenetics proteomics transcriptomics statisticalmethod regression

13.0 match 4.30 score 5 scripts

stscl

itmsa:Information-Theoretic Measures for Spatial Association

Leveraging information-theoretic measures like mutual information and v-measure to quantify spatial associations between patterns (Nowosad and Stepinski (2018) <doi:10.1080/13658816.2018.1511794>; Bai, H. et al. (2023) <doi:10.1080/24694452.2023.2223700>).

Maintained by Wenbo Lv. Last updated 3 months ago.

cpp

11.1 match 5 stars 5.00 score

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 22 days ago.

3.6 match 100 stars 15.36 score 1.4k scripts 36 dependents

bioc

LEA:LEA: an R package for Landscape and Ecological Association Studies

LEA is an R package dedicated to population genomics, landscape genomics and genotype-environment association tests. LEA can run analyses of population structure and genome-wide tests for local adaptation, and also performs imputation of missing genotypes. The package includes statistical methods for estimating ancestry coefficients from large genotypic matrices and for evaluating the number of ancestral populations (snmf). It performs statistical tests using latent factor mixed models for identifying genetic polymorphisms that exhibit association with environmental gradients or phenotypic traits (lfmm2). In addition, LEA computes values of genetic offset statistics based on new or predicted environments (genetic.gap, genetic.offset). LEA is mainly based on optimized programs that can scale with the dimensions of large data sets.

Maintained by Olivier Francois. Last updated 21 days ago.

software statistical method clustering regression openblas

8.2 match 6.63 score 534 scripts

bioc

Maaslin2:"Multivariable Association Discovery in Population-scale Meta-omics Studies"

MaAsLin2 is comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta'omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods. MaAsLin2 is the next generation of MaAsLin.

Maintained by Lauren McIver. Last updated 5 months ago.

metagenomics software microbiome normalization biobakery bioconductor differential-abundance-analysis false-discovery-rate multiple-covariates public repeated-measures tools

4.9 match 133 stars 11.03 score 532 scripts 3 dependents

bioc

PCAN:Phenotype Consensus ANalysis (PCAN)

Phenotypes comparison based on a pathway consensus approach. Assess the relationship between candidate genes and a set of phenotypes based on additional genes related to the candidate (e.g. Pathways or network neighbors).

Maintained by Matthew Page. Last updated 5 months ago.

annotation sequencing genetics functionalprediction variantannotation pathways network

13.0 match 4.15 score 7 scripts

biodiverse

unmarked:Models for Data from Unmarked Animals

Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.

Maintained by Ken Kellner. Last updated 13 days ago.

openblas cpp openmp

4.1 match 4 stars 13.02 score 652 scripts 12 dependents

danchaltiel

crosstable:Crosstables for Descriptive Analyses

Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using 'rmarkdown' or 'officer'. You can also compute effect sizes and statistical tests if needed.

Maintained by Dan Chaltiel. Last updated 2 months ago.

descriptive-statistics flextable frequency-table html-report msword officer

5.1 match 116 stars 10.40 score 340 scripts

j-mitchel

scITD:Single-Cell Interpretable Tensor Decomposition

Single-cell Interpretable Tensor Decomposition (scITD) employs the Tucker tensor decomposition to extract multicell-type gene expression patterns that vary across donors/individuals. This tool is geared for use with single-cell RNA-sequencing datasets consisting of many source donors. The method has a wide range of potential applications, including the study of inter-individual variation at the population-level, patient sub-grouping/stratification, and the analysis of sample-level batch effects. Each "multicellular process" that is extracted consists of (A) a multi cell type gene loadings matrix and (B) a corresponding donor scores vector indicating the level at which the corresponding loadings matrix is expressed in each donor. Additional methods are implemented to aid in selecting an appropriate number of factors and to evaluate stability of the decomposition. Additional tools are provided for downstream analysis, including integration of gene set enrichment analysis and ligand-receptor analysis. Tucker, L.R. (1966) <doi:10.1007/BF02289464>. Unkel, S., Hannachi, A., Trendafilov, N. T., & Jolliffe, I. T. (2011) <doi:10.1007/s13253-011-0055-9>. Zhou, G., & Cichocki, A. (2012) <doi:10.2478/v10175-012-0051-4>.

Maintained by Jonathan Mitchel. Last updated 2 years ago.

cpp

26.6 match 1.98 score 19 scripts

ozancinar

poolr:Methods for Pooling P-Values from (Dependent) Tests

Functions for pooling/combining the results (i.e., p-values) from (dependent) hypothesis tests. Included are Fisher's method, Stouffer's method, the inverse chi-square method, the Bonferroni method, Tippett's method, and the binomial test. Each method can be adjusted based on an estimate of the effective number of tests or using empirically derived null distribution using pseudo replicates. For Fisher's, Stouffer's, and the inverse chi-square method, direct generalizations based on multivariate theory are also available (leading to Brown's method, Strube's method, and the generalized inverse chi-square method). An introduction can be found in Cinar and Viechtbauer (2022) <doi:10.18637/jss.v101.i01>.

Maintained by Ozan Cinar. Last updated 18 days ago.

8.3 match 12 stars 6.32 score 145 scripts 1 dependents

bioc

xcms:LC-MS and GC-MS Data Analysis

Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.

Maintained by Steffen Neumann. Last updated 18 days ago.

immunooncology massspectrometry metabolomics bioconductor feature-detection mass-spectrometry peak-detection cpp

3.7 match 196 stars 14.31 score 984 scripts 11 dependents

r-forge

survey:Analysis of Complex Survey Samples

Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase and multiphase subsampling designs. Graphics. PPS sampling without replacement. Small-area estimation. Dual-frame designs.

Maintained by "Thomas Lumley". Last updated 4 days ago.

cpp

3.8 match 1 stars 13.94 score 13k scripts 234 dependents

barakbri

repfdr:Replicability Analysis for Multiple Studies of High Dimension

Estimation of Bayes and local Bayes false discovery rates for replicability analysis (Heller & Yekutieli, 2014 <doi:10.1214/13-AOAS697> ; Heller at al., 2015 <doi: 10.1093/bioinformatics/btu434>).

Maintained by Ruth Heller. Last updated 8 years ago.

cpp

10.5 match 3 stars 4.98 score 16 scripts

bioc

HIBAG:HLA Genotype Imputation with Attribute Bagging

Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.

Maintained by Xiuwen Zheng. Last updated 4 months ago.

genetics statisticalmethod bioinformatics gpu hla imputation mhc snp cpp

6.3 match 30 stars 8.24 score 48 scripts

bioc

SANTA:Spatial Analysis of Network Associations

This package provides methods for measuring the strength of association between a network and a phenotype. It does this by measuring clustering of the phenotype across the network (Knet). Vertices can also be individually ranked by their strength of association with high-weight vertices (Knode).

Maintained by Alex Cornish. Last updated 5 months ago.

network networkenrichment clustering

10.4 match 5.02 score 6 scripts

shixiangwang

sigminer:Extract, Analyze and Visualize Mutational Signatures for Genomic Variations

Genomic alterations including single nucleotide substitution, copy number alteration, etc. are the major force for cancer initialization and development. Due to the specificity of molecular lesions caused by genomic alterations, we can generate characteristic alteration spectra, called 'signature' (Wang, Shixiang, et al. (2021) <DOI:10.1371/journal.pgen.1009557> & Alexandrov, Ludmil B., et al. (2020) <DOI:10.1038/s41586-020-1943-3> & Steele Christopher D., et al. (2022) <DOI:10.1038/s41586-022-04738-6>). This package helps users to extract, analyze and visualize signatures from genomic alteration records, thus providing new insight into cancer study.

Maintained by Shixiang Wang. Last updated 6 months ago.

bayesian-nmf bioinformatics cancer-research cnv copynumber-signatures cosmic-signatures dbs easy-to-use indel mutational-signatures nmf nmf-extraction sbs signature-extraction somatic-mutations somatic-variants visualization cpp

5.4 match 150 stars 9.48 score 123 scripts 2 dependents

vincentarelbundock

countrycode:Convert Country Names and Country Codes

Standardize country names, convert them into one of 40 different coding schemes, convert between coding schemes, and assign region descriptors.

Maintained by Vincent Arel-Bundock. Last updated 13 hours ago.

3.4 match 348 stars 14.89 score 6.3k scripts 119 dependents

maelstrom-research

madshapR:Support Technical Processes Following 'Maelstrom Research' Standards

Functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).

Maintained by Guillaume Fabre. Last updated 11 months ago.

9.4 match 2 stars 5.40 score 28 scripts 3 dependents

vp-biostat

comorbidPGS:Assessing Predisposition Between Phenotypes using Polygenic Scores

Using polygenic scores (PGS, or PRS/GRS for binary outcomes), this package allows to investigate shared predisposition between different conditions, and do fast association analysis, export plots and views of the PGS distribution using 'ggplot2' object.

Maintained by Vincent Pascat. Last updated 3 months ago.

13.4 match 3 stars 3.78 score 3 scripts

bioc

consensusSeekeR:Detection of consensus regions inside a group of experiences using genomic positions and genomic ranges

This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region. In genomic analysis where feature identification generates a position value surrounded by a genomic range, such as ChIP-Seq peaks and nucleosome positions, the replication of an experiment may result in slight differences between predicted values. This package enables the conciliation of the results into consensus regions.

Maintained by Astrid Deschênes. Last updated 5 months ago.

biologicalquestion chipseq genetics multiplecomparison transcription peakdetection sequencing coverage chip-seq-analysis genomic-data-analysis nucleosome-positioning

9.6 match 1 stars 5.26 score 5 scripts 1 dependents

cran

vcd:Visualizing Categorical Data

Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book "Visualizing Categorical Data" by Michael Friendly and is now the main support package for a new book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer (2015).

Maintained by David Meyer. Last updated 7 months ago.

6.0 match 5 stars 8.19 score 87 dependents

blasbenito

collinear:Automated Multicollinearity Management

Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.

Maintained by Blas M. Benito. Last updated 3 months ago.

machine-learning multicollinearity statistics

8.9 match 11 stars 5.51 score 15 scripts 1 dependents

pbs-software

PBSmodelling:GUI Tools Made Easy: Interact with Models and Explore Data

Provides software to facilitate the design, testing, and operation of computer models. It focuses particularly on tools that make it easy to construct and edit a customized graphical user interface ('GUI'). Although our simplified 'GUI' language depends heavily on the R interface to the 'Tcl/Tk' package, a user does not need to know 'Tcl/Tk'. Examples illustrate models built with other R packages, including 'PBSmapping', 'PBSddesolve', and 'BRugs'. A complete user's guide 'PBSmodelling-UG.pdf' shows how to use this package effectively.

Maintained by Rowan Haigh. Last updated 5 months ago.

7.3 match 2 stars 6.76 score 120 scripts 4 dependents

cmmr

rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data

A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.

Maintained by Daniel P. Smith. Last updated 14 days ago.

5.4 match 15 stars 9.07 score 117 scripts 6 dependents

robindenz1

simDAG:Simulate Data from a DAG and Associated Node Information

Simulate complex data from a given directed acyclic graph and information about each individual node. Root nodes are simply sampled from the specified distribution. Child Nodes are simulated according to one of many implemented regressions, such as logistic regression, linear regression, poisson regression and more. Also includes a comprehensive framework for discrete-time simulation, which can generate even more complex longitudinal data.

Maintained by Robin Denz. Last updated 2 days ago.

causal-inference directed-acyclic-graph simulation

6.4 match 11 stars 7.69 score 77 scripts

apxr

analyzer:Data Analysis and Automated R Notebook Generation

Easy data analysis and quality checks which are commonly used in data science. It combines the tabular and graphical visualization for easier usability. This package also creates an R Notebook with detailed data exploration with one function call. The notebook can be made interactive.

Maintained by Apurv Priyam. Last updated 5 years ago.

11.8 match 4.13 score 27 scripts

tgrimes

SeqNet:Generate RNA-Seq Data from Gene-Gene Association Networks

Methods to generate random gene-gene association networks and simulate RNA-seq data from them, as described in Grimes and Datta (2021) <doi:10.18637/jss.v098.i12>. Includes functions to generate random networks of any size and perturb them to obtain differential networks. Network objects are built from individual, overlapping modules that represent pathways. The resulting network has various topological properties that are characteristic of gene regulatory networks. RNA-seq data can be generated such that the association among gene expression profiles reflect the underlying network. A reference RNA-seq dataset can be provided to model realistic marginal distributions. Plotting functions are available to visualize a network, compare two networks, and compare the expression of two genes across multiple networks.

Maintained by Tyler Grimes. Last updated 4 years ago.

cpp

17.3 match 2.82 score 22 scripts 1 dependents

usdaforestservice

gdalraster:Bindings to the 'Geospatial Data Abstraction Library' Raster API

Interface to the Raster API of the 'Geospatial Data Abstraction Library' ('GDAL', <https://gdal.org>). Bindings are implemented in an exposed C++ class encapsulating a 'GDALDataset' and its raster band objects, along with several stand-alone functions. These support manual creation of uninitialized datasets, creation from existing raster as template, read/set dataset parameters, low level I/O, color tables, raster attribute tables, virtual raster (VRT), and 'gdalwarp' wrapper for reprojection and mosaicing. Includes 'GDAL' algorithms ('dem_proc()', 'polygonize()', 'rasterize()', etc.), and functions for coordinate transformation and spatial reference systems. Calling signatures resemble the native C, C++ and Python APIs provided by the 'GDAL' project. Includes raster 'calc()' to evaluate a given R expression on a layer or stack of layers, with pixel x/y available as variables in the expression; and raster 'combine()' to identify and count unique pixel combinations across multiple input layers, with optional output of the pixel-level combination IDs. Provides raster display using base 'graphics'. Bindings to a subset of the 'OGR' API are also included for managing vector data sources. Bindings to a subset of the Virtual Systems Interface ('VSI') are also included to support operations on 'GDAL' virtual file systems. These are general utility functions that abstract file system operations on URLs, cloud storage services, 'Zip'/'GZip'/'7z'/'RAR' archives, and in-memory files. 'gdalraster' may be useful in applications that need scalable, low-level I/O, or prefer a direct 'GDAL' API.

Maintained by Chris Toney. Last updated 2 hours ago.

gdal geospatial raster vector cpp

5.1 match 41 stars 9.49 score 32 scripts 3 dependents

yulab-smu

aplot:Decorate a 'ggplot' with Associated Information

For many times, we are not just aligning plots as what 'cowplot' and 'patchwork' did. Users would like to align associated information that requires axes to be exactly matched in subplots, e.g. hierarchical clustering with a heatmap. Inspired by the 'Method 2' in 'ggtree' (G Yu (2018) <doi:10.1093/molbev/msy194>), 'aplot' provides utilities to aligns associated subplots to a main plot at different sides (left, right, top and bottom) with axes exactly matched.

Maintained by Guangchuang Yu. Last updated 1 months ago.

3.9 match 103 stars 12.25 score 520 scripts 118 dependents

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 11 days ago.

3.5 match 845 stars 13.63 score 264 scripts 2 dependents

mllaberia

Rtapas:Random Tanglegram Partitions

Applies a given global-fit method to random partial tanglegrams of a fixed size to identify the associations, terminals, and nodes that maximize phylogenetic (in)congruence. It also includes functions to compute more easily the confidence intervals of classification metrics and plot results, reducing computational time. See "Llaberia-Robledillo et al. (2023, <doi:10.1093/sysbio/syad016>)".

Maintained by Mar Llaberia-Robledillo. Last updated 10 months ago.

12.9 match 5 stars 3.70 score 9 scripts

patzaw

BED:Biological Entity Dictionary (BED)

An interface for the 'Neo4j' database providing mapping between different identifiers of biological entities. This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information. The method has been published by Godard and van Eyll (2018) <doi:10.12688/f1000research.13925.3>.

Maintained by Patrice Godard. Last updated 4 months ago.

6.9 match 8 stars 6.85 score 25 scripts

jedalong

wildlifeDI:Calculate Indices of Dynamic Interaction for Wildlife Tracking Data

Dynamic interaction refers to spatial-temporal associations in the movements of two (or more) animals. This package provides tools for calculating a suite of indices used for quantifying dynamic interaction with wildlife telemetry data. For more information on each of the methods employed see the references within. The package (as of version >= 0.3) also has new tools for automating contact analysis in large tracking datasets. The package (as of version 1.0) uses the 'move2' class of objects for working with tracking dataset.

Maintained by Jed Long. Last updated 13 days ago.

7.2 match 16 stars 6.60 score 31 scripts

acorg

Racmacs:Antigenic Cartography Macros

A toolkit for making antigenic maps from immunological assay data, in order to quantify and visualize antigenic differences between different pathogen strains as described in Smith et al. (2004) <doi:10.1126/science.1097211> and used in the World Health Organization influenza vaccine strain selection process. Additional functions allow for the diagnostic evaluation of antigenic maps and an interactive viewer is provided to explore antigenic relationships amongst several strains and incorporate the visualization of associated genetic information.

Maintained by Sam Wilks. Last updated 9 months ago.

openblas cpp openmp

5.9 match 21 stars 8.06 score 362 scripts

r-forge

Sleuth3:Data Sets from Ramsey and Schafer's "Statistical Sleuth (3rd Ed)"

Data sets from Ramsey, F.L. and Schafer, D.W. (2013), "The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed)", Cengage Learning.

Maintained by Berwin A Turlach. Last updated 1 years ago.

7.5 match 6.29 score 522 scripts

ready4-dev

ready4:Develop and Use Modular Health Economic Models

A template model module, tools to help find model modules derived from this template and a programming syntax to use these modules in health economic analyses. These elements are the foundation for a prototype software framework for developing living and transferable models and using those models in reproducible health economic analyses. The software framework is extended by other R libraries. For detailed documentation about the framework and how to use it visit <https://www.ready4-dev.com/>. For a background to the methodological issues that the framework is attempting to help solve, see Hamilton et al. (2024) <doi:10.1007/s40273-024-01378-8>.

Maintained by Matthew Hamilton. Last updated 5 months ago.

computational-modeling health-economics software-framework

6.9 match 2 stars 6.84 score 95 scripts

bioc

coMethDMR:Accurate identification of co-methylated and differentially methylated regions in epigenome-wide association studies

coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.

Maintained by Fernanda Veitzman. Last updated 5 months ago.

dnamethylation epigenetics methylationarray differentialmethylation genomewideassociation

7.3 match 7 stars 6.47 score 42 scripts

drjphughesjr

hash:Full Featured Implementation of Hash Tables/Associative Arrays/Dictionaries

Implements a data structure similar to hashes in Perl and dictionaries in Python but with a purposefully R flavor. For objects of appreciable size, access using hashes outperforms native named lists and vectors.

Maintained by John Hughes. Last updated 2 years ago.

6.2 match 1 stars 7.54 score 4.0k scripts 50 dependents

bioc

pathwayPCA:Integrative Pathway Analysis with Modern PCA Methodology and Gene Selection

pathwayPCA is an integrative analysis tool that implements the principal component analysis (PCA) based pathway analysis approaches described in Chen et al. (2008), Chen et al. (2010), and Chen (2011). pathwayPCA allows users to: (1) Test pathway association with binary, continuous, or survival phenotypes. (2) Extract relevant genes in the pathways using the SuperPCA and AES-PCA approaches. (3) Compute principal components (PCs) based on the selected genes. These estimated latent variables represent pathway activities for individual subjects, which can then be used to perform integrative pathway analysis, such as multi-omics analysis. (4) Extract relevant genes that drive pathway significance as well as data corresponding to these relevant genes for additional in-depth analysis. (5) Perform analyses with enhanced computational efficiency with parallel computing and enhanced data safety with S4-class data objects. (6) Analyze studies with complex experimental designs, with multiple covariates, and with interaction effects, e.g., testing whether pathway association with clinical phenotype is different between male and female subjects. Citations: Chen et al. (2008) <https://doi.org/10.1093/bioinformatics/btn458>; Chen et al. (2010) <https://doi.org/10.1002/gepi.20532>; and Chen (2011) <https://doi.org/10.2202/1544-6115.1697>.

Maintained by Gabriel Odom. Last updated 5 months ago.

copynumbervariation dnamethylation geneexpression snp transcription geneprediction genesetenrichment genesignaling genetarget genomewideassociation genomicvariation cellbiology epigenetics functionalgenomics genetics lipidomics metabolomics proteomics systemsbiology transcriptomics classification dimensionreduction featureextraction principalcomponent regression survival multiplecomparison pathways

6.0 match 11 stars 7.74 score 42 scripts

satijalab

SeuratObject:Data Structures for Single Cell Data

Defines S4 classes for single-cell genomic data and associated information, such as dimensionality reduction embeddings, nearest-neighbor graphs, and spatially-resolved coordinates. Provides data access methods and R-native hooks to ensure the Seurat object is familiar to other R users. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, and Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031> for more details.

Maintained by Paul Hoffman. Last updated 2 years ago.

cpp

3.9 match 25 stars 11.69 score 1.2k scripts 88 dependents

fhdsl

metricminer:Mine Metrics from Common Places on the Web

Mine metrics on common places on the web through the power of their APIs (application programming interfaces). It also helps make the data in a format that is easily used for a dashboard or other purposes. There is an associated dashboard template and tutorials that are underdevelopment that help you fully utilize 'metricminer'.

Maintained by Candace Savonen. Last updated 7 days ago.

edtech-software

7.5 match 2 stars 6.13 score 21 scripts

great-northern-diver

loon:Interactive Statistical Data Visualization

An extendable toolkit for interactive data visualization and exploration.

Maintained by R. Wayne Oldford. Last updated 2 years ago.

data-analysis data-science data-visualization exploratory-analysis exploratory-data-analysis high-dimensional-data interactive-graphics interactive-visualizations loon python statistical-analysis statistical-graphics statistics tcl-extension tk

5.1 match 48 stars 9.00 score 93 scripts 5 dependents

bioc

knowYourCG:Functional analysis of DNA methylome datasets

KnowYourCG (KYCG) is a supervised learning framework designed for the functional analysis of DNA methylation data. Unlike existing tools that focus on genes or genomic intervals, KnowYourCG directly targets CpG dinucleotides, featuring automated supervised screenings of diverse biological and technical influences, including sequence motifs, transcription factor binding, histone modifications, replication timing, cell-type-specific methylation, and trait-epigenome associations. KnowYourCG addresses the challenges of data sparsity in various methylation datasets, including low-pass Nanopore sequencing, single-cell DNA methylomes, 5-hydroxymethylation profiles, spatial DNA methylation maps, and array-based datasets for epigenome-wide association studies and epigenetic clocks.

Maintained by Goldberg David. Last updated 3 months ago.

epigenetics dnamethylation sequencing singlecell spatial methylationarray zlib

7.5 match 2 stars 6.10 score 4 scripts

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

3.1 match 461 stars 14.59 score 948 scripts 18 dependents

bioc

lisaClust:lisaClust: Clustering of Local Indicators of Spatial Association

lisaClust provides a series of functions to identify and visualise regions of tissue where spatial associations between cell-types is similar. This package can be used to provide a high-level summary of cell-type colocalization in multiplexed imaging data that has been segmented at a single-cell resolution.

Maintained by Ellis Patrick. Last updated 4 months ago.

singlecell cellbasedassays spatial

6.9 match 3 stars 6.64 score 48 scripts

asalavaty

influential:Identification and Classification of the Most Influential Nodes

Contains functions for the classification and ranking of top candidate features, reconstruction of networks from adjacency matrices and data frames, analysis of the topology of the network and calculation of centrality measures, and identification of the most influential nodes. Also, a function is provided for running SIRIR model, which is the combination of leave-one-out cross validation technique and the conventional SIR model, on a network to unsupervisedly rank the true influence of vertices. Additionally, some functions have been provided for the assessment of dependence and correlation of two network centrality measures as well as the conditional probability of deviation from their corresponding means in opposite direction. Fred Viole and David Nawrocki (2013, ISBN:1490523995). Csardi G, Nepusz T (2006). "The igraph software package for complex network research." InterJournal, Complex Systems, 1695. Adopted algorithms and sources are referenced in function document.

Maintained by Adrian Salavaty. Last updated 6 months ago.

centrality-measures classification-model influence-ranking network-analysis priaritization-model

7.0 match 27 stars 6.54 score 43 scripts 1 dependents

emf-creaf

indicspecies:Relationship Between Species and Groups of Sites

Functions to assess the strength and statistical significance of the relationship between species occurrence/abundance and groups of sites [De Caceres & Legendre (2009) <doi:10.1890/08-1823.1>]. Also includes functions to measure species niche breadth using resource categories [De Caceres et al. (2011) <doi:10.1111/J.1600-0706.2011.19679.x>].

Maintained by Miquel De Cáceres. Last updated 1 months ago.

4.7 match 10 stars 9.66 score 386 scripts 4 dependents

bioc

CNVRanger:Summarization and expression/phenotype association of CNV ranges

The CNVRanger package implements a comprehensive tool suite for CNV analysis. This includes functionality for summarizing individual CNV calls across a population, assessing overlap with functional genomic regions, and association analysis with gene expression and quantitative phenotypes.

Maintained by Ludwig Geistlinger. Last updated 5 months ago.

copynumbervariation differentialexpression geneexpression genomewideassociation genomicvariation microarray rnaseq snp bioconductor-package u24ca289073

7.8 match 7 stars 5.77 score 12 scripts

covaruber

sommer:Solving Mixed Model Equations in R

Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.

Maintained by Giovanny Covarrubias-Pazaran. Last updated 5 days ago.

average-information mixed-models rcpparmadillo openblas cpp openmp

3.5 match 44 stars 12.63 score 300 scripts 10 dependents

sem-in-r

seminr:Building and Estimating Structural Equation Models

A powerful, easy to syntax for specifying and estimating complex Structural Equation Models. Models can be estimated using Partial Least Squares Path Modeling or Covariance-Based Structural Equation Modeling or covariance based Confirmatory Factor Analysis. Methods described in Ray, Danks, and Valdez (2021).

Maintained by Nicholas Patrick Danks. Last updated 3 years ago.

common-factors composites construct pls-models

6.0 match 62 stars 7.46 score 284 scripts

sth1402

CNVreg:CNV-Profile Regression for Copy Number Variants Association Analysis with Penalized Regression

Performs copy number variants association analysis with Lasso and Weighted Fusion penalized regression. Creates a "CNV profile curve" to represent an individual’s CNV events across a genomic region so to capture variations in CNV length and dosage. When evaluating association, the CNV profile curve is directly used as a predictor in the regression model, avoiding the need to predefine CNV loci. CNV profile regression estimates CNV effects at each genome position, making the results comparable across different studies. The penalization encourages sparsity in variable selection with a Lasso penalty and encourages effect smoothness between consecutive CNV events with a weighted fusion penalty, where the weight controls the level of smoothing between adjacent CNVs. For more details, see Si (2024) <doi:10.1101/2024.11.23.624994>.

Maintained by Shannon T. Holloway. Last updated 22 days ago.

16.5 match 2.70 score

cran

hytest:Hypothesis Testing Based on Neyman-Pearson Lemma and Likelihood Ratio Test

Error type I and Optimal critical values to test statistical hypothesis based on Neyman-Pearson Lemma and Likelihood ratio test based on random samples from several distributions. The families of distributions are Bernoulli, Exponential, Geometric, Inverse Normal, Normal, Gamma, Gumbel, Lognormal, Poisson, and Weibull. This package is an ideal resource to help with the teaching of Statistics. The main references for this package are Casella G. and Berger R. (2003,ISBN:0-534-24312-6 , "Statistical Inference. Second Edition", Duxbury Press) and Hogg, R., McKean, J., and Craig, A. (2019,ISBN:013468699, "Introduction to Mathematical Statistic. Eighth edition", Pearson).

Maintained by Carlos Alberto Cardozo Delgado. Last updated 7 months ago.

34.1 match 1.30 score

bioc

PanomiR:Detection of miRNAs that regulate interacting groups of pathways

PanomiR is a package to detect miRNAs that target groups of pathways from gene expression data. This package provides functionality for generating pathway activity profiles, determining differentially activated pathways between user-specified conditions, determining clusters of pathways via the PCxN package, and generating miRNAs targeting clusters of pathways. These function can be used separately or sequentially to analyze RNA-Seq data.

Maintained by Pourya Naderi. Last updated 5 months ago.

geneexpression genesetenrichment genetarget mirna pathways

9.0 match 3 stars 4.89 score 13 scripts

kkholst

lava:Latent Variable Models

A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.

Maintained by Klaus K. Holst. Last updated 3 months ago.

latent-variable-models simulation statistics structural-equation-models

3.4 match 33 stars 12.87 score 610 scripts 478 dependents

friendly

vcdExtra:'vcd' Extensions and Additions

Provides additional data sets, methods and documentation to complement the 'vcd' package for Visualizing Categorical Data and the 'gnm' package for Generalized Nonlinear Models. In particular, 'vcdExtra' extends mosaic, assoc and sieve plots from 'vcd' to handle 'glm()' and 'gnm()' models and adds a 3D version in 'mosaic3d'. Additionally, methods are provided for comparing and visualizing lists of 'glm' and 'loglm' objects. This package is now a support package for the book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer.

Maintained by Michael Friendly. Last updated 8 days ago.

categorical-data-visualization generalized-linear-models mosaic-plots

4.0 match 24 stars 10.85 score 472 scripts 3 dependents

billdenney

PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis

Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.

Maintained by Bill Denney. Last updated 1 months ago.

nca noncompartmental-analysis pharmacokinetics

3.5 match 73 stars 12.53 score 214 scripts 4 dependents

ncss-tech

sharpshootR:A Soil Survey Toolkit

A collection of data processing, visualization, and export functions to support soil survey operations. Many of the functions build on the `SoilProfileCollection` S4 class provided by the aqp package, extending baseline visualization to more elaborate depictions in the context of spatial and taxonomic data. While this package is primarily developed by and for the USDA-NRCS, in support of the National Cooperative Soil Survey, the authors strive for generalization sufficient to support any soil survey operation. Many of the included functions are used by the SoilWeb suite of websites and movile applications. These functions are provided here, with additional documentation, to enable others to replicate high quality versions of these figures for their own purposes.

Maintained by Dylan Beaudette. Last updated 28 days ago.

5.2 match 18 stars 8.37 score 327 scripts

r-forge

Sleuth2:Data Sets from Ramsey and Schafer's "Statistical Sleuth (2nd Ed)"

Data sets from Ramsey, F.L. and Schafer, D.W. (2002), "The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed)", Duxbury.

Maintained by Berwin A Turlach. Last updated 1 years ago.

7.5 match 5.79 score 191 scripts

alexkowa

EnvStats:Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).

Maintained by Alexander Kowarik. Last updated 1 days ago.

3.4 match 26 stars 12.85 score 2.4k scripts 47 dependents

ohdsi

DatabaseConnector:Connecting to Various Database Platforms

An R 'DataBase Interface' ('DBI') compatible interface to various database platforms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Impala', 'Google BigQuery', 'Snowflake', 'Spark', 'SQLite', and 'InterSystems IRIS'). Also includes support for fetching data as 'Andromeda' objects. Uses either 'Java Database Connectivity' ('JDBC') or other 'DBI' drivers to connect to databases.

Maintained by Martijn Schuemie. Last updated 2 months ago.

hades openjdk

3.6 match 56 stars 11.94 score 772 scripts 13 dependents

mlizhangx

NAIR:Network Analysis of Immune Repertoire

Pipelines for studying the adaptive immune repertoire of T cells and B cells via network analysis based on receptor sequence similarity. Relate clinical outcomes to immune repertoires based on their network properties, or to particular clusters and clones within a repertoire. Yang et al. (2023) <doi:10.3389/fimmu.2023.1181825>.

Maintained by Brian Neal. Last updated 3 months ago.

cpp openmp

6.4 match 7 stars 6.66 score 27 scripts

civisanalytics

civis:R Client for the 'Civis Platform API'

A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.

Maintained by Peter Cooman. Last updated 2 months ago.

5.4 match 16 stars 7.84 score 144 scripts

openintrostat

openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs

Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.

Maintained by Mine Çetinkaya-Rundel. Last updated 3 months ago.

data openintro

3.8 match 240 stars 11.29 score 6.0k scripts

mikejareds

hermiter:Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate)

Facilitates estimation of full univariate and bivariate probability density functions and cumulative distribution functions along with full quantile functions (univariate) and nonparametric correlation (bivariate) using Hermite series based estimators. These estimators are particularly useful in the sequential setting (both stationary and non-stationary) and one-pass batch estimation setting for large data sets. Based on: Stephanou, Michael, Varughese, Melvin and Macdonald, Iain. "Sequential quantiles via Hermite series density estimation." Electronic Journal of Statistics 11.1 (2017): 570-607 <doi:10.1214/17-EJS1245>, Stephanou, Michael and Varughese, Melvin. "On the properties of Hermite series based distribution function estimators." Metrika (2020) <doi:10.1007/s00184-020-00785-z> and Stephanou, Michael and Varughese, Melvin. "Sequential estimation of Spearman rank correlation using Hermite series estimators." Journal of Multivariate Analysis (2021) <doi:10.1016/j.jmva.2021.104783>.

Maintained by Michael Stephanou. Last updated 7 months ago.

cumulative-distribution-function kendall-correlation-coefficient online-algorithms probability-density-function quantile spearman-correlation-coefficient statistics streaming-algorithms streaming-data cpp

8.3 match 15 stars 5.11 score 17 scripts

hanchenphd

MAGEE:Mixed Model Association Test for GEne-Environment Interaction

Use a 'glmmkin' class object (GMMAT package) from the null model to perform generalized linear mixed model-based single-variant and variant set main effect tests, gene-environment interaction tests, and joint tests for association, as proposed in Wang et al. (2020) <DOI:10.1002/gepi.22351>.

Maintained by Han Chen. Last updated 8 months ago.

openblas libzstd libdeflate cpp

8.5 match 4.95 score 9 scripts

abdel-elsayed87

GRIN2:Genomic Random Interval (GRIN)

Improved version of 'GRIN' software that streamlines its use in practice to analyze genomic lesion data, accelerate its computing, and expand its analysis capabilities to answer additional scientific questions including a rigorous evaluation of the association of genomic lesions with RNA expression. Pounds, Stan, et al. (2013) <DOI:10.1093/bioinformatics/btt372>.

Maintained by Abdelrahman Elsayed. Last updated 4 months ago.

12.6 match 3.30 score

bioc

geNetClassifier:Classify diseases and build associated gene networks using gene expression profiles

Comprehensive package to automatically train and validate a multi-class SVM classifier based on gene expression data. Provides transparent selection of gene markers, their coexpression networks, and an interface to query the classifier.

Maintained by Sara Aibar. Last updated 5 months ago.

classification differentialexpression microarray

9.5 match 4.38 score 1 scripts 2 dependents

bioc

SAIGEgds:Scalable Implementation of Generalized mixed models using GDS files in Phenome-Wide Association Studies

Scalable implementation of generalized mixed models with highly optimized C++ implementation and integration with Genomic Data Structure (GDS) files. It is designed for single variant tests and set-based aggregate tests in large-scale Phenome-wide Association Studies (PheWAS) with millions of variants and samples, controlling for sample structure and case-control imbalance. The implementation is based on the SAIGE R package (v0.45, Zhou et al. 2018 and Zhou et al. 2020), and it is extended to include the state-of-the-art ACAT-O set-based tests. Benchmarks show that SAIGEgds is significantly faster than the SAIGE R package.

Maintained by Xiuwen Zheng. Last updated 4 months ago.

software genetics statisticalmethod genomewideassociation gds gwas mixed-model phewas openblas cpp

6.8 match 7 stars 6.04 score 15 scripts

kasperwelbers

corpustools:Managing, Querying and Analyzing Tokenized Text

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.

Maintained by Kasper Welbers. Last updated 6 months ago.

cpp

5.5 match 31 stars 7.50 score 174 scripts 1 dependents

rmi-pacta

r2dii.colours:2 Degrees Investing Colour Palettes in Different Formats

Get colour values from different colour palettes used by 2 Degrees Investing (2DII) organization in their reserach streams. Different ways to obtain the colour values are available: dataframe or a function call.

Maintained by Monika Furdyna. Last updated 11 months ago.

13.9 match 3 stars 2.95 score 6 scripts 2 dependents

termehs

netropy:Statistical Entropy Analysis of Network Data

Statistical entropy analysis of network data as introduced by Frank and Shafie (2016) <doi:10.1177/0759106315615511>, and a in textbook which is in progress.

Maintained by Termeh Shafie. Last updated 5 months ago.

6.6 match 12 stars 6.26 score 9 scripts

nimble-dev

nimble:MCMC, Particle Filtering, and Programmable Hierarchical Modeling

A system for writing hierarchical statistical models largely compatible with 'BUGS' and 'JAGS', writing nimbleFunctions to operate models and do basic R-style math, and compiling both models and nimbleFunctions via custom-generated C++. 'NIMBLE' includes default methods for MCMC, Laplace Approximation, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers 'NIMBLE' provides. 'NIMBLE' extends the 'BUGS'/'JAGS' language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the 'BUGS'/'JAGS' language for writing models, one can use 'NIMBLE' for writing arbitrary other kinds of model-generic algorithms as well. A full User Manual is available at <https://r-nimble.org>.

Maintained by Christopher Paciorek. Last updated 20 days ago.

bayesian-inference bayesian-methods hierarchical-models mcmc probabilistic-programming openblas cpp

3.2 match 169 stars 12.97 score 2.6k scripts 19 dependents

bioc

methodical:Discovering genomic regions where methylation is strongly associated with transcriptional activity

DNA methylation is generally considered to be associated with transcriptional silencing. However, comprehensive, genome-wide investigation of this relationship requires the evaluation of potentially millions of correlation values between the methylation of individual genomic loci and expression of associated transcripts in a relatively large numbers of samples. Methodical makes this process quick and easy while keeping a low memory footprint. It also provides a novel method for identifying regions where a number of methylation sites are consistently strongly associated with transcriptional expression. In addition, Methodical enables housing DNA methylation data from diverse sources (e.g. WGBS, RRBS and methylation arrays) with a common framework, lifting over DNA methylation data between different genome builds and creating base-resolution plots of the association between DNA methylation and transcriptional activity at transcriptional start sites.

Maintained by Richard Heery. Last updated 2 months ago.

dnamethylation methylationarray transcription genomewideassociation software openjdk

8.8 match 4.65 score 14 scripts

crsh

papaja:Prepare American Psychological Association Journal Articles with R Markdown

Tools to create dynamic, submission-ready manuscripts, which conform to American Psychological Association manuscript guidelines. We provide R Markdown document formats for manuscripts (PDF and Word) and revision letters (PDF). Helper functions facilitate reporting statistical analyses or create publication-ready tables and plots.

Maintained by Frederik Aust. Last updated 1 months ago.

apa apa-guidelines journal manuscript psychology reproducible-paper reproducible-research rmarkdown

3.3 match 663 stars 12.00 score 1.7k scripts 2 dependents

dstanley4

apaTables:Create American Psychological Association (APA) Style Tables

A common task faced by researchers is the creation of APA style (i.e., American Psychological Association style) tables from statistical output. In R a large number of function calls are often needed to obtain all of the desired information for a single APA style table. As well, the process of manually creating APA style tables in a word processor is prone to transcription errors. This package creates Word files (.doc files) and latex code containing APA style tables for several types of analyses. Using this package minimizes transcription errors and reduces the number commands needed by the user.

Maintained by David Stanley. Last updated 7 months ago.

5.1 match 61 stars 7.86 score 438 scripts

ausgis

SecDim:The Second Dimension of Spatial Association

Most of the current methods explore spatial association using observations at sample locations, which are defined as the first dimension of spatial association (FDA). The proposed concept of the second dimension of spatial association (SDA), as described in Yongze Song (2022) <doi:10.1016/j.jag.2022.102834>, aims to extract in-depth information about the geographical environment from locations outside sample locations for exploring spatial association.

Maintained by Wenbo Lv. Last updated 7 months ago.

spatial-association spatial-predictions

14.8 match 1 stars 2.70 score 2 scripts

daniel-jg

BeviMed:Bayesian Evaluation of Variant Involvement in Mendelian Disease

A fast integrative genetic association test for rare diseases based on a model for disease status given allele counts at rare variant sites. Probability of association, mode of inheritance and probability of pathogenicity for individual variants are all inferred in a Bayesian framework - 'A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases', Greene et al 2017 <doi:10.1016/j.ajhg.2017.05.015>.

Maintained by Daniel Greene. Last updated 10 months ago.

cpp

11.7 match 1 stars 3.41 score 17 scripts

bioc

GEM:GEM: fast association study for the interplay of Gene, Environment and Methylation

Tools for analyzing EWAS, methQTL and GxE genome widely.

Maintained by Hong Pan. Last updated 5 months ago.

methylseq methylationarray genomewideassociation regression dnamethylation snp geneexpression gui

7.3 match 5.43 score 27 scripts

fmichonneau

phylobase:Base Package for Phylogenetic Structures and Comparative Data

Provides a base S4 class for comparative methods, incorporating one or more trees and trait data.

Maintained by Francois Michonneau. Last updated 1 years ago.

phylogenetics cpp

3.6 match 18 stars 11.10 score 394 scripts 18 dependents

djvanderlaan

datapackage:Creating and Reading Data Packages

Open, read data from and modify Data Packages. Data Packages are an open standard for bundling and describing data sets (<https://datapackage.org>). When data is read from a Data Package care is taken to convert the data as much a possible to R appropriate data types. The package can be extended with plugins for additional data types.

Maintained by Jan van der Laan. Last updated 6 days ago.

datapackage frictionless

7.0 match 2 stars 5.65 score

bioc

ChIPQC:Quality metrics for ChIPseq data

Quality metrics for ChIPseq data.

Maintained by Tom Carroll. Last updated 5 months ago.

sequencing chipseq qualitycontrol reportwriting

7.2 match 5.45 score 140 scripts

bioc

CytoML:A GatingML Interface for Cross Platform Cytometry Data Sharing

Uses platform-specific implemenations of the GatingML2.0 standard to exchange gated cytometry data with other software platforms.

Maintained by Mike Jiang. Last updated 25 days ago.

immunooncology flowcytometry dataimport datarepresentation zlib openblas libxml2 cpp

5.2 match 30 stars 7.60 score 132 scripts

zrmacc

RNOmni:Rank Normal Transformation Omnibus Test

Inverse normal transformation (INT) based genetic association testing. These tests are recommend for continuous traits with non-normally distributed residuals. INT-based tests robustly control the type I error in settings where standard linear regression does not, as when the residual distribution exhibits excess skew or kurtosis. Moreover, INT-based tests outperform standard linear regression in terms of power. These tests may be classified into two types. In direct INT (D-INT), the phenotype is itself transformed. In indirect INT (I-INT), phenotypic residuals are transformed. The omnibus test (O-INT) adaptively combines D-INT and I-INT into a single robust and statistically powerful approach. See McCaw ZR, Lane JM, Saxena R, Redline S, Lin X. "Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies" <doi:10.1111/biom.13214>.

Maintained by Zachary McCaw. Last updated 5 months ago.

openblas cpp

5.8 match 6.80 score 303 scripts 3 dependents

objornstad

ncf:Spatial Covariance Functions

Spatial (cross-)covariance and related geostatistical tools: the nonparametric (cross-)covariance function , the spline correlogram, the nonparametric phase coherence function, local indicators of spatial association (LISA), (Mantel) correlogram, (Partial) Mantel test.

Maintained by Ottar N. Bjornstad. Last updated 3 years ago.

6.0 match 5 stars 6.44 score 328 scripts 1 dependents

santagos

dad:Three-Way / Multigroup Data Analysis Through Densities

The data consist of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides tools to create or manage such data and functional methods (principal component analysis, multidimensional scaling, cluster analysis, discriminant analysis...) for such probability densities.

Maintained by Pierre Santagostini. Last updated 4 months ago.

7.2 match 5.32 score 92 scripts

rstudio

shiny:Web Application Framework for R

Makes it incredibly easy to build interactive web applications with R. Automatic "reactive" binding between inputs and outputs and extensive prebuilt widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.

Maintained by Winston Chang. Last updated 6 days ago.

reactive rstudio shiny web-app web-development

1.8 match 5.5k stars 21.31 score 108k scripts 1.8k dependents

heliosdrm

pwr:Basic Functions for Power Analysis

Power analysis functions along the lines of Cohen (1988).

Maintained by Helios De Rosario. Last updated 1 years ago.

2.9 match 105 stars 13.05 score 2.6k scripts 28 dependents

bioc

rcellminer:rcellminer: Molecular Profiles, Drug Response, and Chemical Structures for the NCI-60 Cell Lines

The NCI-60 cancer cell line panel has been used over the course of several decades as an anti-cancer drug screen. This panel was developed as part of the Developmental Therapeutics Program (DTP, http://dtp.nci.nih.gov/) of the U.S. National Cancer Institute (NCI). Thousands of compounds have been tested on the NCI-60, which have been extensively characterized by many platforms for gene and protein expression, copy number, mutation, and others (Reinhold, et al., 2012). The purpose of the CellMiner project (http://discover.nci.nih.gov/ cellminer) has been to integrate data from multiple platforms used to analyze the NCI-60 and to provide a powerful suite of tools for exploration of NCI-60 data.

Maintained by Augustin Luna. Last updated 5 months ago.

acgh cellbasedassays copynumbervariation geneexpression pharmacogenomics pharmacogenetics mirna cheminformatics visualization software systemsbiology

6.7 match 5.71 score 113 scripts

bioc

MicrobiomeProfiler:An R/shiny package for microbiome functional enrichment analysis

This is an R/shiny package to perform functional enrichment analysis for microbiome data. This package was based on clusterProfiler. Moreover, MicrobiomeProfiler support KEGG enrichment analysis, COG enrichment analysis, Microbe-Disease association enrichment analysis, Metabo-Pathway analysis.

Maintained by Guangchuang Yu. Last updated 5 months ago.

microbiome software visualization kegg

5.6 match 38 stars 6.80 score 22 scripts

rudeboybert

fivethirtyeight:Data and Code Behind the Stories and Interactives at 'FiveThirtyEight'

Datasets and code published by the data journalism website 'FiveThirtyEight' available at <https://github.com/fivethirtyeight/data>. Note that while we received guidance from editors at 'FiveThirtyEight', this package is not officially published by 'FiveThirtyEight'.

Maintained by Albert Y. Kim. Last updated 2 years ago.

data-science datajournalism fivethirtyeight statistics

3.4 match 453 stars 10.98 score 1.7k scripts

r-dbi

DBI:R Database Interface

A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.

Maintained by Kirill Müller. Last updated 14 days ago.

database interface

1.8 match 302 stars 20.87 score 19k scripts 2.9k dependents

bioc

Statial:A package to identify changes in cell state relative to spatial associations

Statial is a suite of functions for identifying changes in cell state. The functionality provided by Statial provides robust quantification of cell type localisation which are invariant to changes in tissue structure. In addition to this Statial uncovers changes in marker expression associated with varying levels of localisation. These features can be used to explore how the structure and function of different cell types may be altered by the agents they are surrounded with.

Maintained by Farhan Ameen. Last updated 5 months ago.

singlecell spatial classification single-cell

5.7 match 5 stars 6.49 score 23 scripts

bahlolab

UKB.COVID19:UK Biobank COVID-19 Data Processing and Risk Factor Association Tests

Process UK Biobank COVID-19 test result data for susceptibility, severity and mortality analyses, perform potential non-genetic COVID-19 risk factor and co-morbidity association tests. Wang et al. (2021) <doi:10.5281/zenodo.5174381>.

Maintained by Longfei Wang. Last updated 8 months ago.

9.2 match 1 stars 4.00 score 4 scripts

qiuanzhu

xlink:Genetic Association Models for X-Chromosome SNPS on Continuous, Binary and Survival Outcomes

The expression of X-chromosome undergoes three possible biological processes: X-chromosome inactivation (XCI), escape of the X-chromosome inactivation (XCI-E),and skewed X-chromosome inactivation (XCI-S). To analyze the X-linked genetic association for phenotype such as continuous, binary, and time-to-event outcomes with the actual process unknown, we propose a unified approach of maximizing the likelihood or partial likelihood over all of the potential biological processes. The methods are described in Wei Xu, Meiling Hao (2017) <doi:10.1002/gepi.22097>. And also see Dongxiao Han, Meiling Hao, Lianqiang Qu, Wei Xu (2019) <doi:10.1177/0962280219859037>.

Maintained by Yi Zhu. Last updated 6 years ago.

9.9 match 3.70 score 4 scripts

cran

MASS:Support Functions and Datasets for Venables and Ripley's MASS

Functions and datasets to support Venables and Ripley, "Modern Applied Statistics with S" (4th edition, 2002).

Maintained by Brian Ripley. Last updated 1 months ago.

3.4 match 19 stars 10.64 score 11k dependents

mrcieu

mrbayes:Bayesian Summary Data Models for Mendelian Randomization Studies

Bayesian estimation of inverse variance weighted (IVW), Burgess et al. (2013) <doi:10.1002/gepi.21758>, and MR-Egger, Bowden et al. (2015) <doi:10.1093/ije/dyv080>, summary data models for Mendelian randomization analyses.

Maintained by Tom Palmer. Last updated 1 days ago.

cpp

6.5 match 4 stars 5.60 score 2 scripts

moosa-r

rbioapi:User-Friendly R Interface to Biologic Web Services' API

Currently fully supports Enrichr, JASPAR, miEAA, PANTHER, Reactome, STRING, and UniProt! The goal of rbioapi is to provide a user-friendly and consistent interface to biological databases and services. In a way that insulates the user from the technicalities of using web services API and creates a unified and easy-to-use interface to biological and medical web services. This is an ongoing project; New databases and services will be added periodically. Feel free to suggest any databases or services you often use.

Maintained by Moosa Rezwani. Last updated 2 months ago.

api-client bioinformatics biology enrichment enrichment-analysis enrichr jaspar mieaa over-representation-analysis panther reactome string uniprot

4.8 match 20 stars 7.60 score 55 scripts

theoreticalecology

sjSDM:Scalable Joint Species Distribution Modeling

A scalable and fast method for estimating joint Species Distribution Models (jSDMs) for big community data, including eDNA data. The package estimates a full (i.e. non-latent) jSDM with different response distributions (including the traditional multivariate probit model). The package allows to perform variation partitioning (VP) / ANOVA on the fitted models to separate the contribution of environmental, spatial, and biotic associations. In addition, the total R-squared can be further partitioned per species and site to reveal the internal metacommunity structure, see Leibold et al., <doi:10.1111/oik.08618>. The internal structure can then be regressed against environmental and spatial distinctiveness, richness, and traits to analyze metacommunity assembly processes. The package includes support for accounting for spatial autocorrelation and the option to fit responses using deep neural networks instead of a standard linear predictor. As described in Pichler & Hartig (2021) <doi:10.1111/2041-210X.13687>, scalability is achieved by using a Monte Carlo approximation of the joint likelihood implemented via 'PyTorch' and 'reticulate', which can be run on CPUs or GPUs.

Maintained by Maximilian Pichler. Last updated 1 months ago.

deep-learning gpu-acceleration machine-learning species-distribution-modelling species-interactions

4.7 match 69 stars 7.64 score 70 scripts