Showing 200 of total 1795 results (show query)
mhahsler
arules:Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.
Maintained by Michael Hahsler. Last updated 2 months ago.
arulesassociation-rulesfrequent-itemsets
37.5 match 194 stars 13.99 score 3.3k scripts 28 dependentswviechtb
metadat:Meta-Analysis Datasets
A collection of meta-analysis datasets for teaching purposes, illustrating/testing meta-analytic methods, and validating published analyses.
Maintained by Wolfgang Viechtbauer. Last updated 18 days ago.
24.2 match 30 stars 10.54 score 65 scripts 93 dependentsmhahsler
arulesViz:Visualizing Association Rules and Frequent Itemsets
Extends package 'arules' with various visualization techniques for association rules and itemsets. The package also includes several interactive visualizations for rule exploration. Michael Hahsler (2017) <doi:10.32614/RJ-2017-047>.
Maintained by Michael Hahsler. Last updated 7 months ago.
arulesassociation-rulesfrequent-itemsetsinteractive-visualizationsvisualization
21.4 match 54 stars 11.03 score 1.7k scripts 2 dependentsbioc
SIAMCAT:Statistical Inference of Associations between Microbial Communities And host phenoTypes
Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).
Maintained by Jakob Wirbel. Last updated 5 months ago.
immunooncologymetagenomicsclassificationmicrobiomesequencingpreprocessingclusteringfeatureextractiongeneticvariabilitymultiplecomparisonregression
35.1 match 6.72 score 147 scriptstiledb-inc
tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays
The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.
Maintained by Isaiah Norton. Last updated 4 days ago.
arrayhdfss3storage-managertiledbcpp
19.1 match 108 stars 11.79 score 306 scripts 4 dependentsramiromagno
gwasrapidd:'REST' 'API' Client for the 'NHGRI'-'EBI' 'GWAS' Catalog
'GWAS' R 'API' Data Download. This package provides easy access to the 'NHGRI'-'EBI' 'GWAS' Catalog data by accessing the 'REST' 'API' <https://www.ebi.ac.uk/gwas/rest/docs/api/>.
Maintained by Ramiro Magno. Last updated 1 years ago.
thirdpartyclientbiomedicalinformaticsgenomewideassociationsnpassociation-studiesgwas-cataloghumanrest-clienttraittrait-ontology
26.7 match 95 stars 8.10 score 49 scripts 1 dependentsjiefei-wang
aws.ecx:Communicating with AWS EC2 and ECS using AWS REST APIs
Providing the functions for communicating with Amazon Web Services(AWS) Elastic Compute Cloud(EC2) and Elastic Container Service(ECS). The functions will have the prefix 'ecs_' or 'ec2_' depending on the class of the API. The request will be sent via the REST API and the parameters are given by the function argument. The credentials can be set via 'aws_set_credentials'. The EC2 documentation can be found at <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Welcome.html> and ECS can be found at <https://docs.aws.amazon.com/AmazonECS/latest/APIReference/Welcome.html>.
Maintained by Jiefei Wang. Last updated 3 years ago.
45.6 match 1 stars 4.18 score 2 scriptselvanceyhan
pcds:Proximity Catch Digraphs and Their Applications
Contains the functions for construction and visualization of various families of the proximity catch digraphs (PCDs) (see (Ceyhan (2005) ISBN:978-3-639-19063-2), for computing the graph invariants for testing the patterns of segregation and association against complete spatial randomness (CSR) or uniformity in one, two and three dimensional cases. The package also has tools for generating points from these spatial patterns. The graph invariants used in testing spatial point data are the domination number (Ceyhan (2011) <doi:10.1080/03610921003597211>) and arc density (Ceyhan et al. (2006) <doi:10.1016/j.csda.2005.03.002>; Ceyhan et al. (2007) <doi:10.1002/cjs.5550350106>). The PCD families considered are Arc-Slice PCDs, Proportional-Edge PCDs, and Central Similarity PCDs.
Maintained by Elvan Ceyhan. Last updated 2 years ago.
32.1 match 5.80 score 21 scripts 2 dependentsisglobal-brge
SNPassoc:SNPs-Based Whole Genome Association Studies
Functions to perform most of the common analysis in genome association studies are implemented. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Max-statistic and genetic risk-allele score exact distributions are also possible to be estimated. The methods are described in Gonzalez JR et al., 2007 <doi: 10.1093/bioinformatics/btm025>.
Maintained by Dolors Pelegri. Last updated 5 months ago.
19.6 match 16 stars 9.11 score 89 scripts 6 dependentsnalimilan
logmult:Log-Multiplicative Models, Including Association Models
Functions to fit log-multiplicative models using 'gnm', with support for convenient printing, plots, and jackknife/bootstrap standard errors. For complex survey data, models can be fitted from design objects from the 'survey' package. Currently supported models include UNIDIFF (Erikson & Goldthorpe, 1992), a.k.a. log-multiplicative layer effect model (Xie, 1992) <doi:10.2307/2096242>, and several association models: Goodman (1979) <doi:10.2307/2286971> row-column association models of the RC(M) and RC(M)-L families with one or several dimensions; two skew-symmetric association models proposed by Yamaguchi (1990) <doi:10.2307/271086> and by van der Heijden & Mooijaart (1995) <doi:10.1177/0049124195024001002> Functions allow computing the intrinsic association coefficient (see Bouchet-Valat (2022) <doi:10.1177/0049124119852389>) and the Altham (1970) index <doi:10.1111/j.2517-6161.1970.tb00816.x>, including via the Bayes shrinkage estimator proposed by Zhou (2015) <doi:10.1177/0081175015570097>; and the RAS/IPF/Deming-Stephan algorithm.
Maintained by Milan Bouchet-Valat. Last updated 3 years ago.
log-linear-modelmodellingstatistics
34.1 match 4 stars 5.18 score 76 scriptsdataoneorg
dataone:R Interface to the DataONE REST API
Provides read and write access to data and metadata from the DataONE network <https://www.dataone.org> of data repositories. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository. Users can also insert and update data objects on repositories that support these methods.
Maintained by Matthew B. Jones. Last updated 3 years ago.
17.5 match 36 stars 9.93 score 472 scripts 3 dependentscran
wavethresh:Wavelets Statistics and Transforms
Performs 1, 2 and 3D real and complex-valued wavelet transforms, nondecimated transforms, wavelet packet transforms, nondecimated wavelet packet transforms, multiple wavelet transforms, complex-valued wavelet transforms, wavelet shrinkage for various kinds of data, locally stationary wavelet time series, nonstationary multiscale transfer function modeling, density estimation.
Maintained by Guy Nason. Last updated 7 months ago.
26.1 match 5.90 score 41 dependentsbioc
GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness
The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.
Maintained by Stephanie M. Gogarten. Last updated 2 months ago.
snpgeneticvariabilitygeneticsstatisticalmethoddimensionreductionprincipalcomponentgenomewideassociationqualitycontrolbiocviews
14.5 match 36 stars 10.44 score 342 scripts 1 dependentsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 7 days ago.
12.5 match 12 stars 11.94 score 448 scripts 16 dependentsbioc
APL:Association Plots
APL is a package developed for computation of Association Plots (AP), a method for visualization and analysis of single cell transcriptomics data. The main focus of APL is the identification of genes characteristic for individual clusters of cells from input data. The package performs correspondence analysis (CA) and allows to identify cluster-specific genes using Association Plots. Additionally, APL computes the cluster-specificity scores for all genes which allows to rank the genes by their specificity for a selected cell cluster of interest.
Maintained by Clemens Kohl. Last updated 5 months ago.
statisticalmethoddimensionreductionsinglecellsequencingrnaseqgeneexpression
23.5 match 15 stars 6.31 score 15 scriptsbioc
TFARM:Transcription Factors Association Rules Miner
It searches for relevant associations of transcription factors with a transcription factor target, in specific genomic regions. It also allows to evaluate the Importance Index distribution of transcription factors (and combinations of transcription factors) in association rules.
Maintained by Liuba Nausicaa Martino. Last updated 5 months ago.
biologicalquestioninfrastructurestatisticalmethodtranscription
36.1 match 4.00 score 2 scriptsbioc
flowWorkspace:Infrastructure for representing and interacting with gated and ungated cytometry data sets.
This package is designed to facilitate comparison of automated gating methods against manual gating done in flowJo. This package allows you to import basic flowJo workspaces into BioConductor and replicate the gating from flowJo using the flowCore functionality. Gating hierarchies, groups of samples, compensation, and transformation are performed so that the output matches the flowJo analysis.
Maintained by Greg Finak. Last updated 25 days ago.
immunooncologyflowcytometrydataimportpreprocessingdatarepresentationzlibopenblascpp
17.3 match 7.89 score 576 scripts 10 dependentsralmond
RNetica:R interface to Netica(R) Bayesian Network Engine
This provides an R interface to the Netica (http://norsys.com/) Bayesian network library API.
Maintained by Russell Almond. Last updated 3 months ago.
27.7 match 2 stars 4.92 score 14 scripts 2 dependentshelixcn
spaa:SPecies Association Analysis
Miscellaneous functions for analysing species association and niche overlap.
Maintained by Jinlong Zhang. Last updated 4 years ago.
18.3 match 12 stars 7.40 score 155 scripts 1 dependentsnicolas-robette
descriptio:Descriptive Statistical Analysis
Description of statistical associations between variables : measures of local and global association between variables (phi, Cramér V, correlations, eta-squared, Goodman and Kruskal tau, permutation tests, etc.), multiple graphical representations of the associations between variables (using 'ggplot2') and weighted statistics.
Maintained by Nicolas Robette. Last updated 7 months ago.
26.0 match 4 stars 5.00 score 11 scripts 3 dependentsmhahsler
arulesCBA:Classification Based on Association Rules
Provides the infrastructure for association rule-based classification including the algorithms CBA, CMAR, CPAR, C4.5, FOIL, PART, PRM, RCAR, and RIPPER to build associative classifiers. Hahsler et al (2019) <doi:10.32614/RJ-2019-048>.
Maintained by Michael Hahsler. Last updated 7 months ago.
association-rulesclassification
22.1 match 3 stars 5.42 score 47 scripts 1 dependentsconfig-i1
greybox:Toolbox for Model Building and Forecasting
Implements functions and instruments for regression model building and its application to forecasting. The main scope of the package is in variables selection and models specification for cases of time series data. This includes promotional modelling, selection between different dynamic regressions with non-standard distributions of errors, selection based on cross validation, solutions to the fat regression model problem and more. Models developed in the package are tailored specifically for forecasting purposes. So as a results there are several methods that allow producing forecasts from these models and visualising them.
Maintained by Ivan Svetunkov. Last updated 18 days ago.
forecastingmodel-selectionmodel-selection-and-evaluationregressionregression-modelsstatisticscpp
10.6 match 30 stars 11.03 score 97 scripts 34 dependentsxiaoruizhu
PAsso:Assessing the Partial Association Between Ordinal Variables
An implementation of the unified framework for assessing partial association between ordinal variables after adjusting for a set of covariates (Dungang Liu, Shaobo Li, Yan Yu and Irini Moustaki (2020) <doi:10.1080/01621459.2020.1796394> Journal of the American Statistical Association). This package provides a set of tools to quantify, visualize, and test partial associations between multiple ordinal variables. It can produce a number of $phi$ measures, partial regression plots, 3-D plots, and p-values for testing H_0: phi=0 or H_0: phi <= delta.
Maintained by Xiaorui (Jeremy) Zhu. Last updated 1 years ago.
association-analysisordinal-variablespartial-associationstatisticscpp
27.8 match 7 stars 4.14 score 13 scripts 1 dependentsvictor-navarro
calmr:Canonical Associative Learning Models and their Representations
Implementations of canonical associative learning models, with tools to run experiment simulations, estimate model parameters, and compare model representations. Experiments and results are represented using S4 classes and methods.
Maintained by Victor Navarro. Last updated 10 months ago.
17.8 match 3 stars 6.40 score 17 scriptsbranchlab
metasnf:Meta Clustering with Similarity Network Fusion
Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.
Maintained by Prashanth S Velayudhan. Last updated 7 days ago.
bioinformaticsclusteringmetaclusteringsnf
13.9 match 8 stars 8.21 score 30 scriptsbioc
ISAnalytics:Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies
In gene therapy, stem cells are modified using viral vectors to deliver the therapeutic transgene and replace functional properties since the genetic modification is stable and inherited in all cell progeny. The retrieval and mapping of the sequences flanking the virus-host DNA junctions allows the identification of insertion sites (IS), essential for monitoring the evolution of genetically modified cells in vivo. A comprehensive toolkit for the analysis of IS is required to foster clonal trackign studies and supporting the assessment of safety and long term efficacy in vivo. This package is aimed at (1) supporting automation of IS workflow, (2) performing base and advance analysis for IS tracking (clonal abundance, clonal expansions and statistics for insertional mutagenesis, etc.), (3) providing basic biology insights of transduced stem cells in vivo.
Maintained by Francesco Gazzo. Last updated 4 months ago.
biomedicalinformaticssequencingsinglecell
19.4 match 3 stars 5.83 score 15 scriptsbioc
GWASTools:Tools for Genome Wide Association Studies
Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.
Maintained by Stephanie M. Gogarten. Last updated 14 days ago.
snpgeneticvariabilityqualitycontrolmicroarray
10.2 match 17 stars 10.67 score 396 scripts 5 dependentstobiaskley
quantspec:Quantile-Based Spectral Analysis of Time Series
Methods to determine, smooth and plot quantile periodograms for univariate and multivariate time series.
Maintained by Tobias Kley. Last updated 9 years ago.
18.7 match 10 stars 5.84 score 46 scripts 1 dependentsbioc
DegCre:Probabilistic association of DEGs to CREs from differential data
DegCre generates associations between differentially expressed genes (DEGs) and cis-regulatory elements (CREs) based on non-parametric concordance between differential data. The user provides GRanges of DEG TSS and CRE regions with differential p-value and optionally log-fold changes and DegCre returns an annotated Hits object with associations and their calculated probabilities. Additionally, the package provides functionality for visualization and conversion to other formats.
Maintained by Brian S. Roberts. Last updated 4 months ago.
geneexpressiongeneregulationatacseqchipseqdnaseseqrnaseq
20.5 match 5 stars 5.30 score 2 scriptspbreheny
ncvreg:Regularization Paths for SCAD and MCP Penalized Regression Models
Fits regularization paths for linear regression, GLM, and Cox regression models using lasso or nonconvex penalties, in particular the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty, with options for additional L2 penalties (the "elastic net" idea). Utilities for carrying out cross-validation as well as post-fitting visualization, summarization, inference, and prediction are also provided. For more information, see Breheny and Huang (2011) <doi:10.1214/10-AOAS388> or visit the ncvreg homepage <https://pbreheny.github.io/ncvreg/>.
Maintained by Patrick Breheny. Last updated 14 days ago.
9.0 match 43 stars 12.03 score 458 scripts 38 dependentssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 14 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
7.0 match 959 stars 15.20 score 4.0k scripts 21 dependentsdboslab
expowo:An R package for mining global plant diversity and distribution data
Produces diversity estimates and species lists with associated global distribution for any vascular plant family and genus from 'Plants of the World Online' database <https://powo.science.kew.org/>, by interacting with the source code of each plant taxon page. It also creates global maps of species richness, graphics of species discoveries and nomenclatural changes over time. For more details
Maintained by Debora Zuanny. Last updated 8 days ago.
13.5 match 8 stars 7.44 score 64 scriptscbhurley
bullseye:Visualising Multiple Pairwise Variable Correlations and Other Scores
We provide a tidy data structure and visualisations for multiple or grouped variable correlations, general association measures scagnostics and other pairwise scores suitable for numerical, ordinal and nominal variables. Supported measures include distance correlation, maximal information, ace correlation, Kendall's tau, and polychoric correlation.
Maintained by Catherine Hurley. Last updated 25 days ago.
17.5 match 2 stars 5.58 score 14 scriptsropensci
allodb:Tree Biomass Estimation at Extra-Tropical Forest Plots
Standardize and simplify the tree biomass estimation process across globally distributed extratropical forests.
Maintained by Erika Gonzalez-Akre. Last updated 26 days ago.
16.2 match 38 stars 5.94 score 38 scriptspharmar
riskmetric:Risk Metrics to Evaluating R Packages
Facilities for assessing R packages against a number of metrics to help quantify their robustness.
Maintained by Eli Miller. Last updated 8 days ago.
10.7 match 166 stars 8.98 score 43 scriptsbioc
hmdbQuery:utilities for exploration of human metabolome database
Define utilities for exploration of human metabolome database, including functions to retrieve specific metabolite entries and data snapshots with pairwise associations (metabolite-gene,-protein,-disease).
Maintained by VJ Carey. Last updated 5 months ago.
23.1 match 4.11 score 13 scriptsi02momuj
RKEEL:Using 'KEEL' in R Code
'KEEL' is a popular 'Java' software for a large number of different knowledge data discovery tasks. This package takes the advantages of 'KEEL' and R, allowing to use 'KEEL' algorithms in simple R code. The implemented R code layer between R and 'KEEL' makes easy both using 'KEEL' algorithms in R as implementing new algorithms for 'RKEEL' in a very simple way. It includes more than 100 algorithms for classification, regression, preprocess, association rules and imbalance learning, which allows a more complete experimentation process. For more information about 'KEEL', see <http://www.keel.es/>.
Maintained by Jose M. Moyano. Last updated 2 years ago.
38.7 match 2 stars 2.41 score 130 scriptsbioc
rGREAT:GREAT Analysis - Functional Enrichment on Genomic Regions
GREAT (Genomic Regions Enrichment of Annotations Tool) is a type of functional enrichment analysis directly performed on genomic regions. This package implements the GREAT algorithm (the local GREAT analysis), also it supports directly interacting with the GREAT web service (the online GREAT analysis). Both analysis can be viewed by a Shiny application. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions.
Maintained by Zuguang Gu. Last updated 19 days ago.
genesetenrichmentgopathwayssoftwaresequencingwholegenomegenomeannotationcoveragecpp
9.4 match 86 stars 9.96 score 320 scripts 1 dependentsstscl
gdverse:Analysis of Spatial Stratified Heterogeneity
Detecting spatial associations based on the concept of spatial stratified heterogeneity while also considering spatial dependencies, spatial interpretability, complex spatial interactions, and robust spatial stratification. In addition, it supports the spatial stratified heterogeneity family described in Lv et al. (2025)<doi:10.1111/tgis.70032>.
Maintained by Wenbo Lv. Last updated 3 days ago.
geographical-detectorgeoinformaticsgeospatial-analysisspatial-statisticsspatial-stratified-heterogeneitycpp
10.2 match 33 stars 9.10 score 41 scripts 2 dependentsusepa
tcpl:ToxCast Data Analysis Pipeline
The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.
Maintained by Jason Brown. Last updated 13 days ago.
9.8 match 36 stars 9.39 score 90 scriptskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 12 months ago.
40.1 match 2.28 score 38 scriptsrich-iannone
DiagrammeR:Graph/Network Visualization
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
Maintained by Richard Iannone. Last updated 2 months ago.
graphgraph-functionsnetwork-graphproperty-graphvisualization
5.9 match 1.7k stars 15.29 score 3.8k scripts 86 dependentsbioc
snpStats:SnpMatrix and XSnpMatrix classes and methods
Classes and statistical methods for large SNP association studies. This extends the earlier snpMatrix package, allowing for uncertainty in genotypes.
Maintained by David Clayton. Last updated 4 days ago.
microarraysnpgeneticvariabilityzlib
9.2 match 9.68 score 674 scripts 23 dependentspecanproject
PEcAn.DB:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by David LeBauer. Last updated 8 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
7.3 match 216 stars 11.90 score 127 scripts 27 dependentsbioc
rhdf5:R Interface to HDF5
This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.
Maintained by Mike Smith. Last updated 7 days ago.
infrastructuredataimporthdf5rhdf5opensslcurlzlibcpp
5.3 match 62 stars 15.87 score 4.2k scripts 232 dependentsleelabsg
SKAT:SNP-Set (Sequence) Kernel Association Test
Functions for kernel-regression-based association tests including Burden test, SKAT and SKAT-O. These methods aggregate individual SNP score statistics in a SNP set and efficiently compute SNP-set level p-values.
Maintained by Seunggeun (Shawn) Lee. Last updated 2 months ago.
8.6 match 45 stars 9.70 score 268 scripts 16 dependentsgrunwaldlab
poppr:Genetic Analysis of Populations with Mixed Reproduction
Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grünwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grünwald (2015) <doi:10.3389/fgene.2015.00208>.
Maintained by Zhian N. Kamvar. Last updated 11 months ago.
clonalitygenetic-analysisgenetic-distancesminimum-spanning-networksmultilocus-genotypesmultilocus-lineagespopulation-geneticspopulationsopenmp
7.7 match 69 stars 10.84 score 672 scriptsrefunders
refund:Regression with Functional Data
Methods for regression for functional data, including function-on-scalar, scalar-on-function, and function-on-function regression. Some of the functions are applicable to image data.
Maintained by Julia Wrobel. Last updated 6 months ago.
8.3 match 43 stars 10.11 score 472 scripts 17 dependentsbioc
regioneR:Association analysis of genomic regions based on permutation tests
regioneR offers a statistical framework based on customizable permutation tests to assess the association between genomic region sets and other genomic features.
Maintained by Bernat Gel. Last updated 5 months ago.
geneticschipseqdnaseqmethylseqcopynumbervariation
9.1 match 9.01 score 2.7k scripts 21 dependentspeterreichert
utility:Construct, Evaluate and Plot Value and Utility Functions
Construct and plot objective hierarchies and associated value and utility functions. Evaluate the values and utilities and visualize the results as colored objective hierarchies or tables. Visualize uncertainty by plotting median and quantile intervals within the nodes of objective hierarchies. Get numerical results of the evaluations in standard R data types for further processing.
Maintained by Peter Reichert. Last updated 2 years ago.
24.3 match 3.35 score 82 scripts 1 dependentsazure
azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'
Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.
Maintained by Diondra Peck. Last updated 3 years ago.
amlcomputeazureazure-machine-learningazuremldsimachine-learningrstudiosdk-r
9.1 match 105 stars 8.91 score 221 scriptsschochastics
networkdata:Repository of Network Datasets
The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.
Maintained by David Schoch. Last updated 1 years ago.
16.0 match 142 stars 5.01 score 143 scriptsmrcieu
ieugwasr:Interface to the 'OpenGWAS' Database API
Interface to the 'OpenGWAS' database API <https://api.opengwas.io/api/>. Includes a wrapper to make generic calls to the API, plus convenience functions for specific queries.
Maintained by Gibran Hemani. Last updated 18 days ago.
7.5 match 89 stars 10.71 score 404 scripts 6 dependentsbioc
EnrichedHeatmap:Making Enriched Heatmaps
Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.
Maintained by Zuguang Gu. Last updated 5 months ago.
softwarevisualizationsequencinggenomeannotationcoveragecpp
7.3 match 190 stars 10.87 score 330 scripts 1 dependentsr-spatialecology
shar:Species-Habitat Associations
Analyse species-habitat associations in R. Therefore, information about the location of the species (as a point pattern) is needed together with environmental conditions (as a categorical raster). To test for significance habitat associations, one of the two components is randomized. Methods are mainly based on Plotkin et al. (2000) <doi:10.1006/jtbi.2000.2158> and Harms et al. (2001) <doi:10.1111/j.1365-2745.2001.00615.x>.
Maintained by Maximilian H.K. Hesselbarth. Last updated 2 months ago.
habitat-associationlandscape-ecologypoint-pattern-analysisspatial-analysis
11.6 match 20 stars 6.83 score 28 scriptsbioc
podkat:Position-Dependent Kernel Association Test
This package provides an association test that is capable of dealing with very rare and even private variants. This is accomplished by a kernel-based approach that takes the positions of the variants into account. The test can be used for pre-processed matrix data, but also directly for variant data stored in VCF files. Association testing can be performed whole-genome, whole-exome, or restricted to pre-defined regions of interest. The test is complemented by tools for analyzing and visualizing the results.
Maintained by Ulrich Bodenhofer. Last updated 5 months ago.
geneticswholegenomeannotationvariantannotationsequencingdataimportcurlbzip2xz-utilszlibcpp
15.8 match 5.02 score 6 scriptsinsightsengineering
rtables:Reporting Tables
Reporting tables often have structure that goes beyond simple rectangular data. The 'rtables' package provides a framework for declaring complex multi-level tabulations and then applying them to data. This framework models both tabulation and the resulting tables as hierarchical, tree-like objects which support sibling sub-tables, arbitrary splitting or grouping of data in row and column dimensions, cells containing multiple values, and the concept of contextual summary computations. A convenient pipe-able interface is provided for declaring table layouts and the corresponding computations, and then applying them to data.
Maintained by Joe Zhu. Last updated 3 months ago.
5.8 match 232 stars 13.65 score 238 scripts 17 dependentspbreheny
grpreg:Regularization Paths for Regression Models with Grouped Covariates
Efficient algorithms for fitting the regularization path of linear regression, GLM, and Cox regression models with grouped penalties. This includes group selection methods such as group lasso, group MCP, and group SCAD as well as bi-level selection methods such as the group exponential lasso, the composite MCP, and the group bridge. For more information, see Breheny and Huang (2009) <doi:10.4310/sii.2009.v2.n3.a10>, Huang, Breheny, and Ma (2012) <doi:10.1214/12-sts392>, Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2>, and Breheny (2015) <doi:10.1111/biom.12300>, or visit the package homepage <https://pbreheny.github.io/grpreg/>.
Maintained by Patrick Breheny. Last updated 1 months ago.
6.9 match 34 stars 11.38 score 192 scripts 35 dependentskosukehamazaki
RAINBOWR:Genome-Wide Association Study with SNP-Set Methods
By using 'RAINBOWR' (Reliable Association INference By Optimizing Weights with R), users can test multiple SNPs (Single Nucleotide Polymorphisms) simultaneously by kernel-based (SNP-set) methods. This package can also be applied to haplotype-based GWAS (Genome-Wide Association Study). Users can test not only additive effects but also dominance and epistatic effects. In detail, please check our paper on PLOS Computational Biology: Kosuke Hamazaki and Hiroyoshi Iwata (2020) <doi:10.1371/journal.pcbi.1007663>.
Maintained by Kosuke Hamazaki. Last updated 4 months ago.
13.0 match 22 stars 5.99 score 22 scriptsocbe-uio
contingencytables:Statistical Analysis of Contingency Tables
Provides functions to perform statistical inference of data organized in contingency tables. This package is a companion to the "Statistical Analysis of Contingency Tables" book by Fagerland et al. <ISBN 9781466588172>.
Maintained by Waldir Leoncio. Last updated 7 months ago.
21.2 match 3 stars 3.65 score 8 scripts 1 dependentsaplantin
MiRKAT:Microbiome Regression-Based Kernel Association Tests
Test for overall association between microbiome composition data and phenotypes via phylogenetic kernels. The phenotype can be univariate continuous or binary (Zhao et al. (2015) <doi:10.1016/j.ajhg.2015.04.003>), survival outcomes (Plantinga et al. (2017) <doi:10.1186/s40168-017-0239-9>), multivariate (Zhan et al. (2017) <doi:10.1002/gepi.22030>) and structured phenotypes (Zhan et al. (2017) <doi:10.1111/biom.12684>). The package can also use robust regression (unpublished work) and integrated quantile regression (Wang et al. (2021) <doi:10.1093/bioinformatics/btab668>). In each case, the microbiome community effect is modeled nonparametrically through a kernel function, which can incorporate phylogenetic tree information.
Maintained by Anna Plantinga. Last updated 2 years ago.
14.8 match 3 stars 5.22 score 183 scripts 1 dependentsstan-dev
rstanarm:Bayesian Applied Regression Modeling via Stan
Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.
Maintained by Ben Goodrich. Last updated 13 days ago.
bayesianbayesian-data-analysisbayesian-inferencebayesian-methodsbayesian-statisticsmultilevel-modelsrstanrstanarmstanstatistical-modelingcpp
4.9 match 393 stars 15.70 score 5.0k scripts 13 dependentshanchenphd
GMMAT:Generalized Linear Mixed Model Association Tests
Perform association tests using generalized linear mixed models (GLMMs) in genome-wide association studies (GWAS) and sequencing association studies. First, GMMAT fits a GLMM with covariate adjustment and random effects to account for population structure and familial or cryptic relatedness. For GWAS, GMMAT performs score tests for each genetic variant as proposed in Chen et al. (2016) <DOI:10.1016/j.ajhg.2016.02.012>. For candidate gene studies, GMMAT can also perform Wald tests to get the effect size estimate for each genetic variant. For rare variant analysis from sequencing association studies, GMMAT performs the variant Set Mixed Model Association Tests (SMMAT) as proposed in Chen et al. (2019) <DOI:10.1016/j.ajhg.2018.12.012>, including the burden test, the sequence kernel association test (SKAT), SKAT-O and an efficient hybrid test of the burden test and SKAT, based on user-defined variant sets.
Maintained by Han Chen. Last updated 1 years ago.
openblaszlibbzip2libzstdlibdeflatecpp
9.1 match 41 stars 8.37 score 96 scripts 2 dependentsropensci
rotl:Interface to the 'Open Tree of Life' API
An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.
Maintained by Francois Michonneau. Last updated 2 years ago.
metadataropensciphylogeneticsindependant-contrastsbiodiversitypeer-reviewedphylogenytaxonomy
6.3 match 40 stars 12.05 score 356 scripts 29 dependentsfirefly-cpp
niarules:Numerical Association Rule Mining using Population-Based Nature-Inspired Algorithms
Framework is devoted to mining numerical association rules through the utilization of nature-inspired algorithms for optimization. Drawing inspiration from the 'NiaARM' 'Python' and the 'NiaARM' 'Julia' packages, this repository introduces the capability to perform numerical association rule mining in the R programming language. Fister Jr., Iglesias, Galvez, Del Ser, Osaba and Fister (2018) <doi:10.1007/978-3-030-03493-1_9>.
Maintained by Iztok Jr. Fister. Last updated 28 days ago.
association-rulesmetaheuristicsoptimization
20.2 match 1 stars 3.70 score 2 scriptsbioc
traseR:GWAS trait-associated SNP enrichment analyses in genomic intervals
traseR performs GWAS trait-associated SNP enrichment analyses in genomic intervals using different hypothesis testing approaches, also provides various functionalities to explore and visualize the results.
Maintained by li chen. Last updated 5 months ago.
geneticssequencingcoveragealignmentqualitycontroldataimport
22.0 match 3.30 score 3 scriptsbioc
midasHLA:R package for immunogenomics data handling and association analysis
MiDAS is a R package for immunogenetics data transformation and statistical analysis. MiDAS accepts input data in the form of HLA alleles and KIR types, and can transform it into biologically meaningful variables, enabling HLA amino acid fine mapping, analyses of HLA evolutionary divergence, KIR gene presence, as well as validated HLA-KIR interactions. Further, it allows comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS closes a gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to T cell, Natural Killer cell, and disease biology.
Maintained by Maciej Migdał. Last updated 5 months ago.
cellbiologygeneticsstatisticalmethod
16.6 match 4.30 score 3 scriptsalenxav
NAM:Nested Association Mapping
Designed for association studies in nested association mapping (NAM) panels, experimental and random panels. The method is described by Xavier et al. (2015) <doi:10.1093/bioinformatics/btv448>. It includes tools for genome-wide associations of multiple populations, marker quality control, population genetics analysis, genome-wide prediction, solving mixed models and finding variance components through likelihood and Bayesian methods.
Maintained by Alencar Xavier. Last updated 5 years ago.
12.4 match 2 stars 5.72 score 44 scripts 1 dependentscran
datarobot:'DataRobot' Predictive Modeling API
For working with the 'DataRobot' predictive modeling platform's API <https://www.datarobot.com/>.
Maintained by AJ Alon. Last updated 1 years ago.
20.4 match 2 stars 3.48 scorenowosad
sabre:Spatial Association Between Regionalizations
Calculates a degree of spatial association between regionalizations or categorical maps using the information-theoretical V-measure (Nowosad and Stepinski (2018) <doi:10.1080/13658816.2018.1511794>). It also offers an R implementation of the MapCurve method (Hargrove et al. (2006) <doi:10.1007/s10109-006-0025-x>).
Maintained by Jakub Nowosad. Last updated 4 months ago.
entropypolygonsregionalizationsspatialspatial-analysis
9.9 match 36 stars 6.95 score 25 scriptsbioc
metagenomeSeq:Statistical analysis for sparse high-throughput sequencing
metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.
Maintained by Joseph N. Paulson. Last updated 3 months ago.
immunooncologyclassificationclusteringgeneticvariabilitydifferentialexpressionmicrobiomemetagenomicsnormalizationvisualizationmultiplecomparisonsequencingsoftware
5.8 match 69 stars 11.90 score 494 scripts 7 dependentszarquon42b
Morpho:Calculations and Visualisations Related to Geometric Morphometrics
A toolset for Geometric Morphometrics and mesh processing. This includes (among other stuff) mesh deformations based on reference points, permutation tests, detection of outliers, processing of sliding semi-landmarks and semi-automated surface landmark placement.
Maintained by Stefan Schlager. Last updated 5 months ago.
6.8 match 51 stars 10.01 score 218 scripts 13 dependentsfaosorios
SpatialPack:Tools for Assessment the Association Between Two Spatial Processes
Tools to assess the association between two spatial processes. Currently, several methodologies are implemented: A modified t-test to perform hypothesis testing about the independence between the processes, a suitable nonparametric correlation coefficient, the codispersion coefficient, and an F test for assessing the multiple correlation between one spatial process and several others. Functions for image processing and computing the spatial association between images are also provided. Functions contained in the package are intended to accompany Vallejos, R., Osorio, F., Bevilacqua, M. (2020). Spatial Relationships Between Two Georeferenced Variables: With Applications in R. Springer, Cham <doi:10.1007/978-3-030-56681-4>.
Maintained by Felipe Osorio. Last updated 12 days ago.
codispersion-coefficientmodified-t-testspatial-associationspatial-processesssimstructural-similaritytjostheim-coefficientfortran
11.2 match 7 stars 5.88 score 73 scripts 1 dependentsbioc
microbiome:Microbiome Analytics
Utilities for microbiome analysis.
Maintained by Leo Lahti. Last updated 5 months ago.
metagenomicsmicrobiomesequencingsystemsbiologyhitchiphitchip-atlashuman-microbiomemicrobiologymicrobiome-analysisphyloseqpopulation-study
5.3 match 293 stars 12.51 score 2.0k scripts 5 dependentsbioc
PANR:Posterior association networks and functional modules inferred from rich phenotypes of gene perturbations
This package provides S4 classes and methods for inferring functional gene networks with edges encoding posterior beliefs of gene association types and nodes encoding perturbation effects.
Maintained by Xin Wang. Last updated 5 months ago.
immunooncologynetworkinferencevisualizationgraphandnetworkclusteringcellbasedassays
19.9 match 3.30 score 2 scriptsbioc
dcanr:Differential co-expression/association network analysis
This package implements methods and an evaluation framework to infer differential co-expression/association networks. Various methods are implemented and can be evaluated using simulated datasets. Inference of differential co-expression networks can allow identification of networks that are altered between two conditions (e.g., health and disease).
Maintained by Dharmesh D. Bhuva. Last updated 5 months ago.
networkinferencegraphandnetworkdifferentialexpressionnetwork
8.8 match 6 stars 7.45 score 26 scripts 5 dependentsbioc
rexposome:Exposome exploration and outcome data analysis
Package that allows to explore the exposome and to perform association analyses between exposures and health outcomes.
Maintained by Xavier Escribà Montagut. Last updated 5 months ago.
softwarebiologicalquestioninfrastructuredataimportdatarepresentationbiomedicalinformaticsexperimentaldesignmultiplecomparisonclassificationclustering
11.4 match 5.70 score 28 scripts 1 dependentspln-team
PLNmodels:Poisson Lognormal Models
The Poisson-lognormal model and variants (Chiquet, Mariadassou and Robin, 2021 <doi:10.3389/fevo.2021.588292>) can be used for a variety of multivariate problems when count data are at play, including principal component analysis for count data, discriminant analysis, model-based clustering and network inference. Implements variational algorithms to fit such models accompanied with a set of functions for visualization and diagnostic.
Maintained by Julien Chiquet. Last updated 7 days ago.
count-datamultivariate-analysisnetwork-inferencepcapoisson-lognormal-modelopenblascpp
6.7 match 55 stars 9.54 score 226 scriptshimesgroup
snpsettest:A Set-Based Association Test using GWAS Summary Statistics
The goal of 'snpsettest' is to provide simple tools that perform set-based association tests (e.g., gene-based association tests) using GWAS (genome-wide association study) summary statistics. A set-based association test in this package is based on the statistical model described in VEGAS (versatile gene-based association study), which combines the effects of a set of SNPs accounting for linkage disequilibrium between markers. This package uses a different approach from the original VEGAS implementation to compute set-level p values more efficiently, as described in <https://github.com/HimesGroup/snpsettest/wiki/Statistical-test-in-snpsettest>.
Maintained by Jaehyun Joo. Last updated 2 years ago.
12.7 match 7 stars 5.02 score 9 scriptsbioc
rTRM:Identification of Transcriptional Regulatory Modules from Protein-Protein Interaction Networks
rTRM identifies transcriptional regulatory modules (TRMs) from protein-protein interaction networks.
Maintained by Diego Diez. Last updated 5 months ago.
transcriptionnetworkgeneregulationgraphandnetworkbioconductorbioinformatics
13.2 match 3 stars 4.86 score 3 scripts 1 dependentsbioc
maaslin3:"Refining and extending generalized multivariate linear models for meta-omic association discovery"
MaAsLin 3 refines and extends generalized multivariate linear models for meta-omicron association discovery. It finds abundance and prevalence associations between microbiome meta-omics features and complex metadata in population-scale epidemiological studies. The software includes multiple analysis methods (including support for multiple covariates, repeated measures, and ordered predictors), filtering, normalization, and transform options to customize analysis for your specific study.
Maintained by William Nickols. Last updated 7 days ago.
metagenomicssoftwaremicrobiomenormalizationmultiplecomparison
7.8 match 33 stars 8.16 score 34 scriptsnanxstats
ggsci:Scientific Journal and Sci-Fi Themed Color Palettes for 'ggplot2'
A collection of 'ggplot2' color palettes inspired by plots in scientific journals, data visualization libraries, science fiction movies, and TV shows.
Maintained by Nan Xiao. Last updated 10 months ago.
color-palettesdata-visualizationggplot2ggscisci-fiscientific-journalsvisualization
3.5 match 680 stars 18.00 score 26k scripts 438 dependentsstscl
cisp:A Correlation Indicator Based on Spatial Patterns
Use the spatial association marginal contributions derived from spatial stratified heterogeneity to capture the degree of correlation between spatial patterns.
Maintained by Wenbo Lv. Last updated 2 months ago.
associationcorrelationgeoinformaticsspatial-patterns
12.2 match 5 stars 5.10 score 2 scriptsbabaknaimi
elsa:Entropy-Based Local Indicator of Spatial Association
A framework that provides the methods for quantifying entropy-based local indicator of spatial association (ELSA) that can be used for both continuous and categorical data. In addition, this package offers other methods to measure local indicators of spatial associations (LISA). Furthermore, global spatial structure can be measured using a variogram-like diagram, called entrogram. For more information, please check that paper: Naimi, B., Hamm, N. A., Groen, T. A., Skidmore, A. K., Toxopeus, A. G., & Alibakhshi, S. (2019) <doi:10.1016/j.spasta.2018.10.001>.
Maintained by Babak Naimi. Last updated 1 years ago.
11.9 match 14 stars 5.23 score 24 scriptssinnweja
haplo.stats:Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous
Routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers). The main functions are: haplo.em(), haplo.glm(), haplo.score(), and haplo.power(); all of which have detailed examples in the vignette.
Maintained by Jason P. Sinnwell. Last updated 6 months ago.
10.4 match 2 stars 5.98 score 96 scripts 12 dependentsbeerda
lfl:Linguistic Fuzzy Logic
Various algorithms related to linguistic fuzzy logic: mining for linguistic fuzzy association rules, composition of fuzzy relations, performing perception-based logical deduction (PbLD), and forecasting time-series using fuzzy rule-based ensemble (FRBE). The package also contains basic fuzzy-related algebraic functions capable of handling missing values in different styles (Bochvar, Sobocinski, Kleene etc.), computation of Sugeno integrals and fuzzy transform.
Maintained by Michal Burda. Last updated 5 months ago.
association-rulesforecast-modelfuzzy-logicinference-rulescppopenmp
11.5 match 8 stars 5.35 score 28 scriptsropensci
datapack:A Flexible Container to Transport and Manipulate Data and Associated Resources
Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI- ORE standard is described at <https://www.openarchives.org/ore/>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at <https://tools.ietf.org/html/draft-kunze-bagit-08>.
Maintained by Matthew B. Jones. Last updated 3 years ago.
7.1 match 43 stars 8.55 score 195 scripts 4 dependentsinsightsengineering
teal.modules.general:General Modules for 'teal' Applications
Prebuilt 'shiny' modules containing tools for viewing data, visualizing data, understanding missing and outlier values within your data and performing simple data analysis. This extends 'teal' framework that supports reproducible research and analysis.
Maintained by Dawid Kaledkowski. Last updated 1 months ago.
general-purposemodulesnestshiny
6.1 match 13 stars 9.74 score 71 scriptsomfmartin
zebu:Local Association Measures
Implements the estimation of local (and global) association measures: Lewontin's D, Ducher's Z, pointwise mutual information, normalized pointwise mutual information and chi-squared residuals. The significance of local (and global) association is accessed using p-values estimated by permutations.
Maintained by Olivier M. F. Martin. Last updated 2 years ago.
14.9 match 3.98 score 19 scriptsbioc
ramwas:Fast Methylome-Wide Association Study Pipeline for Enrichment Platforms
A complete toolset for methylome-wide association studies (MWAS). It is specifically designed for data from enrichment based methylation assays, but can be applied to other data as well. The analysis pipeline includes seven steps: (1) scanning aligned reads from BAM files, (2) calculation of quality control measures, (3) creation of methylation score (coverage) matrix, (4) principal component analysis for capturing batch effects and detection of outliers, (5) association analysis with respect to phenotypes of interest while correcting for top PCs and known covariates, (6) annotation of significant findings, and (7) multi-marker analysis (methylation risk score) using elastic net. Additionally, RaMWAS include tools for joint analysis of methlyation and genotype data. This work is published in Bioinformatics, Shabalin et al. (2018) <doi:10.1093/bioinformatics/bty069>.
Maintained by Andrey A Shabalin. Last updated 5 months ago.
dnamethylationsequencingqualitycontrolcoveragepreprocessingnormalizationbatcheffectprincipalcomponentdifferentialmethylationvisualization
9.7 match 10 stars 6.08 score 85 scriptsfishr-core-team
FSA:Simple Fisheries Stock Assessment Methods
A variety of simple fish stock assessment methods.
Maintained by Derek H. Ogle. Last updated 2 months ago.
fishfisheriesfisheries-managementfisheries-stock-assessmentpopulation-dynamicsstock-assessment
5.3 match 69 stars 11.16 score 1.7k scripts 6 dependentsbeerda
nuggets:Extensible Data Pattern Searching Framework
Extensible framework for subgroup discovery (Atzmueller (2015) <doi:10.1002/widm.1144>), contrast patterns (Chen (2022) <doi:10.48550/arXiv.2209.13556>), emerging patterns (Dong (1999) <doi:10.1145/312129.312191>), association rules (Agrawal (1994) <https://www.vldb.org/conf/1994/P487.PDF>) and conditional correlations (Hájek (1978) <doi:10.1007/978-3-642-66943-9>). Both crisp (Boolean, binary) and fuzzy data are supported. It generates conditions in the form of elementary conjunctions, evaluates them on a dataset and checks the induced sub-data for interesting statistical properties. A user-defined function may be defined to evaluate on each generated condition to search for custom patterns.
Maintained by Michal Burda. Last updated 19 days ago.
association-rule-miningcontrast-pattern-miningdata-miningfuzzyknowledge-discoverypattern-recognitioncppopenmp
10.5 match 2 stars 5.38 score 10 scriptsr-spatial
spdep:Spatial Dependence: Weighting Schemes, Statistics
A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.
Maintained by Roger Bivand. Last updated 1 months ago.
spatial-autocorrelationspatial-dependencespatial-weights
3.4 match 131 stars 16.59 score 6.0k scripts 106 dependentsralmond
Peanut:Parameterized Bayesian Networks, Abstract Classes
This provides support of learning conditional probability tables parameterized using CPTtools. This provides and object oriented layer on top of a CPTtools, to facilitate calculations with Parameterized models for Bayesian networks. Peanut is a collection of abstract classes and generic functions defining a protocol, with the intent that the protocol can be implemented with different Bayes net engines. The companion pacakge PNetica provides an implementation using Netica and RNetica.
Maintained by Russell Almond. Last updated 2 years ago.
bayesian-networkknowledge-representation
22.7 match 1 stars 2.48 score 4 scripts 2 dependentszhanxw
seqminer:Efficiently Read Sequence Data (VCF Format, BCF Format, METAL Format and BGEN Format) into R
Integrate sequencing data (Variant call format, e.g. VCF or BCF) or meta-analysis results in R. This package can help you (1) read VCF/BCF/BGEN files by chromosomal ranges (e.g. 1:100-200); (2) read RareMETAL summary statistics files; (3) read tables from a tabix-indexed files; (4) annotate VCF/BCF files; (5) create customized workflow based on Makefile.
Maintained by Xiaowei Zhan. Last updated 6 months ago.
annotationbcfbgenmeta-analysisnext-generation-sequencingplinksequencingtabixvcfworkflowzlibbzip2libzstdsqlite3cpp
6.8 match 30 stars 8.29 score 111 scripts 6 dependentskornl
mutoss:Unified Multiple Testing Procedures
Designed to ease the application and comparison of multiple hypothesis testing procedures for FWER, gFWER, FDR and FDX. Methods are standardized and usable by the accompanying 'mutossGUI'.
Maintained by Kornelius Rohmeyer. Last updated 1 years ago.
6.6 match 4 stars 8.50 score 24 scripts 16 dependentsbioc
omicRexposome:Exposome and omic data associatin and integration analysis
omicRexposome systematizes the association evaluation between exposures and omic data, taking advantage of MultiDataSet for coordinated data management, rexposome for exposome data definition and limma for association testing. Also to perform data integration mixing exposome and omic data using multi co-inherent analysis (omicade4) and multi-canonical correlation analysis (PMA).
Maintained by Xavier Escribà Montagut. Last updated 5 months ago.
immunooncologyworkflowstepmultiplecomparisonvisualizationgeneexpressiondifferentialexpressiondifferentialmethylationgeneregulationepigeneticsproteomicstranscriptomicsstatisticalmethodregression
13.0 match 4.30 score 5 scriptsstscl
itmsa:Information-Theoretic Measures for Spatial Association
Leveraging information-theoretic measures like mutual information and v-measure to quantify spatial associations between patterns (Nowosad and Stepinski (2018) <doi:10.1080/13658816.2018.1511794>; Bai, H. et al. (2023) <doi:10.1080/24694452.2023.2223700>).
Maintained by Wenbo Lv. Last updated 3 months ago.
11.1 match 5 stars 5.00 scorestatnet
ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks
An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.
Maintained by Pavel N. Krivitsky. Last updated 22 days ago.
3.6 match 100 stars 15.36 score 1.4k scripts 36 dependentsbioc
LEA:LEA: an R package for Landscape and Ecological Association Studies
LEA is an R package dedicated to population genomics, landscape genomics and genotype-environment association tests. LEA can run analyses of population structure and genome-wide tests for local adaptation, and also performs imputation of missing genotypes. The package includes statistical methods for estimating ancestry coefficients from large genotypic matrices and for evaluating the number of ancestral populations (snmf). It performs statistical tests using latent factor mixed models for identifying genetic polymorphisms that exhibit association with environmental gradients or phenotypic traits (lfmm2). In addition, LEA computes values of genetic offset statistics based on new or predicted environments (genetic.gap, genetic.offset). LEA is mainly based on optimized programs that can scale with the dimensions of large data sets.
Maintained by Olivier Francois. Last updated 21 days ago.
softwarestatistical methodclusteringregressionopenblas
8.2 match 6.63 score 534 scriptsbioc
Maaslin2:"Multivariable Association Discovery in Population-scale Meta-omics Studies"
MaAsLin2 is comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta'omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods. MaAsLin2 is the next generation of MaAsLin.
Maintained by Lauren McIver. Last updated 5 months ago.
metagenomicssoftwaremicrobiomenormalizationbiobakerybioconductordifferential-abundance-analysisfalse-discovery-ratemultiple-covariatespublicrepeated-measurestools
4.9 match 133 stars 11.03 score 532 scripts 3 dependentsbioc
PCAN:Phenotype Consensus ANalysis (PCAN)
Phenotypes comparison based on a pathway consensus approach. Assess the relationship between candidate genes and a set of phenotypes based on additional genes related to the candidate (e.g. Pathways or network neighbors).
Maintained by Matthew Page. Last updated 5 months ago.
annotationsequencinggeneticsfunctionalpredictionvariantannotationpathwaysnetwork
13.0 match 4.15 score 7 scriptsbiodiverse
unmarked:Models for Data from Unmarked Animals
Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.
Maintained by Ken Kellner. Last updated 13 days ago.
4.1 match 4 stars 13.02 score 652 scripts 12 dependentsdanchaltiel
crosstable:Crosstables for Descriptive Analyses
Create descriptive tables for continuous and categorical variables. Apply summary statistics and counting function, with or without a grouping variable, and create beautiful reports using 'rmarkdown' or 'officer'. You can also compute effect sizes and statistical tests if needed.
Maintained by Dan Chaltiel. Last updated 2 months ago.
descriptive-statisticsflextablefrequency-tablehtml-reportmswordofficer
5.1 match 116 stars 10.40 score 340 scriptsozancinar
poolr:Methods for Pooling P-Values from (Dependent) Tests
Functions for pooling/combining the results (i.e., p-values) from (dependent) hypothesis tests. Included are Fisher's method, Stouffer's method, the inverse chi-square method, the Bonferroni method, Tippett's method, and the binomial test. Each method can be adjusted based on an estimate of the effective number of tests or using empirically derived null distribution using pseudo replicates. For Fisher's, Stouffer's, and the inverse chi-square method, direct generalizations based on multivariate theory are also available (leading to Brown's method, Strube's method, and the generalized inverse chi-square method). An introduction can be found in Cinar and Viechtbauer (2022) <doi:10.18637/jss.v101.i01>.
Maintained by Ozan Cinar. Last updated 18 days ago.
8.3 match 12 stars 6.32 score 145 scripts 1 dependentsbioc
xcms:LC-MS and GC-MS Data Analysis
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.
Maintained by Steffen Neumann. Last updated 18 days ago.
immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp
3.7 match 196 stars 14.31 score 984 scripts 11 dependentsr-forge
survey:Analysis of Complex Survey Samples
Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase and multiphase subsampling designs. Graphics. PPS sampling without replacement. Small-area estimation. Dual-frame designs.
Maintained by "Thomas Lumley". Last updated 4 days ago.
3.8 match 1 stars 13.94 score 13k scripts 234 dependentsbarakbri
repfdr:Replicability Analysis for Multiple Studies of High Dimension
Estimation of Bayes and local Bayes false discovery rates for replicability analysis (Heller & Yekutieli, 2014 <doi:10.1214/13-AOAS697> ; Heller at al., 2015 <doi: 10.1093/bioinformatics/btu434>).
Maintained by Ruth Heller. Last updated 8 years ago.
10.5 match 3 stars 4.98 score 16 scriptsbioc
HIBAG:HLA Genotype Imputation with Attribute Bagging
Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.
Maintained by Xiuwen Zheng. Last updated 4 months ago.
geneticsstatisticalmethodbioinformaticsgpuhlaimputationmhcsnpcpp
6.3 match 30 stars 8.24 score 48 scriptsbioc
SANTA:Spatial Analysis of Network Associations
This package provides methods for measuring the strength of association between a network and a phenotype. It does this by measuring clustering of the phenotype across the network (Knet). Vertices can also be individually ranked by their strength of association with high-weight vertices (Knode).
Maintained by Alex Cornish. Last updated 5 months ago.
networknetworkenrichmentclustering
10.4 match 5.02 score 6 scriptsshixiangwang
sigminer:Extract, Analyze and Visualize Mutational Signatures for Genomic Variations
Genomic alterations including single nucleotide substitution, copy number alteration, etc. are the major force for cancer initialization and development. Due to the specificity of molecular lesions caused by genomic alterations, we can generate characteristic alteration spectra, called 'signature' (Wang, Shixiang, et al. (2021) <DOI:10.1371/journal.pgen.1009557> & Alexandrov, Ludmil B., et al. (2020) <DOI:10.1038/s41586-020-1943-3> & Steele Christopher D., et al. (2022) <DOI:10.1038/s41586-022-04738-6>). This package helps users to extract, analyze and visualize signatures from genomic alteration records, thus providing new insight into cancer study.
Maintained by Shixiang Wang. Last updated 6 months ago.
bayesian-nmfbioinformaticscancer-researchcnvcopynumber-signaturescosmic-signaturesdbseasy-to-useindelmutational-signaturesnmfnmf-extractionsbssignature-extractionsomatic-mutationssomatic-variantsvisualizationcpp
5.4 match 150 stars 9.48 score 123 scripts 2 dependentsvincentarelbundock
countrycode:Convert Country Names and Country Codes
Standardize country names, convert them into one of 40 different coding schemes, convert between coding schemes, and assign region descriptors.
Maintained by Vincent Arel-Bundock. Last updated 13 hours ago.
3.4 match 348 stars 14.89 score 6.3k scripts 119 dependentsmaelstrom-research
madshapR:Support Technical Processes Following 'Maelstrom Research' Standards
Functions to support rigorous processes in data cleaning, evaluation, and documentation across datasets from different studies based on Maelstrom Research guidelines. The package includes the core functions to evaluate and format the main inputs that define the process, diagnose errors, and summarize and evaluate datasets and their associated data dictionaries. The main outputs are clean datasets and associated metadata, and tabular and visual summary reports. As described in Maelstrom Research guidelines for rigorous retrospective data harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).
Maintained by Guillaume Fabre. Last updated 11 months ago.
9.4 match 2 stars 5.40 score 28 scripts 3 dependentsvp-biostat
comorbidPGS:Assessing Predisposition Between Phenotypes using Polygenic Scores
Using polygenic scores (PGS, or PRS/GRS for binary outcomes), this package allows to investigate shared predisposition between different conditions, and do fast association analysis, export plots and views of the PGS distribution using 'ggplot2' object.
Maintained by Vincent Pascat. Last updated 3 months ago.
13.4 match 3 stars 3.78 score 3 scriptsbioc
consensusSeekeR:Detection of consensus regions inside a group of experiences using genomic positions and genomic ranges
This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region. In genomic analysis where feature identification generates a position value surrounded by a genomic range, such as ChIP-Seq peaks and nucleosome positions, the replication of an experiment may result in slight differences between predicted values. This package enables the conciliation of the results into consensus regions.
Maintained by Astrid Deschênes. Last updated 5 months ago.
biologicalquestionchipseqgeneticsmultiplecomparisontranscriptionpeakdetectionsequencingcoveragechip-seq-analysisgenomic-data-analysisnucleosome-positioning
9.6 match 1 stars 5.26 score 5 scripts 1 dependentscran
vcd:Visualizing Categorical Data
Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book "Visualizing Categorical Data" by Michael Friendly and is now the main support package for a new book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer (2015).
Maintained by David Meyer. Last updated 7 months ago.
6.0 match 5 stars 8.19 score 87 dependentsblasbenito
collinear:Automated Multicollinearity Management
Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.
Maintained by Blas M. Benito. Last updated 3 months ago.
machine-learningmulticollinearitystatistics
8.9 match 11 stars 5.51 score 15 scripts 1 dependentspbs-software
PBSmodelling:GUI Tools Made Easy: Interact with Models and Explore Data
Provides software to facilitate the design, testing, and operation of computer models. It focuses particularly on tools that make it easy to construct and edit a customized graphical user interface ('GUI'). Although our simplified 'GUI' language depends heavily on the R interface to the 'Tcl/Tk' package, a user does not need to know 'Tcl/Tk'. Examples illustrate models built with other R packages, including 'PBSmapping', 'PBSddesolve', and 'BRugs'. A complete user's guide 'PBSmodelling-UG.pdf' shows how to use this package effectively.
Maintained by Rowan Haigh. Last updated 5 months ago.
7.3 match 2 stars 6.76 score 120 scripts 4 dependentscmmr
rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data
A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.
Maintained by Daniel P. Smith. Last updated 14 days ago.
5.4 match 15 stars 9.07 score 117 scripts 6 dependentsrobindenz1
simDAG:Simulate Data from a DAG and Associated Node Information
Simulate complex data from a given directed acyclic graph and information about each individual node. Root nodes are simply sampled from the specified distribution. Child Nodes are simulated according to one of many implemented regressions, such as logistic regression, linear regression, poisson regression and more. Also includes a comprehensive framework for discrete-time simulation, which can generate even more complex longitudinal data.
Maintained by Robin Denz. Last updated 2 days ago.
causal-inferencedirected-acyclic-graphsimulation
6.4 match 11 stars 7.69 score 77 scriptsapxr
analyzer:Data Analysis and Automated R Notebook Generation
Easy data analysis and quality checks which are commonly used in data science. It combines the tabular and graphical visualization for easier usability. This package also creates an R Notebook with detailed data exploration with one function call. The notebook can be made interactive.
Maintained by Apurv Priyam. Last updated 5 years ago.
11.8 match 4.13 score 27 scriptstgrimes
SeqNet:Generate RNA-Seq Data from Gene-Gene Association Networks
Methods to generate random gene-gene association networks and simulate RNA-seq data from them, as described in Grimes and Datta (2021) <doi:10.18637/jss.v098.i12>. Includes functions to generate random networks of any size and perturb them to obtain differential networks. Network objects are built from individual, overlapping modules that represent pathways. The resulting network has various topological properties that are characteristic of gene regulatory networks. RNA-seq data can be generated such that the association among gene expression profiles reflect the underlying network. A reference RNA-seq dataset can be provided to model realistic marginal distributions. Plotting functions are available to visualize a network, compare two networks, and compare the expression of two genes across multiple networks.
Maintained by Tyler Grimes. Last updated 4 years ago.
17.3 match 2.82 score 22 scripts 1 dependentsyulab-smu
aplot:Decorate a 'ggplot' with Associated Information
For many times, we are not just aligning plots as what 'cowplot' and 'patchwork' did. Users would like to align associated information that requires axes to be exactly matched in subplots, e.g. hierarchical clustering with a heatmap. Inspired by the 'Method 2' in 'ggtree' (G Yu (2018) <doi:10.1093/molbev/msy194>), 'aplot' provides utilities to aligns associated subplots to a main plot at different sides (left, right, top and bottom) with axes exactly matched.
Maintained by Guangchuang Yu. Last updated 1 months ago.
3.9 match 103 stars 12.25 score 520 scripts 118 dependentsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 11 days ago.
3.5 match 845 stars 13.63 score 264 scripts 2 dependentsmllaberia
Rtapas:Random Tanglegram Partitions
Applies a given global-fit method to random partial tanglegrams of a fixed size to identify the associations, terminals, and nodes that maximize phylogenetic (in)congruence. It also includes functions to compute more easily the confidence intervals of classification metrics and plot results, reducing computational time. See "Llaberia-Robledillo et al. (2023, <doi:10.1093/sysbio/syad016>)".
Maintained by Mar Llaberia-Robledillo. Last updated 10 months ago.
12.9 match 5 stars 3.70 score 9 scriptsjedalong
wildlifeDI:Calculate Indices of Dynamic Interaction for Wildlife Tracking Data
Dynamic interaction refers to spatial-temporal associations in the movements of two (or more) animals. This package provides tools for calculating a suite of indices used for quantifying dynamic interaction with wildlife telemetry data. For more information on each of the methods employed see the references within. The package (as of version >= 0.3) also has new tools for automating contact analysis in large tracking datasets. The package (as of version 1.0) uses the 'move2' class of objects for working with tracking dataset.
Maintained by Jed Long. Last updated 13 days ago.
7.2 match 16 stars 6.60 score 31 scriptsacorg
Racmacs:Antigenic Cartography Macros
A toolkit for making antigenic maps from immunological assay data, in order to quantify and visualize antigenic differences between different pathogen strains as described in Smith et al. (2004) <doi:10.1126/science.1097211> and used in the World Health Organization influenza vaccine strain selection process. Additional functions allow for the diagnostic evaluation of antigenic maps and an interactive viewer is provided to explore antigenic relationships amongst several strains and incorporate the visualization of associated genetic information.
Maintained by Sam Wilks. Last updated 9 months ago.
5.9 match 21 stars 8.06 score 362 scriptsr-forge
Sleuth3:Data Sets from Ramsey and Schafer's "Statistical Sleuth (3rd Ed)"
Data sets from Ramsey, F.L. and Schafer, D.W. (2013), "The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed)", Cengage Learning.
Maintained by Berwin A Turlach. Last updated 1 years ago.
7.5 match 6.29 score 522 scriptsready4-dev
ready4:Develop and Use Modular Health Economic Models
A template model module, tools to help find model modules derived from this template and a programming syntax to use these modules in health economic analyses. These elements are the foundation for a prototype software framework for developing living and transferable models and using those models in reproducible health economic analyses. The software framework is extended by other R libraries. For detailed documentation about the framework and how to use it visit <https://www.ready4-dev.com/>. For a background to the methodological issues that the framework is attempting to help solve, see Hamilton et al. (2024) <doi:10.1007/s40273-024-01378-8>.
Maintained by Matthew Hamilton. Last updated 5 months ago.
computational-modelinghealth-economicssoftware-framework
6.9 match 2 stars 6.84 score 95 scriptsbioc
coMethDMR:Accurate identification of co-methylated and differentially methylated regions in epigenome-wide association studies
coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.
Maintained by Fernanda Veitzman. Last updated 5 months ago.
dnamethylationepigeneticsmethylationarraydifferentialmethylationgenomewideassociation
7.3 match 7 stars 6.47 score 42 scriptsdrjphughesjr
hash:Full Featured Implementation of Hash Tables/Associative Arrays/Dictionaries
Implements a data structure similar to hashes in Perl and dictionaries in Python but with a purposefully R flavor. For objects of appreciable size, access using hashes outperforms native named lists and vectors.
Maintained by John Hughes. Last updated 2 years ago.
6.2 match 1 stars 7.54 score 4.0k scripts 50 dependentsbioc
pathwayPCA:Integrative Pathway Analysis with Modern PCA Methodology and Gene Selection
pathwayPCA is an integrative analysis tool that implements the principal component analysis (PCA) based pathway analysis approaches described in Chen et al. (2008), Chen et al. (2010), and Chen (2011). pathwayPCA allows users to: (1) Test pathway association with binary, continuous, or survival phenotypes. (2) Extract relevant genes in the pathways using the SuperPCA and AES-PCA approaches. (3) Compute principal components (PCs) based on the selected genes. These estimated latent variables represent pathway activities for individual subjects, which can then be used to perform integrative pathway analysis, such as multi-omics analysis. (4) Extract relevant genes that drive pathway significance as well as data corresponding to these relevant genes for additional in-depth analysis. (5) Perform analyses with enhanced computational efficiency with parallel computing and enhanced data safety with S4-class data objects. (6) Analyze studies with complex experimental designs, with multiple covariates, and with interaction effects, e.g., testing whether pathway association with clinical phenotype is different between male and female subjects. Citations: Chen et al. (2008) <https://doi.org/10.1093/bioinformatics/btn458>; Chen et al. (2010) <https://doi.org/10.1002/gepi.20532>; and Chen (2011) <https://doi.org/10.2202/1544-6115.1697>.
Maintained by Gabriel Odom. Last updated 5 months ago.
copynumbervariationdnamethylationgeneexpressionsnptranscriptiongenepredictiongenesetenrichmentgenesignalinggenetargetgenomewideassociationgenomicvariationcellbiologyepigeneticsfunctionalgenomicsgeneticslipidomicsmetabolomicsproteomicssystemsbiologytranscriptomicsclassificationdimensionreductionfeatureextractionprincipalcomponentregressionsurvivalmultiplecomparisonpathways
6.0 match 11 stars 7.74 score 42 scriptssatijalab
SeuratObject:Data Structures for Single Cell Data
Defines S4 classes for single-cell genomic data and associated information, such as dimensionality reduction embeddings, nearest-neighbor graphs, and spatially-resolved coordinates. Provides data access methods and R-native hooks to ensure the Seurat object is familiar to other R users. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, and Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031> for more details.
Maintained by Paul Hoffman. Last updated 2 years ago.
3.9 match 25 stars 11.69 score 1.2k scripts 88 dependentsfhdsl
metricminer:Mine Metrics from Common Places on the Web
Mine metrics on common places on the web through the power of their APIs (application programming interfaces). It also helps make the data in a format that is easily used for a dashboard or other purposes. There is an associated dashboard template and tutorials that are underdevelopment that help you fully utilize 'metricminer'.
Maintained by Candace Savonen. Last updated 7 days ago.
7.5 match 2 stars 6.13 score 21 scriptsgreat-northern-diver
loon:Interactive Statistical Data Visualization
An extendable toolkit for interactive data visualization and exploration.
Maintained by R. Wayne Oldford. Last updated 2 years ago.
data-analysisdata-sciencedata-visualizationexploratory-analysisexploratory-data-analysishigh-dimensional-datainteractive-graphicsinteractive-visualizationsloonpythonstatistical-analysisstatistical-graphicsstatisticstcl-extensiontk
5.1 match 48 stars 9.00 score 93 scripts 5 dependentsbioc
knowYourCG:Functional analysis of DNA methylome datasets
KnowYourCG (KYCG) is a supervised learning framework designed for the functional analysis of DNA methylation data. Unlike existing tools that focus on genes or genomic intervals, KnowYourCG directly targets CpG dinucleotides, featuring automated supervised screenings of diverse biological and technical influences, including sequence motifs, transcription factor binding, histone modifications, replication timing, cell-type-specific methylation, and trait-epigenome associations. KnowYourCG addresses the challenges of data sparsity in various methylation datasets, including low-pass Nanopore sequencing, single-cell DNA methylomes, 5-hydroxymethylation profiles, spatial DNA methylation maps, and array-based datasets for epigenome-wide association studies and epigenetic clocks.
Maintained by Goldberg David. Last updated 3 months ago.
epigeneticsdnamethylationsequencingsinglecellspatialmethylationarrayzlib
7.5 match 2 stars 6.10 score 4 scriptsbioc
maftools:Summarize, Analyze and Visualize MAF Files
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.
Maintained by Anand Mayakonda. Last updated 5 months ago.
datarepresentationdnaseqvisualizationdrivermutationvariantannotationfeatureextractionclassificationsomaticmutationsequencingfunctionalgenomicssurvivalbioinformaticscancer-genome-atlascancer-genomicsgenomicsmaf-filestcgacurlbzip2xz-utilszlib
3.1 match 461 stars 14.59 score 948 scripts 18 dependentsbioc
lisaClust:lisaClust: Clustering of Local Indicators of Spatial Association
lisaClust provides a series of functions to identify and visualise regions of tissue where spatial associations between cell-types is similar. This package can be used to provide a high-level summary of cell-type colocalization in multiplexed imaging data that has been segmented at a single-cell resolution.
Maintained by Ellis Patrick. Last updated 4 months ago.
singlecellcellbasedassaysspatial
6.9 match 3 stars 6.64 score 48 scriptsasalavaty
influential:Identification and Classification of the Most Influential Nodes
Contains functions for the classification and ranking of top candidate features, reconstruction of networks from adjacency matrices and data frames, analysis of the topology of the network and calculation of centrality measures, and identification of the most influential nodes. Also, a function is provided for running SIRIR model, which is the combination of leave-one-out cross validation technique and the conventional SIR model, on a network to unsupervisedly rank the true influence of vertices. Additionally, some functions have been provided for the assessment of dependence and correlation of two network centrality measures as well as the conditional probability of deviation from their corresponding means in opposite direction. Fred Viole and David Nawrocki (2013, ISBN:1490523995). Csardi G, Nepusz T (2006). "The igraph software package for complex network research." InterJournal, Complex Systems, 1695. Adopted algorithms and sources are referenced in function document.
Maintained by Adrian Salavaty. Last updated 6 months ago.
centrality-measuresclassification-modelinfluence-rankingnetwork-analysispriaritization-model
7.0 match 27 stars 6.54 score 43 scripts 1 dependentsemf-creaf
indicspecies:Relationship Between Species and Groups of Sites
Functions to assess the strength and statistical significance of the relationship between species occurrence/abundance and groups of sites [De Caceres & Legendre (2009) <doi:10.1890/08-1823.1>]. Also includes functions to measure species niche breadth using resource categories [De Caceres et al. (2011) <doi:10.1111/J.1600-0706.2011.19679.x>].
Maintained by Miquel De Cáceres. Last updated 1 months ago.
4.7 match 10 stars 9.66 score 386 scripts 4 dependentsbioc
CNVRanger:Summarization and expression/phenotype association of CNV ranges
The CNVRanger package implements a comprehensive tool suite for CNV analysis. This includes functionality for summarizing individual CNV calls across a population, assessing overlap with functional genomic regions, and association analysis with gene expression and quantitative phenotypes.
Maintained by Ludwig Geistlinger. Last updated 5 months ago.
copynumbervariationdifferentialexpressiongeneexpressiongenomewideassociationgenomicvariationmicroarrayrnaseqsnpbioconductor-packageu24ca289073
7.8 match 7 stars 5.77 score 12 scriptscovaruber
sommer:Solving Mixed Model Equations in R
Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.
Maintained by Giovanny Covarrubias-Pazaran. Last updated 5 days ago.
average-informationmixed-modelsrcpparmadilloopenblascppopenmp
3.5 match 44 stars 12.63 score 300 scripts 10 dependentssem-in-r
seminr:Building and Estimating Structural Equation Models
A powerful, easy to syntax for specifying and estimating complex Structural Equation Models. Models can be estimated using Partial Least Squares Path Modeling or Covariance-Based Structural Equation Modeling or covariance based Confirmatory Factor Analysis. Methods described in Ray, Danks, and Valdez (2021).
Maintained by Nicholas Patrick Danks. Last updated 3 years ago.
common-factorscompositesconstructpls-models
6.0 match 62 stars 7.46 score 284 scriptscran
hytest:Hypothesis Testing Based on Neyman-Pearson Lemma and Likelihood Ratio Test
Error type I and Optimal critical values to test statistical hypothesis based on Neyman-Pearson Lemma and Likelihood ratio test based on random samples from several distributions. The families of distributions are Bernoulli, Exponential, Geometric, Inverse Normal, Normal, Gamma, Gumbel, Lognormal, Poisson, and Weibull. This package is an ideal resource to help with the teaching of Statistics. The main references for this package are Casella G. and Berger R. (2003,ISBN:0-534-24312-6 , "Statistical Inference. Second Edition", Duxbury Press) and Hogg, R., McKean, J., and Craig, A. (2019,ISBN:013468699, "Introduction to Mathematical Statistic. Eighth edition", Pearson).
Maintained by Carlos Alberto Cardozo Delgado. Last updated 7 months ago.
34.1 match 1.30 scorebioc
PanomiR:Detection of miRNAs that regulate interacting groups of pathways
PanomiR is a package to detect miRNAs that target groups of pathways from gene expression data. This package provides functionality for generating pathway activity profiles, determining differentially activated pathways between user-specified conditions, determining clusters of pathways via the PCxN package, and generating miRNAs targeting clusters of pathways. These function can be used separately or sequentially to analyze RNA-Seq data.
Maintained by Pourya Naderi. Last updated 5 months ago.
geneexpressiongenesetenrichmentgenetargetmirnapathways
9.0 match 3 stars 4.89 score 13 scriptskkholst
lava:Latent Variable Models
A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.
Maintained by Klaus K. Holst. Last updated 3 months ago.
latent-variable-modelssimulationstatisticsstructural-equation-models
3.4 match 33 stars 12.87 score 610 scripts 478 dependentsfriendly
vcdExtra:'vcd' Extensions and Additions
Provides additional data sets, methods and documentation to complement the 'vcd' package for Visualizing Categorical Data and the 'gnm' package for Generalized Nonlinear Models. In particular, 'vcdExtra' extends mosaic, assoc and sieve plots from 'vcd' to handle 'glm()' and 'gnm()' models and adds a 3D version in 'mosaic3d'. Additionally, methods are provided for comparing and visualizing lists of 'glm' and 'loglm' objects. This package is now a support package for the book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer.
Maintained by Michael Friendly. Last updated 8 days ago.
categorical-data-visualizationgeneralized-linear-modelsmosaic-plots
4.0 match 24 stars 10.85 score 472 scripts 3 dependentsbilldenney
PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis
Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.
Maintained by Bill Denney. Last updated 1 months ago.
ncanoncompartmental-analysispharmacokinetics
3.5 match 73 stars 12.53 score 214 scripts 4 dependentsncss-tech
sharpshootR:A Soil Survey Toolkit
A collection of data processing, visualization, and export functions to support soil survey operations. Many of the functions build on the `SoilProfileCollection` S4 class provided by the aqp package, extending baseline visualization to more elaborate depictions in the context of spatial and taxonomic data. While this package is primarily developed by and for the USDA-NRCS, in support of the National Cooperative Soil Survey, the authors strive for generalization sufficient to support any soil survey operation. Many of the included functions are used by the SoilWeb suite of websites and movile applications. These functions are provided here, with additional documentation, to enable others to replicate high quality versions of these figures for their own purposes.
Maintained by Dylan Beaudette. Last updated 28 days ago.
5.2 match 18 stars 8.37 score 327 scriptsr-forge
Sleuth2:Data Sets from Ramsey and Schafer's "Statistical Sleuth (2nd Ed)"
Data sets from Ramsey, F.L. and Schafer, D.W. (2002), "The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed)", Duxbury.
Maintained by Berwin A Turlach. Last updated 1 years ago.
7.5 match 5.79 score 191 scriptsalexkowa
EnvStats:Package for Environmental Statistics, Including US EPA Guidance
Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).
Maintained by Alexander Kowarik. Last updated 1 days ago.
3.4 match 26 stars 12.85 score 2.4k scripts 47 dependentsohdsi
DatabaseConnector:Connecting to Various Database Platforms
An R 'DataBase Interface' ('DBI') compatible interface to various database platforms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Impala', 'Google BigQuery', 'Snowflake', 'Spark', 'SQLite', and 'InterSystems IRIS'). Also includes support for fetching data as 'Andromeda' objects. Uses either 'Java Database Connectivity' ('JDBC') or other 'DBI' drivers to connect to databases.
Maintained by Martijn Schuemie. Last updated 2 months ago.
3.6 match 56 stars 11.94 score 772 scripts 13 dependentsmlizhangx
NAIR:Network Analysis of Immune Repertoire
Pipelines for studying the adaptive immune repertoire of T cells and B cells via network analysis based on receptor sequence similarity. Relate clinical outcomes to immune repertoires based on their network properties, or to particular clusters and clones within a repertoire. Yang et al. (2023) <doi:10.3389/fimmu.2023.1181825>.
Maintained by Brian Neal. Last updated 3 months ago.
6.4 match 7 stars 6.66 score 27 scriptscivisanalytics
civis:R Client for the 'Civis Platform API'
A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.
Maintained by Peter Cooman. Last updated 2 months ago.
5.4 match 16 stars 7.84 score 144 scriptsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 3 months ago.
3.8 match 240 stars 11.29 score 6.0k scriptsmikejareds
hermiter:Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate)
Facilitates estimation of full univariate and bivariate probability density functions and cumulative distribution functions along with full quantile functions (univariate) and nonparametric correlation (bivariate) using Hermite series based estimators. These estimators are particularly useful in the sequential setting (both stationary and non-stationary) and one-pass batch estimation setting for large data sets. Based on: Stephanou, Michael, Varughese, Melvin and Macdonald, Iain. "Sequential quantiles via Hermite series density estimation." Electronic Journal of Statistics 11.1 (2017): 570-607 <doi:10.1214/17-EJS1245>, Stephanou, Michael and Varughese, Melvin. "On the properties of Hermite series based distribution function estimators." Metrika (2020) <doi:10.1007/s00184-020-00785-z> and Stephanou, Michael and Varughese, Melvin. "Sequential estimation of Spearman rank correlation using Hermite series estimators." Journal of Multivariate Analysis (2021) <doi:10.1016/j.jmva.2021.104783>.
Maintained by Michael Stephanou. Last updated 7 months ago.
cumulative-distribution-functionkendall-correlation-coefficientonline-algorithmsprobability-density-functionquantilespearman-correlation-coefficientstatisticsstreaming-algorithmsstreaming-datacpp
8.3 match 15 stars 5.11 score 17 scriptshanchenphd
MAGEE:Mixed Model Association Test for GEne-Environment Interaction
Use a 'glmmkin' class object (GMMAT package) from the null model to perform generalized linear mixed model-based single-variant and variant set main effect tests, gene-environment interaction tests, and joint tests for association, as proposed in Wang et al. (2020) <DOI:10.1002/gepi.22351>.
Maintained by Han Chen. Last updated 8 months ago.
8.5 match 4.95 score 9 scriptsabdel-elsayed87
GRIN2:Genomic Random Interval (GRIN)
Improved version of 'GRIN' software that streamlines its use in practice to analyze genomic lesion data, accelerate its computing, and expand its analysis capabilities to answer additional scientific questions including a rigorous evaluation of the association of genomic lesions with RNA expression. Pounds, Stan, et al. (2013) <DOI:10.1093/bioinformatics/btt372>.
Maintained by Abdelrahman Elsayed. Last updated 4 months ago.
12.6 match 3.30 scorebioc
geNetClassifier:Classify diseases and build associated gene networks using gene expression profiles
Comprehensive package to automatically train and validate a multi-class SVM classifier based on gene expression data. Provides transparent selection of gene markers, their coexpression networks, and an interface to query the classifier.
Maintained by Sara Aibar. Last updated 5 months ago.
classificationdifferentialexpressionmicroarray
9.5 match 4.38 score 1 scripts 2 dependentsbioc
SAIGEgds:Scalable Implementation of Generalized mixed models using GDS files in Phenome-Wide Association Studies
Scalable implementation of generalized mixed models with highly optimized C++ implementation and integration with Genomic Data Structure (GDS) files. It is designed for single variant tests and set-based aggregate tests in large-scale Phenome-wide Association Studies (PheWAS) with millions of variants and samples, controlling for sample structure and case-control imbalance. The implementation is based on the SAIGE R package (v0.45, Zhou et al. 2018 and Zhou et al. 2020), and it is extended to include the state-of-the-art ACAT-O set-based tests. Benchmarks show that SAIGEgds is significantly faster than the SAIGE R package.
Maintained by Xiuwen Zheng. Last updated 4 months ago.
softwaregeneticsstatisticalmethodgenomewideassociationgdsgwasmixed-modelphewasopenblascpp
6.8 match 7 stars 6.04 score 15 scriptskasperwelbers
corpustools:Managing, Querying and Analyzing Tokenized Text
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.
Maintained by Kasper Welbers. Last updated 6 months ago.
5.5 match 31 stars 7.50 score 174 scripts 1 dependentsrmi-pacta
r2dii.colours:2 Degrees Investing Colour Palettes in Different Formats
Get colour values from different colour palettes used by 2 Degrees Investing (2DII) organization in their reserach streams. Different ways to obtain the colour values are available: dataframe or a function call.
Maintained by Monika Furdyna. Last updated 11 months ago.
13.9 match 3 stars 2.95 score 6 scripts 2 dependentstermehs
netropy:Statistical Entropy Analysis of Network Data
Statistical entropy analysis of network data as introduced by Frank and Shafie (2016) <doi:10.1177/0759106315615511>, and a in textbook which is in progress.
Maintained by Termeh Shafie. Last updated 5 months ago.
6.6 match 12 stars 6.26 score 9 scriptsnimble-dev
nimble:MCMC, Particle Filtering, and Programmable Hierarchical Modeling
A system for writing hierarchical statistical models largely compatible with 'BUGS' and 'JAGS', writing nimbleFunctions to operate models and do basic R-style math, and compiling both models and nimbleFunctions via custom-generated C++. 'NIMBLE' includes default methods for MCMC, Laplace Approximation, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers 'NIMBLE' provides. 'NIMBLE' extends the 'BUGS'/'JAGS' language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the 'BUGS'/'JAGS' language for writing models, one can use 'NIMBLE' for writing arbitrary other kinds of model-generic algorithms as well. A full User Manual is available at <https://r-nimble.org>.
Maintained by Christopher Paciorek. Last updated 20 days ago.
bayesian-inferencebayesian-methodshierarchical-modelsmcmcprobabilistic-programmingopenblascpp
3.2 match 169 stars 12.97 score 2.6k scripts 19 dependentsbioc
methodical:Discovering genomic regions where methylation is strongly associated with transcriptional activity
DNA methylation is generally considered to be associated with transcriptional silencing. However, comprehensive, genome-wide investigation of this relationship requires the evaluation of potentially millions of correlation values between the methylation of individual genomic loci and expression of associated transcripts in a relatively large numbers of samples. Methodical makes this process quick and easy while keeping a low memory footprint. It also provides a novel method for identifying regions where a number of methylation sites are consistently strongly associated with transcriptional expression. In addition, Methodical enables housing DNA methylation data from diverse sources (e.g. WGBS, RRBS and methylation arrays) with a common framework, lifting over DNA methylation data between different genome builds and creating base-resolution plots of the association between DNA methylation and transcriptional activity at transcriptional start sites.
Maintained by Richard Heery. Last updated 2 months ago.
dnamethylationmethylationarraytranscriptiongenomewideassociationsoftwareopenjdk
8.8 match 4.65 score 14 scriptscrsh
papaja:Prepare American Psychological Association Journal Articles with R Markdown
Tools to create dynamic, submission-ready manuscripts, which conform to American Psychological Association manuscript guidelines. We provide R Markdown document formats for manuscripts (PDF and Word) and revision letters (PDF). Helper functions facilitate reporting statistical analyses or create publication-ready tables and plots.
Maintained by Frederik Aust. Last updated 1 months ago.
apaapa-guidelinesjournalmanuscriptpsychologyreproducible-paperreproducible-researchrmarkdown
3.3 match 663 stars 12.00 score 1.7k scripts 2 dependentsdstanley4
apaTables:Create American Psychological Association (APA) Style Tables
A common task faced by researchers is the creation of APA style (i.e., American Psychological Association style) tables from statistical output. In R a large number of function calls are often needed to obtain all of the desired information for a single APA style table. As well, the process of manually creating APA style tables in a word processor is prone to transcription errors. This package creates Word files (.doc files) and latex code containing APA style tables for several types of analyses. Using this package minimizes transcription errors and reduces the number commands needed by the user.
Maintained by David Stanley. Last updated 7 months ago.
5.1 match 61 stars 7.86 score 438 scriptsausgis
SecDim:The Second Dimension of Spatial Association
Most of the current methods explore spatial association using observations at sample locations, which are defined as the first dimension of spatial association (FDA). The proposed concept of the second dimension of spatial association (SDA), as described in Yongze Song (2022) <doi:10.1016/j.jag.2022.102834>, aims to extract in-depth information about the geographical environment from locations outside sample locations for exploring spatial association.
Maintained by Wenbo Lv. Last updated 7 months ago.
spatial-associationspatial-predictions
14.8 match 1 stars 2.70 score 2 scriptsdaniel-jg
BeviMed:Bayesian Evaluation of Variant Involvement in Mendelian Disease
A fast integrative genetic association test for rare diseases based on a model for disease status given allele counts at rare variant sites. Probability of association, mode of inheritance and probability of pathogenicity for individual variants are all inferred in a Bayesian framework - 'A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases', Greene et al 2017 <doi:10.1016/j.ajhg.2017.05.015>.
Maintained by Daniel Greene. Last updated 10 months ago.
11.7 match 1 stars 3.41 score 17 scriptsbioc
GEM:GEM: fast association study for the interplay of Gene, Environment and Methylation
Tools for analyzing EWAS, methQTL and GxE genome widely.
Maintained by Hong Pan. Last updated 5 months ago.
methylseqmethylationarraygenomewideassociationregressiondnamethylationsnpgeneexpressiongui
7.3 match 5.43 score 27 scriptsfmichonneau
phylobase:Base Package for Phylogenetic Structures and Comparative Data
Provides a base S4 class for comparative methods, incorporating one or more trees and trait data.
Maintained by Francois Michonneau. Last updated 1 years ago.
3.6 match 18 stars 11.10 score 394 scripts 18 dependentsdjvanderlaan
datapackage:Creating and Reading Data Packages
Open, read data from and modify Data Packages. Data Packages are an open standard for bundling and describing data sets (<https://datapackage.org>). When data is read from a Data Package care is taken to convert the data as much a possible to R appropriate data types. The package can be extended with plugins for additional data types.
Maintained by Jan van der Laan. Last updated 6 days ago.
7.0 match 2 stars 5.65 scorebioc
ChIPQC:Quality metrics for ChIPseq data
Quality metrics for ChIPseq data.
Maintained by Tom Carroll. Last updated 5 months ago.
sequencingchipseqqualitycontrolreportwriting
7.2 match 5.45 score 140 scriptsbioc
CytoML:A GatingML Interface for Cross Platform Cytometry Data Sharing
Uses platform-specific implemenations of the GatingML2.0 standard to exchange gated cytometry data with other software platforms.
Maintained by Mike Jiang. Last updated 25 days ago.
immunooncologyflowcytometrydataimportdatarepresentationzlibopenblaslibxml2cpp
5.2 match 30 stars 7.60 score 132 scriptsobjornstad
ncf:Spatial Covariance Functions
Spatial (cross-)covariance and related geostatistical tools: the nonparametric (cross-)covariance function , the spline correlogram, the nonparametric phase coherence function, local indicators of spatial association (LISA), (Mantel) correlogram, (Partial) Mantel test.
Maintained by Ottar N. Bjornstad. Last updated 3 years ago.
6.0 match 5 stars 6.44 score 328 scripts 1 dependentssantagos
dad:Three-Way / Multigroup Data Analysis Through Densities
The data consist of a set of variables measured on several groups of individuals. To each group is associated an estimated probability density function. The package provides tools to create or manage such data and functional methods (principal component analysis, multidimensional scaling, cluster analysis, discriminant analysis...) for such probability densities.
Maintained by Pierre Santagostini. Last updated 4 months ago.
7.2 match 5.32 score 92 scriptsrstudio
shiny:Web Application Framework for R
Makes it incredibly easy to build interactive web applications with R. Automatic "reactive" binding between inputs and outputs and extensive prebuilt widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.
Maintained by Winston Chang. Last updated 6 days ago.
reactiverstudioshinyweb-appweb-development
1.8 match 5.5k stars 21.31 score 108k scripts 1.8k dependentsheliosdrm
pwr:Basic Functions for Power Analysis
Power analysis functions along the lines of Cohen (1988).
Maintained by Helios De Rosario. Last updated 1 years ago.
2.9 match 105 stars 13.05 score 2.6k scripts 28 dependentsbioc
rcellminer:rcellminer: Molecular Profiles, Drug Response, and Chemical Structures for the NCI-60 Cell Lines
The NCI-60 cancer cell line panel has been used over the course of several decades as an anti-cancer drug screen. This panel was developed as part of the Developmental Therapeutics Program (DTP, http://dtp.nci.nih.gov/) of the U.S. National Cancer Institute (NCI). Thousands of compounds have been tested on the NCI-60, which have been extensively characterized by many platforms for gene and protein expression, copy number, mutation, and others (Reinhold, et al., 2012). The purpose of the CellMiner project (http://discover.nci.nih.gov/ cellminer) has been to integrate data from multiple platforms used to analyze the NCI-60 and to provide a powerful suite of tools for exploration of NCI-60 data.
Maintained by Augustin Luna. Last updated 5 months ago.
acghcellbasedassayscopynumbervariationgeneexpressionpharmacogenomicspharmacogeneticsmirnacheminformaticsvisualizationsoftwaresystemsbiology
6.7 match 5.71 score 113 scriptsbioc
MicrobiomeProfiler:An R/shiny package for microbiome functional enrichment analysis
This is an R/shiny package to perform functional enrichment analysis for microbiome data. This package was based on clusterProfiler. Moreover, MicrobiomeProfiler support KEGG enrichment analysis, COG enrichment analysis, Microbe-Disease association enrichment analysis, Metabo-Pathway analysis.
Maintained by Guangchuang Yu. Last updated 5 months ago.
microbiomesoftwarevisualizationkegg
5.6 match 38 stars 6.80 score 22 scriptsrudeboybert
fivethirtyeight:Data and Code Behind the Stories and Interactives at 'FiveThirtyEight'
Datasets and code published by the data journalism website 'FiveThirtyEight' available at <https://github.com/fivethirtyeight/data>. Note that while we received guidance from editors at 'FiveThirtyEight', this package is not officially published by 'FiveThirtyEight'.
Maintained by Albert Y. Kim. Last updated 2 years ago.
data-sciencedatajournalismfivethirtyeightstatistics
3.4 match 453 stars 10.98 score 1.7k scriptsr-dbi
DBI:R Database Interface
A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.
Maintained by Kirill Müller. Last updated 14 days ago.
1.8 match 302 stars 20.87 score 19k scripts 2.9k dependentsbioc
Statial:A package to identify changes in cell state relative to spatial associations
Statial is a suite of functions for identifying changes in cell state. The functionality provided by Statial provides robust quantification of cell type localisation which are invariant to changes in tissue structure. In addition to this Statial uncovers changes in marker expression associated with varying levels of localisation. These features can be used to explore how the structure and function of different cell types may be altered by the agents they are surrounded with.
Maintained by Farhan Ameen. Last updated 5 months ago.
singlecellspatialclassificationsingle-cell
5.7 match 5 stars 6.49 score 23 scriptsbahlolab
UKB.COVID19:UK Biobank COVID-19 Data Processing and Risk Factor Association Tests
Process UK Biobank COVID-19 test result data for susceptibility, severity and mortality analyses, perform potential non-genetic COVID-19 risk factor and co-morbidity association tests. Wang et al. (2021) <doi:10.5281/zenodo.5174381>.
Maintained by Longfei Wang. Last updated 8 months ago.
9.2 match 1 stars 4.00 score 4 scriptsqiuanzhu
xlink:Genetic Association Models for X-Chromosome SNPS on Continuous, Binary and Survival Outcomes
The expression of X-chromosome undergoes three possible biological processes: X-chromosome inactivation (XCI), escape of the X-chromosome inactivation (XCI-E),and skewed X-chromosome inactivation (XCI-S). To analyze the X-linked genetic association for phenotype such as continuous, binary, and time-to-event outcomes with the actual process unknown, we propose a unified approach of maximizing the likelihood or partial likelihood over all of the potential biological processes. The methods are described in Wei Xu, Meiling Hao (2017) <doi:10.1002/gepi.22097>. And also see Dongxiao Han, Meiling Hao, Lianqiang Qu, Wei Xu (2019) <doi:10.1177/0962280219859037>.
Maintained by Yi Zhu. Last updated 6 years ago.
9.9 match 3.70 score 4 scriptscran
MASS:Support Functions and Datasets for Venables and Ripley's MASS
Functions and datasets to support Venables and Ripley, "Modern Applied Statistics with S" (4th edition, 2002).
Maintained by Brian Ripley. Last updated 1 months ago.
3.4 match 19 stars 10.64 score 11k dependentsmrcieu
mrbayes:Bayesian Summary Data Models for Mendelian Randomization Studies
Bayesian estimation of inverse variance weighted (IVW), Burgess et al. (2013) <doi:10.1002/gepi.21758>, and MR-Egger, Bowden et al. (2015) <doi:10.1093/ije/dyv080>, summary data models for Mendelian randomization analyses.
Maintained by Tom Palmer. Last updated 1 days ago.
6.5 match 4 stars 5.60 score 2 scriptsmoosa-r
rbioapi:User-Friendly R Interface to Biologic Web Services' API
Currently fully supports Enrichr, JASPAR, miEAA, PANTHER, Reactome, STRING, and UniProt! The goal of rbioapi is to provide a user-friendly and consistent interface to biological databases and services. In a way that insulates the user from the technicalities of using web services API and creates a unified and easy-to-use interface to biological and medical web services. This is an ongoing project; New databases and services will be added periodically. Feel free to suggest any databases or services you often use.
Maintained by Moosa Rezwani. Last updated 2 months ago.
api-clientbioinformaticsbiologyenrichmentenrichment-analysisenrichrjasparmieaaover-representation-analysispantherreactomestringuniprot
4.8 match 20 stars 7.60 score 55 scriptstheoreticalecology
sjSDM:Scalable Joint Species Distribution Modeling
A scalable and fast method for estimating joint Species Distribution Models (jSDMs) for big community data, including eDNA data. The package estimates a full (i.e. non-latent) jSDM with different response distributions (including the traditional multivariate probit model). The package allows to perform variation partitioning (VP) / ANOVA on the fitted models to separate the contribution of environmental, spatial, and biotic associations. In addition, the total R-squared can be further partitioned per species and site to reveal the internal metacommunity structure, see Leibold et al., <doi:10.1111/oik.08618>. The internal structure can then be regressed against environmental and spatial distinctiveness, richness, and traits to analyze metacommunity assembly processes. The package includes support for accounting for spatial autocorrelation and the option to fit responses using deep neural networks instead of a standard linear predictor. As described in Pichler & Hartig (2021) <doi:10.1111/2041-210X.13687>, scalability is achieved by using a Monte Carlo approximation of the joint likelihood implemented via 'PyTorch' and 'reticulate', which can be run on CPUs or GPUs.
Maintained by Maximilian Pichler. Last updated 1 months ago.
deep-learninggpu-accelerationmachine-learningspecies-distribution-modellingspecies-interactions
4.7 match 69 stars 7.64 score 70 scripts