R-universe search: targets

ropensci

targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines

Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).

Maintained by William Michael Landau. Last updated 14 hours ago.

data-science high-performance-computing make peer-reviewed pipeline r-targetopia reproducibility reproducible-research targets workflow

205.4 match 973 stars 15.20 score 4.6k scripts 22 dependents

ropensci

drake:A Pipeline Toolkit for Reproducible Computation at Scale

A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.

Maintained by William Michael Landau. Last updated 3 months ago.

data-science drake high-performance-computing makefile peer-reviewed pipeline reproducibility reproducible-research ropensci workflow

59.2 match 1.3k stars 11.49 score 1.7k scripts 1 dependents

rstudio

gt:Easily Create Presentation-Ready Display Tables

Build display tables from tabular data with an easy-to-use set of functions. With its progressive approach, we can construct display tables with a cohesive set of table parts. Table values can be formatted using any of the included formatting functions. Footnotes and cell styles can be precisely added through a location targeting system. The way in which 'gt' handles things for you means that you don't often have to worry about the fine details.

Maintained by Richard Iannone. Last updated 10 days ago.

docx easy-to-use html latex rtf summary-tables

32.3 match 2.1k stars 18.36 score 20k scripts 112 dependents

ropensci

tarchetypes:Archetypes for Targets

Function-oriented Make-like declarative pipelines for Statistics and data science are supported in the 'targets' R package. As an extension to 'targets', the 'tarchetypes' package provides convenient user-side functions to make 'targets' easier to use. By establishing reusable archetypes for common kinds of targets and pipelines, these functions help express complicated reproducible pipelines concisely and compactly. The methods in this package were influenced by the 'targets' R package. by Will Landau (2018) <doi:10.21105/joss.00550>.

Maintained by William Michael Landau. Last updated 19 days ago.

data-science high-performance-computing peer-reviewed pipeline r-targetopia reproducibility targets workflow

50.6 match 141 stars 11.43 score 1.7k scripts 10 dependents

bioc

target:Predict Combined Function of Transcription Factors

Implement the BETA algorithm for infering direct target genes from DNA-binding and perturbation expression data Wang et al. (2013) <doi: 10.1038/nprot.2013.150>. Extend the algorithm to predict the combined function of two DNA-binding elements from comprable binding and expression data.

Maintained by Mahmoud Ahmed. Last updated 5 months ago.

software statisticalmethod transcription algorithm chip-seq dna-binding gene-regulation transcription-factors

65.6 match 4 stars 7.79 score 1.3k scripts

kkholst

targeted:Targeted Inference

Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).

Maintained by Klaus K. Holst. Last updated 1 months ago.

causal-inference double-robust estimation semiparametric-estimation statistics openblas cpp openmp

59.2 match 11 stars 7.20 score 30 scripts 1 dependents

bioc

crisprDesign:Comprehensive design of CRISPR gRNAs for nucleases and base editors

Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.

Maintained by Jean-Philippe Fortin. Last updated 10 days ago.

crispr functionalgenomics genetarget bioconductor bioconductor-package crispr-cas9 crispr-design crispr-target genomics-analysis grna grna-sequence grna-sequences sgrna sgrna-design

45.8 match 22 stars 8.28 score 80 scripts 3 dependents

rolkra

explore:Simplifies Exploratory Data Analysis

Interactive data exploration with one line of code, automated reporting or use an easy to remember set of tidy functions for low code exploratory data analysis.

Maintained by Roland Krasser. Last updated 3 months ago.

data-exploration data-visualisation decision-trees eda rmarkdown shiny tidy

31.7 match 228 stars 11.43 score 221 scripts 1 dependents

interstellar-consultation-services

covid19dbcand:Selected 'Drugbank' Drugs for COVID-19 Treatment Related Data in R Format

Provides different datasets parsed from 'Drugbank' <https://www.drugbank.ca/covid-19> database using 'dbparser' package. It is a smaller version from 'dbdataset' package. It contains only information about COVID-19 possible treatment.

Maintained by Mohammed Ali. Last updated 11 months ago.

dataset dbparser drugbank drugbank-database

78.0 match 3 stars 4.48 score 6 scripts

bioc

crisprScore:On-Target and Off-Target Scoring Algorithms for CRISPR gRNAs

Provides R wrappers of several on-target and off-target scoring methods for CRISPR guide RNAs (gRNAs). The following nucleases are supported: SpCas9, AsCas12a, enAsCas12a, and RfxCas13d (CasRx). The available on-target cutting efficiency scoring methods are RuleSet1, Azimuth, DeepHF, DeepCpf1, enPAM+GB, and CRISPRscan. Both the CFD and MIT scoring methods are available for off-target specificity prediction. The package also provides a Lindel-derived score to predict the probability of a gRNA to produce indels inducing a frameshift for the Cas9 nuclease. Note that DeepHF, DeepCpf1 and enPAM+GB are not available on Windows machines.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics functionalprediction bioconductor bioconductor-package crispr-cas9 crispr-design crispr-target genomics grna grna-sequence grna-sequences scoring-algorithm sgrna sgrna-design

43.4 match 16 stars 7.52 score 19 scripts 4 dependents

tvganesh

QCSimulator:5 Qubit Quantum Computing Simulator

This package simulates a 5 qubit Quantum Computer.

Maintained by Tinniam V Ganesh. Last updated 9 years ago.

67.5 match 5 stars 4.20 score 64 scripts

bioc

systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation

systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.

Maintained by Thomas Girke. Last updated 5 months ago.

genetics infrastructure dataimport sequencing rnaseq riboseq chipseq methylseq snp geneexpression coverage genesetenrichment alignment qualitycontrol immunooncology reportwriting workflowstep workflowmanagement

24.1 match 53 stars 11.56 score 344 scripts 3 dependents

bioc

bioassayR:Cross-target analysis of small molecule bioactivity

bioassayR is a computational tool that enables simultaneous analysis of thousands of bioassay experiments performed over a diverse set of compounds and biological targets. Unique features include support for large-scale cross-target analyses of both public and custom bioassays, generation of high throughput screening fingerprints (HTSFPs), and an optional preloaded database that provides access to a substantial portion of publicly available bioactivity data.

Maintained by Thomas Girke. Last updated 5 months ago.

immunooncology microtitreplateassay cellbasedassays visualization infrastructure dataimport bioinformatics proteomics metabolomics

34.1 match 5 stars 6.70 score 46 scripts

ropensci

geotargets:'Targets' Extensions for Geographic Spatial Formats

Provides extensions for various geographic spatial file formats, such as shape files and rasters. Currently provides support for the 'terra' geographic spatial formats. See the vignettes for worked examples, demonstrations, and explanations of how to use the various package extensions.

Maintained by Nicholas Tierney. Last updated 2 days ago.

geospatial pipeline r-targetopia raster reproducibility reproducible-research targets vector workflow

31.1 match 72 stars 6.78 score

bioc

CRISPRseek:Design of guide RNAs in CRISPR genome-editing systems

The package encompasses functions to find potential guide RNAs for the CRISPR-based genome-editing systems including the Base Editors and the Prime Editors when supplied with target sequences as input. Users have the flexibility to filter resulting guide RNAs based on parameters such as the absence of restriction enzyme cut sites or the lack of paired guide RNAs. The package also facilitates genome-wide exploration for off-targets, offering features to score and rank off-targets, retrieve flanking sequences, and indicate whether the hits are located within exon regions. All detected guide RNAs are annotated with the cumulative scores of the top5 and topN off-targets together with the detailed information such as mismatch sites and restrictuion enzyme cut sites. The package also outputs INDELs and their frequencies for Cas9 targeted sites.

Maintained by Lihua Julie Zhu. Last updated 5 days ago.

immunooncology generegulation sequencematching crispr

27.4 match 7.18 score 51 scripts 2 dependents

prioritizr

prioritizr:Systematic Conservation Prioritization in R

Systematic conservation prioritization using mixed integer linear programming (MILP). It provides a flexible interface for building and solving conservation planning problems. Once built, conservation planning problems can be solved using a variety of commercial and open-source exact algorithm solvers. By using exact algorithm solvers, solutions can be generated that are guaranteed to be optimal (or within a pre-specified optimality gap). Furthermore, conservation problems can be constructed to optimize the spatial allocation of different management actions or zones, meaning that conservation practitioners can identify solutions that benefit multiple stakeholders. To solve large-scale or complex conservation planning problems, users should install the Gurobi optimization software (available from <https://www.gurobi.com/>) and the 'gurobi' R package (see Gurobi Installation Guide vignette for details). Users can also install the IBM CPLEX software (<https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer>) and the 'cplexAPI' R package (available at <https://github.com/cran/cplexAPI>). Additionally, the 'rcbc' R package (available at <https://github.com/dirkschumacher/rcbc>) can be used to generate solutions using the CBC optimization software (<https://github.com/coin-or/Cbc>). For further details, see Hanson et al. (2025) <doi:10.1111/cobi.14376>.

Maintained by Richard Schuster. Last updated 10 days ago.

biodiversity conservation conservation-planner optimization prioritization solver spatial cpp

16.6 match 124 stars 11.82 score 584 scripts 2 dependents

bioc

RCy3:Functions to Access and Control Cytoscape

Vizualize, analyze and explore networks using Cytoscape via R. Anything you can do using the graphical user interface of Cytoscape, you can now do with a single RCy3 function.

Maintained by Alex Pico. Last updated 5 months ago.

visualization graphandnetwork thirdpartyclient network

14.1 match 52 stars 13.39 score 628 scripts 15 dependents

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 3 days ago.

13.8 match 845 stars 13.57 score 264 scripts 2 dependents

bioc

miRLAB:Dry lab for exploring miRNA-mRNA relationships

Provide tools exploring miRNA-mRNA relationships, including popular miRNA target prediction methods, ensemble methods that integrate individual methods, functions to get data from online resources, functions to validate the results, and functions to conduct enrichment analyses.

Maintained by Thuc Duy Le. Last updated 5 months ago.

mirna geneexpression networkinference network

39.5 match 4.72 score 11 scripts

usaid-oha-si

tameDP:Import targets and PLHIV data from COP Target Setting Tool (formerly Data Pack)

Import PSNUxIM targets and PLHIV data from COP Data Pack. The purpose is to make the data tidy and more usable than their current structure in the Excel data packs.

Maintained by Aaron Chafetz. Last updated 1 years ago.

34.4 match 1 stars 4.92 score 46 scripts

owp-spatial

reference.fabric:Hydrological Reference Fabric Tools

Development tools and `targets` pipeline for generating a national hydrological geospatial reference fabric.

Maintained by Justin Singh-Mohudpur. Last updated 2 months ago.

56.0 match 1 stars 3.00 score

winvector

vtreat:A Statistically Sound 'data.frame' Processor/Conditioner

A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.

Maintained by John Mount. Last updated 2 months ago.

categorical-variables machine-learning-algorithms nested-models prepare-data

14.8 match 285 stars 11.19 score 328 scripts 1 dependents

mlr-org

mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'

Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.

Maintained by Martin Binder. Last updated 7 days ago.

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing stacking

13.0 match 141 stars 12.36 score 448 scripts 7 dependents

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

14.1 match 10.82 score 10k scripts 54 dependents

flr

FLasher:Projection and Forecasting of Fish Populations, Stocks and Fleets

Projection of future population and fishery dynamics is carried out for a given set of management targets. A system of equations is solved, using Automatic Differentation (AD), for the levels of effort by fishery (fleet) that will result in the required abundances, catches or fishing mortalities.

Maintained by Iago Mosqueira. Last updated 8 days ago.

forecast fisheries flr cpp

22.0 match 2 stars 6.86 score 254 scripts 6 dependents

bioc

OmnipathR:OmniPath web service client and more

A client for the OmniPath web service (https://www.omnipathdb.org) and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).

Maintained by Denes Turei. Last updated 17 days ago.

graphandnetwork network pathways software thirdpartyclient dataimport datarepresentation genesignaling generegulation systemsbiology transcriptomics singlecell annotation kegg complexes enzyme-ptm networks networks-biology omnipath proteins quarto

14.4 match 126 stars 9.90 score 226 scripts 2 dependents

choonghyunryu

dlookr:Tools for Data Diagnosis, Exploration, Transformation

A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.

Maintained by Choonghyun Ryu. Last updated 9 months ago.

12.4 match 212 stars 11.05 score 748 scripts 2 dependents

bioc

crisprViz:Visualization Functions for CRISPR gRNAs

Provides functionalities to visualize and contextualize CRISPR guide RNAs (gRNAs) on genomic tracks across nucleases and applications. Works in conjunction with the crisprBase and crisprDesign Bioconductor packages. Plots are produced using the Gviz framework.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics genetarget bioconductor bioconductor-package crispr-analysis crispr-design grna grna-sequence grna-sequences sgrna sgrna-design visualization

20.0 match 7 stars 6.23 score 6 scripts 2 dependents

pik-piam

mrremind:MadRat REMIND Input Data Package

The mrremind packages contains data preprocessing for the REMIND model.

Maintained by Lavinia Baumstark. Last updated 2 days ago.

19.8 match 4 stars 6.25 score 15 scripts 1 dependents

jeffreyhanson

raptr:Representative and Adequate Prioritization Toolkit in R

Biodiversity is in crisis. The overarching aim of conservation is to preserve biodiversity patterns and processes. To this end, protected areas are established to buffer species and preserve biodiversity processes. But resources are limited and so protected areas must be cost-effective. This package contains tools to generate plans for protected areas (prioritizations), using spatially explicit targets for biodiversity patterns and processes. To obtain solutions in a feasible amount of time, this package uses the commercial 'Gurobi' software (obtained from <https://www.gurobi.com/>). For more information on using this package, see Hanson et al. (2018) <doi:10.1111/2041-210X.12862>.

Maintained by Jeffrey O Hanson. Last updated 1 years ago.

cpp

22.0 match 8 stars 5.52 score 83 scripts

bioc

genomation:Summary, annotation and visualization of genomic data

A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.

Maintained by Altuna Akalin. Last updated 5 months ago.

annotation sequencing visualization cpgisland cpp

10.9 match 75 stars 11.09 score 738 scripts 5 dependents

ropensci

stantargets:Targets for Stan Workflows

Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the 'stantargets' R package leverages 'targets' and 'cmdstanr' to ease these burdens. 'stantargets' makes it super easy to set up scalable Stan pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than 'targets' alone. 'stantargets' can access all of 'cmdstanr''s major algorithms (MCMC, variational Bayes, and optimization) and it supports both single-fit workflows and multi-rep simulation studies. For the statistical methodology, please refer to 'Stan' documentation (Stan Development Team 2020) <https://mc-stan.org/>.

Maintained by William Michael Landau. Last updated 1 months ago.

bayesian high-performance-computing make r-targetopia reproducibility stan statistics targets

17.5 match 49 stars 6.85 score 180 scripts

bioc

IntramiRExploreR:Predicting Targets for Drosophila Intragenic miRNAs

Intra-miR-ExploreR, an integrative miRNA target prediction bioinformatics tool, identifies targets combining expression and biophysical interactions of a given microRNA (miR). Using the tool, we have identified targets for 92 intragenic miRs in D. melanogaster, using available microarray expression data, from Affymetrix 1 and Affymetrix2 microarray array platforms, providing a global perspective of intragenic miR targets in Drosophila. Predicted targets are grouped according to biological functions using the DAVID Gene Ontology tool and are ranked based on a biologically relevant scoring system, enabling the user to identify functionally relevant targets for a given miR.

Maintained by Surajit Bhattacharya. Last updated 5 months ago.

software microarray genetarget statisticalmethod geneexpression geneprediction

25.0 match 4.60 score 4 scripts

bioc

crisprBase:Base functions and classes for CRISPR gRNA design

Provides S4 classes for general nucleases, CRISPR nucleases, CRISPR nickases, and base editors.Several CRISPR-specific genome arithmetic functions are implemented to help extract genomic coordinates of spacer and protospacer sequences. Commonly-used CRISPR nuclease objects are provided that can be readily used in other packages. Both DNA- and RNA-targeting nucleases are supported.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics bioconductor bioconductor-package crispr-cas9 crispr-design crispr-target grna grna-sequence grna-sequences

15.9 match 5 stars 7.15 score 52 scripts 6 dependents

ropensci

jagstargets:Targets for JAGS Pipelines

Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the 'jagstargets' R package is leverages 'targets' and 'R2jags' to ease this burden. 'jagstargets' makes it super easy to set up scalable JAGS pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than 'targets' alone. For the underlying methodology, please refer to the documentation of 'targets' <doi:10.21105/joss.02959> and 'JAGS' (Plummer 2003) <https://www.r-project.org/conferences/DSC-2003/Proceedings/Plummer.pdf>.

Maintained by William Michael Landau. Last updated 3 months ago.

bayesian high-performance-computing jags make r-targetopia reproducibility rjags statistics targets cpp

16.1 match 10 stars 7.01 score 32 scripts

tlverse

tmle3:The Extensible TMLE Framework

A general framework supporting the implementation of targeted maximum likelihood estimators (TMLEs) of a diverse range of statistical target parameters through a unified interface. The goal is that the exposed framework be as general as the mathematical framework upon which it draws.

Maintained by Jeremy Coyle. Last updated 4 months ago.

causal-inference machine-learning targeted-learning variable-importance

13.8 match 38 stars 7.91 score 286 scripts 5 dependents

tlverse

tmle3shift:Targeted Learning of the Causal Effects of Stochastic Interventions

Targeted maximum likelihood estimation (TMLE) of population-level causal effects under stochastic treatment regimes and related nonparametric variable importance analyses. Tools are provided for TML estimation of the counterfactual mean under a stochastic intervention characterized as a modified treatment policy, such as treatment policies that shift the natural value of the exposure. The causal parameter and estimation were described in Díaz and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x> and an improved estimation approach was given by Díaz and van der Laan (2018) <doi:10.1007/978-3-319-65304-4_14>.

Maintained by Nima Hejazi. Last updated 6 months ago.

causal-inference machine-learning marginal-structural-models stochastic-interventions targeted-learning treatment-effects variable-importance

20.4 match 17 stars 5.33 score 42 scripts 1 dependents

ropensci

gittargets:Data Version Control for the Targets Package

In computationally demanding data analysis pipelines, the 'targets' R package (2021, <doi:10.21105/joss.02959>) maintains an up-to-date set of results while skipping tasks that do not need to rerun. This process increases speed and increases trust in the final end product. However, it also overwrites old output with new output, and past results disappear by default. To preserve historical output, the 'gittargets' package captures version-controlled snapshots of the data store, and each snapshot links to the underlying commit of the source code. That way, when the user rolls back the code to a previous branch or commit, 'gittargets' can recover the data contemporaneous with that commit so that all targets remain up to date.

Maintained by William Michael Landau. Last updated 8 months ago.

data-science data-version-control data-versioning reproducibility reproducible-research targets workflow

17.4 match 88 stars 5.99 score 11 scripts

bioc

TEQC:Quality control for target capture experiments

Target capture experiments combine hybridization-based (in solution or on microarrays) capture and enrichment of genomic regions of interest (e.g. the exome) with high throughput sequencing of the captured DNA fragments. This package provides functionalities for assessing and visualizing the quality of the target enrichment process, like specificity and sensitivity of the capture, per-target read coverage and so on.

Maintained by Sarah Bonnin. Last updated 5 months ago.

qualitycontrol microarray sequencing genetics

24.0 match 4.30 score 8 scripts

daranzolin

sqltargets:'Targets' Extension for 'SQL' Queries

Provides an extension for 'SQL' queries as separate file within 'targets' pipelines. The shorthand creates two targets, the query file and the query result.

Maintained by David Ranzolin. Last updated 5 months ago.

pipeline sql targets workflow

18.0 match 39 stars 5.72 score 18 scripts

tbep-tech

tbeptools:Data and Indicators for the Tampa Bay Estuary Program

Several functions are provided for working with Tampa Bay Estuary Program data and indicators, including the water quality report card, tidal creek assessments, Tampa Bay Nekton Index, Tampa Bay Benthic Index, seagrass transect data, habitat report card, and fecal indicator bacteria. Additional functions are provided for miscellaneous tasks, such as reference library curation.

Maintained by Marcus Beck. Last updated 8 days ago.

data-analysis tampa-bay tbep water-quality

13.0 match 10 stars 7.86 score 133 scripts

bioc

limma:Linear Models for Microarray and Omics Data

Data analysis, linear models and differential expression for omics data.

Maintained by Gordon Smyth. Last updated 4 days ago.

exonarray geneexpression transcription alternativesplicing differentialexpression differentialsplicing genesetenrichment dataimport bayesian clustering regression timecourse microarray micrornaarray mrnamicroarray onechannel proprietaryplatforms twochannel sequencing rnaseq batcheffect multiplecomparison normalization preprocessing qualitycontrol biomedicalinformatics cellbiology cheminformatics epigenetics functionalgenomics genetics immunooncology metabolomics proteomics systemsbiology transcriptomics

7.4 match 13.81 score 16k scripts 585 dependents

afialkowski

SimMultiCorrData:Simulation of Correlated Data with Multiple Variable Types

Generate continuous (normal or non-normal), binary, ordinal, and count (Poisson or Negative Binomial) variables with a specified correlation matrix. It can also produce a single continuous variable. This package can be used to simulate data sets that mimic real-world situations (i.e. clinical or genetic data sets, plasmodes). All variables are generated from standard normal variables with an imposed intermediate correlation matrix. Continuous variables are simulated by specifying mean, variance, skewness, standardized kurtosis, and fifth and sixth standardized cumulants using either Fleishman's third-order (<DOI:10.1007/BF02293811>) or Headrick's fifth-order (<DOI:10.1016/S0167-9473(02)00072-5>) polynomial transformation. Binary and ordinal variables are simulated using a modification of the ordsample() function from 'GenOrd'. Count variables are simulated using the inverse cdf method. There are two simulation pathways which differ primarily according to the calculation of the intermediate correlation matrix. In Correlation Method 1, the intercorrelations involving count variables are determined using a simulation based, logarithmic correlation correction (adapting Yahav and Shmueli's 2012 method, <DOI:10.1002/asmb.901>). In Correlation Method 2, the count variables are treated as ordinal (adapting Barbiero and Ferrari's 2015 modification of GenOrd, <DOI:10.1002/asmb.2072>). There is an optional error loop that corrects the final correlation matrix to be within a user-specified precision value of the target matrix. The package also includes functions to calculate standardized cumulants for theoretical distributions or from real data sets, check if a target correlation matrix is within the possible correlation bounds (given the distributions of the simulated variables), summarize results (numerically or graphically), to verify valid power method pdfs, and to calculate lower standardized kurtosis bounds.

Maintained by Allison Cynthia Fialkowski. Last updated 7 years ago.

13.4 match 12 stars 7.58 score 44 scripts 6 dependents

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 5 days ago.

6.4 match 100 stars 15.36 score 1.4k scripts 36 dependents

bioc

crisprShiny:Exploring curated CRISPR gRNAs via Shiny

Provides means to interactively visualize guide RNAs (gRNAs) in GuideSet objects via Shiny application. This GUI can be self-contained or as a module within a larger Shiny app. The content of the app reflects the annotations present in the passed GuideSet object, and includes intuitive tools to examine, filter, and export gRNAs, thereby making gRNA design more user-friendly.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics genetarget gui crispr-analysis crispr-design shiny

20.9 match 2 stars 4.48 score 8 scripts

iohprofiler

IOHanalyzer:Data Analysis Part of 'IOHprofiler'

The data analysis module for the Iterative Optimization Heuristics Profiler ('IOHprofiler'). This module provides statistical analysis methods for the benchmark data generated by optimization heuristics, which can be visualized through a web-based interface. The benchmark data is usually generated by the experimentation module, called 'IOHexperimenter'. 'IOHanalyzer' also supports the widely used 'COCO' (Comparing Continuous Optimisers) data format for benchmarking.

Maintained by Diederick Vermetten. Last updated 10 months ago.

cpp

18.1 match 24 stars 5.10 score 13 scripts

azure

azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'

Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.

Maintained by Diondra Peck. Last updated 3 years ago.

amlcompute azure azure-machine-learning azureml dsi machine-learning rstudio sdk-r

10.3 match 106 stars 8.91 score 221 scripts

dimitri-justeau

rflsgen:Neutral Landscape Generator with Targets on Landscape Indices

Interface to the 'flsgen' neutral landscape generator <https://github.com/dimitri-justeau/flsgen>. It allows to - Generate fractal terrain; - Generate landscape structures satisfying user targets over landscape indices; - Generate landscape raster from landscape structures.

Maintained by Dimitri Justeau-Allaire. Last updated 10 months ago.

openjdk

16.7 match 12 stars 5.48 score 6 scripts

dovinij

GxEprs:Genotype-by-Environment Interaction in Polygenic Score Models

A novel PRS model is introduced to enhance the prediction accuracy by utilising GxE effects. This package performs Genome Wide Association Studies (GWAS) and Genome Wide Environment Interaction Studies (GWEIS) using a discovery dataset. The package has the ability to obtain polygenic risk scores (PRSs) for a target sample. Finally it predicts the risk values of each individual in the target sample. Users have the choice of using existing models (Li et al., 2015) <doi:10.1093/annonc/mdu565>, (Pandis et al., 2013) <doi:10.1093/ejo/cjt054>, (Peyrot et al., 2018) <doi:10.1016/j.biopsych.2017.09.009> and (Song et al., 2022) <doi:10.1038/s41467-022-32407-9>, as well as newly proposed models for genomic risk prediction (refer to the URL for more details).

Maintained by Dovini Jayasinghe. Last updated 10 months ago.

27.5 match 2 stars 3.30 score

rickhelmus

patRoon:Workflows for Mass-Spectrometry Based Non-Target Analysis

Provides an easy-to-use interface to a mass spectrometry based non-target analysis workflow. Various (open-source) tools are combined which provide algorithms for extraction and grouping of features, extraction of MS and MS/MS data, automatic formula and compound annotation and grouping related features to components. In addition, various tools are provided for e.g. data preparation and cleanup, plotting results and automatic reporting.

Maintained by Rick Helmus. Last updated 9 days ago.

mass-spectrometry non-target cpp openjdk

14.6 match 65 stars 6.22 score 43 scripts

sdctools

sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.

Maintained by Matthias Templ. Last updated 25 days ago.

cpp

9.1 match 83 stars 9.89 score 258 scripts

yufree

pmd:Paired Mass Distance Analysis for GC/LC-MS Based Non-Targeted Analysis and Reactomics Analysis

Paired mass distance (PMD) analysis proposed in Yu, Olkowicz and Pawliszyn (2018) <doi:10.1016/j.aca.2018.10.062> and PMD based reactomics analysis proposed in Yu and Petrick (2020) <doi:10.1038/s42004-020-00403-z> for gas/liquid chromatography–mass spectrometry (GC/LC-MS) based non-targeted analysis. PMD analysis including GlobalStd algorithm and structure/reaction directed analysis. GlobalStd algorithm could found independent peaks in m/z-retention time profiles based on retention time hierarchical cluster analysis and frequency analysis of paired mass distances within retention time groups. Structure directed analysis could be used to find potential relationship among those independent peaks in different retention time groups based on frequency of paired mass distances. Reactomics analysis could also be performed to build PMD network, assign sources and make biomarker reaction discovery. GUIs for PMD analysis is also included as 'shiny' applications.

Maintained by Miao YU. Last updated 2 months ago.

mass-spectrometry metabolomics non-target

13.4 match 10 stars 6.68 score 40 scripts

civisanalytics

civis:R Client for the 'Civis Platform API'

A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.

Maintained by Peter Cooman. Last updated 2 months ago.

11.4 match 16 stars 7.84 score 144 scripts

smbc-nzp

MigConnectivity:Estimate Migratory Connectivity for Migratory Animals

Allows the user to estimate transition probabilities for migratory animals between any two phases of the annual cycle, using a variety of different data types. Also quantifies the strength of migratory connectivity (MC), a standardized metric to quantify the extent to which populations co-occur between two phases of the annual cycle. Includes functions to estimate MC and the more traditional metric of migratory connectivity strength (Mantel correlation) incorporating uncertainty from multiple sources of sampling error. For cross-species comparisons, methods are provided to estimate differences in migratory connectivity strength, incorporating uncertainty. See Cohen et al. (2018) <doi:10.1111/2041-210X.12916>, Cohen et al. (2019) <doi:10.1111/ecog.03974>, and Roberts et al. (2023) <doi:10.1002/eap.2788> for details on some of these methods.

Maintained by Jeffrey A. Hostetler. Last updated 12 months ago.

jags cpp

13.1 match 8 stars 6.77 score 41 scripts

bioc

DECIPHER:Tools for curating, analyzing, and manipulating biological sequences

A toolset for deciphering and managing biological sequences.

Maintained by Erik Wright. Last updated 4 days ago.

clustering genetics sequencing dataimport visualization microarray qualitycontrol qpcr alignment wholegenome microbiome immunooncology geneprediction openmp

10.3 match 8.40 score 1.1k scripts 14 dependents

prodriguezsosa

conText:'a la Carte' on Text (ConText) Embedding Regression

A fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) <arXiv:1805.05388> and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021)<https://github.com/prodriguezsosa/EmbeddingRegression>.

Maintained by Pedro L. Rodriguez. Last updated 11 months ago.

9.2 match 104 stars 9.40 score 1.7k scripts

dpc10ster

RJafroc:Artificial Intelligence Systems and Observer Performance

Analyzing the performance of artificial intelligence (AI) systems/algorithms characterized by a 'search-and-report' strategy. Historically observer performance has dealt with measuring radiologists' performances in search tasks, e.g., searching for lesions in medical images and reporting them, but the implicit location information has been ignored. The implemented methods apply to analyzing the absolute and relative performances of AI systems, comparing AI performance to a group of human readers or optimizing the reporting threshold of an AI system. In addition to performing historical receiver operating receiver operating characteristic (ROC) analysis (localization information ignored), the software also performs free-response receiver operating characteristic (FROC) analysis, where lesion localization information is used. A book using the software has been published: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples, Taylor-Francis LLC; 2017: <https://www.routledge.com/Observer-Performance-Methods-for-Diagnostic-Imaging-Foundations-Modeling/Chakraborty/p/book/9781482214840>. Online updates to this book, which use the software, are at <https://dpc10ster.github.io/RJafrocQuickStart/>, <https://dpc10ster.github.io/RJafrocRocBook/> and at <https://dpc10ster.github.io/RJafrocFrocBook/>. Supported data collection paradigms are the ROC, FROC and the location ROC (LROC). ROC data consists of single ratings per images, where a rating is the perceived confidence level that the image is that of a diseased patient. An ROC curve is a plot of true positive fraction vs. false positive fraction. FROC data consists of a variable number (zero or more) of mark-rating pairs per image, where a mark is the location of a reported suspicious region and the rating is the confidence level that it is a real lesion. LROC data consists of a rating and a location of the most suspicious region, for every image. Four models of observer performance, and curve-fitting software, are implemented: the binormal model (BM), the contaminated binormal model (CBM), the correlated contaminated binormal model (CORCBM), and the radiological search model (RSM). Unlike the binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves that do not inappropriately cross the chance diagonal. Additionally, RSM parameters are related to search performance (not measured in conventional ROC analysis) and classification performance. Search performance refers to finding lesions, i.e., true positives, while simultaneously not finding false positive locations. Classification performance measures the ability to distinguish between true and false positive locations. Knowing these separate performances allows principled optimization of reader or AI system performance. This package supersedes Windows JAFROC (jackknife alternative FROC) software V4.2.1, <https://github.com/dpc10ster/WindowsJafroc>. Package functions are organized as follows. Data file related function names are preceded by 'Df', curve fitting functions by 'Fit', included data sets by 'dataset', plotting functions by 'Plot', significance testing functions by 'St', sample size related functions by 'Ss', data simulation functions by 'Simulate' and utility functions by 'Util'. Implemented are figures of merit (FOMs) for quantifying performance and functions for visualizing empirical or fitted operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC) curves. For fully crossed study designs significance testing of reader-averaged FOM differences between modalities is implemented via either Dorfman-Berbaum-Metz or the Obuchowski-Rockette methods. Also implemented is single modality analysis, which allows comparison of performance of a group of radiologists to a specified value, or comparison of AI to a group of radiologists interpreting the same cases. Crossed-modality analysis is implemented wherein there are two crossed modality factors and the aim is to determined performance in each modality factor averaged over all levels of the second factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve the desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's collaborations. This update includes improvements to the code, some as a result of user-reported bugs and new feature requests, and others discovered during ongoing testing and code simplification.

Maintained by Dev Chakraborty. Last updated 5 months ago.

ai-optimization artificial-intelligence-algorithms computer-aided-diagnosis froc-analysis roc-analysis target-classification target-localization cpp

15.0 match 19 stars 5.69 score 65 scripts

yelleknek

MBESS:The MBESS R Package

Implements methods that are useful in designing research studies and analyzing data, with particular emphasis on methods that are developed for or used within the behavioral, educational, and social sciences (broadly defined). That being said, many of the methods implemented within MBESS are applicable to a wide variety of disciplines. MBESS has a suite of functions for a variety of related topics, such as effect sizes, confidence intervals for effect sizes (including standardized effect sizes and noncentral effect sizes), sample size planning (from the accuracy in parameter estimation [AIPE], power analytic, equivalence, and minimum-risk point estimation perspectives), mediation analysis, various properties of distributions, and a variety of utility functions. MBESS (pronounced 'em-bes') was originally an acronym for 'Methods for the Behavioral, Educational, and Social Sciences,' but MBESS became more general and now contains methods applicable and used in a wide variety of fields and is an orphan acronym, in the sense that what was an acronym is now literally its name. MBESS has greatly benefited from others, see <https://www3.nd.edu/~kkelley/site/MBESS.html> for a detailed list of those that have contributed and other details.

Maintained by Ken Kelley. Last updated 1 years ago.

10.3 match 2 stars 8.21 score 274 scripts 23 dependents

bioc

multiMiR:Integration of multiple microRNA-target databases with their disease and drug associations

A collection of microRNAs/targets from external resources, including validated microRNA-target databases (miRecords, miRTarBase and TarBase), predicted microRNA-target databases (DIANA-microT, ElMMo, MicroCosm, miRanda, miRDB, PicTar, PITA and TargetScan) and microRNA-disease/drug databases (miR2Disease, Pharmaco-miR VerSe and PhenomiR).

Maintained by Spencer Mahaffey. Last updated 5 months ago.

mirnadata homo_sapiens_data mus_musculus_data rattus_norvegicus_data organismdata microrna-sequence sql

10.0 match 20 stars 8.45 score 141 scripts

bioc

XNAString:Efficient Manipulation of Modified Oligonucleotide Sequences

The XNAString package allows for description of base sequences and associated chemical modifications in a single object. XNAString is able to capture single stranded, as well as double stranded molecules. Chemical modifications are represented as independent strings associated with different features of the molecules (base sequence, sugar sequence, backbone sequence, modifications) and can be read or written to a HELM notation. It also enables secondary structure prediction using RNAfold from ViennaRNA. XNAString is designed to be efficient representation of nucleic-acid based therapeutics, therefore it stores information about target sequences and provides interface for matching and alignment functions from Biostrings and pwalign packages.

Maintained by Marianna Plucinska. Last updated 5 months ago.

sequencematching alignment sequencing genetics cpp

20.0 match 4.18 score 4 scripts

bentaylor1

lgcp:Log-Gaussian Cox Process

Spatial and spatio-temporal modelling of point patterns using the log-Gaussian Cox process. Bayesian inference for spatial, spatiotemporal, multivariate and aggregated point processes using Markov chain Monte Carlo. See Benjamin M. Taylor, Tilman M. Davies, Barry S. Rowlingson, Peter J. Diggle (2015) <doi:10.18637/jss.v063.i07>.

Maintained by Benjamin M. Taylor. Last updated 1 years ago.

23.3 match 3.59 score 27 scripts

jkcshea

ivmte:Instrumental Variables: Extrapolation by Marginal Treatment Effects

The marginal treatment effect was introduced by Heckman and Vytlacil (2005) <doi:10.1111/j.1468-0262.2005.00594.x> to provide a choice-theoretic interpretation to instrumental variables models that maintain the monotonicity condition of Imbens and Angrist (1994) <doi:10.2307/2951620>. This interpretation can be used to extrapolate from the compliers to estimate treatment effects for other subpopulations. This package provides a flexible set of methods for conducting this extrapolation. It allows for parametric or nonparametric sieve estimation, and allows the user to maintain shape restrictions such as monotonicity. The package operates in the general framework developed by Mogstad, Santos and Torgovitsky (2018) <doi:10.3982/ECTA15463>, and accommodates either point identification or partial identification (bounds). In the partially identified case, bounds are computed using either linear programming or quadratically constrained quadratic programming. Support for four solvers is provided. Gurobi and the Gurobi R API can be obtained from <http://www.gurobi.com/index>. CPLEX can be obtained from <https://www.ibm.com/analytics/cplex-optimizer>. CPLEX R APIs 'Rcplex' and 'cplexAPI' are available from CRAN. MOSEK and the MOSEK R API can be obtained from <https://www.mosek.com/>. The lp_solve library is freely available from <http://lpsolve.sourceforge.net/5.5/>, and is included when installing its API 'lpSolveAPI', which is available from CRAN.

Maintained by Joshua Shea. Last updated 7 months ago.

15.4 match 18 stars 5.33 score 30 scripts

bioc

scMultiSim:Simulation of Multi-Modality Single Cell Data Guided By Gene Regulatory Networks and Cell-Cell Interactions

scMultiSim simulates paired single cell RNA-seq, single cell ATAC-seq and RNA velocity data, while incorporating mechanisms of gene regulatory networks, chromatin accessibility and cell-cell interactions. It allows users to tune various parameters controlling the amount of each biological factor, variation of gene-expression levels, the influence of chromatin accessibility on RNA sequence data, and so on. It can be used to benchmark various computational methods for single cell multi-omics data, and to assist in experimental design of wet-lab experiments.

Maintained by Hechen Li. Last updated 5 months ago.

singlecell transcriptomics geneexpression sequencing experimentaldesign

11.4 match 23 stars 7.15 score 11 scripts

bioc

TAPseq:Targeted scRNA-seq primer design for TAP-seq

Design primers for targeted single-cell RNA-seq used by TAP-seq. Create sequence templates for target gene panels and design gene-specific primers using Primer3. Potential off-targets can be estimated with BLAST. Requires working installations of Primer3 and BLASTn.

Maintained by Andreas R. Gschwind. Last updated 5 months ago.

singlecell sequencing technology crispr pooledscreens

15.0 match 4 stars 5.38 score 9 scripts

bioc

peakPantheR:Peak Picking and Annotation of High Resolution Experiments

An automated pipeline for the detection, integration and reporting of predefined features across a large number of mass spectrometry data files. It enables the real time annotation of multiple compounds in a single file, or the parallel annotation of multiple compounds in multiple files. A graphical user interface as well as command line functions will assist in assessing the quality of annotation and update fitting parameters until a satisfactory result is obtained.

Maintained by Arnaud Wolfer. Last updated 5 months ago.

massspectrometry metabolomics peakdetection feature-detection mass-spectrometry

11.8 match 12 stars 6.82 score 23 scripts

bioc

dittoSeq:User Friendly Single-Cell and Bulk RNA Sequencing Visualization

A universal, user friendly, single-cell and bulk RNA sequencing visualization toolkit that allows highly customizable creation of color blindness friendly, publication-quality figures. dittoSeq accepts both SingleCellExperiment (SCE) and Seurat objects, as well as the import and usage, via conversion to an SCE, of SummarizedExperiment or DGEList bulk data. Visualizations include dimensionality reduction plots, heatmaps, scatterplots, percent composition or expression across groups, and more. Customizations range from size and title adjustments to automatic generation of annotations for heatmaps, overlay of trajectory analysis onto any dimensionality reduciton plot, hidden data overlay upon cursor hovering via ggplotly conversion, and many more. All with simple, discrete inputs. Color blindness friendliness is powered by legend adjustments (enlarged keys), and by allowing the use of shapes or letter-overlay in addition to the carefully selected dittoColors().

Maintained by Daniel Bunis. Last updated 5 months ago.

software visualization rnaseq singlecell geneexpression transcriptomics dataimport

10.5 match 7.56 score 760 scripts 2 dependents

pbiecek

bgmm:Gaussian Mixture Modeling Algorithms and the Belief-Based Mixture Modeling

Two partially supervised mixture modeling methods: soft-label and belief-based modeling are implemented. For completeness, we equipped the package also with the functionality of unsupervised, semi- and fully supervised mixture modeling. The package can be applied also to selection of the best-fitting from a set of models with different component numbers or constraints on their structures. For detailed introduction see: Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software <doi:10.18637/jss.v047.i03>.

Maintained by Przemyslaw Biecek. Last updated 2 years ago.

18.7 match 2 stars 4.22 score 55 scripts 1 dependents

bioc

MethReg:Assessing the regulatory potential of DNA methylation regions or sites on gene transcription

Epigenome-wide association studies (EWAS) detects a large number of DNA methylation differences, often hundreds of differentially methylated regions and thousands of CpGs, that are significantly associated with a disease, many are located in non-coding regions. Therefore, there is a critical need to better understand the functional impact of these CpG methylations and to further prioritize the significant changes. MethReg is an R package for integrative modeling of DNA methylation, target gene expression and transcription factor binding sites data, to systematically identify and rank functional CpG methylations. MethReg evaluates, prioritizes and annotates CpG sites with high regulatory potential using matched methylation and gene expression data, along with external TF-target interaction databases based on manually curation, ChIP-seq experiments or gene regulatory network analysis.

Maintained by Tiago Silva. Last updated 5 months ago.

methylationarray regression geneexpression epigenetics genetarget transcription

14.4 match 5 stars 5.45 score 19 scripts

barcaroli

SamplingStrata:Optimal Stratification of Sampling Frames for Multipurpose Sampling Surveys

In the field of stratified sampling design, this package offers an approach for the determination of the best stratification of a sampling frame, the one that ensures the minimum sample cost under the condition to satisfy precision constraints in a multivariate and multidomain case. This approach is based on the use of the genetic algorithm: each solution (i.e. a particular partition in strata of the sampling frame) is considered as an individual in a population; the fitness of all individuals is evaluated applying the Bethel-Chromy algorithm to calculate the sampling size satisfying precision constraints on the target estimates. Functions in the package allows to: (a) analyse the obtained results of the optimisation step; (b) assign the new strata labels to the sampling frame; (c) select a sample from the new frame accordingly to the best allocation. Functions for the execution of the genetic algorithm are a modified version of the functions in the 'genalg' package.

Maintained by Giulio Barcaroli. Last updated 13 days ago.

cpp

10.3 match 15 stars 7.60 score 178 scripts

bioc

multicrispr:Multi-locus multi-purpose Crispr/Cas design

This package is for designing Crispr/Cas9 and Prime Editing experiments. It contains functions to (1) define and transform genomic targets, (2) find spacers (4) count offtarget (mis)matches, and (5) compute Doench2016/2014 targeting efficiency. Care has been taken for multicrispr to scale well towards large target sets, enabling the design of large Crispr/Cas9 libraries.

Maintained by Aditya Bhagwat. Last updated 4 months ago.

crispr software

13.8 match 5.65 score 2 scripts

ropensci

redland:RDF Library Bindings in R

Provides methods to parse, query and serialize information stored in the Resource Description Framework (RDF). RDF is described at <https://www.w3.org/TR/rdf-primer/>. This package supports RDF by implementing an R interface to the Redland RDF C library, described at <https://librdf.org/docs/api/index.html>. In brief, RDF provides a structured graph consisting of Statements composed of Subject, Predicate, and Object Nodes.

Maintained by Matthew B. Jones. Last updated 1 years ago.

redland

9.8 match 17 stars 7.85 score 98 scripts 13 dependents

bioc

signatureSearch:Environment for Gene Expression Searching Combined with Functional Enrichment Analysis

This package implements algorithms and data structures for performing gene expression signature (GES) searches, and subsequently interpreting the results functionally with specialized enrichment methods.

Maintained by Brendan Gongol. Last updated 5 months ago.

software geneexpression go kegg networkenrichment sequencing coverage differentialexpression cpp

10.6 match 17 stars 7.18 score 74 scripts 1 dependents

cfwp

rags2ridges:Ridge Estimation of Precision Matrices from High-Dimensional Data

Proper L2-penalized maximum likelihood estimators for precision matrices and supporting functions to employ these estimators in a graphical modeling setting. For details, see Peeters, Bilgrau, & van Wieringen (2022) <doi:10.18637/jss.v102.i04> and associated publications.

Maintained by Carel F.W. Peeters. Last updated 1 years ago.

c-plus-plus graphical-models machine-learning networkscience statistics openblas cpp

13.2 match 8 stars 5.60 score 46 scripts

bioc

SIMAT:GC-SIM-MS data processing and alaysis tool

This package provides a pipeline for analysis of GC-MS data acquired in selected ion monitoring (SIM) mode. The tool also provides a guidance in choosing appropriate fragments for the targets of interest by using an optimization algorithm. This is done by considering overlapping peaks from a provided library by the user.

Maintained by M. R. Nezami Ranjbar. Last updated 5 months ago.

immunooncology software metabolomics massspectrometry

17.3 match 4.26 score 1 scripts

bioc

fCI:f-divergence Cutoff Index for Differential Expression Analysis in Transcriptomics and Proteomics

(f-divergence Cutoff Index), is to find DEGs in the transcriptomic & proteomic data, and identify DEGs by computing the difference between the distribution of fold-changes for the control-control and remaining (non-differential) case-control gene expression ratio data. fCI provides several advantages compared to existing methods.

Maintained by Shaojun Tang. Last updated 5 months ago.

proteomics

22.0 match 3.30 score 5 scripts

bioc

miRSM:Inferring miRNA sponge modules in heterogeneous data

The package aims to identify miRNA sponge or ceRNA modules in heterogeneous data. It provides several functions to study miRNA sponge modules at single-sample and multi-sample levels, including popular methods for inferring gene modules (candidate miRNA sponge or ceRNA modules), and two functions to identify miRNA sponge modules at single-sample and multi-sample levels, as well as several functions to conduct modular analysis of miRNA sponge modules.

Maintained by Junpeng Zhang. Last updated 5 months ago.

geneexpression biomedicalinformatics clustering genesetenrichment microarray software generegulation genetarget cerna mirna mirna-sponge mirna-targets modules openjdk

12.8 match 4 stars 5.68 score 5 scripts

nhs-r-community

NHSRwaitinglist:R-package to implement a waiting list management approach

R-package to implement the waiting list management approach described in this paper by Fong et al 2022.

Maintained by Tom Smith. Last updated 4 months ago.

nhs queuing-theory

11.6 match 16 stars 6.06 score 17 scripts

great-northern-diver

loon:Interactive Statistical Data Visualization

An extendable toolkit for interactive data visualization and exploration.

Maintained by R. Wayne Oldford. Last updated 2 years ago.

data-analysis data-science data-visualization exploratory-analysis exploratory-data-analysis high-dimensional-data interactive-graphics interactive-visualizations loon python statistical-analysis statistical-graphics statistics tcl-extension tk

7.5 match 48 stars 9.00 score 93 scripts 5 dependents

bioc

biotmle:Targeted Learning with Moderated Statistics for Biomarker Discovery

Tools for differential expression biomarker discovery based on microarray and next-generation sequencing data that leverage efficient semiparametric estimators of the average treatment effect for variable importance analysis. Estimation and inference of the (marginal) average treatment effects of potential biomarkers are computed by targeted minimum loss-based estimation, with joint, stable inference constructed across all biomarkers using a generalization of moderated statistics for use with the estimated efficient influence function. The procedure accommodates the use of ensemble machine learning for the estimation of nuisance functions.

Maintained by Nima Hejazi. Last updated 5 months ago.

regression geneexpression differentialexpression sequencing microarray rnaseq immunooncology bioconductor bioconductor-package bioconductor-packages bioinformatics biomarker-discovery biostatistics causal-inference computational-biology machine-learning statistics targeted-learning

12.6 match 5 stars 5.30 score 5 scripts

daya6489

SmartEDA:Summarize and Explore the Data

Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.

Maintained by Dayanand Ubrangala. Last updated 1 years ago.

analysis exploratory-data-analysis

9.1 match 42 stars 7.25 score 214 scripts

bioc

preprocessCore:A collection of pre-processing functions

A library of core preprocessing routines.

Maintained by Ben Bolstad. Last updated 5 months ago.

infrastructure openblas

5.5 match 19 stars 12.03 score 1.8k scripts 204 dependents

fredhutch

gimap:Calculate Genetic Interactions for Paired CRISPR Targets

Helps find meaningful patterns in complex genetic experiments. First gimap takes data from paired CRISPR (Clustered regularly interspaced short palindromic repeats) screens that has been pre-processed to counts table of paired gRNA (guide Ribonucleic Acid) reads. The input data will have cell counts for how well cells grow (or don't grow) when different genes or pairs of genes are disabled. The output of the 'gimap' package is genetic interaction scores which are the distance between the observed CRISPR score and the expected CRISPR score. The expected CRISPR scores are what we expect for the CRISPR values to be for two unrelated genes. The further away an observed CRISPR score is from its expected score the more we suspect genetic interaction. The work in this package is based off of original research from the Alice Berger lab at Fred Hutchinson Cancer Center (2021) <doi:10.1016/j.celrep.2021.109597>.

Maintained by Candace Savonen. Last updated 2 days ago.

10.2 match 6.43 score 7 scripts

pablo14

funModeling:Exploratory Data Analysis and Data Preparation Tool-Box

Around 10% of almost any predictive modeling project is spent in predictive modeling, 'funModeling' and the book Data Science Live Book (<https://livebook.datascienceheroes.com/>) are intended to cover remaining 90%: data preparation, profiling, selecting best variables 'dataViz', assessing model performance and other functions.

Maintained by Pablo Casas. Last updated 2 years ago.

7.6 match 100 stars 8.57 score 654 scripts

irinagain

iglu:Interpreting Glucose Data from Continuous Glucose Monitors

Implements a wide range of metrics for measuring glucose control and glucose variability based on continuous glucose monitoring data. The list of implemented metrics is summarized in Rodbard (2009) <doi:10.1089/dia.2009.0015>. Additional visualization tools include time-series plots, lasagna plots and ambulatory glucose profile report.

Maintained by Irina Gaynanova. Last updated 9 days ago.

7.2 match 26 stars 9.00 score 39 scripts

bioc

decoupleR:decoupleR: Ensemble of computational methods to infer biological activities from omics data

Many methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability. Here, we present decoupleR, a Bioconductor package containing different statistical methods to extract these signatures within a unified framework. decoupleR allows the user to flexibly test any method with any resource. It incorporates methods that take into account the sign and weight of network interactions. decoupleR can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.

Maintained by Pau Badia-i-Mompel. Last updated 5 months ago.

differentialexpression functionalgenomics geneexpression generegulation network software statisticalmethod transcription

5.7 match 230 stars 11.27 score 316 scripts 3 dependents

bioc

TFutils:TFutils

This package helps users to work with TF metadata from various sources. Significant catalogs of TFs and classifications thereof are made available. Tools for working with motif scans are also provided.

Maintained by Vincent Carey. Last updated 4 months ago.

transcriptomics

13.1 match 4.80 score 21 scripts

bioc

FindIT2:find influential TF and Target based on multi-omics data

This package implements functions to find influential TF and target based on different input type. It have five module: Multi-peak multi-gene annotaion(mmPeakAnno module), Calculate regulation potential(calcRP module), Find influential Target based on ChIP-Seq and RNA-Seq data(Find influential Target module), Find influential TF based on different input(Find influential TF module), Calculate peak-gene or peak-peak correlation(peakGeneCor module). And there are also some other useful function like integrate different source information, calculate jaccard similarity for your TF.

Maintained by Guandong Shang. Last updated 5 months ago.

software annotation chipseq atacseq generegulation multiplecomparison genetarget

12.0 match 6 stars 5.26 score 7 scripts

bioc

DeepTarget:Deep characterization of cancer drugs

This package predicts a drug’s primary target(s) or secondary target(s) by integrating large-scale genetic and drug screens from the Cancer Dependency Map project run by the Broad Institute. It further investigates whether the drug specifically targets the wild-type or mutated target forms. To show how to use this package in practice, we provided sample data along with step-by-step example.

Maintained by Trinh Nguyen. Last updated 5 months ago.

genetarget geneprediction pathways geneexpression rnaseq immunooncology differentialexpression genesetenrichment reportwriting crispr

13.8 match 4.54 score 1 scripts

richardgeveritt

ggsmc:Visualising Output from Sequential Monte Carlo and Ensemble-Based Methods

Functions for plotting, and animating, the output of importance samplers, sequential Monte Carlo samplers (SMC) and ensemble-based methods. The package can be used to plot and animate histograms, densities, scatter plots and time series, and to plot the genealogy of an SMC or ensemble-based algorithm. These functions all rely on algorithm output to be supplied in tidy format. A function is provided to transform algorithm output from matrix format (one Monte Carlo point per row) to the tidy format required by the plotting and animating functions.

Maintained by Richard G Everitt. Last updated 2 months ago.

13.6 match 4.48 score 6 scripts

ucl

rmcmc:Robust Markov Chain Monte Carlo Methods

Functions for simulating Markov chains using the Barker proposal to compute Markov chain Monte Carlo (MCMC) estimates of expectations with respect to a target distribution on a real-valued vector space. The Barker proposal, described in Livingstone and Zanella (2022) <doi:10.1111/rssb.12482>, is a gradient-based MCMC algorithm inspired by the Barker accept-reject rule. It combines the robustness of simpler MCMC schemes, such as random-walk Metropolis, with the efficiency of gradient-based methods, such as the Metropolis adjusted Langevin algorithm. The key function provided by the package is sample_chain(), which allows sampling a Markov chain with a specified target distribution as its stationary distribution. The chain is sampled by generating proposals and accepting or rejecting them using a Metropolis-Hasting acceptance rule. During an initial warm-up stage, the parameters of the proposal distribution can be adapted, with adapters available to both: tune the scale of the proposals by coercing the average acceptance rate to a target value; tune the shape of the proposals to match covariance estimates under the target distribution. As well as the default Barker proposal, the package also provides implementations of alternative proposal distributions, such as (Gaussian) random walk and Langevin proposals. Optionally, if 'BridgeStan's R interface <https://roualdes.github.io/bridgestan/latest/languages/r.html>, available on GitHub <https://github.com/roualdes/bridgestan>, is installed, then 'BridgeStan' can be used to specify the target distribution to sample from.

Maintained by Matthew M. Graham. Last updated 12 days ago.

approximate-inference mcmc

10.4 match 5 stars 5.85 score 8 scripts

adrianhordyk

DLMtool:Data-Limited Methods Toolkit

A collection of data-limited management procedures that can be evaluated with management strategy evaluation with the 'MSEtool' package, or applied to fishery data to provide management recommendations.

Maintained by Adrian Hordyk. Last updated 3 years ago.

cpp

16.5 match 1 stars 3.67 score 229 scripts 1 dependents

nyuglobalties

blueprintr:Automagically Document and Test Datasets Using Targets Or Drake

Documents and tests datasets in a reproducible manner so that data lineage is easier to comprehend for small to medium tabular data. Originally designed to aid data cleaning tasks for humanitarian research groups, specifically large-scale longitudinal studies.

Maintained by Patrick Anker. Last updated 8 months ago.

drake targets

17.7 match 1 stars 3.40 score 7 scripts

nt-williams

lmtp:Non-Parametric Causal Effects of Feasible Interventions Based on Modified Treatment Policies

Non-parametric estimators for casual effects based on longitudinal modified treatment policies as described in Diaz, Williams, Hoffman, and Schenck <doi:10.1080/01621459.2021.1955691>, traditional point treatment, and traditional longitudinal effects. Continuous, binary, categorical treatments, and multivariate treatments are allowed as well are censored outcomes. The treatment mechanism is estimated via a density ratio classification procedure irrespective of treatment variable type. For both continuous and binary outcomes, additive treatment effects can be calculated and relative risks and odds ratios may be calculated for binary outcomes. Supports survival outcomes with competing risks (Diaz, Hoffman, and Hejazi; <doi:10.1007/s10985-023-09606-7>).

Maintained by Nicholas Williams. Last updated 8 days ago.

causal-inference censored-data longitudinal-data machine-learning modified-treatment-policy nonparametric-statistics precision-medicine robust-statistics statistics stochastic-interventions survival-analysis targeted-learning

9.3 match 64 stars 6.37 score 91 scripts

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

7.2 match 3 stars 8.20 score 7.8k scripts 11 dependents

bioc

drugTargetInteractions:Drug-Target Interactions

Provides utilities for identifying drug-target interactions for sets of small molecule or gene/protein identifiers. The required drug-target interaction information is obained from a local SQLite instance of the ChEMBL database. ChEMBL has been chosen for this purpose, because it provides one of the most comprehensive and best annotatated knowledge resources for drug-target information available in the public domain.

Maintained by Thomas Girke. Last updated 5 months ago.

cheminformatics biomedicalinformatics pharmacogenetics pharmacogenomics proteomics metabolomics

13.5 match 1 stars 4.34 score 11 scripts

tdhock

penaltyLearning:Penalty Learning

Implementations of algorithms from Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression, by Hocking, Rigaill, Vert, Bach <http://proceedings.mlr.press/v28/hocking13.html> published in proceedings of ICML2013.

Maintained by Toby Dylan Hocking. Last updated 5 months ago.

cpp

9.5 match 16 stars 6.13 score 129 scripts 2 dependents

cjvanlissa

worcs:Workflow for Open Reproducible Code in Science

Create reproducible and transparent research projects in 'R'. This package is based on the Workflow for Open Reproducible Code in Science (WORCS), a step-by-step procedure based on best practices for Open Science. It includes an 'RStudio' project template, several convenience functions, and all dependencies required to make your project reproducible and transparent. WORCS is explained in the tutorial paper by Van Lissa, Brandmaier, Brinkman, Lamprecht, Struiksma, & Vreede (2021). <doi:10.3233/DS-210031>.

Maintained by Caspar J. Van Lissa. Last updated 10 days ago.

6.2 match 83 stars 9.26 score 59 scripts

jeffreyevans

yaImpute:Nearest Neighbor Observation Imputation and Evaluation Tools

Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation: analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.

Maintained by Jeffrey S. Evans. Last updated 6 months ago.

imputation cpp

7.7 match 3 stars 7.40 score 94 scripts 12 dependents

bluefoxr

COINr:Composite Indicator Construction and Analysis

A comprehensive high-level package, for composite indicator construction and analysis. It is a "development environment" for composite indicators and scoreboards, which includes utilities for construction (indicator selection, denomination, imputation, data treatment, normalisation, weighting and aggregation) and analysis (multivariate analysis, correlation plotting, short cuts for principal component analysis, global sensitivity analysis, and more). A composite indicator is completely encapsulated inside a single hierarchical list called a "coin". This allows a fast and efficient work flow, as well as making quick copies, testing methodological variations and making comparisons. It also includes many plotting options, both statistical (scatter plots, distribution plots) as well as for presenting results.

Maintained by William Becker. Last updated 2 months ago.

6.3 match 26 stars 9.07 score 73 scripts 1 dependents

causal-lda

TrialEmulation:Causal Analysis of Observational Time-to-Event Data

Implements target trial emulation methods to apply randomized clinical trial design and analysis in an observational setting. Using marginal structural models, it can estimate intention-to-treat and per-protocol effects in emulated trials using electronic health records. A description and application of the method can be found in Danaei et al (2013) <doi:10.1177/0962280211403603>.

Maintained by Isaac Gravestock. Last updated 22 days ago.

causal-inference longitudinal-data survival-analysis cpp

7.3 match 25 stars 7.72 score 29 scripts

bioc

POMA:Tools for Omics Data Analysis

The POMA package offers a comprehensive toolkit designed for omics data analysis, streamlining the process from initial visualization to final statistical analysis. Its primary goal is to simplify and unify the various steps involved in omics data processing, making it more accessible and manageable within a single, intuitive R package. Emphasizing on reproducibility and user-friendliness, POMA leverages the standardized SummarizedExperiment class from Bioconductor, ensuring seamless integration and compatibility with a wide array of Bioconductor tools. This approach guarantees maximum flexibility and replicability, making POMA an essential asset for researchers handling omics datasets. See https://github.com/pcastellanoescuder/POMAShiny. Paper: Castellano-Escuder et al. (2021) <doi:10.1371/journal.pcbi.1009148> for more details.

Maintained by Pol Castellano-Escuder. Last updated 4 months ago.

batcheffect classification clustering decisiontree dimensionreduction multidimensionalscaling normalization preprocessing principalcomponent regression rnaseq software statisticalmethod visualization bioconductor bioinformatics data-visualization dimension-reduction exploratory-data-analysis machine-learning omics-data-integration pipeline pre-processing statistical-analysis user-friendly workflow

6.8 match 11 stars 8.23 score 20 scripts 1 dependents

c-rutter

imabc:Incremental Mixture Approximate Bayesian Computation (IMABC)

Provides functionality to perform a likelihood-free method for estimating the parameters of complex models that results in a simulated sample from the posterior distribution of model parameters given targets. The method begins with a accept/reject approximate bayes computation (ABC) step applied to a sample of points from the prior distribution of model parameters. Accepted points result in model predictions that are within the initially specified tolerance intervals around the target points. The sample is iteratively updated by drawing additional points from a mixture of multivariate normal distributions, accepting points within tolerance intervals. As the algorithm proceeds, the acceptance intervals are narrowed. The algorithm returns a set of points and sampling weights that account for the adaptive sampling scheme. For more details see Rutter, Ozik, DeYoreo, and Collier (2018) <arXiv:1804.02090>.

Maintained by "Christopher, E. Maerzluft". Last updated 2 years ago.

calibration simulation

10.3 match 8 stars 5.38 score 7 scripts

bioc

BulkSignalR:Infer Ligand-Receptor Interactions from bulk expression (transcriptomics/proteomics) data, or spatial transcriptomics

Inference of ligand-receptor (LR) interactions from bulk expression (transcriptomics/proteomics) data, or spatial transcriptomics. BulkSignalR bases its inferences on the LRdb database included in our other package, SingleCellSignalR available from Bioconductor. It relies on a statistical model that is specific to bulk data sets. Different visualization and data summary functions are proposed to help navigating prediction results.

Maintained by Jean-Philippe Villemin. Last updated 3 months ago.

network rnaseq software proteomics transcriptomics networkinference spatial

10.6 match 5.22 score 15 scripts

r-forge

pcalg:Methods for Graphical Models and Causal Inference

Functions for causal structure learning and causal inference using graphical models. The main algorithms for causal structure learning are PC (for observational data without hidden variables), FCI and RFCI (for observational data with hidden variables), and GIES (for a mix of data from observational studies (i.e. observational data) and data from experiments involving interventions (i.e. interventional data) without hidden variables). For causal inference the IDA algorithm, the Generalized Backdoor Criterion (GBC), the Generalized Adjustment Criterion (GAC) and some related functions are implemented. Functions for incorporating background knowledge are provided.

Maintained by Markus Kalisch. Last updated 6 months ago.

openblas cpp

7.5 match 7.32 score 700 scripts 19 dependents

bioc

EnrichedHeatmap:Making Enriched Heatmaps

Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization sequencing genomeannotation coverage cpp

5.1 match 190 stars 10.87 score 330 scripts 1 dependents

bioc

RAREsim:Simulation of Rare Variant Genetic Data

Haplotype simulations of rare variant genetic data that emulates real data can be performed with RAREsim. RAREsim uses the expected number of variants in MAC bins - either as provided by default parameters or estimated from target data - and an abundance of rare variants as simulated HAPGEN2 to probabilistically prune variants. RAREsim produces haplotypes that emulate real sequencing data with respect to the total number of variants, allele frequency spectrum, haplotype structure, and variant annotation.

Maintained by Ryan Barnard. Last updated 5 months ago.

genetics software variantannotation sequencing

11.8 match 4 stars 4.60 score 4 scripts

revelle

psych:Procedures for Psychological, Psychometric, and Personality Research

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 3 months ago.

3.9 match 52 stars 13.94 score 29k scripts 317 dependents

bioc

gCrisprTools:Suite of Functions for Pooled Crispr Screen QC and Analysis

Set of tools for evaluating pooled high-throughput screening experiments, typically employing CRISPR/Cas9 or shRNA expression cassettes. Contains methods for interrogating library and cassette behavior within an experiment, identifying differentially abundant cassettes, aggregating signals to identify candidate targets for empirical validation, hypothesis testing, and comprehensive reporting. Version 2.0 extends these applications to include a variety of tools for contextualizing and integrating signals across many experiments, incorporates extended signal enrichment methodologies via the "sparrow" package, and streamlines many formal requirements to aid in interpretablity.

Maintained by Russell Bainer. Last updated 5 months ago.

immunooncology crispr pooledscreens experimentaldesign biomedicalinformatics cellbiology functionalgenomics pharmacogenomics pharmacogenetics systemsbiology differentialexpression genesetenrichment genetics multiplecomparison normalization preprocessing qualitycontrol rnaseq regression software visualization

11.3 match 4.78 score 8 scripts

flr

mse:Tools for Running Management Strategy Evaluations using FLR

A set of functions and methods to enable the development and running of Management Strategy Evaluation (MSE) analyses, using the FLR packages and classes and the a4a methods and algorithms.

Maintained by Iago Mosqueira. Last updated 20 days ago.

simulation mse fisheries flr a4a

7.6 match 4 stars 7.04 score 137 scripts 3 dependents

joshuaschwab

ltmle:Longitudinal Targeted Maximum Likelihood Estimation

Targeted Maximum Likelihood Estimation ('TMLE') of treatment/censoring specific mean outcome or marginal structural model for point-treatment and longitudinal data. Petersen et al. (2014) <doi:10.1515/jci-2013-0007>

Maintained by Joshua Schwab. Last updated 2 years ago.

8.7 match 23 stars 6.15 score 207 scripts

jiefei-wang

aws.ecx:Communicating with AWS EC2 and ECS using AWS REST APIs

Providing the functions for communicating with Amazon Web Services(AWS) Elastic Compute Cloud(EC2) and Elastic Container Service(ECS). The functions will have the prefix 'ecs_' or 'ec2_' depending on the class of the API. The request will be sent via the REST API and the parameters are given by the function argument. The credentials can be set via 'aws_set_credentials'. The EC2 documentation can be found at <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Welcome.html> and ECS can be found at <https://docs.aws.amazon.com/AmazonECS/latest/APIReference/Welcome.html>.

Maintained by Jiefei Wang. Last updated 3 years ago.

ec2 ecs ecs-functions

12.7 match 1 stars 4.18 score 2 scripts

reichlab

zoltr:Interface to the 'Zoltar' Forecast Repository API

'Zoltar' <https://www.zoltardata.com/> is a website that provides a repository of model forecast results in a standardized format and a central location. It supports storing, retrieving, comparing, and analyzing time series forecasts for prediction challenges of interest to the modeling community. This package provides functions for working with the 'Zoltar' API, including connecting and authenticating, getting meta information (projects, models, and forecasts, and truth), and uploading, downloading, and deleting forecast and truth data.

Maintained by Matthew Cornell. Last updated 9 days ago.

7.0 match 2 stars 7.58 score 175 scripts 3 dependents

bioc

CrispRVariants:Tools for counting and visualising mutations in a target location

CrispRVariants provides tools for analysing the results of a CRISPR-Cas9 mutagenesis sequencing experiment, or other sequencing experiments where variants within a given region are of interest. These tools allow users to localize variant allele combinations with respect to any genomic location (e.g. the Cas9 cut site), plot allele combinations and calculate mutation rates with flexible filtering of unrelated variants.

Maintained by Helen Lindsay. Last updated 5 months ago.

immunooncology crispr genomicvariation variantdetection geneticvariability datarepresentation visualization sequencing

9.5 match 5.51 score 32 scripts

johnmackintosh

cusumcharter:Easier CUSUM Control Charts

Create CUSUM (cumulative sum) statistics from a vector or dataframe. Also create single or faceted CUSUM control charts, with or without control limits. Accepts vector, dataframe, tibble or data.table inputs.

Maintained by John MacKintosh. Last updated 4 months ago.

cusum ggplot2 health-informatics healthcare quality-improvement rdatatable statistical-process-control

10.1 match 27 stars 5.13 score 9 scripts

nhejazi

txshift:Efficient Estimation of the Causal Effects of Stochastic Interventions

Efficient estimation of the population-level causal effects of stochastic interventions on a continuous-valued exposure. Both one-step and targeted minimum loss estimators are implemented for the counterfactual mean value of an outcome of interest under an additive modified treatment policy, a stochastic intervention that may depend on the natural value of the exposure. To accommodate settings with outcome-dependent two-phase sampling, procedures incorporating inverse probability of censoring weighting are provided to facilitate the construction of inefficient and efficient one-step and targeted minimum loss estimators. The causal parameter and its estimation were first described by Díaz and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x>, while the multiply robust estimation procedure and its application to data from two-phase sampling designs is detailed in NS Hejazi, MJ van der Laan, HE Janes, PB Gilbert, and DC Benkeser (2020) <doi:10.1111/biom.13375>. The software package implementation is described in NS Hejazi and DC Benkeser (2020) <doi:10.21105/joss.02447>. Estimation of nuisance parameters may be enhanced through the Super Learner ensemble model in 'sl3', available for download from GitHub using 'remotes::install_github("tlverse/sl3")'.

Maintained by Nima Hejazi. Last updated 6 months ago.

causal-effects causal-inference censored-data machine-learning robust-statistics statistics stochastic-interventions stochastic-treatment-regimes targeted-learning treatment-effects variable-importance

9.9 match 14 stars 5.12 score 19 scripts

bioc

rtracklayer:R interface to genome annotation files and the UCSC genome browser

Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport.

Maintained by Michael Lawrence. Last updated 7 days ago.

annotation visualization dataimport zlib openssl curl

4.0 match 12.66 score 6.7k scripts 481 dependents

tjmahr

notestar:Notebooks Using 'Targets' and 'Bookdown'

'Targets' is an R package for dependency and build management in data analysis projects. This package provides a set of targets and project infrastructure to create 'bookdown'-based notebooks using 'targets'.

Maintained by Tristan Mahr. Last updated 2 months ago.

bookdown knitr pandoc rmarkdown targets

15.9 match 30 stars 3.18 score 7 scripts

ssnn-airr

shazam:Immunoglobulin Somatic Hypermutation Analysis

Provides a computational framework for analyzing mutations in immunoglobulin (Ig) sequences. Includes methods for Bayesian estimation of antigen-driven selection pressure, mutational load quantification, building of somatic hypermutation (SHM) models, and model-dependent distance calculations. Also includes empirically derived models of SHM for both mice and humans. Citations: Gupta and Vander Heiden, et al (2015) <doi:10.1093/bioinformatics/btv359>, Yaari, et al (2012) <doi:10.1093/nar/gks457>, Yaari, et al (2013) <doi:10.3389/fimmu.2013.00358>, Cui, et al (2016) <doi:10.4049/jimmunol.1502263>.

Maintained by Susanna Marquez. Last updated 2 months ago.

6.8 match 7.43 score 222 scripts 2 dependents

bioc

circRNAprofiler:circRNAprofiler: An R-Based Computational Framework for the Downstream Analysis of Circular RNAs

R-based computational framework for a comprehensive in silico analysis of circRNAs. This computational framework allows to combine and analyze circRNAs previously detected by multiple publicly available annotation-based circRNA detection tools. It covers different aspects of circRNAs analysis from differential expression analysis, evolutionary conservation, biogenesis to functional analysis.

Maintained by Simona Aufiero. Last updated 5 months ago.

annotation structuralprediction functionalprediction geneprediction genomeassembly differentialexpression

8.7 match 10 stars 5.78 score 5 scripts

bioc

PureCN:Copy number calling and SNV classification using targeted short read sequencing

This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.

Maintained by Markus Riester. Last updated 2 months ago.

copynumbervariation software sequencing variantannotation variantdetection coverage immunooncology bioconductor-package cell-free-dna copy-number loh tumor-heterogeneity tumor-mutational-burden tumor-purity

5.1 match 132 stars 9.72 score 40 scripts

psychelzh

tarflow.iquizoo:Setup "targets" Workflows for "iquizoo" Data Processing

For "iquizoo" data processing, there is already a package called "preproc.iquizoo", but eventually the use of it is relied on a workflow. This package is used to build such workflows based on tools provided by "targets" package which mimics the logic of "make", automating the building processes.

Maintained by Liang Zhang. Last updated 5 months ago.

13.9 match 10 stars 3.60 score 1 scripts

bioc

crisprBowtie:Bowtie-based alignment of CRISPR gRNA spacer sequences

Provides a user-friendly interface to map on-targets and off-targets of CRISPR gRNA spacer sequences using bowtie. The alignment is fast, and can be performed using either commonly-used or custom CRISPR nucleases. The alignment can work with any reference or custom genomes. Both DNA- and RNA-targeting nucleases are supported.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics alignment aligner bioconductor bioconductor-package bowtie crispr-analysis crispr-cas9 crispr-design crispr-target grna grna-sequence grna-sequences sgrna sgrna-design

8.4 match 3 stars 5.86 score 7 scripts 4 dependents

ropensci

git2r:Provides Access to Git Repositories

Interface to the 'libgit2' library, which is a pure C implementation of the 'Git' core methods. Provides access to 'Git' repositories to extract data and running some basic 'Git' commands.

Maintained by Stefan Widgren. Last updated 10 days ago.

git git-client libgit2 libgit2-library

3.5 match 218 stars 13.86 score 836 scripts 49 dependents

darwin-eu

IncidencePrevalence:Estimate Incidence and Prevalence using the OMOP Common Data Model

Calculate incidence and prevalence using data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. Incidence and prevalence can be estimated for the total population in a database or for a stratification cohort.

Maintained by Edward Burn. Last updated 5 days ago.

6.1 match 9 stars 7.96 score 102 scripts 1 dependents

openbiox

UCSCXenaShiny:Interactive Analysis of UCSC Xena Data

Provides functions and a Shiny application for downloading, analyzing and visualizing datasets from UCSC Xena (<http://xena.ucsc.edu/>), which is a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others.

Maintained by Shixiang Wang. Last updated 4 months ago.

cancer-dataset shiny-apps ucsc-xena

5.7 match 96 stars 8.54 score 35 scripts

jarretrt

tci:Target Controlled Infusion (TCI)

Implementation of target-controlled infusion algorithms for compartmental pharmacokinetic and pharmacokinetic-pharmacodynamic models. Jacobs (1990) <doi:10.1109/10.43622>; Marsh et al. (1991) <doi:10.1093/bja/67.1.41>; Shafer and Gregg (1993) <doi:10.1007/BF01070999>; Schnider et al. (1998) <doi:10.1097/00000542-199805000-00006>; Abuhelwa, Foster, and Upton (2015) <doi:10.1016/j.vascn.2015.03.004>; Eleveld et al. (2018) <doi:10.1016/j.bja.2018.01.018>.

Maintained by Ryan Jarrett. Last updated 2 years ago.

cpp openmp

13.7 match 6 stars 3.48 score 8 scripts

bhklab

mRMRe:Parallelized Minimum Redundancy, Maximum Relevance (mRMR)

Computes mutual information matrices from continuous, categorical and survival variables, as well as feature selection with minimum redundancy, maximum relevance (mRMR) and a new ensemble mRMR technique. Published in De Jay et al. (2013) <doi:10.1093/bioinformatics/btt383>.

Maintained by Benjamin Haibe-Kains. Last updated 4 years ago.

cpp openmp

5.3 match 19 stars 8.95 score 105 scripts 2 dependents

jucheng1992

ctmle:Collaborative Targeted Maximum Likelihood Estimation

Implements the general template for collaborative targeted maximum likelihood estimation. It also provides several commonly used C-TMLE instantiation, like the vanilla/scalable variable-selection C-TMLE (Ju et al. (2017) <doi:10.1177/0962280217729845>) and the glmnet-C-TMLE algorithm (Ju et al. (2017) <arXiv:1706.10029>).

Maintained by Cheng Ju. Last updated 5 years ago.

causal-inference machine-learning statistics tmle

9.8 match 5 stars 4.83 score 27 scripts

mommy003

MSML:Model Selection Based on Machine Learning (ML)

Model evaluation based on a modified version of the recursive feature elimination algorithm. This package is designed to determine the optimal model(s) by leveraging all available features.

Maintained by Moksedul Momin. Last updated 9 months ago.

13.6 match 3.48 score 1 scripts

tlverse

tmle3mopttx:Targeted Maximum Likelihood Estimation of the Mean under Optimal Individualized Treatment

This package estimates the optimal individualized treatment rule for the categorical treatment using Super Learner (sl3). In order to avoid nested cross-validation, it uses split-specific estimates of Q and g to estimate the rule as described by Coyle et al. In addition, it provides the Targeted Maximum Likelihood estimates of the mean performance using CV-TMLE under such estimated rules. This is an adapter package for use with the tmle3 framework and the tlverse software ecosystem for Targeted Learning.

Maintained by Ivana Malenica. Last updated 3 years ago.

categorical-treatment causal-inference heterogeneous-effects machine-learning optimal-individualized-treatment targeted-learning variable-importance

11.1 match 12 stars 4.25 score 49 scripts 1 dependents

ropensci

tidyqpcr:Quantitative PCR Analysis with the Tidyverse

For reproducible quantitative PCR (qPCR) analysis building on packages from the ’tidyverse’, notably ’dplyr’ and ’ggplot2’. It normalizes (by ddCq), summarizes, and plots pre-calculated Cq data, and plots raw amplification and melt curves from Roche Lightcycler (tm) machines. It does NOT (yet) calculate Cq data from amplification curves.

Maintained by Edward Wallace. Last updated 11 months ago.

miqe qpcr qpcr-analysis tidyverse

8.3 match 54 stars 5.64 score 20 scripts

geco-bern

rsofun:The P-Model and BiomeE Modelling Framework

Implements the Simulating Optimal FUNctioning framework for site-scale simulations of ecosystem processes, including model calibration. It contains 'Fortran 90' modules for the P-model (Stocker et al. (2020) <doi:10.5194/gmd-13-1545-2020>), SPLASH (Davis et al. (2017) <doi:10.5194/gmd-10-689-2017>) and BiomeE (Weng et al. (2015) <doi:10.5194/bg-12-2655-2015>).

Maintained by Benjamin Stocker. Last updated 12 days ago.

dgvm growth modeling p-model simulation vegetation-dynamics fortran

5.3 match 26 stars 8.77 score 119 scripts

hanjunwei-lab

DTSEA:Drug Target Set Enrichment Analysis

It is a novel tool used to identify the candidate drugs against a particular disease based on the drug target set enrichment analysis. It assumes the most effective drugs are those with a closer affinity in the protein-protein interaction network to the specified disease. (See Gómez-Carballa et al. (2022) <doi: 10.1016/j.envres.2022.112890> and Feng et al. (2022) <doi: 10.7150/ijms.67815> for disease expression profiles; see Wishart et al. (2018) <doi: 10.1093/nar/gkx1037> and Gaulton et al. (2017) <doi: 10.1093/nar/gkw1074> for drug target information; see Kanehisa et al. (2021) <doi: 10.1093/nar/gkaa970> for the details of KEGG database.)

Maintained by Junwei Han. Last updated 2 years ago.

10.7 match 4.32 score 42 scripts

bioc

memes:motif matching, comparison, and de novo discovery using the MEME Suite

A seamless interface to the MEME Suite family of tools for motif analysis. 'memes' provides data aware utilities for using GRanges objects as entrypoints to motif analysis, data structures for examining & editing motif lists, and novel data visualizations. 'memes' functions and data structures are amenable to both base R and tidyverse workflows.

Maintained by Spencer Nystrom. Last updated 5 months ago.

dataimport functionalgenomics generegulation motifannotation motifdiscovery sequencematching software

5.3 match 49 stars 8.68 score 117 scripts 1 dependents

stevenmmortimer

rdfp:An Implementation of the 'DoubleClick for Publishers' API

Functions to interact with the 'Google DoubleClick for Publishers (DFP)' API <https://developers.google.com/ad-manager/api/start> (recently renamed to 'Google Ad Manager'). This package is automatically compiled from the API WSDL (Web Service Description Language) files to dictate how the API is structured. Theoretically, all API actions are possible using this package; however, care must be taken to format the inputs correctly and parse the outputs correctly. Please see the 'Google Ad Manager' API reference <https://developers.google.com/ad-manager/api/rel_notes> and this package's website <https://stevenmmortimer.github.io/rdfp/> for more information, documentation, and examples.

Maintained by Steven M. Mortimer. Last updated 6 years ago.

api-client api-wrapper dfp dfp-api doubleclick doubleclick-for-publishers google-dfp

6.5 match 16 stars 6.93 score 214 scripts

uscbiostats

partition:Agglomerative Partitioning Framework for Dimension Reduction

A fast and flexible framework for agglomerative partitioning. 'partition' uses an approach called Direct-Measure-Reduce to create new variables that maintain the user-specified minimum level of information. Each reduced variable is also interpretable: the original variables map to one and only one variable in the reduced data set. 'partition' is flexible, as well: how variables are selected to reduce, how information loss is measured, and the way data is reduced can all be customized. 'partition' is based on the Partition framework discussed in Millstein et al. (2020) <doi:10.1093/bioinformatics/btz661>.

Maintained by Malcolm Barrett. Last updated 4 months ago.

data-reduction dimensionality-reduction partitional-clustering openblas cpp

5.8 match 36 stars 7.72 score 27 scripts 1 dependents

martin3141

spant:MR Spectroscopy Analysis Tools

Tools for reading, visualising and processing Magnetic Resonance Spectroscopy data. The package includes methods for spectral fitting: Wilson (2021) <DOI:10.1002/mrm.28385> and spectral alignment: Wilson (2018) <DOI:10.1002/mrm.27605>.

Maintained by Martin Wilson. Last updated 29 days ago.

brain mri mrs mrshub spectroscopy fortran

5.2 match 24 stars 8.55 score 81 scripts

annechao

iNEXT.3D:Interpolation and Extrapolation for Three Dimensions of Biodiversity

Biodiversity is a multifaceted concept covering different levels of organization from genes to ecosystems. 'iNEXT.3D' extends 'iNEXT' to include three dimensions (3D) of biodiversity, i.e., taxonomic diversity (TD), phylogenetic diversity (PD) and functional diversity (FD). This package provides functions to compute standardized 3D diversity estimates with a common sample size or sample coverage. A unified framework based on Hill numbers and their generalizations (Hill-Chao numbers) are used to quantify 3D. All 3D estimates are in the same units of species/lineage equivalents and can be meaningfully compared. The package features size- and coverage-based rarefaction and extrapolation sampling curves to facilitate rigorous comparison of 3D diversity across individual assemblages. Asymptotic 3D diversity estimates are also provided. See Chao et al. (2021) <doi:10.1111/2041-210X.13682> for more details.

Maintained by Anne Chao. Last updated 25 days ago.

cpp

6.5 match 6.74 score 26 scripts 2 dependents

cran

tmle:Targeted Maximum Likelihood Estimation

Targeted maximum likelihood estimation of point treatment effects (Targeted Maximum Likelihood Learning, The International Journal of Biostatistics, 2(1), 2006. This version automatically estimates the additive treatment effect among the treated (ATT) and among the controls (ATC). The tmle() function calculates the adjusted marginal difference in mean outcome associated with a binary point treatment, for continuous or binary outcomes. Relative risk and odds ratio estimates are also reported for binary outcomes. Missingness in the outcome is allowed, but not in treatment assignment or baseline covariate values. The population mean is calculated when there is missingness, and no variation in the treatment assignment. The tmleMSM() function estimates the parameters of a marginal structural model for a binary point treatment effect. Effect estimation stratified by a binary mediating variable is also available. An ID argument can be used to identify repeated measures. Default settings call 'SuperLearner' to estimate the Q and g portions of the likelihood, unless values or a user-supplied regression function are passed in as arguments.

Maintained by Susan Gruber. Last updated 10 months ago.

9.3 match 1 stars 4.73 score 300 scripts 3 dependents

bioc

miRNAtap:miRNAtap: microRNA Targets - Aggregated Predictions

The package facilitates implementation of workflows requiring miRNA predictions, it allows to integrate ranked miRNA target predictions from multiple sources available online and aggregate them with various methods which improves quality of predictions above any of the single sources. Currently predictions are available for Homo sapiens, Mus musculus and Rattus norvegicus (the last one through homology translation).

Maintained by T. Ian Simpson. Last updated 5 months ago.

software classification microarray sequencing mirna

8.8 match 4.94 score 44 scripts

yingjie4science

SDGdetector:Detect SDGs and Targets in Text

Identify 17 Sustainable Development Goals and associated 169 targets in text.

Maintained by Yingjie Li. Last updated 6 months ago.

sdg sdgs sustainability sustainable-development-goals text-mining

10.5 match 14 stars 4.15 score 10 scripts

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

3.0 match 459 stars 14.63 score 948 scripts 18 dependents

chris-prener

areal:Areal Weighted Interpolation

A pipeable, transparent implementation of areal weighted interpolation with support for interpolating multiple variables in a single function call. These tools provide a full-featured workflow for validation and estimation that fits into both modern data management (e.g. tidyverse) and spatial data (e.g. sf) frameworks.

Maintained by Christopher Prener. Last updated 3 years ago.

4.9 match 93 stars 8.88 score 106 scripts 4 dependents

epiverse-trace

epidemics:Composable Epidemic Scenario Modelling

A library of compartmental epidemic models taken from the published literature, and classes to represent affected populations, public health response measures including non-pharmaceutical interventions on social contacts, non-pharmaceutical and pharmaceutical interventions that affect disease transmissibility, vaccination regimes, and disease seasonality, which can be combined to compose epidemic scenario models.

Maintained by Rosalind Eggo. Last updated 9 months ago.

decision-support epidemic-modelling epidemic-simulations epidemiology epiverse infectious-disease-dynamics model-library non-pharmaceutical-interventions rcpp rcppeigen scenario-analysis vaccination cpp

5.7 match 9 stars 7.48 score 59 scripts

usepa

tcpl:ToxCast Data Analysis Pipeline

The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.

Maintained by Jason Brown. Last updated 1 days ago.

ccte comptox ord

4.5 match 36 stars 9.41 score 90 scripts

prioriactions

prioriactions:Multi-Action Conservation Planning

This uses a mixed integer mathematical programming (MIP) approach for building and solving multi-action planning problems, where the goal is to find an optimal combination of management actions that abate threats, in an efficient way while accounting for spatial aspects. Thus, optimizing the connectivity and conservation effectiveness of the prioritized units and of the deployed actions. The package is capable of handling different commercial (gurobi, CPLEX) and non-commercial (symphony, CBC) MIP solvers. Gurobi optimization solver can be installed using comprehensive instructions in the 'gurobi' installation vignette of the prioritizr package (available in <https://prioritizr.net/articles/gurobi_installation_guide.html>). Instead, 'CPLEX' optimization solver can be obtain from IBM CPLEX web page (available here <https://www.ibm.com/es-es/products/ilog-cplex-optimization-studio>). Additionally, the 'rcbc' R package (available at <https://github.com/dirkschumacher/rcbc>) can be used to obtain solutions using the CBC optimization software (<https://github.com/coin-or/Cbc>). Methods used in the package refers to Salgado-Rojas et al. (2020) <doi:10.1016/j.ecolmodel.2019.108901>, Beyer et al. (2016) <doi:10.1016/j.ecolmodel.2016.02.005>, Cattarino et al. (2015) <doi:10.1371/journal.pone.0128027> and Watts et al. (2009) <doi:10.1016/j.envsoft.2009.06.005>. See the prioriactions website for more information, documentations and examples.

Maintained by Jose Salgado-Rojas. Last updated 2 years ago.

conservation conservation-plan optimization prioritization threats cpp

7.8 match 10 stars 5.40 score 6 scripts

scmethods

scregclust:Reconstructing the Regulatory Programs of Target Genes in scRNA-Seq Data

Implementation of the scregclust algorithm described in Larsson, Held, et al. (2024) <doi:10.1038/s41467-024-53954-3> which reconstructs regulatory programs of target genes in scRNA-seq data. Target genes are clustered into modules and each module is associated with a linear model describing the regulatory program.

Maintained by Felix Held. Last updated 2 months ago.

clustering regulatory-programs scrna-seq-analysis cpp openmp

6.5 match 9 stars 6.45 score 21 scripts

johnaponte

repana:Repeatable Analysis in R

Set of utilities to facilitate the reproduction of analysis in R. It allow to make_structure(), clean_structure(), and run and log programs in a predefined order to allow secondary files, analysis and reports be constructed in an ordered and reproducible form.

Maintained by John J. Aponte. Last updated 21 days ago.

7.0 match 5 stars 5.98 score 19 scripts

bioc

GSCA:GSCA: Gene Set Context Analysis

GSCA takes as input several lists of activated and repressed genes. GSCA then searches through a compendium of publicly available gene expression profiles for biological contexts that are enriched with a specified pattern of gene expression. GSCA provides both traditional R functions and interactive, user-friendly user interface.

Maintained by Zhicheng Ji. Last updated 5 months ago.

geneexpression visualization gui

8.2 match 5.00 score 5 scripts

cibiobcg

EthSEQ:Ethnicity Annotation from Whole-Exome and Targeted Sequencing Data

Reliable and rapid ethnicity annotation from whole exome and targeted sequencing data.

Maintained by Alessandro Romanel. Last updated 2 years ago.

ethnicity-analysis exome-sequencing genotype-data cpp

7.9 match 15 stars 5.18 score 10 scripts

cran

sna:Tools for Social Network Analysis

A range of tools for social network analysis, including node and graph-level indices, structural distance and covariance methods, structural equivalence detection, network regression, random graph generation, and 2D/3D network visualization.

Maintained by Carter T. Butts. Last updated 6 months ago.

6.0 match 8 stars 6.78 score 94 dependents

bioc

TargetDecoy:Diagnostic Plots to Evaluate the Target Decoy Approach

A first step in the data analysis of Mass Spectrometry (MS) based proteomics data is to identify peptides and proteins. With this respect the huge number of experimental mass spectra typically have to be assigned to theoretical peptides derived from a sequence database. Search engines are used for this purpose. These tools compare each of the observed spectra to all candidate theoretical spectra derived from the sequence data base and calculate a score for each comparison. The observed spectrum is then assigned to the theoretical peptide with the best score, which is also referred to as the peptide to spectrum match (PSM). It is of course crucial for the downstream analysis to evaluate the quality of these matches. Therefore False Discovery Rate (FDR) control is used to return a reliable list PSMs. The FDR, however, requires a good characterisation of the score distribution of PSMs that are matched to the wrong peptide (bad target hits). In proteomics, the target decoy approach (TDA) is typically used for this purpose. The TDA method matches the spectra to a database of real (targets) and nonsense peptides (decoys). A popular approach to generate these decoys is to reverse the target database. Hence, all the PSMs that match to a decoy are known to be bad hits and the distribution of their scores are used to estimate the distribution of the bad scoring target PSMs. A crucial assumption of the TDA is that the decoy PSM hits have similar properties as bad target hits so that the decoy PSM scores are a good simulation of the target PSM scores. Users, however, typically do not evaluate these assumptions. To this end we developed TargetDecoy to generate diagnostic plots to evaluate the quality of the target decoy method.

Maintained by Elke Debrie. Last updated 5 months ago.

massspectrometry proteomics qualitycontrol software visualization bioconductor mass-spectrometry

8.8 match 1 stars 4.60 score 9 scripts

bioc

MSstatsPTM:Statistical Characterization of Post-translational Modifications

MSstatsPTM provides general statistical methods for quantitative characterization of post-translational modifications (PTMs). Supports DDA, DIA, SRM, and tandem mass tag (TMT) labeling. Typically, the analysis involves the quantification of PTM sites (i.e., modified residues) and their corresponding proteins, as well as the integration of the quantification results. MSstatsPTM provides functions for summarization, estimation of PTM site abundance, and detection of changes in PTMs across experimental conditions.

Maintained by Devon Kohler. Last updated 4 months ago.

immunooncology massspectrometry proteomics software differentialexpression onechannel twochannel normalization qualitycontrol post-translational-modification cpp

5.0 match 10 stars 7.98 score 36 scripts 2 dependents

sinhrks

ggfortify:Data Visualization Tools for Statistical Analysis Results

Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using 'ggplot2'.

Maintained by Yuan Tang. Last updated 9 months ago.

2.8 match 529 stars 14.49 score 9.1k scripts 22 dependents

bioc

TargetScore:TargetScore: Infer microRNA targets using microRNA-overexpression data and sequence information

Infer the posterior distributions of microRNA targets by probabilistically modelling the likelihood microRNA-overexpression fold-changes and sequence-based scores. Variaitonal Bayesian Gaussian mixture model (VB-GMM) is applied to log fold-changes and sequence scores to obtain the posteriors of latent variable being the miRNA targets. The final targetScore is computed as the sigmoid-transformed fold-change weighted by the averaged posteriors of target components over all of the features.

Maintained by Yue Li. Last updated 5 months ago.

mirna

9.8 match 4.00 score 9 scripts

alarm-redist

redist:Simulation Methods for Legislative Redistricting

Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.

Maintained by Christopher T. Kenny. Last updated 2 months ago.

geospatial gerrymandering redistricting sampling openblas cpp openmp

4.3 match 68 stars 9.17 score 259 scripts

ohdsi

OhdsiReportGenerator:Observational Health Data Sciences and Informatics Report Generator

Extract results into R from the Observational Health Data Sciences and Informatics result database (see <https://ohdsi.github.io/Strategus/results-schema/index.html>) and generate reports/presentations via 'quarto' that summarize results in HTML format. Learn more about 'OhdsiReportGenerator' at <https://ohdsi.github.io/OhdsiReportGenerator/>.

Maintained by Jenna Reps. Last updated 17 days ago.

openjdk

8.6 match 4.54 score 2 scripts

aphalo

ggspectra:Extensions to 'ggplot2' for Radiation Spectra

Additional annotations, stats, geoms and scales for plotting "light" spectra with 'ggplot2', together with specializations of ggplot() and autoplot() methods for spectral data and waveband definitions stored in objects of classes defined in package 'photobiology'. Part of the 'r4photobiology' suite, Aphalo P. J. (2015) <doi:10.19232/uv4pb.2015.1.14>.

Maintained by Pedro J. Aphalo. Last updated 15 hours ago.

dataviz ggplot2-autoplot ggplot2-enhancementes ggplot2-geoms ggplot2-scales ggplot2-stats light r4photobiology-suite radiation spectra

4.8 match 5 stars 8.09 score 390 scripts 1 dependents

kwstat

agridat:Agricultural Datasets

Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from small-plot trials, multi-environment trials, uniformity trials, yield monitors, and more.

Maintained by Kevin Wright. Last updated 26 days ago.

data

3.5 match 125 stars 11.02 score 1.7k scripts 2 dependents

bioc

crisprVerse:Easily install and load the crisprVerse ecosystem for CRISPR gRNA design

The crisprVerse is a modular ecosystem of R packages developed for the design and manipulation of CRISPR guide RNAs (gRNAs). All packages share a common language and design principles. This package is designed to make it easy to install and load the crisprVerse packages in a single step. To learn more about the crisprVerse, visit <https://www.github.com/crisprVerse>.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics genetarget crispr-analysis crispr-design crispr-target grna grna-sequence grna-sequences

7.5 match 13 stars 5.11 score 8 scripts

centerforstatistics-ugent

xnet:Two-Step Kernel Ridge Regression for Network Predictions

Fit a two-step kernel ridge regression model for predicting edges in networks, and carry out cross-validation using shortcuts for swift and accurate performance assessment (Stock et al, 2018 <doi:10.1093/bib/bby095> ).

Maintained by Joris Meys. Last updated 4 years ago.

7.2 match 11 stars 5.30 score 12 scripts

nbarrowman

vtree:Display Information About Nested Subsets of a Data Frame

A tool for calculating and drawing "variable trees". Variable trees display information about nested subsets of a data frame.

Maintained by Nick Barrowman. Last updated 1 years ago.

data-science data-visualization exploratory-data-analysis statistics

5.4 match 76 stars 7.09 score 65 scripts

osofr

simcausal:Simulating Longitudinal Data with Causal Inference Applications

A flexible tool for simulating complex longitudinal data using structural equations, with emphasis on problems in causal inference. Specify interventions and simulate from intervened data generating distributions. Define and evaluate treatment-specific means, the average treatment effects and coefficients from working marginal structural models. User interface designed to facilitate the conduct of transparent and reproducible simulation studies, and allows concise expression of complex functional dependencies for a large number of time-varying nodes. See the package vignette for more information, documentation and examples.

Maintained by Oleg Sofrygin. Last updated 8 months ago.

counterfactual-data sem simulated-network simulating-data structural-equations

6.3 match 67 stars 6.06 score 170 scripts

dkyleward

ipfr:List Balancing for Reweighting and Population Synthesis

Performs iterative proportional updating given a seed table and an arbitrary number of marginal distributions. This is commonly used in population synthesis, survey raking, matrix rebalancing, and other applications. For example, a household survey may be weighted to match the known distribution of households by size from the census. An origin/ destination trip matrix might be balanced to match traffic counts. The approach used by this package is based on a paper from Arizona State University (Ye, Xin, et. al. (2009) <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.537.723&rep=rep1&type=pdf>). Some enhancements have been made to their work including primary and secondary target balance/importance, general marginal agreement, and weight restriction.

Maintained by Kyle Ward. Last updated 5 years ago.

7.5 match 5 stars 5.06 score 23 scripts

ropensci

chopin:Computation of Spatial Data by Hierarchical and Objective Partitioning of Inputs for Parallel Processing

Geospatial data computation is parallelized by grid, hierarchy, or raster files. Based on future and mirai parallel backends, terra and sf functions as well as convenience functions in the package can be distributed over multiple threads. The simplest way of parallelizing generic geospatial computation is to start from `par_pad_*` functions to `par_grid`, `par_hierarchy`, or `par_multirasters` functions. Virtually any functions accepting classes in terra or sf packages can be used in the three parallelization functions. A common raster-vector overlay operation is provided as a function `extract_at`, which uses exactextractr, with options for kernel weights for summarizing raster values at vector geometries. Other convenience functions for vector-vector operations including simple areal interpolation (`summarize_aw`) and summation of exponentially decaying weights (`summarize_sedc`) are also provided.

Maintained by Insang Song. Last updated 13 days ago.

6.2 match 16 stars 6.11 score 23 scripts

rstudio

pointblank:Data Validation and Organization of Metadata for Local and Remote Tables

Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.

Maintained by Richard Iannone. Last updated 8 days ago.

data-assertions data-checker data-dictionaries data-frames data-inference data-management data-profiler data-quality data-validation data-verification database-tables easy-to-understand reporting-tool schema-validation testing-tools yaml-configuration

3.5 match 932 stars 10.59 score 284 scripts

bioc

MIRit:Integrate microRNA and gene expression to decipher pathway complexity

MIRit is an R package that provides several methods for investigating the relationships between miRNAs and genes in different biological conditions. In particular, MIRit allows to explore the functions of dysregulated miRNAs, and makes it possible to identify miRNA-gene regulatory axes that control biological pathways, thus enabling the users to unveil the complexity of miRNA biology. MIRit is an all-in-one framework that aims to help researchers in all the central aspects of an integrative miRNA-mRNA analyses, from differential expression analysis to network characterization.

Maintained by Jacopo Ronchi. Last updated 18 hours ago.

software generegulation networkenrichment networkinference epigenetics functionalgenomics systemsbiology network pathways geneexpression differentialexpression mirna mirna-mrna-interaction mirna-seq mirnaseq-analysis cpp

9.3 match 4.00 score 2 scripts

usaid-oha-si

grabr:OHA/SI APIs Package

Provides a series of base functions useful to the GH OHA SI team. These function extend the utility functions in glamr, focusing primarily on API utility functions.

Maintained by Aaron Chafetz. Last updated 6 months ago.

7.2 match 1 stars 5.14 score 69 scripts

r-spatial

RSAGA:SAGA Geoprocessing and Terrain Analysis

Provides access to geocomputing and terrain analysis functions of the geographical information system (GIS) 'SAGA' (System for Automated Geoscientific Analyses) from within R by running the command line version of SAGA. This package furthermore provides several R functions for handling ASCII grids, including a flexible framework for applying local functions (including predict methods of fitted models) and focal functions to multiple grids. SAGA GIS is available under GPL-2 / LGPL-2 licences from <https://sourceforge.net/projects/saga-gis/>.

Maintained by Alexander Brenning. Last updated 1 months ago.

4.1 match 23 stars 8.72 score 275 scripts

bioc

DNAfusion:Identification of gene fusions using paired-end sequencing

DNAfusion can identify gene fusions such as EML4-ALK based on paired-end sequencing results. This package was developed using position deduplicated BAM files generated with the AVENIO Oncology Analysis Software. These files are made using the AVENIO ctDNA surveillance kit and Illumina Nextseq 500 sequencing. This is a targeted hybridization NGS approach and includes ALK-specific but not EML4-specific probes.

Maintained by Christoffer Trier Maansson. Last updated 5 months ago.

targetedresequencing genetics genefusiondetection sequencing bioconductor-package circulating-tumor-dna gene-fusion liquid-biopsy next-generation-sequencing targeted-sequencing variant-calling

8.0 match 3 stars 4.48 score 10 scripts

bioc

oppti:Outlier Protein and Phosphosite Target Identifier

The aim of oppti is to analyze protein (and phosphosite) expressions to find outlying markers for each sample in the given cohort(s) for the discovery of personalized actionable targets.

Maintained by Abdulkadir Elmas. Last updated 5 months ago.

proteomics regression differentialexpression biomedicalinformatics genetarget geneexpression network

8.3 match 2 stars 4.30 score 2 scripts

virgile-baudrot

morse:Modelling Reproduction and Survival Data in Ecotoxicology

Advanced methods for a valuable quantitative environmental risk assessment using Bayesian inference of survival and reproduction Data. Among others, it facilitates Bayesian inference of the general unified threshold model of survival (GUTS). See our companion paper Baudrot and Charles (2021) <doi:10.21105/joss.03200>, as well as complementary details in Baudrot et al. (2018) <doi:10.1021/acs.est.7b05464> and Delignette-Muller et al. (2017) <doi:10.1021/acs.est.6b05326>.

Maintained by Virgile Baudrot. Last updated 6 months ago.

jags cpp

11.0 match 3.26 score 60 scripts

munterfi

eRTG3D:Empirically Informed Random Trajectory Generation in 3-D

Creates realistic random trajectories in a 3-D space between two given fix points, so-called conditional empirical random walks (CERWs). The trajectory generation is based on empirical distribution functions extracted from observed trajectories (training data) and thus reflects the geometrical movement characteristics of the mover. A digital elevation model (DEM), representing the Earth's surface, and a background layer of probabilities (e.g. food sources, uplift potential, waterbodies, etc.) can be used to influence the trajectories. Unterfinger M (2018). "3-D Trajectory Simulation in Movement Ecology: Conditional Empirical Random Walk". Master's thesis, University of Zurich. <https://www.geo.uzh.ch/dam/jcr:6194e41e-055c-4635-9807-53c5a54a3be7/MasterThesis_Unterfinger_2018.pdf>. Technitis G, Weibel R, Kranstauber B, Safi K (2016). "An algorithm for empirically informed random trajectory generation between two endpoints". GIScience 2016: Ninth International Conference on Geographic Information Science, 9, online. <doi:10.5167/uzh-130652>.

Maintained by Merlin Unterfinger. Last updated 3 years ago.

3d birds conditional-empirical-random-walk gliding-and-soaring machine-learning movement-ecology random-trajectory-generator random-walk simulation trajectory-generation

6.3 match 6 stars 5.71 score 19 scripts

nhejazi

medoutcon:Efficient Natural and Interventional Causal Mediation Analysis

Efficient estimators of interventional (in)direct effects in the presence of mediator-outcome confounding affected by exposure. The effects estimated allow for the impact of the exposure on the outcome through a direct path to be disentangled from that through mediators, even in the presence of intermediate confounders that complicate such a relationship. Currently supported are non-parametric efficient one-step and targeted minimum loss estimators based on the formulation of Díaz, Hejazi, Rudolph, and van der Laan (2020) <doi:10.1093/biomet/asaa085>. Support for efficient estimation of the natural (in)direct effects is also provided, appropriate for settings in which intermediate confounders are absent. The package also supports estimation of these effects when the mediators are measured using outcome-dependent two-phase sampling designs (e.g., case-cohort).

Maintained by Nima Hejazi. Last updated 1 years ago.

causal-inference causal-machine-learning inverse-probability-weights machine-learning mediation-analysis stochastic-interventions targeted-learning treatment-effects

8.0 match 13 stars 4.46 score 22 scripts

bioc

crisprBwa:BWA-based alignment of CRISPR gRNA spacer sequences

Provides a user-friendly interface to map on-targets and off-targets of CRISPR gRNA spacer sequences using bwa. The alignment is fast, and can be performed using either commonly-used or custom CRISPR nucleases. The alignment can work with any reference or custom genomes. Currently not supported on Windows machines.

Maintained by Jean-Philippe Fortin. Last updated 5 months ago.

crispr functionalgenomics alignment aligner bioconductor bioconductor-package bwa crispr-analysis crispr-cas9 crispr-design crispr-target grna grna-sequence grna-sequences sgrna sgrna-design

8.3 match 1 stars 4.30 score 6 scripts

patzaw

BED:Biological Entity Dictionary (BED)

An interface for the 'Neo4j' database providing mapping between different identifiers of biological entities. This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information. The method has been published by Godard and van Eyll (2018) <doi:10.12688/f1000research.13925.3>.

Maintained by Patrice Godard. Last updated 3 months ago.

5.2 match 8 stars 6.85 score 25 scripts

nwaller

fungible:Psychometric Functions from the Waller Lab

Computes fungible coefficients and Monte Carlo data. Underlying theory for these functions is described in the following publications: Waller, N. (2008). Fungible Weights in Multiple Regression. Psychometrika, 73(4), 691-703, <DOI:10.1007/s11336-008-9066-z>. Waller, N. & Jones, J. (2009). Locating the Extrema of Fungible Regression Weights. Psychometrika, 74(4), 589-602, <DOI:10.1007/s11336-008-9087-7>. Waller, N. G. (2016). Fungible Correlation Matrices: A Method for Generating Nonsingular, Singular, and Improper Correlation Matrices for Monte Carlo Research. Multivariate Behavioral Research, 51(4), 554-568. Jones, J. A. & Waller, N. G. (2015). The normal-theory and asymptotic distribution-free (ADF) covariance matrix of standardized regression coefficients: theoretical extensions and finite sample behavior. Psychometrika, 80, 365-378, <DOI:10.1007/s11336-013-9380-y>. Waller, N. G. (2018). Direct Schmid-Leiman transformations and rank-deficient loadings matrices. Psychometrika, 83, 858-870. <DOI:10.1007/s11336-017-9599-0>.

Maintained by Niels Waller. Last updated 1 years ago.

7.1 match 5.01 score 136 scripts 8 dependents

ngreifer

optweight:Targeted Stable Balancing Weights Using Optimization

Use optimization to estimate weights that balance covariates for binary, multinomial, and continuous treatments in the spirit of Zubizarreta (2015) <doi:10.1080/01621459.2015.1023805>. The degree of balance can be specified for each covariate. In addition, sampling weights can be estimated that allow a sample to generalize to a population specified with given target moments of covariates.

Maintained by Noah Greifer. Last updated 2 years ago.

causal-inference inverse-probability-weights observational-study optimization propensity-scores

9.4 match 8 stars 3.78 score 15 scripts

bioc

spatialHeatmap:spatialHeatmap: Visualizing Spatial Assays in Anatomical Images and Large-Scale Data Extensions

The spatialHeatmap package offers the primary functionality for visualizing cell-, tissue- and organ-specific assay data in spatial anatomical images. Additionally, it provides extended functionalities for large-scale data mining routines and co-visualizing bulk and single-cell data. A description of the project is available here: https://spatialheatmap.org.

Maintained by Jianhai Zhang. Last updated 4 months ago.

spatial visualization microarray sequencing geneexpression datarepresentation network clustering graphandnetwork cellbasedassays atacseq dnaseq tissuemicroarray singlecell cellbiology genetarget

5.6 match 5 stars 6.26 score 12 scripts

bioc

PanomiR:Detection of miRNAs that regulate interacting groups of pathways

PanomiR is a package to detect miRNAs that target groups of pathways from gene expression data. This package provides functionality for generating pathway activity profiles, determining differentially activated pathways between user-specified conditions, determining clusters of pathways via the PCxN package, and generating miRNAs targeting clusters of pathways. These function can be used separately or sequentially to analyze RNA-Seq data.

Maintained by Pourya Naderi. Last updated 5 months ago.

geneexpression genesetenrichment genetarget mirna pathways

7.2 match 3 stars 4.89 score 13 scripts

bioc

GeomxTools:NanoString GeoMx Tools

Tools for NanoString Technologies GeoMx Technology. Package provides functions for reading in DCC and PKC files based on an ExpressionSet derived object. Normalization and QC functions are also included.

Maintained by Maddy Griswold. Last updated 5 months ago.

geneexpression transcription cellbasedassays dataimport transcriptomics proteomics mrnamicroarray proprietaryplatforms rnaseq sequencing experimentaldesign normalization spatial

4.9 match 7.11 score 239 scripts 3 dependents

nlmixr2

nlmixr2targets:Targets for 'nlmixr2' Pipelines

'nlmixr2' often has long runtimes. A pipeline toolkit tailored to 'nlmixr2' workflows leverages 'targets' and 'nlmixr2' to ease reproducible workflows. 'nlmixr2targets' ensures minimal rework in model development with 'nlmixr2' and 'targets' by simplifying and standardizing models and datasets.

Maintained by Bill Denney. Last updated 20 days ago.

10.9 match 3.18 score 6 scripts

mlopez-ibanez

irace:Iterated Racing for Automatic Algorithm Configuration

Iterated race is an extension of the Iterated F-race method for the automatic configuration of optimization algorithms, that is, (offline) tuning their parameters by finding the most appropriate settings given a set of instances of an optimization problem. M. López-Ibáñez, J. Dubois-Lacoste, L. Pérez Cáceres, T. Stützle, and M. Birattari (2016) <doi:10.1016/j.orp.2016.09.002>.

Maintained by Manuel López-Ibáñez. Last updated 29 days ago.

algorithm-configuration hyperparameter-tuning irace optimization-algorithms

3.4 match 63 stars 10.28 score 103 scripts 1 dependents

kgoldfeld

simstudy:Simulation of Study Data

Simulates data sets in order to explore modeling techniques or better understand data generating processes. The user specifies a set of relationships between covariates, and generates data based on these specifications. The final data sets can represent data from randomized control trials, repeated measure (longitudinal) designs, and cluster randomized trials. Missingness can be generated using various mechanisms (MCAR, MAR, NMAR).

Maintained by Keith Goldfeld. Last updated 8 months ago.

data-generation data-simulation simulation statistical-models cpp

3.1 match 82 stars 11.00 score 972 scripts 1 dependents

natverse

nat:NeuroAnatomy Toolbox for Analysis of 3D Image Data

NeuroAnatomy Toolbox (nat) enables analysis and visualisation of 3D biological image data, especially traced neurons. Reads and writes 3D images in NRRD and 'Amira' AmiraMesh formats and reads surfaces in 'Amira' hxsurf format. Traced neurons can be imported from and written to SWC and 'Amira' LineSet and SkeletonGraph formats. These data can then be visualised in 3D via 'rgl', manipulated including applying calculated registrations, e.g. using the 'CMTK' registration suite, and analysed. There is also a simple representation for neurons that have been subjected to 3D skeletonisation but not formally traced; this allows morphological comparison between neurons including searches and clustering (via the 'nat.nblast' extension package).

Maintained by Gregory Jefferis. Last updated 5 months ago.

3d connectomics image-analysis neuroanatomy neuroanatomy-toolbox neuron neuron-morphology neuroscience visualisation

3.4 match 67 stars 9.94 score 436 scripts 2 dependents

bioc

NTW:Predict gene network using an Ordinary Differential Equation (ODE) based method

This package predicts the gene-gene interaction network and identifies the direct transcriptional targets of the perturbation using an ODE (Ordinary Differential Equation) based method.

Maintained by Yuanhua Liu. Last updated 5 months ago.

preprocessing

9.0 match 3.78 score 1 scripts

bioc

sSeq:Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size

The purpose of this package is to discover the genes that are differentially expressed between two conditions in RNA-seq experiments. Gene expression is measured in counts of transcripts and modeled with the Negative Binomial (NB) distribution using a shrinkage approach for dispersion estimation. The method of moment (MM) estimates for dispersion are shrunk towards an estimated target, which minimizes the average squared difference between the shrinkage estimates and the initial estimates. The exact per-gene probability under the NB model is calculated, and used to test the hypothesis that the expected expression of a gene in two conditions identically follow a NB distribution.

Maintained by Danni Yu. Last updated 5 months ago.

immunooncology rnaseq

6.8 match 4.98 score 4 scripts 2 dependents

insightsengineering

teal:Exploratory Web Apps for Analyzing Clinical Trials Data

A 'shiny' based interactive exploration framework for analyzing clinical trials data. 'teal' currently provides a dynamic filtering facility and different data viewers. 'teal' 'shiny' applications are built using standard 'shiny' modules.

Maintained by Dawid Kaledkowski. Last updated 19 days ago.

clinical-trials nest shiny webapp

2.7 match 197 stars 12.68 score 176 scripts 5 dependents

bioc

netZooR:Unified methods for the inference and analysis of gene regulatory networks

netZooR unifies the implementations of several Network Zoo methods (netzoo, netzoo.github.io) into a single package by creating interfaces between network inference and network analysis methods. Currently, the package has 3 methods for network inference including PANDA and its optimized implementation OTTER (network reconstruction using mutliple lines of biological evidence), LIONESS (single-sample network inference), and EGRET (genotype-specific networks). Network analysis methods include CONDOR (community detection), ALPACA (differential community detection), CRANE (significance estimation of differential modules), MONSTER (estimation of network transition states). In addition, YARN allows to process gene expresssion data for tissue-specific analyses and SAMBAR infers missing mutation data based on pathway information.

Maintained by Tara Eicher. Last updated 7 days ago.

networkinference network generegulation geneexpression transcription microarray graphandnetwork gene-regulatory-network transcription-factors

4.2 match 105 stars 7.98 score

tlverse

tmle3mediate:Targeted Learning for Causal Mediation Analysis

Targeted maximum likelihood (TML) estimation of population-level causal effects in mediation analysis. The causal effects are defined by joint static or stochastic interventions applied to the exposure and the mediator. Targeted doubly robust estimators are provided for the classical natural direct and indirect effects, as well as the more recently developed population intervention direct and indirect effects.

Maintained by Nima Hejazi. Last updated 4 years ago.

causal-inference causal-mediation-analysis machine-learning mediation-analysis stochastic-interventions targeted-learning treatment-effects

11.3 match 6 stars 2.98 score 16 scripts

zarquon42b

Rvcg:Manipulations of Triangular Meshes Based on the 'VCGLIB' API

Operations on triangular meshes based on 'VCGLIB'. This package integrates nicely with the R-package 'rgl' to render the meshes processed by 'Rvcg'. The Visualization and Computer Graphics Library (VCG for short) is an open source portable C++ templated library for manipulation, processing and displaying with OpenGL of triangle and tetrahedral meshes. The library, composed by more than 100k lines of code, is released under the GPL license, and it is the base of most of the software tools of the Visual Computing Lab of the Italian National Research Council Institute ISTI <https://vcg.isti.cnr.it/>, like 'metro' and 'MeshLab'. The 'VCGLIB' source is pulled from trunk <https://github.com/cnr-isti-vclab/vcglib> and patched to work with options determined by the configure script as well as to work with the header files included by 'RcppEigen'.

Maintained by Stefan Schlager. Last updated 5 months ago.

cpp openmp

3.4 match 25 stars 10.05 score 195 scripts 29 dependents

blasbenito

collinear:Automated Multicollinearity Management

Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.

Maintained by Blas M. Benito. Last updated 2 months ago.

machine-learning multicollinearity statistics

6.1 match 11 stars 5.51 score 15 scripts 1 dependents

hubverse-org

hubExamples:Example Hub Data

This package provides example data for forecasting and scenario modeling hubs in the hubverse format.

Maintained by Evan L Ray. Last updated 2 months ago.

6.2 match 1 stars 5.46 score 20 scripts 1 dependents

rstudio

DT:A Wrapper of the JavaScript Library 'DataTables'

Data objects in R can be rendered as HTML tables using the JavaScript library 'DataTables' (typically via R Markdown or Shiny). The 'DataTables' library has been included in this R package. The package name 'DT' is an abbreviation of 'DataTables'.

Maintained by Joe Cheng. Last updated 3 months ago.

datatables htmlwidgets javascript shiny

1.8 match 604 stars 19.15 score 38k scripts 673 dependents

sfilges

umiAnalyzer:Tools for Analyzing Sequencing Data with Unique Molecular Identifiers

Tools for analyzing sequencing data containing unique molecular identifiers generated by 'UMIErrorCorrect' (<https://github.com/stahlberggroup/umierrorcorrect>).

Maintained by Stefan Filges. Last updated 3 years ago.

targeted-sequencing unique-molecular-identifiers variant-analysis

7.5 match 4.46 score 58 scripts

usdaforestservice

gdalraster:Bindings to the 'Geospatial Data Abstraction Library' Raster API

Interface to the Raster API of the 'Geospatial Data Abstraction Library' ('GDAL', <https://gdal.org>). Bindings are implemented in an exposed C++ class encapsulating a 'GDALDataset' and its raster band objects, along with several stand-alone functions. These support manual creation of uninitialized datasets, creation from existing raster as template, read/set dataset parameters, low level I/O, color tables, raster attribute tables, virtual raster (VRT), and 'gdalwarp' wrapper for reprojection and mosaicing. Includes 'GDAL' algorithms ('dem_proc()', 'polygonize()', 'rasterize()', etc.), and functions for coordinate transformation and spatial reference systems. Calling signatures resemble the native C, C++ and Python APIs provided by the 'GDAL' project. Includes raster 'calc()' to evaluate a given R expression on a layer or stack of layers, with pixel x/y available as variables in the expression; and raster 'combine()' to identify and count unique pixel combinations across multiple input layers, with optional output of the pixel-level combination IDs. Provides raster display using base 'graphics'. Bindings to a subset of the 'OGR' API are also included for managing vector data sources. Bindings to a subset of the Virtual Systems Interface ('VSI') are also included to support operations on 'GDAL' virtual file systems. These are general utility functions that abstract file system operations on URLs, cloud storage services, 'Zip'/'GZip'/'7z'/'RAR' archives, and in-memory files. 'gdalraster' may be useful in applications that need scalable, low-level I/O, or prefer a direct 'GDAL' API.

Maintained by Chris Toney. Last updated 4 hours ago.

gdal geospatial raster vector cpp

3.5 match 42 stars 9.50 score 32 scripts 3 dependents

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 25 days ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

2.3 match 305 stars 14.45 score 1.6k scripts 6 dependents

bioc

rTRM:Identification of Transcriptional Regulatory Modules from Protein-Protein Interaction Networks

rTRM identifies transcriptional regulatory modules (TRMs) from protein-protein interaction networks.

Maintained by Diego Diez. Last updated 5 months ago.

transcription network generegulation graphandnetwork bioconductor bioinformatics

6.9 match 3 stars 4.86 score 3 scripts 1 dependents

bioc

epiregulon:Gene regulatory network inference from single cell epigenomic data

Gene regulatory networks model the underlying gene regulation hierarchies that drive gene expression and observed phenotypes. Epiregulon infers TF activity in single cells by constructing a gene regulatory network (regulons). This is achieved through integration of scATAC-seq and scRNA-seq data and incorporation of public bulk TF ChIP-seq data. Links between regulatory elements and their target genes are established by computing correlations between chromatin accessibility and gene expressions.

Maintained by Xiaosai Yao. Last updated 5 days ago.

singlecell generegulation networkinference network geneexpression transcription genetarget cpp

5.0 match 14 stars 6.67 score 17 scripts

bioc

aroma.light:Light-Weight Methods for Normalization and Visualization of Microarray Data using Only Basic R Data Types

Methods for microarray analysis that take basic data types such as matrices and lists of vectors. These methods can be used standalone, be utilized in other packages, or be wrapped up in higher-level classes.

Maintained by Henrik Bengtsson. Last updated 5 months ago.

infrastructure microarray onechannel twochannel multichannel visualization preprocessing bioconductor

5.1 match 1 stars 6.43 score 26 scripts 20 dependents