R-universe search: discover

promidat

discoveR:Exploratory Data Analysis System

Performs an exploratory data analysis through a 'shiny' interface. It includes basic methods such as the mean, median, mode, normality test, among others. It also includes clustering techniques such as Principal Components Analysis, Hierarchical Clustering and the K-Means Method.

Maintained by Oldemar Rodriguez. Last updated 2 years ago.

50.0 match 3 stars 3.03 score 18 scripts

doi-usgs

nhdplusTools:NHDPlus Tools

Tools for traversing and working with National Hydrography Dataset Plus (NHDPlus) data. All methods implemented in 'nhdplusTools' are available in the NHDPlus documentation available from the US Environmental Protection Agency <https://www.epa.gov/waterdata/basic-information>.

Maintained by David Blodgett. Last updated 25 days ago.

5.6 match 87 stars 11.38 score 348 scripts 5 dependents

bioc

metagenomeSeq:Statistical analysis for sparse high-throughput sequencing

metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.

Maintained by Joseph N. Paulson. Last updated 3 months ago.

immunooncology classification clustering geneticvariability differentialexpression microbiome metagenomics normalization visualization multiplecomparison sequencing software

5.2 match 69 stars 12.02 score 494 scripts 7 dependents

sergejruff

Virusparies:Visualize and Process Output from 'VirusHunterGatherer'

A collection of tools for downstream analysis of 'VirusHunterGatherer' output. Processing of hittables and plotting of results, enabling better interpretation, is made easier with the provided functions.

Maintained by Ruff Sergej. Last updated 3 months ago.

bioinformatics data-driven discover discovery ggplot2 graphical-table hidden-markov-model hmmlearn plot r-programming summary-statistics virus virus-discovery virus-scanning virusgatherer virushunter virushuntergatherer visualization

10.0 match 1 stars 4.49 score 28 scripts

rstudio

pins:Pin, Discover, and Share Resources

Publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with 'DropBox'), 'Posit Connect', 'AWS S3', and more.

Maintained by Julia Silge. Last updated 1 months ago.

azure gcloud rpins rsconnect s3 storage

3.1 match 321 stars 14.17 score 1.9k scripts 17 dependents

bioc

CatsCradle:This package provides methods for analysing spatial transcriptomics data and for discovering gene clusters

This package addresses two broad areas. It allows for in-depth analysis of spatial transcriptomic data by identifying tissue neighbourhoods. These are contiguous regions of tissue surrounding individual cells. 'CatsCradle' allows for the categorisation of neighbourhoods by the cell types contained in them and the genes expressed in them. In particular, it produces Seurat objects whose individual elements are neighbourhoods rather than cells. In addition, it enables the categorisation and annotation of genes by producing Seurat objects whose elements are genes.

Maintained by Michael Shapiro. Last updated 1 months ago.

biologicalquestion statisticalmethod geneexpression singlecell transcriptomics spatial

6.0 match 3 stars 6.50 score

rstudio

reticulate:Interface to 'Python'

Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.

Maintained by Tomasz Kalinowski. Last updated 1 days ago.

cpp

1.8 match 1.7k stars 21.07 score 18k scripts 427 dependents

bioc

tradeSeq:trajectory-based differential expression analysis for sequencing data

tradeSeq provides a flexible method for fitting regression models that can be used to find genes that are differentially expressed along one or multiple lineages in a trajectory. Based on the fitted models, it uses a variety of tests suited to answer different questions of interest, e.g. the discovery of genes for which expression is associated with pseudotime, or which are differentially expressed (in a specific region) along the trajectory. It fits a negative binomial generalized additive model (GAM) for each gene, and performs inference on the parameters of the GAM.

Maintained by Hector Roux de Bezieux. Last updated 5 months ago.

clustering regression timecourse differentialexpression geneexpression rnaseq sequencing software singlecell transcriptomics multiplecomparison visualization

3.7 match 247 stars 10.06 score 440 scripts

bioc

musicatk:Mutational Signature Comprehensive Analysis Toolkit

Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.

Maintained by Joshua D. Campbell. Last updated 5 months ago.

software biologicalquestion somaticmutation variantannotation

4.9 match 13 stars 7.02 score 20 scripts

dstgithub

GrpString:Patterns and Statistical Differences Between Two Groups of Strings

Methods include converting series of event names to strings, finding common patterns in a group of strings, discovering featured patterns when comparing two groups of strings as well as the number and starting position of each pattern in each string, obtaining transition matrix, computing transition entropy, statistically comparing the difference between two groups of strings, and clustering string groups. Event names can be any action names or labels such as events in log files or areas of interest (AOIs) in eye tracking research.

Maintained by Hui (Tom) Tang. Last updated 7 years ago.

9.2 match 2 stars 3.48 score 30 scripts

bioc

AnnotationHub:Client to access AnnotationHub resources

This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be discovered. The resource includes metadata about each resource, e.g., a textual description, tags, and date of modification. The client creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure dataimport gui thirdpartyclient core-package u24ca289073

2.2 match 17 stars 13.89 score 2.7k scripts 102 dependents

bioconductor

BiocManager:Access the Bioconductor Project Package Repository

A convenient tool to install and update Bioconductor packages.

Maintained by Marcel Ramos. Last updated 30 days ago.

core-services

1.9 match 74 stars 16.47 score 2.9k scripts 414 dependents

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 12 days ago.

data-manipulation grammar cpp

1.3 match 4.8k stars 24.68 score 659k scripts 7.8k dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 9 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

2.0 match 959 stars 15.16 score 4.0k scripts 21 dependents

molgenis

MolgenisAuth:'OpenID Connect' Discovery and Authentication

Discover 'OpenID Connect' endpoints and authenticate using device flow. Used by 'MOLGENIS' packages.

Maintained by Mariska Slofstra. Last updated 7 months ago.

jwt

5.4 match 8 stars 5.58 score 5 scripts 2 dependents

bioc

cellxgenedp:Discover and Access Single Cell Data Sets in the CELLxGENE Data Portal

The cellxgene data portal (https://cellxgene.cziscience.com/) provides a graphical user interface to collections of single-cell sequence data processed in standard ways to 'count matrix' summaries. The cellxgenedp package provides an alternative, R-based inteface, allowind data discovery, viewing, and downloading.

Maintained by Martin Morgan. Last updated 5 months ago.

singlecell dataimport thirdpartyclient

4.5 match 8 stars 6.64 score 27 scripts

bioc

motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).

Maintained by Simon Gert Coetzee. Last updated 5 months ago.

chipseq visualization motifannotation transcription

3.2 match 28 stars 8.96 score 103 scripts

bioc

DESpace:DESpace: a framework to discover spatially variable genes

Intuitive framework for identifying spatially variable genes (SVGs) via edgeR, a popular method for performing differential expression analyses. Based on pre-annotated spatial clusters as summarized spatial information, DESpace models gene expression using a negative binomial (NB), via edgeR, with spatial clusters as covariates. SVGs are then identified by testing the significance of spatial clusters. The method is flexible and robust, and is faster than the most SV methods. Furthermore, to the best of our knowledge, it is the only SV approach that allows: - performing a SV test on each individual spatial cluster, hence identifying the key regions of the tissue affected by spatial variability; - jointly fitting multiple samples, targeting genes with consistent spatial patterns across replicates.

Maintained by Peiying Cai. Last updated 5 months ago.

spatial singlecell rnaseq transcriptomics geneexpression sequencing differentialexpression statisticalmethod visualization

5.7 match 4 stars 5.02 score 13 scripts

bioc

DifferentialRegulation:Differentially regulated genes from scRNA-seq data

DifferentialRegulation is a method for detecting differentially regulated genes between two groups of samples (e.g., healthy vs. disease, or treated vs. untreated samples), by targeting differences in the balance of spliced and unspliced mRNA abundances, obtained from single-cell RNA-sequencing (scRNA-seq) data. From a mathematical point of view, DifferentialRegulation accounts for the sample-to-sample variability, and embeds multiple samples in a Bayesian hierarchical model. Furthermore, our method also deals with two major sources of mapping uncertainty: i) 'ambiguous' reads, compatible with both spliced and unspliced versions of a gene, and ii) reads mapping to multiple genes. In particular, ambiguous reads are treated separately from spliced and unsplced reads, while reads that are compatible with multiple genes are allocated to the gene of origin. Parameters are inferred via Markov chain Monte Carlo (MCMC) techniques (Metropolis-within-Gibbs).

Maintained by Simone Tiberi. Last updated 5 months ago.

differentialsplicing bayesian genetics rnaseq sequencing differentialexpression geneexpression multiplecomparison software transcription statisticalmethod visualization singlecell genetarget openblas cpp

5.2 match 10 stars 5.30 score 4 scripts

ropensci

gert:Simple Git Client for R

Simple git client for R based on 'libgit2' <https://libgit2.org> with support for SSH and HTTPS remotes. All functions in 'gert' use basic R data types (such as vectors and data-frames) for their arguments and return values. User credentials are shared with command line 'git' through the git-credential store and ssh keys stored on disk or ssh-agent.

Maintained by Jeroen Ooms. Last updated 4 months ago.

libgit2

1.8 match 154 stars 14.82 score 158 scripts 369 dependents

jeroen

curl:A Modern and Flexible Web Client for R

Bindings to 'libcurl' <https://curl.se/libcurl/> for performing fully configurable HTTP/FTP requests where responses can be processed in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of 'libcurl' is recommended; for a more-user-friendly web client see the 'httr2' package which builds on this package with http specific tools and logic.

Maintained by Jeroen Ooms. Last updated 22 days ago.

curl

1.3 match 225 stars 19.95 score 4.0k scripts 5.8k dependents

shabbychef

cocktailApp:'shiny' App to Discover Cocktails

A 'shiny' app to discover cocktails. The app allows one to search for cocktails by ingredient, filter on rating, and number of ingredients. The package also contains data with the ingredients of nearly 26 thousand cocktails scraped from the web.

Maintained by Steven E. Pav. Last updated 3 years ago.

5.5 match 43 stars 4.33 score 5 scripts

chanzuckerberg

cellxgene.census:CZ CELLxGENE Discover Cell Census

API to facilitate the use of the CZ CELLxGENE Discover Census. For more information about the API and the project visit https://github.com/chanzuckerberg/cellxgene-census/

Maintained by Chan Zuckerberg Initiative Foundation. Last updated 5 months ago.

3.5 match 96 stars 6.60 score 15 scripts

daattali

addinslist:Discover and Install Useful RStudio Addins

Browse through a continuously updated list of existing RStudio addins and install/uninstall their corresponding packages.

Maintained by Dean Attali. Last updated 7 months ago.

3.0 match 850 stars 7.73 score 18 scripts

ropensci

dataaimsr:AIMS Data Platform API Client

AIMS Data Platform API Client which provides easy access to AIMS Data Platform scientific data and information.

Maintained by Diego R. Barneche. Last updated 2 years ago.

aims australia data marine monitoring sst weather

4.5 match 4 stars 5.11 score 54 scripts

bioc

ExperimentHub:Client to access ExperimentHub resources

This package provides a client for the Bioconductor ExperimentHub web resource. ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed. Each resource has associated metadata, tags and date of modification. The client creates and manages a local cache of files retrieved enabling quick and reproducible access.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure dataimport gui thirdpartyclient core-package u24ca289073

1.7 match 9 stars 11.98 score 764 scripts 55 dependents

tiledb-inc

tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays

The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.

Maintained by Isaiah Norton. Last updated 4 days ago.

array hdfs s3 storage-manager tiledb cpp

1.7 match 107 stars 11.96 score 306 scripts 4 dependents

ropensci

crul:HTTP Client

A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Ruby's 'faraday' gem (<https://rubygems.org/gems/faraday>). The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package 'curl', an interface to 'libcurl' (<https://curl.se/libcurl/>).

Maintained by Scott Chamberlain. Last updated 8 months ago.

http https api web-services curl download libcurl async mocking caching

1.3 match 107 stars 14.00 score 240 scripts 162 dependents

cran4linux

bspm:Bridge to System Package Manager

Enables binary package installations on Linux distributions. Provides functions to manage packages via the distribution's package manager. Also provides transparent integration with R's install.packages() and a fallback mechanism. When installed as a system package, interacts with the system's package manager without requiring administrative privileges via an integrated D-Bus service; otherwise, uses sudo. Currently, the following backends are supported: DNF, APT, ALPM.

Maintained by Iñaki Ucar. Last updated 5 months ago.

automation linux packages

3.0 match 82 stars 6.19 score 2 scripts

bioc

GenomicScores:Infrastructure to work with genomewide position-specific scores

Provide infrastructure to store and access genomewide position-specific scores within R and Bioconductor.

Maintained by Robert Castelo. Last updated 1 months ago.

infrastructure genetics annotation sequencing coverage annotationhubsoftware

2.0 match 8 stars 8.71 score 83 scripts 6 dependents

r-lib

gargle:Utilities for Working with Google APIs

Provides utilities for working with Google APIs <https://developers.google.com/apis-explorer>. This includes functions and classes for handling common credential types and for preparing, executing, and processing HTTP requests.

Maintained by Jennifer Bryan. Last updated 2 years ago.

authentication google

1.2 match 113 stars 14.88 score 266 scripts 192 dependents

bioc

DiscoRhythm:Interactive Workflow for Discovering Rhythmicity in Biological Data

Set of functions for estimation of cyclical characteristics, such as period, phase, amplitude, and statistical significance in large temporal datasets. Supporting functions are available for quality control, dimensionality reduction, spectral analysis, and analysis of experimental replicates. Contains a R Shiny web interface to execute all workflow steps.

Maintained by Matthew Carlucci. Last updated 5 months ago.

software timecourse qualitycontrol visualization gui principalcomponent bioconductor data-visualization oscillations rhythm-detection webapp

2.9 match 13 stars 5.89 score 9 scripts

viascientific

viafoundry:R Client for 'Via Foundry' API

'Via Foundry' API provides streamlined tools for interacting with and extracting data from structured responses, particularly for use cases involving hierarchical data from Foundry's API. It includes functions to fetch and parse process-level and file-level metadata, allowing users to efficiently query and manipulate nested data structures. Key features include the ability to list all unique process names, retrieve file metadata for specific or all processes, and dynamically load or download files based on their type. With built-in support for handling various file formats (e.g., tabular and non-tabular files) and seamless integration with API through authentication, this package is designed to enhance workflows involving large-scale data management and analysis. Robust error handling and flexible configuration ensure reliable performance across diverse data environments. Please consult the documentation for the API endpoint for your installation.

Maintained by Alper Kucukural. Last updated 2 months ago.

5.0 match 3.40 score 3 scripts

bioc

cBioPortalData:Exposes and Makes Available Data from the cBioPortal Web Resources

The cBioPortalData R package accesses study datasets from the cBio Cancer Genomics Portal. It accesses the data either from the pre-packaged zip / tar files or from the API interface that was recently implemented by the cBioPortal Data Team. The package can provide data in either tabular format or with MultiAssayExperiment object that uses familiar Bioconductor data representations.

Maintained by Marcel Ramos. Last updated 9 days ago.

software infrastructure thirdpartyclient bioconductor-package nci-itcr u24ca289073

1.5 match 33 stars 10.15 score 147 scripts 4 dependents

cran

bnlearn:Bayesian Network Structure Learning, Parameter Learning and Inference

Bayesian network structure learning, parameter learning and inference. This package implements constraint-based (PC, GS, IAMB, Inter-IAMB, Fast-IAMB, MMPC, Hiton-PC, HPC), pairwise (ARACNE and Chow-Liu), score-based (Hill-Climbing and Tabu Search) and hybrid (MMHC, RSMAX2, H2PC) structure learning algorithms for discrete, Gaussian and conditional Gaussian networks, along with many score functions and conditional independence tests. The Naive Bayes and the Tree-Augmented Naive Bayes (TAN) classifiers are also implemented. Some utility functions (model comparison and manipulation, random data generation, arc orientation testing, simple and advanced plots) are included, as well as support for parameter estimation (maximum likelihood and Bayesian) and inference, conditional probability queries, cross-validation, bootstrap and model averaging. Development snapshots with the latest bugfixes are available from <https://www.bnlearn.com/>.

Maintained by Marco Scutari. Last updated 2 months ago.

openblas

1.8 match 57 stars 7.72 score 32 dependents

ropensci

rotl:Interface to the 'Open Tree of Life' API

An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.

Maintained by Francois Michonneau. Last updated 2 years ago.

metadata ropensci phylogenetics independant-contrasts biodiversity peer-reviewed phylogeny taxonomy

1.1 match 40 stars 12.05 score 356 scripts 29 dependents

bioc

MotifPeeker:Benchmarking Epigenomic Profiling Methods Using Motif Enrichment

MotifPeeker is used to compare and analyse datasets from epigenomic profiling methods with motif enrichment as the key benchmark. The package outputs an HTML report consisting of three sections: (1. General Metrics) Overview of peaks-related general metrics for the datasets (FRiP scores, peak widths and motif-summit distances). (2. Known Motif Enrichment Analysis) Statistics for the frequency of user-provided motifs enriched in the datasets. (3. De-Novo Motif Enrichment Analysis) Statistics for the frequency of de-novo discovered motifs enriched in the datasets and compared with known motifs.

Maintained by Hiranyamaya Dash. Last updated 2 months ago.

epigenetics genetics qualitycontrol chipseq multiplecomparison functionalgenomics motifdiscovery sequencematching software alignment bioconductor bioconductor-package chip-seq epigenomics interactive-report motif-enrichment-analysis

2.5 match 2 stars 5.48 score 6 scripts

pursuitofdatascience

tidyEmoji:Discovers Emoji from Text

Unicodes are not friendly to work with, and not all Unicodes are Emoji per se, making obtaining Emoji statistics a difficult task. This tool can help your experience of working with Emoji as smooth as possible, as it has the 'tidyverse' style.

Maintained by Youzhi Yu. Last updated 2 years ago.

3.3 match 2 stars 4.00 score 7 scripts

bioc

methodical:Discovering genomic regions where methylation is strongly associated with transcriptional activity

DNA methylation is generally considered to be associated with transcriptional silencing. However, comprehensive, genome-wide investigation of this relationship requires the evaluation of potentially millions of correlation values between the methylation of individual genomic loci and expression of associated transcripts in a relatively large numbers of samples. Methodical makes this process quick and easy while keeping a low memory footprint. It also provides a novel method for identifying regions where a number of methylation sites are consistently strongly associated with transcriptional expression. In addition, Methodical enables housing DNA methylation data from diverse sources (e.g. WGBS, RRBS and methylation arrays) with a common framework, lifting over DNA methylation data between different genome builds and creating base-resolution plots of the association between DNA methylation and transcriptional activity at transcriptional start sites.

Maintained by Richard Heery. Last updated 2 months ago.

dnamethylation methylationarray transcription genomewideassociation software openjdk

2.8 match 4.65 score 14 scripts

bioc

TrajectoryGeometry:This Package Discovers Directionality in Time and Pseudo-times Series of Gene Expression Patterns

Given a time series or pseudo-times series of gene expression data, we might wish to know: Do the changes in gene expression in these data exhibit directionality? Are there turning points in this directionality. Do different subsets of the data move in different directions? This package uses spherical geometry to probe these sorts of questions. In particular, if we are looking at (say) the first n dimensions of the PCA of gene expression, directionality can be detected as the clustering of points on the (n-1)-dimensional sphere.

Maintained by Michael Shapiro. Last updated 5 months ago.

biologicalquestion statisticalmethod geneexpression singlecell

2.8 match 4.60 score 7 scripts

jbdorey

BeeBDC:Occurrence Data Cleaning

Flags and checks occurrence data that are in Darwin Core format. The package includes generic functions and data as well as some that are specific to bees. This package is meant to build upon and be complimentary to other excellent occurrence cleaning packages, including 'bdc' and 'CoordinateCleaner'. This package uses datasets from several sources and particularly from the Discover Life Website, created by Ascher and Pickering (2020). For further information, please see the original publication and package website. Publication - Dorey et al. (2023) <doi:10.1101/2023.06.30.547152> and package website - Dorey et al. (2023) <https://github.com/jbdorey/BeeBDC>.

Maintained by James B. Dorey. Last updated 4 months ago.

2.2 match 3 stars 5.68 score 7 scripts

bioc

rcellminer:rcellminer: Molecular Profiles, Drug Response, and Chemical Structures for the NCI-60 Cell Lines

The NCI-60 cancer cell line panel has been used over the course of several decades as an anti-cancer drug screen. This panel was developed as part of the Developmental Therapeutics Program (DTP, http://dtp.nci.nih.gov/) of the U.S. National Cancer Institute (NCI). Thousands of compounds have been tested on the NCI-60, which have been extensively characterized by many platforms for gene and protein expression, copy number, mutation, and others (Reinhold, et al., 2012). The purpose of the CellMiner project (http://discover.nci.nih.gov/ cellminer) has been to integrate data from multiple platforms used to analyze the NCI-60 and to provide a powerful suite of tools for exploration of NCI-60 data.

Maintained by Augustin Luna. Last updated 5 months ago.

acgh cellbasedassays copynumbervariation geneexpression pharmacogenomics pharmacogenetics mirna cheminformatics visualization software systemsbiology

2.1 match 5.71 score 113 scripts

idigbio

ridigbio:Interface to the iDigBio Data API

An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio.

Maintained by Jesse Bennett. Last updated 4 days ago.

1.2 match 16 stars 10.23 score 63 scripts 7 dependents

aravind-j

PGRdup:Discover Probable Duplicates in Plant Genetic Resources Collections

Provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using 'passport databases' comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.

Maintained by J. Aravind. Last updated 2 years ago.

double-metaphone double-metaphone-algorithm natural-language-processing pgr plant-genetic-resources record-linkage

2.9 match 1 stars 4.06 score 23 scripts

bioc

deltaCaptureC:This Package Discovers Meso-scale Chromatin Remodeling from 3C Data

This package discovers meso-scale chromatin remodelling from 3C data. 3C data is local in nature. It givens interaction counts between restriction enzyme digestion fragments and a preferred 'viewpoint' region. By binning this data and using permutation testing, this package can test whether there are statistically significant changes in the interaction counts between the data from two cell types or two treatments.

Maintained by Michael Shapiro. Last updated 5 months ago.

biologicalquestion statisticalmethod

3.3 match 3.48 score 1 scripts

youhuachen

RSE:Number of Newly Discovered Rare Species Estimation

A Bayesian-weighted estimator and two unweighted estimators are developed to estimate the number of newly found rare species in additional ecological samples. Among these methods, the Bayesian-weighted estimator and an unweighted (Chao-derived) estimator are of high accuracy and recommended for practical applications. Technical details of the proposed estimators have been well described in the following paper: Shen TJ, Chen YH (2018) A Bayesian weighted approach to predicting the number of newly discovered rare species. Conservation Biology, In press.

Maintained by Youhua Chen. Last updated 6 years ago.

5.2 match 2.03 score 18 scripts

bioc

ChIPpeakAnno:Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments, or any experiments that result in large number of genomic interval data

The package encompasses a range of functions for identifying the closest gene, exon, miRNA, or custom features—such as highly conserved elements and user-supplied transcription factor binding sites. Additionally, users can retrieve sequences around the peaks and obtain enriched Gene Ontology (GO) or Pathway terms. In version 2.0.5 and beyond, new functionalities have been introduced. These include features for identifying peaks associated with bi-directional promoters along with summary statistics (peaksNearBDP), summarizing motif occurrences in peaks (summarizePatternInPeaks), and associating additional identifiers with annotated peaks or enrichedGO (addGeneIDs). The package integrates with various other packages such as biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest, and stat to enhance its analytical capabilities.

Maintained by Jianhong Ou. Last updated 2 months ago.

annotation chipseq chipchip

1.2 match 8.75 score 584 scripts 6 dependents

paolodalena

tastypie:Easy Pie Charts

You only need to type 'why pie charts are bad' on Google to find thousands of articles full of (valid) reasons why other types of charts should be preferred over this one. Therefore, because of the little use due to the reasons already mentioned, making pie charts (and related) in R is not straightforward, so other functions are needed to simplify things. In this R package there are useful functions to make 'tasty' pie charts immediately by exploiting the many cool templates provided.

Maintained by Paolo Dalena. Last updated 2 years ago.

ggplot2 pie pie-chart tastypie

2.0 match 15 stars 5.24 score 23 scripts

bioc

EnrichDO:a Global Weighted Model for Disease Ontology Enrichment Analysis

To implement disease ontology (DO) enrichment analysis, this package is designed and presents a double weighted model based on the latest annotations of the human genome with DO terms, by integrating the DO graph topology on a global scale. This package exhibits high accuracy that it can identify more specific DO terms, which alleviates the over enriched problem. The package includes various statistical models and visualization schemes for discovering the associations between genes and diseases from biological big data.

Maintained by Hongyu Fu. Last updated 4 months ago.

annotation visualization genesetenrichment software

2.1 match 4.74 score 9 scripts

argocanada

argoFloats:Analysis of Oceanographic Argo Floats

Supports the analysis of oceanographic data recorded by Argo autonomous drifting profiling floats. Functions are provided to (a) download and cache data files, (b) subset data in various ways, (c) handle quality-control flags and (d) plot the results according to oceanographic conventions. A shiny app is provided for easy exploration of datasets. The package is designed to work well with the 'oce' package, providing a wide range of processing capabilities that are particular to oceanographic analysis. See Kelley, Harbin, and Richards (2021) <doi:10.3389/fmars.2021.635922> for more on the scientific context and applications.

Maintained by Dan Kelley. Last updated 30 days ago.

1.3 match 17 stars 7.32 score 203 scripts

nixtla

nixtlar:A Software Development Kit for 'Nixtla''s 'TimeGPT'

A Software Development Kit for working with 'Nixtla''s 'TimeGPT', a foundation model for time series forecasting. 'API' is an acronym for 'application programming interface'; this package allows users to interact with 'TimeGPT' via the 'API'. You can set and validate 'API' keys and generate forecasts via 'API' calls. It is compatible with 'tsibble' and base R. For more details visit <https://docs.nixtla.io/>.

Maintained by Mariana Menchero. Last updated 27 days ago.

1.2 match 30 stars 8.16 score 38 scripts

bioc

Harman:The removal of batch effects from datasets using a PCA and constrained optimisation based technique

Harman is a PCA and constrained optimisation based technique that maximises the removal of batch effects from datasets, with the constraint that the probability of overcorrection (i.e. removing genuine biological signal along with batch noise) is kept to a fraction which is set by the end-user.

Maintained by Jason Ross. Last updated 5 months ago.

batcheffect microarray multiplecomparison principalcomponent normalization preprocessing dnamethylation transcription software statisticalmethod cpp

1.9 match 4.97 score 31 scripts 1 dependents

bioc

VaSP:Quantification and Visualization of Variations of Splicing in Population

Discovery of genome-wide variable alternative splicing events from short-read RNA-seq data and visualizations of gene splicing information for publication-quality multi-panel figures in a population. (Warning: The visualizing function is removed due to the dependent package Sushi deprecated. If you want to use it, please change back to an older version.)

Maintained by Huihui Yu. Last updated 5 months ago.

rnaseq alternativesplicing differentialsplicing statisticalmethod visualization preprocessing clustering differentialexpression kegg immunooncology 3s-scores alternative-splicing ballgown rna-seq splicing sqtl statistics

1.9 match 3 stars 4.78 score 3 scripts

netique

buildr:Organize & Run Build Scripts Comfortably

Working with reproducible reports or any other similar projects often require to run the script that builds the output file in a specified way. 'buildr' can help you organize, modify and comfortably run those scripts. The package provides a set of functions that interactively guides you through the process and that are available as 'RStudio' Addin, meaning you can set up the keyboard shortcuts, enabling you to choose and run the desired build script with one keystroke anywhere anytime.

Maintained by Jan Netik. Last updated 11 months ago.

addin buildr keyboard-shortcut rstudio-addin

1.8 match 15 stars 4.88 score 8 scripts

jiefei-wang

aws.ecx:Communicating with AWS EC2 and ECS using AWS REST APIs

Providing the functions for communicating with Amazon Web Services(AWS) Elastic Compute Cloud(EC2) and Elastic Container Service(ECS). The functions will have the prefix 'ecs_' or 'ec2_' depending on the class of the API. The request will be sent via the REST API and the parameters are given by the function argument. The credentials can be set via 'aws_set_credentials'. The EC2 documentation can be found at <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Welcome.html> and ECS can be found at <https://docs.aws.amazon.com/AmazonECS/latest/APIReference/Welcome.html>.

Maintained by Jiefei Wang. Last updated 3 years ago.

ec2 ecs ecs-functions

2.0 match 1 stars 4.18 score 2 scripts

ropensci

qcoder:Lightweight Qualitative Coding

A free, lightweight, open source option for analyzing text-based qualitative data. Enables analysis of interview transcripts, observation notes, memos, and other sources. Supports the work of social scientists, historians, humanists, and other researchers who use qualitative methods. Addresses the unique challenges faced in analyzing qualitative data analysis. Provides opportunities for researchers who otherwise might not develop software to build software development skills.

Maintained by Elin Waring. Last updated 3 years ago.

unconf unconf18

1.7 match 134 stars 5.05 score 13 scripts

bioc

DOSE:Disease Ontology Semantic and Enrichment analysis

This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation visualization multiplecomparison genesetenrichment pathways software disease-ontology enrichment-analysis semantic-similarity

0.5 match 119 stars 14.97 score 2.0k scripts 61 dependents

s-fleck

rotor:Log Rotation and Conditional Backups

Conditionally rotate or back-up files based on their size or the date of the last backup; inspired by the 'Linux' utility 'logrotate'.

Maintained by Stefan Fleck. Last updated 2 years ago.

backup logging logrotate logrotation

2.0 match 12 stars 3.78 score 10 scripts

ftwkoopmans

goat:Gene Set Analysis Using the Gene Set Ordinal Association Test

Perform gene set enrichment analyses using the Gene set Ordinal Association Test (GOAT) algorithm and visualize your results. Koopmans, F. (2024) <doi:10.1038/s42003-024-06454-5>.

Maintained by Frank Koopmans. Last updated 22 days ago.

bioinformatics geneset-enrichment geneset-enrichment-analysis cpp openmp

1.7 match 10 stars 4.40 score 8 scripts

doi-usgs

dataRetrieval:Retrieval Functions for USGS and EPA Hydrology and Water Quality Data

Collection of functions to help retrieve U.S. Geological Survey and U.S. Environmental Protection Agency water quality and hydrology data from web services. Data are discovered from National Water Information System <https://waterservices.usgs.gov/> and <https://waterdata.usgs.gov/nwis>. Water quality data are obtained from the Water Quality Portal <https://www.waterqualitydata.us/>.

Maintained by Laura DeCicco. Last updated 17 days ago.

usgs

0.5 match 280 stars 14.18 score 1.7k scripts 15 dependents

molevolepid

SEEPS:Sequence evolution and epidemiological process simulator

A modular, modern simulation suite and toolkit for simulating transmission networks, phylogenies, and evolutionary pairwise distance matrices under different models and assumptions for viral/sequence evolution. While intially developed for HIV, SEEPS offers modular utilities for custom workflows for extension beyond HIV.

Maintained by Michael Kupperman. Last updated 2 months ago.

biological-sequences epidemiology evolution hiv simulation-framework

1.8 match 1 stars 3.95 score 6 scripts

bioc

Rcollectl:Help use collectl with R in Linux, to measure resource consumption in R processes

Provide functions to obtain instrumentation data on processes in a unix environment. Parse output of a collectl run. Vizualize aspects of system usage over time, with annotation.

Maintained by Vincent Carey. Last updated 5 months ago.

software infrastructure

1.8 match 3 stars 3.95 score 7 scripts

ryanzomorrodi

healthatlas:Explore and Import 'Metopio' Health Atlas Data and Spatial Layers

Allows for painless use of the 'Metopio' health atlas APIs <https://metopio.com/how-it-works/atlas/> to explore and import data. 'Metopio' health atlases store open public health data. See what topics (or indicators) are available among specific populations, periods, and geographic layers. Download relevant data along with geographic boundaries or point datasets. Spatial datasets are returned as 'sf' objects.

Maintained by Ryan Zomorrodi. Last updated 14 days ago.

1.5 match 1 stars 4.60 score 5 scripts

cran

texteffect:Discovering Latent Treatments in Text Corpora and Estimating Their Causal Effects

Implements the approach described in Fong and Grimmer (2016) <https://aclweb.org/anthology/P/P16/P16-1151.pdf> for automatically discovering latent treatments from a corpus and estimating the average marginal component effect (AMCE) of each treatment. The data is divided into a training and test set. The supervised Indian Buffet Process (sibp) is used to discover latent treatments in the training set. The fitted model is then applied to the test set to infer the values of the latent treatments in the test set. Finally, Y is regressed on the latent treatments in the test set to estimate the causal effect of each treatment.

Maintained by Christian Fong. Last updated 6 years ago.

5.3 match 2 stars 1.30 score 7 scripts

rcarragh

c212:Methods for Detecting Safety Signals in Clinical Trials Using Body-Systems (System Organ Classes)

Provides a self-contained set of methods to aid clinical trial safety investigators, statisticians and researchers, in the early detection of adverse events using groupings by body-system or system organ class. This work was supported by the Engineering and Physical Sciences Research Council (UK) (EPSRC) [award reference 1521741] and Frontier Science (Scotland) Ltd. The package title c212 is in reference to the original Engineering and Physical Sciences Research Council (UK) funded project which was named CASE 2/12.

Maintained by Raymond Carragher. Last updated 4 months ago.

cpp

1.7 match 4.06 score 57 scripts

haythorn

sr:Smooth Regression - The Gamma Test and Tools

Finds causal connections in precision data, finds lags and embeddings in time series, guides training of neural networks and other smooth models, evaluates their performance, gives a mathematically grounded answer to the over-training problem. Smooth regression is based on the Gamma test, which measures smoothness in a multivariate relationship. Causal relations are smooth, noise is not. 'sr' includes the Gamma test and search techniques that use it. References: Evans & Jones (2002) <doi:10.1098/rspa.2002.1010>, AJ Jones (2004) <doi:10.1007/s10287-003-0006-1>.

Maintained by Wayne Haythorn. Last updated 2 years ago.

1.8 match 3.70 score 9 scripts

hrbrmstr

wand:Retrieve Magic Attributes from Files and Directories

MIME types are shorthand descriptors for file contents and can be determined from "magic" bytes in file headers, file contents or intuited from file extensions. Tools are provided to perform curated "magic" tests as well as mapping MIME types from a database of over 1,800 extension mappings.

Maintained by Bob Rudis. Last updated 5 years ago.

1.8 match 3.69 score 11 scripts 3 dependents

ropensci

sodium:A Modern and Easy-to-Use Crypto Library

Bindings to 'libsodium' <https://doc.libsodium.org/>: a modern, easy-to-use software library for encryption, decryption, signatures, password hashing and more. Sodium uses curve25519, a state-of-the-art Diffie-Hellman function by Daniel Bernstein, which has become very popular after it was discovered that the NSA had backdoored Dual EC DRBG.

Maintained by Jeroen Ooms. Last updated 3 months ago.

libsodium

0.5 match 70 stars 12.43 score 175 scripts 103 dependents

rinterface

shinyMobile:Mobile Ready 'shiny' Apps with Standalone Capabilities

Develop outstanding 'shiny' apps for 'iOS' and 'Android' as well as beautiful 'shiny' gadgets. 'shinyMobile' is built on top of the latest 'Framework7' template <https://framework7.io>. Discover 14 new input widgets (sliders, vertical sliders, stepper, grouped action buttons, toggles, picker, smart select, ...), 2 themes (light and dark), 12 new widgets (expandable cards, badges, chips, timelines, gauges, progress bars, ...) combined with the power of server-side notifications such as alerts, modals, toasts, action sheets, sheets (and more) as well as 3 layouts (single, tabs and split).

Maintained by David Granjon. Last updated 2 months ago.

android hacktoberfest2022 pwa shiny shinyapps template

0.5 match 409 stars 11.91 score 1.1k scripts 2 dependents

neonscience

neonUtilities:Utilities for Working with NEON Data

NEON data packages can be accessed through the NEON Data Portal <https://www.neonscience.org> or through the NEON Data API (see <https://data.neonscience.org/data-api> for documentation). Data delivered from the Data Portal are provided as monthly zip files packaged within a parent zip file, while individual files can be accessed from the API. This package provides tools that aid in discovering, downloading, and reformatting data prior to use in analyses. This includes downloading data via the API, merging data tables by type, and converting formats. For more information, see the readme file at <https://github.com/NEONScience/NEON-utilities>.

Maintained by Claire Lunch. Last updated 1 months ago.

0.5 match 57 stars 10.66 score 944 scripts 15 dependents

sciviews

data.io:Read and Write Data in Different Formats

Read or write data from many different formats (tabular datasets, from statistic software ...) into R objects. Add labels and units in different languages.

Maintained by Philippe Grosjean. Last updated 11 months ago.

dataset sciviews

1.3 match 1 stars 4.32 score 20 scripts 7 dependents

bioc

TTMap:Two-Tier Mapper: a clustering tool based on topological data analysis

TTMap is a clustering method that groups together samples with the same deviation in comparison to a control group. It is specially useful when the data is small. It is parameter free.

Maintained by Rachel Jeitziner. Last updated 5 months ago.

software microarray differentialexpression multiplecomparison clustering classification

1.8 match 3.00 score

slesche

acdcquery:Query the Attentional Control Data Collection

Interact with the Attentional Control Data Collection (ACDC). Connect to the database via connect_to_db(), set filter arguments via add_argument() and query the database via query_db().

Maintained by Sven Lesche. Last updated 4 months ago.

1.9 match 2.70 score 10 scripts

program--

HSClientR:A HydroShare API client for R

A RESTful API wrapper for accessing <https://hydroshare.org> data in R.

Maintained by Justin Singh-Mohudpur. Last updated 4 years ago.

api-wrapper cuashi hydrology hydroshare water-resources

2.0 match 4 stars 2.30 score 2 scripts

cran

osdatahub:Easier Interaction with the Ordnance Survey Data Hub

Ordnance Survey ('OS') is the national mapping agency for Great Britain and produces a large variety of mapping and geospatial products. Much of OS's data is available via the OS Data Hub <https://osdatahub.os.uk/>, a platform that hosts both free and premium data products. 'osdatahub' provides a user-friendly way to access, query, and download these data.

Maintained by Chris Jochem. Last updated 1 years ago.

1.3 match 3.45 score 14 scripts

jpquast

protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.

Maintained by Jan-Philipp Quast. Last updated 5 months ago.

data-analysis lip-ms mass-spectrometry omics protein proteomics systems-biology

0.5 match 61 stars 8.58 score 83 scripts

reimandlab

ActivePathways:Integrative Pathway Enrichment Analysis of Multivariate Omics Data

Framework for analysing multiple omics datasets in the context of molecular pathways, biological processes and other types of gene sets. The package uses p-value merging to combine gene- or protein-level signals, followed by ranked hypergeometric tests to determine enriched pathways and processes. Genes can be integrated using directional constraints that reflect how the input datasets are expected interact with one another. This approach allows researchers to interpret a series of omics datasets in the context of known biology and gene function, and discover associations that are only apparent when several datasets are combined. The recent version of the package is part of the following publication: Directional integration and pathway enrichment analysis for multi-omics data. Slobodyanyuk M^, Bahcheli AT^, Klein ZP, Bayati M, Strug LJ, Reimand J. Nature Communications (2024) <doi:10.1038/s41467-024-49986-4>.

Maintained by Juri Reimand. Last updated 8 months ago.

0.5 match 107 stars 8.61 score 35 scripts 2 dependents

cthombor

SafeVote:Election Vote Counting with Safety Features

Fork of 'vote_2.3-2', Raftery et al. (2021) <DOI:10.32614/RJ-2021-086>, with additional support for stochastic experimentation.

Maintained by Clark Thomborson. Last updated 5 months ago.

1.6 match 2.70 score 5 scripts

rcannood

SCORPIUS:Inferring Developmental Chronologies from Single-Cell RNA Sequencing Data

An accurate and easy tool for performing linear trajectory inference on single cells using single-cell RNA sequencing data. In addition, 'SCORPIUS' provides functions for discovering the most important genes with respect to the reconstructed trajectory, as well as nice visualisation tools. Cannoodt et al. (2016) <doi:10.1101/079509>.

Maintained by Robrecht Cannoodt. Last updated 2 years ago.

0.5 match 59 stars 8.17 score 126 scripts

bioc

debrowser:Interactive Differential Expresion Analysis Browser

Bioinformatics platform containing interactive plots and tables for differential gene and region expression studies. Allows visualizing expression data much more deeply in an interactive and faster way. By changing the parameters, users can easily discover different parts of the data that like never have been done before. Manually creating and looking these plots takes time. With DEBrowser users can prepare plots without writing any code. Differential expression, PCA and clustering analysis are made on site and the results are shown in various plots such as scatter, bar, box, volcano, ma plots and Heatmaps.

Maintained by Alper Kucukural. Last updated 5 months ago.

sequencing chipseq rnaseq differentialexpression geneexpression clustering immunooncology

0.5 match 61 stars 7.80 score 65 scripts

bioc

animalcules:Interactive microbiome analysis toolkit

animalcules is an R package for utilizing up-to-date data analytics, visualization methods, and machine learning models to provide users an easy-to-use interactive microbiome analysis framework. It can be used as a standalone software package or users can explore their data with the accompanying interactive R Shiny application. Traditional microbiome analysis such as alpha/beta diversity and differential abundance analysis are enhanced, while new methods like biomarker identification are introduced by animalcules. Powerful interactive and dynamic figures generated by animalcules enable users to understand their data better and discover new insights.

Maintained by Jessica McClintock. Last updated 5 months ago.

microbiome metagenomics coverage visualization

0.5 match 55 stars 6.95 score 23 scripts

bioc

MoonlightR:Identify oncogenes and tumor suppressor genes from omics data

Motivation: The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). Results: We present an R/bioconductor package called MoonlightR which returns a list of candidate driver genes for specific cancer types on the basis of TCGA expression data. The method first infers gene regulatory networks and then carries out a functional enrichment analysis (FEA) (implementing an upstream regulator analysis, URA) to score the importance of well-known biological processes with respect to the studied cancer type. Eventually, by means of random forests, MoonlightR predicts two specific roles for the candidate driver genes: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, MoonlightR can be used to discover OCGs and TSGs in the same cancer type. This may help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV) in breast cancer. In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments.

Maintained by Matteo Tiberti. Last updated 5 months ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network survival genesetenrichment networkenrichment

0.5 match 17 stars 6.57 score

bioc

Moonlight2R:Identify oncogenes and tumor suppressor genes from omics data

The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). We present an updated version of the R/bioconductor package called MoonlightR, namely Moonlight2R, which returns a list of candidate driver genes for specific cancer types on the basis of omics data integration. The Moonlight framework contains a primary layer where gene expression data and information about biological processes are integrated to predict genes called oncogenic mediators, divided into putative tumor suppressors and putative oncogenes. This is done through functional enrichment analyses, gene regulatory networks and upstream regulator analyses to score the importance of well-known biological processes with respect to the studied cancer type. By evaluating the effect of the oncogenic mediators on biological processes or through random forests, the primary layer predicts two putative roles for the oncogenic mediators: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As gene expression data alone is not enough to explain the deregulation of the genes, a second layer of evidence is needed. We have automated the integration of a secondary mutational layer through new functionalities in Moonlight2R. These functionalities analyze mutations in the cancer cohort and classifies these into driver and passenger mutations using the driver mutation prediction tool, CScape-somatic. Those oncogenic mediators with at least one driver mutation are retained as the driver genes. As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, Moonlight2R can be used to discover OCGs and TSGs in the same cancer type. This may for instance help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV). In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments. An additional mechanistic layer evaluates if there are mutations affecting the protein stability of the transcription factors (TFs) of the TSGs and OCGs, as that may have an effect on the expression of the genes.

Maintained by Matteo Tiberti. Last updated 2 months ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network survival genesetenrichment networkenrichment

0.5 match 5 stars 6.59 score 43 scripts

cran

newFocus:True Discovery Guarantee by Combining Partial Closed Testings

Closed testing has been proved powerful for true discovery guarantee. The computation of closed testing is, however, quite burdensome. A general way to reduce computational complexity is to combine partial closed testings for some prespecified feature sets of interest. Partial closed testings are performed at Bonferroni-corrected alpha level to guarantee the lower bounds for the number of true discoveries in prespecified sets are simultaneously valid. For any post hoc chosen sets of interest, coherence property is used to get the lower bound. In this package, we implement closed testing with globaltest to calculate the lower bound for number of true discoveries, see Ningning Xu et.al (2021) <arXiv:2001.01541> for detailed description.

Maintained by Ningning Xu. Last updated 4 years ago.

3.3 match 1.00 score

przechoj

gips:Gaussian Model Invariant by Permutation Symmetry

Find the permutation symmetry group such that the covariance matrix of the given data is approximately invariant under it. Discovering such a permutation decreases the number of observations needed to fit a Gaussian model, which is of great use when it is smaller than the number of variables. Even if that is not the case, the covariance matrix found with 'gips' approximates the actual covariance with less statistical error. The methods implemented in this package are described in Graczyk et al. (2022) <doi:10.1214/22-AOS2174>.

Maintained by Adam Przemysław Chojecki. Last updated 8 months ago.

covariance-estimation machine-learning normal-distribution

0.5 match 6 stars 6.40 score 31 scripts

huanglabumn

oncoPredict:Drug Response Modeling and Biomarker Discovery

Allows for building drug response models using screening data between bulk RNA-Seq and a drug response metric and two additional tools for biomarker discovery that have been developed by the Huang Laboratory at University of Minnesota. There are 3 main functions within this package. (1) calcPhenotype is used to build drug response models on RNA-Seq data and impute them on any other RNA-Seq dataset given to the model. (2) GLDS is used to calculate the general level of drug sensitivity, which can improve biomarker discovery. (3) IDWAS can take the results from calcPhenotype and link the imputed response back to available genomic (mutation and CNV alterations) to identify biomarkers. Each of these functions comes from a paper from the Huang research laboratory. Below gives the relevant paper for each function. calcPhenotype - Geeleher et al, Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. GLDS - Geeleher et al, Cancer biomarker discovery is improved by accounting for variability in general levels of drug sensitivity in pre-clinical models. IDWAS - Geeleher et al, Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies.

Maintained by Robert Gruener. Last updated 12 months ago.

sva preprocesscore stringr biomart genefilter org.hs.eg.db genomicfeatures txdb.hsapiens.ucsc.hg19.knowngene tcgabiolinks biocgenerics genomicranges iranges s4vectors

0.5 match 18 stars 6.47 score 41 scripts

ddalthorp

eoa3:Wildlife Mortality Estimator for Low Fatality Rates and Imperfect Detection

Evidence of Absence software (EoA) is a user-friendly application for estimating bird and bat fatalities at wind farms and designing search protocols. The software is particularly useful in addressing whether the number of fatalities has exceeded a given threshold and what search parameters are needed to give assurance that thresholds were not exceeded. The models are applicable even when zero carcasses have been found in searches, following Huso et al. (2015) <doi:10.1890/14-0764.1>, Dalthorp et al. (2017) <doi:10.3133/ds1055>, and Dalthorp and Huso (2015) <doi:10.3133/ofr20151227>.

Maintained by Daniel Dalthorp. Last updated 4 months ago.

jags cpp

3.2 match 1.00 score

kforner

srcpkgs:R Source Packages Manager

Manage a collection/library of R source packages. Discover, document, load, test source packages. Enable to use those packages as if they were actually installed. Quickly reload only what is needed on source code change. Run tests and checks in parallel.

Maintained by Karl Forner. Last updated 10 months ago.

0.5 match 11 stars 6.04 score 6 scripts

cran

startR:Automatically Retrieve Multidimensional Distributed Data Sets

Tool to automatically fetch, transform and arrange subsets of multi- dimensional data sets (collections of files) stored in local and/or remote file systems or servers, using multicore capabilities where possible. The tool provides an interface to perceive a collection of data sets as a single large multidimensional data array, and enables the user to request for automatic retrieval, processing and arrangement of subsets of the large array. Wrapper functions to add support for custom file formats can be plugged in/out, making the tool suitable for any research field where large multidimensional data sets are involved.

Maintained by Victoria Agudetse. Last updated 6 months ago.

1.7 match 1.78 score 2 dependents

jamesdalg

CNVScope:A Versatile Toolkit for Copy Number Variation Relationship Data Analysis and Visualization

Provides the ability to create interaction maps, discover CNV map domains (edges), gene annotate interactions, and create interactive visualizations of these CNV interaction maps.

Maintained by James Dalgleish. Last updated 3 years ago.

0.5 match 8 stars 5.58 score 24 scripts

mazamascience

MazamaLocationUtils:Manage Spatial Metadata for Known Locations

Utility functions for discovering and managing metadata associated with spatially unique "known locations". Applications include all fields of environmental monitoring (e.g. air and water quality) where data are collected at stationary sites.

Maintained by Jonathan Callahan. Last updated 3 months ago.

0.5 match 5.64 score 108 scripts

ropensci

hddtools:Hydrological Data Discovery Tools

Tools to discover hydrological data, accessing catalogues and databases from various data providers. The package is described in Vitolo (2017) "hddtools: Hydrological Data Discovery Tools" <doi:10.21105/joss.00056>.

Maintained by Dorothea Hug Peter. Last updated 7 months ago.

data60uk grdc hydrology kgclimateclass mopex peer-reviewed precipitation sepa

0.5 match 48 stars 5.56 score 25 scripts

bioc

CytoGLMM:Conditional Differential Analysis for Flow and Mass Cytometry Experiments

The CytoGLMM R package implements two multiple regression strategies: A bootstrapped generalized linear model (GLM) and a generalized linear mixed model (GLMM). Most current data analysis tools compare expressions across many computationally discovered cell types. CytoGLMM focuses on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. As a result, CytoGLMM finds differential proteins in flow and mass cytometry data while reducing biases arising from marker correlations and safeguarding against false discoveries induced by patient heterogeneity.

Maintained by Christof Seiler. Last updated 5 months ago.

flowcytometry proteomics singlecell cellbasedassays cellbiology immunooncology regression statisticalmethod software

0.5 match 2 stars 5.68 score 1 scripts 1 dependents

dpc10ster

RJafroc:Artificial Intelligence Systems and Observer Performance

Analyzing the performance of artificial intelligence (AI) systems/algorithms characterized by a 'search-and-report' strategy. Historically observer performance has dealt with measuring radiologists' performances in search tasks, e.g., searching for lesions in medical images and reporting them, but the implicit location information has been ignored. The implemented methods apply to analyzing the absolute and relative performances of AI systems, comparing AI performance to a group of human readers or optimizing the reporting threshold of an AI system. In addition to performing historical receiver operating receiver operating characteristic (ROC) analysis (localization information ignored), the software also performs free-response receiver operating characteristic (FROC) analysis, where lesion localization information is used. A book using the software has been published: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples, Taylor-Francis LLC; 2017: <https://www.routledge.com/Observer-Performance-Methods-for-Diagnostic-Imaging-Foundations-Modeling/Chakraborty/p/book/9781482214840>. Online updates to this book, which use the software, are at <https://dpc10ster.github.io/RJafrocQuickStart/>, <https://dpc10ster.github.io/RJafrocRocBook/> and at <https://dpc10ster.github.io/RJafrocFrocBook/>. Supported data collection paradigms are the ROC, FROC and the location ROC (LROC). ROC data consists of single ratings per images, where a rating is the perceived confidence level that the image is that of a diseased patient. An ROC curve is a plot of true positive fraction vs. false positive fraction. FROC data consists of a variable number (zero or more) of mark-rating pairs per image, where a mark is the location of a reported suspicious region and the rating is the confidence level that it is a real lesion. LROC data consists of a rating and a location of the most suspicious region, for every image. Four models of observer performance, and curve-fitting software, are implemented: the binormal model (BM), the contaminated binormal model (CBM), the correlated contaminated binormal model (CORCBM), and the radiological search model (RSM). Unlike the binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves that do not inappropriately cross the chance diagonal. Additionally, RSM parameters are related to search performance (not measured in conventional ROC analysis) and classification performance. Search performance refers to finding lesions, i.e., true positives, while simultaneously not finding false positive locations. Classification performance measures the ability to distinguish between true and false positive locations. Knowing these separate performances allows principled optimization of reader or AI system performance. This package supersedes Windows JAFROC (jackknife alternative FROC) software V4.2.1, <https://github.com/dpc10ster/WindowsJafroc>. Package functions are organized as follows. Data file related function names are preceded by 'Df', curve fitting functions by 'Fit', included data sets by 'dataset', plotting functions by 'Plot', significance testing functions by 'St', sample size related functions by 'Ss', data simulation functions by 'Simulate' and utility functions by 'Util'. Implemented are figures of merit (FOMs) for quantifying performance and functions for visualizing empirical or fitted operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC) curves. For fully crossed study designs significance testing of reader-averaged FOM differences between modalities is implemented via either Dorfman-Berbaum-Metz or the Obuchowski-Rockette methods. Also implemented is single modality analysis, which allows comparison of performance of a group of radiologists to a specified value, or comparison of AI to a group of radiologists interpreting the same cases. Crossed-modality analysis is implemented wherein there are two crossed modality factors and the aim is to determined performance in each modality factor averaged over all levels of the second factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve the desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's collaborations. This update includes improvements to the code, some as a result of user-reported bugs and new feature requests, and others discovered during ongoing testing and code simplification.

Maintained by Dev Chakraborty. Last updated 5 months ago.

ai-optimization artificial-intelligence-algorithms computer-aided-diagnosis froc-analysis roc-analysis target-classification target-localization cpp

0.5 match 19 stars 5.69 score 65 scripts

pweidemueller

fullRankMatrix:Generation of Full Rank Design Matrix

Creates a full rank matrix out of a given matrix. The intended use is for one-hot encoded design matrices that should be used in linear models to ensure that significant associations can be correctly interpreted. However, 'fullRankMatrix' can be applied to any matrix to make it full rank. It removes columns with only 0's, merges duplicated columns and discovers linearly dependent columns and replaces them with linearly independent columns that span the space of the original columns. Columns are renamed to reflect those modifications. This results in a full rank matrix that can be used as a design matrix in linear models. The algorithm and some functions are inspired by Kuhn, M. (2008) <doi:10.18637/jss.v028.i05>.

Maintained by Paula Weidemueller. Last updated 9 months ago.

0.5 match 14 stars 5.62 score 6 scripts

bioc

msmsTests:LC-MS/MS Differential Expression Tests

Statistical tests for label-free LC-MS/MS data by spectral counts, to discover differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package.The three models admit blocking factors to control for nuissance variables.To assure a good level of reproducibility a post-test filter is available, where we may set the minimum effect size considered biologicaly relevant, and the minimum expression of the most abundant condition.

Maintained by Josep Gregori i Font. Last updated 5 months ago.

immunooncology software massspectrometry proteomics

0.5 match 5.03 score 15 scripts 1 dependents

bioc

PepSetTest:Peptide Set Test

Peptide Set Test (PepSetTest) is a peptide-centric strategy to infer differentially expressed proteins in LC-MS/MS proteomics data. This test detects coordinated changes in the expression of peptides originating from the same protein and compares these changes against the rest of the peptidome. Compared to traditional aggregation-based approaches, the peptide set test demonstrates improved statistical power, yet controlling the Type I error rate correctly in most cases. This test can be valuable for discovering novel biomarkers and prioritizing drug targets, especially when the direct application of statistical analysis to protein data fails to provide substantial insights.

Maintained by Junmin Wang. Last updated 5 months ago.

differentialexpression regression proteomics massspectrometry

0.5 match 2 stars 5.00 score 9 scripts

bioc

sSeq:Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size

The purpose of this package is to discover the genes that are differentially expressed between two conditions in RNA-seq experiments. Gene expression is measured in counts of transcripts and modeled with the Negative Binomial (NB) distribution using a shrinkage approach for dispersion estimation. The method of moment (MM) estimates for dispersion are shrunk towards an estimated target, which minimizes the average squared difference between the shrinkage estimates and the initial estimates. The exact per-gene probability under the NB model is calculated, and used to test the hypothesis that the expected expression of a gene in two conditions identically follow a NB distribution.

Maintained by Danni Yu. Last updated 5 months ago.

immunooncology rnaseq

0.5 match 4.98 score 4 scripts 2 dependents

paulgovan

eAnalytics:Dynamic Web-Based Analytics for the Energy Industry

A 'Shiny' web application for energy industry analytics. Take an overview of the industry, measure Key Performance Indicators, identify changes in the industry over time, and discover new relationships in the data.

Maintained by Paul Govan. Last updated 6 months ago.

analytics energy shiny shinydashboard visualization

0.5 match 34 stars 4.83 score 1 scripts

overton-group

eHDPrep:Quality Control and Semantic Enrichment of Datasets

A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) <doi:10.1093/gigascience/giad030>). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.

Maintained by Ian Overton. Last updated 2 years ago.

data-quality health-informatics semantic-enrichment

0.5 match 8 stars 4.90 score 10 scripts

bioc

methylscaper:Visualization of Methylation Data

methylscaper is an R package for processing and visualizing data jointly profiling methylation and chromatin accessibility (MAPit, NOMe-seq, scNMT-seq, nanoNOMe, etc.). The package supports both single-cell and single-molecule data, and a common interface for jointly visualizing both data types through the generation of ordered representational methylation-state matrices. The Shiny app allows for an interactive seriation process of refinement and re-weighting that optimally orders the cells or DNA molecules to discover methylation patterns and nucleosome positioning.

Maintained by Bacher Rhonda. Last updated 5 months ago.

dnamethylation epigenetics sequencing visualization singlecell nucleosomepositioning

0.5 match 1 stars 4.90 score 3 scripts

mpru

ggcleveland:Implementation of Plots from Cleveland's Visualizing Data Book

William S. Cleveland's book 'Visualizing Data' is a classic piece of literature on Exploratory Data Analysis. Although it was written several decades ago, its content is still relevant as it proposes several tools which are useful to discover patterns and relationships among the data under study, and also to assess the goodness of fit of a model. This package provides functions to produce the 'ggplot2' versions of the visualization tools described in this book and is thought to be used in the context of courses on Exploratory Data Analysis.

Maintained by Marcos Prunello. Last updated 3 years ago.

0.5 match 9 stars 4.83 score 15 scripts

bioc

sarks:Suffix Array Kernel Smoothing for discovery of correlative sequence motifs and multi-motif domains

Suffix Array Kernel Smoothing (see https://academic.oup.com/bioinformatics/article-abstract/35/20/3944/5418797), or SArKS, identifies sequence motifs whose presence correlates with numeric scores (such as differential expression statistics) assigned to the sequences (such as gene promoters). SArKS smooths over sequence similarity, quantified by location within a suffix array based on the full set of input sequences. A second round of smoothing over spatial proximity within sequences reveals multi-motif domains. Discovered motifs can then be merged or extended based on adjacency within MMDs. False positive rates are estimated and controlled by permutation testing.

Maintained by Dennis Wylie. Last updated 5 months ago.

motifdiscovery generegulation geneexpression transcriptomics rnaseq differentialexpression featureextraction openjdk

0.5 match 3 stars 4.78 score 3 scripts

thomasjemielita

StratifiedMedicine:Stratified Medicine

A toolkit for stratified medicine, subgroup identification, and precision medicine. Current tools include (1) filtering models (reduce covariate space), (2) patient-level estimate models (counterfactual patient-level quantities, such as the conditional average treatment effect), (3) subgroup identification models (find subsets of patients with similar treatment effects), and (4) treatment effect estimation and inference (for the overall population and discovered subgroups). These tools can be customized and are directly used in PRISM (patient response identifiers for stratified medicine; Jemielita and Mehrotra 2019 <arXiv:1912.03337>. This package is in beta and will be continually updated.

Maintained by Thomas Jemielita. Last updated 3 years ago.

0.5 match 2 stars 4.73 score 27 scripts

riccardo-df

aggTrees:Aggregation Trees

Nonparametric data-driven approach to discovering heterogeneous subgroups in a selection-on-observables framework. 'aggTrees' allows researchers to assess whether there exists relevant heterogeneity in treatment effects by generating a sequence of optimal groupings, one for each level of granularity. For each grouping, we obtain point estimation and inference about the group average treatment effects. Please reference the use as Di Francesco (2024) <doi:10.48550/arXiv.2410.11408>.

Maintained by Riccardo Di Francesco. Last updated 29 days ago.

0.5 match 4.60 score 4 scripts

cran

CASMI:'CASMI'-Based Functions

Contains Coverage Adjusted Standardized Mutual Information ('CASMI')-based functions. 'CASMI' is a fundamental concept of a series of methods. For more information about 'CASMI' and 'CASMI'-related methods, please refer to the corresponding publications (e.g., a feature selection method, Shi, J., Zhang, J., & Ge, Y. (2019) <doi:10.3390/e21121179>, and a dataset quality measurement method, Shi, J., Zhang, J., & Ge, Y. (2019) <doi:10.1109/ICHI.2019.8904553>) or contact the package author for the latest updates.

Maintained by Jingyi (Catherine) Shi. Last updated 30 days ago.

1.8 match 1.30 score

bioc

PAA:PAA (Protein Array Analyzer)

PAA imports single color (protein) microarray data that has been saved in gpr file format - esp. ProtoArray data. After preprocessing (background correction, batch filtering, normalization) univariate feature preselection is performed (e.g., using the "minimum M statistic" approach - hereinafter referred to as "mMs"). Subsequently, a multivariate feature selection is conducted to discover biomarker candidates. Therefore, either a frequency-based backwards elimination aproach or ensemble feature selection can be used. PAA provides a complete toolbox of analysis tools including several different plots for results examination and evaluation.

Maintained by Michael Turewicz. Last updated 5 months ago.

classification microarray onechannel proteomics cpp

0.5 match 4.34 score 11 scripts

osysoev

psica:Decision Tree Analysis for Probabilistic Subgroup Identification with Multiple Treatments

In the situation when multiple alternative treatments or interventions available, different population groups may respond differently to different treatments. This package implements a method that discovers the population subgroups in which a certain treatment has a better effect than the other alternative treatments. This is done by first estimating the treatment effect for a given treatment and its uncertainty by computing random forests, and the resulting model is summarized by a decision tree in which the probabilities that the given treatment is best for a given subgroup is shown in the corresponding terminal node of the tree.

Maintained by Oleg Sysoev. Last updated 5 years ago.

2.2 match 1.00 score 1 scripts

fbertran

networkABC:Network Reverse Engineering with Approximate Bayesian Computation

We developed an inference tool based on approximate Bayesian computation to decipher network data and assess the strength of the inferred links between network's actors. It is a new multi-level approximate Bayesian computation (ABC) approach. At the first level, the method captures the global properties of the network, such as a scale-free structure and clustering coefficients, whereas the second level is targeted to capture local properties, including the probability of each couple of genes being linked. Up to now, Approximate Bayesian Computation (ABC) algorithms have been scarcely used in that setting and, due to the computational overhead, their application was limited to a small number of genes. On the contrary, our algorithm was made to cope with that issue and has low computational cost. It can be used, for instance, for elucidating gene regulatory network, which is an important step towards understanding the normal cell physiology and complex pathological phenotype. Reverse-engineering consists in using gene expressions over time or over different experimental conditions to discover the structure of the gene network in a targeted cellular process. The fact that gene expression data are usually noisy, highly correlated, and have high dimensionality explains the need for specific statistical methods to reverse engineer the underlying network.

Maintained by Frederic Bertrand. Last updated 2 years ago.

0.5 match 4 stars 4.34 score 11 scripts

swarm-lab

CEC:Cross-Entropy Clustering

Splits data into Gaussian type clusters using the Cross-Entropy Clustering ('CEC') method. This method allows for the simultaneous use of various types of Gaussian mixture models, for performing the reduction of unnecessary clusters, and for discovering new clusters by splitting them. 'CEC' is based on the work of Spurek, P. and Tabor, J. (2014) <doi:10.1016/j.patcog.2014.03.006>.

Maintained by Simon Garnier. Last updated 5 months ago.

clustering cross-entropy openblas cpp

0.5 match 10 stars 4.26 score 18 scripts

bioc

PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit

Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression pharmacogenetics pharmacogenomics software classification survival clustering geneprediction

0.5 match 1 stars 4.31 score 17 scripts

nystat

COLP:Causal Discovery for Categorical Data with Label Permutation

Discover causality for bivariate categorical data. This package aims to enable users to discover causality for bivariate observational categorical data. See Ni, Y. (2022) <arXiv:2209.08579> "Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation. Advances in Neural Information Processing Systems 35 (in press)".

Maintained by Yang Ni. Last updated 2 years ago.

0.8 match 1 stars 2.70 score

olajumokeevangelina

MetabolicSurv:A Biomarker Validation Approach for Classification and Predicting Survival Using Metabolomics Signature

An approach to identifies metabolic biomarker signature for metabolic data by discovering predictive metabolite for predicting survival and classifying patients into risk groups. Classifiers are constructed as a linear combination of predictive/important metabolites, prognostic factors and treatment effects if necessary. Several methods were implemented to reduce the metabolomics matrix such as the principle component analysis of Wold Svante et al. (1987) <doi:10.1016/0169-7439(87)80084-9> , the LASSO method by Robert Tibshirani (1998) <doi:10.1002/(SICI)1097-0258(19970228)16:4%3C385::AID-SIM380%3E3.0.CO;2-3>, the elastic net approach by Hui Zou and Trevor Hastie (2005) <doi:10.1111/j.1467-9868.2005.00503.x>. Sensitivity analysis on the quantile used for the classification can also be accessed to check the deviation of the classification group based on the quantile specified. Large scale cross validation can be performed in order to investigate the mostly selected predictive metabolites and for internal validation. During the evaluation process, validation is accessed using the hazard ratios (HR) distribution of the test set and inference is mainly based on resampling and permutations technique.

Maintained by Olajumoke Evangelina Owokotomo. Last updated 6 years ago.

0.5 match 4.13 score 27 scripts

bupaverse

heuristicsmineR:Discovery of Process Models with the Heuristics Miner

Provides the heuristics miner algorithm for process discovery as proposed by Weijters et al. (2011) <doi:10.1109/CIDM.2011.5949453>. The algorithm builds a causal net from an event log created with the 'bupaR' package. Event logs are a set of ordered sequences of events for which 'bupaR' provides the S3 class eventlog(). The discovered causal nets can be visualised as 'htmlwidgets' and it is possible to annotate them with the occurrence frequency or processing and waiting time of process activities.

Maintained by Felix Mannhardt. Last updated 3 years ago.

bupar event-log heuristics-miner petri-net process-mining cpp

0.5 match 14 stars 4.08 score 17 scripts

bioc

survtype:Subtype Identification with Survival Data

Subtypes are defined as groups of samples that have distinct molecular and clinical features. Genomic data can be analyzed for discovering patient subtypes, associated with clinical data, especially for survival information. This package is aimed to identify subtypes that are both clinically relevant and biologically meaningful.

Maintained by Dongmin Jung. Last updated 5 months ago.

software statisticalmethod geneexpression survival clustering sequencing coverage

0.5 match 4.00 score 3 scripts

s-u

scagnostics:Compute scagnostics - scatterplot diagnostics

Calculates graph theoretic scagnostics. Scagnostics describe various measures of interest for pairs of variables, based on their appearance on a scatterplot. They are useful tool for discovering interesting or unusual scatterplots from a scatterplot matrix, without having to look at every individual plot.

Maintained by Simon Urbanek. Last updated 3 years ago.

openjdk

0.5 match 1 stars 3.81 score 87 scripts 1 dependents

bioc

RareVariantVis:A suite for analysis of rare genomic variants in whole genome sequencing data

Second version of RareVariantVis package aims to provide comprehensive information about rare variants for your genome data. It annotates, filters and presents genomic variants (especially rare ones) in a global, per chromosome way. For discovered rare variants CRISPR guide RNAs are designed, so the user can plan further functional studies. Large structural variants, including copy number variants are also supported. Package accepts variants directly from variant caller - for example GATK or Speedseq. Output of package are lists of variants together with adequate visualization. Visualization of variants is performed in two ways - standard that outputs png figures and interactive that uses JavaScript d3 package. Interactive visualization allows to analyze trio/family data, for example in search for causative variants in rare Mendelian diseases, in point-and-click interface. The package includes homozygous region caller and allows to analyse whole human genomes in less than 30 minutes on a desktop computer. RareVariantVis disclosed novel causes of several rare monogenic disorders, including one with non-coding causative variant - keratolythic winter erythema.

Maintained by Tomasz Stokowy. Last updated 5 months ago.

genomicvariation sequencing wholegenome

0.5 match 3.90 score 1 scripts

bioc

GeneNetworkBuilder:GeneNetworkBuilder: a bioconductor package for building regulatory network using ChIP-chip/ChIP-seq data and Gene Expression Data

Appliation for discovering direct or indirect targets of transcription factors using ChIP-chip or ChIP-seq, and microarray or RNA-seq gene expression data. Inputting a list of genes of potential targets of one TF from ChIP-chip or ChIP-seq, and the gene expression results, GeneNetworkBuilder generates a regulatory network of the TF.

Maintained by Jianhong Ou. Last updated 9 days ago.

sequencing microarray graphandnetwork cpp

0.5 match 3.77 score 17 scripts

statuser

RGremlinsConjoint:Estimate the "Gremlins in the Data" Model for Conjoint Studies

The tools and utilities to estimate the model described in "Gremlin's in the Data: Identifying the Information Content of Research Subjects" (Howell et al. (2021) <doi:10.1177/0022243720965930>) using conjoint analysis data such as that collected in Sawtooth Software's 'Lighthouse' or 'Discover' products. Additional utilities are included for formatting the input data.

Maintained by John Howell. Last updated 2 years ago.

0.5 match 3.70 score 6 scripts

n-t-huyen

MicrobiomeSurv:A Biomarker Validation Approach for Classification and Predicting Survival Using Microbiome Data

An approach to identify microbiome biomarker for time to event data by discovering microbiome for predicting survival and classifying subjects into risk groups. Classifiers are constructed as a linear combination of important microbiome and treatment effects if necessary. Several methods were implemented to estimate the microbiome risk score such as majority voting technique, LASSO, Elastic net, supervised principle component analysis (SPCA), and supervised partial least squares analysis (SPLS). Sensitivity analysis on the quantile used for the classification can also be accessed to check the deviation of the classification group based on the quantile specified. Large scale cross validation can be performed in order to investigate the mostly selected microbiome and for internal validation. During the evaluation process, validation is accessed using the hazard ratios (HR) distribution of the test set and inference is mainly based on resampling and permutations technique.

Maintained by Thi Huyen Nguyen. Last updated 1 years ago.

0.5 match 3.70 score 2 scripts

bioc

pandaR:PANDA Algorithm

Runs PANDA, an algorithm for discovering novel network structure by combining information from multiple complementary data sources.

Maintained by Joseph N. Paulson. Last updated 5 months ago.

statisticalmethod graphandnetwork microarray generegulation networkinference geneexpression transcription network

0.5 match 3.30 score 8 scripts

joerigdon

nearfar:Near-Far Matching

Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable. Methods outlined in further detail in Rigdon, Baiocchi, and Basu (2018) <doi:10.18637/jss.v086.c05>.

Maintained by Joseph Rigdon. Last updated 1 years ago.

1.6 match 1.08 score 12 scripts

yhenryli

PAC:Partition-Assisted Clustering and Multiple Alignments of Networks

Implements partition-assisted clustering and multiple alignments of networks. It 1) utilizes partition-assisted clustering to find robust and accurate clusters and 2) discovers coherent relationships of clusters across multiple samples. It is particularly useful for analyzing single-cell data set. Please see Li et al. (2017) <doi:10.1371/journal.pcbi.1005875> for detail method description.

Maintained by Ye Henry Li. Last updated 4 years ago.

cpp

0.5 match 3.30 score 7 scripts

korydjohnson

rai:Revisiting-Alpha-Investing for Polynomial Regression

A modified implementation of stepwise regression that greedily searches the space of interactions among features in order to build polynomial regression models. Furthermore, the hypothesis tests conducted are valid-post model selection due to the use of a revisiting procedure that implements an alpha-investing rule. As a result, the set of rejected sequential hypotheses is proven to control the marginal false discover rate. When not searching for polynomials, the package provides a statistically valid algorithm to run and terminate stepwise regression. For more information, see Johnson, Stine, and Foster (2019) <arXiv:1510.06322>.

Maintained by Kory D. Johnson. Last updated 3 years ago.

0.5 match 3 stars 3.18 score 7 scripts

cran

rcausim:Generate Causally-Simulated Data

Generate causally-simulated data to serve as ground truth for evaluating methods in causal discovery and effect estimation. The package provides tools to assist in defining functions based on specified edges, and conversely, defining edges based on functions. It enables the generation of data according to these predefined functions and causal structures. This is particularly useful for researchers in fields such as artificial intelligence, statistics, biology, medicine, epidemiology, economics, and social sciences, who are developing a general or a domain-specific methods to discover causal structures and estimate causal effects. Data simulation adheres to principles of structural causal modeling. Detailed methodologies and examples are documented in our vignette, available at <https://htmlpreview.github.io/?https://github.com/herdiantrisufriyana/rcausim/blob/master/doc/causal_simulation_exemplar.html>.

Maintained by Herdiantri Sufriyana. Last updated 9 months ago.

0.5 match 3.00 score

nystat

OrdCD:Ordinal Causal Discovery

Algorithms for ordinal causal discovery. This package aims to enable users to discover causality for observational ordinal categorical data with greedy and exhaustive search. See Ni, Y., & Mallick, B. (2022) <https://proceedings.mlr.press/v180/ni22a/ni22a.pdf> "Ordinal Causal Discovery. Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence, (UAI 2022), PMLR 180:1530–1540".

Maintained by Yang Ni. Last updated 2 years ago.

0.5 match 2.70 score

vankesteren

cmfilter:Coordinate-Wise Mediation Filter

Functions to discover, plot, and select multiple mediators from an x -> M -> y linear system. This exploratory mediation analysis is performed using the Coordinate-wise Mediation Filter as introduced by Van Kesteren and Oberski (2019) <doi: 10.1080/10705511.2019.1588124>.

Maintained by Erik-Jan van Kesteren. Last updated 2 years ago.

openblas cpp openmp

0.5 match 4 stars 2.30 score 4 scripts

gjhunt

rrscale:Robust Re-Scaling to Better Recover Latent Effects in Data

Non-linear transformations of data to better discover latent effects. Applies a sequence of three transformations (1) a Gaussianizing transformation, (2) a Z-score transformation, and (3) an outlier removal transformation. A publication describing the method has the following citation: Gregory J. Hunt, Mark A. Dane, James E. Korkola, Laura M. Heiser & Johann A. Gagnon-Bartsch (2020) "Automatic Transformation and Integration to Improve Visualization and Discovery of Latent Effects in Imaging Data", Journal of Computational and Graphical Statistics, <doi:10.1080/10618600.2020.1741379>.

Maintained by Gregory Hunt. Last updated 5 years ago.

0.5 match 2.30 score 9 scripts

kobiperl

mHG:Minimum-Hypergeometric Test

Runs a minimum-hypergeometric (mHG) test as described in: Eden, E. (2007). Discovering Motifs in Ranked Lists of DNA Sequences. Haifa.

Maintained by Kobi Perl. Last updated 8 years ago.

0.5 match 1 stars 2.19 score 31 scripts

cran

exploreR:Tools for Quickly Exploring Data

Simplifies some complicated and labor intensive processes involved in exploring and explaining data. Allows you to quickly and efficiently visualize the interaction between variables and simplifies the process of discovering covariation in your data. Also includes some convenience features designed to remove as much redundant typing as possible.

Maintained by Michael Coates. Last updated 9 years ago.

0.5 match 2.00 score

cran

DMtest:Differential Methylation Tests (DMtest)

Several tests for differential methylation in methylation array data, including one-sided differential mean and variance test. Methods used in the package refer to Dai, J, Wang, X, Chen, H and others (2021) "Incorporating increased variability in discovering cancer methylation markers", Biostatistics, submitted.

Maintained by James Dai. Last updated 4 years ago.

0.5 match 2.00 score 3 scripts

empiricalbayes

LFDREmpiricalBayes:Estimating Local False Discovery Rates Using Empirical Bayes Methods

New empirical Bayes methods aiming at analyzing the association of single nucleotide polymorphisms (SNPs) to some particular disease are implemented in this package. The package uses local false discovery rate (LFDR) estimates of SNPs within a sample population defined as a "reference class" and discovers if SNPs are associated with the corresponding disease. Although SNPs are used throughout this document, other biological data such as protein data and other gene data can be used. Karimnezhad, Ali and Bickel, D. R. (2016) <http://hdl.handle.net/10393/34889>.

Maintained by Ali Karimnezhad. Last updated 7 years ago.

bayesian mathematicalbiology multiplecomparison

0.5 match 2.00 score 5 scripts

reimand0

ActiveDriver:Finding Cancer Driver Proteins with Enriched Mutations in Post-Translational Modification Sites

A mutation analysis tool that discovers cancer driver genes with frequent mutations in protein signalling sites such as post-translational modifications (phosphorylation, ubiquitination, etc). The Poisson generalised linear regression model identifies genes where cancer mutations in signalling sites are more frequent than expected from the sequence of the entire gene. Integration of mutations with signalling information helps find new driver genes and propose candidate mechanisms to known drivers. Reference: Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Juri Reimand and Gary D Bader. Molecular Systems Biology (2013) 9:637 <doi:10.1038/msb.2012.68>.

Maintained by Juri Reimand. Last updated 8 years ago.

0.5 match 2.00 score 6 scripts

neonira

wyz.code.metaTesting:Wizardry Code Meta Testing

Meta testing is the ability to test a function without having to provide its parameter values. Those values will be generated, based on semantic naming of parameters, as introduced by package 'wyz.code.offensiveProgramming'. Value generation logic can be completed with your own data types and generation schemes. This to meet your most specific requirements and to answer to a wide variety of usages, from general use case to very specific ones. While using meta testing, it becomes easier to generate stress test campaigns, non-regression test campaigns and robustness test campaigns, as generated tests can be saved and reused from session to session. Main benefits of using 'wyz.code.metaTesting' is ability to discover valid and invalid function parameter combinations, ability to infer valid parameter values, and to provide smart summaries that allows you to focus on dysfunctional cases.

Maintained by Fabien Gelineau. Last updated 1 years ago.

0.5 match 2.00 score

cran

squant:Subgroup Identification Based on Quantitative Objectives

A subgroup identification method for precision medicine based on quantitative objectives. This method can handle continuous, binary and survival endpoint for both prognostic and predictive case. For the predictive case, the method aims at identifying a subgroup for which treatment is better than control by at least a pre-specified or auto-selected constant. For the prognostic case, the method aims at identifying a subgroup that is at least better than a pre-specified/auto-selected constant. The derived signature is a linear combination of predictors, and the selected subgroup are subjects with the signature > 0. The false discover rate when no true subgroup exists is controlled at a user-specified level.

Maintained by YAN SUN. Last updated 7 months ago.

0.5 match 1.30 score

cheweichang1992

GGoutlieR:Identify Individuals with Unusual Geo-Genetic Patterns

Identify and visualize individuals with unusual association patterns of genetics and geography using the approach of Chang and Schmid (2023) <doi:10.1101/2023.04.06.535838>. It detects potential outliers that violate the isolation-by-distance assumption using the K-nearest neighbor approach. You can obtain a table of outliers with statistics and visualize unusual geo-genetic patterns on a geographical map. This is useful for landscape genomics studies to discover individuals with unusual geography and genetics associations from a large biological sample.

Maintained by Che-Wei Chang. Last updated 1 years ago.

0.5 match 1.18 score 15 scripts

cran

onc.api:Oceans 2.0 API Client Library

Allows users to discover and retrieve Ocean Networks Canada's oceanographic data in raw, text, image, audio, video or any other format available. Provides a class that wraps web service calls and business logic so that users can download data with a single line of code.

Maintained by Bennit Mueller. Last updated 4 years ago.

0.5 match 1.00 score 1 scripts

cran

costat:Time Series Costationarity Determination

Contains functions that can determine whether a time series is second-order stationary or not (and hence evidence for locally stationarity). Given two non-stationary series (i.e. locally stationary series) this package can then discover time-varying linear combinations that are second-order stationary. Cardinali, A. and Nason, G.P. (2013) <doi:10.18637/jss.v055.i01>.

Maintained by Guy Nason. Last updated 2 years ago.

0.5 match 1.00 score

cbergmeir

opusminer:OPUS Miner Algorithm for Filtered Top-k Association Discovery

Provides a simple R interface to the OPUS Miner algorithm (implemented in C++) for finding the top-k productive, non-redundant itemsets from transaction data. The OPUS Miner algorithm uses the OPUS search algorithm to efficiently discover the key associations in transaction data, in the form of self-sufficient itemsets, using either leverage or lift. See <http://i.giwebb.com/index.php/research/association-discovery/> for more information in relation to the OPUS Miner algorithm.

Maintained by Christoph Bergmeir. Last updated 5 years ago.

cpp

0.5 match 1 stars 1.00 score 2 scripts

syedhaider5

iDOS:Integrated Discovery of Oncogenic Signatures

A method to integrate molecular profiles of cancer patients (gene copy number and mRNA abundance) to identify candidate gain of function alterations. These candidate alterations can be subsequently further tested to discover cancer driver alterations. Briefly, this method tests of genomic correlates of mRNA dysregulation and prioritise those where DNA gains/amplifications are associated with elevated mRNA expression of the same gene. For details see, Haider S et al. (2016) "Genomic alterations underlie a pan-cancer metabolic shift associated with tumour hypoxia", Genome Biology, <https://pubmed.ncbi.nlm.nih.gov/27358048/>.

Maintained by Syed Haider. Last updated 1 years ago.

0.5 match 1.00 score 10 scripts