Showing 140 of total 140 results (show query)
promidat
discoveR:Exploratory Data Analysis System
Performs an exploratory data analysis through a 'shiny' interface. It includes basic methods such as the mean, median, mode, normality test, among others. It also includes clustering techniques such as Principal Components Analysis, Hierarchical Clustering and the K-Means Method.
Maintained by Oldemar Rodriguez. Last updated 2 years ago.
50.0 match 3 stars 3.03 score 18 scriptsdoi-usgs
nhdplusTools:NHDPlus Tools
Tools for traversing and working with National Hydrography Dataset Plus (NHDPlus) data. All methods implemented in 'nhdplusTools' are available in the NHDPlus documentation available from the US Environmental Protection Agency <https://www.epa.gov/waterdata/basic-information>.
Maintained by David Blodgett. Last updated 25 days ago.
5.6 match 87 stars 11.38 score 348 scripts 5 dependentsbioc
metagenomeSeq:Statistical analysis for sparse high-throughput sequencing
metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.
Maintained by Joseph N. Paulson. Last updated 3 months ago.
immunooncologyclassificationclusteringgeneticvariabilitydifferentialexpressionmicrobiomemetagenomicsnormalizationvisualizationmultiplecomparisonsequencingsoftware
5.2 match 69 stars 12.02 score 494 scripts 7 dependentssergejruff
Virusparies:Visualize and Process Output from 'VirusHunterGatherer'
A collection of tools for downstream analysis of 'VirusHunterGatherer' output. Processing of hittables and plotting of results, enabling better interpretation, is made easier with the provided functions.
Maintained by Ruff Sergej. Last updated 3 months ago.
bioinformaticsdata-drivendiscoverdiscoveryggplot2graphical-tablehidden-markov-modelhmmlearnplotr-programmingsummary-statisticsvirusvirus-discoveryvirus-scanningvirusgatherervirushuntervirushuntergatherervisualization
10.0 match 1 stars 4.49 score 28 scriptsrstudio
pins:Pin, Discover, and Share Resources
Publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with 'DropBox'), 'Posit Connect', 'AWS S3', and more.
Maintained by Julia Silge. Last updated 1 months ago.
azuregcloudrpinsrsconnects3storage
3.1 match 321 stars 14.17 score 1.9k scripts 17 dependentsbioc
CatsCradle:This package provides methods for analysing spatial transcriptomics data and for discovering gene clusters
This package addresses two broad areas. It allows for in-depth analysis of spatial transcriptomic data by identifying tissue neighbourhoods. These are contiguous regions of tissue surrounding individual cells. 'CatsCradle' allows for the categorisation of neighbourhoods by the cell types contained in them and the genes expressed in them. In particular, it produces Seurat objects whose individual elements are neighbourhoods rather than cells. In addition, it enables the categorisation and annotation of genes by producing Seurat objects whose elements are genes.
Maintained by Michael Shapiro. Last updated 1 months ago.
biologicalquestionstatisticalmethodgeneexpressionsinglecelltranscriptomicsspatial
6.0 match 3 stars 6.50 scorerstudio
reticulate:Interface to 'Python'
Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.
Maintained by Tomasz Kalinowski. Last updated 1 days ago.
1.8 match 1.7k stars 21.07 score 18k scripts 427 dependentsbioc
tradeSeq:trajectory-based differential expression analysis for sequencing data
tradeSeq provides a flexible method for fitting regression models that can be used to find genes that are differentially expressed along one or multiple lineages in a trajectory. Based on the fitted models, it uses a variety of tests suited to answer different questions of interest, e.g. the discovery of genes for which expression is associated with pseudotime, or which are differentially expressed (in a specific region) along the trajectory. It fits a negative binomial generalized additive model (GAM) for each gene, and performs inference on the parameters of the GAM.
Maintained by Hector Roux de Bezieux. Last updated 5 months ago.
clusteringregressiontimecoursedifferentialexpressiongeneexpressionrnaseqsequencingsoftwaresinglecelltranscriptomicsmultiplecomparisonvisualization
3.7 match 247 stars 10.06 score 440 scriptsbioc
musicatk:Mutational Signature Comprehensive Analysis Toolkit
Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.
Maintained by Joshua D. Campbell. Last updated 5 months ago.
softwarebiologicalquestionsomaticmutationvariantannotation
4.9 match 13 stars 7.02 score 20 scriptsdstgithub
GrpString:Patterns and Statistical Differences Between Two Groups of Strings
Methods include converting series of event names to strings, finding common patterns in a group of strings, discovering featured patterns when comparing two groups of strings as well as the number and starting position of each pattern in each string, obtaining transition matrix, computing transition entropy, statistically comparing the difference between two groups of strings, and clustering string groups. Event names can be any action names or labels such as events in log files or areas of interest (AOIs) in eye tracking research.
Maintained by Hui (Tom) Tang. Last updated 7 years ago.
9.2 match 2 stars 3.48 score 30 scriptsbioc
AnnotationHub:Client to access AnnotationHub resources
This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be discovered. The resource includes metadata about each resource, e.g., a textual description, tags, and date of modification. The client creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructuredataimportguithirdpartyclientcore-packageu24ca289073
2.2 match 17 stars 13.89 score 2.7k scripts 102 dependentsbioconductor
BiocManager:Access the Bioconductor Project Package Repository
A convenient tool to install and update Bioconductor packages.
Maintained by Marcel Ramos. Last updated 30 days ago.
1.9 match 74 stars 16.47 score 2.9k scripts 414 dependentstidyverse
dplyr:A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
Maintained by Hadley Wickham. Last updated 12 days ago.
1.3 match 4.8k stars 24.68 score 659k scripts 7.8k dependentssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 9 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
2.0 match 959 stars 15.16 score 4.0k scripts 21 dependentsmolgenis
MolgenisAuth:'OpenID Connect' Discovery and Authentication
Discover 'OpenID Connect' endpoints and authenticate using device flow. Used by 'MOLGENIS' packages.
Maintained by Mariska Slofstra. Last updated 7 months ago.
5.4 match 8 stars 5.58 score 5 scripts 2 dependentsbioc
cellxgenedp:Discover and Access Single Cell Data Sets in the CELLxGENE Data Portal
The cellxgene data portal (https://cellxgene.cziscience.com/) provides a graphical user interface to collections of single-cell sequence data processed in standard ways to 'count matrix' summaries. The cellxgenedp package provides an alternative, R-based inteface, allowind data discovery, viewing, and downloading.
Maintained by Martin Morgan. Last updated 5 months ago.
singlecelldataimportthirdpartyclient
4.5 match 8 stars 6.64 score 27 scriptsbioc
motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites
We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).
Maintained by Simon Gert Coetzee. Last updated 5 months ago.
chipseqvisualizationmotifannotationtranscription
3.2 match 28 stars 8.96 score 103 scriptsbioc
DESpace:DESpace: a framework to discover spatially variable genes
Intuitive framework for identifying spatially variable genes (SVGs) via edgeR, a popular method for performing differential expression analyses. Based on pre-annotated spatial clusters as summarized spatial information, DESpace models gene expression using a negative binomial (NB), via edgeR, with spatial clusters as covariates. SVGs are then identified by testing the significance of spatial clusters. The method is flexible and robust, and is faster than the most SV methods. Furthermore, to the best of our knowledge, it is the only SV approach that allows: - performing a SV test on each individual spatial cluster, hence identifying the key regions of the tissue affected by spatial variability; - jointly fitting multiple samples, targeting genes with consistent spatial patterns across replicates.
Maintained by Peiying Cai. Last updated 5 months ago.
spatialsinglecellrnaseqtranscriptomicsgeneexpressionsequencingdifferentialexpressionstatisticalmethodvisualization
5.7 match 4 stars 5.02 score 13 scriptsbioc
DifferentialRegulation:Differentially regulated genes from scRNA-seq data
DifferentialRegulation is a method for detecting differentially regulated genes between two groups of samples (e.g., healthy vs. disease, or treated vs. untreated samples), by targeting differences in the balance of spliced and unspliced mRNA abundances, obtained from single-cell RNA-sequencing (scRNA-seq) data. From a mathematical point of view, DifferentialRegulation accounts for the sample-to-sample variability, and embeds multiple samples in a Bayesian hierarchical model. Furthermore, our method also deals with two major sources of mapping uncertainty: i) 'ambiguous' reads, compatible with both spliced and unspliced versions of a gene, and ii) reads mapping to multiple genes. In particular, ambiguous reads are treated separately from spliced and unsplced reads, while reads that are compatible with multiple genes are allocated to the gene of origin. Parameters are inferred via Markov chain Monte Carlo (MCMC) techniques (Metropolis-within-Gibbs).
Maintained by Simone Tiberi. Last updated 5 months ago.
differentialsplicingbayesiangeneticsrnaseqsequencingdifferentialexpressiongeneexpressionmultiplecomparisonsoftwaretranscriptionstatisticalmethodvisualizationsinglecellgenetargetopenblascpp
5.2 match 10 stars 5.30 score 4 scriptsropensci
gert:Simple Git Client for R
Simple git client for R based on 'libgit2' <https://libgit2.org> with support for SSH and HTTPS remotes. All functions in 'gert' use basic R data types (such as vectors and data-frames) for their arguments and return values. User credentials are shared with command line 'git' through the git-credential store and ssh keys stored on disk or ssh-agent.
Maintained by Jeroen Ooms. Last updated 4 months ago.
1.8 match 154 stars 14.82 score 158 scripts 369 dependentsjeroen
curl:A Modern and Flexible Web Client for R
Bindings to 'libcurl' <https://curl.se/libcurl/> for performing fully configurable HTTP/FTP requests where responses can be processed in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of 'libcurl' is recommended; for a more-user-friendly web client see the 'httr2' package which builds on this package with http specific tools and logic.
Maintained by Jeroen Ooms. Last updated 22 days ago.
1.3 match 225 stars 19.95 score 4.0k scripts 5.8k dependentsshabbychef
cocktailApp:'shiny' App to Discover Cocktails
A 'shiny' app to discover cocktails. The app allows one to search for cocktails by ingredient, filter on rating, and number of ingredients. The package also contains data with the ingredients of nearly 26 thousand cocktails scraped from the web.
Maintained by Steven E. Pav. Last updated 3 years ago.
5.5 match 43 stars 4.33 score 5 scriptschanzuckerberg
cellxgene.census:CZ CELLxGENE Discover Cell Census
API to facilitate the use of the CZ CELLxGENE Discover Census. For more information about the API and the project visit https://github.com/chanzuckerberg/cellxgene-census/
Maintained by Chan Zuckerberg Initiative Foundation. Last updated 5 months ago.
3.5 match 96 stars 6.60 score 15 scriptsdaattali
addinslist:Discover and Install Useful RStudio Addins
Browse through a continuously updated list of existing RStudio addins and install/uninstall their corresponding packages.
Maintained by Dean Attali. Last updated 7 months ago.
3.0 match 850 stars 7.73 score 18 scriptsropensci
dataaimsr:AIMS Data Platform API Client
AIMS Data Platform API Client which provides easy access to AIMS Data Platform scientific data and information.
Maintained by Diego R. Barneche. Last updated 2 years ago.
aimsaustraliadatamarinemonitoringsstweather
4.5 match 4 stars 5.11 score 54 scriptsbioc
ExperimentHub:Client to access ExperimentHub resources
This package provides a client for the Bioconductor ExperimentHub web resource. ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed. Each resource has associated metadata, tags and date of modification. The client creates and manages a local cache of files retrieved enabling quick and reproducible access.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructuredataimportguithirdpartyclientcore-packageu24ca289073
1.7 match 9 stars 11.98 score 764 scripts 55 dependentstiledb-inc
tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays
The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.
Maintained by Isaiah Norton. Last updated 4 days ago.
arrayhdfss3storage-managertiledbcpp
1.7 match 107 stars 11.96 score 306 scripts 4 dependentsropensci
crul:HTTP Client
A simple HTTP client, with tools for making HTTP requests, and mocking HTTP requests. The package is built on R6, and takes inspiration from Ruby's 'faraday' gem (<https://rubygems.org/gems/faraday>). The package name is a play on curl, the widely used command line tool for HTTP, and this package is built on top of the R package 'curl', an interface to 'libcurl' (<https://curl.se/libcurl/>).
Maintained by Scott Chamberlain. Last updated 8 months ago.
httphttpsapiweb-servicescurldownloadlibcurlasyncmockingcaching
1.3 match 107 stars 14.00 score 240 scripts 162 dependentscran4linux
bspm:Bridge to System Package Manager
Enables binary package installations on Linux distributions. Provides functions to manage packages via the distribution's package manager. Also provides transparent integration with R's install.packages() and a fallback mechanism. When installed as a system package, interacts with the system's package manager without requiring administrative privileges via an integrated D-Bus service; otherwise, uses sudo. Currently, the following backends are supported: DNF, APT, ALPM.
Maintained by Iñaki Ucar. Last updated 5 months ago.
3.0 match 82 stars 6.19 score 2 scriptsbioc
GenomicScores:Infrastructure to work with genomewide position-specific scores
Provide infrastructure to store and access genomewide position-specific scores within R and Bioconductor.
Maintained by Robert Castelo. Last updated 1 months ago.
infrastructuregeneticsannotationsequencingcoverageannotationhubsoftware
2.0 match 8 stars 8.71 score 83 scripts 6 dependentsr-lib
gargle:Utilities for Working with Google APIs
Provides utilities for working with Google APIs <https://developers.google.com/apis-explorer>. This includes functions and classes for handling common credential types and for preparing, executing, and processing HTTP requests.
Maintained by Jennifer Bryan. Last updated 2 years ago.
1.2 match 113 stars 14.88 score 266 scripts 192 dependentsbioc
DiscoRhythm:Interactive Workflow for Discovering Rhythmicity in Biological Data
Set of functions for estimation of cyclical characteristics, such as period, phase, amplitude, and statistical significance in large temporal datasets. Supporting functions are available for quality control, dimensionality reduction, spectral analysis, and analysis of experimental replicates. Contains a R Shiny web interface to execute all workflow steps.
Maintained by Matthew Carlucci. Last updated 5 months ago.
softwaretimecoursequalitycontrolvisualizationguiprincipalcomponentbioconductordata-visualizationoscillationsrhythm-detectionwebapp
2.9 match 13 stars 5.89 score 9 scriptsbioc
cBioPortalData:Exposes and Makes Available Data from the cBioPortal Web Resources
The cBioPortalData R package accesses study datasets from the cBio Cancer Genomics Portal. It accesses the data either from the pre-packaged zip / tar files or from the API interface that was recently implemented by the cBioPortal Data Team. The package can provide data in either tabular format or with MultiAssayExperiment object that uses familiar Bioconductor data representations.
Maintained by Marcel Ramos. Last updated 9 days ago.
softwareinfrastructurethirdpartyclientbioconductor-packagenci-itcru24ca289073
1.5 match 33 stars 10.15 score 147 scripts 4 dependentsropensci
rotl:Interface to the 'Open Tree of Life' API
An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.
Maintained by Francois Michonneau. Last updated 2 years ago.
metadataropensciphylogeneticsindependant-contrastsbiodiversitypeer-reviewedphylogenytaxonomy
1.1 match 40 stars 12.05 score 356 scripts 29 dependentsbioc
MotifPeeker:Benchmarking Epigenomic Profiling Methods Using Motif Enrichment
MotifPeeker is used to compare and analyse datasets from epigenomic profiling methods with motif enrichment as the key benchmark. The package outputs an HTML report consisting of three sections: (1. General Metrics) Overview of peaks-related general metrics for the datasets (FRiP scores, peak widths and motif-summit distances). (2. Known Motif Enrichment Analysis) Statistics for the frequency of user-provided motifs enriched in the datasets. (3. De-Novo Motif Enrichment Analysis) Statistics for the frequency of de-novo discovered motifs enriched in the datasets and compared with known motifs.
Maintained by Hiranyamaya Dash. Last updated 2 months ago.
epigeneticsgeneticsqualitycontrolchipseqmultiplecomparisonfunctionalgenomicsmotifdiscoverysequencematchingsoftwarealignmentbioconductorbioconductor-packagechip-seqepigenomicsinteractive-reportmotif-enrichment-analysis
2.5 match 2 stars 5.48 score 6 scriptspursuitofdatascience
tidyEmoji:Discovers Emoji from Text
Unicodes are not friendly to work with, and not all Unicodes are Emoji per se, making obtaining Emoji statistics a difficult task. This tool can help your experience of working with Emoji as smooth as possible, as it has the 'tidyverse' style.
Maintained by Youzhi Yu. Last updated 2 years ago.
3.3 match 2 stars 4.00 score 7 scriptsbioc
methodical:Discovering genomic regions where methylation is strongly associated with transcriptional activity
DNA methylation is generally considered to be associated with transcriptional silencing. However, comprehensive, genome-wide investigation of this relationship requires the evaluation of potentially millions of correlation values between the methylation of individual genomic loci and expression of associated transcripts in a relatively large numbers of samples. Methodical makes this process quick and easy while keeping a low memory footprint. It also provides a novel method for identifying regions where a number of methylation sites are consistently strongly associated with transcriptional expression. In addition, Methodical enables housing DNA methylation data from diverse sources (e.g. WGBS, RRBS and methylation arrays) with a common framework, lifting over DNA methylation data between different genome builds and creating base-resolution plots of the association between DNA methylation and transcriptional activity at transcriptional start sites.
Maintained by Richard Heery. Last updated 2 months ago.
dnamethylationmethylationarraytranscriptiongenomewideassociationsoftwareopenjdk
2.8 match 4.65 score 14 scriptsbioc
TrajectoryGeometry:This Package Discovers Directionality in Time and Pseudo-times Series of Gene Expression Patterns
Given a time series or pseudo-times series of gene expression data, we might wish to know: Do the changes in gene expression in these data exhibit directionality? Are there turning points in this directionality. Do different subsets of the data move in different directions? This package uses spherical geometry to probe these sorts of questions. In particular, if we are looking at (say) the first n dimensions of the PCA of gene expression, directionality can be detected as the clustering of points on the (n-1)-dimensional sphere.
Maintained by Michael Shapiro. Last updated 5 months ago.
biologicalquestionstatisticalmethodgeneexpressionsinglecell
2.8 match 4.60 score 7 scriptsjbdorey
BeeBDC:Occurrence Data Cleaning
Flags and checks occurrence data that are in Darwin Core format. The package includes generic functions and data as well as some that are specific to bees. This package is meant to build upon and be complimentary to other excellent occurrence cleaning packages, including 'bdc' and 'CoordinateCleaner'. This package uses datasets from several sources and particularly from the Discover Life Website, created by Ascher and Pickering (2020). For further information, please see the original publication and package website. Publication - Dorey et al. (2023) <doi:10.1101/2023.06.30.547152> and package website - Dorey et al. (2023) <https://github.com/jbdorey/BeeBDC>.
Maintained by James B. Dorey. Last updated 4 months ago.
2.2 match 3 stars 5.68 score 7 scriptsbioc
rcellminer:rcellminer: Molecular Profiles, Drug Response, and Chemical Structures for the NCI-60 Cell Lines
The NCI-60 cancer cell line panel has been used over the course of several decades as an anti-cancer drug screen. This panel was developed as part of the Developmental Therapeutics Program (DTP, http://dtp.nci.nih.gov/) of the U.S. National Cancer Institute (NCI). Thousands of compounds have been tested on the NCI-60, which have been extensively characterized by many platforms for gene and protein expression, copy number, mutation, and others (Reinhold, et al., 2012). The purpose of the CellMiner project (http://discover.nci.nih.gov/ cellminer) has been to integrate data from multiple platforms used to analyze the NCI-60 and to provide a powerful suite of tools for exploration of NCI-60 data.
Maintained by Augustin Luna. Last updated 5 months ago.
acghcellbasedassayscopynumbervariationgeneexpressionpharmacogenomicspharmacogeneticsmirnacheminformaticsvisualizationsoftwaresystemsbiology
2.1 match 5.71 score 113 scriptsidigbio
ridigbio:Interface to the iDigBio Data API
An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio.
Maintained by Jesse Bennett. Last updated 4 days ago.
1.2 match 16 stars 10.23 score 63 scripts 7 dependentsaravind-j
PGRdup:Discover Probable Duplicates in Plant Genetic Resources Collections
Provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using 'passport databases' comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.
Maintained by J. Aravind. Last updated 2 years ago.
double-metaphonedouble-metaphone-algorithmnatural-language-processingpgrplant-genetic-resourcesrecord-linkage
2.9 match 1 stars 4.06 score 23 scriptsbioc
deltaCaptureC:This Package Discovers Meso-scale Chromatin Remodeling from 3C Data
This package discovers meso-scale chromatin remodelling from 3C data. 3C data is local in nature. It givens interaction counts between restriction enzyme digestion fragments and a preferred 'viewpoint' region. By binning this data and using permutation testing, this package can test whether there are statistically significant changes in the interaction counts between the data from two cell types or two treatments.
Maintained by Michael Shapiro. Last updated 5 months ago.
biologicalquestionstatisticalmethod
3.3 match 3.48 score 1 scriptsyouhuachen
RSE:Number of Newly Discovered Rare Species Estimation
A Bayesian-weighted estimator and two unweighted estimators are developed to estimate the number of newly found rare species in additional ecological samples. Among these methods, the Bayesian-weighted estimator and an unweighted (Chao-derived) estimator are of high accuracy and recommended for practical applications. Technical details of the proposed estimators have been well described in the following paper: Shen TJ, Chen YH (2018) A Bayesian weighted approach to predicting the number of newly discovered rare species. Conservation Biology, In press.
Maintained by Youhua Chen. Last updated 6 years ago.
5.2 match 2.03 score 18 scriptspaolodalena
tastypie:Easy Pie Charts
You only need to type 'why pie charts are bad' on Google to find thousands of articles full of (valid) reasons why other types of charts should be preferred over this one. Therefore, because of the little use due to the reasons already mentioned, making pie charts (and related) in R is not straightforward, so other functions are needed to simplify things. In this R package there are useful functions to make 'tasty' pie charts immediately by exploiting the many cool templates provided.
Maintained by Paolo Dalena. Last updated 2 years ago.
2.0 match 15 stars 5.24 score 23 scriptsbioc
EnrichDO:a Global Weighted Model for Disease Ontology Enrichment Analysis
To implement disease ontology (DO) enrichment analysis, this package is designed and presents a double weighted model based on the latest annotations of the human genome with DO terms, by integrating the DO graph topology on a global scale. This package exhibits high accuracy that it can identify more specific DO terms, which alleviates the over enriched problem. The package includes various statistical models and visualization schemes for discovering the associations between genes and diseases from biological big data.
Maintained by Hongyu Fu. Last updated 4 months ago.
annotationvisualizationgenesetenrichmentsoftware
2.1 match 4.74 score 9 scriptsargocanada
argoFloats:Analysis of Oceanographic Argo Floats
Supports the analysis of oceanographic data recorded by Argo autonomous drifting profiling floats. Functions are provided to (a) download and cache data files, (b) subset data in various ways, (c) handle quality-control flags and (d) plot the results according to oceanographic conventions. A shiny app is provided for easy exploration of datasets. The package is designed to work well with the 'oce' package, providing a wide range of processing capabilities that are particular to oceanographic analysis. See Kelley, Harbin, and Richards (2021) <doi:10.3389/fmars.2021.635922> for more on the scientific context and applications.
Maintained by Dan Kelley. Last updated 30 days ago.
1.3 match 17 stars 7.32 score 203 scriptsnixtla
nixtlar:A Software Development Kit for 'Nixtla''s 'TimeGPT'
A Software Development Kit for working with 'Nixtla''s 'TimeGPT', a foundation model for time series forecasting. 'API' is an acronym for 'application programming interface'; this package allows users to interact with 'TimeGPT' via the 'API'. You can set and validate 'API' keys and generate forecasts via 'API' calls. It is compatible with 'tsibble' and base R. For more details visit <https://docs.nixtla.io/>.
Maintained by Mariana Menchero. Last updated 27 days ago.
1.2 match 30 stars 8.16 score 38 scriptsbioc
Harman:The removal of batch effects from datasets using a PCA and constrained optimisation based technique
Harman is a PCA and constrained optimisation based technique that maximises the removal of batch effects from datasets, with the constraint that the probability of overcorrection (i.e. removing genuine biological signal along with batch noise) is kept to a fraction which is set by the end-user.
Maintained by Jason Ross. Last updated 5 months ago.
batcheffectmicroarraymultiplecomparisonprincipalcomponentnormalizationpreprocessingdnamethylationtranscriptionsoftwarestatisticalmethodcpp
1.9 match 4.97 score 31 scripts 1 dependentsbioc
VaSP:Quantification and Visualization of Variations of Splicing in Population
Discovery of genome-wide variable alternative splicing events from short-read RNA-seq data and visualizations of gene splicing information for publication-quality multi-panel figures in a population. (Warning: The visualizing function is removed due to the dependent package Sushi deprecated. If you want to use it, please change back to an older version.)
Maintained by Huihui Yu. Last updated 5 months ago.
rnaseqalternativesplicingdifferentialsplicingstatisticalmethodvisualizationpreprocessingclusteringdifferentialexpressionkeggimmunooncology3s-scoresalternative-splicingballgownrna-seqsplicingsqtlstatistics
1.9 match 3 stars 4.78 score 3 scriptsnetique
buildr:Organize & Run Build Scripts Comfortably
Working with reproducible reports or any other similar projects often require to run the script that builds the output file in a specified way. 'buildr' can help you organize, modify and comfortably run those scripts. The package provides a set of functions that interactively guides you through the process and that are available as 'RStudio' Addin, meaning you can set up the keyboard shortcuts, enabling you to choose and run the desired build script with one keystroke anywhere anytime.
Maintained by Jan Netik. Last updated 11 months ago.
addinbuildrkeyboard-shortcutrstudio-addin
1.8 match 15 stars 4.88 score 8 scriptsjiefei-wang
aws.ecx:Communicating with AWS EC2 and ECS using AWS REST APIs
Providing the functions for communicating with Amazon Web Services(AWS) Elastic Compute Cloud(EC2) and Elastic Container Service(ECS). The functions will have the prefix 'ecs_' or 'ec2_' depending on the class of the API. The request will be sent via the REST API and the parameters are given by the function argument. The credentials can be set via 'aws_set_credentials'. The EC2 documentation can be found at <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Welcome.html> and ECS can be found at <https://docs.aws.amazon.com/AmazonECS/latest/APIReference/Welcome.html>.
Maintained by Jiefei Wang. Last updated 3 years ago.
2.0 match 1 stars 4.18 score 2 scriptsropensci
qcoder:Lightweight Qualitative Coding
A free, lightweight, open source option for analyzing text-based qualitative data. Enables analysis of interview transcripts, observation notes, memos, and other sources. Supports the work of social scientists, historians, humanists, and other researchers who use qualitative methods. Addresses the unique challenges faced in analyzing qualitative data analysis. Provides opportunities for researchers who otherwise might not develop software to build software development skills.
Maintained by Elin Waring. Last updated 3 years ago.
1.7 match 134 stars 5.05 score 13 scriptsbioc
DOSE:Disease Ontology Semantic and Enrichment analysis
This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.
Maintained by Guangchuang Yu. Last updated 5 months ago.
annotationvisualizationmultiplecomparisongenesetenrichmentpathwayssoftwaredisease-ontologyenrichment-analysissemantic-similarity
0.5 match 119 stars 14.97 score 2.0k scripts 61 dependentss-fleck
rotor:Log Rotation and Conditional Backups
Conditionally rotate or back-up files based on their size or the date of the last backup; inspired by the 'Linux' utility 'logrotate'.
Maintained by Stefan Fleck. Last updated 2 years ago.
backuplogginglogrotatelogrotation
2.0 match 12 stars 3.78 score 10 scriptsftwkoopmans
goat:Gene Set Analysis Using the Gene Set Ordinal Association Test
Perform gene set enrichment analyses using the Gene set Ordinal Association Test (GOAT) algorithm and visualize your results. Koopmans, F. (2024) <doi:10.1038/s42003-024-06454-5>.
Maintained by Frank Koopmans. Last updated 22 days ago.
bioinformaticsgeneset-enrichmentgeneset-enrichment-analysiscppopenmp
1.7 match 10 stars 4.40 score 8 scriptsdoi-usgs
dataRetrieval:Retrieval Functions for USGS and EPA Hydrology and Water Quality Data
Collection of functions to help retrieve U.S. Geological Survey and U.S. Environmental Protection Agency water quality and hydrology data from web services. Data are discovered from National Water Information System <https://waterservices.usgs.gov/> and <https://waterdata.usgs.gov/nwis>. Water quality data are obtained from the Water Quality Portal <https://www.waterqualitydata.us/>.
Maintained by Laura DeCicco. Last updated 17 days ago.
0.5 match 280 stars 14.18 score 1.7k scripts 15 dependentsmolevolepid
SEEPS:Sequence evolution and epidemiological process simulator
A modular, modern simulation suite and toolkit for simulating transmission networks, phylogenies, and evolutionary pairwise distance matrices under different models and assumptions for viral/sequence evolution. While intially developed for HIV, SEEPS offers modular utilities for custom workflows for extension beyond HIV.
Maintained by Michael Kupperman. Last updated 2 months ago.
biological-sequencesepidemiologyevolutionhivsimulation-framework
1.8 match 1 stars 3.95 score 6 scriptsbioc
Rcollectl:Help use collectl with R in Linux, to measure resource consumption in R processes
Provide functions to obtain instrumentation data on processes in a unix environment. Parse output of a collectl run. Vizualize aspects of system usage over time, with annotation.
Maintained by Vincent Carey. Last updated 5 months ago.
1.8 match 3 stars 3.95 score 7 scriptsryanzomorrodi
healthatlas:Explore and Import 'Metopio' Health Atlas Data and Spatial Layers
Allows for painless use of the 'Metopio' health atlas APIs <https://metopio.com/how-it-works/atlas/> to explore and import data. 'Metopio' health atlases store open public health data. See what topics (or indicators) are available among specific populations, periods, and geographic layers. Download relevant data along with geographic boundaries or point datasets. Spatial datasets are returned as 'sf' objects.
Maintained by Ryan Zomorrodi. Last updated 14 days ago.
1.5 match 1 stars 4.60 score 5 scriptscran
texteffect:Discovering Latent Treatments in Text Corpora and Estimating Their Causal Effects
Implements the approach described in Fong and Grimmer (2016) <https://aclweb.org/anthology/P/P16/P16-1151.pdf> for automatically discovering latent treatments from a corpus and estimating the average marginal component effect (AMCE) of each treatment. The data is divided into a training and test set. The supervised Indian Buffet Process (sibp) is used to discover latent treatments in the training set. The fitted model is then applied to the test set to infer the values of the latent treatments in the test set. Finally, Y is regressed on the latent treatments in the test set to estimate the causal effect of each treatment.
Maintained by Christian Fong. Last updated 6 years ago.
5.3 match 2 stars 1.30 score 7 scriptsrcarragh
c212:Methods for Detecting Safety Signals in Clinical Trials Using Body-Systems (System Organ Classes)
Provides a self-contained set of methods to aid clinical trial safety investigators, statisticians and researchers, in the early detection of adverse events using groupings by body-system or system organ class. This work was supported by the Engineering and Physical Sciences Research Council (UK) (EPSRC) [award reference 1521741] and Frontier Science (Scotland) Ltd. The package title c212 is in reference to the original Engineering and Physical Sciences Research Council (UK) funded project which was named CASE 2/12.
Maintained by Raymond Carragher. Last updated 4 months ago.
1.7 match 4.06 score 57 scriptshaythorn
sr:Smooth Regression - The Gamma Test and Tools
Finds causal connections in precision data, finds lags and embeddings in time series, guides training of neural networks and other smooth models, evaluates their performance, gives a mathematically grounded answer to the over-training problem. Smooth regression is based on the Gamma test, which measures smoothness in a multivariate relationship. Causal relations are smooth, noise is not. 'sr' includes the Gamma test and search techniques that use it. References: Evans & Jones (2002) <doi:10.1098/rspa.2002.1010>, AJ Jones (2004) <doi:10.1007/s10287-003-0006-1>.
Maintained by Wayne Haythorn. Last updated 2 years ago.
1.8 match 3.70 score 9 scriptshrbrmstr
wand:Retrieve Magic Attributes from Files and Directories
MIME types are shorthand descriptors for file contents and can be determined from "magic" bytes in file headers, file contents or intuited from file extensions. Tools are provided to perform curated "magic" tests as well as mapping MIME types from a database of over 1,800 extension mappings.
Maintained by Bob Rudis. Last updated 5 years ago.
1.8 match 3.69 score 11 scripts 3 dependentsropensci
sodium:A Modern and Easy-to-Use Crypto Library
Bindings to 'libsodium' <https://doc.libsodium.org/>: a modern, easy-to-use software library for encryption, decryption, signatures, password hashing and more. Sodium uses curve25519, a state-of-the-art Diffie-Hellman function by Daniel Bernstein, which has become very popular after it was discovered that the NSA had backdoored Dual EC DRBG.
Maintained by Jeroen Ooms. Last updated 3 months ago.
0.5 match 70 stars 12.43 score 175 scripts 103 dependentsrinterface
shinyMobile:Mobile Ready 'shiny' Apps with Standalone Capabilities
Develop outstanding 'shiny' apps for 'iOS' and 'Android' as well as beautiful 'shiny' gadgets. 'shinyMobile' is built on top of the latest 'Framework7' template <https://framework7.io>. Discover 14 new input widgets (sliders, vertical sliders, stepper, grouped action buttons, toggles, picker, smart select, ...), 2 themes (light and dark), 12 new widgets (expandable cards, badges, chips, timelines, gauges, progress bars, ...) combined with the power of server-side notifications such as alerts, modals, toasts, action sheets, sheets (and more) as well as 3 layouts (single, tabs and split).
Maintained by David Granjon. Last updated 2 months ago.
androidhacktoberfest2022pwashinyshinyappstemplate
0.5 match 409 stars 11.91 score 1.1k scripts 2 dependentsneonscience
neonUtilities:Utilities for Working with NEON Data
NEON data packages can be accessed through the NEON Data Portal <https://www.neonscience.org> or through the NEON Data API (see <https://data.neonscience.org/data-api> for documentation). Data delivered from the Data Portal are provided as monthly zip files packaged within a parent zip file, while individual files can be accessed from the API. This package provides tools that aid in discovering, downloading, and reformatting data prior to use in analyses. This includes downloading data via the API, merging data tables by type, and converting formats. For more information, see the readme file at <https://github.com/NEONScience/NEON-utilities>.
Maintained by Claire Lunch. Last updated 1 months ago.
0.5 match 57 stars 10.66 score 944 scripts 15 dependentssciviews
data.io:Read and Write Data in Different Formats
Read or write data from many different formats (tabular datasets, from statistic software ...) into R objects. Add labels and units in different languages.
Maintained by Philippe Grosjean. Last updated 11 months ago.
1.3 match 1 stars 4.32 score 20 scripts 7 dependentsbioc
TTMap:Two-Tier Mapper: a clustering tool based on topological data analysis
TTMap is a clustering method that groups together samples with the same deviation in comparison to a control group. It is specially useful when the data is small. It is parameter free.
Maintained by Rachel Jeitziner. Last updated 5 months ago.
softwaremicroarraydifferentialexpressionmultiplecomparisonclusteringclassification
1.8 match 3.00 scoreslesche
acdcquery:Query the Attentional Control Data Collection
Interact with the Attentional Control Data Collection (ACDC). Connect to the database via connect_to_db(), set filter arguments via add_argument() and query the database via query_db().
Maintained by Sven Lesche. Last updated 4 months ago.
1.9 match 2.70 score 10 scriptsprogram--
HSClientR:A HydroShare API client for R
A RESTful API wrapper for accessing <https://hydroshare.org> data in R.
Maintained by Justin Singh-Mohudpur. Last updated 4 years ago.
api-wrappercuashihydrologyhydrosharewater-resources
2.0 match 4 stars 2.30 score 2 scriptscran
osdatahub:Easier Interaction with the Ordnance Survey Data Hub
Ordnance Survey ('OS') is the national mapping agency for Great Britain and produces a large variety of mapping and geospatial products. Much of OS's data is available via the OS Data Hub <https://osdatahub.os.uk/>, a platform that hosts both free and premium data products. 'osdatahub' provides a user-friendly way to access, query, and download these data.
Maintained by Chris Jochem. Last updated 1 years ago.
1.3 match 3.45 score 14 scriptsjpquast
protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools
Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.
Maintained by Jan-Philipp Quast. Last updated 5 months ago.
data-analysislip-msmass-spectrometryomicsproteinproteomicssystems-biology
0.5 match 61 stars 8.58 score 83 scriptsreimandlab
ActivePathways:Integrative Pathway Enrichment Analysis of Multivariate Omics Data
Framework for analysing multiple omics datasets in the context of molecular pathways, biological processes and other types of gene sets. The package uses p-value merging to combine gene- or protein-level signals, followed by ranked hypergeometric tests to determine enriched pathways and processes. Genes can be integrated using directional constraints that reflect how the input datasets are expected interact with one another. This approach allows researchers to interpret a series of omics datasets in the context of known biology and gene function, and discover associations that are only apparent when several datasets are combined. The recent version of the package is part of the following publication: Directional integration and pathway enrichment analysis for multi-omics data. Slobodyanyuk M^, Bahcheli AT^, Klein ZP, Bayati M, Strug LJ, Reimand J. Nature Communications (2024) <doi:10.1038/s41467-024-49986-4>.
Maintained by Juri Reimand. Last updated 8 months ago.
0.5 match 107 stars 8.61 score 35 scripts 2 dependentscthombor
SafeVote:Election Vote Counting with Safety Features
Fork of 'vote_2.3-2', Raftery et al. (2021) <DOI:10.32614/RJ-2021-086>, with additional support for stochastic experimentation.
Maintained by Clark Thomborson. Last updated 5 months ago.
1.6 match 2.70 score 5 scriptsrcannood
SCORPIUS:Inferring Developmental Chronologies from Single-Cell RNA Sequencing Data
An accurate and easy tool for performing linear trajectory inference on single cells using single-cell RNA sequencing data. In addition, 'SCORPIUS' provides functions for discovering the most important genes with respect to the reconstructed trajectory, as well as nice visualisation tools. Cannoodt et al. (2016) <doi:10.1101/079509>.
Maintained by Robrecht Cannoodt. Last updated 2 years ago.
0.5 match 59 stars 8.17 score 126 scriptsbioc
debrowser:Interactive Differential Expresion Analysis Browser
Bioinformatics platform containing interactive plots and tables for differential gene and region expression studies. Allows visualizing expression data much more deeply in an interactive and faster way. By changing the parameters, users can easily discover different parts of the data that like never have been done before. Manually creating and looking these plots takes time. With DEBrowser users can prepare plots without writing any code. Differential expression, PCA and clustering analysis are made on site and the results are shown in various plots such as scatter, bar, box, volcano, ma plots and Heatmaps.
Maintained by Alper Kucukural. Last updated 5 months ago.
sequencingchipseqrnaseqdifferentialexpressiongeneexpressionclusteringimmunooncology
0.5 match 61 stars 7.80 score 65 scriptsbioc
animalcules:Interactive microbiome analysis toolkit
animalcules is an R package for utilizing up-to-date data analytics, visualization methods, and machine learning models to provide users an easy-to-use interactive microbiome analysis framework. It can be used as a standalone software package or users can explore their data with the accompanying interactive R Shiny application. Traditional microbiome analysis such as alpha/beta diversity and differential abundance analysis are enhanced, while new methods like biomarker identification are introduced by animalcules. Powerful interactive and dynamic figures generated by animalcules enable users to understand their data better and discover new insights.
Maintained by Jessica McClintock. Last updated 5 months ago.
microbiomemetagenomicscoveragevisualization
0.5 match 55 stars 6.95 score 23 scriptsbioc
MoonlightR:Identify oncogenes and tumor suppressor genes from omics data
Motivation: The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). Results: We present an R/bioconductor package called MoonlightR which returns a list of candidate driver genes for specific cancer types on the basis of TCGA expression data. The method first infers gene regulatory networks and then carries out a functional enrichment analysis (FEA) (implementing an upstream regulator analysis, URA) to score the importance of well-known biological processes with respect to the studied cancer type. Eventually, by means of random forests, MoonlightR predicts two specific roles for the candidate driver genes: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, MoonlightR can be used to discover OCGs and TSGs in the same cancer type. This may help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV) in breast cancer. In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments.
Maintained by Matteo Tiberti. Last updated 5 months ago.
dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksurvivalgenesetenrichmentnetworkenrichment
0.5 match 17 stars 6.57 scorebioc
Moonlight2R:Identify oncogenes and tumor suppressor genes from omics data
The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). We present an updated version of the R/bioconductor package called MoonlightR, namely Moonlight2R, which returns a list of candidate driver genes for specific cancer types on the basis of omics data integration. The Moonlight framework contains a primary layer where gene expression data and information about biological processes are integrated to predict genes called oncogenic mediators, divided into putative tumor suppressors and putative oncogenes. This is done through functional enrichment analyses, gene regulatory networks and upstream regulator analyses to score the importance of well-known biological processes with respect to the studied cancer type. By evaluating the effect of the oncogenic mediators on biological processes or through random forests, the primary layer predicts two putative roles for the oncogenic mediators: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As gene expression data alone is not enough to explain the deregulation of the genes, a second layer of evidence is needed. We have automated the integration of a secondary mutational layer through new functionalities in Moonlight2R. These functionalities analyze mutations in the cancer cohort and classifies these into driver and passenger mutations using the driver mutation prediction tool, CScape-somatic. Those oncogenic mediators with at least one driver mutation are retained as the driver genes. As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, Moonlight2R can be used to discover OCGs and TSGs in the same cancer type. This may for instance help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV). In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments. An additional mechanistic layer evaluates if there are mutations affecting the protein stability of the transcription factors (TFs) of the TSGs and OCGs, as that may have an effect on the expression of the genes.
Maintained by Matteo Tiberti. Last updated 2 months ago.
dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksurvivalgenesetenrichmentnetworkenrichment
0.5 match 5 stars 6.59 score 43 scriptsprzechoj
gips:Gaussian Model Invariant by Permutation Symmetry
Find the permutation symmetry group such that the covariance matrix of the given data is approximately invariant under it. Discovering such a permutation decreases the number of observations needed to fit a Gaussian model, which is of great use when it is smaller than the number of variables. Even if that is not the case, the covariance matrix found with 'gips' approximates the actual covariance with less statistical error. The methods implemented in this package are described in Graczyk et al. (2022) <doi:10.1214/22-AOS2174>.
Maintained by Adam Przemysław Chojecki. Last updated 8 months ago.
covariance-estimationmachine-learningnormal-distribution
0.5 match 6 stars 6.40 score 31 scriptshuanglabumn
oncoPredict:Drug Response Modeling and Biomarker Discovery
Allows for building drug response models using screening data between bulk RNA-Seq and a drug response metric and two additional tools for biomarker discovery that have been developed by the Huang Laboratory at University of Minnesota. There are 3 main functions within this package. (1) calcPhenotype is used to build drug response models on RNA-Seq data and impute them on any other RNA-Seq dataset given to the model. (2) GLDS is used to calculate the general level of drug sensitivity, which can improve biomarker discovery. (3) IDWAS can take the results from calcPhenotype and link the imputed response back to available genomic (mutation and CNV alterations) to identify biomarkers. Each of these functions comes from a paper from the Huang research laboratory. Below gives the relevant paper for each function. calcPhenotype - Geeleher et al, Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. GLDS - Geeleher et al, Cancer biomarker discovery is improved by accounting for variability in general levels of drug sensitivity in pre-clinical models. IDWAS - Geeleher et al, Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies.
Maintained by Robert Gruener. Last updated 12 months ago.
svapreprocesscorestringrbiomartgenefilterorg.hs.eg.dbgenomicfeaturestxdb.hsapiens.ucsc.hg19.knowngenetcgabiolinksbiocgenericsgenomicrangesirangess4vectors
0.5 match 18 stars 6.47 score 41 scriptsddalthorp
eoa3:Wildlife Mortality Estimator for Low Fatality Rates and Imperfect Detection
Evidence of Absence software (EoA) is a user-friendly application for estimating bird and bat fatalities at wind farms and designing search protocols. The software is particularly useful in addressing whether the number of fatalities has exceeded a given threshold and what search parameters are needed to give assurance that thresholds were not exceeded. The models are applicable even when zero carcasses have been found in searches, following Huso et al. (2015) <doi:10.1890/14-0764.1>, Dalthorp et al. (2017) <doi:10.3133/ds1055>, and Dalthorp and Huso (2015) <doi:10.3133/ofr20151227>.
Maintained by Daniel Dalthorp. Last updated 4 months ago.
3.2 match 1.00 scorekforner
srcpkgs:R Source Packages Manager
Manage a collection/library of R source packages. Discover, document, load, test source packages. Enable to use those packages as if they were actually installed. Quickly reload only what is needed on source code change. Run tests and checks in parallel.
Maintained by Karl Forner. Last updated 10 months ago.
0.5 match 11 stars 6.04 score 6 scriptscran
startR:Automatically Retrieve Multidimensional Distributed Data Sets
Tool to automatically fetch, transform and arrange subsets of multi- dimensional data sets (collections of files) stored in local and/or remote file systems or servers, using multicore capabilities where possible. The tool provides an interface to perceive a collection of data sets as a single large multidimensional data array, and enables the user to request for automatic retrieval, processing and arrangement of subsets of the large array. Wrapper functions to add support for custom file formats can be plugged in/out, making the tool suitable for any research field where large multidimensional data sets are involved.
Maintained by Victoria Agudetse. Last updated 6 months ago.
1.7 match 1.78 score 2 dependentsjamesdalg
CNVScope:A Versatile Toolkit for Copy Number Variation Relationship Data Analysis and Visualization
Provides the ability to create interaction maps, discover CNV map domains (edges), gene annotate interactions, and create interactive visualizations of these CNV interaction maps.
Maintained by James Dalgleish. Last updated 3 years ago.
0.5 match 8 stars 5.58 score 24 scriptsmazamascience
MazamaLocationUtils:Manage Spatial Metadata for Known Locations
Utility functions for discovering and managing metadata associated with spatially unique "known locations". Applications include all fields of environmental monitoring (e.g. air and water quality) where data are collected at stationary sites.
Maintained by Jonathan Callahan. Last updated 3 months ago.
0.5 match 5.64 score 108 scriptsropensci
hddtools:Hydrological Data Discovery Tools
Tools to discover hydrological data, accessing catalogues and databases from various data providers. The package is described in Vitolo (2017) "hddtools: Hydrological Data Discovery Tools" <doi:10.21105/joss.00056>.
Maintained by Dorothea Hug Peter. Last updated 7 months ago.
data60ukgrdchydrologykgclimateclassmopexpeer-reviewedprecipitationsepa
0.5 match 48 stars 5.56 score 25 scriptsbioc
CytoGLMM:Conditional Differential Analysis for Flow and Mass Cytometry Experiments
The CytoGLMM R package implements two multiple regression strategies: A bootstrapped generalized linear model (GLM) and a generalized linear mixed model (GLMM). Most current data analysis tools compare expressions across many computationally discovered cell types. CytoGLMM focuses on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. As a result, CytoGLMM finds differential proteins in flow and mass cytometry data while reducing biases arising from marker correlations and safeguarding against false discoveries induced by patient heterogeneity.
Maintained by Christof Seiler. Last updated 5 months ago.
flowcytometryproteomicssinglecellcellbasedassayscellbiologyimmunooncologyregressionstatisticalmethodsoftware
0.5 match 2 stars 5.68 score 1 scripts 1 dependentsdpc10ster
RJafroc:Artificial Intelligence Systems and Observer Performance
Analyzing the performance of artificial intelligence (AI) systems/algorithms characterized by a 'search-and-report' strategy. Historically observer performance has dealt with measuring radiologists' performances in search tasks, e.g., searching for lesions in medical images and reporting them, but the implicit location information has been ignored. The implemented methods apply to analyzing the absolute and relative performances of AI systems, comparing AI performance to a group of human readers or optimizing the reporting threshold of an AI system. In addition to performing historical receiver operating receiver operating characteristic (ROC) analysis (localization information ignored), the software also performs free-response receiver operating characteristic (FROC) analysis, where lesion localization information is used. A book using the software has been published: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples, Taylor-Francis LLC; 2017: <https://www.routledge.com/Observer-Performance-Methods-for-Diagnostic-Imaging-Foundations-Modeling/Chakraborty/p/book/9781482214840>. Online updates to this book, which use the software, are at <https://dpc10ster.github.io/RJafrocQuickStart/>, <https://dpc10ster.github.io/RJafrocRocBook/> and at <https://dpc10ster.github.io/RJafrocFrocBook/>. Supported data collection paradigms are the ROC, FROC and the location ROC (LROC). ROC data consists of single ratings per images, where a rating is the perceived confidence level that the image is that of a diseased patient. An ROC curve is a plot of true positive fraction vs. false positive fraction. FROC data consists of a variable number (zero or more) of mark-rating pairs per image, where a mark is the location of a reported suspicious region and the rating is the confidence level that it is a real lesion. LROC data consists of a rating and a location of the most suspicious region, for every image. Four models of observer performance, and curve-fitting software, are implemented: the binormal model (BM), the contaminated binormal model (CBM), the correlated contaminated binormal model (CORCBM), and the radiological search model (RSM). Unlike the binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves that do not inappropriately cross the chance diagonal. Additionally, RSM parameters are related to search performance (not measured in conventional ROC analysis) and classification performance. Search performance refers to finding lesions, i.e., true positives, while simultaneously not finding false positive locations. Classification performance measures the ability to distinguish between true and false positive locations. Knowing these separate performances allows principled optimization of reader or AI system performance. This package supersedes Windows JAFROC (jackknife alternative FROC) software V4.2.1, <https://github.com/dpc10ster/WindowsJafroc>. Package functions are organized as follows. Data file related function names are preceded by 'Df', curve fitting functions by 'Fit', included data sets by 'dataset', plotting functions by 'Plot', significance testing functions by 'St', sample size related functions by 'Ss', data simulation functions by 'Simulate' and utility functions by 'Util'. Implemented are figures of merit (FOMs) for quantifying performance and functions for visualizing empirical or fitted operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC) curves. For fully crossed study designs significance testing of reader-averaged FOM differences between modalities is implemented via either Dorfman-Berbaum-Metz or the Obuchowski-Rockette methods. Also implemented is single modality analysis, which allows comparison of performance of a group of radiologists to a specified value, or comparison of AI to a group of radiologists interpreting the same cases. Crossed-modality analysis is implemented wherein there are two crossed modality factors and the aim is to determined performance in each modality factor averaged over all levels of the second factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve the desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's collaborations. This update includes improvements to the code, some as a result of user-reported bugs and new feature requests, and others discovered during ongoing testing and code simplification.
Maintained by Dev Chakraborty. Last updated 5 months ago.
ai-optimizationartificial-intelligence-algorithmscomputer-aided-diagnosisfroc-analysisroc-analysistarget-classificationtarget-localizationcpp
0.5 match 19 stars 5.69 score 65 scriptspweidemueller
fullRankMatrix:Generation of Full Rank Design Matrix
Creates a full rank matrix out of a given matrix. The intended use is for one-hot encoded design matrices that should be used in linear models to ensure that significant associations can be correctly interpreted. However, 'fullRankMatrix' can be applied to any matrix to make it full rank. It removes columns with only 0's, merges duplicated columns and discovers linearly dependent columns and replaces them with linearly independent columns that span the space of the original columns. Columns are renamed to reflect those modifications. This results in a full rank matrix that can be used as a design matrix in linear models. The algorithm and some functions are inspired by Kuhn, M. (2008) <doi:10.18637/jss.v028.i05>.
Maintained by Paula Weidemueller. Last updated 9 months ago.
0.5 match 14 stars 5.62 score 6 scriptsbioc
msmsTests:LC-MS/MS Differential Expression Tests
Statistical tests for label-free LC-MS/MS data by spectral counts, to discover differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package.The three models admit blocking factors to control for nuissance variables.To assure a good level of reproducibility a post-test filter is available, where we may set the minimum effect size considered biologicaly relevant, and the minimum expression of the most abundant condition.
Maintained by Josep Gregori i Font. Last updated 5 months ago.
immunooncologysoftwaremassspectrometryproteomics
0.5 match 5.03 score 15 scripts 1 dependentsbioc
PepSetTest:Peptide Set Test
Peptide Set Test (PepSetTest) is a peptide-centric strategy to infer differentially expressed proteins in LC-MS/MS proteomics data. This test detects coordinated changes in the expression of peptides originating from the same protein and compares these changes against the rest of the peptidome. Compared to traditional aggregation-based approaches, the peptide set test demonstrates improved statistical power, yet controlling the Type I error rate correctly in most cases. This test can be valuable for discovering novel biomarkers and prioritizing drug targets, especially when the direct application of statistical analysis to protein data fails to provide substantial insights.
Maintained by Junmin Wang. Last updated 5 months ago.
differentialexpressionregressionproteomicsmassspectrometry
0.5 match 2 stars 5.00 score 9 scriptsbioc
sSeq:Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size
The purpose of this package is to discover the genes that are differentially expressed between two conditions in RNA-seq experiments. Gene expression is measured in counts of transcripts and modeled with the Negative Binomial (NB) distribution using a shrinkage approach for dispersion estimation. The method of moment (MM) estimates for dispersion are shrunk towards an estimated target, which minimizes the average squared difference between the shrinkage estimates and the initial estimates. The exact per-gene probability under the NB model is calculated, and used to test the hypothesis that the expected expression of a gene in two conditions identically follow a NB distribution.
Maintained by Danni Yu. Last updated 5 months ago.
0.5 match 4.98 score 4 scripts 2 dependentspaulgovan
eAnalytics:Dynamic Web-Based Analytics for the Energy Industry
A 'Shiny' web application for energy industry analytics. Take an overview of the industry, measure Key Performance Indicators, identify changes in the industry over time, and discover new relationships in the data.
Maintained by Paul Govan. Last updated 6 months ago.
analyticsenergyshinyshinydashboardvisualization
0.5 match 34 stars 4.83 score 1 scriptsoverton-group
eHDPrep:Quality Control and Semantic Enrichment of Datasets
A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) <doi:10.1093/gigascience/giad030>). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.
Maintained by Ian Overton. Last updated 2 years ago.
data-qualityhealth-informaticssemantic-enrichment
0.5 match 8 stars 4.90 score 10 scriptsbioc
methylscaper:Visualization of Methylation Data
methylscaper is an R package for processing and visualizing data jointly profiling methylation and chromatin accessibility (MAPit, NOMe-seq, scNMT-seq, nanoNOMe, etc.). The package supports both single-cell and single-molecule data, and a common interface for jointly visualizing both data types through the generation of ordered representational methylation-state matrices. The Shiny app allows for an interactive seriation process of refinement and re-weighting that optimally orders the cells or DNA molecules to discover methylation patterns and nucleosome positioning.
Maintained by Bacher Rhonda. Last updated 5 months ago.
dnamethylationepigeneticssequencingvisualizationsinglecellnucleosomepositioning
0.5 match 1 stars 4.90 score 3 scriptsmpru
ggcleveland:Implementation of Plots from Cleveland's Visualizing Data Book
William S. Cleveland's book 'Visualizing Data' is a classic piece of literature on Exploratory Data Analysis. Although it was written several decades ago, its content is still relevant as it proposes several tools which are useful to discover patterns and relationships among the data under study, and also to assess the goodness of fit of a model. This package provides functions to produce the 'ggplot2' versions of the visualization tools described in this book and is thought to be used in the context of courses on Exploratory Data Analysis.
Maintained by Marcos Prunello. Last updated 3 years ago.
0.5 match 9 stars 4.83 score 15 scriptsbioc
sarks:Suffix Array Kernel Smoothing for discovery of correlative sequence motifs and multi-motif domains
Suffix Array Kernel Smoothing (see https://academic.oup.com/bioinformatics/article-abstract/35/20/3944/5418797), or SArKS, identifies sequence motifs whose presence correlates with numeric scores (such as differential expression statistics) assigned to the sequences (such as gene promoters). SArKS smooths over sequence similarity, quantified by location within a suffix array based on the full set of input sequences. A second round of smoothing over spatial proximity within sequences reveals multi-motif domains. Discovered motifs can then be merged or extended based on adjacency within MMDs. False positive rates are estimated and controlled by permutation testing.
Maintained by Dennis Wylie. Last updated 5 months ago.
motifdiscoverygeneregulationgeneexpressiontranscriptomicsrnaseqdifferentialexpressionfeatureextractionopenjdk
0.5 match 3 stars 4.78 score 3 scriptsthomasjemielita
StratifiedMedicine:Stratified Medicine
A toolkit for stratified medicine, subgroup identification, and precision medicine. Current tools include (1) filtering models (reduce covariate space), (2) patient-level estimate models (counterfactual patient-level quantities, such as the conditional average treatment effect), (3) subgroup identification models (find subsets of patients with similar treatment effects), and (4) treatment effect estimation and inference (for the overall population and discovered subgroups). These tools can be customized and are directly used in PRISM (patient response identifiers for stratified medicine; Jemielita and Mehrotra 2019 <arXiv:1912.03337>. This package is in beta and will be continually updated.
Maintained by Thomas Jemielita. Last updated 3 years ago.
0.5 match 2 stars 4.73 score 27 scriptsriccardo-df
aggTrees:Aggregation Trees
Nonparametric data-driven approach to discovering heterogeneous subgroups in a selection-on-observables framework. 'aggTrees' allows researchers to assess whether there exists relevant heterogeneity in treatment effects by generating a sequence of optimal groupings, one for each level of granularity. For each grouping, we obtain point estimation and inference about the group average treatment effects. Please reference the use as Di Francesco (2024) <doi:10.48550/arXiv.2410.11408>.
Maintained by Riccardo Di Francesco. Last updated 29 days ago.
0.5 match 4.60 score 4 scriptscran
CASMI:'CASMI'-Based Functions
Contains Coverage Adjusted Standardized Mutual Information ('CASMI')-based functions. 'CASMI' is a fundamental concept of a series of methods. For more information about 'CASMI' and 'CASMI'-related methods, please refer to the corresponding publications (e.g., a feature selection method, Shi, J., Zhang, J., & Ge, Y. (2019) <doi:10.3390/e21121179>, and a dataset quality measurement method, Shi, J., Zhang, J., & Ge, Y. (2019) <doi:10.1109/ICHI.2019.8904553>) or contact the package author for the latest updates.
Maintained by Jingyi (Catherine) Shi. Last updated 30 days ago.
1.8 match 1.30 scorebioc
PAA:PAA (Protein Array Analyzer)
PAA imports single color (protein) microarray data that has been saved in gpr file format - esp. ProtoArray data. After preprocessing (background correction, batch filtering, normalization) univariate feature preselection is performed (e.g., using the "minimum M statistic" approach - hereinafter referred to as "mMs"). Subsequently, a multivariate feature selection is conducted to discover biomarker candidates. Therefore, either a frequency-based backwards elimination aproach or ensemble feature selection can be used. PAA provides a complete toolbox of analysis tools including several different plots for results examination and evaluation.
Maintained by Michael Turewicz. Last updated 5 months ago.
classificationmicroarrayonechannelproteomicscpp
0.5 match 4.34 score 11 scriptsosysoev
psica:Decision Tree Analysis for Probabilistic Subgroup Identification with Multiple Treatments
In the situation when multiple alternative treatments or interventions available, different population groups may respond differently to different treatments. This package implements a method that discovers the population subgroups in which a certain treatment has a better effect than the other alternative treatments. This is done by first estimating the treatment effect for a given treatment and its uncertainty by computing random forests, and the resulting model is summarized by a decision tree in which the probabilities that the given treatment is best for a given subgroup is shown in the corresponding terminal node of the tree.
Maintained by Oleg Sysoev. Last updated 5 years ago.
2.2 match 1.00 score 1 scriptsswarm-lab
CEC:Cross-Entropy Clustering
Splits data into Gaussian type clusters using the Cross-Entropy Clustering ('CEC') method. This method allows for the simultaneous use of various types of Gaussian mixture models, for performing the reduction of unnecessary clusters, and for discovering new clusters by splitting them. 'CEC' is based on the work of Spurek, P. and Tabor, J. (2014) <doi:10.1016/j.patcog.2014.03.006>.
Maintained by Simon Garnier. Last updated 5 months ago.
clusteringcross-entropyopenblascpp
0.5 match 10 stars 4.26 score 18 scriptsbioc
PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit
Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationsurvivalclusteringgeneprediction
0.5 match 1 stars 4.31 score 17 scriptsnystat
COLP:Causal Discovery for Categorical Data with Label Permutation
Discover causality for bivariate categorical data. This package aims to enable users to discover causality for bivariate observational categorical data. See Ni, Y. (2022) <arXiv:2209.08579> "Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation. Advances in Neural Information Processing Systems 35 (in press)".
Maintained by Yang Ni. Last updated 2 years ago.
0.8 match 1 stars 2.70 scorebupaverse
heuristicsmineR:Discovery of Process Models with the Heuristics Miner
Provides the heuristics miner algorithm for process discovery as proposed by Weijters et al. (2011) <doi:10.1109/CIDM.2011.5949453>. The algorithm builds a causal net from an event log created with the 'bupaR' package. Event logs are a set of ordered sequences of events for which 'bupaR' provides the S3 class eventlog(). The discovered causal nets can be visualised as 'htmlwidgets' and it is possible to annotate them with the occurrence frequency or processing and waiting time of process activities.
Maintained by Felix Mannhardt. Last updated 3 years ago.
buparevent-logheuristics-minerpetri-netprocess-miningcpp
0.5 match 14 stars 4.08 score 17 scriptsbioc
survtype:Subtype Identification with Survival Data
Subtypes are defined as groups of samples that have distinct molecular and clinical features. Genomic data can be analyzed for discovering patient subtypes, associated with clinical data, especially for survival information. This package is aimed to identify subtypes that are both clinically relevant and biologically meaningful.
Maintained by Dongmin Jung. Last updated 5 months ago.
softwarestatisticalmethodgeneexpressionsurvivalclusteringsequencingcoverage
0.5 match 4.00 score 3 scriptss-u
scagnostics:Compute scagnostics - scatterplot diagnostics
Calculates graph theoretic scagnostics. Scagnostics describe various measures of interest for pairs of variables, based on their appearance on a scatterplot. They are useful tool for discovering interesting or unusual scatterplots from a scatterplot matrix, without having to look at every individual plot.
Maintained by Simon Urbanek. Last updated 3 years ago.
0.5 match 1 stars 3.81 score 87 scripts 1 dependentsbioc
RareVariantVis:A suite for analysis of rare genomic variants in whole genome sequencing data
Second version of RareVariantVis package aims to provide comprehensive information about rare variants for your genome data. It annotates, filters and presents genomic variants (especially rare ones) in a global, per chromosome way. For discovered rare variants CRISPR guide RNAs are designed, so the user can plan further functional studies. Large structural variants, including copy number variants are also supported. Package accepts variants directly from variant caller - for example GATK or Speedseq. Output of package are lists of variants together with adequate visualization. Visualization of variants is performed in two ways - standard that outputs png figures and interactive that uses JavaScript d3 package. Interactive visualization allows to analyze trio/family data, for example in search for causative variants in rare Mendelian diseases, in point-and-click interface. The package includes homozygous region caller and allows to analyse whole human genomes in less than 30 minutes on a desktop computer. RareVariantVis disclosed novel causes of several rare monogenic disorders, including one with non-coding causative variant - keratolythic winter erythema.
Maintained by Tomasz Stokowy. Last updated 5 months ago.
genomicvariationsequencingwholegenome
0.5 match 3.90 score 1 scriptsbioc
GeneNetworkBuilder:GeneNetworkBuilder: a bioconductor package for building regulatory network using ChIP-chip/ChIP-seq data and Gene Expression Data
Appliation for discovering direct or indirect targets of transcription factors using ChIP-chip or ChIP-seq, and microarray or RNA-seq gene expression data. Inputting a list of genes of potential targets of one TF from ChIP-chip or ChIP-seq, and the gene expression results, GeneNetworkBuilder generates a regulatory network of the TF.
Maintained by Jianhong Ou. Last updated 9 days ago.
sequencingmicroarraygraphandnetworkcpp
0.5 match 3.77 score 17 scriptsstatuser
RGremlinsConjoint:Estimate the "Gremlins in the Data" Model for Conjoint Studies
The tools and utilities to estimate the model described in "Gremlin's in the Data: Identifying the Information Content of Research Subjects" (Howell et al. (2021) <doi:10.1177/0022243720965930>) using conjoint analysis data such as that collected in Sawtooth Software's 'Lighthouse' or 'Discover' products. Additional utilities are included for formatting the input data.
Maintained by John Howell. Last updated 2 years ago.
0.5 match 3.70 score 6 scriptsbioc
pandaR:PANDA Algorithm
Runs PANDA, an algorithm for discovering novel network structure by combining information from multiple complementary data sources.
Maintained by Joseph N. Paulson. Last updated 5 months ago.
statisticalmethodgraphandnetworkmicroarraygeneregulationnetworkinferencegeneexpressiontranscriptionnetwork
0.5 match 3.30 score 8 scriptsjoerigdon
nearfar:Near-Far Matching
Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable. Methods outlined in further detail in Rigdon, Baiocchi, and Basu (2018) <doi:10.18637/jss.v086.c05>.
Maintained by Joseph Rigdon. Last updated 1 years ago.
1.6 match 1.08 score 12 scriptsyhenryli
PAC:Partition-Assisted Clustering and Multiple Alignments of Networks
Implements partition-assisted clustering and multiple alignments of networks. It 1) utilizes partition-assisted clustering to find robust and accurate clusters and 2) discovers coherent relationships of clusters across multiple samples. It is particularly useful for analyzing single-cell data set. Please see Li et al. (2017) <doi:10.1371/journal.pcbi.1005875> for detail method description.
Maintained by Ye Henry Li. Last updated 4 years ago.
0.5 match 3.30 score 7 scriptskorydjohnson
rai:Revisiting-Alpha-Investing for Polynomial Regression
A modified implementation of stepwise regression that greedily searches the space of interactions among features in order to build polynomial regression models. Furthermore, the hypothesis tests conducted are valid-post model selection due to the use of a revisiting procedure that implements an alpha-investing rule. As a result, the set of rejected sequential hypotheses is proven to control the marginal false discover rate. When not searching for polynomials, the package provides a statistically valid algorithm to run and terminate stepwise regression. For more information, see Johnson, Stine, and Foster (2019) <arXiv:1510.06322>.
Maintained by Kory D. Johnson. Last updated 3 years ago.
0.5 match 3 stars 3.18 score 7 scriptsnystat
OrdCD:Ordinal Causal Discovery
Algorithms for ordinal causal discovery. This package aims to enable users to discover causality for observational ordinal categorical data with greedy and exhaustive search. See Ni, Y., & Mallick, B. (2022) <https://proceedings.mlr.press/v180/ni22a/ni22a.pdf> "Ordinal Causal Discovery. Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence, (UAI 2022), PMLR 180:1530–1540".
Maintained by Yang Ni. Last updated 2 years ago.
0.5 match 2.70 scorevankesteren
cmfilter:Coordinate-Wise Mediation Filter
Functions to discover, plot, and select multiple mediators from an x -> M -> y linear system. This exploratory mediation analysis is performed using the Coordinate-wise Mediation Filter as introduced by Van Kesteren and Oberski (2019) <doi: 10.1080/10705511.2019.1588124>.
Maintained by Erik-Jan van Kesteren. Last updated 2 years ago.
0.5 match 4 stars 2.30 score 4 scriptsgjhunt
rrscale:Robust Re-Scaling to Better Recover Latent Effects in Data
Non-linear transformations of data to better discover latent effects. Applies a sequence of three transformations (1) a Gaussianizing transformation, (2) a Z-score transformation, and (3) an outlier removal transformation. A publication describing the method has the following citation: Gregory J. Hunt, Mark A. Dane, James E. Korkola, Laura M. Heiser & Johann A. Gagnon-Bartsch (2020) "Automatic Transformation and Integration to Improve Visualization and Discovery of Latent Effects in Imaging Data", Journal of Computational and Graphical Statistics, <doi:10.1080/10618600.2020.1741379>.
Maintained by Gregory Hunt. Last updated 5 years ago.
0.5 match 2.30 score 9 scriptskobiperl
mHG:Minimum-Hypergeometric Test
Runs a minimum-hypergeometric (mHG) test as described in: Eden, E. (2007). Discovering Motifs in Ranked Lists of DNA Sequences. Haifa.
Maintained by Kobi Perl. Last updated 8 years ago.
0.5 match 1 stars 2.19 score 31 scriptscran
exploreR:Tools for Quickly Exploring Data
Simplifies some complicated and labor intensive processes involved in exploring and explaining data. Allows you to quickly and efficiently visualize the interaction between variables and simplifies the process of discovering covariation in your data. Also includes some convenience features designed to remove as much redundant typing as possible.
Maintained by Michael Coates. Last updated 9 years ago.
0.5 match 2.00 scorecran
DMtest:Differential Methylation Tests (DMtest)
Several tests for differential methylation in methylation array data, including one-sided differential mean and variance test. Methods used in the package refer to Dai, J, Wang, X, Chen, H and others (2021) "Incorporating increased variability in discovering cancer methylation markers", Biostatistics, submitted.
Maintained by James Dai. Last updated 4 years ago.
0.5 match 2.00 score 3 scriptsempiricalbayes
LFDREmpiricalBayes:Estimating Local False Discovery Rates Using Empirical Bayes Methods
New empirical Bayes methods aiming at analyzing the association of single nucleotide polymorphisms (SNPs) to some particular disease are implemented in this package. The package uses local false discovery rate (LFDR) estimates of SNPs within a sample population defined as a "reference class" and discovers if SNPs are associated with the corresponding disease. Although SNPs are used throughout this document, other biological data such as protein data and other gene data can be used. Karimnezhad, Ali and Bickel, D. R. (2016) <http://hdl.handle.net/10393/34889>.
Maintained by Ali Karimnezhad. Last updated 7 years ago.
bayesianmathematicalbiologymultiplecomparison
0.5 match 2.00 score 5 scriptsreimand0
ActiveDriver:Finding Cancer Driver Proteins with Enriched Mutations in Post-Translational Modification Sites
A mutation analysis tool that discovers cancer driver genes with frequent mutations in protein signalling sites such as post-translational modifications (phosphorylation, ubiquitination, etc). The Poisson generalised linear regression model identifies genes where cancer mutations in signalling sites are more frequent than expected from the sequence of the entire gene. Integration of mutations with signalling information helps find new driver genes and propose candidate mechanisms to known drivers. Reference: Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Juri Reimand and Gary D Bader. Molecular Systems Biology (2013) 9:637 <doi:10.1038/msb.2012.68>.
Maintained by Juri Reimand. Last updated 8 years ago.
0.5 match 2.00 score 6 scriptscheweichang1992
GGoutlieR:Identify Individuals with Unusual Geo-Genetic Patterns
Identify and visualize individuals with unusual association patterns of genetics and geography using the approach of Chang and Schmid (2023) <doi:10.1101/2023.04.06.535838>. It detects potential outliers that violate the isolation-by-distance assumption using the K-nearest neighbor approach. You can obtain a table of outliers with statistics and visualize unusual geo-genetic patterns on a geographical map. This is useful for landscape genomics studies to discover individuals with unusual geography and genetics associations from a large biological sample.
Maintained by Che-Wei Chang. Last updated 1 years ago.
0.5 match 1.18 score 15 scriptscran
onc.api:Oceans 2.0 API Client Library
Allows users to discover and retrieve Ocean Networks Canada's oceanographic data in raw, text, image, audio, video or any other format available. Provides a class that wraps web service calls and business logic so that users can download data with a single line of code.
Maintained by Bennit Mueller. Last updated 4 years ago.
0.5 match 1.00 score 1 scriptscran
costat:Time Series Costationarity Determination
Contains functions that can determine whether a time series is second-order stationary or not (and hence evidence for locally stationarity). Given two non-stationary series (i.e. locally stationary series) this package can then discover time-varying linear combinations that are second-order stationary. Cardinali, A. and Nason, G.P. (2013) <doi:10.18637/jss.v055.i01>.
Maintained by Guy Nason. Last updated 2 years ago.
0.5 match 1.00 scorecbergmeir
opusminer:OPUS Miner Algorithm for Filtered Top-k Association Discovery
Provides a simple R interface to the OPUS Miner algorithm (implemented in C++) for finding the top-k productive, non-redundant itemsets from transaction data. The OPUS Miner algorithm uses the OPUS search algorithm to efficiently discover the key associations in transaction data, in the form of self-sufficient itemsets, using either leverage or lift. See <http://i.giwebb.com/index.php/research/association-discovery/> for more information in relation to the OPUS Miner algorithm.
Maintained by Christoph Bergmeir. Last updated 5 years ago.
0.5 match 1 stars 1.00 score 2 scriptssyedhaider5
iDOS:Integrated Discovery of Oncogenic Signatures
A method to integrate molecular profiles of cancer patients (gene copy number and mRNA abundance) to identify candidate gain of function alterations. These candidate alterations can be subsequently further tested to discover cancer driver alterations. Briefly, this method tests of genomic correlates of mRNA dysregulation and prioritise those where DNA gains/amplifications are associated with elevated mRNA expression of the same gene. For details see, Haider S et al. (2016) "Genomic alterations underlie a pan-cancer metabolic shift associated with tumour hypoxia", Genome Biology, <https://pubmed.ncbi.nlm.nih.gov/27358048/>.
Maintained by Syed Haider. Last updated 1 years ago.
0.5 match 1.00 score 10 scripts