Showing 200 of total 328 results (show query)

crsh

rmdfiltr:'Lua'-Filters for R Markdown

A collection of 'Lua' filters that extend the functionality of R Markdown templates (e.g., count words or post-process citations).

Maintained by Frederik Aust. Last updated 5 months ago.

3.1 match 42 stars 8.08 score 4 scripts 3 dependents

anestistouloumis

SimCorMultRes:Simulates Correlated Multinomial Responses

Simulates correlated multinomial responses conditional on a marginal model specification.

Maintained by Anestis Touloumis. Last updated 12 months ago.

binarylongitudinal-studiesmultinomialsimulation

2.0 match 7 stars 6.04 score 26 scripts 2 dependents

patzaw

BED:Biological Entity Dictionary (BED)

An interface for the 'Neo4j' database providing mapping between different identifiers of biological entities. This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information. The method has been published by Godard and van Eyll (2018) <doi:10.12688/f1000research.13925.3>.

Maintained by Patrice Godard. Last updated 3 months ago.

1.5 match 8 stars 6.85 score 25 scripts

neon-biodiversity

Ostats:O-Stats, or Pairwise Community-Level Niche Overlap Statistics

O-statistics, or overlap statistics, measure the degree of community-level trait overlap. They are estimated by fitting nonparametric kernel density functions to each species’ trait distribution and calculating their areas of overlap. For instance, the median pairwise overlap for a community is calculated by first determining the overlap of each species pair in trait space, and then taking the median overlap of each species pair in a community. This median overlap value is called the O-statistic (O for overlap). The Ostats() function calculates separate univariate overlap statistics for each trait, while the Ostats_multivariate() function calculates a single multivariate overlap statistic for all traits. O-statistics can be evaluated against null models to obtain standardized effect sizes. 'Ostats' is part of the collaborative Macrosystems Biodiversity Project "Local- to continental-scale drivers of biodiversity across the National Ecological Observatory Network (NEON)." For more information on this project, see the Macrosystems Biodiversity Website (<https://neon-biodiversity.github.io/>). Calculation of O-statistics is described in Read et al. (2018) <doi:10.1111/ecog.03641>, and a teaching module for introducing the underlying biological concepts at an undergraduate level is described in Grady et al. (2018) <http://tiee.esa.org/vol/v14/issues/figure_sets/grady/abstract.html>.

Maintained by Quentin D. Read. Last updated 4 months ago.

ecology

1.5 match 7 stars 6.69 score 28 scripts

eltebioinformatics

mulea:Enrichment Analysis Using Multiple Ontologies and False Discovery Rate

Background - Traditional gene set enrichment analyses are typically limited to a few ontologies and do not account for the interdependence of gene sets or terms, resulting in overcorrected p-values. To address these challenges, we introduce mulea, an R package offering comprehensive overrepresentation and functional enrichment analysis. Results - mulea employs a progressive empirical false discovery rate (eFDR) method, specifically designed for interconnected biological data, to accurately identify significant terms within diverse ontologies. mulea expands beyond traditional tools by incorporating a wide range of ontologies, encompassing Gene Ontology, pathways, regulatory elements, genomic locations, and protein domains. This flexibility enables researchers to tailor enrichment analysis to their specific questions, such as identifying enriched transcriptional regulators in gene expression data or overrepresented protein domains in protein sets. To facilitate seamless analysis, mulea provides gene sets (in standardised GMT format) for 27 model organisms, covering 22 ontology types from 16 databases and various identifiers resulting in almost 900 files. Additionally, the muleaData ExperimentData Bioconductor package simplifies access to these pre-defined ontologies. Finally, mulea's architecture allows for easy integration of user-defined ontologies, or GMT files from external sources (e.g., MSigDB or Enrichr), expanding its applicability across diverse research areas. Conclusions - mulea is distributed as a CRAN R package. It offers researchers a powerful and flexible toolkit for functional enrichment analysis, addressing limitations of traditional tools with its progressive eFDR and by supporting a variety of ontologies. Overall, mulea fosters the exploration of diverse biological questions across various model organisms.

Maintained by Tamas Stirling. Last updated 3 months ago.

annotationdifferentialexpressiongeneexpressiongenesetenrichmentgographandnetworkmultiplecomparisonpathwaysreactomesoftwaretranscriptionvisualizationenrichmentenrichment-analysisfunctional-enrichment-analysisgene-set-enrichmentontologiestranscriptomicscpp

1.3 match 28 stars 7.36 score 34 scripts

lakshay-anand

chromoMap:Interactive Genomic Visualization of Biological Data

Provides interactive, configurable and elegant graphics visualization of the chromosomes or chromosome regions of any living organism allowing users to map chromosome elements (like genes, SNPs etc.) on the chromosome plot. It introduces a special plot viz. the "chromosome heatmap" that, in addition to mapping elements, can visualize the data associated with chromosome elements (like gene expression) in the form of heat colors which can be highly advantageous in the scientific interpretations and research work. Because of the large size of the chromosomes, it is impractical to visualize each element on the same plot. However, the plot provides a magnified view for each of chromosome locus to render additional information and visualization specific for that location. You can map thousands of genes and can view all mappings easily. Users can investigate the detailed information about the mappings (like gene names or total genes mapped on a location) or can view the magnified single or double stranded view of the chromosome at a location showing each mapped element in sequential order. The package provide multiple features like visualizing multiple sets, chromosome heat-maps, group annotations, adding hyperlinks, and labelling. The plots can be saved as HTML documents that can be customized and shared easily. In addition, you can include them in R Markdown or in R 'Shiny' applications.

Maintained by Lakshay Anand. Last updated 3 years ago.

2.2 match 9 stars 4.46 score 80 scripts

lcrawlab

mvMAPIT:Multivariate Genome Wide Marginal Epistasis Test

Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this package, we present the 'multivariate MArginal ePIstasis Test' ('mvMAPIT') – a multi-outcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact – thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search based methods. Our proposed 'mvMAPIT' builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate 'mvMAPIT' as a multivariate linear mixed model and develop a multi-trait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. Crawford et al. (2017) <doi:10.1371/journal.pgen.1006869>. Stamp et al. (2023) <doi:10.1093/g3journal/jkad118>.

Maintained by Julian Stamp. Last updated 5 months ago.

cppepistasisepistasis-analysisgwasgwas-toolslinear-mixed-modelsmapitmvmapitvariance-componentsopenblascppopenmp

1.3 match 11 stars 6.90 score 17 scripts 1 dependents

mikejareds

hermiter:Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate)

Facilitates estimation of full univariate and bivariate probability density functions and cumulative distribution functions along with full quantile functions (univariate) and nonparametric correlation (bivariate) using Hermite series based estimators. These estimators are particularly useful in the sequential setting (both stationary and non-stationary) and one-pass batch estimation setting for large data sets. Based on: Stephanou, Michael, Varughese, Melvin and Macdonald, Iain. "Sequential quantiles via Hermite series density estimation." Electronic Journal of Statistics 11.1 (2017): 570-607 <doi:10.1214/17-EJS1245>, Stephanou, Michael and Varughese, Melvin. "On the properties of Hermite series based distribution function estimators." Metrika (2020) <doi:10.1007/s00184-020-00785-z> and Stephanou, Michael and Varughese, Melvin. "Sequential estimation of Spearman rank correlation using Hermite series estimators." Journal of Multivariate Analysis (2021) <doi:10.1016/j.jmva.2021.104783>.

Maintained by Michael Stephanou. Last updated 7 months ago.

cumulative-distribution-functionkendall-correlation-coefficientonline-algorithmsprobability-density-functionquantilespearman-correlation-coefficientstatisticsstreaming-algorithmsstreaming-datacpp

1.5 match 15 stars 5.58 score 17 scripts

jl5000

tidyged:Handle GEDCOM Files Using Tidyverse Principles

Create and summarise family tree GEDCOM files using tidy dataframes.

Maintained by Jamie Lendrum. Last updated 3 years ago.

1.3 match 8 stars 5.96 score 23 scripts 3 dependents

muschellij2

gcite:Google Citation Parser

Scrapes Google Citation pages and creates data frames of citations over time.

Maintained by John Muschelli. Last updated 3 years ago.

2.0 match 3 stars 3.67 score 31 scripts

bioc

TargetDecoy:Diagnostic Plots to Evaluate the Target Decoy Approach

A first step in the data analysis of Mass Spectrometry (MS) based proteomics data is to identify peptides and proteins. With this respect the huge number of experimental mass spectra typically have to be assigned to theoretical peptides derived from a sequence database. Search engines are used for this purpose. These tools compare each of the observed spectra to all candidate theoretical spectra derived from the sequence data base and calculate a score for each comparison. The observed spectrum is then assigned to the theoretical peptide with the best score, which is also referred to as the peptide to spectrum match (PSM). It is of course crucial for the downstream analysis to evaluate the quality of these matches. Therefore False Discovery Rate (FDR) control is used to return a reliable list PSMs. The FDR, however, requires a good characterisation of the score distribution of PSMs that are matched to the wrong peptide (bad target hits). In proteomics, the target decoy approach (TDA) is typically used for this purpose. The TDA method matches the spectra to a database of real (targets) and nonsense peptides (decoys). A popular approach to generate these decoys is to reverse the target database. Hence, all the PSMs that match to a decoy are known to be bad hits and the distribution of their scores are used to estimate the distribution of the bad scoring target PSMs. A crucial assumption of the TDA is that the decoy PSM hits have similar properties as bad target hits so that the decoy PSM scores are a good simulation of the target PSM scores. Users, however, typically do not evaluate these assumptions. To this end we developed TargetDecoy to generate diagnostic plots to evaluate the quality of the target decoy method.

Maintained by Elke Debrie. Last updated 5 months ago.

massspectrometryproteomicsqualitycontrolsoftwarevisualizationbioconductormass-spectrometry

1.5 match 1 stars 4.60 score 9 scripts