Showing 106 of total 106 results (show query)

bioc

pepStat:Statistical analysis of peptide microarrays

Statistical analysis of peptide microarrays

Maintained by Gregory C Imholte. Last updated 5 months ago.

microarraypreprocessing

18.6 match 7 stars 5.62 score 4 scripts

wraff

wrProteo:Proteomics Data Analysis Functions

Data analysis of proteomics experiments by mass spectrometry is supported by this collection of functions mostly dedicated to the analysis of (bottom-up) quantitative (XIC) data. Fasta-formatted proteomes (eg from UniProt Consortium <doi:10.1093/nar/gky1049>) can be read with automatic parsing and multiple annotation types (like species origin, abbreviated gene names, etc) extracted. Initial results from multiple software for protein (and peptide) quantitation can be imported (to a common format): MaxQuant (Tyanova et al 2016 <doi:10.1038/nprot.2016.136>), Dia-NN (Demichev et al 2020 <doi:10.1038/s41592-019-0638-x>), Fragpipe (da Veiga et al 2020 <doi:10.1038/s41592-020-0912-y>), ionbot (Degroeve et al 2021 <doi:10.1101/2021.07.02.450686>), MassChroq (Valot et al 2011 <doi:10.1002/pmic.201100120>), OpenMS (Strauss et al 2021 <doi:10.1038/nmeth.3959>), ProteomeDiscoverer (Orsburn 2021 <doi:10.3390/proteomes9010015>), Proline (Bouyssie et al 2020 <doi:10.1093/bioinformatics/btaa118>), AlphaPept (preprint Strauss et al <doi:10.1101/2021.07.23.453379>) and Wombat-P (Bouyssie et al 2023 <doi:10.1021/acs.jproteome.3c00636>. Meta-data provided by initial analysis software and/or in sdrf format can be integrated to the analysis. Quantitative proteomics measurements frequently contain multiple NA values, due to physical absence of given peptides in some samples, limitations in sensitivity or other reasons. Help is provided to inspect the data graphically to investigate the nature of NA-values via their respective replicate measurements and to help/confirm the choice of NA-replacement algorithms. Meta-data in sdrf-format (Perez-Riverol et al 2020 <doi:10.1021/acs.jproteome.0c00376>) or similar tabular formats can be imported and included. Missing values can be inspected and imputed based on the concept of NA-neighbours or other methods. Dedicated filtering and statistical testing using the framework of package 'limma' <doi:10.18129/B9.bioc.limma> can be run, enhanced by multiple rounds of NA-replacements to provide robustness towards rare stochastic events. Multi-species samples, as frequently used in benchmark-tests (eg Navarro et al 2016 <doi:10.1038/nbt.3685>, Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>), can be run with special options considering such sub-groups during normalization and testing. Subsequently, ROC curves (Hand and Till 2001 <doi:10.1023/A:1010920819831>) can be constructed to compare multiple analysis approaches. As detailed example the data-set from Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>) quantified by MaxQuant, ProteomeDiscoverer, and Proline is provided with a detailed analysis of heterologous spike-in proteins.

Maintained by Wolfgang Raffelsberger. Last updated 4 months ago.

19.9 match 3.67 score 17 scripts 1 dependents

cran

OrgMassSpecR:Organic Mass Spectrometry

Organic/biological mass spectrometry data analysis.

Maintained by Nathan Dodder. Last updated 8 years ago.

19.5 match 3.68 score 2 dependents

laurafancello

net4pg:Handle Ambiguity of Protein Identifications from Shotgun Proteomics

In shotgun proteomics, shared peptides (i.e., peptides that might originate from different proteins sharing homology, from different proteoforms due to alternative mRNA splicing, post-translational modifications, proteolytic cleavages, and/or allelic variants) represent a major source of ambiguity in protein identifications. The 'net4pg' package allows to assess and handle ambiguity of protein identifications. It implements methods for two main applications. First, it allows to represent and quantify ambiguity of protein identifications by means of graph connected components (CCs). In graph theory, CCs are defined as the largest subgraphs in which any two vertices are connected to each other by a path and not connected to any other of the vertices in the supergraph. Here, proteins sharing one or more peptides are thus gathered in the same CC (multi-protein CC), while unambiguous protein identifications constitute CCs with a single protein vertex (single-protein CCs). Therefore, the proportion of single-protein CCs and the size of multi-protein CCs can be used to measure the level of ambiguity of protein identifications. The package implements a strategy to efficiently calculate graph connected components on large datasets and allows to visually inspect them. Secondly, the 'net4pg' package allows to exploit the increasing availability of matched transcriptomic and proteomic datasets to reduce ambiguity of protein identifications. More precisely, it implement a transcriptome-based filtering strategy fundamentally consisting in the removal of those proteins whose corresponding transcript is not expressed in the sample-matched transcriptome. The underlying assumption is that, according to the central dogma of biology, there can be no proteins without the corresponding transcript. Most importantly, the package allows to visually inspect the effect of the filtering on protein identifications and quantify ambiguity before and after filtering by means of graph connected components. As such, it constitutes a reproducible and transparent method to exploit transcriptome information to enhance protein identifications. All methods implemented in the 'net4pg' package are fully described in Fancello and Burger (2022) <doi:10.1186/s13059-022-02701-2>.

Maintained by Laura Fancello. Last updated 3 years ago.

15.2 match 2 stars 4.00 score 3 scripts

patzaw

BED:Biological Entity Dictionary (BED)

An interface for the 'Neo4j' database providing mapping between different identifiers of biological entities. This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information. The method has been published by Godard and van Eyll (2018) <doi:10.12688/f1000research.13925.3>.

Maintained by Patrice Godard. Last updated 3 months ago.

7.7 match 8 stars 6.85 score 25 scripts

benbruyneel

massSpectrometryR:massSpectrometryR

Provides calculations, plotting etc for chemistry & mass spectrometry.

Maintained by Ben Bruyneel. Last updated 4 months ago.

mass-spectrometryproteomicsopenjdk

8.1 match 1.70 score 1 scripts

bioc

IsoBayes:IsoBayes: Single Isoform protein inference Method via Bayesian Analyses

IsoBayes is a Bayesian method to perform inference on single protein isoforms. Our approach infers the presence/absence of protein isoforms, and also estimates their abundance; additionally, it provides a measure of the uncertainty of these estimates, via: i) the posterior probability that a protein isoform is present in the sample; ii) a posterior credible interval of its abundance. IsoBayes inputs liquid cromatography mass spectrometry (MS) data, and can work with both PSM counts, and intensities. When available, trascript isoform abundances (i.e., TPMs) are also incorporated: TPMs are used to formulate an informative prior for the respective protein isoform relative abundance. We further identify isoforms where the relative abundance of proteins and transcripts significantly differ. We use a two-layer latent variable approach to model two sources of uncertainty typical of MS data: i) peptides may be erroneously detected (even when absent); ii) many peptides are compatible with multiple protein isoforms. In the first layer, we sample the presence/absence of each peptide based on its estimated probability of being mistakenly detected, also known as PEP (i.e., posterior error probability). In the second layer, for peptides that were estimated as being present, we allocate their abundance across the protein isoforms they map to. These two steps allow us to recover the presence and abundance of each protein isoform.

Maintained by Simone Tiberi. Last updated 5 months ago.

statisticalmethodbayesianproteomicsmassspectrometryalternativesplicingsequencingrnaseqgeneexpressiongeneticsvisualizationsoftwarecpp

2.2 match 7 stars 5.39 score 10 scripts

jacky11

imp4p:Imputation for Proteomics

Functions to analyse missing value mechanisms and to impute data sets in the context of bottom-up MS-based proteomics.

Maintained by Quentin Giai Gianetto. Last updated 4 years ago.

cpp

5.2 match 1 stars 2.00 score 33 scripts 1 dependents

grosenberger

aLFQ:Estimating Absolute Protein Quantities from Label-Free LC-MS/MS Proteomics Data

Determination of absolute protein quantities is necessary for multiple applications, such as mechanistic modeling of biological systems. Quantitative liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics can measure relative protein abundance on a system-wide scale. To estimate absolute quantitative information using these relative abundance measurements requires additional information such as heavy-labeled references of known concentration. Multiple methods have been using different references and strategies; some are easily available whereas others require more effort on the users end. Hence, we believe the field might benefit from making some of these methods available under an automated framework, which also facilitates validation of the chosen strategy. We have implemented the most commonly used absolute label-free protein abundance estimation methods for LC-MS/MS modes quantifying on either MS1-, MS2-levels or spectral counts together with validation algorithms to enable automated data analysis and error estimation. Specifically, we used Monte-carlo cross-validation and bootstrapping for model selection and imputation of proteome-wide absolute protein quantity estimation. Our open-source software is written in the statistical programming language R and validated and demonstrated on a synthetic sample.

Maintained by George Rosenberger. Last updated 5 years ago.

5.3 match 1.85 score 14 scripts

sareameri

ftrCOOL:Feature Extraction from Biological Sequences

Extracts features from biological sequences. It contains most features which are presented in related work and also includes features which have never been introduced before. It extracts numerous features from nucleotide and peptide sequences. Each feature converts the input sequences to discrete numbers in order to use them as predictors in machine learning models. There are many features and information which are hidden inside a sequence. Utilizing the package, users can convert biological sequences to discrete models based on chosen properties. References: 'iLearn' 'Z. Chen et al.' (2019) <DOI:10.1093/bib/bbz041>. 'iFeature' 'Z. Chen et al.' (2018) <DOI:10.1093/bioinformatics/bty140>. <https://CRAN.R-project.org/package=rDNAse>. 'PseKRAAC' 'Y. Zuo et al.' 'PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition' (2017) <DOI:10.1093/bioinformatics/btw564>. 'iDNA6mA-PseKNC' 'P. Feng et al.' 'iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC' (2019) <DOI:10.1016/j.ygeno.2018.01.005>. 'I. Dubchak et al.' 'Prediction of protein folding class using global description of amino acid sequence' (1995) <DOI:10.1073/pnas.92.19.8700>. 'W. Chen et al.' 'Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome' (2015) <DOI:10.1038/srep13859>.

Maintained by Sare Amerifar. Last updated 3 years ago.

2.3 match 2 stars 2.26 score 1 scripts 3 dependents

bioc

TargetDecoy:Diagnostic Plots to Evaluate the Target Decoy Approach

A first step in the data analysis of Mass Spectrometry (MS) based proteomics data is to identify peptides and proteins. With this respect the huge number of experimental mass spectra typically have to be assigned to theoretical peptides derived from a sequence database. Search engines are used for this purpose. These tools compare each of the observed spectra to all candidate theoretical spectra derived from the sequence data base and calculate a score for each comparison. The observed spectrum is then assigned to the theoretical peptide with the best score, which is also referred to as the peptide to spectrum match (PSM). It is of course crucial for the downstream analysis to evaluate the quality of these matches. Therefore False Discovery Rate (FDR) control is used to return a reliable list PSMs. The FDR, however, requires a good characterisation of the score distribution of PSMs that are matched to the wrong peptide (bad target hits). In proteomics, the target decoy approach (TDA) is typically used for this purpose. The TDA method matches the spectra to a database of real (targets) and nonsense peptides (decoys). A popular approach to generate these decoys is to reverse the target database. Hence, all the PSMs that match to a decoy are known to be bad hits and the distribution of their scores are used to estimate the distribution of the bad scoring target PSMs. A crucial assumption of the TDA is that the decoy PSM hits have similar properties as bad target hits so that the decoy PSM scores are a good simulation of the target PSM scores. Users, however, typically do not evaluate these assumptions. To this end we developed TargetDecoy to generate diagnostic plots to evaluate the quality of the target decoy method.

Maintained by Elke Debrie. Last updated 5 months ago.

massspectrometryproteomicsqualitycontrolsoftwarevisualizationbioconductormass-spectrometry

1.0 match 1 stars 4.60 score 9 scripts

billchenxi

BMRBr:'BMRB' File Downloader

Nuclear magnetic resonance (NMR) is a highly versatile analytical technique for studying molecular configuration, conformation, and dynamics, especially those of biomacromolecules such as proteins. Biological Magnetic Resonance Data Bank ('BMRB') is a repository for Data from NMR Spectroscopy on Proteins, Peptides, Nucleic Acids, and other Biomolecules. Currently, 'BMRB' offers an R package 'RBMRB' to fetch data, however, it doesn't easily offer individual data file downloading and storing in a local directory. When using 'RBMRB', the data will stored as an R object, which fundamentally hinders the NMR researches to access the rich information from raw data, for example, the metadata. Here, 'BMRBr' File Downloader ('BMRBr') offers a more fundamental, low level downloader, which will download original deposited .str format file. This type of file contains information such as entry title, authors, citation, protein sequences, and so on. Many factors affect NMR experiment outputs, such as temperature, resonance sensitivity and etc., approximately 40% of the entries in the 'BMRB' have chemical shift accuracy problems [1,2] Unfortunately, current reference correction methods are heavily dependent on the availability of assigned protein chemical shifts or protein structure. This is my current research project is going to solve, which will be included in the future release of the package. The current version of the package is sufficient and robust enough for downloading individual 'BMRB' data file from the 'BMRB' database <http://www.bmrb.wisc.edu>. The functionalities of this package includes but not limited: * To simplifies NMR researches by combine data downloading and results analysis together. * To allows NMR data reaches a broader audience that could utilize more than just chemical shifts but also metadata. * To offer reference corrected data for entries without assignment or structure information (future release). Reference: [1] E.L. Ulrich, H. Akutsu, J.F. Doreleijers, Y. Harano, Y.E. Ioannidis, J. Lin, et al., BioMagResBank, Nucl. Acids Res. 36 (2008) D402–8. <doi:10.1093/nar/gkm957>. [2] L. Wang, H.R. Eghbalnia, A. Bahrami, J.L. Markley, Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications, J. Biomol. NMR. 32 (2005) 13–22. <doi:10.1007/s10858-005-1717-0>.

Maintained by Xi Chen. Last updated 6 years ago.

0.5 match 1 stars 3.74 score 11 scripts