R-universe search: peptides

dosorio

Peptides:Calculate Indices and Theoretical Physicochemical Properties of Protein Sequences

Includes functions to calculate several physicochemical properties and indices for amino-acid sequences as well as to read and plot 'XVG' output files from the 'GROMACS' molecular dynamics package.

Maintained by Daniel Osorio. Last updated 1 years ago.

bioinformatics calculate-indices peptides protein-sequences qsar cpp

63.7 match 82 stars 9.14 score 245 scripts 7 dependents

cpanse

protViz:Visualizing and Analyzing Mass Spectrometry Related Data in Proteomics

Helps with quality checks, visualizations and analysis of mass spectrometry data, coming from proteomics experiments. The package is developed, tested and used at the Functional Genomics Center Zurich <https://fgcz.ch>. We use this package mainly for prototyping, teaching, and having fun with proteomics data. But it can also be used to do data analysis for small scale data sets.

Maintained by Christian Panse. Last updated 1 years ago.

fun mass-spectrometry peptide-identification proteomics quantification visualization cpp

43.6 match 11 stars 7.88 score 72 scripts 2 dependents

bioc

isobar:Analysis and quantitation of isobarically tagged MSMS proteomics data

isobar provides methods for preprocessing, normalization, and report generation for the analysis of quantitative mass spectrometry proteomics data labeled with isobaric tags, such as iTRAQ and TMT. Features modules for integrating and validating PTM-centric datasets (isobar-PTM). More information on http://www.ms-isobar.org.

Maintained by Florian P Breitwieser. Last updated 5 months ago.

immunooncology proteomics massspectrometry bioinformatics multiplecomparisons qualitycontrol

26.0 match 10 stars 6.96 score 19 scripts

bioc

DAPAR:Tools for the Differential Analysis of Proteins Abundance with R

The package DAPAR is a Bioconductor distributed R package which provides all the necessary functions to analyze quantitative data from label-free proteomics experiments. Contrarily to most other similar R packages, it is endowed with rich and user-friendly graphical interfaces, so that no programming skill is required (see `Prostar` package).

Maintained by Samuel Wieczorek. Last updated 5 months ago.

proteomics normalization preprocessing massspectrometry qualitycontrol go dataimport prostar1

32.4 match 2 stars 5.42 score 22 scripts 1 dependents

jpquast

protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.

Maintained by Jan-Philipp Quast. Last updated 5 months ago.

data-analysis lip-ms mass-spectrometry omics protein proteomics systems-biology

20.4 match 61 stars 8.58 score 83 scripts

bioc

PSMatch:Handling and Managing Peptide Spectrum Matches

The PSMatch package helps proteomics practitioners to load, handle and manage Peptide Spectrum Matches. It provides functions to model peptide-protein relations as adjacency matrices and connected components, visualise these as graphs and make informed decision about shared peptide filtering. The package also provides functions to calculate and visualise MS2 fragment ions.

Maintained by Laurent Gatto. Last updated 5 months ago.

infrastructure proteomics massspectrometry mass-spectrometry peptide-spectrum-matches

20.1 match 3 stars 8.40 score 15 scripts 39 dependents

cogdisreslab

KRSA:KRSA: Kinome Random Sampling Analyzer

The goal of this package is to analyze the PamChip data and identify the changes in the active kinome. The package can preprocess the PamChip data output from BioNavigator and use Random Sampling and Permutation Analysis to identify upstream kinases. Additionally, this package provides a set of useful visualizations for the PamChip data.

Maintained by Ali Sajid Imami. Last updated 10 days ago.

kinase phosphatases pamchip kinome random sampling permutation analysis

30.3 match 4 stars 4.42 score 49 scripts

mkajano

HDXBoxeR:Analysis of Hydrogen-Deuterium Exchange Mass-Spectrometry Data

A protocol that facilitates the processing and analysis of Hydrogen-Deuterium Exchange Mass Spectrometry data using p-value statistics and Critical Interval analysis. It provides a pipeline for analyzing data from 'HDXExaminer' (Sierra Analytics, Trajan Scientific), automating matching and comparison of protein states through Welch's T-test and the Critical Interval statistical framework. Additionally, it simplifies data export, generates 'PyMol' scripts, and ensures calculations meet publication standards. 'HDXBoxeR' assists in various aspects of hydrogen-deuterium exchange data analysis, including reprocessing data, calculating parameters, identifying significant peptides, generating plots, and facilitating comparison between protein states. For details check papers by Hageman and Weis (2019) <doi:10.1021/acs.analchem.9b01325> and Masson et al. (2019) <doi:10.1038/s41592-019-0459-y>. 'HDXBoxeR' citation: Janowska et al. (2024) <doi:10.1093/bioinformatics/btae479>.

Maintained by Maria K. Janowska. Last updated 7 months ago.

26.5 match 2 stars 4.60 score 3 scripts

pmartr

pmartR:Panomics Marketplace - Quality Control and Statistical Analysis for Panomics Data

Provides functionality for quality control processing and statistical analysis of mass spectrometry (MS) omics data, in particular proteomic (either at the peptide or the protein level), lipidomic, and metabolomic data, as well as RNA-seq based count data and nuclear magnetic resonance (NMR) data. This includes data transformation, specification of groups that are to be compared against each other, filtering of features and/or samples, data normalization, data summarization (correlation, PCA), and statistical comparisons between defined groups. Implements methods described in: Webb-Robertson et al. (2014) <doi:10.1074/mcp.M113.030932>. Webb-Robertson et al. (2011) <doi:10.1002/pmic.201100078>. Matzke et al. (2011) <doi:10.1093/bioinformatics/btr479>. Matzke et al. (2013) <doi:10.1002/pmic.201200269>. Polpitiya et al. (2008) <doi:10.1093/bioinformatics/btn217>. Webb-Robertson et al. (2010) <doi:10.1021/pr1005247>.

Maintained by Lisa Bramer. Last updated 3 days ago.

data-summarization lipids mass-spectrometry metabolites metabolomics-data peptides proteins rna-seq-analysis openblas cpp

14.9 match 40 stars 7.69 score 144 scripts

bioc

specL:specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics

provides a functions for generating spectra libraries that can be used for MRM SRM MS workflows in proteomics. The package provides a BiblioSpec reader, a function which can add the protein information using a FASTA formatted amino acid file, and an export method for using the created library in the Spectronaut software. The package is developed, tested and used at the Functional Genomics Center Zurich <https://fgcz.ch>.

Maintained by Christian Panse. Last updated 5 months ago.

massspectrometry proteomics dda dia mass-spectrometry

20.8 match 1 stars 5.46 score 12 scripts

bioc

DEqMS:a tool to perform statistical analysis of differential protein expression for quantitative proteomics data.

DEqMS is developped on top of Limma. However, Limma assumes same prior variance for all genes. In proteomics, the accuracy of protein abundance estimates varies by the number of peptides/PSMs quantified in both label-free and labelled data. Proteins quantification by multiple peptides or PSMs are more accurate. DEqMS package is able to estimate different prior variances for proteins quantified by different number of PSMs/peptides, therefore acchieving better accuracy. The package can be applied to analyze both label-free and labelled proteomics data.

Maintained by Yafeng Zhu. Last updated 5 months ago.

immunooncology proteomics massspectrometry preprocessing differentialexpression multiplecomparison normalization bayesian experimenthubsoftware limma quantitative-proteomic-analysis

13.3 match 23 stars 8.18 score 58 scripts 1 dependents

bioc

SWATH2stats:Transform and Filter SWATH Data for Statistical Packages

This package is intended to transform SWATH data from the OpenSWATH software into a format readable by other statistics packages while performing filtering, annotation and FDR estimation.

Maintained by Peter Blattmann. Last updated 5 months ago.

proteomics annotation experimentaldesign preprocessing massspectrometry immunooncology

17.1 match 1 stars 6.30 score 22 scripts

biogenies

CancerGram:Prediction of Anticancer Peptides

Predicts anticancer peptides using random forests trained on the n-gram encoded peptides. The implemented algorithm can be accessed from both the command line and shiny-based GUI. The CancerGram model is too large for CRAN and it has to be downloaded separately from the repository: <https://github.com/BioGenies/CancerGramModel>. For more information see: Burdukiewicz et al. (2020) <doi:10.3390/pharmaceutics12111045>.

Maintained by Michal Burdukiewicz. Last updated 4 years ago.

anticancer-peptides bioinformatics k-mer n-gram peptide-identification random-forests

27.1 match 4 stars 3.90 score 3 scripts

bioc

pepStat:Statistical analysis of peptide microarrays

Statistical analysis of peptide microarrays

Maintained by Gregory C Imholte. Last updated 5 months ago.

microarray preprocessing

18.6 match 7 stars 5.62 score 4 scripts

nanxstats

protr:Generating Various Numerical Representation Schemes for Protein Sequences

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.

Maintained by Nan Xiao. Last updated 6 months ago.

bioinformatics feature-engineering feature-extraction machine-learning peptides protein-sequences sequence-analysis

10.0 match 52 stars 10.02 score 173 scripts 3 dependents

bioc

msImpute:Imputation of label-free mass spectrometry peptides

MsImpute is a package for imputation of peptide intensity in proteomics experiments. It additionally contains tools for MAR/MNAR diagnosis and assessment of distortions to the probability distribution of the data post imputation. The missing values are imputed by low-rank approximation of the underlying data matrix if they are MAR (method = "v2"), by Barycenter approach if missingness is MNAR ("v2-mnar"), or by Peptide Identity Propagation (PIP).

Maintained by Soroor Hediyeh-zadeh. Last updated 5 months ago.

massspectrometry proteomics software label-free-proteomics low-rank-approximation

18.6 match 14 stars 5.15 score 7 scripts

protviz

prozor:Minimal Protein Set Explaining Peptide Spectrum Matches

Determine minimal protein set explaining peptide spectrum matches. Utility functions for creating fasta amino acid databases with decoys and contaminants. Peptide false discovery rate estimation for target decoy search results on psm, precursor, peptide and protein level. Computing dynamic swath window sizes based on MS1 or MS2 signal distributions.

Maintained by Witold Wolski. Last updated 4 months ago.

software massspectrometry proteomics experimenthubsoftware

21.5 match 6 stars 4.45 score 93 scripts

cran

PepMapViz:A Versatile Toolkit for Peptide Mapping, Visualization, and Comparative Exploration

A versatile R visualization package that empowers researchers with comprehensive visualization tools for seamlessly mapping peptides to protein sequences, identifying distinct domains and regions of interest, accentuating mutations, and highlighting post-translational modifications, all while enabling comparisons across diverse experimental conditions. Potential applications of 'PepMapViz' include the visualization of cross-software mass spectrometry results at the peptide level for specific protein and domain details in a linearized format and post-translational modification coverage across different experimental conditions; unraveling insights into disease mechanisms. It also enables visualization of major histocompatibility complex-presented peptides in different antibody regions predicting immunogenicity in antibody drug development.

Maintained by Zhenru Zhou. Last updated 4 months ago.

immunogenicity massspectrometry proteomics peptidomics software visualization

33.2 match 2.70 score

mariechion

mi4p:Multiple Imputation for Proteomics

A framework for multiple imputation for proteomics is proposed by Marie Chion, Christine Carapito and Frederic Bertrand (2021) <doi:10.1371/journal.pcbi.1010420>. It is dedicated to dealing with multiple imputation for proteomics.

Maintained by Frederic Bertrand. Last updated 6 months ago.

15.9 match 6 stars 4.91 score 27 scripts

bioc

PepSetTest:Peptide Set Test

Peptide Set Test (PepSetTest) is a peptide-centric strategy to infer differentially expressed proteins in LC-MS/MS proteomics data. This test detects coordinated changes in the expression of peptides originating from the same protein and compares these changes against the rest of the peptidome. Compared to traditional aggregation-based approaches, the peptide set test demonstrates improved statistical power, yet controlling the Type I error rate correctly in most cases. This test can be valuable for discovering novel biomarkers and prioritizing drug targets, especially when the direct application of statistical analysis to protein data fails to provide substantial insights.

Maintained by Junmin Wang. Last updated 5 months ago.

differentialexpression regression proteomics massspectrometry

15.2 match 2 stars 5.00 score 9 scripts

bioc

cleaver:Cleavage of Polypeptide Sequences

In-silico cleavage of polypeptide sequences. The cleavage rules are taken from: http://web.expasy.org/peptide_cutter/peptidecutter_enzymes.html

Maintained by Sebastian Gibb. Last updated 5 months ago.

proteomics bioc cleavage digest peptides

10.0 match 12 stars 7.46 score 22 scripts 2 dependents

cbielow

PTXQC:Quality Report Generation for MaxQuant and mzTab Results

Generates Proteomics (PTX) quality control (QC) reports for shotgun LC-MS data analyzed with the MaxQuant software suite (from .txt files) or mzTab files (ideally from OpenMS 'QualityControl' tool). Reports are customizable (target thresholds, subsetting) and available in HTML or PDF format. Published in J. Proteome Res., Proteomics Quality Control: Quality Control Software for MaxQuant Results (2015) <doi:10.1021/acs.jproteome.5b00780>.

Maintained by Chris Bielow. Last updated 1 years ago.

drag-and-drop hacktoberfest heatmap match-between-runs maxquant metric mztab openms proteomics quality-control quality-metrics report

7.9 match 42 stars 9.35 score 105 scripts 1 dependents

wraff

wrProteo:Proteomics Data Analysis Functions

Data analysis of proteomics experiments by mass spectrometry is supported by this collection of functions mostly dedicated to the analysis of (bottom-up) quantitative (XIC) data. Fasta-formatted proteomes (eg from UniProt Consortium <doi:10.1093/nar/gky1049>) can be read with automatic parsing and multiple annotation types (like species origin, abbreviated gene names, etc) extracted. Initial results from multiple software for protein (and peptide) quantitation can be imported (to a common format): MaxQuant (Tyanova et al 2016 <doi:10.1038/nprot.2016.136>), Dia-NN (Demichev et al 2020 <doi:10.1038/s41592-019-0638-x>), Fragpipe (da Veiga et al 2020 <doi:10.1038/s41592-020-0912-y>), ionbot (Degroeve et al 2021 <doi:10.1101/2021.07.02.450686>), MassChroq (Valot et al 2011 <doi:10.1002/pmic.201100120>), OpenMS (Strauss et al 2021 <doi:10.1038/nmeth.3959>), ProteomeDiscoverer (Orsburn 2021 <doi:10.3390/proteomes9010015>), Proline (Bouyssie et al 2020 <doi:10.1093/bioinformatics/btaa118>), AlphaPept (preprint Strauss et al <doi:10.1101/2021.07.23.453379>) and Wombat-P (Bouyssie et al 2023 <doi:10.1021/acs.jproteome.3c00636>. Meta-data provided by initial analysis software and/or in sdrf format can be integrated to the analysis. Quantitative proteomics measurements frequently contain multiple NA values, due to physical absence of given peptides in some samples, limitations in sensitivity or other reasons. Help is provided to inspect the data graphically to investigate the nature of NA-values via their respective replicate measurements and to help/confirm the choice of NA-replacement algorithms. Meta-data in sdrf-format (Perez-Riverol et al 2020 <doi:10.1021/acs.jproteome.0c00376>) or similar tabular formats can be imported and included. Missing values can be inspected and imputed based on the concept of NA-neighbours or other methods. Dedicated filtering and statistical testing using the framework of package 'limma' <doi:10.18129/B9.bioc.limma> can be run, enhanced by multiple rounds of NA-replacements to provide robustness towards rare stochastic events. Multi-species samples, as frequently used in benchmark-tests (eg Navarro et al 2016 <doi:10.1038/nbt.3685>, Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>), can be run with special options considering such sub-groups during normalization and testing. Subsequently, ROC curves (Hand and Till 2001 <doi:10.1023/A:1010920819831>) can be constructed to compare multiple analysis approaches. As detailed example the data-set from Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>) quantified by MaxQuant, ProteomeDiscoverer, and Proline is provided with a detailed analysis of heterologous spike-in proteins.

Maintained by Wolfgang Raffelsberger. Last updated 4 months ago.

19.9 match 3.67 score 17 scripts 1 dependents

bioc

msqrob2:Robust statistical inference for quantitative LC-MS proteomics

msqrob2 provides a robust linear mixed model framework for assessing differential abundance in MS-based Quantitative proteomics experiments. Our workflows can start from raw peptide intensities or summarised protein expression values. The model parameter estimates can be stabilized by ridge regression, empirical Bayes variance estimation and robust M-estimation. msqrob2's hurde workflow can handle missing data without having to rely on hard-to-verify imputation assumptions, and, outcompetes state-of-the-art methods with and without imputation for both high and low missingness. It builds on QFeature infrastructure for quantitative mass spectrometry data to store the model results together with the raw data and preprocessed data.

Maintained by Lieven Clement. Last updated 18 days ago.

proteomics massspectrometry differentialexpression multiplecomparison regression experimentaldesign software immunooncology normalization timecourse preprocessing

10.5 match 10 stars 6.94 score 83 scripts

cran

OrgMassSpecR:Organic Mass Spectrometry

Organic/biological mass spectrometry data analysis.

Maintained by Nathan Dodder. Last updated 8 years ago.

19.5 match 3.68 score 2 dependents

bioc

MSnID:Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications

Extracts MS/MS ID data from mzIdentML (leveraging mzID package) or text files. After collating the search results from multiple datasets it assesses their identification quality and optimize filtering criteria to achieve the maximum number of identifications while not exceeding a specified false discovery rate. Also contains a number of utilities to explore the MS/MS results and assess missed and irregular enzymatic cleavages, mass measurement accuracy, etc.

Maintained by Vlad Petyuk. Last updated 5 months ago.

proteomics massspectrometry immunooncology

13.9 match 5.06 score 57 scripts

bioc

MSstatsLiP:LiP Significance Analysis in shotgun mass spectrometry-based proteomic experiments

Tools for LiP peptide and protein significance analysis. Provides functions for summarization, estimation of LiP peptide abundance, and detection of changes across conditions. Utilizes functionality across the MSstats family of packages.

Maintained by Devon Kohler. Last updated 5 months ago.

immunooncology massspectrometry proteomics software differentialexpression onechannel twochannel normalization qualitycontrol cpp

12.3 match 7 stars 5.62 score 5 scripts

bioc

dagLogo:dagLogo: a Bioconductor package for visualizing conserved amino acid sequence pattern in groups based on probability theory

Visualize significant conserved amino acid sequence pattern in groups based on probability theory.

Maintained by Jianhong Ou. Last updated 2 months ago.

sequencematching visualization

14.0 match 4.48 score 9 scripts

michbur

AmpGram:Prediction of Antimicrobial Peptides

Predicts antimicrobial peptides using random forests trained on the n-gram encoded peptides (10.3390/ijms21124310). The implemented algorithm can be accessed from both the command line and shiny-based GUI. The AmpGram model is too large for CRAN and it has to be downloaded separately from the repository: <https://github.com/michbur/AmpGramModel>.

Maintained by Michal Burdukiewicz. Last updated 3 years ago.

14.0 match 4 stars 4.38 score 5 scripts

laurafancello

net4pg:Handle Ambiguity of Protein Identifications from Shotgun Proteomics

In shotgun proteomics, shared peptides (i.e., peptides that might originate from different proteins sharing homology, from different proteoforms due to alternative mRNA splicing, post-translational modifications, proteolytic cleavages, and/or allelic variants) represent a major source of ambiguity in protein identifications. The 'net4pg' package allows to assess and handle ambiguity of protein identifications. It implements methods for two main applications. First, it allows to represent and quantify ambiguity of protein identifications by means of graph connected components (CCs). In graph theory, CCs are defined as the largest subgraphs in which any two vertices are connected to each other by a path and not connected to any other of the vertices in the supergraph. Here, proteins sharing one or more peptides are thus gathered in the same CC (multi-protein CC), while unambiguous protein identifications constitute CCs with a single protein vertex (single-protein CCs). Therefore, the proportion of single-protein CCs and the size of multi-protein CCs can be used to measure the level of ambiguity of protein identifications. The package implements a strategy to efficiently calculate graph connected components on large datasets and allows to visually inspect them. Secondly, the 'net4pg' package allows to exploit the increasing availability of matched transcriptomic and proteomic datasets to reduce ambiguity of protein identifications. More precisely, it implement a transcriptome-based filtering strategy fundamentally consisting in the removal of those proteins whose corresponding transcript is not expressed in the sample-matched transcriptome. The underlying assumption is that, according to the central dogma of biology, there can be no proteins without the corresponding transcript. Most importantly, the package allows to visually inspect the effect of the filtering on protein identifications and quantify ambiguity before and after filtering by means of graph connected components. As such, it constitutes a reproducible and transparent method to exploit transcriptome information to enhance protein identifications. All methods implemented in the 'net4pg' package are fully described in Fancello and Burger (2022) <doi:10.1186/s13059-022-02701-2>.

Maintained by Laura Fancello. Last updated 3 years ago.

15.2 match 2 stars 4.00 score 3 scripts

bioc

MSstats:Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments

A set of tools for statistical relative protein significance analysis in DDA, SRM and DIA experiments.

Maintained by Meena Choi. Last updated 11 days ago.

immunooncology massspectrometry proteomics software normalization qualitycontrol timecourse openblas cpp

7.2 match 8.49 score 164 scripts 7 dependents

cogdisreslab

creedenzymatic:creedenzymatic

Combine kinome results from KRSA and UKA and other tools A package for integrating upstream kinases analyses

Maintained by Ali Sajid Imami. Last updated 12 months ago.

bioconductor bioconductor-package kinome

16.7 match 1 stars 3.60 score 20 scripts

rrrlw

helixvis:Visualize Alpha-Helical Peptide Sequences

Create publication-quality, 2-dimensional visualizations of alpha-helical peptide sequences. Specifically, allows the user to programmatically generate helical wheels and wenxiang diagrams to provide a bird's eye, top-down view of alpha-helical oligopeptides. See Wadhwa RR, et al. (2018) <doi:10.21105/joss.01008> for more information.

Maintained by Raoul Wadhwa. Last updated 6 years ago.

biochemistry helical-wheels peptide-sequences visualization wenxiang-diagrams

14.5 match 2 stars 4.00 score 3 scripts

bioc

scp:Mass Spectrometry-Based Single-Cell Proteomics Data Analysis

Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the 'QFeatures' package and relies on 'SingleCellExpirement' to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization.

Maintained by Christophe Vanderaa. Last updated 16 days ago.

geneexpression proteomics singlecell massspectrometry preprocessing cellbasedassays bioconductor mass-spectrometry single-cell software

6.4 match 25 stars 8.94 score 115 scripts

patzaw

BED:Biological Entity Dictionary (BED)

An interface for the 'Neo4j' database providing mapping between different identifiers of biological entities. This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information. The method has been published by Godard and van Eyll (2018) <doi:10.12688/f1000research.13925.3>.

Maintained by Patrice Godard. Last updated 3 months ago.

7.7 match 8 stars 6.85 score 25 scripts

bioc

MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics

MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.

Maintained by Laurent Gatto. Last updated 2 days ago.

immunooncology infrastructure proteomics massspectrometry qualitycontrol dataimport bioconductor bioinformatics mass-spectrometry proteomics-data visualisation cpp

3.9 match 130 stars 12.81 score 772 scripts 36 dependents

bioc

Doscheda:A DownStream Chemo-Proteomics Analysis Pipeline

Doscheda focuses on quantitative chemoproteomics used to determine protein interaction profiles of small molecules from whole cell or tissue lysates using Mass Spectrometry data. The package provides a shiny application to run the pipeline, several visualisations and a downloadable report of an experiment.

Maintained by Bruno Contrino. Last updated 5 months ago.

proteomics normalization preprocessing massspectrometry qualitycontrol dataimport regression

11.8 match 4.00 score 2 scripts

bioc

ComPrAn:Complexome Profiling Analysis package

This package is for analysis of SILAC labeled complexome profiling data. It uses peptide table in tab-delimited format as an input and produces ready-to-use tables and plots.

Maintained by Petra Palenikova. Last updated 5 months ago.

massspectrometry proteomics visualization

10.2 match 4.48 score 5 scripts

bioc

mspms:Tools for the analysis of MSP-MS data

This package provides functions for the analysis of data generated by the multiplex substrate profiling by mass spectrometry for proteases (MSP-MS) method. Data exported from upstream proteomics software is accepted as input and subsequently processed for analysis. Tools for statistical analysis, visualization, and interpretation of the data are provided.

Maintained by Charlie Bayne. Last updated 3 months ago.

proteomics massspectrometry preprocessing protease proteomics-data-analysis

9.2 match 4.95 score 4 scripts

bioc

beer:Bayesian Enrichment Estimation in R

BEER implements a Bayesian model for analyzing phage-immunoprecipitation sequencing (PhIP-seq) data. Given a PhIPData object, BEER returns posterior probabilities of enriched antibody responses, point estimates for the relative fold-change in comparison to negative control samples, and more. Additionally, BEER provides a convenient implementation for using edgeR to identify enriched antibody responses.

Maintained by Athena Chen. Last updated 5 months ago.

software statisticalmethod bayesian sequencing coverage jags cpp

8.2 match 10 stars 5.38 score 12 scripts

jrcodina

peptoolkit:A Toolkit for Using Peptide Sequences in Machine Learning

This toolkit is designed for manipulation and analysis of peptides. It provides functionalities to assist researchers in peptide engineering and proteomics. Users can manipulate peptides by adding amino acids at every position, count occurrences of each amino acid at each position, and transform amino acid counts based on probabilities. The package offers functionalities to select the best versus the worst peptides and analyze these peptides, which includes counting specific residues, reducing peptide sequences, extracting features through One Hot Encoding (OHE), and utilizing Quantitative Structure-Activity Relationship (QSAR) properties (based in the package 'Peptides' by Osorio et al. (2015) <doi:10.32614/RJ-2015-001>). This package is intended for both researchers and bioinformatics enthusiasts working on peptide-based projects, especially for their use with machine learning.

Maintained by Josep-Ramon Codina. Last updated 1 years ago.

15.0 match 2.70 score

bioc

ProteoMM:Multi-Dataset Model-based Differential Expression Proteomics Analysis Platform

ProteoMM is a statistical method to perform model-based peptide-level differential expression analysis of single or multiple datasets. For multiple datasets ProteoMM produces a single fold change and p-value for each protein across multiple datasets. ProteoMM provides functionality for normalization, missing value imputation and differential expression. Model-based peptide-level imputation and differential expression analysis component of package follows the analysis described in “A statistical framework for protein quantitation in bottom-up MS based proteomics" (Karpievitch et al. Bioinformatics 2009). EigenMS normalisation is implemented as described in "Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition." (Karpievitch et al. Bioinformatics 2009).

Maintained by Yuliya V Karpievitch. Last updated 5 months ago.

immunooncology massspectrometry proteomics normalization differentialexpression

11.8 match 3.38 score 12 scripts

bioc

MSstatsLOBD:Assay characterization: estimation of limit of blanc(LoB) and limit of detection(LOD)

The MSstatsLOBD package allows calculation and visualization of limit of blac (LOB) and limit of detection (LOD). We define the LOB as the highest apparent concentration of a peptide expected when replicates of a blank sample containing no peptides are measured. The LOD is defined as the measured concentration value for which the probability of falsely claiming the absence of a peptide in the sample is 0.05, given a probability 0.05 of falsely claiming its presence. These functionalities were previously a part of the MSstats package. The methodology is described in Galitzine (2018) <doi:10.1074/mcp.RA117.000322>.

Maintained by Devon Kohler. Last updated 5 months ago.

immunooncology massspectrometry proteomics software differentialexpression onechannel twochannel normalization qualitycontrol mass-spectrometry

8.8 match 4.30 score 1 scripts

bioc

qPLEXanalyzer:Tools for quantitative proteomics data analysis

Tools for TMT based quantitative proteomics data analysis.

Maintained by Ashley Sawle. Last updated 5 months ago.

immunooncology proteomics massspectrometry normalization preprocessing qualitycontrol dataimport

9.3 match 1 stars 4.08 score 9 scripts

legana

ampir:Predict Antimicrobial Peptides

A toolkit to predict antimicrobial peptides from protein sequences on a genome-wide scale. It incorporates two support vector machine models ("precursor" and "mature") trained on publicly available antimicrobial peptide data using calculated physico-chemical and compositional sequence properties described in Meher et al. (2017) <doi:10.1038/srep42362>. In order to support genome-wide analyses, these models are designed to accept any type of protein as input and calculation of compositional properties has been optimised for high-throughput use. For best results it is important to select the model that accurately represents your sequence type: for full length proteins, it is recommended to use the default "precursor" model. The alternative, "mature", model is best suited for mature peptide sequences that represent the final antimicrobial peptide sequence after post-translational processing. For details see Fingerhut et al. (2020) <doi:10.1093/bioinformatics/btaa653>. The 'ampir' package is also available via a Shiny based GUI at <https://ampir.marine-omics.net/>.

Maintained by Legana Fingerhut. Last updated 4 years ago.

cpp

6.1 match 27 stars 5.96 score 34 scripts

bioc

synapter:Label-free data analysis pipeline for optimal identification and quantitation

The synapter package provides functionality to reanalyse label-free proteomics data acquired on a Synapt G2 mass spectrometer. One or several runs, possibly processed with additional ion mobility separation to increase identification accuracy can be combined to other quantitation files to maximise identification and quantitation accuracy.

Maintained by Laurent Gatto. Last updated 4 days ago.

immunooncology massspectrometry proteomics qualitycontrol

7.4 match 4 stars 4.73 score 5 scripts

cogdisreslab

KINNET:Kinase INteraction NETwork Generation

This package provides the functionality to process PamGene's PamChip Data Output and generate kinase interaction networks from that. This project uses a bayesian algorithm to generate bayesian networks for defining dependence relationships between peptide sequences in the PamChip data. It then uses a novel kinase assignment method to assign upstream kinases to each peptide which is then output as a graph.

Maintained by Ali Sajid Imami. Last updated 3 years ago.

16.0 match 2 stars 2.00 score 3 scripts

bioc

Pirat:Precursor or Peptide Imputation under Random Truncation

Pirat enables the imputation of missing values (either MNARs or MCARs) in bottom-up LC-MS/MS proteomics data using a penalized maximum likelihood strategy. It does not require any parameter tuning, it models the instrument censorship from the data available. It accounts for sibling peptides correlations and it can leverage complementary transcriptomics measurements.

Maintained by Samuel Wieczorek. Last updated 5 months ago.

proteomics massspectrometry preprocessing software

6.6 match 4.81 score 9 scripts

michbur

signalHsmm:Predict Presence of Signal Peptides

Predicts the presence of signal peptides in eukaryotic protein using hidden semi-Markov models. The implemented algorithm can be accessed from both the command line and GUI.

Maintained by Michal Burdukiewicz. Last updated 5 years ago.

cpp

9.0 match 2 stars 3.48 score 7 scripts

bioc

MSstatsPTM:Statistical Characterization of Post-translational Modifications

MSstatsPTM provides general statistical methods for quantitative characterization of post-translational modifications (PTMs). Supports DDA, DIA, SRM, and tandem mass tag (TMT) labeling. Typically, the analysis involves the quantification of PTM sites (i.e., modified residues) and their corresponding proteins, as well as the integration of the quantification results. MSstatsPTM provides functions for summarization, estimation of PTM site abundance, and detection of changes in PTMs across experimental conditions.

Maintained by Devon Kohler. Last updated 4 months ago.

immunooncology massspectrometry proteomics software differentialexpression onechannel twochannel normalization qualitycontrol post-translational-modification cpp

3.7 match 10 stars 7.98 score 36 scripts 2 dependents

bioc

HERON:Hierarchical Epitope pROtein biNding

HERON is a software package for analyzing peptide binding array data. In addition to identifying significant binding probes, HERON also provides functions for finding epitopes (string of consecutive peptides within a protein). HERON also calculates significance on the probe, epitope, and protein level by employing meta p-value methods. HERON is designed for obtaining calls on the sample level and calculates fractions of hits for different conditions.

Maintained by Sean McIlwain. Last updated 5 months ago.

microarray software

7.0 match 1 stars 4.18 score 6 scripts

bioc

ProtGenerics:Generic infrastructure for Bioconductor mass spectrometry packages

S4 generic functions and classes needed by Bioconductor proteomics packages.

Maintained by Laurent Gatto. Last updated 2 months ago.

infrastructure proteomics massspectrometry bioconductor mass-spectrometry metabolomics

3.0 match 8 stars 9.43 score 4 scripts 188 dependents

bioc

mzID:An mzIdentML parser for R

A parser for mzIdentML files implemented using the XML package. The parser tries to be general and able to handle all types of mzIdentML files with the drawback of having less 'pretty' output than a vendor specific parser. Please contact the maintainer with any problems and supply an mzIdentML file so the problems can be fixed quickly.

Maintained by Laurent Gatto. Last updated 5 months ago.

immunooncology dataimport massspectrometry proteomics

3.5 match 7.83 score 32 scripts 38 dependents

michbur

biogram:N-Gram Analysis of Biological Sequences

Tools for extraction and analysis of various n-grams (k-mers) derived from biological sequences (proteins or nucleic acids). Contains QuiPT (quick permutation test) for fast feature-filtering of the n-gram data.

Maintained by Michal Burdukiewicz. Last updated 7 months ago.

biological-sequences ngram-analysis

3.6 match 10 stars 7.50 score 87 scripts 3 dependents

moosa-r

rbioapi:User-Friendly R Interface to Biologic Web Services' API

Currently fully supports Enrichr, JASPAR, miEAA, PANTHER, Reactome, STRING, and UniProt! The goal of rbioapi is to provide a user-friendly and consistent interface to biological databases and services. In a way that insulates the user from the technicalities of using web services API and creates a unified and easy-to-use interface to biological and medical web services. This is an ongoing project; New databases and services will be added periodically. Feel free to suggest any databases or services you often use.

Maintained by Moosa Rezwani. Last updated 1 months ago.

api-client bioinformatics biology enrichment enrichment-analysis enrichr jaspar mieaa over-representation-analysis panther reactome string uniprot

3.5 match 20 stars 7.60 score 55 scripts

okdll

flowTraceR:Tracing Information Flow for Inter-Software Comparisons in Mass Spectrometry-Based Bottom-Up Proteomics

Useful functions to standardize software outputs from ProteomeDiscoverer, Spectronaut, DIA-NN and MaxQuant on precursor, modified peptide and proteingroup level and to trace software differences for identifications such as varying proteingroup denotations for common precursor.

Maintained by Oliver Kardell. Last updated 3 years ago.

4.9 match 3 stars 5.17 score 11 scripts 1 dependents

shixiangwang

neopeptides:Calculate and Explore Property Indices of Neopeptides

Includes functions to calculate and explore several property indices for neopeptides, which are abnormal peptides generated from genome.

Maintained by Shixiang Wang. Last updated 3 years ago.

somaticmutation immunooncology alignment immunotherapy neoantigen peptides

14.2 match 1 stars 1.70 score 1 scripts

bioc

PhIPData:Container for PhIP-Seq Experiments

PhIPData defines an S4 class for phage-immunoprecipitation sequencing (PhIP-seq) experiments. Buliding upon the RangedSummarizedExperiment class, PhIPData enables users to coordinate metadata with experimental data in analyses. Additionally, PhIPData provides specialized methods to subset and identify beads-only samples, subset objects using virus aliases, and use existing peptide libraries to populate object parameters.

Maintained by Athena Chen. Last updated 5 months ago.

infrastructure datarepresentation sequencing coverage

4.3 match 6 stars 5.26 score 6 scripts 1 dependents

cran

VDAP:Peptide Array Analysis Tools

Analyze Peptide Array Data and characterize peptide sequence space. Allows for high level visualization of global signal, Quality control based on replicate correlation and/or relative Kd, calculation of peptide Length/Charge/Kd parameters, Hits selection based on RFU Signal, and amino acid composition/basic motif recognition with RFU signal weighting. Basic signal trends can be used to generate peptides that follow the observed compositional trends.

Maintained by Cody Moore. Last updated 9 years ago.

18.1 match 1.15 score 14 scripts

benbruyneel

proteinDiscover:ProteinDiscover

Provides an interface to the data contained in Proteome Discoverer (Thermo Scientific) results.

Maintained by Ben Bruyneel. Last updated 1 years ago.

mass-spectrometry proteomics proteomics-data-analysis

6.7 match 2 stars 3.00 score 2 scripts

bioc

pepXMLTab:Parsing pepXML files and filter based on peptide FDR.

Parsing pepXML files based one XML package. The package tries to handle pepXML files generated from different softwares. The output will be a peptide-spectrum-matching tabular file. The package also provide function to filter the PSMs based on FDR.

Maintained by Xiaojing Wang. Last updated 5 months ago.

immunooncology proteomics massspectrometry

5.4 match 3.60 score 9 scripts

bioc

MSstatsTMT:Protein Significance Analysis in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling

The package provides statistical tools for detecting differentially abundant proteins in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling. It provides multiple functionalities, including aata visualization, protein quantification and normalization, and statistical modeling and inference. Furthermore, it is inter-operable with other data processing tools, such as Proteome Discoverer, MaxQuant, OpenMS and SpectroMine.

Maintained by Devon Kohler. Last updated 11 days ago.

immunooncology massspectrometry proteomics software

2.9 match 6.60 score 35 scripts 3 dependents

werpuc

HaDeX:Analysis and Visualisation of Hydrogen/Deuterium Exchange Mass Spectrometry Data

Functions for processing, analysis and visualization of Hydrogen Deuterium eXchange monitored by Mass Spectrometry experiments (HDX-MS) (10.1093/bioinformatics/btaa587). 'HaDeX' introduces a new standardized and reproducible workflow for the analysis of the HDX-MS data, including novel uncertainty intervals. Additionally, it covers data exploration, quality control and generation of publication-quality figures. All functionalities are also available in the in-built 'Shiny' app.

Maintained by Weronika Puchala. Last updated 4 years ago.

4.7 match 3.93 score 43 scripts

philipberg

baldur:Bayesian Hierarchical Modeling for Label-Free Proteomics

Statistical decision in proteomics data using a hierarchical Bayesian model. There are two regression models for describing the mean-variance trend, a gamma regression or a latent gamma mixture regression. The regression model is then used as an Empirical Bayes estimator for the prior on the variance in a peptide. Further, it assumes that each measurement has an uncertainty (increased variance) associated with it that is also inferred. Finally, it tries to estimate the posterior distribution (by Hamiltonian Monte Carlo) for the differences in means for each peptide in the data. Once the posterior is inferred, it integrates the tails to estimate the probability of error from which a statistical decision can be made. See Berg and Popescu for details (<doi:10.1101/2023.05.11.540411>).

Maintained by Philip Berg. Last updated 7 months ago.

cpp

4.5 match 1 stars 4.00 score 8 scripts

bioc

KinSwingR:KinSwingR: network-based kinase activity prediction

KinSwingR integrates phosphosite data derived from mass-spectrometry data and kinase-substrate predictions to predict kinase activity. Several functions allow the user to build PWM models of kinase-subtrates, statistically infer PWM:substrate matches, and integrate these data to infer kinase activity.

Maintained by Ashley J. Waardenberg. Last updated 5 months ago.

proteomics sequencematching network

4.2 match 4.00 score 4 scripts

richelbilderbeek

netmhc2pan:Interface to 'NetMHCIIpan'

The field of immunology benefits from software that can predict which peptide sequences trigger an immune response. 'NetMHCIIpan' is a such a tool: it predicts the binding strength of a short peptide to a Major Histocompatibility Complex class II (MHC-II) molecule. 'NetMHCIIpan' can be used from a web server at <https://services.healthtech.dtu.dk/services/NetMHCIIpan-3.2/> or from the command-line, using a local installation. This package allows to call 'NetMHCIIpan' from R.

Maintained by Richèl J.C. Bilderbeek. Last updated 8 months ago.

4.5 match 1 stars 3.70 score 8 scripts

bioc

ensembldb:Utilities to create and use Ensembl-based annotation databases

The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.

Maintained by Johannes Rainer. Last updated 5 months ago.

genetics annotationdata sequencing coverage annotation bioconductor bioconductor-packages ensembl

1.2 match 35 stars 14.08 score 892 scripts 108 dependents

wraff

wrTopDownFrag:Internal Fragment Identification from Top-Down Mass Spectrometry

Top-Down mass spectrometry aims to identify entire proteins as well as their (post-translational) modifications or ions bound (eg Chen et al (2018) <doi:10.1021/acs.analchem.7b04747>). The pattern of internal fragments (Haverland et al (2017) <doi:10.1007/s13361-017-1635-x>) may reveal important information about the original structure of the proteins studied (Skinner et al (2018) <doi:10.1038/nchembio.2515> and Li et al (2018) <doi:10.1038/nchem.2908>). However, the number of possible internal fragments gets huge with longer proteins and subsequent identification of internal fragments remains challenging, in particular since the the accuracy of measurements with current mass spectrometers represents a limiting factor. This package attempts to deal with the complexity of internal fragments and allows identification of terminal and internal fragments from deconvoluted mass-spectrometry data.

Maintained by Wolfgang Raffelsberger. Last updated 5 years ago.

8.1 match 2.00 score 2 scripts

bioc

ProteoDisco:Generation of customized protein variant databases from genomic variants, splice-junctions and manual sequences

ProteoDisco is an R package to facilitate proteogenomics studies. It houses functions to create customized (variant) protein databases based on user-submitted genomic variants, splice-junctions, fusion genes and manual transcript sequences. The flexible workflow can be adopted to suit a myriad of research and experimental settings.

Maintained by Job van Riet. Last updated 5 months ago.

software proteomics rnaseq snp sequencing variantannotation dataimport

3.0 match 5 stars 5.30 score 4 scripts

benbruyneel

massSpectrometryR:massSpectrometryR

Provides calculations, plotting etc for chemistry & mass spectrometry.

Maintained by Ben Bruyneel. Last updated 4 months ago.

mass-spectrometry proteomics openjdk

8.1 match 1.70 score 1 scripts

bioc

glmSparseNet:Network Centrality Metrics for Elastic-Net Regularized Models

glmSparseNet is an R-package that generalizes sparse regression models when the features (e.g. genes) have a graph structure (e.g. protein-protein interactions), by including network-based regularizers. glmSparseNet uses the glmnet R-package, by including centrality measures of the network as penalty weights in the regularization. The current version implements regularization based on node degree, i.e. the strength and/or number of its associated edges, either by promoting hubs in the solution or orphan genes in the solution. All the glmnet distribution families are supported, namely "gaussian", "poisson", "binomial", "multinomial", "cox", and "mgaussian".

Maintained by André Veríssimo. Last updated 5 months ago.

software statisticalmethod dimensionreduction regression classification survival network graphandnetwork

1.8 match 6 stars 7.42 score 41 scripts 1 dependents

bioc

Pviz:Peptide Annotation and Data Visualization using Gviz

Pviz adapts the Gviz package for protein sequences and data.

Maintained by Renan Sauteraud. Last updated 5 months ago.

visualization proteomics microarray

2.9 match 4.48 score 4 scripts

bioc

IsoBayes:IsoBayes: Single Isoform protein inference Method via Bayesian Analyses

IsoBayes is a Bayesian method to perform inference on single protein isoforms. Our approach infers the presence/absence of protein isoforms, and also estimates their abundance; additionally, it provides a measure of the uncertainty of these estimates, via: i) the posterior probability that a protein isoform is present in the sample; ii) a posterior credible interval of its abundance. IsoBayes inputs liquid cromatography mass spectrometry (MS) data, and can work with both PSM counts, and intensities. When available, trascript isoform abundances (i.e., TPMs) are also incorporated: TPMs are used to formulate an informative prior for the respective protein isoform relative abundance. We further identify isoforms where the relative abundance of proteins and transcripts significantly differ. We use a two-layer latent variable approach to model two sources of uncertainty typical of MS data: i) peptides may be erroneously detected (even when absent); ii) many peptides are compatible with multiple protein isoforms. In the first layer, we sample the presence/absence of each peptide based on its estimated probability of being mistakenly detected, also known as PEP (i.e., posterior error probability). In the second layer, for peptides that were estimated as being present, we allocate their abundance across the protein isoforms they map to. These two steps allow us to recover the presence and abundance of each protein isoform.

Maintained by Simone Tiberi. Last updated 5 months ago.

statisticalmethod bayesian proteomics massspectrometry alternativesplicing sequencing rnaseq geneexpression genetics visualization software cpp

2.2 match 7 stars 5.39 score 10 scripts

samwieczorek

imputeLCMD:A Collection of Methods for Left-Censored Missing Data Imputation

A collection of functions for left-censored missing data imputation. Left-censoring is a special case of missing not at random (MNAR) mechanism that generates non-responses in proteomics experiments. The package also contains functions to artificially generate peptide/protein expression data (log-transformed) as random draws from a multivariate Gaussian distribution as well as a function to generate missing data (both randomly and non-randomly). For comparison reasons, the package also contains several wrapper functions for the imputation of non-responses that are missing at random. * New functionality has been added: a hybrid method that allows the imputation of missing values in a more complex scenario where the missing data are both MAR and MNAR.

Maintained by Samuel Wieczorek. Last updated 3 years ago.

2.5 match 2 stars 4.55 score 93 scripts 5 dependents

cran

mpwR:Standardized Comparison of Workflows in Mass Spectrometry-Based Bottom-Up Proteomics

Useful functions to analyze proteomic workflows including number of identifications, data completeness, missed cleavages, quantitative and retention time precision etc. Various software outputs are supported such as 'ProteomeDiscoverer', 'Spectronaut', 'DIA-NN' and 'MaxQuant'.

Maintained by Oliver Kardell. Last updated 1 years ago.

3.4 match 3.30 score

bioc

SubCellBarCode:SubCellBarCode: Integrated workflow for robust mapping and visualizing whole human spatial proteome

Mass-Spectrometry based spatial proteomics have enabled the proteome-wide mapping of protein subcellular localization (Orre et al. 2019, Molecular Cell). SubCellBarCode R package robustly classifies proteins into corresponding subcellular localization.

Maintained by Taner Arslan. Last updated 5 months ago.

proteomics massspectrometry classification

2.9 match 3.78 score 1 scripts

sophie-lebre

DCODE:List Linear n-Peptide Constraints for Overlapping Protein Regions

Traversal graph algorithm for listing linear n-peptide constraints for overlapping protein regions. (Lebre and Gascuel, The combinatorics of overlapping genes, freely available from arXiv at : http://arxiv.org/abs/1602.04971).

Maintained by Sophie Lebre. Last updated 9 years ago.

6.7 match 1.60 score 3 scripts

sjmack

HLAtools:Toolkit for HLA Immunogenomics

A toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics HLA 'GitHub' repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.

Maintained by Steven Mack. Last updated 12 days ago.

1.7 match 4 stars 6.21 score 7 scripts 1 dependents

bioc

customProDB:Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search

Database search is the most widely used approach for peptide and protein identification in mass spectrometry-based proteomics studies. Our previous study showed that sample-specific protein databases derived from RNA-Seq data can better approximate the real protein pools in the samples and thus improve protein identification. More importantly, single nucleotide variations, short insertion and deletions and novel junctions identified from RNA-Seq data make protein database more complete and sample-specific. Here, we report an R package customProDB that enables the easy generation of customized databases from RNA-Seq data for proteomics search. This work bridges genomics and proteomics studies and facilitates cross-omics data integration.

Maintained by Xiaojing Wang. Last updated 5 months ago.

immunooncology sequencing massspectrometry proteomics snp rnaseq software transcription alternativesplicing functionalgenomics

2.2 match 4.72 score 15 scripts

jacky11

imp4p:Imputation for Proteomics

Functions to analyse missing value mechanisms and to impute data sets in the context of bottom-up MS-based proteomics.

Maintained by Quentin Giai Gianetto. Last updated 4 years ago.

cpp

5.2 match 1 stars 2.00 score 33 scripts 1 dependents

cran

RHybridFinder:Identification of Hybrid Peptides in Immunopeptidomic Analyses

Tool for the analysis Mass Spectrometry (MS) data in the context of immunopeptidomic analysis for the identification of hybrid peptides and the predictions of binding affinity of all peptides using 'netMHCpan' <doi:10.1093/nar/gkaa379> while providing a summary of the netMHCpan output. 'RHybridFinder' (RHF) is destined for researchers who are looking to analyze their MS data for the purpose of identification of potential spliced peptides. This package, developed mainly in base R, is based on the workflow published by Faridi et al. in 2018 <doi:10.1126/sciimmunol.aar3947>.

Maintained by Frederic Saab. Last updated 4 years ago.

5.1 match 2.00 score

grosenberger

aLFQ:Estimating Absolute Protein Quantities from Label-Free LC-MS/MS Proteomics Data

Determination of absolute protein quantities is necessary for multiple applications, such as mechanistic modeling of biological systems. Quantitative liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics can measure relative protein abundance on a system-wide scale. To estimate absolute quantitative information using these relative abundance measurements requires additional information such as heavy-labeled references of known concentration. Multiple methods have been using different references and strategies; some are easily available whereas others require more effort on the users end. Hence, we believe the field might benefit from making some of these methods available under an automated framework, which also facilitates validation of the chosen strategy. We have implemented the most commonly used absolute label-free protein abundance estimation methods for LC-MS/MS modes quantifying on either MS1-, MS2-levels or spectral counts together with validation algorithms to enable automated data analysis and error estimation. Specifically, we used Monte-carlo cross-validation and bootstrapping for model selection and imputation of proteome-wide absolute protein quantity estimation. Our open-source software is written in the statistical programming language R and validated and demonstrated on a synthetic sample.

Maintained by George Rosenberger. Last updated 5 years ago.

5.3 match 1.85 score 14 scripts

bioc

QFeatures:Quantitative features for mass spectrometry data

The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.

Maintained by Laurent Gatto. Last updated 12 days ago.

infrastructure massspectrometry proteomics metabolomics bioconductor mass-spectrometry

0.8 match 27 stars 11.87 score 278 scripts 49 dependents

bioc

adductomicsR:Processing of adductomic mass spectral datasets

Processes MS2 data to identify potentially adducted peptides from spectra that has been corrected for mass drift and retention time drift and quantifies MS1 level mass spectral peaks.

Maintained by Josie Hayes. Last updated 5 months ago.

massspectrometry metabolomics software thirdpartyclient dataimport gui

2.2 match 1 stars 4.00 score 5 scripts

bioc

MsBackendRawFileReader:Mass Spectrometry Backend for Reading Thermo Fisher Scientific raw Files

implements a MsBackend for the Spectra package using Thermo Fisher Scientific's NewRawFileReader .Net libraries. The package is generalizing the functionality introduced by the rawrr package Methods defined in this package are supposed to extend the Spectra Bioconductor package.

Maintained by Christian Panse. Last updated 4 months ago.

massspectrometry proteomics metabolomics

1.5 match 5 stars 5.94 score 5 scripts

isoverse

isoorbi:Process Orbitrap Isotopocule Data

Read and process isotopocule data from an Orbitrap Isotope Solutions mass spectrometer. Citation: Kantnerova et al. (Nature Protocols, 2024).

Maintained by Caj Neubauer. Last updated 7 months ago.

1.3 match 4 stars 6.31 score 17 scripts

uclouvain-cbio

rWSBIM1207:Companion Package for WSBIM1207 Course

Companion package for the WSBIM1207 course, distributing data and general documentation, and making course administration easier.

Maintained by Laurent Gatto. Last updated 9 months ago.

4.0 match 2.00 score 7 scripts

cran

LSPFP:Lysate and Secretome Peptide Feature Plotter

Creates plots of peptides from shotgun proteomics analysis of secretome and lysate samples. These plots contain associated protein features and scores for potential secretion and truncation.

Maintained by Gereon Poschmann. Last updated 5 years ago.

7.6 match 1.00 score 3 scripts

bioc

artMS:Analytical R tools for Mass Spectrometry

artMS provides a set of tools for the analysis of proteomics label-free datasets. It takes as input the MaxQuant search result output (evidence.txt file) and performs quality control, relative quantification using MSstats, downstream analysis and integration. artMS also provides a set of functions to re-format and make it compatible with other analytical tools, including, SAINTq, SAINTexpress, Phosfate, and PHOTON. Check [http://artms.org](http://artms.org) for details.

Maintained by David Jimenez-Morales. Last updated 5 months ago.

proteomics differentialexpression biomedicalinformatics systemsbiology massspectrometry annotation qualitycontrol genesetenrichment clustering normalization immunooncology multiplecomparison analysis analytical ap-ms bioconductor bioinformatics mass-spectrometry phosphoproteomics post-translational-modification quantitative-analysis

1.1 match 14 stars 6.41 score 13 scripts

yuanbofaith

protag:Search Tagged Peptides & Draw Highlighted Mass Spectra

In a typical protein labelling procedure, proteins are chemically tagged with a functional group, usually at specific sites, then digested into peptides, which are then analyzed using matrix-assisted laser desorption ionization - time of flight mass spectrometry (MALDI-TOF MS) to generate peptide fingerprint. Relative to the control, peptides that are heavier by the mass of the labelling group are informative for sequence determination. Searching for peptides with such mass shifts, however, can be difficult. This package, designed to tackle this inconvenience, takes as input the mass list of two or multiple MALDI-TOF MS mass lists, and makes pairwise comparisons between the labeled groups vs. control, and restores centroid mass spectra with highlighted peaks of interest for easier visual examination. Particularly, peaks differentiated by the mass of the labelling group are defined as a “pair”, those with equal masses as a “match”, and all the other peaks as a “mismatch”.For more bioanalytical background information, refer to following publications: Jingjing Deng (2015) <doi:10.1007/978-1-4939-2550-6_19>; Elizabeth Chang (2016) <doi:10.7171/jbt.16-2702-002>.

Maintained by Bo Yuan. Last updated 6 years ago.

5.5 match 1 stars 1.00 score 1 scripts

cran

AntAngioCOOL:Anti-Angiogenic Peptide Prediction

Machine learning based package to predict anti-angiogenic peptides using heterogeneous sequence descriptors. 'AntAngioCOOL' exploits five descriptor types of a peptide of interest to do prediction including: pseudo amino acid composition, k-mer composition, k-mer composition (reduced alphabet), physico-chemical profile and atomic profile. According to the obtained results, 'AntAngioCOOL' reached to a satisfactory performance in anti-angiogenic peptide prediction on a benchmark non-redundant independent test dataset.

Maintained by Javad Zahiri. Last updated 9 years ago.

openjdk

4.0 match 1.30 score 1 scripts

sareameri

ftrCOOL:Feature Extraction from Biological Sequences

Extracts features from biological sequences. It contains most features which are presented in related work and also includes features which have never been introduced before. It extracts numerous features from nucleotide and peptide sequences. Each feature converts the input sequences to discrete numbers in order to use them as predictors in machine learning models. There are many features and information which are hidden inside a sequence. Utilizing the package, users can convert biological sequences to discrete models based on chosen properties. References: 'iLearn' 'Z. Chen et al.' (2019) <DOI:10.1093/bib/bbz041>. 'iFeature' 'Z. Chen et al.' (2018) <DOI:10.1093/bioinformatics/bty140>. <https://CRAN.R-project.org/package=rDNAse>. 'PseKRAAC' 'Y. Zuo et al.' 'PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition' (2017) <DOI:10.1093/bioinformatics/btw564>. 'iDNA6mA-PseKNC' 'P. Feng et al.' 'iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC' (2019) <DOI:10.1016/j.ygeno.2018.01.005>. 'I. Dubchak et al.' 'Prediction of protein folding class using global description of amino acid sequence' (1995) <DOI:10.1073/pnas.92.19.8700>. 'W. Chen et al.' 'Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome' (2015) <DOI:10.1038/srep13859>.

Maintained by Sare Amerifar. Last updated 3 years ago.

2.3 match 2 stars 2.26 score 1 scripts 3 dependents

bioc

MSstatsQC:Longitudinal system suitability monitoring and quality control for proteomic experiments

MSstatsQC is an R package which provides longitudinal system suitability monitoring and quality control tools for proteomic experiments.

Maintained by Eralp Dogu. Last updated 5 months ago.

software qualitycontrol proteomics massspectrometry

1.1 match 4.48 score 7 scripts 1 dependents

bioc

TargetDecoy:Diagnostic Plots to Evaluate the Target Decoy Approach

A first step in the data analysis of Mass Spectrometry (MS) based proteomics data is to identify peptides and proteins. With this respect the huge number of experimental mass spectra typically have to be assigned to theoretical peptides derived from a sequence database. Search engines are used for this purpose. These tools compare each of the observed spectra to all candidate theoretical spectra derived from the sequence data base and calculate a score for each comparison. The observed spectrum is then assigned to the theoretical peptide with the best score, which is also referred to as the peptide to spectrum match (PSM). It is of course crucial for the downstream analysis to evaluate the quality of these matches. Therefore False Discovery Rate (FDR) control is used to return a reliable list PSMs. The FDR, however, requires a good characterisation of the score distribution of PSMs that are matched to the wrong peptide (bad target hits). In proteomics, the target decoy approach (TDA) is typically used for this purpose. The TDA method matches the spectra to a database of real (targets) and nonsense peptides (decoys). A popular approach to generate these decoys is to reverse the target database. Hence, all the PSMs that match to a decoy are known to be bad hits and the distribution of their scores are used to estimate the distribution of the bad scoring target PSMs. A crucial assumption of the TDA is that the decoy PSM hits have similar properties as bad target hits so that the decoy PSM scores are a good simulation of the target PSM scores. Users, however, typically do not evaluate these assumptions. To this end we developed TargetDecoy to generate diagnostic plots to evaluate the quality of the target decoy method.

Maintained by Elke Debrie. Last updated 5 months ago.

massspectrometry proteomics qualitycontrol software visualization bioconductor mass-spectrometry

1.0 match 1 stars 4.60 score 9 scripts

zzheng68

MixTwice:Large-Scale Hypothesis Testing by Variance Mixing

Implements large-scale hypothesis testing by variance mixing. It takes two statistics per testing unit -- an estimated effect and its associated squared standard error -- and fits a nonparametric, shape-constrained mixture separately on two latent parameters. It reports local false discovery rates (lfdr) and local false sign rates (lfsr). Manuscript describing algorithm of MixTwice: Zheng et al(2021) <doi: 10.1093/bioinformatics/btab162>.

Maintained by Zihao Zheng. Last updated 3 years ago.

3.8 match 1.00 score 1 scripts

dapritchard

PepSAVIms:PepSAVI-MS Data Analysis

An implementation of the data processing and data analysis portion of the PepSAVI-MS pipeline developed by principal investigator Christine Kirkpatrick and the Hicks Laboratory at the University of North Carolina, as presented in the paper \emph{The "PepSAVI-MS" Pipeline for Natural Product Bioactive Peptide Discovery} (DOI:10.1021/acs.analchem.6b03625). The statistical analysis package presented herein provides a collection of software tools used to facilitate the prioritization of putative bioactive peptides from a complex biological matrix. Tools are provided to deconvolute mass spectrometry features into a single representation for each peptide charge state, filter compounds to include only those possibly contributing to the observed bioactivity, and prioritize these remaining compounds for those most likely contributing to each bioactivity data set.

Maintained by Pritchard David. Last updated 7 years ago.

0.9 match 1 stars 4.00 score 10 scripts

cran

R4HCR:R for Health Care Research

A collection of datasets that accompany the forthcoming book "R for Health Care Research".

Maintained by Jason L. Oke. Last updated 6 months ago.

3.5 match 1.00 score

uclouvain-cbio

scpdata:Single-Cell Proteomics Data Package

The package disseminates mass spectrometry (MS)-based single-cell proteomics (SCP) datasets. The data were collected from published work and formatted using the `scp` data structure. The data sets contain quantitative information at spectrum, peptide and/or protein level for single cells or minute sample amounts.

Maintained by Christophe Vanderaa. Last updated 9 days ago.

experimentdata expressiondata experimenthub reproducibleresearch massspectrometrydata proteome singlecelldata packagetypedata

0.5 match 6 stars 5.58 score 16 scripts

bioc

MsDataHub:Mass Spectrometry Data on ExperimentHub

The MsDataHub package uses the ExperimentHub infrastructure to distribute raw mass spectrometry data files, peptide spectrum matches or quantitative data from proteomics and metabolomics experiments.

Maintained by Laurent Gatto. Last updated 23 days ago.

experimenthubsoftware massspectrometry proteomics metabolomics bioconductor data mass-spectrometry

0.5 match 1 stars 4.95 score 2 scripts

bioc

immunotation:Tools for working with diverse immune genes

MHC (major histocompatibility complex) molecules are cell surface complexes that present antigens to T cells. The repertoire of antigens presented in a given genetic background largely depends on the sequence of the encoded MHC molecules, and thus, in humans, on the highly variable HLA (human leukocyte antigen) genes of the hyperpolymorphic HLA locus. More than 28,000 different HLA alleles have been reported, with significant differences in allele frequencies between human populations worldwide. Reproducible and consistent annotation of HLA alleles in large-scale bioinformatics workflows remains challenging, because the available reference databases and software tools often use different HLA naming schemes. The package immunotation provides tools for consistent annotation of HLA genes in typical immunoinformatics workflows such as for example the prediction of MHC-presented peptides in different human donors. Converter functions that provide mappings between different HLA naming schemes are based on the MHC restriction ontology (MRO). The package also provides automated access to HLA alleles frequencies in worldwide human reference populations stored in the Allele Frequency Net Database.

Maintained by Katharina Imkeller. Last updated 5 months ago.

software immunooncology biomedicalinformatics genetics annotation

0.5 match 8 stars 4.90 score 3 scripts

bioc

immunogenViewer:Visualization and evaluation of protein immunogens

Plots protein properties and visualizes position of peptide immunogens within protein sequence. Allows evaluation of immunogens based on structural and functional annotations to infer suitability for antibody-based methods aiming to detect native proteins.

Maintained by Katharina Waury. Last updated 1 months ago.

featureextraction proteomics software visualization

0.5 match 4.65 score 10 scripts

michbur

AmyloGram:Prediction of Amyloid Proteins

Predicts amyloid proteins using random forests trained on the n-gram encoded peptides. The implemented algorithm can be accessed from both the command line and shiny-based GUI.

Maintained by Michal Burdukiewicz. Last updated 5 years ago.

0.5 match 9 stars 4.60 score 11 scripts

bioc

PECA:Probe-level Expression Change Averaging

Calculates Probe-level Expression Change Averages (PECA) to identify differential expression in Affymetrix gene expression microarray studies or in proteomic studies using peptide-level mesurements respectively.

Maintained by Tomi Suomi. Last updated 5 months ago.

software proteomics microarray differentialexpression geneexpression exonarray differentialsplicing

0.5 match 4.34 score 11 scripts

billchenxi

BMRBr:'BMRB' File Downloader

Nuclear magnetic resonance (NMR) is a highly versatile analytical technique for studying molecular configuration, conformation, and dynamics, especially those of biomacromolecules such as proteins. Biological Magnetic Resonance Data Bank ('BMRB') is a repository for Data from NMR Spectroscopy on Proteins, Peptides, Nucleic Acids, and other Biomolecules. Currently, 'BMRB' offers an R package 'RBMRB' to fetch data, however, it doesn't easily offer individual data file downloading and storing in a local directory. When using 'RBMRB', the data will stored as an R object, which fundamentally hinders the NMR researches to access the rich information from raw data, for example, the metadata. Here, 'BMRBr' File Downloader ('BMRBr') offers a more fundamental, low level downloader, which will download original deposited .str format file. This type of file contains information such as entry title, authors, citation, protein sequences, and so on. Many factors affect NMR experiment outputs, such as temperature, resonance sensitivity and etc., approximately 40% of the entries in the 'BMRB' have chemical shift accuracy problems [1,2] Unfortunately, current reference correction methods are heavily dependent on the availability of assigned protein chemical shifts or protein structure. This is my current research project is going to solve, which will be included in the future release of the package. The current version of the package is sufficient and robust enough for downloading individual 'BMRB' data file from the 'BMRB' database <http://www.bmrb.wisc.edu>. The functionalities of this package includes but not limited: * To simplifies NMR researches by combine data downloading and results analysis together. * To allows NMR data reaches a broader audience that could utilize more than just chemical shifts but also metadata. * To offer reference corrected data for entries without assignment or structure information (future release). Reference: [1] E.L. Ulrich, H. Akutsu, J.F. Doreleijers, Y. Harano, Y.E. Ioannidis, J. Lin, et al., BioMagResBank, Nucl. Acids Res. 36 (2008) D402–8. <doi:10.1093/nar/gkm957>. [2] L. Wang, H.R. Eghbalnia, A. Bahrami, J.L. Markley, Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications, J. Biomol. NMR. 32 (2005) 13–22. <doi:10.1007/s10858-005-1717-0>.

Maintained by Xi Chen. Last updated 6 years ago.

0.5 match 1 stars 3.74 score 11 scripts

shashankkumbhare

ypssc:Yeast-Proteome Secondary-Structure Calculator

An extension for 'NetSurfP-2.0' (Klausen et al. (2019) <doi:10.1002/prot.25674>) which is specifically designed to analyze the results of bottom-up-proteomics that is primarily analyzed with 'MaxQuant' (Cox, J., Mann, M. (2008) <doi:10.1038/nbt.1511>). This tool is designed to process a large number of yeast peptides that produced as a results of whole yeast cell-proteome digestion and provide a coherent picture of secondary structure of proteins.

Maintained by Shashank Kumbhare. Last updated 3 years ago.

0.5 match 2.70 score 1 scripts

aaronmilloro

metaprotr:Metaproteomics Post-Processing Analysis

Set of tools for descriptive analysis of metaproteomics data generated from high-throughput mass spectrometry instruments. These tools allow to cluster peptides and proteins abundance, expressed as spectral counts, and to manipulate them in groups of metaproteins. This information can be represented using multiple visualization functions to portray the global metaproteome landscape and to differentiate samples or conditions, in terms of abundance of metaproteins, taxonomic levels and/or functional annotation. The provided tools allow to implement flexible analytical pipelines that can be easily applied to studies interested in metaproteomics analysis.

Maintained by Aaron Millan-Oropeza. Last updated 4 years ago.

0.5 match 2 stars 2.00 score