R-universe search: biostrings

Showing 20 of total 20 results (show query)

bioc

Biostrings:Efficient manipulation of biological strings

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.

Maintained by Hervé Pagès. Last updated 24 days ago.

sequencematching alignment sequencing genetics dataimport datarepresentation infrastructure bioconductor-package core-package

56.2 match 61 stars 17.83 score 8.6k scripts 1.2k dependents

bioc

Modstrings:Working with modified nucleotide sequences

Representing nucleotide modifications in a nucleotide sequence is usually done via special characters from a number of sources. This represents a challenge to work with in R and the Biostrings package. The Modstrings package implements this functionallity for RNA and DNA sequences containing modified nucleotides by translating the character internally in order to work with the infrastructure of the Biostrings package. For this the ModRNAString and ModDNAString classes and derivates and functions to construct and modify these objects despite the encoding issues are implemenented. In addition the conversion from sequences to list like location information (and the reverse operation) is implemented as well.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

dataimport datarepresentation infrastructure sequencing software bioconductor biostrings dna dna-modifications modified-nucleotides nucleotides rna rna-modification-alphabet rna-modifications sequences

12.4 match 1 stars 6.64 score 5 scripts 8 dependents

ropensci

biomartr:Genomic Data Retrieval

Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.

Maintained by Hajk-Georg Drost. Last updated 1 months ago.

biomart genomic-data-retrieval annotation-retrieval database-retrieval ncbi ensembl biological-data-retrieval ensembl-servers genome genome-annotation genome-retrieval genomics meta-analysis metagenomics ncbi-genbank peer-reviewed proteome sequenced-genomes

7.0 match 218 stars 11.35 score 129 scripts 3 dependents

bioc

BSgenome:Software infrastructure for efficient representation of full genomes and their SNPs

Infrastructure shared by all the Biostrings-based genome data packages.

Maintained by Hervé Pagès. Last updated 2 months ago.

genetics infrastructure datarepresentation sequencematching annotation snp bioconductor-package core-package

3.4 match 9 stars 14.12 score 1.2k scripts 269 dependents

svilsen

STRMPS:Analysis of Short Tandem Repeat (STR) Massively Parallel Sequencing (MPS) Data

Loading, identifying, aggregating, manipulating, and analysing short tandem repeat regions of massively parallel sequencing data in forensic genetics. The analyses and framework implemented in this package relies on the papers of Vilsen et al. (2017) <doi:10.1016/j.fsigen.2017.01.017> and Vilsen et al. (2018) <doi:10.1016/j.fsigen.2018.04.003>. Note: that the parallelisation in the package relies on mclapply() and, thus, speed-ups will only be seen on UNIX based systems.

Maintained by Søren B. Vilsen. Last updated 3 days ago.

biostrings pwalign shortread iranges

10.0 match 4.30 score

bioc

Structstrings:Implementation of the dot bracket annotations with Biostrings

The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package.

Maintained by Felix G.M. Ernst. Last updated 5 months ago.

dataimport datarepresentation infrastructure sequencing software alignment sequencematching bioconductor rna rna-structural-analysis rna-structure sequences structures

5.3 match 4 stars 6.46 score 3 scripts 4 dependents

jdieramon

refseqR:Common Computational Operations Working with RefSeq Entries (GenBank)

Fetches NCBI data (RefSeq <https://www.ncbi.nlm.nih.gov/refseq/> database) and provides an environment to extract information at the level of gene, mRNA or protein accessions.

Maintained by Jose V. Die. Last updated 3 months ago.

3.5 match 4 stars 5.34 score 5 scripts

bioc

alabaster.string:Save and Load Biostrings to/from File

Save Biostrings objects to file artifacts, and load them back into memory. This is a more portable alternative to serialization of such objects into RDS files. Each artifact is associated with metadata for further interpretation; downstream applications can enrich this metadata with context-specific properties.

Maintained by Aaron Lun. Last updated 5 months ago.

dataimport datarepresentation

3.6 match 4.95 score 5 scripts 2 dependents

bioc

musicatk:Mutational Signature Comprehensive Analysis Toolkit

Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.

Maintained by Joshua D. Campbell. Last updated 5 months ago.

software biologicalquestion somaticmutation variantannotation

1.8 match 13 stars 7.02 score 20 scripts

bioc

altcdfenvs:alternative CDF environments (aka probeset mappings)

Convenience data structures and functions to handle cdfenvs

Maintained by Laurent Gautier. Last updated 5 months ago.

microarray onechannel qualitycontrol preprocessing annotation proprietaryplatforms transcription

2.3 match 4.95 score 5 scripts 1 dependents

bioc

regutools:regutools: an R package for data extraction from RegulonDB

RegulonDB has collected, harmonized and centralized data from hundreds of experiments for nearly two decades and is considered a point of reference for transcriptional regulation in Escherichia coli K12. Here, we present the regutools R package to facilitate programmatic access to RegulonDB data in computational biology. regutools provides researchers with the possibility of writing reproducible workflows with automated queries to RegulonDB. The regutools package serves as a bridge between RegulonDB data and the Bioconductor ecosystem by reusing the data structures and statistical methods powered by other Bioconductor packages. We demonstrate the integration of regutools with Bioconductor by analyzing transcription factor DNA binding sites and transcriptional regulatory networks from RegulonDB. We anticipate that regutools will serve as a useful building block in our progress to further our understanding of gene regulatory networks.

Maintained by Joselyn Chavez. Last updated 3 months ago.

generegulation geneexpression systemsbiology network networkinference visualization transcription bioconductor cdsb regulondb

1.7 match 4 stars 5.20 score 6 scripts

bioc

factR:Functional Annotation of Custom Transcriptomes

factR contain tools to process and interact with custom-assembled transcriptomes (GTF). At its core, factR constructs CDS information on custom transcripts and subsequently predicts its functional output. In addition, factR has tools capable of plotting transcripts, correcting chromosome and gene information and shortlisting new transcripts.

Maintained by Fursham Hamid. Last updated 5 months ago.

alternativesplicing functionalprediction geneprediction custom-transcriptomes functional-annotation gtf rna-seq-analysis

1.7 match 1 stars 4.00 score 5 scripts

bioc

MSA2dist:MSA2dist calculates pairwise distances between all sequences of a DNAStringSet or a AAStringSet using a custom score matrix and conducts codon based analysis

MSA2dist calculates pairwise distances between all sequences of a DNAStringSet or a AAStringSet using a custom score matrix and conducts codon based analysis. It uses scoring matrices to be used in these pairwise distance calcualtions which can be adapted to any scoring for DNA or AA characters. E.g. by using literal distances MSA2dist calculates pairwise IUPAC distances.

Maintained by Kristian K Ullrich. Last updated 4 months ago.

alignment sequencing genetics go cpp

1.2 match 5.02 score 7 scripts 1 dependents

bioc

ChIPpeakAnno:Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments, or any experiments that result in large number of genomic interval data

The package encompasses a range of functions for identifying the closest gene, exon, miRNA, or custom features—such as highly conserved elements and user-supplied transcription factor binding sites. Additionally, users can retrieve sequences around the peaks and obtain enriched Gene Ontology (GO) or Pathway terms. In version 2.0.5 and beyond, new functionalities have been introduced. These include features for identifying peaks associated with bi-directional promoters along with summary statistics (peaksNearBDP), summarizing motif occurrences in peaks (summarizePatternInPeaks), and associating additional identifiers with annotated peaks or enrichedGO (addGeneIDs). The package integrates with various other packages such as biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest, and stat to enhance its analytical capabilities.

Maintained by Jianhong Ou. Last updated 2 months ago.

annotation chipseq chipchip

0.5 match 8.75 score 584 scripts 6 dependents

regisoc

kibior:A Simple Data Management and Sharing Tool

An interface to store, retrieve, search, join and share datasets, based on Elasticsearch (ES) API. As a decentralized, FAIR and collaborative search engine and database effort, it proposes a simple push/pull/search mechanism only based on ES, a tool which can be deployed on nearly any hardware. It is a high-level R-ES binding to ease data usage using 'elastic' package (S. Chamberlain (2020)) <https://docs.ropensci.org/elastic/>, extends joins from 'dplyr' package (H. Wickham et al. (2020)) <https://dplyr.tidyverse.org/> and integrates specific biological format importation with Bioconductor packages such as 'rtracklayer' (M. Lawrence and al. (2009) <doi:10.1093/bioinformatics/btp328>) <http://bioconductor.org/packages/rtracklayer>, 'Biostrings' (H. Pagès and al. (2020) <doi:10.18129/B9.bioc.Biostrings>) <http://bioconductor.org/packages/Biostrings>, and 'Rsamtools' (M. Morgan and al. (2020) <doi:10.18129/B9.bioc.Rsamtools>) <http://bioconductor.org/packages/Rsamtools>, but also a long list of more common ones with 'rio' (C-h. Chan and al. (2018)) <https://cran.r-project.org/package=rio>.

Maintained by Régis Ongaro-Carcy. Last updated 4 years ago.

dataimport datarepresentation thirdpartyclient data-science database datasets elasticsearch elasticsearch-client push-pull search search-engine

0.9 match 3 stars 4.48 score 8 scripts

bioc

PWMEnrich:PWM enrichment analysis

A toolkit of high-level functions for DNA motif scanning and enrichment analysis built upon Biostrings. The main functionality is PWM enrichment analysis of already known PWMs (e.g. from databases such as MotifDb), but the package also implements high-level functions for PWM scanning and visualisation. The package does not perform "de novo" motif discovery, but is instead focused on using motifs that are either experimentally derived or computationally constructed by other tools.

Maintained by Diego Diez. Last updated 5 months ago.

motifannotation sequencematching software

0.5 match 5.08 score 60 scripts

bioc

ggseqalign:Minimal Visualization of Sequence Alignments

Simple visualizations of alignments of DNA or AA sequences as well as arbitrary strings. Compatible with Biostrings and ggplot2. The plots are fully customizable using ggplot2 modifiers such as theme().

Maintained by Simeon Lim Rossmann. Last updated 13 days ago.

alignment multiplesequencealignment software visualization bioinformatics ggplot2-enhancements minimalistic

0.5 match 4.48 score

bioc

XNAString:Efficient Manipulation of Modified Oligonucleotide Sequences

The XNAString package allows for description of base sequences and associated chemical modifications in a single object. XNAString is able to capture single stranded, as well as double stranded molecules. Chemical modifications are represented as independent strings associated with different features of the molecules (base sequence, sugar sequence, backbone sequence, modifications) and can be read or written to a HELM notation. It also enables secondary structure prediction using RNAfold from ViennaRNA. XNAString is designed to be efficient representation of nucleic-acid based therapeutics, therefore it stores information about target sequences and provides interface for matching and alignment functions from Biostrings and pwalign packages.

Maintained by Marianna Plucinska. Last updated 5 months ago.

sequencematching alignment sequencing genetics cpp

0.5 match 4.18 score 4 scripts

bioc

FastqCleaner:A Shiny Application for Quality Control, Filtering and Trimming of FASTQ Files

An interactive web application for quality control, filtering and trimming of FASTQ files. This user-friendly tool combines a pipeline for data processing based on Biostrings and ShortRead infrastructure, with a cutting-edge visual environment. Single-Read and Paired-End files can be locally processed. Diagnostic interactive plots (CG content, per-base sequence quality, etc.) are provided for both the input and output files.

Maintained by Leandro Roser. Last updated 5 months ago.

qualitycontrol sequencing software sangerseq sequencematching cpp

0.5 match 4.00 score 4 scripts

mbeer3

gkmSVM:Gapped-Kmer Support Vector Machine

Imports the 'gkmSVM' v2.0 functionalities into R <https://www.beerlab.org/gkmsvm/> It also uses the 'kernlab' library (separate R package by different authors) for various SVM algorithms. Users should note that the suggested packages 'rtracklayer', 'GenomicRanges', 'BSgenome', 'BiocGenerics', 'Biostrings', 'GenomeInfoDb', 'IRanges', and 'S4Vectors' are all BioConductor packages <https://bioconductor.org>.

Maintained by Mike Beer. Last updated 2 years ago.

cpp

0.5 match 2.48 score 30 scripts