R-universe search: fingerprint

bioc

Rcpi:Molecular Informatics Toolkit for Compound-Protein Interaction in Drug Discovery

A molecular informatics toolkit with an integration of bioinformatics and chemoinformatics tools for drug discovery.

Maintained by Nan Xiao. Last updated 5 months ago.

software dataimport datarepresentation featureextraction cheminformatics biomedicalinformatics proteomics go systemsbiology bioconductor bioinformatics drug-discovery feature-extraction fingerprint molecular-descriptors protein-sequences

54.6 match 37 stars 7.81 score 29 scripts

rajarshi

fingerprint:Functions to Operate on Binary Fingerprint Data

Functions to manipulate binary fingerprints of arbitrary length. A fingerprint is represented by an object of S4 class 'fingerprint' which is internally represented a vector of integers, such that each element represents the position in the fingerprint that is set to 1. The bitwise logical functions in R are overridden so that they can be used directly with 'fingerprint' objects. A number of distance metrics are also available (many contributed by Michael Fadock). Fingerprints can be converted to Euclidean vectors (i.e., points on the unit hypersphere) and can also be folded using OR. Arbitrary fingerprint formats can be handled via line handlers. Currently handlers are provided for CDK, MOE and BCI fingerprint data.

Maintained by Rajarshi Guha. Last updated 7 years ago.

93.3 match 4.22 score 82 scripts 12 dependents

bioc

ChemmineR:Cheminformatics Toolkit for R

ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.

Maintained by Thomas Girke. Last updated 5 months ago.

cheminformatics biomedicalinformatics pharmacogenetics pharmacogenomics microtitreplateassay cellbasedassays visualization infrastructure dataimport clustering proteomics metabolomics cpp

21.5 match 14 stars 9.42 score 253 scripts 12 dependents

bioc

flowFP:Fingerprinting for Flow Cytometry

Fingerprint generation of flow cytometry data, used to facilitate the application of machine learning and datamining tools for flow cytometry.

Maintained by Herb Holyst. Last updated 5 months ago.

flowcytometry cellbasedassays clustering visualization

28.6 match 4.72 score 11 scripts 2 dependents

jeroen

openssl:Toolkit for Encryption, Signatures and Certificates Based on OpenSSL

Bindings to OpenSSL libssl and libcrypto, plus custom SSH key parsers. Supports RSA, DSA and EC curves P-256, P-384, P-521, and curve25519. Cryptographic signatures can either be created and verified manually or via x509 certificates. AES can be used in cbc, ctr or gcm mode for symmetric encryption; RSA for asymmetric (public key) encryption or EC for Diffie Hellman. High-level envelope functions combine RSA and AES for encrypting arbitrary sized data. Other utilities include key generators, hash functions (md5, sha1, sha256, etc), base64 encoder, a secure random number generator, and 'bignum' math methods for manually performing crypto calculations on large multibyte integers.

Maintained by Jeroen Ooms. Last updated 1 months ago.

openssl

5.6 match 65 stars 18.00 score 632 scripts 5.0k dependents

pik-piam

madrat:May All Data be Reproducible and Transparent (MADRaT) *

Provides a framework which should improve reproducibility and transparency in data processing. It provides functionality such as automatic meta data creation and management, rudimentary quality management, data caching, work-flow management and data aggregation. * The title is a wish not a promise. By no means we expect this package to deliver everything what is needed to achieve full reproducibility and transparency, but we believe that it supports efforts in this direction.

Maintained by Jan Philipp Dietrich. Last updated 2 days ago.

6.0 match 15 stars 11.01 score 83 scripts 38 dependents

bioc

BloodGen3Module:This R package for performing module repertoire analyses and generating fingerprint representations

The BloodGen3Module package provides functions for R user performing module repertoire analyses and generating fingerprint representations. Functions can perform group comparison or individual sample analysis and visualization by fingerprint grid plot or fingerprint heatmap. Module repertoire analyses typically involve determining the percentage of the constitutive genes for each module that are significantly increased or decreased. As we describe in details;https://www.biorxiv.org/content/10.1101/525709v2 and https://pubmed.ncbi.nlm.nih.gov/33624743/, the results of module repertoire analyses can be represented in a fingerprint format, where red and blue spots indicate increases or decreases in module activity. These spots are subsequently represented either on a grid, with each position being assigned to a given module, or in a heatmap where the samples are arranged in columns and the modules in rows.

Maintained by Darawan Rinchai. Last updated 5 months ago.

software visualization geneexpression

12.8 match 4.30 score 5 scripts

liyanstat

dacc:Detection and Attribution Analysis of Climate Change

Conduct detection and attribution of climate change using methods including optimal fingerprinting via generalized total least squares or estimating equation approach from Ma et al. (2023) <doi:10.1175/JCLI-D-22-0681.1>. Provide shrinkage estimators for covariance matrix from Ledoit and Wolf (2004) <doi:10.1016/S0047-259X(03)00096-4>, and Ledoit and Wolf (2017) <doi:10.2139/ssrn.2383361>.

Maintained by Yan Li. Last updated 2 months ago.

10.4 match 14 stars 4.45 score

zachcp

rcdk:Interface to the 'CDK' Libraries

Allows the user to access functionality in the 'CDK', a Java framework for chemoinformatics. This allows the user to load molecules, evaluate fingerprints, calculate molecular descriptors and so on. In addition, the 'CDK' API allows the user to view structures in 2D.

Maintained by Zachary Charlop-Powers. Last updated 2 years ago.

openjdk

6.8 match 1 stars 6.78 score 287 scripts 11 dependents

friendly

HistData:Data Sets from the History of Statistics and Data Visualization

The 'HistData' package provides a collection of small data sets that are interesting and important in the history of statistics and data visualization. The goal of the package is to make these available, both for instructional use and for historical research. Some of these present interesting challenges for graphics or analysis in R.

Maintained by Michael Friendly. Last updated 10 months ago.

graphics historical-data

3.8 match 63 stars 9.19 score 732 scripts 2 dependents

nanxstats

protr:Generating Various Numerical Representation Schemes for Protein Sequences

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.

Maintained by Nan Xiao. Last updated 6 months ago.

bioinformatics feature-engineering feature-extraction machine-learning peptides protein-sequences sequence-analysis

3.3 match 52 stars 10.02 score 173 scripts 3 dependents

bioc

omicsPrint:Cross omic genetic fingerprinting

omicsPrint provides functionality for cross omic genetic fingerprinting, for example, to verify sample relationships between multiple omics data types, i.e. genomic, transcriptomic and epigenetic (DNA methylation).

Maintained by Davy Cats. Last updated 5 months ago.

qualitycontrol genetics epigenetics transcriptomics dnamethylation transcription geneticvariability immunooncology

3.6 match 5.20 score 32 scripts

bioc

ChemmineOB:R interface to a subset of OpenBabel functionalities

ChemmineOB provides an R interface to a subset of cheminformatics functionalities implemented by the OpelBabel C++ project. OpenBabel is an open source cheminformatics toolbox that includes utilities for structure format interconversions, descriptor calculations, compound similarity searching and more. ChemineOB aims to make a subset of these utilities available from within R. For non-developers, ChemineOB is primarily intended to be used from ChemmineR as an add-on package rather than used directly.

Maintained by Thomas Girke. Last updated 5 months ago.

cheminformatics biomedicalinformatics pharmacogenetics pharmacogenomics microtitreplateassay cellbasedassays visualization infrastructure dataimport clustering proteomics metabolomics openbabel cpp

2.3 match 10 stars 7.87 score 77 scripts 1 dependents

chrismuir

refinr:Cluster and Merge Similar Values Within a Character Vector

These functions take a character vector as input, identify and cluster similar values, and then merge clusters together so their values become identical. The functions are an implementation of the key collision and ngram fingerprint algorithms from the open source tool Open Refine <https://openrefine.org/>. More info on key collision and ngram fingerprint can be found here <https://openrefine.org/docs/technical-reference/clustering-in-depth>.

Maintained by Chris Muir. Last updated 1 years ago.

approximate-string-matching clustering data-cleaning data-clustering fuzzy-matching ngram openrefine cpp

2.6 match 104 stars 6.80 score 121 scripts

inbo

n2kanalysis:Generic Functions to Analyse Data from the 'Natura 2000' Monitoring

All generic functions and classes for the analysis for the 'Natura 2000' monitoring. The classes contain all required data and definitions to fit the model without the need to access other sources. Potentially they might need access to one or more parent objects. An aggregation object might for example need the result of an imputation object. The actual definition of the analysis, using these generic function and classes, is defined in dedictated analysis R packages for every monitoring scheme. For example 'abvanalysis' and 'watervogelanalysis'.

Maintained by Thierry Onkelinx. Last updated 2 months ago.

analysis monitoring natura2000

5.4 match 1 stars 3.18 score 7 scripts

bioc

Scale4C:Scale4C: an R/Bioconductor package for scale-space transformation of 4C-seq data

Scale4C is an R/Bioconductor package for scale-space transformation and visualization of 4C-seq data. The scale-space transformation is a multi-scale visualization technique to transform a 2D signal (e.g. 4C-seq reads on a genomic interval of choice) into a tesselation in the scale space (2D, genomic position x scale factor) by applying different smoothing kernels (Gauss, with increasing sigma). This transformation allows for explorative analysis and comparisons of the data's structure with other samples.

Maintained by Carolin Walter. Last updated 5 months ago.

visualization qualitycontrol dataimport sequencing coverage

5.1 match 3.30 score 1 scripts

esansano

ipft:Indoor Positioning Fingerprinting Toolset

Algorithms and utility functions for indoor positioning using fingerprinting techniques. These functions are designed for manipulation of RSSI (Received Signal Strength Intensity) data sets, estimation of positions,comparison of the performance of different models, and graphical visualization of data. Machine learning algorithms and methods such as k-nearest neighbors or probabilistic fingerprinting are implemented in this package to perform analysis and estimations over RSSI data sets.

Maintained by Emilio Sansano. Last updated 7 years ago.

cpp

11.8 match 1.23 score 17 scripts

ropensci

bowerbird:Keep a Collection of Sparkly Data Resources

Tools to get and maintain a data repository from third-party data providers.

Maintained by Ben Raymond. Last updated 5 days ago.

ropensci antarctic southern ocean data environmental satellite climate peer-reviewed

1.8 match 50 stars 7.16 score 16 scripts 1 dependents

bioc

Uniquorn:Identification of cancer cell lines based on their weighted mutational/ variational fingerprint

'Uniquorn' enables users to identify cancer cell lines. Cancer cell line misidentification and cross-contamination reprents a significant challenge for cancer researchers. The identification is vital and in the frame of this package based on the locations/ loci of somatic and germline mutations/ variations. The input format is vcf/ vcf.gz and the files have to contain a single cancer cell line sample (i.e. a single member/genotype/gt column in the vcf file).

Maintained by Raik Otto. Last updated 5 months ago.

immunooncology statisticalmethod wholegenome exomeseq

2.8 match 4.30 score

ropensci

DataPackageR:Construct Reproducible Analytic Data Sets as R Packages

A framework to help construct R data packages in a reproducible manner. Potentially time consuming processing of raw data sets into analysis ready data sets is done in a reproducible manner and decoupled from the usual 'R CMD build' process so that data sets can be processed into R objects in the data package and the data package can then be shared, built, and installed by others without the need to repeat computationally costly data processing. The package maintains data provenance by turning the data processing scripts into package vignettes, as well as enforcing documentation and version checking of included data objects. Data packages can be version controlled on 'GitHub', and used to share data for manuscripts, collaboration and reproducible research.

Maintained by Dave Slager. Last updated 6 months ago.

peer-reviewed reproducibility

1.3 match 156 stars 9.38 score 72 scripts

bioc

rcellminer:rcellminer: Molecular Profiles, Drug Response, and Chemical Structures for the NCI-60 Cell Lines

The NCI-60 cancer cell line panel has been used over the course of several decades as an anti-cancer drug screen. This panel was developed as part of the Developmental Therapeutics Program (DTP, http://dtp.nci.nih.gov/) of the U.S. National Cancer Institute (NCI). Thousands of compounds have been tested on the NCI-60, which have been extensively characterized by many platforms for gene and protein expression, copy number, mutation, and others (Reinhold, et al., 2012). The purpose of the CellMiner project (http://discover.nci.nih.gov/ cellminer) has been to integrate data from multiple platforms used to analyze the NCI-60 and to provide a powerful suite of tools for exploration of NCI-60 data.

Maintained by Augustin Luna. Last updated 5 months ago.

acgh cellbasedassays copynumbervariation geneexpression pharmacogenomics pharmacogenetics mirna cheminformatics visualization software systemsbiology

2.0 match 5.71 score 113 scripts

cran

fmri:Analysis of fMRI Experiments

Contains R-functions to perform an fMRI analysis as described in Polzehl and Tabelow (2019) <DOI:10.1007/978-3-030-29184-6>, Tabelow et al. (2006) <DOI:10.1016/j.neuroimage.2006.06.029>, Polzehl et al. (2010) <DOI:10.1016/j.neuroimage.2010.04.241>, Tabelow and Polzehl (2011) <DOI:10.18637/jss.v044.i11>.

Maintained by Karsten Tabelow. Last updated 8 months ago.

fortran openblas

2.3 match 2 stars 4.47 score 99 scripts 1 dependents

ivanlizaga

fingerPro:Sediment Source Fingerprinting

Quantifies the provenance of the sediments in a catchment or study area. Based on a comprehensive characterization of the sediment sources and the end sediment mixtures a mixing model algorithm is applied to the sediment mixtures in order to estimate the relative contribution of each potential source. The package includes several statistical methods such as Kruskal-Wallis test, discriminant function analysis ('DFA'), principal component plot ('PCA') to select the optimal subset of tracer properties. The variability within each sediment source is also considered to estimate the statistical distribution of the sources contribution.

Maintained by Ivan Lizaga. Last updated 7 years ago.

gsl cpp

8.8 match 1.11 score 13 scripts

bioc

DeepPINCS:Protein Interactions and Networks with Compounds based on Sequences using Deep Learning

The identification of novel compound-protein interaction (CPI) is important in drug discovery. Revealing unknown compound-protein interactions is useful to design a new drug for a target protein by screening candidate compounds. The accurate CPI prediction assists in effective drug discovery process. To identify potential CPI effectively, prediction methods based on machine learning and deep learning have been developed. Data for sequences are provided as discrete symbolic data. In the data, compounds are represented as SMILES (simplified molecular-input line-entry system) strings and proteins are sequences in which the characters are amino acids. The outcome is defined as a variable that indicates how strong two molecules interact with each other or whether there is an interaction between them. In this package, a deep-learning based model that takes only sequence information of both compounds and proteins as input and the outcome as output is used to predict CPI. The model is implemented by using compound and protein encoders with useful features. The CPI model also supports other modeling tasks, including protein-protein interaction (PPI), chemical-chemical interaction (CCI), or single compounds and proteins. Although the model is designed for proteins, DNA and RNA can be used if they are represented as sequences.

Maintained by Dongmin Jung. Last updated 5 months ago.

software network graphandnetwork neuralnetwork openjdk

1.8 match 4.78 score 4 scripts 2 dependents

varungiri

RxnSim:Functions to Compute Chemical and Chemical Reaction Similarity

Methods to compute chemical similarity between two or more reactions and molecules. Allows masking of chemical substructures for weighted similarity computations. Uses packages 'rCDK' and 'fingerprint' for cheminformatics functionality. Methods for reaction similarity and sub-structure masking are as described in: Giri et al. (2015) <doi:10.1093/bioinformatics/btv416>.

Maintained by Varun Giri. Last updated 2 years ago.

openjdk

2.5 match 2 stars 3.18 score 15 scripts

jmcurran

jaggR:Supporting Files and Functions for the Book Bayesian Modelling with 'JAGS'

All the data and functions used to produce the book. We do not expect most people to use the package for any other reason than to get simple access to the 'JAGS' model files, the data, and perhaps run some of the simple examples. The authors of the book are David Lucy (now sadly deceased) and James Curran. It is anticipated that a manuscript will be provided to Taylor and Francis around Augus 2020, with bibliographic details to follow at that point. Until such time, further information can be obtained by emailing James Curran.

Maintained by James Curran. Last updated 1 years ago.

3.4 match 2.00 score 1 scripts

kwb-r

kwb.miacso:functions used in KWB project MIA-CSO

functions used in KWB project MIA-CSO, for example for plotting data availabilities.

Maintained by Hauke Sonnenberg. Last updated 3 years ago.

project-miacso

3.8 match 1.70 score 1 scripts

cran

IntegratedJM:Joint Modeling of the Gene-Expression and Bioassay Data, Taking Care of the Effect Due to a Fingerprint Feature

Offers modeling the association between gene-expression and bioassay data, taking care of the effect due to a fingerprint feature and helps with several plots to better understand the analysis.

Maintained by Rudradev Sengupta. Last updated 8 years ago.

4.9 match 1.08 score 12 scripts

bioc

bioassayR:Cross-target analysis of small molecule bioactivity

bioassayR is a computational tool that enables simultaneous analysis of thousands of bioassay experiments performed over a diverse set of compounds and biological targets. Unique features include support for large-scale cross-target analyses of both public and custom bioassays, generation of high throughput screening fingerprints (HTSFPs), and an optional preloaded database that provides access to a substantial portion of publicly available bioactivity data.

Maintained by Thomas Girke. Last updated 5 months ago.

immunooncology microtitreplateassay cellbasedassays visualization infrastructure dataimport bioinformatics proteomics metabolomics

0.5 match 5 stars 6.70 score 46 scripts

tesselle

nexus:Sourcing Archaeological Materials by Chemical Composition

Exploration and analysis of compositional data in the framework of Aitchison (1986, ISBN: 978-94-010-8324-9). This package provides tools for chemical fingerprinting and source tracking of ancient materials.

Maintained by Nicolas Frerebeau. Last updated 12 days ago.

archaeology archaeological-science archaeometry compositional-data provenance-studies

0.5 match 5.21 score 26 scripts 1 dependents

zeehio

sgolay:Efficient Savitzky-Golay Filtering

Smoothing signals and computing their derivatives is a common requirement in signal processing workflows. Savitzky-Golay filters are a established method able to do both (Savitzky and Golay, 1964 <doi:10.1021/ac60214a047>). This package implements one dimensional Savitzky-Golay filters that can be applied to vectors and matrices (either row-wise or column-wise). Vectorization and memory allocations have been profiled to reduce computational fingerprint. Short filter lengths are implemented in the direct space, while longer filters are implemented in frequency space, using a Fast Fourier Transform (FFT).

Maintained by Sergio Oller Moreno. Last updated 2 years ago.

openblas

0.5 match 7 stars 3.54 score 2 scripts

eddelbuettel

RcppFarmHash:Interface to the Google 'FarmHash' Family of Hash Functions

The Google 'FarmHash' family of hash functions is used by the Google 'BigQuery' data warehouse via the 'FARM_FINGERPRINT' function. This package permits to calculate these hash digest fingerprints directly from R, and uses the included 'FarmHash' files written by G. Pike and copyrighted by Google, Inc.

Maintained by Dirk Eddelbuettel. Last updated 6 months ago.

farmhash cpp

0.5 match 2 stars 3.48 score 2 scripts

ablommaert

spectralAnalysis:Pre-Process, Visualize and Analyse Spectral Data

Infrared, near-infrared and Raman spectroscopic data measured during chemical reactions, provide structural fingerprints by which molecules can be identified and quantified. The application of these spectroscopic techniques as inline process analytical tools (PAT), provides the pharmaceutical and chemical industry with novel tools, allowing to monitor their chemical processes, resulting in a better process understanding through insight in reaction rates, mechanistics, stability, etc. Data can be read into R via the generic spc-format, which is generally supported by spectrometer vendor software. Versatile pre-processing functions are available to perform baseline correction by linking to the 'baseline' package; noise reduction via the 'signal' package; as well as time alignment, normalization, differentiation, integration and interpolation. Implementation based on the S4 object system allows storing a pre-processing pipeline as part of a spectral data object, and easily transferring it to other datasets. Interactive plotting tools are provided based on the 'plotly' package. Non-negative matrix factorization (NMF) has been implemented to perform multivariate analyses on individual spectral datasets or on multiple datasets at once. NMF provides a parts-based representation of the spectral data in terms of spectral signatures of the chemical compounds and their relative proportions. See 'hNMF'-package for references on available methods. The functionality to read in spc-files was adapted from the 'hyperSpec' package.

Maintained by Adriaan Blommaert. Last updated 1 years ago.

0.5 match 2.26 score 18 scripts

tkcaccia

musicNMR:Conversion of Nuclear Magnetic Resonance Spectra in Audio Files

A collection of functions for converting and visualization the free induction decay of mono dimensional nuclear magnetic resonance (NMR) spectra into an audio file. It facilitates the conversion of Bruker datasets in files WAV. The sound of NMR signals could provide an alternative to the current representation of the individual metabolic fingerprint and supply equally significant information. The package includes also NMR spectra of the urine samples provided by four healthy donors. Based on Cacciatore S, Saccenti E, Piccioli M. Hypothesis: the sound of the individual metabolic phenotype? Acoustic detection of NMR experiments. OMICS. 2015;19(3):147-56. <doi:10.1089/omi.2014.0131>.

Maintained by Stefano Cacciatore. Last updated 1 years ago.

0.5 match 1.00 score 4 scripts

yuanbofaith

protag:Search Tagged Peptides & Draw Highlighted Mass Spectra

In a typical protein labelling procedure, proteins are chemically tagged with a functional group, usually at specific sites, then digested into peptides, which are then analyzed using matrix-assisted laser desorption ionization - time of flight mass spectrometry (MALDI-TOF MS) to generate peptide fingerprint. Relative to the control, peptides that are heavier by the mass of the labelling group are informative for sequence determination. Searching for peptides with such mass shifts, however, can be difficult. This package, designed to tackle this inconvenience, takes as input the mass list of two or multiple MALDI-TOF MS mass lists, and makes pairwise comparisons between the labeled groups vs. control, and restores centroid mass spectra with highlighted peaks of interest for easier visual examination. Particularly, peaks differentiated by the mass of the labelling group are defined as a “pair”, those with equal masses as a “match”, and all the other peaks as a “mismatch”.For more bioanalytical background information, refer to following publications: Jingjing Deng (2015) <doi:10.1007/978-1-4939-2550-6_19>; Elizabeth Chang (2016) <doi:10.7171/jbt.16-2702-002>.

Maintained by Bo Yuan. Last updated 6 years ago.

0.5 match 1 stars 1.00 score 1 scripts