R-universe search: encoding

bergsmat

encode:Represent Ordered Lists and Pairs as Strings

Interconverts between ordered lists and compact string notation. Useful for capturing code lists, and pair-wise codes and decodes, for text storage. Analogous to factor levels and labels. Generics encode() and decode() perform interconversion, while codes() and decodes() extract components of an encoding. The function encoded() checks whether something is interpretable as an encoding. If a vector has an encoded 'guide' attribute, as_factor() uses it to coerce to factor.

Maintained by Tim Bergsma. Last updated 6 years ago.

71.2 match 2 stars 4.03 score 12 scripts 5 dependents

gagolews

stringi:Fast and Portable Character String Processing Facilities

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).

Maintained by Marek Gagolewski. Last updated 1 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode cpp

15.0 match 309 stars 18.31 score 10k scripts 8.6k dependents

quanteda

readtext:Import and Handling for Plain and Formatted Text Files

Functions for importing and handling text files and formatted text files with additional meta-data, such including '.csv', '.tab', '.json', '.xml', '.html', '.pdf', '.doc', '.docx', '.rtf', '.xls', '.xlsx', and others.

Maintained by Kenneth Benoit. Last updated 4 months ago.

encoding quanteda text

21.9 match 122 stars 10.66 score 1.2k scripts 5 dependents

qsbase

qs:Quick Serialization of R Objects

Provides functions for quickly writing and reading any R object to and from disk.

Maintained by Travers Ching. Last updated 9 days ago.

compression data-storage encoding serialization libzstd lz4 cpp

16.4 match 414 stars 13.91 score 2.5k scripts 51 dependents

cefet-rj-dal

daltoolbox:Leveraging Experiment Lines to Data Analytics

The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.

Maintained by Eduardo Ogasawara. Last updated 1 months ago.

31.7 match 1 stars 6.65 score 536 scripts 4 dependents

symbolixau

googlePolylines:Encoding Coordinates into 'Google' Polylines

Encodes simple feature ('sf') objects and coordinates, and decodes polylines using the 'Google' polyline encoding algorithm (<https://developers.google.com/maps/documentation/utilities/polylinealgorithm>).

Maintained by David Cooley. Last updated 2 days ago.

geospatial gis google-maps polyline-encoder r-spatial spatial cpp

21.7 match 18 stars 8.11 score 9 dependents

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 3 days ago.

12.0 match 845 stars 13.57 score 264 scripts 2 dependents

mlr-org

mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'

Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.

Maintained by Martin Binder. Last updated 8 days ago.

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing stacking

13.2 match 141 stars 12.36 score 448 scripts 7 dependents

munterfi

flexpolyline:Flexible Polyline Encoding

Binding to the C++ implementation of the flexible polyline encoding by HERE <https://github.com/heremaps/flexible-polyline>. The flexible polyline encoding is a lossy compressed representation of a list of coordinate pairs or coordinate triples. The encoding is achieved by: (1) Reducing the decimal digits of each value; (2) encoding only the offset from the previous point; (3) using variable length for each coordinate delta; and (4) using 64 URL-safe characters to display the result.

Maintained by Merlin Unterfinger. Last updated 2 years ago.

gis heremaps polyline polyline-decoder polyline-encoder rspatial cpp

28.2 match 9 stars 5.75 score 14 scripts 1 dependents

polmine

polmineR:Verbs and Nouns for Corpus Analysis

Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.

Maintained by Andreas Blaette. Last updated 1 years ago.

19.8 match 49 stars 7.96 score 311 scripts

jszitas

categoryEncodings:Category Variable Encodings

Simple, fast, and automatic encodings for category data using a data.table backend. Most of the methods are an implementation of Johannemann, Hadad, Athey, Wager (2019) <arXiv:1908.09874>, particularly their 'means', "sPCA", "low-rank" and "multinomial logit".

Maintained by Juraj Szitas. Last updated 3 years ago.

categorical-variables feature-encoding feature-engineering

46.4 match 3 stars 3.18 score 2 scripts

yihui

xfun:Supporting Functions for Packages Maintained by 'Yihui Xie'

Miscellaneous functions commonly used in other packages maintained by 'Yihui Xie'.

Maintained by Yihui Xie. Last updated 2 days ago.

6.6 match 145 stars 18.18 score 916 scripts 4.4k dependents

rpolars

polars:Lightning-Fast 'DataFrame' Library

Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.

Maintained by Soren Welling. Last updated 3 days ago.

arrow polars rust

9.9 match 499 stars 12.01 score 1.0k scripts 2 dependents

michbur

biogram:N-Gram Analysis of Biological Sequences

Tools for extraction and analysis of various n-grams (k-mers) derived from biological sequences (proteins or nucleic acids). Contains QuiPT (quick permutation test) for fast feature-filtering of the n-gram data.

Maintained by Michal Burdukiewicz. Last updated 7 months ago.

biological-sequences ngram-analysis

15.5 match 10 stars 7.50 score 87 scripts 3 dependents

bquast

HomomorphicEncryption:BFV, BGV, CKKS Schema for Fully Homomorphic Encryption

Implements the Brakerski-Fan-Vercauteren (BFV, 2012) <https://eprint.iacr.org/2012/144>, Brakerski-Gentry-Vaikuntanathan (BGV, 2014) <doi:10.1145/2633600>, and Cheon-Kim-Kim-Song (CKKS, 2016) <https://eprint.iacr.org/2016/421.pdf> schema for Fully Homomorphic Encryption. The included vignettes demonstrate the encryption procedures.

Maintained by Bastiaan Quast. Last updated 1 years ago.

19.5 match 1 stars 5.52 score 39 scripts

bioc

graph:graph: A package to handle graph data structures

A package that implements some simple graph handling capabilities.

Maintained by Bioconductor Package Maintainer. Last updated 10 days ago.

graphandnetwork

8.7 match 11.78 score 764 scripts 342 dependents

bioc

consensusSeekeR:Detection of consensus regions inside a group of experiences using genomic positions and genomic ranges

This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region. In genomic analysis where feature identification generates a position value surrounded by a genomic range, such as ChIP-Seq peaks and nucleosome positions, the replication of an experiment may result in slight differences between predicted values. This package enables the conciliation of the results into consensus regions.

Maintained by Astrid Deschênes. Last updated 5 months ago.

biologicalquestion chipseq genetics multiplecomparison transcription peakdetection sequencing coverage chip-seq-analysis genomic-data-analysis nucleosome-positioning

19.2 match 1 stars 5.26 score 5 scripts 1 dependents

polkas

cat2cat:Handling an Inconsistently Coded Categorical Variable in a Longitudinal Dataset

Unifying an inconsistently coded categorical variable between two different time points in accordance with a mapping table. The main rule is to replicate the observation if it could be assigned to a few categories. Then using frequencies or statistical methods to approximate the probabilities of being assigned to each of them. This procedure was invented and implemented in the paper by Nasinski, Majchrowska, and Broniatowska (2020) <doi:10.24425/cejeme.2020.134747>.

Maintained by Maciej Nasinski. Last updated 1 years ago.

categories encoding encodings factor longitudinal mapping mappings panel transitions

23.2 match 4 stars 4.30 score 2 scripts

patperry

utf8:Unicode Text Processing

Process and print 'UTF-8' encoded international text (Unicode). Input, validate, normalize, encode, format, and display.

Maintained by Kirill Müller. Last updated 3 months ago.

6.0 match 113 stars 16.48 score 295 scripts 11k dependents

tidymodels

embed:Extra Recipes for Encoding Predictors

Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.

Maintained by Emil Hvitfeldt. Last updated 2 months ago.

10.5 match 142 stars 9.35 score 1.1k scripts

bioc

DEWSeq:Differential Expressed Windows Based on Negative Binomial Distribution

DEWSeq is a sliding window approach for the analysis of differentially enriched binding regions eCLIP or iCLIP next generation sequencing data.

Maintained by bioinformatics team Hentze. Last updated 5 months ago.

sequencing generegulation functionalgenomics differentialexpression bioinformatics eclip ngs-analysis

17.8 match 5 stars 5.30 score 4 scripts

btskinner

crosswalkr:Rename and Encode Data Frames Using External Crosswalk Files

A pair of functions for renaming and encoding data frames using external crosswalk files. It is especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on similar commands in 'Stata'.

Maintained by Benjamin Skinner. Last updated 1 years ago.

crosswalk encode labels rename

17.3 match 9 stars 5.26 score 20 scripts

bioc

GreyListChIP:Grey Lists -- Mask Artefact Regions Based on ChIP Inputs

Identify regions of ChIP experiments with high signal in the input, that lead to spurious peaks during peak calling. Remove reads aligning to these regions prior to peak calling, for cleaner ChIP analysis.

Maintained by Matt Eldridge. Last updated 5 months ago.

chipseq alignment preprocessing differentialpeakcalling sequencing genomeannotation coverage

18.3 match 4.93 score 10 scripts 4 dependents

s-u

base64enc:Tools for base64 Encoding

Tools for handling base64 encoding. It is more flexible than the orphaned base64 package.

Maintained by Simon Urbanek. Last updated 3 years ago.

7.0 match 9 stars 12.62 score 680 scripts 4.8k dependents

overton-group

eHDPrep:Quality Control and Semantic Enrichment of Datasets

A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) <doi:10.1093/gigascience/giad030>). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.

Maintained by Ian Overton. Last updated 2 years ago.

data-quality health-informatics semantic-enrichment

17.9 match 8 stars 4.90 score 10 scripts

mllg

base64url:Fast and URL-Safe Base64 Encoder and Decoder

In contrast to RFC3548, the 62nd character ("+") is replaced with "-", the 63rd character ("/") is replaced with "_". The resulting encoded strings comply to the regular expression pattern "[A-Za-z0-9_-]" and thus are safe to use in URLs or for file names. The package also comes with a simple base32 encoder/decoder suited for case insensitive file systems.

Maintained by Michel Lang. Last updated 5 years ago.

base32 base64 base64url

10.0 match 12 stars 8.37 score 15 scripts 68 dependents

rexyai

RestRserve:A Framework for Building HTTP API

Allows to easily create high-performance full featured HTTP APIs from R functions. Provides high-level classes such as 'Request', 'Response', 'Application', 'Middleware' in order to streamline server side application development. Out of the box allows to serve requests using 'Rserve' package, but flexible enough to integrate with other HTTP servers such as 'httpuv'.

Maintained by Dmitry Selivanov. Last updated 3 days ago.

http-server openapi rest-api swagger-ui cpp

8.7 match 283 stars 9.56 score 95 scripts 1 dependents

bnosac

tokenizers.bpe:Byte Pair Encoding Text Tokenization

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.

Maintained by Jan Wijffels. Last updated 2 years ago.

bpe byte-pair-encoding text-mining tokenization cpp

17.3 match 15 stars 4.56 score 48 scripts

ironholds

urltools:Vectorised Tools for URL Handling and Parsing

A toolkit for all URL-handling needs, including encoding and decoding, parsing, parameter extraction and modification. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.

Maintained by Os Keyes. Last updated 4 years ago.

access-logs data-import url cpp

5.7 match 131 stars 13.43 score 968 scripts 264 dependents

bioc

matter:Out-of-core statistical computing and signal processing

Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.

Maintained by Kylie A. Bemis. Last updated 3 months ago.

infrastructure datarepresentation dataimport dimensionreduction preprocessing cpp

7.9 match 57 stars 9.52 score 64 scripts 2 dependents

hrbrmstr

qrencoder:Quick Response Code (QR Code) / Matrix Barcode Creator

Quick Response codes (QR codes) are a type of matrix bar code and can be used to authenticate transactions, provide access to multi-factor authentication services and enable general data transfer in an image. QR codes use four standardized encoding modes (numeric, alphanumeric, byte/binary, and kanji) to efficiently store data. Matrix barcode generation is performed efficiently in C via the included 'libqrencoder' library created by Kentaro Fukuchi.

Maintained by Bob Rudis. Last updated 6 years ago.

qrcode qrcode-generator cpp

12.1 match 61 stars 6.03 score 59 scripts 1 dependents

bioc

BPRMeth:Model higher-order methylation profiles

The BPRMeth package is a probabilistic method to quantify explicit features of methylation profiles, in a way that would make it easier to formally use such profiles in downstream modelling efforts, such as predicting gene expression levels or clustering genomic regions or cells according to their methylation profiles.

Maintained by Chantriolnt-Andreas Kapourani. Last updated 5 months ago.

immunooncology dnamethylation geneexpression generegulation epigenetics genetics clustering featureextraction regression rnaseq bayesian kegg sequencing coverage singlecell openblas cpp

12.5 match 5.75 score 94 scripts 1 dependents

merck

r2rtf:Easily Create Production-Ready Rich Text Format (RTF) Tables and Figures

Create production-ready Rich Text Format (RTF) tables and figures with flexible format.

Maintained by Benjamin Wang. Last updated 5 days ago.

6.6 match 78 stars 10.82 score 171 scripts 10 dependents

computationalstylistics

stylo:Stylometric Multivariate Analyses

Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.

Maintained by Maciej Eder. Last updated 2 months ago.

8.3 match 186 stars 8.59 score 462 scripts

jeroen

base64:Base64 Encoder and Decoder

Compatibility wrapper to replace the orphaned package. New applications should use base64 encoders from 'jsonlite' or 'openssl' or 'base64enc'.

Maintained by Jeroen Ooms. Last updated 5 months ago.

10.6 match 2 stars 6.62 score 163 scripts 42 dependents

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 6 days ago.

4.5 match 100 stars 15.36 score 1.4k scripts 36 dependents

bioc

GenomicAlignments:Representation and manipulation of short genomic alignments

Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.

Maintained by Hervé Pagès. Last updated 5 months ago.

infrastructure dataimport genetics sequencing rnaseq snp coverage alignment immunooncology bioconductor-package core-package

4.7 match 10 stars 13.61 score 3.1k scripts 529 dependents

r-lib

jose:JavaScript Object Signing and Encryption

Read and write JSON Web Keys (JWK, rfc7517), generate and verify JSON Web Signatures (JWS, rfc7515) and encode/decode JSON Web Tokens (JWT, rfc7519) <https://datatracker.ietf.org/wg/jose/documents/>. These standards provide modern signing and encryption formats that are natively supported by browsers via the JavaScript WebCryptoAPI <https://www.w3.org/TR/WebCryptoAPI/#jose>, and used by services like OAuth 2.0, LetsEncrypt, and Github Apps.

Maintained by Jeroen Ooms. Last updated 5 months ago.

5.6 match 50 stars 10.98 score 63 scripts 35 dependents

bioc

motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).

Maintained by Simon Gert Coetzee. Last updated 5 months ago.

chipseq visualization motifannotation transcription

6.5 match 28 stars 8.96 score 103 scripts

bioc

DNAshapeR:High-throughput prediction of DNA shape features

DNAhapeR is an R/BioConductor package for ultra-fast, high-throughput predictions of DNA shape features. The package allows to predict, visualize and encode DNA shape features for statistical learning.

Maintained by Tsu-Pei Chiu. Last updated 5 months ago.

structuralprediction dna3dstructure software cpp

10.4 match 5.57 score 37 scripts

extendr

b64:Fast and Vectorized Base 64 Engine

Provides a fast, lightweight, and vectorized base 64 engine to encode and decode character and raw vectors as well as files stored on disk. Common base 64 alphabets are supported out of the box including the standard, URL-safe, bcrypt, crypt, 'BinHex', and IMAP-modified UTF-7 alphabets. Custom engines can be created to support unique base 64 encoding and decoding needs.

Maintained by Josiah Parry. Last updated 2 months ago.

rust cargo

9.5 match 16 stars 6.09 score 4 scripts 3 dependents

thomasp85

farver:High Performance Colour Space Manipulation

The encoding of colour can be handled in many different ways, using different colour spaces. As different colour spaces have different uses, efficient conversion between these representations are important. The 'farver' package provides a set of functions that gives access to very fast colour space conversion and comparisons implemented in C++, and offers speed improvements over the 'convertColor' function in the 'grDevices' package.

Maintained by Thomas Lin Pedersen. Last updated 10 months ago.

color-conversion cpp

4.0 match 136 stars 14.17 score 164 scripts 7.9k dependents

rstudio

httpuv:HTTP and WebSocket Server Library

Provides low-level socket and protocol support for handling HTTP and WebSocket requests directly from within R. It is primarily intended as a building block for other packages, rather than making it particularly easy to create complete web applications using httpuv alone. httpuv is built on top of the libuv and http-parser C libraries, both of which were developed by Joyent, Inc. (See LICENSE file for libuv and http-parser license information.)

Maintained by Winston Chang. Last updated 12 months ago.

libuv1 cpp

3.8 match 235 stars 15.09 score 708 scripts 2.1k dependents

eitsupi

neopolars:R Bindings for the 'polars' Rust Library

Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.

Maintained by Tatsuya Shima. Last updated 12 hours ago.

rust cargo

11.6 match 40 stars 4.86 score 1 scripts

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

5.2 match 10.82 score 10k scripts 54 dependents

bioc

immApex:Tools for Adaptive Immune Receptor Sequence-Based Machine and Deep Learning

A set of tools to build tensorflow/keras3-based models in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.

Maintained by Nick Borcherding. Last updated 19 days ago.

software immunooncology singlecell classification annotation sequencing motifannotation

9.5 match 8 stars 5.92 score 3 scripts

cardiomoon

rrtable:Reproducible Research with a Table of R Codes

Makes documents containing plots and tables from a table of R codes. Can make "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents with or without R code. In the package, modularized 'shiny' app codes are provided. These modules are intended for reuse across applications.

Maintained by Keon-Woong Moon. Last updated 2 years ago.

8.6 match 3 stars 6.45 score 76 scripts 2 dependents

dataoneorg

dataone:R Interface to the DataONE REST API

Provides read and write access to data and metadata from the DataONE network <https://www.dataone.org> of data repositories. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository. Users can also insert and update data objects on repositories that support these methods.

Maintained by Matthew B. Jones. Last updated 3 years ago.

5.5 match 36 stars 9.93 score 472 scripts 3 dependents

bioc

Biostrings:Efficient manipulation of biological strings

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.

Maintained by Hervé Pagès. Last updated 23 days ago.

sequencematching alignment sequencing genetics dataimport datarepresentation infrastructure bioconductor-package core-package

3.0 match 61 stars 17.83 score 8.6k scripts 1.2k dependents

hrbrmstr

vegalite:Tools to Encode Visualizations with the 'Grammar of Graphics'-Like 'Vega-Lite' 'Spec'

The 'Vega-Lite' 'JavaScript' framework provides a higher-level grammar for visual analysis, akin to 'ggplot' or 'Tableau', that generates complete 'Vega' specifications. Functions exist which enable building a valid 'spec' from scratch or importing a previously created 'spec' file. Functions also exist to export 'spec' files and to generate code which will enable plots to be embedded in properly configured web pages. The default behavior is to generate an 'htmlwidget'.

Maintained by Bob Rudis. Last updated 7 years ago.

data-visualization datavisualization vega-lite vega-lite-spec visualization widget

7.0 match 158 stars 7.60 score 84 scripts

jeroen

jsonlite:A Simple and Robust JSON Parser and Generator for R

A reasonably fast JSON parser and generator, optimized for statistical data and the web. Offers simple, flexible tools for working with JSON in R, and is particularly powerful for building pipelines and interacting with a web API. The implementation is based on the mapping described in the vignette (Ooms, 2014). In addition to converting JSON data from/to R objects, 'jsonlite' contains functions to stream, validate, and prettify JSON data. The unit tests included with the package verify that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.

Maintained by Jeroen Ooms. Last updated 22 days ago.

json parser

2.5 match 384 stars 21.15 score 27k scripts 8.6k dependents

bioc

cTRAP:Identification of candidate causal perturbations from differential gene expression data

Compare differential gene expression results with those from known cellular perturbations (such as gene knock-down, overexpression or small molecules) derived from the Connectivity Map. Such analyses allow not only to infer the molecular causes of the observed difference in gene expression but also to identify small molecules that could drive or revert specific transcriptomic alterations.

Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.

differentialexpression geneexpression rnaseq transcriptomics pathways immunooncology genesetenrichment bioconductor bioinformatics cmap gene-expression l1000

10.4 match 5 stars 5.08 score 16 scripts

laresbernardo

lares:Analytics & Machine Learning Sidekick

Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.

Maintained by Bernardo Lares. Last updated 23 days ago.

analytics api automation automl data-science descriptive-statistics h2o machine-learning marketing mmm predictive-modeling puzzle rlanguage robyn visualization

5.2 match 233 stars 9.84 score 185 scripts 1 dependents

kwb-r

kwb.utils:General Utility Functions Developed at KWB

This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).

Maintained by Hauke Sonnenberg. Last updated 12 months ago.

6.8 match 8 stars 7.33 score 12 scripts 78 dependents

qsbase

qs2:Efficient Serialization of R Objects

Streamlines and accelerates the process of saving and loading R objects, improving speed and compression compared to other methods. The package provides two compression formats: the 'qs2' format, which uses R serialization via the C API while optimizing compression and disk I/O, and the 'qdata' format, featuring custom serialization for slightly faster performance and better compression. Additionally, the 'qs2' format can be directly converted to the standard 'RDS' format, ensuring long-term compatibility with future versions of R.

Maintained by Travers Ching. Last updated 9 days ago.

compression data-storage serialization cpp

6.4 match 15 stars 7.57 score 25 scripts 2 dependents

mbojan

rgraph6:Representing Graphs as 'graph6', 'digraph6' or 'sparse6' Strings

Encode network data as strings of printable ASCII characters. Implemented functions include encoding and decoding adjacency matrices, edgelists, igraph, and network objects to/from formats 'graph6', 'sparse6', and 'digraph6'. The formats and methods are described in McKay, B.D. and Piperno, A (2014) <doi:10.1016/j.jsc.2013.09.003>.

Maintained by Michal Bojanowski. Last updated 6 months ago.

network-analysis cpp

9.4 match 12 stars 5.08 score 8 scripts

modal-inria

cfda:Categorical Functional Data Analysis

Package for the analysis of categorical functional data. The main purpose is to compute an encoding (real functional variable) for each state <doi:10.3390/math9233074>. It also provides functions to perform basic statistical analysis on categorical functional data.

Maintained by Quentin Grimonprez. Last updated 2 months ago.

categorical-data functional-data-analysis hacktoberfest

10.3 match 4 stars 4.60 score 3 scripts

teunbrand

ggh4x:Hacks for 'ggplot2'

A 'ggplot2' extension that does a variety of little helpful things. The package extends 'ggplot2' facets through customisation, by setting individual scales per panel, resizing panels and providing nested facets. Also allows multiple colour and fill scales per plot. Also hosts a smaller collection of stats, geoms and axis guides.

Maintained by Teun van den Brand. Last updated 3 months ago.

ggplot-extension ggplot2

3.3 match 616 stars 13.98 score 4.4k scripts 20 dependents

tidyverse

ellmer:Chat with Large Language Models

Chat with large language models from a range of providers including 'Claude' <https://claude.ai>, 'OpenAI' <https://chatgpt.com>, and more. Supports streaming, asynchronous calls, tool calling, and structured data extraction.

Maintained by Hadley Wickham. Last updated 5 hours ago.

3.7 match 391 stars 12.65 score 98 scripts 7 dependents

eltoulemonde

dataPreparation:Automated Data Preparation

Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of 'data.table' efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.

Maintained by Emmanuel-Lin Toulemonde. Last updated 2 years ago.

data-preparation data-preprocessing data-science date-conversion speed variable-elimination variable-selection

8.5 match 31 stars 5.46 score 86 scripts

sdam-au

sdam:Digital Tools for the SDAM Project at Aarhus University

Provides digital tools for performing analyses within Social Dynamics and complexity in the Ancient Mediterranean (SDAM), which is a research group based at the Department of History and Classical Studies at Aarhus University.

Maintained by Antonio Rivero Ostoic. Last updated 3 years ago.

aarhus-university cartography data-visualization dataset digital-humanities encoding extract inscriptions rest-api temporal

11.8 match 4 stars 3.86 score 36 scripts

jeroen

openssl:Toolkit for Encryption, Signatures and Certificates Based on OpenSSL

Bindings to OpenSSL libssl and libcrypto, plus custom SSH key parsers. Supports RSA, DSA and EC curves P-256, P-384, P-521, and curve25519. Cryptographic signatures can either be created and verified manually or via x509 certificates. AES can be used in cbc, ctr or gcm mode for symmetric encryption; RSA for asymmetric (public key) encryption or EC for Diffie Hellman. High-level envelope functions combine RSA and AES for encrypting arbitrary sized data. Other utilities include key generators, hash functions (md5, sha1, sha256, etc), base64 encoder, a secure random number generator, and 'bignum' math methods for manually performing crypto calculations on large multibyte integers.

Maintained by Jeroen Ooms. Last updated 1 months ago.

openssl

2.5 match 65 stars 18.00 score 632 scripts 5.0k dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

5.5 match 3 stars 8.20 score 7.8k scripts 11 dependents

jeroen

curl:A Modern and Flexible Web Client for R

Bindings to 'libcurl' <https://curl.se/libcurl/> for performing fully configurable HTTP/FTP requests where responses can be processed in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of 'libcurl' is recommended; for a more-user-friendly web client see the 'httr2' package which builds on this package with http specific tools and logic.

Maintained by Jeroen Ooms. Last updated 22 days ago.

curl

2.3 match 225 stars 19.95 score 4.0k scripts 5.8k dependents

ices-tools-prod

TAF:Transparent Assessment Framework for Reproducible Research

General framework to organize data, methods, and results used in reproducible scientific analyses. A TAF analysis consists of four scripts (data.R, model.R, output.R, report.R) that are run sequentially. Each script starts by reading files from a previous step and ends with writing out files for the next step. Convenience functions are provided to version control the required data and software, run analyses, clean residues from previous runs, manage files, manipulate tables, and produce figures. With a focus on stability and reproducible analyses, the TAF package comes with no dependencies. TAF forms a base layer for the 'icesTAF' package and other scientific applications.

Maintained by Arni Magnusson. Last updated 4 months ago.

6.5 match 3 stars 6.85 score 282 scripts 2 dependents

bioc

atSNP:Affinity test for identifying regulatory SNPs

atSNP performs affinity tests of motif matches with the SNP or the reference genomes and SNP-led changes in motif matches.

Maintained by Sunyoung Shin. Last updated 5 months ago.

software chipseq genomeannotation motifannotation visualization cpp

7.7 match 1 stars 5.73 score 36 scripts

tidyverse

stringr:Simple, Consistent Wrappers for Common String Operations

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.

Maintained by Hadley Wickham. Last updated 7 months ago.

regular-expression strings

2.0 match 622 stars 21.97 score 164k scripts 8.2k dependents

blasbenito

collinear:Automated Multicollinearity Management

Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.

Maintained by Blas M. Benito. Last updated 2 months ago.

machine-learning multicollinearity statistics

7.9 match 11 stars 5.51 score 15 scripts 1 dependents

yihui

knitr:A General-Purpose Package for Dynamic Report Generation in R

Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.

Maintained by Yihui Xie. Last updated 1 days ago.

dynamic-documents knitr literate-programming rmarkdown sweave

1.8 match 2.4k stars 23.62 score 116k scripts 4.2k dependents

tidyverse

readr:Read Rectangular Text Data

The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.

Maintained by Jennifer Bryan. Last updated 8 months ago.

csv fwf parsing cpp

2.0 match 1.0k stars 21.03 score 132k scripts 2.0k dependents

luca-scr

GA:Genetic Algorithms

Flexible general-purpose toolbox implementing genetic algorithms (GAs) for stochastic optimisation. Binary, real-valued, and permutation representations are available to optimize a fitness function, i.e. a function provided by users depending on their objective function. Several genetic operators are available and can be combined to explore the best settings for the current task. Furthermore, users can define new genetic operators and easily evaluate their performances. Local search using general-purpose optimisation algorithms can be applied stochastically to exploit interesting regions. GAs can be run sequentially or in parallel, using an explicit master-slave parallelisation or a coarse-grain islands approach. For more details see Scrucca (2013) <doi:10.18637/jss.v053.i04> and Scrucca (2017) <doi:10.32614/RJ-2017-008>.

Maintained by Luca Scrucca. Last updated 6 months ago.

genetic-algorithm optimisation cpp

3.6 match 93 stars 11.58 score 624 scripts 52 dependents

mhahsler

arules:Mining Association Rules and Frequent Itemsets

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.

Maintained by Michael Hahsler. Last updated 1 months ago.

arules association-rules frequent-itemsets

3.0 match 194 stars 13.99 score 3.3k scripts 28 dependents

ecodynizw

vietnameseConverter:Convert Vietnamese Encodings

Conversion of characters from unsupported Vietnamese character encodings to Unicode characters. These Vietnamese encodings (TCVN3, VISCII, VPS) are not natively supported in R and lead to printing of wrong characters and garbled text (mojibake). This package fixes that problem and provides readable output with the correct Unicode characters (with or without diacritics).

Maintained by Juergen Niedballa. Last updated 3 years ago.

10.4 match 2 stars 4.00 score 4 scripts

hetong007

pullword:R Interface to Pullword Service

R Interface to Pullword Service for natural language processing in Chinese. It enables users to extract valuable words from text by deep learning models. For more details please visit the official site (in Chinese) <http://www.pullword.com/>.

Maintained by Tong He. Last updated 4 years ago.

10.5 match 19 stars 3.98 score 1 scripts

tidyverse

lubridate:Make Dealing with Dates a Little Easier

Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects. The 'lubridate' package has a consistent and memorable syntax that makes working with dates easy and fun.

Maintained by Vitalie Spinu. Last updated 3 months ago.

date date-time

1.9 match 757 stars 20.95 score 135k scripts 1.9k dependents

ropensci

redland:RDF Library Bindings in R

Provides methods to parse, query and serialize information stored in the Resource Description Framework (RDF). RDF is described at <https://www.w3.org/TR/rdf-primer/>. This package supports RDF by implementing an R interface to the Redland RDF C library, described at <https://librdf.org/docs/api/index.html>. In brief, RDF provides a structured graph consisting of Statements composed of Subject, Predicate, and Object Nodes.

Maintained by Matthew B. Jones. Last updated 1 years ago.

redland

5.0 match 17 stars 7.85 score 98 scripts 13 dependents

bnosac

sentencepiece:Text Tokenization using Byte Pair Encoding and Unigram Modelling

Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library <https://github.com/google/sentencepiece> which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) <doi:10.18653/v1/D18-2012>. Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using 'word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) <http://www.lrec-conf.org/proceedings/lrec2018/pdf/1049.pdf>.

Maintained by Jan Wijffels. Last updated 2 years ago.

byte natural-language-processing sentencepiece word-segmentation cpp

9.4 match 25 stars 4.10 score 8 scripts

conjugateprior

events:Store and Manipulate Event Data

The events package manipulates, aggregates and otherwise messes with event data from 'KEDS' and 'TABARI' software and those with similar output. It also bundles several classic event data sets. Most functions are superseded by those in 'dplyr' and 'tidyr'.

Maintained by William Lowe. Last updated 3 years ago.

10.9 match 3 stars 3.52 score 22 scripts

jalvesaq

descr:Descriptive Statistics

Weighted frequency and contingency tables of categorical variables and of the comparison of the mean value of a numerical variable by the levels of a factor, and methods to produce xtable objects of the tables and to plot them. There are also functions to facilitate the character encoding conversion of objects, to quickly convert fixed width files into csv ones, and to export a data.frame to a text file with the necessary R and SPSS codes to reread the data.

Maintained by Jakson Aquino. Last updated 1 years ago.

4.3 match 18 stars 8.80 score 692 scripts 4 dependents

bioc

genomation:Summary, annotation and visualization of genomic data

A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.

Maintained by Altuna Akalin. Last updated 5 months ago.

annotation sequencing visualization cpgisland cpp

3.4 match 75 stars 11.09 score 738 scripts 5 dependents

bleutner

RStoolbox:Remote Sensing Data Analysis

Toolbox for remote sensing image processing and analysis such as calculating spectral indexes, principal component transformation, unsupervised and supervised classification or fractional cover analyses.

Maintained by Konstantin Mueller. Last updated 1 months ago.

ggplot2 land-cover-mapping remote-sensing spectral-unmixing supervised-classification unsupervised-classification openblas cpp

3.7 match 275 stars 10.10 score 1.1k scripts

e-sensing

sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes

An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.

Maintained by Gilberto Camara. Last updated 1 months ago.

big-earth-data cbers earth-observation eo-datacubes geospatial image-time-series land-cover-classification landsat planetary-computer r-spatial remote-sensing rspatial satellite-image-time-series satellite-imagery sentinel-2 stac-api stac-catalog cpp

3.9 match 494 stars 9.50 score 384 scripts

tidyverse

rvest:Easily Harvest (Scrape) Web Pages

Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.

Maintained by Hadley Wickham. Last updated 5 months ago.

html web-scraping

1.9 match 1.5k stars 19.62 score 29k scripts 546 dependents

r-rust

gifski:Highest Quality GIF Encoder

Multi-threaded GIF encoder written in Rust: <https://gif.ski/>. Converts images to GIF animations using pngquant's efficient cross-frame palettes and temporal dithering with thousands of colors per frame.

Maintained by Jeroen Ooms. Last updated 5 months ago.

rust cargo

3.6 match 74 stars 10.05 score 2.6k scripts 8 dependents

datawookie

emayili:Send Email Messages

A light, simple tool for sending emails with minimal dependencies.

Maintained by Andrew B. Collier. Last updated 1 months ago.

hacktoberfest

3.8 match 180 stars 9.59 score 95 scripts 3 dependents

bioc

ShortRead:FASTQ input and manipulation

This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

dataimport sequencing qualitycontrol bioconductor-package core-package zlib cpp

3.0 match 8 stars 12.08 score 1.8k scripts 49 dependents

michaelchirico

geohashTools:Tools for Working with Geohashes

Tools for working with Gustavo Niemeyer's geohash coordinate system, including API for interacting with other common R GIS libraries.

Maintained by Michael Chirico. Last updated 1 years ago.

5.0 match 52 stars 7.18 score 30 scripts 6 dependents

rstudio

htmltools:Tools for HTML

Tools for HTML generation and output.

Maintained by Carson Sievert. Last updated 10 months ago.

2.0 match 218 stars 17.61 score 10k scripts 4.5k dependents

plotly

plotly:Create Interactive Web Graphics via 'plotly.js'

Create interactive web graphics from 'ggplot2' graphs and/or a custom interface to the (MIT-licensed) JavaScript library 'plotly.js' inspired by the grammar of graphics.

Maintained by Carson Sievert. Last updated 3 months ago.

d3js data-visualization ggplot2 javascript plotly shiny webgl

1.8 match 2.6k stars 19.36 score 93k scripts 783 dependents

myles-lewis

nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'

Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.

Maintained by Myles Lewis. Last updated 5 days ago.

4.4 match 12 stars 7.92 score 46 scripts

ropensci

magick:Advanced Graphics and Image-Processing in R

Bindings to 'ImageMagick': the most comprehensive open-source image processing library available. Supports many common formats (png, jpeg, tiff, pdf, etc) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio images are automatically previewed when printed to the console, resulting in an interactive editing environment. The latest version of the package includes a native graphics device for creating in-memory graphics or drawing onto images using pixel coordinates.

Maintained by Jeroen Ooms. Last updated 19 days ago.

image-manipulation image-processing imagemagick cpp

2.0 match 468 stars 17.31 score 9.0k scripts 256 dependents

bioc

ensembldb:Utilities to create and use Ensembl-based annotation databases

The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.

Maintained by Johannes Rainer. Last updated 5 months ago.

genetics annotationdata sequencing coverage annotation bioconductor bioconductor-packages ensembl

2.4 match 35 stars 14.08 score 892 scripts 108 dependents

hauselin

ollamar:'Ollama' Language Models

An interface to easily run local language models with 'Ollama' <https://ollama.com> server and API endpoints (see <https://github.com/ollama/ollama/blob/main/docs/api.md> for details). It lets you run open-source large language models locally on your machine.

Maintained by Hause Lin. Last updated 2 months ago.

ai api llm llms ollama ollama-api

3.6 match 84 stars 9.36 score 74 scripts 5 dependents

hetong007

rLTP:R Interface to the 'LTP'-Cloud Service

R interface to the 'LTP'-Cloud service for Natural Language Processing in Chinese (http://www.ltp-cloud.com/).

Maintained by Tong He. Last updated 8 years ago.

10.5 match 3 stars 3.18 score 1 scripts

sfirke

janitor:Simple Tools for Examining and Cleaning Dirty Data

The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and explore duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness.

Maintained by Sam Firke. Last updated 3 months ago.

data-analysis data-cleaning data-science dirty-data excel pivot-tables spss tabulations tidyverse

1.7 match 1.4k stars 19.15 score 35k scripts 231 dependents

schaffman5

rtf:Rich Text Format (RTF) Output

A set of R functions to output Rich Text Format (RTF) files with high resolution tables and graphics that may be edited with a standard word processor such as Microsoft Word.

Maintained by Michael E. Schaffer. Last updated 6 years ago.

3.8 match 5 stars 8.55 score 169 scripts 10 dependents

joshwlambert

DAISIEprep:Extracts Phylogenetic Island Community Data from Phylogenetic Trees

Extracts colonisation and branching times of island species to be used for analysis in the R package 'DAISIE'. It uses phylogenetic and endemicity data to extract the separate island colonists and store them.

Maintained by Joshua W. Lambert. Last updated 1 months ago.

data-science island-biogeography phylogenetics

4.7 match 6 stars 6.78 score 24 scripts

r-lib

httr2:Perform HTTP Requests and Process the Responses

Tools for creating and modifying HTTP requests, then performing them and processing the results. 'httr2' is a modern re-imagining of 'httr' that uses a pipe-based interface and solves more of the problems that API wrapping packages face.

Maintained by Hadley Wickham. Last updated 7 days ago.

http

1.8 match 246 stars 17.66 score 1.9k scripts 1.1k dependents

sherrisherry

cleandata:To Inspect and Manipulate Data; and to Keep Track of This Process

Functions to work with data frames to prepare data for further analysis. The functions for imputation, encoding, partitioning, and other manipulation can produce log files to keep track of process.

Maintained by Sherry Zhao. Last updated 6 years ago.

data-analysis data-mining machine-learning wrangling

8.5 match 3 stars 3.72 score 35 scripts

r-lib

processx:Execute and Control System Processes

Tools to run system processes in the background. It can check if a background process is running; wait on a background process to finish; get the exit status of finished processes; kill background processes. It can read the standard output and error of the processes, using non-blocking connections. 'processx' can poll a process for standard output or error, with a timeout. It can also poll several processes at once.

Maintained by Gábor Csárdi. Last updated 22 days ago.

2.0 match 235 stars 15.53 score 340 scripts 1.4k dependents

cran

tmcn:A Text Mining Toolkit for Chinese

A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.

Maintained by Jian Li. Last updated 6 years ago.

12.9 match 1 stars 2.38 score 5 dependents

henrikbengtsson

aroma.affymetrix:Analysis of Large Affymetrix Microarray Data Sets

A cross-platform R framework that facilitates processing of any number of Affymetrix microarray samples regardless of computer system. The only parameter that limits the number of chips that can be processed is the amount of available disk space. The Aroma Framework has successfully been used in studies to process tens of thousands of arrays. This package has actively been used since 2006.

Maintained by Henrik Bengtsson. Last updated 1 years ago.

infrastructure proprietaryplatforms exonarray microarray onechannel gui dataimport datarepresentation preprocessing qualitycontrol visualization reportwriting acgh copynumbervariants differentialexpression geneexpression snp transcription affymetrix analysis copy-number dna expression hpc large-scale notebook reproducibility rna

5.3 match 10 stars 5.79 score 112 scripts 3 dependents

jhelvy

logitr:Logit Models w/Preference & WTP Space Utility Parameterizations

Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R. Models can be estimated using "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. Weighted models can also be estimated. An option is available to run a parallelized multistart optimization loop with random starting points in each iteration, which is useful for non-convex problems like MXL models or models with WTP space utility parameterizations. The main optimization loop uses the 'nloptr' package to minimize the negative log-likelihood function. Additional functions are available for computing and comparing WTP from both preference space and WTP space models and for predicting expected choices and choice probabilities for sets of alternatives based on an estimated model. Mixed logit models can include uncorrelated or correlated heterogeneity covariances and are estimated using maximum simulated likelihood based on the algorithms in Train (2009) <doi:10.1017/CBO9780511805271>. More details can be found in Helveston (2023) <doi:10.18637/jss.v105.i10>.

Maintained by John Helveston. Last updated 4 months ago.

log-likelihood logit logit-model mixed-logit mlogit multinomial-regression mxl mxl-models preference-space preferences willingness-to-pay wtp

3.3 match 54 stars 9.10 score 119 scripts 1 dependents

bioc

Melissa:Bayesian clustering and imputationa of single cell methylomes

Melissa is a Baysian probabilistic model for jointly clustering and imputing single cell methylomes. This is done by taking into account local correlations via a Generalised Linear Model approach and global similarities using a mixture modelling approach.

Maintained by C. A. Kapourani. Last updated 5 months ago.

immunooncology dnamethylation geneexpression generegulation epigenetics genetics clustering featureextraction regression rnaseq bayesian kegg sequencing coverage singlecell

6.2 match 4.90 score 7 scripts

bioc

ChIPpeakAnno:Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments, or any experiments that result in large number of genomic interval data

The package encompasses a range of functions for identifying the closest gene, exon, miRNA, or custom features—such as highly conserved elements and user-supplied transcription factor binding sites. Additionally, users can retrieve sequences around the peaks and obtain enriched Gene Ontology (GO) or Pathway terms. In version 2.0.5 and beyond, new functionalities have been introduced. These include features for identifying peaks associated with bi-directional promoters along with summary statistics (peaksNearBDP), summarizing motif occurrences in peaks (summarizePatternInPeaks), and associating additional identifiers with annotated peaks or enrichedGO (addGeneIDs). The package integrates with various other packages such as biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest, and stat to enhance its analytical capabilities.

Maintained by Jianhong Ou. Last updated 2 months ago.

annotation chipseq chipchip

3.4 match 8.75 score 584 scripts 6 dependents

cran

RCurl:General Network (HTTP/FTP/...) Client Interface for R

A wrapper for 'libcurl' <https://curl.se/libcurl/> Provides functions to allow one to compose general HTTP requests and provides convenient functions to fetch URIs, get & post forms, etc. and process the results returned by the Web server. This provides a great deal of control over the HTTP/FTP/... connection and the form of the request while providing a higher-level interface than is available just using R socket connections. Additionally, the underlying implementation is robust and extensive, supporting FTP/FTPS/TFTP (uploads and downloads), SSL/HTTPS, telnet, dict, ldap, and also supports cookies, redirects, authentication, etc.

Maintained by CRAN Team. Last updated 8 months ago.

curl

3.6 match 2 stars 8.13 score 1.0k dependents

bflammers

ANN2:Artificial Neural Networks for Anomaly Detection

Training of neural networks for classification and regression tasks using mini-batch gradient descent. Special features include a function for training autoencoders, which can be used to detect anomalies, and some related plotting functions. Multiple activation functions are supported, including tanh, relu, step and ramp. For the use of the step and ramp activation functions in detecting anomalies using autoencoders, see Hawkins et al. (2002) <doi:10.1007/3-540-46145-0_17>. Furthermore, several loss functions are supported, including robust ones such as Huber and pseudo-Huber loss, as well as L1 and L2 regularization. The possible options for optimization algorithms are RMSprop, Adam and SGD with momentum. The package contains a vectorized C++ implementation that facilitates fast training through mini-batch learning.

Maintained by Bart Lammers. Last updated 4 years ago.

anomaly-detection artificial-neural-networks autoencoders neural-networks robust-statistics openblas cpp openmp

5.3 match 13 stars 5.59 score 60 scripts

riatelab

gepaf:Google Encoded Polyline Algorithm Format

Encode and decode the Google Encoded Polyline Algorithm Format. See <https://developers.google.com/maps/documentation/utilities/polylinealgorithm> for more information.

Maintained by Timothée Giraud. Last updated 5 months ago.

7.5 match 6 stars 3.84 score 23 scripts

alshum

hashids:Generate Short Unique YouTube-Like IDs (Hashes) from Integers

An R port of the hashids library. hashids generates YouTube-like hashes from integers or vector of integers. Hashes generated from integers are relatively short, unique and non-seqential. hashids can be used to generate unique ids for URLs and hide database row numbers from the user. By default hashids will avoid generating common English cursewords by preventing certain letters being next to each other. hashids are not one-way: it is easy to encode an integer to a hashid and decode a hashid back into an integer.

Maintained by Alex Shum. Last updated 6 years ago.

7.0 match 18 stars 4.10 score 14 scripts

thierryo

qrcode:Generate QRcodes with R

Create static QR codes in R. The content of the QR code is exactly what the user defines. We don't add a redirect URL, making it impossible for us to track the usage of the QR code. This allows to generate fast, free to use and privacy friendly QR codes.

Maintained by Thierry Onkelinx. Last updated 6 months ago.

qrcode qrcode-generator r-project

3.6 match 44 stars 7.56 score 456 scripts 7 dependents

enricoschumann

textutils:Utilities for Handling Strings and Text

Utilities for handling character vectors that store human-readable text (either plain or with markup, such as HTML or LaTeX). The package provides, in particular, functions that help with the preparation of plain-text reports, e.g. for expanding and aligning strings that form the lines of such reports. The package also provides generic functions for transforming R objects to HTML and to plain text.

Maintained by Enrico Schumann. Last updated 2 months ago.

html5 string-manipulation

3.7 match 11 stars 7.37 score 47 scripts 12 dependents

dmkaplan2000

knitrdata:Data Language Engine for 'knitr' / 'rmarkdown'

Implements a data language engine for incorporating data directly in 'rmarkdown' documents so that they can be made completely standalone.

Maintained by David M. Kaplan. Last updated 3 years ago.

5.7 match 7 stars 4.75 score 16 scripts

gravesee

onehot:Fast Onehot Encoding for Data.frames

Quickly create numeric matrices for machine learning algorithms that require them. It converts factor columns into onehot vectors.

Maintained by Eric E. Graves. Last updated 6 years ago.

4.9 match 11 stars 5.45 score 86 scripts 2 dependents

egenn

rtemis:Machine Learning and Visualization

Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.

Maintained by E.D. Gennatas. Last updated 1 months ago.

data-science data-visualization machine-learning machine-learning-library visualization

3.8 match 145 stars 7.09 score 50 scripts 2 dependents

tidymodels

hardhat:Construct Modeling Packages

Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.

Maintained by Hannah Frick. Last updated 1 months ago.

1.8 match 103 stars 14.88 score 175 scripts 436 dependents

aphalo

SunCalcMeeus:Sun Position and Daylight Calculations

Compute the position of the sun, and local solar time using Meeus' formulae. Compute day and/or night length using different twilight definitions or arbitrary sun elevation angles. This package is part of the 'r4photobiology' suite, Aphalo, P. J. (2015) <doi:10.19232/uv4pb.2015.1.14>. Algorithms from Meeus (1998, ISBN:0943396611).

Maintained by Pedro J. Aphalo. Last updated 2 months ago.

4.0 match 1 stars 6.49 score 6 scripts 13 dependents

r-lib

desc:Manipulate DESCRIPTION Files

Tools to read, write, create, and manipulate DESCRIPTION files. It is intended for packages that create or manipulate other packages.

Maintained by Gábor Csárdi. Last updated 1 months ago.

1.8 match 123 stars 14.68 score 409 scripts 1.1k dependents

trevorhastie

glmnet:Lasso and Elastic-Net Regularized Generalized Linear Models

Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple-response Gaussian, and the grouped multinomial regression; see <doi:10.18637/jss.v033.i01> and <doi:10.18637/jss.v039.i05>. There are two new and important additions. The family argument can be a GLM family object, which opens the door to any programmed family (<doi:10.18637/jss.v106.i01>). This comes with a modest computational cost, so when the built-in families suffice, they should be used instead. The other novelty is the relax option, which refits each of the active sets in the path unpenalized. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the papers cited.

Maintained by Trevor Hastie. Last updated 2 years ago.

fortran cpp

1.7 match 82 stars 15.15 score 22k scripts 736 dependents

luca-scr

mclust:Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation

Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.

Maintained by Luca Scrucca. Last updated 11 months ago.

fortran openblas

2.0 match 21 stars 12.23 score 6.6k scripts 587 dependents

bioc

IPO:Automated Optimization of XCMS Data Processing parameters

The outcome of XCMS data processing strongly depends on the parameter settings. IPO (`Isotopologue Parameter Optimization`) is a parameter optimization tool that is applicable for different kinds of samples and liquid chromatography coupled to high resolution mass spectrometry devices, fast and free of labeling steps. IPO uses natural, stable 13C isotopes to calculate a peak picking score. Retention time correction is optimized by minimizing the relative retention time differences within features and grouping parameters are optimized by maximizing the number of features showing exactly one peak from each injection of a pooled sample. The different parameter settings are achieved by design of experiment. The resulting scores are evaluated using response surface models.

Maintained by Thomas Lieb. Last updated 5 months ago.

immunooncology metabolomics massspectrometry

3.0 match 34 stars 8.14 score 41 scripts

symbolrush

osrmr:Wrapper for the 'OSRM' API

Wrapper around the 'Open Source Routing Machine (OSRM)' API <http://project-osrm.org/>. 'osrmr' works with API versions 4 and 5 and can handle servers that run locally as well as the 'OSRM' webserver.

Maintained by Adrian Staempfli. Last updated 4 years ago.

8.0 match 3.06 score 23 scripts

schochastics

shortuuid:Generate and Translate Standard UUIDs

Generate and translate standard UUIDs into shorter - or just different - formats and back. Also implements base58 encoders and decoders.

Maintained by David Schoch. Last updated 7 months ago.

cpp

9.4 match 4 stars 2.60 score 4 scripts

myominnoo

mStats:Medical Statistics & Epidemiological Analysis

A set of tidyverse-friendly functions for data management, calculation of epidemiological measures, statistical analysis, and table creation.

Maintained by Myo Minn Oo. Last updated 1 years ago.

data-management epidemiological-calculations medical-statistics

4.9 match 4.98 score 16 scripts 1 dependents

rubenarslan

codebook:Automatic Codebooks from Metadata Encoded in Dataset Attributes

Easily automate the following tasks to describe data frames: Summarise the distributions, and labelled missings of variables graphically and using descriptive statistics. For surveys, compute and summarise reliabilities (internal consistencies, retest, multilevel) for psychological scales. Combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on 'rmarkdown' partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.) and in JSON-LD, so that search engines can find your data and index the metadata. The metadata are also available at your fingertips via RStudio Addins.

Maintained by Ruben Arslan. Last updated 3 months ago.

codebook documentation formr json-ld metadata spss webapp

2.9 match 142 stars 8.31 score 229 scripts

ben519

mltools:Machine Learning Tools

A collection of machine learning helper functions, particularly assisting in the Exploratory Data Analysis phase. Makes heavy use of the 'data.table' package for optimal speed and memory efficiency. Highlights include a versatile bin_data() function, sparsify() for converting a data.table to sparse matrix format with one-hot encoding, fast evaluation metrics, and empirical_cdf() for calculating empirical Multivariate Cumulative Distribution Functions.

Maintained by Ben Gorman. Last updated 3 years ago.

exploratory-data-analysis machine-learning

2.5 match 72 stars 9.58 score 1.2k scripts 13 dependents

cran

notebookutils:Dummy R APIs Used in 'Azure Synapse Analytics' for Local Developments

This is a pure dummy interfaces package which mirrors 'MsSparkUtils' APIs <https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-r> of 'Azure Synapse Analytics' <https://learn.microsoft.com/en-us/azure/synapse-analytics/> for R users, customer of Azure Synapse can download this package from CRAN for local development.

Maintained by runtimeexp. Last updated 11 months ago.

10.2 match 2.36 score 23 scripts

bioc

GenomicDistributions:GenomicDistributions: fast analysis of genomic intervals with Bioconductor

If you have a set of genomic ranges, this package can help you with visualization and comparison. It produces several kinds of plots, for example: Chromosome distribution plots, which visualize how your regions are distributed over chromosomes; feature distance distribution plots, which visualizes how your regions are distributed relative to a feature of interest, like Transcription Start Sites (TSSs); genomic partition plots, which visualize how your regions overlap given genomic features such as promoters, introns, exons, or intergenic regions. It also makes it easy to compare one set of ranges to another.

Maintained by Kristyna Kupkova. Last updated 5 months ago.

software genomeannotation genomeassembly datarepresentation sequencing coverage functionalgenomics visualization

3.2 match 26 stars 7.44 score 25 scripts

bioc

multiGSEA:Combining GSEA-based pathway enrichment with multi omics data integration

Extracted features from pathways derived from 8 different databases (KEGG, Reactome, Biocarta, etc.) can be used on transcriptomic, proteomic, and/or metabolomic level to calculate a combined GSEA-based enrichment score.

Maintained by Sebastian Canzler. Last updated 2 months ago.

genesetenrichment pathways reactome biocarta

3.4 match 18 stars 7.06 score 32 scripts

vandomed

accelerometry:Functions for Processing Accelerometer Data

A collection of functions that perform operations on time-series accelerometer data, such as identify non-wear time, flag minutes that are part of an activity bout, and find the maximum 10-minute average count value. The functions are generally very flexible, allowing for a variety of algorithms to be implemented. Most of the functions are written in C++ for efficiency.

Maintained by Dane R. Van Domelen. Last updated 6 years ago.

accelerometer exercise moving-average physical-activity sedentary-life wearable-devices cpp

3.5 match 6 stars 6.62 score 31 scripts 5 dependents

bioc

ModCon:Modifying splice site usage by changing the mRNP code, while maintaining the genetic code

Collection of functions to calculate a nucleotide sequence surrounding for splice donors sites to either activate or repress donor usage. The proposed alternative nucleotide sequence encodes the same amino acid and could be applied e.g. in reporter systems to silence or activate cryptic splice donor sites.

Maintained by Johannes Ptok. Last updated 5 months ago.

functionalgenomics alternativesplicing

5.8 match 1 stars 4.00 score 2 scripts

bioc

xCell2:A Tool for Generic Cell Type Enrichment Analysis

xCell2 provides methods for cell type enrichment analysis using cell type signatures. It includes three main functions - 1. xCell2Train for training custom references objects from bulk or single-cell RNA-seq datasets. 2. xCell2Analysis for conducting the cell type enrichment analysis using the custom reference. 3. xCell2GetLineage for identifying dependencies between different cell types using ontology.

Maintained by Almog Angel. Last updated 2 months ago.

geneexpression transcriptomics microarray rnaseq singlecell differentialexpression immunooncology genesetenrichment

3.8 match 6 stars 6.17 score 15 scripts

r-dbi

RPostgres:C++ Interface to PostgreSQL

Fully DBI-compliant C++-backed interface to PostgreSQL <https://www.postgresql.org/>, an open-source relational database.

Maintained by Kirill Müller. Last updated 19 days ago.

database postgres postgresql cpp

1.5 match 338 stars 14.78 score 1.6k scripts 31 dependents

gjmvanboxtel

gsignal:Signal Processing

R implementation of the 'Octave' package 'signal', containing a variety of signal processing tools, such as signal generation and measurement, correlation and convolution, filtering, filter design, filter analysis and conversion, power spectrum analysis, system identification, decimation and sample rate change, and windowing.

Maintained by Geert van Boxtel. Last updated 2 months ago.

signal-processing signals cpp

2.3 match 24 stars 10.03 score 133 scripts 34 dependents

shikokuchuo

secretbase:Cryptographic Hash, Extendable-Output and Base64 Functions

Fast and memory-efficient streaming hash functions and base64 encoding / decoding. Hashes strings and raw vectors directly. Stream hashes files which can be larger than memory, as well as in-memory objects through R's serialization mechanism. Implementations include the SHA-256, SHA-3 and 'Keccak' cryptographic hash functions, SHAKE256 extendable-output function (XOF), and 'SipHash' pseudo-random function.

Maintained by Charlie Gao. Last updated 1 days ago.

base64 cryptographic-hash-functions extendable-output-functions keccak sha256 sha3 shake256 siphash

2.8 match 11 stars 8.14 score 8 scripts 24 dependents

melff

memisc:Management of Survey Data and Presentation of Analysis Results

An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.

Maintained by Martin Elff. Last updated 11 days ago.

survey-data

1.8 match 46 stars 12.34 score 1.2k scripts 13 dependents

insightsengineering

tern:Create Common TLGs Used in Clinical Trials

Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.

Maintained by Joe Zhu. Last updated 2 months ago.

clinical-trials graphs listings nest outputs tables

1.8 match 79 stars 12.62 score 186 scripts 9 dependents

r-lib

nanoparquet:Read and Write 'Parquet' Files

Self-sufficient reader and writer for flat 'Parquet' files. Can read most 'Parquet' data types. Can write many 'R' data types, including factors and temporal types. See docs for limitations.

Maintained by Gábor Csárdi. Last updated 22 days ago.

parquet cpp

2.3 match 60 stars 9.78 score 99 scripts 8 dependents

symbolixau

googleway:Accesses Google Maps APIs to Retrieve Data and Plot Maps

Provides a mechanism to plot a 'Google Map' from 'R' and overlay it with shapes and markers. Also provides access to 'Google Maps' APIs, including places, directions, roads, distances, geocoding, elevation and timezone.

Maintained by David Cooley. Last updated 6 months ago.

google-map google-maps google-maps-api google-maps-javascript-api spatial spatial-analysis

2.3 match 236 stars 9.67 score 536 scripts 2 dependents

ouhscbbmc

REDCapR:Interaction Between R and REDCap

Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.

Maintained by Will Beasley. Last updated 2 months ago.

redcap redcap-api

1.8 match 118 stars 12.36 score 438 scripts 6 dependents

andrija-djurovic

PDtoolkit:Collection of Tools for PD Rating Model Development and Validation

The goal of this package is to cover the most common steps in probability of default (PD) rating model development and validation. The main procedures available are those that refer to univariate, bivariate, multivariate analysis, calibration and validation. Along with accompanied 'monobin' and 'monobinShiny' packages, 'PDtoolkit' provides functions which are suitable for different data transformation and modeling tasks such as: imputations, monotonic binning of numeric risk factors, binning of categorical risk factors, weights of evidence (WoE) and information value (IV) calculations, WoE coding (replacement of risk factors modalities with WoE values), risk factor clustering, area under curve (AUC) calculation and others. Additionally, package provides set of validation functions for testing homogeneity, heterogeneity, discriminatory and predictive power of the model.

Maintained by Andrija Djurovic. Last updated 1 years ago.

4.5 match 14 stars 4.78 score 86 scripts

trelliscope

trelliscope:Create Interactive Multi-Panel Displays

Trelliscope enables interactive exploration of data frames of visualizations.

Maintained by Ryan Hafen. Last updated 7 months ago.

visualization

3.3 match 29 stars 6.43 score 117 scripts

fanhansen

creditmodel:Toolkit for Credit Modeling, Analysis and Visualization

Provides a highly efficient R tool suite for Credit Modeling, Analysis and Visualization.Contains infrastructure functionalities such as data exploration and preparation, missing values treatment, outliers treatment, variable derivation, variable selection, dimensionality reduction, grid search for hyper parameters, data mining and visualization, model evaluation, strategy analysis etc. This package is designed to make the development of binary classification models (machine learning based models as well as credit scorecard) simpler and faster. The references including: 1 Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS; 2 Bezdek, James C.FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences (0098-3004),<DOI:10.1016/0098-3004(84)90020-7>.

Maintained by Dongping Fan. Last updated 3 years ago.

6.1 match 4 stars 3.48 score 15 scripts

r-lib

gmailr:Access the 'Gmail' 'RESTful' API

An interface to the 'Gmail' 'RESTful' API. Allows access to your 'Gmail' messages, threads, drafts and labels.

Maintained by Jennifer Bryan. Last updated 1 years ago.

1.8 match 230 stars 11.49 score 289 scripts 1 dependents

ruthkr

deepredeff:Deep Learning Prediction of Effectors

A tool that contains trained deep learning models for predicting effector proteins. 'deepredeff' has been trained to identify effector proteins using a set of known experimentally validated effectors from either bacteria, fungi, or oomycetes. Documentation is available via several vignettes, and the paper by Kristianingsih and MacLean (2020) <doi:10.1101/2020.07.08.193250>.

Maintained by Ruth Kristianingsih. Last updated 2 years ago.

4.3 match 4 stars 4.86 score 18 scripts

apache

nanoarrow:Interface to the 'nanoarrow' 'C' Library

Provides an 'R' interface to the 'nanoarrow' 'C' library and the 'Apache Arrow' application binary interface. Functions to import and export 'ArrowArray', 'ArrowSchema', and 'ArrowArrayStream' 'C' structures to and from 'R' objects are provided alongside helpers to facilitate zero-copy data transfer among 'R' bindings to libraries implementing the 'Arrow' 'C' data interface.

Maintained by Dewey Dunnington. Last updated 23 hours ago.

cpp

1.8 match 183 stars 11.79 score 37 scripts 27 dependents

bodkan

slendr:A Simulation Framework for Spatiotemporal Population Genetics

A framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed by the 'SLiM' software by Haller et al. (2019) <doi:10.1093/molbev/msy228> behind the scenes, using a custom built-in simulation 'SLiM' script. Additionally, fully abstract spatial models not tied to a specific geographic location are supported, and users can also simulate data from standard, non-spatial, random-mating models. These can be simulated either with the 'SLiM' built-in back-end script, or using an efficient coalescent population genetics simulator 'msprime' by Baumdicker et al. (2022) <doi:10.1093/genetics/iyab229> with a custom-built 'Python' script bundled with the R package. Simulated genomic data is saved in a tree-sequence format and can be loaded, manipulated, and summarised using tree-sequence functionality via an R interface to the 'Python' module 'tskit' by Kelleher et al. (2019) <doi:10.1038/s41588-019-0483-y>. Complete model configuration, simulation and analysis pipelines can be therefore constructed without a need to leave the R environment, eliminating friction between disparate tools for population genetic simulations and data analysis.

Maintained by Martin Petr. Last updated 11 days ago.

popgen population-genetics simulations spatial-statistics

2.3 match 56 stars 9.15 score 88 scripts

vjcitn

combinat:combinatorics utilities

routines for combinatorics

Maintained by Vince Carey. Last updated 12 years ago.

2.7 match 7.75 score 744 scripts 229 dependents

richfitz

storr:Simple Key Value Stores

Creates and manages simple key-value stores. These can use a variety of approaches for storing the data. This package implements the base methods and support for file system, in-memory and DBI-based database stores.

Maintained by Rich FitzJohn. Last updated 4 years ago.

2.0 match 117 stars 10.21 score 57 scripts 33 dependents

cran

wavethresh:Wavelets Statistics and Transforms

Performs 1, 2 and 3D real and complex-valued wavelet transforms, nondecimated transforms, wavelet packet transforms, nondecimated wavelet packet transforms, multiple wavelet transforms, complex-valued wavelet transforms, wavelet shrinkage for various kinds of data, locally stationary wavelet time series, nonstationary multiscale transfer function modeling, density estimation.

Maintained by Guy Nason. Last updated 7 months ago.

3.5 match 5.89 score 41 dependents

qinwf

jiebaR:Chinese Text Segmentation

Chinese text segmentation, keyword extraction and speech tagging For R.

Maintained by Qin Wenfeng. Last updated 5 years ago.

chinese chinese-text-segmentation cppjieba jieba lexical-analysis nlp cpp

2.0 match 348 stars 10.18 score 456 scripts 6 dependents

michelnivard

gptstudio:Use Large Language Models Directly in your Development Environment

Large language models are readily accessible via API. This package lowers the barrier to use the API inside of your development environment. For more on the API, see <https://platform.openai.com/docs/introduction>.

Maintained by James Wade. Last updated 5 days ago.

chatgpt gpt-3 rstudio rstudio-addin

1.9 match 924 stars 10.83 score 43 scripts 1 dependents

bioc

DropletUtils:Utilities for Handling Single-Cell Droplet Data

Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.

Maintained by Jonathan Griffiths. Last updated 3 months ago.

immunooncology singlecell sequencing rnaseq geneexpression transcriptomics dataimport coverage zlib cpp

2.0 match 10.08 score 2.7k scripts 9 dependents

ropensci

drake:A Pipeline Toolkit for Reproducible Computation at Scale

A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.

Maintained by William Michael Landau. Last updated 3 months ago.

data-science drake high-performance-computing makefile peer-reviewed pipeline reproducibility reproducible-research ropensci workflow

1.8 match 1.3k stars 11.49 score 1.7k scripts 1 dependents

bioc

immunotation:Tools for working with diverse immune genes

MHC (major histocompatibility complex) molecules are cell surface complexes that present antigens to T cells. The repertoire of antigens presented in a given genetic background largely depends on the sequence of the encoded MHC molecules, and thus, in humans, on the highly variable HLA (human leukocyte antigen) genes of the hyperpolymorphic HLA locus. More than 28,000 different HLA alleles have been reported, with significant differences in allele frequencies between human populations worldwide. Reproducible and consistent annotation of HLA alleles in large-scale bioinformatics workflows remains challenging, because the available reference databases and software tools often use different HLA naming schemes. The package immunotation provides tools for consistent annotation of HLA genes in typical immunoinformatics workflows such as for example the prediction of MHC-presented peptides in different human donors. Converter functions that provide mappings between different HLA naming schemes are based on the MHC restriction ontology (MRO). The package also provides automated access to HLA alleles frequencies in worldwide human reference populations stored in the Allele Frequency Net Database.

Maintained by Katharina Imkeller. Last updated 5 months ago.

software immunooncology biomedicalinformatics genetics annotation

4.1 match 8 stars 4.90 score 3 scripts

germanrecordlinkage

PPRL:Privacy Preserving Record Linkage

A toolbox for deterministic, probabilistic and privacy-preserving record linkage techniques. Combines the functionality of the 'Merge ToolBox' (<https://www.record-linkage.de>) with current privacy-preserving techniques.

Maintained by Dorothea Rukasz. Last updated 2 years ago.

cpp

7.5 match 2 stars 2.64 score 22 scripts

ropensci

frictionless:Read and Write Frictionless Data Packages

Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets.

Maintained by Peter Desmet. Last updated 6 months ago.

frictionlessdata oscibio

2.0 match 30 stars 9.79 score 55 scripts 6 dependents

tidymodels

textrecipes:Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Maintained by Emil Hvitfeldt. Last updated 8 days ago.

1.8 match 160 stars 10.87 score 964 scripts 1 dependents

coolbutuseless

fastpng:Read and Write PNG Files with Configurable Decoder/Encoder Options

Read and write PNG images with arrays, rasters, native rasters, numeric arrays, integer arrays, raw vectors and indexed values. This PNG encoder exposes configurable internal options enabling the user to select a speed-size tradeoff. For example, disabling compression can speed up writing PNG by a factor of 50. Multiple image formats are supported including raster, native rasters, and integer and numeric arrays at color depths of 1, 2, 3 or 4. 16-bit images are also supported. This implementation uses the 'libspng' 'C' library which is available from <https://github.com/randy408/libspng/>.

Maintained by Mike Cheng. Last updated 2 months ago.

3.3 match 18 stars 5.86 score 7 scripts

ncss-tech

soilDB:Soil Database Interface

A collection of functions for reading soil data from U.S. Department of Agriculture Natural Resources Conservation Service (USDA-NRCS) and National Cooperative Soil Survey (NCSS) databases.

Maintained by Andrew Brown. Last updated 6 days ago.

kssl nasis nrcs soil soil-data-access soil-survey soilweb sql usda

1.7 match 87 stars 11.34 score 1.0k scripts 1 dependents

bioc

Structstrings:Implementation of the dot bracket annotations with Biostrings

The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package.

Maintained by Felix G.M. Ernst. Last updated 4 months ago.

dataimport datarepresentation infrastructure sequencing software alignment sequencematching bioconductor rna rna-structural-analysis rna-structure sequences structures

3.0 match 4 stars 6.46 score 3 scripts 4 dependents

bioc

seqTools:Analysis of nucleotide, sequence and quality content on fastq files

Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.

Maintained by Wolfgang Kaisers. Last updated 5 months ago.

qualitycontrol sequencing zlib

3.5 match 5.57 score 52 scripts 1 dependents

ropensci

av:Working with Audio and Video in R

Bindings to 'FFmpeg' <http://www.ffmpeg.org/> AV library for working with audio and video in R. Generates high quality video from images or R graphics with custom audio. Also offers high performance tools for reading raw audio, creating 'spectrograms', and converting between countless audio / video formats. This package interfaces directly to the C API and does not require any command line utilities.

Maintained by Jeroen Ooms. Last updated 1 months ago.

ffmpeg

1.9 match 93 stars 10.28 score 552 scripts 15 dependents

s-u

PKI:Public Key Infrastucture for R Based on the X.509 Standard

Public Key Infrastucture functions such as verifying certificates, RSA encription and signing which can be used to build PKI infrastructure and perform cryptographic tasks.

Maintained by Simon Urbanek. Last updated 7 months ago.

openssl

2.3 match 18 stars 8.52 score 63 scripts 8 dependents

prabinameher

EncDNA:Encoding of Nucleotide Sequences into Numeric Feature Vectors

We describe fifteen different splice site sequence encoding schemes that have been used in earlier studies for mapping of splice site sequences into numeric feature vectors. These encoding schemes will also be helpful for transforming other nucleotide sequences into numeric forms, provided they are of equal length. These encoding schemes will help the computational biologist working in the field of classification (binary or multiclass) or prediction involving nucleic acid sequences of equal length.

Maintained by Prabina Kumar Meher. Last updated 6 years ago.

19.1 match 1 stars 1.00 score

program--

hilbert:Coordinate Indexing on Hilbert Curves

Provides utilities for encoding and decoding coordinates to/from Hilbert curves based on the iterative encoding implementation described in Chen et al. (2006) <doi:10.1002/spe.793>.

Maintained by Justin Singh-Mohudpur. Last updated 3 years ago.

hilbert-curve spatial cpp

4.3 match 5 stars 4.40 score 5 scripts

bioc

orthos:`orthos` is an R package for variance decomposition using conditional variational auto-encoders

`orthos` decomposes RNA-seq contrasts, for example obtained from a gene knock-out or compound treatment experiment, into unspecific and experiment-specific components. Original and decomposed contrasts can be efficiently queried against a large database of contrasts (derived from ARCHS4, https://maayanlab.cloud/archs4/) to identify similar experiments. `orthos` furthermore provides plotting functions to visualize the results of such a search for similar contrasts.

Maintained by Panagiotis Papasaikas. Last updated 4 days ago.

rnaseq differentialexpression geneexpression

4.4 match 4.18 score 2 scripts

ironholds

olctools:Open Location Code Handling in R

'Open Location Codes' (https://openlocationcode.com/) are a Google- created standard for identifying geographic locations. olctools provides utilities for validating, encoding and decoding entries that follow this standard.

Maintained by Oliver Keyes. Last updated 9 years ago.

cpp

3.6 match 13 stars 5.16 score 11 scripts

r-quantities

errors:Uncertainty Propagation for R Vectors

Support for measurement errors in R vectors, matrices and arrays: automatic uncertainty propagation and reporting. Documentation about 'errors' is provided in the paper by Ucar, Pebesma & Azcorra (2018, <doi:10.32614/RJ-2018-075>), included in this package as a vignette; see 'citation("errors")' for details.

Maintained by Iñaki Ucar. Last updated 2 months ago.

error-propagation uncertainty

2.3 match 49 stars 8.18 score 86 scripts 4 dependents

gdemin

expss:Tables, Labels and Some Useful Functions from Spreadsheets and 'SPSS' Statistics

Package computes and displays tables with support for 'SPSS'-style labels, multiple and nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', 'Shiny', '*.xlsx' files, R and 'Jupyter' notebooks. Methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package brings popular data transformation functions from 'SPSS' Statistics and 'Excel': 'RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP' and etc. These functions are very useful for data processing in marketing research surveys. Package intended to help people to move data processing from 'Excel' and 'SPSS' to R.

Maintained by Gregory Demin. Last updated 11 months ago.

excel labels labels-support msexcel pivot-tables recode spss spss-statistics tables variable-labels vlookup

1.7 match 84 stars 11.00 score 1.8k scripts 4 dependents

gojiplus

tuber:Client for the YouTube API

Get comments posted on YouTube videos, information on how many times a video has been liked, search for videos with particular content, and much more. You can also scrape captions from a few videos. To learn more about the YouTube API, see <https://developers.google.com/youtube/v3/>.

Maintained by Gaurav Sood. Last updated 8 months ago.

access-youtube caption video youtube youtube-api youtube-oauth

2.0 match 184 stars 8.99 score 206 scripts

bergsmat

yamlet:Versatile Curation of Table Metadata

A YAML-based mechanism for working with table metadata. Supports compact syntax for creating, modifying, viewing, exporting, importing, displaying, and plotting metadata coded as column attributes. The 'yamlet' dialect is valid 'YAML' with defaults and conventions chosen to improve readability. See ?yamlet, ?decorate, ?modify, ?io_csv, and ?ggplot.decorated.

Maintained by Tim Bergsma. Last updated 22 days ago.

3.0 match 2 stars 5.99 score 60 scripts 1 dependents

natverse

nat:NeuroAnatomy Toolbox for Analysis of 3D Image Data

NeuroAnatomy Toolbox (nat) enables analysis and visualisation of 3D biological image data, especially traced neurons. Reads and writes 3D images in NRRD and 'Amira' AmiraMesh formats and reads surfaces in 'Amira' hxsurf format. Traced neurons can be imported from and written to SWC and 'Amira' LineSet and SkeletonGraph formats. These data can then be visualised in 3D via 'rgl', manipulated including applying calculated registrations, e.g. using the 'CMTK' registration suite, and analysed. There is also a simple representation for neurons that have been subjected to 3D skeletonisation but not formally traced; this allows morphological comparison between neurons including searches and clustering (via the 'nat.nblast' extension package).

Maintained by Gregory Jefferis. Last updated 5 months ago.

3d connectomics image-analysis neuroanatomy neuroanatomy-toolbox neuron neuron-morphology neuroscience visualisation

1.8 match 67 stars 9.94 score 436 scripts 2 dependents

rapler

dst:Using the Theory of Belief Functions

Using the Theory of Belief Functions for evidence calculus. Basic probability assignments, or mass functions, can be defined on the subsets of a set of possible values and combined. A mass function can be extended to a larger frame. Marginalization, i.e. reduction to a smaller frame can also be done. These features can be combined to analyze small belief networks and take into account situations where information cannot be satisfactorily described by probability distributions.

Maintained by Peiyuan Zhu. Last updated 3 months ago.

3.0 match 6 stars 5.96 score 126 scripts

nepem-ufsc

metan:Multi Environment Trials Analysis

Performs stability analysis of multi-environment trial data using parametric and non-parametric methods. Parametric methods includes Additive Main Effects and Multiplicative Interaction (AMMI) analysis by Gauch (2013) <doi:10.2135/cropsci2013.04.0241>, Ecovalence by Wricke (1965), Genotype plus Genotype-Environment (GGE) biplot analysis by Yan & Kang (2003) <doi:10.1201/9781420040371>, geometric adaptability index by Mohammadi & Amri (2008) <doi:10.1007/s10681-007-9600-6>, joint regression analysis by Eberhart & Russel (1966) <doi:10.2135/cropsci1966.0011183X000600010011x>, genotypic confidence index by Annicchiarico (1992), Murakami & Cruz's (2004) method, power law residuals (POLAR) statistics by Doring et al. (2015) <doi:10.1016/j.fcr.2015.08.005>, scale-adjusted coefficient of variation by Doring & Reckling (2018) <doi:10.1016/j.eja.2018.06.007>, stability variance by Shukla (1972) <doi:10.1038/hdy.1972.87>, weighted average of absolute scores by Olivoto et al. (2019a) <doi:10.2134/agronj2019.03.0220>, and multi-trait stability index by Olivoto et al. (2019b) <doi:10.2134/agronj2019.03.0221>. Non-parametric methods includes superiority index by Lin & Binns (1988) <doi:10.4141/cjps88-018>, nonparametric measures of phenotypic stability by Huehn (1990) <doi:10.1007/BF00024241>, TOP third statistic by Fox et al. (1990) <doi:10.1007/BF00040364>. Functions for computing biometrical analysis such as path analysis, canonical correlation, partial correlation, clustering analysis, and tools for inspecting, manipulating, summarizing and plotting typical multi-environment trial data are also provided.

Maintained by Tiago Olivoto. Last updated 9 days ago.

1.9 match 2 stars 9.48 score 1.3k scripts 2 dependents

statnet

rle:Common Functions for Run-Length Encoded Vectors

Common 'base' and 'stats' methods for 'rle' objects, aiming to make it possible to treat them transparently as vectors.

Maintained by Pavel N. Krivitsky. Last updated 4 months ago.

2.9 match 2 stars 6.07 score 1 scripts 37 dependents

jamesyang007

adelie:Group Lasso and Elastic Net Solver for Generalized Linear Models

Extremely efficient procedures for fitting the entire group lasso and group elastic net regularization path for GLMs, multinomial, the Cox model and multi-task Gaussian models. Similar to the R package 'glmnet' in scope of models, and in computational speed. This package provides R bindings to the C++ code underlying the corresponding Python package 'adelie'. These bindings offer a general purpose group elastic net solver, a wide range of matrix classes that can exploit special structure to allow large-scale inputs, and an assortment of generalized linear model classes for fitting various types of data. The package is an implementation of Yang, J. and Hastie, T. (2024) <doi:10.48550/arXiv.2405.08631>.

Maintained by Trevor Hastie. Last updated 15 days ago.

cpp openmp

3.0 match 6 stars 5.86 score 3 scripts

bioc

ROntoTools:R Onto-Tools suite

Suite of tools for functional analysis.

Maintained by Sorin Draghici. Last updated 5 months ago.

networkanalysis microarray graphsandnetworks

3.4 match 5.10 score 15 scripts 2 dependents

hope-data-science

tidyfst:Tidy Verbs for Fast Data Manipulation

A toolkit of tidy data manipulation verbs with 'data.table' as the backend. Combining the merits of syntax elegance from 'dplyr' and computing performance from 'data.table', 'tidyfst' intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of 'data.table'. While enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations.

Maintained by Tian-Yuan Huang. Last updated 6 months ago.

1.7 match 98 stars 10.09 score 118 scripts 4 dependents

bioc

OUTRIDER:OUTRIDER - OUTlier in RNA-Seq fInDER

Identification of aberrant gene expression in RNA-seq data. Read count expectations are modeled by an autoencoder to control for confounders in the data. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. Furthermore, OUTRIDER provides useful plotting functions to analyze and visualize the results.

Maintained by Christian Mertes. Last updated 5 months ago.

immunooncology rnaseq transcriptomics alignment sequencing geneexpression genetics count-data diagnostics expression-analysis mendelian-genetics outlier-detection rna-seq openblas cpp

1.9 match 49 stars 9.07 score 110 scripts 1 dependents

tim-band

shinylight:Web Interface to 'R' Functions

Web front end for your 'R' functions producing plots or tables. If you have a function or set of related functions, you can make them available over the internet through a web browser. This is the same motivation as the 'shiny' package, but note that the development of 'shinylight' is not in any way linked to that of 'shiny' (beyond the use of the 'httpuv' package). You might prefer 'shinylight' to 'shiny' if you want a lighter weight deployment with easier horizontal scaling, or if you want to develop your front end yourself in JavaScript and HTML just using a lightweight remote procedure call interface to your R code on the server.

Maintained by Tim Band. Last updated 1 years ago.

5.3 match 3.18 score 1 scripts 1 dependents

bioc

HPiP:Host-Pathogen Interaction Prediction

HPiP (Host-Pathogen Interaction Prediction) uses an ensemble learning algorithm for prediction of host-pathogen protein-protein interactions (HP-PPIs) using structural and physicochemical descriptors computed from amino acid-composition of host and pathogen proteins.The proposed package can effectively address data shortages and data unavailability for HP-PPI network reconstructions. Moreover, establishing computational frameworks in that regard will reveal mechanistic insights into infectious diseases and suggest potential HP-PPI targets, thus narrowing down the range of possible candidates for subsequent wet-lab experimental validations.

Maintained by Matineh Rahmatbakhsh. Last updated 5 months ago.

proteomics systemsbiology networkinference structuralprediction geneprediction network

3.4 match 3 stars 4.95 score 6 scripts

bioc

epigraHMM:Epigenomic R-based analysis with hidden Markov models

epigraHMM provides a set of tools for the analysis of epigenomic data based on hidden Markov Models. It contains two separate peak callers, one for consensus peaks from biological or technical replicates, and one for differential peaks from multi-replicate multi-condition experiments. In differential peak calling, epigraHMM provides window-specific posterior probabilities associated with every possible combinatorial pattern of read enrichment across conditions.

Maintained by Pedro Baldoni. Last updated 5 months ago.

chipseq atacseq dnaseseq hiddenmarkovmodel epigenetics zlib openblas cpp openmp

3.4 match 4.94 score 88 scripts

bioc

ChIPseqR:Identifying Protein Binding Sites in High-Throughput Sequencing Data

ChIPseqR identifies protein binding sites from ChIP-seq and nucleosome positioning experiments. The model used to describe binding events was developed to locate nucleosomes but should flexible enough to handle other types of experiments as well.

Maintained by Peter Humburg. Last updated 5 months ago.

chipseq infrastructure

3.5 match 4.70 score 1 scripts

henrikbengtsson

R.matlab:Read and Write MAT Files and Call MATLAB from Within R

Methods readMat() and writeMat() for reading and writing MAT files. For user with MATLAB v6 or newer installed (either locally or on a remote host), the package also provides methods for controlling MATLAB (trademark) via R and sending and retrieving data between R and MATLAB.

Maintained by Henrik Bengtsson. Last updated 3 years ago.

matlab

1.6 match 85 stars 10.55 score 2.9k scripts 25 dependents

shichenxie

scorecard:Credit Risk Scorecard

The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.

Maintained by Shichen Xie. Last updated 11 months ago.

binning credit-scoring release scorecard woe woebinning

2.0 match 164 stars 8.07 score 94 scripts

gi0na

ghypernet:Fit and Simulate Generalised Hypergeometric Ensembles of Graphs

Provides functions for model fitting and selection of generalised hypergeometric ensembles of random graphs (gHypEG). To learn how to use it, check the vignettes for a quick tutorial. Please reference its use as Casiraghi, G., Nanumyan, V. (2019) <doi:10.5281/zenodo.2555300> together with those relevant references from the one listed below. The package is based on the research developed at the Chair of Systems Design, ETH Zurich. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2016) <arXiv:1607.02441>. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017) <doi:10.1007/978-3-319-67256-4_11>. Casiraghi, G., (2017) <arXiv:1702.02048> Brandenberger, L., Casiraghi, G., Nanumyan, V., Schweitzer, F. (2019) <doi:10.1145/3341161.3342926> Casiraghi, G. (2019) <doi:10.1007/s41109-019-0241-1>. Casiraghi, G., Nanumyan, V. (2021) <doi:10.1038/s41598-021-92519-y>. Casiraghi, G. (2021) <doi:10.1088/2632-072X/ac0493>.

Maintained by Giona Casiraghi. Last updated 11 months ago.

data-mining data-science graphs network network-analysis random-graph-generation random-graphs

2.8 match 8 stars 5.68 score 20 scripts

cran

XML:Tools for Parsing and Generating XML Within R and S-Plus

Many approaches for both reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP. Also offers access to an 'XPath' "interpreter".

Maintained by CRAN Team. Last updated 2 months ago.

libxml2

1.8 match 3 stars 8.87 score 1.3k dependents

bioc

FRASER:Find RAre Splicing Events in RNA-Seq Data

Detection of rare aberrant splicing events in transcriptome profiles. Read count ratio expectations are modeled by an autoencoder to control for confounding factors in the data. Given these expectations, the ratios are assumed to follow a beta-binomial distribution with a junction specific dispersion. Outlier events are then identified as read-count ratios that deviate significantly from this distribution. FRASER is able to detect alternative splicing, but also intron retention. The package aims to support diagnostics in the field of rare diseases where RNA-seq is performed to identify aberrant splicing defects.

Maintained by Christian Mertes. Last updated 5 months ago.

rnaseq alternativesplicing sequencing software genetics coverage aberrant-splicing diagnostics outlier-detection rare-disease rna-seq splicing openblas cpp

1.9 match 41 stars 8.50 score 155 scripts

bioboot

bio3d:Biological Structure Analysis

Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.

Maintained by Barry Grant. Last updated 5 months ago.

zlib cpp

1.9 match 5 stars 8.49 score 1.4k scripts 10 dependents

saraswatmks

superml:Build Machine Learning Models Like Using Python's Scikit-Learn Library in R

The idea is to provide a standard interface to users who use both R and Python for building machine learning models. This package provides a scikit-learn's fit, predict interface to train machine learning models in R.

Maintained by Manish Saraswat. Last updated 1 years ago.

openblas cpp

2.3 match 32 stars 7.05 score 117 scripts

bioc

QFeatures:Quantitative features for mass spectrometry data

The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.

Maintained by Laurent Gatto. Last updated 12 days ago.

infrastructure massspectrometry proteomics metabolomics bioconductor mass-spectrometry

1.3 match 27 stars 11.87 score 278 scripts 49 dependents

dipterix

dipsaus:A Dipping Sauce for Data Analysis and Visualizations

Works as an "add-on" to packages like 'shiny', 'future', as well as 'rlang', and provides utility functions. Just like dipping sauce adding flavors to potato chips or pita bread, 'dipsaus' for data analysis and visualizations adds handy functions and enhancements to popular packages. The goal is to provide simple solutions that are frequently asked for online, such as how to synchronize 'shiny' inputs without freezing the app, or how to get memory size on 'Linux' or 'MacOS' system. The enhancements roughly fall into these four categories: 1. 'shiny' input widgets; 2. high-performance computing using the 'future' package; 3. modify R calls and convert among numbers, strings, and other objects. 4. utility functions to get system information such like CPU chip-set, memory limit, etc.

Maintained by Zhengjia Wang. Last updated 4 days ago.

cpp

2.0 match 13 stars 7.90 score 85 scripts 3 dependents

parmsam

lzstring:Wrapper for 'lz-string' 'C++' Library

Provide access to the 'lz-string' <http://pieroxy.net/blog/pages/lz-string/index.html> 'C++' library for Lempel-Ziv (LZ) based compression and decompression of strings.

Maintained by Sam Parmar. Last updated 2 months ago.

lzstring cpp

3.6 match 1 stars 4.38 score 4 scripts 1 dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

1.3 match 215 stars 11.83 score 1.2k scripts 9 dependents

emilhvitfeldt

extrasteps:More Miscellaneous Steps for the 'recipes' Package

Contains additional miscellaneous steps for the 'recipes' package. These steps are useful, but doesn't have a good home in other 'recipes' packages or its extensions.

Maintained by Emil Hvitfeldt. Last updated 5 months ago.

3.8 match 10 stars 4.15 score 14 scripts

bioc

TFutils:TFutils

This package helps users to work with TF metadata from various sources. Significant catalogs of TFs and classifications thereof are made available. Tools for working with motif scans are also provided.

Maintained by Vincent Carey. Last updated 4 months ago.

transcriptomics

3.3 match 4.80 score 21 scripts

bioc

seqArchR:Identify Different Architectures of Sequence Elements

seqArchR enables unsupervised discovery of _de novo_ clusters with characteristic sequence architectures characterized by position-specific motifs or composition of stretches of nucleotides, e.g., CG-richness. seqArchR does _not_ require any specifications w.r.t. the number of clusters, the length of any individual motifs, or the distance between motifs if and when they occur in pairs/groups; it directly detects them from the data. seqArchR uses non-negative matrix factorization (NMF) as its backbone, and employs a chunking-based iterative procedure that enables processing of large sequence collections efficiently. Wrapper functions are provided for visualizing cluster architectures as sequence logos.

Maintained by Sarvesh Nikumbh. Last updated 5 months ago.

motifdiscovery generegulation mathematicalbiology systemsbiology transcriptomics genetics clustering dimensionreduction featureextraction dnaseq nmf nonnegative-matrix-factorization promoter-sequence-architectures scikit-learn sequence-analysis sequence-architectures unsupervised-machine-learning

3.5 match 1 stars 4.48 score 9 scripts 1 dependents

trivialfis

xgboost:Extreme Gradient Boosting

Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.

Maintained by Jiaming Yuan. Last updated 8 months ago.

cpp openmp

1.3 match 6 stars 11.70 score 13k scripts 112 dependents

bioc

KBoost:Inference of gene regulatory networks from gene expression data

Reconstructing gene regulatory networks and transcription factor activity is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-art algorithm are often not able to handle large amounts of data. Furthermore, many of the present methods predict numerous false positives and are unable to integrate other sources of information such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. KBoost can also use a prior network built on previously known transcription factor targets. We have benchmarked KBoost using three different datasets against other high performing algorithms. The results show that our method compares favourably to other methods across datasets.

Maintained by Luis F. Iglesias-Martinez. Last updated 5 months ago.

network graphandnetwork bayesian networkinference generegulation transcriptomics systemsbiology transcription geneexpression regression principalcomponent

3.4 match 4 stars 4.60 score 9 scripts

bioc

FastqCleaner:A Shiny Application for Quality Control, Filtering and Trimming of FASTQ Files

An interactive web application for quality control, filtering and trimming of FASTQ files. This user-friendly tool combines a pipeline for data processing based on Biostrings and ShortRead infrastructure, with a cutting-edge visual environment. Single-Read and Paired-End files can be locally processed. Diagnostic interactive plots (CG content, per-base sequence quality, etc.) are provided for both the input and output files.

Maintained by Leandro Roser. Last updated 5 months ago.

qualitycontrol sequencing software sangerseq sequencematching cpp

3.8 match 4.00 score 4 scripts