Showing 200 of total 455 results (show query)
bergsmat
encode:Represent Ordered Lists and Pairs as Strings
Interconverts between ordered lists and compact string notation. Useful for capturing code lists, and pair-wise codes and decodes, for text storage. Analogous to factor levels and labels. Generics encode() and decode() perform interconversion, while codes() and decodes() extract components of an encoding. The function encoded() checks whether something is interpretable as an encoding. If a vector has an encoded 'guide' attribute, as_factor() uses it to coerce to factor.
Maintained by Tim Bergsma. Last updated 6 years ago.
71.2 match 2 stars 4.03 score 12 scripts 5 dependentsgagolews
stringi:Fast and Portable Character String Processing Facilities
A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).
Maintained by Marek Gagolewski. Last updated 1 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringistringrtexttext-processingtidy-dataunicodecpp
15.0 match 309 stars 18.31 score 10k scripts 8.6k dependentsquanteda
readtext:Import and Handling for Plain and Formatted Text Files
Functions for importing and handling text files and formatted text files with additional meta-data, such including '.csv', '.tab', '.json', '.xml', '.html', '.pdf', '.doc', '.docx', '.rtf', '.xls', '.xlsx', and others.
Maintained by Kenneth Benoit. Last updated 4 months ago.
21.9 match 122 stars 10.66 score 1.2k scripts 5 dependentsqsbase
qs:Quick Serialization of R Objects
Provides functions for quickly writing and reading any R object to and from disk.
Maintained by Travers Ching. Last updated 9 days ago.
compressiondata-storageencodingserializationlibzstdlz4cpp
16.4 match 414 stars 13.91 score 2.5k scripts 51 dependentscefet-rj-dal
daltoolbox:Leveraging Experiment Lines to Data Analytics
The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Maintained by Eduardo Ogasawara. Last updated 1 months ago.
31.7 match 1 stars 6.65 score 536 scripts 4 dependentssymbolixau
googlePolylines:Encoding Coordinates into 'Google' Polylines
Encodes simple feature ('sf') objects and coordinates, and decodes polylines using the 'Google' polyline encoding algorithm (<https://developers.google.com/maps/documentation/utilities/polylinealgorithm>).
Maintained by David Cooley. Last updated 2 days ago.
geospatialgisgoogle-mapspolyline-encoderr-spatialspatialcpp
21.7 match 18 stars 8.11 score 9 dependentsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 3 days ago.
12.0 match 845 stars 13.57 score 264 scripts 2 dependentsmlr-org
mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'
Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.
Maintained by Martin Binder. Last updated 8 days ago.
baggingdata-sciencedataflow-programmingensemble-learningmachine-learningmlr3pipelinespreprocessingstacking
13.2 match 141 stars 12.36 score 448 scripts 7 dependentsmunterfi
flexpolyline:Flexible Polyline Encoding
Binding to the C++ implementation of the flexible polyline encoding by HERE <https://github.com/heremaps/flexible-polyline>. The flexible polyline encoding is a lossy compressed representation of a list of coordinate pairs or coordinate triples. The encoding is achieved by: (1) Reducing the decimal digits of each value; (2) encoding only the offset from the previous point; (3) using variable length for each coordinate delta; and (4) using 64 URL-safe characters to display the result.
Maintained by Merlin Unterfinger. Last updated 2 years ago.
gisheremapspolylinepolyline-decoderpolyline-encoderrspatialcpp
28.2 match 9 stars 5.75 score 14 scripts 1 dependentspolmine
polmineR:Verbs and Nouns for Corpus Analysis
Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.
Maintained by Andreas Blaette. Last updated 1 years ago.
19.8 match 49 stars 7.96 score 311 scriptsjszitas
categoryEncodings:Category Variable Encodings
Simple, fast, and automatic encodings for category data using a data.table backend. Most of the methods are an implementation of Johannemann, Hadad, Athey, Wager (2019) <arXiv:1908.09874>, particularly their 'means', "sPCA", "low-rank" and "multinomial logit".
Maintained by Juraj Szitas. Last updated 3 years ago.
categorical-variablesfeature-encodingfeature-engineering
46.4 match 3 stars 3.18 score 2 scriptsyihui
xfun:Supporting Functions for Packages Maintained by 'Yihui Xie'
Miscellaneous functions commonly used in other packages maintained by 'Yihui Xie'.
Maintained by Yihui Xie. Last updated 2 days ago.
6.6 match 145 stars 18.18 score 916 scripts 4.4k dependentsrpolars
polars:Lightning-Fast 'DataFrame' Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Soren Welling. Last updated 3 days ago.
9.9 match 499 stars 12.01 score 1.0k scripts 2 dependentsmichbur
biogram:N-Gram Analysis of Biological Sequences
Tools for extraction and analysis of various n-grams (k-mers) derived from biological sequences (proteins or nucleic acids). Contains QuiPT (quick permutation test) for fast feature-filtering of the n-gram data.
Maintained by Michal Burdukiewicz. Last updated 7 months ago.
biological-sequencesngram-analysis
15.5 match 10 stars 7.50 score 87 scripts 3 dependentsbquast
HomomorphicEncryption:BFV, BGV, CKKS Schema for Fully Homomorphic Encryption
Implements the Brakerski-Fan-Vercauteren (BFV, 2012) <https://eprint.iacr.org/2012/144>, Brakerski-Gentry-Vaikuntanathan (BGV, 2014) <doi:10.1145/2633600>, and Cheon-Kim-Kim-Song (CKKS, 2016) <https://eprint.iacr.org/2016/421.pdf> schema for Fully Homomorphic Encryption. The included vignettes demonstrate the encryption procedures.
Maintained by Bastiaan Quast. Last updated 1 years ago.
19.5 match 1 stars 5.52 score 39 scriptsbioc
graph:graph: A package to handle graph data structures
A package that implements some simple graph handling capabilities.
Maintained by Bioconductor Package Maintainer. Last updated 10 days ago.
8.7 match 11.78 score 764 scripts 342 dependentsbioc
consensusSeekeR:Detection of consensus regions inside a group of experiences using genomic positions and genomic ranges
This package compares genomic positions and genomic ranges from multiple experiments to extract common regions. The size of the analyzed region is adjustable as well as the number of experiences in which a feature must be present in a potential region to tag this region as a consensus region. In genomic analysis where feature identification generates a position value surrounded by a genomic range, such as ChIP-Seq peaks and nucleosome positions, the replication of an experiment may result in slight differences between predicted values. This package enables the conciliation of the results into consensus regions.
Maintained by Astrid Deschรชnes. Last updated 5 months ago.
biologicalquestionchipseqgeneticsmultiplecomparisontranscriptionpeakdetectionsequencingcoveragechip-seq-analysisgenomic-data-analysisnucleosome-positioning
19.2 match 1 stars 5.26 score 5 scripts 1 dependentspolkas
cat2cat:Handling an Inconsistently Coded Categorical Variable in a Longitudinal Dataset
Unifying an inconsistently coded categorical variable between two different time points in accordance with a mapping table. The main rule is to replicate the observation if it could be assigned to a few categories. Then using frequencies or statistical methods to approximate the probabilities of being assigned to each of them. This procedure was invented and implemented in the paper by Nasinski, Majchrowska, and Broniatowska (2020) <doi:10.24425/cejeme.2020.134747>.
Maintained by Maciej Nasinski. Last updated 1 years ago.
categoriesencodingencodingsfactorlongitudinalmappingmappingspaneltransitions
23.2 match 4 stars 4.30 score 2 scriptspatperry
utf8:Unicode Text Processing
Process and print 'UTF-8' encoded international text (Unicode). Input, validate, normalize, encode, format, and display.
Maintained by Kirill Mรผller. Last updated 3 months ago.
6.0 match 113 stars 16.48 score 295 scripts 11k dependentstidymodels
embed:Extra Recipes for Encoding Predictors
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
10.5 match 142 stars 9.35 score 1.1k scriptsbioc
DEWSeq:Differential Expressed Windows Based on Negative Binomial Distribution
DEWSeq is a sliding window approach for the analysis of differentially enriched binding regions eCLIP or iCLIP next generation sequencing data.
Maintained by bioinformatics team Hentze. Last updated 5 months ago.
sequencinggeneregulationfunctionalgenomicsdifferentialexpressionbioinformaticseclipngs-analysis
17.8 match 5 stars 5.30 score 4 scriptsbtskinner
crosswalkr:Rename and Encode Data Frames Using External Crosswalk Files
A pair of functions for renaming and encoding data frames using external crosswalk files. It is especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on similar commands in 'Stata'.
Maintained by Benjamin Skinner. Last updated 1 years ago.
17.3 match 9 stars 5.26 score 20 scriptsbioc
GreyListChIP:Grey Lists -- Mask Artefact Regions Based on ChIP Inputs
Identify regions of ChIP experiments with high signal in the input, that lead to spurious peaks during peak calling. Remove reads aligning to these regions prior to peak calling, for cleaner ChIP analysis.
Maintained by Matt Eldridge. Last updated 5 months ago.
chipseqalignmentpreprocessingdifferentialpeakcallingsequencinggenomeannotationcoverage
18.3 match 4.93 score 10 scripts 4 dependentss-u
base64enc:Tools for base64 Encoding
Tools for handling base64 encoding. It is more flexible than the orphaned base64 package.
Maintained by Simon Urbanek. Last updated 3 years ago.
7.0 match 9 stars 12.62 score 680 scripts 4.8k dependentsoverton-group
eHDPrep:Quality Control and Semantic Enrichment of Datasets
A tool for the preparation and enrichment of health datasets for analysis (Toner et al. (2023) <doi:10.1093/gigascience/giad030>). Provides functionality for assessing data quality and for improving the reliability and machine interpretability of a dataset. 'eHDPrep' also enables semantic enrichment of a dataset where metavariables are discovered from the relationships between input variables determined from user-provided ontologies.
Maintained by Ian Overton. Last updated 2 years ago.
data-qualityhealth-informaticssemantic-enrichment
17.9 match 8 stars 4.90 score 10 scriptsmllg
base64url:Fast and URL-Safe Base64 Encoder and Decoder
In contrast to RFC3548, the 62nd character ("+") is replaced with "-", the 63rd character ("/") is replaced with "_". The resulting encoded strings comply to the regular expression pattern "[A-Za-z0-9_-]" and thus are safe to use in URLs or for file names. The package also comes with a simple base32 encoder/decoder suited for case insensitive file systems.
Maintained by Michel Lang. Last updated 5 years ago.
10.0 match 12 stars 8.37 score 15 scripts 68 dependentsrexyai
RestRserve:A Framework for Building HTTP API
Allows to easily create high-performance full featured HTTP APIs from R functions. Provides high-level classes such as 'Request', 'Response', 'Application', 'Middleware' in order to streamline server side application development. Out of the box allows to serve requests using 'Rserve' package, but flexible enough to integrate with other HTTP servers such as 'httpuv'.
Maintained by Dmitry Selivanov. Last updated 3 days ago.
http-serveropenapirest-apiswagger-uicpp
8.7 match 283 stars 9.56 score 95 scripts 1 dependentsbnosac
tokenizers.bpe:Byte Pair Encoding Text Tokenization
Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.
Maintained by Jan Wijffels. Last updated 2 years ago.
bpebyte-pair-encodingtext-miningtokenizationcpp
17.3 match 15 stars 4.56 score 48 scriptsironholds
urltools:Vectorised Tools for URL Handling and Parsing
A toolkit for all URL-handling needs, including encoding and decoding, parsing, parameter extraction and modification. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.
Maintained by Os Keyes. Last updated 4 years ago.
5.7 match 131 stars 13.43 score 968 scripts 264 dependentsbioc
matter:Out-of-core statistical computing and signal processing
Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.
Maintained by Kylie A. Bemis. Last updated 3 months ago.
infrastructuredatarepresentationdataimportdimensionreductionpreprocessingcpp
7.9 match 57 stars 9.52 score 64 scripts 2 dependentshrbrmstr
qrencoder:Quick Response Code (QR Code) / Matrix Barcode Creator
Quick Response codes (QR codes) are a type of matrix bar code and can be used to authenticate transactions, provide access to multi-factor authentication services and enable general data transfer in an image. QR codes use four standardized encoding modes (numeric, alphanumeric, byte/binary, and kanji) to efficiently store data. Matrix barcode generation is performed efficiently in C via the included 'libqrencoder' library created by Kentaro Fukuchi.
Maintained by Bob Rudis. Last updated 6 years ago.
12.1 match 61 stars 6.03 score 59 scripts 1 dependentsbioc
BPRMeth:Model higher-order methylation profiles
The BPRMeth package is a probabilistic method to quantify explicit features of methylation profiles, in a way that would make it easier to formally use such profiles in downstream modelling efforts, such as predicting gene expression levels or clustering genomic regions or cells according to their methylation profiles.
Maintained by Chantriolnt-Andreas Kapourani. Last updated 5 months ago.
immunooncologydnamethylationgeneexpressiongeneregulationepigeneticsgeneticsclusteringfeatureextractionregressionrnaseqbayesiankeggsequencingcoveragesinglecellopenblascpp
12.5 match 5.75 score 94 scripts 1 dependentsmerck
r2rtf:Easily Create Production-Ready Rich Text Format (RTF) Tables and Figures
Create production-ready Rich Text Format (RTF) tables and figures with flexible format.
Maintained by Benjamin Wang. Last updated 5 days ago.
6.6 match 78 stars 10.82 score 171 scripts 10 dependentscomputationalstylistics
stylo:Stylometric Multivariate Analyses
Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.
Maintained by Maciej Eder. Last updated 2 months ago.
8.3 match 186 stars 8.59 score 462 scriptsjeroen
base64:Base64 Encoder and Decoder
Compatibility wrapper to replace the orphaned package. New applications should use base64 encoders from 'jsonlite' or 'openssl' or 'base64enc'.
Maintained by Jeroen Ooms. Last updated 5 months ago.
10.6 match 2 stars 6.62 score 163 scripts 42 dependentsstatnet
ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks
An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.
Maintained by Pavel N. Krivitsky. Last updated 6 days ago.
4.5 match 100 stars 15.36 score 1.4k scripts 36 dependentsbioc
GenomicAlignments:Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Maintained by Hervรฉ Pagรจs. Last updated 5 months ago.
infrastructuredataimportgeneticssequencingrnaseqsnpcoveragealignmentimmunooncologybioconductor-packagecore-package
4.7 match 10 stars 13.61 score 3.1k scripts 529 dependentsr-lib
jose:JavaScript Object Signing and Encryption
Read and write JSON Web Keys (JWK, rfc7517), generate and verify JSON Web Signatures (JWS, rfc7515) and encode/decode JSON Web Tokens (JWT, rfc7519) <https://datatracker.ietf.org/wg/jose/documents/>. These standards provide modern signing and encryption formats that are natively supported by browsers via the JavaScript WebCryptoAPI <https://www.w3.org/TR/WebCryptoAPI/#jose>, and used by services like OAuth 2.0, LetsEncrypt, and Github Apps.
Maintained by Jeroen Ooms. Last updated 5 months ago.
5.6 match 50 stars 10.98 score 63 scripts 35 dependentsbioc
motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites
We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).
Maintained by Simon Gert Coetzee. Last updated 5 months ago.
chipseqvisualizationmotifannotationtranscription
6.5 match 28 stars 8.96 score 103 scriptsbioc
DNAshapeR:High-throughput prediction of DNA shape features
DNAhapeR is an R/BioConductor package for ultra-fast, high-throughput predictions of DNA shape features. The package allows to predict, visualize and encode DNA shape features for statistical learning.
Maintained by Tsu-Pei Chiu. Last updated 5 months ago.
structuralpredictiondna3dstructuresoftwarecpp
10.4 match 5.57 score 37 scriptsextendr
b64:Fast and Vectorized Base 64 Engine
Provides a fast, lightweight, and vectorized base 64 engine to encode and decode character and raw vectors as well as files stored on disk. Common base 64 alphabets are supported out of the box including the standard, URL-safe, bcrypt, crypt, 'BinHex', and IMAP-modified UTF-7 alphabets. Custom engines can be created to support unique base 64 encoding and decoding needs.
Maintained by Josiah Parry. Last updated 2 months ago.
9.5 match 16 stars 6.09 score 4 scripts 3 dependentsthomasp85
farver:High Performance Colour Space Manipulation
The encoding of colour can be handled in many different ways, using different colour spaces. As different colour spaces have different uses, efficient conversion between these representations are important. The 'farver' package provides a set of functions that gives access to very fast colour space conversion and comparisons implemented in C++, and offers speed improvements over the 'convertColor' function in the 'grDevices' package.
Maintained by Thomas Lin Pedersen. Last updated 10 months ago.
4.0 match 136 stars 14.17 score 164 scripts 7.9k dependentsrstudio
httpuv:HTTP and WebSocket Server Library
Provides low-level socket and protocol support for handling HTTP and WebSocket requests directly from within R. It is primarily intended as a building block for other packages, rather than making it particularly easy to create complete web applications using httpuv alone. httpuv is built on top of the libuv and http-parser C libraries, both of which were developed by Joyent, Inc. (See LICENSE file for libuv and http-parser license information.)
Maintained by Winston Chang. Last updated 12 months ago.
3.8 match 235 stars 15.09 score 708 scripts 2.1k dependentseitsupi
neopolars:R Bindings for the 'polars' Rust Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Tatsuya Shima. Last updated 12 hours ago.
11.6 match 40 stars 4.86 score 1 scriptst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
5.2 match 10.82 score 10k scripts 54 dependentsbioc
immApex:Tools for Adaptive Immune Receptor Sequence-Based Machine and Deep Learning
A set of tools to build tensorflow/keras3-based models in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.
Maintained by Nick Borcherding. Last updated 19 days ago.
softwareimmunooncologysinglecellclassificationannotationsequencingmotifannotation
9.5 match 8 stars 5.92 score 3 scriptscardiomoon
rrtable:Reproducible Research with a Table of R Codes
Makes documents containing plots and tables from a table of R codes. Can make "HTML", "pdf('LaTex')", "docx('MS Word')" and "pptx('MS Powerpoint')" documents with or without R code. In the package, modularized 'shiny' app codes are provided. These modules are intended for reuse across applications.
Maintained by Keon-Woong Moon. Last updated 2 years ago.
8.6 match 3 stars 6.45 score 76 scripts 2 dependentsdataoneorg
dataone:R Interface to the DataONE REST API
Provides read and write access to data and metadata from the DataONE network <https://www.dataone.org> of data repositories. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository. Users can also insert and update data objects on repositories that support these methods.
Maintained by Matthew B. Jones. Last updated 3 years ago.
5.5 match 36 stars 9.93 score 472 scripts 3 dependentsbioc
Biostrings:Efficient manipulation of biological strings
Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.
Maintained by Hervรฉ Pagรจs. Last updated 23 days ago.
sequencematchingalignmentsequencinggeneticsdataimportdatarepresentationinfrastructurebioconductor-packagecore-package
3.0 match 61 stars 17.83 score 8.6k scripts 1.2k dependentshrbrmstr
vegalite:Tools to Encode Visualizations with the 'Grammar of Graphics'-Like 'Vega-Lite' 'Spec'
The 'Vega-Lite' 'JavaScript' framework provides a higher-level grammar for visual analysis, akin to 'ggplot' or 'Tableau', that generates complete 'Vega' specifications. Functions exist which enable building a valid 'spec' from scratch or importing a previously created 'spec' file. Functions also exist to export 'spec' files and to generate code which will enable plots to be embedded in properly configured web pages. The default behavior is to generate an 'htmlwidget'.
Maintained by Bob Rudis. Last updated 7 years ago.
data-visualizationdatavisualizationvega-litevega-lite-specvisualizationwidget
7.0 match 158 stars 7.60 score 84 scriptsjeroen
jsonlite:A Simple and Robust JSON Parser and Generator for R
A reasonably fast JSON parser and generator, optimized for statistical data and the web. Offers simple, flexible tools for working with JSON in R, and is particularly powerful for building pipelines and interacting with a web API. The implementation is based on the mapping described in the vignette (Ooms, 2014). In addition to converting JSON data from/to R objects, 'jsonlite' contains functions to stream, validate, and prettify JSON data. The unit tests included with the package verify that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.
Maintained by Jeroen Ooms. Last updated 22 days ago.
2.5 match 384 stars 21.15 score 27k scripts 8.6k dependentsbioc
cTRAP:Identification of candidate causal perturbations from differential gene expression data
Compare differential gene expression results with those from known cellular perturbations (such as gene knock-down, overexpression or small molecules) derived from the Connectivity Map. Such analyses allow not only to infer the molecular causes of the observed difference in gene expression but also to identify small molecules that could drive or revert specific transcriptomic alterations.
Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.
differentialexpressiongeneexpressionrnaseqtranscriptomicspathwaysimmunooncologygenesetenrichmentbioconductorbioinformaticscmapgene-expressionl1000
10.4 match 5 stars 5.08 score 16 scriptslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 23 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
5.2 match 233 stars 9.84 score 185 scripts 1 dependentskwb-r
kwb.utils:General Utility Functions Developed at KWB
This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).
Maintained by Hauke Sonnenberg. Last updated 12 months ago.
6.8 match 8 stars 7.33 score 12 scripts 78 dependentsqsbase
qs2:Efficient Serialization of R Objects
Streamlines and accelerates the process of saving and loading R objects, improving speed and compression compared to other methods. The package provides two compression formats: the 'qs2' format, which uses R serialization via the C API while optimizing compression and disk I/O, and the 'qdata' format, featuring custom serialization for slightly faster performance and better compression. Additionally, the 'qs2' format can be directly converted to the standard 'RDS' format, ensuring long-term compatibility with future versions of R.
Maintained by Travers Ching. Last updated 9 days ago.
compressiondata-storageserializationcpp
6.4 match 15 stars 7.57 score 25 scripts 2 dependentsmbojan
rgraph6:Representing Graphs as 'graph6', 'digraph6' or 'sparse6' Strings
Encode network data as strings of printable ASCII characters. Implemented functions include encoding and decoding adjacency matrices, edgelists, igraph, and network objects to/from formats 'graph6', 'sparse6', and 'digraph6'. The formats and methods are described in McKay, B.D. and Piperno, A (2014) <doi:10.1016/j.jsc.2013.09.003>.
Maintained by Michal Bojanowski. Last updated 6 months ago.
9.4 match 12 stars 5.08 score 8 scriptsmodal-inria
cfda:Categorical Functional Data Analysis
Package for the analysis of categorical functional data. The main purpose is to compute an encoding (real functional variable) for each state <doi:10.3390/math9233074>. It also provides functions to perform basic statistical analysis on categorical functional data.
Maintained by Quentin Grimonprez. Last updated 2 months ago.
categorical-datafunctional-data-analysishacktoberfest
10.3 match 4 stars 4.60 score 3 scriptsteunbrand
ggh4x:Hacks for 'ggplot2'
A 'ggplot2' extension that does a variety of little helpful things. The package extends 'ggplot2' facets through customisation, by setting individual scales per panel, resizing panels and providing nested facets. Also allows multiple colour and fill scales per plot. Also hosts a smaller collection of stats, geoms and axis guides.
Maintained by Teun van den Brand. Last updated 3 months ago.
3.3 match 616 stars 13.98 score 4.4k scripts 20 dependentstidyverse
ellmer:Chat with Large Language Models
Chat with large language models from a range of providers including 'Claude' <https://claude.ai>, 'OpenAI' <https://chatgpt.com>, and more. Supports streaming, asynchronous calls, tool calling, and structured data extraction.
Maintained by Hadley Wickham. Last updated 5 hours ago.
3.7 match 391 stars 12.65 score 98 scripts 7 dependentseltoulemonde
dataPreparation:Automated Data Preparation
Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of 'data.table' efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.
Maintained by Emmanuel-Lin Toulemonde. Last updated 2 years ago.
data-preparationdata-preprocessingdata-sciencedate-conversionspeedvariable-eliminationvariable-selection
8.5 match 31 stars 5.46 score 86 scriptssdam-au
sdam:Digital Tools for the SDAM Project at Aarhus University
Provides digital tools for performing analyses within Social Dynamics and complexity in the Ancient Mediterranean (SDAM), which is a research group based at the Department of History and Classical Studies at Aarhus University.
Maintained by Antonio Rivero Ostoic. Last updated 3 years ago.
aarhus-universitycartographydata-visualizationdatasetdigital-humanitiesencodingextractinscriptionsrest-apitemporal
11.8 match 4 stars 3.86 score 36 scriptsjeroen
openssl:Toolkit for Encryption, Signatures and Certificates Based on OpenSSL
Bindings to OpenSSL libssl and libcrypto, plus custom SSH key parsers. Supports RSA, DSA and EC curves P-256, P-384, P-521, and curve25519. Cryptographic signatures can either be created and verified manually or via x509 certificates. AES can be used in cbc, ctr or gcm mode for symmetric encryption; RSA for asymmetric (public key) encryption or EC for Diffie Hellman. High-level envelope functions combine RSA and AES for encrypting arbitrary sized data. Other utilities include key generators, hash functions (md5, sha1, sha256, etc), base64 encoder, a secure random number generator, and 'bignum' math methods for manually performing crypto calculations on large multibyte integers.
Maintained by Jeroen Ooms. Last updated 1 months ago.
2.5 match 65 stars 18.00 score 632 scripts 5.0k dependentstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
5.5 match 3 stars 8.20 score 7.8k scripts 11 dependentsjeroen
curl:A Modern and Flexible Web Client for R
Bindings to 'libcurl' <https://curl.se/libcurl/> for performing fully configurable HTTP/FTP requests where responses can be processed in memory, on disk, or streaming via the callback or connection interfaces. Some knowledge of 'libcurl' is recommended; for a more-user-friendly web client see the 'httr2' package which builds on this package with http specific tools and logic.
Maintained by Jeroen Ooms. Last updated 22 days ago.
2.3 match 225 stars 19.95 score 4.0k scripts 5.8k dependentsices-tools-prod
TAF:Transparent Assessment Framework for Reproducible Research
General framework to organize data, methods, and results used in reproducible scientific analyses. A TAF analysis consists of four scripts (data.R, model.R, output.R, report.R) that are run sequentially. Each script starts by reading files from a previous step and ends with writing out files for the next step. Convenience functions are provided to version control the required data and software, run analyses, clean residues from previous runs, manage files, manipulate tables, and produce figures. With a focus on stability and reproducible analyses, the TAF package comes with no dependencies. TAF forms a base layer for the 'icesTAF' package and other scientific applications.
Maintained by Arni Magnusson. Last updated 4 months ago.
6.5 match 3 stars 6.85 score 282 scripts 2 dependentsbioc
atSNP:Affinity test for identifying regulatory SNPs
atSNP performs affinity tests of motif matches with the SNP or the reference genomes and SNP-led changes in motif matches.
Maintained by Sunyoung Shin. Last updated 5 months ago.
softwarechipseqgenomeannotationmotifannotationvisualizationcpp
7.7 match 1 stars 5.73 score 36 scriptstidyverse
stringr:Simple, Consistent Wrappers for Common String Operations
A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.
Maintained by Hadley Wickham. Last updated 7 months ago.
2.0 match 622 stars 21.97 score 164k scripts 8.2k dependentsblasbenito
collinear:Automated Multicollinearity Management
Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.
Maintained by Blas M. Benito. Last updated 2 months ago.
machine-learningmulticollinearitystatistics
7.9 match 11 stars 5.51 score 15 scripts 1 dependentsyihui
knitr:A General-Purpose Package for Dynamic Report Generation in R
Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.
Maintained by Yihui Xie. Last updated 1 days ago.
dynamic-documentsknitrliterate-programmingrmarkdownsweave
1.8 match 2.4k stars 23.62 score 116k scripts 4.2k dependentstidyverse
readr:Read Rectangular Text Data
The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.
Maintained by Jennifer Bryan. Last updated 8 months ago.
2.0 match 1.0k stars 21.03 score 132k scripts 2.0k dependentsluca-scr
GA:Genetic Algorithms
Flexible general-purpose toolbox implementing genetic algorithms (GAs) for stochastic optimisation. Binary, real-valued, and permutation representations are available to optimize a fitness function, i.e. a function provided by users depending on their objective function. Several genetic operators are available and can be combined to explore the best settings for the current task. Furthermore, users can define new genetic operators and easily evaluate their performances. Local search using general-purpose optimisation algorithms can be applied stochastically to exploit interesting regions. GAs can be run sequentially or in parallel, using an explicit master-slave parallelisation or a coarse-grain islands approach. For more details see Scrucca (2013) <doi:10.18637/jss.v053.i04> and Scrucca (2017) <doi:10.32614/RJ-2017-008>.
Maintained by Luca Scrucca. Last updated 6 months ago.
genetic-algorithmoptimisationcpp
3.6 match 93 stars 11.58 score 624 scripts 52 dependentsmhahsler
arules:Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.
Maintained by Michael Hahsler. Last updated 1 months ago.
arulesassociation-rulesfrequent-itemsets
3.0 match 194 stars 13.99 score 3.3k scripts 28 dependentsecodynizw
vietnameseConverter:Convert Vietnamese Encodings
Conversion of characters from unsupported Vietnamese character encodings to Unicode characters. These Vietnamese encodings (TCVN3, VISCII, VPS) are not natively supported in R and lead to printing of wrong characters and garbled text (mojibake). This package fixes that problem and provides readable output with the correct Unicode characters (with or without diacritics).
Maintained by Juergen Niedballa. Last updated 3 years ago.
10.4 match 2 stars 4.00 score 4 scriptshetong007
pullword:R Interface to Pullword Service
R Interface to Pullword Service for natural language processing in Chinese. It enables users to extract valuable words from text by deep learning models. For more details please visit the official site (in Chinese) <http://www.pullword.com/>.
Maintained by Tong He. Last updated 4 years ago.
10.5 match 19 stars 3.98 score 1 scriptstidyverse
lubridate:Make Dealing with Dates a Little Easier
Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects. The 'lubridate' package has a consistent and memorable syntax that makes working with dates easy and fun.
Maintained by Vitalie Spinu. Last updated 3 months ago.
1.9 match 757 stars 20.95 score 135k scripts 1.9k dependentsropensci
redland:RDF Library Bindings in R
Provides methods to parse, query and serialize information stored in the Resource Description Framework (RDF). RDF is described at <https://www.w3.org/TR/rdf-primer/>. This package supports RDF by implementing an R interface to the Redland RDF C library, described at <https://librdf.org/docs/api/index.html>. In brief, RDF provides a structured graph consisting of Statements composed of Subject, Predicate, and Object Nodes.
Maintained by Matthew B. Jones. Last updated 1 years ago.
5.0 match 17 stars 7.85 score 98 scripts 13 dependentsbnosac
sentencepiece:Text Tokenization using Byte Pair Encoding and Unigram Modelling
Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library <https://github.com/google/sentencepiece> which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) <doi:10.18653/v1/D18-2012>. Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using 'word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) <http://www.lrec-conf.org/proceedings/lrec2018/pdf/1049.pdf>.
Maintained by Jan Wijffels. Last updated 2 years ago.
bytenatural-language-processingsentencepieceword-segmentationcpp
9.4 match 25 stars 4.10 score 8 scriptsconjugateprior
events:Store and Manipulate Event Data
The events package manipulates, aggregates and otherwise messes with event data from 'KEDS' and 'TABARI' software and those with similar output. It also bundles several classic event data sets. Most functions are superseded by those in 'dplyr' and 'tidyr'.
Maintained by William Lowe. Last updated 3 years ago.
10.9 match 3 stars 3.52 score 22 scriptsjalvesaq
descr:Descriptive Statistics
Weighted frequency and contingency tables of categorical variables and of the comparison of the mean value of a numerical variable by the levels of a factor, and methods to produce xtable objects of the tables and to plot them. There are also functions to facilitate the character encoding conversion of objects, to quickly convert fixed width files into csv ones, and to export a data.frame to a text file with the necessary R and SPSS codes to reread the data.
Maintained by Jakson Aquino. Last updated 1 years ago.
4.3 match 18 stars 8.80 score 692 scripts 4 dependentsbioc
genomation:Summary, annotation and visualization of genomic data
A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.
Maintained by Altuna Akalin. Last updated 5 months ago.
annotationsequencingvisualizationcpgislandcpp
3.4 match 75 stars 11.09 score 738 scripts 5 dependentsbleutner
RStoolbox:Remote Sensing Data Analysis
Toolbox for remote sensing image processing and analysis such as calculating spectral indexes, principal component transformation, unsupervised and supervised classification or fractional cover analyses.
Maintained by Konstantin Mueller. Last updated 1 months ago.
ggplot2land-cover-mappingremote-sensingspectral-unmixingsupervised-classificationunsupervised-classificationopenblascpp
3.7 match 275 stars 10.10 score 1.1k scriptse-sensing
sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes
An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
Maintained by Gilberto Camara. Last updated 1 months ago.
big-earth-datacbersearth-observationeo-datacubesgeospatialimage-time-seriesland-cover-classificationlandsatplanetary-computerr-spatialremote-sensingrspatialsatellite-image-time-seriessatellite-imagerysentinel-2stac-apistac-catalogcpp
3.9 match 494 stars 9.50 score 384 scriptstidyverse
rvest:Easily Harvest (Scrape) Web Pages
Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.
Maintained by Hadley Wickham. Last updated 5 months ago.
1.9 match 1.5k stars 19.62 score 29k scripts 546 dependentsr-rust
gifski:Highest Quality GIF Encoder
Multi-threaded GIF encoder written in Rust: <https://gif.ski/>. Converts images to GIF animations using pngquant's efficient cross-frame palettes and temporal dithering with thousands of colors per frame.
Maintained by Jeroen Ooms. Last updated 5 months ago.
3.6 match 74 stars 10.05 score 2.6k scripts 8 dependentsdatawookie
emayili:Send Email Messages
A light, simple tool for sending emails with minimal dependencies.
Maintained by Andrew B. Collier. Last updated 1 months ago.
3.8 match 180 stars 9.59 score 95 scripts 3 dependentsbioc
ShortRead:FASTQ input and manipulation
This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
dataimportsequencingqualitycontrolbioconductor-packagecore-packagezlibcpp
3.0 match 8 stars 12.08 score 1.8k scripts 49 dependentsmichaelchirico
geohashTools:Tools for Working with Geohashes
Tools for working with Gustavo Niemeyer's geohash coordinate system, including API for interacting with other common R GIS libraries.
Maintained by Michael Chirico. Last updated 1 years ago.
5.0 match 52 stars 7.18 score 30 scripts 6 dependentsrstudio
htmltools:Tools for HTML
Tools for HTML generation and output.
Maintained by Carson Sievert. Last updated 10 months ago.
2.0 match 218 stars 17.61 score 10k scripts 4.5k dependentsplotly
plotly:Create Interactive Web Graphics via 'plotly.js'
Create interactive web graphics from 'ggplot2' graphs and/or a custom interface to the (MIT-licensed) JavaScript library 'plotly.js' inspired by the grammar of graphics.
Maintained by Carson Sievert. Last updated 3 months ago.
d3jsdata-visualizationggplot2javascriptplotlyshinywebgl
1.8 match 2.6k stars 19.36 score 93k scripts 783 dependentsmyles-lewis
nestedcv:Nested Cross-Validation with 'glmnet' and 'caret'
Implements nested k*l-fold cross-validation for lasso and elastic-net regularised linear models via the 'glmnet' package and other machine learning models via the 'caret' package <doi:10.1093/bioadv/vbad048>. Cross-validation of 'glmnet' alpha mixing parameter and embedded fast filter functions for feature selection are provided. Described as double cross-validation by Stone (1977) <doi:10.1111/j.2517-6161.1977.tb01603.x>. Also implemented is a method using outer CV to measure unbiased model performance metrics when fitting Bayesian linear and logistic regression shrinkage models using the horseshoe prior over parameters to encourage a sparse model as described by Piironen & Vehtari (2017) <doi:10.1214/17-EJS1337SI>.
Maintained by Myles Lewis. Last updated 5 days ago.
4.4 match 12 stars 7.92 score 46 scriptsropensci
magick:Advanced Graphics and Image-Processing in R
Bindings to 'ImageMagick': the most comprehensive open-source image processing library available. Supports many common formats (png, jpeg, tiff, pdf, etc) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio images are automatically previewed when printed to the console, resulting in an interactive editing environment. The latest version of the package includes a native graphics device for creating in-memory graphics or drawing onto images using pixel coordinates.
Maintained by Jeroen Ooms. Last updated 19 days ago.
image-manipulationimage-processingimagemagickcpp
2.0 match 468 stars 17.31 score 9.0k scripts 256 dependentsbioc
ensembldb:Utilities to create and use Ensembl-based annotation databases
The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.
Maintained by Johannes Rainer. Last updated 5 months ago.
geneticsannotationdatasequencingcoverageannotationbioconductorbioconductor-packagesensembl
2.4 match 35 stars 14.08 score 892 scripts 108 dependentshauselin
ollamar:'Ollama' Language Models
An interface to easily run local language models with 'Ollama' <https://ollama.com> server and API endpoints (see <https://github.com/ollama/ollama/blob/main/docs/api.md> for details). It lets you run open-source large language models locally on your machine.
Maintained by Hause Lin. Last updated 2 months ago.
3.6 match 84 stars 9.36 score 74 scripts 5 dependentshetong007
rLTP:R Interface to the 'LTP'-Cloud Service
R interface to the 'LTP'-Cloud service for Natural Language Processing in Chinese (http://www.ltp-cloud.com/).
Maintained by Tong He. Last updated 8 years ago.
10.5 match 3 stars 3.18 score 1 scriptssfirke
janitor:Simple Tools for Examining and Cleaning Dirty Data
The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and explore duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness.
Maintained by Sam Firke. Last updated 3 months ago.
data-analysisdata-cleaningdata-sciencedirty-dataexcelpivot-tablesspsstabulationstidyverse
1.7 match 1.4k stars 19.15 score 35k scripts 231 dependentsschaffman5
rtf:Rich Text Format (RTF) Output
A set of R functions to output Rich Text Format (RTF) files with high resolution tables and graphics that may be edited with a standard word processor such as Microsoft Word.
Maintained by Michael E. Schaffer. Last updated 6 years ago.
3.8 match 5 stars 8.55 score 169 scripts 10 dependentsjoshwlambert
DAISIEprep:Extracts Phylogenetic Island Community Data from Phylogenetic Trees
Extracts colonisation and branching times of island species to be used for analysis in the R package 'DAISIE'. It uses phylogenetic and endemicity data to extract the separate island colonists and store them.
Maintained by Joshua W. Lambert. Last updated 1 months ago.
data-scienceisland-biogeographyphylogenetics
4.7 match 6 stars 6.78 score 24 scriptsr-lib
httr2:Perform HTTP Requests and Process the Responses
Tools for creating and modifying HTTP requests, then performing them and processing the results. 'httr2' is a modern re-imagining of 'httr' that uses a pipe-based interface and solves more of the problems that API wrapping packages face.
Maintained by Hadley Wickham. Last updated 7 days ago.
1.8 match 246 stars 17.66 score 1.9k scripts 1.1k dependentssherrisherry
cleandata:To Inspect and Manipulate Data; and to Keep Track of This Process
Functions to work with data frames to prepare data for further analysis. The functions for imputation, encoding, partitioning, and other manipulation can produce log files to keep track of process.
Maintained by Sherry Zhao. Last updated 6 years ago.
data-analysisdata-miningmachine-learningwrangling
8.5 match 3 stars 3.72 score 35 scriptsr-lib
processx:Execute and Control System Processes
Tools to run system processes in the background. It can check if a background process is running; wait on a background process to finish; get the exit status of finished processes; kill background processes. It can read the standard output and error of the processes, using non-blocking connections. 'processx' can poll a process for standard output or error, with a timeout. It can also poll several processes at once.
Maintained by Gรกbor Csรกrdi. Last updated 22 days ago.
2.0 match 235 stars 15.53 score 340 scripts 1.4k dependentscran
tmcn:A Text Mining Toolkit for Chinese
A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.
Maintained by Jian Li. Last updated 6 years ago.
12.9 match 1 stars 2.38 score 5 dependentshenrikbengtsson
aroma.affymetrix:Analysis of Large Affymetrix Microarray Data Sets
A cross-platform R framework that facilitates processing of any number of Affymetrix microarray samples regardless of computer system. The only parameter that limits the number of chips that can be processed is the amount of available disk space. The Aroma Framework has successfully been used in studies to process tens of thousands of arrays. This package has actively been used since 2006.
Maintained by Henrik Bengtsson. Last updated 1 years ago.
infrastructureproprietaryplatformsexonarraymicroarrayonechannelguidataimportdatarepresentationpreprocessingqualitycontrolvisualizationreportwritingacghcopynumbervariantsdifferentialexpressiongeneexpressionsnptranscriptionaffymetrixanalysiscopy-numberdnaexpressionhpclarge-scalenotebookreproducibilityrna
5.3 match 10 stars 5.79 score 112 scripts 3 dependentsjhelvy
logitr:Logit Models w/Preference & WTP Space Utility Parameterizations
Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R. Models can be estimated using "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. Weighted models can also be estimated. An option is available to run a parallelized multistart optimization loop with random starting points in each iteration, which is useful for non-convex problems like MXL models or models with WTP space utility parameterizations. The main optimization loop uses the 'nloptr' package to minimize the negative log-likelihood function. Additional functions are available for computing and comparing WTP from both preference space and WTP space models and for predicting expected choices and choice probabilities for sets of alternatives based on an estimated model. Mixed logit models can include uncorrelated or correlated heterogeneity covariances and are estimated using maximum simulated likelihood based on the algorithms in Train (2009) <doi:10.1017/CBO9780511805271>. More details can be found in Helveston (2023) <doi:10.18637/jss.v105.i10>.
Maintained by John Helveston. Last updated 4 months ago.
log-likelihoodlogitlogit-modelmixed-logitmlogitmultinomial-regressionmxlmxl-modelspreference-spacepreferenceswillingness-to-paywtp
3.3 match 54 stars 9.10 score 119 scripts 1 dependentsbioc
Melissa:Bayesian clustering and imputationa of single cell methylomes
Melissa is a Baysian probabilistic model for jointly clustering and imputing single cell methylomes. This is done by taking into account local correlations via a Generalised Linear Model approach and global similarities using a mixture modelling approach.
Maintained by C. A. Kapourani. Last updated 5 months ago.
immunooncologydnamethylationgeneexpressiongeneregulationepigeneticsgeneticsclusteringfeatureextractionregressionrnaseqbayesiankeggsequencingcoveragesinglecell
6.2 match 4.90 score 7 scriptscran
RCurl:General Network (HTTP/FTP/...) Client Interface for R
A wrapper for 'libcurl' <https://curl.se/libcurl/> Provides functions to allow one to compose general HTTP requests and provides convenient functions to fetch URIs, get & post forms, etc. and process the results returned by the Web server. This provides a great deal of control over the HTTP/FTP/... connection and the form of the request while providing a higher-level interface than is available just using R socket connections. Additionally, the underlying implementation is robust and extensive, supporting FTP/FTPS/TFTP (uploads and downloads), SSL/HTTPS, telnet, dict, ldap, and also supports cookies, redirects, authentication, etc.
Maintained by CRAN Team. Last updated 8 months ago.
3.6 match 2 stars 8.13 score 1.0k dependentsbflammers
ANN2:Artificial Neural Networks for Anomaly Detection
Training of neural networks for classification and regression tasks using mini-batch gradient descent. Special features include a function for training autoencoders, which can be used to detect anomalies, and some related plotting functions. Multiple activation functions are supported, including tanh, relu, step and ramp. For the use of the step and ramp activation functions in detecting anomalies using autoencoders, see Hawkins et al. (2002) <doi:10.1007/3-540-46145-0_17>. Furthermore, several loss functions are supported, including robust ones such as Huber and pseudo-Huber loss, as well as L1 and L2 regularization. The possible options for optimization algorithms are RMSprop, Adam and SGD with momentum. The package contains a vectorized C++ implementation that facilitates fast training through mini-batch learning.
Maintained by Bart Lammers. Last updated 4 years ago.
anomaly-detectionartificial-neural-networksautoencodersneural-networksrobust-statisticsopenblascppopenmp
5.3 match 13 stars 5.59 score 60 scriptsriatelab
gepaf:Google Encoded Polyline Algorithm Format
Encode and decode the Google Encoded Polyline Algorithm Format. See <https://developers.google.com/maps/documentation/utilities/polylinealgorithm> for more information.
Maintained by Timothรฉe Giraud. Last updated 5 months ago.
7.5 match 6 stars 3.84 score 23 scriptsalshum
hashids:Generate Short Unique YouTube-Like IDs (Hashes) from Integers
An R port of the hashids library. hashids generates YouTube-like hashes from integers or vector of integers. Hashes generated from integers are relatively short, unique and non-seqential. hashids can be used to generate unique ids for URLs and hide database row numbers from the user. By default hashids will avoid generating common English cursewords by preventing certain letters being next to each other. hashids are not one-way: it is easy to encode an integer to a hashid and decode a hashid back into an integer.
Maintained by Alex Shum. Last updated 6 years ago.
7.0 match 18 stars 4.10 score 14 scriptsthierryo
qrcode:Generate QRcodes with R
Create static QR codes in R. The content of the QR code is exactly what the user defines. We don't add a redirect URL, making it impossible for us to track the usage of the QR code. This allows to generate fast, free to use and privacy friendly QR codes.
Maintained by Thierry Onkelinx. Last updated 6 months ago.
qrcodeqrcode-generatorr-project
3.6 match 44 stars 7.56 score 456 scripts 7 dependentsenricoschumann
textutils:Utilities for Handling Strings and Text
Utilities for handling character vectors that store human-readable text (either plain or with markup, such as HTML or LaTeX). The package provides, in particular, functions that help with the preparation of plain-text reports, e.g. for expanding and aligning strings that form the lines of such reports. The package also provides generic functions for transforming R objects to HTML and to plain text.
Maintained by Enrico Schumann. Last updated 2 months ago.
3.7 match 11 stars 7.37 score 47 scripts 12 dependentsdmkaplan2000
knitrdata:Data Language Engine for 'knitr' / 'rmarkdown'
Implements a data language engine for incorporating data directly in 'rmarkdown' documents so that they can be made completely standalone.
Maintained by David M. Kaplan. Last updated 3 years ago.
5.7 match 7 stars 4.75 score 16 scriptsgravesee
onehot:Fast Onehot Encoding for Data.frames
Quickly create numeric matrices for machine learning algorithms that require them. It converts factor columns into onehot vectors.
Maintained by Eric E. Graves. Last updated 6 years ago.
4.9 match 11 stars 5.45 score 86 scripts 2 dependentsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
3.8 match 145 stars 7.09 score 50 scripts 2 dependentstidymodels
hardhat:Construct Modeling Packages
Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.
Maintained by Hannah Frick. Last updated 1 months ago.
1.8 match 103 stars 14.88 score 175 scripts 436 dependentsaphalo
SunCalcMeeus:Sun Position and Daylight Calculations
Compute the position of the sun, and local solar time using Meeus' formulae. Compute day and/or night length using different twilight definitions or arbitrary sun elevation angles. This package is part of the 'r4photobiology' suite, Aphalo, P. J. (2015) <doi:10.19232/uv4pb.2015.1.14>. Algorithms from Meeus (1998, ISBN:0943396611).
Maintained by Pedro J. Aphalo. Last updated 2 months ago.
4.0 match 1 stars 6.49 score 6 scripts 13 dependentsr-lib
desc:Manipulate DESCRIPTION Files
Tools to read, write, create, and manipulate DESCRIPTION files. It is intended for packages that create or manipulate other packages.
Maintained by Gรกbor Csรกrdi. Last updated 1 months ago.
1.8 match 123 stars 14.68 score 409 scripts 1.1k dependentstrevorhastie
glmnet:Lasso and Elastic-Net Regularized Generalized Linear Models
Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple-response Gaussian, and the grouped multinomial regression; see <doi:10.18637/jss.v033.i01> and <doi:10.18637/jss.v039.i05>. There are two new and important additions. The family argument can be a GLM family object, which opens the door to any programmed family (<doi:10.18637/jss.v106.i01>). This comes with a modest computational cost, so when the built-in families suffice, they should be used instead. The other novelty is the relax option, which refits each of the active sets in the path unpenalized. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the papers cited.
Maintained by Trevor Hastie. Last updated 2 years ago.
1.7 match 82 stars 15.15 score 22k scripts 736 dependentsluca-scr
mclust:Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation
Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
Maintained by Luca Scrucca. Last updated 11 months ago.
2.0 match 21 stars 12.23 score 6.6k scripts 587 dependentsbioc
IPO:Automated Optimization of XCMS Data Processing parameters
The outcome of XCMS data processing strongly depends on the parameter settings. IPO (`Isotopologue Parameter Optimization`) is a parameter optimization tool that is applicable for different kinds of samples and liquid chromatography coupled to high resolution mass spectrometry devices, fast and free of labeling steps. IPO uses natural, stable 13C isotopes to calculate a peak picking score. Retention time correction is optimized by minimizing the relative retention time differences within features and grouping parameters are optimized by maximizing the number of features showing exactly one peak from each injection of a pooled sample. The different parameter settings are achieved by design of experiment. The resulting scores are evaluated using response surface models.
Maintained by Thomas Lieb. Last updated 5 months ago.
immunooncologymetabolomicsmassspectrometry
3.0 match 34 stars 8.14 score 41 scriptssymbolrush
osrmr:Wrapper for the 'OSRM' API
Wrapper around the 'Open Source Routing Machine (OSRM)' API <http://project-osrm.org/>. 'osrmr' works with API versions 4 and 5 and can handle servers that run locally as well as the 'OSRM' webserver.
Maintained by Adrian Staempfli. Last updated 4 years ago.
8.0 match 3.06 score 23 scriptsschochastics
shortuuid:Generate and Translate Standard UUIDs
Generate and translate standard UUIDs into shorter - or just different - formats and back. Also implements base58 encoders and decoders.
Maintained by David Schoch. Last updated 7 months ago.
9.4 match 4 stars 2.60 score 4 scriptsmyominnoo
mStats:Medical Statistics & Epidemiological Analysis
A set of tidyverse-friendly functions for data management, calculation of epidemiological measures, statistical analysis, and table creation.
Maintained by Myo Minn Oo. Last updated 1 years ago.
data-managementepidemiological-calculationsmedical-statistics
4.9 match 4.98 score 16 scripts 1 dependentsrubenarslan
codebook:Automatic Codebooks from Metadata Encoded in Dataset Attributes
Easily automate the following tasks to describe data frames: Summarise the distributions, and labelled missings of variables graphically and using descriptive statistics. For surveys, compute and summarise reliabilities (internal consistencies, retest, multilevel) for psychological scales. Combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on 'rmarkdown' partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.) and in JSON-LD, so that search engines can find your data and index the metadata. The metadata are also available at your fingertips via RStudio Addins.
Maintained by Ruben Arslan. Last updated 3 months ago.
codebookdocumentationformrjson-ldmetadataspsswebapp
2.9 match 142 stars 8.31 score 229 scriptsben519
mltools:Machine Learning Tools
A collection of machine learning helper functions, particularly assisting in the Exploratory Data Analysis phase. Makes heavy use of the 'data.table' package for optimal speed and memory efficiency. Highlights include a versatile bin_data() function, sparsify() for converting a data.table to sparse matrix format with one-hot encoding, fast evaluation metrics, and empirical_cdf() for calculating empirical Multivariate Cumulative Distribution Functions.
Maintained by Ben Gorman. Last updated 3 years ago.
exploratory-data-analysismachine-learning
2.5 match 72 stars 9.58 score 1.2k scripts 13 dependentscran
notebookutils:Dummy R APIs Used in 'Azure Synapse Analytics' for Local Developments
This is a pure dummy interfaces package which mirrors 'MsSparkUtils' APIs <https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-r> of 'Azure Synapse Analytics' <https://learn.microsoft.com/en-us/azure/synapse-analytics/> for R users, customer of Azure Synapse can download this package from CRAN for local development.
Maintained by runtimeexp. Last updated 11 months ago.
10.2 match 2.36 score 23 scriptsbioc
GenomicDistributions:GenomicDistributions: fast analysis of genomic intervals with Bioconductor
If you have a set of genomic ranges, this package can help you with visualization and comparison. It produces several kinds of plots, for example: Chromosome distribution plots, which visualize how your regions are distributed over chromosomes; feature distance distribution plots, which visualizes how your regions are distributed relative to a feature of interest, like Transcription Start Sites (TSSs); genomic partition plots, which visualize how your regions overlap given genomic features such as promoters, introns, exons, or intergenic regions. It also makes it easy to compare one set of ranges to another.
Maintained by Kristyna Kupkova. Last updated 5 months ago.
softwaregenomeannotationgenomeassemblydatarepresentationsequencingcoveragefunctionalgenomicsvisualization
3.2 match 26 stars 7.44 score 25 scriptsbioc
multiGSEA:Combining GSEA-based pathway enrichment with multi omics data integration
Extracted features from pathways derived from 8 different databases (KEGG, Reactome, Biocarta, etc.) can be used on transcriptomic, proteomic, and/or metabolomic level to calculate a combined GSEA-based enrichment score.
Maintained by Sebastian Canzler. Last updated 2 months ago.
genesetenrichmentpathwaysreactomebiocarta
3.4 match 18 stars 7.06 score 32 scriptsvandomed
accelerometry:Functions for Processing Accelerometer Data
A collection of functions that perform operations on time-series accelerometer data, such as identify non-wear time, flag minutes that are part of an activity bout, and find the maximum 10-minute average count value. The functions are generally very flexible, allowing for a variety of algorithms to be implemented. Most of the functions are written in C++ for efficiency.
Maintained by Dane R. Van Domelen. Last updated 6 years ago.
accelerometerexercisemoving-averagephysical-activitysedentary-lifewearable-devicescpp
3.5 match 6 stars 6.62 score 31 scripts 5 dependentsbioc
ModCon:Modifying splice site usage by changing the mRNP code, while maintaining the genetic code
Collection of functions to calculate a nucleotide sequence surrounding for splice donors sites to either activate or repress donor usage. The proposed alternative nucleotide sequence encodes the same amino acid and could be applied e.g. in reporter systems to silence or activate cryptic splice donor sites.
Maintained by Johannes Ptok. Last updated 5 months ago.
functionalgenomicsalternativesplicing
5.8 match 1 stars 4.00 score 2 scriptsbioc
xCell2:A Tool for Generic Cell Type Enrichment Analysis
xCell2 provides methods for cell type enrichment analysis using cell type signatures. It includes three main functions - 1. xCell2Train for training custom references objects from bulk or single-cell RNA-seq datasets. 2. xCell2Analysis for conducting the cell type enrichment analysis using the custom reference. 3. xCell2GetLineage for identifying dependencies between different cell types using ontology.
Maintained by Almog Angel. Last updated 2 months ago.
geneexpressiontranscriptomicsmicroarrayrnaseqsinglecelldifferentialexpressionimmunooncologygenesetenrichment
3.8 match 6 stars 6.17 score 15 scriptsr-dbi
RPostgres:C++ Interface to PostgreSQL
Fully DBI-compliant C++-backed interface to PostgreSQL <https://www.postgresql.org/>, an open-source relational database.
Maintained by Kirill Mรผller. Last updated 19 days ago.
1.5 match 338 stars 14.78 score 1.6k scripts 31 dependentsgjmvanboxtel
gsignal:Signal Processing
R implementation of the 'Octave' package 'signal', containing a variety of signal processing tools, such as signal generation and measurement, correlation and convolution, filtering, filter design, filter analysis and conversion, power spectrum analysis, system identification, decimation and sample rate change, and windowing.
Maintained by Geert van Boxtel. Last updated 2 months ago.
2.3 match 24 stars 10.03 score 133 scripts 34 dependentsshikokuchuo
secretbase:Cryptographic Hash, Extendable-Output and Base64 Functions
Fast and memory-efficient streaming hash functions and base64 encoding / decoding. Hashes strings and raw vectors directly. Stream hashes files which can be larger than memory, as well as in-memory objects through R's serialization mechanism. Implementations include the SHA-256, SHA-3 and 'Keccak' cryptographic hash functions, SHAKE256 extendable-output function (XOF), and 'SipHash' pseudo-random function.
Maintained by Charlie Gao. Last updated 1 days ago.
base64cryptographic-hash-functionsextendable-output-functionskeccaksha256sha3shake256siphash
2.8 match 11 stars 8.14 score 8 scripts 24 dependentsmelff
memisc:Management of Survey Data and Presentation of Analysis Results
An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.
Maintained by Martin Elff. Last updated 11 days ago.
1.8 match 46 stars 12.34 score 1.2k scripts 13 dependentsinsightsengineering
tern:Create Common TLGs Used in Clinical Trials
Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.
Maintained by Joe Zhu. Last updated 2 months ago.
clinical-trialsgraphslistingsnestoutputstables
1.8 match 79 stars 12.62 score 186 scripts 9 dependentsr-lib
nanoparquet:Read and Write 'Parquet' Files
Self-sufficient reader and writer for flat 'Parquet' files. Can read most 'Parquet' data types. Can write many 'R' data types, including factors and temporal types. See docs for limitations.
Maintained by Gรกbor Csรกrdi. Last updated 22 days ago.
2.3 match 60 stars 9.78 score 99 scripts 8 dependentssymbolixau
googleway:Accesses Google Maps APIs to Retrieve Data and Plot Maps
Provides a mechanism to plot a 'Google Map' from 'R' and overlay it with shapes and markers. Also provides access to 'Google Maps' APIs, including places, directions, roads, distances, geocoding, elevation and timezone.
Maintained by David Cooley. Last updated 6 months ago.
google-mapgoogle-mapsgoogle-maps-apigoogle-maps-javascript-apispatialspatial-analysis
2.3 match 236 stars 9.67 score 536 scripts 2 dependentsouhscbbmc
REDCapR:Interaction Between R and REDCap
Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.
Maintained by Will Beasley. Last updated 2 months ago.
1.8 match 118 stars 12.36 score 438 scripts 6 dependentsandrija-djurovic
PDtoolkit:Collection of Tools for PD Rating Model Development and Validation
The goal of this package is to cover the most common steps in probability of default (PD) rating model development and validation. The main procedures available are those that refer to univariate, bivariate, multivariate analysis, calibration and validation. Along with accompanied 'monobin' and 'monobinShiny' packages, 'PDtoolkit' provides functions which are suitable for different data transformation and modeling tasks such as: imputations, monotonic binning of numeric risk factors, binning of categorical risk factors, weights of evidence (WoE) and information value (IV) calculations, WoE coding (replacement of risk factors modalities with WoE values), risk factor clustering, area under curve (AUC) calculation and others. Additionally, package provides set of validation functions for testing homogeneity, heterogeneity, discriminatory and predictive power of the model.
Maintained by Andrija Djurovic. Last updated 1 years ago.
4.5 match 14 stars 4.78 score 86 scriptstrelliscope
trelliscope:Create Interactive Multi-Panel Displays
Trelliscope enables interactive exploration of data frames of visualizations.
Maintained by Ryan Hafen. Last updated 7 months ago.
3.3 match 29 stars 6.43 score 117 scriptsfanhansen
creditmodel:Toolkit for Credit Modeling, Analysis and Visualization
Provides a highly efficient R tool suite for Credit Modeling, Analysis and Visualization.Contains infrastructure functionalities such as data exploration and preparation, missing values treatment, outliers treatment, variable derivation, variable selection, dimensionality reduction, grid search for hyper parameters, data mining and visualization, model evaluation, strategy analysis etc. This package is designed to make the development of binary classification models (machine learning based models as well as credit scorecard) simpler and faster. The references including: 1 Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS; 2 Bezdek, James C.FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences (0098-3004),<DOI:10.1016/0098-3004(84)90020-7>.
Maintained by Dongping Fan. Last updated 3 years ago.
6.1 match 4 stars 3.48 score 15 scriptsr-lib
gmailr:Access the 'Gmail' 'RESTful' API
An interface to the 'Gmail' 'RESTful' API. Allows access to your 'Gmail' messages, threads, drafts and labels.
Maintained by Jennifer Bryan. Last updated 1 years ago.
1.8 match 230 stars 11.49 score 289 scripts 1 dependentsruthkr
deepredeff:Deep Learning Prediction of Effectors
A tool that contains trained deep learning models for predicting effector proteins. 'deepredeff' has been trained to identify effector proteins using a set of known experimentally validated effectors from either bacteria, fungi, or oomycetes. Documentation is available via several vignettes, and the paper by Kristianingsih and MacLean (2020) <doi:10.1101/2020.07.08.193250>.
Maintained by Ruth Kristianingsih. Last updated 2 years ago.
4.3 match 4 stars 4.86 score 18 scriptsapache
nanoarrow:Interface to the 'nanoarrow' 'C' Library
Provides an 'R' interface to the 'nanoarrow' 'C' library and the 'Apache Arrow' application binary interface. Functions to import and export 'ArrowArray', 'ArrowSchema', and 'ArrowArrayStream' 'C' structures to and from 'R' objects are provided alongside helpers to facilitate zero-copy data transfer among 'R' bindings to libraries implementing the 'Arrow' 'C' data interface.
Maintained by Dewey Dunnington. Last updated 23 hours ago.
1.8 match 183 stars 11.79 score 37 scripts 27 dependentsbodkan
slendr:A Simulation Framework for Spatiotemporal Population Genetics
A framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed by the 'SLiM' software by Haller et al. (2019) <doi:10.1093/molbev/msy228> behind the scenes, using a custom built-in simulation 'SLiM' script. Additionally, fully abstract spatial models not tied to a specific geographic location are supported, and users can also simulate data from standard, non-spatial, random-mating models. These can be simulated either with the 'SLiM' built-in back-end script, or using an efficient coalescent population genetics simulator 'msprime' by Baumdicker et al. (2022) <doi:10.1093/genetics/iyab229> with a custom-built 'Python' script bundled with the R package. Simulated genomic data is saved in a tree-sequence format and can be loaded, manipulated, and summarised using tree-sequence functionality via an R interface to the 'Python' module 'tskit' by Kelleher et al. (2019) <doi:10.1038/s41588-019-0483-y>. Complete model configuration, simulation and analysis pipelines can be therefore constructed without a need to leave the R environment, eliminating friction between disparate tools for population genetic simulations and data analysis.
Maintained by Martin Petr. Last updated 11 days ago.
popgenpopulation-geneticssimulationsspatial-statistics
2.3 match 56 stars 9.15 score 88 scriptsvjcitn
combinat:combinatorics utilities
routines for combinatorics
Maintained by Vince Carey. Last updated 12 years ago.
2.7 match 7.75 score 744 scripts 229 dependentsrichfitz
storr:Simple Key Value Stores
Creates and manages simple key-value stores. These can use a variety of approaches for storing the data. This package implements the base methods and support for file system, in-memory and DBI-based database stores.
Maintained by Rich FitzJohn. Last updated 4 years ago.
2.0 match 117 stars 10.21 score 57 scripts 33 dependentscran
wavethresh:Wavelets Statistics and Transforms
Performs 1, 2 and 3D real and complex-valued wavelet transforms, nondecimated transforms, wavelet packet transforms, nondecimated wavelet packet transforms, multiple wavelet transforms, complex-valued wavelet transforms, wavelet shrinkage for various kinds of data, locally stationary wavelet time series, nonstationary multiscale transfer function modeling, density estimation.
Maintained by Guy Nason. Last updated 7 months ago.
3.5 match 5.89 score 41 dependentsqinwf
jiebaR:Chinese Text Segmentation
Chinese text segmentation, keyword extraction and speech tagging For R.
Maintained by Qin Wenfeng. Last updated 5 years ago.
chinesechinese-text-segmentationcppjiebajiebalexical-analysisnlpcpp
2.0 match 348 stars 10.18 score 456 scripts 6 dependentsmichelnivard
gptstudio:Use Large Language Models Directly in your Development Environment
Large language models are readily accessible via API. This package lowers the barrier to use the API inside of your development environment. For more on the API, see <https://platform.openai.com/docs/introduction>.
Maintained by James Wade. Last updated 5 days ago.
chatgptgpt-3rstudiorstudio-addin
1.9 match 924 stars 10.83 score 43 scripts 1 dependentsbioc
DropletUtils:Utilities for Handling Single-Cell Droplet Data
Provides a number of utility functions for handling single-cell (RNA-seq) data from droplet technologies such as 10X Genomics. This includes data loading from count matrices or molecule information files, identification of cells from empty droplets, removal of barcode-swapped pseudo-cells, and downsampling of the count matrix.
Maintained by Jonathan Griffiths. Last updated 3 months ago.
immunooncologysinglecellsequencingrnaseqgeneexpressiontranscriptomicsdataimportcoveragezlibcpp
2.0 match 10.08 score 2.7k scripts 9 dependentsropensci
drake:A Pipeline Toolkit for Reproducible Computation at Scale
A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.
Maintained by William Michael Landau. Last updated 3 months ago.
data-sciencedrakehigh-performance-computingmakefilepeer-reviewedpipelinereproducibilityreproducible-researchropensciworkflow
1.8 match 1.3k stars 11.49 score 1.7k scripts 1 dependentsbioc
immunotation:Tools for working with diverse immune genes
MHC (major histocompatibility complex) molecules are cell surface complexes that present antigens to T cells. The repertoire of antigens presented in a given genetic background largely depends on the sequence of the encoded MHC molecules, and thus, in humans, on the highly variable HLA (human leukocyte antigen) genes of the hyperpolymorphic HLA locus. More than 28,000 different HLA alleles have been reported, with significant differences in allele frequencies between human populations worldwide. Reproducible and consistent annotation of HLA alleles in large-scale bioinformatics workflows remains challenging, because the available reference databases and software tools often use different HLA naming schemes. The package immunotation provides tools for consistent annotation of HLA genes in typical immunoinformatics workflows such as for example the prediction of MHC-presented peptides in different human donors. Converter functions that provide mappings between different HLA naming schemes are based on the MHC restriction ontology (MRO). The package also provides automated access to HLA alleles frequencies in worldwide human reference populations stored in the Allele Frequency Net Database.
Maintained by Katharina Imkeller. Last updated 5 months ago.
softwareimmunooncologybiomedicalinformaticsgeneticsannotation
4.1 match 8 stars 4.90 score 3 scriptsgermanrecordlinkage
PPRL:Privacy Preserving Record Linkage
A toolbox for deterministic, probabilistic and privacy-preserving record linkage techniques. Combines the functionality of the 'Merge ToolBox' (<https://www.record-linkage.de>) with current privacy-preserving techniques.
Maintained by Dorothea Rukasz. Last updated 2 years ago.
7.5 match 2 stars 2.64 score 22 scriptsropensci
frictionless:Read and Write Frictionless Data Packages
Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets.
Maintained by Peter Desmet. Last updated 6 months ago.
2.0 match 30 stars 9.79 score 55 scripts 6 dependentstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 8 days ago.
1.8 match 160 stars 10.87 score 964 scripts 1 dependentscoolbutuseless
fastpng:Read and Write PNG Files with Configurable Decoder/Encoder Options
Read and write PNG images with arrays, rasters, native rasters, numeric arrays, integer arrays, raw vectors and indexed values. This PNG encoder exposes configurable internal options enabling the user to select a speed-size tradeoff. For example, disabling compression can speed up writing PNG by a factor of 50. Multiple image formats are supported including raster, native rasters, and integer and numeric arrays at color depths of 1, 2, 3 or 4. 16-bit images are also supported. This implementation uses the 'libspng' 'C' library which is available from <https://github.com/randy408/libspng/>.
Maintained by Mike Cheng. Last updated 2 months ago.
3.3 match 18 stars 5.86 score 7 scriptsncss-tech
soilDB:Soil Database Interface
A collection of functions for reading soil data from U.S. Department of Agriculture Natural Resources Conservation Service (USDA-NRCS) and National Cooperative Soil Survey (NCSS) databases.
Maintained by Andrew Brown. Last updated 6 days ago.
ksslnasisnrcssoilsoil-data-accesssoil-surveysoilwebsqlusda
1.7 match 87 stars 11.34 score 1.0k scripts 1 dependentsbioc
Structstrings:Implementation of the dot bracket annotations with Biostrings
The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package.
Maintained by Felix G.M. Ernst. Last updated 4 months ago.
dataimportdatarepresentationinfrastructuresequencingsoftwarealignmentsequencematchingbioconductorrnarna-structural-analysisrna-structuresequencesstructures
3.0 match 4 stars 6.46 score 3 scripts 4 dependentsbioc
seqTools:Analysis of nucleotide, sequence and quality content on fastq files
Analyze read length, phred scores and alphabet frequency and DNA k-mers on uncompressed and compressed fastq files.
Maintained by Wolfgang Kaisers. Last updated 5 months ago.
3.5 match 5.57 score 52 scripts 1 dependentsropensci
av:Working with Audio and Video in R
Bindings to 'FFmpeg' <http://www.ffmpeg.org/> AV library for working with audio and video in R. Generates high quality video from images or R graphics with custom audio. Also offers high performance tools for reading raw audio, creating 'spectrograms', and converting between countless audio / video formats. This package interfaces directly to the C API and does not require any command line utilities.
Maintained by Jeroen Ooms. Last updated 1 months ago.
1.9 match 93 stars 10.28 score 552 scripts 15 dependentss-u
PKI:Public Key Infrastucture for R Based on the X.509 Standard
Public Key Infrastucture functions such as verifying certificates, RSA encription and signing which can be used to build PKI infrastructure and perform cryptographic tasks.
Maintained by Simon Urbanek. Last updated 7 months ago.
2.3 match 18 stars 8.52 score 63 scripts 8 dependentsprabinameher
EncDNA:Encoding of Nucleotide Sequences into Numeric Feature Vectors
We describe fifteen different splice site sequence encoding schemes that have been used in earlier studies for mapping of splice site sequences into numeric feature vectors. These encoding schemes will also be helpful for transforming other nucleotide sequences into numeric forms, provided they are of equal length. These encoding schemes will help the computational biologist working in the field of classification (binary or multiclass) or prediction involving nucleic acid sequences of equal length.
Maintained by Prabina Kumar Meher. Last updated 6 years ago.
19.1 match 1 stars 1.00 scoreprogram--
hilbert:Coordinate Indexing on Hilbert Curves
Provides utilities for encoding and decoding coordinates to/from Hilbert curves based on the iterative encoding implementation described in Chen et al. (2006) <doi:10.1002/spe.793>.
Maintained by Justin Singh-Mohudpur. Last updated 3 years ago.
4.3 match 5 stars 4.40 score 5 scriptsbioc
orthos:`orthos` is an R package for variance decomposition using conditional variational auto-encoders
`orthos` decomposes RNA-seq contrasts, for example obtained from a gene knock-out or compound treatment experiment, into unspecific and experiment-specific components. Original and decomposed contrasts can be efficiently queried against a large database of contrasts (derived from ARCHS4, https://maayanlab.cloud/archs4/) to identify similar experiments. `orthos` furthermore provides plotting functions to visualize the results of such a search for similar contrasts.
Maintained by Panagiotis Papasaikas. Last updated 4 days ago.
rnaseqdifferentialexpressiongeneexpression
4.4 match 4.18 score 2 scriptsironholds
olctools:Open Location Code Handling in R
'Open Location Codes' (https://openlocationcode.com/) are a Google- created standard for identifying geographic locations. olctools provides utilities for validating, encoding and decoding entries that follow this standard.
Maintained by Oliver Keyes. Last updated 9 years ago.
3.6 match 13 stars 5.16 score 11 scriptsr-quantities
errors:Uncertainty Propagation for R Vectors
Support for measurement errors in R vectors, matrices and arrays: automatic uncertainty propagation and reporting. Documentation about 'errors' is provided in the paper by Ucar, Pebesma & Azcorra (2018, <doi:10.32614/RJ-2018-075>), included in this package as a vignette; see 'citation("errors")' for details.
Maintained by Iรฑaki Ucar. Last updated 2 months ago.
2.3 match 49 stars 8.18 score 86 scripts 4 dependentsgdemin
expss:Tables, Labels and Some Useful Functions from Spreadsheets and 'SPSS' Statistics
Package computes and displays tables with support for 'SPSS'-style labels, multiple and nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', 'Shiny', '*.xlsx' files, R and 'Jupyter' notebooks. Methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package brings popular data transformation functions from 'SPSS' Statistics and 'Excel': 'RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP' and etc. These functions are very useful for data processing in marketing research surveys. Package intended to help people to move data processing from 'Excel' and 'SPSS' to R.
Maintained by Gregory Demin. Last updated 11 months ago.
excellabelslabels-supportmsexcelpivot-tablesrecodespssspss-statisticstablesvariable-labelsvlookup
1.7 match 84 stars 11.00 score 1.8k scripts 4 dependentsgojiplus
tuber:Client for the YouTube API
Get comments posted on YouTube videos, information on how many times a video has been liked, search for videos with particular content, and much more. You can also scrape captions from a few videos. To learn more about the YouTube API, see <https://developers.google.com/youtube/v3/>.
Maintained by Gaurav Sood. Last updated 8 months ago.
access-youtubecaptionvideoyoutubeyoutube-apiyoutube-oauth
2.0 match 184 stars 8.99 score 206 scriptsbergsmat
yamlet:Versatile Curation of Table Metadata
A YAML-based mechanism for working with table metadata. Supports compact syntax for creating, modifying, viewing, exporting, importing, displaying, and plotting metadata coded as column attributes. The 'yamlet' dialect is valid 'YAML' with defaults and conventions chosen to improve readability. See ?yamlet, ?decorate, ?modify, ?io_csv, and ?ggplot.decorated.
Maintained by Tim Bergsma. Last updated 22 days ago.
3.0 match 2 stars 5.99 score 60 scripts 1 dependentsnatverse
nat:NeuroAnatomy Toolbox for Analysis of 3D Image Data
NeuroAnatomy Toolbox (nat) enables analysis and visualisation of 3D biological image data, especially traced neurons. Reads and writes 3D images in NRRD and 'Amira' AmiraMesh formats and reads surfaces in 'Amira' hxsurf format. Traced neurons can be imported from and written to SWC and 'Amira' LineSet and SkeletonGraph formats. These data can then be visualised in 3D via 'rgl', manipulated including applying calculated registrations, e.g. using the 'CMTK' registration suite, and analysed. There is also a simple representation for neurons that have been subjected to 3D skeletonisation but not formally traced; this allows morphological comparison between neurons including searches and clustering (via the 'nat.nblast' extension package).
Maintained by Gregory Jefferis. Last updated 5 months ago.
3dconnectomicsimage-analysisneuroanatomyneuroanatomy-toolboxneuronneuron-morphologyneurosciencevisualisation
1.8 match 67 stars 9.94 score 436 scripts 2 dependentsrapler
dst:Using the Theory of Belief Functions
Using the Theory of Belief Functions for evidence calculus. Basic probability assignments, or mass functions, can be defined on the subsets of a set of possible values and combined. A mass function can be extended to a larger frame. Marginalization, i.e. reduction to a smaller frame can also be done. These features can be combined to analyze small belief networks and take into account situations where information cannot be satisfactorily described by probability distributions.
Maintained by Peiyuan Zhu. Last updated 3 months ago.
3.0 match 6 stars 5.96 score 126 scriptsstatnet
rle:Common Functions for Run-Length Encoded Vectors
Common 'base' and 'stats' methods for 'rle' objects, aiming to make it possible to treat them transparently as vectors.
Maintained by Pavel N. Krivitsky. Last updated 4 months ago.
2.9 match 2 stars 6.07 score 1 scripts 37 dependentsjamesyang007
adelie:Group Lasso and Elastic Net Solver for Generalized Linear Models
Extremely efficient procedures for fitting the entire group lasso and group elastic net regularization path for GLMs, multinomial, the Cox model and multi-task Gaussian models. Similar to the R package 'glmnet' in scope of models, and in computational speed. This package provides R bindings to the C++ code underlying the corresponding Python package 'adelie'. These bindings offer a general purpose group elastic net solver, a wide range of matrix classes that can exploit special structure to allow large-scale inputs, and an assortment of generalized linear model classes for fitting various types of data. The package is an implementation of Yang, J. and Hastie, T. (2024) <doi:10.48550/arXiv.2405.08631>.
Maintained by Trevor Hastie. Last updated 15 days ago.
3.0 match 6 stars 5.86 score 3 scriptsbioc
ROntoTools:R Onto-Tools suite
Suite of tools for functional analysis.
Maintained by Sorin Draghici. Last updated 5 months ago.
networkanalysismicroarraygraphsandnetworks
3.4 match 5.10 score 15 scripts 2 dependentshope-data-science
tidyfst:Tidy Verbs for Fast Data Manipulation
A toolkit of tidy data manipulation verbs with 'data.table' as the backend. Combining the merits of syntax elegance from 'dplyr' and computing performance from 'data.table', 'tidyfst' intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of 'data.table'. While enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations.
Maintained by Tian-Yuan Huang. Last updated 6 months ago.
1.7 match 98 stars 10.09 score 118 scripts 4 dependentsbioc
OUTRIDER:OUTRIDER - OUTlier in RNA-Seq fInDER
Identification of aberrant gene expression in RNA-seq data. Read count expectations are modeled by an autoencoder to control for confounders in the data. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. Furthermore, OUTRIDER provides useful plotting functions to analyze and visualize the results.
Maintained by Christian Mertes. Last updated 5 months ago.
immunooncologyrnaseqtranscriptomicsalignmentsequencinggeneexpressiongeneticscount-datadiagnosticsexpression-analysismendelian-geneticsoutlier-detectionrna-seqopenblascpp
1.9 match 49 stars 9.07 score 110 scripts 1 dependentstim-band
shinylight:Web Interface to 'R' Functions
Web front end for your 'R' functions producing plots or tables. If you have a function or set of related functions, you can make them available over the internet through a web browser. This is the same motivation as the 'shiny' package, but note that the development of 'shinylight' is not in any way linked to that of 'shiny' (beyond the use of the 'httpuv' package). You might prefer 'shinylight' to 'shiny' if you want a lighter weight deployment with easier horizontal scaling, or if you want to develop your front end yourself in JavaScript and HTML just using a lightweight remote procedure call interface to your R code on the server.
Maintained by Tim Band. Last updated 1 years ago.
5.3 match 3.18 score 1 scripts 1 dependentsbioc
HPiP:Host-Pathogen Interaction Prediction
HPiP (Host-Pathogen Interaction Prediction) uses an ensemble learning algorithm for prediction of host-pathogen protein-protein interactions (HP-PPIs) using structural and physicochemical descriptors computed from amino acid-composition of host and pathogen proteins.The proposed package can effectively address data shortages and data unavailability for HP-PPI network reconstructions. Moreover, establishing computational frameworks in that regard will reveal mechanistic insights into infectious diseases and suggest potential HP-PPI targets, thus narrowing down the range of possible candidates for subsequent wet-lab experimental validations.
Maintained by Matineh Rahmatbakhsh. Last updated 5 months ago.
proteomicssystemsbiologynetworkinferencestructuralpredictiongenepredictionnetwork
3.4 match 3 stars 4.95 score 6 scriptsbioc
epigraHMM:Epigenomic R-based analysis with hidden Markov models
epigraHMM provides a set of tools for the analysis of epigenomic data based on hidden Markov Models. It contains two separate peak callers, one for consensus peaks from biological or technical replicates, and one for differential peaks from multi-replicate multi-condition experiments. In differential peak calling, epigraHMM provides window-specific posterior probabilities associated with every possible combinatorial pattern of read enrichment across conditions.
Maintained by Pedro Baldoni. Last updated 5 months ago.
chipseqatacseqdnaseseqhiddenmarkovmodelepigeneticszlibopenblascppopenmp
3.4 match 4.94 score 88 scriptsbioc
ChIPseqR:Identifying Protein Binding Sites in High-Throughput Sequencing Data
ChIPseqR identifies protein binding sites from ChIP-seq and nucleosome positioning experiments. The model used to describe binding events was developed to locate nucleosomes but should flexible enough to handle other types of experiments as well.
Maintained by Peter Humburg. Last updated 5 months ago.
3.5 match 4.70 score 1 scriptshenrikbengtsson
R.matlab:Read and Write MAT Files and Call MATLAB from Within R
Methods readMat() and writeMat() for reading and writing MAT files. For user with MATLAB v6 or newer installed (either locally or on a remote host), the package also provides methods for controlling MATLAB (trademark) via R and sending and retrieving data between R and MATLAB.
Maintained by Henrik Bengtsson. Last updated 3 years ago.
1.6 match 85 stars 10.55 score 2.9k scripts 25 dependentsshichenxie
scorecard:Credit Risk Scorecard
The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.
Maintained by Shichen Xie. Last updated 11 months ago.
binningcredit-scoringreleasescorecardwoewoebinning
2.0 match 164 stars 8.07 score 94 scriptsgi0na
ghypernet:Fit and Simulate Generalised Hypergeometric Ensembles of Graphs
Provides functions for model fitting and selection of generalised hypergeometric ensembles of random graphs (gHypEG). To learn how to use it, check the vignettes for a quick tutorial. Please reference its use as Casiraghi, G., Nanumyan, V. (2019) <doi:10.5281/zenodo.2555300> together with those relevant references from the one listed below. The package is based on the research developed at the Chair of Systems Design, ETH Zurich. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2016) <arXiv:1607.02441>. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017) <doi:10.1007/978-3-319-67256-4_11>. Casiraghi, G., (2017) <arXiv:1702.02048> Brandenberger, L., Casiraghi, G., Nanumyan, V., Schweitzer, F. (2019) <doi:10.1145/3341161.3342926> Casiraghi, G. (2019) <doi:10.1007/s41109-019-0241-1>. Casiraghi, G., Nanumyan, V. (2021) <doi:10.1038/s41598-021-92519-y>. Casiraghi, G. (2021) <doi:10.1088/2632-072X/ac0493>.
Maintained by Giona Casiraghi. Last updated 11 months ago.
data-miningdata-sciencegraphsnetworknetwork-analysisrandom-graph-generationrandom-graphs
2.8 match 8 stars 5.68 score 20 scriptscran
XML:Tools for Parsing and Generating XML Within R and S-Plus
Many approaches for both reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP. Also offers access to an 'XPath' "interpreter".
Maintained by CRAN Team. Last updated 2 months ago.
1.8 match 3 stars 8.87 score 1.3k dependentsbioc
FRASER:Find RAre Splicing Events in RNA-Seq Data
Detection of rare aberrant splicing events in transcriptome profiles. Read count ratio expectations are modeled by an autoencoder to control for confounding factors in the data. Given these expectations, the ratios are assumed to follow a beta-binomial distribution with a junction specific dispersion. Outlier events are then identified as read-count ratios that deviate significantly from this distribution. FRASER is able to detect alternative splicing, but also intron retention. The package aims to support diagnostics in the field of rare diseases where RNA-seq is performed to identify aberrant splicing defects.
Maintained by Christian Mertes. Last updated 5 months ago.
rnaseqalternativesplicingsequencingsoftwaregeneticscoverageaberrant-splicingdiagnosticsoutlier-detectionrare-diseaserna-seqsplicingopenblascpp
1.9 match 41 stars 8.50 score 155 scriptsbioboot
bio3d:Biological Structure Analysis
Utilities to process, organize and explore protein structure, sequence and dynamics data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform sequence and structure database searches, data summaries, atom selection, alignment, superposition, rigid core identification, clustering, torsion analysis, distance matrix analysis, structure and sequence conservation analysis, normal mode analysis, principal component analysis of heterogeneous structure data, and correlation network analysis from normal mode and molecular dynamics data. In addition, various utility functions are provided to enable the statistical and graphical power of the R environment to work with biological sequence and structural data. Please refer to the URLs below for more information.
Maintained by Barry Grant. Last updated 5 months ago.
1.9 match 5 stars 8.49 score 1.4k scripts 10 dependentssaraswatmks
superml:Build Machine Learning Models Like Using Python's Scikit-Learn Library in R
The idea is to provide a standard interface to users who use both R and Python for building machine learning models. This package provides a scikit-learn's fit, predict interface to train machine learning models in R.
Maintained by Manish Saraswat. Last updated 1 years ago.
2.3 match 32 stars 7.05 score 117 scriptsbioc
QFeatures:Quantitative features for mass spectrometry data
The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.
Maintained by Laurent Gatto. Last updated 12 days ago.
infrastructuremassspectrometryproteomicsmetabolomicsbioconductormass-spectrometry
1.3 match 27 stars 11.87 score 278 scripts 49 dependentsdipterix
dipsaus:A Dipping Sauce for Data Analysis and Visualizations
Works as an "add-on" to packages like 'shiny', 'future', as well as 'rlang', and provides utility functions. Just like dipping sauce adding flavors to potato chips or pita bread, 'dipsaus' for data analysis and visualizations adds handy functions and enhancements to popular packages. The goal is to provide simple solutions that are frequently asked for online, such as how to synchronize 'shiny' inputs without freezing the app, or how to get memory size on 'Linux' or 'MacOS' system. The enhancements roughly fall into these four categories: 1. 'shiny' input widgets; 2. high-performance computing using the 'future' package; 3. modify R calls and convert among numbers, strings, and other objects. 4. utility functions to get system information such like CPU chip-set, memory limit, etc.
Maintained by Zhengjia Wang. Last updated 4 days ago.
2.0 match 13 stars 7.90 score 85 scripts 3 dependentsparmsam
lzstring:Wrapper for 'lz-string' 'C++' Library
Provide access to the 'lz-string' <http://pieroxy.net/blog/pages/lz-string/index.html> 'C++' library for Lempel-Ziv (LZ) based compression and decompression of strings.
Maintained by Sam Parmar. Last updated 2 months ago.
3.6 match 1 stars 4.38 score 4 scripts 1 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
1.3 match 215 stars 11.83 score 1.2k scripts 9 dependentsemilhvitfeldt
extrasteps:More Miscellaneous Steps for the 'recipes' Package
Contains additional miscellaneous steps for the 'recipes' package. These steps are useful, but doesn't have a good home in other 'recipes' packages or its extensions.
Maintained by Emil Hvitfeldt. Last updated 5 months ago.
3.8 match 10 stars 4.15 score 14 scriptsbioc
TFutils:TFutils
This package helps users to work with TF metadata from various sources. Significant catalogs of TFs and classifications thereof are made available. Tools for working with motif scans are also provided.
Maintained by Vincent Carey. Last updated 4 months ago.
3.3 match 4.80 score 21 scriptsbioc
seqArchR:Identify Different Architectures of Sequence Elements
seqArchR enables unsupervised discovery of _de novo_ clusters with characteristic sequence architectures characterized by position-specific motifs or composition of stretches of nucleotides, e.g., CG-richness. seqArchR does _not_ require any specifications w.r.t. the number of clusters, the length of any individual motifs, or the distance between motifs if and when they occur in pairs/groups; it directly detects them from the data. seqArchR uses non-negative matrix factorization (NMF) as its backbone, and employs a chunking-based iterative procedure that enables processing of large sequence collections efficiently. Wrapper functions are provided for visualizing cluster architectures as sequence logos.
Maintained by Sarvesh Nikumbh. Last updated 5 months ago.
motifdiscoverygeneregulationmathematicalbiologysystemsbiologytranscriptomicsgeneticsclusteringdimensionreductionfeatureextractiondnaseqnmfnonnegative-matrix-factorizationpromoter-sequence-architecturesscikit-learnsequence-analysissequence-architecturesunsupervised-machine-learning
3.5 match 1 stars 4.48 score 9 scripts 1 dependentstrivialfis
xgboost:Extreme Gradient Boosting
Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.
Maintained by Jiaming Yuan. Last updated 8 months ago.
1.3 match 6 stars 11.70 score 13k scripts 112 dependentsbioc
KBoost:Inference of gene regulatory networks from gene expression data
Reconstructing gene regulatory networks and transcription factor activity is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-art algorithm are often not able to handle large amounts of data. Furthermore, many of the present methods predict numerous false positives and are unable to integrate other sources of information such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. KBoost can also use a prior network built on previously known transcription factor targets. We have benchmarked KBoost using three different datasets against other high performing algorithms. The results show that our method compares favourably to other methods across datasets.
Maintained by Luis F. Iglesias-Martinez. Last updated 5 months ago.
networkgraphandnetworkbayesiannetworkinferencegeneregulationtranscriptomicssystemsbiologytranscriptiongeneexpressionregressionprincipalcomponent
3.4 match 4 stars 4.60 score 9 scriptsbioc
FastqCleaner:A Shiny Application for Quality Control, Filtering and Trimming of FASTQ Files
An interactive web application for quality control, filtering and trimming of FASTQ files. This user-friendly tool combines a pipeline for data processing based on Biostrings and ShortRead infrastructure, with a cutting-edge visual environment. Single-Read and Paired-End files can be locally processed. Diagnostic interactive plots (CG content, per-base sequence quality, etc.) are provided for both the input and output files.
Maintained by Leandro Roser. Last updated 5 months ago.
qualitycontrolsequencingsoftwaresangerseqsequencematchingcpp
3.8 match 4.00 score 4 scripts