Showing 200 of total 737 results (show query)
rpolars
polars:Lightning-Fast 'DataFrame' Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Soren Welling. Last updated 1 days ago.
24.7 match 501 stars 12.01 score 1.0k scripts 2 dependentsjoshuaulrich
xts:eXtensible Time Series
Provide for uniform handling of R's different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
Maintained by Joshua M. Ulrich. Last updated 4 months ago.
10.4 match 221 stars 18.38 score 12k scripts 654 dependentstidyverse
dplyr:A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
Maintained by Hadley Wickham. Last updated 17 days ago.
6.9 match 4.8k stars 24.68 score 659k scripts 7.8k dependentssafetygraphics
safetyCharts:Charts for Monitoring Clinical Trial Safety
Contains chart code for monitoring clinical trial safety. Charts can be used as standalone output, but are also designed for use with the 'safetyGraphics' package, which makes it easy to load data and customize the charts using an interactive web-based interface created with Shiny.
Maintained by Jeremy Wildfire. Last updated 6 months ago.
28.4 match 9 stars 5.36 score 21 scripts 1 dependentseitsupi
neopolars:R Bindings for the 'polars' Rust Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Tatsuya Shima. Last updated 3 days ago.
25.6 match 40 stars 4.87 score 1 scriptsbioc
S4Vectors:Foundation of vector-like and list-like containers in Bioconductor
The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
7.5 match 18 stars 16.05 score 1.0k scripts 1.9k dependentsbioc
dada2:Accurate, high-resolution sample inference from amplicon sequencing data
The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.
Maintained by Benjamin Callahan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclassificationmetagenomicsampliconbioconductorbioinformaticsmetabarcodingtaxonomycpp
8.8 match 485 stars 13.17 score 3.0k scripts 4 dependentsbioc
gDRutils:A package with helper functions for processing drug response data
This package contains utility functions used throughout the gDR platform to fit data, manipulate data, and convert and validate data structures. This package also has the necessary default constants for gDR platform. Many of the functions are utilized by the gDRcore package.
Maintained by Arkadiusz Gladki. Last updated 15 hours ago.
14.2 match 2 stars 7.40 score 3 scripts 3 dependentsspatstat
spatstat.geom:Geometrical Functionality of the 'spatstat' Family
Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)
Maintained by Adrian Baddeley. Last updated 2 days ago.
classes-and-objectsdistance-calculationgeometrygeometry-processingimagesmensurationplottingpoint-patternsspatial-dataspatial-data-analysis
8.1 match 7 stars 12.10 score 241 scripts 227 dependentsr-lib
bit:Classes and Methods for Fast Memory-Efficient Boolean Selections
Provided are classes for boolean and skewed boolean vectors, fast boolean methods, fast unique and non-unique integer sorting, fast set operations on sorted and unsorted sets of integers, and foundations for ff (range index, compression, chunked processing).
Maintained by Michael Chirico. Last updated 9 days ago.
6.3 match 12 stars 15.15 score 131 scripts 3.2k dependentsrspatial
terra:Spatial Data Analysis
Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).
Maintained by Robert J. Hijmans. Last updated 4 hours ago.
geospatialrasterspatialvectoronetbbprojgdalgeoscpp
5.3 match 560 stars 17.64 score 17k scripts 851 dependentstalgalili
dendextend:Extending 'dendrogram' Functionality in R
Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.
Maintained by Tal Galili. Last updated 2 months ago.
5.4 match 154 stars 17.02 score 6.0k scripts 164 dependentsrekyt
funrar:Functional Rarity Indices Computation
Computes functional rarity indices as proposed by Violle et al. (2017) <doi:10.1016/j.tree.2017.02.002>. Various indices can be computed using both regional and local information. Functional Rarity combines both the functional aspect of rarity as well as the extent aspect of rarity. 'funrar' is presented in Grenié et al. (2017) <doi:10.1111/ddi.12629>.
Maintained by Matthias Grenié. Last updated 11 months ago.
ecological-modelsecologyraritytraits
11.6 match 17 stars 7.85 score 233 scripts 1 dependentsrspatial
raster:Geographic Data Analysis and Modeling
Reading, writing, manipulating, analyzing and modeling of spatial data. This package has been superseded by the "terra" package <https://CRAN.R-project.org/package=terra>.
Maintained by Robert J. Hijmans. Last updated 2 months ago.
5.3 match 164 stars 17.05 score 58k scripts 555 dependentsr-lib
bit64:A S3 Class for Vectors of 64bit Integers
Package 'bit64' provides serializable S3 atomic 64bit (signed) integers. These are useful for handling database keys and exact counting in +-2^63. WARNING: do not use them as replacement for 32bit integers, integer64 are not supported for subscripting by R-core and they have different semantics when combined with double, e.g. integer64 + double => integer64. Class integer64 can be used in vectors, matrices, arrays and data.frames. Methods are available for coercion from and to logicals, integers, doubles, characters and factors as well as many elementwise and summary functions. Many fast algorithmic operations such as 'match' and 'order' support inter- active data exploration and manipulation and optionally leverage caching.
Maintained by Michael Chirico. Last updated 7 days ago.
6.0 match 35 stars 14.91 score 1.5k scripts 3.2k dependentsncss-tech
aqp:Algorithms for Quantitative Pedology
The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.
Maintained by Dylan Beaudette. Last updated 1 months ago.
digital-soil-mappingncss-technrcspedologypedometricssoilsoil-surveyusda
7.3 match 55 stars 11.90 score 1.2k scripts 2 dependentscynkra
dm:Relational Data Models
Provides tools for working with multiple related tables, stored as data frames or in a relational database. Multiple tables (data and metadata) are stored in a compound object, which can then be manipulated with a pipe-friendly syntax.
Maintained by Kirill Müller. Last updated 3 months ago.
data-modeldata-warehousingdatawarehousingdbidbplyrrelational-databases
5.8 match 511 stars 14.81 score 410 scripts 8 dependentsmarkfairbanks
tidytable:Tidy Interface to 'data.table'
A tidy interface to 'data.table', giving users the speed of 'data.table' while using tidyverse-like syntax.
Maintained by Mark Fairbanks. Last updated 2 months ago.
7.2 match 458 stars 11.41 score 732 scripts 10 dependentsbioc
DelayedArray:A unified framework for working transparently with on-disk and in-memory array-like datasets
Wrapping an array-like object (typically an on-disk object) in a DelayedArray object allows one to perform common array operations on it without loading the object in memory. In order to reduce memory usage and optimize performance, operations on the object are either delayed or executed using a block processing mechanism. Note that this also works on in-memory array-like objects like DataFrame objects (typically with Rle columns), Matrix objects, ordinary arrays and, data frames.
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructuredatarepresentationannotationgenomeannotationbioconductor-packagecore-packageu24ca289073
5.3 match 27 stars 15.59 score 538 scripts 1.2k dependentswinvector
wrapr:Wrap R Tools for Debugging and Parametric Programming
Tools for writing and debugging R code. Provides: '%.>%' dot-pipe (an 'S3' configurable pipe), unpack/to (R style multiple assignment/return), 'build_frame()'/'draw_frame()' ('data.frame' example tools), 'qc()' (quoting concatenate), ':=' (named map builder), 'let()' (converts non-standard evaluation interfaces to parametric standard evaluation interfaces, inspired by 'gtools::strmacro()' and 'base::bquote()'), and more.
Maintained by John Mount. Last updated 2 years ago.
6.6 match 137 stars 11.11 score 390 scripts 12 dependentstroyhernandez
tinyspotifyr:Tinyverse R Wrapper for the 'Spotify' Web API
An R wrapper for the 'Spotify' Web API <https://developer.spotify.com/web-api/>.
Maintained by Troy Hernandez. Last updated 1 years ago.
14.8 match 13 stars 4.81 score 5 scriptsr-lib
vctrs:Vector Helpers
Defines new notions of prototype and size that are used to provide tools for consistent and well-founded type-coercion and size-recycling, and are in turn connected to ideas of type- and size-stability useful for analysing function interfaces.
Maintained by Davis Vaughan. Last updated 5 months ago.
3.8 match 290 stars 18.97 score 1.1k scripts 13k dependentsbioc
BiocGenerics:S4 generic functions used in Bioconductor
The package defines many S4 generic functions used in Bioconductor.
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructurebioconductor-packagecore-package
5.0 match 12 stars 14.22 score 612 scripts 2.2k dependentssebkrantz
collapse:Advanced and Fast Data Transformation
A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.
Maintained by Sebastian Krantz. Last updated 9 days ago.
data-aggregationdata-analysisdata-manipulationdata-processingdata-sciencedata-transformationeconometricshigh-performancepanel-datascientific-computingstatisticstime-seriesweightedweightscppopenmp
4.3 match 672 stars 16.63 score 708 scripts 97 dependentsdarwin-eu
omopgenerics:Methods and Classes for the OMOP Common Data Model
Provides definitions of core classes and methods used by analytic pipelines that query the OMOP (Observational Medical Outcomes Partnership) common data model.
Maintained by Martí Català. Last updated 13 days ago.
6.9 match 9.97 score 193 scripts 16 dependentsbioc
InterCellar:InterCellar: an R-Shiny app for interactive analysis and exploration of cell-cell communication in single-cell transcriptomics
InterCellar is implemented as an R/Bioconductor Package containing a Shiny app that allows users to interactively analyze cell-cell communication from scRNA-seq data. Starting from precomputed ligand-receptor interactions, InterCellar provides filtering options, annotations and multiple visualizations to explore clusters, genes and functions. Finally, based on functional annotation from Gene Ontology and pathway databases, InterCellar implements data-driven analyses to investigate cell-cell communication in one or multiple conditions.
Maintained by Marta Interlandi. Last updated 5 months ago.
softwaresinglecellvisualizationgotranscriptomics
13.8 match 9 stars 4.95 score 7 scriptskwb-r
kwb.utils:General Utility Functions Developed at KWB
This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).
Maintained by Hauke Sonnenberg. Last updated 12 months ago.
9.2 match 8 stars 7.33 score 12 scripts 78 dependentsdavidcsterratt
geometry:Mesh Generation and Surface Tessellation
Makes the 'Qhull' library <http://www.qhull.org> available in R, in a similar manner as in Octave and MATLAB. Qhull computes convex hulls, Delaunay triangulations, halfspace intersections about a point, Voronoi diagrams, furthest-site Delaunay triangulations, and furthest-site Voronoi diagrams. It runs in 2D, 3D, 4D, and higher dimensions. It implements the Quickhull algorithm for computing the convex hull. Qhull does not support constrained Delaunay triangulations, or mesh generation of non-convex objects, but the package does include some R functions that allow for this.
Maintained by David C. Sterratt. Last updated 1 months ago.
5.0 match 16 stars 12.98 score 776 scripts 139 dependentsmkearney
funique:A Faster Unique Function
Similar to base's unique function, only optimized for working with data frames, especially those that contain date-time columns.
Maintained by Michael Wayne Kearney. Last updated 7 years ago.
data-framedata-wranglingdate-timeduplicatesmkearney-r-packageposixposixctunique
15.9 match 20 stars 4.00 score 7 scriptsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 1 months ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
5.4 match 33 stars 10.56 score 115 scripts 2 dependentscharlie86
spotifyr:R Wrapper for the 'Spotify' Web API
An R wrapper for pulling data from the 'Spotify' Web API <https://developer.spotify.com/documentation/web-api/> in bulk, or post items on a 'Spotify' user's playlist.
Maintained by Daniel Antal. Last updated 5 months ago.
music-information-retrievalspotify
6.6 match 374 stars 8.54 score 936 scriptsbioc
limma:Linear Models for Microarray and Omics Data
Data analysis, linear models and differential expression for omics data.
Maintained by Gordon Smyth. Last updated 1 days ago.
exonarraygeneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicinggenesetenrichmentdataimportbayesianclusteringregressiontimecoursemicroarraymicrornaarraymrnamicroarrayonechannelproprietaryplatformstwochannelsequencingrnaseqbatcheffectmultiplecomparisonnormalizationpreprocessingqualitycontrolbiomedicalinformaticscellbiologycheminformaticsepigeneticsfunctionalgenomicsgeneticsimmunooncologymetabolomicsproteomicssystemsbiologytranscriptomics
4.0 match 13.81 score 16k scripts 586 dependentsusdaforestservice
FIESTAutils:Utility Functions for Forest Inventory Estimation and Analysis
A set of tools for data wrangling, spatial data analysis, statistical modeling (including direct, model-assisted, photo-based, and small area tools), and USDA Forest Service data base tools. These tools are aimed to help Foresters, Analysts, and Scientists extract and perform analyses on USDA Forest Service data.
Maintained by Grayson White. Last updated 5 days ago.
8.1 match 8 stars 6.33 score 1 dependentsbioc
scuttle:Single-Cell RNA-Seq Analysis Utilities
Provides basic utility functions for performing single-cell analyses, focusing on simple normalization, quality control and data transformations. Also provides some helper functions to assist development of other packages.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologysinglecellrnaseqqualitycontrolpreprocessingnormalizationtranscriptomicsgeneexpressionsequencingsoftwaredataimportopenblascpp
5.0 match 10.21 score 1.7k scripts 80 dependentstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
6.2 match 3 stars 8.20 score 7.8k scripts 11 dependentsjmw86069
jamba:Just Analysis Methods Base
Just analysis methods ('jam') base functions focused on bioinformatics. Version- and gene-centric alphanumeric sort, unique name and version assignment, colorized console and 'HTML' output, color ramp and palette manipulation, 'Rmarkdown' cache import, styled 'Excel' worksheet import and export, interpolated raster output from smooth scatter and image plots, list to delimited vector, efficient list tools.
Maintained by James M. Ward. Last updated 13 days ago.
9.1 match 6 stars 5.43 scoretdhock
directlabels:Direct Labels for Multicolor Plots
An extensible framework for automatically placing direct labels onto multicolor 'lattice' or 'ggplot2' plots. Label positions are described using Positioning Methods which can be re-used across several different plots. There are heuristics for examining "trellis" and "ggplot" objects and inferring an appropriate Positioning Method.
Maintained by Toby Dylan Hocking. Last updated 11 months ago.
4.5 match 83 stars 10.62 score 1.8k scripts 16 dependentsrenkun-ken
rlist:A Toolbox for Non-Tabular Data Manipulation
Provides a set of functions for data manipulation with list objects, including mapping, filtering, grouping, sorting, updating, searching, and other useful functions. Most functions are designed to be pipeline friendly so that data processing with lists can be chained.
Maintained by Kun Ren. Last updated 2 years ago.
3.5 match 206 stars 13.73 score 2.2k scripts 123 dependentsanimint
animint2:Animated Interactive Grammar of Graphics
Functions are provided for defining animated, interactive data visualizations in R code, and rendering on a web page. The 2018 Journal of Computational and Graphical Statistics paper, <doi:10.1080/10618600.2018.1513367> describes the concepts implemented.
Maintained by Toby Hocking. Last updated 1 months ago.
5.3 match 64 stars 8.84 score 173 scriptsbioc
IRanges:Foundation of integer range manipulation in Bioconductor
Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
3.0 match 22 stars 15.09 score 2.1k scripts 1.8k dependentsdataoneorg
dataone:R Interface to the DataONE REST API
Provides read and write access to data and metadata from the DataONE network <https://www.dataone.org> of data repositories. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository. Users can also insert and update data objects on repositories that support these methods.
Maintained by Matthew B. Jones. Last updated 3 years ago.
4.5 match 36 stars 9.93 score 472 scripts 3 dependentsdariah-fi-survey-concept-network
finnsurveytext:Analyse Open-Ended Survey Responses in Finnish
Annotates Finnish textual survey responses into CoNLL-U format using Finnish treebanks from <https://universaldependencies.org/format.html> using UDPipe as described in Straka and Straková (2017) <doi:10.18653/v1/K17-3009>. Formatted data is then analysed using single or comparison n-gram plots, wordclouds, summary tables and Concept Network plots. The Concept Network plots use the TextRank algorithm as outlined in Mihalcea, Rada & Tarau, Paul (2004) <https://aclanthology.org/W04-3252/>.
Maintained by Adeline Clarke. Last updated 14 days ago.
8.4 match 5.39 score 27 scriptssfilges
umiAnalyzer:Tools for Analyzing Sequencing Data with Unique Molecular Identifiers
Tools for analyzing sequencing data containing unique molecular identifiers generated by 'UMIErrorCorrect' (<https://github.com/stahlberggroup/umierrorcorrect>).
Maintained by Stefan Filges. Last updated 3 years ago.
targeted-sequencingunique-molecular-identifiersvariant-analysis
10.1 match 4.46 score 58 scriptstidyverse
ggplot2:Create Elegant Data Visualisations Using the Grammar of Graphics
A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
Maintained by Thomas Lin Pedersen. Last updated 13 days ago.
data-visualisationvisualisation
1.8 match 6.6k stars 25.10 score 645k scripts 7.5k dependentsdata-cleaning
validate:Data Validation Infrastructure
Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. Supports checks implied by an SDMX DSD file as well. See also Van der Loo and De Jonge (2018) <doi:10.1002/9781118897126>, Chapter 6 and the JSS paper (2021) <doi:10.18637/jss.v097.i10>.
Maintained by Mark van der Loo. Last updated 16 days ago.
3.5 match 418 stars 12.50 score 448 scripts 9 dependentsmmaechler
sfsmisc:Utilities from 'Seminar fuer Statistik' ETH Zurich
Useful utilities ['goodies'] from Seminar fuer Statistik ETH Zurich, some of which were ported from S-plus in the 1990s. For graphics, have pretty (Log-scale) axes eaxis(), an enhanced Tukey-Anscombe plot, combining histogram and boxplot, 2d-residual plots, a 'tachoPlot()', pretty arrows, etc. For robustness, have a robust F test and robust range(). For system support, notably on Linux, provides 'Sys.*()' functions with more access to system and CPU information. Finally, miscellaneous utilities such as simple efficient prime numbers, integer codes, Duplicated(), toLatex.numeric() and is.whole().
Maintained by Martin Maechler. Last updated 5 months ago.
4.0 match 11 stars 10.87 score 566 scripts 119 dependentslarmarange
labelled:Manipulating Labelled Data
Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.
Maintained by Joseph Larmarange. Last updated 30 days ago.
havenlabelsmetadatasasspssstata
2.9 match 76 stars 15.04 score 2.4k scripts 98 dependentsschochastics
netrankr:Analyzing Partial Rankings in Networks
Implements methods for centrality related analyses of networks. While the package includes the possibility to build more than 20 indices, its main focus lies on index-free assessment of centrality via partial rankings obtained by neighborhood-inclusion or positional dominance. These partial rankings can be analyzed with different methods, including probabilistic methods like computing expected node ranks and relative rank probabilities (how likely is it that a node is more central than another?). The methodology is described in depth in the vignettes and in Schoch (2018) <doi:10.1016/j.socnet.2017.12.003>.
Maintained by David Schoch. Last updated 1 months ago.
network-analysisnetwork-centralityopenblascppopenmp
4.5 match 49 stars 9.56 score 91 scripts 2 dependentsepimodel
EpiModel:Mathematical Modeling of Infectious Disease Dynamics
Tools for simulating mathematical models of infectious disease dynamics. Epidemic model classes include deterministic compartmental models, stochastic individual-contact models, and stochastic network models. Network models use the robust statistical methods of exponential-family random graph models (ERGMs) from the Statnet suite of software packages in R. Standard templates for epidemic modeling include SI, SIR, and SIS disease types. EpiModel features an API for extending these templates to address novel scientific research aims. Full methods for EpiModel are detailed in Jenness et al. (2018, <doi:10.18637/jss.v084.i08>).
Maintained by Samuel Jenness. Last updated 2 months ago.
agent-based-modelingepidemicsepidemiologyinfectious-diseasesnetwork-graphcpp
3.7 match 250 stars 11.57 score 315 scriptsmhahsler
arules:Mining Association Rules and Frequent Itemsets
Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.
Maintained by Michael Hahsler. Last updated 1 months ago.
arulesassociation-rulesfrequent-itemsets
3.0 match 194 stars 13.99 score 3.3k scripts 28 dependentsggseg
ggseg:Plotting Tool for Brain Atlases
Contains 'ggplot2' geom for plotting brain atlases using simple features. The largest component of the package is the data for the two built-in atlases. Mowinckel & Vidal-Piñeiro (2020) <doi:10.1177/2515245920928009>.
Maintained by Athanasia Mo Mowinckel. Last updated 2 years ago.
3.6 match 221 stars 11.57 score 590 scripts 14 dependentsrandrescastaneda
joyn:Tool for Diagnosis of Tables Joins and Complementary Join Features
Tool for diagnosing table joins. It combines the speed of `collapse` and `data.table`, the flexibility of `dplyr`, and the diagnosis and features of the `merge` command in `Stata`.
Maintained by R.Andres Castaneda. Last updated 3 months ago.
6.1 match 9 stars 6.83 score 31 scriptsrdatatable
data.table:Extension of `data.frame`
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
Maintained by Tyson Barrett. Last updated 3 days ago.
1.8 match 3.7k stars 23.53 score 230k scripts 4.6k dependentsmetabocomp
MUVR2:Multivariate Methods with Unbiased Variable Selection
Predictive multivariate modelling for metabolomics. Types: Classification and regression. Methods: Partial Least Squares, Random Forest ans Elastic Net Data structures: Paired and unpaired Validation: repeated double cross-validation (Westerhuis et al. (2008)<doi:10.1007/s11306-007-0099-6>, Filzmoser et al. (2009)<doi:10.1002/cem.1225>) Variable selection: Performed internally, through tuning in the inner cross-validation loop.
Maintained by Yingxiao Yan. Last updated 6 months ago.
10.0 match 2 stars 4.04 score 1 scriptsandrie
surveydata:Tools to Work with Survey Data
Data obtained from surveys contains information not only about the survey responses, but also the survey metadata, e.g. the original survey questions and the answer options. The 'surveydata' package makes it easy to keep track of this metadata, and to easily extract columns with specific questions.
Maintained by Andrie de Vries. Last updated 2 years ago.
7.0 match 23 stars 5.68 score 42 scriptsbioc
QFeatures:Quantitative features for mass spectrometry data
The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.
Maintained by Laurent Gatto. Last updated 16 days ago.
infrastructuremassspectrometryproteomicsmetabolomicsbioconductormass-spectrometry
3.3 match 27 stars 11.87 score 278 scripts 49 dependentsrstudio
reticulate:Interface to 'Python'
Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.
Maintained by Tomasz Kalinowski. Last updated 2 days ago.
1.9 match 1.7k stars 21.05 score 18k scripts 432 dependentsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
5.5 match 145 stars 7.09 score 50 scripts 2 dependentsnathaneastwood
poorman:A Poor Man's Dependency Free Recreation of 'dplyr'
A replication of key functionality from 'dplyr' and the wider 'tidyverse' using only 'base'.
Maintained by Nathan Eastwood. Last updated 1 years ago.
base-rdata-manipulationgrammar
3.6 match 341 stars 10.79 score 156 scripts 27 dependentsropensci
ritis:Integrated Taxonomic Information System Client
An interface to the Integrated Taxonomic Information System ('ITIS') (<https://www.itis.gov>). Includes functions to work with the 'ITIS' REST API methods (<https://www.itis.gov/ws_description.html>), as well as the 'Solr' web service (<https://www.itis.gov/solr_documentation.html>).
Maintained by Julia Blum. Last updated 1 months ago.
taxonomybiologynomenclaturejsonapiwebapi-clientidentifiersspeciesnamesapi-wrapperitistaxize
5.1 match 16 stars 7.72 score 64 scripts 24 dependentsemmanuelparadis
ape:Analyses of Phylogenetics and Evolution
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
Maintained by Emmanuel Paradis. Last updated 4 days ago.
2.3 match 64 stars 17.22 score 13k scripts 599 dependentsjonesor
Rcompadre:Utilities for using the 'COM(P)ADRE' Matrix Model Database
Utility functions for interacting with the 'COMPADRE' and 'COMADRE' databases of matrix population models. Described in Jones et al. (2021) <doi:10.1101/2021.04.26.441330>.
Maintained by Owen Jones. Last updated 5 months ago.
4.9 match 11 stars 7.74 score 55 scripts 2 dependentstmungle
allMT:Acute Lymphoblastic Leukemia Maintenance Therapy Analysis
Evaluates acute lymphoblastic leukemia maintenance therapy practice at patient and cohort level.
Maintained by Tushar Mungle. Last updated 29 days ago.
10.1 match 3.70 score 2 scriptsopengeos
whitebox:'WhiteboxTools' R Frontend
An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.
Maintained by Andrew Brown. Last updated 5 months ago.
geomorphometrygeoprocessinggeospatialgishydrologyremote-sensingrstudio
3.9 match 173 stars 9.65 score 203 scripts 2 dependentsropensci
rsat:Dealing with Multiplatform Satellite Images
Downloading, customizing, and processing time series of satellite images for a region of interest. 'rsat' functions allow a unified access to multispectral images from Landsat, MODIS and Sentinel repositories. 'rsat' also offers capabilities for customizing satellite images, such as tile mosaicking, image cropping and new variables computation. Finally, 'rsat' covers the processing, including cloud masking, compositing and gap-filling/smoothing time series of images (Militino et al., 2018 <doi:10.3390/rs10030398> and Militino et al., 2019 <doi:10.1109/TGRS.2019.2904193>).
Maintained by Unai Pérez - Goya. Last updated 11 months ago.
5.0 match 54 stars 7.45 score 52 scriptsmelff
memisc:Management of Survey Data and Presentation of Analysis Results
An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.
Maintained by Martin Elff. Last updated 15 days ago.
3.0 match 46 stars 12.34 score 1.2k scripts 13 dependentstidyverse
dbplyr:A 'dplyr' Back End for Databases
A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.
Maintained by Hadley Wickham. Last updated 4 months ago.
1.9 match 481 stars 19.72 score 5.2k scripts 736 dependentsgagolews
stringi:Fast and Portable Character String Processing Facilities
A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).
Maintained by Marek Gagolewski. Last updated 2 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringistringrtexttext-processingtidy-dataunicodecpp
2.0 match 309 stars 18.31 score 10k scripts 8.6k dependentstidyverse
forcats:Tools for Working with Categorical Variables (Factors)
Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').
Maintained by Hadley Wickham. Last updated 1 years ago.
1.9 match 555 stars 18.77 score 21k scripts 1.2k dependentspauljohn32
kutils:Project Management Tools
Tools for data importation, recoding, and inspection. There are functions to create new project folders, R code templates, create uniquely named output directories, and to quickly obtain a visual summary for each variable in a data frame. The main feature here is the systematic implementation of the "variable key" framework for data importation and recoding. We are eager to have community feedback about the variable key and the vignette about it. In version 1.7, the function 'semTable' is removed. It was deprecated since 1.67. That is provided in a separate package, 'semTable'.
Maintained by Paul Johnson. Last updated 2 years ago.
6.0 match 5.85 score 110 scripts 20 dependentsr-forge
Rmpfr:Interface R to MPFR - Multiple Precision Floating-Point Reliable
Arithmetic (via S4 classes and methods) for arbitrary precision floating point numbers, including transcendental ("special") functions. To this end, the package interfaces to the 'LGPL' licensed 'MPFR' (Multiple Precision Floating-Point Reliable) Library which itself is based on the 'GMP' (GNU Multiple Precision) Library.
Maintained by Martin Maechler. Last updated 4 months ago.
3.0 match 11.30 score 316 scripts 141 dependentscran
ensembleTax:Ensemble Taxonomic Assignments of Amplicon Sequencing Data
Creates ensemble taxonomic assignments of amplicon sequencing data in R using outputs of multiple taxonomic assignment algorithms and/or reference databases. Includes flexible algorithms for mapping taxonomic nomenclatures onto one another and for computing ensemble taxonomic assignments.
Maintained by Dylan Catlett. Last updated 4 years ago.
13.5 match 2.48 score 7 scriptsbioc
Biobase:Biobase: Base functions for Bioconductor
Functions that are needed by many other packages or which replace R functions.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructurebioconductor-packagecore-package
2.0 match 9 stars 16.45 score 6.6k scripts 1.8k dependentsgrunwaldlab
poppr:Genetic Analysis of Populations with Mixed Reproduction
Population genetic analyses for hierarchical analysis of partially clonal populations built upon the architecture of the 'adegenet' package. Originally described in Kamvar, Tabima, and Grünwald (2014) <doi:10.7717/peerj.281> with version 2.0 described in Kamvar, Brooks, and Grünwald (2015) <doi:10.3389/fgene.2015.00208>.
Maintained by Zhian N. Kamvar. Last updated 10 months ago.
clonalitygenetic-analysisgenetic-distancesminimum-spanning-networksmultilocus-genotypesmultilocus-lineagespopulation-geneticspopulationsopenmp
3.0 match 69 stars 10.84 score 672 scriptsrstudio
tfruns:Training Run Tools for 'TensorFlow'
Create and manage unique directories for each 'TensorFlow' training run. Provides a unique, time stamped directory for each run along with functions to retrieve the directory of the latest run or latest several runs.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
2.7 match 34 stars 11.80 score 325 scripts 77 dependentscran
circular:Circular Statistics
Circular Statistics, from "Topics in circular Statistics" (2001) S. Rao Jammalamadaka and A. SenGupta, World Scientific.
Maintained by Eduardo García-Portugués. Last updated 7 months ago.
4.0 match 7 stars 7.76 score 1.1k scripts 40 dependentscbielow
PTXQC:Quality Report Generation for MaxQuant and mzTab Results
Generates Proteomics (PTX) quality control (QC) reports for shotgun LC-MS data analyzed with the MaxQuant software suite (from .txt files) or mzTab files (ideally from OpenMS 'QualityControl' tool). Reports are customizable (target thresholds, subsetting) and available in HTML or PDF format. Published in J. Proteome Res., Proteomics Quality Control: Quality Control Software for MaxQuant Results (2015) <doi:10.1021/acs.jproteome.5b00780>.
Maintained by Chris Bielow. Last updated 1 years ago.
drag-and-drophacktoberfestheatmapmatch-between-runsmaxquantmetricmztabopenmsproteomicsquality-controlquality-metricsreport
3.3 match 42 stars 9.35 score 105 scripts 1 dependentshubverse-org
hubValidations:Testing framework for hubverse hub validations
This package aims at providing a simple interface to run validations on data and metadata submitted to a hubverse modeling hub. Validation tests can be run at different levels (single file, single folder, whole repository) and locally as well as part of a continuous integration workflow.
Maintained by Anna Krystalli. Last updated 8 days ago.
6.6 match 1 stars 4.67 score 27 scripts 1 dependentshypertidy
silicate:Common Forms for Complex Hierarchical and Relational Data Structures
Generate common data forms for complex data suitable for conversions and transmission by decomposition as paths or primitives. Paths are sequentially-linked records, primitives are basic atomic elements and both can model many forms and be grouped into hierarchical structures. The universal models 'SC0' (structural) and 'SC' (labelled, relational) are composed of edges and can represent any hierarchical form. Specialist models 'PATH', 'ARC' and 'TRI' provide the most common intermediate forms used for converting from one form to another. The methods are inspired by the simplicial complex <https://en.wikipedia.org/wiki/Simplicial_complex> and provide intermediate forms that relate spatial data structures to this mathematical construct.
Maintained by Michael D. Sumner. Last updated 1 years ago.
hierarchical-datasimplicial-complexspatial-datastructural-primitivestopologytriangulation
4.3 match 54 stars 7.28 score 111 scripts 7 dependentstidyverse
dtplyr:Data Table Back-End for 'dplyr'
Provides a data.table backend for 'dplyr'. The goal of 'dtplyr' is to allow you to write 'dplyr' code that is automatically translated to the equivalent, but usually much faster, data.table code.
Maintained by Hadley Wickham. Last updated 2 months ago.
1.9 match 671 stars 16.27 score 2.5k scripts 147 dependentsquantmeth
Rnest:Next Eigenvalue Sufficiency Test
Determine the number of dimensions to retain in exploratory factor analysis. The main function, nest(), returns the solution and the plot(nest()) returns a plot.
Maintained by P.-O. Caron. Last updated 2 months ago.
exploratory-data-analysisfactor-analysis
7.3 match 2 stars 4.02 score 13 scriptsrevelle
psychTools:Tools to Accompany the 'psych' Package for Psychological Research
Support functions, data sets, and vignettes for the 'psych' package. Contains several of the biggest data sets for the 'psych' package as well as four vignettes. A few helper functions for file manipulation are included as well. For more information, see the <https://personality-project.org/r/> web page.
Maintained by William Revelle. Last updated 1 years ago.
5.0 match 5.89 score 178 scripts 5 dependentsbeckerbenj
eatGADS:Data Management of Large Hierarchical Data
Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.
Maintained by Benjamin Becker. Last updated 27 days ago.
4.0 match 1 stars 7.36 score 34 scripts 1 dependentsoobianom
quickcode:Quick and Essential 'R' Tricks for Better Scripts
The NOT functions, 'R' tricks and a compilation of some simple quick plus often used 'R' codes to improve your scripts. Improve the quality and reproducibility of 'R' scripts.
Maintained by Obinna Obianom. Last updated 17 days ago.
3.8 match 5 stars 7.76 score 7 scripts 6 dependentshughparsonage
hutils:Miscellaneous R Functions and Aliases
Provides utility functions for, and drawing on, the 'data.table' package. The package also collates useful miscellaneous functions extending base R not available elsewhere. The name is a portmanteau of 'utils' and the author.
Maintained by Hugh Parsonage. Last updated 2 years ago.
3.8 match 12 stars 7.76 score 219 scripts 8 dependentsstatisticsnorway
SSBtools:Algorithms and Tools for Tabular Statistics and Hierarchical Computations
Includes general data manipulation functions, algorithms for statistical disclosure control (Langsrud, 2024) <doi:10.1007/978-3-031-69651-0_6> and functions for hierarchical computations by sparse model matrices (Langsrud, 2023) <doi:10.32614/RJ-2023-088>.
Maintained by Øyvind Langsrud. Last updated 6 days ago.
3.7 match 7 stars 7.62 score 68 scripts 7 dependentsmlverse
torch:Tensors and Neural Networks with 'GPU' Acceleration
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
Maintained by Daniel Falbel. Last updated 10 days ago.
1.7 match 520 stars 16.52 score 1.4k scripts 38 dependentsbioc
CNEr:CNE Detection and Visualization
Large-scale identification and advanced visualization of sets of conserved noncoding elements.
Maintained by Ge Tan. Last updated 5 months ago.
generegulationvisualizationdataimport
3.0 match 3 stars 9.28 score 35 scripts 19 dependentsreconhub
epicontacts:Handling, Visualisation and Analysis of Epidemiological Contacts
A collection of tools for representing epidemiological contact data, composed of case line lists and contacts between cases. Also contains procedures for data handling, interactive graphics, and statistics.
Maintained by Finlay Campbell. Last updated 2 months ago.
3.1 match 15 stars 8.86 score 112 scripts 2 dependentsrqtl
qtl2convert:Convert Data among QTL Mapping Packages
Functions to convert data structures among the 'qtl2', 'qtl', and 'DOQTL' packages for mapping quantitative trait loci (QTL).
Maintained by Karl W Broman. Last updated 12 months ago.
5.2 match 5 stars 5.24 score 230 scripts 1 dependentsgforge
htmlTable:Advanced Tables for Markdown/HTML
Tables with state-of-the-art layout elements such as row spanners, column spanners, table spanners, zebra striping, and more. While allowing advanced layout, the underlying css-structure is simple in order to maximize compatibility with common word processors. The package also contains a few text formatting functions that help outputting text compatible with HTML/LaTeX.
Maintained by Max Gordon. Last updated 8 months ago.
1.8 match 79 stars 15.32 score 1.3k scripts 763 dependentspalaeoverse
palaeoverse:Prepare and Explore Data for Palaeobiological Analyses
Provides functionality to support data preparation and exploration for palaeobiological analyses, improving code reproducibility and accessibility. The wider aim of 'palaeoverse' is to bring the palaeobiological community together to establish agreed standards. The package currently includes functionality for data cleaning, binning (time and space), exploration, summarisation and visualisation. Reference datasets (i.e. Geological Time Scales <https://stratigraphy.org/chart>) and auxiliary functions are also provided. Details can be found in: Jones et al., (2023) <doi: 10.1111/2041-210X.14099>.
Maintained by Lewis A. Jones. Last updated 5 months ago.
biodiversityfossilpalaeobiologypaleobiology
3.1 match 21 stars 8.57 score 44 scripts 1 dependentspoissonconsulting
chk:Check User-Supplied Function Arguments
For developers to check user-supplied function arguments. It is designed to be simple, fast and customizable. Error messages follow the tidyverse style guide.
Maintained by Joe Thorley. Last updated 2 months ago.
2.3 match 48 stars 11.89 score 22 scripts 95 dependentssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 2 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
1.8 match 959 stars 15.20 score 4.0k scripts 21 dependentsr-lib
here:A Simpler Way to Find Your Files
Constructs paths to your project's files. Declare the relative path of a file within your project with 'i_am()'. Use the 'here()' function as a drop-in replacement for 'file.path()', it will always locate the files relative to your project root.
Maintained by Kirill Müller. Last updated 4 days ago.
1.3 match 417 stars 19.62 score 96k scripts 607 dependentshojsgaard
doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities
Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.
Maintained by Søren Højsgaard. Last updated 8 days ago.
1.8 match 1 stars 14.94 score 3.2k scripts 939 dependentsshaunpwilkinson
insect:Informatic Sequence Classification Trees
Provides tools for probabilistic taxon assignment with informatic sequence classification trees. See Wilkinson et al (2018) <doi:10.7287/peerj.preprints.26812v1>.
Maintained by Shaun Wilkinson. Last updated 4 years ago.
4.5 match 14 stars 5.80 score 91 scriptspauljohn32
rockchalk:Regression Estimation and Presentation
A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.
Maintained by Paul E. Johnson. Last updated 3 years ago.
3.7 match 7.13 score 584 scripts 18 dependentssocialresearchcentre
testdat:Data Unit Testing for R
Test your data! An extension of the 'testthat' unit testing framework with a family of functions and reporting tools for checking and validating data frames.
Maintained by Danny Smith. Last updated 10 months ago.
4.5 match 8 stars 5.78 score 50 scriptslrberge
stringmagic:Character String Operations and Interpolation, Magic Edition
Performs complex string operations compactly and efficiently. Supports string interpolation jointly with over 50 string operations. Also enhances regular string functions (like grep() and co). See an introduction at <https://lrberge.github.io/stringmagic/>.
Maintained by Laurent R Berge. Last updated 7 months ago.
2.5 match 15 stars 10.56 score 37 scripts 33 dependentstidymodels
dials:Tools for Creating Tuning Parameter Values
Many models contain tuning parameters (i.e. parameters that cannot be directly estimated from the data). These tools can be used to define objects for creating, simulating, or validating values for such parameters.
Maintained by Hannah Frick. Last updated 1 months ago.
1.8 match 114 stars 14.31 score 426 scripts 52 dependentslrberge
fixest:Fast Fixed-Effects Estimations
Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.
Maintained by Laurent Berge. Last updated 7 months ago.
1.8 match 387 stars 14.69 score 3.8k scripts 25 dependentsokdll
flowTraceR:Tracing Information Flow for Inter-Software Comparisons in Mass Spectrometry-Based Bottom-Up Proteomics
Useful functions to standardize software outputs from ProteomeDiscoverer, Spectronaut, DIA-NN and MaxQuant on precursor, modified peptide and proteingroup level and to trace software differences for identifications such as varying proteingroup denotations for common precursor.
Maintained by Oliver Kardell. Last updated 3 years ago.
4.9 match 3 stars 5.17 score 11 scripts 1 dependentsdselivanov
text2vec:Modern Text Mining Framework for R
Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.
Maintained by Dmitriy Selivanov. Last updated 7 months ago.
glovelatent-dirichlet-allocationnatural-language-processingtext-miningtopic-modelingvectorizationword-embeddingsword2veccpp
1.9 match 860 stars 13.48 score 1.3k scripts 23 dependentsbioc
rols:An R interface to the Ontology Lookup Service
The rols package is an interface to the Ontology Lookup Service (OLS) to access and query hundred of ontolgies directly from R.
Maintained by Laurent Gatto. Last updated 5 months ago.
immunooncologysoftwareannotationmassspectrometrygo
3.0 match 11 stars 8.30 score 89 scripts 5 dependentscmlmagneville
mFD:Compute and Illustrate the Multiple Facets of Functional Diversity
Computing functional traits-based distances between pairs of species for species gathered in assemblages allowing to build several functional spaces. The package allows to compute functional diversity indices assessing the distribution of species (and of their dominance) in a given functional space for each assemblage and the overlap between assemblages in a given functional space, see: Chao et al. (2018) <doi:10.1002/ecm.1343>, Maire et al. (2015) <doi:10.1111/geb.12299>, Mouillot et al. (2013) <doi:10.1016/j.tree.2012.10.004>, Mouillot et al. (2014) <doi:10.1073/pnas.1317625111>, Ricotta and Szeidl (2009) <doi:10.1016/j.tpb.2009.10.001>. Graphical outputs are included. Visit the 'mFD' website for more information, documentation and examples.
Maintained by Camille Magneville. Last updated 3 months ago.
3.4 match 26 stars 7.22 score 61 scriptscran
flexmix:Flexible Mixture Modeling
A general framework for finite mixtures of regression models using the EM algorithm is implemented. The E-step and all data handling are provided, while the M-step can be supplied by the user to easily define new models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering.
Maintained by Bettina Gruen. Last updated 20 days ago.
3.0 match 5 stars 8.19 score 113 dependentsbioc
VennDetail:A package for visualization and extract details
A set of functions to generate high-resolution Venn,Vennpie plot,extract and combine details of these subsets with user datasets in data frame is available.
Maintained by Kai Guo. Last updated 5 months ago.
datarepresentationgraphandnetworkextractvenndiagram
3.6 match 29 stars 6.75 score 65 scriptsknausb
vcfR:Manipulate and Visualize VCF Data
Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.
Maintained by Brian J. Knaus. Last updated 26 days ago.
genomicspopulation-geneticspopulation-genomicsrcppvcf-datavisualizationzlibcpp
1.8 match 254 stars 13.59 score 3.1k scripts 19 dependentskkholst
lava:Latent Variable Models
A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.
Maintained by Klaus K. Holst. Last updated 2 months ago.
latent-variable-modelssimulationstatisticsstructural-equation-models
1.9 match 33 stars 12.85 score 610 scripts 476 dependentsstatisticsnorway
GaussSuppression:Tabular Data Suppression using Gaussian Elimination
A statistical disclosure control tool to protect tables by suppression using the Gaussian elimination secondary suppression algorithm (Langsrud, 2024) <doi:10.1007/978-3-031-69651-0_6>. A suggestion is to start by working with functions SuppressSmallCounts() and SuppressDominantCells(). These functions use primary suppression functions for the minimum frequency rule and the dominance rule, respectively. Novel functionality for suppression of disclosive cells is also included. General primary suppression functions can be supplied as input to the general working horse function, GaussSuppressionFromData(). Suppressed frequencies can be replaced by synthetic decimal numbers as described in Langsrud (2019) <doi:10.1007/s11222-018-9848-9>.
Maintained by Øyvind Langsrud. Last updated 6 days ago.
3.6 match 2 stars 6.61 score 50 scriptsai4ci
interfacer:Define and Enforce Contracts for Dataframes as Function Parameters
A dataframe validation framework for package builders who use dataframes as function parameters. It performs checks on column names, coerces data-types, and checks grouping to make sure user inputs conform to a specification provided by the package author. It provides a mechanism for package authors to automatically document supported dataframe inputs and selectively dispatch to functions depending on the format of a dataframe much like S3 does for classes. It also contains some developer tools to make working with and documenting dataframe specifications easier. It helps package developers to improve their documentation and simplifies parameter validation where dataframes are used as function parameters.
Maintained by Robert Challen. Last updated 2 months ago.
3.7 match 2 stars 6.43 score 2 dependentsbioc
xcms:LC-MS and GC-MS Data Analysis
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.
Maintained by Steffen Neumann. Last updated 6 days ago.
immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp
1.7 match 196 stars 14.31 score 984 scripts 11 dependentscran
mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.
Maintained by Simon Wood. Last updated 1 years ago.
1.9 match 32 stars 12.71 score 17k scripts 7.8k dependentspachadotdev
cpp11armadillo:An 'Armadillo' Interface
Provides function declarations and inline function definitions that facilitate communication between R and the 'Armadillo' 'C++' library for linear algebra and scientific computing. This implementation is detailed in Vargas Sepulveda and Schneider Malamud (2024) <doi:10.48550/arXiv.2408.11074>.
Maintained by Mauricio Vargas Sepulveda. Last updated 29 days ago.
armadillocppcpp11hacktoberfestlinear-algebra
2.6 match 9 stars 9.14 score 1 scripts 16 dependentsbioc
phyloseq:Handling and analysis of high-throughput microbiome census data
phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.
Maintained by Paul J. McMurdie. Last updated 5 months ago.
immunooncologysequencingmicrobiomemetagenomicsclusteringclassificationmultiplecomparisongeneticvariability
1.7 match 597 stars 13.90 score 8.4k scripts 37 dependentstin900
vvauditor:Creates Assertion Tests
Offers a comprehensive set of assertion tests to help users validate the integrity of their data. These tests can be used to check for specific conditions or properties within a dataset and help ensure that data is accurate and reliable. The package is designed to make it easy to add quality control checks to data analysis workflows and to aid in identifying and correcting any errors or inconsistencies in data.
Maintained by Tomer Iwan. Last updated 1 months ago.
5.8 match 4.03 score 7 scriptsdieghernan
tidyterra:'tidyverse' Methods and 'ggplot2' Helpers for 'terra' Objects
Extension of the 'tidyverse' for 'SpatRaster' and 'SpatVector' objects of the 'terra' package. It includes also new 'geom_' functions that provide a convenient way of visualizing 'terra' objects with 'ggplot2'.
Maintained by Diego Hernangómez. Last updated 11 hours ago.
terraggplot-extensionr-spatialrspatial
1.7 match 191 stars 13.61 score 1.9k scripts 25 dependentsropensci
rentrez:'Entrez' in R
Provides an R interface to the NCBI's 'EUtils' API, allowing users to search databases like 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and 'PubMed' <https://pubmed.ncbi.nlm.nih.gov/>, process the results of those searches and pull data into their R sessions.
Maintained by David Winter. Last updated 4 years ago.
1.7 match 199 stars 13.60 score 784 scripts 95 dependentsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 3 days ago.
1.7 match 845 stars 13.60 score 264 scripts 2 dependentsropensci
rgbif:Interface to the Global Biodiversity Information Facility API
A programmatic interface to the Web Service methods provided by the Global Biodiversity Information Facility (GBIF; <https://www.gbif.org/developer/summary>). GBIF is a database of species occurrence records from sources all over the globe. rgbif includes functions for searching for taxonomic names, retrieving information on data providers, getting species occurrence records, getting counts of occurrence records, and using the GBIF tile map service to make rasters summarizing huge amounts of data.
Maintained by John Waller. Last updated 7 days ago.
gbifspecimensapiweb-servicesoccurrencesspeciestaxonomybiodiversitydatalifewatchoscibiospocc
1.8 match 161 stars 13.26 score 2.1k scripts 20 dependentsgluc
data.tree:General Purpose Hierarchical Data Structure
Create tree structures from hierarchical data, and traverse the tree in various orders. Aggregate, cumulate, print, plot, convert to and from data.frame and more. Useful for decision trees, machine learning, finance, conversion from and to JSON, and many other applications.
Maintained by Christoph Glur. Last updated 5 months ago.
1.8 match 209 stars 12.84 score 1.1k scripts 88 dependentssandrinepavoine
adiv:Analysis of Diversity
Functions, data sets and examples for the calculation of various indices of biodiversity including species, functional and phylogenetic diversity. Part of the indices are expressed in terms of equivalent numbers of species. The package also provides ways to partition biodiversity across spatial or temporal scales (alpha, beta, gamma diversities). In addition to the quantification of biodiversity, ordination approaches are available which rely on diversity indices and allow the detailed identification of species, functional or phylogenetic differences between communities.
Maintained by Sandrine Pavoine. Last updated 1 years ago.
10.1 match 1 stars 2.28 score 63 scriptsbioc
doubletrouble:Identification and classification of duplicated genes
doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.
Maintained by Fabrício Almeida-Silva. Last updated 7 days ago.
softwarewholegenomecomparativegenomicsfunctionalgenomicsphylogeneticsnetworkclassificationbioinformaticscomparative-genomicsgene-duplicationmolecular-evolutionwhole-genome-duplication
3.5 match 23 stars 6.44 score 17 scriptsbioc
metagenomeSeq:Statistical analysis for sparse high-throughput sequencing
metagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.
Maintained by Joseph N. Paulson. Last updated 3 months ago.
immunooncologyclassificationclusteringgeneticvariabilitydifferentialexpressionmicrobiomemetagenomicsnormalizationvisualizationmultiplecomparisonsequencingsoftware
1.9 match 69 stars 11.90 score 494 scripts 7 dependentsfatelarico
FinNet:Quickly Build and Manipulate Financial Networks
Providing classes, methods, and functions to deal with financial networks. Users can easily store information about both physical and legal persons by using pre-made classes that are studied for integration with scraping packages such as 'rvest' and 'RSelenium'. Moreover, the package assists in creating various types of financial networks depending on the type of relation between its units depending on the relation under scrutiny (ownership, board interlocks, etc.), the desired tie type (valued or binary), and renders them in the most common formats (adjacency matrix, incidence matrix, edge list, 'igraph', 'network'). There are also ad-hoc functions for the Fiedler value, global network efficiency, and cascade-failure analysis.
Maintained by Fabio Ashtar Telarico. Last updated 5 months ago.
4.7 match 2 stars 4.78 score 7 scriptsinsightsengineering
teal:Exploratory Web Apps for Analyzing Clinical Trials Data
A 'shiny' based interactive exploration framework for analyzing clinical trials data. 'teal' currently provides a dynamic filtering facility and different data viewers. 'teal' 'shiny' applications are built using standard 'shiny' modules.
Maintained by Dawid Kaledkowski. Last updated 24 days ago.
clinical-trialsnestshinywebapp
1.8 match 197 stars 12.68 score 176 scripts 5 dependentshadley
reshape:Flexibly Reshape Data
Flexibly restructure and aggregate data using just two functions: melt and cast.
Maintained by Hadley Wickham. Last updated 3 years ago.
2.3 match 9.83 score 21k scripts 231 dependentsluca-scr
mclust:Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation
Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.
Maintained by Luca Scrucca. Last updated 11 months ago.
1.8 match 21 stars 12.23 score 6.6k scripts 587 dependentsgogonzo
runner:Running Operations for Vectors
Lightweight library for rolling windows operations. Package enables full control over the window length, window lag and a time indices. With a runner one can apply any R function on a rolling windows. The package eases work with equally and unequally spaced time series.
Maintained by Dawid Kałędkowski. Last updated 1 years ago.
2.4 match 51 stars 9.11 score 212 scripts 4 dependentsbblonder
hypervolume:High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls
Estimates the shape and volume of high-dimensional datasets and performs set operations: intersection / overlap, union, unique components, inclusion test, and hole detection. Uses stochastic geometry approach to high-dimensional kernel density estimation, support vector machine delineation, and convex hull generation. Applications include modeling trait and niche hypervolumes and species distribution modeling.
Maintained by Benjamin Blonder. Last updated 2 months ago.
2.3 match 23 stars 9.69 score 211 scripts 7 dependentsbioc
BioQC:Detect tissue heterogeneity in expression profiles with gene sets
BioQC performs quality control of high-throughput expression data based on tissue gene signatures. It can detect tissue heterogeneity in gene expression data. The core algorithm is a Wilcoxon-Mann-Whitney test that is optimised for high performance.
Maintained by Jitao David Zhang. Last updated 5 months ago.
geneexpressionqualitycontrolstatisticalmethodgenesetenrichmentcpp
2.7 match 5 stars 8.16 score 86 scriptsbioc
plyranges:A fluent interface for manipulating GenomicRanges
A dplyr-like interface for interacting with the common Bioconductor classes Ranges and GenomicRanges. By providing a grammatical and consistent way of manipulating these classes their accessiblity for new Bioconductor users is hopefully increased.
Maintained by Michael Love. Last updated 20 hours ago.
infrastructuredatarepresentationworkflowstepcoveragebioconductordata-analysisdplyrgenomic-rangesgenomicstidy-data
1.7 match 144 stars 12.66 score 1.9k scripts 20 dependentszhenkewu
baker:"Nested Partially Latent Class Models"
Provides functions to specify, fit and visualize nested partially-latent class models ( Wu, Deloria-Knoll, Hammitt, and Zeger (2016) <doi:10.1111/rssc.12101>; Wu, Deloria-Knoll, and Zeger (2017) <doi:10.1093/biostatistics/kxw037>; Wu and Chen (2021) <doi:10.1002/sim.8804>) for inference of population disease etiology and individual diagnosis. In the motivating Pneumonia Etiology Research for Child Health (PERCH) study, because both quantities of interest sum to one hundred percent, the PERCH scientists frequently refer to them as population etiology pie and individual etiology pie, hence the name of the package.
Maintained by Zhenke Wu. Last updated 11 months ago.
bayesiancase-controllatent-class-analysisjagscpp
3.6 match 8 stars 6.00 score 21 scriptsropensci
phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'
A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.
Maintained by Shixiang Wang. Last updated 8 months ago.
blastngenbankpeer-reviewedphylogeneticssequence-alignment
3.7 match 23 stars 5.86 score 156 scriptspredictiveecology
LandR:Landscape Ecosystem Modelling in R
Utilities for 'LandR' suite of landscape simulation models. These models simulate forest vegetation dynamics based on LANDIS-II, and incorporate fire and insect disturbance, as well as other important ecological processes. Models are implemented as 'SpaDES' modules.
Maintained by Eliot J B McIntire. Last updated 8 days ago.
ecological-modellinglandscape-ecosystem-modellingspades
3.5 match 17 stars 6.07 score 12 scripts 4 dependentsjoshwlambert
DAISIEprep:Extracts Phylogenetic Island Community Data from Phylogenetic Trees
Extracts colonisation and branching times of island species to be used for analysis in the R package 'DAISIE'. It uses phylogenetic and endemicity data to extract the separate island colonists and store them.
Maintained by Joshua W. Lambert. Last updated 2 months ago.
data-scienceisland-biogeographyphylogenetics
3.1 match 6 stars 6.78 score 24 scriptssjackman
uniqtag:Abbreviate Strings to Short, Unique Identifiers
For each string in a set of strings, determine a unique tag that is a substring of fixed size k unique to that string, if it has one. If no such unique substring exists, the least frequent substring is used. If multiple unique substrings exist, the lexicographically smallest substring is used. This lexicographically smallest substring of size k is called the "UniqTag" of that string.
Maintained by Shaun Jackman. Last updated 3 years ago.
4.0 match 24 stars 5.26 score 50 scripts 1 dependentstidyverse
duckplyr:A 'DuckDB'-Backed Version of 'dplyr'
A drop-in replacement for 'dplyr', powered by 'DuckDB' for performance. Offers convenient utilities for working with in-memory and larger-than-memory data while retaining full 'dplyr' compatibility.
Maintained by Kirill Müller. Last updated 8 days ago.
analyticsdataframedplyrduckdbperformance
1.9 match 309 stars 11.33 score 220 scriptsyufree
pmd:Paired Mass Distance Analysis for GC/LC-MS Based Non-Targeted Analysis and Reactomics Analysis
Paired mass distance (PMD) analysis proposed in Yu, Olkowicz and Pawliszyn (2018) <doi:10.1016/j.aca.2018.10.062> and PMD based reactomics analysis proposed in Yu and Petrick (2020) <doi:10.1038/s42004-020-00403-z> for gas/liquid chromatography–mass spectrometry (GC/LC-MS) based non-targeted analysis. PMD analysis including GlobalStd algorithm and structure/reaction directed analysis. GlobalStd algorithm could found independent peaks in m/z-retention time profiles based on retention time hierarchical cluster analysis and frequency analysis of paired mass distances within retention time groups. Structure directed analysis could be used to find potential relationship among those independent peaks in different retention time groups based on frequency of paired mass distances. Reactomics analysis could also be performed to build PMD network, assign sources and make biomarker reaction discovery. GUIs for PMD analysis is also included as 'shiny' applications.
Maintained by Miao YU. Last updated 2 months ago.
mass-spectrometrymetabolomicsnon-target
3.2 match 10 stars 6.68 score 40 scriptspaulponcet
bazar:Miscellaneous Basic Functions
A collection of miscellaneous functions for copying objects to the clipboard ('Copy'); manipulating strings ('concat', 'mgsub', 'trim', 'verlan'); loading or showing packages ('library_with_dep', 'require_with_dep', 'sessionPackages'); creating or testing for named lists ('nlist', 'as_nlist', 'is_nlist'), formulas ('is_formula'), empty objects ('as_empty', 'is_empty'), whole numbers ('as_wholenumber', 'is_wholenumber'); testing for equality ('almost_equal', 'almost_zero') and computing uniqueness ('almost_unique'); getting modified versions of usual functions ('rle2', 'sumNA'); making a pause or a stop ('pause', 'stopif'); converting into a function ('as_fun'); providing a C like ternary operator ('condition %?% true %:% false'); finding packages and functions ('get_all_pkgs', 'get_all_funs'); and others ('erase', '%nin%', 'unwhich', 'top', 'bot', 'normalize').
Maintained by Paul Poncet. Last updated 6 years ago.
4.8 match 1 stars 4.46 score 79 scripts 2 dependentsbergsmat
wrangle:A Systematic Data Wrangling Idiom
Supports systematic scrutiny, modification, and integration of data. The function status() counts rows that have missing values in grouping columns (returned by na() ), have non-unique combinations of grouping columns (returned by dup() ), and that are not locally sorted (returned by unsorted() ). Functions enumerate() and itemize() give sorted unique combinations of columns, with or without occurrence counts, respectively. Function ignore() drops columns in x that are present in y, and informative() drops columns in x that are entirely NA; constant() returns values that are constant, given a key. Data that have defined unique combinations of grouping values behave more predictably during merge operations.
Maintained by Tim Bergsma. Last updated 5 months ago.
7.2 match 2 stars 2.91 score 41 scriptsgnguy
assertable:Verbose Assertions for Tabular Data (Data.frames and Data.tables)
Simple, flexible, assertions on data.frame or data.table objects with verbose output for vetting. While other assertion packages apply towards more general use-cases, assertable is tailored towards tabular data. It includes functions to check variable names and values, whether the dataset contains all combinations of a given set of unique identifiers, and whether it is a certain length. In addition, assertable includes utility functions to check the existence of target files and to efficiently import multiple tabular data files into one data.table.
Maintained by Grant Nguyen. Last updated 4 years ago.
3.3 match 6.29 score 219 scripts 2 dependentsteebusch
noah:Create Unique Pseudonymous Animal Names
Generate pseudonymous animal names that are delightful and easy to remember like the Likable Leech and the Proud Chickadee. A unique pseudonym can be created for every unique element in a vector or row in a data frame. Pseudonyms can be customized and tracked over time, so that the same input is always assigned the same pseudonym.
Maintained by Tobias Busch. Last updated 4 years ago.
5.8 match 7 stars 3.54 score 9 scriptsapache
adbcdrivermanager:'Arrow' Database Connectivity ('ADBC') Driver Manager
Provides a developer-facing interface to 'Arrow' Database Connectivity ('ADBC') for the purposes of driver development, driver testing, and building high-level database interfaces for users. 'ADBC' <https://arrow.apache.org/adbc/> is an API standard for database access libraries that uses 'Arrow' for result sets and query parameters.
Maintained by Dewey Dunnington. Last updated 2 days ago.
1.8 match 419 stars 11.39 score 73 scripts 6 dependentsjmbarbone
mark:Miscellaneous, Analytic R Kernels
Miscellaneous functions and wrappers for development in other packages created, maintained by Jordan Mark Barbone.
Maintained by Jordan Mark Barbone. Last updated 1 months ago.
4.1 match 6 stars 4.95 score 9 scriptsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
1.7 match 215 stars 11.83 score 1.2k scripts 9 dependentsbioc
regioneR:Association analysis of genomic regions based on permutation tests
regioneR offers a statistical framework based on customizable permutation tests to assess the association between genomic region sets and other genomic features.
Maintained by Bernat Gel. Last updated 5 months ago.
geneticschipseqdnaseqmethylseqcopynumbervariation
2.3 match 9.00 score 2.7k scripts 21 dependentswinvector
vtreat:A Statistically Sound 'data.frame' Processor/Conditioner
A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.
Maintained by John Mount. Last updated 2 months ago.
categorical-variablesmachine-learning-algorithmsnested-modelsprepare-data
1.8 match 285 stars 11.19 score 328 scripts 1 dependentsbioc
amplican:Automated analysis of CRISPR experiments
`amplican` performs alignment of the amplicon reads, normalizes gathered data, calculates multiple statistics (e.g. cut rates, frameshifts) and presents results in form of aggregated reports. Data and statistics can be broken down by experiments, barcodes, user defined groups, guides and amplicons allowing for quick identification of potential problems.
Maintained by Eivind Valen. Last updated 5 months ago.
immunooncologytechnologyalignmentqpcrcrisprcpp
2.7 match 10 stars 7.54 score 41 scriptsnoreastermt
allelematch:Identifying Unique Multilocus Genotypes where Genotyping Error and Missing Data may be Present
Tools for the identification of unique of multilocus genotypes when both genotyping error and missing data may be present; targeted for use with large datasets and databases containing multiple samples of each individual (a common situation in conservation genetics, particularly in non-invasive wildlife sampling applications). Functions explicitly incorporate missing data and can tolerate allele mismatches created by genotyping error. If you use this package, please cite the original publication in Molecular Ecology Resources (Galpern et al., 2012), the details for which can be generated using citation('allelematch'). For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
Maintained by Todd Cross. Last updated 12 months ago.
8.8 match 2.26 score 8 scripts 1 dependentsjepusto
scdhlm:Estimating Hierarchical Linear Models for Single-Case Designs
Provides a set of tools for estimating hierarchical linear models and effect sizes based on data from single-case designs. Functions are provided for calculating standardized mean difference effect sizes that are directly comparable to standardized mean differences estimated from between-subjects randomized experiments, as described in Hedges, Pustejovsky, and Shadish (2012) <DOI:10.1002/jrsm.1052>; Hedges, Pustejovsky, and Shadish (2013) <DOI:10.1002/jrsm.1086>; Pustejovsky, Hedges, and Shadish (2014) <DOI:10.3102/1076998614547577>; and Chen, Pustejovsky, Klingbeil, and Van Norman (2023) <DOI:10.1016/j.jsp.2023.02.002>. Includes an interactive web interface.
Maintained by James Pustejovsky. Last updated 1 years ago.
3.5 match 4 stars 5.62 score 52 scriptsdickoa
robotoolbox:Client for the 'KoboToolbox' API
Suite of utilities for accessing and manipulating data from the 'KoboToolbox' API. 'KoboToolbox' is a robust platform designed for field data collection in various disciplines. This package aims to simplify the process of fetching and handling data from the API. Detailed documentation for the 'KoboToolbox' API can be found at <https://support.kobotoolbox.org/api.html>.
Maintained by Ahmadou Dicko. Last updated 3 months ago.
open-datakobotoolboxodkkpiapidatadataset
3.4 match 5.86 score 48 scriptsbioc
OmnipathR:OmniPath web service client and more
A client for the OmniPath web service (https://www.omnipathdb.org) and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).
Maintained by Denes Turei. Last updated 22 days ago.
graphandnetworknetworkpathwayssoftwarethirdpartyclientdataimportdatarepresentationgenesignalinggeneregulationsystemsbiologytranscriptomicssinglecellannotationkeggcomplexesenzyme-ptmnetworksnetworks-biologyomnipathproteinsquarto
2.0 match 126 stars 9.90 score 226 scripts 2 dependentsspkaluzny
splus2R:Supplemental S-PLUS Functionality in R
Currently there are many functions in S-PLUS that are missing in R. To facilitate the conversion of S-PLUS packages to R packages, this package provides some missing S-PLUS functionality in R.
Maintained by Stephen Kaluzny. Last updated 1 years ago.
3.0 match 1 stars 6.56 score 82 scripts 30 dependentshfgolino
EGAnet:Exploratory Graph Analysis – a Framework for Estimating the Number of Dimensions in Multivariate Data using Network Psychometrics
Implements the Exploratory Graph Analysis (EGA) framework for dimensionality and psychometric assessment. EGA estimates the number of dimensions in psychological data using network estimation methods and community detection algorithms. A bootstrap method is provided to assess the stability of dimensions and items. Fit is evaluated using the Entropy Fit family of indices. Unique Variable Analysis evaluates the extent to which items are locally dependent (or redundant). Network loadings provide similar information to factor loadings and can be used to compute network scores. A bootstrap and permutation approach are available to assess configural and metric invariance. Hierarchical structures can be detected using Hierarchical EGA. Time series and intensive longitudinal data can be analyzed using Dynamic EGA, supporting individual, group, and population level assessments.
Maintained by Hudson Golino. Last updated 3 days ago.
2.5 match 47 stars 7.83 score 61 scripts 1 dependentsbioc
annotate:Annotation for microarrays
Using R enviroments for annotation.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
1.7 match 11.41 score 812 scripts 243 dependentsngreifer
cobalt:Covariate Balance Tables and Plots
Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with 'MatchIt', 'WeightIt', 'MatchThem', 'twang', 'Matching', 'optmatch', 'CBPS', 'ebal', 'cem', 'sbw', and 'designmatch' for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with multi-category, continuous, or longitudinal treatments.
Maintained by Noah Greifer. Last updated 12 months ago.
causal-inferencepropensity-scores
1.5 match 75 stars 12.98 score 1.0k scripts 8 dependentsopenanalytics
inTextSummaryTable:Creation of in-Text Summary Table
Creation of tables of summary statistics or counts for clinical data (for 'TLFs'). These tables can be exported as in-text table (with the 'flextable' package) for a Clinical Study Report (Word format) or a 'topline' presentation (PowerPoint format), or as interactive table (with the 'DT' package) to an html document for clinical data review.
Maintained by Laure Cougnaud. Last updated 9 months ago.
3.5 match 1 stars 5.52 score 47 scriptsberndbischl
BBmisc:Miscellaneous Helper Functions for B. Bischl
Miscellaneous helper functions for and from B. Bischl and some other guys, mainly for package development.
Maintained by Bernd Bischl. Last updated 2 years ago.
1.8 match 20 stars 10.59 score 980 scripts 69 dependentsbioc
matter:Out-of-core statistical computing and signal processing
Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.
Maintained by Kylie A. Bemis. Last updated 4 months ago.
infrastructuredatarepresentationdataimportdimensionreductionpreprocessingcpp
2.0 match 57 stars 9.52 score 64 scripts 2 dependentsbioc
GUIDEseq:GUIDE-seq and PEtag-seq analysis pipeline
The package implements GUIDE-seq and PEtag-seq analysis workflow including functions for filtering UMI and reads with low coverage, obtaining unique insertion sites (proxy of cleavage sites), estimating the locations of the insertion sites, aka, peaks, merging estimated insertion sites from plus and minus strand, and performing off target search of the extended regions around insertion sites with mismatches and indels.
Maintained by Lihua Julie Zhu. Last updated 5 months ago.
immunooncologygeneregulationsequencingworkflowstepcrispr
4.3 match 4.45 score 14 scriptsbioc
scRepertoire:A toolkit for single-cell immune receptor profiling
scRepertoire is a toolkit for processing and analyzing single-cell T-cell receptor (TCR) and immunoglobulin (Ig). The scRepertoire framework supports use of 10x, AIRR, BD, MiXCR, Omniscope, TRUST4, and WAT3R single-cell formats. The functionality includes basic clonal analyses, repertoire summaries, distance-based clustering and interaction with the popular Seurat and SingleCellExperiment/Bioconductor R workflows.
Maintained by Nick Borcherding. Last updated 2 months ago.
softwareimmunooncologysinglecellclassificationannotationsequencingcpp
1.8 match 326 stars 10.49 score 240 scriptst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
1.7 match 10.93 score 10k scripts 55 dependentsrickhelmus
patRoon:Workflows for Mass-Spectrometry Based Non-Target Analysis
Provides an easy-to-use interface to a mass spectrometry based non-target analysis workflow. Various (open-source) tools are combined which provide algorithms for extraction and grouping of features, extraction of MS and MS/MS data, automatic formula and compound annotation and grouping related features to components. In addition, various tools are provided for e.g. data preparation and cleanup, plotting results and automatic reporting.
Maintained by Rick Helmus. Last updated 21 hours ago.
mass-spectrometrynon-targetcppopenjdk
3.0 match 65 stars 6.24 score 43 scriptspik-piam
quitte:Bits and pieces of code to use with quitte-style data frames
A collection of functions for easily dealing with quitte-style data frames, doing multi-model comparisons and plots.
Maintained by Falk Benke. Last updated 18 hours ago.
2.3 match 8.24 score 184 scripts 35 dependentspredictiveecology
reproducible:Enhance Reproducibility of R Code
A collection of high-level, machine- and OS-independent tools for making reproducible and reusable content in R. The two workhorse functions are Cache() and prepInputs(). Cache() allows for nested caching, is robust to environments and objects with environments (like functions), and deals with some classes of file-backed R objects e.g., from terra and raster packages. Both functions have been developed to be foundational components of data retrieval and processing in continuous workflow situations. In both functions, efforts are made to make the first and subsequent calls of functions have the same result, but faster at subsequent times by way of checksums and digesting. Several features are still under development, including cloud storage of cached objects allowing for sharing between users. Several advanced options are available, see ?reproducibleOptions().
Maintained by Eliot J B McIntire. Last updated 1 months ago.
reproducibilityreproducible-research
1.8 match 41 stars 10.52 score 122 scripts 15 dependentsbioc
adverSCarial:adverSCarial, generate and analyze the vulnerability of scRNA-seq classifier to adversarial attacks
adverSCarial is an R Package designed for generating and analyzing the vulnerability of scRNA-seq classifiers to adversarial attacks. The package is versatile and provides a format for integrating any type of classifier. It offers functions for studying and generating two types of attacks, single gene attack and max change attack. The single-gene attack involves making a small modification to the input to alter the classification. The max-change attack involves making a large modification to the input without changing its classification. The package provides a comprehensive solution for evaluating the robustness of scRNA-seq classifiers against adversarial attacks.
Maintained by Ghislain FIEVET. Last updated 5 months ago.
softwaresinglecelltranscriptomicsclassification
3.4 match 5.42 score 19 scripts2005m
kit:Data Manipulation Functions Implemented in C
Basic functions, implemented in C, for large data manipulation. Fast vectorised ifelse()/nested if()/switch() functions, psum()/pprod() functions equivalent to pmin()/pmax() plus others which are missing from base R. Most of these functions are callable at C level.
Maintained by Morgan Jacob. Last updated 6 months ago.
2.0 match 58 stars 9.11 score 92 scripts 5 dependentssdctools
sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Maintained by Matthias Templ. Last updated 30 days ago.
1.9 match 84 stars 9.63 score 258 scriptsspatstat
spatstat.univar:One-Dimensional Probability Distribution Support for the 'spatstat' Family
Estimation of one-dimensional probability distributions including kernel density estimation, weighted empirical cumulative distribution functions, Kaplan-Meier and reduced-sample estimators for right-censored data, heat kernels, kernel properties, quantiles and integration.
Maintained by Adrian Baddeley. Last updated 15 days ago.
1.8 match 3 stars 9.93 score 1 scripts 239 dependentswenchao-ma
GDINA:The Generalized DINA Model Framework
A set of psychometric tools for cognitive diagnosis modeling based on the generalized deterministic inputs, noisy and gate (G-DINA) model by de la Torre (2011) <DOI:10.1007/s11336-011-9207-7> and its extensions, including the sequential G-DINA model by Ma and de la Torre (2016) <DOI:10.1111/bmsp.12070> for polytomous responses, and the polytomous G-DINA model by Chen and de la Torre <DOI:10.1177/0146621613479818> for polytomous attributes. Joint attribute distribution can be independent, saturated, higher-order, loglinear smoothed or structured. Q-matrix validation, item and model fit statistics, model comparison at test and item level and differential item functioning can also be conducted. A graphical user interface is also provided. For tutorials, please check Ma and de la Torre (2020) <DOI:10.18637/jss.v093.i14>, Ma and de la Torre (2019) <DOI:10.1111/emip.12262>, Ma (2019) <DOI:10.1007/978-3-030-05584-4_29> and de la Torre and Akbay (2019).
Maintained by Wenchao Ma. Last updated 1 months ago.
cdmcognitive-diagnosisdcmdina-modeldinoestimation-modelsgdinaitem-response-theorypsychometricsopenblascpp
2.0 match 30 stars 8.92 score 94 scripts 6 dependentsstemangiola
tidyseurat:Brings Seurat to the Tidyverse
It creates an invisible layer that allow to see the 'Seurat' object as tibble and interact seamlessly with the tidyverse.
Maintained by Stefano Mangiola. Last updated 8 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsdplyrggplot2pcapurrrsctseuratsingle-cellsingle-cell-rna-seqtibbletidyrtidyversetranscriptstsneumap
1.9 match 159 stars 9.48 score 398 scripts 1 dependentsbioc
SpatialFeatureExperiment:Integrating SpatialExperiment with Simple Features in sf
A new S4 class integrating Simple Features with the R package sf to bring geospatial data analysis methods based on vector data to spatial transcriptomics. Also implements management of spatial neighborhood graphs and geometric operations. This pakage builds upon SpatialExperiment and SingleCellExperiment, hence methods for these parent classes can still be used.
Maintained by Lambda Moses. Last updated 1 months ago.
datarepresentationtranscriptomicsspatial
1.9 match 49 stars 9.40 score 322 scripts 1 dependentshope-data-science
tidyfst:Tidy Verbs for Fast Data Manipulation
A toolkit of tidy data manipulation verbs with 'data.table' as the backend. Combining the merits of syntax elegance from 'dplyr' and computing performance from 'data.table', 'tidyfst' intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of 'data.table'. While enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations.
Maintained by Tian-Yuan Huang. Last updated 6 months ago.
1.8 match 100 stars 10.06 score 118 scripts 4 dependentskbroman
broman:Karl Broman's R Code
Miscellaneous R functions, including functions related to graphics (mostly for base graphics), permutation tests, running mean/median, and general utilities.
Maintained by Karl W Broman. Last updated 10 months ago.
2.0 match 183 stars 8.80 score 648 scripts 1 dependentsramhiser
itertools2:Iterators for efficient looping
A port of Python's excellent itertools module to R for efficient looping.
Maintained by John A. Ramey. Last updated 9 years ago.
3.4 match 12 stars 5.10 score 35 scripts 2 dependentseddelbuettel
RcppUUID:Generating Universally Unique Identificators
Using the efficient implementation in the Boost C++ library, functions are provided to generate vectors of 'Universally Unique Identifiers (UUID)' from R supporting random (version 4), name (version 5) and time (version 7) 'UUIDs'. The initial repository was at <https://gitlab.com/artemklevtsov/rcppuuid>.
Maintained by Dirk Eddelbuettel. Last updated 1 months ago.
5.5 match 1 stars 3.18 score 1 scriptsjinkim3
kim:A Toolkit for Behavioral Scientists
A collection of functions for analyzing data typically collected or used by behavioral scientists. Examples of the functions include a function that compares groups in a factorial experimental design, a function that conducts two-way analysis of variance (ANOVA), and a function that cleans a data set generated by Qualtrics surveys. Some of the functions will require installing additional package(s). Such packages and other references are cited within the section describing the relevant functions. Many functions in this package rely heavily on these two popular R packages: Dowle et al. (2021) <https://CRAN.R-project.org/package=data.table>. Wickham et al. (2021) <https://CRAN.R-project.org/package=ggplot2>.
Maintained by Jin Kim. Last updated 22 days ago.
3.8 match 7 stars 4.66 score 3 scriptss-u
fastmatch:Fast 'match()' Function
Package providing a fast match() replacement for cases that require repeated look-ups. It is slightly faster that R's built-in match() function on first match against a table, but extremely fast on any subsequent lookup as it keeps the hash table in memory.
Maintained by Simon Urbanek. Last updated 3 months ago.
1.8 match 20 stars 9.93 score 251 scripts 375 dependentsms609
TreeTools:Create, Modify and Analyse Phylogenetic Trees
Efficient implementations of functions for the creation, modification and analysis of phylogenetic trees. Applications include: generation of trees with specified shapes; tree rearrangement; analysis of tree shape; rooting of trees and extraction of subtrees; calculation and depiction of split support; plotting the position of rogue taxa (Klopfstein & Spasojevic 2019) <doi:10.1371/journal.pone.0212942>; calculation of ancestor-descendant relationships, of 'stemwardness' (Asher & Smith, 2022) <doi:10.1093/sysbio/syab072>, and of tree balance (Mir et al. 2013, Lemant et al. 2022) <doi:10.1016/j.mbs.2012.10.005>, <doi:10.1093/sysbio/syac027>; artificial extinction (Asher & Smith, 2022) <doi:10.1093/sysbio/syab072>; import and export of trees from Newick, Nexus (Maddison et al. 1997) <doi:10.1093/sysbio/46.4.590>, and TNT <https://www.lillo.org.ar/phylogeny/tnt/> formats; and analysis of splits and cladistic information.
Maintained by Martin R. Smith. Last updated 1 months ago.
evolutionary-biologyphylogenetic-treesphylogeneticscpp
1.8 match 21 stars 9.92 score 124 scripts 10 dependentsbioc
clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters
Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.
Maintained by Rui Fu. Last updated 5 months ago.
singlecellannotationsequencingmicroarraygeneexpressionassign-identitiesclustersmarker-genesrna-seqsingle-cell-rna-seq
1.8 match 119 stars 9.63 score 296 scriptstrinker
qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
Maintained by Tyler Rinker. Last updated 4 years ago.
qdapquantitative-discourse-analysistext-analysistext-miningtext-plottingopenjdk
1.8 match 176 stars 9.61 score 1.3k scripts 3 dependentscnrakt
haplotypes:Manipulating DNA Sequences and Estimating Unambiguous Haplotype Network with Statistical Parsimony
Provides S4 classes and methods for reading and manipulating aligned DNA sequences, supporting an indel coding methods (only simple indel coding method is available in the current version), showing base substitutions and indels, calculating absolute pairwise distances between DNA sequences, and collapses identical DNA sequences into haplotypes or inferring haplotypes using user provided absolute pairwise character difference matrix. This package also includes S4 classes and methods for estimating genealogical relationships among haplotypes using statistical parsimony and plotting parsimony networks.
Maintained by Caner Aktas. Last updated 2 years ago.
5.0 match 1 stars 3.43 score 54 scriptsnickch-k
vtable:Variable Table for Variable Documentation
Automatically generates HTML variable documentation including variable names, labels, classes, value labels (if applicable), value ranges, and summary statistics. See the vignette "vtable" for a package overview.
Maintained by Nick Huntington-Klein. Last updated 3 months ago.
1.9 match 40 stars 9.10 score 1.2k scriptsusepa
tcpl:ToxCast Data Analysis Pipeline
The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.
Maintained by Jason Brown. Last updated 13 hours ago.
1.8 match 36 stars 9.39 score 90 scriptsropensci
qualtRics:Download 'Qualtrics' Survey Data
Provides functions to access survey results directly into R using the 'Qualtrics' API. 'Qualtrics' <https://www.qualtrics.com/about/> is an online survey and data collection software platform. See <https://api.qualtrics.com/> for more information about the 'Qualtrics' API. This package is community-maintained and is not officially supported by 'Qualtrics'.
Maintained by Julia Silge. Last updated 6 months ago.
apiqualtricsqualtrics-apisurveysurvey-data
1.7 match 221 stars 10.23 score 272 scriptscarlosp-carmona
TPD:Methods for Measuring Functional Diversity Based on Trait Probability Density
Tools to calculate trait probability density functions (TPD) at any scale (e.g. populations, species, communities). TPD functions are used to compute several indices of functional diversity, as well as its partition across scales. These indices constitute a unified framework that incorporates the underlying probabilistic nature of trait distributions into uni- or multidimensional functional trait-based studies. See Carmona et al. (2016) <doi:10.1016/j.tree.2016.02.003> for further information.
Maintained by Carlos P. Carmona. Last updated 6 years ago.
4.9 match 2 stars 3.42 score 33 scripts