R-universe search: needs:vctrs

tidyverse

ggplot2:Create Elegant Data Visualisations Using the Grammar of Graphics

A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

Maintained by Thomas Lin Pedersen. Last updated 6 days ago.

data-visualisation visualisation

6.6k stars 25.10 score 645k scripts 7.6k dependents

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 28 days ago.

data-manipulation grammar cpp

4.8k stars 24.68 score 659k scripts 7.8k dependents

tidyverse

tibble:Simple Data Frames

Provides a 'tbl_df' class (the 'tibble') with stricter checking and better formatting than the traditional data frame.

Maintained by Kirill Müller. Last updated 9 hours ago.

tidy-data

696 stars 22.88 score 47k scripts 11k dependents

tidyverse

tidyr:Tidy Messy Data

Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).

Maintained by Hadley Wickham. Last updated 28 days ago.

tidy-data cpp

1.4k stars 22.88 score 168k scripts 5.5k dependents

tidyverse

purrr:Functional Programming Tools

A complete and consistent functional programming toolkit for R.

Maintained by Hadley Wickham. Last updated 2 months ago.

functional-programming

1.3k stars 22.12 score 59k scripts 6.9k dependents

tidyverse

stringr:Simple, Consistent Wrappers for Common String Operations

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.

Maintained by Hadley Wickham. Last updated 7 months ago.

regular-expression strings

628 stars 21.99 score 164k scripts 8.3k dependents

tidymodels

broom:Convert Statistical Objects into Tidy Tibbles

Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.

Maintained by Simon Couch. Last updated 3 days ago.

modeling tidy-data

1.5k stars 21.58 score 37k scripts 1.5k dependents

igraph

igraph:Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

Maintained by Kirill Müller. Last updated 8 hours ago.

complex-networks graph-algorithms graph-theory mathematics network-analysis network-graph fortran libxml2 glpk openblas cpp

584 stars 21.14 score 31k scripts 1.9k dependents

tidyverse

readxl:Read Excel Files

Import excel files into R. Supports '.xls' via the embedded 'libxls' C library <https://github.com/libxls/libxls> and '.xlsx' via the embedded 'RapidXML' C++ library <https://rapidxml.sourceforge.net/>. Works on Windows, Mac and Linux without external dependencies.

Maintained by Jennifer Bryan. Last updated 24 days ago.

excel spreadsheet xls xlsx cpp

734 stars 20.85 score 160k scripts 815 dependents

tidyverse

tidyverse:Easily Install and Load the 'Tidyverse'

The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.

Maintained by Hadley Wickham. Last updated 5 months ago.

data-science tidyverse

1.7k stars 20.23 score 664k scripts 125 dependents

tidyverse

readr:Read Rectangular Text Data

The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.

Maintained by Jennifer Bryan. Last updated 8 months ago.

csv fwf parsing cpp

1.0k stars 20.06 score 132k scripts 2.1k dependents

thomasp85

patchwork:The Composer of Plots

The 'ggplot2' package provides a strong API for sequentially building up a plot, but does not concern itself with composition of multiple plots. 'patchwork' is a package that expands the API to allow for arbitrarily complex composition of plots by, among others, providing mathematical operators for combining multiple plots. Other packages that try to address this need (but with a different approach) are 'gridExtra' and 'cowplot'.

Maintained by Thomas Lin Pedersen. Last updated 6 days ago.

ggplot-extension ggplot2 visualization

2.5k stars 19.83 score 82k scripts 657 dependents

tidyverse

dbplyr:A 'dplyr' Back End for Databases

A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.

Maintained by Hadley Wickham. Last updated 4 months ago.

database

481 stars 19.72 score 5.2k scripts 736 dependents

r-lib

devtools:Tools to Make Developing R Packages Easier

Collection of package development tools.

Maintained by Jennifer Bryan. Last updated 6 months ago.

package-creation

2.4k stars 19.55 score 51k scripts 150 dependents

plotly

plotly:Create Interactive Web Graphics via 'plotly.js'

Create interactive web graphics from 'ggplot2' graphs and/or a custom interface to the (MIT-licensed) JavaScript library 'plotly.js' inspired by the grammar of graphics.

Maintained by Carson Sievert. Last updated 4 months ago.

d3js data-visualization ggplot2 javascript plotly shiny webgl

2.6k stars 19.43 score 93k scripts 797 dependents

sfirke

janitor:Simple Tools for Examining and Cleaning Dirty Data

The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and explore duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness.

Maintained by Sam Firke. Last updated 3 months ago.

data-analysis data-cleaning data-science dirty-data excel pivot-tables spss tabulations tidyverse

1.4k stars 19.40 score 35k scripts 231 dependents

haozhu233

kableExtra:Construct Complex Table with 'kable' and Pipe Syntax

Build complex HTML or 'LaTeX' tables using 'kable()' from 'knitr' and the piping syntax from 'magrittr'. Function 'kable()' is a light weight table generator coming from 'knitr'. This package simplifies the way to manipulate the HTML or 'LaTeX' codes generated by 'kable()' and allows users to construct complex tables and customize styles using a readable syntax.

Maintained by Hao Zhu. Last updated 25 days ago.

html kable kableextra knitr latex rmarkdown

702 stars 19.35 score 55k scripts 163 dependents

tidyverse

rvest:Easily Harvest (Scrape) Web Pages

Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.

Maintained by Hadley Wickham. Last updated 5 months ago.

html web-scraping

1.5k stars 19.33 score 29k scripts 549 dependents

apache

arrow:Integration to 'Apache' 'Arrow'

'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.

Maintained by Jonathan Keane. Last updated 2 months ago.

arrow curl openssl cpp

15k stars 19.25 score 10k scripts 82 dependents

topepo

caret:Classification and Regression Training

Misc functions for training and plotting classification and regression models.

Maintained by Max Kuhn. Last updated 4 months ago.

1.6k stars 19.24 score 61k scripts 303 dependents

slowkow

ggrepel:Automatically Position Non-Overlapping Text Labels with 'ggplot2'

Provides text and label geoms for 'ggplot2' that help to avoid overlapping text labels. Labels repel away from each other and away from the data points.

Maintained by Kamil Slowikowski. Last updated 5 months ago.

ggplot2 text visualization cpp

1.2k stars 19.20 score 37k scripts 1.2k dependents

stan-dev

rstan:R Interface to Stan

User-facing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the 'StanHeaders' package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via 'variational' approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.

Maintained by Ben Goodrich. Last updated 3 days ago.

bayesian-data-analysis bayesian-inference bayesian-statistics mcmc stan cpp

1.1k stars 18.86 score 14k scripts 281 dependents

wilkelab

cowplot:Streamlined Plot Theme and Plot Annotations for 'ggplot2'

Provides various features that help with creating publication-quality figures with 'ggplot2', such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images. The package was originally written for internal use in the Wilke lab, hence the name (Claus O. Wilke's plot package). It has also been used extensively in the book Fundamentals of Data Visualization.

Maintained by Claus O. Wilke. Last updated 3 months ago.

714 stars 18.83 score 75k scripts 1.4k dependents

tidymodels

recipes:Preprocessing and Feature Engineering Steps for Modeling

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Maintained by Max Kuhn. Last updated 3 days ago.

586 stars 18.80 score 7.2k scripts 383 dependents

tidyverse

forcats:Tools for Working with Categorical Variables (Factors)

Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').

Maintained by Hadley Wickham. Last updated 1 years ago.

factor tidyverse

553 stars 18.79 score 21k scripts 1.2k dependents

r-dbi

RSQLite:SQLite Interface for R

Embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine and for various extensions in a recent version is included. System libraries will never be consulted because this package relies on static linking for the plugins it includes; this also ensures a consistent experience across all installations.

Maintained by Kirill Müller. Last updated 1 days ago.

database sqlite3 cpp

331 stars 18.78 score 8.1k scripts 1.1k dependents

tidyverse

haven:Import and Export 'SPSS', 'Stata' and 'SAS' Files

Import foreign statistical formats into R via the embedded 'ReadStat' C library, <https://github.com/WizardMac/ReadStat>.

Maintained by Hadley Wickham. Last updated 6 months ago.

sas spss stata zlib cpp

427 stars 18.63 score 18k scripts 682 dependents

r-lib

roxygen2:In-Line Documentation for R

Generate your Rd documentation, 'NAMESPACE' file, and collation field using specially formatted comments. Writing documentation in-line with code makes it easier to keep your documentation up-to-date as your requirements change. 'roxygen2' is inspired by the 'Doxygen' system for C++.

Maintained by Hadley Wickham. Last updated 8 months ago.

devtools documentation cpp

606 stars 18.51 score 2.3k scripts 219 dependents

r-lib

pkgdown:Make Static HTML Documentation for a Package

Generate an attractive and useful website from a source package. 'pkgdown' converts your documentation, vignettes, 'README', and more to 'HTML' making it easy to share information about your package online.

Maintained by Hadley Wickham. Last updated 12 days ago.

documentation-tool

734 stars 18.46 score 588 scripts 162 dependents

rstudio

gt:Easily Create Presentation-Ready Display Tables

Build display tables from tabular data with an easy-to-use set of functions. With its progressive approach, we can construct display tables with a cohesive set of table parts. Table values can be formatted using any of the included formatting functions. Footnotes and cell styles can be precisely added through a location targeting system. The way in which 'gt' handles things for you means that you don't often have to worry about the fine details.

Maintained by Richard Iannone. Last updated 26 days ago.

docx easy-to-use html latex rtf summary-tables

2.1k stars 18.36 score 20k scripts 112 dependents

r-lib

pillar:Coloured Formatting for Columns

Provides 'pillar' and 'colonnade' generics designed for formatting columns of data using the full range of colours provided by modern terminals.

Maintained by Kirill Müller. Last updated 4 days ago.

colour

178 stars 18.33 score 233 scripts 11k dependents

r-lib

tidyselect:Select from a Set of Strings

A backend for the selecting functions of the 'tidyverse'. It makes it easy to implement select-like functions in your own packages in a way that is consistent with other 'tidyverse' interfaces for selection.

Maintained by Lionel Henry. Last updated 4 months ago.

130 stars 18.31 score 1.9k scripts 8.2k dependents

nanxstats

ggsci:Scientific Journal and Sci-Fi Themed Color Palettes for 'ggplot2'

A collection of 'ggplot2' color palettes inspired by plots in scientific journals, data visualization libraries, science fiction movies, and TV shows.

Maintained by Nan Xiao. Last updated 10 months ago.

color-palettes data-visualization ggplot2 ggsci sci-fi scientific-journals visualization

680 stars 18.00 score 26k scripts 438 dependents

sjmgarnier

viridis:Colorblind-Friendly Color Maps for R

Color maps designed to improve graph readability for readers with common forms of color blindness and/or color vision deficiency. The color maps are also perceptually-uniform, both in regular form and also when converted to black-and-white for printing. This package also contains 'ggplot2' bindings for discrete and continuous color and fill scales. A lean version of the package called 'viridisLite' that does not include the 'ggplot2' bindings can be found at <https://cran.r-project.org/package=viridisLite>.

Maintained by Simon Garnier. Last updated 1 years ago.

color-blindness color-scheme

298 stars 17.96 score 49k scripts 1.2k dependents

tidyverse

vroom:Read and Write Rectangular Text Data Quickly

The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting.

Maintained by Jennifer Bryan. Last updated 7 months ago.

csv csv-parser fixed-width-text tsv tsv-parser cpp

625 stars 17.82 score 4.5k scripts 2.1k dependents

r-lib

httr2:Perform HTTP Requests and Process the Responses

Tools for creating and modifying HTTP requests, then performing them and processing the results. 'httr2' is a modern re-imagining of 'httr' that uses a pipe-based interface and solves more of the problems that API wrapping packages face.

Maintained by Hadley Wickham. Last updated 4 days ago.

http

246 stars 17.64 score 1.9k scripts 1.1k dependents

harrelfe

Hmisc:Harrell Miscellaneous

Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.

Maintained by Frank E Harrell Jr. Last updated 6 days ago.

fortran

209 stars 17.64 score 17k scripts 750 dependents

r-lib

usethis:Automate Package and Project Setup

Automate package and project setup tasks that are otherwise performed manually. This includes setting up unit testing, test coverage, continuous integration, Git, 'GitHub', licenses, 'Rcpp', 'RStudio' projects, and more.

Maintained by Jennifer Bryan. Last updated 26 days ago.

github setup

869 stars 17.54 score 5.6k scripts 336 dependents

robjhyndman

forecast:Forecasting Functions for Time Series and Linear Models

Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

Maintained by Rob Hyndman. Last updated 7 months ago.

forecast forecasting openblas cpp

1.1k stars 17.46 score 16k scripts 240 dependents

stan-dev

loo:Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models

Efficient approximate leave-one-out cross-validation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions.

Maintained by Jonah Gabry. Last updated 18 days ago.

bayes bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics cross-validation information-criterion model-comparison stan

152 stars 17.30 score 2.6k scripts 297 dependents

hadley

reshape2:Flexibly Reshape Data: A Reboot of the Reshape Package

Flexibly restructure and aggregate data using just two functions: melt and 'dcast' (or 'acast').

Maintained by Hadley Wickham. Last updated 4 years ago.

cpp

210 stars 17.19 score 94k scripts 2.0k dependents

talgalili

dendextend:Extending 'dendrogram' Functionality in R

Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.

Maintained by Tal Galili. Last updated 2 months ago.

154 stars 17.13 score 6.0k scripts 165 dependents

gesistsa

rio:A Swiss-Army Knife for Data I/O

Streamlined data import and export by making assumptions that the user is probably willing to make: 'import()' and 'export()' determine the data format from the file extension, reasonable defaults are used for data import and export, web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly, and fast import packages are used where appropriate. An additional convenience function, 'convert()', provides a simple method for converting between file types.

Maintained by Chung-hong Chan. Last updated 3 months ago.

csv csvy data data-science excel io rio sas spss stata

610 stars 17.10 score 7.8k scripts 74 dependents

bioc

clusterProfiler:A universal enrichment tool for interpreting omics data

This package supports functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation. It provides a univeral interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios. It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation. Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions.

Maintained by Guangchuang Yu. Last updated 4 months ago.

annotation clustering genesetenrichment go kegg multiplecomparison pathways reactome visualization enrichment-analysis gsea

1.1k stars 17.03 score 11k scripts 48 dependents

ddsjoberg

gtsummary:Presentation-Ready Data Summary and Analytic Result Tables

Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.

Maintained by Daniel D. Sjoberg. Last updated 6 days ago.

easy-to-use gt html5 regression-models reproducibility reproducible-research statistics summary-statistics summary-tables table1 tableone

1.1k stars 17.02 score 8.2k scripts 15 dependents

thomasp85

ggraph:An Implementation of Grammar of Graphics for Graphs and Networks

The grammar of graphics as implemented in ggplot2 is a poor fit for graph and network visualizations due to its reliance on tabular data input. ggraph is an extension of the ggplot2 API tailored to graph visualizations and provides the same flexible approach to building up plots layer by layer.

Maintained by Thomas Lin Pedersen. Last updated 1 years ago.

ggplot-extension ggplot2 graph-visualization network-visualization visualization cpp

1.1k stars 16.96 score 9.2k scripts 111 dependents

const-ae

ggsignif:Significance Brackets for 'ggplot2'

Enrich your 'ggplots' with group-wise comparisons. This package provides an easy way to indicate if two groups are significantly different. Commonly this is shown by a bracket on top connecting the groups of interest which itself is annotated with the level of significance (NS, *, **, ***). The package provides a single layer (geom_signif()) that takes the groups for comparison and the test (t.test(), wilcox.text() etc.) as arguments and adds the annotation to the plot.

Maintained by Constantin Ahlmann-Eltze. Last updated 8 months ago.

asterisk ggplot-extension ggplot2 significance-stars

601 stars 16.89 score 3.6k scripts 417 dependents

satijalab

Seurat:Tools for Single Cell Genomics

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.

Maintained by Paul Hoffman. Last updated 1 years ago.

human-cell-atlas single-cell-genomics single-cell-rna-seq cpp

2.4k stars 16.86 score 50k scripts 73 dependents

juliasilge

tidytext:Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Maintained by Julia Silge. Last updated 12 months ago.

natural-language-processing text-mining tidy-data tidyverse

1.2k stars 16.86 score 17k scripts 61 dependents

bioc

ggtree:an R package for visualization of tree and annotation data

'ggtree' extends the 'ggplot2' plotting system which implemented the grammar of graphics. 'ggtree' is designed for visualization and annotation of phylogenetic trees and other tree-like structures with their annotation data.

Maintained by Guangchuang Yu. Last updated 5 months ago.

alignment annotation clustering dataimport multiplesequencealignment phylogenetics reproducibleresearch software visualization annotations ggplot2 phylogenetic-trees

871 stars 16.83 score 5.1k scripts 109 dependents

ropensci

skimr:Compact and Flexible Summaries of Data

A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README.

Maintained by Elin Waring. Last updated 2 months ago.

peer-reviewed ropensci summary-statistics unconf unconf17

1.1k stars 16.80 score 18k scripts 14 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 3 days ago.

fortran cpp

86 stars 16.73 score 7.7k scripts 101 dependents

tidymodels

rsample:General Resampling Infrastructure

Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).

Maintained by Hannah Frick. Last updated 20 days ago.

341 stars 16.72 score 5.2k scripts 79 dependents

wilkelab

ggridges:Ridgeline Plots in 'ggplot2'

Ridgeline plots provide a convenient way of visualizing changes in distributions over time or space. This package enables the creation of such plots in 'ggplot2'.

Maintained by Claus O. Wilke. Last updated 4 months ago.

418 stars 16.71 score 14k scripts 285 dependents

klausvigo

phangorn:Phylogenetic Reconstruction and Analysis

Allows for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation (Schliep 2011). Offers methods for tree comparison, model selection and visualization of phylogenetic networks as described in Schliep et al. (2017).

Maintained by Klaus Schliep. Last updated 3 days ago.

software technology qualitycontrol phylogenetic-analysis phylogenetics openblas cpp

206 stars 16.70 score 2.5k scripts 135 dependents

stan-dev

bayesplot:Plotting for Bayesian Models

Plotting functions for posterior analysis, MCMC diagnostics, prior and posterior predictive checks, and other visualizations to support the applied Bayesian workflow advocated in Gabry, Simpson, Vehtari, Betancourt, and Gelman (2019) <doi:10.1111/rssa.12378>. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with 'Stan'.

Maintained by Jonah Gabry. Last updated 2 months ago.

bayesian ggplot2 mcmc pandoc stan statistical-graphics visualization

436 stars 16.69 score 6.5k scripts 98 dependents

kassambara

ggpubr:'ggplot2' Based Publication Ready Plots

The 'ggplot2' package is excellent and flexible for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a 'ggplot', the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. 'ggpubr' provides some easy-to-use functions for creating and customizing 'ggplot2'- based publication ready plots.

Maintained by Alboukadel Kassambara. Last updated 2 years ago.

1.2k stars 16.68 score 65k scripts 409 dependents

paul-buerkner

brms:Bayesian Regression Models using 'Stan'

Fit Bayesian generalized (non-)linear multivariate multilevel models using 'Stan' for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include both theory-driven and data-driven non-linear terms, auto-correlation structures, censoring and truncation, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their prior knowledge. Models can easily be evaluated and compared using several methods assessing posterior or prior predictions. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>; Bürkner (2018) <doi:10.32614/RJ-2018-017>; Bürkner (2021) <doi:10.18637/jss.v100.i05>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.

Maintained by Paul-Christian Bürkner. Last updated 5 days ago.

bayesian-inference brms multilevel-models stan statistical-models

1.3k stars 16.64 score 13k scripts 35 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 3 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

462 stars 16.64 score 10k scripts 154 dependents

tidyverse

hms:Pretty Time of Day

Implements an S3 class for storing and formatting time-of-day values, based on the 'difftime' class.

Maintained by Kirill Müller. Last updated 5 days ago.

hms time

141 stars 16.54 score 1.3k scripts 3.2k dependents

tidymodels

tidymodels:Easily Install and Load the 'Tidymodels' Packages

The tidy modeling "verse" is a collection of packages for modeling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse.

Maintained by Max Kuhn. Last updated 1 months ago.

783 stars 16.52 score 66k scripts 15 dependents

tidyverse

modelr:Modelling Functions that Work with the Pipe

Functions for modelling that help you seamlessly integrate modelling into a pipeline of data manipulation and visualisation.

Maintained by Hadley Wickham. Last updated 1 years ago.

modelling

400 stars 16.46 score 6.9k scripts 1.1k dependents

tidymodels

parsnip:A Common API to Modeling and Analysis Functions

A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).

Maintained by Max Kuhn. Last updated 19 days ago.

612 stars 16.37 score 3.4k scripts 69 dependents

bioc

fgsea:Fast Gene Set Enrichment Analysis

The package implements an algorithm for fast gene set enrichment analysis. Using the fast algorithm allows to make more permutations and get more fine grained p-values, which allows to use accurate stantard approaches to multiple hypothesis correction.

Maintained by Alexey Sergushichev. Last updated 12 days ago.

geneexpression differentialexpression genesetenrichment pathways cpp

392 stars 16.31 score 3.9k scripts 101 dependents

r-dbi

odbc:Connect to ODBC Compatible Databases (using the DBI Interface)

A DBI-compatible interface to ODBC databases.

Maintained by Hadley Wickham. Last updated 4 days ago.

database odbc unixodbc cpp

396 stars 16.31 score 2.9k scripts 23 dependents

r-tmap

tmap:Thematic Maps

Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.

Maintained by Martijn Tennekes. Last updated 3 days ago.

choropleth-maps maps spatial thematic-maps visualisation

879 stars 16.25 score 13k scripts 24 dependents

stan-dev

posterior:Tools for Working with Posterior Distributions

Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.

Maintained by Paul-Christian Bürkner. Last updated 3 days ago.

bayes bayesian mcmc

168 stars 16.21 score 3.3k scripts 346 dependents

jrnold

ggthemes:Extra Themes, Scales and Geoms for 'ggplot2'

Some extra themes, geoms, and scales for 'ggplot2'. Provides 'ggplot2' themes and scales that replicate the look of plots by Edward Tufte, Stephen Few, 'Fivethirtyeight', 'The Economist', 'Stata', 'Excel', and 'The Wall Street Journal', among others. Provides 'geoms' for Tufte's box plot and range frame.

Maintained by Jeffrey B. Arnold. Last updated 1 years ago.

data-visualisation ggplot2 ggplot2-themes plot plotting theme visualization

1.3k stars 16.17 score 40k scripts 102 dependents

ggobi

GGally:Extension to 'ggplot2'

The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.

Maintained by Barret Schloerke. Last updated 11 months ago.

597 stars 16.15 score 17k scripts 154 dependents

r-lib

styler:Non-Invasive Pretty Printing of R Code

Pretty-prints R code without changing the user's formatting intent.

Maintained by Lorenz Walthert. Last updated 2 months ago.

pretty-print

754 stars 16.15 score 940 scripts 62 dependents

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 26 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

375 stars 16.11 score 17k scripts 115 dependents

tidyverse

dtplyr:Data Table Back-End for 'dplyr'

Provides a data.table backend for 'dplyr'. The goal of 'dtplyr' is to allow you to write 'dplyr' code that is automatically translated to the equivalent, but usually much faster, data.table code.

Maintained by Hadley Wickham. Last updated 2 months ago.

datatable dplyr

671 stars 16.11 score 2.5k scripts 148 dependents

bioc

biomaRt:Interface to BioMart databases (i.e. Ensembl)

In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.

Maintained by Mike Smith. Last updated 17 days ago.

annotation bioconductor biomart ensembl

38 stars 15.99 score 13k scripts 230 dependents

thomasp85

ggforce:Accelerating 'ggplot2'

The aim of 'ggplot2' is to aid in visual data investigations. This focus has led to a lack of facilities for composing specialised plots. 'ggforce' aims to be a collection of mainly new stats and geoms that fills this gap. All additional functionality is aimed to come through the official extension system so using 'ggforce' should be a stable experience.

Maintained by Thomas Lin Pedersen. Last updated 6 days ago.

ggplot-extension ggplot2 visualization cpp

929 stars 15.98 score 9.3k scripts 298 dependents

kassambara

survminer:Drawing Survival Curves using 'ggplot2'

Contains the function 'ggsurvplot()' for drawing easily beautiful and 'ready-to-publish' survival curves with the 'number at risk' table and 'censoring count plot'. Other functions are also available to plot adjusted curves for `Cox` model and to visually examine 'Cox' model assumptions.

Maintained by Alboukadel Kassambara. Last updated 5 months ago.

524 stars 15.87 score 7.0k scripts 55 dependents

tidymodels

infer:Tidy Statistical Inference

The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.

Maintained by Simon Couch. Last updated 6 months ago.

736 stars 15.75 score 3.5k scripts 18 dependents

r-lib

progress:Terminal Progress Bars

Configurable Progress bars, they may include percentage, elapsed time, and/or the estimated completion time. They work in terminals, in 'Emacs' 'ESS', 'RStudio', 'Windows' 'Rgui' and the 'macOS' 'R.app'. The package also provides a 'C++' 'API', that works with or without 'Rcpp'.

Maintained by Gábor Csárdi. Last updated 5 months ago.

470 stars 15.72 score 2.1k scripts 3.0k dependents

wilkelab

ggtext:Improved Text Rendering Support for 'ggplot2'

A 'ggplot2' extension that enables the rendering of complex formatted plot labels (titles, subtitles, facet labels, axis labels, etc.). Text boxes with automatic word wrap are also supported.

Maintained by Brenton M. Wiernik. Last updated 3 years ago.

657 stars 15.71 score 13k scripts 155 dependents

bioc

enrichplot:Visualization of Functional Enrichment Result

The 'enrichplot' package implements several visualization methods for interpreting functional enrichment results obtained from ORA or GSEA analysis. It is mainly designed to work with the 'clusterProfiler' package suite. All the visualization methods are developed based on 'ggplot2' graphics.

Maintained by Guangchuang Yu. Last updated 3 months ago.

annotation genesetenrichment go kegg pathways software visualization enrichment-analysis pathway-analysis

239 stars 15.71 score 3.1k scripts 58 dependents

stan-dev

rstanarm:Bayesian Applied Regression Modeling via Stan

Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.

Maintained by Ben Goodrich. Last updated 12 days ago.

bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics multilevel-models rstan rstanarm stan statistical-modeling cpp

393 stars 15.70 score 5.0k scripts 13 dependents

r-lib

profvis:Interactive Visualizations for Profiling R Code

Interactive visualizations for profiling R code.

Maintained by Hadley Wickham. Last updated 6 months ago.

310 stars 15.64 score 1.3k scripts 153 dependents

njtierney

naniar:Data Structures, Summaries, and Visualisations for Missing Data

Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.

Maintained by Nicholas Tierney. Last updated 18 days ago.

data-visualisation ggplot2 missing-data missingness tidy-data

657 stars 15.63 score 5.1k scripts 9 dependents

facebook

prophet:Automatic Forecasting Procedure

Implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

Maintained by Sean Taylor. Last updated 5 months ago.

forecasting python cpp

19k stars 15.59 score 976 scripts 13 dependents

r-lib

gh:'GitHub' 'API'

Minimal client to access the 'GitHub' 'API'.

Maintained by Gábor Csárdi. Last updated 2 months ago.

github github-api

224 stars 15.55 score 444 scripts 401 dependents

thomasp85

gganimate:A Grammar of Animated Graphics

The grammar of graphics as implemented in the 'ggplot2' package has been successful in providing a powerful API for creating static visualisation. In order to extend the API for animated graphics this package provides a completely new set of grammar, fully compatible with 'ggplot2' for specifying transitions and animations in a flexible and extensible way.

Maintained by Thomas Lin Pedersen. Last updated 6 days ago.

animation data-visualization ggplot-extension ggplot2 transition

2.0k stars 15.53 score 13k scripts 24 dependents

rstudio

tensorflow:R Interface to 'TensorFlow'

Interface to 'TensorFlow' <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more 'CPUs' or 'GPUs' in a desktop, server, or mobile device with a single 'API'. 'TensorFlow' was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

Maintained by Tomasz Kalinowski. Last updated 5 days ago.

1.3k stars 15.47 score 3.2k scripts 75 dependents

tidymodels

yardstick:Tidy Characterizations of Model Performance

Tidy tools for quantifying how well model fits to a data set such as confusion matrices, class probability curve summaries, and regression metrics (e.g., RMSE).

Maintained by Emil Hvitfeldt. Last updated 19 days ago.

387 stars 15.47 score 2.2k scripts 60 dependents

eclarke

ggbeeswarm:Categorical Scatter (Violin Point) Plots

Provides two methods of plotting categorical scatter plots such that the arrangement of points within a category reflects the density of data at that region, and avoids over-plotting.

Maintained by Erik Clarke. Last updated 5 months ago.

550 stars 15.45 score 7.6k scripts 84 dependents

r-forge

car:Companion to Applied Regression

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.

Maintained by John Fox. Last updated 5 months ago.

15.38 score 43k scripts 919 dependents

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 21 days ago.

100 stars 15.36 score 1.4k scripts 36 dependents

bioc

GenomicFeatures:Query the gene models of a given organism/assembly

Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.

Maintained by H. Pagès. Last updated 5 months ago.

genetics infrastructure annotation sequencing genomeannotation bioconductor-package core-package

26 stars 15.34 score 5.3k scripts 339 dependents

hms-dbmi

UpSetR:A More Scalable Alternative to Venn and Euler Diagrams for Visualizing Intersecting Sets

Creates visualizations of intersecting sets using a novel matrix design, along with visualizations of several common set, element and attribute related tasks (Conway 2017) <doi:10.1093/bioinformatics/btx364>.

Maintained by Jake Conway. Last updated 4 years ago.

gehlenborglab ggplot2 upset upsetr visualization

781 stars 15.33 score 4.8k scripts 42 dependents

gforge

htmlTable:Advanced Tables for Markdown/HTML

Tables with state-of-the-art layout elements such as row spanners, column spanners, table spanners, zebra striping, and more. While allowing advanced layout, the underlying css-structure is simple in order to maximize compatibility with common word processors. The package also contains a few text formatting functions that help outputting text compatible with HTML/LaTeX.

Maintained by Max Gordon. Last updated 8 months ago.

knitr table

79 stars 15.33 score 1.3k scripts 767 dependents

rich-iannone

DiagrammeR:Graph/Network Visualization

Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.

Maintained by Richard Iannone. Last updated 2 months ago.

graph graph-functions network-graph property-graph visualization

1.7k stars 15.29 score 3.8k scripts 86 dependents

kassambara

rstatix:Pipe-Friendly Framework for Basic Statistical Tests

Provides a simple and intuitive pipe-friendly framework, coherent with the 'tidyverse' design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely 'within-Ss' designs (repeated measures), purely 'between-Ss' designs, and mixed 'within-and-between-Ss' designs. It's also possible to compute several effect size metrics, including "eta squared" for ANOVA, "Cohen's d" for t-test and 'Cramer V' for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.

Maintained by Alboukadel Kassambara. Last updated 2 years ago.

458 stars 15.27 score 11k scripts 432 dependents

bbolker

broom.mixed:Tidying Methods for Mixed Models

Convert fitted objects from various R mixed-model packages into tidy data frames along the lines of the 'broom' package. The package provides three S3 generics for each model: tidy(), which summarizes a model's statistical findings such as coefficients of a regression; augment(), which adds columns to the original data such as predictions, residuals and cluster assignments; and glance(), which provides a one-row summary of model-level statistics.

Maintained by Ben Bolker. Last updated 7 days ago.

230 stars 15.22 score 4.0k scripts 37 dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 13 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

959 stars 15.20 score 4.0k scripts 21 dependents

ropensci

targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines

Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).

Maintained by William Michael Landau. Last updated 3 days ago.

data-science high-performance-computing make peer-reviewed pipeline r-targetopia reproducibility reproducible-research targets workflow

978 stars 15.16 score 4.6k scripts 22 dependents

bioc

AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor

Implements a user-friendly interface for querying SQLite-based annotation data packages.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation microarray sequencing genomeannotation bioconductor-package core-package

9 stars 15.05 score 3.6k scripts 769 dependents

larmarange

labelled:Manipulating Labelled Data

Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.

Maintained by Joseph Larmarange. Last updated 1 months ago.

haven labels metadata sas spss stata

76 stars 15.04 score 2.4k scripts 98 dependents

hojsgaard

doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities

Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.

Maintained by Søren Højsgaard. Last updated 20 hours ago.

1 stars 14.99 score 3.2k scripts 948 dependents

bioc

DOSE:Disease Ontology Semantic and Enrichment analysis

This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation visualization multiplecomparison genesetenrichment pathways software disease-ontology enrichment-analysis semantic-similarity

119 stars 14.97 score 2.0k scripts 61 dependents

tidyverse

googledrive:An Interface to Google Drive

Manage Google Drive files from R.

Maintained by Jennifer Bryan. Last updated 8 months ago.

google-drive

329 stars 14.97 score 2.1k scripts 164 dependents

mjskay

ggdist:Visualizations of Distributions and Uncertainty

Provides primitives for visualizing distributions using 'ggplot2' that are particularly tuned for visualizing uncertainty in either a frequentist or Bayesian mode. Both analytical distributions (such as frequentist confidence distributions or Bayesian priors) and distributions represented as samples (such as bootstrap distributions or Bayesian posterior samples) are easily visualized. Visualization primitives include but are not limited to: points with multiple uncertainty intervals, eye plots (Spiegelhalter D., 1999) <https://ideas.repec.org/a/bla/jorssa/v162y1999i1p45-58.html>, density plots, gradient plots, dot plots (Wilkinson L., 1999) <doi:10.1080/00031305.1999.10474474>, quantile dot plots (Kay M., Kola T., Hullman J., Munson S., 2016) <doi:10.1145/2858036.2858558>, complementary cumulative distribution function barplots (Fernandes M., Walls L., Munson S., Hullman J., Kay M., 2018) <doi:10.1145/3173574.3173718>, and fit curves with multiple uncertainty ribbons.

Maintained by Matthew Kay. Last updated 4 months ago.

ggplot2 uncertainty uncertainty-visualization visualization cpp

859 stars 14.95 score 3.1k scripts 62 dependents

guido-s

meta:General Package for Meta-Analysis

User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker <DOI:10.1007/978-3-319-21416-0>, "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from 'RevMan 5'; - produce forest plot summarising several (subgroup) meta-analyses.

Maintained by Guido Schwarzer. Last updated 3 days ago.

meta-analysis rstudio

89 stars 14.95 score 2.3k scripts 30 dependents

bioc

MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor

Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.

Maintained by Marcel Ramos. Last updated 2 months ago.

infrastructure datarepresentation bioconductor bioconductor-package genomics nci-itcr tcga u24ca289073

71 stars 14.94 score 670 scripts 126 dependents

philchalmers

mirt:Multidimensional Item Response Theory

Analysis of discrete response data using unidimensional and multidimensional item analysis models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory item factor analysis models are estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier models are available for modeling item testlets using dimension reduction EM algorithms, while multiple group analyses and mixed effects designs are included for detecting differential item, bundle, and test functioning, and for modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, mixture IRT models, and zero-inflated response models are supported, as well as a wide family of probabilistic unfolding models.

Maintained by Phil Chalmers. Last updated 4 days ago.

irt mirt openblas cpp openmp

212 stars 14.93 score 2.5k scripts 40 dependents

tidymodels

hardhat:Construct Modeling Packages

Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.

Maintained by Hannah Frick. Last updated 2 months ago.

103 stars 14.88 score 175 scripts 436 dependents

cynkra

dm:Relational Data Models

Provides tools for working with multiple related tables, stored as data frames or in a relational database. Multiple tables (data and metadata) are stored in a compound object, which can then be manipulated with a pipe-friendly syntax.

Maintained by Kirill Müller. Last updated 3 months ago.

data-model data-warehousing datawarehousing dbi dbplyr relational-databases

511 stars 14.81 score 410 scripts 8 dependents

r-dbi

RPostgres:C++ Interface to PostgreSQL

Fully DBI-compliant C++-backed interface to PostgreSQL <https://www.postgresql.org/>, an open-source relational database.

Maintained by Kirill Müller. Last updated 1 months ago.

database postgres postgresql cpp

338 stars 14.78 score 1.6k scripts 31 dependents

bioc

GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.

Maintained by Robert Castelo. Last updated 10 days ago.

functionalgenomics microarray rnaseq pathways genesetenrichment gene-set-enrichment genomics pathway-enrichment-analysis

212 stars 14.74 score 1.6k scripts 19 dependents

florianhartig

DHARMa:Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models

The 'DHARMa' package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Currently supported are linear and generalized linear (mixed) models from 'lme4' (classes 'lmerMod', 'glmerMod'), 'glmmTMB', 'GLMMadaptive', and 'spaMM'; phylogenetic linear models from 'phylolm' (classes 'phylolm' and 'phyloglm'); generalized additive models ('gam' from 'mgcv'); 'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model classes. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial, phylogenetic and temporal autocorrelation.

Maintained by Florian Hartig. Last updated 27 days ago.

glmm regression regression-diagnostics residual

226 stars 14.74 score 2.8k scripts 10 dependents

thomasp85

tidygraph:A Tidy API for Graph Manipulation

A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.

Maintained by Thomas Lin Pedersen. Last updated 2 months ago.

graph-algorithms graph-manipulation igraph network-analysis tidyverse cpp

553 stars 14.74 score 4.6k scripts 136 dependents

mjskay

tidybayes:Tidy Data and 'Geoms' for Bayesian Models

Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.

Maintained by Matthew Kay. Last updated 7 months ago.

bayesian-data-analysis brms ggplot2 jags stan tidy-data visualization

733 stars 14.72 score 7.3k scripts 20 dependents

husson

FactoMineR:Multivariate Exploratory Data Analysis and Data Mining

Exploratory data analysis methods to summarize, visualize and describe datasets. The main principal component methods are available, those with the largest potential in terms of applications: principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, Multiple Factor Analysis when variables are structured in groups, etc. and hierarchical cluster analysis. F. Husson, S. Le and J. Pages (2017).

Maintained by Francois Husson. Last updated 4 months ago.

47 stars 14.71 score 5.6k scripts 112 dependents

dcomtois

summarytools:Tools to Quickly and Neatly Summarize Data

Data frame summaries, cross-tabulations, weight-enabled frequency tables and common descriptive (univariate) statistics in concise tables available in a variety of formats (plain ASCII, Markdown and HTML). A good point-of-entry for exploring data, both for experienced and new R users.

Maintained by Dominic Comtois. Last updated 5 days ago.

descriptive-statistics frequency-table html-report markdown pander pandoc pandoc-markdown rmarkdown rstudio

527 stars 14.62 score 2.9k scripts 6 dependents

sinhrks

ggfortify:Data Visualization Tools for Statistical Analysis Results

Unified plotting tools for statistics commonly used, such as GLM, time series, PCA families, clustering and survival analysis. The package offers a single plotting interface for these analysis results and plots in a unified style using 'ggplot2'.

Maintained by Yuan Tang. Last updated 9 months ago.

528 stars 14.60 score 9.1k scripts 24 dependents

tidyverse

googlesheets4:Access Google Sheets using the Sheets API V4

Interact with Google Sheets through the Sheets API v4 <https://developers.google.com/sheets/api>. "API" is an acronym for "application programming interface"; the Sheets API allows users to interact with Google Sheets programmatically, instead of via a web browser. The "v4" refers to the fact that the Sheets API is currently at version 4. This package can read and write both the metadata and the cell data in a Sheet.

Maintained by Jennifer Bryan. Last updated 8 months ago.

google-drive google-sheets spreadsheet

363 stars 14.55 score 7.0k scripts 142 dependents

r-lib

clock:Date-Time Types and Tools

Provides a comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) that partition responsibilities so that the complexities of time zones are only considered when they are really needed. Capabilities include: date-time parsing, formatting, arithmetic, extraction and updating of components, and rounding.

Maintained by Davis Vaughan. Last updated 14 days ago.

cpp

106 stars 14.53 score 296 scripts 407 dependents

hojsgaard

pbkrtest:Parametric Bootstrap, Kenward-Roger and Satterthwaite Based Methods for Test in Mixed Models

Computes p-values based on (a) Satterthwaite or Kenward-Rogers degree of freedom methods and (b) parametric bootstrap for mixed effects models as implemented in the 'lme4' package. Implements parametric bootstrap test for generalized linear mixed models as implemented in 'lme4' and generalized linear models. The package is documented in the paper by Halekoh and Højsgaard, (2012, <doi:10.18637/jss.v059.i09>). Please see 'citation("pbkrtest")' for citation details.

Maintained by Søren Højsgaard. Last updated 15 hours ago.

6 stars 14.53 score 648 scripts 929 dependents

ropensci

osmdata:Import 'OpenStreetMap' Data as Simple Features or Spatial Objects

Download and import of 'OpenStreetMap' ('OSM') data as 'sf' or 'sp' objects. 'OSM' data are extracted from the 'Overpass' web server (<https://overpass-api.de/>) and processed with very fast 'C++' routines for return to 'R'.

Maintained by Mark Padgham. Last updated 2 months ago.

open0street0map openstreetmap overpass0api osm cpp osm-data overpass-api peer-reviewed cpp

322 stars 14.53 score 2.8k scripts 14 dependents

jacob-long

jtools:Analysis and Presentation of Social Scientific Data

This is a collection of tools for more efficiently understanding and sharing the results of (primarily) regression analyses. There are also a number of miscellaneous functions for statistical and programming purposes. Support for models produced by the survey and lme4 packages are points of emphasis.

Maintained by Jacob A. Long. Last updated 7 months ago.

social-sciences

167 stars 14.48 score 4.0k scripts 14 dependents

tidyverts

tsibble:Tidy Temporal Data Frames and Tools

Provides a 'tbl_ts' class (the 'tsibble') for temporal data in an data- and model-oriented format. The 'tsibble' provides tools to easily manipulate and analyse temporal data, such as filling in time gaps and aggregating over calendar periods.

Maintained by Earo Wang. Last updated 2 months ago.

536 stars 14.48 score 4.4k scripts 42 dependents

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 1 months ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

310 stars 14.47 score 1.6k scripts 6 dependents

indrajeetpatil

ggstatsplot:'ggplot2' Based Plots with Statistical Details

Extension of 'ggplot2', 'ggstatsplot' creates graphics with details from statistical tests included in the plots themselves. It provides an easier syntax to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms, dot plots, dot-and-whisker plots) or categorical (pie and bar charts) data. Currently, it supports the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian versions of t-test/ANOVA, correlation analyses, contingency table analysis, meta-analysis, and regression analyses. References: Patil (2021) <doi:10.21105/joss.03236>.

Maintained by Indrajeet Patil. Last updated 1 months ago.

bayes-factors datascience dataviz effect-size ggplot-extension hypothesis-testing non-parametric-statistics regression-models statistical-analysis

2.1k stars 14.46 score 3.0k scripts 1 dependents

statistikat

VIM:Visualization and Imputation of Missing Values

New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.

Maintained by Matthias Templ. Last updated 8 months ago.

hotdeck imputation-methods model-predictions visualization cpp

85 stars 14.44 score 2.6k scripts 19 dependents

singmann

afex:Analysis of Factorial Experiments

Convenience functions for analyzing factorial experiments using ANOVA or mixed models. aov_ez(), aov_car(), and aov_4() allow specification of between, within (i.e., repeated-measures), or mixed (i.e., split-plot) ANOVAs for data in long format (i.e., one observation per row), automatically aggregating multiple observations per individual and cell of the design. mixed() fits mixed models using lme4::lmer() and computes p-values for all fixed effects using either Kenward-Roger or Satterthwaite approximation for degrees of freedom (LMM only), parametric bootstrap (LMMs and GLMMs), or likelihood ratio tests (LMMs and GLMMs). afex_plot() provides a high-level interface for interaction or one-way plots using ggplot2, combining raw data and model estimates. afex uses type 3 sums of squares as default (imitating commercial statistical software).

Maintained by Henrik Singmann. Last updated 7 months ago.

124 stars 14.43 score 1.4k scripts 15 dependents

davidgohel

ggiraph:Make 'ggplot2' Graphics Interactive

Create interactive 'ggplot2' graphics using 'htmlwidgets'.

Maintained by David Gohel. Last updated 3 days ago.

libpng cpp

822 stars 14.37 score 4.1k scripts 35 dependents

tidymodels

dials:Tools for Creating Tuning Parameter Values

Many models contain tuning parameters (i.e. parameters that cannot be directly estimated from the data). These tools can be used to define objects for creating, simulating, or validating values for such parameters.

Maintained by Hannah Frick. Last updated 2 months ago.

114 stars 14.31 score 426 scripts 52 dependents

bioc

xcms:LC-MS and GC-MS Data Analysis

Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.

Maintained by Steffen Neumann. Last updated 17 days ago.

immunooncology massspectrometry metabolomics bioconductor feature-detection mass-spectrometry peak-detection cpp

196 stars 14.31 score 984 scripts 11 dependents

tidymodels

tune:Tidy Tuning Tools

The ability to tune models is important. 'tune' contains functions and classes to be used in conjunction with other 'tidymodels' packages for finding reasonable values of hyper-parameters in models, pre-processing methods, and post-processing steps.

Maintained by Max Kuhn. Last updated 27 days ago.

293 stars 14.27 score 756 scripts 39 dependents

talgalili

heatmaply:Interactive Cluster Heat Maps Using 'plotly' and 'ggplot2'

Create interactive cluster 'heatmaps' that can be saved as a stand- alone HTML file, embedded in 'R Markdown' documents or in a 'Shiny' app, and available in the 'RStudio' viewer pane. Hover the mouse pointer over a cell to show details or drag a rectangle to zoom. A 'heatmap' is a popular graphical method for visualizing high-dimensional data, in which a table of numbers are encoded as a grid of colored cells. The rows and columns of the matrix are ordered to highlight patterns and are often accompanied by 'dendrograms'. 'Heatmaps' are used in many fields for visualizing observations, correlations, missing values patterns, and more. Interactive 'heatmaps' allow the inspection of specific value by hovering the mouse over a cell, as well as zooming into a region of the 'heatmap' by dragging a rectangle around the relevant area. This work is based on the 'ggplot2' and 'plotly.js' engine. It produces similar 'heatmaps' to 'heatmap.2' with the advantage of speed ('plotly.js' is able to handle larger size matrix), the ability to zoom from the 'dendrogram' panes, and the placing of factor variables in the sides of the 'heatmap'.

Maintained by Tal Galili. Last updated 9 months ago.

d3-heatmap dendextend dendrogram ggplot2 heatmap plotly

386 stars 14.21 score 2.0k scripts 45 dependents

business-science

timetk:A Tool Kit for Working with Time Series

Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.

Maintained by Matt Dancho. Last updated 1 years ago.

coercion coercion-functions data-mining dplyr forecast forecasting forecasting-models machine-learning series-decomposition series-signature tibble tidy tidyquant tidyverse time time-series timeseries

626 stars 14.20 score 4.0k scripts 16 dependents

eliocamp

ggnewscale:Multiple Fill and Colour Scales in 'ggplot2'

Use multiple fill and colour scales in 'ggplot2'.

Maintained by Elio Campitelli. Last updated 1 months ago.

ggplot2

414 stars 14.18 score 4.9k scripts 136 dependents

rstudio

pins:Pin, Discover, and Share Resources

Publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with 'DropBox'), 'Posit Connect', 'AWS S3', and more.

Maintained by Julia Silge. Last updated 2 months ago.

azure gcloud rpins rsconnect s3 storage

321 stars 14.17 score 1.9k scripts 17 dependents

dkahle

ggmap:Spatial Visualization with ggplot2

A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps and Stamen Maps). It includes tools common to those tasks, including functions for geolocation and routing.

Maintained by David Kahle. Last updated 1 years ago.

770 stars 14.17 score 12k scripts 31 dependents

doi-usgs

dataRetrieval:Retrieval Functions for USGS and EPA Hydrology and Water Quality Data

Collection of functions to help retrieve U.S. Geological Survey and U.S. Environmental Protection Agency water quality and hydrology data from web services. Data are discovered from National Water Information System <https://waterservices.usgs.gov/> and <https://waterdata.usgs.gov/nwis>. Water quality data are obtained from the Water Quality Portal <https://www.waterqualitydata.us/>.

Maintained by Laura DeCicco. Last updated 5 days ago.

usgs

286 stars 14.16 score 1.7k scripts 15 dependents

corybrunson

ggalluvial:Alluvial Plots in 'ggplot2'

Alluvial plots use variable-width ribbons and stacked bar plots to represent multi-dimensional or repeated-measures data with categorical or ordinal variables; see Riehmann, Hanfler, and Froehlich (2005) <doi:10.1109/INFVIS.2005.1532152> and Rosvall and Bergstrom (2010) <doi:10.1371/journal.pone.0008694>. Alluvial plots are statistical graphics in the sense of Wilkinson (2006) <doi:10.1007/0-387-28695-0>; they share elements with Sankey diagrams and parallel sets plots but are uniquely determined from the data and a small set of parameters. This package extends Wickham's (2010) <doi:10.1198/jcgs.2009.07098> layered grammar of graphics to generate alluvial plots from tidy data.

Maintained by Jason Cory Brunson. Last updated 8 months ago.

alluvial-diagrams alluvial-plots categorical-data-visualization ggplot2 repeated-measures-data

507 stars 14.14 score 3.0k scripts 21 dependents

kassambara

factoextra:Extract and Visualize the Results of Multivariate Data Analyses

Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including 'PCA' (Principal Component Analysis), 'CA' (Correspondence Analysis), 'MCA' (Multiple Correspondence Analysis), 'FAMD' (Factor Analysis of Mixed Data), 'MFA' (Multiple Factor Analysis) and 'HMFA' (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides 'ggplot2' - based elegant data visualization.

Maintained by Alboukadel Kassambara. Last updated 5 years ago.

363 stars 14.13 score 15k scripts 52 dependents

bioc

GOSemSim:GO-terms Semantic Similarity Measures

The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters. GOSemSim implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation go clustering pathways network software bioinformatics gene-ontology semantic-similarity cpp

63 stars 14.12 score 708 scripts 68 dependents

bioc

ensembldb:Utilities to create and use Ensembl-based annotation databases

The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.

Maintained by Johannes Rainer. Last updated 5 months ago.

genetics annotationdata sequencing coverage annotation bioconductor bioconductor-packages ensembl

35 stars 14.08 score 892 scripts 108 dependents

bioc

qvalue:Q-value estimation for false discovery rate control

This package takes a list of p-values resulting from the simultaneous testing of many hypotheses and estimates their q-values and local FDR values. The q-value of a test measures the proportion of false positives incurred (called the false discovery rate) when that particular test is called significant. The local FDR measures the posterior probability the null hypothesis is true given the test's p-value. Various plots are automatically generated, allowing one to make sensible significance cut-offs. Several mathematical results have recently been shown on the conservative accuracy of the estimated q-values from this software. The software can be applied to problems in genomics, brain imaging, astrophysics, and data mining.

Maintained by John D. Storey. Last updated 5 months ago.

multiplecomparisons

116 stars 14.07 score 3.0k scripts 139 dependents

teunbrand

ggh4x:Hacks for 'ggplot2'

A 'ggplot2' extension that does a variety of little helpful things. The package extends 'ggplot2' facets through customisation, by setting individual scales per panel, resizing panels and providing nested facets. Also allows multiple colour and fill scales per plot. Also hosts a smaller collection of stats, geoms and axis guides.

Maintained by Teun van den Brand. Last updated 12 days ago.

ggplot-extension ggplot2

617 stars 14.06 score 4.4k scripts 21 dependents

walkerke

tidycensus:Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames

An integrated R interface to several United States Census Bureau APIs (<https://www.census.gov/data/developers/data-sets.html>) and the US Census Bureau's geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for mapping and spatial analysis.

Maintained by Kyle Walker. Last updated 2 months ago.

648 stars 14.02 score 7.5k scripts 10 dependents

r-lib

slider:Sliding Window Functions

Provides type-stable rolling window functions over any R data type. Cumulative and expanding windows are also supported. For more advanced usage, an index can be used as a secondary vector that defines how sliding windows are to be created.

Maintained by Davis Vaughan. Last updated 2 months ago.

302 stars 13.99 score 848 scripts 99 dependents

tazinho

snakecase:Convert Strings into any Case

A consistent, flexible and easy to use tool to parse and convert strings into cases like snake or camel among others.

Maintained by Malte Grosser. Last updated 2 years ago.

camelcase case conversion pascalcase snake-case

148 stars 13.98 score 744 scripts 288 dependents

pharmaverse

admiral:ADaM in R Asset Library

A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam>).

Maintained by Ben Straub. Last updated 6 days ago.

cdisc clinical-trials open-source

239 stars 13.97 score 486 scripts 4 dependents

tidymodels

workflows:Modeling Workflows

Managing both a 'parsnip' model and a preprocessor, such as a model formula or recipe from 'recipes', can often be challenging. The goal of 'workflows' is to streamline this process by bundling the model alongside the preprocessor, all within the same object.

Maintained by Simon Couch. Last updated 1 months ago.

207 stars 13.97 score 876 scripts 43 dependents

hughjonesd

huxtable:Easily Create and Style Tables for LaTeX, HTML and Other Formats

Creates styled tables for data presentation. Export to HTML, LaTeX, RTF, 'Word', 'Excel', and 'PowerPoint'. Simple, modern interface to manipulate borders, size, position, captions, colours, text styles and number formatting. Table cells can span multiple rows and/or columns. Includes a 'huxreg' function for creation of regression tables, and 'quick_*' one-liners to print data to a new document.

Maintained by David Hugh-Jones. Last updated 27 days ago.

html huxtable latex microsoft-word powerpoint reproducible-research tables

323 stars 13.93 score 1.9k scripts 16 dependents

jbkunst

highcharter:A Wrapper for the 'Highcharts' Library

A wrapper for the 'Highcharts' library including shortcut functions to plot R objects. 'Highcharts' <https://www.highcharts.com/> is a charting library offering numerous chart types with a simple configuration syntax.

Maintained by Joshua Kunst. Last updated 1 years ago.

highcharts htmlwidgets shiny shiny-r visualization wrapper

725 stars 13.93 score 4.9k scripts 18 dependents

hrbrmstr

hrbrthemes:Additional Themes, Theme Components and Utilities for 'ggplot2'

A compilation of extra 'ggplot2' themes, scales and utilities, including a spell check function for plot label fields and an overall emphasis on typography. A copy of the 'Google' font 'Roboto Condensed' is also included.

Maintained by Bob Rudis. Last updated 17 days ago.

data-visualization datavisualization ggplot-extension ggplot2 ggplot2-scales ggplot2-themes visualization

1.3k stars 13.92 score 13k scripts 15 dependents

bioc

phyloseq:Handling and analysis of high-throughput microbiome census data

phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.

Maintained by Paul J. McMurdie. Last updated 5 months ago.

immunooncology sequencing microbiome metagenomics clustering classification multiplecomparison geneticvariability

600 stars 13.91 score 8.4k scripts 38 dependents

bioc

AnnotationHub:Client to access AnnotationHub resources

This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be discovered. The resource includes metadata about each resource, e.g., a textual description, tags, and date of modification. The client creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure dataimport gui thirdpartyclient core-package u24ca289073

17 stars 13.88 score 2.7k scripts 104 dependents

gergness

srvyr:'dplyr'-Like Syntax for Summary Statistics of Survey Data

Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.

Maintained by Greg Freedman Ellis. Last updated 2 months ago.

survey

215 stars 13.88 score 1.8k scripts 15 dependents

biomodhub

biomod2:Ensemble Platform for Species Distribution Modeling

Functions for species distribution modeling, calibration and evaluation, ensemble of models, ensemble forecasting and visualization. The package permits to run consistently up to 10 single models on a presence/absences (resp presences/pseudo-absences) dataset and to combine them in ensemble models and ensemble projections. Some bench of other evaluation and visualisation tools are also available within the package.

Maintained by Maya Guéguen. Last updated 3 days ago.

95 stars 13.85 score 536 scripts 7 dependents

rsheets

cellranger:Translate Spreadsheet Cell Ranges to Rows and Columns

Helper functions to work with spreadsheets and the "A1:D10" style of cell range specification.

Maintained by Jennifer Bryan. Last updated 7 years ago.

51 stars 13.84 score 80 scripts 843 dependents

liamrevell

phytools:Phylogenetic Tools for Comparative Biology (and Other Things)

A wide range of methods for phylogenetic analysis - concentrated in phylogenetic comparative biology, but also including numerous techniques for visualizing, analyzing, manipulating, reading or writing, and even inferring phylogenetic trees. Included among the functions in phylogenetic comparative biology are various for ancestral state reconstruction, model-fitting, and simulation of phylogenies and trait data. A broad range of plotting methods for phylogenies and comparative data include (but are not restricted to) methods for mapping trait evolution on trees, for projecting trees into phenotype space or a onto a geographic map, and for visualizing correlated speciation between trees. Lastly, numerous functions are designed for reading, writing, analyzing, inferring, simulating, and manipulating phylogenetic trees and comparative data. For instance, there are functions for computing consensus phylogenies from a set, for simulating phylogenetic trees and data under a range of models, for randomly or non-randomly attaching species or clades to a tree, as well as for a wide range of other manipulations and analyses that phylogenetic biologists might find useful in their research.

Maintained by Liam J. Revell. Last updated 1 months ago.

220 stars 13.84 score 4.8k scripts 77 dependents

tidyverse

blob:A Simple S3 Class for Representing Vectors of Binary Data ('BLOBS')

R's raw vector is useful for storing a single binary object. What if you want to put a vector of them in a data frame? The 'blob' package provides the blob object, a list of raw vectors, suitable for use as a column in data frame.

Maintained by Kirill Müller. Last updated 4 months ago.

database

45 stars 13.82 score 157 scripts 1.4k dependents

tidymodels

corrr:Correlations in R

A tool for exploring correlations. It makes it possible to easily perform routine tasks when exploring correlation matrices such as ignoring the diagonal, focusing on the correlations of certain variables against others, or rearranging and visualizing the matrix in terms of the strength of the correlations.

Maintained by Max Kuhn. Last updated 1 years ago.

593 stars 13.82 score 2.9k scripts 7 dependents

bioc

BiocFileCache:Manage Files Across Sessions

This package creates a persistent on-disk cache of files that the user can add, update, and retrieve. It is useful for managing resources (such as custom Txdb objects) that are costly or difficult to create, web resources, and data files used across sessions.

Maintained by Lori Shepherd. Last updated 2 months ago.

dataimport core-package u24ca289073

13 stars 13.76 score 486 scripts 436 dependents

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 3 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

185 stars 13.75 score 1.3k scripts 22 dependents

immunogenomics

harmony:Fast, Sensitive, and Accurate Integration of Single Cell Data

Implementation of the Harmony algorithm for single cell integration, described in Korsunsky et al <doi:10.1038/s41592-019-0619-0>. Package includes a standalone Harmony function and interfaces to external frameworks.

Maintained by Ilya Korsunsky. Last updated 5 months ago.

algorithm data-integration scrna-seq openblas cpp

554 stars 13.74 score 5.5k scripts 8 dependents

richarddmorey

BayesFactor:Computation of Bayes Factors for Common Designs

A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.

Maintained by Richard D. Morey. Last updated 1 years ago.

cpp

132 stars 13.71 score 1.7k scripts 21 dependents

knausb

vcfR:Manipulate and Visualize VCF Data

Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.

Maintained by Brian J. Knaus. Last updated 1 months ago.

genomics population-genetics population-genomics rcpp vcf-data visualization zlib cpp

256 stars 13.66 score 3.1k scripts 19 dependents

aphalo

ggpmisc:Miscellaneous Extensions to 'ggplot2'

Extensions to 'ggplot2' respecting the grammar of graphics paradigm. Statistics: locate and tag peaks and valleys; label plot with the equation of a fitted polynomial or other types of models; labels with P-value, R^2 or adjusted R^2 or information criteria for fitted models; label with ANOVA table for fitted models; label with summary for fitted models. Model fit classes for which suitable methods are provided by package 'broom' and 'broom.mixed' are supported. Scales and stats to build volcano and quadrant plots based on outcomes, fold changes, p-values and false discovery rates.

Maintained by Pedro J. Aphalo. Last updated 2 days ago.

data-analysis dataviz ggplot2-annotations ggplot2-stats statistics

107 stars 13.64 score 4.4k scripts 14 dependents

ropensci

taxize:Taxonomic Information from Around the Web

Interacts with a suite of web application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more. Some of the services supported include 'NCBI E-utilities' (<https://www.ncbi.nlm.nih.gov/books/NBK25501/>), 'Encyclopedia of Life' (<https://eol.org/docs/what-is-eol/data-services>), 'Global Biodiversity Information Facility' (<https://techdocs.gbif.org/en/openapi/>), and many more. Links to the API documentation for other supported services are available in the documentation for their respective functions in this package.

Maintained by Zachary Foster. Last updated 27 days ago.

taxonomy biology nomenclature json api web api-client identifiers species names api-wrapper biodiversity darwincore data taxize

274 stars 13.63 score 1.6k scripts 23 dependents

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 11 days ago.

845 stars 13.63 score 264 scripts 2 dependents

christophergandrud

networkD3:D3 JavaScript Network Graphs from R

Creates 'D3' 'JavaScript' network, tree, dendrogram, and Sankey graphs from 'R'.

Maintained by Christopher Gandrud. Last updated 6 years ago.

d3js networks

654 stars 13.60 score 3.4k scripts 31 dependents

yulab-smu

scatterpie:Scatter Pie Plot

Creates scatterpie plots, especially useful for plotting pies on a map.

Maintained by Guangchuang Yu. Last updated 3 months ago.

62 stars 13.60 score 820 scripts 68 dependents

dieghernan

tidyterra:'tidyverse' Methods and 'ggplot2' Helpers for 'terra' Objects

Extension of the 'tidyverse' for 'SpatRaster' and 'SpatVector' objects of the 'terra' package. It includes also new 'geom_' functions that provide a convenient way of visualizing 'terra' objects with 'ggplot2'.

Maintained by Diego Hernangómez. Last updated 17 hours ago.

terra ggplot-extension r-spatial rspatial

190 stars 13.59 score 1.9k scripts 26 dependents

kaz-yos

tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights

Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.

Maintained by Kazuki Yoshida. Last updated 3 years ago.

baseline-characteristics descriptive-statistics statistics

221 stars 13.55 score 2.3k scripts 12 dependents

andrie

ggdendro:Create Dendrograms and Tree Diagrams Using 'ggplot2'

This is a set of tools for dendrograms and tree plots using 'ggplot2'. The 'ggplot2' philosophy is to clearly separate data from the presentation. Unfortunately the plot method for dendrograms plots directly to a plot device without exposing the data. The 'ggdendro' package resolves this by making available functions that extract the dendrogram plot data. The package provides implementations for 'tree', 'rpart', as well as diana and agnes (from 'cluster') diagrams.

Maintained by Andrie de Vries. Last updated 4 months ago.

ggplot2

86 stars 13.54 score 3.9k scripts 62 dependents

mitchelloharawild

distributional:Vectorised Probability Distributions

Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions.

Maintained by Mitchell OHara-Wild. Last updated 4 days ago.

probability-distribution statistics vctrs

100 stars 13.54 score 744 scripts 388 dependents

tidyverts

fable:Forecasting Models for Tidy Time Series

Provides a collection of commonly used univariate and multivariate time series forecasting models including automatically selected exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA) models. These models work within the 'fable' framework provided by the 'fabletools' package, which provides the tools to evaluate, visualise, and combine models in a workflow consistent with the tidyverse.

Maintained by Mitchell OHara-Wild. Last updated 4 months ago.

forecasting cpp

569 stars 13.54 score 2.1k scripts 6 dependents

asgr

imager:Image Processing Library Based on 'CImg'

Fast image processing for images in up to 4 dimensions (two spatial dimensions, one time/depth dimension, one colour dimension). Provides most traditional image processing tools (filtering, morphology, transformations, etc.) as well as various functions for easily analysing image data using R. The package wraps 'CImg', <http://cimg.eu>, a simple, modern C++ library for image processing.

Maintained by Aaron Robotham. Last updated 7 days ago.

libx11 fftw3 tiff cpp openmp

17 stars 13.53 score 2.4k scripts 44 dependents

emilhvitfeldt

paletteer:Comprehensive Collection of Color Palettes

The choices of color palettes in R can be quite overwhelming with palettes spread over many packages with many different API's. This packages aims to collect all color palettes across the R ecosystem under the same package with a streamlined API.

Maintained by Emil Hvitfeldt. Last updated 9 months ago.

color-palette palettes

964 stars 13.53 score 6.9k scripts 23 dependents

bioc

GEOquery:Get data from NCBI Gene Expression Omnibus (GEO)

The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.

Maintained by Sean Davis. Last updated 5 months ago.

microarray dataimport onechannel twochannel sage bioconductor bioinformatics data-science genomics ncbi-geo

93 stars 13.48 score 4.1k scripts 45 dependents

bioc

RCy3:Functions to Access and Control Cytoscape

Vizualize, analyze and explore networks using Cytoscape via R. Anything you can do using the graphical user interface of Cytoscape, you can now do with a single RCy3 function.

Maintained by Alex Pico. Last updated 4 days ago.

visualization graphandnetwork thirdpartyclient network

52 stars 13.47 score 628 scripts 17 dependents

daattali

ggExtra:Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements

Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggMarginal()', which can be used to add marginal histograms/boxplots/density plots to 'ggplot2' scatterplots.

Maintained by Dean Attali. Last updated 10 months ago.

ggplot2 ggplot2-enhancements marginal-plots

387 stars 13.45 score 3.3k scripts 28 dependents

philchalmers

SimDesign:Structure for Organizing Monte Carlo Simulation Designs

Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>.

Maintained by Phil Chalmers. Last updated 3 days ago.

monte-carlo-simulation simulation simulation-framework

62 stars 13.41 score 253 scripts 47 dependents

modeloriented

DALEX:moDel Agnostic Language for Exploration and eXplanation

Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.

Maintained by Przemyslaw Biecek. Last updated 2 months ago.

black-box dalex data-science explainable-ai explainable-artificial-intelligence explainable-ml explanations explanatory-model-analysis fairness iml interpretability interpretable-machine-learning machine-learning model-visualization predictive-modeling responsible-ai responsible-ml xai

1.4k stars 13.40 score 876 scripts 21 dependents

vpetukhov

ggrastr:Rasterize Layers for 'ggplot2'

Rasterize only specific layers of a 'ggplot2' plot while simultaneously keeping all labels and text in vector format. This allows users to keep plots within the reasonable size limit without loosing vector properties of the scale-sensitive information.

Maintained by Evan Biederstedt. Last updated 2 years ago.

220 stars 13.37 score 1.9k scripts 53 dependents

yulab-smu

tidytree:A Tidy Tool for Phylogenetic Tree Data Manipulation

Phylogenetic tree generally contains multiple components including node, edge, branch and associated data. 'tidytree' provides an approach to convert tree object to tidy data frame as well as provides tidy interfaces to manipulate tree data.

Maintained by Guangchuang Yu. Last updated 8 months ago.

phylogenetic-tree tidyverse tree-data

56 stars 13.36 score 584 scripts 128 dependents

business-science

tidyquant:Tidy Quantitative Financial Analysis

Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.

Maintained by Matt Dancho. Last updated 2 months ago.

dplyr financial-analysis financial-data financial-statements multiple-stocks performance-analysis performanceanalytics quantmod stock stock-exchanges stock-indexes stock-lists stock-performance stock-prices stock-symbol tidyverse time-series timeseries xts

872 stars 13.34 score 5.2k scripts

projectmosaic

mosaic:Project MOSAIC Statistics and Mathematics Teaching Utilities

Data sets and utilities from Project MOSAIC (<http://www.mosaic-web.org>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.

Maintained by Randall Pruim. Last updated 1 years ago.

93 stars 13.32 score 7.2k scripts 7 dependents

ropensci

visdat:Preliminary Visualisation of Data

Create preliminary exploratory data visualisations of an entire dataset to identify problems or unexpected features using 'ggplot2'.

Maintained by Nicholas Tierney. Last updated 9 months ago.

exploratory-data-analysis missingness peer-reviewed ropensci visualisation

452 stars 13.31 score 2.1k scripts 11 dependents

chjackson

flexsurv:Flexible Parametric Survival and Multi-State Models

Flexible parametric models for time-to-event data, including the Royston-Parmar spline model, generalized gamma and generalized F distributions. Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multi-state models, based on either cause-specific hazards or mixture models.

Maintained by Christopher Jackson. Last updated 2 months ago.

cpp

57 stars 13.31 score 632 scripts 43 dependents

dreamrs

esquisse:Explore and Visualize Your Data Interactively

A 'shiny' gadget to create 'ggplot2' figures interactively with drag-and-drop to map your variables to different aesthetics. You can quickly visualize your data accordingly to their type, export in various formats, and retrieve the code to reproduce the plot.

Maintained by Victor Perrier. Last updated 1 months ago.

addin data-visualization ggplot2 rstudio-addin visualization

1.8k stars 13.31 score 1.1k scripts 1 dependents

trafficonese

leaflet.extras:Extra Functionality for 'leaflet' Package

The 'leaflet' JavaScript library provides many plugins some of which are available in the core 'leaflet' package, but there are many more. It is not possible to support them all in the core 'leaflet' package. This package serves as an add-on to the 'leaflet' package by providing extra functionality via 'leaflet' plugins.

Maintained by Sebastian Gatscha. Last updated 3 months ago.

data-visualization geospatial leaflet

218 stars 13.27 score 2.5k scripts 25 dependents

ropensci

rgbif:Interface to the Global Biodiversity Information Facility API

A programmatic interface to the Web Service methods provided by the Global Biodiversity Information Facility (GBIF; <https://www.gbif.org/developer/summary>). GBIF is a database of species occurrence records from sources all over the globe. rgbif includes functions for searching for taxonomic names, retrieving information on data providers, getting species occurrence records, getting counts of occurrence records, and using the GBIF tile map service to make rasters summarizing huge amounts of data.

Maintained by John Waller. Last updated 18 days ago.

gbif specimens api web-services occurrences species taxonomy biodiversity data lifewatch oscibio spocc

161 stars 13.26 score 2.1k scripts 20 dependents

guangchuangyu

ggplotify:Convert Plot to 'grob' or 'ggplot' Object

Convert plot function call (using expression or formula) to 'grob' or 'ggplot' object that compatible to the 'grid' and 'ggplot2' ecosystem. With this package, we are able to e.g. using 'cowplot' to align plots produced by 'base' graphics, 'ComplexHeatmap', 'eulerr', 'grid', 'lattice', 'magick', 'pheatmap', 'vcd' etc. by converting them to 'ggplot' objects.

Maintained by Guangchuang Yu. Last updated 1 years ago.

baseplot ggplot2 grid lattice upsetr vcd

108 stars 13.23 score 2.0k scripts 174 dependents

easystats

see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'

Provides plotting utilities supporting packages in the 'easystats' ecosystem (<https://github.com/easystats/easystats>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <https://materialui.co/>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.

Maintained by Indrajeet Patil. Last updated 20 days ago.

data-visualization easystats ggplot2 hacktoberfest plotting see statistics visualisation visualization

902 stars 13.22 score 2.0k scripts 3 dependents

oscarkjell

text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Maintained by Oscar Kjell. Last updated 9 days ago.

deep-learning machine-learning nlp transformers openjdk

145 stars 13.21 score 436 scripts 1 dependents

wadpac

GGIR:Raw Accelerometer Data Analysis

A tool to process and analyse data collected with wearable raw acceleration sensors as described in Migueles and colleagues (JMPB 2019), and van Hees and colleagues (JApplPhysiol 2014; PLoSONE 2015). The package has been developed and tested for binary data from 'GENEActiv' <https://activinsights.com/>, binary (.gt3x) and .csv-export data from 'Actigraph' <https://theactigraph.com> devices, and binary (.cwa) and .csv-export data from 'Axivity' <https://axivity.com>. These devices are currently widely used in research on human daily physical activity. Further, the package can handle accelerometer data file from any other sensor brand providing that the data is stored in csv format. Also the package allows for external function embedding.

Maintained by Vincent T van Hees. Last updated 17 days ago.

accelerometer activity-recognition circadian-rhythm movement-sensor sleep

109 stars 13.20 score 342 scripts 3 dependents

bioc

dada2:Accurate, high-resolution sample inference from amplicon sequencing data

The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.

Maintained by Benjamin Callahan. Last updated 5 months ago.

immunooncology microbiome sequencing classification metagenomics amplicon bioconductor bioinformatics metabarcoding taxonomy cpp

487 stars 13.17 score 3.0k scripts 4 dependents

jacobkap

fastDummies:Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables

Creates dummy columns from columns that have categorical variables (character or factor types). You can also specify which columns to make dummies out of, or which columns to ignore. Also creates dummy rows from character, factor, and Date columns. This package provides a significant speed increase from creating dummy variables through model.matrix().

Maintained by Jacob Kaplan. Last updated 2 months ago.

binary-data dummy-columns dummy-data dummy-rows dummy-variable

38 stars 13.13 score 2.5k scripts 134 dependents

stan-dev

shinystan:Interactive Visual and Numerical Diagnostics and Posterior Analysis for Bayesian Models

A graphical user interface for interactive Markov chain Monte Carlo (MCMC) diagnostics and plots and tables helpful for analyzing a posterior sample. The interface is powered by the 'Shiny' web application framework from 'RStudio' and works with the output of MCMC programs written in any programming language (and has extended functionality for 'Stan' models fit using the 'rstan' and 'rstanarm' packages).

Maintained by Jonah Gabry. Last updated 3 years ago.

bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics mcmc shiny-apps stan statistical-graphics

200 stars 13.13 score 1.6k scripts 15 dependents

runehaubo

lmerTest:Tests in Linear Mixed Effects Models

Provides p-values in type I, II or III anova and summary tables for lmer model fits (cf. lme4) via Satterthwaite's degrees of freedom method. A Kenward-Roger method is also available via the pbkrtest package. Model selection methods include step, drop1 and anova-like tables for random effects (ranova). Methods for Least-Square means (LS-means) and tests of linear contrasts of fixed effects are also available.

Maintained by Rune Haubo Bojesen Christensen. Last updated 4 years ago.

52 stars 13.09 score 13k scripts 91 dependents

larmarange

ggstats:Extension to 'ggplot2' for Plotting Stats

Provides new statistics, new geometries and new positions for 'ggplot2' and a suite of functions to facilitate the creation of statistical plots.

Maintained by Joseph Larmarange. Last updated 21 days ago.

37 stars 13.08 score 190 scripts 156 dependents

tagteam

riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks

Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.

Maintained by Thomas Alexander Gerds. Last updated 1 months ago.

openblas cpp

47 stars 13.07 score 736 scripts 37 dependents