edgar:Tool for the U.S. SEC EDGAR Retrieval and Parsing of Corporate Filings
In the USA, companies file different forms with the U.S. Securities and Exchange Commission (SEC) through EDGAR (Electronic Data Gathering, Analysis, and Retrieval system). The EDGAR database automated system collects all the different necessary filings and makes it publicly available. This package facilitates retrieving, storing, searching, and parsing of all the available filings on the EDGAR server. It downloads filings from SEC server in bulk with a single query. Additionally, it provides various useful functions: extracts 8-K triggering events, extract "Business (Item 1)" and "Management's Discussion and Analysis(Item 7)" sections of annual statements, searches filings for desired keywords, provides sentiment measures, parses filing header information, and provides HTML view of SEC filings.
Maintained by Gunratan Lonare. Last updated 9 days ago.
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 10 days ago.
blavaan:Bayesian Latent Variable Analysis
Fit a variety of Bayesian latent variable models, including confirmatory factor analysis, structural equation models, and latent growth curve models. References: Merkle & Rosseel (2018) <doi:10.18637/jss.v085.i04>; Merkle et al. (2021) <doi:10.18637/jss.v100.i06>.
Maintained by Edgar Merkle. Last updated 5 days ago.
chattr:Interact with Large Language Models in 'RStudio'
Enables user interactivity with large-language models ('LLM') inside the 'RStudio' integrated development environment (IDE). The user can interact with the model using the 'shiny' app included in this package, or directly in the 'R' console. It comes with back-ends for 'OpenAI', 'GitHub' 'Copilot', and 'LlamaGPT'.
Maintained by Edgar Ruiz. Last updated 2 months ago.
nonnest2:Tests of Non-Nested Models
Testing non-nested models via theory supplied by Vuong (1989) <DOI:10.2307/1912557>. Includes tests of model distinguishability and of model fit that can be applied to both nested and non-nested models. Also includes functionality to obtain confidence intervals associated with AIC and BIC. This material is partially based on work supported by the National Science Foundation under Grant Number SES-1061334.
Maintained by Edgar Merkle. Last updated 7 months ago.
edgarWebR:SEC Filings Access
A set of methods to access and parse live filing information from the U.S. Securities and Exchange Commission (SEC - <>) including company and fund filings along with all associated metadata.
Maintained by Micah J Waldstein. Last updated 4 years ago.
mall:Run Multiple Large Language Model Predictions Against a Table, or Vectors
Run multiple 'Large Language Model' predictions against a table. The predictions run row-wise over a specified column. It works using a one-shot prompt, along with the current row's content. The prompt that is used will depend of the type of analysis needed.
Maintained by Edgar Ruiz. Last updated 3 months ago.
connections:Integrates with the 'RStudio' Connections Pane and 'pins'
Enables 'DBI' compliant packages to integrate with the 'RStudio' connections pane, and the 'pins' package. It automates the display of schemata, tables, views, as well as the preview of the table's top 1000 records.
Maintained by Edgar Ruiz. Last updated 1 years ago.
sparklyr.flint:Sparklyr Extension for 'Flint'
This sparklyr extension makes 'Flint' time series library functionalities (<>) easily accessible through R.
Maintained by Edgar Ruiz. Last updated 3 years ago.
dbplot:Simplifies Plotting Data Inside Databases
Leverages 'dplyr' to process the calculations of a plot inside a database. This package provides helper functions that abstract the work at three levels: outputs a 'ggplot', outputs the calculations, outputs the formula needed to calculate bins.
Maintained by Edgar Ruiz. Last updated 5 years ago.
ggstatsplot:'ggplot2' Based Plots with Statistical Details
Extension of 'ggplot2', 'ggstatsplot' creates graphics with details from statistical tests included in the plots themselves. It provides an easier syntax to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms, dot plots, dot-and-whisker plots) or categorical (pie and bar charts) data. Currently, it supports the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian versions of t-test/ANOVA, correlation analyses, contingency table analysis, meta-analysis, and regression analyses. References: Patil (2021) <doi:10.21105/joss.03236>.
Maintained by Indrajeet Patil. Last updated 20 days ago.
SSNbayes:Bayesian Spatio-Temporal Analysis in Stream Networks
Fits Bayesian spatio-temporal models and makes predictions on stream networks using the approach by Santos-Fernandez, Edgar, et al. (2022)."Bayesian spatio-temporal models for stream networks" and Santos-Fernandez, Edgar, et al. (2023). "SSNbayes: An R Package for Bayesian Spatio-Temporal Modelling on Stream Networks". In these models, spatial dependence is captured using stream distance and flow connectivity, while temporal autocorrelation is modelled using vector autoregression methods.
Maintained by Edgar Santos-Fernandez. Last updated 2 months ago.
pysparklyr:Provides a 'PySpark' Back-End for the 'sparklyr' Package
It enables 'sparklyr' to integrate with 'Spark Connect', and 'Databricks Connect' by providing a wrapper over the 'PySpark' 'python' library.
Maintained by Edgar Ruiz. Last updated 4 days ago.
datos:Traduce al Español Varios Conjuntos de Datos de Práctica
Provee una versión traducida de los siguientes conjuntos de datos: 'airlines', 'airports', 'AwardsManagers', 'babynames', 'Batting', 'credit_data', 'diamonds', 'faithful', 'fueleconomy', 'Fielding', 'flights', 'gapminder', 'gss_cat', 'iris', 'Managers', 'mpg', 'mtcars', 'atmos', 'palmerpenguins', 'People, 'Pitching', 'planes', 'presidential', 'table1', 'table2', 'table3', 'table4a', 'table4b', 'table5', 'vehicles', 'weather', 'who'. English: It provides a Spanish translated version of the datasets listed above.
Maintained by Riva Quiroga. Last updated 1 years ago.
eixport:Export Emissions to Atmospheric Models
Emissions are the mass of pollutants released into the atmosphere. Air quality models need emissions data, with spatial and temporal distribution, to represent air pollutant concentrations. This package, eixport, creates inputs for the air quality models 'WRF-Chem' Grell et al (2005) <doi:10.1016/j.atmosenv.2005.04.027>, 'MUNICH' Kim et al (2018) <doi:10.5194/gmd-11-611-2018> , 'BRAMS-SPM' Freitas et al (2005) <doi:10.1016/j.atmosenv.2005.07.017> and 'RLINE' Snyder et al (2013) <doi:10.1016/j.atmosenv.2013.05.074>. See the 'eixport' website (<>) for more information, documentations and examples. More details in Ibarra-Espinosa et al (2018) <doi:10.21105/joss.00607>.
Maintained by Sergio Ibarra-Espinosa. Last updated 26 days ago.
statsExpressions:Tidy Dataframes and Expressions with Statistical Details
Utilities for producing dataframes with rich details for the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian t-test, one-way ANOVA, correlation analyses, contingency table analyses, and meta-analyses. The functions are pipe-friendly and provide a consistent syntax to work with tidy data. These dataframes additionally contain expressions with statistical details, and can be used in graphing packages. This package also forms the statistical processing backend for 'ggstatsplot'. References: Patil (2021) <doi:10.21105/joss.03236>.
Maintained by Indrajeet Patil. Last updated 21 days ago.
sparkwarc:Load WARC Files into Apache Spark
Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <>.
Maintained by Edgar Ruiz. Last updated 3 years ago.
tidyedgar:Tidy Fundamental Financial Data from 'SEC's 'EDGAR' 'API'
Streamline the process of accessing fundamental financial data from the United States Securities and Exchange Commission's ('SEC') Electronic Data Gathering, Analysis, and Retrieval system ('EDGAR') 'API' <>, transforming it into a tidy, analysis-ready format.
Maintained by Gerard Gimenez-Adsuar. Last updated 1 years ago.
dbplyr:A 'dplyr' Back End for Databases
A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.
Maintained by Hadley Wickham. Last updated 3 months ago.
finreportr:Financial Data from U.S. Securities and Exchange Commission
Download and display company financial data from the U.S. Securities and Exchange Commission's EDGAR database. It contains a suite of functions with web scraping and XBRL parsing capabilities that allows users to extract data from EDGAR in an automated and scalable manner. See <> for more information.
Maintained by Seward Lee. Last updated 3 years ago.
dataset:Create Data Frames that are Easier to Exchange and Reuse
The aim of the 'dataset' package is to make tidy datasets easier to release, exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, well-described, interoperable datasets into release and reuse ready form.
Maintained by Daniel Antal. Last updated 21 days ago.
dados:Translate Datasets to Portuguese
Este pacote traduz os seguintes conjuntos de dados: 'airlines', 'airports', 'ames_raw', 'AwardsManagers', 'babynames', 'Batting', 'diamonds', 'faithful', 'fueleconomy', 'Fielding', 'flights', 'gapminder', 'gss_cat', 'iris', 'Managers', 'mpg', 'mtcars', 'atmos', 'penguins', 'People, 'Pitching', 'pixarfilms','planes', 'presidential', 'table1', 'table2', 'table3', 'table4a', 'table4b', 'table5', 'vehicles', 'weather', 'who'. English: It provides a Portuguese translated version of the datasets listed above.
Maintained by Riva Quiroga. Last updated 7 months ago.
MedDataSets:Comprehensive Medical, Disease, Treatment, and Drug Datasets
Provides an extensive collection of datasets related to medicine, diseases, treatments, drugs, and public health. This package covers topics such as drug effectiveness, vaccine trials, survival rates, infectious disease outbreaks, and medical treatments. The included datasets span various health conditions, including AIDS, cancer, bacterial infections, and COVID-19, along with information on pharmaceuticals and vaccines. These datasets are sourced from the R ecosystem and other R packages, remaining unaltered to ensure data integrity. This package serves as a valuable resource for researchers, analysts, and healthcare professionals interested in conducting medical and public health data analysis in R.
Maintained by Renzo Caceres Rossi. Last updated 5 months ago.
semTools:Useful Tools for Structural Equation Modeling
Provides miscellaneous tools for structural equation modeling, many of which extend the 'lavaan' package. For example, latent interactions can be estimated using product indicators (Lin et al., 2010, <doi:10.1080/10705511.2010.488999>) and simple effects probed; analytical power analyses can be conducted (Jak et al., 2021, <doi:10.3758/s13428-020-01479-0>); and scale reliability can be estimated based on estimated factor-model parameters.
Maintained by Terrence D. Jorgensen. Last updated 4 days ago.
dendextend:Extending 'dendrogram' Functionality in R
Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.
Maintained by Tal Galili. Last updated 2 months ago.
sparkxgb:Interface for 'XGBoost' on 'Apache Spark'
A 'sparklyr' <> extension that provides an R interface for 'XGBoost' <> on 'Apache Spark'. 'XGBoost' is an optimized distributed gradient boosting library.
Maintained by Edgar Ruiz. Last updated 11 months ago.
probably:Tools for Post-Processing Predicted Values
Models can be improved by post-processing class probabilities, by: recalibration, conversion to hard probabilities, assessment of equivocal zones, and other activities. 'probably' contains tools for conducting these operations as well as calibration tools and conformal inference techniques for regression models.
Maintained by Max Kuhn. Last updated 5 months ago.
tidypredict:Run Predictions Inside the Database
It parses a fitted 'R' model object, and returns a formula in 'Tidy Eval' code that calculates the predictions. It works with several databases back-ends because it leverages 'dplyr' and 'dbplyr' for the final 'SQL' translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.
Maintained by Emil Hvitfeldt. Last updated 3 months ago.
farr:Data and Code for Financial Accounting Research
Handy functions and data to support a course book for accounting research. Gow, Ian D. and Tongqing Ding (2024) 'Empirical Research in Accounting: Tools and Methods' <>.
Maintained by Ian Gow. Last updated 1 months ago.
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
ecostats:Code and Data Accompanying the Eco-Stats Text (Warton 2022)
Functions and data supporting the Eco-Stats text (Warton, 2022, Springer), and solutions to exercises. Functions include tools for using simulation envelopes in diagnostic plots, and a function for diagnostic plots of multivariate linear models. Datasets mentioned in the package are included here (where not available elsewhere) and there is a vignette for each chapter of the text with solutions to exercises.
Maintained by David Warton. Last updated 1 years ago.
modeldb:Fits Models Inside the Database
Uses 'dplyr' and 'tidyeval' to fit statistical models inside the database. It currently supports KMeans and linear regression models.
Maintained by Max Kuhn. Last updated 1 years ago.
sdtmchecks:Data Quality Checks for Study Data Tabulation Model (SDTM) Datasets
A series of checks to identify common issues in Study Data Tabulation Model (SDTM) datasets. These checks are intended to be generalizable, actionable, and meaningful for analysis.
Maintained by Will Harris. Last updated 3 months ago.
Autoseed:Retrieve Disease-Related Genes from Public Sources
For researchers to quickly and comprehensively acquire disease genes, so as to understand the mechanism of disease, we developed this program to acquire disease-related genes. The data is integrated from three public databases. The three databases are 'eDGAR', 'DrugBank' and 'MalaCards'. The 'eDGAR' is a comprehensive database, containing data on the relationship between disease and genes. 'DrugBank' contains information on 13443 drugs and 5157 targets. 'MalaCards' integrates human disease information, including disease-related genes.
Maintained by Jiawei Wu. Last updated 5 years ago.
gimme:Group Iterative Multiple Model Estimation
Data-driven approach for arriving at person-specific time series models. The method first identifies which relations replicate across the majority of individuals to detect signal from noise. These group-level relations are then used as a foundation for starting the search for person-specific (or individual-level) relations. See Gates & Molenaar (2012) <doi:10.1016/j.neuroimage.2012.06.026>.
Maintained by Kathleen M Gates. Last updated 6 months ago.
optimStrat:Choosing the Sample Strategy
Intended to assist in the choice of the sampling strategy to implement in a survey.
Maintained by Edgar Bueno. Last updated 2 years ago.
rcorpora:A Collection of Small Text Corpora of Interesting Data
A collection of small text corpora of interesting data. It contains all data sets from 'dariusk/corpora'. Some examples: names of animals: birds, dinosaurs, dogs; foods: beer categories, pizza toppings; geography: English towns, rivers, oceans; humans: authors, US presidents, occupations; science: elements, planets; words: adjectives, verbs, proverbs, US president quotes.
Maintained by Gábor Csárdi. Last updated 7 years ago.
evolqg:Evolutionary Quantitative Genetics
Provides functions for covariance matrix comparisons, estimation of repeatabilities in measurements and matrices, and general evolutionary quantitative genetics tools. Melo D, Garcia G, Hubbe A, Assis A P, Marroig G. (2016) <doi:10.12688/f1000research.7082.3>.
Maintained by Diogo Melo. Last updated 11 months ago.
rscontract:Generic implementation of the 'RStudio' connections contract
Provides a generic implementation of the 'RStudio' connection contract to make it easier for database connections, and other type of connections, opened via R packages integrate with the connections pane inside the 'RStudio' interactive development environment (IDE).
Maintained by Nathan Stephens. Last updated 4 years ago.
muscle:Multiple Sequence Alignment with MUSCLE
MUSCLE performs multiple sequence alignments of nucleotide or amino acid sequences.
Maintained by Alex T. Kalinka. Last updated 5 months ago.
merDeriv:Case-Wise and Cluster-Wise Derivatives for Mixed Effects Models
Compute case-wise and cluster-wise derivative for mixed effects models with respect to fixed effects parameter, random effect (co)variances, and residual variance. This material is partially based on work supported by the National Science Foundation under Grant Number 1460719.
Maintained by Ting Wang. Last updated 3 years ago.
MPCI:Multivariate Process Capability Indices (MPCI)
It performs the followings Multivariate Process Capability Indices: Shahriari et al. (1995) Multivariate Capability Vector, Taam et al. (1993) Multivariate Capability Index (MCpm), Pan and Lee (2010) proposal (NMCpm) and the followings based on Principal Component Analysis (PCA):Wang and Chen (1998), Xekalaki and Perakis (2002) and Wang (2005). Two datasets are included.
Maintained by Edgar Santos-Fernandez. Last updated 9 years ago.
strucchangeRcpp:Testing, Monitoring, and Dating Structural Changes: C++ Version
A fast implementation with additional experimental features for testing, monitoring and dating structural changes in (linear) regression models. 'strucchangeRcpp' features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g. cumulative/moving sum, recursive/moving estimates) and F statistics, respectively. These methods are described in Zeileis et al. (2002) <doi:10.18637/jss.v007.i02>. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals, and their magnitude as well as the model fit can be evaluated using a variety of statistical measures.
Maintained by Dainius Masiliunas. Last updated 5 months ago.
TapeR:Flexible Tree Taper Curves Based on Semiparametric Mixed Models
Implementation of functions for fitting taper curves (a semiparametric linear mixed effects taper model) to diameter measurements along stems. Further functions are provided to estimate the uncertainty around the predicted curves, to calculate timber volume (also by sections) and marginal (e.g., upper) diameters. For cases where tree heights are not measured, methods for estimating additional variance in volume predictions resulting from uncertainties in tree height models (tariffs) are provided. The example data include the taper curve parameters for Norway spruce used in the 3rd German NFI fitted to 380 trees and a subset of section-wise diameter measurements of these trees. The functions implemented here are detailed in Kublin, E., Breidenbach, J., Kaendler, G. (2013) <doi:10.1007/s10342-013-0715-0>.
Maintained by Christian Vonderach. Last updated 1 years ago.
TapeS:Tree Taper Curves and Sorting Based on 'TapeR'
Providing new german-wide 'TapeR' Models and functions for their evaluation. Included are the most common tree species in Germany (Norway spruce, Scots pine, European larch, Douglas fir, Silver fir as well as European beech, Common/Sessile oak and Red oak). Many other species are mapped to them so that 36 tree species / groups can be processed. Single trees are defined by species code, one or multiple diameters in arbitrary measuring height and tree height. The functions then provide information on diameters along the stem, bark thickness, height of diameters, volume of the total or parts of the trunk and total and component above-ground biomass. It is also possible to calculate assortments from the taper curves. Uncertainty information is provided for diameter, volume and component biomass estimation.
Maintained by Christian Vonderach. Last updated 1 months ago.
rBDAT:Implementation of BDAT Tree Taper Fortran Functions
Implementing the BDAT tree taper Fortran routines, which were developed for the German National Forest Inventory (NFI), to calculate diameters, volume, assortments, double bark thickness and biomass for different tree species based on tree characteristics and sorting information. See Kublin (2003) <doi:10.1046/j.1439-0337.2003.00183.x> for details.
Maintained by Christian Vonderach. Last updated 6 months ago.
Efficient calculation of pseudo-ranks and (pseudo)-rank based test statistics. In case of equal sample sizes, pseudo-ranks and mid-ranks are equal. When used for inference mid-ranks may lead to paradoxical results. Pseudo-ranks are in general not affected by such a problem. See Happ et al. (2020, <doi:10.18637/jss.v095.c01>) for details.
Maintained by Martin Happ. Last updated 27 days ago.
WMWssp:Wilcoxon-Mann-Whitney Sample Size Planning
Calculates the minimal sample size for the Wilcoxon-Mann-Whitney test that is needed for a given power and two sided type I error rate. The method works for metric data with and without ties, count data, ordered categorical data, and even dichotomous data. But data is needed for the reference group to generate synthetic data for the treatment group based on a relevant effect. See Happ et al. (2019, <doi:10.1002/sim.7983>) for details.
Maintained by Martin Happ. Last updated 27 days ago.
nparLD:Nonparametric Analysis of Longitudinal Data in Factorial Experiments
Performs nonparametric analysis of longitudinal data in factorial experiments. Longitudinal data are those which are collected from the same subjects over time, and they frequently arise in biological sciences. Nonparametric methods do not require distributional assumptions, and are applicable to a variety of data types (continuous, discrete, purely ordinal, and dichotomous). Such methods are also robust with respect to outliers and for small sample sizes.
Maintained by Frank Konietschke. Last updated 3 years ago.
rquery:Relational Query Generator for Data Manipulation at Scale
A piped query generator based on Edgar F. Codd's relational algebra, and on production experience using 'SQL' and 'dplyr' at big data scale. The design represents an attempt to make 'SQL' more teachable by denoting composition by a sequential pipeline notation instead of nested queries or functions. The implementation delivers reliable high performance data processing on large data systems such as 'Spark', databases, and 'data.table'. Package features include: data processing trees or pipelines as observable objects (able to report both columns produced and columns used), optimized 'SQL' generation as an explicit user visible table modeling step, plus explicit query reasoning and checking.
Maintained by John Mount. Last updated 2 years ago.
rankFD:Rank-Based Tests for General Factorial Designs
The rankFD() function calculates the Wald-type statistic (WTS) and the ANOVA-type statistic (ATS) for nonparametric factorial designs, e.g., for count, ordinal or score data in a crossed design with an arbitrary number of factors. Brunner, E., Bathke, A. and Konietschke, F. (2018) <doi:10.1007/978-3-030-02914-2>.
Maintained by Frank Konietschke. Last updated 3 years ago.
