R-universe search: role

tidymodels

recipes:Preprocessing and Feature Engineering Steps for Modeling

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Maintained by Max Kuhn. Last updated 5 days ago.

11.4 match 584 stars 18.71 score 7.2k scripts 380 dependents

arnaudgallou

plume:A Simple Author Handler for Scientific Writing

Handles and formats author information in scientific writing in 'R Markdown' and 'Quarto'. 'plume' provides easy-to-use and flexible tools for injecting author metadata in 'YAML' headers as well as generating author and contribution lists (among others) as strings from tabular data.

Maintained by Arnaud Gallou. Last updated 30 days ago.

authors contribution contributions list lists markdown paper preprint quarto role roles

24.6 match 21 stars 6.84 score 15 scripts

josesamos

rolap:Obtaining Star Databases from Flat Tables

Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a ROLAP (Relational On-Line Analytical Processing) star database. The main objective of the package is to allow the definition of these transformations easily. The implementation of the multidimensional database obtained can be exported to work with multidimensional analysis tools on spreadsheets or relational databases.

Maintained by Jose Samos. Last updated 1 years ago.

openjdk

14.8 match 5 stars 6.12 score 25 scripts 1 dependents

tychobra

polished:Authentication and Hosting for 'shiny' Apps

Authentication, user administration, hosting, and additional infrastructure for 'shiny' apps. See <https://polished.tech> for additional documentation and examples.

Maintained by Andy Merlino. Last updated 1 years ago.

10.9 match 233 stars 8.09 score 75 scripts

azure

AzureRMR:Interface to 'Azure Resource Manager'

A lightweight but powerful R interface to the 'Azure Resource Manager' REST API. The package exposes a comprehensive class framework and related tools for creating, updating and deleting 'Azure' resource groups, resources and templates. While 'AzureRMR' can be used to manage any 'Azure' service, it can also be extended by other packages to provide extra functionality for specific services. Part of the 'AzureR' family of packages.

Maintained by Hong Ooi. Last updated 1 years ago.

azure azure-resource-manager azure-sdk-r cloud

8.1 match 20 stars 9.94 score 51 scripts 12 dependents

r-lib

desc:Manipulate DESCRIPTION Files

Tools to read, write, create, and manipulate DESCRIPTION files. It is intended for packages that create or manipulate other packages.

Maintained by Gábor Csárdi. Last updated 1 months ago.

5.3 match 123 stars 14.68 score 409 scripts 1.1k dependents

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 11 months ago.

24.8 match 2.28 score 38 scripts

revelle

psychTools:Tools to Accompany the 'psych' Package for Psychological Research

Support functions, data sets, and vignettes for the 'psych' package. Contains several of the biggest data sets for the 'psych' package as well as four vignettes. A few helper functions for file manipulation are included as well. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 12 months ago.

6.9 match 5.89 score 178 scripts 5 dependents

vubiostat

redcapAPI:Interface to 'REDCap'

Access data stored in 'REDCap' databases using the Application Programming Interface (API). 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>, Harris, et al. (2009) <doi:10.1016/j.jbi.2008.08.010>, Harris, et al. (2019) <doi:10.1016/j.jbi.2019.103208>) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The API allows users to access data and project meta data (such as the data dictionary) from the web programmatically. The 'redcapAPI' package facilitates the process of accessing data with options to prepare an analysis-ready data set consistent with the definitions in a database's data dictionary.

Maintained by Shawn Garbett. Last updated 8 days ago.

3.5 match 22 stars 10.47 score 134 scripts 2 dependents

tom-wolff

ideanet:Integrating Data Exchange and Analysis for Networks ('ideanet')

A suite of convenient tools for social network analysis geared toward students, entry-level users, and non-expert practitioners. ‘ideanet’ features unique functions for the processing and measurement of sociocentric and egocentric network data. These functions automatically generate node- and system-level measures commonly used in the analysis of these types of networks. Outputs from these functions maximize the ability of novice users to employ network measurements in further analyses while making all users less prone to common data analytic errors. Additionally, ‘ideanet’ features an R Shiny graphic user interface that allows novices to explore network data with minimal need for coding.

Maintained by Tom Wolff. Last updated 2 days ago.

5.1 match 6 stars 6.80 score 10 scripts

bioc

rsbml:R support for SBML, using libsbml

Links R to libsbml for SBML parsing, validating output, provides an S4 SBML DOM, converts SBML to R graph objects. Optionally links to the SBML ODE Solver Library (SOSLib) for simulating models.

Maintained by Michael Lawrence. Last updated 17 days ago.

graphandnetwork pathways network libsbml cpp

6.3 match 4.71 score 19 scripts 1 dependents

bioc

Biobase:Biobase: Base functions for Bioconductor

Functions that are needed by many other packages or which replace R functions.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure bioconductor-package core-package

1.8 match 9 stars 16.45 score 6.6k scripts 1.8k dependents

avdrark

cmm:Categorical Marginal Models

Quite extensive package for maximum likelihood estimation and weighted least squares estimation of categorical marginal models (CMMs; e.g., Bergsma and Rudas, 2002, <http://www.jstor.org/stable/2700006?; Bergsma, Croon and Hagenaars, 2009, <DOI:10.1007/b12532>.

Maintained by L. A. van der Ark. Last updated 2 years ago.

10.2 match 2.73 score 25 scripts 4 dependents

thothorn

HSAUR3:A Handbook of Statistical Analyses Using R (3rd Edition)

Functions, data sets, analyses and examples from the third edition of the book ''A Handbook of Statistical Analyses Using R'' (Torsten Hothorn and Brian S. Everitt, Chapman & Hall/CRC, 2014). The first chapter of the book, which is entitled ''An Introduction to R'', is completely included in this package, for all other chapters, a vignette containing all data analyses is available. In addition, Sweave source code for slides of selected chapters is included in this package (see HSAUR3/inst/slides). The publishers web page is '<https://www.routledge.com/A-Handbook-of-Statistical-Analyses-using-R/Hothorn-Everitt/p/book/9781482204582>'.

Maintained by Torsten Hothorn. Last updated 7 months ago.

4.0 match 6 stars 6.72 score 120 scripts 2 dependents

akgold

onelogin:Interact with the 'OneLogin' API

The identity provider ['OneLogin']<http://onelogin.com> is used for authentication via Single Sign On (SSO). This package provides an R interface to their API.

Maintained by Alex Gold. Last updated 6 years ago.

9.9 match 2.70 score 1 scripts

markedmondson1234

googleAuthR:Authenticate and Create Google APIs

Create R functions that interact with OAuth2 Google APIs <https://developers.google.com/apis-explorer/> easily, with auto-refresh and Shiny compatibility.

Maintained by Erik Grönroos. Last updated 10 months ago.

api authentication google googleauthr oauth2-flow shiny

2.0 match 178 stars 12.84 score 804 scripts 13 dependents

smartdata-analysis-and-statistics

metamisc:Meta-Analysis of Diagnosis and Prognosis Research Studies

Facilitate frequentist and Bayesian meta-analysis of diagnosis and prognosis research studies. It includes functions to summarize multiple estimates of prediction model discrimination and calibration performance (Debray et al., 2019) <doi:10.1177/0962280218785504>. It also includes functions to evaluate funnel plot asymmetry (Debray et al., 2018) <doi:10.1002/jrsm.1266>. Finally, the package provides functions for developing multivariable prediction models from datasets with clustering (de Jong et al., 2021) <doi:10.1002/sim.8981>.

Maintained by Thomas Debray. Last updated 30 days ago.

meta-analysis prognosis prognostic-models

3.4 match 7 stars 7.48 score 102 scripts

azure

AzureStor:Storage Management in 'Azure'

Manage storage in Microsoft's 'Azure' cloud: <https://azure.microsoft.com/en-us/product-categories/storage/>. On the admin side, 'AzureStor' includes features to create, modify and delete storage accounts. On the client side, it includes an interface to blob storage, file storage, and 'Azure Data Lake Storage Gen2': upload and download files and blobs; list containers and files/blobs; create containers; and so on. Authenticated access to storage is supported, via either a shared access key or a shared access signature (SAS). Part of the 'AzureR' family of packages.

Maintained by Hong Ooi. Last updated 2 years ago.

azure-data-lake azure-sdk-r azure-storage azure-storage-blob azure-storage-file

2.3 match 64 stars 10.72 score 298 scripts 4 dependents

bioc

MoonlightR:Identify oncogenes and tumor suppressor genes from omics data

Motivation: The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). Results: We present an R/bioconductor package called MoonlightR which returns a list of candidate driver genes for specific cancer types on the basis of TCGA expression data. The method first infers gene regulatory networks and then carries out a functional enrichment analysis (FEA) (implementing an upstream regulator analysis, URA) to score the importance of well-known biological processes with respect to the studied cancer type. Eventually, by means of random forests, MoonlightR predicts two specific roles for the candidate driver genes: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, MoonlightR can be used to discover OCGs and TSGs in the same cancer type. This may help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV) in breast cancer. In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments.

Maintained by Matteo Tiberti. Last updated 5 months ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network survival genesetenrichment networkenrichment

3.7 match 17 stars 6.57 score

tidymodels

textrecipes:Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Maintained by Emil Hvitfeldt. Last updated 8 days ago.

2.3 match 160 stars 10.87 score 964 scripts 1 dependents

thothorn

HSAUR:A Handbook of Statistical Analyses Using R (1st Edition)

Functions, data sets, analyses and examples from the book ''A Handbook of Statistical Analyses Using R'' (Brian S. Everitt and Torsten Hothorn, Chapman & Hall/CRC, 2006). The first chapter of the book, which is entitled ''An Introduction to R'', is completely included in this package, for all other chapters, a vignette containing all data analyses is available.

Maintained by Torsten Hothorn. Last updated 3 years ago.

4.0 match 6.07 score 253 scripts 5 dependents

azure

AzureGraph:Simple Interface to 'Microsoft Graph'

A simple interface to the 'Microsoft Graph' API <https://learn.microsoft.com/en-us/graph/overview>. 'Graph' is a comprehensive framework for accessing data in various online Microsoft services. This package was originally intended to provide an R interface only to the 'Azure Active Directory' part, with a view to supporting interoperability of R and 'Azure': users, groups, registered apps and service principals. However it has since been expanded into a more general tool for interacting with Graph. Part of the 'AzureR' family of packages.

Maintained by Hong Ooi. Last updated 2 years ago.

azure-active-directory-graph-api azure-sdk-r microsoft-graph-api

2.3 match 32 stars 10.30 score 36 scripts 21 dependents

mlr-org

mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'

Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.

Maintained by Martin Binder. Last updated 8 days ago.

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing stacking

1.9 match 141 stars 12.36 score 448 scripts 7 dependents

stouffer

rnetcarto:Fast Network Modularity and Roles Computation by Simulated Annealing (Rgraph C Library Wrapper for R)

Provides functions to compute the modularity and modularity-related roles in networks. It is a wrapper around the rgraph library (Guimera & Amaral, 2005, <doi:10.1038/nature03288>).

Maintained by Daniel B. Stouffer. Last updated 2 years ago.

gsl

5.0 match 1 stars 4.58 score 38 scripts

r-forge

Sleuth3:Data Sets from Ramsey and Schafer's "Statistical Sleuth (3rd Ed)"

Data sets from Ramsey, F.L. and Schafer, D.W. (2013), "The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed)", Cengage Learning.

Maintained by Berwin A Turlach. Last updated 1 years ago.

3.6 match 6.38 score 522 scripts

paws-r

paws:Amazon Web Services Software Development Kit

Interface to Amazon Web Services <https://aws.amazon.com>, including storage, database, and compute services, such as 'Simple Storage Service' ('S3'), 'DynamoDB' 'NoSQL' database, and 'Lambda' functions-as-a-service.

Maintained by Dyfan Jones. Last updated 3 days ago.

aws aws-sdk

2.0 match 332 stars 11.25 score 177 scripts 12 dependents

jiefei-wang

aws.ecx:Communicating with AWS EC2 and ECS using AWS REST APIs

Providing the functions for communicating with Amazon Web Services(AWS) Elastic Compute Cloud(EC2) and Elastic Container Service(ECS). The functions will have the prefix 'ecs_' or 'ec2_' depending on the class of the API. The request will be sent via the REST API and the parameters are given by the function argument. The credentials can be set via 'aws_set_credentials'. The EC2 documentation can be found at <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Welcome.html> and ECS can be found at <https://docs.aws.amazon.com/AmazonECS/latest/APIReference/Welcome.html>.

Maintained by Jiefei Wang. Last updated 3 years ago.

ec2 ecs ecs-functions

5.3 match 1 stars 4.18 score 2 scripts

thothorn

HSAUR2:A Handbook of Statistical Analyses Using R (2nd Edition)

Functions, data sets, analyses and examples from the second edition of the book ''A Handbook of Statistical Analyses Using R'' (Brian S. Everitt and Torsten Hothorn, Chapman & Hall/CRC, 2008). The first chapter of the book, which is entitled ''An Introduction to R'', is completely included in this package, for all other chapters, a vignette containing all data analyses is available. In addition, the package contains Sweave code for producing slides for selected chapters (see HSAUR2/inst/slides).

Maintained by Torsten Hothorn. Last updated 2 years ago.

4.0 match 5.51 score 181 scripts 1 dependents

bioc

DiffLogo:DiffLogo: A comparative visualisation of biooligomer motifs

DiffLogo is an easy-to-use tool to visualize motif differences.

Maintained by Hendrik Treutler. Last updated 5 months ago.

software sequencematching multiplecomparison motifannotation visualization alignment

3.2 match 8 stars 6.66 score 27 scripts

jkropko

coxed:Duration-Based Quantities of Interest for the Cox Proportional Hazards Model

Functions for generating, simulating, and visualizing expected durations and marginal changes in duration from the Cox proportional hazards model as described in Kropko and Harden (2017) <doi:10.1017/S000712341700045X> and Harden and Kropko (2018) <doi:10.1017/psrm.2018.19>.

Maintained by "Kropko, Jonathan". Last updated 4 years ago.

3.3 match 25 stars 6.00 score 132 scripts 1 dependents

bioc

ReactomeGraph4R:Interface for the Reactome Graph Database

Pathways, reactions, and biological entities in Reactome knowledge are systematically represented as an ordered network. Instances are represented as nodes and relationships between instances as edges; they are all stored in the Reactome Graph Database. This package serves as an interface to query the interconnected data from a local Neo4j database, with the aim of minimizing the usage of Neo4j Cypher queries.

Maintained by Chi-Lam Poon. Last updated 5 months ago.

dataimport pathways reactome network graphandnetwork

3.5 match 6 stars 5.26 score 6 scripts

paws-r

paws.security.identity:'Amazon Web Services' Security, Identity, & Compliance Services

Interface to 'Amazon Web Services' security, identity, and compliance services, including the 'Identity & Access Management' ('IAM') service for managing access to services and resources, and more <https://aws.amazon.com/>.

Maintained by Dyfan Jones. Last updated 3 days ago.

aws aws-sdk

2.0 match 332 stars 9.17 score 15 dependents

bioc

OmnipathR:OmniPath web service client and more

A client for the OmniPath web service (https://www.omnipathdb.org) and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).

Maintained by Denes Turei. Last updated 18 days ago.

graphandnetwork network pathways software thirdpartyclient dataimport datarepresentation genesignaling generegulation systemsbiology transcriptomics singlecell annotation kegg complexes enzyme-ptm networks networks-biology omnipath proteins quarto

1.8 match 126 stars 9.90 score 226 scripts 2 dependents

civisanalytics

civis:R Client for the 'Civis Platform API'

A convenient interface for making requests directly to the 'Civis Platform API' <https://www.civisanalytics.com/platform/>. Full documentation available 'here' <https://civisanalytics.github.io/civis-r/>.

Maintained by Peter Cooman. Last updated 2 months ago.

2.3 match 16 stars 7.84 score 144 scripts

bioc

DAPAR:Tools for the Differential Analysis of Proteins Abundance with R

The package DAPAR is a Bioconductor distributed R package which provides all the necessary functions to analyze quantitative data from label-free proteomics experiments. Contrarily to most other similar R packages, it is endowed with rich and user-friendly graphical interfaces, so that no programming skill is required (see `Prostar` package).

Maintained by Samuel Wieczorek. Last updated 5 months ago.

proteomics normalization preprocessing massspectrometry qualitycontrol go dataimport prostar1

3.2 match 2 stars 5.42 score 22 scripts 1 dependents

melff

memisc:Management of Survey Data and Presentation of Analysis Results

An infrastructure for the management of survey data including value labels, definable missing values, recoding of variables, production of code books, and import of (subsets of) 'SPSS' and 'Stata' files is provided. Further, the package allows to produce tables and data frames of arbitrary descriptive statistics and (almost) publication-ready tables of regression model estimates, which can be exported to 'LaTeX' and HTML.

Maintained by Martin Elff. Last updated 11 days ago.

survey-data

1.3 match 46 stars 12.34 score 1.2k scripts 13 dependents

mages

googleVis:R Interface to Google Charts

R interface to Google's chart tools, allowing users to create interactive charts based on data frames. Charts are displayed locally via the R HTTP help server. A modern browser with an Internet connection is required. The data remains local and is not uploaded to Google.

Maintained by Markus Gesmann. Last updated 10 months ago.

1.3 match 361 stars 12.98 score 2.4k scripts 11 dependents

brian-j-smith

MachineShop:Machine Learning Models and Tools

Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.

Maintained by Brian J Smith. Last updated 7 months ago.

classification-models machine-learning predictive-modeling regression-models survival-models

2.0 match 61 stars 7.95 score 121 scripts

datawookie

clockify:A Wrapper for the 'Clockify' API

A wrapper for the Clockify API <https://docs.clockify.me/>, making it possible to query, insert and update time keeping data.

Maintained by Andrew B. Collier. Last updated 10 months ago.

hacktoberfest

4.0 match 2 stars 3.95 score 6 scripts

bioc

PIUMA:Phenotypes Identification Using Mapper from topological data Analysis

The PIUMA package offers a tidy pipeline of Topological Data Analysis frameworks to identify and characterize communities in high and heterogeneous dimensional data.

Maintained by Mattia Chiesa. Last updated 5 months ago.

clustering graphandnetwork dimensionreduction network classification

3.0 match 4 stars 5.08 score 2 scripts

huanglabumn

oncoPredict:Drug Response Modeling and Biomarker Discovery

Allows for building drug response models using screening data between bulk RNA-Seq and a drug response metric and two additional tools for biomarker discovery that have been developed by the Huang Laboratory at University of Minnesota. There are 3 main functions within this package. (1) calcPhenotype is used to build drug response models on RNA-Seq data and impute them on any other RNA-Seq dataset given to the model. (2) GLDS is used to calculate the general level of drug sensitivity, which can improve biomarker discovery. (3) IDWAS can take the results from calcPhenotype and link the imputed response back to available genomic (mutation and CNV alterations) to identify biomarkers. Each of these functions comes from a paper from the Huang research laboratory. Below gives the relevant paper for each function. calcPhenotype - Geeleher et al, Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. GLDS - Geeleher et al, Cancer biomarker discovery is improved by accounting for variability in general levels of drug sensitivity in pre-clinical models. IDWAS - Geeleher et al, Discovering novel pharmacogenomic biomarkers by imputing drug response in cancer patients from large genomics studies.

Maintained by Robert Gruener. Last updated 12 months ago.

sva preprocesscore stringr biomart genefilter org.hs.eg.db genomicfeatures txdb.hsapiens.ucsc.hg19.knowngene tcgabiolinks biocgenerics genomicranges iranges s4vectors

2.4 match 18 stars 6.47 score 41 scripts

nicebread

fSRM:Social Relations Analyses with Roles ("Family SRM")

Social Relations Analysis with roles ("Family SRM") are computed, using a structural equation modeling approach. Groups ranging from three members up to an unlimited number of members are supported and the mean structure can be computed. Means and variances can be compared between different groups of families and between roles.

Maintained by Felix Schönbrodt. Last updated 4 years ago.

14.7 match 1.04 score 11 scripts

dyfanjones

noctua:Connect to 'AWS Athena' using R 'AWS SDK' 'paws' ('DBI' Interface)

Designed to be compatible with the 'R' package 'DBI' (Database Interface) when connecting to Amazon Web Service ('AWS') Athena <https://aws.amazon.com/athena/>. To do this the 'R' 'AWS' Software Development Kit ('SDK') 'paws' <https://github.com/paws-r/paws> is used as a driver.

Maintained by Dyfan Jones. Last updated 11 months ago.

athena aws database

1.9 match 46 stars 7.48 score 58 scripts

mspeekenbrink

sdamr:Statistics: Data Analysis and Modelling

Data sets and functions to support the books "Statistics: Data analysis and modelling" by Speekenbrink, M. (2021) <https://mspeekenbrink.github.io/sdam-book/> and "An R companion to Statistics: data analysis and modelling" by Speekenbrink, M. (2021) <https://mspeekenbrink.github.io/sdam-r-companion/>. All datasets analysed in these books are provided in this package. In addition, the package provides functions to compute sample statistics (variance, standard deviation, mode), create raincloud and enhanced Q-Q plots, and expand Anova results into omnibus tests and tests of individual contrasts.

Maintained by Maarten Speekenbrink. Last updated 1 months ago.

3.1 match 5 stars 4.39 score 99 scripts

big-life-lab

recodeflow:Contains functions to interface with variable details sheets, including recoding variables and converting them to PMML

Recode and harmonize data using variable and details sheets.

Maintained by Yulric Sequeria. Last updated 5 days ago.

2.0 match 6 stars 6.75 score 7 scripts

dyfanjones

RAthena:Connect to 'AWS Athena' using 'Boto3' ('DBI' Interface)

Designed to be compatible with the R package 'DBI' (Database Interface) when connecting to Amazon Web Service ('AWS') Athena <https://aws.amazon.com/athena/>. To do this 'Python' 'Boto3' Software Development Kit ('SDK') <https://boto3.amazonaws.com/v1/documentation/api/latest/index.html> is used as a driver.

Maintained by Dyfan Jones. Last updated 1 years ago.

athena aws boto3 database

1.9 match 37 stars 7.10 score 38 scripts

pablobarbera

Rfacebook:Access to Facebook API via R

Provides an interface to the Facebook API.

Maintained by Pablo Barbera. Last updated 5 years ago.

1.7 match 351 stars 7.75 score 268 scripts

lvclark

polyRAD:Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

Read depth data from genotyping-by-sequencing (GBS) or restriction site-associated DNA sequencing (RAD-seq) are imported and used to make Bayesian probability estimates of genotypes in polyploids or diploids. The genotype probabilities, posterior mean genotypes, or most probable genotypes can then be exported for downstream analysis. 'polyRAD' is described by Clark et al. (2019) <doi:10.1534/g3.118.200913>, and the Hind/He statistic for marker filtering is described by Clark et al. (2022) <doi:10.1186/s12859-022-04635-9>. A variant calling pipeline for highly duplicated genomes is also included and is described by Clark et al. (2020, Version 1) <doi:10.1101/2020.01.11.902890>.

Maintained by Lindsay V. Clark. Last updated 7 days ago.

bioinformatics dna-sequencing genotype-likelihoods genotyping-by-sequencing hacktoberfest rad-seq rad-sequencing snp-genotyping cpp

1.8 match 28 stars 6.98 score 85 scripts

cran

ILSM:Analyze Interconnection Structure of Multilayer Interaction Networks

In view of the analysis of the structural characteristics of the multilayer network has been complete, however, there is still a lack of a unified operation that can quickly obtain the corresponding characteristics of the multilayer network. To solve this insufficiency, 'ILSM' was designed for supporting calculating such metrics of multilayer networks by functions of this R package.

Maintained by WeiCheng Sun. Last updated 6 months ago.

3.7 match 3.30 score

vusaverse

vvcanvas:'Canvas' LMS API Integration

Allow R users to interact with the 'Canvas' Learning Management System (LMS) API (see <https://canvas.instructure.com/doc/api/all_resources.html> for details). It provides a set of functions to access and manipulate course data, assignments, grades, users, and other resources available through the 'Canvas' API.

Maintained by Tomer Iwan. Last updated 3 days ago.

canvas canvas-lms canvas-lms-api canvasapi educational instructure-canvas

1.9 match 7 stars 6.23 score 10 scripts

cran

randomLCA:Random Effects Latent Class Analysis

Fits standard and random effects latent class models. The single level random effects model is described in Qu et al <doi:10.2307/2533043> and the two level random effects model in Beath and Heller <doi:10.1177/1471082X0800900302>. Examples are given for their use in diagnostic testing.

Maintained by Ken Beath. Last updated 6 months ago.

3.8 match 3.10 score 42 scripts

matloff

qeML:Quick and Easy Machine Learning Tools

The letters 'qe' in the package title stand for "quick and easy," alluding to the convenience goal of the package. We bring together a variety of machine learning (ML) tools from standard R packages, providing wrappers with a simple, convenient, and uniform interface.

Maintained by Norm Matloff. Last updated 25 days ago.

1.3 match 41 stars 8.41 score 48 scripts 1 dependents

daroczig

botor:'AWS Python SDK' ('boto3') for R

Fork-safe, raw access to the 'Amazon Web Services' ('AWS') 'SDK' via the 'boto3' 'Python' module, and convenient helper functions to query the 'Simple Storage Service' ('S3') and 'Key Management Service' ('KMS'), partial support for 'IAM', the 'Systems Manager Parameter Store' and 'Secrets Manager'.

Maintained by Gergely Daróczi. Last updated 2 months ago.

amazon-web-services aws boto3 python

1.7 match 31 stars 6.61 score 32 scripts

mlr-org

mlr3spatiotempcv:Spatiotemporal Resampling Methods for 'mlr3'

Extends the mlr3 machine learning framework with spatio-temporal resampling methods to account for the presence of spatiotemporal autocorrelation (STAC) in predictor variables. STAC may cause highly biased performance estimates in cross-validation if ignored. A JSS article is available at <doi:10.18637/jss.v111.i07>.

Maintained by Patrick Schratz. Last updated 4 months ago.

cross-validation mlr3 resampling resampling-methods spatial temporal

1.3 match 50 stars 8.09 score 123 scripts

floschuberth

cSEM:Composite-Based Structural Equation Modeling

Estimate, assess, test, and study linear, nonlinear, hierarchical and multigroup structural equation models using composite-based approaches and procedures, including estimation techniques such as partial least squares path modeling (PLS-PM) and its derivatives (PLSc, ordPLSc, robustPLSc), generalized structured component analysis (GSCA), generalized structured component analysis with uniqueness terms (GSCAm), generalized canonical correlation analysis (GCCA), principal component analysis (PCA), factor score regression (FSR) using sum score, regression or Bartlett scores (including bias correction using Croon’s approach), as well as several tests and typical postestimation procedures (e.g., verify admissibility of the estimates, assess the model fit, test the model fit etc.).

Maintained by Florian Schuberth. Last updated 16 days ago.

1.1 match 28 stars 9.11 score 56 scripts 2 dependents

mbannert

timeseriesdb:A Time Series Database for Official Statistics with R and PostgreSQL

Archive and manage times series data from official statistics. The 'timeseriesdb' package was designed to manage a large catalog of time series from official statistics which are typically published on a monthly, quarterly or yearly basis. Thus timeseriesdb is optimized to handle updates caused by data revision as well as elaborate, multi-lingual meta information.

Maintained by Matthias Bannert. Last updated 6 months ago.

1.5 match 24 stars 6.89 score 26 scripts

josesamos

starschemar:Obtaining Stars from Flat Tables

Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a star schema. Transformations can be carried out using professional extract, transform and load tools or tools intended for data transformation for end users. With the tools mentioned, this transformation can be carried out, but it requires a lot of work. The main objective of this package is to define transformations that allow obtaining stars from flat tables easily. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.

Maintained by Jose Samos. Last updated 11 months ago.

1.8 match 7 stars 5.66 score 11 scripts 2 dependents

emitanaka

edibble:Encapsulating Elements of Experimental Design

A system to facilitate designing comparative (and non-comparative) experiments using the grammar of experimental designs <https://emitanaka.org/edibble-book/>. An experimental design is treated as an intermediate, mutable object that is built progressively by fundamental experimental components like units, treatments, and their relation. The system aids in experimental planning, management and workflow.

Maintained by Emi Tanaka. Last updated 4 months ago.

experimental-designs

1.3 match 217 stars 7.43 score 62 scripts

samlestrade2

MoLE:Modeling Language Evolution

Model for simulating language evolution in terms of cultural evolution (Smith & Kirby (2008) <DOI:10.1098/rstb.2008.0145>; Deacon 1997). The focus is on the emergence of argument-marking systems (Dowty (1991) <DOI:10.1353/lan.1991.0021>, Van Valin 1999, Dryer 2002, Lestrade 2015a), i.e. noun marking (Aristar (1997) <DOI:10.1075/sl.21.2.04ari>, Lestrade (2010) <DOI:10.7282/T3ZG6R4S>), person indexing (Ariel 1999, Dahl (2000) <DOI:10.1075/fol.7.1.03dah>, Bhat 2004), and word order (Dryer 2013), but extensions are foreseen. Agents start out with a protolanguage (a language without grammar; Bickerton (1981) <DOI:10.17169/langsci.b91.109>, Jackendoff 2002, Arbib (2015) <DOI:10.1002/9781118346136.ch27>) and interact through language games (Steels 1997). Over time, grammatical constructions emerge that may or may not become obligatory (for which the tolerance principle is assumed; Yang 2016). Throughout the simulation, uniformitarianism of principles is assumed (Hopper (1987) <DOI:10.3765/bls.v13i0.1834>, Givon (1995) <DOI:10.1075/z.74>, Croft (2000), Saffran (2001) <DOI:10.1111/1467-8721.01243>, Heine & Kuteva 2007), in which maximal psychological validity is aimed at (Grice (1975) <DOI:10.1057/9780230005853_5>, Levelt 1989, Gaerdenfors 2000) and language representation is usage based (Tomasello 2003, Bybee 2010). In Lestrade (2015b) <DOI:10.15496/publikation-8640>, Lestrade (2015c) <DOI:10.1075/avt.32.08les>, and Lestrade (2016) <DOI:10.17617/2.2248195>), which reported on the results of preliminary versions, this package was announced as WDWTW (for who does what to whom), but for reasons of pronunciation and generalization the title was changed.

Maintained by Sander Lestrade. Last updated 7 years ago.

4.0 match 2.46 score 58 scripts

ropensci

tic:Tasks Integrating Continuously: CI-Agnostic Workflow Definitions

Provides a way to describe common build and deployment workflows for R-based projects: packages, websites (e.g. blogdown, pkgdown), or data processing (e.g. research compendia). The recipe is described independent of the continuous integration tool used for processing the workflow (e.g. 'GitHub Actions' or 'Circle CI'). This package has been peer-reviewed by rOpenSci (v0.3.0.9004).

Maintained by Eli Miller. Last updated 1 months ago.

appveyor continuous-integration deployment githubactions travis-ci

1.3 match 155 stars 7.57 score 16 scripts

mbojan

isnar:Introduction to Social Network Analysis with R

Functions and datasets accompanying the workshop "Introduction to Social Network Analysis with R" on annual INSNA Sunbelt conferences.

Maintained by Michal Bojanowski. Last updated 4 years ago.

3.3 match 8 stars 2.86 score 18 scripts

cloudyr

aws.iam:AWS IAM Client Package

A simple client for the Amazon Web Services ('AWS') Identity and Access Management ('IAM') 'API' <https://aws.amazon.com/iam/>.

Maintained by Simon Urbanek. Last updated 5 years ago.

aws aws-iam cloudyr iam

2.0 match 15 stars 4.65 score 10 scripts

bioc

GenomicRanges:Representation and manipulation of genomic intervals

The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.

Maintained by Hervé Pagès. Last updated 4 months ago.

genetics infrastructure datarepresentation sequencing annotation genomeannotation coverage bioconductor-package core-package

0.5 match 44 stars 17.75 score 13k scripts 1.3k dependents

ropensci

CRediTas:Generate CRediT Author Statements

A tiny package to generate CRediT author statements (<https://credit.niso.org/>). It provides three functions: create a template, read it back and generate the CRediT author statement in a text file.

Maintained by Josep Pueyo-Ros. Last updated 2 years ago.

1.9 match 8 stars 4.75 score 14 scripts

jiscah

sequoia:Pedigree Inference from SNPs

Multi-generational pedigree inference from incomplete data on hundreds of SNPs, including parentage assignment and sibship clustering. See Huisman (2017) (<DOI:10.1111/1755-0998.12665>) for more information.

Maintained by Jisca Huisman. Last updated 9 months ago.

pedigree pedigree-reconstruction pedigrees sequoia snp snp-data fortran

1.2 match 26 stars 7.40 score 79 scripts

jimbrig

lossrx:Actuarial Loss Development and Reserving with R

Actuarial Loss Development and Reserving Helper Functions and ShinyApp.

Maintained by Jimmy Briggs. Last updated 3 months ago.

actuarial-science claims-data claims-reserving data-science insurance modelling property-casualty reserving rshiny workflow

1.5 match 14 stars 5.89 score 7 scripts

jglev

veccompare:Perform Set Operations on Vectors, Automatically Generating All n-Wise Comparisons, and Create Markdown Output

Automates set operations (i.e., comparisons of overlap) between multiple vectors. It also contains a function for automating reporting in 'RMarkdown', by generating markdown output for easy analysis, as well as an 'RMarkdown' template for use with 'RStudio'.

Maintained by Jacob Gerard Levernier. Last updated 8 years ago.

2.4 match 8 stars 3.60 score 10 scripts

kzavez

LearnVizLMM:Learning and Communicating Linear Mixed Models Without Data

Summarizes characteristics of linear mixed effects models without data or a fitted model by converting code for fitting lmer() from 'lme4' and lme() from 'nlme' into tables, equations, and visuals. Outputs can be used to learn how to fit linear mixed effects models in 'R' and to communicate about these models in presentations, manuscripts, and analysis plans.

Maintained by Katherine Zavez. Last updated 5 months ago.

2.3 match 3.70 score 2 scripts

drg-123

IIS:Datasets to Accompany Wolfe and Schneider - Intuitive Introductory Statistics

These datasets and functions accompany Wolfe and Schneider (2017) - Intuitive Introductory Statistics (ISBN: 978-3-319-56070-0) <doi:10.1007/978-3-319-56072-4>. They are used in the examples throughout the text and in the end-of-chapter exercises. The datasets are meant to cover a broad range of topics in order to appeal to the diverse set of interests and backgrounds typically present in an introductory Statistics class.

Maintained by Grant Schneider. Last updated 1 months ago.

4.5 match 1.74 score 55 scripts

cran

datarobot:'DataRobot' Predictive Modeling API

For working with the 'DataRobot' predictive modeling platform's API <https://www.datarobot.com/>.

Maintained by AJ Alon. Last updated 1 years ago.

2.3 match 2 stars 3.48 score

bioc

awst:Asymmetric Within-Sample Transformation

We propose an Asymmetric Within-Sample Transformation (AWST) to regularize RNA-seq read counts and reduce the effect of noise on the classification of samples. AWST comprises two main steps: standardization and smoothing. These steps transform gene expression data to reduce the noise of the lowly expressed features, which suffer from background effects and low signal-to-noise ratio, and the influence of the highly expressed features, which may be the result of amplification bias and other experimental artifacts.

Maintained by Davide Risso. Last updated 5 months ago.

normalization geneexpression rnaseq software transcriptomics sequencing singlecell

1.5 match 3 stars 4.95 score 15 scripts

gi0na

ghypernet:Fit and Simulate Generalised Hypergeometric Ensembles of Graphs

Provides functions for model fitting and selection of generalised hypergeometric ensembles of random graphs (gHypEG). To learn how to use it, check the vignettes for a quick tutorial. Please reference its use as Casiraghi, G., Nanumyan, V. (2019) <doi:10.5281/zenodo.2555300> together with those relevant references from the one listed below. The package is based on the research developed at the Chair of Systems Design, ETH Zurich. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2016) <arXiv:1607.02441>. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017) <doi:10.1007/978-3-319-67256-4_11>. Casiraghi, G., (2017) <arXiv:1702.02048> Brandenberger, L., Casiraghi, G., Nanumyan, V., Schweitzer, F. (2019) <doi:10.1145/3341161.3342926> Casiraghi, G. (2019) <doi:10.1007/s41109-019-0241-1>. Casiraghi, G., Nanumyan, V. (2021) <doi:10.1038/s41598-021-92519-y>. Casiraghi, G. (2021) <doi:10.1088/2632-072X/ac0493>.

Maintained by Giona Casiraghi. Last updated 11 months ago.

data-mining data-science graphs network network-analysis random-graph-generation random-graphs

1.3 match 8 stars 5.68 score 20 scripts

karlines

NetIndices:Estimating Network Indices, Including Trophic Structure of Foodwebs in R

Given a network (e.g. a food web), estimates several network indices. These include: Ascendency network indices, Direct and indirect dependencies, Effective measures, Environ network indices, General network indices, Pathway analysis, Network uncertainty indices and constraint efficiencies and the trophic level and omnivory indices of food webs.

Maintained by Karline Soetaert. Last updated 3 years ago.

1.7 match 3.91 score 134 scripts 2 dependents

bioc

Moonlight2R:Identify oncogenes and tumor suppressor genes from omics data

The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). We present an updated version of the R/bioconductor package called MoonlightR, namely Moonlight2R, which returns a list of candidate driver genes for specific cancer types on the basis of omics data integration. The Moonlight framework contains a primary layer where gene expression data and information about biological processes are integrated to predict genes called oncogenic mediators, divided into putative tumor suppressors and putative oncogenes. This is done through functional enrichment analyses, gene regulatory networks and upstream regulator analyses to score the importance of well-known biological processes with respect to the studied cancer type. By evaluating the effect of the oncogenic mediators on biological processes or through random forests, the primary layer predicts two putative roles for the oncogenic mediators: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As gene expression data alone is not enough to explain the deregulation of the genes, a second layer of evidence is needed. We have automated the integration of a secondary mutational layer through new functionalities in Moonlight2R. These functionalities analyze mutations in the cancer cohort and classifies these into driver and passenger mutations using the driver mutation prediction tool, CScape-somatic. Those oncogenic mediators with at least one driver mutation are retained as the driver genes. As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, Moonlight2R can be used to discover OCGs and TSGs in the same cancer type. This may for instance help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV). In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments. An additional mechanistic layer evaluates if there are mutations affecting the protein stability of the transcription factors (TFs) of the TSGs and OCGs, as that may have an effect on the expression of the genes.

Maintained by Matteo Tiberti. Last updated 2 months ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network survival genesetenrichment networkenrichment

1.0 match 5 stars 6.59 score 43 scripts

openvolley

ovlytics:Functions and Algorithms for Volleyball Analytics

Analytical functions for volleyball analytics, to be used in conjunction with the datavolley and peranavolley packages.

Maintained by Ben Raymond. Last updated 3 months ago.

2.0 match 3.13 score 9 scripts 3 dependents

cran

airGRteaching:Teaching Hydrological Modelling with the GR Rainfall-Runoff Models ('Shiny' Interface Included)

Add-on package to the 'airGR' package that simplifies its use and is aimed at being used for teaching hydrology. The package provides 1) three functions that allow to complete very simply a hydrological modelling exercise 2) plotting functions to help students to explore observed data and to interpret the results of calibration and simulation of the GR ('Génie rural') models 3) a 'Shiny' graphical interface that allows for displaying the impact of model parameters on hydrographs and models internal variables.

Maintained by Olivier Delaigue. Last updated 1 months ago.

1.3 match 6 stars 4.82 score

ninohardt

echoice2:Choice Models with Economic Foundation

Implements choice models based on economic theory, including estimation using Markov chain Monte Carlo (MCMC), prediction, and more. Its usability is inspired by ideas from 'tidyverse'. Models include versions of the Hierarchical Multinomial Logit and Multiple Discrete-Continous (Volumetric) models with and without screening. The foundations of these models are described in Allenby, Hardt and Rossi (2019) <doi:10.1016/bs.hem.2019.04.002>. Models with conjunctive screening are described in Kim, Hardt, Kim and Allenby (2022) <doi:10.1016/j.ijresmar.2022.04.001>. Models with set-size variation are described in Hardt and Kurz (2020) <doi:10.2139/ssrn.3418383>.

Maintained by Nino Hardt. Last updated 1 years ago.

choice-models openblas cpp openmp

1.5 match 1 stars 4.00 score 7 scripts

ifpri

ARIA:App for IMPACT (App foR ImpAct)

App for IMPACT (App foR ImpAct).

Maintained by Abhijeet Mishra. Last updated 10 months ago.

2.2 match 2.70 score 6 scripts

ncss-tech

aqp:Algorithms for Quantitative Pedology

The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.

Maintained by Dylan Beaudette. Last updated 28 days ago.

digital-soil-mapping ncss-tech nrcs pedology pedometrics soil soil-survey usda

0.5 match 55 stars 11.77 score 1.2k scripts 2 dependents

dyfanjones

smdocker:Build 'Docker Images' in 'Amazon SageMaker Studio' using 'Amazon Web Service CodeBuild'

Allows users to easily build custom 'docker images' <https://docs.docker.com/> from 'Amazon Web Service Sagemaker' <https://aws.amazon.com/sagemaker/> using 'Amazon Web Service CodeBuild' <https://aws.amazon.com/codebuild/>.

Maintained by Dyfan Jones. Last updated 2 years ago.

1.7 match 5 stars 3.40 score 4 scripts

psolymos

clickrup:Interacting with the ClickUp v2 API from R

Work with the ClickUp productivity app from R to manage tasks, goals, time tracking, and more.

Maintained by Peter Solymos. Last updated 1 years ago.

api clickup clickup-api project-management

1.8 match 18 stars 3.26 score 7 scripts

jhmaindonald

hddplot:Use Known Groups in High-Dimensional Data to Derive Scores for Plots

Cross-validated linear discriminant calculations determine the optimum number of features. Test and training scores from successive cross-validation steps determine, via a principal components calculation, a low-dimensional global space onto which test scores are projected, in order to plot them. Further functions are included that are intended for didactic use. The package implements, and extends, methods described in J.H. Maindonald and C.J. Burden (2005) <https://journal.austms.org.au/V46/CTAC2004/Main/home.html>.

Maintained by John Maindonald. Last updated 2 years ago.

1.7 match 3.00 score 10 scripts

dyfanjones

sagemaker.core:Sagemaker core classes, methods and functions

Contains core classes, methods and functions that support `AWS Sagemaker R Software Development Kit (SDK)`.

Maintained by Dyfan Jones. Last updated 3 years ago.

amazon-sagemaker aws machine-learning sagemaker sdk

1.7 match 2.88 score 1 scripts 5 dependents

tobiasschoch

wbacon:Weighted BACON Algorithms

The BACON algorithms are methods for multivariate outlier nomination (detection) and robust linear regression by Billor, Hadi, and Velleman (2000) <doi:10.1016/S0167-9473(99)00101-2>. The extension to weighted problems is due to Beguin and Hulliger (2008) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X200800110616>; see also <doi:10.21105/joss.03238>.

Maintained by Tobias Schoch. Last updated 6 months ago.

outlier outlier-detection robust-regression statistics openblas openmp

1.2 match 2 stars 4.00 score 8 scripts

bioc

midasHLA:R package for immunogenomics data handling and association analysis

MiDAS is a R package for immunogenetics data transformation and statistical analysis. MiDAS accepts input data in the form of HLA alleles and KIR types, and can transform it into biologically meaningful variables, enabling HLA amino acid fine mapping, analyses of HLA evolutionary divergence, KIR gene presence, as well as validated HLA-KIR interactions. Further, it allows comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS closes a gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to T cell, Natural Killer cell, and disease biology.

Maintained by Maciej Migdał. Last updated 5 months ago.

cellbiology genetics statisticalmethod

1.1 match 4.30 score 3 scripts

inqs909

csucistats:CSU Channel Islands R Tools

An R package containing functions for statistics courses at CSUCI.

Maintained by Isaac Quintanilla Salinas. Last updated 2 months ago.

1.6 match 2.99 score 14 scripts

johannes-titz

passt:Probability Associator Time (PASS-T)

Simulates judgments of frequency and duration based on the Probability Associator Time (PASS-T) model. PASS-T is a memory model based on a simple competitive artificial neural network. It can imitate human judgments of frequency and duration, which have been extensively studied in cognitive psychology (e.g. Hintzman (1970) <doi:10.1037/h0028865>, Betsch et al. (2010) <https://psycnet.apa.org/record/2010-18204-003>). The PASS-T model is an extension of the PASS model (Sedlmeier, 2002, ISBN:0198508638). The package provides an easy way to run simulations, which can then be compared with empirical data in human judgments of frequency and duration.

Maintained by Johannes Titz. Last updated 4 years ago.

1.3 match 3.70 score 3 scripts

abdisalammuse

AHSurv:Flexible Parametric Accelerated Hazards Models

Flexible parametric Accelerated Hazards (AH) regression models in overall and relative survival frameworks with 13 distinct Baseline Distributions. The AH Model can also be applied to lifetime data with crossed survival curves. Any user-defined parametric distribution can be fitted, given at least an R function defining the cumulative hazard and hazard rate functions. See Chen and Wang (2000) <doi:10.1080/01621459.2000.10474236>, and Lee (2015) <doi:10.1007/s10985-015-9349-5> for more details.

Maintained by Abdisalam Hassan Muse. Last updated 3 years ago.

3.1 match 1.48 score 1 dependents

nhs-r-community

NHSRtools:NHS-R Tools

Provides tools commonly used by Analysts within the NHS.

Maintained by Tom Jemmett. Last updated 4 years ago.

2.3 match 2 stars 2.00 score 1 scripts

ropengov

mpg:FuelEconomy.gov Data

Extract fuel economy data from FuelEconomy.gov.

Maintained by Thomas J. Leeper. Last updated 3 years ago.

ropengov

1.6 match 12 stars 2.78 score

erictleung

pyblack:Style Python code blocks with black

RStudio addin to help format and style Python code in RMarkdown and Quarto documents with the Python code formatter, black.

Maintained by Eric Leung. Last updated 9 months ago.

black formatting python rstudio rstudio-addin

1.6 match 8 stars 2.60 score

shah-in-boots

rmdl:A Causality-Informed Modeling Approach

A system for describing and manipulating the many models that are generated in causal inference and data analysis projects, as based on the causal theory and criteria of Austin Bradford Hill (1965) <doi:10.1177/003591576505800503>. This system includes the addition of formal attributes that modify base `R` objects, including terms and formulas, with a focus on variable roles in the "do-calculus" of modeling, as described in Pearl (2010) <doi:10.2202/1557-4679.1203>. For example, the definition of exposure, outcome, and interaction are implicit in the roles variables take in a formula. These premises allow for a more fluent modeling approach focusing on variable relationships, and assessing effect modification, as described by VanderWeele and Robins (2007) <doi:10.1097/EDE.0b013e318127181b>. The essential goal is to help contextualize formulas and models in causality-oriented workflows.

Maintained by Anish S. Shah. Last updated 10 months ago.

epidemiology modeling statistics

0.8 match 4.60 score 7 scripts

lcrawlab

mvMAPIT:Multivariate Genome Wide Marginal Epistasis Test

Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this package, we present the 'multivariate MArginal ePIstasis Test' ('mvMAPIT') – a multi-outcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact – thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search based methods. Our proposed 'mvMAPIT' builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate 'mvMAPIT' as a multivariate linear mixed model and develop a multi-trait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. Crawford et al. (2017) <doi:10.1371/journal.pgen.1006869>. Stamp et al. (2023) <doi:10.1093/g3journal/jkad118>.

Maintained by Julian Stamp. Last updated 5 months ago.

cpp epistasis epistasis-analysis gwas gwas-tools linear-mixed-models mapit mvmapit variance-components openblas cpp openmp

0.5 match 11 stars 6.90 score 17 scripts 1 dependents

laxchan

SparseMSE:'Multiple Systems Estimation for Sparse Capture Data'

Implements the routines and algorithms developed and analysed in "Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists" Chan, L, Silverman, B. W., Vincent, K (2019) <https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748> and in "Bootstrapping multiple systems estimates to account for model selection" Silverman, B. W., Chan, L, Vincent, K (2023)<https://doi.org/10.1007/s11222-023-10346-9>. This package explicitly handles situations where there are pairs of lists which have no observed individuals in common. It deals correctly with parameters whose estimated values can be considered as being negative infinity. It also addresses other possible issues of non-existence and non-identifiability of maximum likelihood estimates.

Maintained by Lax Chan. Last updated 1 years ago.

1.7 match 2.00 score 7 scripts

bioc

gDRstyle:A package with style requirements for the gDR suite

Package fills a helper package role for whole gDR suite. It helps to support good development practices by keeping style requirements and style tests for other packages. It also contains build helpers to make all package requirements met.

Maintained by Arkadiusz Gladki. Last updated 1 months ago.

software infrastructure

0.5 match 2 stars 6.10 score 2 scripts

bioc

scMET:Bayesian modelling of cell-to-cell DNA methylation heterogeneity

High-throughput single-cell measurements of DNA methylomes can quantify methylation heterogeneity and uncover its role in gene regulation. However, technical limitations and sparse coverage can preclude this task. scMET is a hierarchical Bayesian model which overcomes sparsity, sharing information across cells and genomic features to robustly quantify genuine biological heterogeneity. scMET can identify highly variable features that drive epigenetic heterogeneity, and perform differential methylation and variability analyses. We illustrate how scMET facilitates the characterization of epigenetically distinct cell populations and how it enables the formulation of novel hypotheses on the epigenetic regulation of gene expression.

Maintained by Andreas C. Kapourani. Last updated 5 months ago.

immunooncology dnamethylation differentialmethylation differentialexpression geneexpression generegulation epigenetics genetics clustering featureextraction regression bayesian sequencing coverage singlecell bayesian-inference generalised-linear-models heterogeneity hierarchical-models methylation-analysis single-cell cpp

0.5 match 20 stars 6.23 score 42 scripts

rsetienne

secsse:Several Examined and Concealed States-Dependent Speciation and Extinction

Simultaneously infers state-dependent diversification across two or more states of a single or multiple traits while accounting for the role of a possible concealed trait. See Herrera-Alsina et al. (2019) <doi:10.1093/sysbio/syy057>.

Maintained by Rampal S. Etienne. Last updated 11 months ago.

cpp

0.5 match 1 stars 5.83 score 34 scripts

cran

latcontrol:Evaluation of the Role of Control Variables in Structural Equation Models

Various opportunities to evaluate the effects of including one or more control variable(s) in structural equation models onto model-implied variances, covariances, and parameter estimates. The derivation of the methodology employed in this package can be obtained from Blötner (2023) <doi:10.31234/osf.io/dy79z>.

Maintained by Christian Blötner. Last updated 9 months ago.

2.9 match 1.00 score

bioc

PICB:piRNA Cluster Builder

piRNAs (short for PIWI-interacting RNAs) and their PIWI protein partners play a key role in fertility and maintaining genome integrity by restricting mobile genetic elements (transposons) in germ cells. piRNAs originate from genomic regions known as piRNA clusters. The piRNA Cluster Builder (PICB) is a versatile toolkit designed to identify genomic regions with a high density of piRNAs. It constructs piRNA clusters through a stepwise integration of unique and multimapping piRNAs and offers wide-ranging parameter settings, supported by an optimization function that allows users to test different parameter combinations to tailor the analysis to their specific piRNA system. The output includes extensive metadata columns, enabling researchers to rank clusters and extract cluster characteristics.

Maintained by Franziska Ahrend. Last updated 1 months ago.

genetics genomeannotation sequencing functionalprediction coverage transcriptomics

0.5 match 5 stars 5.57 score

cran

vocaldia:Create and Manipulate Vocalisation Diagrams

Create adjacency matrices of vocalisation graphs from dataframes containing sequences of speech and silence intervals, transforming these matrices into Markov diagrams, and generating datasets for classification of these diagrams by 'flattening' them and adding global properties (functionals) etc. Vocalisation diagrams date back to early work in psychiatry (Jaffe and Feldstein, 1970) and social psychology (Dabbs and Ruback, 1987) but have only recently been employed as a data representation method for machine learning tasks including meeting segmentation (Luz, 2012) <doi:10.1145/2328967.2328970> and classification (Luz, 2013) <doi:10.1145/2522848.2533788>.

Maintained by Saturnino Luz. Last updated 3 years ago.

1.7 match 1.70 score

amoeba

pkgsci:Does science on R pakcages

Various utility functions for analyzing coding practices in R packages.

Maintained by Bryce Mecum. Last updated 6 years ago.

1.6 match 1.70 score 1 scripts

bioc

GRaNIE:GRaNIE: Reconstruction cell type specific gene regulatory networks including enhancers using single-cell or bulk chromatin accessibility and RNA-seq data

Genetic variants associated with diseases often affect non-coding regions, thus likely having a regulatory role. To understand the effects of genetic variants in these regulatory regions, identifying genes that are modulated by specific regulatory elements (REs) is crucial. The effect of gene regulatory elements, such as enhancers, is often cell-type specific, likely because the combinations of transcription factors (TFs) that are regulating a given enhancer have cell-type specific activity. This TF activity can be quantified with existing tools such as diffTF and captures differences in binding of a TF in open chromatin regions. Collectively, this forms a gene regulatory network (GRN) with cell-type and data-specific TF-RE and RE-gene links. Here, we reconstruct such a GRN using single-cell or bulk RNAseq and open chromatin (e.g., using ATACseq or ChIPseq for open chromatin marks) and optionally (Capture) Hi-C data. Our network contains different types of links, connecting TFs to regulatory elements, the latter of which is connected to genes in the vicinity or within the same chromatin domain (TAD). We use a statistical framework to assign empirical FDRs and weights to all links using a permutation-based approach.

Maintained by Christian Arnold. Last updated 5 months ago.

software geneexpression generegulation networkinference genesetenrichment biomedicalinformatics genetics transcriptomics atacseq rnaseq graphandnetwork regression transcription chipseq

0.5 match 5.40 score 24 scripts

cran

DevTreatRules:Develop Treatment Rules with Observational Data

Develop and evaluate treatment rules based on: (1) the standard indirect approach of split-regression, which fits regressions separately in both treatment groups and assigns an individual to the treatment option under which predicted outcome is more desirable; (2) the direct approach of outcome-weighted-learning proposed by Yingqi Zhao, Donglin Zeng, A. John Rush, and Michael Kosorok (2012) <doi:10.1080/01621459.2012.695674>; (3) the direct approach, which we refer to as direct-interactions, proposed by Shuai Chen, Lu Tian, Tianxi Cai, and Menggang Yu (2017) <doi:10.1111/biom.12676>. Please see the vignette for a walk-through of how to start with an observational dataset whose design is understood scientifically and end up with a treatment rule that is trustworthy statistically, along with an estimation of rule benefit in an independent sample.

Maintained by Jeremy Roth. Last updated 5 years ago.

1.3 match 2.00 score

hezibu

alien:Estimate Invasive and Alien Species (IAS) Introduction Rates

Easily estimate the introduction rates of alien species given first records data. It specializes in addressing the role of sampling on the pattern of discoveries, thus providing better estimates than using Generalized Linear Models which assume perfect immediate detection of newly introduced species.

Maintained by Yehezkel Buba. Last updated 9 months ago.

0.5 match 1 stars 5.08 score 10 scripts

briandk

granovaGG:Graphical Analysis of Variance Using ggplot2

Create what we call Elemental Graphics for display of anova results. The term elemental derives from the fact that each function is aimed at construction of graphical displays that afford direct visualizations of data with respect to the fundamental questions that drive the particular anova methods. This package represents a modification of the original granova package; the key change is to use 'ggplot2', Hadley Wickham's package based on Grammar of Graphics concepts (due to Wilkinson). The main function is granovagg.1w() (a graphic for one way ANOVA); two other functions (granovagg.ds() and granovagg.contr()) are to construct graphics for dependent sample analyses and contrast-based analyses respectively. (The function granova.2w(), which entails dynamic displays of data, is not currently part of 'granovaGG'.) The 'granovaGG' functions are to display data for any number of groups, regardless of their sizes (however, very large data sets or numbers of groups can be problematic). For granovagg.1w() a specialized approach is used to construct data-based contrast vectors for which anova data are displayed. The result is that the graphics use a straight line to facilitate clear interpretations while being faithful to the standard effect test in anova. The graphic results are complementary to standard summary tables; indeed, numerical summary statistics are provided as side effects of the graphic constructions. granovagg.ds() and granovagg.contr() provide graphic displays and numerical outputs for a dependent sample and contrast-based analyses. The graphics based on these functions can be especially helpful for learning how the respective methods work to answer the basic question(s) that drive the analyses. This means they can be particularly helpful for students and non-statistician analysts. But these methods can be of assistance for work-a-day applications of many kinds, as they can help to identify outliers, clusters or patterns, as well as highlight the role of non-linear transformations of data. In the case of granovagg.1w() and granovagg.ds() several arguments are provided to facilitate flexibility in the construction of graphics that accommodate diverse features of data, according to their corresponding display requirements. See the help files for individual functions.

Maintained by Brian A. Danielak. Last updated 1 years ago.

0.5 match 15 stars 4.90 score 35 scripts

mbq

vistla:Detecting Influence Paths with Information Theory

Traces information spread through interactions between features, utilising information theory measures and a higher-order generalisation of the concept of widest paths in graphs. In particular, 'vistla' can be used to better understand the results of high-throughput biomedical experiments, by organising the effects of the investigated intervention in a tree-like hierarchy from direct to indirect ones, following the plausible information relay circuits. Due to its higher-order nature, 'vistla' can handle multi-modality and assign multiple roles to a single feature.

Maintained by Miron B. Kursa. Last updated 24 days ago.

openmp

0.5 match 4.78 score 3 scripts

thiyangt

DSjobtracker:What Skills and Qualifications are Required for Data Science Related Jobs?

Dataset containing information about job listings for data science job roles.

Maintained by Thiyanga S. Talagala. Last updated 1 years ago.

dataset qualifications skills statistics tidy

0.6 match 3 stars 4.29 score 13 scripts

stscl

sshicm:Information Consistency-Based Measures for Spatial Stratified Heterogeneity

Spatial stratified heterogeneity (SSH) denotes the coexistence of within-strata homogeneity and between-strata heterogeneity. Information consistency-based methods provide a rigorous approach to quantify SSH and evaluate its role in spatial processes, grounded in principles of geographical stratification and information theory (Bai, H. et al. (2023) <doi:10.1080/24694452.2023.2223700>; Wang, J. et al. (2024) <doi:10.1080/24694452.2023.2289982>).

Maintained by Wenbo Lv. Last updated 3 months ago.

geoinformatics geospatial-analysis information-theory spatial-statistics spatial-stratified-heterogeneity cpp

0.5 match 3 stars 4.65 score 2 scripts

bioc

NoRCE:NoRCE: Noncoding RNA Sets Cis Annotation and Enrichment

While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint to a functional association. We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast.

Maintained by Gulden Olgun. Last updated 5 months ago.

biologicalquestion differentialexpression genomeannotation genesetenrichment genetarget genomeassembly go

0.5 match 1 stars 4.60 score 6 scripts

josesamos

geomultistar:Multidimensional Queries Enriched with Geographic Data

Multidimensional systems allow complex queries to be carried out in an easy way. The geographical dimension, together with the temporal dimension, plays a fundamental role in multidimensional systems. Through this package, vector geographic data layers can be associated to the attributes of geographic dimensions, so that the results of multidimensional queries can be obtained directly as vector layers. The multidimensional structures on which we can define the queries can be created from a flat table or imported directly using functions from this package.

Maintained by Jose Samos. Last updated 8 months ago.

0.5 match 2 stars 4.48 score 8 scripts 1 dependents

kisungyou

mclustcomp:Measures for Comparing Clusters

Given a set of data points, a clustering is defined as a disjoint partition where each pair of sets in a partition has no overlapping elements. This package provides 25 methods that play a role somewhat similar to distance or metric that measures similarity of two clusterings - or partitions. For a more detailed description, see Meila, M. (2005) <doi:10.1145/1102351.1102424>.

Maintained by Kisung You. Last updated 2 years ago.

cpp

0.5 match 1 stars 4.43 score 18 scripts 10 dependents

bioc

RLassoCox:A reweighted Lasso-Cox by integrating gene interaction information

RLassoCox is a package that implements the RLasso-Cox model proposed by Wei Liu. The RLasso-Cox model integrates gene interaction information into the Lasso-Cox model for accurate survival prediction and survival biomarker discovery. It is based on the hypothesis that topologically important genes in the gene interaction network tend to have stable expression changes. The RLasso-Cox model uses random walk to evaluate the topological weight of genes, and then highlights topologically important genes to improve the generalization ability of the Lasso-Cox model. The RLasso-Cox model has the advantage of identifying small gene sets with high prognostic performance on independent datasets, which may play an important role in identifying robust survival biomarkers for various cancer types.

Maintained by Wei Liu. Last updated 5 months ago.

survival regression geneexpression geneprediction network

0.5 match 3 stars 4.48 score 2 scripts

bioc

flowSpecs:Tools for processing of high-dimensional cytometry data

This package is intended to fill the role of conventional cytometry pre-processing software, for spectral decomposition, transformation, visualization and cleanup, and to aid further downstream analyses, such as with DepecheR, by enabling transformation of flowFrames and flowSets to dataframes. Functions for flowCore-compliant automatic 1D-gating/filtering are in the pipe line. The package name has been chosen both as it will deal with spectral cytometry and as it will hopefully give the user a nice pair of spectacles through which to view their data.

Maintained by Jakob Theorell. Last updated 5 months ago.

software cellbasedassays datarepresentation immunooncology flowcytometry singlecell visualization normalization dataimport

0.5 match 6 stars 4.38 score 7 scripts

bioc

transite:RNA-binding protein motif analysis

transite is a computational method that allows comprehensive analysis of the regulatory role of RNA-binding proteins in various cellular processes by leveraging preexisting gene expression data and current knowledge of binding preferences of RNA-binding proteins.

Maintained by Konstantin Krismer. Last updated 5 months ago.

geneexpression transcription differentialexpression microarray mrnamicroarray genetics genesetenrichment cpp

0.5 match 4.30 score 20 scripts

bbcrown

solrad:Calculating Solar Radiation and Related Variables Based on Location, Time and Topographical Conditions

For surface energy models and estimation of solar positions and components with varying topography, time and locations. The functions calculate solar top-of-atmosphere, open, diffuse and direct components, atmospheric transmittance and diffuse factors, day length, sunrise and sunset, solar azimuth, zenith, altitude, incidence, and hour angles, earth declination angle, equation of time, and solar constant. Details about the methods and equations are explained in Seyednasrollah, Bijan, Mukesh Kumar, and Timothy E. Link. 'On the role of vegetation density on net snow cover radiation at the forest floor.' Journal of Geophysical Research: Atmospheres 118.15 (2013): 8359-8374, <doi:10.1002/jgrd.50575>.

Maintained by Bijan Seyednasrollah. Last updated 6 years ago.

0.5 match 10 stars 4.32 score 42 scripts

josesamos

geodimension:Definition of Geographic Dimensions

The geographic dimension plays a fundamental role in multidimensional systems. To define a geographic dimension in a star schema, we need a table with attributes corresponding to the levels of the dimension. Additionally, we will also need one or more geographic layers to represent the data using this dimension. The goal of this package is to support the definition of geographic dimensions from layers of geographic information related to each other. It makes it easy to define relationships between layers and obtain the necessary data from them.

Maintained by Jose Samos. Last updated 1 years ago.

0.5 match 2 stars 4.00 score 8 scripts

jefriedel

discAUC:Linear and Non-Linear AUC for Discounting Data

Area under the curve (AUC; Myerson et al., 2001) <doi:10.1901/jeab.2001.76-235> is a popular measure used in discounting research. Although the calculation of AUC is standardized, there are differences in AUC based on some assumptions. For example, Myerson et al. (2001) <doi:10.1901/jeab.2001.76-235> assumed that (with delay discounting data) a researcher would impute an indifference point at zero delay equal to the value of the larger, later outcome. However, this practice is not clearly followed. This imputed zero-delay indifference point plays an important role in log and ordinal versions of AUC. Ordinal and log versions of AUC are described by Borges et al. (2016)<doi:10.1002/jeab.219>. The package can calculate all three versions of AUC [and includes a new version: IHS(AUC)], impute indifference points when x = 0, calculate ordinal AUC in the case of Halton sampling of x-values, and account for probability discounting AUC.

Maintained by Jonathan E. Friedel. Last updated 5 months ago.

0.5 match 4.00 score 5 scripts

bioc

DMCHMM:Differentially Methylated CpG using Hidden Markov Model

A pipeline for identifying differentially methylated CpG sites using Hidden Markov Model in bisulfite sequencing data. DNA methylation studies have enabled researchers to understand methylation patterns and their regulatory roles in biological processes and disease. However, only a limited number of statistical approaches have been developed to provide formal quantitative analysis. Specifically, a few available methods do identify differentially methylated CpG (DMC) sites or regions (DMR), but they suffer from limitations that arise mostly due to challenges inherent in bisulfite sequencing data. These challenges include: (1) that read-depths vary considerably among genomic positions and are often low; (2) both methylation and autocorrelation patterns change as regions change; and (3) CpG sites are distributed unevenly. Furthermore, there are several methodological limitations: almost none of these tools is capable of comparing multiple groups and/or working with missing values, and only a few allow continuous or multiple covariates. The last of these is of great interest among researchers, as the goal is often to find which regions of the genome are associated with several exposures and traits. To tackle these issues, we have developed an efficient DMC identification method based on Hidden Markov Models (HMMs) called “DMCHMM” which is a three-step approach (model selection, prediction, testing) aiming to address the aforementioned drawbacks.

Maintained by Farhad Shokoohi. Last updated 5 months ago.

differentialmethylation sequencing hiddenmarkovmodel coverage

0.5 match 3.78 score 3 scripts

augustobrusaca

KHQ:Methods for Calculating 'KHQ' Scores and 'KHQ5D' Utility Index Scores

The King's Health Questionnaire (KHQ) is a disease-specific, self-administered questionnaire designed specific to assess the impact of Urinary Incontinence (UI) on Quality of Life. The questionnaire was developed by Kelleher and collaborators (1997) <doi:10.1111/j.1471-0528.1997.tb11006.x>. It is a simple, acceptable and reliable measure to use in the clinical setting and a research tool that is useful in evaluating UI treatment outcomes. The KHQ five dimensions (KHQ5D) is a condition-specific preference-based measure developed by Brazier and collaborators (2008) <doi:10.1177/0272989X07301820>. Although not as popular as the SF6D <doi:10.1016/S0895-4356(98)00103-6> and EQ-5D <https://euroqol.org/>, the KHQ5D measures health-related quality of life (HRQoL) specifically for UI, not general conditions like the others two instruments mentioned. The KHQ5D ca be used in the clinical and economic evaluation of health care. The subject self-rates their health in terms of five dimensions: Role Limitation (RL), Physical Limitations (PL), Social Limitations (SL), Emotions (E), and Sleep (S). Frequently the states on these five dimensions are converted to a single utility index using country specific value sets, which can be used in the clinical and economic evaluation of health care as well as in population health surveys. This package provides methods to calculate scores for each dimension of the KHQ; converts KHQ item scores to KHQ5D scores; and also calculates the utility index of the KHQ5D.

Maintained by Luiz Augusto Brusaca. Last updated 4 years ago.

0.5 match 3.70 score 4 scripts

florian-laroumagne

previsionio:'Prevision.io' R SDK

For working with the 'Prevision.io' AI model management platform's API <https://prevision.io/>.

Maintained by Florian Laroumagne. Last updated 3 years ago.

1.8 match 1.00 score 1 scripts

kolassa-dev

PHInfiniteEstimates:Tools for Inference in the Presence of a Monotone Likelihood

Proportional hazards estimation in the presence of a partially monotone likelihood has difficulties, in that finite estimators do not exist. These difficulties are related to those arising from logistic and multinomial regression. References for methods are given in the separate function documents. Supported by grant NSF DMS 1712839.

Maintained by John E. Kolassa. Last updated 1 years ago.

1.6 match 1.00 score

cran

HDMT:A Multiple Testing Procedure for High-Dimensional Mediation Hypotheses

A multiple-testing procedure for high-dimensional mediation hypotheses. Mediation analysis is of rising interest in epidemiology and clinical trials. Among existing methods for mediation analyses, the popular joint significance (JS) test yields an overly conservative type I error rate and therefore low power. In the R package 'HDMT' we implement a multiple-testing procedure that accurately controls the family-wise error rate (FWER) and the false discovery rate (FDR) when using JS for testing high-dimensional mediation hypotheses. The core of our procedure is based on estimating the proportions of three component null hypotheses and deriving the corresponding mixture distribution of null p-values. Results of the data examples include better-behaved quantile-quantile plots and improved detection of novel mediation relationships on the role of DNA methylation in genetic regulation of gene expression. With increasing interest in mediation by molecular intermediaries such as gene expression, the proposed method addresses an unmet methodological challenge. Methods used in the package refer to James Y. Dai, Janet L. Stanford & Michael LeBlanc (2020) <doi:10.1080/01621459.2020.1765785>.

Maintained by James Dai. Last updated 3 years ago.

0.5 match 2.86 score 12 scripts 2 dependents

cran

NST:Normalized Stochasticity Ratio

To estimate ecological stochasticity in community assembly. Understanding the community assembly mechanisms controlling biodiversity patterns is a central issue in ecology. Although it is generally accepted that both deterministic and stochastic processes play important roles in community assembly, quantifying their relative importance is challenging. The new index, normalized stochasticity ratio (NST), is to estimate ecological stochasticity, i.e. relative importance of stochastic processes, in community assembly. With functions in this package, NST can be calculated based on different similarity metrics and/or different null model algorithms, as well as some previous indexes, e.g. previous Stochasticity Ratio (ST), Standard Effect Size (SES), modified Raup-Crick metrics (RC). Functions for permutational test and bootstrapping analysis are also included. Previous ST is published by Zhou et al (2014) <doi:10.1073/pnas.1324044111>. NST is modified from ST by considering two alternative situations and normalizing the index to range from 0 to 1 (Ning et al 2019) <doi:10.1073/pnas.1904623116>. A modified version, MST, is a special case of NST, used in some recent or upcoming publications, e.g. Liang et al (2020) <doi:10.1016/j.soilbio.2020.108023>. SES is calculated as described in Kraft et al (2011) <doi:10.1126/science.1208584>. RC is calculated as reported by Chase et al (2011) <doi:10.1890/ES10-00117.1> and Stegen et al (2013) <doi:10.1038/ismej.2013.93>. Version 3 added NST based on phylogenetic beta diversity, used by Ning et al (2020) <doi:10.1038/s41467-020-18560-z>.

Maintained by Daliang Ning. Last updated 3 years ago.

0.5 match 2 stars 2.85 score 35 scripts

r-forge

multiColl:Collinearity Detection in a Multiple Linear Regression Model

The detection of worrying approximate collinearity in a multiple linear regression model is a problem addressed in all existing statistical packages. However, we have detected deficits regarding to the incorrect treatment of qualitative independent variables and the role of the intercept of the model. The objective of this package is to correct these deficits. In this package will be available detection and treatment techniques traditionally used as the recently developed. D.A. Belsley (1982) <doi:10.1016/0304-4076(82)90020-3>. D. A. Belsley (1991, ISBN: 978-0471528890). C. Garcia, R. Salmeron and C.B. Garcia (2019) <doi:10.1080/00949655.2018.1543423>. R. Salmeron, C.B. Garcia and J. Garcia (2018) <doi:10.1080/00949655.2018.1463376>. G.W. Stewart (1987) <doi:10.1214/ss/1177013444>.

Maintained by R. Salmeron. Last updated 2 years ago.

0.5 match 2.59 score 13 scripts 1 dependents

cran

cencrne:Consistent Estimation of the Number of Communities via Regularized Network Embedding

The network analysis plays an important role in numerous application domains including biomedicine. Estimation of the number of communities is a fundamental and critical issue in network analysis. Most existing studies assume that the number of communities is known a priori, or lack of rigorous theoretical guarantee on the estimation consistency. This method proposes a regularized network embedding model to simultaneously estimate the community structure and the number of communities in a unified formulation. The proposed model equips network embedding with a novel composite regularization term, which pushes the embedding vector towards its center and collapses similar community centers with each other. A rigorous theoretical analysis is conducted, establishing asymptotic consistency in terms of community detection and estimation of the number of communities. Reference: Ren, M., Zhang S. and Wang J. (2022). "Consistent Estimation of the Number of Communities via Regularized Network Embedding". Biometrics, <doi:10.1111/biom.13815>.

Maintained by Mingyang Ren. Last updated 2 years ago.

0.5 match 2.00 score

wittenburg

hsrecombi:Estimation of Recombination Rate and Maternal LD in Half-Sibs

Paternal recombination rate and maternal linkage disequilibrium (LD) are estimated for pairs of biallelic markers such as single nucleotide polymorphisms (SNPs) from progeny genotypes and sire haplotypes. The implementation relies on paternal half-sib families. If maternal half-sib families are used, the roles of sire/dam are swapped. Multiple families can be considered. For parameter estimation, at least one sire has to be double heterozygous at the investigated pairs of SNPs. Based on recombination rates, genetic distances between markers can be estimated. Markers with unusually large recombination rate to markers in close proximity (i.e. putatively misplaced markers) shall be discarded in this derivation. A workflow description is attached as vignette. *A pipeline is available at GitHub* <https://github.com/wittenburg/hsrecombi> Hampel, Teuscher, Gomez-Raya, Doschoris, Wittenburg (2018) "Estimation of recombination rate and maternal linkage disequilibrium in half-sibs" <doi:10.3389/fgene.2018.00186>. Gomez-Raya (2012) "Maximum likelihood estimation of linkage disequilibrium in half-sib families" <doi:10.1534/genetics.111.137521>.

Maintained by Dörte Wittenburg. Last updated 2 years ago.

cpp

0.5 match 2.00 score 7 scripts

matt-dray

potato:Play a Game of 'Potato'

Play in your console an interactive version of 'Potato', a one-page role-playing game by Oliver Darkshire.

Maintained by Matt Dray. Last updated 3 years ago.

halfling potatoes

0.5 match 1.70 score 6 scripts

fbertran

granova:Graphical Analysis of Variance

This small collection of functions provides what we call elemental graphics for display of analysis of variance results, David C. Hoaglin, Frederick Mosteller and John W. Tukey (1991, ISBN:978-0-471-52735-0), Paul R. Rosenbaum (1989) <doi:10.2307/2684513>, Robert M. Pruzek and James E. Helmreich <https://jse.amstat.org/v17n1/helmreich.html>. The term elemental derives from the fact that each function is aimed at construction of graphical displays that afford direct visualizations of data with respect to the fundamental questions that drive the particular analysis of variance methods. These functions can be particularly helpful for students and non-statistician analysts. But these methods should be quite generally helpful for work-a-day applications of all kinds, as they can help to identify outliers, clusters or patterns, as well as highlight the role of non-linear transformations of data.

Maintained by Frederic Bertrand. Last updated 2 years ago.

0.5 match 1.56 score 36 scripts

diprosinha

OpEnHiMR:Optimization Based Ensemble Model for Prediction of Histone Modifications in Rice

The comprehensive knowledge of epigenetic modifications in plants, encompassing histone modifications in regulating gene expression, is not completely ingrained. It is noteworthy that histone deacetylation and histone H3 lysine 27 trimethylation (H3K27me3) play a role in repressing transcription in eukaryotes. In contrast, histone acetylation (H3K9ac) and H3K4me3 have been inevitably linked to the stimulation of gene expression, which significantly influences plant development and plays a role in plant responses to biotic and abiotic stresses. To our knowledge this the first multiclass classifier for predicting histone modification in plants. <doi:10.1186/s12864-019-5489-4>.

Maintained by Dipro Sinha. Last updated 10 months ago.

0.8 match 1.00 score

cran

selectiongain:A Tool for Calculation and Optimization of the Expected Gain from Multi-Stage Selection

Multi-stage selection is practiced in numerous fields of life and social sciences and particularly in breeding. A special characteristic of multi-stage selection is that candidates are evaluated in successive stages with increasing intensity and effort, and only a fraction of the superior candidates is selected and promoted to the next stage. For the optimum design of such selection programs, the selection gain plays a crucial role. It can be calculated by integration of a truncated multivariate normal (MVN) distribution. While mathematical formulas for calculating the selection gain and the variance among selected candidates were developed long time ago, solutions for numerical calculation were not available. This package can also be used for optimizing multi-stage selection programs for a given total budget and different costs of evaluating the candidates in each stage.

Maintained by Xuefei Mi. Last updated 2 years ago.

0.5 match 2 stars 1.51 score 16 scripts

slimaneregui

PerRegMod:Fitting Periodic Coefficients Linear Regression Models

Provides tools for fitting periodic coefficients regression models to data where periodicity plays a crucial role. It allows users to model and analyze relationships between variables that exhibit cyclical or seasonal patterns, offering functions for estimating parameters and testing the periodicity of coefficients in linear regression models. For simple periodic coefficient regression model see Regui et al. (2024) <doi:10.1080/03610918.2024.2314662>.

Maintained by Slimane Regui. Last updated 3 months ago.

0.5 match 1.48 score 1 scripts

randel

GMAC:Genomic Mediation Analysis with Adaptive Confounding Adjustment

Performs genomic mediation analysis with adaptive confounding adjustment (GMAC) proposed by Yang et al. (2017) <doi:10.1101/078683>. It implements large scale mediation analysis and adaptively selects potential confounding variables to adjust for each mediation test from a pool of candidate confounders. The package is tailored for but not limited to genomic mediation analysis (e.g., cis-gene mediating trans-gene regulation pattern where an eQTL, its cis-linking gene transcript, and its trans-gene transcript play the roles as treatment, mediator and the outcome, respectively), restricting to scenarios with the presence of cis-association (i.e., treatment-mediator association) and random eQTL (i.e., treatment).

Maintained by Jiebiao Wang. Last updated 3 years ago.

0.5 match 1.48 score 3 scripts

xliaosdsu

ShapeSelectForest:Shape Selection for Landsat Time Series of Forest Dynamics

Landsat satellites collect important data about global forest conditions. Documentation about Landsat's role in forest disturbance estimation is available at the site <https://landsat.gsfc.nasa.gov/>. By constrained quadratic B-splines, this package delivers an optimal shape-restricted trajectory to a time series of Landsat imagery for the purpose of modeling annual forest disturbance dynamics to behave in an ecologically sensible manner assuming one of seven possible "shapes", namely, flat, decreasing, one-jump (decreasing, jump up, decreasing), inverted vee (increasing then decreasing), vee (decreasing then increasing), linear increasing, and double-jump (decreasing, jump up, decreasing, jump up, decreasing). The main routine selects the best shape according to the minimum Bayes information criterion (BIC) or the cone information criterion (CIC), which is defined as the log of the estimated predictive squared error. The package also provides parameters summarizing the temporal pattern including year(s) of inflection, magnitude of change, pre- and post-inflection rates of growth or recovery. In addition, it contains routines for converting a flat map of disturbance agents to time-series disturbance maps and a graphical routine displaying the fitted trajectory of Landsat imagery.

Maintained by Xiyue Liao. Last updated 2 years ago.

0.5 match 1.48 score 9 scripts

cran

DNAmotif:DNA Sequence Motifs

Motifs within biological sequences show a significant role. This package utilizes a user-defined threshold value (window size and similarity) to create consensus segments or motifs through local alignment of dynamic programming with gap and it calculates the frequency of each identified motif, offering a detailed view of their prevalence within the dataset. It allows for thorough exploration and understanding of sequence patterns and their biological importance.

Maintained by Subham Ghosh. Last updated 6 months ago.

cpp

0.5 match 1.30 score

cran

censCov:Linear Regression with a Randomly Censored Covariate

Implementations of threshold regression approaches for linear regression models with a covariate subject to random censoring, including deletion threshold regression and completion threshold regression. Reverse survival regression, which flip the role of response variable and the covariate, is also considered.

Maintained by Sy Han (Steven) Chiou. Last updated 8 years ago.

0.5 match 1.00 score 1 scripts

cran

EnviroPRA2:Environmental Probabilistic Risk Assessment Tools

It contains functions for dose calculation for different routes, fitting data to probability distributions, random number generation (Monte Carlo simulation) and calculation of systemic and carcinogenic risks. For more information see the publication: Barrio-Parra et al. (2019) "Human-health probabilistic risk assessment: the role of exposure factors in an urban garden scenario" <doi:10.1016/j.landurbplan.2019.02.005>.

Maintained by Fernando Barrio-Parra. Last updated 1 years ago.

0.5 match 1.00 score

cran

CoreMicrobiomeR:Identification of Core Microbiome

The Core Microbiome refers to the group of microorganisms that are consistently present in a particular environment, habitat, or host species. These microorganisms play a crucial role in the functioning and stability of that ecosystem. Identifying these microorganisms can contribute to the emerging field of personalized medicine. The 'CoreMicrobiomeR' is designed to facilitate the identification, statistical testing, and visualization of this group of microorganisms.This package offers three key functions to analyze and visualize microbial community data. This package has been developed based on the research papers published by Pereira et al.(2018) <doi:10.1186/s12864-018-4637-6> and Beule L, Karlovsky P. (2020) <doi:10.7717/peerj.9593>.

Maintained by Mohammad Samir Farooqi. Last updated 12 months ago.

0.5 match 1.00 score

cran

MEGB:Gradient Boosting for Longitudinal Data

Gradient boosting is a powerful statistical learning method known for its ability to model complex relationships between predictors and outcomes while performing inherent variable selection. However, traditional gradient boosting methods lack flexibility in handling longitudinal data where within-subject correlations play a critical role. In this package, we propose a novel approach Mixed Effect Gradient Boosting ('MEGB'), designed specifically for high-dimensional longitudinal data. 'MEGB' incorporates a flexible semi-parametric model that embeds random effects within the gradient boosting framework, allowing it to account for within-individual covariance over time. Additionally, the method efficiently handles scenarios where the number of predictors greatly exceeds the number of observations (p>>n) making it particularly suitable for genomics data and other large-scale biomedical studies.

Maintained by Oyebayo Ridwan Olaniran. Last updated 2 months ago.

0.5 match 1.00 score

safai

varbin:Optimal Binning of Continuous and Categorical Variables

Tool for easy and efficient discretization of continuous and categorical data. The package calculates the most optimal binning of a given explanatory variable with respect to a user-specified target variable. The purpose is to assign a unique Weight-of-Evidence value to each of the calculated binpoints in order to recode the original variable. The package allows users to impose certain restrictions on the functional form on the resulting binning while maximizing the overall information value in the original data. The package is well suited for logistic scoring models where input variables may be subject to restrictions such as linearity by e.g. regulatory authorities. An excellent source describing in detail the development of scorecards, and the role of Weight-of-Evidence coding in credit scoring is (Siddiqi 2006, ISBN: 978–0-471–75451–0). The package utilizes the discrete nature of decision trees and Isotonic Regression to accommodate the trade-off between flexible functional forms and maximum information value.

Maintained by Daniel Safai. Last updated 6 years ago.

0.5 match 1.00 score 6 scripts

wittenburg

hscovar:Calculation of Covariance Between Markers for Half-Sib Families

The theoretical covariance between pairs of markers is calculated from either paternal haplotypes and maternal linkage disequilibrium (LD) or vise versa. A genetic map is required. Grouping of markers is based on the correlation matrix and a representative marker is suggested for each group. Employing the correlation matrix, optimal sample size can be derived for association studies based on a SNP-BLUP approach. The implementation relies on paternal half-sib families and biallelic markers. If maternal half-sib families are used, the roles of sire/dam are swapped. Multiple families can be considered. Wittenburg, Bonk, Doschoris, Reyer (2020) "Design of Experiments for Fine-Mapping Quantitative Trait Loci in Livestock Populations" <doi:10.1186/s12863-020-00871-1>. Carlson, Eberle, Rieder, Yi, Kruglyak, Nickerson (2004) "Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium" <doi:10.1086/381000>.

Maintained by Dörte Wittenburg. Last updated 4 years ago.

0.5 match 1.00 score 2 scripts

cran

GEInter:Robust Gene-Environment Interaction Analysis

Description: For the risk, progression, and response to treatment of many complex diseases, it has been increasingly recognized that gene-environment interactions play important roles beyond the main genetic and environmental effects. In practical interaction analyses, outliers in response variables and covariates are not uncommon. In addition, missingness in environmental factors is routinely encountered in epidemiological studies. The developed package consists of five robust approaches to address the outliers problems, among which two approaches can also accommodate missingness in environmental factors. Both continuous and right censored responses are considered. The proposed approaches are based on penalization and sparse boosting techniques for identifying important interactions, which are realized using efficient algorithms. Beyond the gene-environment analysis, the developed package can also be adopted to conduct analysis on interactions between other types of low-dimensional and high-dimensional data. (Mengyun Wu et al (2017), <doi:10.1080/00949655.2018.1523411>; Mengyun Wu et al (2017), <doi:10.1002/gepi.22055>; Yaqing Xu et al (2018), <doi:10.1080/00949655.2018.1523411>; Yaqing Xu et al (2019), <doi:10.1016/j.ygeno.2018.07.006>; Mengyun Wu et al (2021), <doi:10.1093/bioinformatics/btab318>).

Maintained by Xing Qin. Last updated 3 years ago.

0.5 match 1.00 score

harryyy213

descstatsr:Descriptive Univariate Statistics

It generates summary statistics on the input dataset using different descriptive univariate statistical measures on entire data or at a group level. Though there are other packages which does similar job but each of these are deficient in one form or other, in the measures generated, in treating numeric, character and date variables alike, no functionality to view these measures on a group level or the way the output is represented. Given the foremost role of the descriptive statistics in any of the exploratory data analysis or solution development, there is a need for a more constructive, structured and refined version over these packages. This is the idea behind the package and it brings together all the required descriptive measures to give an initial understanding of the data quality, distribution in a faster,easier and elaborative way.The function brings an additional capability to be able to generate these statistical measures on the entire dataset or at a group level. It calculates measures of central tendency (mean, median), distribution (count, proportion), dispersion (min, max, quantile, standard deviation, variance) and shape (skewness, kurtosis). Addition to these measures, it provides information on the data type, count on no. of rows, unique entries and percentage of missing entries. More importantly the measures are generated based on the data types as required by them,rather than applying numerical measures on character and data variables and vice versa. Output as a dataframe object gives a very neat representation, which often is useful when working with a large number of columns. It can easily be exported as csv and analyzed further or presented as a summary report for the data.

Maintained by Harish Kumar. Last updated 6 years ago.

0.5 match 1.00 score 1 scripts