R-universe search: pos

skranz

stringtools:Tools for working with strings in R

Tools for working with strings in R

Maintained by Sebastian Kranz. Last updated 3 years ago.

43.5 match 2 stars 3.66 score 29 scripts 26 dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

12.6 match 215 stars 11.83 score 1.2k scripts 9 dependents

brodieg

vetr:Trust, but Verify

Declarative template-based framework for verifying that objects meet structural requirements, and auto-composing error messages when they do not.

Maintained by Brodie Gaslam. Last updated 9 months ago.

argument-checks input-validation

12.4 match 79 stars 7.50 score 67 scripts 1 dependents

junhewk

RcppMeCab:'rcpp' Wrapper for 'mecab' Library

R package based on 'Rcpp' for 'MeCab': Yet Another Part-of-Speech and Morphological Analyzer. The purpose of this package is providing a seamless developing and analyzing environment for CJK texts. This package utilizes parallel programming for providing highly efficient text preprocessing 'posParallel()' function. For installation, please refer to README.md file.

Maintained by Junhewk Kim. Last updated 7 months ago.

cjk nlp pos rcpp tagger mecab cpp

14.3 match 25 stars 5.30 score 40 scripts

bioc

BiocGenerics:S4 generic functions used in Bioconductor

The package defines many S4 generic functions used in Bioconductor.

Maintained by Hervé Pagès. Last updated 2 months ago.

infrastructure bioconductor-package core-package

5.0 match 12 stars 14.22 score 612 scripts 2.2k dependents

trinker

qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis

Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.

Maintained by Tyler Rinker. Last updated 5 years ago.

qdap quantitative-discourse-analysis text-analysis text-mining text-plotting openjdk

7.3 match 176 stars 9.47 score 1.3k scripts 3 dependents

bioc

GenomicRanges:Representation and manipulation of genomic intervals

The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.

Maintained by Hervé Pagès. Last updated 5 months ago.

genetics infrastructure datarepresentation sequencing annotation genomeannotation coverage bioconductor-package core-package

3.3 match 44 stars 17.68 score 13k scripts 1.3k dependents

bioc

IRanges:Foundation of integer range manipulation in Bioconductor

Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.

Maintained by Hervé Pagès. Last updated 2 months ago.

infrastructure datarepresentation bioconductor-package core-package

3.3 match 22 stars 16.09 score 2.1k scripts 1.8k dependents

paithiov909

gibasa:An Alternative 'Rcpp' Wrapper of 'MeCab'

A plain 'Rcpp' wrapper for 'MeCab' that can segment Chinese, Japanese, and Korean text into tokens. The main goal of this package is to provide an alternative to 'tidytext' using morphological analysis.

Maintained by Akiru Kato. Last updated 14 days ago.

mecab pos-tagging rcpp cpp

9.3 match 15 stars 5.02 score 3 scripts

cvxgrp

CVXR:Disciplined Convex Optimization

An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.

Maintained by Anqi Fu. Last updated 5 months ago.

cpp

3.3 match 207 stars 12.89 score 768 scripts 51 dependents

jinghuazhao

gap:Genetic Analysis Package

As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).

Maintained by Jing Hua Zhao. Last updated 7 days ago.

genetics imputation lmm fortran

3.4 match 12 stars 11.94 score 448 scripts 16 dependents

mlr-org

mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'

Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.

Maintained by Martin Binder. Last updated 24 days ago.

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing stacking

3.3 match 141 stars 12.36 score 448 scripts 7 dependents

markbravington

mvbutils:General utilities, workspace organization, code and docu editing, live package maintenance, etc

Hierarchical workspace tree, code editing and backup, easy package prep, editing of packages while loaded, per-object lazy-loading, easy documentation, macro functions, and miscellaneous utilities. Needed by debug package.

Maintained by Mark V. Bravington. Last updated 7 days ago.

5.6 match 6.57 score 138 scripts 18 dependents

bioc

clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters

Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.

Maintained by Rui Fu. Last updated 5 months ago.

singlecell annotation sequencing microarray geneexpression assign-identities clusters marker-genes rna-seq single-cell-rna-seq

3.4 match 120 stars 9.63 score 296 scripts

computationalstylistics

stylo:Stylometric Multivariate Analyses

Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.

Maintained by Maciej Eder. Last updated 3 months ago.

3.8 match 187 stars 8.58 score 462 scripts

skranz

RTutor:Interactive R problem sets with automatic testing of solutions and automatic hints

Interactive R problem sets with automatic testing of solutions and automatic hints

Maintained by Sebastian Kranz. Last updated 1 years ago.

economics learn-to-code problem-set rstudio rtutor shiny teaching

5.3 match 205 stars 5.83 score 111 scripts 1 dependents

mlr-org

mlr3verse:Easily Install and Load the 'mlr3' Package Family

The 'mlr3' package family is a set of packages for machine-learning purposes built in a modular fashion. This wrapper package is aimed to simplify the installation and loading of the core 'mlr3' packages. Get more information about the 'mlr3' project at <https://mlr3book.mlr-org.com/>.

Maintained by Marc Becker. Last updated 3 months ago.

machine-learning meta mlr3

3.3 match 55 stars 8.32 score 720 scripts 1 dependents

bioc

CNAnorm:A normalization method for Copy Number Aberration in cancer samples

Performs ratio, GC content correction and normalization of data obtained using low coverage (one read every 100-10,000 bp) high troughput sequencing. It performs a "discrete" normalization looking for the ploidy of the genome. It will also provide tumour content if at least two ploidy states can be found.

Maintained by Stefano Berri. Last updated 5 months ago.

copynumbervariation sequencing coverage normalization wholegenome dnaseq genomicvariation fortran

6.3 match 4.30 score 6 scripts

shusei-e

RcppJagger:An R Wrapper for Jagger

A wrapper for Jagger, a morphological analyzer proposed in Yoshinaga (2023) <arXiv:2305.19045>. Jagger uses patterns derived from morphological dictionaries and training data sets and applies them from the beginning of the input. This simultaneous and deterministic process enables it to effectively perform tokenization, POS tagging, and lemmatization.

Maintained by Shusei Eshima. Last updated 2 years ago.

japanese-nlp morphological-analyser nlp part-of-speech-tagger text-analysis cpp

8.5 match 3 stars 3.18 score 3 scripts

jamesramsay5

fda:Functional Data Analysis

These functions were developed to support functional data analysis as described in Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. New York: Springer and in Ramsay, J. O., Hooker, Giles, and Graves, Spencer (2009). Functional Data Analysis with R and Matlab (Springer). The package includes data sets and script files working many examples including all but one of the 76 figures in this latter book. Matlab versions are available by ftp from <https://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/>.

Maintained by James Ramsay. Last updated 4 months ago.

2.3 match 3 stars 11.88 score 2.0k scripts 142 dependents

repboxr

repboxUtils:Utility functions shared by several repbox packages

Utility functions shared by several repbox packages

Maintained by Sebastian Kranz. Last updated 2 months ago.

6.1 match 4.21 score 9 dependents

rpahl

container:Extending Base 'R' Lists

Extends the functionality of base 'R' lists and provides specialized data structures 'deque', 'set', 'dict', and 'dict.table', the latter to extend the 'data.table' package.

Maintained by Roman Pahl. Last updated 3 months ago.

container data-structures deque dict sets

3.3 match 16 stars 7.13 score 140 scripts

knausb

vcfR:Manipulate and Visualize VCF Data

Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.

Maintained by Brian J. Knaus. Last updated 1 months ago.

genomics population-genetics population-genomics rcpp vcf-data visualization zlib cpp

1.7 match 256 stars 13.66 score 3.1k scripts 19 dependents

r-forge

tm:Text Mining Package

A framework for text mining applications within R.

Maintained by Kurt Hornik. Last updated 1 months ago.

cpp

1.8 match 13.00 score 14k scripts 100 dependents

patrickroocks

rPref:Database Preferences and Skyline Computation

Routines to select and visualize the maxima for a given strict partial order. This especially includes the computation of the Pareto frontier, also known as (Top-k) Skyline operator (see Börzsönyi, et al. (2001) <doi:10.1109/ICDE.2001.914855>), and some generalizations known as database preferences (see Kießling (2002) <doi:10.1016/B978-155860869-6/50035-4>).

Maintained by Patrick Roocks. Last updated 2 years ago.

cpp

3.3 match 2 stars 6.55 score 115 scripts 4 dependents

pbs-software

PBSmapping:Mapping Fisheries Data and Spatial Analysis Tools

This software has evolved from fisheries research conducted at the Pacific Biological Station (PBS) in 'Nanaimo', British Columbia, Canada. It extends the R language to include two-dimensional plotting features similar to those commonly available in a Geographic Information System (GIS). Embedded C code speeds algorithms from computational geometry, such as finding polygons that contain specified point events or converting between longitude-latitude and Universal Transverse Mercator (UTM) coordinates. Additionally, we include 'C++' code developed by Angus Johnson for the 'Clipper' library, data for a global shoreline, and other data sets in the public domain. Under the user's R library directory '.libPaths()', specifically in './PBSmapping/doc', a complete user's guide is offered and should be consulted to use package functions effectively.

Maintained by Rowan Haigh. Last updated 6 months ago.

cpp

1.9 match 11 stars 10.16 score 652 scripts 9 dependents

mrcieu

ieugwasr:Interface to the 'OpenGWAS' Database API

Interface to the 'OpenGWAS' database API <https://api.opengwas.io/api/>. Includes a wrapper to make generic calls to the API, plus convenience functions for specific queries.

Maintained by Gibran Hemani. Last updated 18 days ago.

1.8 match 89 stars 10.71 score 404 scripts 6 dependents

agusnieto77

ACEP:Análisis Computacional de Eventos de Protesta

La librería 'ACEP' contiene funciones específicas para desarrollar análisis computacional de eventos de protesta. Asimismo, contiene bases de datos con colecciones de notas sobre protestas y diccionarios de palabras conflictivas. La colección de diccionarios reune diccionarios de diferentes orígenes. The 'ACEP' library contains specific functions to perform computational analysis of protest events. It also contains a database with collections of notes on protests and dictionaries of conflicting words. Collection of dictionaries that brings together dictionaries from different sources.

Maintained by Agustín Nieto. Last updated 1 years ago.

computer-aided-detection conflict-analysis conflict-detection dictionaries nlp-keywords-extraction protest-events text-mining visualization

3.4 match 10 stars 5.48 score 9 scripts

paithiov909

sudachir2:R Wrapper for 'sudachi.rs'

Offers bindings to 'sudachi.rs' <https://github.com/WorksApplications/sudachi.rs>, a Rust implementation of 'Sudachi' Japanese morphological analyzer.

Maintained by Akiru Kato. Last updated 11 days ago.

pos-tagging rust cargo

7.5 match 3 stars 2.48 score 3 scripts

icosa-grid

icosa:Global Triangular and Penta-Hexagonal Grids Based on Tessellated Icosahedra

Implementation of icosahedral grids in three dimensions. The spherical-triangular tessellation can be set to create grids with custom resolutions. Both the primary triangular and their inverted penta-hexagonal grids can be calculated. Additional functions are provided that allow plotting of the grids and associated data, the interaction of the grids with other raster and vector objects, and treating the grids as a graphs.

Maintained by Adam T. Kocsis. Last updated 8 months ago.

grid cpp

3.3 match 4 stars 5.41 score 65 scripts

paithiov909

vibrrt:An R Wrapper for 'vibrato'

An R wrapper for 'vibrato' <https://github.com/daac-tools/vibrato>, a Rust reimplementation of 'MeCab' for fast tokenization.

Maintained by Akiru Kato. Last updated 1 months ago.

pos-tagging rust cargo

7.5 match 2.30 score 1 scripts

pboutros

VennDiagram:Generate High-Resolution Venn and Euler Plots

A set of functions to generate high-resolution Venn and Euler plots. Includes handling for several special cases, including two-case scaling, and extensive customization of plot shape and structure.

Maintained by Paul Boutros. Last updated 3 years ago.

2.0 match 3 stars 8.60 score 5.7k scripts 40 dependents

stefanocoretta

rticulate:Articulatory Data Processing in R

A tool for processing Articulate Assistant Advanced™ (AAA) ultrasound tongue imaging data and Carstens AG500/1 electro-magnetic articulographic data.

Maintained by Stefano Coretta. Last updated 1 months ago.

phonetics software tongue-image ultrasound ultrasound-tongue-imaging

2.9 match 5 stars 5.88 score 17 scripts

kurthornik

NLP:Natural Language Processing Infrastructure

Basic classes and methods for Natural Language Processing.

Maintained by Kurt Hornik. Last updated 4 months ago.

1.8 match 6 stars 9.42 score 1.0k scripts 127 dependents

sigbertklinke

HKRbook:Apps and Data for the Book "Introduction to Statistics"

Functions, Shiny apps and data for the book "Introduction to Statistics" by Wolfgang Karl Härdle, Sigbert Klinke, and Bernd Rönz (2015) <doi:10.1007/978-3-319-17704-5>.

Maintained by Sigbert Klinke. Last updated 2 years ago.

4.5 match 1 stars 3.70 score

bioc

trackViewer:A R/Bioconductor package with web interface for drawing elegant interactive tracks or lollipop plot to facilitate integrated analysis of multi-omics data

Visualize mapped reads along with annotation as track layers for NGS dataset such as ChIP-seq, RNA-seq, miRNA-seq, DNA-seq, SNPs and methylation data.

Maintained by Jianhong Ou. Last updated 5 days ago.

visualization

1.9 match 8.68 score 145 scripts 2 dependents

tidymodels

textrecipes:Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Maintained by Emil Hvitfeldt. Last updated 13 days ago.

1.3 match 160 stars 10.86 score 964 scripts 1 dependents

bioc

MEDME:Modelling Experimental Data from MeDIP Enrichment

MEDME allows the prediction of absolute and relative methylation levels based on measures obtained by MeDIP-microarray experiments

Maintained by Mattia Pelizzola. Last updated 5 months ago.

microarray cpgisland dnamethylation

3.3 match 4.30 score 2 scripts

liukf10

DDPNA:Disease-Drived Differential Proteins Co-Expression Network Analysis

Functions designed to connect disease-related differential proteins and co-expression network. It provides the basic statics analysis included t test, ANOVA analysis. The network construction is not offered by the package, you can used 'WGCNA' package which you can learn in Peter et al. (2008) <doi:10.1186/1471-2105-9-559>. It also provides module analysis included PCA analysis, two enrichment analysis, Planner maximally filtered graph extraction and hub analysis.

Maintained by Kefu Liu. Last updated 4 years ago.

4.5 match 2 stars 3.00 score 4 scripts

tobiaskley

quantspec:Quantile-Based Spectral Analysis of Time Series

Methods to determine, smooth and plot quantile periodograms for univariate and multivariate time series.

Maintained by Tobias Kley. Last updated 9 years ago.

cpp

2.3 match 10 stars 5.84 score 46 scripts 1 dependents

cran

tensorA:Advanced Tensor Arithmetic with Named Indices

Provides convenience functions for advanced linear algebra with tensors and computation with data sets of tensors on a higher level abstraction. It includes Einstein and Riemann summing conventions, dragging, co- and contravariate indices, parallel computations on sequences of tensors.

Maintained by K. Gerald van den Boogaart. Last updated 1 years ago.

2.3 match 5.83 score 399 dependents

bioc

BioQC:Detect tissue heterogeneity in expression profiles with gene sets

BioQC performs quality control of high-throughput expression data based on tissue gene signatures. It can detect tissue heterogeneity in gene expression data. The core algorithm is a Wilcoxon-Mann-Whitney test that is optimised for high performance.

Maintained by Jitao David Zhang. Last updated 5 months ago.

geneexpression qualitycontrol statisticalmethod genesetenrichment cpp

1.6 match 5 stars 8.16 score 86 scripts

sigbertklinke

exams.forge:Support for Compiling Examination Tasks using the 'exams' Package

The main aim is to further facilitate the creation of exercises based on the package 'exams' by Grün, B., and Zeileis, A. (2009) <doi:10.18637/jss.v029.i10>. Creating effective student exercises involves challenges such as creating appropriate data sets and ensuring access to intermediate values for accurate explanation of solutions. The functionality includes the generation of univariate and bivariate data including simple time series, functions for theoretical distributions and their approximation, statistical and mathematical calculations for tasks in basic statistics courses as well as general tasks such as string manipulation, LaTeX/HTML formatting and the editing of XML task files for 'Moodle'.

Maintained by Sigbert Klinke. Last updated 9 months ago.

4.6 match 2.70 score 1 scripts

florianstijven

Surrogate:Evaluation of Surrogate Endpoints in Clinical Trials

In a clinical trial, it frequently occurs that the most credible outcome to evaluate the effectiveness of a new therapy (the true endpoint) is difficult to measure. In such a situation, it can be an effective strategy to replace the true endpoint by a (bio)marker that is easier to measure and that allows for a prediction of the treatment effect on the true endpoint (a surrogate endpoint). The package 'Surrogate' allows for an evaluation of the appropriateness of a candidate surrogate endpoint based on the meta-analytic, information-theoretic, and causal-inference frameworks. Part of this software has been developed using funding provided from the European Union's Seventh Framework Programme for research, technological development and demonstration (Grant Agreement no 602552), the Special Research Fund (BOF) of Hasselt University (BOF-number: BOF2OCPO3), GlaxoSmithKline Biologicals, Baekeland Mandaat (HBC.2022.0145), and Johnson & Johnson Innovative Medicine.

Maintained by Wim Van Der Elst. Last updated 1 months ago.

2.0 match 1 stars 6.15 score 133 scripts

oucru-modelling

serosv:Model Infectious Disease Parameters from Serosurveys

An easy-to-use and efficient tool to estimate infectious diseases parameters using serological data. Implemented models include SIR models (basic_sir_model(), static_sir_model(), mseir_model(), sir_subpops_model()), parametric models (polynomial_model(), fp_model()), nonparametric models (lp_model()), semiparametric models (penalized_splines_model()), hierarchical models (hierarchical_bayesian_model()). The package is based on the book "Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective" (Hens, Niel & Shkedy, Ziv & Aerts, Marc & Faes, Christel & Damme, Pierre & Beutels, Philippe., 2013) <doi:10.1007/978-1-4614-4072-7>.

Maintained by Anh Phan Truong Quynh. Last updated 11 days ago.

cpp

1.7 match 6.77 score 24 scripts

yufree

pmd:Paired Mass Distance Analysis for GC/LC-MS Based Non-Targeted Analysis and Reactomics Analysis

Paired mass distance (PMD) analysis proposed in Yu, Olkowicz and Pawliszyn (2018) <doi:10.1016/j.aca.2018.10.062> and PMD based reactomics analysis proposed in Yu and Petrick (2020) <doi:10.1038/s42004-020-00403-z> for gas/liquid chromatography–mass spectrometry (GC/LC-MS) based non-targeted analysis. PMD analysis including GlobalStd algorithm and structure/reaction directed analysis. GlobalStd algorithm could found independent peaks in m/z-retention time profiles based on retention time hierarchical cluster analysis and frequency analysis of paired mass distances within retention time groups. Structure directed analysis could be used to find potential relationship among those independent peaks in different retention time groups based on frequency of paired mass distances. Reactomics analysis could also be performed to build PMD network, assign sources and make biomarker reaction discovery. GUIs for PMD analysis is also included as 'shiny' applications.

Maintained by Miao YU. Last updated 11 days ago.

mass-spectrometry metabolomics non-target

1.7 match 10 stars 6.78 score 40 scripts

the-hull

RAPTOR:Row and Position Tracheid Organizer

Performs wood cell anatomical data analyses on spatially explicit xylem (tracheids) datasets derived from thin sections of woody tissue. The package includes functions for visualisation, detection and alignment of continuous tracheid radial file (defined as rows) and individual tracheid position within an annual ring of coniferous species. This package is designed to be used with elaborate cell output, e.g. as provided with ROXAS (von Arx & Carrer, 2014 <doi:10.1016/j.dendro.2013.12.001>). The package has been validated for Picea abies, Larix Siberica, Pinus cembra and Pinus sylvestris.

Maintained by Richard L. Peters. Last updated 4 years ago.

2.3 match 2 stars 4.59 score 39 scripts

dariah-fi-survey-concept-network

finnsurveytext:Analyse Open-Ended Survey Responses in Finnish

Annotates Finnish textual survey responses into CoNLL-U format using Finnish treebanks from <https://universaldependencies.org/format.html> using UDPipe as described in Straka and Straková (2017) <doi:10.18653/v1/K17-3009>. Formatted data is then analysed using single or comparison n-gram plots, wordclouds, summary tables and Concept Network plots. The Concept Network plots use the TextRank algorithm as outlined in Mihalcea, Rada & Tarau, Paul (2004) <https://aclanthology.org/W04-3252/>.

Maintained by Adeline Clarke. Last updated 26 days ago.

dariah-fi

1.9 match 5.39 score 27 scripts

crew102

slowraker:A Slow Version of the Rapid Automatic Keyword Extraction (RAKE) Algorithm

A mostly pure-R implementation of the RAKE algorithm (Rose, S., Engel, D., Cramer, N. and Cowley, W. (2010) <doi:10.1002/9780470689646.ch1>), which can be used to extract keywords from documents without any training data.

Maintained by Christopher Baker. Last updated 7 months ago.

openjdk

1.9 match 6 stars 5.37 score 13 scripts 1 dependents

kurthornik

openNLP:Apache OpenNLP Tools Interface

An interface to the Apache OpenNLP tools (version 1.5.3). The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text written in Java. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. See <https://opennlp.apache.org/> for more information.

Maintained by Kurt Hornik. Last updated 5 years ago.

openjdk

1.8 match 4 stars 5.48 score 386 scripts 8 dependents

adrian-bowman

rpanel:Simple Interactive Controls for R using the 'tcltk' Package

A set of functions to build simple GUI controls for R functions. These are built on the 'tcltk' package. Uses could include changing a parameter on a graph by animating it with a slider or a "doublebutton", up to more sophisticated control panels. Some functions for specific graphical tasks, referred to as 'cartoons', are provided.

Maintained by Adrian Bowman. Last updated 2 years ago.

2.3 match 1 stars 4.24 score 157 scripts 9 dependents

mw201608

NetWeaver:Graphic Presentation of Complex Genomic and Network Data Analysis

Implements various simple function utilities and flexible pipelines to generate circular images for visualizing complex genomic and network data analysis features.

Maintained by Minghui Wang. Last updated 2 years ago.

2.0 match 4 stars 4.75 score 28 scripts

usaid-mozambique

sismar:Arrumar dados SISMA

Fornece um conjunto de funções para a criação de conjuntos de dados analíticos a partir de downloads do SISMA e DISA. Inclui funções que arrumam os ficheiros para um formato longo, removem variáveis desnecessárias, e criam colunas úteis para a análise.

Maintained by Joe Lara. Last updated 5 days ago.

1.8 match 2 stars 5.23 score 9 scripts

rkbauer

oceanmap:A Plotting Toolbox for 2D Oceanographic Data

Plotting toolbox for 2D oceanographic data (satellite data, sea surface temperature, chlorophyll, ocean fronts & bathymetry). Recognized classes and formats include netcdf, Raster, '.nc' and '.gz' files.

Maintained by Robert K. Bauer. Last updated 1 years ago.

bathymetry chla ggplot mapping-tools ncdf oceanographic-data remote-sensing satellite-im spatial-data sst

2.0 match 4 stars 4.54 score 58 scripts 1 dependents

bioc

TransView:Read density map construction and accession. Visualization of ChIPSeq and RNASeq data sets

This package provides efficient tools to generate, access and display read densities of sequencing based data sets such as from RNA-Seq and ChIP-Seq.

Maintained by Julius Muller. Last updated 2 months ago.

immunooncology dnamethylation geneexpression transcription microarray sequencing chipseq rnaseq methylseq dataimport visualization clustering multiplecomparison curl bzip2 xz-utils zlib

3.3 match 2.60 score

tathey

VLF:Frequency Matrix Approach for Assessing Very Low Frequency Variants in Sequence Records

Using frequency matrices, very low frequency variants (VLFs) are assessed for amino acid and nucleotide sequences. The VLFs are then compared to see if they occur in only one member of a species, singleton VLFs, or if they occur in multiple members of a species, shared VLFs. The amino acid and nucleotide VLFs are then compared to see if they are concordant with one another. Amino acid VLFs are also assessed to determine if they lead to a change in amino acid residue type, and potential changes to protein structures. Based on Stoeckle and Kerr (2012) <doi:10.1371/journal.pone.0043992>.

Maintained by Taryn B. T. Athey. Last updated 3 years ago.

3.9 match 2.16 score 48 scripts 1 dependents

quadrama

DramaAnalysis:Analysis of Dramatic Texts

Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.

Maintained by Nils Reiter. Last updated 5 years ago.

corpus-linguistics digital-humanities drama dramatic-texts statistics

1.7 match 15 stars 4.79 score 41 scripts

nagodem

rebmix:Finite Mixture Modeling, Clustering & Classification

Random univariate and multivariate finite mixture model generation, estimation, clustering, latent class analysis and classification. Variables can be continuous, discrete, independent or dependent and may follow normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or circular von Mises parametric families.

Maintained by Marko Nagode. Last updated 9 months ago.

cpp

3.0 match 1 stars 2.66 score 43 scripts

idslme

IDSL.FSA:Fragmentation Spectra Analysis (FSA)

The 'IDSL.FSA' package was designed to annotate standard .msp (mass spectra format) and .mgf (Mascot generic format) files using mass spectral entropy similarity, dot product (cosine) similarity, and normalized Euclidean mass error (NEME) followed by intelligent pre-filtering steps for rapid spectra searches. 'IDSL.FSA' also provides a number of modules to convert and manipulate .msp and .mgf files. The 'IDSL.FSA' workflow was integrated in the 'IDSL.CSA' and 'IDSL.NPA' packages introduced in <doi:10.1021/acs.analchem.3c00376>.

Maintained by Dinesh Barupal. Last updated 8 months ago.

fragmentation-spectra mass-spectrometry massbank mgf mgf-parser msp msp-parser spectral-entropy

1.9 match 1 stars 3.48 score 2 dependents

khanna-lab

influenceR:Software Tools to Quantify Structural Importance of Nodes in a Network

Provides functionality to compute various node centrality measures on networks. Included are functions to compute betweenness centrality (by utilizing Madduri and Bader's SNAP library), implementations of constraint and effective network size by Burt (2000) <doi:10.1016/S0191-3085(00)22009-1>; algorithm to identify key players by Borgatti (2006) <doi:10.1007/s10588-006-7084-x>; and the bridging algorithm by Valente and Fujimoto (2010) <doi:10.1016/j.socnet.2010.03.003>. On Unix systems, the betweenness, Key Players, and bridging implementations are parallelized with OpenMP, which may run faster on systems which have OpenMP configured.

Maintained by Aditya Khanna. Last updated 2 years ago.

openmp

1.8 match 1 stars 3.61 score 24 scripts

pariya

netgwas:Network-Based Genome Wide Association Studies

A multi-core R package that contains a set of tools based on copula graphical models for accomplishing the three interrelated goals in genetics and genomics in an unified way: (1) linkage map construction, (2) constructing linkage disequilibrium networks, and (3) exploring high-dimensional genotype-phenotype network and genotype- phenotype-environment interactions networks. The 'netgwas' package can deal with biparental inbreeding and outbreeding species with any ploidy level, namely diploid (2 sets of chromosomes), triploid (3 sets of chromosomes), tetraploid (4 sets of chromosomes) and so on. We target on high-dimensional data where number of variables p is considerably larger than number of sample sizes (p >> n). The computations is memory-optimized using the sparse matrix output. The 'netgwas' implements the methodological developments in Behrouzi and Wit (2017) <doi:10.1111/rssc.12287> and Behrouzi and Wit (2017) <doi:10.1093/bioinformatics/bty777>.

Maintained by Pariya Behrouzi. Last updated 1 years ago.

2.3 match 3 stars 2.65 score 5 scripts 1 dependents

strategicprojects

pikchr:R Wrapper for 'pikchr' (PIC) Diagram Language

An 'R' interface to 'pikchr' (<https://pikchr.org>, pronounced “picture”), a 'PIC'-like markup language for creating diagrams within technical documentation. Originally developed by Brian Kernighan, 'PIC' has been adapted into 'pikchr' by D. Richard Hipp, the creator of 'SQLite'. 'pikchr' is designed to be embedded in fenced code blocks of Markdown or other documentation markup languages, making it ideal for generating diagrams in text-based formats. This package allows R users to seamlessly integrate the descriptive syntax of 'pikchr' for diagram creation directly within the 'R' environment.

Maintained by Andre Leite. Last updated 17 hours ago.

1.1 match 1 stars 4.90 score 7 scripts

martinzaefferer

CEGO:Combinatorial Efficient Global Optimization

Model building, surrogate model based optimization and Efficient Global Optimization in combinatorial or mixed search spaces.

Maintained by Martin Zaefferer. Last updated 3 months ago.

1.8 match 1 stars 3.04 score 73 scripts

repboxr

repboxStata:Repbox analysis of stata scripts in reproduction packages

Repbox analysis of stata scripts in reproduction packages

Maintained by Sebastian Kranz. Last updated 2 months ago.

2.0 match 2.73 score 4 scripts 2 dependents

taylor-arnold

coreNLP:Wrappers Around Stanford CoreNLP Tools

Provides a minimal interface for applying annotators from the 'Stanford CoreNLP' java library. Methods are provided for tasks such as tokenisation, part of speech tagging, lemmatisation, named entity recognition, coreference detection and sentiment analysis.

Maintained by Taylor Arnold. Last updated 3 years ago.

openjdk

1.8 match 1 stars 3.04 score 55 scripts

chrchang

pgenlibr:PLINK 2 Binary (.pgen) Reader

A thin wrapper over PLINK 2's core libraries which provides an R interface for reading .pgen files. A minimal .pvar loader is also included. Chang et al. (2015) \doi{10.1186/s13742-015-0047-8}.

Maintained by Christopher Chang. Last updated 3 months ago.

libzstd libdeflate zlib cpp

1.7 match 2.98 score 64 scripts

cran

qtlhot:Inference for QTL Hotspots

Functions to infer co-mapping trait hotspots and causal models. Chaibub Neto E, Keller MP, Broman AF, Attie AD, Jansen RC, Broman KW, Yandell BS (2012) Quantile-based permutation thresholds for QTL hotspots. Genetics 191 : 1355-1365. <doi:10.1534/genetics.112.139451>. Chaibub Neto E, Broman AT, Keller MP, Attie AD, Zhang B, Zhu J, Yandell BS (2013) Modeling causality for pairs of phenotypes in system genetics. Genetics 193 : 1003-1013. <doi:10.1534/genetics.112.147124>.

Maintained by Brian S. Yandell. Last updated 7 years ago.

1.8 match 2.30 score

consmavr

FLR:Fuzzy Logic Rule Classifier

FLR algorithm for classification

Maintained by Constantinos Mavridis. Last updated 13 years ago.

2.3 match 1.78 score 3 scripts

cran

propOverlap:Feature (gene) selection based on the Proportional Overlapping Scores

A package for selecting the most relevant features (genes) in the high-dimensional binary classification problems. The discriminative features are identified using analyzing the overlap between the expression values across both classes. The package includes functions for measuring the proportional overlapping score for each gene avoiding the outliers effect. The used measure for the overlap is the one defined in the "Proportional Overlapping Score (POS)" technique for feature selection. A gene mask which represents a gene's classification power can also be produced for each gene (feature). The set size of the selected genes might be set by the user. The minimum set of genes that correctly classify the maximum number of the given tissue samples (observations) can be also produced.

Maintained by Osama Mahmoud. Last updated 11 years ago.

3.8 match 1 stars 1.00 score

skranz

rmdtools:Tools for RMarkdown

Tools for RMarkdown

Maintained by Sebastian Kranz. Last updated 4 years ago.

1.9 match 1 stars 1.78 score 6 scripts 2 dependents

cran

Tex4exams:Generating 'Sweave' Code for 'R/exams' Questions in Mathematics

When using the R package 'exams' to write mathematics questions in 'Sweave' files, the output of a lot of R functions need to be adjusted for display in mathematical formulas. Specifically, the functions were accumulated when writing questions for the topics of the mathematics courses College Algebra, Precalculus, Calculus, Differential Equations, Introduction to Probability, and Linear Algebra. The output of the developed functions can be used in 'Sweave' files.

Maintained by Qingwen Hu. Last updated 2 years ago.

3.3 match 1.00 score

madanstat

LongCART:Recursive Partitioning for Longitudinal Data and Right Censored Data Using Baseline Covariates

Constructs tree for continuous longitudinal data and survival data using baseline covariates as partitioning variables according to the 'LongCART' and 'SurvCART' algorithm, respectively. Later also included functions to calculate conditional power and predictive power of success based on interim results and probability of success for a prospective trial.

Maintained by Madan G Kundu. Last updated 3 years ago.

3.3 match 1.00 score 4 scripts

tconwell

textTools:Functions for Text Cleansing and Text Analysis

A framework for text cleansing and analysis. Conveniently prepare and process large amounts of text for analysis. Includes various metrics for word counts/frequencies that scale efficiently. Quickly analyze large amounts of text data using a text.table (a data.table created with one word (or unit of text analysis) per row, similar to the tidytext format). Offers flexibility to efficiently work with text data stored in vectors as well as text data formatted as a text.table.

Maintained by Timothy Conwell. Last updated 4 years ago.

3.3 match 1.00 score 4 scripts

computationalstylistics

litRiddle:Dataset and Tools to Research the Riddle of Literary Quality

Dataset and functions to explore quality of literary novels. The package is a part of the Riddle of Literary Quality project, and it contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data. For more details, see: Eder M, van Zundert J, Lensink S, van Dalen-Oskam K (2022). Replicating The Riddle of Literary Quality: The litRiddle package for R. In _Digital Humanities 2022: Conference Abstracts_, 636-637.

Maintained by Maciej Eder. Last updated 2 years ago.

1.2 match 2.70 score 2 scripts

cran

chinese.misc:Miscellaneous Tools for Chinese Text Mining and More

Efforts are made to make Chinese text mining easier, faster, and robust to errors. Document term matrix can be generated by only one line of code; detecting encoding, segmenting and removing stop words are done automatically. Some convenient tools are also supplied.

Maintained by Jiang Wu. Last updated 5 years ago.

1.8 match 1.78 score 2 dependents

cran

RegKink:Regression Kink with a Time-Varying Threshold

An algorithm is proposed to estimate regression kink model proposed by the paper, Lixiong Yang and Jen-Je Su (2018) <doi:10.1016/j.jimonfin.2018.06.002>.

Maintained by Lixiong Yang. Last updated 4 years ago.

2.3 match 1 stars 1.00 score

prabinameher

EncDNA:Encoding of Nucleotide Sequences into Numeric Feature Vectors

We describe fifteen different splice site sequence encoding schemes that have been used in earlier studies for mapping of splice site sequences into numeric feature vectors. These encoding schemes will also be helpful for transforming other nucleotide sequences into numeric forms, provided they are of equal length. These encoding schemes will help the computational biologist working in the field of classification (binary or multiclass) or prediction involving nucleic acid sequences of equal length.

Maintained by Prabina Kumar Meher. Last updated 6 years ago.

2.3 match 1 stars 1.00 score

schweflo

NLPclient:Stanford 'CoreNLP' Annotation Client

Stanford 'CoreNLP' annotation client. Stanford 'CoreNLP' <https://stanfordnlp.github.io/CoreNLP/index.html> integrates all NLP tools from the Stanford Natural Language Processing Group, including a part-of-speech (POS) tagger, a named entity recognizer (NER), a parser, and a coreference resolution system, and provides model files for the analysis of English. More information can be found in the README.

Maintained by Florian Schwendinger. Last updated 5 years ago.

0.5 match 1.70 score