Showing 78 of total 78 results (show query)
skranz
stringtools:Tools for working with strings in R
Tools for working with strings in R
Maintained by Sebastian Kranz. Last updated 3 years ago.
43.5 match 2 stars 3.66 score 29 scripts 26 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
12.6 match 215 stars 11.83 score 1.2k scripts 9 dependentsbrodieg
vetr:Trust, but Verify
Declarative template-based framework for verifying that objects meet structural requirements, and auto-composing error messages when they do not.
Maintained by Brodie Gaslam. Last updated 9 months ago.
argument-checksinput-validation
12.4 match 79 stars 7.50 score 67 scripts 1 dependentsjunhewk
RcppMeCab:'rcpp' Wrapper for 'mecab' Library
R package based on 'Rcpp' for 'MeCab': Yet Another Part-of-Speech and Morphological Analyzer. The purpose of this package is providing a seamless developing and analyzing environment for CJK texts. This package utilizes parallel programming for providing highly efficient text preprocessing 'posParallel()' function. For installation, please refer to README.md file.
Maintained by Junhewk Kim. Last updated 7 months ago.
14.3 match 25 stars 5.30 score 40 scriptsbioc
BiocGenerics:S4 generic functions used in Bioconductor
The package defines many S4 generic functions used in Bioconductor.
Maintained by Hervé Pagès. Last updated 2 months ago.
infrastructurebioconductor-packagecore-package
5.0 match 12 stars 14.22 score 612 scripts 2.2k dependentstrinker
qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
Maintained by Tyler Rinker. Last updated 5 years ago.
qdapquantitative-discourse-analysistext-analysistext-miningtext-plottingopenjdk
7.3 match 176 stars 9.47 score 1.3k scripts 3 dependentsbioc
GenomicRanges:Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Maintained by Hervé Pagès. Last updated 5 months ago.
geneticsinfrastructuredatarepresentationsequencingannotationgenomeannotationcoveragebioconductor-packagecore-package
3.3 match 44 stars 17.68 score 13k scripts 1.3k dependentsbioc
IRanges:Foundation of integer range manipulation in Bioconductor
Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.
Maintained by Hervé Pagès. Last updated 2 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
3.3 match 22 stars 16.09 score 2.1k scripts 1.8k dependentspaithiov909
gibasa:An Alternative 'Rcpp' Wrapper of 'MeCab'
A plain 'Rcpp' wrapper for 'MeCab' that can segment Chinese, Japanese, and Korean text into tokens. The main goal of this package is to provide an alternative to 'tidytext' using morphological analysis.
Maintained by Akiru Kato. Last updated 14 days ago.
9.3 match 15 stars 5.02 score 3 scriptscvxgrp
CVXR:Disciplined Convex Optimization
An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.
Maintained by Anqi Fu. Last updated 5 months ago.
3.3 match 207 stars 12.89 score 768 scripts 51 dependentsjinghuazhao
gap:Genetic Analysis Package
As first reported [Zhao, J. H. 2007. "gap: Genetic Analysis Package". J Stat Soft 23(8):1-18. <doi:10.18637/jss.v023.i08>], it is designed as an integrated package for genetic data analysis of both population and family data. Currently, it contains functions for sample size calculations of both population-based and family-based designs, probability of familial disease aggregation, kinship calculation, statistics in linkage analysis, and association analysis involving genetic markers including haplotype analysis with or without environmental covariates. Over years, the package has been developed in-between many projects hence also in line with the name (gap).
Maintained by Jing Hua Zhao. Last updated 7 days ago.
3.4 match 12 stars 11.94 score 448 scripts 16 dependentsmlr-org
mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'
Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.
Maintained by Martin Binder. Last updated 24 days ago.
baggingdata-sciencedataflow-programmingensemble-learningmachine-learningmlr3pipelinespreprocessingstacking
3.3 match 141 stars 12.36 score 448 scripts 7 dependentsmarkbravington
mvbutils:General utilities, workspace organization, code and docu editing, live package maintenance, etc
Hierarchical workspace tree, code editing and backup, easy package prep, editing of packages while loaded, per-object lazy-loading, easy documentation, macro functions, and miscellaneous utilities. Needed by debug package.
Maintained by Mark V. Bravington. Last updated 7 days ago.
5.6 match 6.57 score 138 scripts 18 dependentsbioc
clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters
Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.
Maintained by Rui Fu. Last updated 5 months ago.
singlecellannotationsequencingmicroarraygeneexpressionassign-identitiesclustersmarker-genesrna-seqsingle-cell-rna-seq
3.4 match 120 stars 9.63 score 296 scriptscomputationalstylistics
stylo:Stylometric Multivariate Analyses
Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.
Maintained by Maciej Eder. Last updated 3 months ago.
3.8 match 187 stars 8.58 score 462 scriptsskranz
RTutor:Interactive R problem sets with automatic testing of solutions and automatic hints
Interactive R problem sets with automatic testing of solutions and automatic hints
Maintained by Sebastian Kranz. Last updated 1 years ago.
economicslearn-to-codeproblem-setrstudiortutorshinyteaching
5.3 match 205 stars 5.83 score 111 scripts 1 dependentsmlr-org
mlr3verse:Easily Install and Load the 'mlr3' Package Family
The 'mlr3' package family is a set of packages for machine-learning purposes built in a modular fashion. This wrapper package is aimed to simplify the installation and loading of the core 'mlr3' packages. Get more information about the 'mlr3' project at <https://mlr3book.mlr-org.com/>.
Maintained by Marc Becker. Last updated 3 months ago.
3.3 match 55 stars 8.32 score 720 scripts 1 dependentsbioc
CNAnorm:A normalization method for Copy Number Aberration in cancer samples
Performs ratio, GC content correction and normalization of data obtained using low coverage (one read every 100-10,000 bp) high troughput sequencing. It performs a "discrete" normalization looking for the ploidy of the genome. It will also provide tumour content if at least two ploidy states can be found.
Maintained by Stefano Berri. Last updated 5 months ago.
copynumbervariationsequencingcoveragenormalizationwholegenomednaseqgenomicvariationfortran
6.3 match 4.30 score 6 scriptsshusei-e
RcppJagger:An R Wrapper for Jagger
A wrapper for Jagger, a morphological analyzer proposed in Yoshinaga (2023) <arXiv:2305.19045>. Jagger uses patterns derived from morphological dictionaries and training data sets and applies them from the beginning of the input. This simultaneous and deterministic process enables it to effectively perform tokenization, POS tagging, and lemmatization.
Maintained by Shusei Eshima. Last updated 2 years ago.
japanese-nlpmorphological-analysernlppart-of-speech-taggertext-analysiscpp
8.5 match 3 stars 3.18 score 3 scriptsjamesramsay5
fda:Functional Data Analysis
These functions were developed to support functional data analysis as described in Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. New York: Springer and in Ramsay, J. O., Hooker, Giles, and Graves, Spencer (2009). Functional Data Analysis with R and Matlab (Springer). The package includes data sets and script files working many examples including all but one of the 76 figures in this latter book. Matlab versions are available by ftp from <https://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/>.
Maintained by James Ramsay. Last updated 4 months ago.
2.3 match 3 stars 11.88 score 2.0k scripts 142 dependentsrepboxr
repboxUtils:Utility functions shared by several repbox packages
Utility functions shared by several repbox packages
Maintained by Sebastian Kranz. Last updated 2 months ago.
6.1 match 4.21 score 9 dependentsrpahl
container:Extending Base 'R' Lists
Extends the functionality of base 'R' lists and provides specialized data structures 'deque', 'set', 'dict', and 'dict.table', the latter to extend the 'data.table' package.
Maintained by Roman Pahl. Last updated 3 months ago.
containerdata-structuresdequedictsets
3.3 match 16 stars 7.13 score 140 scriptsknausb
vcfR:Manipulate and Visualize VCF Data
Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.
Maintained by Brian J. Knaus. Last updated 1 months ago.
genomicspopulation-geneticspopulation-genomicsrcppvcf-datavisualizationzlibcpp
1.7 match 256 stars 13.66 score 3.1k scripts 19 dependentsr-forge
tm:Text Mining Package
A framework for text mining applications within R.
Maintained by Kurt Hornik. Last updated 1 months ago.
1.8 match 13.00 score 14k scripts 100 dependentspatrickroocks
rPref:Database Preferences and Skyline Computation
Routines to select and visualize the maxima for a given strict partial order. This especially includes the computation of the Pareto frontier, also known as (Top-k) Skyline operator (see Börzsönyi, et al. (2001) <doi:10.1109/ICDE.2001.914855>), and some generalizations known as database preferences (see Kießling (2002) <doi:10.1016/B978-155860869-6/50035-4>).
Maintained by Patrick Roocks. Last updated 2 years ago.
3.3 match 2 stars 6.55 score 115 scripts 4 dependentspbs-software
PBSmapping:Mapping Fisheries Data and Spatial Analysis Tools
This software has evolved from fisheries research conducted at the Pacific Biological Station (PBS) in 'Nanaimo', British Columbia, Canada. It extends the R language to include two-dimensional plotting features similar to those commonly available in a Geographic Information System (GIS). Embedded C code speeds algorithms from computational geometry, such as finding polygons that contain specified point events or converting between longitude-latitude and Universal Transverse Mercator (UTM) coordinates. Additionally, we include 'C++' code developed by Angus Johnson for the 'Clipper' library, data for a global shoreline, and other data sets in the public domain. Under the user's R library directory '.libPaths()', specifically in './PBSmapping/doc', a complete user's guide is offered and should be consulted to use package functions effectively.
Maintained by Rowan Haigh. Last updated 6 months ago.
1.9 match 11 stars 10.16 score 652 scripts 9 dependentsmrcieu
ieugwasr:Interface to the 'OpenGWAS' Database API
Interface to the 'OpenGWAS' database API <https://api.opengwas.io/api/>. Includes a wrapper to make generic calls to the API, plus convenience functions for specific queries.
Maintained by Gibran Hemani. Last updated 18 days ago.
1.8 match 89 stars 10.71 score 404 scripts 6 dependentsagusnieto77
ACEP:Análisis Computacional de Eventos de Protesta
La librería 'ACEP' contiene funciones específicas para desarrollar análisis computacional de eventos de protesta. Asimismo, contiene bases de datos con colecciones de notas sobre protestas y diccionarios de palabras conflictivas. La colección de diccionarios reune diccionarios de diferentes orígenes. The 'ACEP' library contains specific functions to perform computational analysis of protest events. It also contains a database with collections of notes on protests and dictionaries of conflicting words. Collection of dictionaries that brings together dictionaries from different sources.
Maintained by Agustín Nieto. Last updated 1 years ago.
computer-aided-detectionconflict-analysisconflict-detectiondictionariesnlp-keywords-extractionprotest-eventstext-miningvisualization
3.4 match 10 stars 5.48 score 9 scriptspaithiov909
sudachir2:R Wrapper for 'sudachi.rs'
Offers bindings to 'sudachi.rs' <https://github.com/WorksApplications/sudachi.rs>, a Rust implementation of 'Sudachi' Japanese morphological analyzer.
Maintained by Akiru Kato. Last updated 11 days ago.
7.5 match 3 stars 2.48 score 3 scriptsicosa-grid
icosa:Global Triangular and Penta-Hexagonal Grids Based on Tessellated Icosahedra
Implementation of icosahedral grids in three dimensions. The spherical-triangular tessellation can be set to create grids with custom resolutions. Both the primary triangular and their inverted penta-hexagonal grids can be calculated. Additional functions are provided that allow plotting of the grids and associated data, the interaction of the grids with other raster and vector objects, and treating the grids as a graphs.
Maintained by Adam T. Kocsis. Last updated 8 months ago.
3.3 match 4 stars 5.41 score 65 scriptspaithiov909
vibrrt:An R Wrapper for 'vibrato'
An R wrapper for 'vibrato' <https://github.com/daac-tools/vibrato>, a Rust reimplementation of 'MeCab' for fast tokenization.
Maintained by Akiru Kato. Last updated 1 months ago.
7.5 match 2.30 score 1 scriptspboutros
VennDiagram:Generate High-Resolution Venn and Euler Plots
A set of functions to generate high-resolution Venn and Euler plots. Includes handling for several special cases, including two-case scaling, and extensive customization of plot shape and structure.
Maintained by Paul Boutros. Last updated 3 years ago.
2.0 match 3 stars 8.60 score 5.7k scripts 40 dependentsstefanocoretta
rticulate:Articulatory Data Processing in R
A tool for processing Articulate Assistant Advanced™ (AAA) ultrasound tongue imaging data and Carstens AG500/1 electro-magnetic articulographic data.
Maintained by Stefano Coretta. Last updated 1 months ago.
phoneticssoftwaretongue-imageultrasoundultrasound-tongue-imaging
2.9 match 5 stars 5.88 score 17 scriptskurthornik
NLP:Natural Language Processing Infrastructure
Basic classes and methods for Natural Language Processing.
Maintained by Kurt Hornik. Last updated 4 months ago.
1.8 match 6 stars 9.42 score 1.0k scripts 127 dependentssigbertklinke
HKRbook:Apps and Data for the Book "Introduction to Statistics"
Functions, Shiny apps and data for the book "Introduction to Statistics" by Wolfgang Karl Härdle, Sigbert Klinke, and Bernd Rönz (2015) <doi:10.1007/978-3-319-17704-5>.
Maintained by Sigbert Klinke. Last updated 2 years ago.
4.5 match 1 stars 3.70 scorebioc
trackViewer:A R/Bioconductor package with web interface for drawing elegant interactive tracks or lollipop plot to facilitate integrated analysis of multi-omics data
Visualize mapped reads along with annotation as track layers for NGS dataset such as ChIP-seq, RNA-seq, miRNA-seq, DNA-seq, SNPs and methylation data.
Maintained by Jianhong Ou. Last updated 5 days ago.
1.9 match 8.68 score 145 scripts 2 dependentstidymodels
textrecipes:Extra 'Recipes' for Text Processing
Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.
Maintained by Emil Hvitfeldt. Last updated 13 days ago.
1.3 match 160 stars 10.86 score 964 scripts 1 dependentsbioc
MEDME:Modelling Experimental Data from MeDIP Enrichment
MEDME allows the prediction of absolute and relative methylation levels based on measures obtained by MeDIP-microarray experiments
Maintained by Mattia Pelizzola. Last updated 5 months ago.
microarraycpgislanddnamethylation
3.3 match 4.30 score 2 scriptsliukf10
DDPNA:Disease-Drived Differential Proteins Co-Expression Network Analysis
Functions designed to connect disease-related differential proteins and co-expression network. It provides the basic statics analysis included t test, ANOVA analysis. The network construction is not offered by the package, you can used 'WGCNA' package which you can learn in Peter et al. (2008) <doi:10.1186/1471-2105-9-559>. It also provides module analysis included PCA analysis, two enrichment analysis, Planner maximally filtered graph extraction and hub analysis.
Maintained by Kefu Liu. Last updated 4 years ago.
4.5 match 2 stars 3.00 score 4 scriptstobiaskley
quantspec:Quantile-Based Spectral Analysis of Time Series
Methods to determine, smooth and plot quantile periodograms for univariate and multivariate time series.
Maintained by Tobias Kley. Last updated 9 years ago.
2.3 match 10 stars 5.84 score 46 scripts 1 dependentscran
tensorA:Advanced Tensor Arithmetic with Named Indices
Provides convenience functions for advanced linear algebra with tensors and computation with data sets of tensors on a higher level abstraction. It includes Einstein and Riemann summing conventions, dragging, co- and contravariate indices, parallel computations on sequences of tensors.
Maintained by K. Gerald van den Boogaart. Last updated 1 years ago.
2.3 match 5.83 score 399 dependentsbioc
BioQC:Detect tissue heterogeneity in expression profiles with gene sets
BioQC performs quality control of high-throughput expression data based on tissue gene signatures. It can detect tissue heterogeneity in gene expression data. The core algorithm is a Wilcoxon-Mann-Whitney test that is optimised for high performance.
Maintained by Jitao David Zhang. Last updated 5 months ago.
geneexpressionqualitycontrolstatisticalmethodgenesetenrichmentcpp
1.6 match 5 stars 8.16 score 86 scriptssigbertklinke
exams.forge:Support for Compiling Examination Tasks using the 'exams' Package
The main aim is to further facilitate the creation of exercises based on the package 'exams' by Grün, B., and Zeileis, A. (2009) <doi:10.18637/jss.v029.i10>. Creating effective student exercises involves challenges such as creating appropriate data sets and ensuring access to intermediate values for accurate explanation of solutions. The functionality includes the generation of univariate and bivariate data including simple time series, functions for theoretical distributions and their approximation, statistical and mathematical calculations for tasks in basic statistics courses as well as general tasks such as string manipulation, LaTeX/HTML formatting and the editing of XML task files for 'Moodle'.
Maintained by Sigbert Klinke. Last updated 9 months ago.
4.6 match 2.70 score 1 scriptsoucru-modelling
serosv:Model Infectious Disease Parameters from Serosurveys
An easy-to-use and efficient tool to estimate infectious diseases parameters using serological data. Implemented models include SIR models (basic_sir_model(), static_sir_model(), mseir_model(), sir_subpops_model()), parametric models (polynomial_model(), fp_model()), nonparametric models (lp_model()), semiparametric models (penalized_splines_model()), hierarchical models (hierarchical_bayesian_model()). The package is based on the book "Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective" (Hens, Niel & Shkedy, Ziv & Aerts, Marc & Faes, Christel & Damme, Pierre & Beutels, Philippe., 2013) <doi:10.1007/978-1-4614-4072-7>.
Maintained by Anh Phan Truong Quynh. Last updated 11 days ago.
1.7 match 6.77 score 24 scriptsyufree
pmd:Paired Mass Distance Analysis for GC/LC-MS Based Non-Targeted Analysis and Reactomics Analysis
Paired mass distance (PMD) analysis proposed in Yu, Olkowicz and Pawliszyn (2018) <doi:10.1016/j.aca.2018.10.062> and PMD based reactomics analysis proposed in Yu and Petrick (2020) <doi:10.1038/s42004-020-00403-z> for gas/liquid chromatography–mass spectrometry (GC/LC-MS) based non-targeted analysis. PMD analysis including GlobalStd algorithm and structure/reaction directed analysis. GlobalStd algorithm could found independent peaks in m/z-retention time profiles based on retention time hierarchical cluster analysis and frequency analysis of paired mass distances within retention time groups. Structure directed analysis could be used to find potential relationship among those independent peaks in different retention time groups based on frequency of paired mass distances. Reactomics analysis could also be performed to build PMD network, assign sources and make biomarker reaction discovery. GUIs for PMD analysis is also included as 'shiny' applications.
Maintained by Miao YU. Last updated 11 days ago.
mass-spectrometrymetabolomicsnon-target
1.7 match 10 stars 6.78 score 40 scriptsthe-hull
RAPTOR:Row and Position Tracheid Organizer
Performs wood cell anatomical data analyses on spatially explicit xylem (tracheids) datasets derived from thin sections of woody tissue. The package includes functions for visualisation, detection and alignment of continuous tracheid radial file (defined as rows) and individual tracheid position within an annual ring of coniferous species. This package is designed to be used with elaborate cell output, e.g. as provided with ROXAS (von Arx & Carrer, 2014 <doi:10.1016/j.dendro.2013.12.001>). The package has been validated for Picea abies, Larix Siberica, Pinus cembra and Pinus sylvestris.
Maintained by Richard L. Peters. Last updated 4 years ago.
2.3 match 2 stars 4.59 score 39 scriptsdariah-fi-survey-concept-network
finnsurveytext:Analyse Open-Ended Survey Responses in Finnish
Annotates Finnish textual survey responses into CoNLL-U format using Finnish treebanks from <https://universaldependencies.org/format.html> using UDPipe as described in Straka and Straková (2017) <doi:10.18653/v1/K17-3009>. Formatted data is then analysed using single or comparison n-gram plots, wordclouds, summary tables and Concept Network plots. The Concept Network plots use the TextRank algorithm as outlined in Mihalcea, Rada & Tarau, Paul (2004) <https://aclanthology.org/W04-3252/>.
Maintained by Adeline Clarke. Last updated 26 days ago.
1.9 match 5.39 score 27 scriptscrew102
slowraker:A Slow Version of the Rapid Automatic Keyword Extraction (RAKE) Algorithm
A mostly pure-R implementation of the RAKE algorithm (Rose, S., Engel, D., Cramer, N. and Cowley, W. (2010) <doi:10.1002/9780470689646.ch1>), which can be used to extract keywords from documents without any training data.
Maintained by Christopher Baker. Last updated 7 months ago.
1.9 match 6 stars 5.37 score 13 scripts 1 dependentskurthornik
openNLP:Apache OpenNLP Tools Interface
An interface to the Apache OpenNLP tools (version 1.5.3). The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text written in Java. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. See <https://opennlp.apache.org/> for more information.
Maintained by Kurt Hornik. Last updated 5 years ago.
1.8 match 4 stars 5.48 score 386 scripts 8 dependentsadrian-bowman
rpanel:Simple Interactive Controls for R using the 'tcltk' Package
A set of functions to build simple GUI controls for R functions. These are built on the 'tcltk' package. Uses could include changing a parameter on a graph by animating it with a slider or a "doublebutton", up to more sophisticated control panels. Some functions for specific graphical tasks, referred to as 'cartoons', are provided.
Maintained by Adrian Bowman. Last updated 2 years ago.
2.3 match 1 stars 4.24 score 157 scripts 9 dependentsmw201608
NetWeaver:Graphic Presentation of Complex Genomic and Network Data Analysis
Implements various simple function utilities and flexible pipelines to generate circular images for visualizing complex genomic and network data analysis features.
Maintained by Minghui Wang. Last updated 2 years ago.
2.0 match 4 stars 4.75 score 28 scriptsusaid-mozambique
sismar:Arrumar dados SISMA
Fornece um conjunto de funções para a criação de conjuntos de dados analíticos a partir de downloads do SISMA e DISA. Inclui funções que arrumam os ficheiros para um formato longo, removem variáveis desnecessárias, e criam colunas úteis para a análise.
Maintained by Joe Lara. Last updated 5 days ago.
1.8 match 2 stars 5.23 score 9 scriptsrkbauer
oceanmap:A Plotting Toolbox for 2D Oceanographic Data
Plotting toolbox for 2D oceanographic data (satellite data, sea surface temperature, chlorophyll, ocean fronts & bathymetry). Recognized classes and formats include netcdf, Raster, '.nc' and '.gz' files.
Maintained by Robert K. Bauer. Last updated 1 years ago.
bathymetrychlaggplotmapping-toolsncdfoceanographic-dataremote-sensingsatellite-imspatial-datasst
2.0 match 4 stars 4.54 score 58 scripts 1 dependentsbioc
TransView:Read density map construction and accession. Visualization of ChIPSeq and RNASeq data sets
This package provides efficient tools to generate, access and display read densities of sequencing based data sets such as from RNA-Seq and ChIP-Seq.
Maintained by Julius Muller. Last updated 2 months ago.
immunooncologydnamethylationgeneexpressiontranscriptionmicroarraysequencingchipseqrnaseqmethylseqdataimportvisualizationclusteringmultiplecomparisoncurlbzip2xz-utilszlib
3.3 match 2.60 scoretathey
VLF:Frequency Matrix Approach for Assessing Very Low Frequency Variants in Sequence Records
Using frequency matrices, very low frequency variants (VLFs) are assessed for amino acid and nucleotide sequences. The VLFs are then compared to see if they occur in only one member of a species, singleton VLFs, or if they occur in multiple members of a species, shared VLFs. The amino acid and nucleotide VLFs are then compared to see if they are concordant with one another. Amino acid VLFs are also assessed to determine if they lead to a change in amino acid residue type, and potential changes to protein structures. Based on Stoeckle and Kerr (2012) <doi:10.1371/journal.pone.0043992>.
Maintained by Taryn B. T. Athey. Last updated 3 years ago.
3.9 match 2.16 score 48 scripts 1 dependentsquadrama
DramaAnalysis:Analysis of Dramatic Texts
Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.
Maintained by Nils Reiter. Last updated 5 years ago.
corpus-linguisticsdigital-humanitiesdramadramatic-textsstatistics
1.7 match 15 stars 4.79 score 41 scriptsnagodem
rebmix:Finite Mixture Modeling, Clustering & Classification
Random univariate and multivariate finite mixture model generation, estimation, clustering, latent class analysis and classification. Variables can be continuous, discrete, independent or dependent and may follow normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or circular von Mises parametric families.
Maintained by Marko Nagode. Last updated 9 months ago.
3.0 match 1 stars 2.66 score 43 scriptsidslme
IDSL.FSA:Fragmentation Spectra Analysis (FSA)
The 'IDSL.FSA' package was designed to annotate standard .msp (mass spectra format) and .mgf (Mascot generic format) files using mass spectral entropy similarity, dot product (cosine) similarity, and normalized Euclidean mass error (NEME) followed by intelligent pre-filtering steps for rapid spectra searches. 'IDSL.FSA' also provides a number of modules to convert and manipulate .msp and .mgf files. The 'IDSL.FSA' workflow was integrated in the 'IDSL.CSA' and 'IDSL.NPA' packages introduced in <doi:10.1021/acs.analchem.3c00376>.
Maintained by Dinesh Barupal. Last updated 8 months ago.
fragmentation-spectramass-spectrometrymassbankmgfmgf-parsermspmsp-parserspectral-entropy
1.9 match 1 stars 3.48 score 2 dependentskhanna-lab
influenceR:Software Tools to Quantify Structural Importance of Nodes in a Network
Provides functionality to compute various node centrality measures on networks. Included are functions to compute betweenness centrality (by utilizing Madduri and Bader's SNAP library), implementations of constraint and effective network size by Burt (2000) <doi:10.1016/S0191-3085(00)22009-1>; algorithm to identify key players by Borgatti (2006) <doi:10.1007/s10588-006-7084-x>; and the bridging algorithm by Valente and Fujimoto (2010) <doi:10.1016/j.socnet.2010.03.003>. On Unix systems, the betweenness, Key Players, and bridging implementations are parallelized with OpenMP, which may run faster on systems which have OpenMP configured.
Maintained by Aditya Khanna. Last updated 2 years ago.
1.8 match 1 stars 3.61 score 24 scriptspariya
netgwas:Network-Based Genome Wide Association Studies
A multi-core R package that contains a set of tools based on copula graphical models for accomplishing the three interrelated goals in genetics and genomics in an unified way: (1) linkage map construction, (2) constructing linkage disequilibrium networks, and (3) exploring high-dimensional genotype-phenotype network and genotype- phenotype-environment interactions networks. The 'netgwas' package can deal with biparental inbreeding and outbreeding species with any ploidy level, namely diploid (2 sets of chromosomes), triploid (3 sets of chromosomes), tetraploid (4 sets of chromosomes) and so on. We target on high-dimensional data where number of variables p is considerably larger than number of sample sizes (p >> n). The computations is memory-optimized using the sparse matrix output. The 'netgwas' implements the methodological developments in Behrouzi and Wit (2017) <doi:10.1111/rssc.12287> and Behrouzi and Wit (2017) <doi:10.1093/bioinformatics/bty777>.
Maintained by Pariya Behrouzi. Last updated 1 years ago.
2.3 match 3 stars 2.65 score 5 scripts 1 dependentsstrategicprojects
pikchr:R Wrapper for 'pikchr' (PIC) Diagram Language
An 'R' interface to 'pikchr' (<https://pikchr.org>, pronounced “picture”), a 'PIC'-like markup language for creating diagrams within technical documentation. Originally developed by Brian Kernighan, 'PIC' has been adapted into 'pikchr' by D. Richard Hipp, the creator of 'SQLite'. 'pikchr' is designed to be embedded in fenced code blocks of Markdown or other documentation markup languages, making it ideal for generating diagrams in text-based formats. This package allows R users to seamlessly integrate the descriptive syntax of 'pikchr' for diagram creation directly within the 'R' environment.
Maintained by Andre Leite. Last updated 17 hours ago.
1.1 match 1 stars 4.90 score 7 scriptsmartinzaefferer
CEGO:Combinatorial Efficient Global Optimization
Model building, surrogate model based optimization and Efficient Global Optimization in combinatorial or mixed search spaces.
Maintained by Martin Zaefferer. Last updated 3 months ago.
1.8 match 1 stars 3.04 score 73 scriptsrepboxr
repboxStata:Repbox analysis of stata scripts in reproduction packages
Repbox analysis of stata scripts in reproduction packages
Maintained by Sebastian Kranz. Last updated 2 months ago.
2.0 match 2.73 score 4 scripts 2 dependentstaylor-arnold
coreNLP:Wrappers Around Stanford CoreNLP Tools
Provides a minimal interface for applying annotators from the 'Stanford CoreNLP' java library. Methods are provided for tasks such as tokenisation, part of speech tagging, lemmatisation, named entity recognition, coreference detection and sentiment analysis.
Maintained by Taylor Arnold. Last updated 3 years ago.
1.8 match 1 stars 3.04 score 55 scriptschrchang
pgenlibr:PLINK 2 Binary (.pgen) Reader
A thin wrapper over PLINK 2's core libraries which provides an R interface for reading .pgen files. A minimal .pvar loader is also included. Chang et al. (2015) \doi{10.1186/s13742-015-0047-8}.
Maintained by Christopher Chang. Last updated 3 months ago.
1.7 match 2.98 score 64 scriptscran
qtlhot:Inference for QTL Hotspots
Functions to infer co-mapping trait hotspots and causal models. Chaibub Neto E, Keller MP, Broman AF, Attie AD, Jansen RC, Broman KW, Yandell BS (2012) Quantile-based permutation thresholds for QTL hotspots. Genetics 191 : 1355-1365. <doi:10.1534/genetics.112.139451>. Chaibub Neto E, Broman AT, Keller MP, Attie AD, Zhang B, Zhu J, Yandell BS (2013) Modeling causality for pairs of phenotypes in system genetics. Genetics 193 : 1003-1013. <doi:10.1534/genetics.112.147124>.
Maintained by Brian S. Yandell. Last updated 7 years ago.
1.8 match 2.30 scoreconsmavr
FLR:Fuzzy Logic Rule Classifier
FLR algorithm for classification
Maintained by Constantinos Mavridis. Last updated 13 years ago.
2.3 match 1.78 score 3 scriptsskranz
rmdtools:Tools for RMarkdown
Tools for RMarkdown
Maintained by Sebastian Kranz. Last updated 4 years ago.
1.9 match 1 stars 1.78 score 6 scripts 2 dependentscran
Tex4exams:Generating 'Sweave' Code for 'R/exams' Questions in Mathematics
When using the R package 'exams' to write mathematics questions in 'Sweave' files, the output of a lot of R functions need to be adjusted for display in mathematical formulas. Specifically, the functions were accumulated when writing questions for the topics of the mathematics courses College Algebra, Precalculus, Calculus, Differential Equations, Introduction to Probability, and Linear Algebra. The output of the developed functions can be used in 'Sweave' files.
Maintained by Qingwen Hu. Last updated 2 years ago.
3.3 match 1.00 scoremadanstat
LongCART:Recursive Partitioning for Longitudinal Data and Right Censored Data Using Baseline Covariates
Constructs tree for continuous longitudinal data and survival data using baseline covariates as partitioning variables according to the 'LongCART' and 'SurvCART' algorithm, respectively. Later also included functions to calculate conditional power and predictive power of success based on interim results and probability of success for a prospective trial.
Maintained by Madan G Kundu. Last updated 3 years ago.
3.3 match 1.00 score 4 scriptstconwell
textTools:Functions for Text Cleansing and Text Analysis
A framework for text cleansing and analysis. Conveniently prepare and process large amounts of text for analysis. Includes various metrics for word counts/frequencies that scale efficiently. Quickly analyze large amounts of text data using a text.table (a data.table created with one word (or unit of text analysis) per row, similar to the tidytext format). Offers flexibility to efficiently work with text data stored in vectors as well as text data formatted as a text.table.
Maintained by Timothy Conwell. Last updated 4 years ago.
3.3 match 1.00 score 4 scriptscomputationalstylistics
litRiddle:Dataset and Tools to Research the Riddle of Literary Quality
Dataset and functions to explore quality of literary novels. The package is a part of the Riddle of Literary Quality project, and it contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data. For more details, see: Eder M, van Zundert J, Lensink S, van Dalen-Oskam K (2022). Replicating The Riddle of Literary Quality: The litRiddle package for R. In _Digital Humanities 2022: Conference Abstracts_, 636-637.
Maintained by Maciej Eder. Last updated 2 years ago.
1.2 match 2.70 score 2 scriptscran
chinese.misc:Miscellaneous Tools for Chinese Text Mining and More
Efforts are made to make Chinese text mining easier, faster, and robust to errors. Document term matrix can be generated by only one line of code; detecting encoding, segmenting and removing stop words are done automatically. Some convenient tools are also supplied.
Maintained by Jiang Wu. Last updated 5 years ago.
1.8 match 1.78 score 2 dependentscran
RegKink:Regression Kink with a Time-Varying Threshold
An algorithm is proposed to estimate regression kink model proposed by the paper, Lixiong Yang and Jen-Je Su (2018) <doi:10.1016/j.jimonfin.2018.06.002>.
Maintained by Lixiong Yang. Last updated 4 years ago.
2.3 match 1 stars 1.00 scoreprabinameher
EncDNA:Encoding of Nucleotide Sequences into Numeric Feature Vectors
We describe fifteen different splice site sequence encoding schemes that have been used in earlier studies for mapping of splice site sequences into numeric feature vectors. These encoding schemes will also be helpful for transforming other nucleotide sequences into numeric forms, provided they are of equal length. These encoding schemes will help the computational biologist working in the field of classification (binary or multiclass) or prediction involving nucleic acid sequences of equal length.
Maintained by Prabina Kumar Meher. Last updated 6 years ago.
2.3 match 1 stars 1.00 scoreschweflo
NLPclient:Stanford 'CoreNLP' Annotation Client
Stanford 'CoreNLP' annotation client. Stanford 'CoreNLP' <https://stanfordnlp.github.io/CoreNLP/index.html> integrates all NLP tools from the Stanford Natural Language Processing Group, including a part-of-speech (POS) tagger, a named entity recognizer (NER), a parser, and a coreference resolution system, and provides model files for the analysis of English. More information can be found in the README.
Maintained by Florian Schwendinger. Last updated 5 years ago.
0.5 match 1.70 score