R-universe search: linguistics

ropensci

lingtypology:Linguistic Typology and Mapping

Provides R with the Glottolog database <https://glottolog.org/> and some more abilities for purposes of linguistic mapping. The Glottolog database contains the catalogue of languages of the world. This package helps researchers to make a linguistic maps, using philosophy of the Cross-Linguistic Linked Data project <https://clld.org/>, which allows for while at the same time facilitating uniform access to the data across publications. A tutorial for this package is available on GitHub pages <https://docs.ropensci.org/lingtypology/> and package vignette. Maps created by this package can be used both for the investigation and linguistic teaching. In addition, package provides an ability to download data from typological databases such as WALS, AUTOTYP and some others and to create your own database website.

Maintained by George Moroz. Last updated 5 months ago.

abvd afbo atlas autotype bivaltyp clld glottolog-database linguistic-maps linguistics phoible sails typology wals

21.8 match 51 stars 9.58 score 694 scripts

beerda

lfl:Linguistic Fuzzy Logic

Various algorithms related to linguistic fuzzy logic: mining for linguistic fuzzy association rules, composition of fuzzy relations, performing perception-based logical deduction (PbLD), and forecasting time-series using fuzzy rule-based ensemble (FRBE). The package also contains basic fuzzy-related algebraic functions capable of handling missing values in different styles (Bochvar, Sobocinski, Kleene etc.), computation of Sugeno integrals and fuzzy transform.

Maintained by Michal Burda. Last updated 4 months ago.

association-rules forecast-model fuzzy-logic inference-rules cpp openmp

20.4 match 8 stars 5.35 score 28 scripts

sylvainloiseau

interlineaR:Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software

Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. 'InterlineaR' provide a set of functions that targets several popular formats of IGT ('SIL Toolbox', 'EMELD XML') and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software ('SIL FLEX', 'SIL Toolbox') typically produce dictionaries of the morphemes used in the glosses. 'InterlineaR' provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.

Maintained by Sylvain Loiseau. Last updated 7 years ago.

corpus-linguistics descriptive-linguistics dictionaries interlinear-gloss

20.3 match 4 stars 4.60 score 9 scripts

agricolamz

lingglosses:Interlinear Glossed Linguistic Examples and Abbreviation Lists Generation

Helps to render interlinear glossed linguistic examples in html 'rmarkdown' documents and then semi-automatically compiles the list of glosses at the end of the document. It also provides a database of linguistic glosses.

Maintained by George Moroz. Last updated 10 days ago.

glosses glosses-list interlinear-gloss language-documentation linguistics rmarkdown typology

13.7 match 15 stars 5.88 score 167 scripts

k3jph

phonics:Phonetic Spelling Algorithms

Provides a collection of phonetic algorithms including Soundex, Metaphone, NYSIIS, Caverphone, and others. The package is documented in <doi:10.18637/jss.v095.i08>.

Maintained by James Howard. Last updated 4 years ago.

bsd-2-license linguistics metaphone nysiis phonetic-spelling-algorithms phonics record-linkage soundex text-processing cpp

10.0 match 30 stars 7.18 score 56 scripts 3 dependents

liao961120

linguisticsdown:Easy Linguistics Document Writing with R Markdown

Provides 'Shiny gadgets' to search, type, and insert IPA symbols into documents or scripts, requiring only knowledge about phonetics or 'X-SAMPA'. Also provides functions to facilitate the rendering of IPA symbols in 'LaTeX' and PDF format, making IPA symbols properly rendered in all output formats. A minimal R Markdown template for authoring Linguistics related documents is also bundled with the package. Some helper functions to facilitate authoring with R Markdown is also provided.

Maintained by Yongfu Liao. Last updated 6 years ago.

linguistics rmarkdown rmarkdown-template

13.4 match 26 stars 4.59 score 30 scripts

seancarmody

ngramr:Retrieve and Plot Google n-Gram Data

Retrieve and plot word frequencies through time from the "Google Ngram Viewer" <https://books.google.com/ngrams>.

Maintained by Sean Carmody. Last updated 2 months ago.

linguistics

10.0 match 49 stars 5.79 score 42 scripts

cran

FuzzySTs:Fuzzy Statistical Tools

The main goal of this package is to present various fuzzy statistical tools. It intends to provide an implementation of the theoretical and empirical approaches presented in the book entitled "The signed distance measure in fuzzy statistical analysis. Some theoretical, empirical and programming advances" <doi: 10.1007/978-3-030-76916-1>. For the theoretical approaches, see Berkachy R. and Donze L. (2019) <doi:10.1007/978-3-030-03368-2_1>. For the empirical approaches, see Berkachy R. and Donze L. (2016) <ISBN: 978-989-758-201-1>). Important (non-exhaustive) implementation highlights of this package are as follows: (1) a numerical procedure to estimate the fuzzy difference and the fuzzy square. (2) two numerical methods of fuzzification. (3) a function performing different possibilities of distances, including the signed distance and the generalized signed distance for instance with all its properties. (4) numerical estimations of fuzzy statistical measures such as the variance, the moment, etc. (5) two methods of estimation of the bootstrap distribution of the likelihood ratio in the fuzzy context. (6) an estimation of a fuzzy confidence interval by the likelihood ratio method. (7) testing fuzzy hypotheses and/or fuzzy data by fuzzy confidence intervals in the Kwakernaak - Kruse and Meyer sense. (8) a general method to estimate the fuzzy p-value with fuzzy hypotheses and/or fuzzy data. (9) a method of estimation of global and individual evaluations of linguistic questionnaires. (10) numerical estimations of multi-ways analysis of variance models in the fuzzy context. The unbalance in the considered designs are also foreseen.

Maintained by Redina Berkachy. Last updated 8 months ago.

17.0 match 3.40 score

patrickreidy

textgRid:Praat TextGrid Objects in R

The software application Praat can be used to annotate waveform data (e.g., to mark intervals of interest or to label events). (See <http://www.fon.hum.uva.nl/praat/> for more information about Praat.) These annotations are stored in a Praat TextGrid object, which consists of a number of interval tiers and point tiers. An interval tier consists of sequential (i.e., not overlapping) labeled intervals. A point tier consists of labeled events that have no duration. The 'textgRid' package provides S4 classes, generics, and methods for accessing information that is stored in Praat TextGrid objects.

Maintained by Patrick Reidy. Last updated 7 years ago.

acoustic-phonetics linguistics praat speech-science textgrid

10.0 match 24 stars 4.58 score 16 scripts

masterclm

mclm:Mastering Corpus Linguistics Methods

Read, inspect and process corpus files for quantitative corpus linguistics. Obtain concordances via regular expressions, tokenize texts, and compute frequencies and association measures. Useful for collocation analysis, keywords analysis and variationist studies (comparison of linguistic variants and of linguistic varieties).

Maintained by Mariana Montes. Last updated 2 years ago.

corpus linguistics cpp

14.1 match 1 stars 3.24 score 35 scripts

ropensci

phonfieldwork:Linguistic Phonetic Fieldwork Tools

There are a lot of different typical tasks that have to be solved during phonetic research and experiments. This includes creating a presentation that will contain all stimuli, renaming and concatenating multiple sound files recorded during a session, automatic annotation in 'Praat' TextGrids (this is one of the sound annotation standards provided by 'Praat' software, see Boersma & Weenink 2020 <https://www.fon.hum.uva.nl/praat/>), creating an html table with annotations and spectrograms, and converting multiple formats ('Praat' TextGrid, 'ELAN', 'EXMARaLDA', 'Audacity', subtitles '.srt', and 'FLEx' flextext). All of these tasks can be solved by a mixture of different tools (any programming language has programs for automatic renaming, and Praat contains scripts for concatenating and renaming files, etc.). 'phonfieldwork' provides a functionality that will make it easier to solve those tasks independently of any additional tools. You can also compare the functionality with other packages: 'rPraat' <https://CRAN.R-project.org/package=rPraat>, 'textgRid' <https://CRAN.R-project.org/package=textgRid>.

Maintained by George Moroz. Last updated 8 months ago.

audacity eaf elan exb exmaralda fieldwork flextext phonetics phonology praat srt-subtitles textgrid

5.9 match 20 stars 6.68 score 20 scripts

tallguyjenks

runes:Convert Strings to Elder Futhark Runes

Convert a string of text characters to Elder Futhark Runes <https://en.wikipedia.org/wiki/Elder_Futhark>.

Maintained by Bryan Jenks. Last updated 4 years ago.

bryan-jenks elder-futhark-runes futhark futhark-runes linguistics nordic rstudio rune runes

10.0 match 10 stars 3.70 score 2 scripts

quadrama

DramaAnalysis:Analysis of Dramatic Texts

Analysis of preprocessed dramatic texts, with respect to literary research. The package provides functions to analyze and visualize information about characters, stage directions, the dramatic structure and the text itself. The dramatic texts are expected to be in CSV format, which can be installed from within the package, sample texts are provided. The package and the reasoning behind it are described in Reiter et al. (2017) <doi:10.18420/in2017_119>.

Maintained by Nils Reiter. Last updated 4 years ago.

corpus-linguistics digital-humanities drama dramatic-texts statistics

7.5 match 15 stars 4.79 score 41 scripts

glottospace

glottospace:Language Mapping and Geospatial Analysis of Linguistic and Cultural Data

Streamlined workflows for geolinguistic analysis, including: accessing global linguistic and cultural databases, data import, data entry, data cleaning, data exploration, mapping, visualization and export.

Maintained by Rui Dong. Last updated 3 months ago.

5.2 match 23 stars 5.54 score 6 scripts

miserman

lingmatch:Linguistic Matching and Accommodation

Measure similarity between texts. Offers a variety of processing tools and similarity metrics to facilitate flexible representation of texts and matching. Implements forms of Language Style Matching (Ireland & Pennebaker, 2010) <doi:10.1037/a0020386> and Latent Semantic Analysis (Landauer & Dumais, 1997) <doi:10.1037/0033-295X.104.2.211>.

Maintained by Micah Iserman. Last updated 25 days ago.

nlp rcpp text-analysis cpp

5.3 match 11 stars 4.80 score 23 scripts

mspeekenbrink

sdamr:Statistics: Data Analysis and Modelling

Data sets and functions to support the books "Statistics: Data analysis and modelling" by Speekenbrink, M. (2021) <https://mspeekenbrink.github.io/sdam-book/> and "An R companion to Statistics: data analysis and modelling" by Speekenbrink, M. (2021) <https://mspeekenbrink.github.io/sdam-r-companion/>. All datasets analysed in these books are provided in this package. In addition, the package provides functions to compute sample statistics (variance, standard deviation, mode), create raincloud and enhanced Q-Q plots, and expand Anova results into omnibus tests and tests of individual contrasts.

Maintained by Maarten Speekenbrink. Last updated 1 months ago.

4.8 match 5 stars 4.39 score 99 scripts

dselivanov

text2vec:Modern Text Mining Framework for R

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

Maintained by Dmitriy Selivanov. Last updated 7 months ago.

glove latent-dirichlet-allocation natural-language-processing text-mining topic-modeling vectorization word-embeddings word2vec cpp

1.5 match 860 stars 13.48 score 1.3k scripts 23 dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

1.6 match 215 stars 11.83 score 1.2k scripts 9 dependents

alexchristensen

SemNetCleaner:An Automated Cleaning Tool for Semantic and Linguistic Data

Implements several functions that automates the cleaning and spell-checking of text data. Also converges, finalizes, removes plurals and continuous strings, and puts text data in binary format for semantic network analysis. Uses the 'SemNetDictionaries' package to make the cleaning process more accurate, efficient, and reproducible.

Maintained by Alexander P. Christensen. Last updated 3 years ago.

preprocessing semantic-network-analysis

2.9 match 10 stars 6.16 score 48 scripts 1 dependents

cran

frbs:Fuzzy Rule-Based Systems for Classification and Regression Tasks

An implementation of various learning algorithms based on fuzzy rule-based systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IF-THEN rules, to handle real-life problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IF-THEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neuro-fuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named 'frbsPMML', which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XML-based language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from 'frbsPMML'. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community.

Maintained by Christoph Bergmeir. Last updated 5 years ago.

3.5 match 12 stars 5.03 score 74 scripts 1 dependents

r-forge

corpora:Statistics and Data Sets for Corpus Frequency Data

Utility functions for the statistical analysis of corpus frequency data. This package is a companion to the open-source course "Statistical Inference: A Gentle Introduction for Computational Linguists and Similar Creatures" ('SIGIL').

Maintained by Stephanie Evert. Last updated 1 months ago.

5.8 match 3.01 score 34 scripts

cran

languageR:Analyzing Linguistic Data: A Practical Introduction to Statistics

Data sets exemplifying statistical methods, and some facilitatory utility functions used in ``Analyzing Linguistic Data: A practical introduction to statistics using R'', Cambridge University Press, 2008.

Maintained by R. H. Baayen. Last updated 6 years ago.

5.2 match 2.32 score

fncokg

lingdist:Fast Linguistic Distance and Alignment Computation

A fast generalized edit distance and string alignment computation mainly for linguistic aims. As a generalization to the classic edit distance algorithms, the package allows users to define custom cost for every symbol's insertion, deletion, and substitution. The package also allows character combinations in any length to be seen as a single symbol which is very useful for International Phonetic Alphabet (IPA) transcriptions with diacritics. In addition to edit distance result, users can get detailed alignment information such as all possible alignment scenarios between two strings which is useful for testing, illustration or any further usage. Either the distance matrix or its long table form can be obtained and tools to do such conversions are provided. All functions in the package are implemented in 'C++' and the distance matrix computation is parallelized leveraging the 'RcppThread' package.

Maintained by Chao Kong. Last updated 1 years ago.

language-distance rcpp cpp

3.5 match 3 stars 3.18 score 1 scripts

sammo3182

regioncode:Convert Region Names and Division Codes of China Over Years

A tool to conquer the difficulties to convert various region names and administration division codes of Chinese regions. The current version enables seamlessly converting Chinese regions' formal names, common-used names, and codes between each other at the city level from 1986 to 2019.

Maintained by Yue Hu. Last updated 3 months ago.

1.5 match 11 stars 6.48 score 14 scripts

cysouw

qlcData:Processing Data for Quantitative Language Comparison

Functionality to read, recode, and transcode data as used in quantitative language comparison, specifically to deal with multilingual orthographic variation (Moran & Cysouw (2018) <doi:10.5281/zenodo.1296780>) and with the recoding of nominal data.

Maintained by Michael Cysouw. Last updated 9 months ago.

1.8 match 3 stars 5.38 score 40 scripts

prabhanjan-tattar

ACSWR:A Companion Package for the Book "A Course in Statistics with R"

A book designed to meet the requirements of masters students. Tattar, P.N., Suresh, R., and Manjunath, B.G. "A Course in Statistics with R", J. Wiley, ISBN 978-1-119-15272-9.

Maintained by Prabhanjan Tattar. Last updated 10 years ago.

3.8 match 2.03 score 106 scripts

cysouw

qlcVisualize:Visualization for Quantitative Language Comparison

Collection of visualizations as used in quantitative language comparison. Currently implemented are visualisations dealing nominal data with multiple levels ("level map" and "factor map"), and assistance for making weighted geographical Voronoi-maps ("weighted map").

Maintained by Michael Cysouw. Last updated 6 months ago.

1.7 match 4.03 score 24 scripts

jimbrig

jimstools:Tools for R

What the package does (one paragraph).

Maintained by Jimmy Briggs. Last updated 3 years ago.

functions personal utility

2.0 match 2 stars 3.00 score 2 scripts

bnosac

nametagger:Named Entity Recognition in Texts using 'NameTag'

Wraps the 'nametag' library <https://github.com/ufal/nametag>, allowing users to find and extract entities (names, persons, locations, addresses, ...) in raw text and build your own entity recognition models. Based on a maximum entropy Markov model which is described in Strakova J., Straka M. and Hajic J. (2013) <https://ufal.mff.cuni.cz/~straka/papers/2013-tsd_ner.pdf>.

Maintained by Jan Wijffels. Last updated 1 years ago.

ner cpp

1.6 match 11 stars 3.74 score 8 scripts

browndw

pseudobibeR:Aggregate Counts of Linguistic Features

Calculates the lexicogrammatical and functional features described by Biber (1985) <doi:10.1515/ling.1985.23.2.337> and widely used for text-type, register, and genre classification tasks.

Maintained by David Brown. Last updated 4 months ago.

3.1 match 1.70 score 3 scripts

samlestrade2

MoLE:Modeling Language Evolution

Model for simulating language evolution in terms of cultural evolution (Smith & Kirby (2008) <DOI:10.1098/rstb.2008.0145>; Deacon 1997). The focus is on the emergence of argument-marking systems (Dowty (1991) <DOI:10.1353/lan.1991.0021>, Van Valin 1999, Dryer 2002, Lestrade 2015a), i.e. noun marking (Aristar (1997) <DOI:10.1075/sl.21.2.04ari>, Lestrade (2010) <DOI:10.7282/T3ZG6R4S>), person indexing (Ariel 1999, Dahl (2000) <DOI:10.1075/fol.7.1.03dah>, Bhat 2004), and word order (Dryer 2013), but extensions are foreseen. Agents start out with a protolanguage (a language without grammar; Bickerton (1981) <DOI:10.17169/langsci.b91.109>, Jackendoff 2002, Arbib (2015) <DOI:10.1002/9781118346136.ch27>) and interact through language games (Steels 1997). Over time, grammatical constructions emerge that may or may not become obligatory (for which the tolerance principle is assumed; Yang 2016). Throughout the simulation, uniformitarianism of principles is assumed (Hopper (1987) <DOI:10.3765/bls.v13i0.1834>, Givon (1995) <DOI:10.1075/z.74>, Croft (2000), Saffran (2001) <DOI:10.1111/1467-8721.01243>, Heine & Kuteva 2007), in which maximal psychological validity is aimed at (Grice (1975) <DOI:10.1057/9780230005853_5>, Levelt 1989, Gaerdenfors 2000) and language representation is usage based (Tomasello 2003, Bybee 2010). In Lestrade (2015b) <DOI:10.15496/publikation-8640>, Lestrade (2015c) <DOI:10.1075/avt.32.08les>, and Lestrade (2016) <DOI:10.17617/2.2248195>), which reported on the results of preliminary versions, this package was announced as WDWTW (for who does what to whom), but for reasons of pronunciation and generalization the title was changed.

Maintained by Sander Lestrade. Last updated 7 years ago.

2.0 match 2.46 score 58 scripts

j-ll

GeoFIS:Spatial Data Processing for Decision Making

Methods for processing spatial data for decision-making. This package is an R implementation of methods provided by the open source software GeoFIS <https://www.geofis.org> (Leroux et al. 2018) <doi:10.3390/agriculture8060073>. The main functionalities are the management zone delineation (Pedroso et al. 2010) <doi:10.1016/j.compag.2009.10.007> and data aggregation (Mora-Herrera et al. 2020) <doi:10.1016/j.compag.2020.105624>.

Maintained by Jean-Luc Lablée. Last updated 3 months ago.

mpfr4 gmp cpp

1.3 match 3.30 score 8 scripts

myeomans

politeness:Detecting Politeness Features in Text

Detecting markers of politeness in English natural language. This package allows researchers to easily visualize and quantify politeness between groups of documents. This package combines prior research on the linguistic markers of politeness. We thank the Spencer Foundation, the Hewlett Foundation, and Harvard's Institute for Quantitative Social Science for support.

Maintained by Mike Yeomans. Last updated 1 months ago.

0.5 match 25 stars 7.49 score 41 scripts 1 dependents

ropensci

pangoling:Access to Large Language Model Predictions

Provides access to word predictability estimates using large language models (LLMs) based on 'transformer' architectures via integration with the 'Hugging Face' ecosystem. The package interfaces with pre-trained neural networks and supports both causal/auto-regressive LLMs (e.g., 'GPT-2'; Radford et al., 2019) and masked/bidirectional LLMs (e.g., 'BERT'; Devlin et al., 2019, <doi:10.48550/arXiv.1810.04805>) to compute the probability of words, phrases, or tokens given their linguistic context. By enabling a straightforward estimation of word predictability, the package facilitates research in psycholinguistics, computational linguistics, and natural language processing (NLP).

Maintained by Bruno Nicenboim. Last updated 4 days ago.

nlp psycholinguistics transformers

0.8 match 8 stars 4.90 score

oliverehmer

act:Aligned Corpus Toolkit

The Aligned Corpus Toolkit (act) is designed for linguists that work with time aligned transcription data. It offers functions to import and export various annotation file formats ('ELAN' .eaf, 'EXMARaLDA .exb and 'Praat' .TextGrid files), create print transcripts in the style of conversation analysis, search transcripts (span searches across multiple annotations, search in normalized annotations, make concordances etc.), export and re-import search results (.csv and 'Excel' .xlsx format), create cuts for the search results (print transcripts, audio/video cuts using 'FFmpeg' and video sub titles in 'Subrib title' .srt format), modify the data in a corpus (search/replace, delete, filter etc.), interact with 'Praat' using 'Praat'-scripts, and exchange data with the 'rPraat' package. The package is itself written in R and may be expanded by other users.

Maintained by Oliver Ehmer. Last updated 2 years ago.

0.5 match 4 stars 6.65 score 184 scripts

myeomans

doc2concrete:Measuring Concreteness in Natural Language

Models for detecting concreteness in natural language. This package is built in support of Yeomans (2021) <doi:10.1016/j.obhdp.2020.10.008>, which reviews linguistic models of concreteness in several domains. Here, we provide an implementation of the best-performing domain-general model (from Brysbaert et al., (2014) <doi:10.3758/s13428-013-0403-5>) as well as two pre-trained models for the feedback and plan-making domains.

Maintained by Mike Yeomans. Last updated 1 years ago.

0.5 match 13 stars 5.59 score 20 scripts 1 dependents

ptaranti

coppeCosenzaR:COPPE-Cosenza Fuzzy Hierarchy Model

The program implements the COPPE-Cosenza Fuzzy Hierarchy Model. The model was based on the evaluation of local alternatives, representing regional potentialities, so as to fulfill demands of economic projects. After defining demand profiles in terms of their technological coefficients, the degree of importance of factors is defined so as to represent the productive activity. The method can detect a surplus of supply without the restriction of the distance of classical algebra, defining a hierarchy of location alternatives. In COPPE-Cosenza Model, the distance between factors is measured in terms of the difference between grades of memberships of the same factors belonging to two or more sets under comparison. The required factors are classified under the following linguistic variables: Critical (CR); Conditioning (C); Little Conditioning (LC); and Irrelevant (I). And the alternatives can assume the following linguistic variables: Excellent (Ex), Good (G), Regular (R), Weak (W), Empty (Em), Zero (Z) and Inexistent (In). The model also provides flexibility, allowing different aggregation rules to be performed and defined by the Decision Maker. Such feature is considered in this package, allowing the user to define other aggregation matrices, since it considers the same linguistic variables mentioned.

Maintained by Pier Taranti. Last updated 5 years ago.

coppe-cosenza

0.9 match 3.00 score 20 scripts

qtalr

qtkit:Quantitative Text Kit

Support package for the textbook "An Introduction to Quantitative Text Analysis for Linguists: Reproducible Research Using R" (Francom, 2024) <doi:10.4324/9781003393764>. Includes functions to acquire, clean, and analyze text data as well as functions to document and share the results of text analysis. The package is designed to be used in conjunction with the book, but can also be used as a standalone package for text analysis.

Maintained by Jerid Francom. Last updated 2 months ago.

0.5 match 5.03 score 12 scripts

maciejdanko

hopit:Hierarchical Ordered Probit Models with Application to Reporting Heterogeneity

Self-reported health, happiness, attitudes, and other statuses or perceptions are often the subject of biases that may come from different sources. For example, the evaluation of an individual’s own health may depend on previous medical diagnoses, functional status, and symptoms and signs of illness; as on well as life-style behaviors, including contextual social, gender, age-specific, linguistic and other cultural factors (Jylha 2009 <doi:10.1016/j.socscimed.2009.05.013>; Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). The hopit package offers versatile functions for analyzing different self-reported ordinal variables, and for helping to estimate their biases. Specifically, the package provides the function to fit a generalized ordered probit model that regresses original self-reported status measures on two sets of independent variables (King et al. 2004 <doi:10.1017/S0003055403000881>; Jurges 2007 <doi:10.1002/hec.1134>; Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). The first set of variables (e.g., health variables) included in the regression are individual statuses and characteristics that are directly related to the self-reported variable. In the case of self-reported health, these could be chronic conditions, mobility level, difficulties with daily activities, performance on grip strength tests, anthropometric measures, and lifestyle behaviors. The second set of independent variables (threshold variables) is used to model cut-points between adjacent self-reported response categories as functions of individual characteristics, such as gender, age group, education, and country (Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). The model helps to adjust for specific socio-demographic and cultural differences in how the continuous latent health is projected onto the ordinal self-rated measure. The fitted model can be used to calculate an individual predicted latent status variable, a latent index, and standardized latent coefficients; and makes it possible to reclassify a categorical status measure that has been adjusted for inter-individual differences in reporting behavior.

Maintained by Maciej J. Danko. Last updated 2 years ago.

cpp

0.5 match 6 stars 4.95 score 5 scripts

korap

RKorAPClient:'KorAP' Web Service Client Package

A client package that makes the 'KorAP' web service API accessible from R. The corpus analysis platform 'KorAP' has been developed as a scientific tool to make potentially large, stratified and multiply annotated corpora, such as the 'German Reference Corpus DeReKo' or the 'Corpus of the Contemporary Romanian Language CoRoLa', accessible for linguists to let them verify hypotheses and to find interesting patterns in real language use. The 'RKorAPClient' package provides access to 'KorAP' and the corpora behind it for user-created R code, as a programmatic alternative to the 'KorAP' web user-interface. You can learn more about 'KorAP' and use it directly on 'DeReKo' at <https://korap.ids-mannheim.de/>.

Maintained by Marc Kupietz. Last updated 15 days ago.

0.5 match 6 stars 4.81 score 30 scripts

aquincum

Rexperigen:R Interface to Experigen

Provides convenience functions to communicate with an Experigen server: Experigen (<http://github.com/aquincum/experigen>) is an online framework for creating linguistic experiments, and it stores the results on a dedicated server. This package can be used to retrieve the results from the server, and it is especially helpful with registered experiments, as authentication with the server has to happen.

Maintained by Daniel Szeredi. Last updated 9 years ago.

0.5 match 1 stars 2.95 score 18 scripts

thmild

keyperm:Keyword Analysis Using Permutation Tests

Fast implementation of permutation tests for keyword analysis in corpus linguistics. The aim is to identify words that are significantly more frequent in one corpus than in another. The method is described in Mildenberger (2023) <arXiv:2308.13383>.

Maintained by Thoralf Mildenberger. Last updated 2 years ago.

cpp

0.5 match 2.70 score 3 scripts

cran

rLDCP:Text Generation from Data

Linguistic Descriptions of Complex Phenomena (LDCP) is an architecture and methodology that allows us to model complex phenomena, interpreting input data, and generating automatic text reports customized to the user needs (see <doi:10.1016/j.ins.2016.11.002> and <doi:10.1007/s00500-016-2430-5>). The proposed package contains a set of methods that facilitates the development of LDCP systems. It main goal is increasing the visibility and practical use of this research line.

Maintained by Patricia Conde-Clemente. Last updated 7 years ago.

0.5 match 1.43 score 27 scripts

cran

PrInDT:Prediction and Interpretation in Decision Trees for Classification and Regression

Optimization of conditional inference trees from the package 'party' for classification and regression. For optimization, the model space is searched for the best tree on the full sample by means of repeated subsampling. Restrictions are allowed so that only trees are accepted which do not include pre-specified uninterpretable split results (cf. Weihs & Buschfeld, 2021a). The function PrInDT() represents the basic resampling loop for 2-class classification (cf. Weihs & Buschfeld, 2021a). The function RePrInDT() (repeated PrInDT()) allows for repeated applications of PrInDT() for different percentages of the observations of the large and the small classes (cf. Weihs & Buschfeld, 2021c). The function NesPrInDT() (nested PrInDT()) allows for an extra layer of subsampling for a specific factor variable (cf. Weihs & Buschfeld, 2021b). The functions PrInDTMulev() and PrInDTMulab() deal with multilevel and multilabel classification. In addition to these PrInDT() variants for classification, the function PrInDTreg() has been developed for regression problems. Finally, the function PostPrInDT() allows for a posterior analysis of the distribution of a specified variable in the terminal nodes of a given tree. References are: -- Weihs, C., Buschfeld, S. (2021a) "Combining Prediction and Interpretation in Decision Trees (PrInDT) - a Linguistic Example" <arXiv:2103.02336>; -- Weihs, C., Buschfeld, S. (2021b) "NesPrInDT: Nested undersampling in PrInDT" <arXiv:2103.14931>; -- Weihs, C., Buschfeld, S. (2021c) "Repeated undersampling in PrInDT (RePrInDT): Variation in undersampling and prediction, and ranking of predictors in ensembles" <arXiv:2108.05129>.

Maintained by Claus Weihs. Last updated 2 years ago.

0.5 match 1.00 score