R-universe search: portuguese

pommedeterresautee

unine:Unine Light Stemmer

Implementation of "light" stemmers for French, German, Italian, Spanish, Portuguese, Finnish, Swedish. They are based on the same work as the "light" stemmers found in 'SolR' <https://lucene.apache.org/solr/> or 'ElasticSearch' <https://www.elastic.co/fr/products/elasticsearch>. A "light" stemmer consists in removing inflections only for noun and adjectives. Indexing verbs for these languages is not of primary importance compared to nouns and adjectives. The stemming procedure for French is described in (Savoy, 1999) <doi:10.1002/(SICI)1097-4571(1999)50:10%3C944::AID-ASI9%3E3.3.CO;2-H>.

Maintained by Michaël Benesty. Last updated 6 years ago.

finish french german information-retrieval ir italian nlp portuguese spanish stemmer swedish cpp

16.5 match 4 stars 3.30 score 1 scripts

xiaoruizhu

SurrogateRsq:Goodness-of-Fit Analysis for Categorical Data using the Surrogate R-Squared

To assess and compare the models' goodness of fit, R-squared is one of the most popular measures. For categorical data analysis, however, no universally adopted R-squared measure can resemble the ordinary least square (OLS) R-squared for linear models with continuous data. This package implement the surrogate R-squared measure for categorical data analysis, which is proposed in the study of Dungang Liu, Xiaorui Zhu, Brandon Greenwell, and Zewei Lin (2022) <doi:10.1111/bmsp.12289>. It can generate a point or interval measure of the surrogate R-squared. It can also provide a ranking measure of the percentage contribution of each variable to the overall surrogate R-squared. This ranking assessment allows one to check the importance of each variable in terms of their explained variance. This package can be jointly used with other existing R packages for variable selection and model diagnostics in the model-building process.

Maintained by Xiaorui (Jeremy) Zhu. Last updated 12 months ago.

categorical-data-analysis goodness-of-fit r-squared-statistic statistics

6.8 match 5 stars 4.48 score 12 scripts

datasketch

genero:Estimate Gender from Names in Spanish and Portuguese

Estimate gender from names in Spanish and Portuguese. Works with vectors and dataframes. The estimation works not only for first names but also full names. The package relies on a compilation of common names with it's most frequent associated gender in both languages which are used as look up tables for gender inference.

Maintained by Juan Pablo Marin Diaz. Last updated 5 years ago.

7.5 match 2 stars 4.00 score 9 scripts

cienciadedatos

dados:Translate Datasets to Portuguese

Este pacote traduz os seguintes conjuntos de dados: 'airlines', 'airports', 'ames_raw', 'AwardsManagers', 'babynames', 'Batting', 'diamonds', 'faithful', 'fueleconomy', 'Fielding', 'flights', 'gapminder', 'gss_cat', 'iris', 'Managers', 'mpg', 'mtcars', 'atmos', 'penguins', 'People, 'Pitching', 'pixarfilms','planes', 'presidential', 'table1', 'table2', 'table3', 'table4a', 'table4b', 'table5', 'vehicles', 'weather', 'who'. English: It provides a Portuguese translated version of the datasets listed above.

Maintained by Riva Quiroga. Last updated 7 months ago.

3.8 match 46 stars 7.13 score 266 scripts

dmarcelinobr

SoundexBR:Phonetic-Coding for Portuguese

The SoundexBR package provides an algorithm for decoding names into phonetic codes, as pronounced in Portuguese. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. The soundex code resultant consists of a four digits long string composed by one letter followed by three numerical digits: the letter is the first letter of the name, and the digits encode the remaining consonants.

Maintained by Daniel Marcelino. Last updated 6 years ago.

5.8 match 13 stars 3.99 score 15 scripts

koheiw

newsmap:Semi-Supervised Model for Geographical Document Classification

Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).

Maintained by Kohei Watanabe. Last updated 9 months ago.

machine-learning news-stories quanteda text-analysis

3.8 match 62 stars 6.05 score 8 scripts

ropensci

charlatan:Make Fake Data

Make fake data that looks realistic, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers ('DOIs'), jobs, phone numbers, 'DNA' sequences, doubles and integers from distributions and within a range.

Maintained by Roel M. Hogervorst. Last updated 1 months ago.

data dataset fake-data faker peer-reviewed

2.3 match 296 stars 10.06 score 180 scripts 1 dependents

nhs-pt

hospitals:Portuguese 'NHS' Hospitals

A data set of the Portuguese 'NHS' hospitals.

Maintained by Ramiro Magno. Last updated 3 years ago.

7.9 match 1 stars 2.70 score 2 scripts

sillasgonzaga

lexiconPT:Lexicons for Portuguese Text Analysis

Provides easy access for sentiment lexicons for those who want to do text analysis in Portuguese texts. As of now, two Portuguese lexicons are available: 'SentiLex-PT02' and 'OpLexicon' (v2.1 and v3.0).

Maintained by Sillas Gonzaga. Last updated 7 years ago.

3.9 match 57 stars 5.12 score 46 scripts

trangdata

treeheatr:Heatmap-Integrated Decision Tree Visualizations

Creates interpretable decision tree visualizations with the data represented as a heatmap at the tree's leaf nodes. 'treeheatr' utilizes the customizable 'ggparty' package for drawing decision trees.

Maintained by Trang Le. Last updated 2 years ago.

dataviz decision-trees ggplot heatmap visualization

3.5 match 57 stars 5.71 score 18 scripts

zumbov2

deeplr:Interface to the 'DeepL' Translation API

A wrapper for the 'DeepL' Pro API <https://www.deepl.com/docs-api>, a web service for translating texts between different languages. A DeepL API developer account is required to use the service (see <https://www.deepl.com/pro#developer>).

Maintained by David Zumbach. Last updated 12 months ago.

api-wrapper deepl translation

3.4 match 41 stars 5.57 score 70 scripts

dfalbel

rslp:A Stemming Algorithm for the Portuguese Language

Implements the "Stemming Algorithm for the Portuguese Language" <DOI:10.1109/SPIRE.2001.10024>.

Maintained by Daniel Falbel. Last updated 5 years ago.

3.7 match 21 stars 4.10 score 12 scripts

koenderks

aRtsy:Generative Art with 'ggplot2'

Provides algorithms for creating artworks in the 'ggplot2' language that incorporate some form of randomness.

Maintained by Koen Derks. Last updated 21 hours ago.

generative-art ggplot2 cpp

2.0 match 174 stars 7.52 score 59 scripts

dfalbel

ptwikiwords:Words Used in Portuguese Wikipedia

Contains a dataset of words used in 15.000 randomly extracted pages from the Portuguese Wikipedia (<https://pt.wikipedia.org/>).

Maintained by Daniel Falbel. Last updated 8 years ago.

3.7 match 4 stars 3.30 score 6 scripts

cran

MVar.pt:Analise multivariada (brazilian portuguese)

Analise multivariada, tendo funcoes que executam analise de correspondencia simples (CA) e multipla (MCA), analise de componentes principais (PCA), analise de correlacao canonica (CCA), analise fatorial (FA), escalonamento multidimensional (MDS), analise discriminante linear (LDA) e quadratica (QDA), analise de cluster hierarquico e nao hierarquico, regressao linear simples e multipla, analise de multiplos fatores (MFA) para dados quantitativos, qualitativos, de frequencia (MFACT) e dados mistos, biplot, scatter plot, projection pursuit (PP), grant tour e outras funcoes uteis para a analise multivariada.

Maintained by Paulo Cesar Ossani. Last updated 4 months ago.

5.0 match 2.17 score 37 scripts

mvogel78

childsds:Data and Methods Around Reference Values in Pediatrics

Calculation of standard deviation scores and percentiles adduced from different standards (WHO, UK, Germany, Italy, China, etc). Also, references for laboratory values in children and adults are available, e.g., serum lipids, iron-related blood parameters, IGF, liver enzymes. See package documentation for full list.

Maintained by Mandy Vogel. Last updated 2 months ago.

3.8 match 2.83 score 51 scripts

jessesadler

debkeepr:Analysis of Non-Decimal Currencies and Double-Entry Bookkeeping

Analysis of historical non-decimal currencies and value systems that use tripartite or tetrapartite systems such as pounds, shillings, and pence. It introduces new vector classes to represent non-decimal currencies, making them compatible with numeric classes, and provides functions to work with these classes in data frames in the context of double-entry bookkeeping.

Maintained by Jesse Sadler. Last updated 2 years ago.

accounting digital-humanities economic-history history

1.5 match 9 stars 5.51 score 24 scripts

kumes

deepRstudio:Seamless Language Translation in 'RStudio' using 'DeepL' API and 'Rstudioapi'

Enhancing cross-language compatibility within the 'RStudio' environment and supporting seamless language understanding, the 'deepRstudio' package leverages the power of the 'DeepL' API (see <https://www.deepl.com/docs-api>) to enable seamless, fast, accurate, and affordable translation of code comments, documents, and text. This package offers the ability to translate selected text into English (EN), as well as from English into various languages, namely Japanese (JA), Chinese (ZH), Spanish (ES), French (FR), Russian (RU), Portuguese (PT), and Indonesian (ID). With much of the text being written in English, the emphasis is on compatibility from English. It is also designed for developers working on multilingual projects and data analysts collaborating with international teams, simplifying the translation process and making code more accessible and comprehensible to people with diverse language backgrounds. This package uses the 'rstudioapi' package and 'DeepL' API, and is simply implemented, executed from addins or via shortcuts on 'RStudio'. With just a few steps, content can be translated between supported languages, promoting better collaboration and expanding the global reach of work. The functionality of this package works only on 'RStudio' using 'rstudioapi'.

Maintained by Satoshi Kume. Last updated 1 years ago.

deepl deeprstudio language-translation rstudio rstudioapi seamless seamless-language translation

2.2 match 2 stars 3.48 score 4 scripts 1 dependents

nalimilan

SnowballC:Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library

An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.

Maintained by Milan Bouchet-Valat. Last updated 18 days ago.

text-mining

0.5 match 27 stars 12.63 score 4.4k scripts 171 dependents

abjur

abjutils:Useful Tools for Jurimetrical Analysis Used by the Brazilian Jurimetrics Association

The Brazilian Jurimetrics Association (ABJ in Portuguese, see <https://abj.org.br/> for more information) is a non-profit organization which aims to investigate and promote the use of statistics and probability in the study of Law and its institutions. This package implements general purpose tools used by ABJ, such as functions for sampling and basic manipulation of Brazilian lawsuits identification number. It also implements functions for text cleaning, such as accentuation removal.

Maintained by Caio Lente. Last updated 1 years ago.

jurimetrics toolkit

0.5 match 55 stars 6.76 score 78 scripts 1 dependents

abjur

abjData:Databases Used Routinely by the Brazilian Jurimetrics Association

The Brazilian Jurimetrics Association (ABJ in Portuguese, see <https://abj.org.br/> for more information) is a non-profit organization which aims to investigate and promote the use of statistics and probability in the study of Law and its institutions. This package has a set of datasets commonly used in our book.

Maintained by Julio Trecenti. Last updated 2 years ago.

dados ibge pnud

0.5 match 19 stars 5.32 score 55 scripts

guidoamoreira

pompp:Presence-Only for Marked Point Process

Inspired by Moreira and Gamerman (2022) <doi:10.1214/21-AOAS1569>, this methodology expands the idea by including Marks in the point process. Using efficient 'C++' code, the estimation is possible and made faster with 'OpenMP' <https://www.openmp.org/> enabled computers. This package was developed under the project PTDC/MAT-STA/28243/2017, supported by Portuguese funds through the Portuguese Foundation for Science and Technology (FCT).

Maintained by Guido Alberti Moreira. Last updated 2 years ago.

cpp openmp

0.8 match 2.70 score

williamorim

pokemon:Pokemon Data

Provides a dataset of Pokemon information in both English, Brazilian Portuguese and Danish. The dataset contains 949 rows and 22 columns, including information such as the Pokemon's name, ID, height, weight, stats, type, and more.

Maintained by William Amorim. Last updated 21 days ago.

0.5 match 2 stars 3.18 score 15 scripts

cran

orcamentoBR:Download Official Data on Brazil's Federal Budget

Allows users to download and analyze official data on Brazil's federal budget through the 'SPARQL' endpoint provided by the Integrated Budget and Planning System ('SIOP'). This package enables access to detailed information on budget allocations and expenditures of the federal government, making it easier to analyze and visualize these data. Technical information on the Brazilian federal budget is available (Portuguese only) at <https://www1.siop.planejamento.gov.br/mto/>. The 'SIOP' endpoint is available at <https://www1.siop.planejamento.gov.br/sparql/>.

Maintained by Daniel Gersten Reiss. Last updated 12 days ago.

0.5 match 1 stars 1.30 score

lucasprocessi

lero.lero:Generate 'Lero Lero' Quotes

Generates quotes from 'Lero Lero', a database for meaningless sentences filled with corporate buzzwords, intended to be used as corporate lorem ipsum (see <http://www.lerolero.com/> for more information). Unfortunately, quotes are currently portuguese-only.

Maintained by Lucas Processi. Last updated 7 years ago.

0.5 match 1.00 score 1 scripts