Showing 25 of total 25 results (show query)
pommedeterresautee
unine:Unine Light Stemmer
Implementation of "light" stemmers for French, German, Italian, Spanish, Portuguese, Finnish, Swedish. They are based on the same work as the "light" stemmers found in 'SolR' <https://lucene.apache.org/solr/> or 'ElasticSearch' <https://www.elastic.co/fr/products/elasticsearch>. A "light" stemmer consists in removing inflections only for noun and adjectives. Indexing verbs for these languages is not of primary importance compared to nouns and adjectives. The stemming procedure for French is described in (Savoy, 1999) <doi:10.1002/(SICI)1097-4571(1999)50:10%3C944::AID-ASI9%3E3.3.CO;2-H>.
Maintained by Michaël Benesty. Last updated 6 years ago.
finishfrenchgermaninformation-retrievaliritaliannlpportuguesespanishstemmerswedishcpp
16.5 match 4 stars 3.30 score 1 scriptsxiaoruizhu
SurrogateRsq:Goodness-of-Fit Analysis for Categorical Data using the Surrogate R-Squared
To assess and compare the models' goodness of fit, R-squared is one of the most popular measures. For categorical data analysis, however, no universally adopted R-squared measure can resemble the ordinary least square (OLS) R-squared for linear models with continuous data. This package implement the surrogate R-squared measure for categorical data analysis, which is proposed in the study of Dungang Liu, Xiaorui Zhu, Brandon Greenwell, and Zewei Lin (2022) <doi:10.1111/bmsp.12289>. It can generate a point or interval measure of the surrogate R-squared. It can also provide a ranking measure of the percentage contribution of each variable to the overall surrogate R-squared. This ranking assessment allows one to check the importance of each variable in terms of their explained variance. This package can be jointly used with other existing R packages for variable selection and model diagnostics in the model-building process.
Maintained by Xiaorui (Jeremy) Zhu. Last updated 12 months ago.
categorical-data-analysisgoodness-of-fitr-squared-statisticstatistics
6.8 match 5 stars 4.48 score 12 scriptsdatasketch
genero:Estimate Gender from Names in Spanish and Portuguese
Estimate gender from names in Spanish and Portuguese. Works with vectors and dataframes. The estimation works not only for first names but also full names. The package relies on a compilation of common names with it's most frequent associated gender in both languages which are used as look up tables for gender inference.
Maintained by Juan Pablo Marin Diaz. Last updated 5 years ago.
7.5 match 2 stars 4.00 score 9 scriptscienciadedatos
dados:Translate Datasets to Portuguese
Este pacote traduz os seguintes conjuntos de dados: 'airlines', 'airports', 'ames_raw', 'AwardsManagers', 'babynames', 'Batting', 'diamonds', 'faithful', 'fueleconomy', 'Fielding', 'flights', 'gapminder', 'gss_cat', 'iris', 'Managers', 'mpg', 'mtcars', 'atmos', 'penguins', 'People, 'Pitching', 'pixarfilms','planes', 'presidential', 'table1', 'table2', 'table3', 'table4a', 'table4b', 'table5', 'vehicles', 'weather', 'who'. English: It provides a Portuguese translated version of the datasets listed above.
Maintained by Riva Quiroga. Last updated 7 months ago.
3.8 match 46 stars 7.13 score 266 scriptsdmarcelinobr
SoundexBR:Phonetic-Coding for Portuguese
The SoundexBR package provides an algorithm for decoding names into phonetic codes, as pronounced in Portuguese. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. The soundex code resultant consists of a four digits long string composed by one letter followed by three numerical digits: the letter is the first letter of the name, and the digits encode the remaining consonants.
Maintained by Daniel Marcelino. Last updated 6 years ago.
5.8 match 13 stars 3.99 score 15 scriptskoheiw
newsmap:Semi-Supervised Model for Geographical Document Classification
Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).
Maintained by Kohei Watanabe. Last updated 9 months ago.
machine-learningnews-storiesquantedatext-analysis
3.8 match 62 stars 6.05 score 8 scriptsropensci
charlatan:Make Fake Data
Make fake data that looks realistic, supporting addresses, person names, dates, times, colors, coordinates, currencies, digital object identifiers ('DOIs'), jobs, phone numbers, 'DNA' sequences, doubles and integers from distributions and within a range.
Maintained by Roel M. Hogervorst. Last updated 1 months ago.
datadatasetfake-datafakerpeer-reviewed
2.3 match 296 stars 10.06 score 180 scripts 1 dependentsnhs-pt
hospitals:Portuguese 'NHS' Hospitals
A data set of the Portuguese 'NHS' hospitals.
Maintained by Ramiro Magno. Last updated 3 years ago.
7.9 match 1 stars 2.70 score 2 scriptssillasgonzaga
lexiconPT:Lexicons for Portuguese Text Analysis
Provides easy access for sentiment lexicons for those who want to do text analysis in Portuguese texts. As of now, two Portuguese lexicons are available: 'SentiLex-PT02' and 'OpLexicon' (v2.1 and v3.0).
Maintained by Sillas Gonzaga. Last updated 7 years ago.
3.9 match 57 stars 5.12 score 46 scriptstrangdata
treeheatr:Heatmap-Integrated Decision Tree Visualizations
Creates interpretable decision tree visualizations with the data represented as a heatmap at the tree's leaf nodes. 'treeheatr' utilizes the customizable 'ggparty' package for drawing decision trees.
Maintained by Trang Le. Last updated 2 years ago.
datavizdecision-treesggplotheatmapvisualization
3.5 match 57 stars 5.71 score 18 scriptszumbov2
deeplr:Interface to the 'DeepL' Translation API
A wrapper for the 'DeepL' Pro API <https://www.deepl.com/docs-api>, a web service for translating texts between different languages. A DeepL API developer account is required to use the service (see <https://www.deepl.com/pro#developer>).
Maintained by David Zumbach. Last updated 12 months ago.
3.4 match 41 stars 5.57 score 70 scriptsdfalbel
rslp:A Stemming Algorithm for the Portuguese Language
Implements the "Stemming Algorithm for the Portuguese Language" <DOI:10.1109/SPIRE.2001.10024>.
Maintained by Daniel Falbel. Last updated 5 years ago.
3.7 match 21 stars 4.10 score 12 scriptskoenderks
aRtsy:Generative Art with 'ggplot2'
Provides algorithms for creating artworks in the 'ggplot2' language that incorporate some form of randomness.
Maintained by Koen Derks. Last updated 21 hours ago.
2.0 match 174 stars 7.52 score 59 scriptsdfalbel
ptwikiwords:Words Used in Portuguese Wikipedia
Contains a dataset of words used in 15.000 randomly extracted pages from the Portuguese Wikipedia (<https://pt.wikipedia.org/>).
Maintained by Daniel Falbel. Last updated 8 years ago.
3.7 match 4 stars 3.30 score 6 scriptscran
MVar.pt:Analise multivariada (brazilian portuguese)
Analise multivariada, tendo funcoes que executam analise de correspondencia simples (CA) e multipla (MCA), analise de componentes principais (PCA), analise de correlacao canonica (CCA), analise fatorial (FA), escalonamento multidimensional (MDS), analise discriminante linear (LDA) e quadratica (QDA), analise de cluster hierarquico e nao hierarquico, regressao linear simples e multipla, analise de multiplos fatores (MFA) para dados quantitativos, qualitativos, de frequencia (MFACT) e dados mistos, biplot, scatter plot, projection pursuit (PP), grant tour e outras funcoes uteis para a analise multivariada.
Maintained by Paulo Cesar Ossani. Last updated 4 months ago.
5.0 match 2.17 score 37 scriptsmvogel78
childsds:Data and Methods Around Reference Values in Pediatrics
Calculation of standard deviation scores and percentiles adduced from different standards (WHO, UK, Germany, Italy, China, etc). Also, references for laboratory values in children and adults are available, e.g., serum lipids, iron-related blood parameters, IGF, liver enzymes. See package documentation for full list.
Maintained by Mandy Vogel. Last updated 2 months ago.
3.8 match 2.83 score 51 scriptsjessesadler
debkeepr:Analysis of Non-Decimal Currencies and Double-Entry Bookkeeping
Analysis of historical non-decimal currencies and value systems that use tripartite or tetrapartite systems such as pounds, shillings, and pence. It introduces new vector classes to represent non-decimal currencies, making them compatible with numeric classes, and provides functions to work with these classes in data frames in the context of double-entry bookkeeping.
Maintained by Jesse Sadler. Last updated 2 years ago.
accountingdigital-humanitieseconomic-historyhistory
1.5 match 9 stars 5.51 score 24 scriptskumes
deepRstudio:Seamless Language Translation in 'RStudio' using 'DeepL' API and 'Rstudioapi'
Enhancing cross-language compatibility within the 'RStudio' environment and supporting seamless language understanding, the 'deepRstudio' package leverages the power of the 'DeepL' API (see <https://www.deepl.com/docs-api>) to enable seamless, fast, accurate, and affordable translation of code comments, documents, and text. This package offers the ability to translate selected text into English (EN), as well as from English into various languages, namely Japanese (JA), Chinese (ZH), Spanish (ES), French (FR), Russian (RU), Portuguese (PT), and Indonesian (ID). With much of the text being written in English, the emphasis is on compatibility from English. It is also designed for developers working on multilingual projects and data analysts collaborating with international teams, simplifying the translation process and making code more accessible and comprehensible to people with diverse language backgrounds. This package uses the 'rstudioapi' package and 'DeepL' API, and is simply implemented, executed from addins or via shortcuts on 'RStudio'. With just a few steps, content can be translated between supported languages, promoting better collaboration and expanding the global reach of work. The functionality of this package works only on 'RStudio' using 'rstudioapi'.
Maintained by Satoshi Kume. Last updated 1 years ago.
deepldeeprstudiolanguage-translationrstudiorstudioapiseamlessseamless-languagetranslation
2.2 match 2 stars 3.48 score 4 scripts 1 dependentsnalimilan
SnowballC:Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library
An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary. Currently supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish.
Maintained by Milan Bouchet-Valat. Last updated 18 days ago.
0.5 match 27 stars 12.63 score 4.4k scripts 171 dependentsabjur
abjutils:Useful Tools for Jurimetrical Analysis Used by the Brazilian Jurimetrics Association
The Brazilian Jurimetrics Association (ABJ in Portuguese, see <https://abj.org.br/> for more information) is a non-profit organization which aims to investigate and promote the use of statistics and probability in the study of Law and its institutions. This package implements general purpose tools used by ABJ, such as functions for sampling and basic manipulation of Brazilian lawsuits identification number. It also implements functions for text cleaning, such as accentuation removal.
Maintained by Caio Lente. Last updated 1 years ago.
0.5 match 55 stars 6.76 score 78 scripts 1 dependentsabjur
abjData:Databases Used Routinely by the Brazilian Jurimetrics Association
The Brazilian Jurimetrics Association (ABJ in Portuguese, see <https://abj.org.br/> for more information) is a non-profit organization which aims to investigate and promote the use of statistics and probability in the study of Law and its institutions. This package has a set of datasets commonly used in our book.
Maintained by Julio Trecenti. Last updated 2 years ago.
0.5 match 19 stars 5.32 score 55 scriptsguidoamoreira
pompp:Presence-Only for Marked Point Process
Inspired by Moreira and Gamerman (2022) <doi:10.1214/21-AOAS1569>, this methodology expands the idea by including Marks in the point process. Using efficient 'C++' code, the estimation is possible and made faster with 'OpenMP' <https://www.openmp.org/> enabled computers. This package was developed under the project PTDC/MAT-STA/28243/2017, supported by Portuguese funds through the Portuguese Foundation for Science and Technology (FCT).
Maintained by Guido Alberti Moreira. Last updated 2 years ago.
0.8 match 2.70 scorewilliamorim
pokemon:Pokemon Data
Provides a dataset of Pokemon information in both English, Brazilian Portuguese and Danish. The dataset contains 949 rows and 22 columns, including information such as the Pokemon's name, ID, height, weight, stats, type, and more.
Maintained by William Amorim. Last updated 21 days ago.
0.5 match 2 stars 3.18 score 15 scriptscran
orcamentoBR:Download Official Data on Brazil's Federal Budget
Allows users to download and analyze official data on Brazil's federal budget through the 'SPARQL' endpoint provided by the Integrated Budget and Planning System ('SIOP'). This package enables access to detailed information on budget allocations and expenditures of the federal government, making it easier to analyze and visualize these data. Technical information on the Brazilian federal budget is available (Portuguese only) at <https://www1.siop.planejamento.gov.br/mto/>. The 'SIOP' endpoint is available at <https://www1.siop.planejamento.gov.br/sparql/>.
Maintained by Daniel Gersten Reiss. Last updated 12 days ago.
0.5 match 1 stars 1.30 scorelucasprocessi
lero.lero:Generate 'Lero Lero' Quotes
Generates quotes from 'Lero Lero', a database for meaningless sentences filled with corporate buzzwords, intended to be used as corporate lorem ipsum (see <http://www.lerolero.com/> for more information). Unfortunately, quotes are currently portuguese-only.
Maintained by Lucas Processi. Last updated 7 years ago.
0.5 match 1.00 score 1 scripts