R-universe search: topic:outliers

Showing 9 of total 9 results (show query)

easystats

correlation:Methods for Correlation Analysis

Lightweight package for computing different kinds of correlations, such as partial correlations, Bayesian correlations, multilevel correlations, polychoric correlations, biweight correlations, distance correlations and more. Part of the 'easystats' ecosystem. References: Makowski et al. (2020) <doi:10.21105/joss.02306>.

Maintained by Brenton M. Wiernik. Last updated 28 days ago.

bayesian bayesian-correlations biserial cor correlation correlation-analysis correlations easystats gamma gaussian-graphical-models hacktoberfest matrix multilevel-correlations outliers partial partial-correlations regression robust spearman

439 stars 14.23 score 672 scripts 10 dependents

convexfi

fitHeavyTail:Mean and Covariance Matrix Estimation under Heavy Tails

Robust estimation methods for the mean vector, scatter matrix, and covariance matrix (if it exists) from data (possibly containing NAs) under multivariate heavy-tailed distributions such as angular Gaussian (via Tyler's method), Cauchy, and Student's t distributions. Additionally, a factor model structure can be specified for the covariance matrix. The latest revision also includes the multivariate skewed t distribution. The package is based on the papers: Sun, Babu, and Palomar (2014); Sun, Babu, and Palomar (2015); Liu and Rubin (1995); Zhou, Liu, Kumar, and Palomar (2019); Pascal, Ollila, and Palomar (2021).

Maintained by Daniel P. Palomar. Last updated 2 years ago.

cauchy covariance-estimation covariance-matrix heavy-tailed-distributions outliers robust-estimation student-t tyler

22 stars 6.27 score 28 scripts 1 dependents

dppalomar

imputeFin:Imputation of Financial Time Series with Missing Values and/or Outliers

Missing values often occur in financial data due to a variety of reasons (errors in the collection process or in the processing stage, lack of asset liquidity, lack of reporting of funds, etc.). However, most data analysis methods expect complete data and cannot be employed with missing values. One convenient way to deal with this issue without having to redesign the data analysis method is to impute the missing values. This package provides an efficient way to impute the missing values based on modeling the time series with a random walk or an autoregressive (AR) model, convenient to model log-prices and log-volumes in financial data. In the current version, the imputation is univariate-based (so no asset correlation is used). In addition, outliers can be detected and removed. The package is based on the paper: J. Liu, S. Kumar, and D. P. Palomar (2019). Parameter Estimation of Heavy-Tailed AR Model With Missing Data Via Stochastic EM. IEEE Trans. on Signal Processing, vol. 67, no. 8, pp. 2159-2172. <doi:10.1109/TSP.2019.2899816>.

Maintained by Daniel P. Palomar. Last updated 4 years ago.

financial-data missing-values outliers time-series

25 stars 5.80 score 25 scripts

christiangoueguel

HotellingEllipse:Hotelling’s T-Squared Statistic and Ellipse

Functions to calculate the Hotelling’s T-squared statistic and corresponding confidence ellipses. Provides the semi-axes of the Hotelling’s T-squared ellipses at 95% and 99% confidence levels. Enables users to obtain the coordinates in two or three dimensions at user-defined confidence levels, allowing for the construction of 2D or 3D ellipses with customized confidence levels. Bro and Smilde (2014) <DOI:10.1039/c3ay41907j>. Brereton (2016) <DOI:10.1002/cem.2763>.

Maintained by Christian L. Goueguel. Last updated 3 months ago.

confidence-ellipse hotelling-ellipse hotelling-s-t-square hotelling-t2 hotellings-t2-distribution multivariate-distribution outliers partial-least-squares-regression pca pls principal-component-analysis

7 stars 5.29 score 14 scripts

fbartos

RoBTT:Robust Bayesian T-Test

An implementation of Bayesian model-averaged t-tests that allows users to draw inferences about the presence versus absence of an effect, variance heterogeneity, and potential outliers. The 'RoBTT' package estimates ensembles of models created by combining competing hypotheses and applies Bayesian model averaging using posterior model probabilities. Users can obtain model-averaged posterior distributions and inclusion Bayes factors, accounting for uncertainty in the data-generating process (Maier et al., 2024, <doi:10.3758/s13423-024-02590-5>). The package also provides a truncated likelihood version of the model-averaged t-test, enabling users to exclude potential outliers without introducing bias (Godmann et al., 2024, <doi:10.31234/osf.io/j9f3s>). Users can specify a wide range of informative priors for all parameters of interest. The package offers convenient functions for summary, visualization, and fit diagnostics.

Maintained by František Bartoš. Last updated 5 months ago.

bayesian model-averaging outliers t-test cpp

3 stars 5.26 score 9 scripts

talegari

solitude:An Implementation of Isolation Forest

Isolation forest is anomaly detection method introduced by the paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>).

Maintained by Komala Sheshachala Srikanth. Last updated 4 years ago.

isolation-forest outliers rpackages

24 stars 5.23 score 70 scripts 1 dependents

gagolews

genie:Fast, Robust, and Outlier Resistant Hierarchical Clustering

Includes the reference implementation of Genie - a hierarchical clustering algorithm that links two point groups in such a way that an inequity measure (namely, the Gini index) of the cluster sizes does not significantly increase above a given threshold. This method most often outperforms many other data segmentation approaches in terms of clustering quality as tested on a wide range of benchmark datasets. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. For more details see (Gagolewski et al. 2016 <DOI:10.1016/j.ins.2016.05.003>). For an even faster and more feature-rich implementation, including, amongst others, noise point detection, see the 'genieclust' package (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>).

Maintained by Marek Gagolewski. Last updated 3 years ago.

cluster cluster-analysis clustering data-analysis data-mining data-science datascience genie hierarchical-clustering-algorithm machine-learning machine-learning-algorithms outliers cpp openmp

22 stars 4.55 score 16 scripts

kvasilopoulos

transx:Transform Univariate Time Series

Univariate time series operations that follow an opinionated design. The main principle of 'transx' is to keep the number of observations the same. Operations that reduce this number have to fill the observations gap.

Maintained by Kostas Vasilopoulos. Last updated 4 years ago.

detrend filters outliers time-series transx

3 stars 4.29 score 13 scripts

zcebeci

odetector:Outlier Detection Using Partitioning Clustering Algorithms

An object is called "outlier" if it remarkably deviates from the other objects in a data set. Outlier detection is the process to find outliers by using the methods that are based on distance measures, clustering and spatial methods (Ben-Gal, 2005 <ISBN 0-387-24435-2>). It is one of the intensively studied research topics for identification of novelties, frauds, anomalies, deviations or exceptions in addition to its use for outlier removing in data processing. This package provides the implementations of some novel approaches to detect the outliers based on typicality degrees that are obtained with the soft partitioning clustering algorithms such as Fuzzy C-means and its variants.

Maintained by Zeynel Cebeci. Last updated 2 years ago.

anomaly-detection cluster-analysis clustering clustering-methods data datapreparation datapreprocessing exception-handling fcm fraud-detection fuzzy-clustering novelty-detection outlier-detection outlier-removal outliers partitioning pcm surprise-exploration

3.70 score 4 scripts