50,632 research outputs found
A Unique Pipeline Model to Improve Anomaly Detection in High Dimensional Data
This paper presents a comprehensive method for dimension reduction and detecting anomalies in high-dimensional data (on healthcare datasets) using R. Realizing that traditional linear methods such as Principal Component Analysis (PCA) often ignore the complexity of the non-linear manifold of the data, our approach exploits iterative learning, on the belief that high-dimensional data is largely based on a low-dimensional manifold. The methodology starts by preparing the data using R libraries like Keras, dplyr, and ggplot2, addressing challenges like missing values ??and visualizing meaningful information. Using the Mahalanobis distance, the paper identifies and removes country-specific outliers. The pipelined model integrates Principal Component Analysis (PCA) for data transformation and combines an Autoencoder with t-SNE for dimensionality reduction. This refined dataset is then used to train a Multi-Layer Perceptron (MLP) artificial neural network, which facilitates anomaly detection based on reconstruction errors, illustrated by the point cloud. Additionally, the paper explores metric multidimensional scaling using artificial neural networks, tests large datasets such as healthcare and wine, and compares the results of the work using conventional techniques. This study highlights the effectiveness of integrating various pre-processing, visualization, and artificial neural network strategies through R for effective anomaly detection
Principal manifolds and graphs in practice: from molecular biology to dynamical systems
We present several applications of non-linear data modeling, using principal
manifolds and principal graphs constructed using the metaphor of elasticity
(elastic principal graph approach). These approaches are generalizations of the
Kohonen's self-organizing maps, a class of artificial neural networks. On
several examples we show advantages of using non-linear objects for data
approximation in comparison to the linear ones. We propose four numerical
criteria for comparing linear and non-linear mappings of datasets into the
spaces of lower dimension. The examples are taken from comparative political
science, from analysis of high-throughput data in molecular biology, from
analysis of dynamical systems.Comment: 12 pages, 9 figure
- …