15 research outputs found
Geometric Cross-Modal Comparison of Heterogeneous Sensor Data
In this work, we address the problem of cross-modal comparison of aerial data
streams. A variety of simulated automobile trajectories are sensed using two
different modalities: full-motion video, and radio-frequency (RF) signals
received by detectors at various locations. The information represented by the
two modalities is compared using self-similarity matrices (SSMs) corresponding
to time-ordered point clouds in feature spaces of each of these data sources;
we note that these feature spaces can be of entirely different scale and
dimensionality. Several metrics for comparing SSMs are explored, including a
cutting-edge time-warping technique that can simultaneously handle local time
warping and partial matches, while also controlling for the change in geometry
between feature spaces of the two modalities. We note that this technique is
quite general, and does not depend on the choice of modalities. In this
particular setting, we demonstrate that the cross-modal distance between SSMs
corresponding to the same trajectory type is smaller than the cross-modal
distance between SSMs corresponding to distinct trajectory types, and we
formalize this observation via precision-recall metrics in experiments.
Finally, we comment on promising implications of these ideas for future
integration into multiple-hypothesis tracking systems.Comment: 10 pages, 13 figures, Proceedings of IEEE Aeroconf 201
ВИЯВЛЕННЯ АНОМАЛІЙ В ТЕЛЕКОМУНІКАЦІЙНОМУ ТРАФІКУ СТАТИСТИЧНИМИ МЕТОДАМИ
Anomaly detection is an important task in many areas of human life. Many statistical methods are used to detect anomalies. In this paper, statistical methods of data analysis, such as survival analysis, time series analysis (fractal), classification method (decision trees), cluster analysis, entropy method were chosen to detect anomalies. A description of the selected methods is given. To analyze anomalies, the traffic and attack implementations from an open dataset were taken. More than 3 million packets from the dataset were used to analyze the described methods. The dataset contained legitimate traffic (75%) and attacks (25%). Simulation modeling of the selected statistical methods was performed on the example of network traffic implementations of telecommunication networks of different protocols. To implement the simulation, programs were written in the Pyton programming language. DDoS attacks, UDP-flood, TCP SYN, ARP attacks and HTTP-flood were chosen as anomalies. A comparative analysis of the performance of these methods to detect anomalies (attacks) on such parameters as the probability of anomaly detection, the probability of false positive detection, the running time of each method to detect the anomaly was carried out. Experimental results showed the performance of each method. The decision tree method is the best in terms of anomaly identification probability, fewer false positives, and anomaly detection time. The entropy analysis method is slightly slower and gives slightly more false positives. Next is the cluster analysis method, which is slightly worse at detecting anomalies. Then the fractal analysis method showed a lower probability of detecting anomalies, a higher probability of false positives and a longer running time. The worst was the survival analysis method.Виявлення аномалій є важливим завданням у багатьох сферах людського життя. Для виявлення аномалій використовується множина статистичних методів. У даній роботі для виявлення аномалій були обрані статистичні методи аналізу даних, такі як аналіз виживання, аналіз часових рядів (фрактальний), метод класифікації (дерева прийняття рішень), кластерний аналіз, ентропійний метод. Також наводиться опис вибраних методів. Для аналізу аномалій були взяті реалізації трафіків і атак з відкритого датасету. Для аналізу описаних методів було використано понад 3 млн. пакетів з набору даних. Датасет містив легітимний трафік (75%) і атаки (25%). Проведено імітаційне моделювання обраних статистичних методів на прикладі реалізацій мережного трафіку телекомунікаційних мереж різних протоколів. Для реалізації імітаційного моделювання були написані програми на мові програмування Pyton. Як аномалії були обрані DDoS-атаки, UDP-flood, TCP SYN, ARP-атаки і HTTP-flood. Був проведений порівняльний аналіз продуктивності обраних статистичних методів щодо виявлення аномалій (атак) за такими параметрами як ймовірність виявлення аномалій, ймовірність хибнопозитивного виявлення, час роботи кожного методу для виявлення аномалії. Результати експериментів показали працездатність кожного методу. Метод дерева рішень є найкращим за ймовірністю ідентифікації аномалій, меншій кількості хибнопозитивних спрацьовувань і часу виявлення аномалій. Метод ентропійного аналізу дещо повільніше і дає трохи більше помилкових спрацьовувань. Далі слідує метод кластерного аналізу, який дещо гірше виявляє аномалії. Тоді як метод фрактального аналізу показав меншу ймовірність виявлення аномалій, велику ймовірність помилкових спрацьовувань і більший час роботи. Найгіршим виявився метод аналізу виживання
Topological Signals of Singularities in Ricci Flow
We implement methods from computational homology to obtain a topological
signal of singularity formation in a selection of geometries evolved
numerically by Ricci flow. Our approach, based on persistent homology, produces
precise, quantitative measures describing the behavior of an entire collection
of data across a discrete sample of times. We analyze the topological signals
of geometric criticality obtained numerically from the application of
persistent homology to models manifesting singularities under Ricci flow. The
results we obtain for these numerical models suggest that the topological
signals distinguish global singularity formation (collapse to a round point)
from local singularity formation (neckpinch). Finally, we discuss the
interpretation and implication of these results and future applications.Comment: 24 pages, 14 figure
Multi-Scale Local Shape Analysis and Feature Selection in Machine Learning Applications
We introduce a method called multi-scale local shape analysis, or MLSA, for
extracting features that describe the local structure of points within a
dataset. The method uses both geometric and topological features at multiple
levels of granularity to capture diverse types of local information for
subsequent machine learning algorithms operating on the dataset. Using
synthetic and real dataset examples, we demonstrate significant performance
improvement of classification algorithms constructed for these datasets with
correspondingly augmented features.Comment: 15 pages, 6 figures, 8 table
Persistent homology analysis of brain artery trees
New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and looping of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries
Hypothesis Testing for Shapes using Vectorized Persistence Diagrams
Topological data analysis involves the statistical characterization of the
shape of data. Persistent homology is a primary tool of topological data
analysis, which can be used to analyze those topological features and perform
statistical inference. In this paper, we present a two-stage hypothesis test
for vectorized persistence diagrams. The first stage filters elements in the
vectorized persistence diagrams to reduce false positives. The second stage
consists of multiple hypothesis tests, with false positives controlled by false
discovery rates. We demonstrate applications of the proposed procedure on
simulated point clouds and three-dimensional rock image data. Our results show
that the proposed hypothesis tests can provide flexible and informative
inferences on the shape of data with lower computational cost compared to the
permutation test
Using Persistent Homology Topological Features to Characterize Medical Images: Case Studies on Lung and Brain Cancers
Tumor shape is a key factor that affects tumor growth and metastasis. This
paper proposes a topological feature computed by persistent homology to
characterize tumor progression from digital pathology and radiology images and
examines its effect on the time-to-event data. The proposed topological
features are invariant to scale-preserving transformation and can summarize
various tumor shape patterns. The topological features are represented in
functional space and used as functional predictors in a functional Cox
proportional hazards model. The proposed model enables interpretable inference
about the association between topological shape features and survival risks.
Two case studies are conducted using consecutive 143 lung cancer and 77 brain
tumor patients. The results of both studies show that the topological features
predict survival prognosis after adjusting clinical variables, and the
predicted high-risk groups have significantly (at the level of 0.01) worse
survival outcomes than the low-risk groups. Also, the topological shape
features found to be positively associated with survival hazards are irregular
and heterogeneous shape patterns, which are known to be related to tumor
progression