6,178 research outputs found

    Outlier Mining Methods Based on Graph Structure Analysis

    Get PDF
    Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.Peer ReviewedPostprint (published version

    Emergent Chiral Symmetry: Parity and Time Reversal Doubles

    Get PDF
    There are numerous examples of approximately degenerate states of opposite parity in molecular physics. Theory indicates that these doubles can occur in molecules that are reflection-asymmetric. Such parity doubles occur in nuclear physics as well, among nuclei with odd A \sim 219-229. We have also suggested elsewhere that such doubles occur in particle physics for baryons made up of `cbu' and `cbd' quarks. In this article, we discuss the theoretical foundations of these doubles in detail, demonstrating their emergence as a surprisingly subtle consequence of the Born-Oppenheimer approximation, and emphasizing their bundle-theoretic and topological underpinnings. Starting with certain ``low energy'' effective theories in which classical symmetries like parity and time reversal are anomalously broken on quantization, we show how these symmetries can be restored by judicious inclusion of ``high-energy'' degrees of freedom. This mechanism of restoring the symmetry naturally leads to the aforementioned doublet structure. A novel by-product of this mechanism is the emergence of an approximate symmetry (corresponding to the approximate degeneracy of the doubles) at low energies which is not evident in the full Hamiltonian. We also discuss the implications of this mechanism for Skyrmion physics, monopoles, anomalies and quantum gravity.Comment: 32 pages, latex. minor changes in presentation and reference

    Spectral Embedding Norm: Looking Deep into the Spectrum of the Graph Laplacian

    Full text link
    The extraction of clusters from a dataset which includes multiple clusters and a significant background component is a non-trivial task of practical importance. In image analysis this manifests for example in anomaly detection and target detection. The traditional spectral clustering algorithm, which relies on the leading KK eigenvectors to detect KK clusters, fails in such cases. In this paper we propose the {\it spectral embedding norm} which sums the squared values of the first II normalized eigenvectors, where II can be significantly larger than KK. We prove that this quantity can be used to separate clusters from the background in unbalanced settings, including extreme cases such as outlier detection. The performance of the algorithm is not sensitive to the choice of II, and we demonstrate its application on synthetic and real-world remote sensing and neuroimaging datasets
    corecore