35,055 research outputs found
Implications of Z-normalization in the matrix profile
Companies are increasingly measuring their products and services, resulting in a rising amount of available time series data, making techniques to extract usable information needed. One state-of-the-art technique for time series is the Matrix Profile, which has been used for various applications including motif/discord discovery, visualizations and semantic segmentation. Internally, the Matrix Profile utilizes the z-normalized Euclidean distance to compare the shape of subsequences between two series. However, when comparing subsequences that are relatively flat and contain noise, the resulting distance is high despite the visual similarity of these subsequences. This property violates some of the assumptions made by Matrix Profile based techniques, resulting in worse performance when series contain flat and noisy subsequences. By studying the properties of the z-normalized Euclidean distance, we derived a method to eliminate this effect requiring only an estimate of the standard deviation of the noise. In this paper we describe various practical properties of the z-normalized Euclidean distance and show how these can be used to correct the performance of Matrix Profile related techniques. We demonstrate our techniques using anomaly detection using a Yahoo! Webscope anomaly dataset, semantic segmentation on the PAMAP2 activity dataset and for data visualization on a UCI activity dataset, all containing real-world data, and obtain overall better results after applying our technique. Our technique is a straightforward extension of the distance calculation in the Matrix Profile and will benefit any derived technique dealing with time series containing flat and noisy subsequences
Reference face graph for face recognition
Face recognition has been studied extensively; however, real-world face recognition still remains a challenging task. The demand for unconstrained practical face recognition is rising with the explosion of online multimedia such as social networks, and video surveillance footage where face analysis is of significant importance. In this paper, we approach face recognition in the context of graph theory. We recognize an unknown face using an external reference face graph (RFG). An RFG is generated and recognition of a given face is achieved by comparing it to the faces in the constructed RFG. Centrality measures are utilized to identify distinctive faces in the reference face graph. The proposed RFG-based face recognition algorithm is robust to the changes in pose and it is also alignment free. The RFG recognition is used in conjunction with DCT locality sensitive hashing for efficient retrieval to ensure scalability. Experiments are conducted on several publicly available databases and the results show that the proposed approach outperforms the state-of-the-art methods without any preprocessing necessities such as face alignment. Due to the richness in the reference set construction, the proposed method can also handle illumination and expression variation
Visual identification by signature tracking
We propose a new camera-based biometric: visual signature identification. We discuss the importance of the parameterization of the signatures in order to achieve good classification results, independently of variations in the position of the camera with respect to the writing surface. We show that affine arc-length parameterization performs better than conventional time and Euclidean arc-length ones. We find that the system verification performance is better than 4 percent error on skilled forgeries and 1 percent error on random forgeries, and that its recognition performance is better than 1 percent error rate, comparable to the best camera-based biometrics
SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases
The Internet has enabled the creation of a growing number of large-scale
knowledge bases in a variety of domains containing complementary information.
Tools for automatically aligning these knowledge bases would make it possible
to unify many sources of structured knowledge and answer complex queries.
However, the efficient alignment of large-scale knowledge bases still poses a
considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a
simple algorithm for aligning knowledge bases with millions of entities and
facts. SiGMa is an iterative propagation algorithm which leverages both the
structural information from the relationship graph as well as flexible
similarity measures between entity properties in a greedy local search, thus
making it scalable. Despite its greedy nature, our experiments indicate that
SiGMa can efficiently match some of the world's largest knowledge bases with
high precision. We provide additional experiments on benchmark datasets which
demonstrate that SiGMa can outperform state-of-the-art approaches both in
accuracy and efficiency.Comment: 10 pages + 2 pages appendix; 5 figures -- initial preprin
Adapting Sequence to Sequence models for Text Normalization in Social Media
Social media offer an abundant source of valuable raw data, however informal
writing can quickly become a bottleneck for many natural language processing
(NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot
explicitly handle noise found in short online posts. Moreover, the variety of
frequently occurring linguistic variations presents several challenges, even
for humans who might not be able to comprehend the meaning of such posts,
especially when they contain slang and abbreviations. Text Normalization aims
to transform online user-generated text to a canonical form. Current text
normalization systems rely on string or phonetic similarity and classification
models that work on a local fashion. We argue that processing contextual
information is crucial for this task and introduce a social media text
normalization hybrid word-character attention-based encoder-decoder model that
can serve as a pre-processing step for NLP applications to adapt to noisy text
in social media. Our character-based component is trained on synthetic
adversarial examples that are designed to capture errors commonly found in
online user-generated text. Experiments show that our model surpasses neural
architectures designed for text normalization and achieves comparable
performance with state-of-the-art related work.Comment: Accepted at the 13th International AAAI Conference on Web and Social
Media (ICWSM 2019
Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures
Many cluster similarity indices are used to evaluate clustering algorithms,
and choosing the best one for a particular task remains an open problem. We
demonstrate that this problem is crucial: there are many disagreements among
the indices, these disagreements do affect which algorithms are preferred in
applications, and this can lead to degraded performance in real-world systems.
We propose a theoretical framework to tackle this problem: we develop a list of
desirable properties and conduct an extensive theoretical analysis to verify
which indices satisfy them. This allows for making an informed choice: given a
particular application, one can first select properties that are desirable for
the task and then identify indices satisfying these. Our work unifies and
considerably extends existing attempts at analyzing cluster similarity indices:
we introduce new properties, formalize existing ones, and mathematically prove
or disprove each property for an extensive list of validation indices. This
broader and more rigorous approach leads to recommendations that considerably
differ from how validation indices are currently being chosen by practitioners.
Some of the most popular indices are even shown to be dominated by previously
overlooked ones
- …