8 research outputs found

    Using Case Prototypicality as a Semantic Primitive

    Get PDF

    Online pattern recognition in subsequence time series clustering

    Get PDF
    One of the open issues in the context of subsequence time series clustering is online pattern recognition. There are different fields in this clustering such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. Among these fields pattern recognition is one the essential concept. To implement the idea of online pattern recognition, we choose sequences of ECG data as a subsequence time series data. Additionally, using ECG data can help to interpret heart activity for finding heart diseases. This paper will offer a way to generate online pattern recognition in subsequence time series clustering in order to have a runtime results

    Determining provenance in phishing websites using automated conceptual analysis

    Get PDF
    Phishing is a form of online fraud with drastic consequences for the victims and institutions being defrauded. A phishing attack tries to create a believable environment for the intended victim to enter their confidential data such that the attacker can use or sell this information later. In order to apprehend phishers, law enforcement agencies need automated systems capable of tracking the size and scope of phishing attacks, in order to more wisely use their resources shutting down the major players, rather then wasting resources stopping smaller operations. In order to develop these systems, phishing attacks need to be clustered by provenance in a way that adequately profiles these evolving attackers. The research presented in this paper looks at the viability of using automated conceptual analysis through cluster analysis techniques on phishing websites, with the aim of determining provenance of these phishing attacks. Conceptual analysis is performed on the source code of the websites, rather than the final text that is displayed to the user, eliminating problems with rendering obfuscation and increasing the distinctiveness brought about by differences in coding styles of the phishers. By using cluster analysis algorithms, distinguishing factors between groups of phishing websites can be obtained. The results indicate that it is difficult to separate websites by provenance without also separating by intent, by looking at the phishing websites alone. Instead, the methods discussed in this paper should form part of a larger system that uses more information about the phishing attacks

    The Minimum Description Length Principle for Pattern Mining: A Survey

    Full text link
    This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems

    A Graph Analytics Framework for Knowledge Discovery

    Get PDF
    Title from PDF of title page, viewed on June 20, 2016Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references (pages 203-222)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2016In the current data movement, numerous efforts have been made to convert and normalize a large number of traditionally structured and unstructured data to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the big data community, data integration and knowledge discovery from heterogeneous do mains become important research problems. In the application level, detection of related concepts among ontologies shows a huge potential to do knowledge discovery with big data. In RDF graph, concepts represent entities and predicates indicate properties that connect different entities. It is more crucial to figure out how different concepts are re lated within a single ontology or across multiple ontologies by analyzing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for researchers to find existing or potential predicates to per form linking among cross domains concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and generate query to discover cross domains knowledge from each topic. In this work, we present such a model that conducts predicate oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovative unsupervised learning algorithm to partition large data sets into smaller and closer topics that generate meaningful queries to fully discover knowledge over a set of interlinked data sources. In this dissertation, we present a graph analytics framework that aims at providing semantic methods for analysis and pattern discovery from graph data with cross domains. Our contributions can be summarized as follows: • The definition of predicate oriented neighborhood measures to determine the neighborhood relationships among different RDF predicates of linked data across do mains; • The design of the global and local optimization of clustering and retrieval algorithms to maximize the knowledge discovery from large linked data: i) top-down clustering, called the Hierarchical Predicate oriented K-means Clustering;ii)bottom up clustering, called the Predicate oriented Hierarchical Agglomerative Clustering; iii) automatic topic discovery and query generation, context aware topic path finding for a given source and target pair; • The implementation of an interactive tool and endpoints for knowledge discovery and visualization from integrated query design and query processing for cross do mains; • Experimental evaluations conducted to validate proposed methodologies of the frame work using DBpedia, YAGO, and Bio2RDF datasets and comparison of the pro posed methods with existing graph partition methods and topic discovery methods. In this dissertation, we propose a framework called the GraphKDD. The GraphKDD is able to analyze and quantify close relationship among predicates based on Predicate Oriented Neighbor Pattern (PONP). Based on PONP, the GraphKDD conducts a Hierarchical Predicate oriented K-Means clustering (HPKM) algorithm and a Predicate oriented Hierarchical Agglomerative clustering (PHAL) algorithm to partition graphs into semantically related sub-graphs. In addition, in application level, the GraphKDD is capable of generating query dynamically from topic discovery results and testing reachability be tween source target nodes. We validate the proposed GraphKDD framework through comprehensive evaluations using DBPedia, Yago and Bio2RDF datasets.Introduction -- Predicate oriented neighborhood patterns -- Unsupervised learning on PONP Association Measurement -- Query generation and topic aware link discovery -- The GraphKDD ontology learning framework -- Conclusion and future wor

    Argumentative zoning information extraction from scientific text

    Get PDF
    Let me tell you, writing a thesis is not always a barrel of laughs—and strange things can happen, too. For example, at the height of my thesis paranoia, I had a re-current dream in which my cat Amy gave me detailed advice on how to restructure the thesis chapters, which was awfully nice of her. But I also had a lot of human help throughout this time, whether things were going fine or beserk. Most of all, I want to thank Marc Moens: I could not have had a better or more knowledgable supervisor. He always took time for me, however busy he might have been, reading chapters thoroughly in two days. He both had the calmness of mind to give me lots of freedom in research, and the right judgement to guide me away, tactfully but determinedly, from the occasional catastrophe or other waiting along the way. He was great fun to work with and also became a good friend. My work has profitted from the interdisciplinary, interactive and enlightened atmosphere at the Human Communication Centre and the Centre for Cognitive Science (which is now called something else). The Language Technology Group was a great place to work in, as my research was grounded in practical applications develope

    Clustering Words with the MDL Principle

    No full text
    corecore