65,159 research outputs found

    Interpretable Categorization of Heterogeneous Time Series Data

    Get PDF
    Understanding heterogeneous multivariate time series data is important in many applications ranging from smart homes to aviation. Learning models of heterogeneous multivariate time series that are also human-interpretable is challenging and not adequately addressed by the existing literature. We propose grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs extend decision trees with a grammar framework. Logical expressions derived from a context-free grammar are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. In particular, when a grammar based on temporal logic is used, we show that GBDTs can be used for the interpretable classi cation of high-dimensional and heterogeneous time series data. Furthermore, we show how GBDTs can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply GBDTs to analyze the classic Australian Sign Language dataset as well as data on near mid-air collisions (NMACs). The NMAC data comes from aircraft simulations used in the development of the next-generation Airborne Collision Avoidance System (ACAS X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data Mining (SDM) 201

    Speeding-up qq-gram mining on grammar-based compressed texts

    Full text link
    We present an efficient algorithm for calculating qq-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP T\mathcal{T} of size nn that represents string TT, the algorithm computes the occurrence frequencies of all qq-grams in TT, by reducing the problem to the weighted qq-gram frequencies problem on a trie-like structure of size m=Tdup(q,T)m = |T|-\mathit{dup}(q,\mathcal{T}), where dup(q,T)\mathit{dup}(q,\mathcal{T}) is a quantity that represents the amount of redundancy that the SLP captures with respect to qq-grams. The reduced problem can be solved in linear time. Since m=O(qn)m = O(qn), the running time of our algorithm is O(min{Tdup(q,T),qn})O(\min\{|T|-\mathit{dup}(q,\mathcal{T}),qn\}), improving our previous O(qn)O(qn) algorithm when q=Ω(T/n)q = \Omega(|T|/n)

    Learning Language from a Large (Unannotated) Corpus

    Full text link
    A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa

    Application of Formal Grammar in Text Mining and Construction of an Ontology

    Get PDF
    This work describes an investigation of formal grammar with application to text mining. It is an important area since text is the most widespread type of data and it contains a lot of potentially useful information. Unstructured nature of text requires other methods for its processing, in contrast to other types of data mining. In this work, the authors propose an original approach to text mining by making a parse tree for each sentence using regular grammar and creating an ontology and provide a demonstration of this system being implemented in a constrained scenario. This ontology can be used for different tasks, ranging from expert systems to automatic machine translation. The ontology is a network consisting of concepts linked by relations. The authors developed a new system to implement proposed approach working in different languages

    Coin.AI: A Proof-of-Useful-Work Scheme for Blockchain-based Distributed Deep Learning

    Get PDF
    One decade ago, Bitcoin was introduced, becoming the first cryptocurrency and establishing the concept of "blockchain" as a distributed ledger. As of today, there are many different implementations of cryptocurrencies working over a blockchain, with different approaches and philosophies. However, many of them share one common feature: they require proof-of-work to support the generation of blocks (mining) and, eventually, the generation of money. This proof-of-work scheme often consists in the resolution of a cryptography problem, most commonly breaking a hash value, which can only be achieved through brute-force. The main drawback of proof-of-work is that it requires ridiculously large amounts of energy which do not have any useful outcome beyond supporting the currency. In this paper, we present a theoretical proposal that introduces a proof-of-useful-work scheme to support a cryptocurrency running over a blockchain, which we named Coin.AI. In this system, the mining scheme requires training deep learning models, and a block is only mined when the performance of such model exceeds a threshold. The distributed system allows for nodes to verify the models delivered by miners in an easy way (certainly much more efficiently than the mining process itself), determining when a block is to be generated. Additionally, this paper presents a proof-of-storage scheme for rewarding users that provide storage for the deep learning models, as well as a theoretical dissertation on how the mechanics of the system could be articulated with the ultimate goal of democratizing access to artificial intelligence.Comment: 17 pages, 5 figure
    corecore