Search CORE

65,159 research outputs found

Interpretable Categorization of Heterogeneous Time Series Data

Author: Kochenderfer Mykel J.
Lee Ritchie
Mengshoel Ole J.
Silbermann Joshua
Publication venue
Publication date: 26/01/2018
Field of study

Understanding heterogeneous multivariate time series data is important in many applications ranging from smart homes to aviation. Learning models of heterogeneous multivariate time series that are also human-interpretable is challenging and not adequately addressed by the existing literature. We propose grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs extend decision trees with a grammar framework. Logical expressions derived from a context-free grammar are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. In particular, when a grammar based on temporal logic is used, we show that GBDTs can be used for the interpretable classi cation of high-dimensional and heterogeneous time series data. Furthermore, we show how GBDTs can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply GBDTs to analyze the classic Australian Sign Language dataset as well as data on near mid-air collisions (NMACs). The NMAC data comes from aircraft simulations used in the development of the next-generation Airborne Collision Avoidance System (ACAS X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data Mining (SDM) 201

arXiv.org e-Print Archive

Crossref

NASA Technical Reports Server

Speeding-up $q$ -gram mining on grammar-based compressed texts

Author: Bannai Hideo
Goto Keisuke
Inenaga Shunuke
Takeda Masayuki
坂内英夫
後藤啓介
稲永俊介
竹田正幸
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2012
Field of study

We present an efficient algorithm for calculating

q

-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP

\mathcal{T}

of size

n

that represents string

T

, the algorithm computes the occurrence frequencies of all

q

-grams in

T

, by reducing the problem to the weighted

q

-gram frequencies problem on a trie-like structure of size

m = |T|-\mathit{dup}(q,\mathcal{T})

, where

\mathit{dup}(q,\mathcal{T})

is a quantity that represents the amount of redundancy that the SLP captures with respect to

q

-grams. The reduced problem can be solved in linear time. Since

m = O(qn)

, the running time of our algorithm is

O(\min\{|T|-\mathit{dup}(q,\mathcal{T}),qn\})

, improving our previous

O(qn)

algorithm when

q = \Omega(|T|/n)

arXiv.org e-Print Archive

Kyushu University Institutional Repository

Learning Language from a Large (Unannotated) Corpus

Author: Goertzel Ben
Vepstas Linas
Publication venue
Publication date: 14/01/2014
Field of study

A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa

arXiv.org e-Print Archive

CiteSeerX

Application of Formal Grammar in Text Mining and Construction of an Ontology

Author: Cunningham Stuart
Kanev Anton
Publication venue
Publication date: 09/11/2017
Field of study

This work describes an investigation of formal grammar with application to text mining. It is an important area since text is the most widespread type of data and it contains a lot of potentially useful information. Unstructured nature of text requires other methods for its processing, in contrast to other types of data mining. In this work, the authors propose an original approach to text mining by making a parse tree for each sentence using regular grammar and creating an ontology and provide a demonstration of this system being implemented in a constrained scenario. This ontology can be used for different tasks, ranging from expert systems to automatic machine translation. The ontology is a network consisting of concepts linked by relations. The authors developed a new system to implement proposed approach working in different languages

Glyndŵr University Research Online

Coin.AI: A Proof-of-Useful-Work Scheme for Blockchain-based Distributed Deep Learning

Author: Baldominos Alejandro
Saez Yago
Publication venue: 'MDPI AG'
Publication date: 25/07/2019
Field of study

One decade ago, Bitcoin was introduced, becoming the first cryptocurrency and establishing the concept of "blockchain" as a distributed ledger. As of today, there are many different implementations of cryptocurrencies working over a blockchain, with different approaches and philosophies. However, many of them share one common feature: they require proof-of-work to support the generation of blocks (mining) and, eventually, the generation of money. This proof-of-work scheme often consists in the resolution of a cryptography problem, most commonly breaking a hash value, which can only be achieved through brute-force. The main drawback of proof-of-work is that it requires ridiculously large amounts of energy which do not have any useful outcome beyond supporting the currency. In this paper, we present a theoretical proposal that introduces a proof-of-useful-work scheme to support a cryptocurrency running over a blockchain, which we named Coin.AI. In this system, the mining scheme requires training deep learning models, and a block is only mined when the performance of such model exceeds a threshold. The distributed system allows for nodes to verify the models delivered by miners in an easy way (certainly much more efficiently than the mining process itself), determining when a block is to be generated. Additionally, this paper presents a proof-of-storage scheme for rewarding users that provide storage for the deep learning models, as well as a theoretical dissertation on how the mechanics of the system could be articulated with the ultimate goal of democratizing access to artificial intelligence.Comment: 17 pages, 5 figure

arXiv.org e-Print Archive

Universidad Carlos III de Madrid e-Archivo