65,159 research outputs found
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
Speeding-up -gram mining on grammar-based compressed texts
We present an efficient algorithm for calculating -gram frequencies on
strings represented in compressed form, namely, as a straight line program
(SLP). Given an SLP of size that represents string , the
algorithm computes the occurrence frequencies of all -grams in , by
reducing the problem to the weighted -gram frequencies problem on a
trie-like structure of size , where
is a quantity that represents the amount of
redundancy that the SLP captures with respect to -grams. The reduced problem
can be solved in linear time. Since , the running time of our
algorithm is , improving our
previous algorithm when
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Application of Formal Grammar in Text Mining and Construction of an Ontology
This work describes an investigation of formal grammar with application to text mining. It is an important area since text is the most widespread type of data and it contains a lot of potentially useful information. Unstructured nature of text requires other methods for its processing, in contrast to other types of data mining. In this work, the authors propose an original approach to text mining by making a parse tree for each sentence using regular grammar and creating an ontology and provide a demonstration of this system being implemented in a constrained scenario. This ontology can be used for different tasks, ranging from expert systems to automatic machine translation. The ontology is a network consisting of concepts linked by relations. The authors developed a new system to implement proposed approach working in different languages
Coin.AI: A Proof-of-Useful-Work Scheme for Blockchain-based Distributed Deep Learning
One decade ago, Bitcoin was introduced, becoming the first cryptocurrency and
establishing the concept of "blockchain" as a distributed ledger. As of today,
there are many different implementations of cryptocurrencies working over a
blockchain, with different approaches and philosophies. However, many of them
share one common feature: they require proof-of-work to support the generation
of blocks (mining) and, eventually, the generation of money. This proof-of-work
scheme often consists in the resolution of a cryptography problem, most
commonly breaking a hash value, which can only be achieved through brute-force.
The main drawback of proof-of-work is that it requires ridiculously large
amounts of energy which do not have any useful outcome beyond supporting the
currency. In this paper, we present a theoretical proposal that introduces a
proof-of-useful-work scheme to support a cryptocurrency running over a
blockchain, which we named Coin.AI. In this system, the mining scheme requires
training deep learning models, and a block is only mined when the performance
of such model exceeds a threshold. The distributed system allows for nodes to
verify the models delivered by miners in an easy way (certainly much more
efficiently than the mining process itself), determining when a block is to be
generated. Additionally, this paper presents a proof-of-storage scheme for
rewarding users that provide storage for the deep learning models, as well as a
theoretical dissertation on how the mechanics of the system could be
articulated with the ultimate goal of democratizing access to artificial
intelligence.Comment: 17 pages, 5 figure
- …