39 research outputs found
The Case for Learned Index Structures
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the
position of a record within a sorted array, a Hash-Index as a model to map a
key to a position of a record within an unsorted array, and a BitMap-Index as a
model to indicate if a data record exists or not. In this exploratory research
paper, we start from this premise and posit that all existing index structures
can be replaced with other types of models, including deep-learning models,
which we term learned indexes. The key idea is that a model can learn the sort
order or structure of lookup keys and use this signal to effectively predict
the position or existence of records. We theoretically analyze under which
conditions learned indexes outperform traditional index structures and describe
the main challenges in designing learned index structures. Our initial results
show, that by using neural nets we are able to outperform cache-optimized
B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over
several real-world data sets. More importantly though, we believe that the idea
of replacing core components of a data management system through learned models
has far reaching implications for future systems designs and that this work
just provides a glimpse of what might be possible
Financial time series prediction using spiking neural networks
In this paper a novel application of a particular type of spiking neural network, a Polychronous Spiking Network, was used for financial time series prediction. It is argued that the inherent temporal capabilities of this type of network are suited to non-stationary data such as this. The performance of the spiking neural network was benchmarked against three systems: two "traditional", rate-encoded, neural networks; a Multi-Layer Perceptron neural network and a Dynamic Ridge Polynomial neural network, and a standard Linear Predictor Coefficients model. For this comparison three non-stationary and noisy time series were used: IBM stock data; US/Euro exchange rate data, and the price of Brent crude oil. The experiments demonstrated favourable prediction results for the Spiking Neural Network in terms of Annualised Return and prediction error for 5-Step ahead predictions. These results were also supported by other relevant metrics such as Maximum Drawdown and Signal-To-Noise ratio. This work demonstrated the applicability of the Polychronous Spiking Network to financial data forecasting and this in turn indicates the potential of using such networks over traditional systems in difficult to manage non-stationary environments. © 2014 Reid et al
Quantifying and identifying the overlapping community structure in networks
It has been shown that the communities of complex networks often overlap with
each other. However, there is no effective method to quantify the overlapping
community structure. In this paper, we propose a metric to address this
problem. Instead of assuming that one node can only belong to one community,
our metric assumes that a maximal clique only belongs to one community. In this
way, the overlaps between communities are allowed. To identify the overlapping
community structure, we construct a maximal clique network from the original
network, and prove that the optimization of our metric on the original network
is equivalent to the optimization of Newman's modularity on the maximal clique
network. Thus the overlapping community structure can be identified through
partitioning the maximal clique network using any modularity optimization
method. The effectiveness of our metric is demonstrated by extensive tests on
both the artificial networks and the real world networks with known community
structure. The application to the word association network also reproduces
excellent results.Comment: 9 pages, 7 figure
A Survey of Bayesian Statistical Approaches for Big Data
The modern era is characterised as an era of information or Big Data. This
has motivated a huge literature on new methods for extracting information and
insights from these data. A natural question is how these approaches differ
from those that were available prior to the advent of Big Data. We present a
review of published studies that present Bayesian statistical approaches
specifically for Big Data and discuss the reported and perceived benefits of
these approaches. We conclude by addressing the question of whether focusing
only on improving computational algorithms and infrastructure will be enough to
face the challenges of Big Data
Learning American English Accents Using Ensemble Learning with GMMs
Accent identification has grown over the past decade. There has been decent success when a priori knowledge about the accents is available. A typical approach entails detection of certain syllables and phonemes, which in turn requires phoneme-based models. Recently, Gaussian Mixture Models (GMMs) have been used as an unsupervised alternative to these phoneme-based models, but they have had limited success unless they used a priori knowledge. We studied extensions of the GMMs using ensemble learning (i.e. bagging and Boosting).
