2,591 research outputs found
Inferring short-term volatility indicators from Bitcoin blockchain
In this paper, we study the possibility of inferring early warning indicators
(EWIs) for periods of extreme bitcoin price volatility using features obtained
from Bitcoin daily transaction graphs. We infer the low-dimensional
representations of transaction graphs in the time period from 2012 to 2017
using Bitcoin blockchain, and demonstrate how these representations can be used
to predict extreme price volatility events. Our EWI, which is obtained with a
non-negative decomposition, contains more predictive information than those
obtained with singular value decomposition or scalar value of the total Bitcoin
transaction volume
Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams
The large-scale data stream problem refers to high-speed information flow
which cannot be processed in scalable manner under a traditional computing
platform. This problem also imposes expensive labelling cost making the
deployment of fully supervised algorithms unfeasible. On the other hand, the
problem of semi-supervised large-scale data streams is little explored in the
literature because most works are designed in the traditional single-node
computing environments while also being fully supervised approaches. This paper
offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to
cope with the scarcity of labelled samples and the large-scale data streams
simultaneously. WeScatterNet is crafted under distributed computing platform of
Apache Spark with a data-free model fusion strategy for model compression after
parallel computing stage. It features an open network structure to address the
global and local drift problems while integrating a data augmentation,
annotation and auto-correction () method for handling partially labelled
data streams. The performance of WeScatterNet is numerically evaluated in the
six large-scale data stream problems with only label proportions. It
shows highly competitive performance even if compared with fully supervised
learners with label proportions.Comment: This paper has been accepted for publication in Information Science
Evolving Ensemble Fuzzy Classifier
The concept of ensemble learning offers a promising avenue in learning from
data streams under complex environments because it addresses the bias and
variance dilemma better than its single model counterpart and features a
reconfigurable structure, which is well suited to the given context. While
various extensions of ensemble learning for mining non-stationary data streams
can be found in the literature, most of them are crafted under a static base
classifier and revisits preceding samples in the sliding window for a
retraining step. This feature causes computationally prohibitive complexity and
is not flexible enough to cope with rapidly changing environments. Their
complexities are often demanding because it involves a large collection of
offline classifiers due to the absence of structural complexities reduction
mechanisms and lack of an online feature selection mechanism. A novel evolving
ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in
this paper. pENsemble differs from existing architectures in the fact that it
is built upon an evolving classifier from data streams, termed Parsimonious
Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism,
which estimates a localized generalization error of a base classifier. A
dynamic online feature selection scenario is integrated into the pENsemble.
This method allows for dynamic selection and deselection of input features on
the fly. pENsemble adopts a dynamic ensemble structure to output a final
classification decision where it features a novel drift detection scenario to
grow the ensemble structure. The efficacy of the pENsemble has been numerically
demonstrated through rigorous numerical studies with dynamic and evolving data
streams where it delivers the most encouraging performance in attaining a
tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System
Pol-InSAR-Island - A benchmark dataset for multi-frequency Pol-InSAR data land cover classification
This paper presents Pol-InSAR-Island, the first publicly available multi-frequency Polarimetric Interferometric Synthetic Aperture Radar (Pol-InSAR) dataset labeled with detailed land cover classes, which serves as a challenging benchmark dataset for land cover classification. In recent years, machine learning has become a powerful tool for remote sensing image analysis. While there are numerous large-scale benchmark datasets for training and evaluating machine learning models for the analysis of optical data, the availability of labeled SAR or, more specifically, Pol-InSAR data is very limited. The lack of labeled data for training, as well as for testing and comparing different approaches, hinders the rapid development of machine learning algorithms for Pol-InSAR image analysis. The Pol-InSAR-Island benchmark dataset presented in this paper aims to fill this gap. The dataset consists of Pol-InSAR data acquired in S- and L-band by DLR\u27s airborne F-SAR system over the East Frisian island Baltrum. The interferometric image pairs are the result of a repeat-pass measurement with a time offset of several minutes. The image data are given as 6 × 6 coherency matrices in ground range on a 1 m × 1m grid. Pixel-accurate class labels, consisting of 12 different land cover classes, are generated in a semi-automatic process based on an existing biotope type map and visual interpretation of SAR and optical images. Fixed training and test subsets are defined to ensure the comparability of different approaches trained and tested prospectively on the Pol-InSAR-Island dataset. In addition to the dataset, results of supervised Wishart and Random Forest classifiers that achieve mean Intersection-over-Union scores between 24% and 67% are provided to serve as a baseline for future work. The dataset is provided via KITopenData: https://doi.org/10.35097/170
On the Intersection of Communication and Machine Learning
The intersection of communication and machine learning is attracting increasing interest from both communities. On the one hand, the development of modern communication system brings large amount of data and high performance requirement, which challenges the classic analytical-derivation based study philosophy and encourages the researchers to explore the data driven method, such as machine learning, to solve the problems with high complexity and large scale. On the other hand, the usage of distributed machine learning introduces the communication cost as one of the basic considerations for the design of machine learning algorithm and system.In this thesis, we first explore the application of machine learning on one of the classic problems in wireless network, resource allocation, for heterogeneous millimeter wave networks when the environment is with high dynamics. We address the practical concerns by providing the efficient online and distributed framework. In the second part, some sampling based communication-efficient distributed learning algorithm is proposed. We utilize the trade-off between the local computation and the total communication cost and propose the algorithm with good theoretical bound. In more detail, this thesis makes the following contributionsWe introduced an reinforcement learning framework to solve the resource allocation problems in heterogeneous millimeter wave network. The large state/action space is decomposed according to the topology of the network and solved by an efficient distribtued message passing algorithm. We further speed up the inference process by an online updating process.We proposed the distributed coreset based boosting framework. An efficient coreset construction algorithm is proposed based on the prior knowledge provided by clustering. Then the coreset is integrated with boosting with improved convergence rate. We extend the proposed boosting framework to the distributed setting, where the communication cost is reduced by the good approximation of coreset.We propose an selective sampling framework to construct a subset of sample that could effectively represent the model space. Based on the prior distribution of the model space or the large amount of samples from model space, we derive a computational efficient method to construct such subset by minimizing the error of classifying a classifier
A multimodal deep learning framework using local feature representations for face recognition
YesThe most recent face recognition systems are
mainly dependent on feature representations obtained using
either local handcrafted-descriptors, such as local binary patterns
(LBP), or use a deep learning approach, such as deep
belief network (DBN). However, the former usually suffers
from the wide variations in face images, while the latter
usually discards the local facial features, which are proven
to be important for face recognition. In this paper, a novel
framework based on merging the advantages of the local
handcrafted feature descriptors with the DBN is proposed to
address the face recognition problem in unconstrained conditions.
Firstly, a novel multimodal local feature extraction
approach based on merging the advantages of the Curvelet
transform with Fractal dimension is proposed and termed
the Curvelet–Fractal approach. The main motivation of this
approach is that theCurvelet transform, a newanisotropic and
multidirectional transform, can efficiently represent themain
structure of the face (e.g., edges and curves), while the Fractal
dimension is one of the most powerful texture descriptors
for face images. Secondly, a novel framework is proposed,
termed the multimodal deep face recognition (MDFR)framework,
to add feature representations by training aDBNon top
of the local feature representations instead of the pixel intensity
representations. We demonstrate that representations acquired by the proposed MDFR framework are complementary
to those acquired by the Curvelet–Fractal approach.
Finally, the performance of the proposed approaches has
been evaluated by conducting a number of extensive experiments
on four large-scale face datasets: the SDUMLA-HMT,
FERET, CAS-PEAL-R1, and LFW databases. The results
obtained from the proposed approaches outperform other
state-of-the-art of approaches (e.g., LBP, DBN, WPCA) by
achieving new state-of-the-art results on all the employed
datasets
- …