2,591 research outputs found

    Inferring short-term volatility indicators from Bitcoin blockchain

    Full text link
    In this paper, we study the possibility of inferring early warning indicators (EWIs) for periods of extreme bitcoin price volatility using features obtained from Bitcoin daily transaction graphs. We infer the low-dimensional representations of transaction graphs in the time period from 2012 to 2017 using Bitcoin blockchain, and demonstrate how these representations can be used to predict extreme price volatility events. Our EWI, which is obtained with a non-negative decomposition, contains more predictive information than those obtained with singular value decomposition or scalar value of the total Bitcoin transaction volume

    Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams

    Full text link
    The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction (DA3DA^3) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only 25%25\% label proportions. It shows highly competitive performance even if compared with fully supervised learners with 100%100\% label proportions.Comment: This paper has been accepted for publication in Information Science

    Evolving Ensemble Fuzzy Classifier

    Full text link
    The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single model counterpart and features a reconfigurable structure, which is well suited to the given context. While various extensions of ensemble learning for mining non-stationary data streams can be found in the literature, most of them are crafted under a static base classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble pENsemble, is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier pClass. pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.Comment: this paper has been published by IEEE Transactions on Fuzzy System

    Pol-InSAR-Island - A benchmark dataset for multi-frequency Pol-InSAR data land cover classification

    Get PDF
    This paper presents Pol-InSAR-Island, the first publicly available multi-frequency Polarimetric Interferometric Synthetic Aperture Radar (Pol-InSAR) dataset labeled with detailed land cover classes, which serves as a challenging benchmark dataset for land cover classification. In recent years, machine learning has become a powerful tool for remote sensing image analysis. While there are numerous large-scale benchmark datasets for training and evaluating machine learning models for the analysis of optical data, the availability of labeled SAR or, more specifically, Pol-InSAR data is very limited. The lack of labeled data for training, as well as for testing and comparing different approaches, hinders the rapid development of machine learning algorithms for Pol-InSAR image analysis. The Pol-InSAR-Island benchmark dataset presented in this paper aims to fill this gap. The dataset consists of Pol-InSAR data acquired in S- and L-band by DLR\u27s airborne F-SAR system over the East Frisian island Baltrum. The interferometric image pairs are the result of a repeat-pass measurement with a time offset of several minutes. The image data are given as 6 × 6 coherency matrices in ground range on a 1 m × 1m grid. Pixel-accurate class labels, consisting of 12 different land cover classes, are generated in a semi-automatic process based on an existing biotope type map and visual interpretation of SAR and optical images. Fixed training and test subsets are defined to ensure the comparability of different approaches trained and tested prospectively on the Pol-InSAR-Island dataset. In addition to the dataset, results of supervised Wishart and Random Forest classifiers that achieve mean Intersection-over-Union scores between 24% and 67% are provided to serve as a baseline for future work. The dataset is provided via KITopenData: https://doi.org/10.35097/170

    On the Intersection of Communication and Machine Learning

    Get PDF
    The intersection of communication and machine learning is attracting increasing interest from both communities. On the one hand, the development of modern communication system brings large amount of data and high performance requirement, which challenges the classic analytical-derivation based study philosophy and encourages the researchers to explore the data driven method, such as machine learning, to solve the problems with high complexity and large scale. On the other hand, the usage of distributed machine learning introduces the communication cost as one of the basic considerations for the design of machine learning algorithm and system.In this thesis, we first explore the application of machine learning on one of the classic problems in wireless network, resource allocation, for heterogeneous millimeter wave networks when the environment is with high dynamics. We address the practical concerns by providing the efficient online and distributed framework. In the second part, some sampling based communication-efficient distributed learning algorithm is proposed. We utilize the trade-off between the local computation and the total communication cost and propose the algorithm with good theoretical bound. In more detail, this thesis makes the following contributionsWe introduced an reinforcement learning framework to solve the resource allocation problems in heterogeneous millimeter wave network. The large state/action space is decomposed according to the topology of the network and solved by an efficient distribtued message passing algorithm. We further speed up the inference process by an online updating process.We proposed the distributed coreset based boosting framework. An efficient coreset construction algorithm is proposed based on the prior knowledge provided by clustering. Then the coreset is integrated with boosting with improved convergence rate. We extend the proposed boosting framework to the distributed setting, where the communication cost is reduced by the good approximation of coreset.We propose an selective sampling framework to construct a subset of sample that could effectively represent the model space. Based on the prior distribution of the model space or the large amount of samples from model space, we derive a computational efficient method to construct such subset by minimizing the error of classifying a classifier

    A multimodal deep learning framework using local feature representations for face recognition

    Get PDF
    YesThe most recent face recognition systems are mainly dependent on feature representations obtained using either local handcrafted-descriptors, such as local binary patterns (LBP), or use a deep learning approach, such as deep belief network (DBN). However, the former usually suffers from the wide variations in face images, while the latter usually discards the local facial features, which are proven to be important for face recognition. In this paper, a novel framework based on merging the advantages of the local handcrafted feature descriptors with the DBN is proposed to address the face recognition problem in unconstrained conditions. Firstly, a novel multimodal local feature extraction approach based on merging the advantages of the Curvelet transform with Fractal dimension is proposed and termed the Curvelet–Fractal approach. The main motivation of this approach is that theCurvelet transform, a newanisotropic and multidirectional transform, can efficiently represent themain structure of the face (e.g., edges and curves), while the Fractal dimension is one of the most powerful texture descriptors for face images. Secondly, a novel framework is proposed, termed the multimodal deep face recognition (MDFR)framework, to add feature representations by training aDBNon top of the local feature representations instead of the pixel intensity representations. We demonstrate that representations acquired by the proposed MDFR framework are complementary to those acquired by the Curvelet–Fractal approach. Finally, the performance of the proposed approaches has been evaluated by conducting a number of extensive experiments on four large-scale face datasets: the SDUMLA-HMT, FERET, CAS-PEAL-R1, and LFW databases. The results obtained from the proposed approaches outperform other state-of-the-art of approaches (e.g., LBP, DBN, WPCA) by achieving new state-of-the-art results on all the employed datasets
    • …
    corecore