20,042 research outputs found

    Low Power and Scalable Many-Core Architecture for Big-Data Stream Computing

    Get PDF
    In the last years the process of examining large amounts of different types of data, or Big-Data, in an effort to uncover hidden patterns or unknown correlations has become a major need in our society. In this context, stream mining applications are now widely used in several domains such as financial analysis, video annotation, surveillance, medical services, traffic prediction, etc. In order to cope with the Big-Data stream input and its high variability, modern stream mining applications implement systems with heterogeneous classifiers and adapt online to its input data stream characteristics variation. Moreover, unlike existing architectures for video processing and compression applications, where the processing units are reconfigurable in terms of parameters and possibly even functions as the input data is changing, in Big-Data stream mining applications the complete computing pipeline is changing, as entirely new classifiers and processing functions are invoked depending on the input stream. As a result, new approaches of reconfigurable hardware platform architectures are needed to handle Big-Data streams. However, hardware solutions that have been proposed so far for stream mining applications either target high performance computing without any power consideration (i.e., limiting their applicability in small-scale computing infrastructures or current embedded systems), or they are simply dedicated to a specific learning algorithm (i.e., limited to run with a single type of classifiers). Therefore, in this paper we propose a novel low-power manycore architecture for stream mining applications that is able to cope with the dynamic data-driven nature of stream mining applications while consuming limited power. Our exploration indicates that this new proposed architecture is able to adapt to different classifiers complexities thanks to its multiple scalable vector processing units and their re-configurability feature at runtime. Moreover, our platform architecture includes a memory hierarchy optimized for Big-Data streaming and implements modern fine-grained power management techniques over all the different types of cores allowing then minimum energy consumption for each type of executed classifie

    Evolving Large-Scale Data Stream Analytics based on Scalable PANFIS

    Full text link
    Many distributed machine learning frameworks have recently been built to speed up the large-scale data learning process. However, most distributed machine learning used in these frameworks still uses an offline algorithm model which cannot cope with the data stream problems. In fact, large-scale data are mostly generated by the non-stationary data stream where its pattern evolves over time. To address this problem, we propose a novel Evolving Large-scale Data Stream Analytics framework based on a Scalable Parsimonious Network based on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving algorithm is distributed over the worker nodes in the cloud to learn large-scale data stream. Scalable PANFIS framework incorporates the active learning (AL) strategy and two model fusion methods. The AL accelerates the distributed learning process to generate an initial evolving large-scale data stream model (initial model), whereas the two model fusion methods aggregate an initial model to generate the final model. The final model represents the update of current large-scale data knowledge which can be used to infer future data. Extensive experiments on this framework are validated by measuring the accuracy and running time of four combinations of Scalable PANFIS and other Spark-based built in algorithms. The results indicate that Scalable PANFIS with AL improves the training time to be almost two times faster than Scalable PANFIS without AL. The results also show both rule merging and the voting mechanisms yield similar accuracy in general among Scalable PANFIS algorithms and they are generally better than Spark-based algorithms. In terms of running time, the Scalable PANFIS training time outperforms all Spark-based algorithms when classifying numerous benchmark datasets.Comment: 20 pages, 5 figure

    Chipmunk: A Systolically Scalable 0.9 mm2{}^2, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

    Get PDF
    Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm2{}^2) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple Chipmunk engines can cooperate to form a single systolic array. In this way, the Chipmunk architecture in a 75 tiles configuration can achieve real-time phoneme extraction on a demanding RNN topology proposed by Graves et al., consuming less than 13 mW of average power
    • …
    corecore