2 research outputs found

    Towards federated learning over large-scale streaming data

    Get PDF
    2020 Spring.Includes bibliographical references.Distributed Stream Processing Engines (DSPEs) have seen significant deployment growth along with an increase in streaming data sources such as sensor networks. These DSPEs enable processing large amounts of streaming data in a cluster of commodity machines to extract knowledge and insights in real-time. Due to fluctuating data arrival rates in real-world applications, modern DSPEs often provide auto-scaling. However, the existing designs of advanced analytical frameworks are not effectively aligned with scalable streaming computing environments. We have designed and developed ORCA, a federated learning architecture that supports the training of traditional Artificial Neural Networks as well as Convolutional Neural Networks and Long Short-term Memory Network based models while ensuring resiliency during scaling. ORCA also introduces dynamic adjustment of the 'elasticity' hyper-parameter for rescaled computing environments. We estimate this elasticity hyper-parameter using reinforcement learning. Our empirical benchmarks show that ORCA is capable of achieving an MSE of 0.038 over real-world streaming datasets

    Design and implementation of a component-based distributed system for text mining in social networks

    Get PDF
    This report presents the design and implementation of a component-based distributed system for text mining in social networks. The system consists of three main types of components, data collection, data processing and data visualization. Three possible frameworks explore simple linear architecture, message feedback architecture, Kafka centric architecture and provide implementations of them. The final system adopts Kafka-centric architecture in which all components are connected through Kafka brokers. In terms of functionality, data collection components are responsible for collecting data from Twitter and producing messages to Kafka brokers. Data processing components contain a series of basic text mining topologies. Based on JavaScript libraries, data visualization is presented on web pages and allows users to interact with graphs and charts. In order to improve the scalability and performance of text mining, the project selects Apache Storm framework to implement data processing components. In this report, we evaluate the availability of Kafka and Storm, the rates of data collection components and the performance of data processing components. The experimental results demonstrate our system is available and scalable, and the component-based structure of this system enables it to be extended easily
    corecore