64,131 research outputs found

    PROTEUS: Scalable Online Machine Learning for Predictive Analytics and Real-Time Interactive Visualization

    Get PDF
    ABSTRACT Big data analytics is a critical and unavoidable process in any business and industrial environment. Nowadays, companies that do exploit big data's inner value get more economic revenue than the ones which do not. Once companies have determined their big data strategy, they face another serious problem: in-house designing and building of a scalable system that runs their business intelligence is difficult. The PROTEUS project aims to design, develop, and provide an open ready-to-use big data software architecture which is able to handle extremely large historical data and data streams and supports online machine learning predictive analytics and real-time interactive visualization. The overall evaluation of PROTEUS is carried out using a real industrial scenario. PROJECT DESCRIPTION PROTEUS 1 is an EU Horizon2020 2 funded research project, which has the goal to investigate and develop ready-to-use, scalable online machine learning algorithms and real-time interactive visual analytics, taking care of scalability, usability, and effectiveness. In particular, PROTEUS aims to solve the following big data challenges by surpassing the current state-of-art technologies with original contributions: 1. Handling extremely large historical data and data streams 2. Analytics on massive, high-rate, and complex data streams 3. Real-time interactive visual analytics of massive datasets, continuous unbounded streams, and learned models PROTEUS's solutions for the challenges above are: 1) a real-time hybrid processing system built on top of Apache Flink 3 (formerly Stratosphere 4 [1]) with optimized relational algebra and linear algebra operations support through LARA declarative language PROTEUS faces an additional challenge which deals with cor

    A review on big data stream processing applications: contributions, benefits, and limitations

    Get PDF
    The amount of data in our world has been rapidly keep growing from time to time. In the era of big data, the efficient processing and analysis of big data using machine learning algorithm is highly required, especially when the data comes in form of streams. There is no doubt that big data has become an important source of information and knowledge in making decision process. Nevertheless, dealing with this kind of data comes with great difficulties; thus, several techniques have been used in analyzing the data in the form of streams. Many techniques have been proposed and studied to handle big data and give decisions based on off-line batch analysis. Today, we need to make a constructive decision based on online streaming data analysis. Many researchers in recent years proposed some different kind of frameworks for processing the big data streaming. In this work, we explore and present in detail some of the recent achievements in big data streaming in term of contributions, benefits, and limitations. As well as some of recent platforms suitable to be used for big data streaming analytics. Moreover, we also highlight several issues that will be faced in big data stream processing. In conclusion, it is hoped that this study will assist the researchers in choosing the best and suitable framework for big data streaming projects

    Essays on Health Information Technology: Insights from Analyses of Big Datasets

    Get PDF
    The current dissertation provides an examination of health information technology (HIT) by analyzing big datasets. It contains two separate essays focused on: (1) the evolving intellectual structure of the healthcare informatics (HI) and healthcare IT (HIT) scholarly communities, and (2) the impact of social support exchange embedded in social interactions on health promotion outcomes associated with online health community use. Overall, this dissertation extends current theories by applying a unique combination of methods (natural language processing, machine learning, social network analysis, and structural equation modeling etc.) to the analyses of primary datasets. The goal of the first study is to obtain a full understanding of the underlying dynamics of the intellectual structures of HI and its sub-discipline HIT. Using multiple statistical methods including citation and co-citation analysis, social network analysis (SNA), and latent semantic analysis (LSA), this essay shows how HIT research has emerged in IS journals and distinguished itself from the larger HI context. The research themes, intellectual leadership, cohesion of these themes and networks of researchers, and journal presence revealed in our longitudinal intellectual structure analyses foretell how, in particular, these HI and HIT fields have evolved to date and also how they could evolve in the future. Our findings identify which research streams are central (versus peripheral) and which are cohesive (as opposed to disparate). Suggestions for vibrant areas of future research emerge from our analysis. The second part of the dissertation focuses on comprehensively understanding the effect of social support exchange in online health communities on individual members’ health promotion outcomes. This study examines the effectiveness of online consumer-to-consumer social support exchange on health promotion outcomes via analyses of big health data. Based on previous research, we propose a conceptual framework which integrates social capital theory and social support theory in the context of online health communities and test it through a quantitative field study and multiple analyses of a big online health community dataset. Specifically, natural language processing and machine learning techniques are utilized to automate content analysis of digital trace data. This research not only extends current theories of social support exchange in online health communities, but also sheds light on the design and management of such communities

    Crowdbreaks: Tracking Health Trends using Public Social Media Data and Crowdsourcing

    Get PDF
    In the past decade, tracking health trends using social media data has shown great promise, due to a powerful combination of massive adoption of social media around the world, and increasingly potent hardware and software that enables us to work with these new big data streams. At the same time, many challenging problems have been identified. First, there is often a mismatch between how rapidly online data can change, and how rapidly algorithms are updated, which means that there is limited reusability for algorithms trained on past data as their performance decreases over time. Second, much of the work is focusing on specific issues during a specific past period in time, even though public health institutions would need flexible tools to assess multiple evolving situations in real time. Third, most tools providing such capabilities are proprietary systems with little algorithmic or data transparency, and thus little buy-in from the global public health and research community. Here, we introduce Crowdbreaks, an open platform which allows tracking of health trends by making use of continuous crowdsourced labelling of public social media content. The system is built in a way which automatizes the typical workflow from data collection, filtering, labelling and training of machine learning classifiers and therefore can greatly accelerate the research process in the public health domain. This work introduces the technical aspects of the platform and explores its future use cases

    Stream Learning in Energy IoT Systems: A Case Study in Combined Cycle Power Plants

    Get PDF
    The prediction of electrical power produced in combined cycle power plants is a key challenge in the electrical power and energy systems field. This power production can vary depending on environmental variables, such as temperature, pressure, and humidity. Thus, the business problem is how to predict the power production as a function of these environmental conditions, in order to maximize the profit. The research community has solved this problem by applying Machine Learning techniques, and has managed to reduce the computational and time costs in comparison with the traditional thermodynamical analysis. Until now, this challenge has been tackled from a batch learning perspective, in which data is assumed to be at rest, and where models do not continuously integrate new information into already constructed models. We present an approach closer to the Big Data and Internet of Things paradigms, in which data are continuously arriving and where models learn incrementally, achieving significant enhancements in terms of data processing (time, memory and computational costs), and obtaining competitive performances. This work compares and examines the hourly electrical power prediction of several streaming regressors, and discusses about the best technique in terms of time processing and predictive performance to be applied on this streaming scenario.This work has been partially supported by the EU project iDev40. This project has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 783163. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Austria, Germany, Belgium, Italy, Spain, Romania. It has also been supported by the Basque Government (Spain) through the project VIRTUAL (KK-2018/00096), and by Ministerio de Economía y Competitividad of Spain (Grant Ref. TIN2017-85887-C2-2-P)

    Probabilistic Graphical Models on Multi-Core CPUs using Java 8

    Get PDF
    In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions.Comment: Pre-print version of the paper presented in the special issue on Computational Intelligence Software at IEEE Computational Intelligence Magazine journa
    • …
    corecore