2,371 research outputs found
Big Data and the Internet of Things
Advances in sensing and computing capabilities are making it possible to
embed increasing computing power in small devices. This has enabled the sensing
devices not just to passively capture data at very high resolution but also to
take sophisticated actions in response. Combined with advances in
communication, this is resulting in an ecosystem of highly interconnected
devices referred to as the Internet of Things - IoT. In conjunction, the
advances in machine learning have allowed building models on this ever
increasing amounts of data. Consequently, devices all the way from heavy assets
such as aircraft engines to wearables such as health monitors can all now not
only generate massive amounts of data but can draw back on aggregate analytics
to "improve" their performance over time. Big data analytics has been
identified as a key enabler for the IoT. In this chapter, we discuss various
avenues of the IoT where big data analytics either is already making a
significant impact or is on the cusp of doing so. We also discuss social
implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski
(eds.) Big Data Analysis: New algorithms for a new society, Springer Series
on Studies in Big Data, to appea
Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)
Real-time analytics that requires integration and aggregation of
heterogeneous and distributed streaming and static data is a typical task in
many industrial scenarios such as diagnostics of turbines in Siemens. OBDA
approach has a great potential to facilitate such tasks; however, it has a
number of limitations in dealing with analytics that restrict its use in
important industrial applications. Based on our experience with Siemens, we
argue that in order to overcome those limitations OBDA should be extended and
become analytics, source, and cost aware. In this work we propose such an
extension. In particular, we propose an ontology, mapping, and query language
for OBDA, where aggregate and other analytical functions are first class
citizens. Moreover, we develop query optimisation techniques that allow to
efficiently process analytical tasks over static and streaming data. We
implement our approach in a system and evaluate our system with Siemens turbine
data
Warehousing and Analyzing Streaming Data Quality Information
The development of integrative IS architectures focuses typically on solving problems related to the functionality of the system. It is attempted to design optimally flexible interfaces that can achieve the most agile architecture. The quality of the data that will be exchanged across these interfaces is often disregarded (implicitly or explicitly). This results in distributed applications which are functionally correct but cannot be deployed due to the low quality of the data involved. In order to avoid wrong business decisions due to âdirty dataâ, quality characteristics have to be captured, processed, and provided to the respective business task. However, the issue of how to efficiently provide applications with information about data quality is still an open research problem. Our approach tackles the problems posed by data quality deficiencies by presenting a novel concept to stream and warehouse data together with its describing data quality information
Recommended from our members
Big Data in the Oil and Gas Industry: A Promising Courtship
The energy industry remains one of the highest money-producing and investment industries in the world. The United Statesâ own economic stability depends greatly on the stability of oil and gas prices. Various factors affect the amount of money that will continue to be invested in producing oil. A main disadvantage to the oil and gas industry is its lack of technological adaptation. This weakens the industry because the surest measures are not currently being taken to produce oil in optimally efficient, safe, and cost-effective ways. Big data has gained global recognition as an opportunity to gather large volumes of information in real-time and translate data sets into actionable insights. In a low commodity price environment, saving time, reducing costs, and improving safety are crucial outcomes that can be realized using machine learning in oil and gas operations. Big data provides the opportunity to use unsupervised learning. For example, with this approach, engineers can predict oil wellsâ optimal barrels of production given the completion data in a specific area. However, a caveat to utilizing big data in the oil and gas industry is that there simply is neither enough physical data nor data velocity in the industry to be properly referred to as âbig data.â Big data, as it develops, will nonetheless significantly change the energy business in the future, as it already has in various other industries.Petroleum and Geosystems Engineerin
Analyzing data in the Internet of Things
The Internet of Things (IoT) is growing fast. According to the International Data Corporation (IDC), more than 28 billion things will be connected to the Internet by 2020âfrom smartwatches and other wearables to smart cities, smart homes, and smart cars. This OâReilly report dives into the IoT industry through a series of illuminating talks and case studies presented at 2015 Strata + Hadoop World Conferences in San Jose, New York, and Singapore.
Among the topics in this report, youâll explore the use of sensors to generate predictions, using data to create predictive maintenance applications, and modeling the smart and connected city of the future with Kafka and Spark.
Case studies include:
Using Spark Streaming for proactive maintenance and accident prevention in railway equipment
Monitoring subway and expressway traffic in Singapore using telco data
Managing emergency vehicles through situation awareness of traffic and weather in the smart city pilot in Oulu, Finland
Capturing and routing device-based health data to reduce cardiovascular disease
Using data analytics to reduce human space flight risk in NASAâs Orion program
This report concludes with a discussion of ethics related to algorithms that control things in the IoT. Youâll explore decisions related to IoT data, as well as opportunities to influence the moral implications involved in using the IoT
Efficient Data Streaming Analytic Designs for Parallel and Distributed Processing
Today, ubiquitously sensing technologies enable inter-connection of physical\ua0objects, as part of Internet of Things (IoT), and provide massive amounts of\ua0data streams. In such scenarios, the demand for timely analysis has resulted in\ua0a shift of data processing paradigms towards continuous, parallel, and multitier\ua0computing. However, these paradigms are followed by several challenges\ua0especially regarding analysis speed, precision, costs, and deterministic execution.\ua0This thesis studies a number of such challenges to enable efficient continuous\ua0processing of streams of data in a decentralized and timely manner.In the first part of the thesis, we investigate techniques aiming at speeding\ua0up the processing without a loss in precision. The focus is on continuous\ua0machine learning/data mining types of problems, appearing commonly in IoT\ua0applications, and in particular continuous clustering and monitoring, for which\ua0we present novel algorithms; (i) Lisco, a sequential algorithm to cluster data\ua0points collected by LiDAR (a distance sensor that creates a 3D mapping of the\ua0environment), (ii) p-Lisco, the parallel version of Lisco to enhance pipeline- and\ua0data-parallelism of the latter, (iii) pi-Lisco, the parallel and incremental version\ua0to reuse the information and prevent redundant computations, (iv) g-Lisco, a\ua0generalized version of Lisco to cluster any data with spatio-temporal locality\ua0by leveraging the implicit ordering of the data, and (v) Amble, a continuous\ua0monitoring solution in an industrial process.In the second part, we investigate techniques to reduce the analysis costs\ua0in addition to speeding up the processing while also supporting deterministic\ua0execution. The focus is on problems associated with availability and utilization\ua0of computing resources, namely reducing the volumes of data, involving\ua0concurrent computing elements, and adjusting the level of concurrency. For\ua0that, we propose three frameworks; (i) DRIVEN, a framework to continuously\ua0compress the data and enable efficient transmission of the compact data in the\ua0processing pipeline, (ii) STRATUM, a framework to continuously pre-process\ua0the data before transferring the later to upper tiers for further processing, and\ua0(iii) STRETCH, a framework to enable instantaneous elastic reconfigurations\ua0to adjust intra-node resources at runtime while ensuring determinism.The algorithms and frameworks presented in this thesis contribute to an\ua0efficient processing of data streams in an online manner while utilizing available\ua0resources. Using extensive evaluations, we show the efficiency and achievements\ua0of the proposed techniques for IoT representative applications that involve a\ua0wide spectrum of platforms, and illustrate that the performance of our work\ua0exceeds that of state-of-the-art techniques
- âŠ