709 research outputs found
Big Data and the Internet of Things
Advances in sensing and computing capabilities are making it possible to
embed increasing computing power in small devices. This has enabled the sensing
devices not just to passively capture data at very high resolution but also to
take sophisticated actions in response. Combined with advances in
communication, this is resulting in an ecosystem of highly interconnected
devices referred to as the Internet of Things - IoT. In conjunction, the
advances in machine learning have allowed building models on this ever
increasing amounts of data. Consequently, devices all the way from heavy assets
such as aircraft engines to wearables such as health monitors can all now not
only generate massive amounts of data but can draw back on aggregate analytics
to "improve" their performance over time. Big data analytics has been
identified as a key enabler for the IoT. In this chapter, we discuss various
avenues of the IoT where big data analytics either is already making a
significant impact or is on the cusp of doing so. We also discuss social
implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski
(eds.) Big Data Analysis: New algorithms for a new society, Springer Series
on Studies in Big Data, to appea
A Domain Specific Language for Digital Forensics and Incident Response Analysis
One of the longstanding conceptual problems in digital forensics is the dichotomy between the need for verifiable and reproducible forensic investigations, and the lack of practical mechanisms to accomplish them. With nearly four decades of professional digital forensic practice, investigator notes are still the primary source of reproducibility information, and much of it is tied to the functions of specific, often proprietary, tools.
The lack of a formal means of specification for digital forensic operations results in three major problems. Specifically, there is a critical lack of:
a) standardized and automated means to scientifically verify accuracy of digital forensic tools;
b) methods to reliably reproduce forensic computations (their results); and
c) framework for inter-operability among forensic tools.
Additionally, there is no standardized means for communicating software requirements between users, researchers and developers, resulting in a mismatch in expectations. Combined with the exponential growth in data volume and complexity of applications and systems to be investigated, all of these concerns result in major case backlogs and inherently reduce the reliability of the digital forensic analyses.
This work proposes a new approach to the specification of forensic computations, such that the above concerns can be addressed on a scientific basis with a new domain specific language (DSL) called nugget. DSLs are specialized languages that aim to address the concerns of particular domains by providing practical abstractions. Successful DSLs, such as SQL, can transform an application domain by providing a standardized way for users to communicate what they need without specifying how the computation should be performed.
This is the first effort to build a DSL for (digital) forensic computations with the following research goals:
1) provide an intuitive formal specification language that covers core types of forensic computations and common data types;
2) provide a mechanism to extend the language that can incorporate arbitrary computations;
3) provide a prototype execution environment that allows the fully automatic execution of the computation;
4) provide a complete, formal, and auditable log of computations that can be used to reproduce an investigation;
5) demonstrate cloud-ready processing that can match the growth in data volumes and complexity
Blending big data analytics : review on challenges and a recent study
With the collection of massive amounts of data every day, big data analytics has emerged as an important trend for many organizations. These collected data can contain important information that may be key to solving wide-ranging problems, such as cyber security, marketing, healthcare, and fraud. To analyze their large volumes of data for business analyses and decisions, large companies, such as Facebook and Google, adopt analytics. Such analyses and decisions impact existing and future technology. In this paper, we explore how big data analytics is utilized as a technique for solving problems of complex and unstructured data using such technologies as Hadoop, Spark, and MapReduce. We also discuss the data challenges introduced by big data according to the literature, including its six V's. Moreover, we investigate case studies of big data analytics on various techniques of such analytics, namely, text, voice, video, and network analytics. We conclude that big data analytics can bring positive changes in many fields, such as education, military, healthcare, politics, business, agriculture, banking, and marketing, in the future. © 2013 IEEE
Design and evaluation of a cloud native data analysis pipeline for cyber physical production systems
Since 1991 with the birth of the World Wide Web the rate of data growth has been growing with a record level in the last couple of years. Big companies
tackled down this data growth with expensive and enormous data centres to process and get value of this data. From social media, Internet of Things (IoT), new business process, monitoring and multimedia, the capacities of
those data centres started to be a problem and required continuos and expensive expansion. Thus, Big Data was something that only a few were able to access. This changed fast when Amazon launched Amazon Web Services (AWS) around 15 years ago and gave the origins to the public cloud.
At that time, the capabilities were still very new and reduced but 10 years later the cloud was a whole new business that changed for ever the Big Data business. This not only commoditised computer power but it was
accompanied by a price model that let medium and small players the possibility to access it. In consequence, new problems arised regarding the nature of these distributed systems and the software architectures required
for proper data processing. The present job analyse the type of typical Big Data workloads and propose an architecture for a cloud native data analysis
pipeline. Lastly, it provides a chapter for tools and services that can be used in the architecture taking advantage of their open source nature and the cloud
price models.Fil: Ferrer Daub, Facundo Javier. Universidad Católica de Córdoba. Instituto de Ciencias de la Administración; Argentin
Big Data Model Simulation on a Graph Database for Surveillance in Wireless Multimedia Sensor Networks
Sensors are present in various forms all around the world such as mobile
phones, surveillance cameras, smart televisions, intelligent refrigerators and
blood pressure monitors. Usually, most of the sensors are a part of some other
system with similar sensors that compose a network. One of such networks is
composed of millions of sensors connect to the Internet which is called
Internet of things (IoT). With the advances in wireless communication
technologies, multimedia sensors and their networks are expected to be major
components in IoT. Many studies have already been done on wireless multimedia
sensor networks in diverse domains like fire detection, city surveillance,
early warning systems, etc. All those applications position sensor nodes and
collect their data for a long time period with real-time data flow, which is
considered as big data. Big data may be structured or unstructured and needs to
be stored for further processing and analyzing. Analyzing multimedia big data
is a challenging task requiring a high-level modeling to efficiently extract
valuable information/knowledge from data. In this study, we propose a big
database model based on graph database model for handling data generated by
wireless multimedia sensor networks. We introduce a simulator to generate
synthetic data and store and query big data using graph model as a big
database. For this purpose, we evaluate the well-known graph-based NoSQL
databases, Neo4j and OrientDB, and a relational database, MySQL.We have run a
number of query experiments on our implemented simulator to show that which
database system(s) for surveillance in wireless multimedia sensor networks is
efficient and scalable
- …