643 research outputs found
Recommended from our members
An Architecture for Big Data Analytics
Big Data is the new experience curve in the new economy driven by data with high volume, velocity, variety, and veracity. They come from various sources that include the Internet, mobile devices, social media, geospatial devices, sensors, and other machine-generated data. Unlocking the value of Big Data allows business to better sense and respond to the environment, and is becoming a key to creating competitive advantages in a complex and rapidly changing market. Government is also taking notice of the Big Data phenomenon and has created initiatives to exploit Big Data in many areas such as science and engineering, healthcare and national security. Traditional data processing and analysis of structured data using RDBMS and data warehousing no longer satisfy the challenges of Big Data. Technology trends for Big Data embrace open source software, commodity servers, and massively parallel-distributed processing platforms. Analytics is at the core of exploiting values from Big Data to create consumable insights for business and government. This paper presents architecture for Big Data Analytics and explores Big Data technologies that include NoSQL databases, Hadoop Distributed File System and MapReduce
Service Oriented Big Data Management for Transport
International audienceThe increasing power of computer hardware and the sophistication of computer software have brought many new possibilities to information world. On one side the possibility to analyse massive data sets has brought new insight, knowledge and information. On the other, it has enabled to massively distribute computing and has opened to a new programming paradigm called Service Oriented Computing particularly well adapted to cloud computing. Applying these new technologies to the transport industry can bring new understanding to town transport infrastructures. The objective of our work is to manage and aggregate cloud services for managing big data and assist decision making for transport systems. Thus this paper presents our approach to propose a service oriented architecture for big data analytics for transport systems based on the cloud. Proposing big data management strategies for data produced by transport infra‐ structures, whilst maintaining cost effective systems deployed on the cloud, is a promising approach. We present the advancement for developing the Data acquisition service and Information extraction and cleaning service as well as the analysis for choosing a sharding strategy
Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data
Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required.
To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems.
In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries.
The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms.
In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms.
Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies
Application of HADOOP to Store and Process Big Data Gathered from an Urban Water Distribution System
Information technology has become an integral part of municipal water distribution systems (WDS). Various types of sensors, e.g.,
smart water meters, usually work in real-time mode delivering a huge amount of data. Big data must be stored in appropriate
databases. Along with the development of data mining tools, the analysis of big data is very important for the management of
WDS. Valuation of NoSQL databases for water data is currently in its very early stages. In this paper, the Apache Hadoop platform
is investigated with respect to a possible database solution based on NoSQL. We present comparative experiments evaluating the
performance of the Hadoop and MySQL databases
Big Data in the Cloud: A Survey
Big Data has become a hot topic across several business areas requiring the storage and processing of huge volumes of data. Cloud computing leverages Big Data by providing high storage and processing capabilities and enables corporations to consume resources in a pay-as-you-go model making clouds the optimal environment for storing and processing huge quantities of data. By using virtualized resources, Cloud can scale very easily, be highly available and provide massive storage capacity and processing power. This paper surveys existing databases models to store and process Big Data within a Cloud environment. Particularly, we detail the following traditional NoSQL databases: BigTable, Cassandra, DynamoDB, HBase, Hypertable, and MongoDB. The MapReduce framework and its developments Apache Spark, HaLoop, Twister, and other alternatives such as Apache Giraph, GraphLab, Pregel and MapD - a novel platform that uses GPU processing to accelerate Big Data processing - are also analyzed. Finally, we present two case studies that demonstrate the successful use of Big Data within Cloud environments and the challenges that must be addressed in the future
Semantic Support for Log Analysis of Safety-Critical Embedded Systems
Testing is a relevant activity for the development life-cycle of Safety
Critical Embedded systems. In particular, much effort is spent for analysis and
classification of test logs from SCADA subsystems, especially when failures
occur. The human expertise is needful to understand the reasons of failures,
for tracing back the errors, as well as to understand which requirements are
affected by errors and which ones will be affected by eventual changes in the
system design. Semantic techniques and full text search are used to support
human experts for the analysis and classification of test logs, in order to
speedup and improve the diagnosis phase. Moreover, retrieval of tests and
requirements, which can be related to the current failure, is supported in
order to allow the discovery of available alternatives and solutions for a
better and faster investigation of the problem.Comment: EDCC-2014, BIG4CIP-2014, Embedded systems, testing, semantic
discovery, ontology, big dat
- …