Search CORE

809,593 research outputs found

Recommended from our members

Data Management Solutions for Tackling Big Data Variety

Author: Arora Vaibhav
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Variety is one of the three defining characteristics of Big Data; the others being Volume and Velocity. There are several aspects of this data variety: diversity in data formats (text, video, audio) and structure (relational, graph etc), variety in access methodologies(OLTP, OLAP), and distribution heterogeneity within the workloads (read-heavy, high contention). Data management solutions for modern-day applications need to tackle this variety.This dissertation provides an understanding of the challenges associated with the different elements of variety, and proposes several solutions for efficiently handling its various aspects. First, the dissertation studies the challenges related to variety in data structure and access methodologies, and the resultant heterogeneity at the data infrastructure level. Applications now employ several data-processing engines with different underlying representations, like row, column, graph etc., to process their data. We propose Janus, which introduces a novel data-movement pipeline, which enables the use of different representations to support both high throughput of transactions and diverse analytics, while still ensuring consistent real-time analytics in a scale-out setting. Janus partitions the data at different representations, and allows distributed transactions and diverse partitioning strategies at the representations. Then, we propose Typhon and Cerberus, which define and enforce consistency semantics for application data spread across representations. Second, this dissertation proposes solutions for handling distribution heterogeneity within the workloads. Workloads can have have skewed distribution in terms of operation-type, data access or temporal variation. We propose strongly-consistent quorum reads for Raft-like consensus protocols, which can be utilized to scale read-heavy workloads. For supporting high contention transaction workloads, we integrate an existing dynamic timestamp allocation based concurrency control mechanism in a distributed OLTP setting, and analyze its performance. Third, we study IoT applications, which have to deal with both physical heterogeneity of the sensors, as well asdiverse data-processing demands. We propose a multi-representation based architecture catering to IoT applications, and also present the initial design of M-stream, a computation framework for enabling integration and monitoring of uncertain data from multiplesensors. Through analysis, illustrative examples and extensive evaluation of the proposed protocols, this dissertation demonstrates that the proposed solutions can be employed for efficiently handling the different aspects of variety of data-intensive applications

eScholarship - University of California

Big data and the SP theory of intelligence

Author: Wolff J. Gerard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

This article is about how the "SP theory of intelligence" and its realisation in the "SP machine" may, with advantage, be applied to the management and analysis of big data. The SP system -- introduced in the article and fully described elsewhere -- may help to overcome the problem of variety in big data: it has potential as "a universal framework for the representation and processing of diverse kinds of knowledge" (UFK), helping to reduce the diversity of formalisms and formats for knowledge and the different ways in which they are processed. It has strengths in the unsupervised learning or discovery of structure in data, in pattern recognition, in the parsing and production of natural language, in several kinds of reasoning, and more. It lends itself to the analysis of streaming data, helping to overcome the problem of velocity in big data. Central in the workings of the system is lossless compression of information: making big data smaller and reducing problems of storage and management. There is potential for substantial economies in the transmission of data, for big cuts in the use of energy in computing, for faster processing, and for smaller and lighter computers. The system provides a handle on the problem of veracity in big data, with potential to assist in the management of errors and uncertainties in data. It lends itself to the visualisation of knowledge structures and inferential processes. A high-parallel, open-source version of the SP machine would provide a means for researchers everywhere to explore what can be done with the system and to create new versions of it.Comment: Accepted for publication in IEEE Acces

arXiv.org e-Print Archive

CiteSeerX

Big data analytics and application for logistics and supply chain management

Author: Cheng T.C.E.
Govindan Kannan
Mishra Nishikant
Shukla Nagesh
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

This special issue explores big data analytics and applications for logistics and supply chain management by examining novel methods, practices, and opportunities. The articles present and analyse a variety of opportunities to improve big data analytics and applications for logistics and supply chain management, such as those through exploring technology-driven tracking strategies, financial performance relations with data driven supply chains, and implementation issues and supply chain capability maturity with big data. This editorial note summarizes the discussions on the big data attributes, on effective practices for implementation, and on evaluation and implementation methods

Repository@Hull - Worktribe

Crossref

University of Southern Denmark Research Output

Big Data Management Challenges, Approaches, Tools and their limitations

Author: Adiba Michel
Castrejon-Castillo Juan-Carlos
Espinosa Oviedo Javier Alfonso
Vargas-Solar Genoveva
Zechinelli-Martini José-Luis
Publication venue: Chapman and Hall/CRC
Publication date: 01/02/2016
Field of study

International audienceBig Data is the buzzword everyone talks about. Independently of the application domain, today there is a consensus about the V's characterizing Big Data: Volume, Variety, and Velocity. By focusing on Data Management issues and past experiences in the area of databases systems, this chapter examines the main challenges involved in the three V's of Big Data. Then it reviews the main characteristics of existing solutions for addressing each of the V's (e.g., NoSQL, parallel RDBMS, stream data management systems and complex event processing systems). Finally, it provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data

Hal - Université Grenoble Alpes

Solar-Wind Energy Assessment by Big Data Analysis

Author: Bunglowala Aaquil
Khare Vikas
Publication venue: 'IntechOpen'
Publication date: 27/11/2019
Field of study

Big data refer to the massive datasets that are collected from a variety of data sources for business needs to reveal new insights for optimized decision-making. The solar and wind energy system is the modernization of electrical energy generation systems due to the pollution free nature and the continuous advancement of photo-voltaic and wind turbine system technologies. In the solar and wind energy surroundings, the application of big data analysis based decision-making and control are mainly in the following three aspects: data stream side management, storage side management and load side management. The objective of this research is to present a technological framework for the management of large volumes, variety, and velocity of solar system related information through big data tools such as Hadoop to support the assessment of solar and wind energy system. The framework includes a modeling of system, storage, management, monitoring and forecast based on large amounts of global and diffuse solar radiation and wind energy system. This chapter also includes market basket model, the concept of solar and wind depository and application of the Map Reduce algorithm

IntechOpen

Crossref

Influence of Big Data in managing cyber assets

Author: Mitra Amit
Munir Kamran
Publication venue: 'Emerald'
Publication date: 09/09/2019
Field of study

© 2019, Emerald Publishing Limited. Purpose: Today, Big Data plays an imperative role in the creation, maintenance and loss of cyber assets of organisations. Research in connection to Big Data and cyber asset management is embryonic. Using evidence, the purpose of this paper is to argue that asset management in the context of Big Data is punctuated by a variety of vulnerabilities that can only be estimated when characteristics of such assets like being intangible are adequately accounted for. Design/methodology/approach: Evidence for the study has been drawn from interviews of leaders of digital transformation projects in three organisations that are within the insurance industry, natural gas and oil, and manufacturing industries. Findings: By examining the extant literature, the authors traced the type of influence that Big Data has over asset management within organisations. In a context defined by variability and volume of data, it is unlikely that the authors will be going back to restricting data flows. The focus now for asset managing organisations would be to improve semantic processors to deal with the vast array of data in variable formats. Research limitations/implications: Data used as evidence for the study are based on interviews, as well as desk research. The use of real-time data along with the use of quantitative analysis could lead to insights that have hitherto eluded the research community. Originality/value: There is a serious dearth of the research in the context of innovative leadership in dealing with a threatened asset management space. Interpreting creative initiatives to deal with a variety of risks to data assets has clear value for a variety of audiences

Crossref

UWE Bristol Research Repository

Tutorial: Big Data Analytics: Concepts, Technologies, and Applications

Author: Watson Hugh J
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/05/2014
Field of study

We have entered the big data era. Organizations are capturing, storing, and analyzing data that has high volume, velocity, and variety and comes from a variety of new sources, including social media, machines, log files, video, text, image, RFID, and GPS. These sources have strained the capabilities of traditional relational database management systems and spawned a host of new technologies, approaches, and platforms. The potential value of big data analytics is great and is clearly established by a growing number of studies. The keys to success with big data analytics include a clear business need, strong committed sponsorship, alignment between the business and IT strategies, a fact-based decision-making culture, a strong data infrastructure, the right analytical tools, and people skilled in the use of analytics. Because of the paradigm shift in the kinds of data being analyzed and how this data is used, big data can be considered to be a new, fourth generation of decision support data management. Though the business value from big data is great, especially for online companies like Google and Facebook, how it is being used is raising significant privacy concerns

AIS Electronic Library (AISeL)

Km4City Ontology Building vs Data Harvesting and Cleaning for Smart-city Services

Author: Bellini Pierfrancesco
Benigni Monica
Billero Riccardo
Nesi Paolo
Rauch Nadia
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Presently, a very large number of public and private data sets are available from local governments. In most cases, they are not semantically interoperable and a huge human effort would be needed to create integrated ontologies and knowledge base for smart city. Smart City ontology is not yet standardized, and a lot of research work is needed to identify models that can easily support the data reconciliation, the management of the complexity, to allow the data reasoning. In this paper, a system for data ingestion and reconciliation of smart cities related aspects as road graph, services available on the roads, traffic sensors etc., is proposed. The system allows managing a big data volume of data coming from a variety of sources considering both static and dynamic data. These data are mapped to a smart-city ontology, called KM4City (Knowledge Model for City), and stored into an RDF-Store where they are available for applications via SPARQL queries to provide new services to the users via specific applications of public administration and enterprises. The paper presents the process adopted to produce the ontology and the big data architecture for the knowledge base feeding on the basis of open and private data, and the mechanisms adopted for the data verification, reconciliation and validation. Some examples about the possible usage of the coherent big data knowledge base produced are also offered and are accessible from the RDF-Store and related services. The article also presented the work performed about reconciliation algorithms and their comparative assessment and selection

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Florence Research

A pilot “big data” education module curriculum for engineering graduate education: Development and implementation

Author: Pouchard Line C
Sapp Nelson Megan R.
Publication venue: 'Purdue University (bepress)'
Publication date: 20/04/2017
Field of study

Projects in engineering higher education increasingly produce data in the volume, variety, velocity, and need for veracity such that the output of the research is considered “Big Data”. While engineering faculty members do conceive of and direct the research producing this data, there may be gaps in faculty members’ knowledge in training graduate and undergraduate research assistants in the practical management of Big Data. The project described in this research paper details the development of a Big Data education module for graduate researchers in Electrical and Computer Engineering. The project has the following objectives: to document and describe current data management practices within a specific research group; to identify gaps in knowledge that need to be addressed in order for research assistants to successfully manage Big Data; and to create curricular interventions to address these gaps. This paper details the motivation, relevant literature, research methodology, curricular intervention, and pilot presentation of the Big Data module. Results indicate that the fundamental concepts governing the management of Big Data have been cursorily covered in previous coursework and that students are in need of a comprehensive introduction to the topic, contextualized to the work that they are performing in the research or classroom environment

Crossref

Purdue E-Pubs