260,660 research outputs found
Challenges for MapReduce in Big Data
In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research
DATA_SPHERE
This paper presents a comprehensive overview of Database Management Systems (DBMS) and their significance in modern information management. DBMS technology plays a crucial role in the storage, organisation, retrieval, and manipulation of vast amounts of data in various domains, ranging from business operations to scientific research. This abstract highlights the key aspects covered in the paper, including the evolution of DBMS, its architectural components, and the challenges and advancements in the field.
The paper begins by discussing the historical development of DBMS, tracing its origins from file-based systems to the emergence of relational databases and the subsequent rise of object-oriented and NoSQL databases. We explore the motivations behind these advancements and their impact on data management.
Next, we delve into the fundamental architectural components of a DBMS. We examine the storage layer, which encompasses data structures and access methods, and discuss different indexing techniques for efficient data retrieval. The query processing and optimization module are explored, focusing on query execution plans and cost-based optimization strategies. Additionally, we analyse the transaction management component, highlighting concepts such as ACID properties, concurrency control, and recovery mechanisms.
The abstract also highlights the challenges faced by modern DBMS. With the proliferation of big data and the advent of cloud computing, scalability, availability, and performance have become critical concerns. We examine techniques such as parallel and distributed databases, replication, and sharding to address these challenges. Furthermore, we discuss the integration of DBMS with emerging technologies like machine learning and blockchain to leverage their capabilities in data analytics and secure data transactions.
Lastly, the abstract touches upon recent advancements in DBMS, including the rise of graph databases for managing interconnected data, the adoption of in-memory databases for high-performance applications, and the exploration of new database models to handle unstructured and semi-structured data.
In conclusion, this paper provides a comprehensive overview of DBMS, covering its historical evolution, architectural components, challenges, and recent advancements. By understanding the principles and advancements in DBMS, researchers and practitioners can effectively harness the power of data management systems to tackle the complexities of modern data-driven applications
Unified Data Management and Comprehensive Performance Evaluation for Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark]
The field of urban spatial-temporal prediction is advancing rapidly with the
development of deep learning techniques and the availability of large-scale
datasets. However, challenges persist in accessing and utilizing diverse urban
spatial-temporal datasets from different sources and stored in different
formats, as well as determining effective model structures and components with
the proliferation of deep learning models. This work addresses these challenges
and provides three significant contributions. Firstly, we introduce "atomic
files", a unified storage format designed for urban spatial-temporal big data,
and validate its effectiveness on 40 diverse datasets, simplifying data
management. Secondly, we present a comprehensive overview of technological
advances in urban spatial-temporal prediction models, guiding the development
of robust models. Thirdly, we conduct extensive experiments using diverse
models and datasets, establishing a performance leaderboard and identifying
promising research directions. Overall, this work effectively manages urban
spatial-temporal data, guides future efforts, and facilitates the development
of accurate and efficient urban spatial-temporal prediction models. It can
potentially make long-term contributions to urban spatial-temporal data
management and prediction, ultimately leading to improved urban living
standards.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with
arXiv:2304.1434
BIG DATA MINING TOOLS FOR UNSTRUCTURED DATA: A REVIEW
Big data is a buzzword that is used for a large size data which includes structured data, semi-structured data and unstructured data. The size of big data is so large, that it is nearly impossible to collect, process and store data using traditional database management system and software techniques. Therefore, big data requires different approaches and tools to analyze data. The process of collecting, storing and analyzing large amount of data to find unknown patterns is called as big data analytics. The information and patterns found by the analysis process is used by large enterprise and companies to get deeper knowledge and to make better decision in faster way to get advantage over competition. So, better techniques and tools must be developed to analyze and process big data. Big data mining is used to extract useful information from large datasets which is mostly unstructured data. Unstructured data is data that has no particular structure, it can be any form. Today, storage of high dimensional data has no standard structure or schema, because of this problem has risen. This paper gives an overview of big data sources, challenges, scope and unstructured data mining techniques that can be used for big data
Cassandra File System Over Hadoop Distributed File System
Cassandra is an open source distributed database management system is designed to handle large amounts of data across many commodity servers, provides a high availability with no single point of failure. Cassandra will be offering the robust support for clusters spanning multiple data centers with asynchronous masterless replica which allow low latency operations for all the clients. N oSQL data stores target the unstructured data, which nature has dynamic and a key focus area for "Big Data" research. New generation data can prove costly and also unpractical to administer with databases SQL, due to lack of structure, high scalability and needs for the elasticity. N oSQL data stores such as MongoDB and Cassandra provide a desirable platform for fast and efficient for data queries. The Hadoop Distributed File System is one of many different components and projects contained within the community Hadoop ecosystem. The Apache Hadoop project defines Had oop - DFS as "the primary storage system which is used by Hadoop applications" that enables "reliable, extremely rapid computations". This paper was providing high - level overview of how Hadoop - styled analytics (MapReduce, Pig, Mahout and Hive) can be run on data contained in Apache Cassandra wit hout the need for Hadoop - DFS
Security and Privacy Issues of Big Data
This chapter revises the most important aspects in how computing
infrastructures should be configured and intelligently managed to fulfill the
most notably security aspects required by Big Data applications. One of them is
privacy. It is a pertinent aspect to be addressed because users share more and
more personal data and content through their devices and computers to social
networks and public clouds. So, a secure framework to social networks is a very
hot topic research. This last topic is addressed in one of the two sections of
the current chapter with case studies. In addition, the traditional mechanisms
to support security such as firewalls and demilitarized zones are not suitable
to be applied in computing systems to support Big Data. SDN is an emergent
management solution that could become a convenient mechanism to implement
security in Big Data systems, as we show through a second case study at the end
of the chapter. This also discusses current relevant work and identifies open
issues.Comment: In book Handbook of Research on Trends and Future Directions in Big
Data and Web Intelligence, IGI Global, 201
Context Aware Middleware Architectures: Survey and Challenges
Abstract: Context aware applications, which can adapt their behaviors to changing environments, are attracting more and more attention. To simplify the complexity of
developing applications, context aware middleware, which introduces context awareness into the traditional middleware, is highlighted to provide a homogeneous interface involving generic context management solutions. This paper provides a survey of state-of-the-art context aware middleware architectures proposed during the period from 2009 through 2015. First, a preliminary background, such as the principles of context, context awareness,
context modelling, and context reasoning, is provided for a comprehensive understanding of context aware middleware. On this basis, an overview of eleven carefully selected
middleware architectures is presented and their main features explained. Then, thorough comparisons and analysis of the presented middleware architectures are performed based on technical parameters including architectural style, context abstraction, context reasoning, scalability, fault tolerance, interoperability, service discovery, storage, security & privacy, context awareness level, and cloud-based big data analytics. The analysis shows that there is actually no context aware middleware architecture that complies with all requirements. Finally, challenges are pointed out as open issues for future work
- …