260,660 research outputs found

    Challenges for MapReduce in Big Data

    Get PDF
    In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research

    DATA_SPHERE

    Get PDF
    This paper presents a comprehensive overview of Database Management Systems (DBMS) and their significance in modern information management. DBMS technology plays a crucial role in the storage, organisation, retrieval, and manipulation of vast amounts of data in various domains, ranging from business operations to scientific research. This abstract highlights the key aspects covered in the paper, including the evolution of DBMS, its architectural components, and the challenges and advancements in the field. The paper begins by discussing the historical development of DBMS, tracing its origins from file-based systems to the emergence of relational databases and the subsequent rise of object-oriented and NoSQL databases. We explore the motivations behind these advancements and their impact on data management. Next, we delve into the fundamental architectural components of a DBMS. We examine the storage layer, which encompasses data structures and access methods, and discuss different indexing techniques for efficient data retrieval. The query processing and optimization module are explored, focusing on query execution plans and cost-based optimization strategies. Additionally, we analyse the transaction management component, highlighting concepts such as ACID properties, concurrency control, and recovery mechanisms. The abstract also highlights the challenges faced by modern DBMS. With the proliferation of big data and the advent of cloud computing, scalability, availability, and performance have become critical concerns. We examine techniques such as parallel and distributed databases, replication, and sharding to address these challenges. Furthermore, we discuss the integration of DBMS with emerging technologies like machine learning and blockchain to leverage their capabilities in data analytics and secure data transactions. Lastly, the abstract touches upon recent advancements in DBMS, including the rise of graph databases for managing interconnected data, the adoption of in-memory databases for high-performance applications, and the exploration of new database models to handle unstructured and semi-structured data. In conclusion, this paper provides a comprehensive overview of DBMS, covering its historical evolution, architectural components, challenges, and recent advancements. By understanding the principles and advancements in DBMS, researchers and practitioners can effectively harness the power of data management systems to tackle the complexities of modern data-driven applications

    Unified Data Management and Comprehensive Performance Evaluation for Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark]

    Full text link
    The field of urban spatial-temporal prediction is advancing rapidly with the development of deep learning techniques and the availability of large-scale datasets. However, challenges persist in accessing and utilizing diverse urban spatial-temporal datasets from different sources and stored in different formats, as well as determining effective model structures and components with the proliferation of deep learning models. This work addresses these challenges and provides three significant contributions. Firstly, we introduce "atomic files", a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets, simplifying data management. Secondly, we present a comprehensive overview of technological advances in urban spatial-temporal prediction models, guiding the development of robust models. Thirdly, we conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions. Overall, this work effectively manages urban spatial-temporal data, guides future efforts, and facilitates the development of accurate and efficient urban spatial-temporal prediction models. It can potentially make long-term contributions to urban spatial-temporal data management and prediction, ultimately leading to improved urban living standards.Comment: 14 pages, 3 figures. arXiv admin note: text overlap with arXiv:2304.1434

    BIG DATA MINING TOOLS FOR UNSTRUCTURED DATA: A REVIEW

    Get PDF
    Big data is a buzzword that is used for a large size data which includes structured data, semi-structured data and unstructured data. The size of big data is so large, that it is nearly impossible to collect, process and store data using traditional database management system and software techniques. Therefore, big data requires different approaches and tools to analyze data. The process of collecting, storing and analyzing large amount of data to find unknown patterns is called as big data analytics. The information and patterns found by the analysis process is used by large enterprise and companies to get deeper knowledge and to make better decision in faster way to get advantage over competition. So, better techniques and tools must be developed to analyze and process big data. Big data mining is used to extract useful information from large datasets which is mostly unstructured data. Unstructured data is data that has no particular structure, it can be any form. Today, storage of high dimensional data has no standard structure or schema, because of this problem has risen. This paper gives an overview of big data sources, challenges, scope and unstructured data mining techniques that can be used for big data

    Cassandra File System Over Hadoop Distributed File System

    Get PDF
    Cassandra is an open source distributed database management system is designed to handle large amounts of data across many commodity servers, provides a high availability with no single point of failure. Cassandra will be offering the robust support for clusters spanning multiple data centers with asynchronous masterless replica which allow low latency operations for all the clients. N oSQL data stores target the unstructured data, which nature has dynamic and a key focus area for "Big Data" research. New generation data can prove costly and also unpractical to administer with databases SQL, due to lack of structure, high scalability and needs for the elasticity. N oSQL data stores such as MongoDB and Cassandra provide a desirable platform for fast and efficient for data queries. The Hadoop Distributed File System is one of many different components and projects contained within the community Hadoop ecosystem. The Apache Hadoop project defines Had oop - DFS as "the primary storage system which is used by Hadoop applications" that enables "reliable, extremely rapid computations". This paper was providing high - level overview of how Hadoop - styled analytics (MapReduce, Pig, Mahout and Hive) can be run on data contained in Apache Cassandra wit hout the need for Hadoop - DFS

    Security and Privacy Issues of Big Data

    Get PDF
    This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.Comment: In book Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence, IGI Global, 201

    Context Aware Middleware Architectures: Survey and Challenges

    Get PDF
    Abstract: Context aware applications, which can adapt their behaviors to changing environments, are attracting more and more attention. To simplify the complexity of developing applications, context aware middleware, which introduces context awareness into the traditional middleware, is highlighted to provide a homogeneous interface involving generic context management solutions. This paper provides a survey of state-of-the-art context aware middleware architectures proposed during the period from 2009 through 2015. First, a preliminary background, such as the principles of context, context awareness, context modelling, and context reasoning, is provided for a comprehensive understanding of context aware middleware. On this basis, an overview of eleven carefully selected middleware architectures is presented and their main features explained. Then, thorough comparisons and analysis of the presented middleware architectures are performed based on technical parameters including architectural style, context abstraction, context reasoning, scalability, fault tolerance, interoperability, service discovery, storage, security & privacy, context awareness level, and cloud-based big data analytics. The analysis shows that there is actually no context aware middleware architecture that complies with all requirements. Finally, challenges are pointed out as open issues for future work
    • …
    corecore