51 research outputs found

    Challenges for MapReduce in Big Data

    Get PDF
    In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research

    Document Clustering with Map Reduce using Hadoop Framework

    Get PDF
    Big data is a collection of data sets. It is so enormous and complex that it becomes difficult to processes and analyse using normal database management tools or traditional data processing applications. Big data is having many challenges. The main problem of the big data is store and retrieve of the data from the search engines. Document data is also growing rapidly in the eon of internet. Analysing document data is very important for many applications. Document clustering is the one of the important technique to analyse the document data. It has many applications like organizing large document collection, finding similar documents, recommendation system, duplicate content detection, search optimization. This work is motivated by the reorganization of the need for a well efficient retrieve of the data from massive resources of data repository through the search engines. In this work mainly focused on document clustering for collection of documents in efficient manner using with MapReduce. DOI: 10.17762/ijritcc2321-8169.15018

    Алгоритми Машинного Навчання Ρƒ контСксті Π’Π΅Π»ΠΈΠΊΠΈΡ… Π”Π°Π½ΠΈΡ…

    No full text
    Π’Π΅Π»ΠΈΠΊΡ– Π”Π°Π½Ρ– ΠΎΠ±Ρ–Ρ†ΡΡŽΡ‚ΡŒ Π·ΠΌΡ–Π½ΠΈΡ‚ΠΈ наш Π·Π²ΠΈΡ‡Π½ΠΈΠΉ ΡƒΠΊΠ»Π°Π΄ повсякдСнного Тиття, Ρ€ΠΎΠ±ΠΎΡ‚ΠΈ, Π²Ρ–Π΄ΠΏΠΎΡ‡ΠΈΠ½ΠΊΡƒ. Однак, вилучСння Ρ–Π½Ρ„ΠΎΡ€ΠΌΠ°Ρ†Ρ–Ρ— Π· Π²Π΅Π»ΠΈΠΊΠΈΡ… масивів Π΄Π°Π½ΠΈΡ… процСс Π½Π΅Ρ‚Ρ€ΠΈΠ²Ρ–Π°Π»ΡŒΠ½ΠΈΠΉ Ρ– Π΄ΠΎΡΠΈΡ‚ΡŒ рСсурсомісткий. Π”ΠΎ Ρ‚ΠΎΠ³ΠΎ ΠΆ використовувати інструмСнти для Π°Π½Π°Π»Ρ–Π·Ρƒ Π΄Π°Π½ΠΈΡ…, які Π±ΡƒΠ»ΠΈ Π°ΠΊΡ‚ΡƒΠ°Π»ΡŒΠ½Ρ– Ρ‰Π΅ 10 Ρ€ΠΎΠΊΡ–Π² Ρ‚ΠΎΠΌΡƒ Π² сучасному контСксті Π΄ΠΎΡΠΈΡ‚ΡŒ складно. Π£ Π΄Π°Π½Ρ–ΠΉ Ρ€ΠΎΠ±ΠΎΡ‚Ρ– розглянуті сучасні ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈ Машинного Навчання, які ΠΏΡ–Π΄Ρ…ΠΎΠ΄ΡΡ‚ΡŒ для ΠΎΠ±Ρ€ΠΎΠ±ΠΊΠΈ Π’Π΅Π»ΠΈΠΊΠΈΡ… Π”Π°Π½ΠΈΡ…, Π½Π°Π²Π΅Π΄Π΅Π½Ρ– Ρ—Ρ… ΠΏΠ΅Ρ€Π΅Π²Π°Π³ΠΈ Π² ΠΊΠΎΠ½ΠΊΡ€Π΅Ρ‚Π½ΠΎΠΌΡƒ сСрСдовищі Ρ– Ρ‚ΠΎ як Π²ΠΎΠ½ΠΈ Π΄ΠΎΠ»Π°ΡŽΡ‚ΡŒ Ρ‚ΠΎΠΉ Ρ‡ΠΈ Ρ–Π½ΡˆΠΈΠΉ Π²ΠΈΠΊΠ»ΠΈΠΊ ΠΏΠΎΡ€ΠΎΠ΄ΠΆΠ΅Π½ΠΈΠΉ Π’Π΅Π»ΠΈΠΊΠΈΠΌΠΈ Π”Π°Π½ΠΈΠΌΠΈ. Π’Ρ€Π΅ΡˆΡ‚Ρ– ΠΎΠ±Ρ€Π°Π½Π° ΠΎΠ΄Π½Π° мСтодологія, яка Π΄ΠΎΡΠΈΡ‚ΡŒ ΡˆΠΈΡ€ΠΎΠΊΠΎ ΠΏΠΎΠΊΡ€ΠΈΠ²Π°Ρ” ΠΎΠ³ΠΎΠ»ΠΎΡˆΠ΅Π½Ρ– Π²ΠΈΠΊΠ»ΠΈΠΊΠΈ Ρ– Π½Π° Π½Ρ–ΠΉ Π·Ρ€ΠΎΠ±Π»Π΅Π½ΠΎ Π°ΠΊΡ†Π΅Π½Ρ‚ Π· ΠΊΠΎΡ€ΠΎΡ‚ΠΊΠΈΠΌ описом Ρ—Ρ— ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΠ°Ρ‚ΠΈΠΊΠΈ Π² сучасному стані.Big Data promises to change our habitual way of daily life, work, leisure. However, extracting information from huge data sets is not a trivial process and is rather resource intensive. Furthermore, the tools for data analysis that were relevant 10 years ago aren’t so effective in the current context. In this paper is considered modern and popular methods of machine learning that are suitable for processing Big Data, addressed their advantages in a particular environment and described how they cope with the challenges coming from the Big Data. In the final analysis the methodology that broadly covers the announced calls was chosen and its current problems were described

    Big Data and MapReduce Challenges, Opportunities and Trends

    Get PDF
    Nowadays we all are surrounded by Big data. The term β€˜Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data.Β  The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement

    CLASSIFICATION ALGORITHMS FOR BIG DATA ANALYSIS, A MAP REDUCE APPROACH

    Get PDF

    Efficient Storage Management over Cloud Using Data Compression without Losing Searching Capacity

    Get PDF
    Nowadays due to social media, people may communicate with each other, share their thoughts and moments of life in form of texts, images or videos.Β  We are uploading our private data in terms of photos, videos, and documents on internet websites like Facebook, Whatsapp, Google+ and Youtube etc. In short today world is surrounded with large volume of data in different form. This put a requirement for effective management of these billions of terabytes of electronic data generally called BIG DATA. Handling large data sets is a major challenge for data centers. The only solution for this problem is to add as many hard disk as required. But if the data is kept in unformatted the requirement of hard disk will be very high. Cloud technology in today is becoming popular but efficient storage management for large volume of data on cloud still there is a big question. Many frameworks are available to address this problem. Hadoop is one of them. Hadoop provides an efficient way to store and retrieve large volume of data. But Hadoop is efficient only if the file containing data is large enough. Basically Hadoop uses a big hard disk block to store data. And this makes it inefficient in the area where volume to data is large but individual file is small. To satisfy both challenges to store large volume of data in less space. And to store small unit of file without wasting the space. We require to store data not is usual form but in compressed form so that we can keep the block size small. But if we do so it added one more dimension of problem. Searching the content in a compressed file is very in-efficient. Therefore we require an efficient algorithm which compress the file without disturbing the search capacity of the data center. Here we will provide the way how we can solve these challenges. Keywords:Cloud, Big DATA, Hadoop, Data Compression, MapReduc

    A STUDY ON DATA STREAMING IN FOG COMPUTING ENVIRONMENT

    Get PDF
    In lately years, data streaming is become more important day by day, considering technologies employed to servethat manner and share number of terminals within the system either direct or indirect interacting with them.Smart devices now play active role in the data streaming environment as well as fog and cloud compatibility. It is affectingthe data collectivity and appears clearly with the new technologies provided and the increase for the number of theusers of such systems. This is due to the number of the users and resources available system start to employ the computationalpower to the fog for moving the computational power to the network edge. It is adopted to connect system that streamed dataas an object. Those inter-connected objects are expected to be producing more significant data streams, which are produced atunique rates, in some cases for being analyzed nearly in real time. In the presented paper a survey of data streaming systemstechnologies is introduced. It clarified the main notions behind big data stream concepts as well as fog computing. From thepresented study, the industrial and research communities are capable of gaining information about requirements for creatingFog computing environment with a clearer view about managing resources in the Fog.The main objective of this paper is to provide short brief and information about Data Streaming in Fog ComputingEnvironment with explaining the major research field within this meaning

    A first attempt on global evolutionary undersampling for imbalanced big data

    Get PDF
    The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are involved as a key step. Existing solutions typically follow a divide-and-conquer approach in which the data is split into several chunks that are addressed individually. Next, the partial knowledge acquired from every slice of data is aggregated in multiple ways to solve the entire problem. However, these approaches are missing a global view of the data as a whole, which may result in less accurate models. In this work we carry out a first attempt on the design of a global evolutionary undersampling model for imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary models are being used to balance it by selecting only the most relevant data. Using Apache Spark as big data technology, we have introduced a number of variations to the well-known CHC algorithm to work very large chromosomes and reduce the costs associated to fitness evaluation. We discuss some preliminary results, showing the great potential of this new kind of evolutionary big data model

    Novel holistic architecture for analytical operation on sensory data relayed as cloud services

    Get PDF
    With increasing adoption of the sensor-based application, there is an exponential rise of the sensory data that eventually take the shape of the big data. However, the practicality of executing high end analytical operation over the resource-constrained big data has never being studied closely. After reviewing existing approaches, it is explored that there is no cost effective schemes of big data analytics over large scale sensory data processiing that can be directly used as a service. Therefore, the propsoed system introduces a holistic architecture where streamed data after performing extraction of knowedge can be offered in the form of services. Implemented in MATLAB, the proposed study uses a very simplistic approach considering energy constrained of the sensor nodes to find that proposed system offers better accuracy, reduced mining duration (i.e. faster response time), and reduced memory dependencies to prove that it offers cost effective analytical solution in contrast to existing system
    • …
    corecore