Search CORE

51 research outputs found

Challenges for MapReduce in Big Data

Author: Allison David S
Capretz Miriam A.M.
Grolinger Katarina
Hayes Michael
Higashino Wilson A
L\u27Heureux Alexandra
Publication venue: Scholarship@Western
Publication date: 01/01/2014
Field of study

In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research

Scholarship@Western

Crossref

Document Clustering with Map Reduce using Hadoop Framework

Author: M. Satish, M. Ramakrishna Murty
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/01/2015
Field of study

Big data is a collection of data sets. It is so enormous and complex that it becomes difficult to processes and analyse using normal database management tools or traditional data processing applications. Big data is having many challenges. The main problem of the big data is store and retrieve of the data from the search engines. Document data is also growing rapidly in the eon of internet. Analysing document data is very important for many applications. Document clustering is the one of the important technique to analyse the document data. It has many applications like organizing large document collection, finding similar documents, recommendation system, duplicate content detection, search optimization. This work is motivated by the reorganization of the need for a well efficient retrieve of the data from massive resources of data repository through the search engines. In this work mainly focused on document clustering for collection of documents in efficient manner using with MapReduce. DOI: 10.17762/ijritcc2321-8169.15018

International Journal on Recent and Innovation Trends in Computing and Communication

Алгоритми Машинного Навчання у контексті Великих Даних

Author: Бугайов А.Д.
Терещенко В.М.
Publication venue: Інститут проблем штучного інтелекту МОН України та НАН України
Publication date: 01/01/2018
Field of study

Великі Дані обіцяють змінити наш звичний уклад повсякденного життя, роботи, відпочинку. Однак, вилучення інформації з великих масивів даних процес нетривіальний і досить ресурсомісткий. До того ж використовувати інструменти для аналізу даних, які були актуальні ще 10 років тому в сучасному контексті досить складно. У даній роботі розглянуті сучасні методи Машинного Навчання, які підходять для обробки Великих Даних, наведені їх переваги в конкретному середовищі і то як вони долають той чи інший виклик породжений Великими Даними. Врешті обрана одна методологія, яка досить широко покриває оголошені виклики і на ній зроблено акцент з коротким описом її проблематики в сучасному стані.Big Data promises to change our habitual way of daily life, work, leisure. However, extracting information from huge data sets is not a trivial process and is rather resource intensive. Furthermore, the tools for data analysis that were relevant 10 years ago aren’t so effective in the current context. In this paper is considered modern and popular methods of machine learning that are suitable for processing Big Data, addressed their advantages in a particular environment and described how they cope with the challenges coming from the Big Data. In the final analysis the methodology that broadly covers the announced calls was chosen and its current problems were described

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Big Data and MapReduce Challenges, Opportunities and Trends

Author: Bagwan A. B.
Subrahmanyam K.
Thanekar Sachin Arun
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2016
Field of study

Nowadays we all are surrounded by Big data. The term ‘Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data. The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement

IAES journal

Crossref

Institute of Advanced Engineering and Science

CLASSIFICATION ALGORITHMS FOR BIG DATA ANALYSIS, A MAP REDUCE APPROACH

Author
Publication venue: 'Copernicus GmbH'
Publication date
Field of study

Crossref

Efficient Storage Management over Cloud Using Data Compression without Losing Searching Capacity

Author: Desai Amish
P Gohil Amitkumar
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 27/02/2015
Field of study

Nowadays due to social media, people may communicate with each other, share their thoughts and moments of life in form of texts, images or videos. We are uploading our private data in terms of photos, videos, and documents on internet websites like Facebook, Whatsapp, Google+ and Youtube etc. In short today world is surrounded with large volume of data in different form. This put a requirement for effective management of these billions of terabytes of electronic data generally called BIG DATA. Handling large data sets is a major challenge for data centers. The only solution for this problem is to add as many hard disk as required. But if the data is kept in unformatted the requirement of hard disk will be very high. Cloud technology in today is becoming popular but efficient storage management for large volume of data on cloud still there is a big question. Many frameworks are available to address this problem. Hadoop is one of them. Hadoop provides an efficient way to store and retrieve large volume of data. But Hadoop is efficient only if the file containing data is large enough. Basically Hadoop uses a big hard disk block to store data. And this makes it inefficient in the area where volume to data is large but individual file is small. To satisfy both challenges to store large volume of data in less space. And to store small unit of file without wasting the space. We require to store data not is usual form but in compressed form so that we can keep the block size small. But if we do so it added one more dimension of problem. Searching the content in a compressed file is very in-efficient. Therefore we require an efficient algorithm which compress the file without disturbing the search capacity of the data center. Here we will provide the way how we can solve these challenges. Keywords:Cloud, Big DATA, Hadoop, Data Compression, MapReduc

CiteSeerX

International Institute for Science, Technology and Education (IISTE): E-Journals

A STUDY ON DATA STREAMING IN FOG COMPUTING ENVIRONMENT

Author: Jameel Shymaa
Publication venue: University of Information and Technology Communications
Publication date: 01/05/2019
Field of study

In lately years, data streaming is become more important day by day, considering technologies employed to servethat manner and share number of terminals within the system either direct or indirect interacting with them.Smart devices now play active role in the data streaming environment as well as fog and cloud compatibility. It is affectingthe data collectivity and appears clearly with the new technologies provided and the increase for the number of theusers of such systems. This is due to the number of the users and resources available system start to employ the computationalpower to the fog for moving the computational power to the network edge. It is adopted to connect system that streamed dataas an object. Those inter-connected objects are expected to be producing more significant data streams, which are produced atunique rates, in some cases for being analyzed nearly in real time. In the presented paper a survey of data streaming systemstechnologies is introduced. It clarified the main notions behind big data stream concepts as well as fog computing. From thepresented study, the industrial and research communities are capable of gaining information about requirements for creatingFog computing environment with a clearer view about managing resources in the Fog.The main objective of this paper is to provide short brief and information about Data Streaming in Fog ComputingEnvironment with explaining the major research field within this meaning

Iraqi Journal for Computers and Informatics

A first attempt on global evolutionary undersampling for imbalanced big data

Author: Bustince H.
Galar M.
Herrera Francisco
Triguero Isaac
Publication venue
Publication date: 07/07/2017
Field of study

The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are involved as a key step. Existing solutions typically follow a divide-and-conquer approach in which the data is split into several chunks that are addressed individually. Next, the partial knowledge acquired from every slice of data is aggregated in multiple ways to solve the entire problem. However, these approaches are missing a global view of the data as a whole, which may result in less accurate models. In this work we carry out a first attempt on the design of a global evolutionary undersampling model for imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary models are being used to balance it by selecting only the most relevant data. Using Apache Spark as big data technology, we have introduced a number of variations to the well-known CHC algorithm to work very large chromosomes and reduce the costs associated to fitness evaluation. We discuss some preliminary results, showing the great potential of this new kind of evolutionary big data model

Nottingham ePrints

Nottingham eTheses

Crossref

Novel holistic architecture for analytical operation on sensory data relayed as cloud services

Author: B. C Manujakshi
Ramesh K. B.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2020
Field of study

With increasing adoption of the sensor-based application, there is an exponential rise of the sensory data that eventually take the shape of the big data. However, the practicality of executing high end analytical operation over the resource-constrained big data has never being studied closely. After reviewing existing approaches, it is explored that there is no cost effective schemes of big data analytics over large scale sensory data processiing that can be directly used as a service. Therefore, the propsoed system introduces a holistic architecture where streamed data after performing extraction of knowedge can be offered in the form of services. Implemented in MATLAB, the proposed study uses a very simplistic approach considering energy constrained of the sensor nodes to find that proposed system offers better accuracy, reduced mining duration (i.e. faster response time), and reduced memory dependencies to prove that it offers cost effective analytical solution in contrast to existing system

ZENODO

Institute of Advanced Engineering and Science