51 research outputs found
Challenges for MapReduce in Big Data
In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research
Document Clustering with Map Reduce using Hadoop Framework
Big data is a collection of data sets. It is so enormous and complex that it becomes difficult to processes and analyse using normal database management tools or traditional data processing applications. Big data is having many challenges. The main problem of the big data is store and retrieve of the data from the search engines. Document data is also growing rapidly in the eon of internet. Analysing document data is very important for many applications. Document clustering is the one of the important technique to analyse the document data. It has many applications like organizing large document collection, finding similar documents, recommendation system, duplicate content detection, search optimization. This work is motivated by the reorganization of the need for a well efficient retrieve of the data from massive resources of data repository through the search engines. In this work mainly focused on document clustering for collection of documents in efficient manner using with MapReduce.
DOI: 10.17762/ijritcc2321-8169.15018
ΠΠ»Π³ΠΎΡΠΈΡΠΌΠΈ ΠΠ°ΡΠΈΠ½Π½ΠΎΠ³ΠΎ ΠΠ°Π²ΡΠ°Π½Π½Ρ Ρ ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡΡ ΠΠ΅Π»ΠΈΠΊΠΈΡ ΠΠ°Π½ΠΈΡ
ΠΠ΅Π»ΠΈΠΊΡ ΠΠ°Π½Ρ ΠΎΠ±ΡΡΡΡΡΡ Π·ΠΌΡΠ½ΠΈΡΠΈ Π½Π°Ρ Π·Π²ΠΈΡΠ½ΠΈΠΉ ΡΠΊΠ»Π°Π΄ ΠΏΠΎΠ²ΡΡΠΊΠ΄Π΅Π½Π½ΠΎΠ³ΠΎ ΠΆΠΈΡΡΡ, ΡΠΎΠ±ΠΎΡΠΈ, Π²ΡΠ΄ΠΏΠΎΡΠΈΠ½ΠΊΡ. ΠΠ΄Π½Π°ΠΊ, Π²ΠΈΠ»ΡΡΠ΅Π½Π½Ρ ΡΠ½ΡΠΎΡΠΌΠ°ΡΡΡ Π· Π²Π΅Π»ΠΈΠΊΠΈΡ
ΠΌΠ°ΡΠΈΠ²ΡΠ² Π΄Π°Π½ΠΈΡ
ΠΏΡΠΎΡΠ΅Ρ Π½Π΅ΡΡΠΈΠ²ΡΠ°Π»ΡΠ½ΠΈΠΉ Ρ Π΄ΠΎΡΠΈΡΡ ΡΠ΅ΡΡΡΡΠΎΠΌΡΡΡΠΊΠΈΠΉ. ΠΠΎ ΡΠΎΠ³ΠΎ ΠΆ Π²ΠΈΠΊΠΎΡΠΈΡΡΠΎΠ²ΡΠ²Π°ΡΠΈ ΡΠ½ΡΡΡΡΠΌΠ΅Π½ΡΠΈ Π΄Π»Ρ Π°Π½Π°Π»ΡΠ·Ρ Π΄Π°Π½ΠΈΡ
, ΡΠΊΡ Π±ΡΠ»ΠΈ Π°ΠΊΡΡΠ°Π»ΡΠ½Ρ ΡΠ΅ 10 ΡΠΎΠΊΡΠ² ΡΠΎΠΌΡ Π² ΡΡΡΠ°ΡΠ½ΠΎΠΌΡ ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡΡ Π΄ΠΎΡΠΈΡΡ ΡΠΊΠ»Π°Π΄Π½ΠΎ. Π£ Π΄Π°Π½ΡΠΉ ΡΠΎΠ±ΠΎΡΡ ΡΠΎΠ·Π³Π»ΡΠ½ΡΡΡ ΡΡΡΠ°ΡΠ½Ρ ΠΌΠ΅ΡΠΎΠ΄ΠΈ ΠΠ°ΡΠΈΠ½Π½ΠΎΠ³ΠΎ ΠΠ°Π²ΡΠ°Π½Π½Ρ, ΡΠΊΡ ΠΏΡΠ΄Ρ
ΠΎΠ΄ΡΡΡ Π΄Π»Ρ ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ ΠΠ΅Π»ΠΈΠΊΠΈΡ
ΠΠ°Π½ΠΈΡ
, Π½Π°Π²Π΅Π΄Π΅Π½Ρ ΡΡ
ΠΏΠ΅ΡΠ΅Π²Π°Π³ΠΈ Π² ΠΊΠΎΠ½ΠΊΡΠ΅ΡΠ½ΠΎΠΌΡ ΡΠ΅ΡΠ΅Π΄ΠΎΠ²ΠΈΡΡ Ρ ΡΠΎ ΡΠΊ Π²ΠΎΠ½ΠΈ Π΄ΠΎΠ»Π°ΡΡΡ ΡΠΎΠΉ ΡΠΈ ΡΠ½ΡΠΈΠΉ Π²ΠΈΠΊΠ»ΠΈΠΊ ΠΏΠΎΡΠΎΠ΄ΠΆΠ΅Π½ΠΈΠΉ ΠΠ΅Π»ΠΈΠΊΠΈΠΌΠΈ ΠΠ°Π½ΠΈΠΌΠΈ. ΠΡΠ΅ΡΡΡ ΠΎΠ±ΡΠ°Π½Π° ΠΎΠ΄Π½Π° ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ»ΠΎΠ³ΡΡ, ΡΠΊΠ° Π΄ΠΎΡΠΈΡΡ ΡΠΈΡΠΎΠΊΠΎ ΠΏΠΎΠΊΡΠΈΠ²Π°Ρ ΠΎΠ³ΠΎΠ»ΠΎΡΠ΅Π½Ρ Π²ΠΈΠΊΠ»ΠΈΠΊΠΈ Ρ Π½Π° Π½ΡΠΉ Π·ΡΠΎΠ±Π»Π΅Π½ΠΎ Π°ΠΊΡΠ΅Π½Ρ Π· ΠΊΠΎΡΠΎΡΠΊΠΈΠΌ ΠΎΠΏΠΈΡΠΎΠΌ ΡΡ ΠΏΡΠΎΠ±Π»Π΅ΠΌΠ°ΡΠΈΠΊΠΈ Π² ΡΡΡΠ°ΡΠ½ΠΎΠΌΡ ΡΡΠ°Π½Ρ.Big Data promises to change our habitual way of daily life, work, leisure. However, extracting information from huge data sets is not a trivial process and is rather resource intensive. Furthermore, the tools for data analysis that were relevant 10 years ago arenβt so effective in the current context. In this paper is considered modern and popular methods of machine learning that are suitable for processing Big Data, addressed their advantages in a particular environment and described how they cope with the challenges coming from the Big Data. In the final analysis the methodology that broadly covers the announced calls was chosen and its current problems were described
Big Data and MapReduce Challenges, Opportunities and Trends
Nowadays we all are surrounded by Big data. The term βBig Dataβ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Big data generated may be structured data, Semi Structured data or unstructured data. For existing database and systems lot of difficulties are there to process, analyze, store and manage such a Big Data.Β The Big Data challenges are Protection, Curation, Capture, Analysis, Searching, Visualization, Storage, Transfer and sharing. Map Reduce is a framework using which we can write applications to process huge amount of data, in parallel, on large clusters of commodity hardware in a reliable manner. Lot of efforts have been put by different researchers to make it simple, easy, effective and efficient. In our survey paper we emphasized on the working of Map Reduce, challenges, opportunities and recent trends so that researchers can think on further improvement
Efficient Storage Management over Cloud Using Data Compression without Losing Searching Capacity
Nowadays due to social media, people may communicate with each other, share their thoughts and moments of life in form of texts, images or videos.Β We are uploading our private data in terms of photos, videos, and documents on internet websites like Facebook, Whatsapp, Google+ and Youtube etc. In short today world is surrounded with large volume of data in different form. This put a requirement for effective management of these billions of terabytes of electronic data generally called BIG DATA. Handling large data sets is a major challenge for data centers. The only solution for this problem is to add as many hard disk as required. But if the data is kept in unformatted the requirement of hard disk will be very high. Cloud technology in today is becoming popular but efficient storage management for large volume of data on cloud still there is a big question. Many frameworks are available to address this problem. Hadoop is one of them. Hadoop provides an efficient way to store and retrieve large volume of data. But Hadoop is efficient only if the file containing data is large enough. Basically Hadoop uses a big hard disk block to store data. And this makes it inefficient in the area where volume to data is large but individual file is small. To satisfy both challenges to store large volume of data in less space. And to store small unit of file without wasting the space. We require to store data not is usual form but in compressed form so that we can keep the block size small. But if we do so it added one more dimension of problem. Searching the content in a compressed file is very in-efficient. Therefore we require an efficient algorithm which compress the file without disturbing the search capacity of the data center. Here we will provide the way how we can solve these challenges. Keywords:Cloud, Big DATA, Hadoop, Data Compression, MapReduc
A STUDY ON DATA STREAMING IN FOG COMPUTING ENVIRONMENT
In lately years, data streaming is become more important day by day, considering technologies employed to servethat manner and share number of terminals within the system either direct or indirect interacting with them.Smart devices now play active role in the data streaming environment as well as fog and cloud compatibility. It is affectingthe data collectivity and appears clearly with the new technologies provided and the increase for the number of theusers of such systems. This is due to the number of the users and resources available system start to employ the computationalpower to the fog for moving the computational power to the network edge. It is adopted to connect system that streamed dataas an object. Those inter-connected objects are expected to be producing more significant data streams, which are produced atunique rates, in some cases for being analyzed nearly in real time. In the presented paper a survey of data streaming systemstechnologies is introduced. It clarified the main notions behind big data stream concepts as well as fog computing. From thepresented study, the industrial and research communities are capable of gaining information about requirements for creatingFog computing environment with a clearer view about managing resources in the Fog.The main objective of this paper is to provide short brief and information about Data Streaming in Fog ComputingEnvironment with explaining the major research field within this meaning
A first attempt on global evolutionary undersampling for imbalanced big data
The design of efficient big data learning models has become a common need in a great number of applications. The massive amounts of available data may hinder the use of traditional data mining techniques, especially when evolutionary algorithms are involved as a key step. Existing solutions typically follow a divide-and-conquer approach in which the data is split into several chunks that are addressed individually. Next, the partial knowledge acquired from every slice of data is aggregated in multiple ways to solve the entire problem. However, these approaches are missing a global view of the data as a whole, which may result in less accurate models.
In this work we carry out a first attempt on the design of a global evolutionary undersampling model for imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary models are being used to balance it by selecting only the most relevant data. Using Apache Spark as big data technology, we have introduced a number of variations to the well-known CHC algorithm to work very large chromosomes and reduce the costs associated to fitness evaluation. We discuss some preliminary results, showing the great potential of this new kind of evolutionary big data model
Novel holistic architecture for analytical operation on sensory data relayed as cloud services
With increasing adoption of the sensor-based application, there is an exponential rise of the sensory data that eventually take the shape of the big data. However, the practicality of executing high end analytical operation over the resource-constrained big data has never being studied closely. After reviewing existing approaches, it is explored that there is no cost effective schemes of big data analytics over large scale sensory data processiing that can be directly used as a service. Therefore, the propsoed system introduces a holistic architecture where streamed data after performing extraction of knowedge can be offered in the form of services. Implemented in MATLAB, the proposed study uses a very simplistic approach considering energy constrained of the sensor nodes to find that proposed system offers better accuracy, reduced mining duration (i.e. faster response time), and reduced memory dependencies to prove that it offers cost effective analytical solution in contrast to existing system
- β¦