457 research outputs found

    Hadoop The Emerging Tool in the Present Scenario for Accessing the Large Sets of Data

    Get PDF
    Hadoop is one of the tools designed to handle big data. Hadoop and other software products work to interpret or parse the results of big data searches through specific proprietary algorithms and methods. Hadoop is an open-source program under the Apache license that is maintained by a global community of users. It includes various main components, including a MapReduce set of functions and a Hadoop distributed file system (HDFS). The idea behind MapReduce is that Hadoop can first map a large data set, and then perform a reduction on that content for specific results. A reduce function can be thought of as a kind of filter for raw data. The HDFS system then acts to distribute data across a network or migrate it as necessary. The term Hadoop often refers not just to the base modules above but also to the collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others. Prominent corporate users of Hadoop include Face book and Yahoo. It can be deployed in traditional onsite datacenters as well as via the cloud; e.g., it is available on Microsoft Azure, Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3), Google App Engine and IBM Bluemix cloud services. In this paper, we significantly identify and describe the major factors, that Hadoop approach improves accessing large sets of data say “big data” to meet the rapid changing business environments. We also provide a brief comparison Hadoop techniques with traditional systems techniques, and discuss current state of adopting Hadoop techniques. We speculate that from the need to satisfy the customer through time dependency. Hadoop is emerged as an alternative to traditional methods. The purpose of this paper is to provide an in-depth understanding, the major benefits of Hadoop approach to access, as well as provide a study report of Hadoop importance in the present scenario

    Evaluation of Hadoop/Mapreduce Framework Migration Tools

    Get PDF
    In distributed systems, database migration is not an easy task. Companies will encounter challenges moving data including legacy data to the big data platform. This paper reviews some tools for migrating from traditional databases to the big data platform and thus suggests a model, based on the review

    Performance Evaluation of Structured and Unstructured Data in PIG/HADOOP and MONGO-DB Environments

    Get PDF
    The exponential development of data initially exhibited difficulties for prominent organizations, for example, Google, Yahoo, Amazon, Microsoft, Facebook, Twitter and so forth. The size of the information that needs to be handled by cloud applications is developing significantly quicker than storage capacity. This development requires new systems for managing and breaking down data. The term Big Data is used to address large volumes of unstructured (or semi-structured) and structured data that gets created from different applications, messages, weblogs, and online networking. Big Data is data whose size, variety and uncertainty require new supplementary models, procedures, algorithms, and research to manage and extract value and concealed learning from it. To process more information efficiently and skillfully, for analysis parallelism is utilized. To deal with the unstructured and semi-structured information NoSQL database has been presented. Hadoop better serves the Big Data analysis requirements. It is intended to scale up starting from a single server to a large cluster of machines, which has a high level of adaptation to internal failure. Many business and research institutes such as Facebook, Yahoo, Google, and so on had an expanding need to import, store, and analyze dynamic semi-structured data and its metadata. Also, significant development of semi-structured data inside expansive web-based organizations has prompted the formation of NoSQL data collections for flexible sorting and MapReduce for adaptable parallel analysis. They assessed, used and altered Hadoop, the most popular open source execution of MapReduce, for tending to the necessities of various valid analytics problems. These institutes are also utilizing MongoDB, and a report situated NoSQL store. In any case, there is a limited comprehension of the execution trade-offs of using these two innovations. This paper assesses the execution, versatility, and adaptation to an internal failure of utilizing MongoDB and Hadoop, towards the objective of recognizing the correct programming condition for logical data analytics and research. Lately, an expanding number of organizations have developed diverse, distinctive kinds of non-relational databases (such as MongoDB, Cassandra, Hypertable, HBase/ Hadoop, CouchDB and so on), generally referred to as NoSQL databases. The enormous amount of information generated requires an effective system to analyze the data in various scenarios, under various breaking points. In this paper, the objective is to find the break-even point of both Hadoop/Pig and MongoDB and develop a robust environment for data analytics

    PlantES: A plant electrophysiological multi-source data online analysis and sharing platform

    Get PDF
    At present, plant electrophysiological data volumes and complexity are increasing rapidly. It causes the demand for efficient management of big data, data sharing among research groups, and fast analysis. In this paper, we proposed PlantES (Plant Electrophysiological Data Sharing), a distributed computing-based prototype system that can be used to store, manage, visualize, analyze, and share plant electrophysiological data. We deliberately designed a storage schema to manage the multi-source plant electrophysiological data by integrating distributed storage systems HDFS and HBase to access all kinds of files efficiently. To improve the online analysis efficiency, parallel computing algorithms on Spark were proposed and implemented, e.g., plant electrical signals extraction method, the adaptive derivative threshold algorithm, and template matching algorithm. The experimental results indicated that Spark efficiently improves the online analysis. Meanwhile, the online visualization and sharing of multiple types of data in the web browser were implemented. Our prototype platform provides a solution for web-based sharing and analysis of plant electrophysiological multi-source data and improves the comprehension of plant electrical signals from a systemic perspective

    A Study on Efficient Design of A Multimedia Conversion Module in PESMS for Social Media Services

    Get PDF
    The main contribution of this paper is to present the Platform-as-a-Service(PaaS) Environment for Social Multimedia Service (PESMS), derived fromthe Social Media Cloud Computing Service Environment. The main role ofour PESMS is to support the development of social networking services thatinclude audio, image, and video formats. In this paper, we focus in particular on the design and implementation of PESMS, including the transcoding function for processing large amounts of social media in a parallel and distributed manner. PESMS is designed to improve the quality and speed of multimedia conversions by incorporating a multimedia conversion module based on Hadoop, consisting of Hadoop Distributed File System for storing large quantities of social data and MapReduce for distributed parallel processing of these data. In this way, our PESMS has the prospect of exponentially reducing the encoding time for transcoding large numbers of image files into specific formats. To test system performance for the transcoding function, we measured the image transcoding time under a variety of experimental conditions. Based on experiments performed on a 28-node cluster, we found that our system delivered excellent performance in the image transcoding function

    A proposed architecture of big educational data using hadoop at the University of Kufa

    Get PDF
    Nowadays, educational data have been increased rapidly because of the online services provided for both students and staff. University of Kufa (UoK) generates a massive amount of data annually due to the use of e-learning web-based systems, network servers, Windows applications, and Students Information System (SIS).  This data is wasted as traditional management software are not capable to analysis it. As a result, the Big Educational Data concept rises to help education sectors by providing new e-learning methods, allowing to meet individual demands and reach the learners' goals, and supporting the students and teacher’s interaction. This paper focuses on designing Big Data analysis architecture, based on the Hadoop in the UoK and the same case for other Iraqi universities. The impact of this work, help the students learn, emphasizing the need of academic researchers and data science specialist for learning and practicing Big Data analytics and support the analysis of the e-learning management system and set the first step toward developing data repository and data policy in UoK
    • …
    corecore