Search CORE

457 research outputs found

Hadoop The Emerging Tool in the Present Scenario for Accessing the Large Sets of Data

Author: Movva Chandra Kiran
Sura Tejaswi
Thipparthi Ranjith Reddy
Publication venue: OPUS Open Portal to University Scholarship
Publication date: 01/04/2015
Field of study

Hadoop is one of the tools designed to handle big data. Hadoop and other software products work to interpret or parse the results of big data searches through specific proprietary algorithms and methods. Hadoop is an open-source program under the Apache license that is maintained by a global community of users. It includes various main components, including a MapReduce set of functions and a Hadoop distributed file system (HDFS). The idea behind MapReduce is that Hadoop can first map a large data set, and then perform a reduction on that content for specific results. A reduce function can be thought of as a kind of filter for raw data. The HDFS system then acts to distribute data across a network or migrate it as necessary. The term Hadoop often refers not just to the base modules above but also to the collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others. Prominent corporate users of Hadoop include Face book and Yahoo. It can be deployed in traditional onsite datacenters as well as via the cloud; e.g., it is available on Microsoft Azure, Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3), Google App Engine and IBM Bluemix cloud services. In this paper, we significantly identify and describe the major factors, that Hadoop approach improves accessing large sets of data say “big data” to meet the rapid changing business environments. We also provide a brief comparison Hadoop techniques with traditional systems techniques, and discuss current state of adopting Hadoop techniques. We speculate that from the need to satisfy the customer through time dependency. Hadoop is emerged as an alternative to traditional methods. The purpose of this paper is to provide an in-depth understanding, the major benefits of Hadoop approach to access, as well as provide a study report of Hadoop importance in the present scenario

Governors State University

Big Data in multiscale modelling:from medical image processing to personalized models

Author: Filipović Nenad
Geroski Tijana
Jakovljević Djordje
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/05/2023
Field of study

Coventry University Pure Portal

Evaluation of Hadoop/Mapreduce Framework Migration Tools

Author: Adewumi A. O.
Misra Sanjay
Odia Trust
Publication venue
Publication date: 01/01/2014
Field of study

In distributed systems, database migration is not an easy task. Companies will encounter challenges moving data including legacy data to the big data platform. This paper reviews some tools for migrating from traditional databases to the big data platform and thus suggests a model, based on the review

Covenant University Repository

Crossref

Performance Evaluation of Structured and Unstructured Data in PIG/HADOOP and MONGO-DB Environments

Author: Dasari Sri Rama Chandra Charan Tej
Ram Sri
Publication venue: The Repository at St. Cloud State
Publication date: 01/03/2019
Field of study

The exponential development of data initially exhibited difficulties for prominent organizations, for example, Google, Yahoo, Amazon, Microsoft, Facebook, Twitter and so forth. The size of the information that needs to be handled by cloud applications is developing significantly quicker than storage capacity. This development requires new systems for managing and breaking down data. The term Big Data is used to address large volumes of unstructured (or semi-structured) and structured data that gets created from different applications, messages, weblogs, and online networking. Big Data is data whose size, variety and uncertainty require new supplementary models, procedures, algorithms, and research to manage and extract value and concealed learning from it. To process more information efficiently and skillfully, for analysis parallelism is utilized. To deal with the unstructured and semi-structured information NoSQL database has been presented. Hadoop better serves the Big Data analysis requirements. It is intended to scale up starting from a single server to a large cluster of machines, which has a high level of adaptation to internal failure. Many business and research institutes such as Facebook, Yahoo, Google, and so on had an expanding need to import, store, and analyze dynamic semi-structured data and its metadata. Also, significant development of semi-structured data inside expansive web-based organizations has prompted the formation of NoSQL data collections for flexible sorting and MapReduce for adaptable parallel analysis. They assessed, used and altered Hadoop, the most popular open source execution of MapReduce, for tending to the necessities of various valid analytics problems. These institutes are also utilizing MongoDB, and a report situated NoSQL store. In any case, there is a limited comprehension of the execution trade-offs of using these two innovations. This paper assesses the execution, versatility, and adaptation to an internal failure of utilizing MongoDB and Hadoop, towards the objective of recognizing the correct programming condition for logical data analytics and research. Lately, an expanding number of organizations have developed diverse, distinctive kinds of non-relational databases (such as MongoDB, Cassandra, Hypertable, HBase/ Hadoop, CouchDB and so on), generally referred to as NoSQL databases. The enormous amount of information generated requires an effective system to analyze the data in various scenarios, under various breaking points. In this paper, the objective is to find the break-even point of both Hadoop/Pig and MongoDB and develop a robust environment for data analytics

St. Cloud State University

PlantES: A plant electrophysiological multi-source data online analysis and sharing platform

Author: et al.
Li Jun
Liu Wei-He
Qin Xiao-Huang
Song Chao
Tang Guiliang
Wang Zi-Yang
Zhou Qiao
Publication venue: Digital Commons @ Michigan Tech
Publication date: 16/11/2018
Field of study

At present, plant electrophysiological data volumes and complexity are increasing rapidly. It causes the demand for efficient management of big data, data sharing among research groups, and fast analysis. In this paper, we proposed PlantES (Plant Electrophysiological Data Sharing), a distributed computing-based prototype system that can be used to store, manage, visualize, analyze, and share plant electrophysiological data. We deliberately designed a storage schema to manage the multi-source plant electrophysiological data by integrating distributed storage systems HDFS and HBase to access all kinds of files efficiently. To improve the online analysis efficiency, parallel computing algorithms on Spark were proposed and implemented, e.g., plant electrical signals extraction method, the adaptive derivative threshold algorithm, and template matching algorithm. The experimental results indicated that Spark efficiently improves the online analysis. Meanwhile, the online visualization and sharing of multiple types of data in the web browser were implemented. Our prototype platform provides a solution for web-based sharing and analysis of plant electrophysiological multi-source data and improves the comprehension of plant electrical signals from a systemic perspective

Michigan Technological University

A Study on Efficient Design of A Multimedia Conversion Module in PESMS for Social Media Services

Author: Jung Jongjin
Kim Myungjin
Lee Hanku
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2015
Field of study

The main contribution of this paper is to present the Platform-as-a-Service(PaaS) Environment for Social Multimedia Service (PESMS), derived fromthe Social Media Cloud Computing Service Environment. The main role ofour PESMS is to support the development of social networking services thatinclude audio, image, and video formats. In this paper, we focus in particular on the design and implementation of PESMS, including the transcoding function for processing large amounts of social media in a parallel and distributed manner. PESMS is designed to improve the quality and speed of multimedia conversions by incorporating a multimedia conversion module based on Hadoop, consisting of Hadoop Distributed File System for storing large quantities of social data and MapReduce for distributed parallel processing of these data. In this way, our PESMS has the prospect of exponentially reducing the encoding time for transcoding large numbers of image files into specific formats. To test system performance for the transcoding function, we measured the image transcoding time under a variety of experimental conditions. Based on experiments performed on a 28-node cluster, we found that our system delivered excellent performance in the image transcoding function

IAES journal

Crossref

Institute of Advanced Engineering and Science

A proposed architecture of big educational data using hadoop at the University of Kufa

Author: Al-augby Salam
Alhilali Ahmed Hazim
Mjhool Ahmed Yaseen
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/12/2019
Field of study

Nowadays, educational data have been increased rapidly because of the online services provided for both students and staff. University of Kufa (UoK) generates a massive amount of data annually due to the use of e-learning web-based systems, network servers, Windows applications, and Students Information System (SIS). This data is wasted as traditional management software are not capable to analysis it. As a result, the Big Educational Data concept rises to help education sectors by providing new e-learning methods, allowing to meet individual demands and reach the learners' goals, and supporting the students and teacher’s interaction. This paper focuses on designing Big Data analysis architecture, based on the Hadoop in the UoK and the same case for other Iraqi universities. The impact of this work, help the students learn, emphasizing the need of academic researchers and data science specialist for learning and practicing Big Data analytics and support the analysis of the e-learning management system and set the first step toward developing data repository and data policy in UoK

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science