Search CORE

33 research outputs found

QoS oriented MapReduce Optimization for Hadoop Based BigData Application

Author: Burhan Ul Islam Khan
Rashidah F. Olanrewaju
Publication venue: International Journal of Engineering Research and Applications
Publication date: 01/01/2014
Field of study

International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc

Directory of Open Access Journals

The International Islamic University Malaysia Repository

A High-Performance Data Accessing and Processing System for Campus Real-time Power Usage

Author: Chou Sheng-Cang
Yang Chao-Tung
Publication venue: Bright Publisher
Publication date: 01/12/2020
Field of study

With the flourishing of Internet of Things (IoT) technology, ubiquitous power data can be linked to the Internet and be analyzed for real-time monitoring requirements. Numerous power data would be accumulated to even Tera-byte level as the time goes. To approach a real-time power monitoring platform on them, an efficient and novel implementation techniques has been developed and formed to be the kernel material of this thesis. Based on the integration of multiple software subsystems in a layered manner, the proposed power-monitoring platform has been established and is composed of Ubuntu (as operating system), Hadoop (as storage subsystem), Hive (as data warehouse), and the Spark MLlib (as data analytics) from bottom to top. The generic power-data source is provided by the so-called smart meters equipped inside factories located in an enterprise practically. The data collection and storage are handled by the Hadoop subsystem and the data ingestion to Hive data warehouse is conducted by the Spark unit. On the aspect of system verification, under single-record query, these software modules: HiveQL and Impala SQL had been tested in terms of query-response efficiency. And for the performance exploration on the full-table query function. The relevant experiments have been conducted on the same software modules as well. The kernel contributions of this research work can be highlighted by two parts: the details of building an efficient real-time power-monitoring platform, and the relevant query-response efficiency for reference

Directory of Open Access Journals

IJIIS - International Journal of Informatics and Information Systems

Big Data Harmonization – Challenges and Applications

Author: Prof. Jigna Ashish Patel, Dr. Priyanka Sharma
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2017
Field of study

As data grow, need for big data solution gets increased day by day. Concept of data harmonization exist since two decades. As data is to be collected from various heterogeneous sources and techniques of data harmonization allow them to be in a single format at same place it is also called data warehouse. Lot of advancement occurred to analyses historical data by using data warehousing. Innovations uncover the challenges and problems faced by data warehousing every now and then. When the volume and variety of data gets increased exponentially, existing tools might not support the OLAP operations by traditional warehouse approach. In this paper we tried to focus on the research being done in the field of big data warehouse category wise. Research issues and proposed approaches on various kind of dataset is shown. Challenges and advantages of using data warehouse before data mining task are also explained in detail

International Journal on Recent and Innovation Trends in Computing and Communication

A comparative analysis of the performance of the relational database and the Hadoop environment in the context of analytical data processing

Author: Zadrąg Michał
Publication venue: Lublin University of Technology
Publication date: 30/09/2023
Field of study

The article presents a detailed comparative analysis of the performance of a Microsoft SQL Server relational database and an Apache Hadoop environment in the context of analytical data processing. The study was carried out by execut-ing more than a dozen research scenarios with different queries on datasets of varying sizes. For each research scenario, the average query execution time on different datasets was compared. Based on the results, it was found that the average execution time of queries from the presented scenarios is significantly shorter in MS SQL Server than in Apache Ha-doop

Lublin University of Technology Journals

DNA BARCODING DENGAN ALGORITMA PARTICLE SWARM OPTIMIZATION MENGGUNAKAN APACHE SPARK SQL

Author: Muhammad Ilham Nurfathiya -
Publication venue
Publication date: 30/06/2020
Field of study

Terdapat salah satu tahap dalam DNA barcoding yang masih menggunakan metode manual seperti similarity check yang mengakibatkan tahap ini ketelitian dan waktu yang cukup lama. Data sekuens DNA makhluk hidup merupakan data yang sangat banyak pada bidang biologi. Untuk itu penelitian in membuat sebuah model komputasi untuk mendapatkan DNA barcode secara cepat dan efektif dengan mengimplementasikan algoritma particle swarm optimization pada big data platform yaitu Apache Hadoop dan Apache Spark . Data yang digunakan pada penelitian kali ini adalah data RNA SARS-CoV-2. Hasil dari program yang dibangun berupa DNA barcode yang ditemukan dari sampel yang ada berserta waktu yang dibutuhkan untuk menyelesaikan kalkulasi. Dilakukan 2 skenario pengujian, skenario pertama yaitu dengan menggunakan 4 cores dan beberapa worker nodes dan yang kedua yaitu penggunaan cluster dengan 2 worker nodes dan beberapa cores. Hasil dari penelitian ini membuktikan bahwa model komputasi yang dibangun pada big data platform menunjukan adanya perkembangan fitur dan percepatan terhadap penelitian terdahulu. There is one stage in DNA barcoding that still uses manual methods such as similarity check which results in this stage of accuracy and quite a long time. DNA sequence data of living things is very much data in the field of biology. For this reason, this research creates a computational model to obtain DNA barcodes quickly and effectively by implementing the particle swarm optimization algorithm on the big data platform, Apache Hadoop, and Apache Spark. The data used in this study is SARS-CoV-2 RNA data. The results of the program that were built consisted of DNA barcodes found from the existing sample of time needed to complete calculations. The results of this study indicate that there is a significant acceleration between standalone and big data platform with 2 experimental scenarios. The first scenario is to use 4 cores and some worker nodes and the second is to use a cluster with 2 worker nodes and several cores. This research proves that the computational model built on the big data platform shows the development of features and acceleration of previous research

Repository UPI

Efficient Storage Management over Cloud Using Data Compression without Losing Searching Capacity

Author: Desai Amish
P Gohil Amitkumar
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 27/02/2015
Field of study

Nowadays due to social media, people may communicate with each other, share their thoughts and moments of life in form of texts, images or videos. We are uploading our private data in terms of photos, videos, and documents on internet websites like Facebook, Whatsapp, Google+ and Youtube etc. In short today world is surrounded with large volume of data in different form. This put a requirement for effective management of these billions of terabytes of electronic data generally called BIG DATA. Handling large data sets is a major challenge for data centers. The only solution for this problem is to add as many hard disk as required. But if the data is kept in unformatted the requirement of hard disk will be very high. Cloud technology in today is becoming popular but efficient storage management for large volume of data on cloud still there is a big question. Many frameworks are available to address this problem. Hadoop is one of them. Hadoop provides an efficient way to store and retrieve large volume of data. But Hadoop is efficient only if the file containing data is large enough. Basically Hadoop uses a big hard disk block to store data. And this makes it inefficient in the area where volume to data is large but individual file is small. To satisfy both challenges to store large volume of data in less space. And to store small unit of file without wasting the space. We require to store data not is usual form but in compressed form so that we can keep the block size small. But if we do so it added one more dimension of problem. Searching the content in a compressed file is very in-efficient. Therefore we require an efficient algorithm which compress the file without disturbing the search capacity of the data center. Here we will provide the way how we can solve these challenges. Keywords:Cloud, Big DATA, Hadoop, Data Compression, MapReduc

CiteSeerX

International Institute for Science, Technology and Education (IISTE): E-Journals

Big Data Management Challenges, Approaches, Tools and their limitations

Author: Adiba Michel
Castrejon-Castillo Juan-Carlos
Espinosa Oviedo Javier Alfonso
Vargas-Solar Genoveva
Zechinelli-Martini José-Luis
Publication venue: Chapman and Hall/CRC
Publication date: 01/02/2016
Field of study

International audienceBig Data is the buzzword everyone talks about. Independently of the application domain, today there is a consensus about the V's characterizing Big Data: Volume, Variety, and Velocity. By focusing on Data Management issues and past experiences in the area of databases systems, this chapter examines the main challenges involved in the three V's of Big Data. Then it reviews the main characteristics of existing solutions for addressing each of the V's (e.g., NoSQL, parallel RDBMS, stream data management systems and complex event processing systems). Finally, it provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data

Hal - Université Grenoble Alpes