Search CORE

149 research outputs found

Increasing Efficiency of Recommendation System using Big Data Analysis

Author: Chandan Venkatesh, Deekshith Kumar, Ganesh Madhav R
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/10/2015
Field of study

In the present Digital Space a lot of Internet users try to come up with solutions to a particular problem by suggesting solutions that are pre-existing on the Internet. This brings down the originality of posts and we are able to overcome this problem by applying prediction models on data sets. It is important for a user to come up with original ideas to gain up votes which in turn represent the quality of a post. Due to the huge influx of data at every moment, the need for big data analytics becomes essential and hence the use of an open source framework like Hadoop is imperative so as to increase effectiveness of recommender system built on these prediction models

International Journal on Recent and Innovation Trends in Computing and Communication

Data science and distributed intelligence: recent developments and future insights

Author: Cuzzocrea A.
Gaber M.
Publication venue
Publication date: 01/01/2012
Field of study

Archivio istituzionale della ricerca - Università di Trieste

Portsmouth University Research Portal (Pure)

Elementary Concepts of Big Data and Hadoop

Author: Hasmukh B. Domadiya, Dr. Girish C. Bhimani
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2017
Field of study

This paper is an effort to present the basic importance of Big Data and also its importance in an organization from its performance point of view. The term Big data, refers the data sets, whose volume, complexity and also rate of growth make them more difficult to capture, manage, process and also analyzed. For such type of data –intensive applications, the Apache Hadoop Framework has newly concerned a lot of attention. Hadoop is the core platform for structuring Big data, and solves the problem of making it helpful for analytics idea. Hadoop is an open source software project that enables the distributed processing of enormous data and framework for the analysis and transformation of very large data sets using the MapReduce paradigm. This paper deals with the architecture of Hadoop with its various components

International Journal on Recent and Innovation Trends in Computing and Communication

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

Author: Aleksi Kallio
André Schumacher
Dean
Eija Korpelainen
Kallio
Keijo Heljanko
Li
Matti Niemenmaa
McKenna
O'Connor
Olston
Petri Klemelä
Pireddu
Taylor
Thusoo
White
Publication venue: Oxford University Press
Publication date
Field of study

Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps

Crossref

PubMed Central

A High-Performance Data Accessing and Processing System for Campus Real-time Power Usage

Author: Chou Sheng-Cang
Yang Chao-Tung
Publication venue: Bright Publisher
Publication date: 01/12/2020
Field of study

With the flourishing of Internet of Things (IoT) technology, ubiquitous power data can be linked to the Internet and be analyzed for real-time monitoring requirements. Numerous power data would be accumulated to even Tera-byte level as the time goes. To approach a real-time power monitoring platform on them, an efficient and novel implementation techniques has been developed and formed to be the kernel material of this thesis. Based on the integration of multiple software subsystems in a layered manner, the proposed power-monitoring platform has been established and is composed of Ubuntu (as operating system), Hadoop (as storage subsystem), Hive (as data warehouse), and the Spark MLlib (as data analytics) from bottom to top. The generic power-data source is provided by the so-called smart meters equipped inside factories located in an enterprise practically. The data collection and storage are handled by the Hadoop subsystem and the data ingestion to Hive data warehouse is conducted by the Spark unit. On the aspect of system verification, under single-record query, these software modules: HiveQL and Impala SQL had been tested in terms of query-response efficiency. And for the performance exploration on the full-table query function. The relevant experiments have been conducted on the same software modules as well. The kernel contributions of this research work can be highlighted by two parts: the details of building an efficient real-time power-monitoring platform, and the relevant query-response efficiency for reference

Directory of Open Access Journals

IJIIS - International Journal of Informatics and Information Systems

Performansi Response Time Query Pada Hadoop-Hive Menggunakan Metode Partition

Author: Marwan Marwan
Nur Rini
Saharuna Zawiyah
Publication venue: 'Politeknik Negeri Ujung Pandang'
Publication date: 30/05/2021
Field of study

Hive menggantikan teknik pemrosesan tradisional RDBMS yang tidak dapat digunakan pada big data. Tetapi, Hive dengan kondisi default akan mencari data secara menyeluruh saat mengeksekusi query. Metode partition mampu mengelompokkan data, sehingga dilakukan pengujian untuk mengetahui apakah dengan mengelompokkan data akan memberikan peningkatan performansi response time query atau sebaliknya. Pada penelitian ini, dibangun infrastruktur Hadoop cluster dengan sistem multi node menggunakan virtual machine. Dataset yang digunakan adalah dataset Movielens dengan kardinalitas atribut yaitu 5, 50 dan 100. Tiap dataset terdiri dari 15 juta records data. Berdasarkan hasil penelitian, metode partition selain mampu mengelompokkan data juga memberikan performansi response time query yang lebih cepat sebesar 30.8% dibandingkan kondisi default. Selain itu, Metode partition saat kardinalitas 100 lebih baik dibandingkan dua kardinalitas yang lebih kecil yaitu kardinalitas 5 dan kardinalitas 50

Portal E-Journal System PNUP (Politeknik Negeri Ujung Pandang)

Query Recommender System Using Hierarchical Classification

Author: Shruti Y. Patil, Prof. Dinesh D. Patil
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2015
Field of study

In data warehouses, lots of data are gathered which are navigated and explored for analytical purposes. Even for expert people, to handle such a large data is a tough task. Handling such a voluminous data is more difficult task for non-expert users or for users who are not familiar with the database schema. The aim of this paper is to help this class of users by recommending them SQL queries that they might use. These SQL recommendations are selected by tracking the users past behavior and comparing them with other users. At first time, users may not know where to start their exploration. Secondly, users may overlook queries which help to retrieve important information. The queries are recorded and compared using hierarchical classification which is then re-ranked according to relevance. The relevant queries are retrieved using users querying behavior. Users use a query interface to issue a series of SQL queries that aim to analyze the data and mine it for interesting information. DOI: 10.17762/ijritcc2321-8169.15067

International Journal on Recent and Innovation Trends in Computing and Communication