Search CORE

7 research outputs found

Data Transfers in Hadoop: A Comparative Study

Author: Kumar Sharma
Puspendu Mandal
Ujjal Marjit
Publication venue: RonPub
Publication date: 01/01/2015
Field of study

Hadoop is an open source framework for processing large amounts of data in distributed computing environment. It plays an important role in processing and analyzing the Big Data. This framework is used for storing data on large clusters of commodity hardware. Data input and output to and from Hadoop is an indispensable action for any data processing job. At present, many tools have been evolved for importing and exporting Data in Hadoop. In this article, some commonly used tools for importing and exporting data have been emphasized. Moreover, a state-of-the-art comparative study among the various tools has been made. With this study, it has been decided that where to use one tool over the other with emphasis on the data transfer to and from Hadoop system. This article also discusses about how Hadoop handles backup and disaster recovery along with some open research questions in terms of Big Data transfer when dealing with cloud-based services

RonPub -- Research Online Publishing

A Goal Driven Framework for Service Discovery in Service-Oriented Architecture: A Multiagent Based Approach

Author: Biswas Utpal
Marjit Ujjal
Santra Subhrangsu
Sarkar Arup
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 17/08/2020
Field of study

Automated service discovery is one of the very important features in any Semantic Web Service (SWS) based framework. Achieving this functionality in e-resource sharing system is not an easy task due to its hugeness and heterogeneity among the available resources. Any efficient automated service discovery will remain worthless until discovered services fulfill the required goal(s) demanded by the user or the client program. In this paper we have proposed a goal driven approach towards an automated service discovery using Agent Swarm in an innovative way .A novel multi agent based architecture has been introduced here for service discovery. Communications among the agent in service-oriented framework for the said purpose has also been illustrated here. Finally, the pictorial view of the running agent in the system is shown

Interscience Research Network

Efficiently Processing and Storing Library Linked Data using Apache Spark and Parquet

Author: Biswas Utpal
Marjit Ujjal
Sharma Kumar
Publication venue: 'Boston College University Libraries'
Publication date: 26/09/2018
Field of study

Resource Description Framework (RDF) is a commonly used data model in the Semantic Web environment. Libraries and various other communities have been using the RDF data model to store valuable data after it is extracted from traditional storage systems. However, because of the large volume of the data, processing and storing it is becoming a nightmare for traditional data-management tools. This challenge demands a scalable and distributed system that can manage data in parallel. In this article, a distributed solution is proposed for efficiently processing and storing the large volume of library linked data stored in traditional storage systems. Apache Spark is used for parallel processing of large data sets and a column-oriented schema is proposed for storing RDF data. The storage system is built on top of Hadoop Distributed File Systems (HDFS) and uses the Apache Parquet format to store data in a compressed form. The experimental evaluation showed that storage requirements were reduced significantly as compared to Jena TDB, Sesame, RDF/XML, and N-Triples file formats. SPARQL queries are processed using Spark SQL to query the compressed data. The experimental evaluation showed a good query response time, which significantly reduces as the number of worker nodes increases

Boston College: Open Journal Systems

Publishing legacy data as linked data: a state of the art survey

Author: Arup Sarkar
Kumar Sharma
Madaiah Krishnamurthy
Ujjal Marjit
Publication venue: 'Emerald'
Publication date
Field of study

Crossref

Exposing MARC 21 Format for Bibliographic Data As Linked Data With Provenance

Author: Biswas Utpal
Bowen J.
Buneman P.
Harper C. A.
Heath T.
Malmsten M.
Malmsten M.
Marjit U.
Marjit Ujjal
Sharma Kumar
Styles R.
Volz J.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref