7 research outputs found

    Data Transfers in Hadoop: A Comparative Study

    Get PDF
    Hadoop is an open source framework for processing large amounts of data in distributed computing environment. It plays an important role in processing and analyzing the Big Data. This framework is used for storing data on large clusters of commodity hardware. Data input and output to and from Hadoop is an indispensable action for any data processing job. At present, many tools have been evolved for importing and exporting Data in Hadoop. In this article, some commonly used tools for importing and exporting data have been emphasized. Moreover, a state-of-the-art comparative study among the various tools has been made. With this study, it has been decided that where to use one tool over the other with emphasis on the data transfer to and from Hadoop system. This article also discusses about how Hadoop handles backup and disaster recovery along with some open research questions in terms of Big Data transfer when dealing with cloud-based services

    A Goal Driven Framework for Service Discovery in Service-Oriented Architecture: A Multiagent Based Approach

    Get PDF
    Automated service discovery is one of the very important features in any Semantic Web Service (SWS) based framework. Achieving this functionality in e-resource sharing system is not an easy task due to its hugeness and heterogeneity among the available resources. Any efficient automated service discovery will remain worthless until discovered services fulfill the required goal(s) demanded by the user or the client program. In this paper we have proposed a goal driven approach towards an automated service discovery using Agent Swarm in an innovative way .A novel multi agent based architecture has been introduced here for service discovery. Communications among the agent in service-oriented framework for the said purpose has also been illustrated here. Finally, the pictorial view of the running agent in the system is shown

    Efficiently Processing and Storing Library Linked Data using Apache Spark and Parquet

    No full text
    Resource Description Framework (RDF) is a commonly used data model in the Semantic Web environment. Libraries and various other communities have been using the RDF data model to store valuable data after it is extracted from traditional storage systems. However, because of the large volume of the data, processing and storing it is becoming a nightmare for traditional data-management tools. This challenge demands a scalable and distributed system that can manage data in parallel. In this article, a distributed solution is proposed for efficiently processing and storing the large volume of library linked data stored in traditional storage systems. Apache Spark is used for parallel processing of large data sets and a column-oriented schema is proposed for storing RDF data. The storage system is built on top of Hadoop Distributed File Systems (HDFS) and uses the Apache Parquet format to store data in a compressed form. The experimental evaluation showed that storage requirements were reduced significantly as compared to Jena TDB, Sesame, RDF/XML, and N-Triples file formats. SPARQL queries are processed using Spark SQL to query the compressed data. The experimental evaluation showed a good query response time, which significantly reduces as the number of worker nodes increases
    corecore