Search CORE

298,449 research outputs found

A Community-based Cloud Computing Caching Service

Author: Idachaba Unekwu
Wang Frank Z.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2015
Field of study

Caching has become an important technology in the development of cloud computing-based high-performance web services. Caches reduce the request to response latency experienced by users, and reduce workload on backend databases. They need a high cache-hit rate to be fit for purpose, and this rate is dependent on the cache management policy used. Existing cache management policies are not designed to prevent cache pollution or cache monopoly problems, which impacts negatively on the cache-hit rate. This paper proposes a community-based caching approach (CC) to address these two problems. CC was evaluated for performance against thirteen commercially available cache management policies, and results demonstrate that the cache-hit rate achieved by CC was between 0.7% and 55% better than the alternate cache management policies

Crossref

Kent Academic Repository

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Author: Barbosa Helio J. C.
Foster Ian
Gadelha Jr Luiz M. R.
Katz Daniel S.
Loss Guilherme
Magalhães Thiago
Mattoso Marta
Mondelli Maria Luiza
Ocaña Kary
Vasconcelos Ana Tereza R.
Wilde Michael
Publication venue: 'PeerJ'
Publication date: 11/01/2018
Field of study

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

arXiv.org e-Print Archive

Directory of Open Access Journals

A Distributed Approach to Crawl Domain Specific Hidden Web

Author: Desai Lovekeshkumar
Publication venue: ScholarWorks @ Georgia State University
Publication date: 03/08/2007
Field of study

A large amount of on-line information resides on the invisible web - web pages generated dynamically from databases and other data sources hidden from current crawlers which retrieve content only from the publicly indexable Web. Specially, they ignore the tremendous amount of high quality content hidden behind search forms, and pages that require authorization or prior registration in large searchable electronic databases. To extracting data from the hidden web, it is necessary to find the search forms and fill them with appropriate information to retrieve maximum relevant information. To fulfill the complex challenges that arise when attempting to search hidden web i.e. lots of analysis of search forms as well as retrieved information also, it becomes eminent to design and implement a distributed web crawler that runs on a network of workstations to extract data from hidden web. We describe the software architecture of the distributed and scalable system and also present a number of novel techniques that went into its design and implementation to extract maximum relevant data from hidden web for achieving high performance

ScholarWorks @ Georgia State University

Automatic Migration of Data to NoSQL Databases Using Service Oriented Architecture

Author: Koshy Rohan
Publication venue
Publication date: 01/01/2015
Field of study

For the past few years there has been an exponential rise in the use of databases which are not true relational databases. There is no correct definition of such databases but can only be described with a set of common characteristics such absence of a fixed schema, inherent scalability features, high performance, data etc. These databases have come to be known as NoSQL databases. Various companies are seeing the advantages of NoSQL and want to migrate to these databases. But they find it difficult to migrate their data as a lot of study and analysis is required. Each type of database have their own terminology and query language. We propose a novel automated migration model which utilizes the power of service oriented architecture to help these companies easily migrate to NoSQL databases of their choice. We utilize web services which encapsulates few of the most popular NoSQL databases such as MongoDB, Neo4j, Cassandra etc. so that inner details of these databases are hidden yet providing efficient migration of data with little or no knowledge of the inner working of these databases. As proof of concept relational data was migrated successfully from Apache Derby database to MongoDB, Cassandra, Neo4j and DynamoDB, each vendor representing a different type of NoSQL database

ethesis@nitr

A Balanced Memory-Based Collaborative Filtering Similarity Measure.

Author: Arroyo Castillo Angel
Bobadilla Sancho Jesus
Hernando Esteban Antonio
Ortega Requena Fernando
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Collaborative filtering recommender systems contribute to alleviating the problem of information overload that exists on the Internet as a result of the mass use of Web 2.0 applications. The use of an adequate similarity measure becomes a determining factor in the quality of the prediction and recommendation results of the recommender system, as well as in its performance. In this paper, we present a memory-based collaborative filtering similarity measure that provides extremely high-quality and balanced results; these results are complemented with a low processing time (high performance), similar to the one required to execute traditional similarity metrics. The experiments have been carried out on the MovieLens and Netflix databases, using a representative set of information retrieval quality measures

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Following the Mobile Student: Can We Develop the Capacity for a Comprehensive Database to Assess Student Progression?

Author: Karen Paulson
Paula R. Schild
Peter T. Ewell
Publication venue: Lumina Foundation
Publication date: 04/04/2003
Field of study

Presents a study of state-level databases on postsecondary student retention and completion rates and the feasibility of tracking students across state lines. Outlines challenges and recommendations, including establishing a common reporting standard

IssueLab

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Author: Alam Mansaf
Ali Syed Arshad
Khan Samiya
Liu Xiufeng
Publication venue
Publication date: 01/01/2019
Field of study

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

arXiv.org e-Print Archive

Online Research Database In Technology