Search CORE

17,943 research outputs found

Umzi: Unified Multi-Zone Indexing for Large-Scale HTAP

Author: Barber Ronald
Luo Chen
Raman Vijayshankar
Sidle Richard
Tian Yuanyuan
Tözün Pinar
Publication venue
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

Interactive visual exploration of a large spatio-temporal dataset: Reflections on a geovisualization mashup

Author: Clarke K.
Dykes J.
Slingsby A.
Wood J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Exploratory visual analysis is useful for the preliminary investigation of large structured, multifaceted spatio-temporal datasets. This process requires the selection and aggregation of records by time, space and attribute, the ability to transform data and the flexibility to apply appropriate visual encodings and interactions. We propose an approach inspired by geographical 'mashups' in which freely-available functionality and data are loosely but flexibly combined using de facto exchange standards. Our case study combines MySQL, PHP and the LandSerf GIS to allow Google Earth to be used for visual synthesis and interaction with encodings described in KML. This approach is applied to the exploration of a log of 1.42 million requests made of a mobile directory service. Novel combinations of interaction and visual encoding are developed including spatial 'tag clouds', 'tag maps', 'data dials' and multi-scale density surfaces. Four aspects of the approach are informally evaluated: the visual encodings employed, their success in the visual exploration of the clataset, the specific tools used and the 'rnashup' approach. Preliminary findings will be beneficial to others considering using mashups for visualization. The specific techniques developed may be more widely applied to offer insights into the structure of multifarious spatio-temporal data of the type explored here

Many-Task Computing and Blue Waters

Author: Armstrong Timothy G.
Katz Daniel S.
Wilde Michael
Wozniak Justin M.
Zhang Zhao
Publication venue
Publication date: 01/01/2012
Field of study

This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

arXiv.org e-Print Archive

CiteSeerX

Measuring and Managing Answer Quality for Online Data-Intensive Services

Author: Elnikety Sameh
He Yuxiong
Kelley Jaimie
Morris Nathaniel
Stewart Christopher
Tiwari Devesh
Publication venue
Publication date: 16/06/2015
Field of study

Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers; the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the EasyRec Recommendation Engine, and the OpenEphyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor

arXiv.org e-Print Archive

Crossref

Implementation of an efficient Fuzzy Logic based Information Retrieval System

Author: Agarwal Shubham
Dhawan Sumit
Singh Prabhjot
Thakur Narina
Publication venue
Publication date: 13/03/2015
Field of study

This paper exemplifies the implementation of an efficient Information Retrieval (IR) System to compute the similarity between a dataset and a query using Fuzzy Logic. TREC dataset has been used for the same purpose. The dataset is parsed to generate keywords index which is used for the similarity comparison with the user query. Each query is assigned a score value based on its fuzzy similarity with the index keywords. The relevant documents are retrieved based on the score value. The performance and accuracy of the proposed fuzzy similarity model is compared with Cosine similarity model using Precision-Recall curves. The results prove the dominance of Fuzzy Similarity based IR system.Comment: arXiv admin note: substantial text overlap with http://ntz-develop.blogspot.in/ , http://www.micsymposium.org/mics2012/submissions/mics2012_submission_8.pdf , http://www.slideshare.net/JeffreyStricklandPhD/predictive-modeling-and-analytics-selectchapters-41304405 by other author

arXiv.org e-Print Archive

Directory of Open Access Journals

Collaborative e-science architecture for Reaction Kinetics research community

Author: Dew P.M.
Lau L.M.S.
Pham T.V.
Pilling M.J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

This paper presents a novel collaborative e-science architecture (CeSA) to address two challenging issues in e-science that arise from the management of heterogeneous distributed environments: (i) how to provide individual scientists an integrated environment to collaborate with each other in distributed, loosely coupled research communities where each member might be using a disparate range of tools; and (ii) how to provide easy access to a range of computationally intensive resources from a desktop. The Reaction Kinetics research community was used to capture the requirements and in the evaluation of the proposed architecture. The result demonstrated the feasibility of the approach and the potential benefits of the CeSA

Crossref

White Rose Research Online

A Case for Cooperative and Incentive-Based Coupling of Distributed Clusters

Author: Buyya Rajkumar
Harwood Aaron
Ranjan Rajiv
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Research interest in Grid computing has grown significantly over the past five years. Management of distributed resources is one of the key issues in Grid computing. Central to management of resources is the effectiveness of resource allocation as it determines the overall utility of the system. The current approaches to superscheduling in a grid environment are non-coordinated since application level schedulers or brokers make scheduling decisions independently of the others in the system. Clearly, this can exacerbate the load sharing and utilization problems of distributed resources due to suboptimal schedules that are likely to occur. To overcome these limitations, we propose a mechanism for coordinated sharing of distributed clusters based on computational economy. The resulting environment, called \emph{Grid-Federation}, allows the transparent use of resources from the federation when local resources are insufficient to meet its users' requirements. The use of computational economy methodology in coordinating resource allocation not only facilitates the QoS based scheduling, but also enhances utility delivered by resources.Comment: 22 pages, extended version of the conference paper published at IEEE Cluster'05, Boston, M

arXiv.org e-Print Archive

CiteSeerX

Crossref

Ontological Matchmaking in Recommender Systems

Author: Bonifati Angela
Mecca Giansalvatore
Sileo Domenica
Summa Gianvito
Publication venue
Publication date: 01/01/2010
Field of study

The electronic marketplace offers great potential for the recommendation of supplies. In the so called recommender systems, it is crucial to apply matchmaking strategies that faithfully satisfy the predicates specified in the demand, and take into account as much as possible the user preferences. We focus on real-life ontology-driven matchmaking scenarios and identify a number of challenges, being inspired by such scenarios. A key challenge is that of presenting the results to the users in an understandable and clear-cut fashion in order to facilitate the analysis of the results. Indeed, such scenarios evoke the opportunity to rank and group the results according to specific criteria. A further challenge consists of presenting the results to the user in an asynchronous fashion, i.e. the 'push' mode, along with the 'pull' mode, in which the user explicitly issues a query, and displays the results. Moreover, an important issue to consider in real-life cases is the possibility of submitting a query to multiple providers, and collecting the various results. We have designed and implemented an ontology-based matchmaking system that suitably addresses the above challenges. We have conducted a comprehensive experimental study, in order to investigate the usability of the system, the performance and the effectiveness of the matchmaking strategies with real ontological datasets.Comment: 28 pages, 8 figure

arXiv.org e-Print Archive

Archivio della Ricerca - Università della Basilicata

Conceptual information processing: A robust approach to KBS-DBMS integration

Author: Lazzara Allen V.
Liuzzi Raymond
Tepfenhart William
White Richard C.
Publication venue
Publication date
Field of study

Integrating the respective functionality and architectural features of knowledge base and data base management systems is a topic of considerable interest. Several aspects of this topic and associated issues are addressed. The significance of integration and the problems associated with accomplishing that integration are discussed. The shortcomings of current approaches to integration and the need to fuse the capabilities of both knowledge base and data base management systems motivates the investigation of information processing paradigms. One such paradigm is concept based processing, i.e., processing based on concepts and conceptual relations. An approach to robust knowledge and data base system integration is discussed by addressing progress made in the development of an experimental model for conceptual information processing

NASA Technical Reports Server