19,516 research outputs found
Efficient data representation for XML in peer-based systems
Purpose - New directions in the provision of end-user computing experiences mean that the best way to share data between small mobile computing devices needs to be determined. Partitioning large structures so that they can be shared efficiently provides a basis for data-intensive applications on such platforms. The partitioned structure can be compressed using dictionary-based approaches and then directly queried without firstly decompressing the whole structure. Design/methodology/approach - The paper describes an architecture for partitioning XML into structural and dictionary elements and the subsequent manipulation of the dictionary elements to make the best use of available space. Findings - The results indicate that considerable savings are available by removing duplicate dictionaries. The paper also identifies the most effective strategy for defining dictionary scope. Research limitations/implications - This evaluation is based on a range of benchmark XML structures and the approach to minimising dictionary size shows benefit in the majority of these. Where structures are small and regular, the benefits of efficient dictionary representation are lost. The authors' future research now focuses on heuristics for further partitioning of structural elements. Practical implications - Mobile applications that need access to large data collections will benefit from the findings of this research. Traditional client/server architectures are not suited to dealing with high volume demands from a multitude of small mobile devices. Peer data sharing provides a more scalable solution and the experiments that the paper describes demonstrate the most effective way of sharing data in this context. Social implications - Many services are available via smartphone devices but users are wary of exploiting the full potential because of the need to conserve battery power. The approach mitigates this challenge and consequently expands the potential for users to benefit from mobile information systems. This will have impact in areas such as advertising, entertainment and education but will depend on the acceptability of file sharing being extended from the desktop to the mobile environment. Originality/value - The original work characterises the most effective way of sharing large data sets between small mobile devices. This will save battery power on devices such as smartphones, thus providing benefits to users of such devices
Sharing large data collections between mobile peers
New directions in the provision of end-user computing experiences mean that we need to determine the best way to share data between small mobile computing devices. Partitioning large structures so that they can be shared efficiently provides a basis for data-intensive applications on such platforms. In conjunction with such an approach, dictionary-based compression techniques provide additional benefits and help to prolong battery life
Efficient Multi-way Theta-Join Processing Using MapReduce
Multi-way Theta-join queries are powerful in describing complex relations and
therefore widely employed in real practices. However, existing solutions from
traditional distributed and parallel databases for multi-way Theta-join queries
cannot be easily extended to fit a shared-nothing distributed computing
paradigm, which is proven to be able to support OLAP applications over immense
data volumes. In this work, we study the problem of efficient processing of
multi-way Theta-join queries using MapReduce from a cost-effective perspective.
Although there have been some works using the (key,value) pair-based
programming model to support join operations, efficient processing of multi-way
Theta-join queries has never been fully explored. The substantial challenge
lies in, given a number of processing units (that can run Map or Reduce tasks),
mapping a multi-way Theta-join query to a number of MapReduce jobs and having
them executed in a well scheduled sequence, such that the total processing time
span is minimized. Our solution mainly includes two parts: 1) cost metrics for
both single MapReduce job and a number of MapReduce jobs executed in a certain
order; 2) the efficient execution of a chain-typed Theta-join with only one
MapReduce job. Comparing with the query evaluation strategy proposed in [23]
and the widely adopted Pig Latin and Hive SQL solutions, our method achieves
significant improvement of the join processing efficiency.Comment: VLDB201
Peer to Peer Information Retrieval: An Overview
Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Multi-Paradigm Reasoning for Access to Heterogeneous GIS
Accessing and querying geographical data in a uniform way has become easier in recent years. Emerging standards like WFS turn
the web into a geospatial web services enabled place. Mediation
architectures like VirGIS overcome syntactical and semantical heterogeneity
between several distributed sources. On mobile devices,
however, this kind of solution is not suitable, due to limitations,
mostly regarding bandwidth, computation power, and available storage
space. The aim of this paper is to present a solution for providing
powerful reasoning mechanisms accessible from mobile applications
and involving data from several heterogeneous sources.
By adapting contents to time and location, mobile web information
systems can not only increase the value and suitability of the
service itself, but can substantially reduce the amount of data delivered
to users. Because many problems pertain to infrastructures
and transportation in general and to way finding in particular, one
cornerstone of the architecture is higher level reasoning on graph
networks with the Multi-Paradigm Location Language MPLL. A
mediation architecture is used as a “graph provider” in order to
transfer the load of computation to the best suited component –
graph construction and transformation for example being heavy on
resources. Reasoning in general can be conducted either near the
“source” or near the end user, depending on the specific use case.
The concepts underlying the proposal described in this paper are
illustrated by a typical and concrete scenario for web applications
Web-Based Visualization of Very Large Scientific Astronomy Imagery
Visualizing and navigating through large astronomy images from a remote
location with current astronomy display tools can be a frustrating experience
in terms of speed and ergonomics, especially on mobile devices. In this paper,
we present a high performance, versatile and robust client-server system for
remote visualization and analysis of extremely large scientific images.
Applications of this work include survey image quality control, interactive
data query and exploration, citizen science, as well as public outreach. The
proposed software is entirely open source and is designed to be generic and
applicable to a variety of datasets. It provides access to floating point data
at terabyte scales, with the ability to precisely adjust image settings in
real-time. The proposed clients are light-weight, platform-independent web
applications built on standard HTML5 web technologies and compatible with both
touch and mouse-based devices. We put the system to the test and assess the
performance of the system and show that a single server can comfortably handle
more than a hundred simultaneous users accessing full precision 32 bit
astronomy data.Comment: Published in Astronomy & Computing. IIPImage server available from
http://iipimage.sourceforge.net . Visiomatic code and demos available from
http://www.visiomatic.org
AiiDA: Automated Interactive Infrastructure and Database for Computational Science
Computational science has seen in the last decades a spectacular rise in the
scope, breadth, and depth of its efforts. Notwithstanding this prevalence and
impact, it is often still performed using the renaissance model of individual
artisans gathered in a workshop, under the guidance of an established
practitioner. Great benefits could follow instead from adopting concepts and
tools coming from computer science to manage, preserve, and share these
computational efforts. We illustrate here our paradigm sustaining such vision,
based around the four pillars of Automation, Data, Environment, and Sharing. We
then discuss its implementation in the open-source AiiDA platform
(http://www.aiida.net), that has been tuned first to the demands of
computational materials science. AiiDA's design is based on directed acyclic
graphs to track the provenance of data and calculations, and ensure
preservation and searchability. Remote computational resources are managed
transparently, and automation is coupled with data storage to ensure
reproducibility. Last, complex sequences of calculations can be encoded into
scientific workflows. We believe that AiiDA's design and its sharing
capabilities will encourage the creation of social ecosystems to disseminate
codes, data, and scientific workflows.Comment: 30 pages, 7 figure
- …