7,580 research outputs found
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Information visualisation and data analysis using web mash-up systems
A thesis submitted in partial fulfilment for the degree of Doctor of PhilosophyThe arrival of E-commerce systems have contributed greatly to the economy and have played a vital role in collecting a huge amount of transactional data. It is becoming difficult day by day to analyse business and consumer behaviour with the production of such a colossal volume of data. Enterprise 2.0 has the ability to store and create an enormous amount of transactional data; the purpose for which data was collected could quite easily be disassociated as the essential information goes unnoticed in large and complex data sets. The information overflow is a major contributor to the dilemma. In the current environment, where hardware systems have the ability to store such large volumes of data and the software systems have the capability of substantial data production, data exploration problems are on the rise. The problem is not with the production or storage of data but with the effectiveness of the systems and techniques where essential information could be retrieved from complex data sets in a comprehensive and logical approach as the data questions are asked. Using the existing information retrieval systems and visualisation tools, the more specific questions are asked, the more definitive and unambiguous are the visualised results that could be attained, but when it comes to complex and large data sets there are no elementary or simple questions. Therefore a profound information visualisation model and system is required to analyse complex data sets through data analysis and information visualisation, to make it possible for the decision makers to identify the expected and discover the unexpected. In order to address complex data problems, a comprehensive and robust visualisation model and system is introduced. The visualisation model consists of four major layers, (i) acquisition and data analysis, (ii) data representation, (iii) user and computer interaction and (iv) results repositories. There are major contributions in all four layers but particularly in data acquisition and data representation. Multiple attribute and dimensional data visualisation techniques are identified in Enterprise 2.0 and Web 2.0 environment. Transactional tagging and linked data are unearthed which is a novel contribution in information visualisation. The visualisation model and system is first realised as a tangible software system, which is then validated through different and large types of data sets in three experiments. The first experiment is based on the large Royal Mail postcode data set. The second experiment is based on a large transactional data set in an enterprise environment while the same data set is processed in a non-enterprise environment. The system interaction facilitated through new mashup techniques enables users to interact more fluently with data and the representation layer. The results are exported into various reusable formats and retrieved for further comparison and analysis purposes. The information visualisation model introduced in this research is a compact process for any size and type of data set which is a major contribution in information visualisation and data analysis. Advanced data representation techniques are employed using various web mashup technologies. New visualisation techniques have emerged from the research such as transactional tagging visualisation and linked data visualisation. The information visualisation model and system is extremely useful in addressing complex data problems with strategies that are easy to interact with and integrate
The power of indirect social ties
While direct social ties have been intensely studied in the context of
computer-mediated social networks, indirect ties (e.g., friends of friends)
have seen little attention. Yet in real life, we often rely on friends of our
friends for recommendations (of good doctors, good schools, or good
babysitters), for introduction to a new job opportunity, and for many other
occasional needs. In this work we attempt to 1) quantify the strength of
indirect social ties, 2) validate it, and 3) empirically demonstrate its
usefulness for distributed applications on two examples. We quantify social
strength of indirect ties using a(ny) measure of the strength of the direct
ties that connect two people and the intuition provided by the sociology
literature. We validate the proposed metric experimentally by comparing
correlations with other direct social tie evaluators. We show via data-driven
experiments that the proposed metric for social strength can be used
successfully for social applications. Specifically, we show that it alleviates
known problems in friend-to-friend storage systems by addressing two previously
documented shortcomings: reduced set of storage candidates and data
availability correlations. We also show that it can be used for predicting the
effects of a social diffusion with an accuracy of up to 93.5%.Comment: Technical Repor
Data consistency in transactional storage systems: a centralised approach.
We introduce an interleaving operational semantics for describing the client-observable behaviour of atomic transactions on distributed key-value stores. Our semantics builds on abstract states comprising centralised, global key-value stores and partial client views. We provide operational definitions of consistency models for our key-value stores which are shown to be equivalent to the well-known declarative definitions of consistency model for execution graphs. We explore two immediate applications of our semantics: specific protocols of geo-replicated databases (e.g. COPS) and partitioned databases (e.g. Clock-SI) can be shown to be correct for a specific consistency model by embedding them in our centralised semantics; programs can be directly shown to have invariant properties such as robustness results against a weak consistency model
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey
Graph processing is becoming increasingly prevalent across many application
domains. In spite of this prevalence, there is little research about how graphs
are actually used in practice. We performed an extensive study that consisted
of an online survey of 89 users, a review of the mailing lists, source
repositories, and whitepapers of a large suite of graph software products, and
in-person interviews with 6 users and 2 developers of these products. Our
online survey aimed at understanding: (i) the types of graphs users have; (ii)
the graph computations users run; (iii) the types of graph software users use;
and (iv) the major challenges users face when processing their graphs. We
describe the participants' responses to our questions highlighting common
patterns and challenges. Based on our interviews and survey of the rest of our
sources, we were able to answer some new questions that were raised by
participants' responses to our online survey and understand the specific
applications that use graph data and software. Our study revealed surprising
facts about graph processing in practice. In particular, real-world graphs
represent a very diverse range of entities and are often very large,
scalability and visualization are undeniably the most pressing challenges faced
by participants, and data integration, recommendations, and fraud detection are
very popular applications supported by existing graph software. We hope these
findings can guide future research
- …