7,844 research outputs found
User Applications Driven by the Community Contribution Framework MPContribs in the Materials Project
This work discusses how the MPContribs framework in the Materials Project
(MP) allows user-contributed data to be shown and analyzed alongside the core
MP database. The Materials Project is a searchable database of electronic
structure properties of over 65,000 bulk solid materials that is accessible
through a web-based science-gateway. We describe the motivation for enabling
user contributions to the materials data and present the framework's features
and challenges in the context of two real applications. These use-cases
illustrate how scientific collaborations can build applications with their own
"user-contributed" data using MPContribs. The Nanoporous Materials Explorer
application provides a unique search interface to a novel dataset of hundreds
of thousands of materials, each with tables of user-contributed values related
to material adsorption and density at varying temperature and pressure. The
Unified Theoretical and Experimental x-ray Spectroscopy application discusses a
full workflow for the association, dissemination and combined analyses of
experimental data from the Advanced Light Source with MP's theoretical core
data, using MPContribs tools for data formatting, management and exploration.
The capabilities being developed for these collaborations are serving as the
model for how new materials data can be incorporated into the Materials Project
website with minimal staff overhead while giving powerful tools for data search
and display to the user community.Comment: 12 pages, 5 figures, Proceedings of 10th Gateway Computing
Environments Workshop (2015), to be published in "Concurrency in Computation:
Practice and Experience
Distributed Management of Massive Data: an Efficient Fine-Grain Data Access Scheme
This paper addresses the problem of efficiently storing and accessing massive
data blocks in a large-scale distributed environment, while providing efficient
fine-grain access to data subsets. This issue is crucial in the context of
applications in the field of databases, data mining and multimedia. We propose
a data sharing service based on distributed, RAM-based storage of data, while
leveraging a DHT-based, natively parallel metadata management scheme. As
opposed to the most commonly used grid storage infrastructures that provide
mechanisms for explicit data localization and transfer, we provide a
transparent access model, where data are accessed through global identifiers.
Our proposal has been validated through a prototype implementation whose
preliminary evaluation provides promising results
A Provenance Tracking Model for Data Updates
For data-centric systems, provenance tracking is particularly important when
the system is open and decentralised, such as the Web of Linked Data. In this
paper, a concise but expressive calculus which models data updates is
presented. The calculus is used to provide an operational semantics for a
system where data and updates interact concurrently. The operational semantics
of the calculus also tracks the provenance of data with respect to updates.
This provides a new formal semantics extending provenance diagrams which takes
into account the execution of processes in a concurrent setting. Moreover, a
sound and complete model for the calculus based on ideals of series-parallel
DAGs is provided. The notion of provenance introduced can be used as a
subjective indicator of the quality of data in concurrent interacting systems.Comment: In Proceedings FOCLASA 2012, arXiv:1208.432
Functional Data Analysis in Electronic Commerce Research
This paper describes opportunities and challenges of using functional data
analysis (FDA) for the exploration and analysis of data originating from
electronic commerce (eCommerce). We discuss the special data structures that
arise in the online environment and why FDA is a natural approach for
representing and analyzing such data. The paper reviews several FDA methods and
motivates their usefulness in eCommerce research by providing a glimpse into
new domain insights that they allow. We argue that the wedding of eCommerce
with FDA leads to innovations both in statistical methodology, due to the
challenges and complications that arise in eCommerce data, and in online
research, by being able to ask (and subsequently answer) new research questions
that classical statistical methods are not able to address, and also by
expanding on research questions beyond the ones traditionally asked in the
offline environment. We describe several applications originating from online
transactions which are new to the statistics literature, and point out
statistical challenges accompanied by some solutions. We also discuss some
promising future directions for joint research efforts between researchers in
eCommerce and statistics.Comment: Published at http://dx.doi.org/10.1214/088342306000000132 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Protocols for Integrity Constraint Checking in Federated Databases
A federated database is comprised of multiple interconnected database systems that primarily operate independently but cooperate to a certain extent. Global integrity constraints can be very useful in federated databases, but the lack of global queries, global transaction mechanisms, and global concurrency control renders traditional constraint management techniques inapplicable. This paper presents a threefold contribution to integrity constraint checking in federated databases: (1) The problem of constraint checking in a federated database environment is clearly formulated. (2) A family of protocols for constraint checking is presented. (3) The differences across protocols in the family are analyzed with respect to system requirements, properties guaranteed by the protocols, and processing and communication costs. Thus, our work yields a suite of options from which a protocol can be chosen to suit the system capabilities and integrity requirements of a particular federated database environment
Concurrency Control for Perceivedly Instantaneous Transactions in Valid-Time Databases
Although temporal databases have received considerable attention as a topic for research, little work in the area has paid attention to the concurrency control mechanisms that might be employed in temporal databases. This paper describes how the notion of the current time --- also called `now' --- in valid-time databases can cause standard serialisation theory to give what are at least unintuitive results, if not actually incorrect results. The paper then describes two modifications to standard serialisation theory which correct the behaviour to give what we term perceivably instantaneous transactions; transactions where serialising T 1 and T 2 as [T 1 ; T 2 ] always implies that the current time seen by T 1 is less than or equal to the current time seen by T 2 . 1 Introduction Query languages for valid-time temporal database normally contain a notion of "currenttime " [TCG + 93, Sno95], usually represented as the value of a special variable now. While it is agreed that the value of..
- …