33,144 research outputs found
Co-evolution of RDF Datasets
Linking Data initiatives have fostered the publication of large number of RDF
datasets in the Linked Open Data (LOD) cloud, as well as the development of
query processing infrastructures to access these data in a federated fashion.
However, different experimental studies have shown that availability of LOD
datasets cannot be always ensured, being RDF data replication required for
envisioning reliable federated query frameworks. Albeit enhancing data
availability, RDF data replication requires synchronization and conflict
resolution when replicas and source datasets are allowed to change data over
time, i.e., co-evolution management needs to be provided to ensure consistency.
In this paper, we tackle the problem of RDF data co-evolution and devise an
approach for conflict resolution during co-evolution of RDF datasets. Our
proposed approach is property-oriented and allows for exploiting semantics
about RDF properties during co-evolution management. The quality of our
approach is empirically evaluated in different scenarios on the DBpedia-live
dataset. Experimental results suggest that proposed proposed techniques have a
positive impact on the quality of data in source datasets and replicas.Comment: 18 pages, 4 figures, Accepted in ICWE, 201
Qualitative Effects of Knowledge Rules in Probabilistic Data Integration
One of the problems in data integration is data overlap: the fact that different data sources have data on the same real world entities. Much development time in data integration projects is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates from the integration result or solve other semantic conflicts, but it proofs impossible to get rid of all semantic problems in data integration. An often-used rule of thumb states that about 90% of the development effort is devoted to solving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that stores any remaining semantic uncertainty and conflicts in a probabilistic database enabling it to already be meaningfully used. The main development effort in our approach is devoted to defining and tuning knowledge rules and thresholds. Rules and thresholds directly impact the size and quality of the integration result. We measure integration quality indirectly by measuring the quality of answers to queries on the integrated data set in an information retrieval-like way. The main contribution of this report is an experimental investigation of the effects and sensitivity of rule definition and threshold tuning on the integration quality. This proves that our approach indeed reduces development effort â and not merely shifts the effort to rule definition and threshold tuning â by showing that setting rough safe thresholds and defining only a few rules suffices to produce a âgood enoughâ integration that can be meaningfully used
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
DataHub: Collaborative Data Science & Dataset Version Management at Scale
Relational databases have limited support for data collaboration, where teams
collaboratively curate and analyze large datasets. Inspired by software version
control systems like git, we propose (a) a dataset version control system,
giving users the ability to create, branch, merge, difference and search large,
divergent collections of datasets, and (b) a platform, DataHub, that gives
users the ability to perform collaborative data analysis building on this
version control system. We outline the challenges in providing dataset version
control at scale.Comment: 7 page
Requirements modelling and formal analysis using graph operations
The increasing complexity of enterprise systems requires a more advanced
analysis of the representation of services expected than is currently possible.
Consequently, the specification stage, which could be facilitated by formal
verification, becomes very important to the system life-cycle. This paper presents
a formal modelling approach, which may be used in order to better represent
the reality of the system and to verify the awaited or existing systemâs properties,
taking into account the environmental characteristics. For that, we firstly propose
a formalization process based upon properties specification, and secondly we
use Conceptual Graphs operations to develop reasoning mechanisms of verifying
requirements statements. The graphic visualization of these reasoning enables us
to correctly capture the system specifications by making it easier to determine if
desired properties hold. It is applied to the field of Enterprise modelling
Attendee-Sourcing: Exploring The Design Space of Community-Informed Conference Scheduling
Constructing a good conference schedule for a large multi-track conference
needs to take into account the preferences and constraints of organizers,
authors, and attendees. Creating a schedule which has fewer conflicts for
authors and attendees, and thematically coherent sessions is a challenging
task.
Cobi introduced an alternative approach to conference scheduling by engaging
the community to play an active role in the planning process. The current Cobi
pipeline consists of committee-sourcing and author-sourcing to plan a
conference schedule. We further explore the design space of community-sourcing
by introducing attendee-sourcing -- a process that collects input from
conference attendees and encodes them as preferences and constraints for
creating sessions and schedule. For CHI 2014, a large multi-track conference in
human-computer interaction with more than 3,000 attendees and 1,000 authors, we
collected attendees' preferences by making available all the accepted papers at
the conference on a paper recommendation tool we built called Confer, for a
period of 45 days before announcing the conference program (sessions and
schedule). We compare the preferences marked on Confer with the preferences
collected from Cobi's author-sourcing approach. We show that attendee-sourcing
can provide insights beyond what can be discovered by author-sourcing. For CHI
2014, the results show value in the method and attendees' participation. It
produces data that provides more alternatives in scheduling and complements
data collected from other methods for creating coherent sessions and reducing
conflicts.Comment: HCOMP 201
- âŠ