1,078 research outputs found
Managing Linguistic Data Summaries in Advanced P2P Applications
chapitre... à corrigerAs the amount of stored data increases, data localization techniques become no longer sufficient in P2P systems. A practical approach is to rely on compact database summaries rather than raw database records, whose access is costly in large P2P systems. In this chapter, we describe a solution for managing linguistic data summaries in advanced P2P applications which are dealing with semantically rich data. The produced summaries are synthetic, multidimensional views over relational tables. The novelty of this proposal relies on the double summary exploitation in distributed P2P systems. First, as semantic indexes, they support locating relevant nodes based on their data descriptions. Second, due to their intelligibility, these summaries can be directly queried and thus approximately answer a query without the need for exploring original data. The proposed solution consists first in defining a summary model for hierarchical P2P systems. Second, appropriate algorithms for summary creation and maintenance are presented. A query processing mechanism, which relies on summary querying, is then proposed to demonstrate the benefits that might be obtained from summary exploitation
Schema matching in a peer-to-peer database system
Includes bibliographical references (p. 112-118).Peer-to-peer or P2P systems are applications that allow a network of peers to share resources in a scalable and efficient manner. My research is concerned with the use of P2P systems for sharing databases. To allow data mediation between peers' databases, schema mappings need to exist, which are mappings between semantically equivalent attributes in different peers' schemas. Mappings can either be defined manually or found semi-automatically using a technique called schema matching. However, schema matching has not been used much in dynamic environments, such as P2P networks. Therefore, this thesis investigates how to enable effective semi-automated schema matching within a P2P network
Reply With: Proactive Recommendation of Email Attachments
Email responses often contain items-such as a file or a hyperlink to an
external document-that are attached to or included inline in the body of the
message. Analysis of an enterprise email corpus reveals that 35% of the time
when users include these items as part of their response, the attachable item
is already present in their inbox or sent folder. A modern email client can
proactively retrieve relevant attachable items from the user's past emails
based on the context of the current conversation, and recommend them for
inclusion, to reduce the time and effort involved in composing the response. In
this paper, we propose a weakly supervised learning framework for recommending
attachable items to the user. As email search systems are commonly available,
we constrain the recommendation task to formulating effective search queries
from the context of the conversations. The query is submitted to an existing IR
system to retrieve relevant items for attachment. We also present a novel
strategy for generating labels from an email corpus---without the need for
manual annotations---that can be used to train and evaluate the query
formulation model. In addition, we describe a deep convolutional neural network
that demonstrates satisfactory performance on this query formulation task when
evaluated on the publicly available Avocado dataset and a proprietary dataset
of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on
Information and Knowledge Management. 201
Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval
Neural networks with deep architectures have demonstrated significant
performance improvements in computer vision, speech recognition, and natural
language processing. The challenges in information retrieval (IR), however, are
different from these other application areas. A common form of IR involves
ranking of documents--or short passages--in response to keyword-based queries.
Effective IR systems must deal with query-document vocabulary mismatch problem,
by modeling relationships between different query and document terms and how
they indicate relevance. Models should also consider lexical matches when the
query contains rare terms--such as a person's name or a product model
number--not seen during training, and to avoid retrieving semantically related
but irrelevant results. In many real-life IR tasks, the retrieval involves
extremely large collections--such as the document index of a commercial Web
search engine--containing billions of documents. Efficient IR methods should
take advantage of specialized IR data structures, such as inverted index, to
efficiently retrieve from large collections. Given an information need, the IR
system also mediates how much exposure an information artifact receives by
deciding whether it should be displayed, and where it should be positioned,
among other results. Exposure-aware IR systems may optimize for additional
objectives, besides relevance, such as parity of exposure for retrieved items
and content publishers. In this thesis, we present novel neural architectures
and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020
A semantic and agent-based approach to support information retrieval, interoperability and multi-lateral viewpoints for heterogeneous environmental databases
PhDData stored in individual autonomous databases often needs to be combined and
interrelated. For example, in the Inland Water (IW) environment monitoring domain,
the spatial and temporal variation of measurements of different water quality indicators
stored in different databases are of interest. Data from multiple data sources is more
complex to combine when there is a lack of metadata in a computation forin and when
the syntax and semantics of the stored data models are heterogeneous. The main types
of information retrieval (IR) requirements are query transparency and data
harmonisation for data interoperability and support for multiple user views. A
combined Semantic Web based and Agent based distributed system framework has
been developed to support the above IR requirements. It has been implemented using
the Jena ontology and JADE agent toolkits. The semantic part supports the
interoperability of autonomous data sources by merging their intensional data, using a
Global-As-View or GAV approach, into a global semantic model, represented in
DAML+OIL and in OWL. This is used to mediate between different local database
views. The agent part provides the semantic services to import, align and parse
semantic metadata instances, to support data mediation and to reason about data
mappings during alignment. The framework has applied to support information
retrieval, interoperability and multi-lateral viewpoints for four European environmental
agency databases.
An extended GAV approach has been developed and applied to handle queries that can
be reformulated over multiple user views of the stored data. This allows users to
retrieve data in a conceptualisation that is better suited to them rather than to have to
understand the entire detailed global view conceptualisation. User viewpoints are
derived from the global ontology or existing viewpoints of it. This has the advantage
that it reduces the number of potential conceptualisations and their associated
mappings to be more computationally manageable. Whereas an ad hoc framework
based upon conventional distributed programming language and a rule framework
could be used to support user views and adaptation to user views, a more formal
framework has the benefit in that it can support reasoning about the consistency,
equivalence, containment and conflict resolution when traversing data models. A
preliminary formulation of the formal model has been undertaken and is based upon
extending a Datalog type algebra with hierarchical, attribute and instance value
operators. These operators can be applied to support compositional mapping and
consistency checking of data views. The multiple viewpoint system was implemented
as a Java-based application consisting of two sub-systems, one for viewpoint
adaptation and management, the other for query processing and query result
adjustment
Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science
These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)
- …