151 research outputs found
CRIS-IR 2006
The recognition of entities and their
relationships in document collections is an important step towards the discovery of latent knowledge as well as to support knowledge management applications.
The challenge lies on how to extract and correlate entities, aiming to answer key knowledge management questions, such as; who works with whom, on which projects, with which customers and on what research areas. The present work proposes a
knowledge mining approach supported by information retrieval and text mining tasks in which its core is based on the correlation of textual elements through the LRD (Latent Relation Discovery) method. Our experiments show that LRD outperform better than
other correlation methods. Also, we present an application in order to demonstrate the approach over knowledge management scenarios.Fundação para a Ciência e a Tecnologia (FCT)
Denmark's Electronic Research Librar
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
A Survey on Retrieval of Mathematical Knowledge
We present a short survey of the literature on indexing and retrieval of
mathematical knowledge, with pointers to 72 papers and tentative taxonomies of
both retrieval problems and recurring techniques.Comment: CICM 2015, 20 page
Development of a Framework for Ontology Population Using Web Scraping in Mechatronics
One of the major challenges in engineering contexts is the efficient collection, management, and sharing of data. To address this problem, semantic technologies and ontologies are potent assets, although some tasks, such as ontology population, usually demand high maintenance effort. This thesis proposes a framework to automate data collection from sparse web resources and insert it into an ontology. In the first place, a product ontology is created based on the combination of several reference vocabularies, namely GoodRelations, the Basic Formal Ontology, ECLASS stan- dard, and an information model. Then, this study introduces a general procedure for developing a web scraping agent to collect data from the web. Subsequently, an algorithm based on lexical similarity measures is presented to map the collected data to the concepts of the ontology. Lastly, the collected data is inserted into the ontology. To validate the proposed solution, this thesis implements the previous steps to collect information about microcontrollers from three differ- ent websites. Finally, the thesis evaluates the use case results, draws conclusions, and suggests promising directions for future research
Keyword-Based Querying for the Social Semantic Web
Enabling non-experts to publish data on the web is an important
achievement of the social web and one of the primary goals of the social
semantic web. Making the data easily accessible in turn has received only
little attention, which is problematic from the point of view of
incentives: users are likely to be less motivated to participate in the
creation of content if the use of this content is mostly reserved to
experts.
Querying in semantic wikis, for example, is typically realized in terms of
full text search over the textual content and a web query language such as
SPARQL for the annotations. This approach has two shortcomings that limit
the extent to which data can be leveraged by users: combined queries over
content and annotations are not possible, and users either are restricted
to expressing their query intent using simple but vague keyword queries or
have to learn a complex web query language.
The work presented in this dissertation investigates a more suitable form
of querying for semantic wikis that consolidates two seemingly conflicting
characteristics of query languages, ease of use and expressiveness. This
work was carried out in the context of the semantic wiki KiWi, but the
underlying ideas apply more generally to the social semantic and social
web.
We begin by defining a simple modular conceptual model for the KiWi wiki
that enables rich and expressive knowledge representation. A component of
this model are structured tags, an annotation formalism that is simple yet
flexible and expressive, and aims at bridging the gap between atomic tags
and RDF. The viability of the approach is confirmed by a user study, which
finds that structured tags are suitable for quickly annotating evolving
knowledge and are perceived well by the users.
The main contribution of this dissertation is the design and
implementation of KWQL, a query language for semantic wikis. KWQL combines
keyword search and web querying to enable querying that scales with user
experience and information need: basic queries are easy to express; as the
search criteria become more complex, more expertise is needed to formulate
the corresponding query. A novel aspect of KWQL is that it combines both
paradigms in a bottom-up fashion. It treats neither of the two as an
extension to the other, but instead integrates both in one framework. The
language allows for rich combined queries of full text, metadata, document
structure, and informal to formal semantic annotations. KWilt, the KWQL
query engine, provides the full expressive power of first-order queries,
but at the same time can evaluate basic queries at almost the speed of the
underlying search engine. KWQL is accompanied by the visual query language
visKWQL, and an editor that displays both the textual and visual form of
the current query and reflects changes to either representation in the
other. A user study shows that participants quickly learn to construct
KWQL and visKWQL queries, even when given only a short introduction.
KWQL allows users to sift the wealth of structure and annotations in an
information system for relevant data. If relevant data constitutes a
substantial fraction of all data, ranking becomes important. To this end,
we propose PEST, a novel ranking method that propagates relevance among
structurally related or similarly annotated data. Extensive experiments,
including a user study on a real life wiki, show that pest improves the
quality of the ranking over a range of existing ranking approaches
Proceedings of the 9th International Workshop on Information Retrieval on Current Research Information Systems
The recognition of entities and their
relationships in document collections is an important step towards the discovery of latent knowledge as well as to support knowledge management applications.
The challenge lies on how to extract and correlate entities, aiming to answer key knowledge management questions, such as; who works with whom, on which projects, with which customers and on what research areas. The present work proposes a
knowledge mining approach supported by information retrieval and text mining tasks in which its core is based on the correlation of textual elements through the LRD (Latent Relation Discovery) method. Our experiments show that LRD outperform better than
other correlation methods. Also, we present an application in order to demonstrate the approach over knowledge management scenarios
Annotation-based storage and retrieval of models and simulation descriptions in computational biology
This work aimed at enhancing reuse of computational biology models by identifying and formalizing relevant meta-information. One type of meta-information investigated in this thesis is experiment-related meta-information attached to a model, which is necessary to accurately recreate simulations. The main results are: a detailed concept for model annotation, a proposed format for the encoding of simulation experiment setups, a storage solution for standardized model representations and the development of a retrieval concept.Die vorliegende Arbeit widmete sich der besseren Wiederverwendung biologischer Simulationsmodelle. Ziele waren die Identifikation und Formalisierung relevanter Modell-Meta-Informationen, sowie die Entwicklung geeigneter Modellspeicherungs- und Modellretrieval-Konzepte.
Wichtigste Ergebnisse der Arbeit sind ein detailliertes Modellannotationskonzept, ein Formatvorschlag für standardisierte Kodierung von Simulationsexperimenten in XML, eine Speicherlösung für Modellrepräsentationen sowie ein Retrieval-Konzept
Automatic Extraction and Assessment of Entities from the Web
The search for information about entities, such as people or movies, plays an increasingly important role on the Web. This information is still scattered across many Web pages, making it more time consuming for a user to find all relevant information about an entity. This thesis describes techniques to extract entities and information about these entities from the Web, such as facts, opinions, questions and answers, interactive multimedia objects, and events. The findings of this thesis are that it is possible to create a large knowledge base automatically using a manually-crafted ontology. The precision of the extracted information was found to be between 75–90 % (facts and entities respectively) after using assessment algorithms. The algorithms from this thesis can be used to create such a knowledge base, which can be used in various research fields, such as question answering, named entity recognition, and information retrieval
Automated Identification and Exploitation of Market Insufficiencies in Fixed-Odds Betting Markets
Στη διπλωματική αυτή εργασία μελετάμε τη δυνατότητα αυτόματης αναγνώρισης και
εκμετάλλευσης ανεπαρκειών αγοράς σε λιγότερο ερευνημένες αγορές. Ορίζουμε το
πρόβλημα της αναποτελεσματικότητας της αγοράς, προσδιορίζουμε τις επιθυμητές
συνθήκες, και να εξετάζουμε τελικά την ύπαρξη ή όχι ανεπαρκειών που μπορούν να
προσφέρουν ευκαιρίες για εξισορροπητική κερδοσκοπία σε αγορές πρόβλεψης. Στο
πλαίσιο της εργασίας, αναπτύχθηκε το λογισμικό που ονομάζεται Delphi, και είναι
σε θέση να εντοπίζει τέτοιες ευκαιρίες κέρδους σε μεγάλους όγκους δεδομένων και
σε σχετικά σύντομο χρονικό διάστημα. Το Delphi, είναι ένα σύστημα εξόρυξης και
ταξινόμησης δεδομένων διαδικτύου που μπορεί να αντιμετωπίσει την ετερογένεια
των πληροφοριών που παρουσιάζεται στην αγορά και να επιτύχει την παραγωγή
ταξινομημένων και κατάλληλα δομημένων δεδομένων προς επεξεργασία. Τέλος,
διαμορφώνεται ένα μαθηματικό μοντέλο που είναι σε θέση να κατανοήσει τα
δεδομένα αυτά, να αξιολογήσει τις συνθήκες της αγοράς, και να προσδιορίσει εάν
υπάρχουν ευκαιρίες για εξισορροπητική κερδοσκοπία.Ιn this Master thesis we study the possibility of automated identification and
exploitation of market insufficiencies in less explored on-line markets. We
define the problem, determine the desired market conditions, and finally
examine the existence of insufficiencies that can offer arbitrage opportunities
in on-line prediction markets. In the context of this thesis, we developed a
software named Delphi, that is able to identify arbitrage opportunities in
large volumes of data in relatively short time. Delphi is a web data extraction
and classification system capable of dealing with the heterogeneity of market
information and producing classified structured data. Finally, the problem is
reduced to a mathematical model able to understand the market data, evaluate
the market conditions, and identify arbitrage opportunities
- …