826 research outputs found
Semantic and Syntactic Matching of Heterogeneous e-Catalogues
In e-procurement, companies use e-catalogues to exchange product infor-mation with business partners. Matching e-catalogues with product requests helps the suppliers to identify the best business opportunities in B2B e-Marketplaces. But various ways to specify products and the large variety of e-catalogue formats used by different business actors makes it difficult.
This Ph.D. thesis aims to discover potential syntactic and semantic rela-tionships among product data in procurement documents and exploit it to find similar e-catalogues. Using a Concept-based Vector Space Model, product data and its semantic interpretation is used to find the correlation of product data. In order to identify important terms in procurement documents, standard e-catalogues and e-tenders are used as a resource to train a Product Named Entity Recognizer to find B2B product mentions in e-catalogues.
The proposed approach makes it possible to use the benefits of all availa-ble semantic resources and schemas but not to be dependent on any specific as-sumption. The solution can serve as a B2B product search system in e-Procurement platforms and e-Marketplaces
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Enriching ontological user profiles with tagging history for multi-domain recommendations
Many advanced recommendation frameworks employ ontologies of various complexities to model individuals and items, providing a mechanism for the expression of user interests and the representation of item attributes. As a result, complex matching techniques can be applied to support individuals in the discovery of items according to explicit and implicit user preferences. Recently, the rapid adoption of Web2.0, and the proliferation of social networking sites, has resulted in more and more users providing an increasing amount of information about themselves that could be exploited for recommendation purposes. However, the unification of personal information with ontologies using the contemporary knowledge representation methods often associated with Web2.0 applications, such as community tagging, is a non-trivial task. In this paper, we propose a method for the unification of tags with ontologies by grounding tags to a shared representation in the form of Wordnet and Wikipedia. We incorporate individuals' tagging history into their ontological profiles by matching tags with ontology concepts. This approach is preliminary evaluated by extending an existing news recommendation system with user tagging histories harvested from popular social networking sites
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Recommended from our members
Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: NL
Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: N
Natural Language Processing for Teaching Ancient Languages
Konstantin Schulz shows various applications of natural language processing (NLP) to the field of Classics, especially to Latin texts. He addresses different levels of linguistic analysis while also highlighting educational benefits and important theoretical pitfalls, especially in vocabulary learning. NLP can solve some problems reasonably well, like tailoring exercises to the learners' current state of knowledge. However, some tasks still prove to be too difficult for machines at the moment, e.g. reliable and highly accurate parsing of syntax for historical languages
- …