2,174 research outputs found
Term-driven E-Commerce
Die Arbeit nimmt sich der textuellen Dimension des E-Commerce an. Grundlegende Hypothese ist die textuelle Gebundenheit von Information und Transaktion im Bereich des elektronischen Handels. Überall dort, wo Produkte und Dienstleistungen angeboten, nachgefragt, wahrgenommen und bewertet werden, kommen natürlichsprachige Ausdrücke zum Einsatz. Daraus resultiert ist zum einen, wie bedeutsam es ist, die Varianz textueller Beschreibungen im E-Commerce zu erfassen, zum anderen können die umfangreichen textuellen Ressourcen, die bei E-Commerce-Interaktionen anfallen, im Hinblick auf ein besseres Verständnis natürlicher Sprache herangezogen werden
Applying Wikipedia to Interactive Information Retrieval
There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval
Thirty Musts for Meaning Banking
Meaning banking--creating a semantically annotated corpus for the purpose of
semantic parsing or generation--is a challenging task. It is quite simple to
come up with a complex meaning representation, but it is hard to design a
simple meaning representation that captures many nuances of meaning. This paper
lists some lessons learned in nearly ten years of meaning annotation during the
development of the Groningen Meaning Bank (Bos et al., 2017) and the Parallel
Meaning Bank (Abzianidze et al., 2017). The paper's format is rather
unconventional: there is no explicit related work, no methodology section, no
results, and no discussion (and the current snippet is not an abstract but
actually an introductory preface). Instead, its structure is inspired by work
of Traum (2000) and Bender (2013). The list starts with a brief overview of the
existing meaning banks (Section 1) and the rest of the items are roughly
divided into three groups: corpus collection (Section 2 and 3, annotation
methods (Section 4-11), and design of meaning representations (Section 12-30).
We hope this overview will give inspiration and guidance in creating improved
meaning banks in the future.Comment: https://www.aclweb.org/anthology/W19-3302
Web Usage Mining Architecture and Applications
The WEBMINER is a system that implements parts of this general architecture. The first part is domain dependent application. The second part is the domain independent application. This includes pattern discovery and analysis as part of the system's data mining engine. The overall architecture for the Web mining process is depicted below
Mining Meaning from Wikipedia
Wikipedia is a goldmine of information; not just for its many readers, but
also for the growing community of researchers who recognize it as a resource of
exceptional scale and utility. It represents a vast investment of manual effort
and judgment: a huge, constantly evolving tapestry of concepts and relations
that is being applied to a host of tasks.
This article provides a comprehensive description of this work. It focuses on
research that extracts and makes use of the concepts, relations, facts and
descriptions found in Wikipedia, and organizes the work into four broad
categories: applying Wikipedia to natural language processing; using it to
facilitate information retrieval and information extraction; and as a resource
for ontology building. The article addresses how Wikipedia is being used as is,
how it is being improved and adapted, and how it is being combined with other
structures to create entirely new resources. We identify the research groups
and individuals involved, and how their work has developed in the last few
years. We provide a comprehensive list of the open-source software they have
produced.Comment: An extensive survey of re-using information in Wikipedia in natural
language processing, information retrieval and extraction and ontology
building. Accepted for publication in International Journal of Human-Computer
Studie
Creation and extension of ontologies for describing communications in the context of organizations
Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfillment of the requirements for the degree of Master in Computer ScienceThe use of ontologies is nowadays a sufficiently mature and solid field of work to be considered an efficient alternative in knowledge representation. With the crescent growth of the Semantic Web, it is expectable that this alternative tends to emerge even more in the near future.
In the context of a collaboration established between FCT-UNL and the R&D department of a national software company, a new solution entitled ECC – Enterprise Communications Center was developed. This application provides a solution to manage the communications that enter, leave or are made within an organization, and includes intelligent classification of communications and conceptual search techniques in a communications repository. As specificity may be the key to obtain acceptable results with these processes, the use of ontologies becomes crucial to represent the existing knowledge about the specific domain of an organization.
This work allowed us to guarantee a core set of ontologies that have the power of expressing the general context of the communications made in an organization, and of a methodology based upon a series of concrete steps that provides an effective capability of extending the ontologies to any business domain. By applying these steps, the minimization of the conceptualization and setup effort in new organizations and business domains is guaranteed.
The adequacy of the core set of ontologies chosen and of the methodology specified is demonstrated in this thesis by its effective application to a real case-study, which allowed us to work with the different types of sources considered in the methodology and the activities that support its construction and evolution
- …