1,338 research outputs found
A Conceptual Representation of Documents and Queries for Information Retrieval Systems by Using Light Ontologies
International audienceThis article presents a vector space model approach to representing documents and queries, based on concepts instead of terms and using WordNet as a light ontology. Such representation reduces information overlap with respect to classic semantic expansion techniques. Experiments carried out on the MuchMore benchmark and on the TREC-7 and TREC-8 Ad-hoc collections demonstrate the effectiveness of the proposed approach
Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon
This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are
extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented
Microtheories for SDI - Accounting for diversity of local conceptualisations at a global level
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.The categorization and conceptualization of geographic features is fundamental to cartography,
geographic information retrieval, routing applications, spatial decision support
and data sharing in general. However, there is no standard conceptualization of
the world. Humans conceptualize features based on numerous factors including cultural
background, knowledge, motivation and particularly space and time. Thus, geographic
features are prone to multiple, context-dependent conceptualizations reflecting local
conditions. This creates semantic heterogeneity and undermines interoperability. Standardization
of a shared definition is often employed to overcome semantic heterogeneity.
However, this approach loses important local diversity in feature conceptualizations and
may result in feature definitions which are too broad or too specific. This work proposes
the use of microtheories in Spatial Data Infrastructures, such as INSPIRE, to account
for diversity of local conceptualizations while maintaining interoperability at a global
level. It introduces a novel method of structuring microtheories based on space and
time, represented by administrative boundaries, to reflect variations in feature conceptualization.
A bottom-up approach, based on non-standard inference, is used to create
an appropriate global-level feature definition from the local definitions. Conceptualizations
of rivers, forests and estuaries throughout Europe are used to demonstrate how
the approach can improve the INSPIRE data model and ease its adoption by European
member states
Definitions in ontologies
Definitions vary according to context of use and target audience. They must be made relevant for each context to fulfill their cognitive and linguistic goals. This involves adapting their logical structure, type of content, and form to each context of use. We examine from these perspectives the case of definitions in ontologies
Challenges to knowledge representation in multilingual contexts
To meet the increasing demands of the complex inter-organizational processes and the demand for
continuous innovation and internationalization, it is evident that new forms of organisation are
being adopted, fostering more intensive collaboration processes and sharing of resources, in what
can be called collaborative networks (Camarinha-Matos, 2006:03). Information and knowledge are
crucial resources in collaborative networks, being their management fundamental processes to
optimize.
Knowledge organisation and collaboration systems are thus important instruments for the success of
collaborative networks of organisations having been researched in the last decade in the areas of
computer science, information science, management sciences, terminology and linguistics.
Nevertheless, research in this area didn’t give much attention to multilingual contexts of
collaboration, which pose specific and challenging problems. It is then clear that access to and
representation of knowledge will happen more and more on a multilingual setting which implies the
overcoming of difficulties inherent to the presence of multiple languages, through the use of
processes like localization of ontologies.
Although localization, like other processes that involve multilingualism, is a rather well-developed
practice and its methodologies and tools fruitfully employed by the language industry in the
development and adaptation of multilingual content, it has not yet been sufficiently explored as an
element of support to the development of knowledge representations - in particular ontologies -
expressed in more than one language. Multilingual knowledge representation is then an open
research area calling for cross-contributions from knowledge engineering, terminology, ontology
engineering, cognitive sciences, computational linguistics, natural language processing, and
management sciences.
This workshop joined researchers interested in multilingual knowledge representation, in a
multidisciplinary environment to debate the possibilities of cross-fertilization between knowledge
engineering, terminology, ontology engineering, cognitive sciences, computational linguistics,
natural language processing, and management sciences applied to contexts where multilingualism
continuously creates new and demanding challenges to current knowledge representation methods
and techniques.
In this workshop six papers dealing with different approaches to multilingual knowledge
representation are presented, most of them describing tools, approaches and results obtained in the
development of ongoing projects.
In the first case, Andrés Domínguez Burgos, Koen Kerremansa and Rita Temmerman present a
software module that is part of a workbench for terminological and ontological mining,
Termontospider, a wiki crawler that aims at optimally traverse Wikipedia in search of domainspecific
texts for extracting terminological and ontological information. The crawler is part of a tool
suite for automatically developing multilingual termontological databases, i.e. ontologicallyunderpinned
multilingual terminological databases. In this paper the authors describe the basic principles
behind the crawler and summarized the research setting in which the tool is currently tested.
In the second paper, Fumiko Kano presents a work comparing four feature-based similarity
measures derived from cognitive sciences. The purpose of the comparative analysis presented by the author is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain. For that, datasets based on standardized
pre-defined feature dimensions and values, which are obtainable from the UNESCO Institute for
Statistics (UIS) have been used for the comparative analysis of the similarity measures. The purpose
of the comparison is to verify the similarity measures based on the objectively developed datasets.
According to the author the results demonstrate that the Bayesian Model of Generalization provides
for the most effective cognitive model for identifying the most similar corresponding concepts
existing for a targeted socio-cultural community.
In another presentation, Thierry Declerck, Hans-Ulrich Krieger and Dagmar Gromann present an
ongoing work and propose an approach to automatic extraction of information from multilingual
financial Web resources, to provide candidate terms for building ontology elements or instances of
ontology concepts. The authors present a complementary approach to the direct
localization/translation of ontology labels, by acquiring terminologies through the access and
harvesting of multilingual Web presences of structured information providers in the field of finance,
leading to both the detection of candidate terms in various multilingual sources in the financial
domain that can be used not only as labels of ontology classes and properties but also for the
possible generation of (multilingual) domain ontologies themselves.
In the next paper, Manuel Silva, António Lucas Soares and Rute Costa claim that despite the
availability of tools, resources and techniques aimed at the construction of ontological artifacts,
developing a shared conceptualization of a given reality still raises questions about the principles
and methods that support the initial phases of conceptualization. These questions become, according
to the authors, more complex when the conceptualization occurs in a multilingual setting. To tackle
these issues the authors present a collaborative platform – conceptME - where terminological and
knowledge representation processes support domain experts throughout a conceptualization
framework, allowing the inclusion of multilingual data as a way to promote knowledge sharing and
enhance conceptualization and support a multilingual ontology specification.
In another presentation Frieda Steurs and Hendrik J. Kockaert present us TermWise, a large project
dealing with legal terminology and phraseology for the Belgian public services, i.e. the translation
office of the ministry of justice, a project which aims at developing an advanced tool including
expert knowledge in the algorithms that extract specialized language from textual data (legal
documents) and whose outcome is a knowledge database including Dutch/French equivalents for
legal concepts, enriched with the phraseology related to the terms under discussion.
Finally, Deborah Grbac, Luca Losito, Andrea Sada and Paolo Sirito report on the preliminary
results of a pilot project currently ongoing at UCSC Central Library, where they propose to adapt to
subject librarians, employed in large and multilingual Academic Institutions, the model used by
translators working within European Union Institutions. The authors are using User Experience
(UX) Analysis in order to provide subject librarians with a visual support, by means of “ontology
tables” depicting conceptual linking and connections of words with concepts presented according to
their semantic and linguistic meaning.
The organizers hope that the selection of papers presented here will be of interest to a broad audience, and will be a starting point for further discussion and cooperation
Types and taxonomic structures in conceptual modeling:A novel ontological theory and engineering support
Types are fundamental for conceptual modeling and knowledge representation, being an essential construct in all major modeling languages in these fields. Despite that, from an ontological and cognitive point of view, there has been a lack of theoretical support for precisely defining a consensual view on types. As a consequence, there has been a lack of precise methodological support for users when choosing the best way to model general terms representing types that appear in a domain, and for building sound taxonomic structures involving them. For over a decade now, a community of researchers has contributed to the development of the Unified Foundational Ontology (UFO) - aimed at providing foundations for all major conceptual modeling constructs. At the core of this enterprise, there has been a theory of types specially designed to address these issues. This theory is ontologically well-founded, psychologically informed, and formally characterized. These results have led to the development of a Conceptual Modelling language dubbed OntoUML, reflecting the ontological micro-theories comprising UFO. Over the years, UFO and OntoUML have been successfully employed on conceptual model design in a variety of domains including academic, industrial, and governmental settings. These experiences exposed improvement opportunities for both the OntoUML language and its underlying theory, UFO. In this paper, we revise the theory of types in UFO in response to empirical evidence. The new version of this theory shows that many of OntoUML's meta-types (e.g. kind, role, phase, mixin) should be considered not as restricted to substantial types but instead should be applied to model endurant types in general, including relator types, quality types, and mode types. We also contribute with a formal characterization of this fragment of the theory, which is then used to advance a new metamodel for OntoUML (termed OntoUML 2). To demonstrate that the benefits of this approach are extended beyond OntoUML, the proposed formal theory is then employed to support the definition of UFO-based lightweight Semantic Web ontologies with ontological constraint checking in OWL. Additionally, we report on empirical evidence from the literature, mainly from cognitive psychology but also from linguistics, supporting some of the key claims made by this theory. Finally, we propose a computational support for this updated metamodel.</p
Using an ontology for guiding natural language interaction with knowledge based systems
Des dels anys 80, els sistemes basats en el coneixement, programes que utilitzen una gran quantitat de informació per modelar situacions i resoldre problemes, han adquirit gran importància en el camp industrial, financer i científic. La complexitat d'aquests sistemes fa que el seu ús presenti més dificultats que altres aplicacions informàtiques. La comunicació entre els sistemes basats en el coneixement i l'usuari presenta, doncs, nous reptes. Tot i que el llenguate natural es especialment apropiat per comunicar-se amb aquests sistemes, són pocs els que incorporen interfícies en llenguatge natural. Els motius principals són els problemes d'eficiència que presenta el processament del llenguatge natural i l'elevat cost de desenvolupar les bases de coneixement (conceptual i lingüístic) necessàries per a cada aplicació. L'objectiu d'aquesta tesi és millorar la comunicació en llenguatge natural amb els sistemes basats en el coneixement. Aquesta recerca s'ha centrat en el disseny d'una representació reutilitzable dels diferents tipus de coneixement involucrats en aquesta comunicació, que permetir de generar de forma automàtica la interfície més adequada per a cada aplicació. S'ha desenvolupat un sistema, GISE (Generador de Interfaces a Sistemas Expertos), que genera interfícies en llenguatge natural per diferents tipus d'aplicacions. Aquest sistema adapta automàticament les bases de coneixement lingüístic generals als requeriments d'una aplicació concreta, obtenint la gramàtica més apropiada. El disseny del sistema està basat en una representació reutilitzable i modular dels diferents tipus de coneixement necessaris en la comunicació en llenguatge natural. Aquesta informació consisteix en els conceptes de l'aplicació, les tasques de comunicació, el coneixement lingüístic i les relacions generals entre el coneixement conceptual i la seva realització lingüística. Tres bases de coneixement s'han dissenyat per representar aquesta informació: la ontologia conceptual, la ontologia lingüística i un conjunt de relges de producció. El coneixement conceptual s'ha representat en la ontologia conceptual. Aquest coneixement inclou aspectes sobre el domini i la funcionalitat. Tota la informació necessària per modelar l'aplicació i tots els possibles actes de comunicació estan representats en la ontologia conceptual. La complexitat dels sistemes basats en el coneixement fa necessària una representació formal i explícita de la seva funcionalitat i domini.El coneixement lingüístic general necessari per expressar en llenguatge natural les possibles tasques del sistema es representen en la ontologia lingüística.La informació que permet relacionar el coneixement lingüístic general a una aplicació concreta per tal d'obtenir la gramàtica més adequada es representada mitjançant un conjunt de regles de producció.L'organització modular dels diferents tipus de coneixement que intervenen en la comunicació facilita l'adaptació del sistema a diferents tipus d'aplicacions i usuaris.Les gramàtiques generades pel sistema GISE utilitzen un llenguatge alhora ric i precís, adaptat a l'aplicació. La interfície del sistema incorpora un sistema de finestres que guia a l'usuari a introduir les opcions en llenguatge natural que el sistema reconeix.GISE s'ha aplicat a diferents sistemes: a SIREDOJ, un sistema expert en lleis i a un sistema que dóna informació sobre trens.Since the 1980's, knowledge based systems (KBSs), programs that use knowledge to model situations and solve problems, have spread throughout industry, finance and science. Human communication with these systems deals with complex concepts and relationships that are not present in other software applications. Allthough the natural language (NL) is especially appropriate for expressing these concepts, there are not many KBSs incorporating NL interfaces. The main reasons for this are problems of efficiency in NLI performance, lack of adequacy to the communication needs of the applications and the high cost of developing and maintaining them.The aim of this thesis is to study how the communication process and engineering features can be improved in NL interaction with KBSs. This study has been focused on the efficient and reusable representation of the knowledge involved in NL communication with KBSs. GISE (Generador de Interfaces a Sistemas Expertos), a system supporting NL communication with KBSs has been developed. This system adapts the general linguistic resources to application requirements in order to automatically obtain application-restricted grammars. The main issue of the system design is a separate and reusable representation of all types of knowledge involved in communication with KBSs. This knowledge consists of the application knowledge appearing in the communication, the tasks of communication, the linguistic knowledge supporting their expression and the general relationships between conceptual knowledge and its linguistic realization. Three general bases were designed to represent all this knowledge : the Conceptual Ontology (CO), the Linguistic Ontology (LO) and a set of control rules.Conceptual knowledge is represented in the CO. This conceptual knowledge includes domain and functionality issues. All knowledge required to model the applications as well as the description of all possible communication acts is provided in the CO. The CO is the skeleton for anchoring the domain and the functionality of the applications. The complexity of KBS performance makes a formal and explicit representation of their domain and functionality necessary. The general linguistic knowledge needed to cover the expression in NL of the tasks the system performs is represented by means of the LO and a set containing all possible realizations of the application terms. The LO is domain and application independent. The control information to relate the general linguistic knowledge to conceptual application knowledge in order to generate the application-restricted grammars is represented by a set of production rules. The modular organization of the relevant knowledge into separate data structures provides great flexibility for adapting the system to different types of applications and users.The grammars generated by GISE use expressive and precise language tuned to the application and adapted to the evolution of the communicative process. A menu-system to guide the user in introducing the NL is integrated into the GISE interface. GISE has been applied to a couple of applications: SIREDOJ, an ES in law and a railway communication system
- …