Search CORE

256 research outputs found

Generic software for benchmarking formal concept analysis: Orange3 integration

Author: Leutwyler Nicolas
Lezoche Mario
Panetto Hervé
Torres Diego
Publication venue
Publication date: 12/10/2023
Field of study

Thanks to the internet of things (IoT) and cyber physical systems (CPS), we face an incremental growth of the available data, either on the internet or in private databases. This resulted in data mining techniques becoming an essential piece in the information retrieval process. Moreover, trends like the industry 4.0 encourages its usage to support data driven decisions, for instance. Formal Concept Analysis (FCA) is one of the most used techniques in the unsupervised data mining field due to its inherent ability to find patterns between concepts. As a consequence, many applications need the use of fast algorithms to perform the calculations to retrieve either the lattice or the association rules related with the data at their disposal. Due to this, scientists often rely on manually crafted benchmarks to compare how certain algorithms perform under different circumstances. In this work, we propose the architecture of a software to generalize these benchmarks independently of the algorithms, to be integrated in the open source data analysis software Orange3.Facultad de Informátic

Servicio de Difusión de la Creación Intelectual

Formal methods for knowledge extraction and reuse from heterogeneous sources for semantic interoperability of distributed architectures

Author: Leutwyler Nicolás
Publication venue
Publication date: 12/10/2023
Field of study

The tendency in industry, manufacturing, and agriculture nowadays goes towards adopting the Industry 4.0 practices. Additionally, Internet of Things (IoT) has seen a huge increase in its usage over the last decade, and companies are eager to profit from the advantages it has offers. Between these tendencies, the usage of data as a means to increase productivity, or similarly, to minimize loss in production is found. In those lines, Formal Concept Analysis (FCA) is a clusterization method whose output is based on patterns of concepts (sets of objects and attributes). Some extensions such as Relational Concept Analysis have arisen to tackle the use case in which there are relations between seemingly different objects, which is something FCA cannot do. However, the area of automatically using the conceptual data resulted from these methods is still immature in the sense of formalization and usage. In this Ph.D., the goal is to work in expanding the boundaries of knowledge regarding the existing algorithms, mainly looking for optimizations, and extending their current capabilities.Facultad de Informátic

Servicio de Difusión de la Creación Intelectual

Description Logic for Scene Understanding at the Example of Urban Road Intersections

Author: Hummel Britta
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

Understanding a natural scene on the basis of external sensors is a task yet to be solved by computer algorithms. The present thesis investigates the suitability of a particular family of explicit, formal representation and reasoning formalisms for this task, which are subsumed under the term Description Logic

KITopen

Towards A Powerful Knowledge Database To Think Outside The Box And Select Multi-Purpose Plants

Author: Jean Silvie Pierre
Martin Pierre
Publication venue
Publication date: 01/01/2021
Field of study

Knowledge on plant uses is often compartmentalized into fields. Our hypothesis is that if a plant species allows several uses of interest, in different fields, then its cultivation and formulation may be of interest to new value chains. Establishing a knowledge database on multi-purpose plants is one way of facilitating interactions between disciplines, in addition to proposing new solutions to be explored in organic or ecological farming. This paper describes how to construct such a Knowledge Database. A comparison between the uses present in the different tabs makes it possible to identify 16 plant species reported for at least four different uses

Organic Eprints

Formalisation and evaluation of focus theories for requirements elicitation dialogues in natural language

Author: Lecoeuche Renaud
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Requirements engineering is an important part of software engineering. It consists in defining the needs of users when building a new system. These needs may be functional, i.e., what service should the system be able to provide, as well as non-functional, i.e., under which constraints should the system operate. Errors in requirements may have disastrous effects in the rest of the software engineering process (Brooks 1995, p.199), since they would lead to the construction of a system of little interest to its users or would require expensive modifications to correct. Because requirements documents may be very large, errors are usually hard to detect manually. Computer support is therefore often beneficial for their analysis. This is made easier if requirements are expressed formally. However, this support must also be adapted to and be usable by people who are expressing their requirements. These people are usually not computer specialists and are not accustomed to use formal languages. It is therefore necessary to help them express their requirements. Numerous approaches, have been suggested as aids to the acquisition of requirements (Reubenstein 1990). Much less attention has been paid to the control of the dialogue taking place between the users and the system whilst using such frameworks (Bubenko et al. 1994). Frameworks for requirements acquisition are not normally accompanied by theories of the types of dialogue which they support. Our ability to develop sophisticated formal frameworks to analyse requirements makes this deficiency more acutely felt, since increases in formality are often accompanied by greater difficulty in understanding and using the frameworks (Robertson et al. 1989).Users write their requirements in more or less natural language. This is then translated into a formal language that can be interpreted by the elicitation module. This module works on the requirements and provide feedback. The translation process is then applied to convert feedback into more or less natural language. Different systems put different emphasis on the parts of that general architecture. Some are very good at natural language interpretation while others put more emphasis on analysing the requirements and providing feedback.Natural language approaches to requirements elicitation, put an emphasis on natural language interpretation (see section 1.2.1). In these approaches, users write their specifica¬ tion in a subset of natural language. The system then translates it into a formal notation. The main benefit provided by these approaches is the improvement in the ease of use of the system: natural language is the main means of communication for human beings and does not need to be learned. However, most of these approaches do not provide a dialogue well suited for the requirements elicitation process. Because they translate the natural lan¬ guage specification into a formal notation but do not provide guidance on how to write the specification in the first place, users are left in charge of writing correct requirements. If a mistake is made while writing the specification, it will simply be translated into the formal notation.In order to actively help users in the process of writing the requirements, the elicit¬ ation system must interact with them. The emphasis, here, is no longer on translating requirements, but on actively extracting them through a dialogue with users. This is useful, since the requirements elicitation process is complex, and offering guidance is a big help for users. Unfortunately, most of the approaches providing guidance expose their formal underlying frameworks directly to users (see section 1.2.2). In order to benefit from the guidance provided, users have to learn the idiosyncrasies of the system they use. The task of providing guidance is complicated by the fact that there are numerous ways of carrying out the requirements elicitation. Very little research has been done on how to organise best the elicitation process to provide effective guidance. An arbitrary choice could be made, but forcing users to adopt a predefined method is usually not possible as it would make the elicitation process very difficult to follow and understand. The system must therefore be able to adapt itself to various elicitation methods. On the other hand, it is necessary for the system to make choices in order to provide active guidance. A "least-commitment" strategy, such as asking users at every choice point what to do next, is not a useful approach (Ferguson et al. 1996).One way of offering guidance without restricting users too much is by communicating with them in natural language, and by using natural language constraints to inform the choices made by the system to select a guidance strategy. These constraints ensure that the system adopts a strategy that will guide users in a natural and understandable manner, by taking into account the current state of the dialogue. In other words, the system takes into account the current state of the specification to help users complete it, but the current state of the dialogue is the principal factor constraining what will be spoken about next. Using such an approach reduces some of the problems discussed above. The specification does not need to be immediately correct as it will be checked and reworked by the system. The formal framework is hidden from users but is still there to ensure the correctness of the specifications. Guidance is continuously offered through dialogue, which is influenced by but does not directly follow the steps of construction of the specification.The natural language constraints we use in this thesis are theories of dialogue coherence, called "focus" theories. They define what can be spoken about next in a dialogue based on what has already been discussed and the subject under discussion. The theories take into account what participants in a dialogue pay attention to and try to ensure that the rest of the dialogue is related to it. The systems tries to help its users define how a research group WWW site should look like. The way the dialogue evolves from discussing the research group, to discussing the site and its associated home page, to discussing the set of publication can quite easily be followed. The use of pronouns helps in making the text fell natural. It would have been difficult to achieve the same result without using focus rules.Other techniques for organising dialogues, such as those based on the intentions under¬ lying the dialogue (Cohen et al. 1990), would require the dialogue manager to know what the elicitation system is trying to achieve and what its plan is. For some elicitation systems, this knowledge may not be available. Similarly, techniques based on the content of the communications exchanged and how they relate, e.g., based on RST (Mann and Thompson 1987), usually require a lot of domain knowledge. They are therefore time-consumming to code. Focus theories require less information from the elicitation module while enabling the dialogue manager to structure the dialogue. However, in some cases, focus theories are not sufficient to organise a dialogue. We use a theory based on speech act (see section 3.4.1) and some ideas from Grice's work on conversation (see section 5.2.1) to deal with these cases. More generally, although we tried to minimise the impact of other theories to study in detail focus theories, it would be interesting to know whether and how we can integrate them with the work presented in this thesis. In particular, the notion of dialog act and its application to dialog grammar could be of interest. General frameworks developped to study various aspects of dialogue, including dialog acts and focus, have started to appear but work is still at an early stage (C-Star Consortium 1998; Allen and Core 1997).Organising a dialogue based on attention requires a lot of domain knowledge in order to know how things mentioned in the dialogue relate to each other. Therefore, the amount of knowledge engineering needed to build natural language applications is also an important issue. We have tried to limit the engineering difficulties by clearly separating the domain knowledge needed by our dialogue manager from its management capabilities, and by provid¬ ing a way of re-using the existing domain knowledge as far as possible. This is done by using rules which enable us to re-use part of the domain knowledge already used by the elicitation module.The contribution of this thesis is therefore the formalisation and evaluation of focus theories for requirements elicitation dialogues in natural language. The main questions we deal with are the following: • Which focus theories should we use? • What are the relations between the constraints imposed by the focus theories and the constraints inherent to the requirements elicitation process? • Does this approach improve the perceived quality of the dialogue between the elicita¬ tion tool and its users?A prototype system has been developed. This system mainly operates in the WWW site design domain. It has also been applied in other domains as an initial demonstration of the range of problems that can be tackled by our approach

Edinburgh Research Archive

Mechanising an algebraic rely-guarantee refinement calculus

Author: Machado Dias Diego
Publication venue: Newcastle University
Publication date: 01/01/2017
Field of study

PhD ThesisDespite rely-guarantee (RG) being a well-studied program logic established in the 1980s, it was not until recently that researchers realised that rely and guarantee conditions could be treated as independent programming constructs. This recent reformulation of RG paved the way to algebraic characterisations which have helped to better understand the difficulties that arise in the practical application of this development approach. The primary focus of this thesis is to provide automated tool support for a rely-guarantee refinement calculus proposed by Hayes et. al., where rely and guarantee are defined as independent commands. Our motivation is to investigate the application of an algebraic approach to derive concrete examples using this calculus. In the course of this thesis, we locate and fix a few issues involving the refinement language, its operational semantics and preexisting proofs. Moreover, we extend the refinement calculus of Hayes et. al. to cover indexed parallel composition, non-atomic evaluation of expressions within specifications, and assignment to indexed arrays. These extensions are illustrated via concrete examples. Special attention is given to design decisions that simplify the application of the mechanised theory. For example, we leave part of the design of the expression language on the hands of the user, at the cost of the requiring the user to define the notion of undefinedness for unary and binary operators; and we also formalise a notion of indexed parallelism that is parametric on the type of the indexes, this is done deliberately to simplify the formalisation of algorithms. Additionally, we use stratification to reduce the number of cases in in simulation proofs involving the operational semantics. Finally, we also use the algebra to discuss the role of types in program derivation

Newcastle University eTheses

OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web

Author: Pareja-Lora A.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web 1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs. These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools. Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate. However, linguistic annotation tools have still some limitations, which can be summarised as follows: 1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.). 2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts. 3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc. A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved. In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool. Therefore, it would be quite useful to find a way to (i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools; (ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate. Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned. Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section. 2. GOALS OF THE PRESENT WORK As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based triples, as in the usual Semantic Web languages (namely RDF(S) and OWL), in order for the model to be considered suitable for the Semantic Web. Besides, to be useful for the Semantic Web, this model should provide a way to automate the annotation of web pages. As for the present work, this requirement involved reusing the linguistic annotation tools purchased by the OEG research group (http://www.oeg-upm.net), but solving beforehand (or, at least, minimising) some of their limitations. Therefore, this model had to minimise these limitations by means of the integration of several linguistic annotation tools into a common architecture. Since this integration required the interoperation of tools and their annotations, ontologies were proposed as the main technological component to make them effectively interoperate. From the very beginning, it seemed that the formalisation of the elements and the knowledge underlying linguistic annotations within an appropriate set of ontologies would be a great step forward towards the formulation of such a model (henceforth referred to as OntoTag). Obviously, first, to combine the results of the linguistic annotation tools that operated at the same level, their annotation schemas had to be unified (or, preferably, standardised) in advance. This entailed the unification (id. standardisation) of their tags (both their representation and their meaning), and their format or syntax. Second, to merge the results of the linguistic annotation tools operating at different levels, their respective annotation schemas had to be (a) made interoperable and (b) integrated. And third, in order for the resulting annotations to suit the Semantic Web, they had to be specified by means of an ontology-based vocabulary, and structured by means of ontology-based triples, as hinted above. Therefore, a new annotation scheme had to be devised, based both on ontologies and on this type of triples, which allowed for the combination and the integration of the annotations of any set of linguistic annotation tools. This annotation scheme was considered a fundamental part of the model proposed here, and its development was, accordingly, another major objective of the present work. All these goals, aims and objectives could be re-stated more clearly as follows: Goal 1: Development of a set of ontologies for the formalisation of the linguistic knowledge relating linguistic annotation. Sub-goal 1.1: Ontological formalisation of the EAGLES (1996a; 1996b) de facto standards for morphosyntactic and syntactic annotation, in a way that helps respect the triple structure recommended for annotations in these works (which is isomorphic to the triple structures used in the context of the Semantic Web). Sub-goal 1.2: Incorporation into this preliminary ontological formalisation of other existing standards and standard proposals relating the levels mentioned above, such as those currently under development within ISO/TC 37 (the ISO Technical Committee dealing with Terminology, which deals also with linguistic resources and annotations). Sub-goal 1.3: Generalisation and extension of the recommendations in EAGLES (1996a; 1996b) and ISO/TC 37 to the semantic level, for which no ISO/TC 37 standards have been developed yet. Sub-goal 1.4: Ontological formalisation of the generalisations and/or extensions obtained in the previous sub-goal as generalisations and/or extensions of the corresponding ontology (or ontologies). Sub-goal 1.5: Ontological formalisation of the knowledge required to link, combine and unite the knowledge represented in the previously developed ontology (or ontologies). Goal 2: Development of OntoTag’s annotation scheme, a standard-based abstract scheme for the hybrid (linguistically-motivated and ontological-based) annotation of texts. Sub-goal 2.1: Development of the standard-based morphosyntactic annotation level of OntoTag’s scheme. This level should include, and possibly extend, the recommendations of EAGLES (1996a) and also the recommendations included in the ISO/MAF (2008) standard draft. Sub-goal 2.2: Development of the standard-based syntactic annotation level of the hybrid abstract scheme. This level should include, and possibly extend, the recommendations of EAGLES (1996b) and the ISO/SynAF (2010) standard draft. Sub-goal 2.3: Development of the standard-based semantic annotation level of OntoTag’s (abstract) scheme. Sub-goal 2.4: Development of the mechanisms for a convenient integration of the three annotation levels already mentioned. These mechanisms should take into account the recommendations included in the ISO/LAF (2009) standard draft. Goal 3: Design of OntoTag’s (abstract) annotation architecture, an abstract architecture for the hybrid (semantic) annotation of texts (i) that facilitates the integration and interoperation of different linguistic annotation tools, and (ii) whose results comply with OntoTag’s annotation scheme. Sub-goal 3.1: Specification of the decanting processes that allow for the classification and separation, according to their corresponding levels, of the results of the linguistic tools annotating at several different levels. Sub-goal 3.2: Specification of the standardisation processes that allow (a) complying with the standardisation requirements of OntoTag’s annotation scheme, as well as (b) combining the results of those linguistic tools that share some level of annotation. Sub-goal 3.3: Specification of the merging processes that allow for the combination of the output annotations and the interoperation of those linguistic tools that share some level of annotation. Sub-goal 3.4: Specification of the merge processes that allow for the integration of the results and the interoperation of those tools performing their annotations at different levels. Goal 4: Generation of OntoTagger’s schema, a concrete instance of OntoTag’s abstract scheme for a concrete set of linguistic annotations. These linguistic annotations result from the tools and the resources available in the research group, namely • Bitext’s DataLexica (http://www.bitext.com/EN/datalexica.asp), • LACELL’s (POS) tagger (http://www.um.es/grupos/grupo-lacell/quees.php), • Connexor’s FDG (http://www.connexor.eu/technology/machinese/glossary/fdg/), and • EuroWordNet (Vossen et al., 1998). This schema should help evaluate OntoTag’s underlying hypotheses, stated below. Consequently, it should implement, at least, those levels of the abstract scheme dealing with the annotations of the set of tools considered in this implementation. This includes the morphosyntactic, the syntactic and the semantic levels. Goal 5: Implementation of OntoTagger’s configuration, a concrete instance of OntoTag’s abstract architecture for this set of linguistic tools and annotations. This configuration (1) had to use the schema generated in the previous goal; and (2) should help support or refute the hypotheses of this work as well (see the next section). Sub-goal 5.1: Implementation of the decanting processes that facilitate the classification and separation of the results of those linguistic resources that provide annotations at several different levels (on the one hand, LACELL’s tagger operates at the morphosyntactic level and, minimally, also at the semantic level; on the other hand, FDG operates at the morphosyntactic and the syntactic levels and, minimally, at the semantic level as well). Sub-goal 5.2: Implementation of the standardisation processes that allow (i) specifying the results of those linguistic tools that share some level of annotation according to the requirements of OntoTagger’s schema, as well as (ii) combining these shared level results. In particular, all the tools selected perform morphosyntactic annotations and they had to be conveniently combined by means of these processes. Sub-goal 5.3: Implementation of the merging processes that allow for the combination (and possibly the improvement) of the annotations and the interoperation of the tools that share some level of annotation (in particular, those relating the morphosyntactic level, as in the previous sub-goal). Sub-goal 5.4: Implementation of the merging processes that allow for the integration of the different standardised and combined annotations aforementioned, relating all the levels considered. Sub-goal 5.5: Improvement of the semantic level of this configuration by adding a named entity recognition, (sub-)classification and annotation subsystem, which also uses the named entities annotated to populate a domain ontology, in order to provide a concrete application of the present work in the two areas involved (the Semantic Web and Corpus Linguistics). 3. MAIN RESULTS: ASSESSMENT OF ONTOTAG’S UNDERLYING HYPOTHESES The model developed in the present thesis tries to shed some light on (i) whether linguistic annotation tools can effectively interoperate; (ii) whether their results can be combined and integrated; and, if they can, (iii) how they can, respectively, interoperate and be combined and integrated. Accordingly, several hypotheses had to be supported (or rejected) by the development of the OntoTag model and OntoTagger (its implementation). The hypotheses underlying OntoTag are surveyed below. Only one of the hypotheses (H.6) was rejected; the other five could be confirmed. H.1 The annotations of different levels (or layers) can be integrated into a sort of overall, comprehensive, multilayer and multilevel annotation, so that their elements can complement and refer to each other. • CONFIRMED by the development of: o OntoTag’s annotation scheme, o OntoTag’s annotation architecture, o OntoTagger’s (XML, RDF, OWL) annotation schemas, o OntoTagger’s configuration. H.2 Tool-dependent annotations can be mapped onto a sort of tool-independent annotations and, thus, can be standardised. • CONFIRMED by means of the standardisation phase incorporated into OntoTag and OntoTagger for the annotations yielded by the tools. H.3 Standardisation should ease: H.3.1: The interoperation of linguistic tools. H.3.2: The comparison, combination (at the same level and layer) and integration (at different levels or layers) of annotations. • H.3 was CONFIRMED by means of the development of OntoTagger’s ontology-based configuration: o Interoperation, comparison, combination and integration of the annotations of three different linguistic tools (Connexor’s FDG, Bitext’s DataLexica and LACELL’s tagger); o Integration of EuroWordNet-based, domain-ontology-based and named entity annotations at the semantic level. o Integration of morphosyntactic, syntactic and semantic annotations. H.4 Ontologies and Semantic Web technologies (can) play a crucial role in the standardisation of linguistic annotations, by providing consensual vocabularies and standardised formats for annotation (e.g., RDF triples). • CONFIRMED by means of the development of OntoTagger’s RDF-triple-based annotation schemas. H.5 The rate of errors introduced by a linguistic tool at a given level, when annotating, can be reduced automatically by contrasting and combining its results with the ones coming from other tools, operating at the same level. However, these other tools might be built following a different technological (stochastic vs. rule-based, for example) or theoretical (dependency vs. HPS-grammar-based, for instance) approach. • CONFIRMED by the results yielded by the evaluation of OntoTagger. H.6 Each linguistic level can be managed and annotated independently. • REJECTED: OntoTagger’s experiments and the dependencies observed among the morphosyntactic annotations, and between them and the syntactic annotations. In fact, Hypothesis H.6 was already rejected when OntoTag’s ontologies were developed. We observed then that several linguistic units stand on an interface between levels, belonging thereby to both of them (such as morphosyntactic units, which belong to both the morphological level and the syntactic level). Therefore, the annotations of these levels overlap and cannot be handled independently when merged into a unique multileveled annotation. 4. OTHER MAIN RESULTS AND CONTRIBUTIONS First, interoperability is a hot topic for both the linguistic annotation community and the whole Computer Science field. The specification (and implementation) of OntoTag’s architecture for the combination and integration of linguistic (annotation) tools and annotations by means of ontologies shows a way to make these different linguistic annotation tools and annotations interoperate in practice. Second, as mentioned above, the elements involved in linguistic annotation were formalised in a set (or network) of ontologies (OntoTag’s linguistic ontologies). • On the one hand, OntoTag’s network of ontologies consists of − The Linguistic Unit Ontology (LUO), which includes a mostly hierarchical formalisation of the different types of linguistic elements (i.e., units) identifiable in a written text; − The Linguistic Attribute Ontology (LAO), which includes also a mostly hierarchical formalisation of the different types of features that characterise the linguistic units included in the LUO; − The Linguistic Value Ontology (LVO), which includes the corresponding formalisation of the different values that the attributes in the LAO can take; − The OIO (OntoTag’s Integration Ontology), which Includes the knowledge required to link, combine and unite the knowledge represented in the LUO, the LAO and the LVO; Can be viewed as a knowledge representation ontology that describes the most elementary vocabulary used in the area of annotation. • On the other hand, OntoTag’s ontologies incorporate the knowledge included in the different standards and recommendations for linguistic annotation released so far, such as those developed within the EAGLES and the SIMPLE European projects or by the ISO/TC 37 committee: − As far as morphosyntactic annotations are concerned, OntoTag’s ontologies formalise the terms in the EAGLES (1996a) recommendations and their corresponding terms within the ISO Morphosyntactic Annotation Framework (ISO/MAF, 2008) standard; − As for syntactic annotations, OntoTag’s ontologies incorporate the terms in the EAGLES (1996b) recommendations and their corresponding terms within the ISO Syntactic Annotation Framework (ISO/SynAF, 2010) standard draft; − Regarding semantic annotations, OntoTag’s ontologies generalise and extend the recommendations in EAGLES (1996a; 1996b) and, since no stable standards or standard drafts have been released for semantic annotation by ISO/TC 37 yet, they incorporate the terms in SIMPLE (2000) instead; − The terms coming from all these recommendations and standards were supplemented by those within the ISO Data Category Registry (ISO/DCR, 2008) and also of the ISO Linguistic Annotation Framework (ISO/LAF, 2009) standard draft when developing OntoTag’s ontologies. Third, we showed that the combination of the results of tools annotating at the same level can yield better results (both in precision and in recall) than each tool separately. In particular, 1. OntoTagger clearly outperformed two of the tools integrated into its configuration, namely DataLexica and FDG in all the combination sub-phases in which they overlapped (i.e. POS tagging, lemma annotation and morphological feature annotation). As far as the remaining tool is concerned, i.e. LACELL’s tagger, it was also outperformed by OntoTagger in POS tagging and lemma annotation, and it did not behave better than OntoTagger in the morphological feature annotation layer. 2. As an immediate result, this implies that a) This type of combination architecture configurations can be applied in order to improve significantly the accuracy of linguistic annotations; and b) Concerning the morphosyntactic level, this could be regarded as a way of constructing more robust and more accurate POS tagging systems. Fourth, Semantic Web annotations are usually pe

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM