2,984 research outputs found
Conceptual graph-based knowledge representation for supporting reasoning in African traditional medicine
Although African patients use both conventional or modern and traditional healthcare simultaneously, it has been proven that 80% of people rely on African traditional medicine (ATM). ATM includes medical activities stemming from practices, customs and traditions which were integral to the distinctive African cultures. It is based mainly on the oral transfer of knowledge, with the risk of losing critical knowledge. Moreover, practices differ according to the regions and the availability of medicinal plants. Therefore, it is necessary to compile tacit, disseminated and complex knowledge from various Tradi-Practitioners (TP) in order to determine interesting patterns for treating a given disease. Knowledge engineering methods for traditional medicine are useful to model suitably complex information needs, formalize knowledge of domain experts and highlight the effective practices for their integration to conventional medicine. The work described in this paper presents an approach which addresses two issues. First it aims at proposing a formal representation model of ATM knowledge and practices to facilitate their sharing and reusing. Then, it aims at providing a visual reasoning mechanism for selecting best available procedures and medicinal plants to treat diseases. The approach is based on the use of the Delphi method for capturing knowledge from various experts which necessitate reaching a consensus. Conceptual graph formalism is used to model ATM knowledge with visual reasoning capabilities and processes. The nested conceptual graphs are used to visually express the semantic meaning of Computational Tree Logic (CTL) constructs that are useful for formal specification of temporal properties of ATM domain knowledge. Our approach presents the advantage of mitigating knowledge loss with conceptual development assistance to improve the quality of ATM care (medical diagnosis and therapeutics), but also patient safety (drug monitoring)
Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines
Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF
Handbook of Lexical Functional Grammar
Lexical Functional Grammar (LFG) is a nontransformational theory of
linguistic structure, first developed in the 1970s by Joan Bresnan and
Ronald M. Kaplan, which assumes that language is best described and
modeled by parallel structures representing different facets of
linguistic organization and information, related by means of
functional correspondences. This volume has five parts. Part I,
Overview and Introduction, provides an introduction to core syntactic
concepts and representations. Part II, Grammatical Phenomena, reviews
LFG work on a range of grammatical phenomena or constructions. Part
III, Grammatical modules and interfaces, provides an overview of LFG
work on semantics, argument structure, prosody, information structure,
and morphology. Part IV, Linguistic disciplines, reviews LFG work in
the disciplines of historical linguistics, learnability,
psycholinguistics, and second language learning. Part V, Formal and
computational issues and applications, provides an overview of
computational and formal properties of the theory, implementations,
and computational work on parsing, translation, grammar induction, and
treebanks. Part VI, Language families and regions, reviews LFG work
on languages spoken in particular geographical areas or in particular
language families. The final section, Comparing LFG with other
linguistic theories, discusses LFG work in relation to other
theoretical approaches
Learning Sentence-internal Temporal Relations
In this paper we propose a data intensive approach for inferring
sentence-internal temporal relations. Temporal inference is relevant for
practical NLP applications which either extract or synthesize temporal
information (e.g., summarisation, question answering). Our method bypasses the
need for manual coding by exploiting the presence of markers like after", which
overtly signal a temporal relation. We first show that models trained on main
and subordinate clauses connected with a temporal marker achieve good
performance on a pseudo-disambiguation task simulating temporal inference
(during testing the temporal marker is treated as unseen and the models must
select the right marker from a set of possible candidates). Secondly, we assess
whether the proposed approach holds promise for the semi-automatic creation of
temporal annotations. Specifically, we use a model trained on noisy and
approximate data (i.e., main and subordinate clauses) to predict
intra-sentential relations present in TimeBank, a corpus annotated rich
temporal information. Our experiments compare and contrast several
probabilistic models differing in their feature space, linguistic assumptions
and data requirements. We evaluate performance against gold standard corpora
and also against human subjects
Processing temporal information in unstructured documents
Tese de doutoramento, Informática (Ciência da Computação), Universidade de Lisboa, Faculdade de Ciências, 2013Temporal information processing has received substantial attention in the last few years, due to the appearance of evaluation challenges focused on the extraction of temporal information from texts written in natural language. This research area belongs to the broader field of information extraction, which aims to automatically find specific pieces of information in texts, producing structured representations of that information, which can then be easily used by other computer applications. It has the potential to be useful in several applications that deal with natural language, given that many languages, among which we find Portuguese, extensively refer to time. Despite that, temporal processing is still incipient for many language, Portuguese being one of them. The present dissertation has various goals. On one hand, it addresses this current gap, by developing and making available resources that support the development of tools for this task, employing this language, and also by developing precisely this kind of tools. On the other hand, its purpose is also to report on important results of the research on this area of temporal processing. This work shows how temporal processing requires and benefits from modeling different kinds of knowledge: grammatical knowledge, logical knowledge, knowledge about the world, etc. Additionally, both machine learning methods and rule-based approaches are explored and used in the development of hybrid systems that are capable of taking advantage of the strengths of each of these two types of approach.O processamento de informação temporal tem recebido bastante atenção nos últimos anos, devido ao surgimento de desafios de avaliação focados na extração de informação temporal de textos escritos em linguagem natural. Esta área de investigação enquadra-se no campo mais lato da extração de informação, que visa encontrar automaticamente informação especÃfica presente em textos, produzindo representações estruturadas da mesma, que podem depois ser facilmente utilizadas por outras aplicações computacionais. Tem o potencial de ser útil em diversas aplicações que lidam com linguagem natural, dado o caráter quase ubÃquo da referência ao tempo cronólogico em muitas lÃnguas, entre as quais o Português. Apesar de tudo, o processamento temporal encontra-se ainda incipiente para bastantes lÃnguas, sendo o Português uma delas. A presente dissertação tem vários objetivos. Por um lado vem colmatar esta lacuna existente, desenvolvendo e disponibilizando recursos que suportam o desenvolvimento de ferramentas para esta tarefa, utilizando esta lÃngua, e desenvolvendo também precisamente este tipo de ferramentas. Por outro serve também para relatar resultados importantes da pesquisa nesta área do processamento temporal. Neste trabalho, mostra- -se como o processamento temporal requer e beneficia da modelação de conhecimento de diversos nÃveis: gramatical, lógico, acerca do mundo, etc. Adicionalmente, são explorados tanto métodos de aprendizagem automática como abordagens baseadas em regras, desenvolvendo-se sistemas hÃbridos capazes de tirar partido das vantagens de cada um destes dois tipos de abordagem.Fundação para a Ciência e a Tecnologia (FCT, SFRH/BD/40140/2007
Frame semantics for the field of climate change : d iscovering frames based on chinese and english terms
La plupart des dictionnaires spécialisés de termes environnementaux en mandarin sont des dictionnaires papier, compilés et révisés il y a plus de dix ans, et contiennent principalement des termes nominaux. Les informations terminologiques se limitent aux connaissances véhiculées par le terme et son ou ses équivalents anglais. Pour les lecteurs qui souhaitent connaître les propriétés sémantiques ou syntaxiques des termes et pour les lecteurs qui veulent voir l’usage des termes dans des contextes réels de textes spécialisés, les informations fournies par les dictionnaires existants sont insuffisantes. Dans cette recherche, nous avons compilé une ressource terminologique en ligne du mandarin, décrivant les termes verbaux chinois dans le domaine du changement climatique. Cette ressource comble certaines des lacunes des dictionnaires environnementaux mandarin existants, en révélant le(s) sens du terme à travers la(les) structure(s) actantielle(s) et en montrant, à travers des contextes annotés, les propriétés sémantiques et syntaxiques du terme ainsi que ses usages pratiques dans des textes spécialisés. Cette ressource répondra mieux aux besoins du public.
La base théorique qui sous-tend cette recherche est la Sémantique des cadres (Fillmore, 1976, 1977, 1982, 1985; Fillmore & Atkins, 1992), et le FrameNet construit à partir de celle-ci. L’objectif principal de cette recherche est de découvrir et de définir des cadres sémantiques chinois dans le domaine du changement climatique, et d’établir des relations entre les cadres chinois définis. Les cadres sémantiques chinois sont découverts à l’aide de la méthodologie du dictionnaire environnemental multilingue DiCoEnviro (et de sa ressource d’accompagnement Framed DiCoEnviro) (L’Homme, 2018; L’Homme et al., 2020). Afin de rendre cette méthodologie applicable à une langue sino-tibétaine, le chinois, nous avons modifié et adapté cette méthodologie pour qu’elle convienne à la description des termes chinois et à la définition des cadres sémantiques chinois. Certaines de ces modifications et adaptations sont basées sur le Chinese FrameNet (CFN) (Liu & You, 2015).
Afin de découvrir les cadres sémantiques chinois, un corpus monolingue en chinois mandarin sur le changement climatique (MCCC) a d’abord été compilé. Ce corpus contient 224 textes
iv
authentiques chinois spécialisés dans le domaine du changement climatique, qui totalisent 1,228,333 caractères chinois, soit 547,592 mots chinois. Puis, les termes candidats ont été automatiquement extraits du MCCC à l’aide du logiciel de gestion et d’analyse de corpus – Sketch Engine. Après une analyse et une validation manuelle, nous avons déterminé quels termes candidats sont des termes réels. Par la suite, la structure actancielle de chaque terme a été écrite en analysant les contextes où le terme apparaît. Ensuite, chaque sens d’un terme polysémique a été placé dans une entrée séparée et 16-20 contextes ont été sélectionnés pour chaque entrée. Puis, chaque contexte a été annoté en fonction de trois couches – structure sémantique, fonction syntaxique et groupe syntaxique. Ensuite, les termes ont été classés en fonction des scénarios qu’ils évoquent. Les termes qui dépeignent la même scène ou situation dans le domaine du changement climatique, qui ont une structure actantielle similaire et qui partagent la majorité des circonstants sont classés dans un seul cadre sémantique (critères basés sur le projet DiCoEnviro (L’Homme, 2018; L’Homme et al., 2020)). Après avoir identifié les cadres sémantiques chinois, chaque cadre a été défini. Enfin, les cadres chinois découverts ont été reliés selon les huit types de relations entre cadres proposés par Ruppenhofer et al. (2016). Pour être affichés en ligne, les entrées de termes et les cadres sémantiques ont été encodés dans des fichiers XML.
Guidés par cette méthodologie de recherche, nous avons finalement relevé 23 cadres sémantiques chinois et nous les avons définis. Le résultat final de cette recherche est une ressource terminologique en chinois mandarin basée sur des cadres et spécialisée dans le domaine du changement climatique. Cette ressource terminologique se compose de deux parties. La première partie est la description d’un total de 39 termes verbaux chinois. Chaque sens d’un terme verbal polysémique étant placé dans une entrée séparée, il y a au total 59 entrées (chaque entrée contient la structure actantielle et les contextes annotés). Au total, 1,027 contextes ont été annotés. La deuxième partie de cette ressource présente les 23 cadres sémantiques chinois identifiés ainsi que les relations entre les cadres.Most of the existing Mandarin Chinese specialised dictionaries of environmental terms are paper dictionaries, compiled and revised more than ten years ago, and contain mainly noun terms. Terminological information is restricted to knowledge conveyed by the term and its English equivalent(s). For readers who want to learn about semantic or syntactic properties of terms and for readers who want to see usage of terms in real contexts of specialised texts, information provided in existing dictionaries is insufficient. In this research, we compiled an online Mandarin Chinese terminological resource, describing Chinese verb terms in the field of climate change. This resource makes up for some of the deficiencies of existing Chinese environmental dictionaries, revealing meaning(s) of the term through actantial structure(s) and showing, through annotated contexts, semantic and syntactic properties of the term as well as its practical usages in specialised texts. This resource better meets the needs of the audience.
The theoretical basis underpinning this research is Frame Semantics (Fillmore, 1976, 1977, 1982, 1985; Fillmore & Atkins, 1992), and the FrameNet built from it. The main objective of this research is to discover and define Chinese semantic frames in the field of climate change, and to establish relations between the Chinese frames defined. The Chinese semantic frames are discovered with the help of the methodology of the multilingual environmental dictionary DiCoEnviro (and its accompanying resource Framed DiCoEnviro) (L’Homme, 2018; L’Homme et al., 2020). In order to make this methodology applicable to a Sino-Tibetan language, Chinese, we modified and adapted this methodology to suit the description of Chinese terms and definition of Chinese semantic frames. Some of the changes and adaptations are based on the Chinese FrameNet (CFN) (Liu & You, 2015).
In order to discover Chinese semantic frames, a monolingual Mandarin (Chinese) Climate Change Corpus (MCCC) was first compiled. This corpus contains 224 authentic Chinese specialised texts in the field of climate change, totaling 1,228,333 Chinese characters, which is 547,592 Chinese words. Following this, candidate terms were automatically extracted from MCCC using the corpus
ii
management and analysing software – Sketch Engine. After manual analysis and validation, which of the candidate terms are true terms was clarified. Subsequently, the actantial structure of each term was written by analysing the contexts where the term occurs. Next, each sense of a polysemous term was placed in a separate entry and 16-20 contexts were selected for each entry. Then, each context was annotated in terms of three layers – semantic structure, syntactic function and syntactic group. After this, the terms were classified according to the scenarios they evoke. Terms that depict the same scene or situation in the field of climate change, have similar actantial structure, and share the majority of circumstants are categorised into one semantic frame (criteria based on the project DiCoEnviro (L’Homme, 2018; L’Homme et al., 2020)). After Chinese semantic frames were identified, each frame was defined. Finally, the discovered Chinese frames were linked according to the eight types of frame relations proposed by Ruppenhofer et al. (2016). To be displayed online, term entries and semantic frames were encoded in XML files.
Guided by this research methodology, we eventually discovered and defined 23 Chinese semantic frames. The end result of this research is a frame-based Mandarin Chinese terminological resource specialised in the field of climate change. This terminological resource consists of two parts. The first part is the description of a total of 39 Chinese verb terms. With each meaning of a polysemous verb term placed in a separate entry, there are a total of 59 entries (each entry contains the actantial structure and annotated contexts). A total of 1,027 contexts were annotated. The second part of this resource presents the 23 Chinese semantic frames identified as well as the relations between frames
- …