382 research outputs found

    Exploring the Unexplored: Identifying Implicit and Indirect Descriptions of Biomedical Terminologies Based on Multifaceted Weighting Combinations

    Get PDF
    In order to achieve relevant scholarly information from the biomedical databases, researchers generally use technical terms as queries such as proteins, genes, diseases, and other biomedical descriptors. However, the technical terms have limits as query terms because there are so many indirect and conceptual expressions denoting them in scientific literatures. Combinatorial weighting schemes are proposed as an initial approach to this problem, which utilize various indexing and weighting methods and their combinations. In the experiments based on the proposed system and previously constructed evaluation collection, this approach showed promising results in that one could continually locate new relevant expressions by combining the proposed weighting schemes. Furthermore, it could be ascertained that the most outperforming binary combinations of the weighting schemes, showing the inherent traits of the weighting schemes, could be complementary to each other and it is possible to find hidden relevant documents based on the proposed methods

    Using natural language processing for question answering in closed and open domains

    Get PDF
    With regard to the growth in the amount of social, environmental, and biomedical information available digitally, there is a growing need for Question Answering (QA) systems that can empower users to master this new wealth of information. Despite recent progress in QA, the quality of interpretation and extraction of the desired answer is not adequate. We believe that striving for higher accuracy in QA systems is subject to on-going research, i.e., it is better to have no answer is better than wrong answers. However, there are diverse queries, which the state of the art QA systems cannot interpret and answer properly. The problem of interpreting a question in a way that could preserve its syntactic-semantic structure is considered as one of the most important challenges in this area. In this work we focus on the problems of semantic-based QA systems and analyzing the effectiveness of NLP techniques, query mapping, and answer inferencing both in closed (first scenario) and open (second scenario) domains. For this purpose, the architecture of Semantic-based closed and open domain Question Answering System (hereafter “ScoQAS”) over ontology resources is presented with two different prototyping: Ontology-based closed domain and an open domain under Linked Open Data (LOD) resource. The ScoQAS is based on NLP techniques combining semantic-based structure-feature patterns for question classification and creating a question syntactic-semantic information structure (QSiS). The QSiS provides an actual potential by building constraints to formulate the related terms on syntactic-semantic aspects and generating a question graph (QGraph) which facilitates making inference for getting a precise answer in the closed domain. In addition, our approach provides a convenient method to map the formulated comprehensive information into SPARQL query template to crawl in the LOD resources in the open domain. The main contributions of this dissertation are as follows: 1. Developing ScoQAS architecture integrated with common and specific components compatible with closed and open domain ontologies. 2. Analysing user’s question and building a question syntactic-semantic information structure (QSiS), which is constituted by several processes of the methodology: question classification, Expected Answer Type (EAT) determination, and generated constraints. 3. Presenting an empirical semantic-based structure-feature pattern for question classification and generalizing heuristic constraints to formulate the relations between the features in the recognized pattern in terms of syntactical and semantical. 4. Developing a syntactic-semantic QGraph for representing core components of the question. 5. Presenting an empirical graph-based answer inference in the closed domain. In a nutshell, a semantic-based QA system is presented which provides some experimental results over the closed and open domains. The efficiency of the ScoQAS is evaluated using measures such as precision, recall, and F-measure on LOD challenges in the open domain. We focus on quantitative evaluation in the closed domain scenario. Due to the lack of predefined benchmark(s) in the first scenario, we define measures that demonstrate the actual complexity of the problem and the actual efficiency of the solutions. The results of the analysis corroborate the performance and effectiveness of our approach to achieve a reasonable accuracy.Con respecto al crecimiento en la cantidad de información social, ambiental y biomédica disponible digitalmente, existe una creciente necesidad de sistemas de la búsqueda de la respuesta (QA) que puedan ofrecer a los usuarios la gestión de esta nueva cantidad de información. A pesar del progreso reciente en QA, la calidad de interpretación y extracción de la respuesta deseada no es la adecuada. Creemos que trabajar para lograr una mayor precisión en los sistemas de QA es todavía un campo de investigación abierto. Es decir, es mejor no tener respuestas que tener respuestas incorrectas. Sin embargo, existen diversas consultas que los sistemas de QA en el estado del arte no pueden interpretar ni responder adecuadamente. El problema de interpretar una pregunta de una manera que podría preservar su estructura sintáctica-semántica es considerado como uno de los desafíos más importantes en esta área. En este trabajo nos centramos en los problemas de los sistemas de QA basados en semántica y en el análisis de la efectividad de las técnicas de PNL, y la aplicación de consultas e inferencia respuesta tanto en dominios cerrados (primer escenario) como abiertos (segundo escenario). Para este propósito, la arquitectura del sistema de búsqueda de respuestas en dominios cerrados y abiertos basado en semántica (en adelante "ScoQAS") sobre ontologías se presenta con dos prototipos diferentes: en dominio cerrado basado en el uso de ontologías y un dominio abierto dirigido a repositorios de Linked Open Data (LOD). El ScoQAS se basa en técnicas de PNL que combinan patrones de características de estructura semánticas para la clasificación de preguntas y la creación de una estructura de información sintáctico-semántica de preguntas (QSiS). El QSiS proporciona una manera la construcción de restricciones para formular los términos relacionados en aspectos sintáctico-semánticos y generar un grafo de preguntas (QGraph) el cual facilita derivar inferencias para obtener una respuesta precisa en el dominio cerrado. Además, nuestro enfoque proporciona un método adecuado para aplicar la información integral formulada en la plantilla de consulta SPARQL para navegar en los recursos LOD en el dominio abierto. Las principales contribuciones de este trabajo son los siguientes: 1. El desarrollo de la arquitectura ScoQAS integrada con componentes comunes y específicos compatibles con ontologías de dominio cerrado y abierto. 2. El análisis de la pregunta del usuario y la construcción de una estructura de información sintáctico-semántica de las preguntas (QSiS), que está constituida por varios procesos de la metodología: clasificación de preguntas, determinación del Tipo de Respuesta Esperada (EAT) y las restricciones generadas. 3. La presentación de un patrón empírico basado en la estructura semántica para clasificar las preguntas y generalizar las restricciones heurísticas para formular las relaciones entre las características en el patrón reconocido en términos sintácticos y semánticos. 4. El desarrollo de un QGraph sintáctico-semántico para representar los componentes centrales de la pregunta. 5. La presentación de la respuesta inferida a partir de un grafo empírico en el dominio cerrado. En pocas palabras, se presenta un sistema semántico de QA que proporciona algunos resultados experimentales sobre los dominios cerrados y abiertos. La eficiencia del ScoQAS se evalúa utilizando medidas tales como una precisión, cobertura y la medida-F en desafíos LOD para el dominio abierto. Para el dominio cerrado, nos centramos en la evaluación cuantitativa; su precisión se analiza en una ontología empresarial. La falta de un banco la pruebas predefinidas es uno de los principales desafíos de la evaluación en el primer escenario. Por lo tanto, definimos medidas que demuestran la complejidad real del problema y la eficiencia real de las soluciones. Los resultados del análisis corroboran el rendimient

    Contributions to information extraction for spanish written biomedical text

    Get PDF
    285 p.Healthcare practice and clinical research produce vast amounts of digitised, unstructured data in multiple languages that are currently underexploited, despite their potential applications in improving healthcare experiences, supporting trainee education, or enabling biomedical research, for example. To automatically transform those contents into relevant, structured information, advanced Natural Language Processing (NLP) mechanisms are required. In NLP, this task is known as Information Extraction. Our work takes place within this growing field of clinical NLP for the Spanish language, as we tackle three distinct problems. First, we compare several supervised machine learning approaches to the problem of sensitive data detection and classification. Specifically, we study the different approaches and their transferability in two corpora, one synthetic and the other authentic. Second, we present and evaluate UMLSmapper, a knowledge-intensive system for biomedical term identification based on the UMLS Metathesaurus. This system recognises and codifies terms without relying on annotated data nor external Named Entity Recognition tools. Although technically naive, it performs on par with more evolved systems, and does not exhibit a considerable deviation from other approaches that rely on oracle terms. Finally, we present and exploit a new corpus of real health records manually annotated with negation and uncertainty information: NUBes. This corpus is the basis for two sets of experiments, one on cue andscope detection, and the other on assertion classification. Throughout the thesis, we apply and compare techniques of varying levels of sophistication and novelty, which reflects the rapid advancement of the field

    The communicative theory of Terminology (CTT) applied to the development of a corpus-based specialised dictionary of the ceramics industry

    Get PDF
    Esta tesis es el resultado de un proyecto destinado a la creación de un diccionario activo, bilingüe (español-inglés; inglés-español) y especializado de la industria cerámica y azulejera con la Teoría Comunicativa de la Terminología como su pilar teórico principal. Debido al posicionamiento teórico adoptado, la investigación aquí presentada ha partido de un estudio de corpus (compilado ad hoc) en el que los términos han sido analizados in vivo y caracterizados de acuerdo al ¿habitat¿ en el que se hallan en el texto especializado. Así pues, la aproximación hecha al estudio de la terminología industrial cerámica hace pertinente el uso de la etiqueta ¿lexicografía especializada¿ a la hora de referirnos a un trabajo como éste en el que se ha tratado de ir más allá de la práctica terminográfica para dar lugar a un estudio en el que se prima el contexto, las asociaciones naturales de los términos (colocaciones) y la naturaleza comunicativa de la terminología. De este modo, en esta tesis se ha presentado de manera progresiva, además de un marco teórico detallado y coherente con el fin último de la investigación, la metodología utilizada para la elaboración del diccionario en curso, ampliamente basada en el uso de programas informáticos tanto para la explotación del corpus (WordSmith Tools 4.0), como para la creación de la base de datos terminológica (TermStar XV) y la generación de entradas finales (GENDIC).Así pues, esta tesis presenta de manera progresiva los resultados obtenidos en cada etapa del método de trabajo y 4,000 entradas finales (en este caso del inglés al español) correspondientes a las letras A, B, N, O, U y V del diccionario.This PhD dissertation is the result of an ongoing process aimed at the creation of a bilingual corpus-based specialised active dictionary of the ceramic industry, with the Communicative Theory of Terminology (CTT) as its mainstay. According to the grounding principles of the CTT, this research has departed form a corpus-based approach in which terms have been analysed in vivo and characterised from the natural habitat in which they are given in specialised communication/discourse. In this light, it has been put forward how the study of terms – made possible thanks to the activity of compiling and describing them, called terminography – may be complemented by the wider projection of specialised lexicography for the compilation and elaboration of LSP, user-oriented and user-friendly quality products in the form of dictionaries. This specialised lexicographical dimension of the work has necessarily implied the need to renew the concept of speciality language dictionaries applied to the ceramic industry and has given way to the creation of a (prospective) active dictionary in this field with a marked emphasis on context. Accordingly, the importance of pragmatic aspects in a work of this sort, has made it necessary to undertake an in-depth revision and analysis of the socio-economic context for the research in order be able to establish and solve the specific terminological needs that the ceramic industrial discourse community may find. On the basis of this theoretical framework, the method of study followed for the development of the prospective dictionary has comprised 8 broad stages: the stage of work preparation and corpus compilation, the elaboration of the field diagram, the stage of documentary corpus management, term extraction, data processing, revision and normalisation and finally, the edition stage. Two main types of results have been presented: those obtained through work in progress in the different stages of the method and final ones strictly speaking, that is, 4,000 English-Spanish entries in their final format (as they will appear in the prospective dictionary) belonging to the letters A, B, N, O, U and V of a complete dictionary which will include a total of 26,000 entries

    The communicative theory of Terminology (CTT) applied to the development of a corpus-based specialised dictionary of the ceramics industry

    Get PDF
    Esta tesis es el resultado de un proyecto destinado a la creación de un diccionario activo, bilingüe (español-inglés; inglés-español) y especializado de la industria cerámica y azulejera con la Teoría Comunicativa de la Terminología como su pilar teórico principal. Debido al posicionamiento teórico adoptado, la investigación aquí presentada ha partido de un estudio de corpus (compilado ad hoc) en el que los términos han sido analizados in vivo y caracterizados de acuerdo al ¿habitat¿ en el que se hallan en el texto especializado. Así pues, la aproximación hecha al estudio de la terminología industrial cerámica hace pertinente el uso de la etiqueta ¿lexicografía especializada¿ a la hora de referirnos a un trabajo como éste en el que se ha tratado de ir más allá de la práctica terminográfica para dar lugar a un estudio en el que se prima el contexto, las asociaciones naturales de los términos (colocaciones) y la naturaleza comunicativa de la terminología. De este modo, en esta tesis se ha presentado de manera progresiva, además de un marco teórico detallado y coherente con el fin último de la investigación, la metodología utilizada para la elaboración del diccionario en curso, ampliamente basada en el uso de programas informáticos tanto para la explotación del corpus (WordSmith Tools 4.0), como para la creación de la base de datos terminológica (TermStar XV) y la generación de entradas finales (GENDIC).Así pues, esta tesis presenta de manera progresiva los resultados obtenidos en cada etapa del método de trabajo y 4,000 entradas finales (en este caso del inglés al español) correspondientes a las letras A, B, N, O, U y V del diccionario.This PhD dissertation is the result of an ongoing process aimed at the creation of a bilingual corpus-based specialised active dictionary of the ceramic industry, with the Communicative Theory of Terminology (CTT) as its mainstay. According to the grounding principles of the CTT, this research has departed form a corpus-based approach in which terms have been analysed in vivo and characterised from the natural habitat in which they are given in specialised communication/discourse. In this light, it has been put forward how the study of terms – made possible thanks to the activity of compiling and describing them, called terminography – may be complemented by the wider projection of specialised lexicography for the compilation and elaboration of LSP, user-oriented and user-friendly quality products in the form of dictionaries. This specialised lexicographical dimension of the work has necessarily implied the need to renew the concept of speciality language dictionaries applied to the ceramic industry and has given way to the creation of a (prospective) active dictionary in this field with a marked emphasis on context. Accordingly, the importance of pragmatic aspects in a work of this sort, has made it necessary to undertake an in-depth revision and analysis of the socio-economic context for the research in order be able to establish and solve the specific terminological needs that the ceramic industrial discourse community may find. On the basis of this theoretical framework, the method of study followed for the development of the prospective dictionary has comprised 8 broad stages: the stage of work preparation and corpus compilation, the elaboration of the field diagram, the stage of documentary corpus management, term extraction, data processing, revision and normalisation and finally, the edition stage. Two main types of results have been presented: those obtained through work in progress in the different stages of the method and final ones strictly speaking, that is, 4,000 English-Spanish entries in their final format (as they will appear in the prospective dictionary) belonging to the letters A, B, N, O, U and V of a complete dictionary which will include a total of 26,000 entries

    Ethnography as Commentary

    Get PDF
    The Internet allows ethnographers to deposit the textual materials on which they base their writing in virtual archives. Electronically archived fieldwork documents can be accessed at any time by the writer, his or her readers, and the people studied. Johannes Fabian, a leading theorist of anthropological practice, argues that virtual archives have the potential to shift the emphasis in ethnographic writing from the monograph to commentary. In this insightful study, he returns to the recording of a conversation he had with a ritual healer in the Congolese town of Lubumbashi more than three decades ago. Fabian’s transcript and translation of the exchange have been deposited on a website (Language and Popular Culture in Africa), and in Ethnography as Commentary he provides a model of writing in the presence of a virtual archive

    Religions around the Arctic: Source Criticism and Comparisons

    Get PDF
    publishedVersio

    Religions around the Arctic

    Get PDF
    At a seminar at the University of Bergen, Norway, in September 2018, scholars from Denmark, Finland, Norway, and Sweden presented and discussed various forms of source criticism and comparison with examples from the Arctic and Sub-Arctic regions of Eurasia and North America. A selection of the papers read at the seminar are published in this volume. Each of the chapters in the first part compares local phenomena from two or more cultural contexts: a Swedish, a Karelian, an Estonian and an Irish place name that include words for hostage (Stefan Olsson), Old Icelandic and Sami ancestor mountains (Eldar Heide), and Finno-Karelian bear incantations and Ob-Ugrian bear songs (Vesa Matteo Piludu). The second part gives examples of different forms of source criticism in the analysis of indigenous Sami religion. The functions of a newly found ritual drum is discussed in relation to contemporary written sources (Dikka Storm & Trude Fonneland), the court proceedings from a witchcraft trial in 1692 is discussed with the help of Gérard Genette’s category ‘voice’ (Liv Helene Willumsen), and a content analysis of an introduction to indigenous Sami religion shows that the editor added text of his own to the original manuscript (Konsta Kaikkonen). In the third part, the area is widened to other parts of the Arctic. Here, a selection of theoretical perspectives is used to illuminate local empirical material. They give examples of how Native North American bear rituals and sweat bath traditions can be analysed with the help of an ecology of religion model and ritual theories, respectively (Riku Hämäläinen), of how Soviet researchers used the concepts of ‘spirits’ and ‘gods’ when they analysed the world view of the Nganasan (Olle Sundström), and of how representatives of academia have been instrumental in the ‘finding, claiming, and authorizing’ of Sakha religions (Liudmila Nikanorova). Although the papers only deal with a few of the peoples living in the Arctic and Sub-Arctic regions, the examples of source critical and comparative problems they discuss are of great general relevance

    Religions around the Arctic: Source Criticism and Comparisons

    Get PDF
    At a seminar at the University of Bergen, Norway, in September 2018, scholars from Denmark, Finland, Norway, and Sweden presented and discussed various forms of source criticism and comparison with examples from the Arctic and Sub-Arctic regions of Eurasia and North America. A selection of the papers read at the seminar are published in this volume. Each of the chapters in the first part compares local phenomena from two or more cultural contexts: a Swedish, a Karelian, an Estonian and an Irish place name that include words for hostage (Stefan Olsson), Old Icelandic and Sami ancestor mountains (Eldar Heide), and Finno-Karelian bear incantations and Ob-Ugrian bear songs (Vesa Matteo Piludu). The second part gives examples of different forms of source criticism in the analysis of indigenous Sami religion. The functions of a newly found ritual drum is discussed in relation to contemporary written sources (Dikka Storm & Trude Fonneland), the court proceedings from a witchcraft trial in 1692 is discussed with the help of Gérard Genette’s category ‘voice’ (Liv Helene Willumsen), and a content analysis of an introduction to indigenous Sami religion shows that the editor added text of his own to the original manuscript (Konsta Kaikkonen). In the third part, the area is widened to other parts of the Arctic. Here, a selection of theoretical perspectives is used to illuminate local empirical material. They give examples of how Native North American bear rituals and sweat bath traditions can be analysed with the help of an ecology of religion model and ritual theories, respectively (Riku Hämäläinen), of how Soviet researchers used the concepts of ‘spirits’ and ‘gods’ when they analysed the world view of the Nganasan (Olle Sundström), and of how representatives of academia have been instrumental in the ‘finding, claiming, and authorizing’ of Sakha religions (Liudmila Nikanorova). Although the papers only deal with a few of the peoples living in the Arctic and Sub-Arctic regions, the examples of source critical and comparative problems they discuss are of great general relevance
    corecore