2,678 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationThe objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often overwhelmed by excessive, irrelevant data. Scientists have applied natural language processing technologies to improve retrieval. Text summarization, a natural language processing approach, simplifies bibliographic data while filtering it to address a user's need. Traditional text summarization can necessitate the use of multiple software applications to accommodate diverse processing refinements known as "points-of-view." A new, statistical approach to text summarization can transform this process. Combo, a statistical algorithm comprised of three individual metrics, determines which elements within input data are relevant to a user's specified information need, thus enabling a single software application to summarize text for many points-of-view. In this dissertation, I describe this algorithm, and the research process used in developing and testing it. Four studies comprised the research process. The goal of the first study was to create a conventional schema accommodating a genetic disease etiology point-of-view, and an evaluative reference standard. This was accomplished through simulating the task of secondary genetic database curation. The second study addressed the development iv and initial evaluation of the algorithm, comparing its performance to the conventional schema using the previously established reference standard, again within the task of secondary genetic database curation. The third and fourth studies evaluated the algorithm's performance in accommodating additional points-of-view in a simulated clinical decision support task. The third study explored prevention, while the fourth evaluated performance for prevention and drug treatment, comparing results to a conventional treatment schema's output. Both summarization methods identified data that were salient to their tasks. The conventional genetic disease etiology and treatment schemas located salient information for database curation and decision support, respectively. The Combo algorithm located salient genetic disease etiology, treatment, and prevention data, for the associated tasks. Dynamic text summarization could potentially serve additional purposes, such as consumer health information delivery, systematic review creation, and primary research. This technology may benefit many user groups

    OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering.</p> <p>Results</p> <p>OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. </p> <p>Conclusion</p> <p>OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at <url>http://bionlp.sourceforge.net/</url></p

    Disambiguating the species of biomedical named entities using natural language parsers

    Get PDF
    Motivation: Text mining technologies have been shown to reduce the laborious work involved in organizing the vast amount of information hidden in the literature. One challenge in text mining is linking ambiguous word forms to unambiguous biological concepts. This article reports on a comprehensive study on resolving the ambiguity in mentions of biomedical named entities with respect to model organisms and presents an array of approaches, with focus on methods utilizing natural language parsers

    Explainable methods for knowledge graph refinement and exploration via symbolic reasoning

    Get PDF
    Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have created large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine the constructed KGs to enhance their coverage and accuracy via KG completion and KG validation. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Enabling KG exploration, by search and browsing, is also essential for users to understand the KG value and limitations towards down-stream applications. However, the large size of KGs makes KG exploration very challenging. While the type taxonomy of KGs is a useful asset along these lines, it remains insufficient for deep exploration. In this dissertation we tackle the aforementioned challenges of KG refinement and KG exploration by combining logical reasoning over the KG with other techniques such as KG embedding models and text mining. Through such combination, we introduce methods that provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used in inferring missing links in the KG accurately. Furthermore, we propose a framework for constructing human-comprehensible explanations for candidate facts from both KG and text. Extracted explanations are used to insure the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.Wissensgraphen haben viele Anwendungen in verschiedenen Bereichen, beispielsweise im Finanz- und Gesundheitswesen. Wissensgraphen sind jedoch unvollständig und enthalten auch ungültige Daten. Hohe Abdeckung und Korrektheit erfordern neue Methoden zur Wissensgraph-Erweiterung und Wissensgraph-Validierung. Beide Aufgaben zusammen werden als Wissensgraph-Verfeinerung bezeichnet. Ein wichtiger Aspekt dabei ist die Erklärbarkeit und Verständlichkeit von Wissensgraphinhalten für Nutzer. In Anwendungen ist darüber hinaus die nutzerseitige Exploration von Wissensgraphen von besonderer Bedeutung. Suchen und Navigieren im Graph hilft dem Anwender, die Wissensinhalte und ihre Limitationen besser zu verstehen. Aufgrund der riesigen Menge an vorhandenen Entitäten und Fakten ist die Wissensgraphen-Exploration eine Herausforderung. Taxonomische Typsystem helfen dabei, sind jedoch für tiefergehende Exploration nicht ausreichend. Diese Dissertation adressiert die Herausforderungen der Wissensgraph-Verfeinerung und der Wissensgraph-Exploration durch algorithmische Inferenz über dem Wissensgraph. Sie erweitert logisches Schlussfolgern und kombiniert es mit anderen Methoden, insbesondere mit neuronalen Wissensgraph-Einbettungen und mit Text-Mining. Diese neuen Methoden liefern Ausgaben mit Erklärungen für Nutzer. Die Dissertation umfasst folgende Beiträge: Insbesondere leistet die Dissertation folgende Beiträge: • Zur Wissensgraph-Erweiterung präsentieren wir ExRuL, eine Methode zur Revision von Horn-Regeln durch Hinzufügen von Ausnahmebedingungen zum Rumpf der Regeln. Die erweiterten Regeln können neue Fakten inferieren und somit Lücken im Wissensgraphen schließen. Experimente mit großen Wissensgraphen zeigen, dass diese Methode Fehler in abgeleiteten Fakten erheblich reduziert und nutzerfreundliche Erklärungen liefert. • Mit RuLES stellen wir eine Methode zum Lernen von Regeln vor, die auf probabilistischen Repräsentationen für fehlende Fakten basiert. Das Verfahren erweitert iterativ die aus einem Wissensgraphen induzierten Regeln, indem es neuronale Wissensgraph-Einbettungen mit Informationen aus Textkorpora kombiniert. Bei der Regelgenerierung werden neue Metriken für die Regelqualität verwendet. Experimente zeigen, dass RuLES die Qualität der gelernten Regeln und ihrer Vorhersagen erheblich verbessert. • Zur Unterstützung der Wissensgraph-Validierung wird ExFaKT vorgestellt, ein Framework zur Konstruktion von Erklärungen für Faktkandidaten. Die Methode transformiert Kandidaten mit Hilfe von Regeln in eine Menge von Aussagen, die leichter zu finden und zu validieren oder widerlegen sind. Die Ausgabe von ExFaKT ist eine Menge semantischer Evidenzen für Faktkandidaten, die aus Textkorpora und dem Wissensgraph extrahiert werden. Experimente zeigen, dass die Transformationen die Ausbeute und Qualität der entdeckten Erklärungen deutlich verbessert. Die generierten unterstützen Erklärungen unterstütze sowohl die manuelle Wissensgraph- Validierung durch Kuratoren als auch die automatische Validierung. • Zur Unterstützung der Wissensgraph-Exploration wird ExCut vorgestellt, eine Methode zur Erzeugung von informativen Entitäts-Clustern mit Erklärungen unter Verwendung von Wissensgraph-Einbettungen und automatisch induzierten Regeln. Eine Cluster-Erklärung besteht aus einer Kombination von Relationen zwischen den Entitäten, die den Cluster identifizieren. ExCut verbessert gleichzeitig die Cluster- Qualität und die Cluster-Erklärbarkeit durch iteratives Verschränken des Lernens von Einbettungen und Regeln. Experimente zeigen, dass ExCut Cluster von hoher Qualität berechnet und dass die Cluster-Erklärungen für Nutzer informativ sind

    深層学習に基づく感情会話分析に関する研究

    Get PDF
    Owning the capability to express specific emotions by a chatbot during a conversation is one of the key parts of artificial intelligence, which has an intuitive and quantifiable impact on the improvement of chatbot’s usability and user satisfaction. Enabling machines to emotion recognition in conversation is challenging, mainly because the information in human dialogue innately conveys emotions by long-term experience, abundant knowledge, context, and the intricate patterns between the affective states. Recently, many studies on neural emotional conversational models have been conducted. However, enabling the chatbot to control what kind of emotion to respond to upon its own characters in conversation is still underexplored. At this stage, people are no longer satisfied with using a dialogue system to solve specific tasks, and are more eager to achieve spiritual communication. In the chat process, if the robot can perceive the user's emotions and can accurately process them, it can greatly enrich the content of the dialogue and make the user empathize. In the process of emotional dialogue, our ultimate goal is to make the machine understand human emotions and give matching responses. Based on these two points, this thesis explores and in-depth emotion recognition in conversation task and emotional dialogue generation task. In the past few years, although considerable progress has been made in emotional research in dialogue, there are still some difficulties and challenges due to the complex nature of human emotions. The key contributions in this thesis are summarized as below: (1) Researchers have paid more attention to enhancing natural language models with knowledge graphs these days, since knowledge graph has gained a lot of systematic knowledge. A large number of studies had shown that the introduction of external commonsense knowledge is very helpful to improve the characteristic information. We address the task of emotion recognition in conversations using external knowledge to enhance semantics. In this work, we employ an external knowledge graph ATOMIC to extract the knowledge sources. We proposed KES model, a new framework that incorporates different elements of external knowledge and conversational semantic role labeling, where build upon them to learn interactions between interlocutors participating in a conversation. The conversation is a sequence of coherent and orderly discourses. For neural networks, the capture of long-range context information is a weakness. We adopt Transformer a structure composed of self-attention and feed forward neural network, instead of the traditional RNN model, aiming at capturing remote context information. We design a self-attention layer specialized for enhanced semantic text features with external commonsense knowledge. Then, two different networks composed of LSTM are responsible for tracking individual internal state and context external state. In addition, the proposed model has experimented on three datasets in emotion detection in conversation. The experimental results show that our model outperforms the state-of-the-art approaches on most of the tested datasets. (2) We proposed an emotional dialogue model based on Seq2Seq, which is improved from three aspects: model input, encoder structure, and decoder structure, so that the model can generate responses with rich emotions, diversity, and context. In terms of model input, emotional information and location information are added based on word vectors. In terms of the encoder, the proposed model first encodes the current input and sentence sentiment to generate a semantic vector, and additionally encodes the context and sentence sentiment to generate a context vector, adding contextual information while ensuring the independence of the current input. On the decoder side, attention is used to calculate the weights of the two semantic vectors separately and then decode, to fully integrate the local emotional semantic information and the global emotional semantic information. We used seven objective evaluation indicators to evaluate the model's generation results, context similarity, response diversity, and emotional response. Experimental results show that the model can generate diverse responses with rich sentiment, contextual associations

    Semantically-guided evolutionary knowledge discovery from texts

    Get PDF
    This thesis proposes a new approach for structured knowledge discovery from texts which considers both the mining process itself, the evaluation of this knowledge by the model, and the human assessment of the quality of the outcome.This is achieved by integrating Natural-Language technology and Genetic Algorithms to produce explanatory novel hypotheses. Natural-Language techniques are specifically used to extract genre-based information from text documents. Additional semantic and rhetorical information for generating training data and for feeding a semistructured Latent Semantic Analysis process is also captured.The discovery process is modeled by a semantically-guided Genetic Algorithm which uses training data to guide the search and optimization process. A number of novel criteria to evaluate the quality of the new knowledge are proposed. Consequently, new genetic operations suitable for text mining are designed, and techniques for Evolutionary Multi-Objective Optimization are adapted for the model to trade off between different criteria in the hypotheses.Domain experts were used in an experiment to assess the quality of the hypotheses produced by the model so as to establish their effectiveness in terms of novel and interesting knowledge. The assessment showed encouraging results for the discovered knowledge and for the correlation between the model and the human opinions

    Learning Language from a Large (Unannotated) Corpus

    Full text link
    A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
    corecore