10 research outputs found

    Semantic annotation and summarization of biomedical text

    Get PDF
    Advancements in the biomedical community are largely documented and published in text format in scientific forums such as conference papers and journals. To address the scalability of utilizing the large volume of text-based information generated by continuing advances in the biomedical field, two complementary areas are studied. The first area is Semantic Annotation, which is a method for providing machineunderstandable information based on domain-specific resources. A novel semantic annotator, CONANN, is implemented for online matching of concepts defined by a biomedical metathesaurus. CONANN uses a multi-level filter based on both information retrieval and shallow natural language processing techniques. CONANN is evaluated against a state-of-the-art biomedical annotator using the performance measures of time (e.g. number of milliseconds per noun phrase) and precision/recall of the resulting concept matches. CONANN shows that annotation can be performed online, rather than offline, without a significant loss of precision and recall as compared to current offline systems. The second area of study is Text Summarization which is used as a way to perform data reduction of clinical trial texts while still describing the main themes of a biomedical document. The text summarization work is unique in that it focuses exclusively on summarizing biomedical full-text sources as opposed to abstracts, and also exclusively uses domain-specific concepts, rather than terms, to identify important information within a biomedical text. Two novel text summarization algorithms are implemented: one using a concept chaining method based on existing work in lexical chaining (BioChain), and the other using concept distribution to match important sentences between a source text and a generated summary (FreqDist). The BioChain and FreqDist summarizers are evaluated using the publicly-available ROUGE summary evaluation tool. ROUGE compares n-gram co-occurrences between a system summary and one or more model summaries. The text summarization evaluation shows that the two approaches outperform nearly all of the existing term-based approaches.Ph.D., Information Science and Technology -- Drexel University, 200

    Detecting subjectivity through lexicon-grammar. strategies databases, rules and apps for the italian language

    Get PDF
    2014 - 2015The present research handles the detection of linguistic phenomena connected to subjectivity, emotions and opinions from a computational point of view. The necessity to quickly monitor huge quantity of semi-structured and unstructured data from the web, poses several challenges to Natural Language Processing, that must provide strategies and tools to analyze their structures from a lexical, syntactical and semantic point of views. The general aim of the Sentiment Analysis, shared with the broader fields of NLP, Data Mining, Information Extraction, etc., is the automatic extraction of value from chaos; its specific focus instead is on opinions rather than on factual information. This is the aspect that differentiates it from other computational linguistics subfields. The majority of the sentiment lexicons has been manually or automatically created for the English language; therefore, existent Italian lexicons are mostly built through the translation and adaptation of the English lexical databases, e.g. SentiWordNet and WordNet-Affect. Unlike many other Italian and English sentiment lexicons, our database SentIta, made up on the interaction of electronic dictionaries and lexicon dependent local grammars, is able to manage simple and multiword structures, that can take the shape of distributionally free structures, distributionally restricted structures and frozen structures. Moreover, differently from other lexicon-based Sentiment Analysis methods, our approach has been grounded on the solidity of the Lexicon-Grammar resources and classifications, that provides fine-grained semantic but also syntactic descriptions of the lexical entries. According with the major contribution in the Sentiment Analysis literature, we did not consider polar words in isolation. We computed they elementary sentence contexts, with the allowed transformations and, then, their interaction with contextual valence shifters, the linguistic devices that are able to modify the prior polarity of the words from SentIta, when occurring with them in the same sentences. In order to do so, we took advantage of the computational power of the finite-state technology. We formalized a set of rules that work for the intensification, downtoning and negation modeling, the modality detection and the analysis of comparative forms. With regard to the applicative part of the research, we conducted, with satisfactory results, three experiments on the same number of Sentiment Analysis subtasks: the sentiment classification of documents and sentences, the feature-based Sentiment Analysis and the Semantic Role Labeling based on sentiments. [edited by author]XIV n.s

    Semantic annotation and summarization of biomedical text

    Get PDF
    Advancements in the biomedical community are largely documented and published in text format in scientific forums such as conference papers and journals. To address the scalability of utilizing the large volume of text-based information generated by continuing advances in the biomedical field, two complementary areas are studied. The first area is Semantic Annotation, which is a method for providing machineunderstandable information based on domain-specific resources. A novel semantic annotator, CONANN, is implemented for online matching of concepts defined by a biomedical metathesaurus. CONANN uses a multi-level filter based on both information retrieval and shallow natural language processing techniques. CONANN is evaluated against a state-of-the-art biomedical annotator using the performance measures of time (e.g. number of milliseconds per noun phrase) and precision/recall of the resulting concept matches. CONANN shows that annotation can be performed online, rather than offline, without a significant loss of precision and recall as compared to current offline systems. The second area of study is Text Summarization which is used as a way to perform data reduction of clinical trial texts while still describing the main themes of a biomedical document. The text summarization work is unique in that it focuses exclusively on summarizing biomedical full-text sources as opposed to abstracts, and also exclusively uses domain-specific concepts, rather than terms, to identify important information within a biomedical text. Two novel text summarization algorithms are implemented: one using a concept chaining method based on existing work in lexical chaining (BioChain), and the other using concept distribution to match important sentences between a source text and a generated summary (FreqDist). The BioChain and FreqDist summarizers are evaluated using the publicly-available ROUGE summary evaluation tool. ROUGE compares n-gram co-occurrences between a system summary and one or more model summaries. The text summarization evaluation shows that the two approaches outperform nearly all of the existing term-based approaches.Ph.D., Information Science and Technology -- Drexel University, 200

    COMPLEX QUESTION ANSWERING BASED ON A SEMANTIC DOMAIN MODEL OF CLINICAL MEDICINE

    Get PDF
    Much research in recent years has focused on question answering. Due to significant advances in answering simple fact-seeking questions, research is moving towards resolving complex questions. An approach adopted by many researchers is to decompose a complex question into a series of fact-seeking questions and reuse techniques developed for answering simple questions. This thesis presents an alternative novel approach to domain-specific complex question answering based on consistently applying a semantic domain model to question and document understanding as well as to answer extraction and generation. This study uses a semantic domain model of clinical medicine to encode (a) a clinician's information need expressed as a question on the one hand and (b) the meaning of scientific publications on the other to yield a common representation. It is hypothesized that this approach will work well for (1) finding documents that contain answers to clinical questions and (2) extracting these answers from the documents. The domain of clinical question answering was selected primarily because of its unparalleled resources that permit providing a proof by construction for this hypothesis. In addition, a working prototype of a clinical question answering system will support research in informed clinical decision making. The proposed methodology is based on the semantic domain model developed within the paradigm of Evidence Based Medicine. Three basic components of this model - the clinical task, a framework for capturing a synopsis of a clinical scenario that generated the question, and strength of evidence presented in an answer - are identified and discussed in detail. Algorithms and methods were developed that combine knowledge-based and statistical techniques to extract the basic components of the domain model from abstracts of biomedical articles. These algorithms serve as a foundation for the prototype end-to-end clinical question answering system that was built and evaluated to test the hypotheses. Evaluation of the system on test collections developed in the course of this work and based on real life clinical questions demonstrates feasibility of complex question answering and high accuracy information retrieval using a semantic domain model

    Automated Classification of Argument Stance in Student Essays: A Linguistically Motivated Approach with an Application for Supporting Argument Summarization

    Full text link
    This study describes a set of document- and sentence-level classification models designed to automate the task of determining the argument stance (for or against) of a student argumentative essay and the task of identifying any arguments in the essay that provide reasons in support of that stance. A suggested application utilizing these models is presented which involves the automated extraction of a single-sentence summary of an argumentative essay. This summary sentence indicates the overall argument stance of the essay from which the sentence was extracted and provides a representative argument in support of that stance. A novel set of document-level stance classification features motivated by linguistic research involving stancetaking language is described. Several document-level classification models incorporating these features are trained and tested on a corpus of student essays annotated for stance. These models achieve accuracies significantly above those of two baseline models. High-accuracy features used by these models include a dependency subtree feature incorporating information about the targets of any stancetaking language in the essay text and a feature capturing the semantic relationship between the essay prompt text and stancetaking language in the essay text. We also describe the construction of a corpus of essay sentences annotated for supporting argument stance. The resulting corpus is used to train and test two sentence-level classification models. The first model is designed to classify a given sentence as a supporting argument or as not a supporting argument, while the second model is designed to classify a supporting argument as holding a for or against stance. Features motivated by influential linguistic analyses of the lexical, discourse, and rhetorical features of supporting arguments are used to build these two models, both of which achieve accuracies above their respective baseline models. An application illustrating an interesting use-case for the models presented in this dissertation is described. This application incorporates all three classification models to extract a single sentence summarizing both the overall stance of a given text along with a convincing reason in support of that stance

    Grundlagen der Informationswissenschaft

    Get PDF

    Visualisation des résultats de recherche classifiés en contexte de recherche d’information exploratoire : une évaluation d’utilisabilité

    Full text link
    La recherche d’information exploratoire sur le Web présente des défis cognitifs en termes de stratégies cognitives et de tactiques de recherche. Le modèle « question-réponse » des moteurs de recherche actuels est inadéquat pour faciliter les stratégies de recherche d’information exploratoire, assimilables aux stratégies cognitives de l’apprentissage. La visualisation des résultats de recherche est un dispositif qui possède des propriétés graphiques et interactives pertinentes pour le traitement de l’information et l’utilisation de la mémoire et, plus largement de la cognition humaine. Plusieurs recherches ont été menées dans ce contexte de recherche d’information exploratoire, mais aucune n’a distinctement isolé le facteur graphique et interactif de la « visualisation » au sein de son évaluation. L’objectif principal de cette thèse est de vérifier si la visualisation des résultats en contexte de recherche d’information exploratoire témoigne des avantages cognitifs et interactifs pressentis selon ses présupposés théoriques. Pour décrire et déterminer la valeur ajoutée de la visualisation des résultats de recherche dans un contexte de recherche d’information exploratoire sur le Web, cette recherche propose de mesurer son utilisabilité. En la comparant selon les mêmes critères et indicateurs à une interface homologue textuelle, nous postulons que l’interface visuelle atteindra une efficacité, efficience et satisfaction supérieure à l’interface textuelle, dans un contexte de recherche d’information exploratoire. Les mesures objectives de l’efficacité et de l’efficience reposent principalement sur l’analyse des traces de l’interaction des utilisateurs, leur nombre et leur durée. Les mesures subjectives attestant de la satisfaction procurée par l’usage du système dans ce contexte repose sur la perception des utilisateurs par rapport à des critères de perception de la facilité d’utilisation et de l’utilité de l’interface testée et par rapport à des questions plus large sur l’expérience de recherche vécue. Un questionnaire et un entretien ont été passés auprès de chacun des vingt-trois répondants. Leur session de recherche a aussi été enregistré par un logiciel de capture vidéo d’écran. Sur les données des vingt-trois utilisateurs divisés en deux groupes, l’analyse statistique a révélé de faibles différences significatives entre les deux interfaces. Selon les mesures effectuées, l’interface textuelle s’est révélée plus efficace en terme de rappel et de pertinence ; et plus efficiente pour les durées de la recherche d’information. Sur le plan de la satisfaction, les interfaces ont été appréciées toutes deux posivitivement, ne permettant pas de les distinguer pour la grande majorité des métriques. Par contre, au niveau du comportement interactif, des différences notables ont montré que les utilisateurs de l’interface visuelle ont réalisé davantage d’interactions de type exploratoire, et ont procédé à une collecte sélective des résultats de recherche. L’analyse statistique et de contenu sur le critère de l’expérience vécue a permis de démontrer que la visualisation offre l’occasion à l’utilisateur de s’engager davantage dans le processus de recherche d’information en raison de l’impact positif de l’esthétique de l’interface visuelle. De plus, la fonctionnalité de classification a été perçue de manière ambivalente, divisant les candidats peu importe l’interface testée. Enfin, l’analyse des verbatims des « visuelle » a permis d’identifier le besoin de fonctionnalités de rétroaction de l’utilisateur afin de pouvoir communiquer le besoin d’information ou sa pondération des résultats ou des classes, grâce à des modalités interactives de manipulation directe des classes sur un espace graphique.Conducting exploratory searches on the web presents a number of cognitive difficulties as regards search strategies and tactics. The “question-response” model used by the available search engines does not respond adequately to exploratory searches, which are akin to cognitive learning strategies. Visualising search results involves graphic and interactive properties for presenting information that are pertinent for processing and using information, as well as for remembering and, more broadly, for human cognition. Many studies have been conducted in the area of exploratory searches, but none have focussed specifically on the graphic and interactive features of visualisation in their analysis. The principal objective of this thesis is to confirm whether the visualisation of results in the context of exploratory searches offers the cognitive and interactive advantages predicted by conjectural theory. In order to describe and to determine the added value of visualising search results in the context of exploratory web searches, the study proposes to measure its usability. By comparing it to a parallel text interface, using the same criteria and indicators, the likelihood of better efficiency, efficacy, and satisfaction when using a visual interface can be established. The objective measures of efficiency and efficacy are based mainly on the analysis of user interactions, including the number of these interactions and the time they take. Subjective measures of satisfaction in using the system in this context are based on user perception regarding ease of use and the usefulness of the interface tested, and on broader questions concerning the experience of using the search interface. These data were obtained using a questionnaire and a discussion with each participant. Statistical analysis of the data from twenty-three participants divided into two groups showed slightly significant differences between the two interfaces. Analysis of the metrics used showed that the textual interface is more efficient in terms of recall and pertinence, and more efficacious concerning the time needed to search for information. Regarding user satisfaction, both interfaces were seen positively, so that no differences emerged for the great majority of metrics used. However, as regards interactive behaviour, notable differences emerged. Participants using the visual interface had more exploratory interaction, and went on to select and collect pertinent search results. Statistical and content analysis of the experience itself showed that visualisation invites the user to become more involved in the search process, because of the positive effect of a pleasing visual interface. In addition, the classification function was perceived as ambivalent, dividing the participants no matter which interface was used. Finally, analysis of the verbatim reports of participants classed as “visual” indicated the need for a user feedback mechanism in order to communicate information needs or for weighting results or classes, using the interactive function for manipulating classes within a geographic space
    corecore