Search CORE

1,361 research outputs found

Hypotheses, evidence and relationships: The HypER approach for representing scientific knowledge claims

Author: Buckingham Shum S.
Carusi A.
de Waard A.
Park J.
Samwald M.
Sándor Á.
Publication venue
Publication date: 01/01/2009
Field of study

Biological knowledge is increasingly represented as a collection of (entity-relationship-entity) triplets. These are queried, mined, appended to papers, and published. However, this representation ignores the argumentation contained within a paper and the relationships between hypotheses, claims and evidence put forth in the article. In this paper, we propose an alternate view of the research article as a network of 'hypotheses and evidence'. Our knowledge representation focuses on scientific discourse as a rhetorical activity, which leads to a different direction in the development of tools and processes for modeling this discourse. We propose to extract knowledge from the article to allow the construction of a system where a specific scientific claim is connected, through trails of meaningful relationships, to experimental evidence. We discuss some current efforts and future plans in this area

CiteSeerX

Open Research Online (The Open University)

Copenhagen University Research Information System

Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

Author: Anantharam Pramod
Anantharam Pramod
Balasuriya Lakshika
Ferrucci David
Kimmig Angelika
McMahon Connor
Meng Lingling
Perera Sujan
Sheth Amit
Wijeratne Sanjaya
Wijeratne Sanjaya
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina

CORE

A text-mining system for extracting metabolic reactions from full-text articles

Author: Czarnecki Jan M.
Nobeli Irene
Shepherd Adrian J.
Smith Adrian M.L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway—metabolic pathways—has been largely neglected. Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein–protein interactions. Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein–protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Birkbeck Institutional Research Online

Extracting and Visualizing Semantic Relationships from Chinese Biomedical Text

Author: Meng Yao
Miao Qingliang
Yu Hao
Zhang Bo
Zhang Shu
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository

Recommended from our members

Lexical patterns, features and knowledge resources for coreference resolution in clinical notes

Author: Abdul Roudsari
D’Avolio
Miller
Phil Gooch
Rahman
Recasens
Rosse
Savova
Savova
Uzuner
van Deemter
Zheng
Zheng
Publication venue: 'Elsevier BV'
Publication date: 01/10/2012
Field of study

Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download

City Research Online

Elsevier - Publisher Connector

Crossref

Exploring subdomain variation in biomedical language.

Author: Korhonen Anna
Lippincott Thomas
Séaghdha Diarmuid Ó
Publication venue: BMC Bioinformatics
Publication date: 27/05/2011
Field of study

BACKGROUND: Applications of Natural Language Processing (NLP) technology to biomedical texts have generated significant interest in recent years. In this paper we identify and investigate the phenomenon of linguistic subdomain variation within the biomedical domain, i.e., the extent to which different subject areas of biomedicine are characterised by different linguistic behaviour. While variation at a coarser domain level such as between newswire and biomedical text is well-studied and known to affect the portability of NLP systems, we are the first to conduct an extensive investigation into more fine-grained levels of variation. RESULTS: Using the large OpenPMC text corpus, which spans the many subdomains of biomedicine, we investigate variation across a number of lexical, syntactic, semantic and discourse-related dimensions. These dimensions are chosen for their relevance to the performance of NLP systems. We use clustering techniques to analyse commonalities and distinctions among the subdomains. CONCLUSIONS: We find that while patterns of inter-subdomain variation differ somewhat from one feature set to another, robust clusters can be identified that correspond to intuitive distinctions such as that between clinical and laboratory subjects. In particular, subdomains relating to genetics and molecular biology, which are the most common sources of material for training and evaluating biomedical NLP tools, are not representative of all biomedical subdomains. We conclude that an awareness of subdomain variation is important when considering the practical use of language processing applications by biomedical researchers

Springer - Publisher Connector

PubMed Central

Apollo (Cambridge)

Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation

Author: Fesharaki Nooshin J.
Liu Hongfang
Luo Jake
Zhao Yiqing
Publication venue: UWM Digital Commons
Publication date: 01/01/2018
Field of study

Background: The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline. Methods: As a use case of our pipeline, we utilized data from an open source imaging case repository, Radiopaedia.org, to generate a knowledge model that represents the contents of medical imaging reports. We extracted entities and relationships using the Stanford part-of-speech parser and the “Subject:Relationship:Object” syntactic data schema. The identified noun phrases were tagged with the Unified Medical Language System (UMLS) semantic types. An evaluation was done on a dataset comprised of 83 image notes from four data sources. Results: A semantic type network was built based on the co-occurrence of 135 UMLS semantic types in 23,410 medical image reports. By regrouping the semantic types and generalizing the semantic network, we created a knowledge model that contains 14 semantic categories. Our knowledge model was able to cover 98% of the content in the evaluation corpus and revealed 97% of the relationships. Machine annotation achieved a precision of 87%, recall of 79%, and F-score of 82%. Conclusion: The results indicated that our pipeline was able to produce a comprehensive content-based knowledge model that could represent context from various sources in the same domain

University of Wisconsin-Milwaukee

Knowledge Rich Natural Language Queries over Structured Biological Databases

Author: Chu W. W.
Goldsmith E. J.
InterProlog
Kossmann D.
Lawrence C.
Maio C. D.
Mir S.
Mou X.
Nandi A.
Novik L.
Safran M.
Swofford D. L.
Publication venue
Publication date: 30/03/2017
Field of study

Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made

arXiv.org e-Print Archive

Crossref

A Rule-based Methodology and Feature-based Methodology for Effect Relation Extraction in Chinese Unstructured Text

Author: Wang Jingcheng
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2015
Field of study

The Chinese language differs significantly from English, both in lexical representation and grammatical structure. These differences lead to problems in the Chinese NLP, such as word segmentation and flexible syntactic structure. Many conventional methods and approaches in Natural Language Processing (NLP) based on English text are shown to be ineffective when attending to these language specific problems in late-started Chinese NLP. Relation Extraction is an area under NLP, looking to identify semantic relationships between entities in the text. The term “Effect Relation” is introduced in this research to refer to a specific content type of relationship between two entities, where one entity has a certain “effect” on the other entity. In this research project, a case study on Chinese text from Traditional Chinese Medicine (TCM) journal publications is built, to closely examine the forms of Effect Relation in this text domain. This case study targets the effect of a prescription or herb, in treatment of a disease, symptom or body part. A rule-based methodology is introduced in this thesis. It utilises predetermined rules and templates, derived from the characteristics and pattern observed in the dataset. This methodology achieves the F-score of 0.85 in its Named Entity Recognition (NER) module; 0.79 in its Semantic Relationship Extraction (SRE) module; and the overall performance of 0.46. A second methodology taking a feature-based approach is also introduced in this thesis. It views the RE task as a classification problem and utilises mathematical classification model and features consisting of contextual information and rules. It achieves the F-scores of: 0.73 (NER), 0.88 (SRE) and overall performance of 0.41. The role of functional words in the contemporary Chinese language and in relation to the ERs in this research is explored. Functional words have been found to be effective in detecting the complex structure ER entities as rules in the rule-based methodology

Sydney eScholarship

Nominalization and Alternations in Biomedical Language

Author: Adam Meyers
Adam Meyers
Adam Meyers
Adam Meyers
BarbaraH Partee
Ben Goertzel
Beth Levin
Carol Friedman
CharlesJ Fillmore
Christiane Fellbaum
DeborahA Dahl
Douglas Biber
George Dunham
George Hripcsak
Gondy Leroy
Gondy Leroy
James Pustejovsky
Jin-Dong Kim
JM Ko
John Lehrberger
Jonathan Schuman
K. Bretonnel Cohen
Karin Verspoor
KBretonnel Cohen
KBretonnel Cohen
Laurie Bauer
Lawrence Hunter
Leroy Gondy
Lynette Hirschman
M Narayanaswamy
Malka Rappaport-Hovav
Maria Koptjevskaja-Tamm
Martha Palmer
Martha Palmer
Martha Palmer
MartinF Porter
Michael Johnston
Michael Johnston
Naomi Sager
Naomi Sager
ParantuK Shah
PhilipV Ogren
PhilipV Ogren
Pierre Zweigenbaum
Ralph Grishman
Randolph Quirk
Richard Kittredge
Richard Tzong-Han Tsai
Robert P. Futrelle
RobertB Lees
Ron Artstein
Sameer Pradhan
Seth Kulick
T Ono
Thomas Herbst
Thomas Roeper
ThomasC Rindflesch
TimothyW Finin
Tony McEnery
Tuangthong Wattarujeekrit
Wen-Chi Chou
X Yuan
Yacov Kogan
Yuka Tateisi
Zellig Harris
Zheng Ping Jiang
ZZ Hu
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central