Search CORE

16 research outputs found

Exploiting biomedical web resources: a case study

Author: DESSI NICOLETTA
PES BARBARA
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

An increasing number of web resources continue to be extensively used by healthcare operators to obtain more accurate diagnostic results. In particular, health care is reaping the benefits of technological advances in genomic for facing the demand of genetic tests that allow a better comprehension of diagnostic results. Within this context, Gene Ontology (GO) is a popular and effective mean for extracting knowledge from a list of genes and evaluating their semantic similarity. This paper investigates about the potential and any limits of GO ontology as support for capturing information about a set of genes which are supposed to play a significant role in a pathological condition. In particular, we present a case study that exploits some biomedical web resources for devising several groups of functionally coherent genes and experiments about the evaluation of their semantic similarity over GO. Due to the GO structure and content, results reveal limitations that not affect the evaluation of the semantic similarity when genes exhibit simple correlations but influence the estimation of the relatedness of genes belonging to complex organizations

Elsevier - Publisher Connector

Crossref

Archivio istituzionale della ricerca - Università di Cagliari

Knowledge-based extraction of adverse drug events from biomedical text

Author: Afzal M.Z. (Zubair)
Bui C. (Chinh)
Kang N. (Ning)
Kors J.A. (Jan)
Mulligen E.M. (Erik) van
Singh B. (Bharat)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Many biomedical relation extraction systems are machine-learning based and have to be trained on large annotated corpora that are expensive and cumbersome to construct. We developed a knowledge-based relation extraction system that requires minimal training data, and applied the system for the extraction of adverse drug events from biomedical text. The system consists of a concept recognition module that identifies drugs and adverse effects in sentences, and a knowledg

EUR Research Repository

Erasmus University Digital Repository

Term Extraction and Disambiguation for Semantic Knowledge Enrichment: A Case Study on Initial Public Offering (IPO)

Author: Chang Yen-Ling
Deokar Amit
El-Gayar Omar F
Tao Jie
Publication venue: Beadle Scholar
Publication date: 01/01/2015
Field of study

Domain knowledge bases are a basis for advanced knowledge-based systems, manually creating a formal knowledge base for a certain domain is both resource consuming and non-trivial. In this paper, we propose an approach that provides support to extract, select, and disambiguate terms embedded in domain specific documents. The extracted terms are later used to enrich existing ontologies/taxonomies, as well as to bridge domain specific knowledge base with a generic knowledge base such as WordNet. The proposed approach addresses two major issues in the term extraction domain, namely quality and efficiency. Also, the proposed approach adopts a feature-based method that assists in topic extraction and integration with existing ontologies in the given domain. The proposed approach is realized in a research prototype, and then a case study is conducted in order to illustrate the feasibility and the efficiency of the proposed method in the finance domain. A preliminary empirical validation by the domain experts is also conducted to determine the accuracy of the proposed approach. The results from the case study indicate the advantages and potential of the proposed approach

Beadle Scholar at Dakota State University

The Study on Automatic Annotation using Structural/Linguistic Characteristics of biomedical documents

Author: 남세진
Publication venue: 서울대학교 대학원
Publication date: 01/08/2015
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 치의과학과 의료경영정보학전공, 2015. 8. 김홍기.자동 어노테이션에 대한 연구는 급속도로 증가하는 의생명 분야의 논문 과 임상 문서들을 더욱 정확하게 검색하거나 필요한 정보만을 추출할 수 있게 하는 기반이 된다는 점에서 중요하다. 본 연구에서는, 그 중 연구 활 동에서 필수적인 논문 검색과 환자의 질병에 대한 진단, 검사, 그리고 처 방 등을 기록하는데 필수적인 임상서식의 작성에 초점을 맞추어, 이에 필 요한 어노테이션 기술을 연구하였다. 이 두 가지 활동은 의생명 분야의 대 표 문서인 논문과 임상서식을 대상으로 일상적으로 일어나는 것이며, 이 러한 활동이 효율적으로 개선되는 것은 의생명 분야에서 중요한 의미를 가진다. 먼저, 텍스트 형식의 연구 논문에 대해서는 연구 활동의 방향 설정에 중 요한 역할을 하는 초록을 대상으로, 의생명 분야에서 주로 사용하는 IMRAD(Introduction, Methods, Results, and Discussion)로의 자동 태깅을 연구하였다. 이 연구에서는, 기존 언어학 분야에서 의생명 분야의 논문을 대상으로 이룬 결과와 컴퓨터 과학 분야에서 진행돼온 결과를 기 반으로, 계산 비용이 적으면서도 높은 성능을 내는 새로운 자동 태깅 시스 템을 제안하고 개발하였다. 본 연구에서 제안한 방법을 사용하는 경우, 문 장에서 뽑아낸 17개의 특징만으로도 비구조화된 초록을 Accuracy 77.0 ~ 90.3%의 성능으로 분류할 수 있었다. 또한, 기존 연구들에서 사용한 특 징들과 함께 사용했을 때는 최대 Accuracy 91.7%의 성능을 보여주었다. 임상 문서의 경우, EMR(Electronic Medical Record)을 시스템을 사용하는 환경에서는 임상 서식을 통해 생성되는 경우가 대부분이므로, 임 상 서식을 대상으로 자동 태깅을 시도하였다. 임상 서식은 연구 초록과는 달리 이미 구조화된 형식을 가지고 있으므로, 본 연구에서는 이 구조 안에 내재된 전문가의 지식을 태깅하고자 하였다. 이를 위해 새로운 지식모델 과 이를 이용한 임상 서식 작성 지원 시스템인 STEP(Smart Clinical Document Template Editing and Production System)을 개발하였다. STEP의 시스템의 활용성을 검증하기 위해서는 임상 서식 작성 도구를 개 발하여, 지식 모델을 통해 구축된 지식베이스가 임상 서식의 작성을 개선 시킬 수 있음을 보였다. 연구 결과는 의생명 분야의 연구자들에게 대규모의 의생명 관련 논문과 임상에서 지속적으로 생산되는 임상 문서가 더욱 정확하게 검색되고 재사 용될 수 있음을 보여주고 있다. 이러한 결과는 의생명 분야 전반에서 연구 자들의 활동을 개선시킬 수 있다는 점에서 중요하다. 마지막으로, 본 연구 의 성과가 다른 연구자들에게도 활용될 수 있도록, 연구 과정에서 추출한 언어 자원과 결과를 확인할 수 있는 시스템을 웹으로 공개하였다.초 록....................................................................................................i 목 차..................................................................................................iii I. 서론................................................................................................1 1. 연구 배경 ......................................................................................1 2. 연구 목적 ......................................................................................5 3. 논문의 구성....................................................................................6 II. 구조화된 초록의 언어적 특징 추출..................................................7 1. 연구 배경 .....................................................................................7 2. 연구 목적 .....................................................................................9 3. 관련 연구 .....................................................................................9 4. 연구 방법 ................................................................................... 12 4.1. 데이터 코퍼스 ......................................................................... 13 4.2. 섹션 정규화............................................................................. 14 4.3. 섹션 맵핑 ............................................................................... 17 4.4. 언어적 특징 추출 ..................................................................... 18 5. 결과 ......................................................................................... 20 5.1. 섹션별 동사/동사구의 사용 특징 .................................................. 20 5.2. 섹션별 N-gram의 사용 특징 ...................................................... 22 5.3. 섹션별 명사(구)의 사용 특징 ....................................................... 24 5.4. 언어적 특징들의 섹션 구별력 ...................................................... 27 6. 결론 .......................................................................................... 41 III. 언어적 특징을 이용한 초록 문장 분류................................................. 44 1. 연구 배경 ................................................................................... 44 2. 연구 목적 ................................................................................... 45 3. 관련 연구 ................................................................................... 45 4. 연구 방법 ................................................................................... 48 4.1. Feature Set 구성 ................................................................... 48 4.2. 테스트 문서 집합 ...................................................................... 52 4.3. SVM을 이용한 학습 및 평가 ....................................................... 53 5. 연구 결과 ................................................................................... 54 5.1. 언어적 특징별 성능.....................................................................54 5.2. 특징 그룹 조합별 성능 ............................................................... 56 6. 논의 .......................................................................................... 65 IV. 의생명 초록 문장 자동 태깅 시스템.............................................. 67 1. 시스템 소개 ................................................................................ 67 2. 서비스 구성 ................................................................................ 67 2.1. INTRODUCTION...................................................................67 2.2 LEXICAL FEATURES ............................................................. 69 2.3 RESULTS................................................................................71 2.4 ONLINE DEMO.......................................................................73 3. Use Cases ............................................................................... 76 V. 구조적 특징을 이용한 임상 서식의 태깅 ..................................... 78 1. 연구 배경.................................................................................... 78 2. 연구 목표.................................................................................... 80 3. 임상 서식의 태깅을 위한 지식 모델 ................................................... 80 3.1. 온톨로지 ................................................................................ 80 3.2. 개념 모델 ............................................................................... 81 3.3. CDT 온톨로지......................................................................... 85 4. CDT 온톨로지를 이용한 임상서식 태깅 ............................................. 90 5. 결론 .......................................................................................... 93 VI. 임상 서식 지식베이스 기반의 서식 작성 지원 시스템 ............... 94 1. 시스템 소개 ................................................................................ 94 2. 시스템 구성 ................................................................................ 95 2.1. 지식 베이스 관리 모듈 ............................................................... 96 2.2. 핵심 모듈 ............................................................................... 96 2.3. 웹 사용자 인터페이스 .............................................................. 101 2.4. Web Services 인터페이스 ..................................................... 106 3. Use Case ...............................................................................108 4. 결론 ........................................................................................110 VII. 결론 .......................................................................................113 VIII. 연구의 제한점 및 제언 ...............................................................116 참고문헌 .......................................................................................118 부록 ............................................................................................129 Abstract .....................................................................................133Docto

SNU Open Repository and Archive

Text Mining for Chemical Compounds

Author: Akhondi S.A. (Saber)
Publication venue: Exploring the chemical and biological space covered by patent and journal publications is crucial in early- stage medicinal chemistry activities. The analysis provides understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents and journals through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. In this book, we addressed the lack of quality measurements for assessing the correctness of structural representation within and across chemical databases; lack of resources to build text-mining systems; lack of high performance systems to extract chemical compounds from journals and patents; and lack of automated systems to identify relevant compounds in patents. The consistency and ambiguity of chemical identifiers was analyzed within and between small- molecule databases in Chapter 2 and Chapter 3. In Chapter 4 and Chapter 7 we developed resources to enable the construction of chemical text-mining systems. In Chapter 5 and Chapter 6, we used community challenges (BioCreative V and BioCreative VI) and their corresponding resources to identify mentions of chemical compounds in journal abstracts and patents. In Chapter 7 we used our findings in previous chapters to extract chemical named entities from patent full text and to classify the relevancy of chemical compounds.
Publication date: 02/10/2018
Field of study

Exploring the chemical and biological space covered by patent and journal publications is crucial in early- stage medicinal chemistry activities. The analysis provides understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents and journals through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. In this book, we addressed the lack of quality measurements for assessing the correctness of structural representation within and across chemical databases; lack of resources to build text-mining systems; lack of high performance systems to extract chemical compounds from journals and patents; and lack of automated systems to identify relevant compounds in patents. The consistency and ambiguity of chemical identifiers was analyzed within and between small- molecule databases in Chapter 2 and Chapter 3. In Chapter 4 and Chapter 7 we developed resources to enable the construction of chemical text-mining systems. In Chapter 5 and Chapter 6, we used community challenges (BioCreative V and BioCreative VI) and their corresponding resources to identify mentions of chemical compounds in journal abstracts and patents. In Chapter 7 we used our findings in previous chapters to extract chemical named entities from patent full text and to classify the relevancy of chemical compounds

EUR Research Repository

Erasmus University Digital Repository

Agile in-litero experiments:how can semi-automated information extraction from neuroscientific literature help neuroscience model building?

Author: Richardet Renaud Luc
Publication venue: Lausanne, EPFL
Publication date: 08/02/2016
Field of study

In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles in peer-reviewed journals. One challenge for modern neuroinformatics is to design methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and its integration into computational models. In this thesis, we introduce novel natural language processing (NLP) models and systems to mine the neuroscientific literature. In addition to in vivo, in vitro or in silico experiments, we coin the NLP methods developed in this thesis as in litero experiments, aiming at analyzing and making accessible the extended body of neuroscientific literature. In particular, we focus on two important neuroscientific entities: brain regions and neural cells. An integrated NLP model is designed to automatically extract brain region connectivity statements from very large corpora. This system is applied to a large corpus of 25M PubMed abstracts and 600K full-text articles. Central to this system is the creation of a searchable database of brain region connectivity statements, allowing neuroscientists to gain an overview of all brain regions connected to a given region of interest. More importantly, the database enables researcher to provide feedback on connectivity results and links back to the original article sentence to provide the relevant context. The database is evaluated by neuroanatomists on real connectomics tasks (targets of Nucleus Accumbens) and results in significant effort reduction in comparison to previous manual methods (from 1 week to 2h). Subsequently, we introduce neuroNER to identify, normalize and compare instances of identify neuronsneurons in the scientific literature. Our method relies on identifying and analyzing each of the domain features used to annotate a specific neuron mention, like the morphological term 'basket' or brain region 'hippocampus'. We apply our method to the same corpus of 25M PubMed abstracts and 600K full-text articles and find over 500K unique neuron type mentions. To demonstrate the utility of our approach, we also apply our method towards cross-comparing the NeuroLex and Human Brain Project (HBP) cell type ontologies. By decoupling a neuron mention's identity into its specific compositional features, our method can successfully identify specific neuron types even if they are not explicitly listed within a predefined neuron type lexicon, thus greatly facilitating cross-laboratory studies. In order to build such large databases, several tools and infrastructureslarge-scale NLP were developed: a robust pipeline to preprocess full-text PDF articles, as well as bluima, an NLP processing pipeline specialized on neuroscience to perform text-mining at PubMed scale. During the development of those two NLP systems, we acknowledged the need for novel NLP approaches to rapidly develop custom text mining solutions. This led to the formalization of the agile text miningagile text-mining methodology to improve the communication and collaboration between subject matter experts and text miners. Agile text mining is characterized by short development cycles, frequent tasks redefinition and continuous performance monitoring through integration tests. To support our approach, we developed Sherlok, an NLP framework designed for the development of agile text mining applications

Infoscience - École polytechnique fédérale de Lausanne

A framework for an adaptable and personalised e-learning system based on free web resources

Author: Aeiad E
Publication venue
Publication date
Field of study

An adaptable and personalised E-learning system (APELS) architecture is developed to provide a framework for the development of comprehensive learning environments for learners who cannot follow a conventional programme of study. The system extracts information from freely available resources on the Web taking into consideration the learners' background and requirements to design modules and a planner system to organise the extracted learning material to facilitate the learning process. The process is supported by the development of an ontology to optimise and support the information extraction process. Additionally, natural language processing techniques are utilised to evaluate a topic's content against a set of learning outcomes as defined by standard curricula. An application in the computer science field is used to illustrate the working mechanisms of the proposed framework and its evaluation based on the ACM/IEEE Computing Curriculum.A variety of models are developed and techniques used to support the adaptability and personalisation features of APELS. First, a learner’s model was designed by incorporating students’ details, students’ requirements and the domain they wish to study into the system. In addition, learning style theories were adopted as a way of identifying and categorising the individuals, to improve their on-line learning experience and applying it to the learner’s model. Secondly, the knowledge extraction model is responsible for the extraction of the learning resources from the Web that would satisfy the learners’ needs and learning outcomes. To support this process, an ontology was developed to retrieve the relevant information as per users’ needs. In addition, it transforms HTML documents to XHTML to provide the information in an accessible format and easier for extraction and comparison purposes. Moreover, a matching process was implemented to compute the similarity measure between the ontology concepts that are used in the ACM/IEEE Computer Science Curriculum and those extracted from the websites. The website with the highest similarity score is selected as the best matching website that satisfies the learners’ request. A further step is required to evaluate whether the content extracted by the system is the appropriate learning material of the subject. For this purpose, the learning outcome validation process is added to ensure that the content of the selected websites will enable the appropriate learning based to the learning outcomes set by standard curricula. Finally, the information extracted by the system will be passed to a Planner model that will structure the content into lectures, tutorials and workshops based on some predefined learning constraints. The APELS system provides a novel addition to the field of adaptive E-learning systems by providing more personalized learning material to each user in a time-efficient way saving his/her time looking for the right course from the hugely available resources on the Web or going through the large number of websites and links returned by traditional search engines. The APELS system will adapt better to the learner’s style based on feedback and assessment once the learning process is initiated by the learner. The APELS system is expected to develop over time with more users

University of Salford Institutional Repository