13 research outputs found
A Study on Gene Annotation from Biological Literature
為了幫助生物學者可以從快速成長的網際網路上迅速有效地了解所需基因的相關資訊,本論文針對生物文件的基因標示,探討相關課題。例如:增加基因辨認的效能、分類出資料庫管理員有興趣的文件、擷取基因功能、標示基因本體等並提出相關的解決辦法,以便整合基因標示於資料庫中。
為了擷取生物文件的基因資訊,辨認基因是最基本的步驟。針對增加基因辨認的效能,我們提出混合的策略:過濾及整合策略。在實驗中,我們可以自動的探勘出常和基因出現的詞語,並應用於過濾不可能的基因候選者。另外,為了提高回收率,我們使用了整合策略。實驗顯示此種混合策略可以提高原有的辨認效能。當我們可以正確標示基因後,另一個重要的課題就是取出資料庫管理員有興趣的文件。本論文針對基因本體探討不同層級的分類方式。我們利用文章的三個部分─(1)標題與摘要,(2)MeSH term,(3)圖表標題,及UMLS的語意網路在SVM上訓練,實驗結果顯示相當好的效能。另一方面,擷取基因功能是瞭解基因的重要方式。目前,透過Entrez gene database的GeneRIF項目,可以用人工方式建立基因功能。我們提出兩種方法─(1)功能擷取方法及(2)機器學習方法從文件中自動產生GeneRIF,這對傳統的人工傳送GeneRIF方式有很大的幫助。
最後,為了將文件中的資訊整合到資料庫,我們將基因的特性以標準的字彙來表示,本論文選擇目前十分普遍的標準字彙:Gene Ontology (GO)。在GO標示的研究上,研究人員往往位於不同的層級進行標示:(1)文件層級及(2)基因層級。前者是標示出文章所具有的GO,而後者更明確描述是哪一個基因在這篇文章中具有的GO標示。本論文對這兩個層級分別加以探討。在文件層級上提出相關性偵測的方法,而在基因層級上提出密度模組及重力模組。在文件層級上,我們將先前抽出來的GeneRIF視為標示GO的證據,讓資料庫管理員參考。而在基因層級上,除了利用基因與GO的鄰近關係外,我們還應用物理的萬有引力定律,實驗結果證明密度與重力關係在GO標示上同樣都是很好的特徵。In this dissertation we study various issues which will help biologists obtain relevant gene information from the rapidly growing body of online material in the biomedical field. Namely, our studies aim to improve the performance of biomedical named entity recognition, to classify the relevant documents for database curators, to improve gene function extraction and Gene Ontology annotation. We propose some approaches for each issue. Our final goal is to integrate the information extracted from the biological literature into the existing databases.
Given biomedical documents, it is fundamental to recognize the biomedical entities first. For improving the performance of biomedical entity recognition, we introduce a hybrid strategy for a filtering strategy and an integration strategy. We show a fully automatic method of mining collocates from scientific texts in the protein and gene domain and applying collocates to filter out unlikely protein/gene candidates. Furthermore, we use the integration strategy to increase recall rates. The experimental results demonstrate this hybrid strategy performs better than the original protein/gene taggers. After biomedical entities are recognized, another important issue is to retrieve the relevant documents for database curators so that this information can be added to the existing database. The dissertation also investigates different granularities of classification for GO annotation. We utilize the three parts of an article, i.e., (1) titles and abstracts, (2) Mesh terms and (3) captions of tables and figures, as well as the semantic network of UMLS as features for SVM. Evaluation results demonstrate overall high performance in this work. Thirdly, gene function extraction is essential for biologists to understand genes. Currently, researchers can manually submit GeneRIFs in the Entrez gene database. We propose two approaches, a "function extraction approach" and a "machine learning approach" to automatically extract GeneRIFs from the curatable documents generated in the previous step. The experimental results are promising.
Finally, in order to integrate the extracted information into the database, it is necessary to present genes with standard vocabularies. We use the highly popular controlled vocabularies, Gene Ontology (GO), in this dissertation. Researchers usually do GO annotation at different levels, i.e., "document level" and "gene level." The former annotates the GO terms in the document without identifying the relevant genes while the latter explicitly identifies the annotation of genes, GO terms and documents. This dissertation explores GO annotation at both levels. At the document level, we annotate genes by the relevance detection approach. At the gene level, we introduce density and gravitation models. Moreover, we utilize GeneRIFs extracted in the previous stage as the references for annotating GO terms at the document level. It will be of great help for database curators. In addition, we explore the proximity of genes and GO terms in the paragraph at the gene level. Our experiments show that density and gravitation relationships are good features for GO annotation.口試委員會審定書 i
誌謝 iii
摘要 v
Abstract vii
Table of Contents ix
List of Illustrations xiv
List of Tables xvi
Chapter 1 Introduction 1
1.1 Data Mining for Gene Annotation from Biological Literature 1
1.2 Biomedical Entity Recognition 4
1.3 Gene Function Generation 8
1.4 Biological Database Curation 11
1.5 The Goal of the Study 13
Chapter 2 Biomedical Entity Recognition 15
2.1 Introduction 15
2.2 Collocates of Biomedical Named Entities 17
2.2.1 Tagging the Corpus 18
2.2.2 Preprocessing 18
2.2.2.1 Exclusion of Stopwords 18
2.2.2.2 Stemming 19
2.2.3 Computing Collocation Statistics 19
2.2.3.1 Frequency 20
2.2.3.2 Mean and Variance 20
2.2.3.3 t Test Model 21
2.2.4 Extraction of Collocates 22
2.3 Experiments 23
2.3.1 Consideration of Precision Rate 23
2.3.1.1 Filtering Strategies 23
2.3.1.2 Evaluation of Filtering Strategies 26
2.3.2 Consideration of Recall Rage 28
2.3.2.1 Integration Strategies 28
2.3.2.2 Integration Evaluation of Proteins 32
2.3.2.3 Integration Evaluation of Genes 35
2.4 Discussion 37
Chapter 3 Biological Database Curatable Documents 39
3.1 Introduction 39
3.2 Study of Retrieving Relevant Documents for GO Annotation 41
3.2.1 System Overview 41
3.2.2 Methods 42
3.2.2.1 Document Preprocessing 42
3.2.2.2 Employing Domain-Specific Knowledge 43
3.2.2.3 Model Selection 44
3.2.3 Experiments 45
3.2.3.1 Experimental Data 45
3.2.3.2 Evaluation Metrics 45
3.2.3.3 Results and Discussion 47
3.3 Study of Retrieving Relevant Documents for GO Sub-ontology Annotation 48
3.3.1 System Overview 49
3.3.2 Methods 50
3.3.2.1 Document Preprocessing 50
3.3.2.2 Feature Extraction 51
3.3.2.3 Exploitation of Full Text Documents 52
3.3.2.4 SVM Classification 53
3.3.2.5 Normalization versus Stemming 53
3.3.3 Experiments 54
3.3.3.1 Experimental Data 54
3.3.3.2 Evaluation Metrics 55
3.3.3.3 Results and Discussion 56
3.4 Discussion 57
Chapter 4 Gene Function Generation 59
4.1 Introduction 59
4.2 Architecture Overview 61
4.2.1 Background 61
4.2.2 Overall Architecture 63
4.3 Methods 64
4.3.1 Function Extraction Approach 64
4.3.1.1 Training Material Preparation 64
4.3.1.2 Function Words Extraction 65
4.3.1.3 Introducers Extraction 65
4.3.1.4 Compute the Weight for Each Function Words, weight(wi) 65
4.3.1.5 Compute the Score for Each Sentence in the Testing Abstract 65
4.3.1.6 Function Extraction Algorithm 66
4.3.2 Machine Learning Approach 67
4.3.2.1 Training and Test Material Preparation 67
4.3.2.2 GRIF Words Extraction and Weighting Scheme 68
4.3.2.3 Class Definition and Feature Extraction 68
4.3.2.4 Training SVMs 70
4.3.2.5 Picking Up the Answer Sentence 71
4.4 Experiments 71
4.4.1 Experimental Data 71
4.4.2 Evaluation Metrics 72
4.4.3 Function Extraction Approach 73
4.4.3.1 Results of the First Experiment 73
4.4.3.2 Results with Different Weight Schemes 74
4.4.4 Machine Learning Approach 75
4.5 Discussion 78
Chapter 5 GO Annotation at Document Level 79
5.1 Introduction 79
5.2 Annotation Flow 81
5.3 Methods 84
5.3.1 Experimental Data 84
5.3.2 Similarity Measure 86
5.3.3 Generation of Predicted GO Terms 88
5.4 Results 89
5.5 Discussion 94
Chapter 6 GO Annotation at Gene Level 97
6.1 Introduction 97
6.2 System Overview 100
6.3 Corpus Consideration 100
6.4 Methods 102
6.4.1 Gene Name Tagging 102
6.4.2 GO Term Tagging 103
6.4.3 GO to Gene Association 104
6.4.3.1 Density Model 105
6.4.3.2 Gravitation Model 107
6.5 Experiments 109
6.5.1 Evaluation Metrics 109
6.5.2 Results 110
6.6 Discussion 115
Chapter 7 Conclusions and Future Work 117
7.1 Achievements 117
7.2 Future Work 121
References- 123
Appendix 137
Appendix A Terms suggested by an expert 13
[[alternative]]Application of neural network on student modelling in ITS
[[abstract]]一般而,言智慧型教學系統分成四大模組,分別為領域專家模組、學生模組、教學模
組與介面模組。而學生模組在此扮演著十分重要的角色。因為學生模組可以提供其他
模組有關學生的基本資料,使得智慧教學系統得以評估學生對教學材料之熟悉程度,
並可記錄學生的學習進展,而且更進一步的推導出目前學生的狀態,所以可以滿足個
別教學的需要。
本篇論文使用類神經結構來建立學生模組。首先,我們先從老師處得到課程結構與教
學步驟,以便建立適合該程的類神經網路。學生在學習過程中所呈現的反應將被類神
經網路當做輸入資料讀入。藉由學生不斷的學習,該網路可以調整內部的資料,記錄
學生的學習歷程,進而推導出學生的學習狀態,判斷學生對那個課程子題不夠熟悉。
最後由類神經網路提出建議給教學系統以決定接下來的教學子題。
本類神經網路的特色是利用教學時已有的基本資料─課程結構與教學步聚─建立出來
的,能符合教學的特性。主要的目的是想建立具有彈性之學生模組,並且利用類神經
網路強大的分析能力。偵察出學生在學習過程中的一些概念的缺失,而提出訊息給教
學系統。
Relationship of common dyadic coping to marital satisfaction and quality of life for patients with brain injury and their spouses in a rehabilitation facility: using common fate model
目的 基于共同命运模型,考察康复机构脑损伤后功能障碍患者与配偶的共同二元应对水平,探讨其与夫妻的婚姻满意度和生活质量的关系。方法 2022年10月至2023年6月,选取北京博爱医院脑损伤住院患者101例,患者夫妻填写共同二元应对量表、Kansas婚姻满意度量表、世界卫生组织生活质量简表。结果 共同二元应对水平与婚姻满意度呈显著正相关(β=0.814, P <0.001)。共同二元应对水平与各自的生活质量呈显著正相关(β=0.271, P=0.038; β=0.481, P <0.001);其中,配偶在身体健康、心理、社会关系和环境方面均与共同二元应对水平呈显著正相关,患者仅在心理和社会关系维度显著相关。结论 面对脑损伤应激,患者夫妻的共同二元应对水平可正向预测双方的婚姻满意度和生活质量,且对配偶的预测效应更强。宜将患者夫妻作为整体,纳入临床管理,促进双方心理积极适应,提高康复效果。</p
平行科研院所:从数字化转型到智能化变革
为应对当前科研院所在管理工作与科研业务方面面临的双重复杂性挑战,平行科研院所的概念被提出。平行科研院所以虚实互动的平行智能理论为基础,利用基于数字孪生与元宇宙的数字化建设技术、基于区块链与去中心化自治组织与运营(DAO)的分布式治理技术、基于多模态大数据与大模型的智能化决策技术以及基于分布式自主科学(DeSci)与人工智能驱动的科学(AI4S)的科学创新范式,形成基于复杂科学的科研院所变革引导方案,构建可信、可靠、可用和高效益的智慧科研组织与运营生态。介绍了平行科研院所的体系设计与关键技术,描述了其主要特征与优势,并探讨了其典型应用场景。平行科研院所超越简单的科研院所数字化转型,强调更高级的智能化变革,旨在促进科研院所的可持续健康发展
Application of neural network for implementing a practical student model
[[abstract]]The student model plays a critical role in intelligent tutoring systems. With a student model, the tutoring system can adapt the learning process in order to satisfy the individual needs of every student. Here we use neural network methods to implement the student model. In the proposed approach, first a teacher has to apply the curriculum structure and crucial pedagogical steps for the learning process. These two data will be transmitted as proper weights and layers in the network. Then our neural network takes the student's responses as input data and performs learning. During learning, the network records the student's history and changes the corresponding weights in order to record his or her learning state. Finally, the system infers the student's misunderstandings and provides suggestions to the tutor. Our goal is to make a student model embedded in a neural network, and evaluate misunderstandings in the learning processes of learners. Based on the experiment, the proposed neural network can infer the topic where the student's under-standing is weakest as a reference for the next lesson.
JUNO Sensitivity on Proton Decay Searches
The Jiangmen Underground Neutrino Observatory (JUNO) is a large liquid scintillator detector designed to explore many topics in fundamental physics. In this paper, the potential on searching for proton decay in mode with JUNO is investigated.The kaon and its decay particles feature a clear three-fold coincidence signature that results in a high efficiency for identification. Moreover, the excellent energy resolution of JUNO permits to suppress the sizable background caused by other delayed signals. Based on these advantages, the detection efficiency for the proton decay via is 36.9% with a background level of 0.2 events after 10 years of data taking. The estimated sensitivity based on 200 kton-years exposure is years, competitive with the current best limits on the proton lifetime in this channel
