19 research outputs found

    Morhological Disambiguator For Turkish

    No full text
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2007Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2007Doğal Dil İşlemenin kapsamında olan, özet çıkartma, metin sınıflandırması, diller arası çeviri ve anlamsal çalışmalarda bir sözcüğün niteliğinin belirlenmesi önemli bir konudur. Bir sözcüğün niteliği demek, en temel anlatımla o sözcüğün isim mi fiil mi sıfat mı olduğudur. Biraz ayrıntıya inildiğinde, daha alt özelliklerin de belirtilmesi gerekir. Sözgelimi bir fiilin zamanı ve şahsının belirtilmesidir.Türkçe bitişken bir dil olması nedeniyle, sözcükler kök, gövde ve eklerden oluşur. Türkçe ön ek açısından çok kısıtlı olmasına karşın, son ekler konusunda olağanüstü zengindir. Bu eklerin 300 civarında bulunduğu ve bunların yarısının yoğun bir şekilde kullanıldığı bilinmektedir. Türkçe’nin yeni sözcük türetmesine büyük katkı sağlayan bu özelliği, biçimbirimsel çözümlemede sorunlara neden olmaktadır. Bir Hint-Avrupa ailesi dilinde bir sözcüğün biçimbirimsel analizi sonunda belirsizlik oranı az iken, Türkçe’de bir sözcük için 1,8 çözüm bulunmaktadır.Bu tezin amacı Türkçe için biçimbirimsel çözümleyicinin ürettiği sonuçlardan doğru olanının saptanmasıdır. Bu amaçla önce Türkçe’deki sözcüklerin belirsizlik dağılımları çıkartıldı. Ardından sözcükler belirsizlik niteliklerine göre kümelendi. Her belirsizlik türü için kurallar yazarak belirsizlikler giderilmiştir.It is an important issue in summarization, text - categorization and semantic related works ,which are the study areas of Natural Language Processing , to detect the properties of a word .The property of a word basically means whether the word is an adjective, a noun or a verb, etc. More precisely, it is not that simple. The sub-categories (for instance, the tense and personal properties of a verb) should also be analayzed. Because of the agglutinative nature of Turkish, Turkish words are made of roots, stems and affixes. Turkish has restricted prefixes but there are roughly 300 suffixes. It is known that the half of these suffixes are widely used. Although this feature helps new word formations, also produces new morphological problem with it. While, in a n Euraopean language, the morphological analaysis of a word produces a few results for a word, this number is 1,8 for a Turkish word. The aim of this thesis is selecting the true parse of a word in a given context from the results produced by a morphological analyzer. For achieving this goal, firstly the morpholgical ambiguities in Turkish is classified and the distribution of the ambiguiteis are analyzed..Then, for all categories, spesific diambiguation rules are written to get rid of the morphologiacl ambiguities.Yüksek LisansM.Sc

    Linking entities through an ontology using word embeddings and syntactic re-ranking

    No full text
    Abstract Background Although there is an enormous number of textual resources in the biomedical domain, currently, manually curated resources cover only a small part of the existing knowledge. The vast majority of these information is in unstructured form which contain nonstandard naming conventions. The task of named entity recognition, which is the identification of entity names from text, is not adequate without a standardization step. Linking each identified entity mention in text to an ontology/dictionary concept is an essential task to make sense of the identified entities. This paper presents an unsupervised approach for the linking of named entities to concepts in an ontology/dictionary. We propose an approach for the normalization of biomedical entities through an ontology/dictionary by using word embeddings to represent semantic spaces, and a syntactic parser to give higher weight to the most informative word in the named entity mentions. Results We applied the proposed method to two different normalization tasks: the normalization of bacteria biotope entities through the Onto-Biotope ontology and the normalization of adverse drug reaction entities through the Medical Dictionary for Regulatory Activities (MedDRA). The proposed method achieved a precision score of 65.9%, which is 2.9 percentage points above the state-of-the-art result on the BioNLP Shared Task 2016 Bacteria Biotope test data and a macro-averaged precision score of 68.7% on the Text Analysis Conference 2017 Adverse Drug Reaction test data. Conclusions The core contribution of this paper is a syntax-based way of combining the individual word vectors to form vectors for the named entity mentions and ontology concepts, which can then be used to measure the similarity between them. The proposed approach is unsupervised and does not require labeled data, making it easily applicable to different domains

    ISIKSumm at BioLaySumm task 1: BART-based summarization system enhanced with Bio-entity labels

    No full text
    Communicating scientific research to the general public is an essential yet challenging task. Lay summaries, which provide a simplified version of research findings, can bridge the gap between scientific knowledge and public understanding. The BioLaySumm task (Goldsack et al., 2023) is a shared task that seeks to automate this process by generating lay summaries from biomedical articles. Two different datasets that have been created from curating two biomedical journals (PLOS and eLife) are provided by the task organizers. As a participant in this shared task, we developed a system to generate a lay summary from an article’s abstract and main text.Publisher's Versio

    ISIKUN at the FinCausal 2020: Linguistically informed machine-learning approach for causality identification in financial documents

    No full text
    This paper presents our participation to the FinCausal-2020 Shared Task whose ultimate aim is to extract cause-effect relations from a given financial text. Our participation includes two systems for the two sub-tasks of the FinCausal-2020 Shared Task. The first sub-task (Task-1) consists of the binary classification of the given sentences as causal meaningful (1) or causal meaningless (0). Our approach for the Task-1 includes applying linear support vector machines after transforming the input sentences into vector representations using term frequency-inverse document frequency scheme with 3-grams. The second sub-task (Task-2) consists of the identification of the cause-effect relations in the sentences, which are detected as causal meaningful. Our approach for the Task-2 is a CRF-based model which uses linguistically informed features. For the Task-1, the obtained results show that there is a small difference between the proposed approach based on linear support vector machines (F-score 94%), which requires less time compared to the BERT-based baseline (F-score 95%). For the Task-2, although a minor modifications such as the learning algorithm type and the feature representations are made in the conditional random fields based baseline (F-score 52%), we have obtained better results (F-score 60%). The source codes for the both tasks are available online (https://github.com/ozenirgokberk/FinCausal2020.git/).Publisher's Versio

    BOUN-ISIK participation: an unsupervised approach for the named entity normalization and relation extraction of Bacteria Biotopes

    No full text
    This paper presents our participation at the Bacteria Biotope Task of the BioNLP Shared Task 2019. Our participation includes two systems for the two subtasks of the Bacteria Biotope Task: the normalization of entities (BB-norm) and the identification of the relations between the entities given a biomedical text (BB-rel). For the normalization of entities, we utilized word embeddings and syntactic re-ranking. For the relation extraction task, pre-defined rules are used. Although both approaches are unsupervised, in the sense that they do not need any labeled data, they achieved promising results. Especially, for the BB-norm task, the results have shown that the proposed method performs as good as deep learning based methods, which require labeled data.Publisher's Versio

    Protein interaction prediction on PHI networks using graph convolution networks

    No full text
    Proteinler yaşamsal faaliyetlerin gerçekleşmesinde kritik rol oynayan biyolojik moleküllerdir. Konak canlı proteinleri ile patojen proteinleri arasındaki etkileşimler patojenkonak etkileşim (PHI) ağlarını oluşturmaktadır. Bu iki parçalı etkileşim ağları patojenin hangi yaşamsal faaliyetleri etkilediğini belirlemede ve dolayısıyla sebep olabileceği hastalıkların tespitinde büyük öneme sahiptir. Proteinler arası etkileşimlerin laboratuvar ortamında tespiti hem zaman alıcı hem de maliyetlidir. Deneysel olarak saptanabilen etkileşim sayısının kısıtlı olması ve bazı etkileşimlerin gözden kaçması hesaplamalı tahmin yöntemlerinin geliştirilmesine önayak olmaktadır. Bu çalışmada PHI ağlarında protein etkileşim tahmini yapmayı sağlayan çizge evrişim ağı (GCN) tabanlı bir yöntem sunulmaktadır. Gözetimsiz olarak eğitilen GCN modeli (GraphSAGE) topolojik bilginin yanı sıra temel öznitelik olarak amino asit dizilimlerini kullanmaktadır. Bu çalışma bildiğimiz kadarıyla PHI ağlarında GCN tabanlı etkileşim tahmini sağlayan ilk çalışmadır. Deneysel sonuçlar geliştirilen modelin kıyaslama için kullanılan PHI veri seti üzerinde yüksek performanslı algoritmalardan %10 daha iyi performans göstererek %96 oranında doğrulukla etkileşim tahmini yaptığını göstermektedir.Proteins are biological molecules that play a critical role in vital biological processes. Interactions between pathogen proteins and host proteins form pathogen-host interaction (PHI) networks. These bipartite interaction networks have great importance in determining which vital activities are affected by the pathogen and the diseases it may cause. Experimental detection of the protein interactions in wet labs is both timeconsuming and costly. The limited number of experimentally detectable interactions and overlook of some potential interactions lead to development of computational methods. In this study, a graph convolution network (GCN) based method is presented that enables to predict protein-protein interactions in PHI networks. The unsupervised trained GCN model (GraphSAGE) uses amino acid sequences as node features as well as the topological information. This is the first study to the best of our knowledge which provides GCN models to do protein-protein interaction prediction in PHI networks. The experimental results show that the developed model performs 10% better than the state-of-art algorithms on the benchmark PHI dataset and it predicts interactions with 96% accuracy.Publisher's Versio

    Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses

    No full text
    Computational identification of human-virus protein-protein interactions (PHIs) is a worthwhile step towards understanding infection mechanisms. Analysis of the PHI networks is important for the determination of path-ogenic diseases. Prediction of these interactions is a popular problem since experimental detection of PHIs is both time-consuming and expensive. The available methods use biological features like amino acid sequences, molecular structure, or biological activities for prediction. Recent studies show that the topological properties of proteins in protein-protein interaction (PPI) networks increase the performance of the predictions. The basic network projections, random-walk-based models, or graph neural networks are used for generating topologically enriched (hybrid) protein embeddings. In this study, we propose a three-stage machine learning pipeline that generates and uses hybrid embeddings for PHI prediction. In the first stage, numerical features are extracted from the amino acid sequences using the Doc2Vec and Byte Pair Encoding method. The amino acid embeddings are used as node features while training a modified GraphSAGE model, which is an improved version of the graph convolutional network. Lastly, the hybrid protein embeddings are used for training a binary interaction classifier model that predicts whether there is an interaction between the given two proteins or not. The proposed method is evaluated with comprehensive experiments to test its functionality and compare it with the state-of-art methods. The experimental results on the benchmark dataset prove the efficiency of the proposed model by having a 3–23% better area under curve (AUC) score than its competitors.Publisher's VersionQ2WOS:000858873400002PMID: 3603772

    Increasing the awareness of the parents regarding the oral health status of their 0-3 years-old children

    No full text
    Objective: This study aimed to: (a) compare nursing students' knowledge about the oral and dental health of children before and after oral health education; (b) evaluate the effectiveness of oral health education of parents with children aged 0-3-year-old by these trained students during home visits. Methods: In this quasiexperimental study, firstly, 60 senior students in the nursing department were trained on infants' and young children's oral and dental health through standartized training modules. Secondly, 180 parents with children aged 0-3 from low socioeconomic status were trained by the nursing students during home visits with face to face interviews. Both nursing students' and parents' pre and post-training knowledge levels and the changes in their awareness after training were measured using standard questions which were administered as pre-test (preT) and post-test (postT) with I week intervals. Wileoxon test was used to assess the differences of correct answers between pre- and post-tests. The significant value was considered as p<0.05. Results: The mean age of the children of the parents was 18.92 +/- 9.96 months. The median values of preT and postT were found to be 28 and 37.5, respectively for the total number of correct answers given to 50 questions related to the education consisting of seven modules. It was determined that the total number of correct answers increased statistically significantly in postT (p<0.001). The median values of preT and postT correct answers to 14 questions in the education given to parents through brochures were found to be 6 and 10, respectively. It was determined that the total number of correct answers increased statistically significantly in postT after the training (p<0.001). Conclusion: The knowledge level of nursing students about the oral and dental health of children can be increased through standardized training programs. Home visits and face-to-face interviews are applicable to increase awareness in parents by non-dental healthcare professionals

    Research into the effect of proton pump inhibitors on lungs and leukocytes

    No full text
    Background: Proton pump inhibitors (PPI) are the most commonly used medication in the world. They are prescribed as an effective treatment choice for gastrointestinal system diseases linked to hyperacidity, especially. Additionally, non-indication and unnecessary use is very common. Many publications in recent times have reported significant side effects. However, there are insufficient studies about the mechanism for these side effects.Methods: Twenty-four Wistar albino rats were used in this study. Rats were divided into 3 groups of control, a group administered H2 receptor blockers and a group administered PPI. Medications were administered for 30 days intraperitoneal. After 30 days, rats were euthanized and lung tissue was obtained. Lung were stained for immunohistochemical catalase, superoxide dismutase, glutation peroxidase, myeloperoxidase and toluidine blue and investigated with a light microscope. Transmission Electron Microscopy (TEM) was used to investigate lung tissues and neutrophil leukocytes. Additionally, lung tissue had biochemical hydrogen peroxide (H2O2) levels researched.Results: H2O2 amounts, produced by lysosomes with important duties for neutrophil functions in lung tissues, were found to be statistically significantly reduced in the group administered PPI.Results from investigations of specimens obtained with immunohistochemical staining observed increases in antioxidant amounts in the PPI group. Investigation with TEM identified more inflammation findings in the lung tissue from the group administered PPI compared to the control group and the group administered H2 receptors.Conclusion: In conclusion, we identified long-term PPI use disrupts neutrophil leukocyte functions in lung. All clinicians should be much more careful about PPI use.</p
    corecore