376 research outputs found

    Dutch named entity recognition using ensemble classifiers

    Get PDF

    Improved Named Entity Recognition Through SVM-Based Combination

    Get PDF
    Named Entity Extraction (NER) consists in identifying specific textual expressions, which represent various types of concepts: persons, locations, organizations, etc. It is an important part of natural language processing, because it is often used when building more advanced text-based tools, especially in the context of information extraction. Consequently, many NER tools are now available, designed to handle various sorts of texts, languages and entity types. A recent study on biographical texts showed the overall indices used to assess the performance of these tools hide the fact they can behave rather differently depending on the textual context, and could actually be complementary. In this work, we check this assumption by proposing two methods allowing to combine several NER tools: one relies on a voting process and the other is SVM-based. Both take advantage of a global text feature to guide the combination process. We extend an existing corpus to provide enough data for training and testing. We implement an open source flexible platform aiming at benchmarking NER tools. We apply our combination methods on a selection of NER tools, including state-of-the-art ones, as well as our custom tool specifically designed to process hyperlinked biographical texts. Our results show both proposed combination approaches outmatch the individual performance of all the considered standalone NER tools. Of the two, the SVM-based approach reaches the highest performance

    Fine-grained Dutch named entity recognition

    Get PDF
    This paper describes the creation of a fine-grained named entity annotation scheme and corpus for Dutch, and experiments on automatic main type and subtype named entity recognition. We give an overview of existing named entity annotation schemes, and motivate our own, which describes six main types (persons, organizations, locations, products, events and miscellaneous named entities) and finer-grained information on subtypes and metonymic usage. This was applied to a one-million-word subset of the Dutch SoNaR reference corpus. The classifier for main type named entities achieves a micro-averaged F-score of 84.91 %, and is publicly available, along with the corpus and annotations

    Enhancing navigation in biomedical databases by community voting and database-driven text classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them.</p> <p>Results</p> <p>Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly.</p> <p>Conclusion</p> <p>Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases.</p> <p>The system can be accessed at <url>http://pepbank.mgh.harvard.edu</url>.</p

    The role of AI classifiers in skin cancer images

    Get PDF
    Background: The use of different imaging modalities to assist in skin cancer diagnosis is a common practice in clinical scenarios. Different features representative of the lesion under evaluation can be retrieved from image analysis and processing. However, the integration and understanding of these additional parameters can be a challenging task for physicians, so artificial intelligence (AI) methods can be implemented to assist in this process. This bibliographic research was performed with the goal of assessing the current applications of AI algorithms as an assistive tool in skin cancer diagnosis, based on information retrieved from different imaging modalities. Materials and methods: The bibliography databases ISI Web of Science, PubMed and Scopus were used for the literature search, with the combination of keywords: skin cancer, skin neoplasm, imaging and classification methods. Results: The search resulted in 526 publications, which underwent a screening process, considering the established eligibility criteria. After screening, only 65 were qualified for revision. Conclusion: Different imaging modalities have already been coupled with AI methods, particularly dermoscopy for melanoma recognition. Learners based on support vector machines seem to be the preferred option. Future work should focus on image analysis, processing stages and image fusion assuring the best possible classification outcome.info:eu-repo/semantics/publishedVersio

    Development of a recommendation system for scientific literature based on deep learning

    Get PDF
    Dissertação de mestrado em BioinformaticsThe previous few decades have seen an enormous volume of articles from the scientific commu nity on the most diverse biomedical topics, making it extremely challenging for researchers to find relevant information. Methods like Machine Learning (ML) and Deep Learning (DL) have been used to create tools that can speed up this process. In that context, this work focuses on examining the performance of different ML and DL techniques when classifying biomedical documents, mainly regarding their relevance to given topics. To evaluate the different techniques, the dataset from the BioCreative VI Track 4 challenge was used. The objective of the challenge was to identify documents related to protein-protein interactions altered by mutations, a topic extremely important in precision medicine. Protein-protein interactions play a crucial role in the cellular mechanisms of all living organisms, and mutations in these interaction sites could be indicative of diseases. To handle the data to be used in training, some text processing methods were implemented in the Omnia package from OmniumAI, the host company of this work. Several preprocessing and feature extraction methods were implemented, such as removing stopwords and TF-IDF, which may be used in other case studies. They can be used either with generic text or biomedical text. These methods, in conjunction with ML pipelines already developed by the Omnia team, allowed the training of several traditional ML models. We were able to achieve a small improvement on performance, compared to the challenge baseline, when applying these traditional ML models on the same dataset. Regarding DL, testing with a CNN model, it was clear that the BioWordVec pre-trained embedding achieved the best performance of all pre-trained embeddings. Additionally, we explored the application of more complex DL models. These models achieved a better performance than the best challenge submission. BioLinkBERT managed an improvement of 0.4 percent points on precision, 4.9 percent points on recall, and 2.2 percent points on F1.As décadas anteriores assistiram a um enorme aumento no volume de artigos da comunidade científica sobre os mais diversos tópicos biomédicos, tornando extremamente difícil para os investigadores encontrar informação relevante. Métodos como Aprendizagem Máquina (AM) e Aprendizagem Profunda (AP) tem sido utilizados para criar ferramentas que podem acelerar este processo. Neste contexto, este trabalho centra-se na avaliação do desempenho de diferentes técnicas de AM e AP na classificação de documentos biomédicos, principalmente no que diz respeito à sua relevância para determinados tópicos. Para avaliar as diferentes técnicas, foi utilizado o conjunto de dados do desafio BioCreative VI Track 4. O objectivo do desafio era identificar documentos relacionados com as interações proteína-proteína alteradas por mutações, um tópico extremamente importante na medicina de precisão. As interacções proteína-proteína desempenham um papel crucial nos mecanismos celulares de todos os organismos vivos, e as mutações nestes locais de interacção podem ser indicativas de doenças. Para tratar os dados a utilizar no treino, alguns métodos de processamento de texto foram implementados no pacote Omnia da OmniumAI, a empresa anfitriã deste trabalho. Foram implementados vários métodos de pré-processamento e extracção de características, tais como a remoção de palavras irrelevantes e TF-IDF, que podem ser utilizados em outros casos de estudos, tanto com texto genérico quer com texto biomédico. Estes métodos, em conjunto com as pipelines de AM já desenvolvidas pela equipa da Omnia, permitiram o treino de vários modelos tradicionais de AM. Conseguimos alcançar uma pequena melhoria no desempenho, em comparação com a linha de referência do desafio, ao aplicar estes modelos tradicionais de AM no mesmo conjunto de dados. Relativamente a AP, testando com um modelo CNN, ficou claro que o embedding pré-treinado BioWordVec alcançou o melhor desempenho de todos os embeddings pré-treinados. Adicionalmente, exploramos a aplicação de modelos de AP mais complexos. Estes modelos alcançaram um melhor desempenho do que a melhor submissão do desafio. BioLinkBERT conseguiu uma melhoria de 0,4 pontos percentuais na precisão, 4,9 pontos percentuais no recall, e 2,2 pontos percentuais em F1

    Usefulness of Artificial Neural Networks in the Diagnosis and Treatment of Sleep Apnea-Hypopnea Syndrome

    Get PDF
    Sleep apnea-hypopnea syndrome (SAHS) is a chronic and highly prevalent disease considered a major health problem in industrialized countries. The gold standard diagnostic methodology is in-laboratory nocturnal polysomnography (PSG), which is complex, costly, and time consuming. In order to overcome these limitations, novel and simplified diagnostic alternatives are demanded. Sleep scientists carried out an exhaustive research during the last decades focused on the design of automated expert systems derived from artificial intelligence able to help sleep specialists in their daily practice. Among automated pattern recognition techniques, artificial neural networks (ANNs) have demonstrated to be efficient and accurate algorithms in order to implement computer-aided diagnosis systems aimed at assisting physicians in the management of SAHS. In this regard, several applications of ANNs have been developed, such as classification of patients suspected of suffering from SAHS, apnea-hypopnea index (AHI) prediction, detection and quantification of respiratory events, apneic events classification, automated sleep staging and arousal detection, alertness monitoring systems, and airflow pressure optimization in positive airway pressure (PAP) devices to fit patients’ needs. In the present research, current applications of ANNs in the framework of SAHS management are thoroughly reviewed
    corecore