Search CORE

6 research outputs found

What company does my news article refer to? Tackling multiclass problems with topic modeling

Author: Farrell Patricio
Kunkel Julian
Lübbering Max
Publication venue
Publication date: 01/01/2019
Field of study

While it is technically trivial to search for the company name to predict the company a new article refers to, it often leads to incorrect results. In this article, we compare the two approaches bag-of-words with k-nearest neighbors and Latent Dirichlet Allocation with k-nearest neighbor by assessing their applicability for predicting the S\&P 500 company which is mentioned in a business news article or press release. Both approaches are evaluated on a corpus of 13k documents containing 84\% news articles and 16\% press releases. While the bag-of-words approach yields accurate predictions, it is highly inefficient due to its gigantic feature space. The Latent Dirichlet Allocation approach, on the other hand, manages to achieve roughly the same prediction accuracy (0.58 instead of 0.62) but reduces the feature space by a factor of seven

Publications Server of the Weierstrass Institute for Applied Analysis and Stochastics

Repositorium für Naturwissenschaften und Technik

A Multi-Label Machine Learning Approach to Support Pathologist\u27s Histological Analysis

Author: Amir Topalović
Antonia Azzini
Nicola Cortesi
Stefania Marrara
Publication venue
Publication date: 01/01/2019
Field of study

This paper proposes a new tool in the field of telemedicine, defined as a specific branch where IT supports medicine, in case distance impairs the proper care to be delivered to a patient. All the information contained into medical texts, if properly extracted, may be suitable for searching, classification, or statistical analysis. For this reason, in order to reduce errors and improve quality control, a proper information extraction tool may be useful. In this direction, this work presents a Machine Learning Multi-Label approach for the classification of the information extracted from the pathology reports into relevant categories. The aim is to integrate automatic classifiers to improve the current workflow of medical experts, by defining a Multi-Label approach, able to consider all the features of a model, together with their relationships. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.</p

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

A Fuzzy Approach Model for Uncovering Hidden Latent Semantic Structure in Medical Text Collections

Author: Gangopadhyay Aryya
Karami Amir
Kharrazi Hadi
Zhou Bin
Publication venue: 'iSchools'
Publication date: 15/03/2015
Field of study

One of the challenges for text analysis in the medical domain including the clinical notes and research papers is analyzing large-scale medical documents. As a consequence, finding relevant documents has become more difficult and previous work has also shown unique problems of medical documents. The themes in documents help to retrieve documents on the same topic with and without a query. One of the popular methods to retrieve information based on discovering the themes in the documents is topic modeling. In this paper we describe a novel approach in topic modeling, FATM, using fuzzy clustering. To assess the value of FATM, we experiment with two text datasets of medical documents. The quantitative evaluation carried out through log-likelihood on held-out data shows that FATM produces superior performance to LDA. This research contributes to the emerging field of understanding the characteristics of the medical documents and how to account for them in text mining.ye

Illinois Digital Environment for Access to Learning and Scholarship Repository

Detecting Hypoglycemia Incidents Reported in Patients\u27 Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance

Author: Chen Jinying
Druhl Emily
Granillo Edgard A.
Lalor John
Liu Weisong
Vimalananda Varsha G.
Yu Hong
Publication venue: eScholarship@UMassChan
Publication date: 11/03/2019
Field of study

BACKGROUND: Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. OBJECTIVE: We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients\u27 secure messages. METHODS: An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. RESULTS: The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. CONCLUSIONS: Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia

eScholarship@UMMS