Search CORE

53 research outputs found

Combining Evolving Neural Network Classifiers Using Bagging

Author: Dagli Cihan H.
Sohn Sunghwan
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2003
Field of study

The performance of the neural network classifier significantly depends on its architecture and generalization. It is usual to find the proper architecture by trial and error. This is time consuming and may not always find the optimal network. For this reason, we apply genetic algorithms to the automatic generation of neural networks. Many researchers have provided that combining multiple classifiers improves generalization. One of the most effective combining methods is bagging. In bagging, training sets are selected by resampling from the original training set and classifiers trained with these sets are combined by voting. We implement the bagging technique into the training of evolving neural network classifiers to improve generalization

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Advantages of using Fuzzy Class Memberships in Self-Organizing Map and Support Vector Machines

Author: Dagli Cihan H.
Sohn Sunghwan
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2001
Field of study

The self-organizing map (SOM) is naturally unsupervised learning, but if a class label is known, it can be used as the classifier. In a SOM classifier, each neuron is assigned a class label based on the maximum class frequency and classified by a nearest neighbor strategy. The drawback when using this strategy is that each pattern is treated by equal importance in counting class frequency regardless of its typicalness. For this reason, the fuzzy class membership can be used instead of crisp class frequency and this fuzzy membership-label neuron provides another perspective of a feature map. This fuzzy class membership can be also used to select training samples in a support vector machine (SVM) classifier. This method allows us to reduce the training set as well as support vectors without significant loss of classification performance

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

LOST: A Mental Health Dataset of Low Self-esteem in Reddit Posts

Author: Garg Muskan
Gaur Manas
Goswami Raxit
Sohn Sunghwan
Publication venue
Publication date: 08/06/2023
Field of study

Low self-esteem and interpersonal needs (i.e., thwarted belongingness (TB) and perceived burdensomeness (PB)) have a major impact on depression and suicide attempts. Individuals seek social connectedness on social media to boost and alleviate their loneliness. Social media platforms allow people to express their thoughts, experiences, beliefs, and emotions. Prior studies on mental health from social media have focused on symptoms, causes, and disorders. Whereas an initial screening of social media content for interpersonal risk factors and low self-esteem may raise early alerts and assign therapists to at-risk users of mental disturbance. Standardized scales measure self-esteem and interpersonal needs from questions created using psychological theories. In the current research, we introduce a psychology-grounded and expertly annotated dataset, LoST: Low Self esTeem, to study and detect low self-esteem on Reddit. Through an annotation approach involving checks on coherence, correctness, consistency, and reliability, we ensure gold-standard for supervised learning. We present results from different deep language models tested using two data augmentation techniques. Our findings suggest developing a class of language models that infuses psychological and clinical knowledge

arXiv.org e-Print Archive

DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx

Author: Beesley Chris
Dexter Paul
Kesterson Joe
Krishnan Krishnan
Liu Hongfang
Mehrabi Saeed
Palakal Mathew
Roch Alexandra M
Schmidt C. Max
Schmidt Heidi
Sohn Sunghwan
Publication venue: 'Elsevier BV'
Publication date: 01/04/2015
Field of study

In Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs

Elsevier - Publisher Connector

IUPUIScholarWorks

Early Identification of Childhood Asthma: The Role of Informatics in an Era of Electronic Health Records

Author: Chung-Il Wi
Euijung Ryu
Hee Yun Seol
Hongfang Liu
Miguel A. Park
Sunghwan Sohn
Young J. Juhn
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2019
Field of study

Emerging literature suggests that delayed identification of childhood asthma results in an increased risk of long-term and various morbidities compared to those with timely diagnosis and intervention, and yet this risk is still overlooked. Even when children and adolescents have a history of recurrent asthma-like symptoms and risk factors embedded in their medical records, this information is sometimes overlooked by clinicians at the point of care. Given the rapid adoption of electronic health record (EHR) systems, early identification of childhood asthma can be achieved utilizing (1) asthma ascertainment criteria leveraging relevant clinical information embedded in EHR and (2) innovative informatics approaches such as natural language processing (NLP) algorithms for asthma ascertainment criteria to enable such a strategy. In this review, we discuss literature relevant to this topic and introduce recently published informatics algorithms (criteria-based NLP) as a potential solution to address the current challenge of early identification of childhood asthma

Directory of Open Access Journals

Abbreviation definition identification based on automatic precision estimates

Author: A Aronson
A Schwartz
C Fauquet
C Federiuk
C Friedman
Donald C Comeau
H Liu
H Yu
J Pustejovsky
JT Chang
K Fukuda
L Smith
M Yoshida
Sunghwan Sohn
T Cheng
W John Wilbur
W Zhou
Won Kim
Y Park
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders indexing algorithms and adversely affects information retrieval and extraction. Automatic abbreviation definition identification can help resolve these issues. However, abbreviations and their definitions identified by an automatic process are of uncertain validity. Due to the size of databases such as MEDLINE only a small fraction of abbreviation-definition pairs can be examined manually. An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed. In this paper we propose an abbreviation definition identification algorithm that employs a variety of strategies to identify the most probable abbreviation definition. In addition our algorithm produces an accuracy estimate, pseudo-precision, for each strategy without using a human-judged gold standard. The pseudo-precisions determine the order in which the algorithm applies the strategies in seeking to identify the definition of an abbreviation. Results On the Medstract corpus our algorithm produced 97% precision and 85% recall which is higher than previously reported results. We also annotated 1250 randomly selected MEDLINE records as a gold standard. On this set we achieved 96.5% precision and 83.2% recall. This compares favourably with the well known Schwartz and Hearst algorithm. Conclusion We developed an algorithm for abbreviation identification that uses a variety of strategies to identify the most probable definition for an abbreviation and also produces an estimated accuracy of the result. This process is purely automatic.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central