53 research outputs found

    Combining Evolving Neural Network Classifiers Using Bagging

    Get PDF
    The performance of the neural network classifier significantly depends on its architecture and generalization. It is usual to find the proper architecture by trial and error. This is time consuming and may not always find the optimal network. For this reason, we apply genetic algorithms to the automatic generation of neural networks. Many researchers have provided that combining multiple classifiers improves generalization. One of the most effective combining methods is bagging. In bagging, training sets are selected by resampling from the original training set and classifiers trained with these sets are combined by voting. We implement the bagging technique into the training of evolving neural network classifiers to improve generalization

    Advantages of using Fuzzy Class Memberships in Self-Organizing Map and Support Vector Machines

    Get PDF
    The self-organizing map (SOM) is naturally unsupervised learning, but if a class label is known, it can be used as the classifier. In a SOM classifier, each neuron is assigned a class label based on the maximum class frequency and classified by a nearest neighbor strategy. The drawback when using this strategy is that each pattern is treated by equal importance in counting class frequency regardless of its typicalness. For this reason, the fuzzy class membership can be used instead of crisp class frequency and this fuzzy membership-label neuron provides another perspective of a feature map. This fuzzy class membership can be also used to select training samples in a support vector machine (SVM) classifier. This method allows us to reduce the training set as well as support vectors without significant loss of classification performance

    LOST: A Mental Health Dataset of Low Self-esteem in Reddit Posts

    Full text link
    Low self-esteem and interpersonal needs (i.e., thwarted belongingness (TB) and perceived burdensomeness (PB)) have a major impact on depression and suicide attempts. Individuals seek social connectedness on social media to boost and alleviate their loneliness. Social media platforms allow people to express their thoughts, experiences, beliefs, and emotions. Prior studies on mental health from social media have focused on symptoms, causes, and disorders. Whereas an initial screening of social media content for interpersonal risk factors and low self-esteem may raise early alerts and assign therapists to at-risk users of mental disturbance. Standardized scales measure self-esteem and interpersonal needs from questions created using psychological theories. In the current research, we introduce a psychology-grounded and expertly annotated dataset, LoST: Low Self esTeem, to study and detect low self-esteem on Reddit. Through an annotation approach involving checks on coherence, correctness, consistency, and reliability, we ensure gold-standard for supervised learning. We present results from different deep language models tested using two data augmentation techniques. Our findings suggest developing a class of language models that infuses psychological and clinical knowledge

    DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx

    Get PDF
    In Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs

    Early Identification of Childhood Asthma: The Role of Informatics in an Era of Electronic Health Records

    Get PDF
    Emerging literature suggests that delayed identification of childhood asthma results in an increased risk of long-term and various morbidities compared to those with timely diagnosis and intervention, and yet this risk is still overlooked. Even when children and adolescents have a history of recurrent asthma-like symptoms and risk factors embedded in their medical records, this information is sometimes overlooked by clinicians at the point of care. Given the rapid adoption of electronic health record (EHR) systems, early identification of childhood asthma can be achieved utilizing (1) asthma ascertainment criteria leveraging relevant clinical information embedded in EHR and (2) innovative informatics approaches such as natural language processing (NLP) algorithms for asthma ascertainment criteria to enable such a strategy. In this review, we discuss literature relevant to this topic and introduce recently published informatics algorithms (criteria-based NLP) as a potential solution to address the current challenge of early identification of childhood asthma

    Abbreviation definition identification based on automatic precision estimates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders indexing algorithms and adversely affects information retrieval and extraction. Automatic abbreviation definition identification can help resolve these issues. However, abbreviations and their definitions identified by an automatic process are of uncertain validity. Due to the size of databases such as MEDLINE only a small fraction of abbreviation-definition pairs can be examined manually. An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed. In this paper we propose an abbreviation definition identification algorithm that employs a variety of strategies to identify the most probable abbreviation definition. In addition our algorithm produces an accuracy estimate, pseudo-precision, for each strategy without using a human-judged gold standard. The pseudo-precisions determine the order in which the algorithm applies the strategies in seeking to identify the definition of an abbreviation.</p> <p>Results</p> <p>On the Medstract corpus our algorithm produced 97% precision and 85% recall which is higher than previously reported results. We also annotated 1250 randomly selected MEDLINE records as a gold standard. On this set we achieved 96.5% precision and 83.2% recall. This compares favourably with the well known Schwartz and Hearst algorithm.</p> <p>Conclusion</p> <p>We developed an algorithm for abbreviation identification that uses a variety of strategies to identify the most probable definition for an abbreviation and also produces an estimated accuracy of the result. This process is purely automatic.</p
    • …
    corecore