6 research outputs found

    TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study

    Get PDF
    Today, we are seeing an ever-increasing number of clinical notes that contain clinical results, images, and textual descriptions of patient's health state. All these data can be analyzed and employed to cater novel services that can help people and domain experts with their common healthcare tasks. However, many technologies such as Deep Learning and tools like Word Embeddings have started to be investigated only recently, and many challenges remain open when it comes to healthcare domain applications. To address these challenges, we propose the use of Deep Learning and Word Embeddings for identifying sixteen morbidity types within textual descriptions of clinical records. For this purpose, we have used a Deep Learning model based on Bidirectional Long-Short Term Memory (LSTM) layers which can exploit state-of-the-art vector representations of data such as Word Embeddings. We have employed pre-trained Word Embeddings namely GloVe and Word2Vec, and our own Word Embeddings trained on the target domain. Furthermore, we have compared the performances of the deep learning approaches against the traditional tf-idf using Support Vector Machine and Multilayer perceptron (our baselines). From the obtained results it seems that the latter outperforms the combination of Deep Learning approaches using any word embeddings. Our preliminary results indicate that there are specific features that make the dataset biased in favour of traditional machine learning approaches.Comment: 12 pages, 2 figures, 2 tables, SmartPhil 2020-First Workshop on Smart Personal Health Interfaces, Associated to ACM IUI 202

    Exploiting Emotions via Composite Pretrained Embedding and Ensemble Language Model

    Get PDF
    Decisions in the modern era are based on more than just the available data; they also incorporate feedback from online sources. Processing reviews known as Sentiment analysis (SA) or Emotion analysis. Understanding the user's perspective and routines is crucial now-a-days for multiple reasons. It is used by both businesses and governments to make strategic decisions. Various architectural and vector embedding strategies have been developed for SA processing. Accurate representation of text is crucial for automatic SA. Due to the large number of languages spoken and written,  polysemy and syntactic or semantic issues were common. To get around these problems, we developed effective composite embedding (ECE), a method that combines the advantages of vector embedding techniques that are either context-independent (like glove & fasttext) or context-aware (like  XLNet) to effectively represent the features needed for processing.  To improve the performace towards emotion or  sentiment we proposed stacked ensemble model of deep lanugae models.ECE with Ensembled model is evaluated on balanced  dataset to prove that it is a reliable embedding technique and a generalised model for SA.In order to evaluate ECE, cutting-edge ML and Deep net language models are deployed and comapared. The model is evaluated using benchmark datset such as  MR, Kindle along with realtime tweet dataset of user complaints . LIME is used to verify the model's predictions and to provide statistical results for sentence.The model with ECE embedding provides state-of-art results with real time dataset as well

    An assessment of deep learning models and word embeddings for toxicity detection within online textual comments

    Get PDF
    Today, increasing numbers of people are interacting online and a lot of textual comments are being produced due to the explosion of online communication. However, a paramount inconvenience within online environments is that comments that are shared within digital platforms can hide hazards, such as fake news, insults, harassment, and, more in general, comments that may hurt someone’s feelings. In this scenario, the detection of this kind of toxicity has an important role to moderate online communication. Deep learning technologies have recently delivered impressive performance within Natural Language Processing applications encompassing Sentiment Analysis and emotion detection across numerous datasets. Such models do not need any pre-defined hand-picked features, but they learn sophisticated features from the input datasets by themselves. In such a domain, word embeddings have been widely used as a way of representing words in Sentiment Analysis tasks, proving to be very effective. Therefore, in this paper, we investigated the use of deep learning and word embeddings to detect six different types of toxicity within online comments. In doing so, the most suitable deep learning layers and state-of-the-art word embeddings for identifying toxicity are evaluated. The results suggest that Long-Short Term Memory layers in combination with mimicked word embeddings are a good choice for this task

    Evaluating neural word embeddings created from online course reviews for sentiment analysis

    No full text
    Social media are providing the humus for the sharing of knowledge and experiences and the growth of community activities (e.g., debating about different topics). The analysis of the user-generated content in this area usually relies on Sentiment Analysis. Word embeddings and Deep Learning have attracted extensive attention in various sentiment detection tasks. In parallel, the literature exposed the drawbacks of traditional approaches when content belonging to specific contexts is processed with general techniques. Thus, ad-hoc solutions are needed to improve the effectiveness of such systems. In this paper, we focus on user-generated content coming from the e-learning context to demonstrate how distributional semantic approaches trained on smaller context-specific textual resources are more effective with respect to approaches trained on bigger general-purpose ones. To this end, we build context-trained embeddings from online course reviews using state-of-the-art generators. Then, those embeddings are integrated in a deep neural network we designed to solve a polarity detection task on reviews in the e-learning context, modeled as a regression. By applying our approach on embeddings trained using background corpora from different contexts, we show that the performance is better when the background context is aligned with the regression context
    corecore