11 research outputs found

    Forecasting Retail Client Flow with LSTMs on Inconsistent Time Series

    Get PDF
    An important variable in retail future planning is forecasting client flow in stores. This research aims at introducing two Long Short-Term Memory network architectures for time series forecasting of client flow in retail stores. These models are allied with three main data preprocessing approaches: a data imputation method that standardizes store schedules; a harmonic regression method that captures and removes the seasonal and trend components of the time series and a sliding window sampling method to construct the network’s training phase. Results were not extensively optimized but the framework leaves an open door for further improvements

    Content vs metrics: Using language modeling to evaluate in-line source code comments for Python

    Get PDF
    Undergraduate thesis submitted to the Department of Computer Science, Ashesi University, in partial fulfillment of Bachelor of Science degree in / Computer Science, May 2020Documentation is vital to the understanding, maintenance and, ultimately, survival of software projects . And yet, a lot of software projects either lack documentation, or are very poorly documented. This results in a gradual decline in the quality of the code and may require complete overhauls in extreme cases. It is therefore important to evaluate documentation to ensure that it conveys clear and meaningful ideas. While existing methods of evaluating documentation are metrics based and look at the structure of documentation examples, this paper explores the possibility of evaluating documentation by assessing its contents. There is, however, a lack of an existing corpus of documentation for natural language processing tasks. A corpus of Python function/method comments is assembled, and a language modeling experiment is performed on them. The results of this experiment are mixed. While they show that it is possible to evaluate documentation by looking at its content as opposed to structure, they also show that this approach may not necessarily be more accurate, with lower quality comment examples having higher probability than those of higher quality.Ashesi Universit

    Exploiting Emotions via Composite Pretrained Embedding and Ensemble Language Model

    Get PDF
    Decisions in the modern era are based on more than just the available data; they also incorporate feedback from online sources. Processing reviews known as Sentiment analysis (SA) or Emotion analysis. Understanding the user's perspective and routines is crucial now-a-days for multiple reasons. It is used by both businesses and governments to make strategic decisions. Various architectural and vector embedding strategies have been developed for SA processing. Accurate representation of text is crucial for automatic SA. Due to the large number of languages spoken and written,  polysemy and syntactic or semantic issues were common. To get around these problems, we developed effective composite embedding (ECE), a method that combines the advantages of vector embedding techniques that are either context-independent (like glove & fasttext) or context-aware (like  XLNet) to effectively represent the features needed for processing.  To improve the performace towards emotion or  sentiment we proposed stacked ensemble model of deep lanugae models.ECE with Ensembled model is evaluated on balanced  dataset to prove that it is a reliable embedding technique and a generalised model for SA.In order to evaluate ECE, cutting-edge ML and Deep net language models are deployed and comapared. The model is evaluated using benchmark datset such as  MR, Kindle along with realtime tweet dataset of user complaints . LIME is used to verify the model's predictions and to provide statistical results for sentence.The model with ECE embedding provides state-of-art results with real time dataset as well

    Automatic symptoms identification from a massive volume of unstructured medical consultations using deep neural and BERT models

    Get PDF
    Automatic symptom identification plays a crucial role in assisting doctors during the diagnosis process in Telemedicine. In general, physicians spend considerable time on clinical documentation and symptom identification, which is unfeasible due to their full schedule. With text-based consultation services in telemedicine, the identification of symptoms from a user’s consultation is a sophisticated process and time-consuming. Moreover, at Altibbi, which is an Arabic telemedicine platform and the context of this work, users consult doctors and describe their conditions in different Arabic dialects which makes the problem more complex and challenging. Therefore, in this work, an advanced deep learning approach is developed consultations with multi-dialects. The approach is formulated as a multi-label multi-class classification using features extracted based on AraBERT and fine-tuned on the bidirectional long short-term memory (BiLSTM) network. The Fine-tuning of BiLSTM relies on features engineered based on different variants of the bidirectional encoder representations from transformers (BERT). Evaluating the models based on precision, recall, and a customized hit rate showed a successful identification of symptoms from Arabic texts with promising accuracy. Hence, this paves the way toward deploying an automated symptom identification model in production at Altibbi which can help general practitioners in telemedicine in providing more efficient and accurate consultations

    SENTIMENT ANALYSIS FOR SEARCH ENGINE

    Get PDF
    The chief purpose of this study is to detect and eliminate the sentiment bias in a search engine. Sentiment bias means a bias induced in the search results based on the sentiment of the user’s search query. As people increasing depend on search engines for information, it is important to understand the quality of results produced by the search engines. This study does not try to build a search engine but leverage the existing search engines to provide better results to the user. In this study, only the queries that have high sentiment polarity are analyzed and the machine learning models are used to predict the sentiment polarity of the input query, sentiment polarity of the documents produced by the search engine for the given query and also to change the sentiment polarity of the input query to its opposite sentiment. This project proposes an end-to-end system that eliminates the search engine bias by producing results that align with the query sentiment as well as the opposite sentiment. The system comprising of three models for document level sentiment analysis, aspect level sentiment analysis and sentiment style transfer. The document level sentiment analyzer is an LSTM based model that uses GloVe word embeddings to analyze the sentiment of the documents produced by the search engine. The aspect level sentiment analyzer uses deep memory network with attention and auxiliary memory to analyze the sentiment of each search query. In order to obtain the iv documents of the opposite polarity, the sentiment of the search query is reversed using the sentiment style transfer model that uses a bi-directional LSTM. The results are analyzed to determine the sentiment bias of the search engine based on the input query. In our experiments, we observed that positive sentiment queries yielded 67% documents with positive sentiment and negative sentiment queries yielded 70% documents with negative sentiment. The proposed system eliminates this bias by providing the users with two sets of result, one with positive sentiment and one with negative sentiment

    Evaluasi Kepuasan Pelanggan Hotel Berdasarkan Analisa Sentiment Pada Review Pelanggan

    Get PDF
    Customer relationship management (CRM) memiliki pengaruh yang sangat besar bagi kinerja perusahaan. Hubungan pelanggan dengan perusahaan saat ini sangat mudah untuk dilakukan, salah satunya melalui website pada review online. Review online akan sangat membantu perusahaan untuk mengetahui hal apa dari bisnis tersebut yang disenangi pelanggan maupun yang tidak. Untuk mempermudah perusahaan dalam mengetahui kepuasan pelanggan, diusulkan penelitian untuk mencari sentiment kepuasan dari setiap review sesuai dengan kategori aspect hotel kemudian melakukan evaluasi kepuasan. Aspect yang dimaksud terdiri dari: location, meal, service, comfort serta cleanliness hotel. Penelitian ini mengambil teks review dalam bahasa Inggris. Kategorisasi aspect akan dilakukan dengan beberapa tahapan, pertama menggunakan Latent Dirichlet Allocation (LDA) sebagai metode untuk menemukan hidden topic dari review. Latent Dirichlet Allocation (LDA) memiliki kekurangan untuk mengklasifikasikan dokumen ke dalam salah satu aspect secara langsung. Sehingga pada tahap kedua diusulkan metode Semantic Similarity untuk mengkategorikan setiap hidden topic review yang dihasilkan oleh Latent Dirichlet Allocation (LDA) pada 5 aspect hotel. Kemudian dalam menghitung Semantic Similatiry, term list akan diperluas dengan menggunakan metode Term Frequency-Inverse Cluster Frequency (TF-ICF). Akhirnya, dilakukan proses klasifikasi terhadap sentiment pelanggan (puas atau tidak puas) menggunakan Word Embedding untuk mengekstraksi setiap kata dan dokumen menjadi vector kata yang kemudian akan digunakan sebagai input untuk proses klasifikasi menggunakan metode Long Short Tem Memmory (LSTM). setelah ditemukan sentimen pada setiap aspect, selanjutnya akan dilakukan evaluasi hasil. Performa dari setiap metode dievaluasi menggunakan precision, recall dan F1-Measure. Hasil dari uji coba menunjukkan bahwa performa kategorisasi aspect tertinggi dilakukan dengan melakukan penggabungan metode Latent Dirichlet Allocation (LDA) untuk mencari hidden topic, digabungkan dengan Term Frequency-Inverse Cluster Frequency (TF-ICF) 100% untuk peluasan term dan Semantic Similarity untuk kategorisasi aspect yang mendapatkan hasil performa hingga mencapai 85% dan performa Word Embedding untuk representasi angka vector dengan Long Short Term Memmory (LSTM) untuk klasifikasi sentiment sangat tinggi yang mendapatkan performa mencapai 94%. Sehingga, peneliti melakukan penggabungan metode LDA+TF-ICF 100%+Semantic Similarity untuk melakukan kategorisasi aspect lalu menggunakan Word Embedding+LSTM untuk melakukan klasifikasi sentiment pada setiap review. Kemudian, pada evaluasi akhir yang dilakukan, peneliti mendapatkan bahwa aspect comfort hotel memiliki review dengan sentiment negative sangat tinggi yang mencapai 11,369% dibanding dengan sentiment review pada aspect lainnya (location: 0.464, meal: 0.696, service: 3.016, dan cleanliness: 1.160) sehingga pihak manajemen hotel perlu melakukan perbaikan-perbaikan untuk lebih memperhatikan kenyamanan pelanggan dengan tujuan untuk mengurangi jumlah review negative pada aspect comfort tersebut. Hasil juga menunjukkan bahwa perubahan sentiment (pada positive atau negative sentiment) dipengaruhi oleh aspect yang dimiliki oleh setiap review. ================================================================================================ Customer relationship management (CRM) has a huge influence on company performance. Nowadays, customers can contact companies in easy ways, one of them is through the website on an online review. Online reviews will greatly help the company to find out any of the business that makes customers like it or not. To help companies determine customer satisfaction, the proposed research to find satisfaction sentiment of each review in accordance with aspects of the category of the hotel then do an evaluation of satisfaction. Aspect is composed of location, meal, service, comfort, and cleanliness of the hotel. This research will take a review text in English. These aspects were classified in several stages, first using the Latent Dirichlet Allocation (LDA) was used as a method to find the hidden topic of a document. Latent Dirichlet Allocation (LDA) has the disadvantage to classify documents into one aspect directly. So that in the second stage the Semantic Similarity method was proposed to categorize each hidden topic review produced by Latent Dirichlet Allocation (LDA) on 5 aspects of the hotel. Then in calculating the Semantic Similarity, term list will be expanded by using Cluster Term Frequency-Inverse Frequency (TF-ICF). Finally, the classification of customer sentiment (satisfied or dissatisfied) is done using Word Embedding to extract each word and document into a word vector which will then be used as input for the classification process using the LSTM method. After finding sentiment on each aspect, then the results evaluation will be carried out. The performance of each method is evaluated using precision, recall and F1-Measure. The results of the trials show that the highest performance of aspect categorization is done by combining the Latent Dirichlet Allocation (LDA) method to search hidden topics, combined with 100% Term Frequency-Inverse Cluster Frequency (TF-ICF) for expansion term and Semantic Similarity for categorization that get performance results up to 85% and Word Embedding for word vector representation combined with Long Short Term Memmory (LSTM) is getting very high sentiment classifications that get a performance of 94%. So, the researcher merged the LDA + TF-ICF 100% + Semantic Similarity method to categorize aspects and then used Word Embedding + LSTM to classify sentiments in each review. Then, at the final evaluation, the researcher found that the comfort aspect of the hotel had a review with very high negative sentiment which reached 11,369% compared to other aspects of the review sentiment (location: 0.464, meal: 0.696, service: 3.016, dan cleanliness: 1.160) so the hotel management needs to make improvements to pay more attention to customer convenience in order to reduce the number of negative reviews on the comfort aspect. The results also show that changes in sentiment (in positive or negative sentiments) are influenced by the aspects of each review

    Recent Advances in Stock Market Prediction Using Text Mining: A Survey

    Get PDF
    Market prediction offers great profit avenues and is a fundamental stimulus for most researchers in this area. To predict the market, most researchers use either technical or fundamental analysis. Technical analysis focuses on analyzing the direction of prices to predict future prices, while fundamental analysis depends on analyzing unstructured textual information like financial news and earning reports. More and more valuable market information has now become publicly available online. This draws a picture of the significance of text mining strategies to extract significant information to analyze market behavior. While many papers reviewed the prediction techniques based on technical analysis methods, the papers that concentrate on the use of text mining methods were scarce. In contrast to the other current review articles that concentrate on discussing many methods used for forecasting the stock market, this study aims to compare many machine learning (ML) and deep learning (DL) methods used for sentiment analysis to find which method could be more effective in prediction and for which types and amount of data. The study also clarifies the recent research findings and its potential future directions by giving a detailed analysis of the textual data processing and future research opportunity for each reviewed study

    Document-level sentiment analysis of email data

    Get PDF
    Sisi Liu investigated machine learning methods for Email document sentiment analysis. She developed a systematic framework that has been qualitatively and quantitatively proved to be effective and efficient in identifying sentiment from massive amount of Email data. Analytical results obtained from the document-level Email sentiment analysis framework are beneficial for better decision making in various business settings
    corecore