118 research outputs found

    TED-S: Twitter Event Data in Sports and Politics with Aggregated Sentiments

    Get PDF
    Even though social media contain rich information on events and public opinions, it is impractical to manually filter this information due to data’s vast generation and dynamicity. Thus, automated extraction mechanisms are invaluable to the community. We need real data with ground truth labels to build/evaluate such systems. Still, to the best of our knowledge, no available social media dataset covers continuous periods with event and sentiment labels together except for events or sentiments. Datasets without time gaps are huge due to high data generation and require extensive effort for manual labelling. Different approaches, ranging from unsupervised to supervised, have been proposed by previous research targeting such datasets. However, their generic nature mainly fails to capture event-specific sentiment expressions, making them inappropriate for labelling event sentiments. Filling this gap, we propose a novel data annotation approach in this paper involving several neural networks. Our approach outperforms the commonly used sentiment annotation models such as VADER and TextBlob. Also, it generates probability values for all sentiment categories besides providing a single category per tweet, supporting aggregated sentiment analyses. Using this approach, we annotate and release a dataset named TED-S, covering two diverse domains, sports and politics. TED-S has complete subsets of Twitter data streams with both sub-event and sentiment labels, providing the ability to support event sentiment-based research

    Identifying Semantically Duplicate Questions Using Data Science Approach: A Quora Case Study

    Get PDF
    Kaks kĂŒsimust on semantselt dubleeritud, arvestades, et tĂ€pselt sama vastus vĂ”ib rahuldada mĂ”lemaid kĂŒsimusi. Semantselt identsete kĂŒsimuste vĂ€ljaselgitamine selliste sotsiaalmeedia platvormide kohta nagu Quora on erakordselt oluline, et tagada kasutajatele esitatud sisu kvaliteet ja kogus, lĂ€htudes kĂŒsimuse kavatsusest ja nii rikastades ĂŒldist kasutajakogemust. Dubleerivate kĂŒsimuste avastamine on vĂ€ljakutseks, sest looduskeel on vĂ€ga vĂ€ljendusrikas ning ainulaadset kavatsust saab edastada erinevate sĂ”nade, fraaside ja lausekujunduse abil. MasinĂ”ppe ja sĂŒgava Ă”ppimise meetodid on teadaolevalt saavutanud paremaid tulemusi vĂ”rreldes traditsiooniliste loodusliku keeletöötlemise tehnikatega sarnaste tekstide vĂ€ljaselgitamisel.Selles teoses, vĂ”ttes Quora oma juhtumiuuringuks, uurisime ja kohaldasime erinevaid masinĂ”ppe- ja sĂŒgavĂ”ppetehnikaid ĂŒlesandel tuvastada Quora kĂŒsimuse paari andmestikul kahekordsed kĂŒsimused. Kasutades omaduste inseneritehnikat, eristavaid tĂ€htsaid tehnikaid ning katsetades seitsme valitud masinĂ”ppe klassifikaatoriga, nĂ€itasime, et meie mudelid edestasid paari varasemat selle ĂŒlesandega seotud uuringut. Xgboost mudelil, mida söödetakse tĂ€hetaseme termilise sagedusega ja pöördsagedusega, saavutati teiste masinĂ”ppemudelite suhtes paremad tulemused ning edestati ka paari Deep learningi algmudelit.Meie kasutasime sĂŒgava Ă”ppimise tehnikat, et modelleerida neli erinevat sĂŒgavat neuralivĂ”rgustikku, mis koosnevad Glove Embedding, Long Short Term Memory, Convolution, Max Pooling, Dense, Batch normaliseerimisest, aktuaalsetest funktsioonidest ja mudeli ĂŒhendamisest. Meie sĂŒvaĂ”ppemudelid saavutasid parema tĂ€psuse kui masinĂ”ppemudelid. Kolm neljast vĂ€ljapakutud arhitektuurist edestasid tĂ€psust varasemast masinĂ”ppe- ja sĂŒvaĂ”ppetööst, kaks neljast mudelist edestasid tĂ€psust varasemast sĂŒgava Ă”ppimise uuringust Quora kĂŒsitluspaari andmestik ning meie parim mudel saavutas tĂ€psuse 85.82% mis on kunstilise seisundi Quora lĂ€hedane tĂ€psus.Two questions are semantically duplicate, given that precisely the same answer can satisfy both the questions. Identifying semantically identical questions on, Question and Answering(QandA) social media platforms like Quora is exceptionally significant to ensure that the quality and the quantity of content are presented to users, based on the intent of the question and thus enriching overall user experience. Detecting duplicate questions is a challenging problem because natural language is very expressive, and a unique intent can be conveyed using different words, phrases, and sentence structuring. Machine learning and deep learning methods are known to have accomplished superior results over traditional natural language processing techniques in identifying similar texts.In this thesis, taking Quora for our case study, we explored and applied different machine learning and deep learning techniques on the task of identifying duplicate questions on Quora’s question pair dataset. By using feature engineering, feature importance techniques, and experimenting with seven selected machine learning classifiers, we demonstrated that our models outperformed a few of the previous studies on this task. Xgboost model, when fed with character level term frequency and inverse term frequency, achieved superior results to other machine learning models and also outperformed a few of the Deep learning baseline models.We applied deep learning techniques to model four different deep neural networks of multiple layers consisting of Glove embeddings, Long Short Term Memory, Convolution, Max pooling, Dense, Batch Normalization, Activation functions, and model merge. Our deep learning models achieved better accuracy than machine learning models. Three out of four proposed architectures outperformed the accuracy from previous machine learning and deep learning research work, two out of four models outperformed accuracy from previous deep learning study on Quora’s question pair dataset, and our best model achieved accuracy of 85.82% which is close to Quora state of the art accuracy

    Researchers eye-view of sarcasm detection in social media textual content

    Full text link
    The enormous use of sarcastic text in all forms of communication in social media will have a physiological effect on target users. Each user has a different approach to misusing and recognising sarcasm. Sarcasm detection is difficult even for users, and this will depend on many things such as perspective, context, special symbols. So, that will be a challenging task for machines to differentiate sarcastic sentences from non-sarcastic sentences. There are no exact rules based on which model will accurately detect sarcasm from many text corpus in the current situation. So, one needs to focus on optimistic and forthcoming approaches in the sarcasm detection domain. This paper discusses various sarcasm detection techniques and concludes with some approaches, related datasets with optimal features, and the researcher's challenges.Comment: 8 page

    Toponym detection in the bio-medical domain: A hybrid approach with deep learning

    Get PDF
    This paper compares how different machine learning classifiers can be used together with simple string matching and named entity recognition to detect locations in texts. We compare five different state-of-the-art machine learning classifiers in order to predict whether a sentence contains a location or not. Following this classification task, we use a string matching algorithm with a gazetteer to identify the exact index of a toponym within the sentence. We evaluate different approaches in terms of machine learning classifiers, text pre-processing and location extraction on the SemEval-2019 Task 12 dataset, compiled for toponym resolution in the bio-medical domain. Finally, we compare the results with our system that was previously submitted to the SemEval-2019 task evaluation.Published versio

    Productivity Measurement of Call Centre Agents using a Multimodal Classification Approach

    Get PDF
    Call centre channels play a cornerstone role in business communications and transactions, especially in challenging business situations. Operations’ efficiency, service quality, and resource productivity are core aspects of call centres’ competitive advantage in rapid market competition. Performance evaluation in call centres is challenging due to human subjective evaluation, manual assortment to massive calls, and inequality in evaluations because of different raters. These challenges impact these operations' efficiency and lead to frustrated customers. This study aims to automate performance evaluation in call centres using various deep learning approaches. Calls recorded in a call centre are modelled and classified into high- or low-performance evaluations categorised as productive or nonproductive calls. The proposed conceptual model considers a deep learning network approach to model the recorded calls as text and speech. It is based on the following: 1) focus on the technical part of agent performance, 2) objective evaluation of the corpus, 3) extension of features for both text and speech, and 4) combination of the best accuracy from text and speech data using a multimodal structure. Accordingly, the diarisation algorithm extracts that part of the call where the agent is talking from which the customer is doing so. Manual annotation is also necessary to divide the modelling corpus into productive and nonproductive (supervised training). Krippendorff’s alpha was applied to avoid subjectivity in the manual annotation. Arabic speech recognition is then developed to transcribe the speech into text. The text features are the words embedded using the embedding layer. The speech features make several attempts to use the Mel Frequency Cepstral Coefficient (MFCC) upgraded with Low-Level Descriptors (LLD) to improve classification accuracy. The data modelling architectures for speech and text are based on CNNs, BiLSTMs, and the attention layer. The multimodal approach follows the generated models to improve performance accuracy by concatenating the text and speech models using the joint representation methodology. The main contributions of this thesis are: ‱ Developing an Arabic Speech recognition method for automatic transcription of speech into text. ‱ Drawing several DNN architectures to improve performance evaluation using speech features based on MFCC and LLD. ‱ Developing a Max Weight Similarity (MWS) function to outperform the SoftMax function used in the attention layer. ‱ Proposing a multimodal approach for combining the text and speech models for best performance evaluation

    A Multimodal Approach to Sarcasm Detection on Social Media

    Get PDF
    In recent times, a major share of human communication takes place online. The main reason being the ease of communication on social networking sites (SNSs). Due to the variety and large number of users, SNSs have drawn the attention of the computer science (CS) community, particularly the affective computing (also known as emotional AI), information retrieval, natural language processing, and data mining groups. Researchers are trying to make computers understand the nuances of human communication including sentiment and sarcasm. Emotion or sentiment detection requires more insights about the communication than it does for factual information retrieval. Sarcasm detection is particularly more difficult than categorizing sentiment. Because, in sarcasm, the intended meaning of the expression by the user is opposite to the literal meaning. Because of its complex nature, it is often difficult even for human to detect sarcasm without proper context. However, people on social media succeed in detecting sarcasm despite interacting with strangers across the world. That motivates us to investigate the human process of detecting sarcasm on social media where abundant context information is often unavailable and the group of users communicating with each other are rarely well-acquainted. We have conducted a qualitative study to examine the patterns of users conveying sarcasm on social media. Whereas most sarcasm detection systems deal in word-by-word basis to accomplish their goal, we focused on the holistic sentiment conveyed by the post. We argue that utilization of word-level information will limit the systems performance to the domain of the dataset used to train the system and might not perform well for non-English language. As an endeavor to make our system less dependent on text data, we proposed a multimodal approach for sarcasm detection. We showed the applicability of images and reaction emoticons as other sources of hints about the sentiment of the post. Our research showed the superior results from a multimodal approach when compared to a unimodal approach. Multimodal sarcasm detection systems, as the one presented in this research, with the inclusion of more modes or sources of data might lead to a better sarcasm detection model

    An Integrative Behavioral Model of Information Security Policy Compliance

    Get PDF
    The authors found the behavioral factors that influence the organization members’ compliance with the information security policy in organizations on the basis of neutralization theory, Theory of planned behavior, and protection motivation theory. Depending on the theory of planned behavior, members’ attitudes towards compliance, as well as normative belief and self-efficacy, were believed to determine the intention to comply with the information security policy. Neutralization theory, a prominent theory in criminology, could be expected to provide the explanation for information system security policy violations. Based on the protection motivation theory, it was inferred that the expected efficacy could have an impact on intentions of compliance. By the above logical reasoning, the integrative behavioral model and eight hypotheses could be derived. Data were collected by conducting a survey; 194 out of 207 questionnaires were available. The test of the causal model was conducted by PLS. The reliability, validity, and model fit were found to be statistically significant. The results of the hypotheses tests showed that seven of the eight hypotheses were acceptable. The theoretical implications of this study are as follows: (1) the study is expected to play a role of the baseline for future research about organization members’ compliance with the information security policy, (2) the study attempted an interdisciplinary approach by combining psychology and information system security research, and (3) the study suggested concrete operational definitions of influencing factors for information security policy compliance through a comprehensive theoretical review. Also, the study has some practical implications. First, it can provide the guideline to support the successful execution of the strategic establishment for the implement of information system security policies in organizations. Second, it proves that the need of education and training programs suppressing members’ neutralization intention to violate information security policy should be emphasized

    KEER2022

    Get PDF
    AvanttĂ­tol: KEER2022. DiversitiesDescripciĂł del recurs: 25 juliol 202
    • 

    corecore