22 research outputs found

    An Evaluation of Text Representation Techniques for Fake News Detection Using: TF-IDF, Word Embeddings, Sentence Embeddings with Linear Support Vector Machine.

    Get PDF
    In a world where anybody can share their views, opinions and make it sound like these are facts about the current situation of the world, Fake News poses a huge threat especially to the reputation of people with high stature and to organizations. In the political world, this could lead to opposition parties making use of this opportunity to gain popularity in their elections. In the medical world, a fake scandalous message about a medicine giving side effects, hospital treatment gone wrong or even a false message against a practicing doctor could become a big menace to everyone involved in that news. In the world of business, one false news becoming a trending topic could definitely disrupt their future business earnings. The detection of such false news becomes very important in today’s world, where almost everyone has an access to use a mobile phone and can cause enough disruption by creating one false statement and making it a viral hit. Generation of fake news articles gathered more attention during the US Presidential Elections in 2016, leading to a high number of scientists and researchers to explore this NLP problem with deep interest and a sense of urgency too. This research intends to develop and compare a Fake News classifier using Linear Support Vector Machine Classifier built on traditional text feature representation technique Term Frequency Inverse Document Frequency (Ahmed, Traore & Saad, 2017), against a classifier built on the latest developments for text feature representations such as: word embeddings using ‘word2vec’ and sentence embeddings using ‘Universal Sentence Encoder’

    Learning discrete word embeddings to achieve better interpretability and processing efficiency

    Full text link
    L’omniprĂ©sente utilisation des plongements de mot dans le traitement des langues naturellesest la preuve de leur utilitĂ© et de leur capacitĂ© d’adaptation a une multitude de tĂąches. Ce-pendant, leur nature continue est une importante limite en terme de calculs, de stockage enmĂ©moire et d’interprĂ©tation. Dans ce travail de recherche, nous proposons une mĂ©thode pourapprendre directement des plongements de mot discrets. Notre modĂšle est une adaptationd’une nouvelle mĂ©thode de recherche pour base de donnĂ©es avec des techniques dernier crien traitement des langues naturelles comme les Transformers et les LSTM. En plus d’obtenirdes plongements nĂ©cessitant une fraction des ressources informatiques nĂ©cĂ©ssaire Ă  leur sto-ckage et leur traitement, nos expĂ©rimentations suggĂšrent fortement que nos reprĂ©sentationsapprennent des unitĂ©s de bases pour le sens dans l’espace latent qui sont analogues Ă  desmorphĂšmes. Nous appelons ces unitĂ©s dessememes, qui, de l’anglaissemantic morphemes,veut dire morphĂšmes sĂ©mantiques. Nous montrons que notre modĂšle a un grand potentielde gĂ©nĂ©ralisation et qu’il produit des reprĂ©sentations latentes montrant de fortes relationssĂ©mantiques et conceptuelles entre les mots apparentĂ©s.The ubiquitous use of word embeddings in Natural Language Processing is proof of theirusefulness and adaptivity to a multitude of tasks. However, their continuous nature is pro-hibitive in terms of computation, storage and interpretation. In this work, we propose amethod of learning discrete word embeddings directly. The model is an adaptation of anovel database searching method using state of the art natural language processing tech-niques like Transformers and LSTM. On top of obtaining embeddings requiring a fractionof the resources to store and process, our experiments strongly suggest that our representa-tions learn basic units of meaning in latent space akin to lexical morphemes. We call theseunitssememes, i.e., semantic morphemes. We demonstrate that our model has a greatgeneralization potential and outputs representation showing strong semantic and conceptualrelations between related words

    How Do Multilingual Encoders Learn Cross-lingual Representation?

    Get PDF
    NLP systems typically require support for more than one language. As different languages have different amounts of supervision, cross-lingual transfer benefits languages with little to no training data by transferring from other languages. From an engineering perspective, multilingual NLP benefits development and maintenance by serving multiple languages with a single system. Both cross-lingual transfer and multilingual NLP rely on cross-lingual representations serving as the foundation. As BERT revolutionized representation learning and NLP, it also revolutionized cross-lingual representations and cross-lingual transfer. Multilingual BERT was released as a replacement for single-language BERT, trained with Wikipedia data in 104 languages. Surprisingly, without any explicit cross-lingual signal, multilingual BERT learns cross-lingual representations in addition to representations for individual languages. This thesis first shows such surprising cross-lingual effectiveness compared against prior art on various tasks. Naturally, it raises a set of questions, most notably how do these multilingual encoders learn cross-lingual representations. In exploring these questions, this thesis will analyze the behavior of multilingual models in a variety of settings on high and low resource languages. We also look at how to inject different cross-lingual signals into multilingual encoders, and the optimization behavior of cross-lingual transfer with these models. Together, they provide a better understanding of multilingual encoders on cross-lingual transfer. Our findings will lead us to suggested improvements to multilingual encoders and cross-lingual transfer

    Developing a Framework to Identify Professional Skills Required for Banking Sector Employee in UK using Natural Language Processing (NLP) Techniques

    Get PDF
    The banking sector is changing dramatically, and new studies reveal that many financial institutions are having challenges keeping up with technology advancements and an acute shortage of skilled workers. The banking industry is changing into a dynamic field where success requires a wide range of talents. For the industry to properly analyses, match, and develop personnel, a strong skill identification process is needed. The objective of this research is to establish a framework for determining the competencies needed by banking industry experts through data extraction from job postings on UK websites.Data is extracted from job vacancy websites leveraging web-based annotation tools and Natural Language Processing (NLP) techniques. This study starts by conducting a thorough examination of the literature to investigate the theoretical underpinnings of NLP techniques, its applications in talent management and human resources within the banking industry, and its potential for skill identification. Next, textual data from job ads is processed using NLP techniques to extract and categorize talents unique to these categories. Advanced algorithms and approaches are used in the NLP-based development process to automatically extract skills from unstructured textual material, guaranteeing that the skills gathered are accurate and most relevant to the needs of the banking industry. To make sure the NLP techniques-driven skill identification is accurate and up to date, the extracted skills are verified by expert feedback. In the final phase, machine learning models are employed to predict the skills required for banking sector employees. This study delves into various machine learning techniques, which are implemented within the framework. By preprocessing and training on skills extracted from job advertisements, these models undergo evaluation to assess their effectiveness in skill prediction. The results offer a detailed analysis of each model's performance, with metrics such as recall, precision, and F1-score being used for assessment. This comprehensive examination underscores the potential of machine learning in skill identification and highlights its relevance in the banking sector.Key Words: Machine Learning, Banking Sector, Employability, Data Mining, NLP, Semantic analysis, Skill assessment, Skill Recognition, Talent managemen

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f
    corecore