3 research outputs found
Linguistic Features and Bi-LSTM for Identification of Fake News
With the spread of Internet technologies, the use of social media has increased exponentially. Although social media has many benefits, it has become the primary source of disinformation or fake news. The spread of fake news is creating many societal and economic issues. It has become very critical to develop an effective method to detect fake news so that it can be stopped, removed or flagged before spreading. To address the challenge of accurately detecting fake news, this paper proposes a solution called Statistical Word Embedding over Linguistic Features via Deep Learning (SWELDL Fake), which utilizes deep learning techniques to improve accuracy. The proposed model implements a statistical method called “principal component analysis” (PCA) on fake news textual representations to identify significant features that can help identify fake news. In addition, word embedding is employed to comprehend linguistic features and Bidirectional Long Short-Term Memory (Bi-LSTM) is utilized to classify news as true or fake. We used a benchmark dataset called SWELDL Fake to validate our proposed model, which has about 72,000 news articles collected from different benchmark datasets. Our model achieved a classification accuracy of 98.52% on fake news, surpassing the performance of state-of-the-art deep learning and machine learning models
Recommended from our members
A STUDY OF VARIOUS DATA SIZES USING MACHINE LEARNING
Social media is a great domain for news consumption; however, it is referred to as a double-edged sword. While it is user-friendly and low-cost, social media is the reason why fake news can spread rapidly, which is detrimental to society, businesses, and many consumers. Therefore, fake news detection is an emerging field. However, some challenges have restricted other researchers from developing a universal machine learning model that is fast, efficient, and reliable to stop the proliferation because of the lack of resources available, such as large-sized datasets. The goal of this culminating experience project is to explore how varying datasets sizes affect the accuracy percentage of a machine learning model. The research questions are: Q1) How do large volumes of fake news datasets affect the accuracy percentage of a machine learning model? Q2) As one increases the volume of data fed into a machine learning model from small to large datasets, what will the cutoff accuracy percentage point be? Various data sizes collected from Kaggle were fed into the machine learning model, NaĂŻve Bayes, to help answer the two questions. Then, all three datasets were combined together to see if the accuracy of the model improves as more data is fed into the model. The results and findings for each question are; 1) Larger dataset sizes do increase the accuracy percentage because there is more data to train and test on. 2) The cutoff accuracy is dependent on the number of unique values within the dataset. Since it is not finite, we can expect that large dataset sizes to have a cutoff accuracy of above 90%, given that the data is cleaned and pre-processed. Compared to a data size ranging from small to medium, it will achieve an accuracy score of around 70%-90%. An accuracy score below 70% means that the model is highly unreliable and that the dataset size is too small. For instance, Dataset 1 achieved an accuracy score of 66%, Dataset 2 was 83%, and Dataset 3 was 92%. To effectively study and experiment on how to build an optimized model, one must use a large dataset size for analysis. Furthermore, other areas for future studies that appeared from this study are building a new and improved fact-checking website that quickly and accurately processes large databases and how well the model, NaĂŻve Bayes works with different modalities, such as images and videos
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction