6 research outputs found

    Authorship Authentication for Twitter Messages Using Support Vector Machine

    Get PDF
    With the rapid growth of internet usage, authorship authentication of online messages became challenging research topic in the last decades. In this paper, we used a team of support vector machines to authenticate 5 Twitter authors’ messages. SVM is one of the commonly used and strong classification algorithms in authorship attribution problems. SVM maps the linearly non separable input data to a higher dimensional space by a hyperplane via radial base functions. Firstly using the training data, 10 hyperplanes that separate pair wise five authors training data are built. Then the expertise of these SVMs combined to classify the testing data into five classes. 20 tweets with 16 features from each author were used for evaluation. In spite of the randomly choice of the features, one of the author accuracy around 75% is achieved

    Analysis of Students Emotion for Twitter Data using Naïve Bayes and Non Linear Support Vector Machine Approachs

    Get PDF
    Students' informal discussions on social media (e.g Twitter, Facebook) shed light into their educational understandings- opinions, feelings, and concerns about the knowledge process. Data from such surroundings can provide valuable knowledge about students learning. Examining such data, however can be challenging. The difficulty of students' experiences reflected from social media content requires human analysis. However, the growing scale of data demands spontaneous data analysis techniques. The posts of engineering students' on twitter is focused to understand issues and problems in their educational experiences. Analysis on samples taken from tweets related to engineering students' college life is conducted. The proposed work is to explore engineering students informal conversations on Twitter in order to understand issues and problems students encounter in their learning experiences. The encounter problems of engineering students from tweets such as heavy study load, lack of social engagement and sleep deprivation are considered as labels. To classify tweets reflecting students' problems multi-label classification algorithms is implemented. Non Linear Support Vector Machine, Naïve Bayes and Linear Support Vector Machine methods are used as multilabel classifiers which are implemented and compared in terms of accuracy. Non Linear SVM has shown more accuracy than Naïve Bayes classifier and linear Support Vector Machine classifier. The algorithms are used to train a detector of student problems from tweets. DOI: 10.17762/ijritcc2321-8169.150515

    Authorship Authentication of Short Messages from Social Networks Machines

    Get PDF
    Dataset consists of 17000 tweets collected from Twitter, as 500 tweets for each of 34 authors that meet certain criteria. Raw data is collected by using the software Nvivo. The collected raw data is preprocessed to extract frequencies of 200 features. In the data analysis 128 of features are eliminated since they are rare in tweets. As a progressive presentation, five – fifteen – twenty – twenty five – thirty and thirty four of these authors are selected each time. Since recurrent artificial neural networks are more stable and in general ANNs are more successful distinguishing two classes, for N authors, N×N neural networks are trained for pair wise classification. These experts then organized in N competing teams (CANNT) to aggregate decisions of these NXN experts. Then this procedure is repeated seven times and committees with seven members voted for final decision. By a commonest type voting, the accuracy is boosted around ten percent. Number of authors is seen not so effective on the accuracy of the authentication, and around 80% accuracy is achieved for any number of authors

    Authorship Authentication Using Short Messages from Social Networking Sites

    No full text
    Abstract-This paper presents and discusses several experiments in authorship authentication of short social network postings, an average of 20.6 words, from Facebook. The goal of this research is to determine the degree to which such postings can be authenticated as coming from the purported user and not from an intruder. Various sets of stylometry and ad hoc social networking features were developed to categorize short messages from thirty Facebook authors as authentic or non-authentic using Support Vector Machines. The challenges of applying traditional stylometry on short messages were discussed. The test results showed the impact of sample size, features, and user writing style on the effectiveness of authorship authentication, indicating varying degrees of success compared to previous studies in authorship authentication

    Authorship Authentication Using Short Messages from Social Networking Sites

    No full text

    AUTHOR VERIFICATION OF ELECTRONIC MESSAGING SYSTEMS

    Get PDF
    Messaging systems have become a hugely popular new paradigm for sending and delivering text messages; however, online messaging platforms have also become an ideal place for criminals due to their anonymity, ease of use and low cost. Therefore, the ability to verify the identity of individuals involved in criminal activity is becoming increasingly important. The majority of research in this area has focused on traditional authorship problems that deal with single-domain datasets and large bodies of text. Few research studies have sought to explore multi-platform author verification as a possible solution to problems around forensics and security. Therefore, this research has investigated the ability to identify individuals on messaging systems, and has applied this to the modern messaging platforms of Email, Twitter, Facebook and Text messages, using different single-domain datasets for population-based and user-based verification approaches. Through a novel technique of cross-domain research using real scenarios, the domain incompatibilities of profiles from different distributions has been assessed, based on real-life corpora using data from 50 authors who use each of the aforementioned domains. The results show that the use of linguistics is likely be similar between platforms, on average, for a population-based approach. The best corpus experimental result achieved a low EER of 7.97% for Text messages, showing the usefulness of single-domain platforms where the use of linguistics is likely be similar, such as Text messages and Emails. For the user-based approach, there is very little evidence of a strong correlation of stylometry between platforms. It has been shown that linguistic features on some individual platforms have features in common with other platforms, and lexical features play a crucial role in the similarities between users’ modern platforms. Therefore, this research shows that the ability to identify individuals on messaging platforms may provide a viable solution to problems around forensics and security, and help against a range of criminal activities, such as sending spam texts, grooming children, and encouraging violence and terrorism.Royal Embassy of Saudi Arabia, Londo
    corecore