1,850 research outputs found

    Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter

    Get PDF
    Individual happiness is a fundamental societal metric. Normally measured through self-report, happiness has often been indirectly characterized and overshadowed by more readily quantifiable economic indicators such as gross domestic product. Here, we examine expressions made on the online, global microblog and social networking service Twitter, uncovering and explaining temporal variations in happiness and information levels over timescales ranging from hours to years. Our data set comprises over 46 billion words contained in nearly 4.6 billion expressions posted over a 33 month span by over 63 million unique users. In measuring happiness, we use a real-time, remote-sensing, non-invasive, text-based approach---a kind of hedonometer. In building our metric, made available with this paper, we conducted a survey to obtain happiness evaluations of over 10,000 individual words, representing a tenfold size improvement over similar existing word sets. Rather than being ad hoc, our word list is chosen solely by frequency of usage and we show how a highly robust metric can be constructed and defended.Comment: 27 pages, 17 figures, 3 tables. Supplementary Information: 1 table, 52 figure

    Malicious-URL Detection using Logistic Regression Technique

    Get PDF
    Over the last few years, the Web has seen a massive growth in the number and kinds of web services. Web facilities such as online banking, gaming, and social networking have promptly evolved as has the faith upon them by people to perform daily tasks. As a result, a large amount of information is uploaded on a daily to the Web. As these web services drive new opportunities for people to interact, they also create new opportunities for criminals. URLs are launch pads for any web attacks such that any malicious intention user can steal the identity of the legal person by sending the malicious URL. Malicious URLs are a keystone of Internet illegitimate activities. The dangers of these sites have created a mandates for defences that protect end-users from visiting them. The proposed approach is that classifies URLs automatically by using Machine-Learning algorithm called logistic regression that is used to binary classification. The classifiers achieves 97% accuracy by learning phishing URLs

    Implementation of a Noise Filter for Grouping in Bibliographic Databases using Latent Semantic Indexing

    Get PDF
    Clustering algorithms can assist in scientific research by presenting themes related to some topics from which we can extract information more easily. However, it is common for many of these clusters to have documents that have no relevance to the topic of interest, thereby reducing the quality of the information. We can manage the reduced quality of information of clusters for a bibliographic database by dealing with noise in the semantic space that represents the relations between the grouped documents. In this work, we sustain the hypothesis of using the Latent Semantic Indexing (LSI) technique as an efficient instrument to reduce noise and promote better group quality. Using a database of 90 scientific publications from different areas, we pre-processed the documents by LSI and grouped them using six clustering algorithms. The results were significantly improved compared to our initial results that did not use LSI-based pre-processing. From the perspective of individual performance of the algorithms demonstrating the best results, CMeans was the one that got the highest average gain, with approximately 25%, followed by K-Means and SKmeans, with 17% each; PAM, with 16.5%; and EM, with 15%. The conclusion is that Latent Semantic Indexing has proven to be a helpful tool for noise reduction. We recommend its use to improve the cluster quality of bibliographic databases significantly

    Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter

    Get PDF
    Introduction Prescription medication overdose is the fastest growing drug-related problem in the USA. The growing nature of this problem necessitates the implementation of improved monitoring strategies for investigating the prevalence and patterns of abuse of specific medications. Objectives Our primary aims were to assess the possibility of utilizing social media as a resource for automatic monitoring of prescription medication abuse and to devise an automatic classification technique that can identify potentially abuse-indicating user posts. Methods We collected Twitter user posts (tweets) associated with three commonly abused medications (Adderall®, oxycodone, and quetiapine). We manually annotated 6400 tweets mentioning these three medications and a control medication (metformin) that is not the subject of abuse due to its mechanism of action. We performed quantitative and qualitative analyses of the annotated data to determine whether posts on Twitter contain signals of prescription medication abuse. Finally, we designed an automatic supervised classification technique to distinguish posts containing signals of medication abuse from those that do not and assessed the utility of Twitter in investigating patterns of abuse over time. Results Our analyses show that clear signals of medication abuse can be drawn from Twitter posts and the percentage of tweets containing abuse signals are significantly higher for the three case medications (Adderall®: 23 %, quetiapine: 5.0 %, oxycodone: 12 %) than the proportion for the control medication (metformin: 0.3 %). Our automatic classification approach achieves 82 % accuracy overall (medication abuse class recall: 0.51, precision: 0.41, F measure: 0.46). To illustrate the utility of automatic classification, we show how the classification data can be used to analyze abuse patterns over time. Conclusion Our study indicates that social media can be a crucial resource for obtaining abuse-related information for medications, and that automatic approaches involving supervised classification and natural language processing hold promises for essential future monitoring and intervention tasks
    • …
    corecore