374 research outputs found

    Semantics-driven event clustering in Twitter feeds

    Get PDF
    Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources - either textual, temporal, geographic or community features - have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic information can also be used to drive the actual event detection, which is less covered by academic research. We therefore supplemented an existing baseline event clustering algorithm with semantic information about the tweets in order to improve its performance. This paper lays out the details of the semantics-driven event clustering algorithms developed, discusses a novel method to aid in the creation of a ground truth for event detection purposes, and analyses how well the algorithms improve over baseline. We find that assigning semantic information to every individual tweet results in just a worse performance in F1 measure compared to baseline. If however semantics are assigned on a coarser, hashtag level the improvement over baseline is substantial and significant in both precision and recall

    Personal life event detection from social media

    Get PDF
    Creating video clips out of personal content from social media is on the rise. MuseumOfMe, Facebook Lookback, and Google Awesome are some popular examples. One core challenge to the creation of such life summaries is the identification of personal events, and their time frame. Such videos can greatly benefit from automatically distinguishing between social media content that is about someone's own wedding from that week, to an old wedding, or to that of a friend. In this paper, we describe our approach for identifying a number of common personal life events from social media content (in this paper we have used Twitter for our test), using multiple feature-based classifiers. Results show that combination of linguistic and social interaction features increases overall classification accuracy of most of the events while some events are relatively more difficult than others (e.g. new born with mean precision of .6 from all three models)

    Avatar captcha : telling computers and humans apart via face classification and mouse dynamics.

    Get PDF
    Bots are malicious, automated computer programs that execute malicious scripts and predefined functions on an affected computer. They pose cybersecurity threats and are one of the most sophisticated and common types of cybercrime tools today. They spread viruses, generate spam, steal personal sensitive information, rig online polls and commit other types of online crime and fraud. They sneak into unprotected systems through the Internet by seeking vulnerable entry points. They access the system’s resources like a human user does. Now the question arises how do we counter this? How do we prevent bots and on the other hand allow human users to access the system resources? One solution is by designing a CAPTCHA (Completely Automated Public Turing Tests to tell Computers and Humans Apart), a program that can generate and grade tests that most humans can pass but computers cannot. It is used as a tool to distinguish humans from malicious bots. They are a class of Human Interactive Proofs (HIPs) meant to be easily solvable by humans and economically infeasible for computers. Text CAPTCHAs are very popular and commonly used. For each challenge, they generate a sequence of alphabets by distorting standard fonts, requesting users to identify them and type them out. However, they are vulnerable to character segmentation attacks by bots, English language dependent and are increasingly becoming too complex for people to solve. A solution to this is to design Image CAPTCHAs that use images instead of text and require users to identify certain images to solve the challenges. They are user-friendly and convenient for human users and a much more challenging problem for bots to solve. In today’s Internet world the role of user profiling or user identification has gained a lot of significance. Identity thefts, etc. can be prevented by providing authorized access to resources. To achieve timely response to a security breach frequent user verification is needed. However, this process must be passive, transparent and non-obtrusive. In order for such a system to be practical it must be accurate, efficient and difficult to forge. Behavioral biometric systems are usually less prominent however, they provide numerous and significant advantages over traditional biometric systems. Collection of behavior data is non-obtrusive and cost-effective as it requires no special hardware. While these systems are not unique enough to provide reliable human identification, they have shown to be highly accurate in identity verification. In accomplishing everyday tasks, human beings use different styles, strategies, apply unique skills and knowledge, etc. These define the behavioral traits of the user. Behavioral biometrics attempts to quantify these traits to profile users and establish their identity. Human computer interaction (HCI)-based biometrics comprise of interaction strategies and styles between a human and a computer. These unique user traits are quantified to build profiles for identification. A specific category of HCI-based biometrics is based on recording human interactions with mouse as the input device and is known as Mouse Dynamics. By monitoring the mouse usage activities produced by a user during interaction with the GUI, a unique profile can be created for that user that can help identify him/her. Mouse-based verification approaches do not record sensitive user credentials like usernames and passwords. Thus, they avoid privacy issues. An image CAPTCHA is proposed that incorporates Mouse Dynamics to help fortify it. It displays random images obtained from Yahoo’s Flickr. To solve the challenge the user must identify and select a certain class of images. Two theme-based challenges have been designed. They are Avatar CAPTCHA and Zoo CAPTCHA. The former displays human and avatar faces whereas the latter displays different animal species. In addition to the dynamically selected images, while attempting to solve the CAPTCHA, the way each user interacts with the mouse i.e. mouse clicks, mouse movements, mouse cursor screen co-ordinates, etc. are recorded nonobtrusively at regular time intervals. These recorded mouse movements constitute the Mouse Dynamics Signature (MDS) of the user. This MDS provides an additional secure technique to segregate humans from bots. The security of the CAPTCHA is tested by an adversary executing a mouse bot attempting to solve the CAPTCHA challenges

    Framework for Real-Time Event Detection using Multiple Social Media Sources

    Get PDF
    Information about events happening in the real world are generated online on social media in real-time. There is substantial research done to detect these events using information posted on websites like Twitter, Tumblr, and Instagram. The information posted depends on the type of platform the website relies upon, such as short messages, pictures, and long form articles. In this paper, we extend an existing real-time event detection at onset approach to include multiple websites. We present three different approaches to merging information from two different social media sources. We also analyze the strengths and weaknesses of these approaches. We validate the detected events using newswire data that is collected during the same time period. Our results show that including multiple sources increases the number of detected events and also increase the quality of detected events.

    Optimized Ensemble Approach for Multi-model Event Detection in Big data

    Get PDF
    Event detection acts an important role among modern society and it is a popular computer process that permits to detect the events automatically. Big data is more useful for the event detection due to large size of data. Multimodal event detection is utilized for the detection of events using heterogeneous types of data. This work aims to perform for classification of diverse events using Optimized Ensemble learning approach. The Multi-modal event data including text, image and audio are sent to the user devices from cloud or server where three models are generated for processing audio, text and image. At first, the text, image and audio data is processed separately. The process of creating a text model includes pre-processing using Imputation of missing values and data normalization. Then the textual feature extraction using integrated N-gram approach. The Generation of text model using Convolutional two directional LSTM (2DCon_LSTM). The steps involved in image model generation are pre-processing using Min-Max Gaussian filtering (MMGF). Image feature extraction using VGG-16 network model and generation of image model using Tweaked auto encoder (TAE) model. The steps involved in audio model generation are pre-processing using Discrete wavelet transform (DWT). Then the audio feature extraction using Hilbert Huang transform (HHT) and Generation of audio model using Attention based convolutional capsule network (Attn_CCNet). The features obtained by the generated models of text, image and audio are fused together by feature ensemble approach. From the fused feature vector, the optimal features are trained through improved battle royal optimization (IBRO) algorithm. A deep learning model called Convolutional duo Gated recurrent unit with auto encoder (C-Duo GRU_AE) is used as a classifier. Finally, different types of events are classified where the global model are then sent to the user devices with high security and offers better decision making process. The proposed methodology achieves better performances are Accuracy (99.93%), F1-score (99.91%), precision (99.93%), Recall (99.93%), processing time (17seconds) and training time (0.05seconds). Performance analysis exceeds several comparable methodologies in precision, recall, accuracy, F1 score, training time, and processing time. This designates that the proposed methodology achieves improved performance than the compared schemes. In addition, the proposed scheme detects the multi-modal events accurately

    Tweet-SCAN: an event discovery technique for geo-located tweets

    Get PDF
    Twitter has become one of the most popular Location-based Social Networks (LBSNs) that bridges physical and virtual worlds. Tweets, 140-character-long messages, are aimed to give answer to the What’s happening? question. Occurrences and events in the real life (such as political protests, music concerts, natural disasters or terrorist acts) are usually reported through geo-located tweets by users on site. Uncovering event-related tweets from the rest is a challenging problem that necessarily requires exploiting different tweet features. With that in mind, we propose Tweet-SCAN, a novel event discovery technique based on the popular density-based clustering algorithm called DBSCAN. Tweet-SCAN takes into account four main features from a tweet, namely content, time, location and user to group together event-related tweets. The proposed technique models textual content through a probabilistic topic model called Hierarchical Dirichlet Process and introduces Jensen–Shannon distance for the task of neighborhood identification in the textual dimension. As a matter of fact, we show Tweet-SCAN performance in two real data sets of geo-located tweets posted during Barcelona local festivities in 2014 and 2015, for which some of the events were identified by domain experts beforehand. Through these tagged data sets, we are able to assess Tweet-SCAN capabilities to discover events, justify using a textual component and highlight the effects of several parameters.Peer ReviewedPostprint (author's final draft
    corecore