626 research outputs found

    Geocoding location expressions in Twitter messages: A preference learning method

    Get PDF
    Resolving location expressions in text to the correct physical location, also known as geocoding or grounding, is complicated by the fact that so many places around the world share the same name. Correct resolution is made even more difficult when there is little context to determine which place is intended, as in a 140-character Twitter message, or when location cues from different sources conflict, as may be the case among different metadata fields of a Twitter message. We used supervised machine learning to weigh the different fields of the Twitter message and the features of a world gazetteer to create a model that will prefer the correct gazetteer candidate to resolve the extracted expression. We evaluated our model using the F1 measure and compared it to similar algorithms. Our method achieved results higher than state-of-the-art competitors

    Predicting Influencer Virality on Twitter

    Get PDF
    The ability to successfully predict virality on Twitter holds great potential as a resource for Twitter influencers, enabling the development of more sophisticated strategies for audience engagement, audience monetization, and information sharing. To our knowledge, focusing exclusively on tweets posted by influencers is a novel context for studying Twitter virality. We find, among feature categories traditionally considered in the literature, that combining categories covering a range of information performs better than models only incorporating individual feature categories. Moreover, our general predictive model, encompassing a range of feature categories, achieves a prediction accuracy of 68% for influencer virality. We also investigate the role of influencer audiences in predicting virality, a topic we believe to be understudied in the literature. We suspect that incorporating audience information will allow us to better discriminate between virality classes, thus leading to better predictions. We pursue two different approaches, resulting in 10 different predictive models that leverage influencer audience information in addition to traditional feature categories. Both of our attempts to incorporate audience information plateau at an accuracy of approximately 61%, roughly a 7% decrease in performance compared to our general predictive model. We conclude that we are unable to find experimental evidence to support our claim that incorporating influencer audience information will improve virality predictions. Nonetheless, the performance of our general model holds promise for the deployment of a tool that allows influencers to reap the benefits of virality prediction. As stronger performance from the underlying model would make this tool more useful in practice to influencers, improving the predictive performance of our general model is a cornerstone of future work

    Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media

    Full text link
    Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201

    Multiple Kernel-Based Multimedia Fusion for Automated Event Detection from Tweets

    Get PDF
    A method for detecting hot events such as wildfires is proposed. It uses visual and textual information to improve detection. Starting with picking up tweets having texts and images, it preprocesses the data to eliminate unwanted data, transforms unstructured data into structured data, then extracts features. Text features include term frequency-inverse document frequency. Image features include histogram of oriented gradients, gray-level co-occurrence matrix, color histogram, and scale-invariant feature transform. Next, it inputs the features to the multiple kernel learning (MKL) for fusion to automatically combine both feature types to achieve the best performance. Finally, it does event detection. The method was tested on Brisbane hailstorm 2014 and California wildfires 2017. It was compared with methods that used text only or images only. With the Brisbane hailstorm data, the proposed method achieved the best performance, with a fusion accuracy of 0.93, comparing to 0.89 with text only, and 0.85 with images only. With the California wildfires data, a similar performance was recorded. It has demonstrated that event detection in Twitter is enhanced and improved by combination of multiple features. It has delivered an accurate and effective event detection method for spreading awareness and organizing responses, leading to better disaster management

    Automatic stance detection on political discourse in Twitter

    Get PDF
    The majority of opinion mining tasks in natural language processing (NLP) have been focused on sentiment analysis of texts about products and services while there is comparatively less research on automatic detection of political opinion. Almost all previous research work has been done for English, while this thesis is focused on the automatic detection of stance (whether he or she is favorable or not towards important political topic) from Twitter posts in Catalan, Spanish and English. The main objective of this work is to build and compare automatic stance detection systems using supervised both classic machine and deep learning techniques. We also study the influence of text normalization and perform experiments with differentt methods for word representations such as TF-IDF measures for unigrams, word embeddings, tweet embeddings, and contextual character-based embeddings. We obtain state-of-the-art results in the stance detection task on the IberEval 2018 dataset. Our research shows that text normalization and feature selection is important for the systems with unigram features, and does not affect the performance when working with word vector representations. Classic methods such as unigrams and SVM classifier still outperform deep learning techniques, but seem to be prone to overfitting. The classifiers trained using word vector representations and the neural network models encoded with contextual character-based vectors show greater robustness
    • …
    corecore