118 research outputs found
Real-time Content Identification for Events and Sub-Events from Microblogs.
PhDIn an age when people are predisposed to report real-world events through their social
media accounts, many researchers value the advantages of mining such unstructured
and informal data from social media. Compared with the traditional news media, online
social media services, such as Twitter, can provide more comprehensive and timely
information about real-world events. Existing Twitter event monitoring systems analyse
partial event data and are unable to report the underlying stories or sub-events in realtime.
To ll this gap, this research focuses on the automatic identi cation of content for
events and sub-events through the analysis of Twitter streams in real-time.
To full the need of real-time content identification for events and sub-events, this research
First proposes a novel adaptive crawling model that retrieves extra event content
from the Twitter Streaming API. The proposed model analyses the characteristics of
hashtags and tweets collected from live Twitter streams to automate the expansion of
subsequent queries. By investigating the characteristics of Twitter hashtags, this research
then proposes three Keyword Adaptation Algorithms (KwAAs) which are based
on the term frequency (TF-KwAA), the tra c pattern (TP-KwAA), and the text content
of associated tweets (CS-KwAA) of the emerging hashtags. Based on the comparison
between traditional keyword crawling and adaptive crawling with di erent KwAAs, this
thesis demonstrates that the KwAAs retrieve extra event content about sub-events in
real-time for both planned and unplanned events.
To examine the usefulness of extra event content for the event monitoring system, a
Twitter event monitoring solution is proposed. This \Detection of Sub-events by Twit-
ter Real-time Monitoring (DSTReaM)" framework concurrently runs multiple instances
of a statistical-based event detection algorithm over different stream components. By
evaluating the detection performance using detection accuracy and event entropy, this
research demonstrates that better event detection can be achieved with a broader coverage
of event content.School of Electronic Engineering
Computer Science (EECS), Queen Mary University of London (QMUL)
China Scholarship Council (CSC)
Tracking physical events on social media
Social media platforms have emerged as the widely accessed form of communication channel on the world wide web in the modern day. The first ever social networking website came into existence in the year 2002 and currently there are about 2.08 billion social media users around the globe. The participation of users within a social network can be considered as an act of sensing where they are interacting with the physical world and recording the corresponding observations in the form of texts, pictures, videos, etc. This phenomenon is termed as Social Sensing and motivates us to develop robust techniques which can estimate the physical state from the human observations.
This dissertation addresses a set of problems related to detection and tracking of real-world events. The term ‘event’ refers to an entity that can be characterized by spatial and temporal properties. With the help of these properties we design novel mathematical models that help us with our goals. We first focus on a simple event detection technique using ‘Twitter’ as the source of information. The method described in this work allow us to perform detection in a completely language independent and unsupervised fashion. We next extend the event detection problem to a different type of social media, ‘Instagram’, which allows users to share pictorial information of nearby observations. With the availability of geotagged data we solve two different subproblems - the first one is to detect and geolocalize the instance of an event and the second one is to estimate the path taken by an event during its course. The next problem we look at is related to improving the quality of event localization with the help of text and metadata information. Twitter, in general, has less volume of geotagged data available in comparison to Instagram, which demands us to design methods that explore the supplementary information available from the detected events. Finally, we take a look at both the social networks at the same time in order to utilize the complementary advantages and perform better than the methods designed for the individual networks
Detecting Political Framing Shifts and the Adversarial Phrases within\\ Rival Factions and Ranking Temporal Snapshot Contents in Social Media
abstract: Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints, grievances, and goals. Methods for monitoring and summarizing these types of sociopolitical trends, its leaders and followers, messages, and dynamics are needed. In this dissertation, a framework comprising of community and content-based computational methods is presented to provide insights for multilingual and noisy political social media content. First, a model is developed to predict the emergence of viral hashtag breakouts, using network features. Next, another model is developed to detect and compare individual and organizational accounts, by using a set of domain and language-independent features. The third model exposes contentious issues, driving reactionary dynamics between opposing camps. The fourth model develops community detection and visualization methods to reveal underlying dynamics and key messages that drive dynamics. The final model presents a use case methodology for detecting and monitoring foreign influence, wherein a state actor and news media under its control attempt to shift public opinion by framing information to support multiple adversarial narratives that facilitate their goals. In each case, a discussion of novel aspects and contributions of the models is presented, as well as quantitative and qualitative evaluations. An analysis of multiple conflict situations will be conducted, covering areas in the UK, Bangladesh, Libya and the Ukraine where adversarial framing lead to polarization, declines in social cohesion, social unrest, and even civil wars (e.g., Libya and the Ukraine).Dissertation/ThesisDoctoral Dissertation Computer Science 201
Webometrics benefitting from web mining? An investigation of methods and applications of two research fields
Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms
Recommended from our members
Regulating Data in the European Union and United States: Privacy, Access, Portability & APIs
This dissertation examines the way that demands for more control over the collection, processing, and sharing of personal data are being managed by both government and industry leaders with strategies that appear to comply with regulations, but that fail to do so. These are “by-design” strategies used by individuals to unilaterally manage their data with automated tools.
I take a multimethod approach that combines autoethnography, reverse engineering techniques, and data analysis to assess the implementation of by-design services implemented by Facebook, Twitter, and Instagram in compliance with current European Union regulations for access and portability. I also employ archival research, discourse analysis, interviews, and participant observation.
I argue that self-led, by-design approaches do not answer the demands for more control over personal data. The regulatory and technical resources put in place for individuals to control their data are not effective because they turn over decisions about execution to an industry with no interest in sharing that data or being regulated. If policymakers continue to pursue by-design approaches, they will need to learn how to test the techniques, and the execution of the techniques, provided by industry. They will need to assess the impact on data that is made available. So that results can be evaluated, by-design tools like the ones I assessed must be accompanied by clear and detailed details about design choices and procedures. In this vein, I offer directions for critical scrutiny, including standards and measuring the impact of APIs.
I conclude that self-managed, by-design approaches are not the source of the problem. But they are a symptom of the need for critical scrutiny over the execution of tools like the ones offered by Facebook, Twitter, and Instagram. Ultimately, I found that portability and access are legally and technically fraught. However, despite the shortcomings of by-design approaches, personal data can be more effectively regulated in Europe than in the United States as the result of current regulations
Repurposing digital traces to organize social attention
In the late 1990s, Google pioneered the idea of scraping and repurposing digital traces
as a new form of data with which to understand people’s preferences and behaviour.
This way of generating empirical sensitivity towards the world can be termed digital
methods and the last five years have seen such methods gain influence beyond the field
of Internet search. Organizations of different kinds are increasingly mentioning the
need to harness the intelligence of ‘big’ digital datasets, and the social sciences have
similarly been marked by suggestions to move away from established methods such as
surveys and focus groups, and learn from the way Google and other companies have
succeeded in turning big datasets into knowledge of social dynamics. By enabling new
combinations of data and software and by providing new ways of searching,
aggregating, and cross-referencing empirical datasets, it seems probable that the spread
of digital methods will re-configure the way organizations, social scientists, and
citizens ‘see’ the world in which they live
Combating Attacks and Abuse in Large Online Communities
Internet users today are connected more widely and ubiquitously than ever before. As a result, various online communities are formed, ranging from online social networks (Facebook, Twitter), to mobile communities (Foursquare, Waze), to content/interests based networks (Wikipedia, Yelp, Quora). While users are benefiting from the ease of access to information and social interactions, there is a growing concern for users' security and privacy against various attacks such as spam, phishing, malware infection and identity theft. Combating attacks and abuse in online communities is challenging. First, today’s online communities are increasingly dependent on users and user-generated content. Securing online systems demands a deep understanding of the complex and often unpredictable human behaviors. Second, online communities can easily have millions or even billions of users, which requires the corresponding security mechanisms to be highly scalable. Finally, cybercriminals are constantly evolving to launch new types of attacks. This further demands high robustness of security defenses. In this thesis, we take concrete steps towards measuring, understanding, and defending against attacks and abuse in online communities. We begin with a series of empirical measurements to understand user behaviors in different online services and the uniquesecurity and privacy challenges that users are facing with. This effort covers a broad set of popular online services including social networks for question and answering (Quora), anonymous social networks (Whisper), and crowdsourced mobile communities (Waze). Despite the differences of specific online communities, our study provides a first look at their user activity patterns based on empirical data, and reveals the need for reliable mechanisms to curate user content, protect privacy, and defend against emerging attacks. Next, we turn our attention to attacks targeting online communities, with focus on spam campaigns. While traditional spam is mostly generated by automated software, attackers today start to introduce "human intelligence" to implement attacks. This is maliciouscrowdsourcing (or crowdturfing) where a large group of real-users are organized to carry out malicious campaigns, such as writing fake reviews or spreading rumors on social media. Using collective human efforts, attackers can easily bypass many existing defenses (e.g.,CAPTCHA). To understand the ecosystem of crowdturfing, we first use measurements to examine their detailed campaign organization, workers and revenue. Based on insights from empirical data, we develop effective machine learning classifiers to detect crowdturfingactivities. In the meantime, considering the adversarial nature of crowdturfing, we also build practical adversarial models to simulate how attackers can evade or disrupt machine learning based defenses. To aid in this effort, we next explore using user behavior models to detect a wider range of attacks. Instead of making assumptions about attacker behavior, our idea is to model normal user behaviors and capture (malicious) behaviors that are deviated from norm. In this way, we can detect previously unknown attacks. Our behavior model is based on detailed clickstream data, which are sequences of click events generated by users when using the service. We build a similarity graph where each user is a node and the edges are weightedby clickstream similarity. By partitioning this graph, we obtain "clusters" of users with similar behaviors. We then use a small set of known good users to "color" these clusters to differentiate the malicious ones. This technique has been adopted by real-world social networks (Renren and LinkedIn), and already detected unexpected attacks. Finally, we extend clickstream model to understanding more-grained behaviors of attackers (and real users), and tracking how user behavior changes over time. In summary, this thesis illustrates a data-driven approach to understanding and defending against attacks and abuse in online communities. Our measurements have revealed new insights about how attackers are evolving to bypass existing security defenses today. Inaddition, our data-driven systems provide new solutions for online services to gain a deep understanding of their users, and defend them from emerging attacks and abuse
Advanced Location-Based Technologies and Services
Since the publication of the first edition in 2004, advances in mobile devices, positioning sensors, WiFi fingerprinting, and wireless communications, among others, have paved the way for developing new and advanced location-based services (LBSs). This second edition provides up-to-date information on LBSs, including WiFi fingerprinting, mobile computing, geospatial clouds, geospatial data mining, location privacy, and location-based social networking. It also includes new chapters on application areas such as LBSs for public health, indoor navigation, and advertising. In addition, the chapter on remote sensing has been revised to address advancements
- …