146 research outputs found
Non-Hierarchical Networks for Censorship-Resistant Personal Communication.
The Internet promises widespread access to the world’s collective information and fast communication among people, but common government censorship and spying undermines this potential. This censorship is facilitated by the Internet’s hierarchical structure. Most traffic flows through routers owned by a small number of ISPs, who can be secretly coerced into aiding such efforts. Traditional crypographic defenses are confusing to common users. This thesis advocates direct removal of the underlying heirarchical infrastructure instead, replacing it with non-hierarchical networks. These networks lack such chokepoints, instead requiring would-be censors to control a substantial fraction of the participating devices—an expensive proposition. We take four steps towards the development of practical non-hierarchical networks. (1) We first describe Whisper, a non-hierarchical mobile ad hoc network (MANET) architecture for personal communication among friends and family
that resists censorship and surveillance. At its core are two novel techniques, an efficient routing scheme based on the predictability of human locations anda variant of onion-routing suitable for decentralized MANETs. (2) We describe the design and implementation of Shout, a MANET architecture for censorship-resistant, Twitter-like public microblogging. (3) We describe the Mason test, amethod used to detect Sybil attacks in ad hoc networks in which trusted authorities are not available. (4) We characterize and model the aggregate behavior of Twitter users to enable simulation-based study of systems like Shout. We use our characterization of the retweet graph to analyze a novel spammer detection technique for Shout.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107314/1/drbild_1.pd
Combating Attacks and Abuse in Large Online Communities
Internet users today are connected more widely and ubiquitously than ever before. As a result, various online communities are formed, ranging from online social networks (Facebook, Twitter), to mobile communities (Foursquare, Waze), to content/interests based networks (Wikipedia, Yelp, Quora). While users are benefiting from the ease of access to information and social interactions, there is a growing concern for users' security and privacy against various attacks such as spam, phishing, malware infection and identity theft. Combating attacks and abuse in online communities is challenging. First, today’s online communities are increasingly dependent on users and user-generated content. Securing online systems demands a deep understanding of the complex and often unpredictable human behaviors. Second, online communities can easily have millions or even billions of users, which requires the corresponding security mechanisms to be highly scalable. Finally, cybercriminals are constantly evolving to launch new types of attacks. This further demands high robustness of security defenses. In this thesis, we take concrete steps towards measuring, understanding, and defending against attacks and abuse in online communities. We begin with a series of empirical measurements to understand user behaviors in different online services and the uniquesecurity and privacy challenges that users are facing with. This effort covers a broad set of popular online services including social networks for question and answering (Quora), anonymous social networks (Whisper), and crowdsourced mobile communities (Waze). Despite the differences of specific online communities, our study provides a first look at their user activity patterns based on empirical data, and reveals the need for reliable mechanisms to curate user content, protect privacy, and defend against emerging attacks. Next, we turn our attention to attacks targeting online communities, with focus on spam campaigns. While traditional spam is mostly generated by automated software, attackers today start to introduce "human intelligence" to implement attacks. This is maliciouscrowdsourcing (or crowdturfing) where a large group of real-users are organized to carry out malicious campaigns, such as writing fake reviews or spreading rumors on social media. Using collective human efforts, attackers can easily bypass many existing defenses (e.g.,CAPTCHA). To understand the ecosystem of crowdturfing, we first use measurements to examine their detailed campaign organization, workers and revenue. Based on insights from empirical data, we develop effective machine learning classifiers to detect crowdturfingactivities. In the meantime, considering the adversarial nature of crowdturfing, we also build practical adversarial models to simulate how attackers can evade or disrupt machine learning based defenses. To aid in this effort, we next explore using user behavior models to detect a wider range of attacks. Instead of making assumptions about attacker behavior, our idea is to model normal user behaviors and capture (malicious) behaviors that are deviated from norm. In this way, we can detect previously unknown attacks. Our behavior model is based on detailed clickstream data, which are sequences of click events generated by users when using the service. We build a similarity graph where each user is a node and the edges are weightedby clickstream similarity. By partitioning this graph, we obtain "clusters" of users with similar behaviors. We then use a small set of known good users to "color" these clusters to differentiate the malicious ones. This technique has been adopted by real-world social networks (Renren and LinkedIn), and already detected unexpected attacks. Finally, we extend clickstream model to understanding more-grained behaviors of attackers (and real users), and tracking how user behavior changes over time. In summary, this thesis illustrates a data-driven approach to understanding and defending against attacks and abuse in online communities. Our measurements have revealed new insights about how attackers are evolving to bypass existing security defenses today. Inaddition, our data-driven systems provide new solutions for online services to gain a deep understanding of their users, and defend them from emerging attacks and abuse
Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
Machine generated text is increasingly difficult to distinguish from human
authored text. Powerful open-source models are freely available, and
user-friendly tools that democratize access to generative models are
proliferating. ChatGPT, which was released shortly after the first preprint of
this survey, epitomizes these trends. The great potential of state-of-the-art
natural language generation (NLG) systems is tempered by the multitude of
avenues for abuse. Detection of machine generated text is a key countermeasure
for reducing abuse of NLG models, with significant technical challenges and
numerous open problems. We provide a survey that includes both 1) an extensive
analysis of threat models posed by contemporary NLG systems, and 2) the most
complete review of machine generated text detection methods to date. This
survey places machine generated text within its cybersecurity and social
context, and provides strong guidance for future work addressing the most
critical threat models, and ensuring detection systems themselves demonstrate
trustworthiness through fairness, robustness, and accountability.Comment: Manuscript submitted to ACM Special Session on Trustworthy AI.
2022/11/19 - Updated reference
Utilizing Multi-modal Weak Signals to Improve User Stance Inference in Social Media
Social media has become an integral component of the daily life. There are millions of various types of content being released into social networks daily. This allows for an interesting view into a users\u27 view on everyday life. Exploring the opinions of users in social media networks has always been an interesting subject for the Natural Language Processing researchers. Knowing the social opinions of a mass will allow anyone to make informed policy or marketing related decisions. This is exactly why it is desirable to find comprehensive social opinions. The nature of social media is complex and therefore obtaining the social opinion becomes a challenging task. Because of how diverse and complex social media networks are, they typically resonate with the actual social connections but in a digital platform. Similar to how users make friends and companions in the real world, the digital platforms enable users to mimic similar social connections. This work mainly looks at how to obtain a comprehensive social opinion out of social media network. Typical social opinion quantifiers will look at text contributions made by users to find the opinions. Currently, it is challenging because the majority of users on social media will be consuming content rather than expressing their opinions out into the world. This makes natural language processing based methods impractical due to not having linguistic features. In our work we look to improve a method named stance inference which can utilize multi-domain features to extract the social opinion. We also introduce a method which can expose users opinions even though they do not have on-topical content. We also note how by introducing weak supervision to an unsupervised task of stance inference we can improve the performance. The weak supervision we bring into the pipeline is through hashtags. We show how hashtags are contextual indicators added by humans which will be much likelier to be related than a topic model. Lastly we introduce disentanglement methods for chronological social media networks which allows one to utilize the methods we introduce above to be applied in these type of platforms
Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers
The flexibility in mobile communications allows customers to quickly switch from one service provider to
another, making customer churn one of the most critical challenges for the data and voice telecommunication
service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia
decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses.
Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended
on historical customer data to measure customer churn. However, historical data does not reveal current
customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing
churn rates are inadequate and faced some issues, particularly in the Saudi market.
This research was conducted to realize the relationship between customer satisfaction and customer churn
and how to use social media mining to measure customer satisfaction and predict customer churn.
This research conducted a systematic review to address the churn prediction models problems and their
relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating
structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings
show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic
language itself, its complexity, and lack of resources.
As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies,
comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted
from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a
new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits
the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and
churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in
Saudi telecom companies, which has not been attempted before. Different fields, such as education, have
different features, making applying the proposed model is interesting because it based on text-mining
Conversational artificial intelligence - demystifying statistical vs linguistic NLP solutions
yesThis paper aims to demystify the hype and attention on chatbots and its association with conversational artificial intelligence. Both are slowly emerging as a real presence in our lives from the impressive technological developments in machine learning, deep learning and natural language understanding solutions. However, what is under the hood, and how far and to what extent can chatbots/conversational artificial intelligence solutions work – is our question. Natural language is the most easily understood knowledge representation for people, but certainly not the best for computers because of its inherent ambiguous, complex and dynamic nature. We will critique the knowledge representation of heavy statistical chatbot solutions against linguistics alternatives. In order to react intelligently to the user, natural language solutions must critically consider other factors such as context, memory, intelligent understanding, previous experience, and personalized knowledge of the user. We will delve into the spectrum of conversational interfaces and focus on a strong artificial intelligence concept. This is explored via a text based conversational software agents with a deep strategic role to hold a conversation and enable the mechanisms need to plan, and to decide what to do next, and manage the dialogue to achieve a goal. To demonstrate this, a deep linguistically aware and knowledge aware text based conversational agent (LING-CSA) presents a proof-of-concept of a non-statistical conversational AI solution
Addressing the new generation of spam (Spam 2.0) through Web usage models
New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem
Recommended from our members
Social network support for data delivery infrastructures
Network infrastructures often need to stage content so that it is accessible to consumers. The standard solution, deploying the content on a centralised server, can be inadequate in several situations.
Our thesis is that information encoded in social networks can be used to tailor content staging decisions to the user base and thereby build better data delivery infrastructures. This claim is supported by two case studies, which apply social information in challenging situations where traditional content staging is infeasible. Our approach works by examining empirical traces to identify relevant social properties, and then exploits them.
The first study looks at cost-effectively serving the ``Long Tail'' of rich-media user-generated content, which need to be staged close to viewers to control latency and jitter. Our traces show that a preference for the unpopular tail items often spreads virally and is localised to some part of the social network. Exploiting this, we propose Buzztraq, which decreases replication costs by selectively copying items to locations favoured by viral spread. We also design SpinThrift, which separates popular and unpopular content based on the relative proportion of viral accesses, and opportunistically spins down disks containing unpopular content, thereby saving energy.
The second study examines whether human face-to-face contacts can efficiently create paths over time between arbitrary users. Here, content is staged by spreading it through intermediate users until the destination is reached. Flooding every node minimises delivery times but is not scalable. We show that the human contact network is resilient to individual path failures, and for unicast paths, can efficiently approximate flooding in delivery time distribution simply by randomly sampling a handful of paths found by it. Multicast by contained flooding within a community is also efficient. However, connectivity relies on rare contacts and frequent contacts are often not useful for data delivery.
Also, periods of similar duration could achieve different levels of connectivity; we devise a test to identify good periods. We finish by discussing how these properties influence routing algorithms.This work was supported by a St. John's College Benefactor's Scholarship and a Research Studentship from the Cambridge Philosophical Society
- …