178,470 research outputs found
Recommended from our members
Text Classification: Exploiting the Social Network
Within the context of social networks, existing methods for document classification tasks typically only capture textual semantics while ignoring the text’s metadata, e.g., the users who exchange emails and the communication networks they form. However, some work has shown that incorporating the social network information in addition to information from language is useful for various NLP applications, including sentiment analysis, inferring user attributes, and predicting interpersonal relations.
In this thesis, we present empirical studies of incorporating social network information from the underlying communication graphs for various text classification tasks. We show different graph representations for different problems. Also, we introduce social network features extracted from these graphs. We use and extend graph embedding models for text classification.
Our contributions are as follows. First, we have annotated large datasets of emails with fine-grained business and personal labels. Second, we propose graph representations for the social networks induced from documents and users and apply them on different text classification tasks. Third, we propose social network features extracted from these structures for documents and users. Fourth, we exploit different methods for modeling the social network of communication for four tasks: email classification into business and personal, overt display of power detection in emails, hierarchical power detection in emails, and Reddit post classification.
Our main findings are: incorporating the social network information using our proposed methods improves the classification performance for all of the four tasks, and we beat the state-of-the-art graph embedding based model on the three tasks on email; additionally, for the fourth task (Reddit post classification), we argue that simple methods with the proper representation for the task can outperform a state-of-the-art generic model
Identifying Graphs from Noisy Observational Data
There is a growing amount of data describing networks -- examples include social networks, communication networks, and biological networks. As the amount of available data increases, so does our interest in analyzing the properties and characteristics of these networks. However, in most cases the data is noisy, incomplete, and the result of passively acquired observational data; naively analyzing these networks without taking these errors into account can result in inaccurate and misleading conclusions. In my dissertation, I study the tasks of entity resolution, link prediction, and collective classification to address these deficiencies. I describe these tasks in detail and discuss my own work on each of these tasks. For entity resolution, I develop a method for resolving the identities of name mentions in email communications. For link prediction, I develop a method for inferring subordinate-manager relationships between individuals in an email communication network. For collective classification, I propose an adaptive active surveying method to address node labeling in a query-driven setting on network data. In many real-world settings, however, these deficiencies are not found in isolation and all need to be addressed to infer the desired complete and accurate network. Furthermore, because of the dependencies typically found in these tasks, the tasks are inherently inter-related and must be performed jointly. I define the general problem of graph identification which simultaneously performs these tasks; removing the noise and missing values in the observed input network and inferring the complete and accurate output network. I present a novel approach to graph identification using a collection of Coupled Collective Classifiers, C3, which, in addition to capturing the variety of features typically used for each task, can capture the intra- and inter-dependencies required to correctly infer nodes, edges, and labels in the output network. I discuss variants of C3 using different learning and inference paradigms and show the superior performance of C3, in terms of both prediction quality and runtime performance, over various previous approaches. I then conclude by presenting the Graph Alignment, Identification, and Analysis (GAIA) open-source software library which not only provides an implementation of C3 but also algorithms for various tasks in network data such as entity resolution, link prediction, collective classification, clustering, active learning, data generation, and analysis
Understanding implicit social context in electronic communication
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2002.Includes bibliographical references (p. 71-72).Artificial Intelligence (Al) has shown competence in helping people with complex cognitive decisions like air traffic control and playing chess. The goal of this work is to demonstrate that Al can help people with social decisions. In this work Artificial Intelligence of Social Networks is used to improve human-human communication, recognizing the social characteristics of human relations in order to achieve a more natural online communication interface. Can a computer learn to understand the value of communication? It is shown here that a first attempt at social context classification performs with almost 70% reliability. Could a computer use this to help a person relate to other people through technology? The addition of social context to an email interface is shown to have a positive effect in a user's online communication behavior. Email is a tool that people use practically every day, making an implicit statement about their relationships with other people, and providing an opportunity for a computer to learn about their social network. Furthermore, over the years people have come to utilize and depend on email more in their daily lives, but the tool has hardly changed to help people deal with the overwhelming amount of information. Many of the social cues that allow people to naturally function with their social network are not inherent or obvious in Computer Mediated Communication (CMC). This work offers automatic social network analysis as a means to bring these cues to CMC and to foster the user's coherent understanding of the people and resources of their communication network.by Andrea Lyn Lockerd.S.M
Analyzing Social and Stylometric Features to Identify Spear phishing Emails
Spear phishing is a complex targeted attack in which, an attacker harvests
information about the victim prior to the attack. This information is then used
to create sophisticated, genuine-looking attack vectors, drawing the victim to
compromise confidential information. What makes spear phishing different, and
more powerful than normal phishing, is this contextual information about the
victim. Online social media services can be one such source for gathering vital
information about an individual. In this paper, we characterize and examine a
true positive dataset of spear phishing, spam, and normal phishing emails from
Symantec's enterprise email scanning service. We then present a model to detect
spear phishing emails sent to employees of 14 international organizations, by
using social features extracted from LinkedIn. Our dataset consists of 4,742
targeted attack emails sent to 2,434 victims, and 9,353 non targeted attack
emails sent to 5,912 non victims; and publicly available information from their
LinkedIn profiles. We applied various machine learning algorithms to this
labeled data, and achieved an overall maximum accuracy of 97.76% in identifying
spear phishing emails. We used a combination of social features from LinkedIn
profiles, and stylometric features extracted from email subjects, bodies, and
attachments. However, we achieved a slightly better accuracy of 98.28% without
the social features. Our analysis revealed that social features extracted from
LinkedIn do not help in identifying spear phishing emails. To the best of our
knowledge, this is one of the first attempts to make use of a combination of
stylometric features extracted from emails, and social features extracted from
an online social network to detect targeted spear phishing emails.Comment: Detection of spear phishing using social media feature
- …