43,463 research outputs found
Behavior Profiling of Email
This paper describes the forensic and intelligence analysis capabilities of the Email Mining Toolkit (EMT) under development at the Columbia Intrusion Detection (IDS) Lab. EMT provides the means of loading, parsing and analyzing email logs, including content, in a wide range of formats. Many tools and techniques have been available from the fields of Information Retrieval (IR) and Natural Language Processing (NLP) for analyzing documents of various sorts, including emails. EMT, however, extends these kinds of analyses with an entirely new set of analyses that model "user behavior." EMT thus models the behavior of individual user email accounts, or groups of accounts, including the "social cliques" revealed by a user's email behavior
Profiling Users by Modeling Web Transactions
Users of electronic devices, e.g., laptop, smartphone, etc. have
characteristic behaviors while surfing the Web. Profiling this behavior can
help identify the person using a given device. In this paper, we introduce a
technique to profile users based on their web transactions. We compute several
features extracted from a sequence of web transactions and use them with
one-class classification techniques to profile a user. We assess the efficacy
and speed of our method at differentiating 25 users on a dataset representing 6
months of web traffic monitoring from a small company network.Comment: Extended technical report of an IEEE ICDCS 2017 publicatio
Beyond Classification: Latent User Interests Profiling from Visual Contents Analysis
User preference profiling is an important task in modern online social
networks (OSN). With the proliferation of image-centric social platforms, such
as Pinterest, visual contents have become one of the most informative data
streams for understanding user preferences. Traditional approaches usually
treat visual content analysis as a general classification problem where one or
more labels are assigned to each image. Although such an approach simplifies
the process of image analysis, it misses the rich context and visual cues that
play an important role in people's perception of images. In this paper, we
explore the possibilities of learning a user's latent visual preferences
directly from image contents. We propose a distance metric learning method
based on Deep Convolutional Neural Networks (CNN) to directly extract
similarity information from visual contents and use the derived distance metric
to mine individual users' fine-grained visual preferences. Through our
preliminary experiments using data from 5,790 Pinterest users, we show that
even for the images within the same category, each user possesses distinct and
individually-identifiable visual preferences that are consistent over their
lifetime. Our results underscore the untapped potential of finer-grained visual
preference profiling in understanding users' preferences.Comment: 2015 IEEE 15th International Conference on Data Mining Workshop
Virus Propagation in Multiple Profile Networks
Suppose we have a virus or one competing idea/product that propagates over a
multiple profile (e.g., social) network. Can we predict what proportion of the
network will actually get "infected" (e.g., spread the idea or buy the
competing product), when the nodes of the network appear to have different
sensitivity based on their profile? For example, if there are two profiles
and in a network and the nodes of profile
and profile are susceptible to a highly spreading
virus with probabilities and
respectively, what percentage of both profiles will actually get infected from
the virus at the end? To reverse the question, what are the necessary
conditions so that a predefined percentage of the network is infected? We
assume that nodes of different profiles can infect one another and we prove
that under realistic conditions, apart from the weak profile (great
sensitivity), the stronger profile (low sensitivity) will get infected as well.
First, we focus on cliques with the goal to provide exact theoretical results
as well as to get some intuition as to how a virus affects such a multiple
profile network. Then, we move to the theoretical analysis of arbitrary
networks. We provide bounds on certain properties of the network based on the
probabilities of infection of each node in it when it reaches the steady state.
Finally, we provide extensive experimental results that verify our theoretical
results and at the same time provide more insight on the problem
Digital Food Marketing to Children and Adolescents: Problematic Practices and Policy Interventions
Examines trends in digital marketing to youth that uses "immersive" techniques, social media, behavioral profiling, location targeting and mobile marketing, and neuroscience methods. Recommends principles for regulating inappropriate advertising to youth
"Are you accepting new patients?" A pilot field experiment on telephone-based gatekeeping and Black patients' access to pediatric care.
STUDY OBJECTIVES:
To determine whether name and accent cues that the caller is Black shape physician offices' responses to telephone-based requests for well-child visits.
METHOD AND DATA:
In this pilot study, we employed a quasi-experimental audit design and examined a stratified national sample of pediatric and family practice offices. Our final data include information from 205 audits (410 completed phone calls). Qualitative data were blind-coded into binary variables. Our case-control comparisons using McNemar's tests focused on acceptance of patients, withholding information, shaping conversations, and misattributions.
FINDINGS:
Compared to the control group, "Black" auditors were less likely to be told an office was accepting new patients and were more likely to experience both withholding behaviors and misattributions about public insurance. The strength of associations varied according to whether the cue was based on name or accent. Additionally, the likelihood and ways office personnel communicated that they were not accepting patients varied by region.
CONCLUSIONS:
Linguistic profiling over the telephone is an aspect of structural racism that should be further studied and perhaps integrated into efforts to promote equitable access to care. Future research should look reactions to both name and accent, taking practice characteristics and regional differences into consideration
Profiling user activities with minimal traffic traces
Understanding user behavior is essential to personalize and enrich a user's
online experience. While there are significant benefits to be accrued from the
pursuit of personalized services based on a fine-grained behavioral analysis,
care must be taken to address user privacy concerns. In this paper, we consider
the use of web traces with truncated URLs - each URL is trimmed to only contain
the web domain - for this purpose. While such truncation removes the
fine-grained sensitive information, it also strips the data of many features
that are crucial to the profiling of user activity. We show how to overcome the
severe handicap of lack of crucial features for the purpose of filtering out
the URLs representing a user activity from the noisy network traffic trace
(including advertisement, spam, analytics, webscripts) with high accuracy. This
activity profiling with truncated URLs enables the network operators to provide
personalized services while mitigating privacy concerns by storing and sharing
only truncated traffic traces.
In order to offset the accuracy loss due to truncation, our statistical
methodology leverages specialized features extracted from a group of
consecutive URLs that represent a micro user action like web click, chat reply,
etc., which we call bursts. These bursts, in turn, are detected by a novel
algorithm which is based on our observed characteristics of the inter-arrival
time of HTTP records. We present an extensive experimental evaluation on a real
dataset of mobile web traces, consisting of more than 130 million records,
representing the browsing activities of 10,000 users over a period of 30 days.
Our results show that the proposed methodology achieves around 90% accuracy in
segregating URLs representing user activities from non-representative URLs
- …