43,463 research outputs found

    Behavior Profiling of Email

    Get PDF
    This paper describes the forensic and intelligence analysis capabilities of the Email Mining Toolkit (EMT) under development at the Columbia Intrusion Detection (IDS) Lab. EMT provides the means of loading, parsing and analyzing email logs, including content, in a wide range of formats. Many tools and techniques have been available from the fields of Information Retrieval (IR) and Natural Language Processing (NLP) for analyzing documents of various sorts, including emails. EMT, however, extends these kinds of analyses with an entirely new set of analyses that model "user behavior." EMT thus models the behavior of individual user email accounts, or groups of accounts, including the "social cliques" revealed by a user's email behavior

    Profiling Users by Modeling Web Transactions

    Full text link
    Users of electronic devices, e.g., laptop, smartphone, etc. have characteristic behaviors while surfing the Web. Profiling this behavior can help identify the person using a given device. In this paper, we introduce a technique to profile users based on their web transactions. We compute several features extracted from a sequence of web transactions and use them with one-class classification techniques to profile a user. We assess the efficacy and speed of our method at differentiating 25 users on a dataset representing 6 months of web traffic monitoring from a small company network.Comment: Extended technical report of an IEEE ICDCS 2017 publicatio

    Beyond Classification: Latent User Interests Profiling from Visual Contents Analysis

    Full text link
    User preference profiling is an important task in modern online social networks (OSN). With the proliferation of image-centric social platforms, such as Pinterest, visual contents have become one of the most informative data streams for understanding user preferences. Traditional approaches usually treat visual content analysis as a general classification problem where one or more labels are assigned to each image. Although such an approach simplifies the process of image analysis, it misses the rich context and visual cues that play an important role in people's perception of images. In this paper, we explore the possibilities of learning a user's latent visual preferences directly from image contents. We propose a distance metric learning method based on Deep Convolutional Neural Networks (CNN) to directly extract similarity information from visual contents and use the derived distance metric to mine individual users' fine-grained visual preferences. Through our preliminary experiments using data from 5,790 Pinterest users, we show that even for the images within the same category, each user possesses distinct and individually-identifiable visual preferences that are consistent over their lifetime. Our results underscore the untapped potential of finer-grained visual preference profiling in understanding users' preferences.Comment: 2015 IEEE 15th International Conference on Data Mining Workshop

    Virus Propagation in Multiple Profile Networks

    Full text link
    Suppose we have a virus or one competing idea/product that propagates over a multiple profile (e.g., social) network. Can we predict what proportion of the network will actually get "infected" (e.g., spread the idea or buy the competing product), when the nodes of the network appear to have different sensitivity based on their profile? For example, if there are two profiles A\mathcal{A} and B\mathcal{B} in a network and the nodes of profile A\mathcal{A} and profile B\mathcal{B} are susceptible to a highly spreading virus with probabilities βA\beta_{\mathcal{A}} and βB\beta_{\mathcal{B}} respectively, what percentage of both profiles will actually get infected from the virus at the end? To reverse the question, what are the necessary conditions so that a predefined percentage of the network is infected? We assume that nodes of different profiles can infect one another and we prove that under realistic conditions, apart from the weak profile (great sensitivity), the stronger profile (low sensitivity) will get infected as well. First, we focus on cliques with the goal to provide exact theoretical results as well as to get some intuition as to how a virus affects such a multiple profile network. Then, we move to the theoretical analysis of arbitrary networks. We provide bounds on certain properties of the network based on the probabilities of infection of each node in it when it reaches the steady state. Finally, we provide extensive experimental results that verify our theoretical results and at the same time provide more insight on the problem

    Digital Food Marketing to Children and Adolescents: Problematic Practices and Policy Interventions

    Get PDF
    Examines trends in digital marketing to youth that uses "immersive" techniques, social media, behavioral profiling, location targeting and mobile marketing, and neuroscience methods. Recommends principles for regulating inappropriate advertising to youth

    "Are you accepting new patients?" A pilot field experiment on telephone-based gatekeeping and Black patients' access to pediatric care.

    Get PDF
    STUDY OBJECTIVES: To determine whether name and accent cues that the caller is Black shape physician offices' responses to telephone-based requests for well-child visits. METHOD AND DATA: In this pilot study, we employed a quasi-experimental audit design and examined a stratified national sample of pediatric and family practice offices. Our final data include information from 205 audits (410 completed phone calls). Qualitative data were blind-coded into binary variables. Our case-control comparisons using McNemar's tests focused on acceptance of patients, withholding information, shaping conversations, and misattributions. FINDINGS: Compared to the control group, "Black" auditors were less likely to be told an office was accepting new patients and were more likely to experience both withholding behaviors and misattributions about public insurance. The strength of associations varied according to whether the cue was based on name or accent. Additionally, the likelihood and ways office personnel communicated that they were not accepting patients varied by region. CONCLUSIONS: Linguistic profiling over the telephone is an aspect of structural racism that should be further studied and perhaps integrated into efforts to promote equitable access to care. Future research should look reactions to both name and accent, taking practice characteristics and regional differences into consideration

    Profiling user activities with minimal traffic traces

    Full text link
    Understanding user behavior is essential to personalize and enrich a user's online experience. While there are significant benefits to be accrued from the pursuit of personalized services based on a fine-grained behavioral analysis, care must be taken to address user privacy concerns. In this paper, we consider the use of web traces with truncated URLs - each URL is trimmed to only contain the web domain - for this purpose. While such truncation removes the fine-grained sensitive information, it also strips the data of many features that are crucial to the profiling of user activity. We show how to overcome the severe handicap of lack of crucial features for the purpose of filtering out the URLs representing a user activity from the noisy network traffic trace (including advertisement, spam, analytics, webscripts) with high accuracy. This activity profiling with truncated URLs enables the network operators to provide personalized services while mitigating privacy concerns by storing and sharing only truncated traffic traces. In order to offset the accuracy loss due to truncation, our statistical methodology leverages specialized features extracted from a group of consecutive URLs that represent a micro user action like web click, chat reply, etc., which we call bursts. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. We present an extensive experimental evaluation on a real dataset of mobile web traces, consisting of more than 130 million records, representing the browsing activities of 10,000 users over a period of 30 days. Our results show that the proposed methodology achieves around 90% accuracy in segregating URLs representing user activities from non-representative URLs
    corecore