72,251 research outputs found
Characterizing Search Behavior in Productivity Software
Complex software applications expose hundreds of commands to users through intricate menu hierarchies. One of the most popular productivity software suites, Microsoft Office, has recently developed functionality that allows users to issue free-form text queries to a search system to quickly find commands they want to execute, retrieve help documentation or access web results in a unified interface. In this paper, we analyze millions of search sessions originating from within Microsoft Office applications, collected over one month of activity, in an effort to characterize search behavior in productivity software. Our research brings together previous efforts in analyzing command usage in large-scale applications and efforts in understanding search behavior in environments other than the web. Our findings show that users engage primarily in command search, and that re-accessing commands through search is a frequent behavior. Our work represents the first large-scale analysis of search over command spaces and is an important first step in understanding how search systems integrated with productivity software can be successfully developed
Determining WWW User's Next Access and Its Application to Pre-fetching
World-Wide Web (WWW) services have grown to levels where significant delays are expected to happen. Techniques like pre-fetching are likely to help users to personalize their needs, reducing their waiting times. However, pre-fetching is only effective if the right documents are identified and if user's move is correctly predicted. Otherwise, pre-fetching will only waste bandwidth. Therefore, it is productive to determine whether a revisit will occur or not, before starting pre-fetching.
In this paper we develop two user models that help determining user's next move. One model uses Random Walk approximation and the other is based on Digital Signal Processing techniques. We also give hints on how to use such models with a simple pre-fetching technique that we are developing.CNP
Emergence of scaling in random networks
Systems as diverse as genetic networks or the world wide web are best
described as networks with complex topology. A common property of many large
networks is that the vertex connectivities follow a scale-free power-law
distribution. This feature is found to be a consequence of the two generic
mechanisms that networks expand continuously by the addition of new vertices,
and new vertices attach preferentially to already well connected sites. A model
based on these two ingredients reproduces the observed stationary scale-free
distributions, indicating that the development of large networks is governed by
robust self-organizing phenomena that go beyond the particulars of the
individual systems.Comment: 11 pages, 2 figure
Computational Content Analysis of Negative Tweets for Obesity, Diet, Diabetes, and Exercise
Social media based digital epidemiology has the potential to support faster
response and deeper understanding of public health related threats. This study
proposes a new framework to analyze unstructured health related textual data
via Twitter users' post (tweets) to characterize the negative health sentiments
and non-health related concerns in relations to the corpus of negative
sentiments, regarding Diet Diabetes Exercise, and Obesity (DDEO). Through the
collection of 6 million Tweets for one month, this study identified the
prominent topics of users as it relates to the negative sentiments. Our
proposed framework uses two text mining methods, sentiment analysis and topic
modeling, to discover negative topics. The negative sentiments of Twitter users
support the literature narratives and the many morbidity issues that are
associated with DDEO and the linkage between obesity and diabetes. The
framework offers a potential method to understand the publics' opinions and
sentiments regarding DDEO. More importantly, this research provides new
opportunities for computational social scientists, medical experts, and public
health professionals to collectively address DDEO-related issues.Comment: The 2017 Annual Meeting of the Association for Information Science
and Technology (ASIST
Characterizing Phishing Threats with Natural Language Processing
Spear phishing is a widespread concern in the modern network security
landscape, but there are few metrics that measure the extent to which
reconnaissance is performed on phishing targets. Spear phishing emails closely
match the expectations of the recipient, based on details of their experiences
and interests, making them a popular propagation vector for harmful malware. In
this work we use Natural Language Processing techniques to investigate a
specific real-world phishing campaign and quantify attributes that indicate a
targeted spear phishing attack. Our phishing campaign data sample comprises 596
emails - all containing a web bug and a Curriculum Vitae (CV) PDF attachment -
sent to our institution by a foreign IP space. The campaign was found to
exclusively target specific demographics within our institution. Performing a
semantic similarity analysis between the senders' CV attachments and the
recipients' LinkedIn profiles, we conclude with high statistical certainty (p
) that the attachments contain targeted rather than randomly
selected material. Latent Semantic Analysis further demonstrates that
individuals who were a primary focus of the campaign received CVs that are
highly topically clustered. These findings differentiate this campaign from one
that leverages random spam.Comment: This paper has been accepted for publication by the IEEE Conference
on Communications and Network Security in September 2015 at Florence, Italy.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Mean-field theory for scale-free random networks
Random networks with complex topology are common in Nature, describing
systems as diverse as the world wide web or social and business networks.
Recently, it has been demonstrated that most large networks for which
topological information is available display scale-free features. Here we study
the scaling properties of the recently introduced scale-free model, that can
account for the observed power-law distribution of the connectivities. We
develop a mean-field method to predict the growth dynamics of the individual
vertices, and use this to calculate analytically the connectivity distribution
and the scaling exponents. The mean-field method can be used to address the
properties of two variants of the scale-free model, that do not display
power-law scaling.Comment: 19 pages, 6 figure
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
- …