Search CORE

72,263 research outputs found

Characterizing Search Behavior in Productivity Software

Author: Bota Horatiu
Dumais Susan T.
Fourney Adam
Religa Tomasz L.
Rounthwaite Robert
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Complex software applications expose hundreds of commands to users through intricate menu hierarchies. One of the most popular productivity software suites, Microsoft Office, has recently developed functionality that allows users to issue free-form text queries to a search system to quickly find commands they want to execute, retrieve help documentation or access web results in a unified interface. In this paper, we analyze millions of search sessions originating from within Microsoft Office applications, collected over one month of activity, in an effort to characterize search behavior in productivity software. Our research brings together previous efforts in analyzing command usage in large-scale applications and efforts in understanding search behavior in environments other than the web. Our findings show that users engage primarily in command search, and that re-accessing commands through search is a frequent behavior. Our work represents the first large-scale analysis of search over command spaces and is an important first step in understanding how search systems integrated with productivity software can be successfully developed

Crossref

Enlighten

Determining WWW User's Next Access and Its Application to Pre-fetching

Author: Cunha Carlos R.
Jaccoud Carlos F.B.
Publication venue: Boston University Computer Science Department
Publication date: 26/03/1997
Field of study

World-Wide Web (WWW) services have grown to levels where significant delays are expected to happen. Techniques like pre-fetching are likely to help users to personalize their needs, reducing their waiting times. However, pre-fetching is only effective if the right documents are identified and if user's move is correctly predicted. Otherwise, pre-fetching will only waste bandwidth. Therefore, it is productive to determine whether a revisit will occur or not, before starting pre-fetching. In this paper we develop two user models that help determining user's next move. One model uses Random Walk approximation and the other is based on Digital Signal Processing techniques. We also give hints on how to use such models with a simple pre-fetching technique that we are developing.CNP

Boston University Institutional Repository (OpenBU)

Emergence of scaling in random networks

Author: Albert-László Barabási
Barthélémy
Huberman
Milgram
Réka Albert
Rényi
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 01/01/1999
Field of study

Systems as diverse as genetic networks or the world wide web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature is found to be a consequence of the two generic mechanisms that networks expand continuously by the addition of new vertices, and new vertices attach preferentially to already well connected sites. A model based on these two ingredients reproduces the observed stationary scale-free distributions, indicating that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.Comment: 11 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Institute of Mathematics AS CR, v. v. i.

Computational Content Analysis of Negative Tweets for Obesity, Diet, Diabetes, and Exercise

Author: Karami Amir
Shaw Jr. George
Publication venue
Publication date: 01/01/2017
Field of study

Social media based digital epidemiology has the potential to support faster response and deeper understanding of public health related threats. This study proposes a new framework to analyze unstructured health related textual data via Twitter users' post (tweets) to characterize the negative health sentiments and non-health related concerns in relations to the corpus of negative sentiments, regarding Diet Diabetes Exercise, and Obesity (DDEO). Through the collection of 6 million Tweets for one month, this study identified the prominent topics of users as it relates to the negative sentiments. Our proposed framework uses two text mining methods, sentiment analysis and topic modeling, to discover negative topics. The negative sentiments of Twitter users support the literature narratives and the many morbidity issues that are associated with DDEO and the linkage between obesity and diabetes. The framework offers a potential method to understand the publics' opinions and sentiments regarding DDEO. More importantly, this research provides new opportunities for computational social scientists, medical experts, and public health professionals to collectively address DDEO-related issues.Comment: The 2017 Annual Meeting of the Association for Information Science and Technology (ASIST

arXiv.org e-Print Archive

Scholar Commons - Institutional Repository of the University of South Carolina

Characterizing Phishing Threats with Natural Language Processing

Author: Kotson Michael C.
Schulz Alexia
Publication venue
Publication date: 31/08/2015
Field of study

Spear phishing is a widespread concern in the modern network security landscape, but there are few metrics that measure the extent to which reconnaissance is performed on phishing targets. Spear phishing emails closely match the expectations of the recipient, based on details of their experiences and interests, making them a popular propagation vector for harmful malware. In this work we use Natural Language Processing techniques to investigate a specific real-world phishing campaign and quantify attributes that indicate a targeted spear phishing attack. Our phishing campaign data sample comprises 596 emails - all containing a web bug and a Curriculum Vitae (CV) PDF attachment - sent to our institution by a foreign IP space. The campaign was found to exclusively target specific demographics within our institution. Performing a semantic similarity analysis between the senders' CV attachments and the recipients' LinkedIn profiles, we conclude with high statistical certainty (p

< 10^{-4}

) that the attachments contain targeted rather than randomly selected material. Latent Semantic Analysis further demonstrates that individuals who were a primary focus of the campaign received CVs that are highly topically clustered. These findings differentiate this campaign from one that leverages random spam.Comment: This paper has been accepted for publication by the IEEE Conference on Communications and Network Security in September 2015 at Florence, Italy. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

Crossref

Mean-field theory for scale-free random networks

Author: Albert Reka
Barabasi Albert-Laszlo
Jeong Hawoong
Publication venue: 'Elsevier BV'
Publication date: 01/01/1999
Field of study

Random networks with complex topology are common in Nature, describing systems as diverse as the world wide web or social and business networks. Recently, it has been demonstrated that most large networks for which topological information is available display scale-free features. Here we study the scaling properties of the recently introduced scale-free model, that can account for the observed power-law distribution of the connectivities. We develop a mean-field method to predict the growth dynamics of the individual vertices, and use this to calculate analytically the connectivity distribution and the scaling exponents. The mean-field method can be used to address the properties of two variants of the scale-free model, that do not display power-law scaling.Comment: 19 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

Author: Lampesberger Harald
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref