603 research outputs found
Cyberspace and Real-World Behavioral Relationships: Towards the Application of Internet Search Queries to Identify Individuals At-risk for Suicide
The Internet has become an integral and pervasive aspect of society. Not surprisingly, the growth of ecommerce has led to focused research on identifying relationships between user behavior in cyberspace and the real world - retailers are tracking items customers are viewing and purchasing in order to recommend additional products and to better direct advertising. As the relationship between online search patterns and real-world behavior becomes more understood, the practice is likely to expand to other applications. Indeed, Google Flu Trends has implemented an algorithm that accurately charts the relationship between the number of people searching for flu-related topics on the Internet, and the number of people who actually have flu symptoms in that region. Because the results are real-time, studies show Google Flu Trends estimates are typically two weeks ahead of the Center for Disease Control. The Air Force has devoted considerable resources to suicide awareness and prevention. Despite these efforts, suicide rates have remained largely unaffected. The Air Force Suicide Prevention Program assists family, friends, and co-workers of airmen in recognizing and discussing behavioral changes with at-risk individuals. Based on other successes in correlating behaviors in cyberspace and the real world, is it possible to leverage online activities to help identify individuals that exhibit suicidal or depression-related symptoms? This research explores the notion of using Internet search queries to classify individuals with common search patterns. Text mining was performed on user search histories for a one-month period from nine Air Force installations. The search histories were clustered based on search term probabilities, providing the ability to identify relationships between individuals searching for common terms. Analysis was then performed to identify relationships between individuals searching for key terms associated with suicide, anxiety, and post-traumatic stress
Why We Read Wikipedia
Wikipedia is one of the most popular sites on the Web, with millions of users
relying on it to satisfy a broad range of information needs every day. Although
it is crucial to understand what exactly these needs are in order to be able to
meet them, little is currently known about why users visit Wikipedia. The goal
of this paper is to fill this gap by combining a survey of Wikipedia readers
with a log-based analysis of user activity. Based on an initial series of user
surveys, we build a taxonomy of Wikipedia use cases along several dimensions,
capturing users' motivations to visit Wikipedia, the depth of knowledge they
are seeking, and their knowledge of the topic of interest prior to visiting
Wikipedia. Then, we quantify the prevalence of these use cases via a
large-scale user survey conducted on live Wikipedia with almost 30,000
responses. Our analyses highlight the variety of factors driving users to
Wikipedia, such as current events, media coverage of a topic, personal
curiosity, work or school assignments, or boredom. Finally, we match survey
responses to the respondents' digital traces in Wikipedia's server logs,
enabling the discovery of behavioral patterns associated with specific use
cases. For instance, we observe long and fast-paced page sequences across
topics for users who are bored or exploring randomly, whereas those using
Wikipedia for work or school spend more time on individual articles focused on
topics such as science. Our findings advance our understanding of reader
motivations and behavior on Wikipedia and can have implications for developers
aiming to improve Wikipedia's user experience, editors striving to cater to
their readers' needs, third-party services (such as search engines) providing
access to Wikipedia content, and researchers aiming to build tools such as
recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table
CYCLOSA: Decentralizing Private Web Search Through SGX-Based Browser Extensions
By regularly querying Web search engines, users (unconsciously) disclose
large amounts of their personal data as part of their search queries, among
which some might reveal sensitive information (e.g. health issues, sexual,
political or religious preferences). Several solutions exist to allow users
querying search engines while improving privacy protection. However, these
solutions suffer from a number of limitations: some are subject to user
re-identification attacks, while others lack scalability or are unable to
provide accurate results. This paper presents CYCLOSA, a secure, scalable and
accurate private Web search solution. CYCLOSA improves security by relying on
trusted execution environments (TEEs) as provided by Intel SGX. Further,
CYCLOSA proposes a novel adaptive privacy protection solution that reduces the
risk of user re- identification. CYCLOSA sends fake queries to the search
engine and dynamically adapts their count according to the sensitivity of the
user query. In addition, CYCLOSA meets scalability as it is fully
decentralized, spreading the load for distributing fake queries among other
nodes. Finally, CYCLOSA achieves accuracy of Web search as it handles the real
query and the fake queries separately, in contrast to other existing solutions
that mix fake and real query results
Recommended from our members
Using Probabilistic Topic Modeling of Library Access Records to Identify Learning Trends in Educational Research
Advances in the architecture of digital library service infrastructure enable the collection of various types of data related to the use of library resources, tools, and services. The Big Data that is being generated provides valuable insight into library operations and has the potential to reshape the future of library work. In this paper, we describe the innovative application of topic modeling (supervised Latent Dirichlet Allocation) of research corpora accessed by patrons through a library proxy server. We found that the underlying topics of this corpus (e.g., psychology, family education, and methodology) converge with the general interests one would expect from a Graduate School of Education. In addition, we discuss the potential and challenges of utilizing library proxy log data in learning analytics research
JNET: Learning User Representations via Joint Network Embedding and Topic Embedding
User representation learning is vital to capture diverse user preferences,
while it is also challenging as user intents are latent and scattered among
complex and different modalities of user-generated data, thus, not directly
measurable. Inspired by the concept of user schema in social psychology, we
take a new perspective to perform user representation learning by constructing
a shared latent space to capture the dependency among different modalities of
user-generated data. Both users and topics are embedded to the same space to
encode users' social connections and text content, to facilitate joint modeling
of different modalities, via a probabilistic generative framework. We evaluated
the proposed solution on large collections of Yelp reviews and StackOverflow
discussion posts, with their associated network structures. The proposed model
outperformed several state-of-the-art topic modeling based user models with
better predictive power in unseen documents, and state-of-the-art network
embedding based user models with improved link prediction quality in unseen
nodes. The learnt user representations are also proved to be useful in content
recommendation, e.g., expert finding in StackOverflow
- …