27,737 research outputs found
Using Search Queries to Understand Health Information Needs in Africa
The lack of comprehensive, high-quality health data in developing nations
creates a roadblock for combating the impacts of disease. One key challenge is
understanding the health information needs of people in these nations. Without
understanding people's everyday needs, concerns, and misconceptions, health
organizations and policymakers lack the ability to effectively target education
and programming efforts. In this paper, we propose a bottom-up approach that
uses search data from individuals to uncover and gain insight into health
information needs in Africa. We analyze Bing searches related to HIV/AIDS,
malaria, and tuberculosis from all 54 African nations. For each disease, we
automatically derive a set of common search themes or topics, revealing a
wide-spread interest in various types of information, including disease
symptoms, drugs, concerns about breastfeeding, as well as stigma, beliefs in
natural cures, and other topics that may be hard to uncover through traditional
surveys. We expose the different patterns that emerge in health information
needs by demographic groups (age and sex) and country. We also uncover
discrepancies in the quality of content returned by search engines to users by
topic. Combined, our results suggest that search data can help illuminate
health information needs in Africa and inform discussions on health policy and
targeted education efforts both on- and offline.Comment: Extended version of an ICWSM 2019 pape
Measuring success of open source projects using web search engines
What makes an open source project successful?
In this paper we show that the traditional factors of success of open source projects, such as number of downloads, deployments or commits are sometimes inconvenient or even insufficient. We then correlate success of an open source project with its popularity on the Web. We show several ideas of how such popularity could be measured using Web search engines and provide experimental results from quantitative analysis of the proposed measures on representative large samples of open source projects from SourceForge
Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results
In-degree, PageRank, number of visits and other measures of Web page
popularity significantly influence the ranking of search results by modern
search engines. The assumption is that popularity is closely correlated with
quality, a more elusive concept that is difficult to measure directly.
Unfortunately, the correlation between popularity and quality is very weak for
newly-created pages that have yet to receive many visits and/or in-links.
Worse, since discovery of new content is largely done by querying search
engines, and because users usually focus their attention on the top few
results, newly-created but high-quality pages are effectively ``shut out,'' and
it can take a very long time before they become popular.
We propose a simple and elegant solution to this problem: the introduction of
a controlled amount of randomness into search result ranking methods. Doing so
offers new pages a chance to prove their worth, although clearly using too much
randomness will degrade result quality and annul any benefits achieved. Hence
there is a tradeoff between exploration to estimate the quality of new pages
and exploitation of pages already known to be of high quality. We study this
tradeoff both analytically and via simulation, in the context of an economic
objective function based on aggregate result quality amortized over time. We
show that a modest amount of randomness leads to improved search results
Efficient Diversification of Web Search Results
In this paper we analyze the efficiency of various search results
diversification methods. While efficacy of diversification approaches has been
deeply investigated in the past, response time and scalability issues have been
rarely addressed. A unified framework for studying performance and feasibility
of result diversification solutions is thus proposed. First we define a new
methodology for detecting when, and how, query results need to be diversified.
To this purpose, we rely on the concept of "query refinement" to estimate the
probability of a query to be ambiguous. Then, relying on this novel ambiguity
detection method, we deploy and compare on a standard test set, three different
diversification methods: IASelect, xQuAD, and OptSelect. While the first two
are recent state-of-the-art proposals, the latter is an original algorithm
introduced in this paper. We evaluate both the efficiency and the effectiveness
of our approach against its competitors by using the standard TREC Web
diversification track testbed. Results shown that OptSelect is able to run two
orders of magnitude faster than the two other state-of-the-art approaches and
to obtain comparable figures in diversification effectiveness.Comment: VLDB201
- …