16,073 research outputs found
SQL query log analysis for identifying user interests and query recommendations
In the sciences and elsewhere, the use of relational databases has become ubiquitous.
To get maximum profit from a database, one should have in-depth knowledge in both
SQL and a domain (data structure and meaning that a database contains). To assist
inexperienced users in formulating their needs, SQL query recommendation system
(SQL QRS) has been proposed. It utilizes the experience of previous users captured by
SQL query log as well as the user query history to suggest. When constructing such
a system, one should solve related problems: (1) clean the query log and (2) define
appropriate query similarity functions. These two tasks are not only necessary for
building SQL QRS, but they apply to other problems. In what follows, we describe
three scenarios of SQL query log analysis: (1) cleaning an SQL query log, (2) SQL
query log clustering when testing SQL query similarity functions and (3) recommending
SQL queries. We also explain how these three branches are related to each other.
Scenario 1. Cleaning SQL query log as a general pre-processing step
The raw query log is often not suitable for query log analysis tasks such as clustering,
giving recommendations. That is because it contains antipatterns and robotic data
downloads, also known as Sliding Window Search (SWS). An antipattern in software
engineering is a special case of a pattern. While a pattern is a standard solution, an
antipattern is a pattern with a negative effect.
When it comes to SQL query recommendation, leaving such artifacts in the log during
analysis results in a wrong suggestion. Firstly, the behaviour of "mortal" users who
need a recommendation is different from robots, which perform SWS. Secondly, one
does not want to recommend antipatterns, so they need to be excluded from the query
pool. Thirdly, the bigger a log is, the slower a recommendation engine operates. Thus,
excluding SWS and antipatterns from the input data makes the recommendation
better and faster.
The effect of SWS and antipatterns on query log clustering depends on the chosen
similarity function. The result can either (1) do not change or (2) add clusters which
cover a big part of data. In any case, having antipatterns and SWS in an input log
increases only the time one need to cluster and do not increase the quality of results.
Scenario 2. Identifying User Interests via Clustering
To identify the hot spots of user interests, one clusters SQL queries. In a scientific
domain, it exposes research trends. In business, it points to popular data slices which
one might want to refactor for better accessibility. A good clustering result must be
precise (match ground truth) and interpretable.
Query similarity relies on SQL query representation. There are three strategies to
represent an SQL query. FB (feature-based) query representation sees a query as
structure, not considering the data, a query accesses. WB (witness-based) approach
treat a query as a set of tuples in the result set. AAB (access area-based) representation
considers a query as an expression in relational algebra. While WB and FB query
similarity functions are straightforward (Jaccard or cosine similarities), AAB query
similarity requires additional definition. We proposed two variants of AAB similarity
measure – overlap (AABovl) and closeness (AABcl). In AABovl, the similarity of two
queries is the overlap of their access areas. AABcl relies on the distance between two
access areas in the data space – two queries may be similar even if their access areas
do not overlap.
The extensive experiments consist of two parts. The first one is clustering a rather
small dataset with ground truth. This experiment serves to study the precision of
various similarity functions by comparing clustering results to supervised insights. The
second experiment aims to investigate on the interpretability of clustering results with
different similarity functions. It clusters a big real-world query log. The domain expert
then evaluates the results. Both experiments show that AAB similarity functions
produce better results in both precision and interpretability.
Scenario 3. SQL Query Recommendation
A sound SQL query recommendation system (1) provides a query which can be run
directly, (2) supports comparison operators and various logical operators, (3) is scalable
and has low response times, (4) provides recommendations of high quality. The existing
approaches fail to fulfill all the requirements. We proposed DASQR, scalable and
data-aware query recommendation to meet all four needs. In a nutshell, DASQR is
a hybrid (collaborative filtering + content-based) approach. Its variations utilize all
similarity functions, which we define or find in the related work.
Measuring the quality of SQL query recommendation system (QRS) is particularly
challenging since there is no standard way approaching it. Previous studies have
evaluated the results using quality metrics which only rely on the query representations
used in these studies. It is somewhat subjective since a similarity function and a
quality metric are dependent. We propose AAB quality metrics and then evaluate
each approach based on all the metrics.
The experiments test DASQR approaches and competitors. Both performance and
runtime experiments indicate that DASQR approaches outperform the existing ones
Deriving query suggestions for site search
Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T
Theory-based user modeling for personalized interactive information retrieval
In an effort to improve users’ search experiences during their information seeking process, providing a personalized information retrieval system is proposed to be one of the effective approaches. To personalize the search systems requires a good understanding of the users. User modeling has been approved to be a good method for learning and representing users. Therefore many user modeling studies have been carried out and some user models have been developed. The majority of the user modeling studies applies inductive approach, and only small number of studies employs deductive approach. In this paper, an EISE (Extended Information goal, Search strategy and Evaluation threshold) user model is proposed, which uses the deductive approach based on psychology theories and an existing user model. Ten users’ interactive search log obtained from the real search engine is applied to validate the proposed user model. The preliminary validation results show that the EISE model can be applied to identify different types of users. The search preferences of the different user types can be applied to inform interactive search system design and development
Anticipating Information Needs Based on Check-in Activity
In this work we address the development of a smart personal assistant that is
capable of anticipating a user's information needs based on a novel type of
context: the person's activity inferred from her check-in records on a
location-based social network. Our main contribution is a method that
translates a check-in activity into an information need, which is in turn
addressed with an appropriate information card. This task is challenging
because of the large number of possible activities and related information
needs, which need to be addressed in a mobile dashboard that is limited in
size. Our approach considers each possible activity that might follow after the
last (and already finished) activity, and selects the top information cards
such that they maximize the likelihood of satisfying the user's information
needs for all possible future scenarios. The proposed models also incorporate
knowledge about the temporal dynamics of information needs. Using a combination
of historical check-in data and manual assessments collected via crowdsourcing,
we show experimentally the effectiveness of our approach.Comment: Proceedings of the 10th ACM International Conference on Web Search
and Data Mining (WSDM '17), 201
Towards an automated query modification assistant
Users who need several queries before finding what they need can benefit from
an automatic search assistant that provides feedback on their query
modification strategies. We present a method to learn from a search log which
types of query modifications have and have not been effective in the past. The
method analyses query modifications along two dimensions: a traditional
term-based dimension and a semantic dimension, for which queries are enriches
with linked data entities. Applying the method to the search logs of two search
engines, we identify six opportunities for a query modification assistant to
improve search: modification strategies that are commonly used, but that often
do not lead to satisfactory results.Comment: 1st International Workshop on Usage Analysis and the Web of Data
(USEWOD2011) in the 20th International World Wide Web Conference (WWW2011),
Hyderabad, India, March 28th, 201
Supporting aspect-based video browsing - analysis of a user study
In this paper, we present a novel video search interface based on the concept of aspect browsing. The proposed strategy is to assist the user in exploratory video search by actively suggesting new query terms and video shots. Our approach has the potential to narrow the "Semantic Gap" issue by allowing users to explore the data collection. First, we describe a clustering technique to identify potential aspects of a search. Then, we use the results to propose suggestions to the user to help them in their search task. Finally, we analyse this approach by exploiting the log files and the feedbacks of a user study
Asymptotically Truthful Equilibrium Selection in Large Congestion Games
Studying games in the complete information model makes them analytically
tractable. However, large player interactions are more realistically
modeled as games of incomplete information, where players may know little to
nothing about the types of other players. Unfortunately, games in incomplete
information settings lose many of the nice properties of complete information
games: the quality of equilibria can become worse, the equilibria lose their
ex-post properties, and coordinating on an equilibrium becomes even more
difficult. Because of these problems, we would like to study games of
incomplete information, but still implement equilibria of the complete
information game induced by the (unknown) realized player types.
This problem was recently studied by Kearns et al. and solved in large games
by means of introducing a weak mediator: their mediator took as input reported
types of players, and output suggested actions which formed a correlated
equilibrium of the underlying game. Players had the option to play
independently of the mediator, or ignore its suggestions, but crucially, if
they decided to opt-in to the mediator, they did not have the power to lie
about their type. In this paper, we rectify this deficiency in the setting of
large congestion games. We give, in a sense, the weakest possible mediator: it
cannot enforce participation, verify types, or enforce its suggestions.
Moreover, our mediator implements a Nash equilibrium of the complete
information game. We show that it is an (asymptotic) ex-post equilibrium of the
incomplete information game for all players to use the mediator honestly, and
that when they do so, they end up playing an approximate Nash equilibrium of
the induced complete information game. In particular, truthful use of the
mediator is a Bayes-Nash equilibrium in any Bayesian game for any prior.Comment: The conference version of this paper appeared in EC 2014. This
manuscript has been merged and subsumed by the preprint "Robust Mediators in
Large Games": http://arxiv.org/abs/1512.0269
- …