17 research outputs found
Graphical web based tool for generating query from star schema
This paper presents the development of a graphical SQL query tool that allows novice and non-technical users to navigate through database tables and generate their own queries.The tool enables the query output to be presented in graphical and tabular forms, which can help users, especially top management in better understanding and interpreting query results.The algorithms to construct
complex SQL query from star schema in databases is also presented
Graphical Web Based Tool for Generating Query from Star Schema
Novice users have difficulty to generate structured query language from the star schemas because they are not familiar with formulating SQL queries and SQL syntax. This study proposed graphical web based tool to generate queries from star schema and represent the data in tabular or graphical forms which help novice user to formulate SQL query. A prototype for a web based tool to generate the query has been developed using Java Server Pages programming language. The developed tool can facilitate complex query construction which is faced by non-technical and/or novice users. The output of SQL query is presented in tabular and graphical forms which can help users especially top management in better understanding and
interpreting query results
Profile Diversity for Phenotyping Data Search and Recommendation
Session: Applications innovantesNational audienceDans ce travail, nous étudions la diversité de profils. Il s'agit d'une approche nouvelle dans la recherche de documents scientifiques. De nombreux travaux ont combinés la pertinence des mots clés avec la popularité des documents au sein d'une fonction de score " sociale ". Diversifier le contenu des documents retournés a également été traité de mani'ere approfondie et la recherche, la publicité, les requêtes en base de données et la recommandation. Nous pensons que notre travail est le premier à traiter de la diversité de profils afin de traiter le problème des listes de résultats hautement populaires mais trop ciblées. Nous montrerons comment nous adaptons l'algorithme de Fagin sur les algorithmes à seuil pour retourner les documents les plus pertinents, les plus populaires mais aussi les plus divers que ce soit en terme de contenus ou de profils. Nous avons également un ensemble de simulations sur deux benchmarks afin de valider notre fonction de score
Profile Diversity for Phenotyping Data Search and Recommendation
Session: Applications innovantesSession: Applications innovantesNational audienceDans ce travail, nous étudions la diversité de profils. Il s'agit d'une approche nouvelle dans la recherche de documents scientifiques. De nombreux travaux ont combinés la pertinence des mots clés avec la popularité des documents au sein d'une fonction de score " sociale ". Diversifier le contenu des documents retournés a également été traité de mani'ere approfondie et la recherche, la publicité, les requêtes en base de données et la recommandation. Nous pensons que notre travail est le premier à traiter de la diversité de profils afin de traiter le problème des listes de résultats hautement populaires mais trop ciblées. Nous montrerons comment nous adaptons l'algorithme de Fagin sur les algorithmes à seuil pour retourner les documents les plus pertinents, les plus populaires mais aussi les plus divers que ce soit en terme de contenus ou de profils. Nous avons également un ensemble de simulations sur deux benchmarks afin de valider notre fonction de score
MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension
Given a dataset of points in a metric space and an integer , a diversity
maximization problem requires determining a subset of points maximizing
some diversity objective measure, e.g., the minimum or the average distance
between two points in the subset. Diversity maximization is computationally
hard, hence only approximate solutions can be hoped for. Although its
applications are mainly in massive data analysis, most of the past research on
diversity maximization focused on the sequential setting. In this work we
present space and pass/round-efficient diversity maximization algorithms for
the Streaming and MapReduce models and analyze their approximation guarantees
for the relevant class of metric spaces of bounded doubling dimension. Like
other approaches in the literature, our algorithms rely on the determination of
high-quality core-sets, i.e., (much) smaller subsets of the input which contain
good approximations to the optimal solution for the whole input. For a variety
of diversity objective functions, our algorithms attain an
-approximation ratio, for any constant , where
is the best approximation ratio achieved by a polynomial-time,
linear-space sequential algorithm for the same diversity objective. This
improves substantially over the approximation ratios attainable in Streaming
and MapReduce by state-of-the-art algorithms for general metric spaces. We
provide extensive experimental evidence of the effectiveness of our algorithms
on both real world and synthetic datasets, scaling up to over a billion points.Comment: Extended version of
http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5,
January 201
On the Usefulness of SQL-Query-Similarity Measures to Find User Interests
In the sciences and elsewhere, the use of relational databases has become ubiquitous. An important challenge is finding hot spots of user interests. In principle, one can discover user interests by clustering the queries in the query log. Such a clustering requires a notion of query similarity. This, in turn, raises the question of what features of SQL queries are meaningful. We have studied the query representations proposed in the literature and corresponding similarity functions and have identified shortcomings of all of them. To overcome these limitations, we propose new similarity functions for SQL queries. They rely on the so-called access area of a query and, more specifically, on the overlap and the closeness of the access areas. We have carried out experiments systematically to compare the various similarity functions described in this article. The first series of experiments measures the quality of clustering and compares it to a ground truth. In the second series, we focus on the query log from the well-known SkyServer database. Here, a domain expert has interpreted various clusters by hand. We conclude that clusters obtained with our new measures of similarity seem to be good indicators of user interests
Addressing Diverse User Preferences in SQL-Query-Result Navigation
Database queries are often exploratory and users often find their queries return too many answers, many of them irrelevant. Existing work either categorizes or ranks the results to help users locate interesting results. The success of both approaches depends on the utilization of user preferences. However, most existing work assumes that all users have the same user preferences, but in real life different users often have different preferences. This paper proposes a two-step solution to address the diversity issue of user preferences for the categorization approach. The proposed solution does not require explicit user involvement. The first step analyzes query history of all users in the system offline and generates a set of clusters over the data, each corresponding to one type of user preferences. When user asks a query, the second step presents to the user a navigational tree over clusters generated in the first step such that the user can easily select the subset of clusters matching his needs. The user then can browse, rank, or categorize the results in selected clusters. The navigational tree is automatically constructed using a cost-based algorithm which considers the cost of visiting both intermediate nodes and leaf nodes in the tree. An empirical study demonstrates the benefits of our approach. Categories and Subject Descriptor