12,701 research outputs found
A Call to Arms: Revisiting Database Design
Good database design is crucial to obtain a sound, consistent database, and -
in turn - good database design methodologies are the best way to achieve the
right design. These methodologies are taught to most Computer Science
undergraduates, as part of any Introduction to Database class. They can be
considered part of the "canon", and indeed, the overall approach to database
design has been unchanged for years. Moreover, none of the major database
research assessments identify database design as a strategic research
direction.
Should we conclude that database design is a solved problem?
Our thesis is that database design remains a critical unsolved problem.
Hence, it should be the subject of more research. Our starting point is the
observation that traditional database design is not used in practice - and if
it were used it would result in designs that are not well adapted to current
environments. In short, database design has failed to keep up with the times.
In this paper, we put forth arguments to support our viewpoint, analyze the
root causes of this situation and suggest some avenues of research.Comment: Removed spurious column break. Nothing else was change
Generalized h-index for Disclosing Latent Facts in Citation Networks
What is the value of a scientist and its impact upon the scientific thinking?
How can we measure the prestige of a journal or of a conference? The evaluation
of the scientific work of a scientist and the estimation of the quality of a
journal or conference has long attracted significant interest, due to the
benefits from obtaining an unbiased and fair criterion. Although it appears to
be simple, defining a quality metric is not an easy task. To overcome the
disadvantages of the present metrics used for ranking scientists and journals,
J.E. Hirsch proposed a pioneering metric, the now famous h-index. In this
article, we demonstrate several inefficiencies of this index and develop a pair
of generalizations and effective variants of it to deal with scientist ranking
and with publication forum ranking. The new citation indices are able to
disclose trendsetters in scientific research, as well as researchers that
constantly shape their field with their influential work, no matter how old
they are. We exhibit the effectiveness and the benefits of the new indices to
unfold the full potential of the h-index, with extensive experimental results
obtained from DBLP, a widely known on-line digital library.Comment: 19 pages, 17 tables, 27 figure
Interactive Constrained Association Rule Mining
We investigate ways to support interactive mining sessions, in the setting of
association rule mining. In such sessions, users specify conditions (queries)
on the associations to be generated. Our approach is a combination of the
integration of querying conditions inside the mining phase, and the incremental
querying of already generated associations. We present several concrete
algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second
International Conference on Knowledge Discovery and Data Mining (DaWaK 2000
A Tight Upper Bound on the Number of Candidate Patterns
In the context of mining for frequent patterns using the standard levelwise
algorithm, the following question arises: given the current level and the
current set of frequent patterns, what is the maximal number of candidate
patterns that can be generated on the next level? We answer this question by
providing a tight upper bound, derived from a combinatorial result from the
sixties by Kruskal and Katona. Our result is useful to reduce the number of
database scans
Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
- …