80 research outputs found
Search Result Clustering via Randomized Partitioning of Query-Induced Subgraphs
In this paper, we present an approach to search result clustering, using
partitioning of underlying link graph. We define the notion of "query-induced
subgraph" and formulate the problem of search result clustering as a problem of
efficient partitioning of given subgraph into topic-related clusters. Also, we
propose a novel algorithm for approximative partitioning of such graph, which
results in cluster quality comparable to the one obtained by deterministic
algorithms, while operating in more efficient computation time, suitable for
practical implementations. Finally, we present a practical clustering search
engine developed as a part of this research and use it to get results about
real-world performance of proposed concepts.Comment: 16th Telecommunications Forum TELFOR 200
Explicit probabilistic models for databases and networks
Recent work in data mining and related areas has highlighted the importance
of the statistical assessment of data mining results. Crucial to this endeavour
is the choice of a non-trivial null model for the data, to which the found
patterns can be contrasted. The most influential null models proposed so far
are defined in terms of invariants of the null distribution. Such null models
can be used by computation intensive randomization approaches in estimating the
statistical significance of data mining results.
Here, we introduce a methodology to construct non-trivial probabilistic
models based on the maximum entropy (MaxEnt) principle. We show how MaxEnt
models allow for the natural incorporation of prior information. Furthermore,
they satisfy a number of desirable properties of previously introduced
randomization approaches. Lastly, they also have the benefit that they can be
represented explicitly. We argue that our approach can be used for a variety of
data types. However, for concreteness, we have chosen to demonstrate it in
particular for databases and networks.Comment: Submitte
- …