19 research outputs found

    Efficient Mining of Heterogeneous Star-Structured Data

    Get PDF
    Many of the real world clustering problems arising in data mining applications are heterogeneous in nature. Heterogeneous co-clustering involves simultaneous clustering of objects of two or more data types. While pairwise co-clustering of two data types has been well studied in the literature, research on high-order heterogeneous co-clustering is still limited. In this paper, we propose a graph theoretical framework for addressing star- structured co-clustering problems in which a central data type is connected to all the other data types. Partitioning this graph leads to co-clustering of all the data types under the constraints of the star-structure. Although, graph partitioning approach has been adopted before to address star-structured heterogeneous complex problems, the main contribution of this work lies in an e cient algorithm that we propose for partitioning the star-structured graph. Computationally, our algorithm is very quick as it requires a simple solution to a sparse system of overdetermined linear equations. Theoretical analysis and extensive exper- iments performed on toy and real datasets demonstrate the quality, e ciency and stability of the proposed algorithm

    A Relational Approach for Efficient Service Selection

    Get PDF
    Web services are gaining momentum as a major vehicle to deliver business functionalities on the Web. More and more business organizations have begun to use Web services to facilitate user interactions and the collaboration among themselves. This essentially forms a large service space, which still keeps growing. Meanwhile, there may be functionality overlaps among different service providers. The concept of Quality of Web Serivce (QoWS) is emerging as a key feature in distinguishing between competing service providers. We present in this paper a systematic approach for efficient service selection by using QoWS as the major criterion. In particular, we adopt a relational approach that enables to store QoWS information in a relational DBMS and leverage standard relational operators for efficient service selection. We perform a preliminary set of experiments to evaluate the proposed service selection algorithms

    Clustering Web Images with Multi-modal Features

    No full text
    Web image clustering has drawn significant. attention in the research community recently. However, not much work has been done in using multi-modal information for clustering Web im8ges. In this paper, we address the problem of Web image clustering by simultaneous integration of visual and textual features from a graph partitioning perspecth'e. In particular,....-e modelled visual features, images, and words from the surrounding text of the images using a tripartite graph. This gra.ph is actually considered as a fusion of two bipartite graphs that are partitioned simultaneously by the proposed Consistent)soperimetric High-order Co-clustering (CIHC) framework. Ahhough a similar approach has been adopted before, the main contribution of this work lies in the computational efficiency, quality in \Veb image clustering and scalability to large image repositories that CIHC is able to achieve. We demonstrate this tluough experimental results performed on real Web images

    Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering

    No full text
    With the explosive growth of Web and the recent development in digital media technology, the number of images on the Web has grown tremendously. Consequently, Web image clustering has emerged as an important application. Some of the initial efforts along this direction revolved around clustering Web images based on the visual features of images or textual features by making use of the text surrounding the images. However, not much work has been done in using multimodal information for clustering Web images. In this paper, we propose a graph theoretical framework for simultaneously integrating visual and textual features for efficient Web image clustering. Specifically, we model visual features, images and words from surrounding text using a tripartite graph. Partitioning this graph leads to clustering of the Web images. Although, graph partitioning approach has been adopted before, the main contribution of this work lies in a new algorithm that we propose- Consistent Isoperimetric High-order Co-clustering (CIHC), for partitioning the tripartite graph. Computationally, CIHC is very quick as it requires a simple solution to a sparse system of linear equations. Our theoretical analysis and extensive experiments performed on real Web images demonstrate the performance of CIHC in terms of the quality, efficiency and scalability in partitioning the visual feature-image-word tripartite graph

    Incorporating User Provided Constraints into Document Clustering

    No full text
    Document clustering without any prior knowledge or background information is a challenging problem. In this paper, we propose SS-NMF: a semi-supervised non-negative matrix factorization framework for document clus-tering. In SS-NMF, users are able to provide supervision for document clustering in terms of pairwise constraints on a few documents specifying whether they “must ” or “can-not ” be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the document-document similarity matrix to infer the document clusters. Theoretically, we show that SS-NMF provides a general framework for semi-supervised clustering and that existing approaches can be considered as special cases of SS-NMF. Through extensive experiments conducted on publicly avail-able data sets, we demonstrate the superior performance of SS-NMF for clustering documents. 1
    corecore