8 research outputs found

    Automatically Discovering the Number of Clusters in Web Page Datasets

    Get PDF
    Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the measure of average inter-cluster similarity reaches a constant of 1.7 when all our experiments produced the best results for clustering Web pages. We determine the number of clusters by using the constant as the stopping factor in our clustering process by arranging individual Web pages into clusters and then arranging the clusters into larger clusters and so on until the average inter-cluster similarity approaches the constant. Having the new method described in this paper together with our new Bidirectional Hierarchical Clustering algorithm reported elsewhere, we have developed a clustering system suitable for mining the Web

    WINVR09-701 VIRTUAL PROTOTYPING BY USING HOLOGRAPHIC DISPLAYS -BUT WHAT ABOUT LARGE DATA PROBLEMS?

    Get PDF
    ABSTRACT The work reported in this paper is part of the research that explores the viability of using holographic displays as part of virtual prototyping package of supporting tools. The focus is principally on handling of large data problems, which are among the common problems of most volumetric displays. The paper first reviews related works. After describing the large data related problems that designers might face in using holographic displays and identifying the conceptual design tasks that could be supported by using these displays, a concept for handling the large data problems through elimination of irrelevant image details is introduced. An application example showing how some elements of the proposed concept function in the real world is also presented. The main contributions of this work can be summarized as follows: (i) we have demonstrated that through simplifications, visual abstractions, data clustering or other generalization methods, less complicated holographic images that require less computing resources but yet suitable 3D for some conceptual design tasks or virtual prototyping can be created; and (ii) we have defined the steps of a scalable highlevel algorithm, that can be expanded or tuned to suit visualization demands in various conceptual design tasks. In the ongoing work, we aim to develop built in procedures within the proposed algorithm that would reduce the amount of image details without significantly affecting the appropriateness of the overall virtual model. And because of the reduced image details, it would be possible to display less complicated 3D virtual objects and in this way computing resources could be saved. KEYWORDS 3D product visualization, virtual prototyping, volumetric displays, computer-aided virtual design, conceptual design

    Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation.

    No full text
    The goal of clustering is to identify distinct groups in a dataset. Compared to non-parametric clustering methods like complete linkage, hierarchical model-based clustering has the advantage of offering a way to estimate the number of groups present in the data. However, its computational cost is quadratic in the number of items to be clustered, and it is therefore not applicable to large problems. We review an idea called Fractionation, originally conceived by Cutting, Karger, Pedersen and Tukey for non-parametric hierarchical clustering of large datasets, and describe an adaptation of Fractionation to model-based clustering. A further extension, called Refractionation, leads to a procedure that can be successful even in the difficult situation where there are large numbers of small groups

    Escalabilidad de algoritmos on-line con optimización de ram

    Get PDF
    A través de los años, el ser humano ha tratado de dotar de cierta inteligencia a las máquinas para que realicen tareas propias del hombre de forma automática con la finalidad de alcanzar objetivos como su implementación en ambientes reales, en los que se requiere de una alta fiabilidad y rendimiento. Una de estas áreas de investigación es el Reconocimiento de Patrones (RP), en el que se intenta reproducir el proceso biológico a través del uso de algoritmos que ayuden a la máquina a identificar objetos y clasificarlos como lo hace el ser humano. A partir de la automatización, el incremento continuo de los datos y el requerimiento de información, el uso de los sistemas de RP se ha extendido, de forma que en la actualidad, la capacidad para reconocer patrones es en muchas áreas de gran importancia, de manera que las tareas puedan realizarse de forma rápida y eficaz [Ripley, 2008]. La visión por computadora, el reconocimiento de voz, caracteres e imágenes, el diagnostico asistido por computadora y la minería de datos, con la que puede descubrirse información relevante escondida en el conjunto de datos. Son algunas de las aplicaciones del reconocimiento de patrones
    corecore