2 research outputs found

    Generating Market Basket Data with Temporal Information

    Get PDF
    This paper presents a synthetic data generator that outputs timestamped transactional data with embedded temporal patterns controlled by a set of input parameters. In particular, calendar schema, which is determined by a hierarchy of input time granularities, is used as a framework of possible temporal patterns. An example of calendar schema is (year, month, day), which provides a framework for calendar-based temporal patterns of the form -38352 , where each is either an integer or the symbol . For example, is such a pattern, which corresponds to the time intervals consisting of all the 16th days of all months in year 2000. This paper also evaluates the data generator through a series of experiments. The synthetic data generator is intended to provide support for data mining community in evaluating various aspects (especially the temporal aspects and the scalability) of data mining algorithms

    The distance–similarity metaphor in network-display spatializations

    Full text link
    Dimensionality reduction algorithms are applied in the field of information visualization to generate low-dimensional, visuo-spatial displays of complex, multivariate databases—spatializations. Most popular dimensionality reduction algorithms project relatedness in data content among entities in an information space (e. g., semantic similarity) onto some form of distance among the entities, such that semantically similar documents are placed closer to one another than less similar ones. In previous studies of point-display spatializations we have shown that people indeed associate metric straight-line inter-point distances with the semantic dissimilarity of documents depicted as points in two-dimensional space. In this paper we investigate the strategies viewers employ when conflicting notions of distance (straight-line metric vs. network metric vs. topological proximity) are jointly shown in a spatialized network display of Reuters news articles depicted as points connected by links. We report empirical results of an experiment where viewers are asked to assess document similarity, depending on various distance types. We also investigate how cartographic symbolization principles (the use of visual variables, such as size, color hue, and value) influence similarity judgments. These findings provide rare empirical evidence for generally accepted design practices within the cartographic community (e. g., the effects of visual variables). In addition, empirical results from this and related studies can be used to develop design guidelines for constructing cognitively adequate spatializations for knowledge discovery in very large databases. We conclude by presenting design guidelines for network spatializations within the context of cartographic practice and theory
    corecore