98,033 research outputs found

    Exploratory topic modeling with distributional semantics

    Full text link
    As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover. With unsupervised, exploratory analysis, no prior knowledge about the content is required and highly open-ended tasks can be supported. In the past few years, probabilistic topic modeling has emerged as a popular approach to this problem. Nevertheless, the representation of the latent topics as aggregations of semi-coherent terms limits their interpretability and level of detail. This paper presents an alternative approach to topic modeling that maps topics as a network for exploration, based on distributional semantics using learned word vectors. From the granular level of terms and their semantic similarity relations global topic structures emerge as clustered regions and gradients of concepts. Moreover, the paper discusses the visual interactive representation of the topic map, which plays an important role in supporting its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015

    Gaia Data Release 1: the archive visualisation service

    Get PDF
    Context. The first Gaia data release (DR1) delivered a catalogue of astrometry and photometry for over a billion astronomical sources. Within the panoplyof methods used for data exploration, visualisation is often the starting point and even the guiding reference for scientific thought. However, this is a volume of data that cannot be efficiently explored using traditional tools, techniques, and habits. Aims. We aim to provide a global visual exploration service for the Gaia archive, something that is not possible out of the box for most people. The service has two main goals. The first is to provide a software platform for interactive visual exploration of the archive contents, using common personal computers and mobile devices available to most users. The second aim is to produce intelligible and appealing visual representations of the enormous information content of the archive. Methods. The interactive exploration service follows a client-server design. The server runs close to the data, at the archive, and is responsible for hiding as far as possible the complexity and volume of the Gaia data from the client. This is achieved by serving visual detail on demand. Levels of detail are pre-computed using data aggregation and subsampling techniques. For DR1, the client is a web application that provides an interactive multi-panel visualisation workspace as well as a graphical user interface. Results. The Gaia archive Visualisation Service offers a web-based multi-panel interactive visualisation desktop in a browser tab. It currently provides highly configurable 1D histograms and 2D scatter plots of Gaia DR1 and the Tycho-Gaia Astrometric Solution (TGAS) with linked views. An innovative feature is the creation of ADQL queries from visually defined regions in plots. These visual queries are ready for use in the Gaia Archive Search/data retrieval service. In addition, regions around user-selected objects can be further examined with automatically generated SIMBAD searches. Integration of the Aladin Lite and JS9 applications add support to the visualisation of HiPS and FITS maps. The production of the all-sky source density map that became the iconic image of Gaia DR1 is described in detail. Conclusions. On the day of DR1, over seven thousand users accessed the Gaia Archive visualisation portal. The system, running on a single machine, proved robust and did not fail while enabling thousands of users to visualise and explore the over one billion sources in DR1. There are still several limitations, most noticeably that users may only choose from a list of pre-computed visualisations. Thus, other visualisation applications that can complement the archive service are examined. Finally, development plans for Data Release 2 are presented

    Evaluating web-based static, animated and interactive maps for injury prevention

    Get PDF
    This is the final version of the article. Available from PAGEpress via the DOI in this record.Public health planning can benefit from visual exploration and analysis of geospatial data. Maps and geovisualization tools must be developed with the user-group in mind. User-needs assessment and usability testing are crucial elements in the iterative process of map design and implementation. This study presents the results of a usability test of static, animated and interactive maps of injury rates and socio-demographic determinants of injury by a sample of potential end-users in Toronto, Canada. The results of the user-testing suggest that different map types are useful for different purposes and for satisfying the varying skill level of the individual user. The static maps were deemed to be easy to use and versatile, while the animated maps could be made more useful if animation controls were provided. The split-screen concept of the interactive maps was highlighted as particularly effective for map comparison. Overall, interactive maps were identified as the preferred map type for comparing patterns of injury and related socio-demographic risk factors. Information collected from the user-tests is being used to expand and refine the injury web maps for Toronto, and could inform other public health-related geo-visualization projects.Partial funding for this project was provided by the Natural Sciences and Engineering Research Council of Canada and the Canadian Institutes of Health Research

    Blaeu: Mapping and navigating large tables with cluster analysis

    Get PDF
    Blaeu is an interactive database exploration tool. Its aim is to guide casual users through large data tables, ultimately triggering insights and serendipity. To do so, it relies on a double cluster analysis mechanism. It clusters the data vertically: it detects themes, groups of mutually dependent columns that highlight one aspect of the data. Then it clusters the data horizontally. For each theme, it produces a data map, an interactive visualization of the clusters in the table. The data maps summarize the data. They provide a visual synopsis of the clusters, as well as facilities to inspect their content and annotate them. But they also let the users navigate further. Our explorers can change the active set of columns or drill down into the clusters to refine their selection. Our prototype is fully operational, ready to deliver insights from complex databases

    Interactive tag maps and tag clouds for the multiscale exploration of large spatio-temporal datasets

    Get PDF
    'Tag clouds' and 'tag maps' are introduced to represent geographically referenced text. In combination, these aspatial and spatial views are used to explore a large structured spatio-temporal data set by providing overviews and filtering by text and geography. Prototypes are implemented using freely available technologies including Google Earth and Yahoo! 's Tag Map applet. The interactive tag map and tag cloud techniques and the rapid prototyping method used are informally evaluated through successes and limitations encountered. Preliminary evaluation suggests that the techniques may be useful for generating insights when visualizing large data sets containing geo-referenced text strings. The rapid prototyping approach enabled the technique to be developed and evaluated, leading to geovisualization through which a number of ideas were generated. Limitations of this approach are reflected upon. Tag placement, generalisation and prominence at different scales are issues which have come to light in this study that warrant further work

    Visual and interactive exploration of point data

    Get PDF
    Point data, such as Unit Postcodes (UPC), can provide very detailed information at fine scales of resolution. For instance, socio-economic attributes are commonly assigned to UPC. Hence, they can be represented as points and observable at the postcode level. Using UPC as a common field allows the concatenation of variables from disparate data sources that can potentially support sophisticated spatial analysis. However, visualising UPC in urban areas has at least three limitations. First, at small scales UPC occurrences can be very dense making their visualisation as points difficult. On the other hand, patterns in the associated attribute values are often hardly recognisable at large scales. Secondly, UPC can be used as a common field to allow the concatenation of highly multivariate data sets with an associated postcode. Finally, socio-economic variables assigned to UPC (such as the ones used here) can be non-Normal in their distributions as a result of a large presence of zero values and high variances which constrain their analysis using traditional statistics. This paper discusses a Point Visualisation Tool (PVT), a proof-of-concept system developed to visually explore point data. Various well-known visualisation techniques were implemented to enable their interactive and dynamic interrogation. PVT provides multiple representations of point data to facilitate the understanding of the relations between attributes or variables as well as their spatial characteristics. Brushing between alternative views is used to link several representations of a single attribute, as well as to simultaneously explore more than one variable. PVT’s functionality shows how the use of visual techniques embedded in an interactive environment enable the exploration of large amounts of multivariate point data

    Information maps: tools for document exploration

    Get PDF

    Interactive visual exploration of a large spatio-temporal dataset: Reflections on a geovisualization mashup

    Get PDF
    Exploratory visual analysis is useful for the preliminary investigation of large structured, multifaceted spatio-temporal datasets. This process requires the selection and aggregation of records by time, space and attribute, the ability to transform data and the flexibility to apply appropriate visual encodings and interactions. We propose an approach inspired by geographical 'mashups' in which freely-available functionality and data are loosely but flexibly combined using de facto exchange standards. Our case study combines MySQL, PHP and the LandSerf GIS to allow Google Earth to be used for visual synthesis and interaction with encodings described in KML. This approach is applied to the exploration of a log of 1.42 million requests made of a mobile directory service. Novel combinations of interaction and visual encoding are developed including spatial 'tag clouds', 'tag maps', 'data dials' and multi-scale density surfaces. Four aspects of the approach are informally evaluated: the visual encodings employed, their success in the visual exploration of the clataset, the specific tools used and the 'rnashup' approach. Preliminary findings will be beneficial to others considering using mashups for visualization. The specific techniques developed may be more widely applied to offer insights into the structure of multifarious spatio-temporal data of the type explored here

    Using treemaps for variable selection in spatio-temporal visualisation

    Get PDF
    We demonstrate and reflect upon the use of enhanced treemaps that incorporate spatial and temporal ordering for exploring a large multivariate spatio-temporal data set. The resulting data-dense views summarise and simultaneously present hundreds of space-, time-, and variable-constrained subsets of a large multivariate data set in a structure that facilitates their meaningful comparison and supports visual analysis. Interactive techniques allow localised patterns to be explored and subsets of interest selected and compared with the spatial aggregate. Spatial variation is considered through interactive raster maps and high-resolution local road maps. The techniques are developed in the context of 42.2 million records of vehicular activity in a 98 km(2) area of central London and informally evaluated through a design used in the exploratory visualisation of this data set. The main advantages of our technique are the means to simultaneously display hundreds of summaries of the data and to interactively browse hundreds of variable combinations with ordering and symbolism that are consistent and appropriate for space- and time- based variables. These capabilities are difficult to achieve in the case of spatio-temporal data with categorical attributes using existing geovisualisation methods. We acknowledge limitations in the treemap representation but enhance the cognitive plausibility of this popular layout through our two-dimensional ordering algorithm and interactions. Patterns that are expected (e.g. more traffic in central London), interesting (e.g. the spatial and temporal distribution of particular vehicle types) and anomalous (e.g. low speeds on particular road sections) are detected at various scales and locations using the approach. In many cases, anomalies identify biases that may have implications for future use of the data set for analyses and applications. Ordered treemaps appear to have potential as interactive interfaces for variable selection in spatio-temporal visualisation. Information Visualization (2008) 7, 210-224. doi: 10.1057/palgrave.ivs.950018
    • …
    corecore