8 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationWith the ever-increasing amount of available computing resources and sensing devices, a wide variety of high-dimensional datasets are being produced in numerous fields. The complexity and increasing popularity of these data have led to new challenges and opportunities in visualization. Since most display devices are limited to communication through two-dimensional (2D) images, many visualization methods rely on 2D projections to express high-dimensional information. Such a reduction of dimension leads to an explosion in the number of 2D representations required to visualize high-dimensional spaces, each giving a glimpse of the high-dimensional information. As a result, one of the most important challenges in visualizing high-dimensional datasets is the automatic filtration and summarization of the large exploration space consisting of all 2D projections. In this dissertation, a new type of algorithm is introduced to reduce the exploration space that identifies a small set of projections that capture the intrinsic structure of high-dimensional data. In addition, a general framework for summarizing the structure of quality measures in the space of all linear 2D projections is presented. However, identifying the representative or informative projections is only part of the challenge. Due to the high-dimensional nature of these datasets, obtaining insights and arriving at conclusions based solely on 2D representations are limited and prone to error. How to interpret the inaccuracies and resolve the ambiguity in the 2D projections is the other half of the puzzle. This dissertation introduces projection distortion error measures and interactive manipulation schemes that allow the understanding of high-dimensional structures via data manipulation in 2D projections

    Cognitive Foundations for Visual Analytics

    Full text link

    New visualization of surfaces in parallel coordinates: eliminating ambiguity and some “over-plotting”

    Get PDF
    A point P 2 Rn is represented in Parallel Coordinates by a polygonal line ¯ P (see [Ins99] for a recent survey). Earlier [Ins85], a surface ¾ was represented as the envelope of the polygonal lines representing it’s points. This is ambiguous in the sense that different surfaces can provide the same envelopes. Here the ambiguity is eliminated by considering the surface ¾ as the envelope of it’s tangent planes and in turn, representing each of these planes by n-1 points [Ins99]. This, with some future extension, can yield a new and unambiguous representation, ¯¾, of the surface consisting of n-1 planar regions whose properties correspond lead to the recognition of the surfaces’ properties i.e. developable, ruled etc. [Hun92]) and classification criteria. It is further shown that the image (i.e. representation) of an algebraic surface of degree 2 in Rn is a region whose boundary is also an algebraic curve of degree 2. This includes some non-convex surfaces which with the previous ambiguous representation could not be treated. An efficient construction algorithm for the representation of the quadratic surfaces (given either by explicit or implicit equation) is provided. The results obtained are suitable for applications, to be presented in a future paper, and in particular for the approximation of complex surfaces based on their planar images. An additional benefit is the elimination of the “over-plotting” problem i.e. the “bunching” of polygonal lines which often obscure part of the parallel-coordinate display

    A visual analytics approach for visualisation and knowledge discovery from time-varying personal life data

    Get PDF
    A thesis submitted to the University of Bedfordshire, in ful filment of the requirements for the degree of Doctor of PhilosophyToday, the importance of big data from lifestyles and work activities has been the focus of much research. At the same time, advances in modern sensor technologies have enabled self-logging of a signi cant number of daily activities and movements. Lifestyle logging produces a wide variety of personal data along the lifespan of individuals, including locations, movements, travel distance, step counts and the like, and can be useful in many areas such as healthcare, personal life management, memory recall, and socialisation. However, the amount of obtainable personal life logging data has enormously increased and stands in need of effective processing, analysis, and visualisation to provide hidden insights owing to the lack of semantic information (particularly in spatiotemporal data), complexity, large volume of trivial records, and absence of effective information visualisation on a large scale. Meanwhile, new technologies such as visual analytics have emerged with great potential in data mining and visualisation to overcome the challenges in handling such data and to support individuals in many aspects of their life. Thus, this thesis contemplates the importance of scalability and conducts a comprehensive investigation into visual analytics and its impact on the process of knowledge discovery from the European Commission project MyHealthAvatar at the Centre for Visualisation and Data Analytics by actively involving individuals in order to establish a credible reasoning and effectual interactive visualisation of such multivariate data with particular focus on lifestyle and personal events. To this end, this work widely reviews the foremost existing work on data mining (with the particular focus on semantic enrichment and ranking), data visualisation (of time-oriented, personal, and spatiotemporal data), and methodical evaluations of such approaches. Subsequently, a novel automated place annotation is introduced with multilevel probabilistic latent semantic analysis to automatically attach relevant information to the collected personal spatiotemporal data with low or no semantic information in order to address the inadequate information, which is essential for the process of knowledge discovery. Correspondingly, a multi-signi ficance event ranking model is introduced by involving a number of factors as well as individuals' preferences, which can influence the result within the process of analysis towards credible and high-quality knowledge discovery. The data mining models are assessed in terms of accurateness and performance. The results showed that both models are highly capable of enriching the raw data and providing significant events based on user preferences. An interactive visualisation is also designed and implemented including a set of novel visual components signifi cantly based upon human perception and attentiveness to visualise the extracted knowledge. Each visual component is evaluated iteratively based on usability and perceptibility in order to enhance the visualisation towards reaching the goal of this thesis. Lastly, three integrated visual analytics tools (platforms) are designed and implemented in order to demonstrate how the data mining models and interactive visualisation can be exploited to support different aspects of personal life, such as lifestyle, life pattern, and memory recall (reminiscence). The result of the evaluation for the three integrated visual analytics tools showed that this visual analytics approach can deliver a remarkable experience in gaining knowledge and supporting the users' life in certain aspects

    Text in Visualization: Extending the Visualization Design Space

    Get PDF
    This thesis is a systematic exploration and expansion of the design space of data visualization specifically with regards to text. A critical analysis of text in data visualizations reveals gaps in existing frameworks and the use of text in practice. A cross-disciplinary review across fields such as typography, cartography and technical applications yields typographic techniques to encode data into text and provides the scope for the expanded design space. Mapping new attributes, techniques and considerations back to well understood visualization principles organizes the design space of text in visualization. This design space includes: 1) text as a primary data type literally encoded into alphanumeric glyphs, 2) typographic attributes, such as bold and italic, capable of encoding additional data onto literal text, 3) scope of mark, ranging from individual glyphs, syllables and words; to sentences, paragraphs and documents, and 4) layout of these text elements applicable most known visualization techniques and text specific techniques such as tables. This is the primary contribution of this thesis (Part A and B). Then, this design space is used to facilitate the design, implementation and evaluation of new types of visualization techniques, ranging from enhancements of existing techniques, such as, extending scatterplots and graphs with literal marks, stem & leaf plots with multivariate glyphs and broader scope, and microtext line charts; to new visualization techniques, such as, multivariate typographic thematic maps; text formatted to facilitate skimming; and proportionally encoding quantitative values in running text – all of which are new contributions to the field (Part C). Finally, a broad evaluation across the framework and the sample visualizations with cross-discipline expert critiques and a metrics based approach reveals some concerns and many opportunities pointing towards a breadth of future research work now possible with this new framework. (Part D and E)

    Development of an Advanced Molecular Profiling Pipeline for Human Population Screening

    Get PDF
    The interaction between a human’s genes and their environment is dynamic, producing phenotypes that are subject to variance among individuals and across time. Metabolic interpretation of phenotypes, including the elucidation of underlying biochemical causes and effects for physiological or pathological processes, allows for the potential discovery of biomarkers and diagnostics which are important in understanding human health and disease. The study of large cohorts has been pursued in hopes of gaining sufficient statistical power to observe subtle biochemical processes relevant to human phenotypes. In order to minimise the effects of analytical variance in metabolic profiling and maximise extractable information, it is necessary to develop a refined analytical approach to large scale metabolic profiling that allows for efficient and high quality collection of data, facilitating analysis on a scale appropriate for molecular epidemiology applications. The analytical methods used for the multidimensional separation and detection of metabolic content from complex biofluids must be made fit for this purpose, deriving data with unprecedented reproducibility for direct comparison of metabolic profiles across thousands of individuals. Furthermore, computational methods must be established for collating this data into a form that is suitable for analysis and interpretation without compromising the quality achieved in the raw data. These developments together constitute a pipeline for large scale analysis, the components of which are explored and refined herein with a common thread of improving laboratory efficiency and measurement precision. Complimentary chromatographic methods are developed and implemented in the separation of human urine samples, and further mated to separation and detection by mass spectrometry to provide information rich metabolic maps. This system is optimised to derive precision from sustained analysis, with emphasis on minimisation of sample batching thereby allowing the development of metabolite collation tools that leverage the chromatographic reproducibility. Finally, the challenge of metabolite identification in molecular profiling is conceptually addressed in a manner that does not preclude the further reinvention of the analytical approaches established within this thesis. In summary, the thesis offers a novel and practical analytical pipeline suitable for achieving high quality population phenotyping and metabolome wide association studies.Open Acces
    corecore