8 research outputs found

    Interactive rendering and efficient querying for large multivariate seismic volumes on consumer level PCs

    Get PDF
    pre-printWe present a volume visualization method that allows interactive rendering and efficient querying of large multivariate seismic volume data on consumer level PCs. The volume rendering pipeline utilizes a virtual memory structure that supports out-of-core mul- tivariate multi-resolution data and a GPU-based ray caster that allows interactive multivariate transfer function design. A Gaussian mixture model representation is precomputed and nearly interactive querying is achieved by testing the Gaussian functions against user defined transfer functions on the GPU in the runtime. Finally, the method has been tested on a multivariate 3D seismic dataset which is larger than the size of the main memory of the testing machine

    Comparative Uncertainty Visualization for High-Level Analysis of Scalar- and Vector-Valued Ensembles

    Get PDF
    With this thesis, I contribute to the research field of uncertainty visualization, considering parameter dependencies in multi valued fields and the uncertainty of automated data analysis. Like uncertainty visualization in general, both of these fields are becoming more and more important due to increasing computational power, growing importance and availability of complex models and collected data, and progress in artificial intelligence. I contribute in the following application areas: Uncertain Topology of Scalar Field Ensembles. The generalization of topology-based visualizations to multi valued data involves many challenges. An example is the comparative visualization of multiple contour trees, complicated by the random nature of prevalent contour tree layout algorithms. I present a novel approach for the comparative visualization of contour trees - the Fuzzy Contour Tree. Uncertain Topological Features in Time-Dependent Scalar Fields. Tracking features in time-dependent scalar fields is an active field of research, where most approaches rely on the comparison of consecutive time steps. I created a more holistic visualization for time-varying scalar field topology by adapting Fuzzy Contour Trees to the time-dependent setting. Uncertain Trajectories in Vector Field Ensembles. Visitation maps are an intuitive and well-known visualization of uncertain trajectories in vector field ensembles. For large ensembles, visitation maps are not applicable, or only with extensive time requirements. I developed Visitation Graphs, a new representation and data reduction method for vector field ensembles that can be calculated in situ and is an optimal basis for the efficient generation of visitation maps. This is accomplished by bringing forward calculation times to the pre-processing. Visually Supported Anomaly Detection in Cyber Security. Numerous cyber attacks and the increasing complexity of networks and their protection necessitate the application of automated data analysis in cyber security. Due to uncertainty in automated anomaly detection, the results need to be communicated to analysts to ensure appropriate reactions. I introduce a visualization system combining device readings and anomaly detection results: the Security in Process System. To further support analysts I developed an application agnostic framework that supports the integration of knowledge assistance and applied it to the Security in Process System. I present this Knowledge Rocks Framework, its application and the results of evaluations for both, the original and the knowledge assisted Security in Process System. For all presented systems, I provide implementation details, illustrations and applications

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB

    Monitoring, analysis and optimisation of I/O in parallel applications

    Get PDF
    High performance computing (HPC) is changing the way science is performed in the 21st Century; experiments that once took enormous amounts of time, were dangerous and often produced inaccurate results can now be performed and refined in a fraction of the time in a simulation environment. Current generation supercomputers are running in excess of 1016 floating point operations per second, and the push towards exascale will see this increase by two orders of magnitude. To achieve this level of performance it is thought that applications may have to scale to potentially billions of simultaneous threads, pushing hardware to its limits and severely impacting failure rates. To reduce the cost of these failures, many applications use checkpointing to periodically save their state to persistent storage, such that, in the event of a failure, computation can be restarted without significant data loss. As computational power has grown by approximately 2x every 18 ? 24 months, persistent storage has lagged behind; checkpointing is fast becoming a bottleneck to performance. Several software and hardware solutions have been presented to solve the current I/O problem being experienced in the HPC community and this thesis examines some of these. Specifically, this thesis presents a tool designed for analysing and optimising the I/O behaviour of scientific applications, as well as a tool designed to allow the rapid analysis of one software solution to the problem of parallel I/O, namely the parallel log-structured file system (PLFS). This thesis ends with an analysis of a modern Lustre file system under contention from multiple applications and multiple compute nodes running the same problem through PLFS. The results and analysis presented outline a framework through which application settings and procurement decisions can be made

    Constraint Programming-based Job Dispatching for Modern HPC Applications

    Get PDF
    A High-Performance Computing job dispatcher is a critical software that assigns the finite computing resources to submitted jobs. This resource assignment over time is known as the on-line job dispatching problem in HPC systems. The fact the problem is on-line means that solutions must be computed in real-time, and their required time cannot exceed some threshold to do not affect the normal system functioning. In addition, a job dispatcher must deal with a lot of uncertainty: submission times, the number of requested resources, and duration of jobs. Heuristic-based techniques have been broadly used in HPC systems, at the cost of achieving (sub-)optimal solutions in a short time. However, the scheduling and resource allocation components are separated, thus generates a decoupled decision that may cause a performance loss. Optimization-based techniques are less used for this problem, although they can significantly improve the performance of HPC systems at the expense of higher computation time. Nowadays, HPC systems are being used for modern applications, such as big data analytics and predictive model building, that employ, in general, many short jobs. However, this information is unknown at dispatching time, and job dispatchers need to process large numbers of them quickly while ensuring high Quality-of-Service (QoS) levels. Constraint Programming (CP) has been shown to be an effective approach to tackle job dispatching problems. However, state-of-the-art CP-based job dispatchers are unable to satisfy the challenges of on-line dispatching, such as generate dispatching decisions in a brief period and integrate current and past information of the housing system. Given the previous reasons, we propose CP-based dispatchers that are more suitable for HPC systems running modern applications, generating on-line dispatching decisions in a proper time and are able to make effective use of job duration predictions to improve QoS levels, especially for workloads dominated by short jobs

    Neues Konzept zur skalierbaren, explorativen Analyse großer Zeitreihendaten mit Anwendung auf umfangreiche Stromnetz-Messdaten

    Get PDF
    Diese Arbeit beschäftigt sich mit der Entwicklung und Anwendung eines neuen Konzepts zur skalierbaren explorativen Analyse großer Zeitreihendaten. Hierzu werden zahlreiche datenintensive Methoden aus dem Bereich des Data-Mining und der Zeitreihenanalyse hinsichtlich ihrer Skalierbarkeit mit wachsendem Datenvolumen untersucht und neue Verfahren und Datenrepräsentationen vorgestellt, die eine Exploration sehr großer Zeitreihendaten erlauben, die mit herkömmlichen Methoden nicht effizient auswertbar sind und unter dem Begriff Big Data eingeordnet werden können. Methoden zur Verwaltung und Visualisierung großer multivariater Zeitreihen werden mit Methoden zur Detektion seltener und häufiger Muster – sog. Discords und Motifs – kombiniert und zu einem leistungsfähigen Explorationssystem namens ViAT (von engl. Visual Analysis of Time series) zusammengefasst. Um auch Analysen von Zeitreihendaten durchführen zu können, deren Datenvolumen hunderte von Terabyte und mehr umfasst, wurde eine datenparallele verteilte Verarbeitung auf Basis von Apache Hadoop entwickelt. Sie erlaubt die Ableitung datenreduzierter Metadaten, welche statistische Eigenschaften und neuartige Strukturbeschreibungen der Zeitreihen enthalten. Auf dieser Basis sind neue inhaltsbasierte Anfragen und Auswertungen sowie Suchen nach bekannten und zuvor unbekannten Mustern in den Daten möglich. Das Design der entwickelten neuen Methoden und deren Integration zu einem Gesamtsystem namens FraScaTi (von engl. Framework for Scalable management and analysis of Time series data) wird vorgestellt. Das System wird evaluiert und im Anwendungsfeld der Stromnetzanalyse erprobt, welches von der Skalierbarkeit und den neuartigen Analysemöglichkeiten profitiert. Hierzu wird eine explorative Analyse hochfrequenter Stromnetz-Messdaten durchgeführt, deren Ergebnisse im Kontext des Anwendungsbereichs präsentiert und diskutiert werden

    Explorative coastal oceanographic visual analytics : oceans of data

    Get PDF
    The widely acknowledged challenge to data analysis and understanding, resulting from the exponential increase in volumes of data generated by increasingly complex modelling and sampling systems, is a problem experienced by many researchers, including ocean scientists. The thesis explores a visualization and visual analytics solution for predictive studies of coastal shelf and estuarine modelled, hydrodynamics undertaken to understand sea level rise, as a contribution to wider climate change studies, and to underpin coastal zone planning, flood prevention and extreme event management. But these studies are complex and require numerous simulations of estuarine hydrodynamics, generating extremely large datasets of multi-field data. This type\ud of data is acknowledged as difficult to visualize and analyse, as its numerous attributes present significant computational challenges, and ideally require a wide range of approaches to provide the necessary insight. These challenges are not easily overcome with the current visualization and analysis methodologies employed by coastal shelf hydrodynamic researchers, who use several software systems to generate graphs, each taking considerable time to operate, thus it is difficult to explore different scenarios and explore the data interactively and visually. The thesis, therefore, develops novel visualization and visual analytics techniques to help researchers overcome the limitations of existing methods (for example in understanding key tidal components); analyse data in a timely manner and explore different scenarios. There were a number of challenges to this: the size of the data, resulting in lengthy computing time, also many data values becoming plotted on one pixel (overplotting). The thesis presents: (1) a new visualization framework (VINCA) using caching and hierarchical aggregation techniques to make the data more interactive, plus explorative, coordinated multiple views, to enable the scientists to explore the data. (2) A novel estuarine transect profiler and flux tool, which provides instantaneous flux calculations across an estuary. Measures of flux are of great significance in oceanographic studies, yet are notoriously difficult and time consuming to calculate with the commonly used tools. This derived data is added back into the database for further investigation and analysis. (3) New views, including a novel, dynamic, spatially aggregated Parallel Coordinate Plots (Sa-PCP), are developed to provide different perspectives of the spatial, time dependent data, also methodologies for developing high-quality (journal ready) output from the visualization tool. Finally, (4) the dissertation explored the use of hierarchical data-structures and caching techniques to enable fast analysis on a desktop computer and to overcome the overplotting challenge for this data
    corecore