414 research outputs found

    Efficient Point Clustering for Visualization

    Get PDF
    The visualization of large spatial point data sets constitutes a problem with respect to runtime and quality. A visualization of raw data often leads to occlusion and clutter and thus a loss of information. Furthermore, particularly mobile devices have problems in displaying millions of data items. Often, thinning via sampling is not the optimal choice because users want to see distributional patterns, cardinalities and outliers. In particular for visual analytics, an aggregation of this type of data is very valuable for providing an interactive user experience. This thesis defines the problem of visual point clustering that leads to proportional circle maps. It furthermore introduces a set of quality measures that assess different aspects of resulting circle representations. The Circle Merging Quadtree constitutes a novel and efficient method to produce visual point clusterings via aggregation. It is able to outperform comparable methods in terms of runtime and also by evaluating it with the aforementioned quality measures. Moreover, the introduction of a preprocessing step leads to further substantial performance improvements and a guaranteed stability of the Circle Merging Quadtree. This thesis furthermore addresses the incorporation of miscellaneous attributes into the aggregation. It discusses means to provide statistical values for numerical and textual attributes that are suitable for side-views such as plots and data tables. The incorporation of multiple data sets or data sets that contain class attributes poses another problem for aggregation and visualization. This thesis provides methods for extending the Circle Merging Quadtree to output pie chart maps or maps that contain circle packings. For the latter variant, this thesis provides results of a user study that investigates the methods and the introduced quality criteria. In the context of providing methods for interactive data visualization, this thesis finally presents the VAT System, where VAT stands for visualization, analysis and transformation. This system constitutes an exploratory geographical information system that implements principles of visual analytics for working with spatio-temporal data. This thesis details on the user interface concept for facilitating exploratory analysis and provides the results of two user studies that assess the approach

    Accessible software frameworks for reproducible image analysis of host-pathogen interactions

    Get PDF
    Um die Mechanismen hinter lebensgefährlichen Krankheiten zu verstehen, müssen die zugrundeliegenden Interaktionen zwischen den Wirtszellen und krankheitserregenden Mikroorganismen bekannt sein. Die kontinuierlichen Verbesserungen in bildgebenden Verfahren und Computertechnologien ermöglichen die Anwendung von Methoden aus der bildbasierten Systembiologie, welche moderne Computeralgorithmen benutzt um das Verhalten von Zellen, Geweben oder ganzen Organen präzise zu messen. Um den Standards des digitalen Managements von Forschungsdaten zu genügen, müssen Algorithmen den FAIR-Prinzipien (Findability, Accessibility, Interoperability, and Reusability) entsprechen und zur Verbreitung ebenjener in der wissenschaftlichen Gemeinschaft beitragen. Dies ist insbesondere wichtig für interdisziplinäre Teams bestehend aus Experimentatoren und Informatikern, in denen Computerprogramme zur Verbesserung der Kommunikation und schnellerer Adaption von neuen Technologien beitragen können. In dieser Arbeit wurden daher Software-Frameworks entwickelt, welche dazu beitragen die FAIR-Prinzipien durch die Entwicklung von standardisierten, reproduzierbaren, hochperformanten, und leicht zugänglichen Softwarepaketen zur Quantifizierung von Interaktionen in biologischen System zu verbreiten. Zusammenfassend zeigt diese Arbeit wie Software-Frameworks zu der Charakterisierung von Interaktionen zwischen Wirtszellen und Pathogenen beitragen können, indem der Entwurf und die Anwendung von quantitativen und FAIR-kompatiblen Bildanalyseprogrammen vereinfacht werden. Diese Verbesserungen erleichtern zukünftige Kollaborationen mit Lebenswissenschaftlern und Medizinern, was nach dem Prinzip der bildbasierten Systembiologie zur Entwicklung von neuen Experimenten, Bildgebungsverfahren, Algorithmen, und Computermodellen führen wird

    Dynamics of Hybrid Zones at a Continental Scale

    Get PDF
    Hybridization has traditionally been viewed as a happenstance that negatively impacts populations, but is now recognized as an important evolutionary mechanism that can substantially impact the evolutionary trajectories of gene pools, influence adaptive capacity, and contravene or reinforce divergence. Physiographic processes are important drivers of dispersal, alternately funneling populations into isolation, promoting divergence, or facilitating secondary contact of diverged populations, increasing the potential for hybridization. In North America, glacial-interglacial cycles and geomorphological changes have provided a dynamic backdrop over the last two million years that promoted such oscillations of population contraction and expansion. These biogeographic processes have resulted in regional hybrid zones where hybridization spans generations Herein, I explored hybrid zones in two species complexes of reptiles across Eastern, Central, and Southwestern North America. Hybrid zones can influence evolutionary trajectories, and understanding the mechanisms underlying their formation is important for defining appropriate management strategies and can help avoid actions that would inadvertently lead to new hybrid zones. Chapter I assessed differential introgression in a complex of terrestrial turtles, the American Box Turtles (Terrapene spp.), from a contemporary hybrid zone in the southeastern United States. Transcriptomic loci were correlated with environmental predictors to evaluate mechanisms engendering maladapted hybrids and adaptive introgression. Selection against hybrids predominated for inter-specifics but directional introgression did so in conspecifics. Outlier loci also primarily correlated with temperature, reflecting the temperature dependency of ectotherms and underscoring their vulnerability to climate change. Chapter II performed a robust assessment of recently developed machine learning (M-L) approaches to delimit four Terrapene species and evaluate the impact of data filtering and M-L parameter choices. Parameter selections were varied to determine their effects in resolving clusters. The results provide necessary recommendations on using M-L for species delimitation in species complexes defined by secondary contact. These data exemplify usage of M-L software in a phylogenetically complex group. Chapter III describes an R package to visualize some of the analyses from Chapter I. Current software to generate genomic clines does not include functions to visualize the results. Thus, I wrote an API (application programming interface) that does so and also performs other genomic and geographic cline-related tasks. Chapter IV examines historical and contemporary phylogeographic patterns in the Massasaugas (Sistrurus spp.), a type of dwarf rattlesnake found across the Southwest and Central Great Plains. In the Southwest, S. tergeminus tergeminus and S. t. edwardsii putatively diverged in the absence of strong physiographic barriers and physical glaciers, suggesting primary divergence. In contrast, a disjunct population of S. t. tergeminus in Missouri reflects potentially historical secondary contact with S. catenatus. These taxa represent contrasting examples of divergence resulting from alternative phylogeographic processes and contextualizes evolutionarily significant and management units. Combined, the four chapters present population genomic data to elucidate impacts of phylogeographic processes on hybrid zones at a continental scale. The data will promote effective conservation management strategies, as many species in the focal regions have been affected by anthropogenic pressures. In this sense, the results can be extrapolated to co-distributed taxa with similar phylogeographic histories

    The Observational Signatures of Cosmic Strings

    Get PDF
    Cosmic strings were postulated by Kibble in 1976 and, from a theoretical point of view, their existence finds support in modern superstring theories, both in compactification models and in theories with extended additional dimensions. One of the best observational evidences for cosmic strings is the gravitational lensing effects they produce. A first effect is produced by an intervening string along the line of sight which splits in two components (double images) faint background galaxies, thus forming a chain of lensed galaxies along the path of the string. The second optical method is the serendipity discovery through anomalous lensing of extended objects. The huge ratio existing between the string width and length leads to a sort of step function signature on the gravitationally lensed images of background sources. The optical research of cosmic strings signatures suffers from many spurious effects mainly induced by the fact that, in order to be effective, the detection of background galaxies needs to be pushed down to very low flux limits. At these flux levels photometric errors, as well as noise statistics increase the number of spurious detections and, for instance, an application to the Sloan Digital Sky Survey leads to an huge and unrealistic number of candidate pairs. One way to minimize the contamination introduced in the catalogues by the spurious detection, is to increase the contrast by selecting pairs in the 3D space, i.e. by attributing to each galaxy a redshift estimate. At this purpose, a new method for photometric redshifts estimation has been created. The method is based on multiwavelength photometry and on a combination of various data mining techniques developed under the EuroVO and NVO frameworks for data gathering, pre-processing and mining, while relying on the scaling capabilities of the computing grid. This method allowed us to obtain photometric redshifts with an increased accuracy (up to 30%) with respect to the literature. The second fundamental observational evidence for cosmic strings is the signature they are expected to leave in the CMB a signature which may be sought for in the available WMAP data and in the soon to come Planck data. Theory shows that a moving string should produce a step-like discontinuity of low S/N ratio in the CMB, as a consequence of the Doppler shift due to the relative velocity between the string and the observer, thus causing the temperature distribution to deviate from a Gaussian. In the simplifying assumption that the string is a straight discontinuity in space time, we used the S.Co.P.E. computational grid to produce a large number of simulations covering a wide range of values for the velocity of the string, its direction and its distance from the observer. Simulations are produced using a C++ code that generates realistic maps of the CMB temperature distribution in presence of a straight cosmic string. By varying its characteristic parameters, it is possible to explore the signatures left by various types of moving strings. In order to amplify the step-like discontinuity and smooth the noise, maps are then subjected to a “squeezing” procedure. Successively, on the “squeezed” maps, we tested some filters that recognizes high value differences between close pixels. The excellent results of our filter on simulations prompted us to apply it on WMAP 5 years data

    Rapid Segmentation Techniques for Cardiac and Neuroimage Analysis

    Get PDF
    Recent technological advances in medical imaging have allowed for the quick acquisition of highly resolved data to aid in diagnosis and characterization of diseases or to guide interventions. In order to to be integrated into a clinical work flow, accurate and robust methods of analysis must be developed which manage this increase in data. Recent improvements in in- expensive commercially available graphics hardware and General-Purpose Programming on Graphics Processing Units (GPGPU) have allowed for many large scale data analysis problems to be addressed in meaningful time and will continue to as parallel computing technology improves. In this thesis we propose methods to tackle two clinically relevant image segmentation problems: a user-guided segmentation of myocardial scar from Late-Enhancement Magnetic Resonance Images (LE-MRI) and a multi-atlas segmentation pipeline to automatically segment and partition brain tissue from multi-channel MRI. Both methods are based on recent advances in computer vision, in particular max-flow optimization that aims at solving the segmentation problem in continuous space. This allows for (approximately) globally optimal solvers to be employed in multi-region segmentation problems, without the particular drawbacks of their discrete counterparts, graph cuts, which typically present with metrication artefacts. Max-flow solvers are generally able to produce robust results, but are known for being computationally expensive, especially with large datasets, such as volume images. Additionally, we propose two new deformable registration methods based on Gauss-Newton optimization and smooth the resulting deformation fields via total-variation regularization to guarantee the problem is mathematically well-posed. We compare the performance of these two methods against four highly ranked and well-known deformable registration methods on four publicly available databases and are able to demonstrate a highly accurate performance with low run times. The best performing variant is subsequently used in a multi-atlas segmentation pipeline for the segmentation of brain tissue and facilitates fast run times for this computationally expensive approach. All proposed methods are implemented using GPGPU for a substantial increase in computational performance and so facilitate deployment into clinical work flows. We evaluate all proposed algorithms in terms of run times, accuracy, repeatability and errors arising from user interactions and we demonstrate that these methods are able to outperform established methods. The presented approaches demonstrate high performance in comparison with established methods in terms of accuracy and repeatability while largely reducing run times due to the employment of GPU hardware

    Geoinformatic methodologies and quantitative tools for detecting hotspots and for multicriteria ranking and prioritization: application on biodiversity monitoring and conservation

    Get PDF
    Chi ha la responsabilità di gestire un’area protetta non solo deve essere consapevole dei problemi ambientali dell’area ma dovrebbe anche avere a disposizione dati aggiornati e appropriati strumenti metodologici per esaminare accuratamente ogni singolo problema. In effetti, il decisore ambientale deve organizzare in anticipo le fasi necessarie a fronteggiare le prevedibili variazioni che subirà la pressione antropica sulle aree protette. L’obiettivo principale della Tesi è di natura metodologica e riguarda il confronto tra differenti metodi statistici multivariati utili per l’individuazione di punti critici nello spazio e per l’ordinamento degli “oggetti ambientali” di studio e quindi per l’individuazione delle priorità di intervento ambientale. L’obiettivo ambientale generale è la conservazione del patrimonio di biodiversità. L’individuazione, tramite strumenti statistici multivariati, degli habitat aventi priorità ecologica è solamente il primo fondamentale passo per raggiungere tale obiettivo. L’informazione ecologica, integrata nel contesto antropico, è un successivo essenziale passo per effettuare valutazioni ambientali e per pianificare correttamente le azioni volte alla conservazione. Un’ampia serie di dati ed informazioni è stata necessaria per raggiungere questi obiettivi di gestione ambientale. I dati ecologici sono forniti dal Ministero dell’Ambiente Italiano e provengono al Progetto “Carta della Natura” del Paese. I dati demografici sono invece forniti dall’Istituto Italiano di Statistica (ISTAT). I dati si riferiscono a due aree geografiche italiane: la Val Baganza (Parma) e l’Oltrepò Pavese e Appennino Ligure-Emiliano. L’analisi è stata condotta a due differenti livelli spaziali: ecologico-naturalistico (l’habitat) e amministrativo (il Comune). Corrispondentemente, i risultati più significativi ottenuti sono: 1. Livello habitat: il confronto tra due metodi di ordinamento e determinazione delle priorità, il metodo del Vettore Ideale e quello della Preminenza, tramite l’utilizzo di importanti metriche ecologiche come il Valore Ecologico (E.V.) e la Sensibilità Ecologica (E.S.), fornisce dei risultati non direttamente comparabili. Il Vettore Ideale, non essendo un procedimento basato sulla ranghizzazione dei valori originali, sembra essere preferibile nel caso di paesaggi molto eterogenei in senso spaziale. Invece, il metodo della Preminenza probabilmente è da preferire in paesaggi ecologici aventi un basso grado di eterogeneità intesa nel senso di differenze non troppo grandi nel E.V. ed E.S. degli habitat. 2. Livello comunale: Al fine di prendere delle decisioni gestionali ed essendo gli habitat solo delle suddivisioni naturalistiche di un dato territorio, è necessario spostare l’attenzione sulle corrispondenti unità amministrative territoriali (i Comuni). Da questo punto di vista, l’introduzione della demografia risulta essere un elemento centrale oltre che di novità nelle analisi ecologico-ambientali. In effetti, l’analisi demografica rende il risultato di cui al punto 1 molto più realistico introducendo altre dimensioni (la pressione antropica attuale e le sue tendenze) che permettono l’individuazione di aree ecologicamente fragili. Inoltre, tale approccio individua chiaramente le responsabilità ambientali di ogni singolo ente territoriale nei riguardi della difesa della biodiversità. In effetti un ordinamento dei Comuni sulla base delle caratteristiche ambientali e demografiche, chiarisce le responsabilità gestionali di ognuno di essi. Un’applicazione concreta di questa necessaria quanto utile integrazione di dati ecologici e demografici viene discussa progettando una Rete Ecologica (E.N.). La Rete cosi ottenuta infatti presenta come elemento di novità il fatto di non essere “statica” bensì “dinamica” nel senso che la sua pianificazione tiene in considerazione il trend di pressione antropica al fine di individuare i probabili punti di futura fragilità e quindi di più critica gestione.Who has the responsibility to manage a conservation zone, not only must be aware of environmental problems but should have at his disposal updated databases and appropriate methodological instruments to examine carefully each individual case. In effect he has to arrange, in advance, the necessary steps to withstand the foreseeable variations in the trends of human pressure on conservation zones. The essential objective of this Thesis is methodological that is to compare different multivariate statistical methods useful for environmental hotspot detection and for environmental prioritization and ranking. The general environmental goal is the conservation of the biodiversity patrimony. The individuation, through multidimensional statistical tools, of habitats having top ecological priority, is only the first basic step to accomplish this aim. Ecological information integrated in the human context is an essential further step to make environmental evaluations and to plan correct conservation actions. A wide series of data and information has been necessary to accomplish environmental management tasks. Ecological data are provided by the Italian Ministry of the Environment and they refer to the Map of Italian Nature Project database. The demographic data derives from the Italian Institute of Statistics (ISTAT). The data utilized regards two Italian areas: Baganza Valley and Oltrepò Pavese and Ligurian-Emilian Apennine. The analysis has been carried out at two different spatial/scale levels: ecological-naturalistic (habitat level) and administrative (Commune level). Correspondingly, the main obtained results are: 1. Habitat level: comparing two ranking and prioritization methods, Ideal Vector and Salience, through important ecological metrics like Ecological Value (E.V.) and Ecological Sensitivity (E.S.), gives results not directly comparable. Being not based on a ranking process, Ideal Vector method seems to be used preferentially in landscapes characterized by high spatial heterogeneity. On the contrary, Salience method is probably to be preferred in ecological landscapes characterized by a low degree of heterogeneity in terms of not large differences concerning habitat E.V. and E.S.. 2. Commune level: Being habitat only a naturalistic partition of a given territory, it is necessary, for management decisions, to move towards the corresponding administrative units (Communes). From this point of view, the introduction of demography is an essential element of novelty in environmental analysis. In effect, demographic analysis makes the goal at point 1 more realistic introducing other dimensions (actual human pressure and its trend) which allows the individuation of environmentally fragile areas. Furthermore this approach individuates clearly the environmental responsibility of each administrative body for what concerns the biodiversity conservation. In effect communes’ ranking, according to environmental/demographic features, clarify the responsibilities of each administrative body. A concrete application of this necessary and useful integration of ecological and demographic data has been developed in designing an Ecological Network (E.N.).The obtained E.N. has the novelty to be not “static” but “dynamic” that is the network planning take into account the demographic pressure trends in the individuation of the probable future fragile points

    Challenges and prospects of spatial machine learning

    Get PDF
    The main objective of this thesis is to improve the usefulness of spatial machine learning for the spatial sciences and to allow its unused potential to be exploited. To achieve this objective, this thesis addresses several important but distinct challenges which spatial machine learning is facing. These are the modeling of spatial autocorrelation and spatial heterogeneity, the selection of an appropriate model for a given spatial problem, and the understanding of complex spatial machine learning models.Das wesentliche Ziel dieser Arbeit ist es, die Nützlichkeit des räumlichen maschinellen Lernens für die Raumwissenschaften zu verbessern und es zu ermöglichen, ungenutztes Potenzial auszuschöpfen. Um dieses Ziel zu erreichen, befasst sich diese Arbeit mit mehreren wichtigen Herausforderungen, denen das räumliche maschinelle Lernen gegenübersteht. Diese sind die Modellierung von räumlicher Autokorrelation und räumlicher Heterogenität, die Auswahl eines geeigneten Modells für ein gegebenes räumliches Problem und das Verständnis komplexer räumlicher maschineller Lernmodelle

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB
    corecore