14 research outputs found

    Parallel Mapper

    Full text link
    The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint trees. In this paper, we study the parallel analysis of the construction of Mapper. We give a provably correct parallel algorithm to execute Mapper on multiple processors and discuss the performance results that compare our approach to a reference sequential Mapper implementation. We report the performance experiments that demonstrate the efficiency of our method

    PROPOSAL FOR A LANDSCAPE EVALUATION SYSTEM

    Get PDF
    This paper describes a system, currently being designed, for both perceptual analysis and aesthetic evaluation of a landscape. The choice of this topic is motivated by the fact that systems related to landscape visibility (e.g. Geographical Information Systems (GIS) for visual impact assessment) are not fully satisfactory when it comes to assessing the aesthetic appearance. They mainly analyse geometric aspects, such as the width of visual basins or the interference of visual trajectories, which can be expressed by objective and comparable parameters. Instead, for effective landscape knowledge and protection, it is important to consider other factors that cannot be easily measured, namely the quality of human perception, i.e. the aesthetic judgements that people can express about a landscape. Based on these considerations, a system has been designed in order to analyse the elements that can influence the aesthetic judgement of a landscape and therefore simulate the more probable aesthetic judgement. Unlike GIS generally works, this system does not use maps, but perspective views obtained by means of vehicle-mounted cameras, as in mobile mapping technology (MMT). Research into the system described below consisted of two parts: firstly how to form the database on which the system is based and secondly how to use the system. The database contains a large number of views analysed in terms of geometric, qualitative, thematic, topological and gestalt aspects; the results of these analyses are recorded in tables and improved through a parameter expressing an aesthetic judgement. This aesthetic judgement is obtained by processing the responses of a group of participants to a sociological and/or neurological survey (i.e. Functional Magnetic Resonance Imaging). In the operational phase, a new view will be evaluated by comparison with the views stored in the database The new view will be given a judgment, obtained by processing the judgments of the most similar views. The idea of this system applies both to the assessment of a single view and to the evaluation of territorial contexts. Once this system has been defined, it will have to be tested through practical application

    Cliques are bricks for k-CT graphs

    Get PDF
    Many real networks in biology, chemistry, industry, ecological systems, or social networks have an inherent structure of simplicial complexes reflecting many-body interactions. Over the past few decades, a variety of complex systems have been successfully described as networks whose links connect interacting pairs of nodes. Simplicial complexes capture the many-body interactions between two or more nodes and generalized network structures to allow us to go beyond the framework of pairwise interactions. Therefore, to analyze the topological and dynamic properties of simplicial complex networks, the closed trail metric is proposed here. In this article, we focus on the evolution of simplicial complex networks from clicks and k-CT graphs. This approach is used to describe the evolution of real simplicial complex networks. We conclude with a summary of composition k-CT graphs (glued graphs); their closed trail distances are in a specified range.Web of Science911art. no. 116

    Verifying big data topologies by-design: a semi-automated approach

    Get PDF
    Big data architectures have been gaining momentum in recent years. For instance, Twitter uses stream processing frameworks like Apache Storm to analyse billions of tweets per minute and learn the trending topics. However, architectures that process big data involve many different components interconnected via semantically different connectors. Such complex architectures make possible refactoring of the applications a difficult task for software architects, as applications might be very different with respect to the initial designs. As an aid to designers and developers, we developed OSTIA (Ordinary Static Topology Inference Analysis) that allows detecting the occurrence of common anti-patterns across big data architectures and exploiting software verification techniques on the elicited architectural models. This paper illustrates OSTIA and evaluates its uses and benefits on three industrial-scale case-studies

    Articles indexats publicats per investigadors del Campus de Terrassa: 2017

    Get PDF
    Aquest informe recull els 241 treballs publicats per 222 investigadors/es del Campus de Terrassa en revistes indexades al Journal Citation Report durant el 2017Postprint (published version

    Sheaf Theory as a Foundation for Heterogeneous Data Fusion

    Get PDF
    A major impediment to scientific progress in many fields is the inability to make sense of the huge amounts of data that have been collected via experiment or computer simulation. This dissertation provides tools to visualize, represent, and analyze the collection of sensors and data all at once in a single combinatorial geometric object. Encoding and translating heterogeneous data into common language are modeled by supporting objects. In this methodology, the behavior of the system based on the detection of noise in the system, possible failure in data exchange and recognition of the redundant or complimentary sensors are studied via some related geometric objects. Applications of the constructed methodology are described by two case studies: one from wildfire threat monitoring and the other from air traffic monitoring. Both cases are distributed (spatial and temporal) information systems. The systems deal with temporal and spatial fusion of heterogeneous data obtained from multiple sources, where the schema, availability and quality vary. The behavior of both systems is explained thoroughly in terms of the detection of the failure in the systems and the recognition of the redundant and complimentary sensors. A comparison between the methodology in this dissertation and the alternative methods is described to further verify the validity of the sheaf theory method. It is seen that the method has less computational complexity in both space and time

    A framework for multidimensional indexes on distributed and highly-available data stores

    Get PDF
    Spatial Big Data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of peta bytes of spatial data per year. However, as many authors have pointed out, the lack of specialized frameworks dealing with such kind of data is limiting possible applications and probably precluding many scientific breakthroughs. In this thesis, we describe three HPC scientific applications, ranging from molecular dynamics, neuroscience analysis, and physics simulations, where we experience first hand the limits of the existing technologies. Thanks to our experience, we define the desirable missing functionalities, and we focus on two features that when combined significantly improve the way scientific data is analyzed. On one side, scientific simulations generate complex datasets where multiple correlated characteristics describe each item. For instance, a particle might have a space position (x,y,z) at a given time (t). If we want to find all elements within the same area and period, we either have to scan the whole dataset, or we must organize the data so that all items in the same space and time are stored together. The second approach is called Multidimensional Indexing (MI), and it uses different techniques to cluster and to organize similar data together. On the other side, approximate analytics has been often indicated as a smart and flexible way to explore large datasets in a short period. Approximate analytics includes a broad family of algorithms which aims to speed up analytical workloads by relaxing the precision of the results within a specific interval of confidence. For instance, if we want to know the average age in a group with 1-year precision, we can consider just a random fraction of all the people, thus reducing the amount of calculation. But if we also want less I/O operations, we need efficient data sampling, which means organizing data in a way that we do not need to scan the whole data set to generate a random sample of it. According to our analysis, combining Multidimensional Indexing with efficient data Sampling (MIS) is a vital missing feature not available in the current distributed data management solutions. This thesis aims to solve such a shortcoming and it provides novel scalable solutions. At first, we describe the existing data management alternatives; then we motivate our preference for NoSQL key-value databases. Secondly, we propose an analytical model to study the influence of data models on the scalability and performance of this kind of distributed database. Thirdly, we use the analytical model to design two novel multidimensional indexes with efficient data sampling: the D8tree and the AOTree. Our first solution, the D8tree, improves state of the art for approximate spatial queries on static and mostly read dataset. Later, we enhanced the data ingestion capability or our approach by introducing the AOTree, an algorithm that enables the query performance of the D8tree even for HPC write-intensive applications. We compared our solution with PostgreSQL and plain storage, and we demonstrate that our proposal has better performance and scalability. Finally, we describe Qbeast, the novel distributed system that implements the D8tree and the AOTree using NoSQL technologies, and we illustrate how Qbeast simplifies the workflow of scientists in various HPC applications providing a scalable and integrated solution for data analysis and management.La gestión de BigData con información espacial está considerada como una tendencia esencial en el futuro de las aplicaciones científicas y de negocio. De hecho, se generan cientos de petabytes de datos espaciales por año mediante instrumentos de investigación, dispositivos médicos y redes sociales. Sin embargo, tal y como muchos autores han señalado, la falta de entornos especializados en manejar este tipo de datos está limitando sus posibles aplicaciones y está impidiendo muchos avances científicos. En esta tesis, describimos 3 aplicaciones científicas HPC, que cubren los ámbitos de dinámica molecular, análisis neurocientífico y simulaciones físicas, donde hemos experimentado en primera mano las limitaciones de las tecnologías existentes. Gracias a nuestras experiencias, hemos podido definir qué funcionalidades serían deseables y no existen, y nos hemos centrado en dos características que, al combinarlas, mejoran significativamente la manera en la que se analizan los datos científicos. Por un lado, las simulaciones científicas generan conjuntos de datos complejos, en los que cada elemento es descrito por múltiples características correlacionadas. Por ejemplo, una partícula puede tener una posición espacial (x, y, z) en un momento dado (t). Si queremos encontrar todos los elementos dentro de la misma área y periodo, o bien recorremos y analizamos todo el conjunto de datos, o bien organizamos los datos de manera que se almacenen juntos todos los elementos que comparten área en un momento dado. Esta segunda opción se conoce como Indexación Multidimensional (IM) y usa diferentes técnicas para agrupar y organizar datos similares. Por otro lado, se suele señalar que las analíticas aproximadas son una manera inteligente y flexible de explorar grandes conjuntos de datos en poco tiempo. Este tipo de analíticas incluyen una amplia familia de algoritmos que acelera el tiempo de procesado, relajando la precisión de los resultados dentro de un determinado intervalo de confianza. Por ejemplo, si queremos saber la edad media de un grupo con precisión de un año, podemos considerar sólo un subconjunto aleatorio de todas las personas, reduciendo así la cantidad de cálculo. Pero si además queremos menos operaciones de entrada/salida, necesitamos un muestreo eficiente de datos, que implica organizar los datos de manera que no necesitemos recorrerlos todos para generar una muestra aleatoria. De acuerdo con nuestros análisis, la combinación de Indexación Multidimensional con Muestreo eficiente de datos (IMM) es una característica vital que no está disponible en las soluciones actuales de gestión distribuida de datos. Esta tesis pretende resolver esta limitación y proporciona unas soluciones novedosas que son escalables. En primer lugar, describimos las alternativas de gestión de datos que existen y motivamos nuestra preferencia por las bases de datos NoSQL basadas en clave-valor. En segundo lugar, proponemos un modelo analítico para estudiar la influencia que tienen los modelos de datos sobre la escalabilidad y el rendimiento de este tipo de bases de datos distribuidas. En tercer lugar, usamos el modelo analítico para diseñar dos novedosos algoritmos IMM: el D8tree y el AOTree. Nuestra primera solución, el D8tree, mejora el estado del arte actual para consultas espaciales aproximadas, cuando el conjunto de datos es estático y mayoritariamente de lectura. Después, mejoramos la capacidad de ingestión introduciendo el AOTree, un algoritmo que conserva el rendimiento del D8tree incluso para aplicaciones HPC intensivas en escritura. Hemos comparado nuestra solución con PostgreSQL y almacenamiento plano demostrando que nuestra propuesta mejora tanto el rendimiento como la escalabilidad. Finalmente, describimos Qbeast, el sistema que implementa los algoritmos D8tree y AOTree, e ilustramos cómo Qbeast simplifica el flujo de trabajo de los científicos ofreciendo una solución escalable e integraPostprint (published version

    Métodos de Machine Learning para Eficiência Energética

    Get PDF
    Os centros de processamento de dados (Data Centers) são locais onde existe uma quantidade significativa de armazenamento de dados e vários equipamentos para os processar. Devido à grande evolução do volume global de dados (Big Data) e à tendência de processar quantidades de dados cada vez maiores, a gestão dos Data Centers torna-se mais complexa. São o núcleo dos negócios modernos das grandes empresas nos tempos de hoje, onde a utilização da cloud aumenta exponencialmente tornando o acesso e armazenamento de dados mais rápido e eficaz, e que combina com um conceito que vai ganhando cada vez mais credibilidade- Internet das coisas (Internet of Things). Com estes fatores que provocam o rápido crescimento do Big Data, os equipamentos num Data Center consequentemente aumentam a carga da sua utilização. O aumento da sua utilização provoca um maior aquecimento nesses equipamentos, que gera aumentos de temperatura nos Data Centers. Um dos maiores custos para a organização que controla um Data Center é a energia, principalmente a energia utilizada na refrigeração. Devido às evoluções apresentadas, a energia utilizada para a refrigeração aumenta drasticamente. No entanto, este aumento não é controlado, tornando o uso da energia bastante ineficiente, constituindo um custo excessivo para as organizações possuidoras de Data Centers. Neste projeto são apresentadas soluções para combater o uso ineficiente da energia nos chillers, equipamentos que realizam a refrigeração dos Data Centers. Estas soluções incluem implementações de algoritmos machine learning que oferecem inteligência artificial a estes equipamentos de refrigeração, com o intuito de tornar o seu funcionamento mais adequado para a situação apresentada. Este tipo de algoritmos tem a possibilidade de aprender continuamente, sendo possível prever situações futuras e descobrir padrões interessantes de modo a que sejam tratados da forma mais adequada. Foram realizados vários testes para validar os algoritmos implementados, sendo o objetivo principal, na prática, o aumento da eficiência energética por parte dos chillers e consequentemente a diminuição dos custos para os proprietários dos Data Centers
    corecore