246 research outputs found

    Qualitative Spatial Query Processing : Towards Cognitive Geographic Information Systems

    Get PDF
    For a long time, Geographic Information Systems (GISs) have been used by GIS-experts to perform numerous tasks including way finding, mapping, and querying geo-spatial databases. The advancement of Web 2.0 technologies and the development of mobile-based device applications present an excellent opportunity to allow the public -non-expert users- to access information of GISs. However, the interfaces of GISs were mainly designed and developed based on quantitative values of spatial databases to serve GIS-experts, whereas non-expert users usually prefer a qualitative approach to interacting with GISs. For example, humans typically resort to expressions such as the building is near a riverbank or there is a restaurant inside a park which qualitatively locate the spatial entity with respect to another. In other words, the users' interaction with current GISs is still not intuitive and not efficient. This dissertation thusly aims at enabling users to intuitively and efficiently search spatial databases of GISs by means of qualitative relations or terms such as left, north of, or inside. We use these qualitative relations to formalise so-called Qualitative Spatial Queries (QSQs). Aside from existing topological models, we integrate distance and directional qualitative models into Spatial Data-Base Management Systems (SDBMSs) to allow the qualitative and intuitive formalism of queries in GISs. Furthermore, we abstract binary Qualitative Spatial Relations (QSRs) covering the aforementioned aspects of space from the database objects. We store the abstracted QSRs in a Qualitative Spatial Layer (QSL) that we extend into current SDBMSs to avoid the additional cost of the abstraction process when dealing with every single query. Nevertheless, abstracting the QSRs of QSL results in a high space complexity in terms of qualitative representations

    Compression vidéo basée sur l'exploitation d'un décodeur intelligent

    Get PDF
    This Ph.D. thesis studies the novel concept of Smart Decoder (SDec) where the decoder is given the ability to simulate the encoder and is able to conduct the R-D competition similarly as in the encoder. The proposed technique aims to reduce the signaling of competing coding modes and parameters. The general SDec coding scheme and several practical applications are proposed, followed by a long-term approach exploiting machine learning concept in video coding. The SDec coding scheme exploits a complex decoder able to reproduce the choice of the encoder based on causal references, eliminating thus the need to signal coding modes and associated parameters. Several practical applications of the general outline of the SDec scheme are tested, using different coding modes during the competition on the reference blocs. Despite the choice for the SDec reference block being still simple and limited, interesting gains are observed. The long-term research presents an innovative method that further makes use of the processing capacity of the decoder. Machine learning techniques are exploited in video coding with the purpose of reducing the signaling overhead. Practical applications are given, using a classifier based on support vector machine to predict coding modes of a block. The block classification uses causal descriptors which consist of different types of histograms. Significant bit rate savings are obtained, which confirms the potential of the approach.Cette thèse de doctorat étudie le nouveau concept de décodeur intelligent (SDec) dans lequel le décodeur est doté de la possibilité de simuler l’encodeur et est capable de mener la compétition R-D de la même manière qu’au niveau de l’encodeur. Cette technique vise à réduire la signalisation des modes et des paramètres de codage en compétition. Le schéma général de codage SDec ainsi que plusieurs applications pratiques sont proposées, suivis d’une approche en amont qui exploite l’apprentissage automatique pour le codage vidéo. Le schéma de codage SDec exploite un décodeur complexe capable de reproduire le choix de l’encodeur calculé sur des blocs de référence causaux, éliminant ainsi la nécessité de signaler les modes de codage et les paramètres associés. Plusieurs applications pratiques du schéma SDec sont testées, en utilisant différents modes de codage lors de la compétition sur les blocs de référence. Malgré un choix encore simple et limité des blocs de référence, les gains intéressants sont observés. La recherche en amont présente une méthode innovante qui permet d’exploiter davantage la capacité de traitement d’un décodeur. Les techniques d’apprentissage automatique sont exploitées pour but de réduire la signalisation. Les applications pratiques sont données, utilisant un classificateur basé sur les machines à vecteurs de support pour prédire les modes de codage d’un bloc. La classification des blocs utilise des descripteurs causaux qui sont formés à partir de différents types d’histogrammes. Des gains significatifs en débit sont obtenus, confirmant ainsi le potentiel de l’approche

    Low-latency, query-driven analytics over voluminous multidimensional, spatiotemporal datasets

    Get PDF
    2017 Summer.Includes bibliographical references.Ubiquitous data collection from sources such as remote sensing equipment, networked observational devices, location-based services, and sales tracking has led to the accumulation of voluminous datasets; IDC projects that by 2020 we will generate 40 zettabytes of data per year, while Gartner and ABI estimate 20-35 billion new devices will be connected to the Internet in the same time frame. The storage and processing requirements of these datasets far exceed the capabilities of modern computing hardware, which has led to the development of distributed storage frameworks that can scale out by assimilating more computing resources as necessary. While challenging in its own right, storing and managing voluminous datasets is only the precursor to a broader field of study: extracting knowledge, insights, and relationships from the underlying datasets. The basic building block of this knowledge discovery process is analytic queries, encompassing both query instrumentation and evaluation. This dissertation is centered around query-driven exploratory and predictive analytics over voluminous, multidimensional datasets. Both of these types of analysis represent a higher-level abstraction over classical query models; rather than indexing every discrete value for subsequent retrieval, our framework autonomously learns the relationships and interactions between dimensions in the dataset (including time series and geospatial aspects), and makes the information readily available to users. This functionality includes statistical synopses, correlation analysis, hypothesis testing, probabilistic structures, and predictive models that not only enable the discovery of nuanced relationships between dimensions, but also allow future events and trends to be predicted. This requires specialized data structures and partitioning algorithms, along with adaptive reductions in the search space and management of the inherent trade-off between timeliness and accuracy. The algorithms presented in this dissertation were evaluated empirically on real-world geospatial time-series datasets in a production environment, and are broadly applicable across other storage frameworks

    indexing and querying moving objects databases

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Distributed Partitioning and Processing of Large Spatial Datasets

    Full text link
    Data collection is one of the most common practices in today’s world. The data collection rate has rapidly increased over the past decade and is not showing any signs of decline. Data sources are many; the Internet of Things devices, mobile gadgets, social media posts, connected cars, and web servers constantly report on their users’ interactions and habits. Much of the collected data is spatial data which contains attributes that denote the physical origin of the data. As a result of the tremendous growth in data collection, higher demand for new techniques emerged to efficiently process and extract valuable insights in a relatively acceptable time frame. The current standard approach to large-scale data analysis uses distributed parallel processing systems like Apache Hadoop and Apache Spark. However, these systems are designed for general-purpose parallel processing and require an additional layer to recognize and efficiently process spatial datasets. Motivated by its many applications, we examine the several challenges facing spatial data partitioning and processing and propose solutions customized for each task. We detail our techniques for building spatial partitioners over large datasets for use with spatial queries like map-matching and kNN spatial join. Additionally, we present an accuracy benchmarking framework for comparing and classifying the results of two input files based on specific criteria. Our proposed work targets batch processing of large spatial datasets, including structured, unstructured, and semi-structured datasets

    Similarity processing in multi-observation data

    Get PDF
    Many real-world application domains such as sensor-monitoring systems for environmental research or medical diagnostic systems are dealing with data that is represented by multiple observations. In contrast to single-observation data, where each object is assigned to exactly one occurrence, multi-observation data is based on several occurrences that are subject to two key properties: temporal variability and uncertainty. When defining similarity between data objects, these properties play a significant role. In general, methods designed for single-observation data hardly apply for multi-observation data, as they are either not supported by the data models or do not provide sufficiently efficient or effective solutions. Prominent directions incorporating the key properties are the fields of time series, where data is created by temporally successive observations, and uncertain data, where observations are mutually exclusive. This thesis provides research contributions for similarity processing - similarity search and data mining - on time series and uncertain data. The first part of this thesis focuses on similarity processing in time series databases. A variety of similarity measures have recently been proposed that support similarity processing w.r.t. various aspects. In particular, this part deals with time series that consist of periodic occurrences of patterns. Examining an application scenario from the medical domain, a solution for activity recognition is presented. Finally, the extraction of feature vectors allows the application of spatial index structures, which support the acceleration of search and mining tasks resulting in a significant efficiency gain. As feature vectors are potentially of high dimensionality, this part introduces indexing approaches for the high-dimensional space for the full-dimensional case as well as for arbitrary subspaces. The second part of this thesis focuses on similarity processing in probabilistic databases. The presence of uncertainty is inherent in many applications dealing with data collected by sensing devices. Often, the collected information is noisy or incomplete due to measurement or transmission errors. Furthermore, data may be rendered uncertain due to privacy-preserving issues with the presence of confidential information. This creates a number of challenges in terms of effectively and efficiently querying and mining uncertain data. Existing work in this field either neglects the presence of dependencies or provides only approximate results while applying methods designed for certain data. Other approaches dealing with uncertain data are not able to provide efficient solutions. This part presents query processing approaches that outperform existing solutions of probabilistic similarity ranking. This part finally leads to the application of the introduced techniques to data mining tasks, such as the prominent problem of probabilistic frequent itemset mining.Viele Anwendungsgebiete, wie beispielsweise die Umweltforschung oder die medizinische Diagnostik, nutzen Systeme der Sensorüberwachung. Solche Systeme müssen oftmals in der Lage sein, mit Daten umzugehen, welche durch mehrere Beobachtungen repräsentiert werden. Im Gegensatz zu Daten mit nur einer Beobachtung (Single-Observation Data) basieren Daten aus mehreren Beobachtungen (Multi-Observation Data) auf einer Vielzahl von Beobachtungen, welche zwei Schlüsseleigenschaften unterliegen: Zeitliche Veränderlichkeit und Datenunsicherheit. Im Bereich der Ähnlichkeitssuche und im Data Mining spielen diese Eigenschaften eine wichtige Rolle. Gängige Lösungen in diesen Bereichen, die für Single-Observation Data entwickelt wurden, sind in der Regel für den Umgang mit mehreren Beobachtungen pro Objekt nicht anwendbar. Der Grund dafür liegt darin, dass diese Ansätze entweder nicht mit den Datenmodellen vereinbar sind oder keine Lösungen anbieten, die den aktuellen Ansprüchen an Lösungsqualität oder Effizienz genügen. Bekannte Forschungsrichtungen, die sich mit Multi-Observation Data und deren Schlüsseleigenschaften beschäftigen, sind die Analyse von Zeitreihen und die Ähnlichkeitssuche in probabilistischen Datenbanken. Während erstere Richtung eine zeitliche Ordnung der Beobachtungen eines Objekts voraussetzt, basieren unsichere Datenobjekte auf Beobachtungen, die sich gegenseitig bedingen oder ausschließen. Diese Dissertation umfasst aktuelle Forschungsbeiträge aus den beiden genannten Bereichen, wobei Methoden zur Ähnlichkeitssuche und zur Anwendung im Data Mining vorgestellt werden. Der erste Teil dieser Arbeit beschäftigt sich mit Ähnlichkeitssuche und Data Mining in Zeitreihendatenbanken. Insbesondere werden Zeitreihen betrachtet, welche aus periodisch auftretenden Mustern bestehen. Im Kontext eines medizinischen Anwendungsszenarios wird ein Ansatz zur Aktivitätserkennung vorgestellt. Dieser erlaubt mittels Merkmalsextraktion eine effiziente Speicherung und Analyse mit Hilfe von räumlichen Indexstrukturen. Für den Fall hochdimensionaler Merkmalsvektoren stellt dieser Teil zwei Indexierungsmethoden zur Beschleunigung von ähnlichkeitsanfragen vor. Die erste Methode berücksichtigt alle Attribute der Merkmalsvektoren, während die zweite Methode eine Projektion der Anfrage auf eine benutzerdefinierten Unterraum des Vektorraums erlaubt. Im zweiten Teil dieser Arbeit wird die Ähnlichkeitssuche im Kontext probabilistischer Datenbanken behandelt. Daten aus Sensormessungen besitzen häufig Eigenschaften, die einer gewissen Unsicherheit unterliegen. Aufgrund von Mess- oder übertragungsfehlern sind gemessene Werte oftmals unvollständig oder mit Rauschen behaftet. In diversen Szenarien, wie beispielsweise mit persönlichen oder medizinisch vertraulichen Daten, können Daten auch nachträglich von Hand verrauscht werden, so dass eine genaue Rekonstruktion der ursprünglichen Informationen nicht möglich ist. Diese Gegebenheiten stellen Anfragetechniken und Methoden des Data Mining vor einige Herausforderungen. In bestehenden Forschungsarbeiten aus dem Bereich der unsicheren Datenbanken werden diverse Probleme oftmals nicht beachtet. Entweder wird die Präsenz von Abhängigkeiten ignoriert, oder es werden lediglich approximative Lösungen angeboten, welche die Anwendung von Methoden für sichere Daten erlaubt. Andere Ansätze berechnen genaue Lösungen, liefern die Antworten aber nicht in annehmbarer Laufzeit zurück. Dieser Teil der Arbeit präsentiert effiziente Methoden zur Beantwortung von Ähnlichkeitsanfragen, welche die Ergebnisse absteigend nach ihrer Relevanz, also eine Rangliste der Ergebnisse, zurückliefern. Die angewandten Techniken werden schließlich auf Problemstellungen im probabilistischen Data Mining übertragen, um beispielsweise das Problem des Frequent Itemset Mining unter Berücksichtigung des vollen Gehalts an Unsicherheitsinformation zu lösen

    Modeling and Simulation in Engineering

    Get PDF
    This book provides an open platform to establish and share knowledge developed by scholars, scientists, and engineers from all over the world, about various applications of the modeling and simulation in the design process of products, in various engineering fields. The book consists of 12 chapters arranged in two sections (3D Modeling and Virtual Prototyping), reflecting the multidimensionality of applications related to modeling and simulation. Some of the most recent modeling and simulation techniques, as well as some of the most accurate and sophisticated software in treating complex systems, are applied. All the original contributions in this book are jointed by the basic principle of a successful modeling and simulation process: as complex as necessary, and as simple as possible. The idea is to manipulate the simplifying assumptions in a way that reduces the complexity of the model (in order to make a real-time simulation), but without altering the precision of the results

    Introductory Computer Forensics

    Get PDF
    INTERPOL (International Police) built cybercrime programs to keep up with emerging cyber threats, and aims to coordinate and assist international operations for ?ghting crimes involving computers. Although signi?cant international efforts are being made in dealing with cybercrime and cyber-terrorism, ?nding effective, cooperative, and collaborative ways to deal with complicated cases that span multiple jurisdictions has proven dif?cult in practic

    Predictive maintenance of electrical grid assets: internship at EDP Distribuição - Energia S.A

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis report will describe the activities developed during an internship at EDP Distribuição, focusing on a Predictive Maintenance analytics project directed at high voltage electrical grid assets including Overhead Lines, Power Transformers and Circuit Breakers. The project’s main goal is to support EDP’s asset management processes by improving maintenance and investing planning. The project’s main deliverables are the Probability of Failure metric that forecast asset failures 15 days ahead of time, estimated through supervised machine learning models; the Health Index metric that indicates asset’s current state and condition, implemented though the Ofgem methodology; and two asset management dashboards. The project was implemented by an external service provider, a consultant company, and during the internship it was possible to integrate the team, and participate in the development activities

    Design of a secure architecture for the exchange of biomedical information in m-Health scenarios

    Get PDF
    El paradigma de m-Salud (salud móvil) aboga por la integración masiva de las más avanzadas tecnologías de comunicación, red móvil y sensores en aplicaciones y sistemas de salud, para fomentar el despliegue de un nuevo modelo de atención clínica centrada en el usuario/paciente. Este modelo tiene por objetivos el empoderamiento de los usuarios en la gestión de su propia salud (p.ej. aumentando sus conocimientos, promocionando estilos de vida saludable y previniendo enfermedades), la prestación de una mejor tele-asistencia sanitaria en el hogar para ancianos y pacientes crónicos y una notable disminución del gasto de los Sistemas de Salud gracias a la reducción del número y la duración de las hospitalizaciones. No obstante, estas ventajas, atribuidas a las aplicaciones de m-Salud, suelen venir acompañadas del requisito de un alto grado de disponibilidad de la información biomédica de sus usuarios para garantizar una alta calidad de servicio, p.ej. fusionar varias señales de un usuario para obtener un diagnóstico más preciso. La consecuencia negativa de cumplir esta demanda es el aumento directo de las superficies potencialmente vulnerables a ataques, lo que sitúa a la seguridad (y a la privacidad) del modelo de m-Salud como factor crítico para su éxito. Como requisito no funcional de las aplicaciones de m-Salud, la seguridad ha recibido menos atención que otros requisitos técnicos que eran más urgentes en etapas de desarrollo previas, tales como la robustez, la eficiencia, la interoperabilidad o la usabilidad. Otro factor importante que ha contribuido a retrasar la implementación de políticas de seguridad sólidas es que garantizar un determinado nivel de seguridad implica unos costes que pueden ser muy relevantes en varias dimensiones, en especial en la económica (p.ej. sobrecostes por la inclusión de hardware extra para la autenticación de usuarios), en el rendimiento (p.ej. reducción de la eficiencia y de la interoperabilidad debido a la integración de elementos de seguridad) y en la usabilidad (p.ej. configuración más complicada de dispositivos y aplicaciones de salud debido a las nuevas opciones de seguridad). Por tanto, las soluciones de seguridad que persigan satisfacer a todos los actores del contexto de m-Salud (usuarios, pacientes, personal médico, personal técnico, legisladores, fabricantes de dispositivos y equipos, etc.) deben ser robustas y al mismo tiempo minimizar sus costes asociados. Esta Tesis detalla una propuesta de seguridad, compuesta por cuatro grandes bloques interconectados, para dotar de seguridad a las arquitecturas de m-Salud con unos costes reducidos. El primer bloque define un esquema global que proporciona unos niveles de seguridad e interoperabilidad acordes con las características de las distintas aplicaciones de m-Salud. Este esquema está compuesto por tres capas diferenciadas, diseñadas a la medidas de los dominios de m-Salud y de sus restricciones, incluyendo medidas de seguridad adecuadas para la defensa contra las amenazas asociadas a sus aplicaciones de m-Salud. El segundo bloque establece la extensión de seguridad de aquellos protocolos estándar que permiten la adquisición, el intercambio y/o la administración de información biomédica -- por tanto, usados por muchas aplicaciones de m-Salud -- pero no reúnen los niveles de seguridad detallados en el esquema previo. Estas extensiones se concretan para los estándares biomédicos ISO/IEEE 11073 PHD y SCP-ECG. El tercer bloque propone nuevas formas de fortalecer la seguridad de los tests biomédicos, que constituyen el elemento esencial de muchas aplicaciones de m-Salud de carácter clínico, mediante codificaciones novedosas. Finalmente el cuarto bloque, que se sitúa en paralelo a los anteriores, selecciona herramientas genéricas de seguridad (elementos de autenticación y criptográficos) cuya integración en los otros bloques resulta idónea, y desarrolla nuevas herramientas de seguridad, basadas en señal -- embedding y keytagging --, para reforzar la protección de los test biomédicos.The paradigm of m-Health (mobile health) advocates for the massive integration of advanced mobile communications, network and sensor technologies in healthcare applications and systems to foster the deployment of a new, user/patient-centered healthcare model enabling the empowerment of users in the management of their health (e.g. by increasing their health literacy, promoting healthy lifestyles and the prevention of diseases), a better home-based healthcare delivery for elderly and chronic patients and important savings for healthcare systems due to the reduction of hospitalizations in number and duration. It is a fact that many m-Health applications demand high availability of biomedical information from their users (for further accurate analysis, e.g. by fusion of various signals) to guarantee high quality of service, which on the other hand entails increasing the potential surfaces for attacks. Therefore, it is not surprising that security (and privacy) is commonly included among the most important barriers for the success of m-Health. As a non-functional requirement for m-Health applications, security has received less attention than other technical issues that were more pressing at earlier development stages, such as reliability, eficiency, interoperability or usability. Another fact that has contributed to delaying the enforcement of robust security policies is that guaranteeing a certain security level implies costs that can be very relevant and that span along diferent dimensions. These include budgeting (e.g. the demand of extra hardware for user authentication), performance (e.g. lower eficiency and interoperability due to the addition of security elements) and usability (e.g. cumbersome configuration of devices and applications due to security options). Therefore, security solutions that aim to satisfy all the stakeholders in the m-Health context (users/patients, medical staff, technical staff, systems and devices manufacturers, regulators, etc.) shall be robust and, at the same time, minimize their associated costs. This Thesis details a proposal, composed of four interrelated blocks, to integrate appropriate levels of security in m-Health architectures in a cost-efcient manner. The first block designes a global scheme that provides different security and interoperability levels accordingto how critical are the m-Health applications to be implemented. This consists ofthree layers tailored to the m-Health domains and their constraints, whose security countermeasures defend against the threats of their associated m-Health applications. Next, the second block addresses the security extension of those standard protocols that enable the acquisition, exchange and/or management of biomedical information | thus, used by many m-Health applications | but do not meet the security levels described in the former scheme. These extensions are materialized for the biomedical standards ISO/IEEE 11073 PHD and SCP-ECG. Then, the third block proposes new ways of enhancing the security of biomedical standards, which are the centerpiece of many clinical m-Health applications, by means of novel codings. Finally the fourth block, with is parallel to the others, selects generic security methods (for user authentication and cryptographic protection) whose integration in the other blocks results optimal, and also develops novel signal-based methods (embedding and keytagging) for strengthening the security of biomedical tests. The layer-based extensions of the standards ISO/IEEE 11073 PHD and SCP-ECG can be considered as robust, cost-eficient and respectful with their original features and contents. The former adds no attributes to its data information model, four new frames to the service model |and extends four with new sub-frames|, and only one new sub-state to the communication model. Furthermore, a lightweight architecture consisting of a personal health device mounting a 9 MHz processor and an aggregator mounting a 1 GHz processor is enough to transmit a 3-lead electrocardiogram in real-time implementing the top security layer. The extra requirements associated to this extension are an initial configuration of the health device and the aggregator, tokens for identification/authentication of users if these devices are to be shared and the implementation of certain IHE profiles in the aggregator to enable the integration of measurements in healthcare systems. As regards to the extension of SCP-ECG, it only adds a new section with selected security elements and syntax in order to protect the rest of file contents and provide proper role-based access control. The overhead introduced in the protected SCP-ECG is typically 2{13 % of the regular file size, and the extra delays to protect a newly generated SCP-ECG file and to access it for interpretation are respectively a 2{10 % and a 5 % of the regular delays. As regards to the signal-based security techniques developed, the embedding method is the basis for the proposal of a generic coding for tests composed of biomedical signals, periodic measurements and contextual information. This has been adjusted and evaluated with electrocardiogram and electroencephalogram-based tests, proving the objective clinical quality of the coded tests, the capacity of the coding-access system to operate in real-time (overall delays of 2 s for electrocardiograms and 3.3 s for electroencephalograms) and its high usability. Despite of the embedding of security and metadata to enable m-Health services, the compression ratios obtained by this coding range from ' 3 in real-time transmission to ' 5 in offline operation. Complementarily, keytagging permits associating information to images (and other signals) by means of keys in a secure and non-distorting fashion, which has been availed to implement security measures such as image authentication, integrity control and location of tampered areas, private captioning with role-based access control, traceability and copyright protection. The tests conducted indicate a remarkable robustness-capacity tradeoff that permits implementing all this measures simultaneously, and the compatibility of keytagging with JPEG2000 compression, maintaining this tradeoff while setting the overall keytagging delay in only ' 120 ms for any image size | evidencing the scalability of this technique. As a general conclusion, it has been demonstrated and illustrated with examples that there are various, complementary and structured manners to contribute in the implementation of suitable security levels for m-Health architectures with a moderate cost in budget, performance, interoperability and usability. The m-Health landscape is evolving permanently along all their dimensions, and this Thesis aims to do so with its security. Furthermore, the lessons learned herein may offer further guidance for the elaboration of more comprehensive and updated security schemes, for the extension of other biomedical standards featuring low emphasis on security or privacy, and for the improvement of the state of the art regarding signal-based protection methods and applications
    corecore