78 research outputs found

    Privacy-Preserving Publishing of Knowledge Graphs

    Get PDF
    openOnline social networks (OSNs) attract a huge number of users sharing their data every day. These data can be shared with third parties for various usage purposes, such as data analytics and machine learning. Unfortunately, adversaries can exploit shared data to infer users’ sensitive information. Various anonymization solutions have been presented to anonymize shared data such that it is harder for adversaries to infer users’ personal information. Whereas OSNs contain both users’ attributes and relationships, previous work only consider anonymizing either attributes, illustrated in relational data or relationships, represented in directed graphs. To cope with this issue, in this thesis, we consider the research challenge of anonymizing knowledge graphs (KGs), due to their flexibility in representing both attributes’ values and relationships of users. The anonymization of KGs is not trivial since adversaries can exploit both attributes and relationships of their victims. In the era of big data, these solutions are significant as they allow data providers to share attributes’ values and relationships together. Over the last three years, we have done important research efforts which has resulted in the definition of different anonymization solutions for KGs for many relevant scenarios, i.e., anonymization of static KGs, sequential anonymization of KGs, and personalized anonymization of KGs. Since KGs are directed graphs, we started our research by investigating anonymization solutions for directed graphs. As anonymization algorithms proposed in the literature (i.e., the Paired k-degree) cannot always anonymize graphs, we first presented the Cluster-Based Directed Graph Anonymization Algorithm (CDGA). We proved that CDGA can always generate anonymized directed graphs. We analyzed an attacking scenario where an adversary can exploit attributes’ values and relationships of his/her victims to re-identify these victims in anonymized KGs. To protect users in this scenario, we presented the k-Attribute Degree (k-ad) protection model to ensure that users cannot be re-identified with a confidence higher than 1 k . We proposed the Cluster-Based Knowledge Graph Anonymization Algorithm (CKGA) to anonymize KGs for this scenario. CKGA has been designed for a scenario where KGs are statically anonymized. Unfortunately, the adversary can still re-identify his/her victims if he/she has access to many versions of the anonymized KG. To cope with this issue, we further presented the k w-Time-Varying Attribute Degree to give users the same protection of k-ad even if the adversary gains access to w continuous anonymized KGs. In addition, we proposed the Cluster-based Time-Varying Knowledge Graph Anonymization Algorithm to anonymize KGs while allowing data providers to insert/re-insert/remove/update nodes and edges of their KGs. However, users are not allowed to specify their privacy preferences which are crucial to for those users requiring strong privacy protection, such as influencers. To this end, we proposed the Personalized k-Attribute Degree to allow users to specify their own value of k. The effectiveness of the proposed algorithms has been tested with experiments on real-life datasets.openInformatica e matematica del calcoloHOANG ANH-TUHoang, ANH-T

    Real-time Traffic State Assessment using Multi-source Data

    Get PDF
    The normal flow of traffic is impeded by abnormal events and the impacts of the events extend over time and space. In recent years, with the rapid growth of multi-source data, traffic researchers seek to leverage those data to identify the spatial-temporal dynamics of traffic flow and proactively manage abnormal traffic conditions. However, the characteristics of data collected by different techniques have not been fully understood. To this end, this study presents a series of studies to provide insight to data from different sources and to dynamically detect real-time traffic states utilizing those data. Speed is one of the three traffic fundamental parameters in traffic flow theory that describe traffic flow states. While the speed collection techniques evolve over the past decades, the average speed calculation method has not been updated. The first section of this study pointed out the traditional harmonic mean-based average speed calculation method can produce erroneous results for probe-based data. A new speed calculation method based on the fundamental definition was proposed instead. The second section evaluated the spatial-temporal accuracy of a different type of crowdsourced data - crowdsourced user reports and revealed Waze user behavior. Based on the evaluation results, a traffic detection system was developed to support the dynamic detection of incidents and traffic queues. A critical problem with current automatic incident detection algorithms (AIDs) which limits their application in practice is their heavy calibration requirements. The third section solved this problem by proposing a selfevaluation module that determines the occurrence of traffic incidents and serves as an autocalibration procedure. Following the incident detection, the fourth section proposed a clustering algorithm to detect the spatial-temporal movements of congestion by clustering crowdsource reports. This study contributes to the understanding of fundamental parameters and expands the knowledge of multi-source data. It has implications for future speed, flow, and density calculation with data collection technique advancements. Additionally, the proposed dynamic algorithms allow the system to run automatically with minimum human intervention thus promote the intelligence of the traffic operation system. The algorithms not only apply to incident and queue detection but also apply to a variety of detection systems

    Machine learning for aircraft trajectory prediction: a solution for pre-tactical air traffic flow management

    Get PDF
    Pla de Doctorats Industrials de la Generalitat de Catalunya(English) The goal of air traffic flow and capacity management (ATFCM) is to ensure that airport and airspace capacity meet traffic demand while optimising traffic flows to avoid exceeding the available capacity when it cannot be further increased. In Europe, ATFCM is handled by EUROCONTROL, in its role of Network Manager (NM), and comprises three phases: strategic, pre-tactical, and tactical. This thesis is focused on the pre-tactical phase, which covers the six days prior to the day of operations. During the pre-tactical phase, few or no flight plans (FPLs) have been filed by airspace users (AUs) and the only flight information available to the NM are the so-called flight intentions (FIs), consisting mainly of flight schedules. Trajectory information becomes available only when the AUs send their FPLs. This information is required to ensure a correct allocation of resources in coordination with air navigation service providers (ANSPs). To forecast FPLs before they are filed by the AUs, the NM relies on the PREDICT tool, which generates traffic forecasts for the whole European Civil Aviation Conference (ECAC) area according to the trajectories chosen by the same or similar flights in the recent past, without taking advantage of the information on AU choices encoded in historical data. The goal of the present PhD thesis is to develop a solution for pre-tactical traffic forecast that improves the predictive performance of the PREDICT tool while being able to cope with the entire set of flights in the ECAC network in a computationally efficient manner. To this end, trajectory forecasting approaches based on machine learning models trained on historical data have been explored, evaluating their predictive performance. In the application of machine learning techniques to trajectory prediction, three fundamental methodological choices have to be made: (i) approach to trajectory clustering, which is used to group similar trajectories in order to simplify the trajectory prediction problem; (ii) model formulation; and (iii) model training approach. The contribution of this PhD thesis to the state of the-art lies in the first two areas. First, we have developed a novel route clustering technique based on the area comprised between two routes that reduces the required computational time and increases the scalability with respect to other clustering techniques described in the literature. Second, we have developed, tested and evaluated two new modelling approaches for route prediction. The first approach consists in building and training an independent machine learning model for each origin destination (OD) pair in the network, taking as inputs different variables available from FIs plus other variables related to weather and to the number of regulations. This approach improves the performance of the PREDICT model, but it also has an important limitation: it does not consider changes in the airspace structure, thus being unable to predict routes not available in the training data and sometimes predicting routes that are not compatible with the airspace structure. The second approach is an airline-based approach, which consists in building and training a model for each airline. The limitations of the first model are overcome by considering as input variables not only the variables available from the FIs and the weather, but also airspace restrictions and route characteristics (e.g., route cost, length, etc.). The airline-based approach yields a significant improvement with respect to PREDICT and to the OD pair-based model, achieving a route prediction accuracy of 0.896 (versus PREDICT’s accuracy of 0.828), while being able to deal with the full ECAC network within reasonable computational time. These promising results encourage us to be optimistic about the future implementation of the proposed system.(Català) L’objectiu de la gestió de demanda i capacitat de trànsit aeri (ATFCM per les sigles en anglès) és garantir que la capacitat aeroportuària i de l’espai aeri satisfacin la demanda de trànsit mentre s’optimitzen els fluxos per evitar excedir la capacitat disponible quan aquesta no es pot augmentar més. A Europa, l’ATFCM està a càrrec d’EUROCONTROL, i consta de tres fases: estratègica, pre-tàctica i tàctica. Aquesta tesi se centra en la pre-tàctica, que inclou els sis dies previs al dia d’operacions. Durant la fase pre-tàctica, els de l'espai aeri han presentat pocs o cap pla de vol i l’única informació sobre els vols disponible són els anomenats intencions de vol (principalment els horaris). La informació de la trajectòria només està disponible quan els usuaris envien els seus pla. Aquesta informació és necessària per assegurar una assignació correcta de recursos en coordinació amb els proveïdors de serveis de. Per predir els plans abans que siguin presentats, EUROCONTROL es recolza en l'eina PREDICT, que genera prediccions de trànsit d'acord amb les trajectòries escollides per vols similars el passat recent, sense aprofitar la informació sobre les decisions en dades històriques. L'objectiu de la present tesi doctoral és millorar l'exercici predictiu de l'eina PREDICT mitjançant el desenvolupament d'una eina que pugui gestionar tots els vols a Europa de manera eficient. Per fer-ho, s’han explorat diferents enfocaments de predicció de trajectòries basats en models d’aprenentatge automàtic entrenats amb dades històriques, avaluant l’exercici de la predicció. A l’hora d’aplicar les tècniques d’aprenentatge automàtic per a la predicció de trajectòries, s’han identificat tres eleccions metodològiques fonamentals: (i) el clustering de trajectòries, que s’utilitza per agrupar trajectòries similars per simplificar el problema de predicció de trajectòries; (ii) la formulació del model d’aprenentatge automàtic; i (iii) l’aproximació seguida per entrenar el model. La contribució d’aquesta tesi doctoral a l’estat de l’art es troba a les dues primeres àrees. Primer, hem desenvolupat una nova tècnica de clustering de rutes, basada en l’àrea compresa entre dues rutes, que redueix el temps computacional requerit i augmenta l’escalabilitat respecte a altres tècniques de clustering descrites a la literatura. En segon lloc, hem desenvolupat, provat i avaluat dos nous enfocaments de modelatge per a la predicció de rutes. El primer enfocament consisteix a construir i entrenar un model d’aprenentatge automàtic independent per a cada parell de d'aeroports a la xarxa, prenent com a entrades diferents variables disponibles de les intencions més altres variables relacionades amb el clima i el nombre de regulacions. Aquest enfocament millora el rendiment del model PREDICT, però també té una limitació important: no considera canvis en l’estructura de l’espai aeri, per la qual cosa no podeu predir rutes que no estan disponibles a les dades d’entrenament i, de vegades, podeu predir rutes que no són compatibles amb l’estructura de l’espai aeri. El segon enfocament, basat en les aerolínies, consisteix a construir i entrenar un model independent per a cada aerolínia. Les limitacions del primer model se superen en considerar com a variables d’entrada no només les variables disponibles dels intencions i el clima, sinó també les restriccions de l’espai aeri i les característiques de la ruta (p. ex., cost de la ruta, longitud, etc.). L’enfocament basat en aerolínies produeix una millora significativa respecte a PREDICT i al model basat en parells d'aeroports, aconseguint una precisió de predicció de ruta del 0,896 (comparant amb la precisió de PREDICT del 0,828), alhora que el problema pot escalar a tota l'àrea al complet amb un temps de computació raonable.(Español) El objetivo de la gestión de demanda y capacidad de tráfico (ATFCM por sus siglas en inglés) es garantizar que la capacidad aeroportuaria y del espacio aéreo satisfagan la demanda de tráfico mientras se optimizan los flujos para evitar exceder la capacidad disponible cuando esta no se puede aumentar más. En Europa, el ATFCM está a cargo de EUROCONTROL y consta de tres fases: estratégica, pre-táctica y táctica. Esta tesis se centra en la pre-táctica, que abarca los seis días previos al día de operaciones. Durante la fase pre-táctica, los usuarios del espacio aéreo han presentado pocos o ningún plan de vuelo y la única información sobre los vuelos disponible para EUROCONTROL son las llamados Intenciones de vuelo, que consisten principalmente en los horarios. La trayectoria está disponible sólo cuando los usuarios envían sus planes. Esta información es necesaria para asegurar una correcta asignación de recursos en coordinación con los provedores de servicios de navegación aérea de los distintos estados. Para predecir los FPLs antes de que sean presentados, EUROCONTROL se apoya en la herramienta PREDICT, que genera predicciones de tráfico de acuerdo las trayectorias elegidas por vuelos similares en el pasado reciente, sin aprovechar la información sobre las decisiones en datos históricos. El objetivo de la presente tesis doctoral es mejorar el desempeño predictivo de la herramienta PREDICT mediante el desarrollo de una herramienta que pueda gestionar todos los vuelos en Europa de una forma eficiente. Para ello, se han explorado diferentes enfoques de predicción de trayectorias basados en modelos de aprendizaje automático. A la hora de aplicar las técnicas de aprendizaje automático para predicción de trayectorias, se han identificado tres elecciones metodológicas fundamentales: (i) el clustering de trayectorias, que se utiliza para agrupar trayectorias similares a fin de simplificar el problema de predicción de trayectorias; (ii) la formulación del modelo de aprendizaje automático; y (iii) la aproximación seguida para entrenar el modelo. La contribución de esta tesis doctoral al estado del arte se encuentra en las dos primeras áreas. Primero, hemos desarrollado una novedosa técnica de clustering de rutas, basada en el área comprendida entre dos rutas, que reduce el tiempo computacional requerido y aumenta la escalabilidad con respecto a otras técnicas de clustering en la literatura. En segundo lugar, hemos desarrollado, probado y evaluado dos nuevos enfoques de modelado para la predicción de rutas. El primer enfoque consiste en construir y entrenar un modelo de aprendizaje automático independiente para cada par de aeropuertos en la red, tomando como entradas diferentes variables disponibles de las intenciones de vuelo más otras variables relacionadas con la meteorología y el número de regulaciones. Este enfoque mejora el rendimiento del modelo PREDICT, pero también tiene una limitación importante: no considera cambios en la estructura del espacio aéreo, por lo que no xvii puede predecir rutas que no están disponibles en los datos de entrenamiento y, a veces, puede predecir rutas que no son compatibles con el estructura del espacio aéreo. El segundo enfoque, basado en las aerolíneas, consiste en construir y entrenar un modelo independiente para cada aerolínea. Las limitaciones del primer modelo se superan al considerar como variables de entrada no solo las variables disponibles de las FIs y la meteorología, sino también las restricciones del espacio aéreo y las características de la ruta (p. ej., coste de la ruta, longitud, etc.). El enfoque basado en aerolíneas produce una mejora significativa con respecto a PREDICT y al modelo basado en pares de aeropuertos, logrando una precisión de predicción de ruta de 0,896 (frente a la precisión de PREDICT de 0,828), a la vez que puede lidiar con toda la red en un tiempo de computación razonable. Estos prometedores resultados nos animan a ser optimistas sobre una futura implementación del sistema propuesto.Ciència i tecnologies aeroespacial

    An explainable artificial intelligence (xAI) framework for improving trust in automated ATM tools

    Get PDF
    With the increased use of intelligent Decision Support Tools in Air Traffic Management (ATM) and inclusion of non-traditional entities, regulators and end users need assurance that new technologies such as Artificial Intelligence (AI) and Machine Learning (ML) are trustworthy and safe. Although there is a wide amount of research on the technologies themselves, there seem to be a gap between research projects and practical implementation due to different regulatory and practical challenges including the need for transparency and explainability of solutions. In order to help address these challenges, a novel framework to enable trust on AI-based automated solutions is presented based on current guidelines and end user feedback. Finally, recommendations are provided to bridge the gap between research and implementation of AI and ML-based solutions using our framework as a mechanism to aid advances of AI technology within ATM

    Testing the T-loop Model of Telomeric End Protection

    Get PDF
    Telomeres are the key structures that protect the ends of linear chromosomes. Although they are often thought about in the context of cellular aging, their most important role is actually to protect the end of the DNA from being mis-interpreted as a site of DNA damage. Telomeres are thought to accomplish this through the action of shelterin. Shelterin is a multi-protein complex where each subunit is dedicated to a specific role in repressing a form of DNA damage signaling, repair, or recruiting telomerase to extend the telomere end. One of the critical anchor points of shelterin is a protein known as TRF2. TRF2 is necessary to protect telomeres from becoming fused by non-homologous end-joining, and from one of the two main DNA damage signaling pathways, in this case the one driven by ATM and CHK2. It is thought to do this by rearranging the very 3’ end of the DNA into a duplex loop – known as the t-loop. Together, TRF2 and t-loops are the main pillars of the t-loop model of end-protection, which is the focus of this thesis. The first part of this thesis presents an overview of telomere protection and focuses specifically on what is known about end-protection in mammalian cells. From there, we test an alternative model of telomere end-protection, and find it to be unsubstantiated. We next analyze how TRF2 contributes to t-loop formation, including whether TRF2 cooperates with other shelterin components, uses non-shelterin factors, and which domains of TRF2 contribute. Finally, we try to understand how t-loops are made, and whether there are any external factors that assist TRF2, or whether TRF2 is self-sufficient in repressing signaling, fusions, and forming t-loops. We then discuss the evolution of telomeres which serves as an important reference point towards understanding the greater context of the t-loop model, and its plausibility. The appendix discusses attempts to push the resolution of t-loop imaging in the context of whole cells. The work presented here is of relevance to understanding the central mechanism of telomere end protection. What t-loops do – if anything – and how they are made is a question that is at the heart of telomere biology

    Development and evaluation of low cost 2-d lidar based traffic data collection methods

    Get PDF
    Traffic data collection is one of the essential components of a transportation planning exercise. Granular traffic data such as volume count, vehicle classification, speed measurement, and occupancy, allows managing transportation systems more effectively. For effective traffic operation and management, authorities require deploying many sensors across the network. Moreover, the ascending efforts to achieve smart transportation aspects put immense pressure on planning authorities to deploy more sensors to cover an extensive network. This research focuses on the development and evaluation of inexpensive data collection methodology by using two-dimensional (2-D) Light Detection and Ranging (LiDAR) technology. LiDAR is adopted since it is economical and easily accessible technology. Moreover, its 360-degree visibility and accurate distance information make it more reliable. To collect traffic count data, the proposed method integrates a Continuous Wavelet Transform (CWT), and Support Vector Machine (SVM) into a single framework. Proof-of-Concept (POC) test is conducted in three different places in Newark, New Jersey to examine the performance of the proposed method. The POC test results demonstrate that the proposed method achieves acceptable performances, resulting in 83% ~ 94% accuracy. It is discovered that the proposed method\u27s accuracy is affected by the color of the exterior surface of a vehicle since some colored surfaces do not produce enough reflective rays. It is noticed that the blue and black colors are less reflective, while white-colored surfaces produce high reflective rays. A methodology is proposed that comprises K-means clustering, inverse sensor model, and Kalman filter to obtain trajectories of the vehicles at the intersections. The primary purpose of vehicle detection and tracking is to obtain the turning movement counts at an intersection. A K-means clustering is an unsupervised machine learning technique that clusters the data into different groups by analyzing the smallest mean of a data point from the centroid. The ultimate objective of applying K-mean clustering is to identify the difference between pedestrians and vehicles. An inverse sensor model is a state model of occupancy grid mapping that localizes the detected vehicles on the grid map. A constant velocity model based Kalman filter is defined to track the trajectory of the vehicles. The data are collected from two intersections located in Newark, New Jersey, to study the accuracy of the proposed method. The results show that the proposed method has an average accuracy of 83.75%. Furthermore, the obtained R-squared value for localization of the vehicles on the grid map is ranging between 0.87 to 0.89. Furthermore, a primary cost comparison is made to study the cost efficiency of the developed methodology. The cost comparison shows that the proposed methodology based on 2-D LiDAR technology can achieve acceptable accuracy at a low price and be considered a smart city concept to conduct extensive scale data collection

    Big data analytics for preventive medicine

    Get PDF
    © 2019, Springer-Verlag London Ltd., part of Springer Nature. Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations

    Tools and Experiments for Software Security

    Get PDF
    The computer security problems that we face begin in computer programs that we write. The exploitation of vulnerabilities that leads to the theft of private information and other nefarious activities often begins with a vulnerability accidentally created in a computer program by that program's author. What are the factors that lead to the creation of these vulnerabilities? Software development and programming is in part a synthetic activity that we can control with technology, i.e. different programming languages and software development tools. Does changing the technology used to program software help programmers write more secure code? Can we create technology that will help programmers make fewer mistakes? This dissertation examines these questions. We start with the Build It Break It Fix It project, a security focused programming competition. This project provides data on software security problems by allowing contestants to write security focused software in any programming language. We discover that using C leads to memory safety issues that can compromise security. Next, we consider making C safer. We develop and examine the Checked C programming language, a strict super-set of C that adds types for spatial safety. We also introduce an automatic re-writing tool that can convert C code into Checked C code. We evaluate the approach overall on benchmarks used by prior work on making C safer. We then consider static analysis. After an examination of different parameters of numeric static analyzers, we develop a disjunctive abstract domain that uses a novel merge heuristic, a notion of volumetric difference, either approximated via MCMC sampling or precisely computed via conical decomposition. This domain is implemented in a static analyzer for C programs and evaluated. After static analysis, we consider fuzzing. We consider what it takes to perform a good evaluation of a fuzzing technique with our own experiments and a review of recent fuzzing papers. We develop a checklist for conducting new fuzzing research and a general strategy for identifying root causes of failure found during fuzzing. We evaluate new root cause analysis approaches using coverage information as inputs to statistical clustering algorithms

    Comparative Analysis Based on Survey of DDOS Attacks’ Detection Techniques at Transport, Network, and Application Layers

    Get PDF
    Distributed Denial of Service (DDOS) is one of the most prevalent attacks and can be executed in diverse ways using various tools and codes. This makes it very difficult for the security researchers and engineers to come up with a rigorous and efficient security methodology. Even with thorough research, analysis, real time implementation, and application of the best mechanisms in test environments, there are various ways to exploit the smallest vulnerability within the system that gets overlooked while designing the defense mechanism. This paper presents a comprehensive survey of various methodologies implemented by researchers and engineers to detect DDOS attacks at network, transport, and application layers using comparative analysis. DDOS attacks are most prevalent on network, transport, and application layers justifying the need to focus on these three layers in the OSI model
    • …
    corecore