168 research outputs found

    Multimodal analysis for object classification and event detection

    Get PDF

    Earth Observation Open Science and Innovation

    Get PDF
    geospatial analytics; social observatory; big earth data; open data; citizen science; open innovation; earth system science; crowdsourced geospatial data; citizen science; science in society; data scienc

    End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets

    Get PDF
    Privacy-preserving machine learning (PPML) promises to train machine learning (ML) models by combining data spread across multiple data silos. Theoretically, secure multiparty computation (MPC) allows multiple data owners to train models on their joint data without revealing the data to each other. However, the prior implementations of this secure training using MPC have three limitations: they have only been evaluated on CNNs, and LSTMs have been ignored; fixed point approximations have affected training accuracies compared to training in floating point; and due to significant latency overheads of secure training via MPC, its relevance for practical tasks with streaming data remains unclear. The motivation of this work is to report our experience of addressing the practical problem of secure training and inference of models for urban sensing problems, e.g., traffic congestion estimation, or air pollution monitoring in large cities, where data can be contributed by rival fleet companies while balancing the privacy-accuracy trade-offs using MPC-based techniques. Our first contribution is to design a custom ML model for this task that can be efficiently trained with MPC within a desirable latency. In particular, we design a GCN-LSTM and securely train it on time-series sensor data for accurate forecasting, within 7 minutes per epoch. As our second contribution, we build an end-toend system of private training and inference that provably matches the training accuracy of cleartext ML training. This work is the first to securely train a model with LSTM cells. Third, this trained model is kept secret-shared between the fleet companies and allows clients to make sensitive queries to this model while carefully handling potentially invalid queries. Our custom protocols allow clients to query predictions from privately trained models in milliseconds, all the while maintaining accuracy and cryptographic securit

    Text Similarity Between Concepts Extracted from Source Code and Documentation

    Get PDF
    Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p

    A Deep Learning approach for monitoring severe rainfall in urban catchments using consumer cameras. Models development and deployment on a case study in Matera (Italy) Un approccio basato sul Deep Learning per monitorare le piogge intense nei bacini urbani utilizzando fotocamere generiche. Sviluppo e implementazione di modelli su un caso di studio a Matera (Italia)

    Get PDF
    In the last 50 years, flooding has figured as the most frequent and widespread natural disaster globally. Extreme precipitation events stemming from climate change could alter the hydro-geological regime resulting in increased flood risk. Near real-time precipitation monitoring at local scale is essential for flood risk mitigation in urban and suburban areas, due to their high vulnerability. Presently, most of the rainfall data is obtained from ground‐based measurements or remote sensing that provide limited information in terms of temporal or spatial resolution. Other problems may be due to the high costs. Furthermore, rain gauges are unevenly spread and usually placed away from urban centers. In this context, a big potential is represented by the use of innovative techniques to develop low-cost monitoring systems. Despite the diversity of purposes, methods and epistemological fields, the literature on the visual effects of the rain supports the idea of camera-based rain sensors but tends to be device-specific. The present thesis aims to investigate the use of easily available photographing devices as rain detectors-gauges to develop a dense network of low-cost rainfall sensors to support the traditional methods with an expeditious solution embeddable into smart devices. As opposed to existing works, the study focuses on maximizing the number of image sources (like smartphones, general-purpose surveillance cameras, dashboard cameras, webcams, digital cameras, etc.). This encompasses cases where it is not possible to adjust the camera parameters or obtain shots in timelines or videos. Using a Deep Learning approach, the rainfall characterization can be achieved through the analysis of the perceptual aspects that determine whether and how a photograph represents a rainy condition. The first scenario of interest for the supervised learning was a binary classification; the binary output (presence or absence of rain) allows the detection of the presence of precipitation: the cameras act as rain detectors. Similarly, the second scenario of interest was a multi-class classification; the multi-class output described a range of quasi-instantaneous rainfall intensity: the cameras act as rain estimators. Using Transfer Learning with Convolutional Neural Networks, the developed models were compiled, trained, validated, and tested. The preparation of the classifiers included the preparation of a suitable dataset encompassing unconstrained verisimilar settings: open data, several data owned by National Research Institute for Earth Science and Disaster Prevention - NIED (dashboard cameras in Japan coupled with high precision multi-parameter radar data), and experimental activities conducted in the NIED Large Scale Rainfall Simulator. The outcomes were applied to a real-world scenario, with the experimentation through a pre-existent surveillance camera using 5G connectivity provided by Telecom Italia S.p.A. in the city of Matera (Italy). Analysis unfolded on several levels providing an overview of generic issues relating to the urban flood risk paradigm and specific territorial questions inherent with the case study. These include the context aspects, the important role of rainfall from driving the millennial urban evolution to determining present criticality, and components of a Web prototype for flood risk communication at local scale. The results and the model deployment raise the possibility that low‐cost technologies and local capacities can help to retrieve rainfall information for flood early warning systems based on the identification of a significant meteorological state. The binary model reached accuracy and F1 score values of 85.28% and 0.86 for the test, and 83.35% and 0.82 for the deployment. The multi-class model reached test average accuracy and macro-averaged F1 score values of 77.71% and 0.73 for the 6-way classifier, and 78.05% and 0.81 for the 5-class. The best performances were obtained in heavy rainfall and no-rain conditions, whereas the mispredictions are related to less severe precipitation. The proposed method has limited operational requirements, can be easily and quickly implemented in real use cases, exploiting pre-existent devices with a parsimonious use of economic and computational resources. The classification can be performed on single photographs taken in disparate conditions by commonly used acquisition devices, i.e. by static or moving cameras without adjusted parameters. This approach is especially useful in urban areas where measurement methods such as rain gauges encounter installation difficulties or operational limitations or in contexts where there is no availability of remote sensing data. The system does not suit scenes that are also misleading for human visual perception. The approximations inherent in the output are acknowledged. Additional data may be gathered to address gaps that are apparent and improve the accuracy of the precipitation intensity prediction. Future research might explore the integration with further experiments and crowdsourced data, to promote communication, participation, and dialogue among stakeholders and to increase public awareness, emergency response, and civic engagement through the smart community idea.Negli ultimi 50 anni, le alluvioni si sono confermate come il disastro naturale più frequente e diffuso a livello globale. Tra gli impatti degli eventi meteorologici estremi, conseguenti ai cambiamenti climatici, rientrano le alterazioni del regime idrogeologico con conseguente incremento del rischio alluvionale. Il monitoraggio delle precipitazioni in tempo quasi reale su scala locale è essenziale per la mitigazione del rischio di alluvione in ambito urbano e periurbano, aree connotate da un'elevata vulnerabilità. Attualmente, la maggior parte dei dati sulle precipitazioni è ottenuta da misurazioni a terra o telerilevamento che forniscono informazioni limitate in termini di risoluzione temporale o spaziale. Ulteriori problemi possono derivare dagli elevati costi. Inoltre i pluviometri sono distribuiti in modo non uniforme e spesso posizionati piuttosto lontano dai centri urbani, comportando criticità e discontinuità nel monitoraggio. In questo contesto, un grande potenziale è rappresentato dall'utilizzo di tecniche innovative per sviluppare sistemi inediti di monitoraggio a basso costo. Nonostante la diversità di scopi, metodi e campi epistemologici, la letteratura sugli effetti visivi della pioggia supporta l'idea di sensori di pioggia basati su telecamera, ma tende ad essere specifica per dispositivo scelto. La presente tesi punta a indagare l'uso di dispositivi fotografici facilmente reperibili come rilevatori-misuratori di pioggia, per sviluppare una fitta rete di sensori a basso costo a supporto dei metodi tradizionali con una soluzione rapida incorporabile in dispositivi intelligenti. A differenza dei lavori esistenti, lo studio si concentra sulla massimizzazione del numero di fonti di immagini (smartphone, telecamere di sorveglianza generiche, telecamere da cruscotto, webcam, telecamere digitali, ecc.). Ciò comprende casi in cui non sia possibile regolare i parametri fotografici o ottenere scatti in timeline o video. Utilizzando un approccio di Deep Learning, la caratterizzazione delle precipitazioni può essere ottenuta attraverso l'analisi degli aspetti percettivi che determinano se e come una fotografia rappresenti una condizione di pioggia. Il primo scenario di interesse per l'apprendimento supervisionato è una classificazione binaria; l'output binario (presenza o assenza di pioggia) consente la rilevazione della presenza di precipitazione: gli apparecchi fotografici fungono da rivelatori di pioggia. Analogamente, il secondo scenario di interesse è una classificazione multi-classe; l'output multi-classe descrive un intervallo di intensità delle precipitazioni quasi istantanee: le fotocamere fungono da misuratori di pioggia. Utilizzando tecniche di Transfer Learning con reti neurali convoluzionali, i modelli sviluppati sono stati compilati, addestrati, convalidati e testati. La preparazione dei classificatori ha incluso la preparazione di un set di dati adeguato con impostazioni verosimili e non vincolate: dati aperti, diversi dati di proprietà del National Research Institute for Earth Science and Disaster Prevention - NIED (telecamere dashboard in Giappone accoppiate con dati radar multiparametrici ad alta precisione) e attività sperimentali condotte nel simulatore di pioggia su larga scala del NIED. I risultati sono stati applicati a uno scenario reale, con la sperimentazione attraverso una telecamera di sorveglianza preesistente che utilizza la connettività 5G fornita da Telecom Italia S.p.A. nella città di Matera (Italia). L'analisi si è svolta su più livelli, fornendo una panoramica sulle questioni relative al paradigma del rischio di alluvione in ambito urbano e questioni territoriali specifiche inerenti al caso di studio. Queste ultime includono diversi aspetti del contesto, l'importante ruolo delle piogge dal guidare l'evoluzione millenaria della morfologia urbana alla determinazione delle criticità attuali, oltre ad alcune componenti di un prototipo Web per la comunicazione del rischio alluvionale su scala locale. I risultati ottenuti e l'implementazione del modello corroborano la possibilità che le tecnologie a basso costo e le capacità locali possano aiutare a caratterizzare la forzante pluviometrica a supporto dei sistemi di allerta precoce basati sull'identificazione di uno stato meteorologico significativo. Il modello binario ha raggiunto un'accuratezza e un F1-score di 85,28% e 0,86 per il set di test e di 83,35% e 0,82 per l'implementazione nel caso di studio. Il modello multi-classe ha raggiunto un'accuratezza media e F1-score medio (macro-average) di 77,71% e 0,73 per il classificatore a 6 vie e 78,05% e 0,81 per quello a 5 classi. Le prestazioni migliori sono state ottenute nelle classi relative a forti precipitazioni e assenza di pioggia, mentre le previsioni errate sono legate a precipitazioni meno estreme. Il metodo proposto richiede requisiti operativi limitati, può essere implementato facilmente e rapidamente in casi d'uso reali, sfruttando dispositivi preesistenti con un uso parsimonioso di risorse economiche e computazionali. La classificazione può essere eseguita su singole fotografie scattate in condizioni disparate da dispositivi di acquisizione di uso comune, ovvero da telecamere statiche o in movimento senza regolazione dei parametri. Questo approccio potrebbe essere particolarmente utile nelle aree urbane in cui i metodi di misurazione come i pluviometri incontrano difficoltà di installazione o limitazioni operative o in contesti in cui non sono disponibili dati di telerilevamento o radar. Il sistema non si adatta a scene che sono fuorvianti anche per la percezione visiva umana. I limiti attuali risiedono nelle approssimazioni intrinseche negli output. Per colmare le lacune evidenti e migliorare l'accuratezza della previsione dell'intensità di precipitazione, sarebbe possibile un'ulteriore raccolta di dati. Sviluppi futuri potrebbero riguardare l'integrazione con ulteriori esperimenti in campo e dati da crowdsourcing, per promuovere comunicazione, partecipazione e dialogo aumentando la resilienza attraverso consapevolezza pubblica e impegno civico in una concezione di comunità smart

    Algorithms, applications and systems towards interpretable pattern mining from multi-aspect data

    Get PDF
    How do humans move around in the urban space and how do they differ when the city undergoes terrorist attacks? How do users behave in Massive Open Online courses~(MOOCs) and how do they differ if some of them achieve certificates while some of them not? What areas in the court elite players, such as Stephen Curry, LeBron James, like to make their shots in the course of the game? How can we uncover the hidden habits that govern our online purchases? Are there unspoken agendas in how different states pass legislation of certain kinds? At the heart of these seemingly unconnected puzzles is this same mystery of multi-aspect mining, i.g., how can we mine and interpret the hidden pattern from a dataset that simultaneously reveals the associations, or changes of the associations, among various aspects of the data (e.g., a shot could be described with three aspects, player, time of the game, and area in the court)? Solving this problem could open gates to a deep understanding of underlying mechanisms for many real-world phenomena. While much of the research in multi-aspect mining contribute broad scope of innovations in the mining part, interpretation of patterns from the perspective of users (or domain experts) is often overlooked. Questions like what do they require for patterns, how good are the patterns, or how to read them, have barely been addressed. Without efficient and effective ways of involving users in the process of multi-aspect mining, the results are likely to lead to something difficult for them to comprehend. This dissertation proposes the M^3 framework, which consists of multiplex pattern discovery, multifaceted pattern evaluation, and multipurpose pattern presentation, to tackle the challenges of multi-aspect pattern discovery. Based on this framework, we develop algorithms, applications, and analytic systems to enable interpretable pattern discovery from multi-aspect data. Following the concept of meaningful multiplex pattern discovery, we propose PairFac to close the gap between human information needs and naive mining optimization. We demonstrate its effectiveness in the context of impact discovery in the aftermath of urban disasters. We develop iDisc to target the crossing of multiplex pattern discovery with multifaceted pattern evaluation. iDisc meets the specific information need in understanding multi-level, contrastive behavior patterns. As an example, we use iDisc to predict student performance outcomes in Massive Open Online Courses given users' latent behaviors. FacIt is an interactive visual analytic system that sits at the intersection of all three components and enables for interpretable, fine-tunable, and scrutinizable pattern discovery from multi-aspect data. We demonstrate each work's significance and implications in its respective problem context. As a whole, this series of studies is an effort to instantiate the M^3 framework and push the field of multi-aspect mining towards a more human-centric process in real-world applications
    corecore