231 research outputs found

    2019年 環境数理学科 学術論文等

    Get PDF

    A Computational Framework for Finding Interestingness Hotspots in Spatial Datasets

    Get PDF
    The significant growth of spatial data increased the need for automated discovery of spatial knowledge. An important task when analyzing spatial data is hotspot discovery. In this dissertation, we propose a novel methodology for discovering interestingness hotspots in spatial datasets. We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert’s notion of interestingness captured by an interestingness function. We propose computational methods for finding interestingness hotspots in point-based and polygonal spatial datasets, and gridded spatial-temporal datasets. The proposed framework identifies hotspots maximizing an externally given interestingness function defined on any number of spatial or non-spatial attributes using a five-step methodology, which consists of: (1) identifying neighboring objects in the dataset, (2) generating hotspot seeds, (3) growing hotspots from identified hotspot seeds, (4) post-processing to remove highly overlapping neighboring redundant hotspots, and (5) finding the scope of hotspots. In particular, we introduce novel hotspot growing algorithms that grow hotspots from hotspot seeds. A novel growing algorithm for point-based datasets is introduced that operates on Gabriel Graphs, capturing the neighboring relationships of objects in a spatial dataset. Moreover, we present a novel graph-based post-processing algorithm, which removes highly overlapping hotspots and employs a graph simplification step that significantly improves the runtime of finding maximum weight independent set in the overlap graph of hotspots. The proposed post-processing algorithm is quite generic and can be used with any methods to cope with overlapping hotspots or clusters. Additionally, the employed graph simplification step can be adapted as a preprocessing step by algorithms that find maximum weight clique and maximum weight independent sets in graphs. Furthermore, we propose a computational framework for finding the scope of two-dimensional point-based hotspots. We evaluate our framework in case studies using a gridded air-pollution dataset, and point-based crime and taxicab datasets in which we find hotspots based on different interestingness functions and we give a comparison of our framework with a state of the art hotspot discovery technique. Experiments show that our methodology succeeds in accurately discovering interestingness hotspots and does well in comparison to traditional hotspot detection methods.Computer Science, Department o

    Geoinformatic methodologies and quantitative tools for detecting hotspots and for multicriteria ranking and prioritization: application on biodiversity monitoring and conservation

    Get PDF
    Chi ha la responsabilità di gestire un’area protetta non solo deve essere consapevole dei problemi ambientali dell’area ma dovrebbe anche avere a disposizione dati aggiornati e appropriati strumenti metodologici per esaminare accuratamente ogni singolo problema. In effetti, il decisore ambientale deve organizzare in anticipo le fasi necessarie a fronteggiare le prevedibili variazioni che subirà la pressione antropica sulle aree protette. L’obiettivo principale della Tesi è di natura metodologica e riguarda il confronto tra differenti metodi statistici multivariati utili per l’individuazione di punti critici nello spazio e per l’ordinamento degli “oggetti ambientali” di studio e quindi per l’individuazione delle priorità di intervento ambientale. L’obiettivo ambientale generale è la conservazione del patrimonio di biodiversità. L’individuazione, tramite strumenti statistici multivariati, degli habitat aventi priorità ecologica è solamente il primo fondamentale passo per raggiungere tale obiettivo. L’informazione ecologica, integrata nel contesto antropico, è un successivo essenziale passo per effettuare valutazioni ambientali e per pianificare correttamente le azioni volte alla conservazione. Un’ampia serie di dati ed informazioni è stata necessaria per raggiungere questi obiettivi di gestione ambientale. I dati ecologici sono forniti dal Ministero dell’Ambiente Italiano e provengono al Progetto “Carta della Natura” del Paese. I dati demografici sono invece forniti dall’Istituto Italiano di Statistica (ISTAT). I dati si riferiscono a due aree geografiche italiane: la Val Baganza (Parma) e l’Oltrepò Pavese e Appennino Ligure-Emiliano. L’analisi è stata condotta a due differenti livelli spaziali: ecologico-naturalistico (l’habitat) e amministrativo (il Comune). Corrispondentemente, i risultati più significativi ottenuti sono: 1. Livello habitat: il confronto tra due metodi di ordinamento e determinazione delle priorità, il metodo del Vettore Ideale e quello della Preminenza, tramite l’utilizzo di importanti metriche ecologiche come il Valore Ecologico (E.V.) e la Sensibilità Ecologica (E.S.), fornisce dei risultati non direttamente comparabili. Il Vettore Ideale, non essendo un procedimento basato sulla ranghizzazione dei valori originali, sembra essere preferibile nel caso di paesaggi molto eterogenei in senso spaziale. Invece, il metodo della Preminenza probabilmente è da preferire in paesaggi ecologici aventi un basso grado di eterogeneità intesa nel senso di differenze non troppo grandi nel E.V. ed E.S. degli habitat. 2. Livello comunale: Al fine di prendere delle decisioni gestionali ed essendo gli habitat solo delle suddivisioni naturalistiche di un dato territorio, è necessario spostare l’attenzione sulle corrispondenti unità amministrative territoriali (i Comuni). Da questo punto di vista, l’introduzione della demografia risulta essere un elemento centrale oltre che di novità nelle analisi ecologico-ambientali. In effetti, l’analisi demografica rende il risultato di cui al punto 1 molto più realistico introducendo altre dimensioni (la pressione antropica attuale e le sue tendenze) che permettono l’individuazione di aree ecologicamente fragili. Inoltre, tale approccio individua chiaramente le responsabilità ambientali di ogni singolo ente territoriale nei riguardi della difesa della biodiversità. In effetti un ordinamento dei Comuni sulla base delle caratteristiche ambientali e demografiche, chiarisce le responsabilità gestionali di ognuno di essi. Un’applicazione concreta di questa necessaria quanto utile integrazione di dati ecologici e demografici viene discussa progettando una Rete Ecologica (E.N.). La Rete cosi ottenuta infatti presenta come elemento di novità il fatto di non essere “statica” bensì “dinamica” nel senso che la sua pianificazione tiene in considerazione il trend di pressione antropica al fine di individuare i probabili punti di futura fragilità e quindi di più critica gestione.Who has the responsibility to manage a conservation zone, not only must be aware of environmental problems but should have at his disposal updated databases and appropriate methodological instruments to examine carefully each individual case. In effect he has to arrange, in advance, the necessary steps to withstand the foreseeable variations in the trends of human pressure on conservation zones. The essential objective of this Thesis is methodological that is to compare different multivariate statistical methods useful for environmental hotspot detection and for environmental prioritization and ranking. The general environmental goal is the conservation of the biodiversity patrimony. The individuation, through multidimensional statistical tools, of habitats having top ecological priority, is only the first basic step to accomplish this aim. Ecological information integrated in the human context is an essential further step to make environmental evaluations and to plan correct conservation actions. A wide series of data and information has been necessary to accomplish environmental management tasks. Ecological data are provided by the Italian Ministry of the Environment and they refer to the Map of Italian Nature Project database. The demographic data derives from the Italian Institute of Statistics (ISTAT). The data utilized regards two Italian areas: Baganza Valley and Oltrepò Pavese and Ligurian-Emilian Apennine. The analysis has been carried out at two different spatial/scale levels: ecological-naturalistic (habitat level) and administrative (Commune level). Correspondingly, the main obtained results are: 1. Habitat level: comparing two ranking and prioritization methods, Ideal Vector and Salience, through important ecological metrics like Ecological Value (E.V.) and Ecological Sensitivity (E.S.), gives results not directly comparable. Being not based on a ranking process, Ideal Vector method seems to be used preferentially in landscapes characterized by high spatial heterogeneity. On the contrary, Salience method is probably to be preferred in ecological landscapes characterized by a low degree of heterogeneity in terms of not large differences concerning habitat E.V. and E.S.. 2. Commune level: Being habitat only a naturalistic partition of a given territory, it is necessary, for management decisions, to move towards the corresponding administrative units (Communes). From this point of view, the introduction of demography is an essential element of novelty in environmental analysis. In effect, demographic analysis makes the goal at point 1 more realistic introducing other dimensions (actual human pressure and its trend) which allows the individuation of environmentally fragile areas. Furthermore this approach individuates clearly the environmental responsibility of each administrative body for what concerns the biodiversity conservation. In effect communes’ ranking, according to environmental/demographic features, clarify the responsibilities of each administrative body. A concrete application of this necessary and useful integration of ecological and demographic data has been developed in designing an Ecological Network (E.N.).The obtained E.N. has the novelty to be not “static” but “dynamic” that is the network planning take into account the demographic pressure trends in the individuation of the probable future fragile points

    Generalized and efficient outlier detection for spatial, temporal, and high-dimensional data mining

    Get PDF
    Knowledge Discovery in Databases (KDD) ist der Prozess, nicht-triviale Muster aus großen Datenbanken zu extrahieren, mit dem Ziel, dass diese bisher unbekannt, potentiell nützlich, statistisch fundiert und verständlich sind. Der Prozess umfasst mehrere Schritte wie die Selektion, Vorverarbeitung, Evaluierung und den Analyseschritt, der als Data-Mining bekannt ist. Eine der zentralen Aufgabenstellungen im Data-Mining ist die Ausreißererkennung, das Identifizieren von Beobachtungen, die ungewöhnlich sind und mit der Mehrzahl der Daten inkonsistent erscheinen. Solche seltene Beobachtungen können verschiedene Ursachen haben: Messfehler, ungewöhnlich starke (aber dennoch genuine) Abweichungen, beschädigte oder auch manipulierte Daten. In den letzten Jahren wurden zahlreiche Verfahren zur Erkennung von Ausreißern vorgeschlagen, die sich oft nur geringfügig zu unterscheiden scheinen, aber in den Publikationen experimental als ``klar besser'' dargestellt sind. Ein Schwerpunkt dieser Arbeit ist es, die unterschiedlichen Verfahren zusammenzuführen und in einem gemeinsamen Formalismus zu modularisieren. Damit wird einerseits die Analyse der Unterschiede vereinfacht, andererseits aber die Flexibilität der Verfahren erhöht, indem man Module hinzufügen oder ersetzen und damit die Methode an geänderte Anforderungen und Datentypen anpassen kann. Um die Vorteile der modularisierten Struktur zu zeigen, werden (i) zahlreiche bestehende Algorithmen in dem Schema formalisiert, (ii) neue Module hinzugefügt, um die Robustheit, Effizienz, statistische Aussagekraft und Nutzbarkeit der Bewertungsfunktionen zu verbessern, mit denen die existierenden Methoden kombiniert werden können, (iii) Module modifiziert, um bestehende und neue Algorithmen auf andere, oft komplexere, Datentypen anzuwenden wie geographisch annotierte Daten, Zeitreihen und hochdimensionale Räume, (iv) mehrere Methoden in ein Verfahren kombiniert, um bessere Ergebnisse zu erzielen, (v) die Skalierbarkeit auf große Datenmengen durch approximative oder exakte Indizierung verbessert. Ausgangspunkt der Arbeit ist der Algorithmus Local Outlier Factor (LOF). Er wird zunächst mit kleinen Erweiterungen modifiziert, um die Robustheit und die Nutzbarkeit der Bewertung zu verbessern. Diese Methoden werden anschließend in einem gemeinsamen Rahmen zur Erkennung lokaler Ausreißer formalisiert, um die entsprechenden Vorteile auch in anderen Algorithmen nutzen zu können. Durch Abstraktion von einem einzelnen Vektorraum zu allgemeinen Datentypen können auch räumliche und zeitliche Beziehungen analysiert werden. Die Verwendung von Unterraum- und Korrelations-basierten Nachbarschaften ermöglicht dann, einen neue Arten von Ausreißern in beliebig orientierten Projektionen zu erkennen. Verbesserungen bei den Bewertungsfunktionen erlauben es, die Bewertung mit der statistischen Intuition einer Wahrscheinlichkeit zu interpretieren und nicht nur eine Ausreißer-Rangfolge zu erstellen wie zuvor. Verbesserte Modelle generieren auch Erklärungen, warum ein Objekt als Ausreißer bewertet wurde. Anschließend werden für verschiedene Module Verbesserungen eingeführt, die unter anderem ermöglichen, die Algorithmen auf wesentlich größere Datensätze anzuwenden -- in annähernd linearer statt in quadratischer Zeit --, indem man approximative Nachbarschaften bei geringem Verlust an Präzision und Effektivität erlaubt. Des weiteren wird gezeigt, wie mehrere solcher Algorithmen mit unterschiedlichen Intuitionen gleichzeitig benutzt und die Ergebnisse in einer Methode kombiniert werden können, die dadurch unterschiedliche Arten von Ausreißern erkennen kann. Schließlich werden für reale Datensätze neue Ausreißeralgorithmen konstruiert, die auf das spezifische Problem angepasst sind. Diese neuen Methoden erlauben es, so aufschlussreiche Ergebnisse zu erhalten, die mit den bestehenden Methoden nicht erreicht werden konnten. Da sie aus den Bausteinen der modularen Struktur entwickelt wurden, ist ein direkter Bezug zu den früheren Ansätzen gegeben. Durch Verwendung der Indexstrukturen können die Algorithmen selbst auf großen Datensätzen effizient ausgeführt werden.Knowledge Discovery in Databases (KDD) is the process of extracting non-trivial patterns in large data bases, with the focus of extracting novel, potentially useful, statistically valid and understandable patterns. The process involves multiple phases including selection, preprocessing, evaluation and the analysis step which is known as Data Mining. One of the key techniques of Data Mining is outlier detection, that is the identification of observations that are unusual and seemingly inconsistent with the majority of the data set. Such rare observations can have various reasons: they can be measurement errors, unusually extreme (but valid) measurements, data corruption or even manipulated data. Over the previous years, various outlier detection algorithms have been proposed that often appear to be only slightly different than previous but ``clearly outperform'' the others in the experiments. A key focus of this thesis is to unify and modularize the various approaches into a common formalism to make the analysis of the actual differences easier, but at the same time increase the flexibility of the approaches by allowing the addition and replacement of modules to adapt the methods to different requirements and data types. To show the benefits of the modularized structure, (i) several existing algorithms are formalized within the new framework (ii) new modules are added that improve the robustness, efficiency, statistical validity and score usability and that can be combined with existing methods (iii) modules are modified to allow existing and new algorithms to run on other, often more complex data types including spatial, temporal and high-dimensional data spaces (iv) the combination of multiple algorithm instances into an ensemble method is discussed (v) the scalability to large data sets is improved using approximate as well as exact indexing. The starting point is the Local Outlier Factor (LOF) algorithm, which is extended with slight modifications to increase robustness and the usability of the produced scores. In order to get the same benefits for other methods, these methods are abstracted to a general framework for local outlier detection. By abstracting from a single vector space, other data types that involve spatial and temporal relationships can be analyzed. The use of subspace and correlation neighborhoods allows the algorithms to detect new kinds of outliers in arbitrarily oriented subspaces. Improvements in the score normalization bring back a statistic intuition of probabilities to the outlier scores that previously were only useful for ranking objects, while improved models also offer explanations of why an object was considered to be an outlier. Subsequently, for different modules found in the framework improved modules are presented that for example allow to run the same algorithms on significantly larger data sets -- in approximately linear complexity instead of quadratic complexity -- by accepting approximated neighborhoods at little loss in precision and effectiveness. Additionally, multiple algorithms with different intuitions can be run at the same time, and the results combined into an ensemble method that is able to detect outliers of different types. Finally, new outlier detection methods are constructed; customized for the specific problems of these real data sets. The new methods allow to obtain insightful results that could not be obtained with the existing methods. Since being constructed from the same building blocks, there however exists a strong and explicit connection to the previous approaches, and by using the indexing strategies introduced earlier, the algorithms can be executed efficiently even on large data sets

    Optimal sensor placement for sewer capacity risk management

    Get PDF
    2019 Spring.Includes bibliographical references.Complex linear assets, such as those found in transportation and utilities, are vital to economies, and in some cases, to public health. Wastewater collection systems in the United States are vital to both. Yet effective approaches to remediating failures in these systems remains an unresolved shortfall for system operators. This shortfall is evident in the estimated 850 billion gallons of untreated sewage that escapes combined sewer pipes each year (US EPA 2004a) and the estimated 40,000 sanitary sewer overflows and 400,000 backups of untreated sewage into basements (US EPA 2001). Failures in wastewater collection systems can be prevented if they can be detected in time to apply intervention strategies such as pipe maintenance, repair, or rehabilitation. This is the essence of a risk management process. The International Council on Systems Engineering recommends that risks be prioritized as a function of severity and occurrence and that criteria be established for acceptable and unacceptable risks (INCOSE 2007). A significant impediment to applying generally accepted risk models to wastewater collection systems is the difficulty of quantifying risk likelihoods. These difficulties stem from the size and complexity of the systems, the lack of data and statistics characterizing the distribution of risk, the high cost of evaluating even a small number of components, and the lack of methods to quantify risk. This research investigates new methods to assess risk likelihood of failure through a novel approach to placement of sensors in wastewater collection systems. The hypothesis is that iterative movement of water level sensors, directed by a specialized metaheuristic search technique, can improve the efficiency of discovering locations of unacceptable risk. An agent-based simulation is constructed to validate the performance of this technique along with testing its sensitivity to varying environments. The results demonstrated that a multi-phase search strategy, with a varying number of sensors deployed in each phase, could efficiently discover locations of unacceptable risk that could be managed via a perpetual monitoring, analysis, and remediation process. A number of promising well-defined future research opportunities also emerged from the performance of this research

    Ubiquitous intelligence for smart cities: a public safety approach

    Get PDF
    Citizen-centered safety enhancement is an integral component of public safety and a top priority for decision makers in a smart city development. However, public safety agencies are constantly faced with the challenge of deterring crime. While most smart city initiatives have placed emphasis on the use of modern technology for fighting crime, this may not be sufficient to achieve a sustainable safe and smart city in a resource constrained environment, such as in Africa. In particular, crime series which is a set of crimes considered to have been committed by the same offender is currently less explored in developing nations and has great potential in helping to fight against crime and promoting safety in smart cities. This research focuses on detecting the situation of crime through data mining approaches that can be used to promote citizens' safety, and assist security agencies in knowledge-driven decision support, such as crime series identification. While much research has been conducted on crime hotspots, not enough has been done in the area of identifying crime series. This thesis presents a novel crime clustering model, CriClust, for crime series pattern (CSP) detection and mapping to derive useful knowledge from a crime dataset, drawing on sound scientific and mathematical principles, as well as assumptions from theories of environmental criminology. The analysis is augmented using a dual-threshold model, and pattern prevalence information is encoded in similarity graphs. Clusters are identified by finding highly-connected subgraphs using adaptive graph size and Monte-Carlo heuristics in the Karger-Stein mincut algorithm. We introduce two new interest measures: (i) Proportion Difference Evaluation (PDE), which reveals the propagation effect of a series and dominant series; and (ii) Pattern Space Enumeration (PSE), which reveals underlying strong correlations and defining features for a series. Our findings on experimental quasi-real data set, generated based on expert knowledge recommendation, reveal that identifying CSP and statistically interpretable patterns could contribute significantly to strengthening public safety service delivery in a smart city development. Evaluation was conducted to investigate: (i) the reliability of the model in identifying all inherent series in a crime dataset; (ii) the scalability of the model with varying crime records volume; and (iii) unique features of the model compared to competing baseline algorithms and related research. It was found that Monte Carlo technique and adaptive graph size mechanism for crime similarity clustering yield substantial improvement. The study also found that proportion estimation (PDE) and PSE of series clusters can provide valuable insight into crime deterrence strategies. Furthermore, visual enhancement of clusters using graphical approaches to organising information and presenting a unified viable view promotes a prompt identification of important areas demanding attention. Our model particularly attempts to preserve desirable and robust statistical properties. This research presents considerable empirical evidence that the proposed crime cluster (CriClust) model is promising and can assist in deriving useful crime pattern knowledge, contributing knowledge services for public safety authorities and intelligence gathering organisations in developing nations, thereby promoting a sustainable "safe and smart" city

    A COMPREHENSIVE GEOSPATIAL KNOWLEDGE DISCOVERY FRAMEWORK FOR SPATIAL ASSOCIATION RULE MINING

    Get PDF
    Continuous advances in modern data collection techniques help spatial scientists gain access to massive and high-resolution spatial and spatio-temporal data. Thus there is an urgent need to develop effective and efficient methods seeking to find unknown and useful information embedded in big-data datasets of unprecedentedly large size (e.g., millions of observations), high dimensionality (e.g., hundreds of variables), and complexity (e.g., heterogeneous data sources, space–time dynamics, multivariate connections, explicit and implicit spatial relations and interactions). Responding to this line of development, this research focuses on the utilization of the association rule (AR) mining technique for a geospatial knowledge discovery process. Prior attempts have sidestepped the complexity of the spatial dependence structure embedded in the studied phenomenon. Thus, adopting association rule mining in spatial analysis is rather problematic. Interestingly, a very similar predicament afflicts spatial regression analysis with a spatial weight matrix that would be assigned a priori, without validation on the specific domain of application. Besides, a dependable geospatial knowledge discovery process necessitates algorithms supporting automatic and robust but accurate procedures for the evaluation of mined results. Surprisingly, this has received little attention in the context of spatial association rule mining. To remedy the existing deficiencies mentioned above, the foremost goal for this research is to construct a comprehensive geospatial knowledge discovery framework using spatial association rule mining for the detection of spatial patterns embedded in geospatial databases and to demonstrate its application within the domain of crime analysis. It is the first attempt at delivering a complete geo-spatial knowledge discovery framework using spatial association rule mining

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc

    Sustainable Smart Cities and Smart Villages Research

    Get PDF
    ca. 200 words; this text will present the book in all promotional forms (e.g. flyers). Please describe the book in straightforward and consumer-friendly terms. [There is ever more research on smart cities and new interdisciplinary approaches proposed on the study of smart cities. At the same time, problems pertinent to communities inhabiting rural areas are being addressed, as part of discussions in contigious fields of research, be it environmental studies, sociology, or agriculture. Even if rural areas and countryside communities have previously been a subject of concern for robust policy frameworks, such as the European Union’s Cohesion Policy and Common Agricultural Policy Arguably, the concept of ‘the village’ has been largely absent in the debate. As a result, when advances in sophisticated information and communication technology (ICT) led to the emergence of a rich body of research on smart cities, the application and usability of ICT in the context of a village has remained underdiscussed in the literature. Against this backdrop, this volume delivers on four objectives. It delineates the conceptual boundaries of the concept of ‘smart village’. It highlights in which ways ‘smart village’ is distinct from ‘smart city’. It examines in which ways smart cities research can enrich smart villages research. It sheds light on the smart village research agenda as it unfolds in European and global contexts.
    corecore