195 research outputs found

    Novelty Detection And Cluster Analysis In Time Series Data Using Variational Autoencoder Feature Maps

    Get PDF
    The identification of atypical events and anomalies in complex data systems is an essential yet challenging task. The dynamic nature of these systems produces huge volumes of data that is often heterogeneous, and the failure to account for this will impede the detection of anomalies. Time series data encompass these issues and its high dimensional nature intensifies these challenges. This research presents a framework for the identification of anomalies in temporal data. A comparative analysis of Centroid, Density and Neural Network-based clustering techniques was performed and their scalability was assessed. This facilitated the development of a new algorithm called the Variational Autoencoder Feature Map (VAEFM) which is an ensemble method that is based on Kohonen’s Self-Organizing Maps (SOM) and Variational Autoencoders. The VAEFM is an unsupervised learning algorithm that models the distribution of temporal data without making a priori assumptions. It incorporates principles of novelty detection to enhance the representational capacity of SOMs neurons, which improves their ability to generalize with novel data. The VAEFM technique was demonstrated on a dataset of accumulated aircraft sensor recordings, to detect atypical events that transpired in the approach phase of flight. This is a proactive means of accident prevention and is therefore advantageous to the Aviation industry. Furthermore, accumulated aircraft data presents big data challenges, which requires scalable analytical solutions. The results indicated that VAEFM successfully identified temporal dependencies in the flight data and produced several clusters and outliers. It analyzed over 2500 flights in under 5 minutes and identified 12 clusters, two of which contained stabilized approaches. The remaining comprised of aborted approaches, excessively high/fast descent patterns and other contributory factors for unstabilized approaches. Outliers were detected which revealed oscillations in aircraft trajectories; some of which would have a lower detection rate using traditional flight safety analytical techniques. The results further indicated that VAEFM facilitates large-scale analysis and its scaling efficiency was demonstrated on a High Performance Computing System, by using an increased number of processors, where it achieved an average speedup of 70%

    Qualitative Spatial Query Processing : Towards Cognitive Geographic Information Systems

    Get PDF
    For a long time, Geographic Information Systems (GISs) have been used by GIS-experts to perform numerous tasks including way finding, mapping, and querying geo-spatial databases. The advancement of Web 2.0 technologies and the development of mobile-based device applications present an excellent opportunity to allow the public -non-expert users- to access information of GISs. However, the interfaces of GISs were mainly designed and developed based on quantitative values of spatial databases to serve GIS-experts, whereas non-expert users usually prefer a qualitative approach to interacting with GISs. For example, humans typically resort to expressions such as the building is near a riverbank or there is a restaurant inside a park which qualitatively locate the spatial entity with respect to another. In other words, the users' interaction with current GISs is still not intuitive and not efficient. This dissertation thusly aims at enabling users to intuitively and efficiently search spatial databases of GISs by means of qualitative relations or terms such as left, north of, or inside. We use these qualitative relations to formalise so-called Qualitative Spatial Queries (QSQs). Aside from existing topological models, we integrate distance and directional qualitative models into Spatial Data-Base Management Systems (SDBMSs) to allow the qualitative and intuitive formalism of queries in GISs. Furthermore, we abstract binary Qualitative Spatial Relations (QSRs) covering the aforementioned aspects of space from the database objects. We store the abstracted QSRs in a Qualitative Spatial Layer (QSL) that we extend into current SDBMSs to avoid the additional cost of the abstraction process when dealing with every single query. Nevertheless, abstracting the QSRs of QSL results in a high space complexity in terms of qualitative representations

    Human activity recognition with accelerometry: novel time and frequency features

    Get PDF
    Human Activity Recognition systems require objective and reliable methods that can be used in the daily routine and must offer consistent results according with the performed activities. These systems are under development and offer objective and personalized support for several applications such as the healthcare area. This thesis aims to create a framework for human activities recognition based on accelerometry signals. Some new features and techniques inspired in the audio recognition methodology are introduced in this work, namely Log Scale Power Bandwidth and the Markov Models application. The Forward Feature Selection was adopted as the feature selection algorithm in order to improve the clustering performances and limit the computational demands. This method selects the most suitable set of features for activities recognition in accelerometry from a 423th dimensional feature vector. Several Machine Learning algorithms were applied to the used accelerometry databases – FCHA and PAMAP databases - and these showed promising results in activities recognition. The developed algorithm set constitutes a mighty contribution for the development of reliable evaluation methods of movement disorders for diagnosis and treatment applications

    Spatial Keyword Querying: Ranking Evaluation and Efficient Query Processing

    Get PDF

    An Information-Theoretic Approach for Estimating Scenario Generalization in Crowd Motion Prediction

    Full text link
    Learning-based approaches to modeling crowd motion have become increasingly successful but require training and evaluation on large datasets, coupled with complex model selection and parameter tuning. To circumvent this tremendously time-consuming process, we propose a novel scoring method, which characterizes generalization of models trained on source crowd scenarios and applied to target crowd scenarios using a training-free, model-agnostic Interaction + Diversity Quantification score, ISDQ. The Interaction component aims to characterize the difficulty of scenario domains, while the diversity of a scenario domain is captured in the Diversity score. Both scores can be computed in a computation tractable manner. Our experimental results validate the efficacy of the proposed method on several simulated and real-world (source,target) generalization tasks, demonstrating its potential to select optimal domain pairs before training and testing a model

    Taramalı electron mikroskobu görüntülerinde mitokondrilerin otomatik olarak bölütlenmesi

    Get PDF
    Many studies have shown that shape of mitochondria indicates the occurrence of diseases. Scanning Electron Microscopy (SEM) enables to obtain image of internal structures of the cell and mitochondria. Automatic segmentation of mitochondria contributes to the decision of diseases by specialists. There is limited study about automatic segmentation of mitochondria in Serial Block-Face Scanning Electron Microscopy (SFBSEM) images. SBFSEM imaging technique provides full automation, well registered images, less time and less effort for data acquisition. Therefore, SBFSEM imaging technique is selected for this study. Recently, deep learning methods have been implemented for image processing of SEM datasets. However, due to requirement of huge datasets, much effort and powerful computers for preparing testing and training data, energy based model is implemented for this study. The algorithms used in this thesis are primarily the algorithms developed by Tasel et al for mitochondria segmentation in TEM images. The method includes preprocessing, ridge detection, energy mapping, curve fitting, snake-based shape extraction, validation and post-processing steps. In this thesis, these algorithms are adapted and refined for SBFSEM images to obtain optimum performance. Evaluations are made by using Dice Similarity Coefficient (DSC), precision, recall and F-Score metrics.Birçok çalışma mitokondri ve kristaların şeklinin hastalıkların oluşumunu belirttiğini göstermektedir. Taramalı Elektron Mikroskobu (SEM), hücrenin iç yapılarının ve mitokondrilerin görüntülerinin elde edilmesini sağlar. Mitokondrilerin otomatik bölütlenmesi uzmanlar tarafından hastalıkların karar verilmesine katkı sağlar. Seri Blok-Yüz Taramalı Elektron Mikroskobu (SFBSEM) görüntülerinde mitokondrinin otomatik segmentasyonu hakkında sınırlı çalışma vardır. SBFSEM görüntüleme tekniği, tam otomasyon, iyi kaydedilmiş görüntüler, veri elde etmek için daha az zaman ve daha az çaba sağlar. Bu nedenle, bu çalışma için SBFSEM görüntüleme tekniği seçilmiştir. Son zamanlarda, derin ögrenme yöntemleri SEM veri setlerinin görüntü işlemesi için uygulanmaktadır. Ancak, büyük veri setlerinin, fazla çabanın ve test ve eğitim verilerinin hazırlanması için güçlü bilgisayarların gerekliliğinden bu çalışma için enerji tabanlı model uygulanmaktadır. Bu tezde kullanılan algoritmalar öncelikle TEM görüntülerinde mitokondri bölütlenmesi için Taşel ve arkadaşları tarafından geliştirilen algoritmalardır. Yöntem, ön işleme, sırt algılama, enerji haritalama, eğri uyumlandırma, yılan temelli şekil çıkarma, doğrulama ve son işlem adımlarını içerir. Bu tezde, bu algoritmalar optimum performans elde etmek için SBFSEM görüntüleri için uyarlanmış ve yeniden düzenlenmiştir. Değerlendirmeler Dice Benzerlik Katsayısı(DSC), kesinlik, hatırlama ve F-Skoru metrikleri kullanılarak yapılır.M.S. - Master of Scienc

    슬라이딩 윈도우상의 빠른 점진적 밀도 기반 클러스터링

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2022. 8. 문봉기.Given the prevalence of mobile and IoT devices, continuous clustering against streaming data has become an essential tool of increasing importance for data analytics. Among many clustering approaches, density-based clustering has garnered much attention due to its unique advantage that it can detect clusters of an arbitrary shape when noise exists. However, when the clusters need to be updated continuously along with an evolving input dataset, a relatively high computational cost is required. Particularly, deleting data points from the clusters causes severe performance degradation. In this dissertation, the performance limits of the incremental density-based clustering over sliding windows are addressed. Ultimately, two algorithms, DISC and DenForest, are proposed. The first algorithm DISC is an incremental density-based clustering algorithm that efficiently produces the same clustering results as DBSCAN over sliding windows. It focuses on redundancy issues that occur when updating clusters. When multiple data points are inserted or deleted individually, surrounding data points are explored and retrieved redundantly. DISC addresses these issues and improves the performance by updating multiple points in a batch. It also presents several optimization techniques. The second algorithm DenForest is an incremental density-based clustering algorithm that primarily focuses on the deletion process. Unlike previous methods that manage clusters as a graph, DenForest manages clusters as a group of spanning trees, which contributes to very efficient deletion performance. Moreover, it provides a batch-optimized technique to improve the insertion performance. To prove the effectiveness of the two algorithms, extensive evaluations were conducted, and it is demonstrated that DISC and DenForest outperform the state-of-the-art density-based clustering algorithms significantly.모바일 및 IoT 장치가 널리 보급됨에 따라 스트리밍 데이터상에서 지속적으로 클러스터링 작업을 수행하는 것은 데이터 분석에서 점점 더 중요해지는 필수 도구가 되었습니다. 많은 클러스터링 방법 중에서 밀도 기반 클러스터링은 노이즈가 존재할 때 임의의 모양의 클러스터를 감지할 수 있다는 고유한 장점을 가지고 있으며 이에 따라 많은 관심을 받았습니다. 그러나 밀도 기반 클러스터링은 변화하는 입력 데이터 셋에 따라 지속적으로 클러스터를 업데이트해야 하는 경우 비교적 높은 계산 비용이 필요합니다. 특히, 클러스터에서의 데이터 점들의 삭제는 심각한 성능 저하를 초래합니다. 본 박사 학위 논문에서는 슬라이딩 윈도우상의 밀도 기반 클러스터링의 성능 한계를 다루며 궁극적으로 두 가지 알고리즘을 제안합니다. 첫 번째 알고리즘인 DISC는 슬라이딩 윈도우상에서 DBSCAN과 동일한 클러스터링 결과를 찾는 점진적 밀도 기반 클러스터링 알고리즘입니다. 해당 알고리즘은 클러스터 업데이트 시에 발생하는 중복 문제들에 초점을 둡니다. 밀도 기반 클러스터링에서는 여러 데이터 점들을 개별적으로 삽입 혹은 삭제할 때 주변 점들을 불필요하게 중복적으로 탐색하고 회수합니다. DISC 는 배치 업데이트로 이 문제를 해결하여 성능을 향상시키며 여러 최적화 방법들을 제안합니다. 두 번째 알고리즘인 DenForest 는 삭제 과정에 초점을 둔 점진적 밀도 기반 클러스터링 알고리즘입니다. 클러스터를 그래프로 관리하는 이전 방법들과 달리 DenForest 는 클러스터를 신장 트리의 그룹으로 관리함으로써 효율적인 삭제 성능에 기여합니다. 나아가 배치 최적화 기법을 통해 삽입 성능 향상에도 기여합니다. 두 알고리즘의 효율성을 입증하기 위해 광범위한 평가를 수행하였으며 DISC 및 DenForest 는 최신의 밀도 기반 클러스터링 알고리즘들보다 뛰어난 성능을 보여주었습니다.1 Introduction 1 1.1 Overview of Dissertation 3 2 Related Works 7 2.1 Clustering 7 2.2 Density-Based Clustering for Static Datasets 8 2.2.1 Extension of DBSCAN 8 2.2.2 Approximation of Density-Based Clustering 9 2.2.3 Parallelization of Density-Based Clustering 10 2.3 Incremental Density-Based Clustering 10 2.3.1 Approximated Density-Based Clustering for Dynamic Datasets 11 2.4 Density-Based Clustering for Data Streams 11 2.4.1 Micro-clusters 12 2.4.2 Density-Based Clustering in Damped Window Model 12 2.4.3 Density-Based Clustering in Sliding Window Model 13 2.5 Non-Density-Based Clustering 14 2.5.1 Partitional Clustering and Hierarchical Clustering 14 2.5.2 Distribution-Based Clustering 15 2.5.3 High-Dimensional Data Clustering 15 2.5.4 Spectral Clustering 16 3 Background 17 3.1 DBSCAN 17 3.1.1 Reformulation of Density-Based Clustering 19 3.2 Incremental DBSCAN 20 3.3 Sliding Windows 22 3.3.1 Density-Based Clustering over Sliding Windows 23 3.3.2 Slow Deletion Problem 24 4 Avoiding Redundant Searches in Updating Clusters 26 4.1 The DISC Algorithm 27 4.1.1 Overview of DISC 27 4.1.2 COLLECT 29 4.1.3 CLUSTER 30 4.1.3.1 Splitting a Cluster 32 4.1.3.2 Merging Clusters 37 4.1.4 Horizontal Manner vs. Vertical Manner 38 4.2 Checking Reachability 39 4.2.1 Multi-Starter BFS 40 4.2.2 Epoch-Based Probing of R-tree Index 41 4.3 Updating Labels 43 5 Avoiding Graph Traversals in Updating Clusters 45 5.1 The DenForest Algorithm 46 5.1.1 Overview of DenForest 47 5.1.1.1 Supported Types of the Sliding Window Model 48 5.1.2 Nostalgic Core and Density-based Clusters 49 5.1.2.1 Cluster Membership of Border 51 5.1.3 DenTree 51 5.2 Operations of DenForest 54 5.2.1 Insertion 54 5.2.1.1 MST based on Link-Cut Tree 57 5.2.1.2 Time Complexity of Insert Operation 58 5.2.2 Deletion 59 5.2.2.1 Time Complexity of Delete Operation 61 5.2.3 Insertion/Deletion Examples 64 5.2.4 Cluster Membership 65 5.2.5 Batch-Optimized Update 65 5.3 Clustering Quality of DenForest 68 5.3.1 Clustering Quality for Static Data 68 5.3.2 Discussion 70 5.3.3 Replaceability 70 5.3.3.1 Nostalgic Cores and Density 71 5.3.3.2 Nostalgic Cores and Quality 72 5.3.4 1D Example 74 6 Evaluation 76 6.1 Real-World Datasets 76 6.2 Competing Methods 77 6.2.1 Exact Methods 77 6.2.2 Non-Exact Methods 77 6.3 Experimental Settings 78 6.4 Evaluation of DISC 78 6.4.1 Parameters 79 6.4.2 Baseline Evaluation 79 6.4.3 Drilled-Down Evaluation 82 6.4.3.1 Effects of Threshold Values 82 6.4.3.2 Insertions vs. Deletions 83 6.4.3.3 Range Searches 84 6.4.3.4 MS-BFS and Epoch-Based Probing 85 6.4.4 Comparison with Summarization/Approximation-Based Methods 86 6.5 Evaluation of DenForest 90 6.5.1 Parameters 90 6.5.2 Baseline Evaluation 91 6.5.3 Drilled-Down Evaluation 94 6.5.3.1 Varying Size of Window/Stride 94 6.5.3.2 Effect of Density and Distance Thresholds 95 6.5.3.3 Memory Usage 98 6.5.3.4 Clustering Quality over Sliding Windows 98 6.5.3.5 Clustering Quality under Various Density and Distance Thresholds 101 6.5.3.6 Relaxed Parameter Settings 102 6.5.4 Comparison with Summarization-Based Methods 102 7 Future Work: Extension to Varying/Relative Densities 105 8 Conclusion 107 Abstract (In Korean) 120박

    Exploring anomalies in time

    Get PDF

    A Smart Charging Assistant for Electric Vehicles Considering Battery Degradation, Power Grid and User Constraints

    Get PDF
    Der Anstieg intermittierender Stromerzeugung aus erneuerbaren Energiequellen erschwert zunehmend einen effizienten und zuverlässigen Betrieb der Versorgungsnetze. Gleichzeitig steigt die Zahl der Elektrofahrzeuge, die zum Aufladen erhebliche Mengen an elektrischer Energie benötigen, rapide an. Energie- und Mobilitätssektor sind somit unweigerlich miteinander verbunden, was zur Folge hat, dass zuverlässige Elektromobilität von einer robusten Stromversorgung abhängt. Darüber hinaus empfinden Fahrzeugnutzer ihre individuelle Mobilität als eingeschränkt, da Elektrofahrzeuge im Vergleich zu Fahrzeugen mit Verbrennungsmotor derzeit eine geringere Reichweite aufweisen und mehr Zeit zum Aufladen benötigen. In der vorliegenden Arbeit wird daher ein neuartiges Konzept sowie eine Softwareanwendung (Ladeassistent) vorgestellt, die den Nutzer beim Laden seines Elektrofahrzeuges unterstützt und dabei die Interessen aller beteiligten Akteure berücksichtigt. Dafür werden zunächst Gestaltungsmerkmale möglicher Softwarearchitekturen verglichen, um eine geeignete Struktur von Modulen und deren Verknüpfung zu definieren. Anschließend werden anhand realer Daten sowohl Energieverbrauchs- als auch Batteriemodelle entwickelt, verbessert und validiert, welche die Fahr- und Ladeeigenschaften von Elektrofahrzeugen abbilden. Die wichtigsten Beiträge dieser Arbeit resultieren aus der Entwicklung und Validierung der folgenden drei Kernkomponenten des Ladeassistenten. Als Erstes wird das individuelle Mobilitätsverhalten der Nutzer modelliert und anhand von aufgezeichneten und halbsynthetischen Fahrdaten von Elektrofahrzeugen ausgewertet. Insbesondere wird ein neuartiger, zweistufiger Clustering-Algorithmus entwickelt, um häufig besuchte Orte der Nutzer zu ermitteln. Anschließend werden Ensembles von Random-Forest-Modellen verwendet, um die nächsten Aufenthaltsorte und die dort typischen Parkzeiten vorherzusagen. Als Zweites wird gemischt-ganzzahlige stochastische Optimierung angewandt, um Ladestopps in einem zukünftigen Zeithorizont möglichst komfortabel und kostengünstig zu planen. Dabei wird ein graphenbasierter Algorithmus eingesetzt, um den Energiebedarf und die Eintrittswahrscheinlichkeit von Mobilitätsszenarien eines Elektrofahrzeugnutzers zu quantifizieren. Zur Validierung werden zwei alternative Ladestrategien definiert und mit dem vorgeschlagenen System verglichen. Als Drittes wird ein nichtlineares Optimierungsschema entwickelt, um vorhandene Zeit- und Energieflexibilität in Ladevorgängen von Elektrofahrzeugen zu nutzen. Die Integration eines detaillierten Batteriemodells ermöglicht eine genaue Quantifizierung der Kosteneinsparungen aufgrund einer geringeren Batteriealterung und dynamischer Stromtarife. Anhand von Daten aus realen Ladevorgängen von Elektrofahrzeugen können Einflüsse auf die Rentabilität von Vehicle-to-Grid-Anwendungen herausgearbeitet werden. Aus der Umsetzung des vorgestellten Ansatzes in einer realistischen Umgebung geht ein Architekturentwurf und ein Kommunikationskonzept für optimierungsbasierte intelligente Ladesysteme hervor. Dabei werden weitere Herausforderungen im Zusammenhang mit standardisierter Ladekommunikation, Eingriffen der Energieversorger und Nutzerakzeptanz aufgedeckt
    corecore