13 research outputs found

    Combined Deep Learning and Traditional NLP Approaches for Fire Burst Detection Based on Twitter Posts

    Get PDF
    The current chapter introduces a procedure that aims at determining regions that are on fire, based on Twitter posts, as soon as possible. The proposed scheme utilizes a deep learning approach for analyzing the text of Twitter posts announcing fire bursts. Deep learning is becoming very popular within different text applications involving text generalization, text summarization, and extracting text information. A deep learning network is to be trained so as to distinguish valid Twitter fire-announcing posts from junk posts. Next, the posts labeled as valid by the network have undergone traditional NLP-based information extraction where the initial unstructured text is converted into a structured one, from which potential location and timestamp of the incident for further exploitation are derived. Analytic processing is then implemented in order to output aggregated reports which are used to finally detect potential geographical areas that are probably threatened by fire. So far, the part that has been implemented is the traditional NLP-based and has already derived promising results under real-world conditions’ testing. The deep learning enrichment is to be implemented and expected to build upon the performance of the existing architecture and further improve it

    Automated Real-time Anomaly Detection in Human Trajectories using Sequence to Sequence Networks

    Full text link
    Detection of anomalous trajectories is an important problem with potential applications to various domains, such as video surveillance, risk assessment, vessel monitoring and high-energy physics. Modeling the distribution of trajectories with statistical approaches has been a challenging task due to the fact that such time series are usually non stationary and highly dimensional. However, modern machine learning techniques provide robust approaches for data-driven modeling and critical information extraction. In this paper, we propose a Sequence to Sequence architecture for real-time detection of anomalies in human trajectories, in the context of risk-based security. Our detection scheme is tested on a synthetic dataset of diverse and realistic trajectories generated by the ISL iCrowd simulator. The experimental results indicate that our scheme accurately detects motion patterns that deviate from normal behaviors and is promising for future real-world applications.Comment: AVSS 201

    Ανάκτηση εκφράσεων προσώπου με χρήση τριδιάστατων πλεγματοσειρών

    No full text
    In recent years, the increased availability of inexpensive 3D object acquisition hardware and simplified 3D modeling software has resulted in the creation of massive 3D facial expression mesh sequences datasets that are either publicly available or for proprietary use. Consiquently, two new problems for the research community arose: Facial Expression Recognition from 3D mesh sequences and Facial Expression Retrieval from 3D mesh sequences. The first problem has gained a lot of interest among the research community. On the contrary, no sufficient research has been conducted on the second problem that deals with Retrieval.This dissertation focuses on the problem of facial expression retrieval from large datasets of 3D facial expressions mesh sequences. In order to address this problem we develop a 3-step retrieval scheme: (i) initially, eight 3D facial landmarks are automatically detected on each 3D face mesh of the sequence. (ii) Next, the landmarks extracted in the previous step, are used in order for the descriptors of the 3D facial expression mesh sequence to be created. (iii) Finally, appropriate distance functions are used in order for different descriptors (i.e. query descriptor vs dataset descriptor) to be compared and the retrieval list is produced. The core of the problem is the creation of appropriate descriptors. Six pioneer descriptors were created during this dissertation for 3D mesh sequence facial expression retrieval purposes (GeoTopo, GeoTopo+, DCT-GeoTopo, WT-GeoTopo+, CVD, WT-CVD). The aforementioned descriptors steadily increased the retrieval evaluation metrics. Two of them are spatial, which means that they are based only on spatial modifications of the facial expressions across time, and the remaining four are spatio-temporal, which means that they are based on both time and space modifications of the facial expressions.GeoTopo is a hybrid spatial descriptor which captures the topological as well as the geometric information of the 3D face meshes along time. This is achieved by concatenating two different sub-descriptors, one for the topology and one for the geometry of the 3D face mesh. GeoTopo+ is a hybrid spatial descriptor and is an improved version of GeoTopo. GeoTopo+ used two sub-descriptors for capturing the facial geometry and one for capturing facial topology.The motivation behind the proposed spatial, hybrid facial expression descriptors is the fact that some facial expressions, like happiness and surprise, are characterized by obvious changes in the mouth topology while others, like anger, fear and sadness, produce geometric but no significant topological changes.DCT-GeoTopo is the first attempt of constructing a spatio-temporal descriptor for 3D mesh sequence facial expression retrieval purposes. Initially, DCT-GeoTopo descriptor captures topological information of the 3D facial expression sequence. In the sequel, Discrete Cosine Transformation is applied on the aforementioned information consulting to the final spatio-temporal descriptor. WT-GeoTopo+ is a hybrid spatio-temporal descriptor which captures the geometric and the topological information of the 3D meshes in a similar way as GeoTopo+ does. In the sequel, the aforementioned spatial information is filtered by using Wavelet Transformation resulting to our final spatio-temporal descriptor. CVD descriptor is a spatio-temporal descriptor which exploits the depth information of the eight chosen facial landmarks. Finally, WT-CVD is an improved version of CVD which is produced after performing the Wavelet Transformation, on the depth information of the extracted facial landmarks.The motivation behind the proposed spatio-temporal descriptors is the fact that, in general, spatio-temporal descriptors are much more frugal, in terms of space and time requirements, than spatial descriptors. In addition, spatio-temporal descriptors are invariant to the number of the 3D face meshes of a facial expression sequence.The descriptors developed and described in this dissertation are evaluated in terms of retrieval accuracy and demonstrated using both quantitative and qualitative measures via an extensive consistent evaluation against state-of-the-art descriptors on standard datasets. This comparison illustrates the superiority of our descriptors compared to the state-of-the-art ones. Furthermore, a technique which exploits the retrieval results, in order to achieve unsupervised facial expression recognition from 3D mesh sequences, is presented. Our proposed technique achieve better results in terms of classification accuracy compared to the supervised dynamic 3D facial expression recognition state-of-the-art techniques.Τα τελευταία χρόνια, η αυξημένη διαθεσιμότητα φθηνού υλικού ψηφιοποίησης τριδιάστατων αντικειμένων και απλοποιημένου λογισμικού προσομοίωσης είχε ως αποτέλεσμα τη δημιουργία μαζικών βάσεων δεδομένων δυναμικών τριδιάστατων εκφράσεων προσώπου, οι οποίες είτε είναι διαθέσιμες στο κοινό ή αποκλειστικά για ιδιωτική χρήση. Αυτές οι βάσεις αποτελούνται από τριαδιάστατες πλεγματοσειρές εκφράσεων προσώπου. Ως εκ τούτου, παρουσιάστηκαν δυο νέα προβλήματα για την ερευνητική κοινότητα: Η Αναγνώριση και η Ανάκτηση ανθρώπινων εκφράσεων προσώπου από τριδιάστατες πλεγματοσειρές. Το πρώτο πρόβλημα παρουσίασε αυξημένο ενδιαφέρον στην ερευνητική κοινότητα. Αντίθετα, δεν υπάρχει επαρκής ερευνητική δραστηριότητα για το δεύτερο πρόβλημα που αφορά την ανάκτηση.Η παρούσα διατριβή εστιάζει στην ανάκτηση. Αναπτύχθηκε ένα σχήμα ανάκτησης τριών βημάτων: (Α) Αρχικά, ανιχνεύονται οκτώ τριαδιάστατα σημεία ορόσημα του προσώπου σε κάθε τριαδιάστατο πλέγμα της ακολουθίας. (Β) Στην συνέχεια, τα ορόσημα που εξάχθηκαν, χρησιμοποιούνται για να κατασκευαστούν οι περιγραφείς της πλεγματοσειράς. (Γ) Τέλος, εφαρμόζονται κατάλληλες συναρτήσεις απόστασης για σύγκριση μεταξύ διαφορετικών περιγραφέων (π.χ. επερώτησης με βάσης) και παραγωγή λίστας ανάκτησης. Ο πυρήνας του προβλήματος είναι η κατασκευή περιγραφέων. Στην παρούσα διατριβή δημιουργήθηκαν έξι νέοι περιγραφείς ανάκτησης εκφράσεων προσώπου από τριαδιάστατες πλεγματοσειρές (GeoTopo, GeoTopo+, DCT-GeoTopo, WT-GeoTopo+, CVD, WT-CVD). Οι εν λόγο περιγραφείς σταθερά αύξησαν τις επιδόσεις του σχήματος ανάκτησης. Οι δυο περιγραφείς είναι χωρικοί, δηλαδή βασίζονται μόνο στις χωρικές μεταβολές του προσώπου, εξαιτίας μιας έκφρασης. Οι τέσσερις είναι χωροχρονικοί, δηλαδή βασίζονται και σε χωρικές αλλά και χρονικές μεταβολές στο ανθρώπινο πρόσωπο, εξαιτίας των εκφράσεων. Ο GeoTopo είναι ένας υβριδικός χωρικός περιγραφέας που αποθηκεύει στην τοπολογική αλλά και γεωμετρική πληροφορία των τριδιάστατων πλεγμάτων προσώπου με την συνένωση δυο υπο-περιγραφέων, ενός για την τοπολογία και ενός για την γεωμετρία του τριαδιάστατου προσώπου του εκάστοτε πλέγματος. Ο GeoTopo+ είναι, επίσης, ένας υβριδικός χωρικός περιγραφέας και πρόκειται για μια βελτιωμένη έκδοση του GeoTopo. Ο GeoTopo+ συνενώνει δυο γεωμετρικούς και έναν τοπολογικό υπο-περιγραφέα. Το κίνητρο για την κατασκευή των παραπάνω υβριδικών και χωρικών περιγραφέων είναι ότι κάποιες εκφράσεις προσώπου (χαρά, έκπληξη) χαρακτηρίζονται από εμφανείς μεταβολές στην τοπολογία του προσώπου, ενώ άλλες (θυμός, φόβος, λύπη) χαρακτηρίζονται από γεωμετρικές αλλά όχι εμφανείς τοπολογικές μεταβολές.Ο DCT-GeoTopo, είναι ο πρώτος χωροχρονικός περιγραφέας που κατασκευάστηκε. Αρχικά, αποθηκεύεται μόνο η τοπολογική πληροφορία των πλεγμάτων προσώπου. Έπειτα, εφαρμόζεται ο μετασχηματισμός συνημίτονου. Ο WT-GeoTopo+ είναι ένας υβριδικός χωροχρονικός περιγραφέας που αποθηκεύει την τοπολογική και γεωμετρική πληροφορία όπως ο περιγραφέας GeoTopo+. Στην συνέχεια, η εν λόγο πληροφορία φιλτράρεται με μετασχηματισμό κυματιδίων. Ο CVD είναι ένας χωροχρονικός περιγραφέας που εκμεταλλεύεται το βάθος των οκτώ οροσήμων προσώπου. Τέλος, ο WT-CVD είναι μια βελτιωμένη έκδοση του CVD και παράγεται μετά το φιλτράρισμα της πληροφορίας βάθους με μετασχηματισμό κυματιδίων. Εν γένει, οι χωροχρονικοί περιγραφείς είναι πολύ λιγότερο απαιτητικοί σε θέματα χωριτικότητας αλλά και ταχύτητας από τους χωρικούς περιγραφείς. Επιπλέον, είναι ανεξάρτητοι από το πλήθος των πλεγμάτων εντός της πλεγματοσειράς.Οι περιγραφείς που αναπτύχθηκαν και περιγράφηκαν αξιολογούνται με όρους ακρίβειας ανάκτησης και συγκρίνονται ποσοτικά και ποιοτικά με τους περιγραφείς της τρέχουσας τεχνολογικής στάθμης στις ευρέως χρησιμοποιούμενες βάσεις δεδομένων. Η εν λόγο σύγκριση φανερώνει την ανωτερότητα των προτεινόμενων περιγραφέων. Επιπλέον παρουσιάζεται μια τεχνική η οποία εκμεταλλεύεται τα αποτελέσματα της ανάκτησης προκειμένου να επιτύχει μη-καθοδηγούμενη αναγνώριση έκφρασης προσώπου από τριδιάστατες πλεγματοσειρές. Η τεχνική που προτάθηκε στην παρούσα διατριβή επιτυγχάνει καλύτερα αποτελέσματα, σε όρους ακρίβειας ταξινόμηση

    Image-based Somatotype as a Biometric Trait for Non-Collaborative Person Recognition at a Distance and On-The-Move

    No full text
    It has recently been shown in Re-Identification (Re-ID) work that full-body images of people reveal their somatotype, even after change in apparel. A significant advantage of this biometric trait is that it can easily be captured, even at a distance, as a full-body image of a person, taken by a standard 2D camera. In this work, full-body image-based somatotype is investigated as a novel soft biometric feature for person recognition at a distance and on-the-move. The two common scenarios of i) identification and ii) verification are both studied and evaluated. To this end, two different deep networks have been recruited, one for the identification and one for the verification scenario. Experiments have been conducted on popular, publicly available datasets and the results indicate that somatotype can indeed be a valuable biometric trait for identity recognition at a distance and on-the-move (and hence also suitable for non-collaborative individuals) due to the ease of obtaining the required images. This soft biometric trait can be especially useful under a wider biometric fusion scheme

    Action unit detection in 3D facial videos with application in facial expression retrieval and recognition

    No full text
    This work introduces a new scheme for action unit detection in 3D facial videos. Sets of features that define action unit activation in a robust manner are proposed. These features are computed based on eight detected facial landmarks on each facial mesh that involve angles, areas and distances. Support vector machine classifiers are then trained using the above features in order to perform action unit detection. The proposed AU detection scheme is used in a dynamic 3D facial expression retrieval and recognition pipeline, highlighting the most important AU s, in terms of providing facial expression information, and at the same time, resulting in better performance than state-of-the-art methodologies

    Blind image deconvolution using a banded matrix method

    No full text
    In this paper we study the blind image deconvolution problem in the presence of noise and measurement errors. We use a stable banded matrix based approach in order to robustly compute the greatest common divisor of two univariate polynomials and we introduce the notion of approximate greatest common divisor to encapsulate the above approach, for blind image restoration. Our method is analyzed concerning its stability and complexity resulting to useful conclusions. It is proved that our approach has better complexity than the other known greatest common divisor based blind image deconvolution techniques. Examples illustrating our procedures are given

    Cross-time registration of 3D point clouds

    No full text
    Registration is a ubiquitous operation in visual computing and constitutes an important pre-processing step for operations such as 3D object reconstruction, retrieval and recognition. Particularly in cultural heritage (CH) applications, registration techniques are essential for the digitization and restoration pipelines. Cross-time registration is a special case where the objects to be registered are instances of the same object after undergoing processes such as erosion or restoration. Traditional registration techniques are inadequate to address this problem with the required high accuracy for detecting minute changes; some are extremely slow. A deep learning registration framework for cross-time registration is proposed which uses the DeepGMR network in combination with a novel down-sampling scheme for cross-time registration. A dataset especially designed for cross-time registration is presented (called ECHO) and an extensive evaluation of state-of-the-art methods is conducted for the challenging case of cross-time registration

    Survey of automated multiple sclerosis lesion segmentation techniques on magnetic resonance imaging

    No full text
    Multiple sclerosis (MS) is a chronic disease. It affects the central nervous system and its clinical manifestation can variate. Magnetic Resonance Imaging (MRI) is often used to detect, characterize and quantify MS lesions in the brain, due to the detailed structural information that it can provide. Manual detection and measurement of MS lesions in MRI data is time-consuming, subjective and prone to errors. Therefore, multiple automated methodologies for MRI-based MS lesion segmentation have been proposed. Here, a review of the state-of-the-art of automatic methods available in the literature is presented. The current survey provides a categorization of the methodologies in existence in terms of their input data handling, their main strategy of segmentation and their type of supervision. The strengths and weaknesses of each category are analyzed and explicitly discussed. The positive and negative aspects of the methods are highlighted, pointing out the future trends and, thus, leading to possible promising directions for future research. In addition, a further clustering of the methods, based on the databases used for their evaluation, is provided. The aforementioned clustering achieves a reliable comparison among methods evaluated on the same databases. Despite the large number of methods that have emerged in the field, there is as yet no commonly accepted methodology that has been established in clinical practice. Future challenges such as the simultaneous exploitation of more sophisticated MRI protocols and the hybridization of the most promising methods are expected to further improve the performance of the segmentation

    Effective Descriptors for Human Action Retrieval from 3D Mesh Sequences

    No full text
    Two novel methods for fully unsupervised human action retrieval using 3D mesh sequences are presented. The first achieves high accuracy but is suitable for sequences consisting of clean meshes, such as artificial sequences or highly post-processed real sequences, while the second one is robust and suitable for noisy meshes, such as those that often result from unprocessed scanning or 3D surface reconstruction errors. The first method uses a spatio-temporal descriptor based on the trajectories of 6 salient points of the human body (i.e. the centroid, the top of the head and the ends of the two upper and two lower limbs) from which a set of kinematic features are extracted. The resulting features are transformed using the wavelet transformation in different scales and a set of statistics are used to obtain the descriptor. An important characteristic of this descriptor is that its length is constant independent of the number of frames in the sequence. The second descriptor consists of two complementary sub-descriptors, one based on the trajectory of the centroid of the human body across frames and the other based on the Hybrid static shape descriptor adapted for mesh sequences. The robustness of the second descriptor derives from the robustness involved in extracting the centroid and the Hybrid sub-descriptors. Performance figures on publicly available real and artificial datasets demonstrate our accuracy and robustness claims and in most cases the results outperform the state-of-the-art

    Robust 3D Face Reconstruction Using One/Two Facial Images

    No full text
    Being able to robustly reconstruct 3D faces from 2D images is a topic of pivotal importance for a variety of computer vision branches, such as face analysis and face recognition, whose applications are steadily growing. Unlike 2D facial images, 3D facial data are less affected by lighting conditions and pose. Recent advances in the computer vision field have enabled the use of convolutional neural networks (CNNs) for the production of 3D facial reconstructions from 2D facial images. This paper proposes a novel CNN-based method which targets 3D facial reconstruction from two facial images, one in front and one from the side, as are often available to law enforcement agencies (LEAs). The proposed CNN was trained on both synthetic and real facial data. We show that the proposed network was able to predict 3D faces in the MICC Florence dataset with greater accuracy than the current state-of-the-art. Moreover, a scheme for using the proposed network in cases where only one facial image is available is also presented. This is achieved by introducing an additional network whose task is to generate a rotated version of the original image, which in conjunction with the original facial image, make up the image pair used for reconstruction via the previous method
    corecore