22,820 research outputs found

    Medical Image Segmentation Based on Multi-Modal Convolutional Neural Network: Study on Image Fusion Schemes

    Full text link
    Image analysis using more than one modality (i.e. multi-modal) has been increasingly applied in the field of biomedical imaging. One of the challenges in performing the multimodal analysis is that there exist multiple schemes for fusing the information from different modalities, where such schemes are application-dependent and lack a unified framework to guide their designs. In this work we firstly propose a conceptual architecture for the image fusion schemes in supervised biomedical image analysis: fusing at the feature level, fusing at the classifier level, and fusing at the decision-making level. Further, motivated by the recent success in applying deep learning for natural image analysis, we implement the three image fusion schemes above based on the Convolutional Neural Network (CNN) with varied structures, and combined into a single framework. The proposed image segmentation framework is capable of analyzing the multi-modality images using different fusing schemes simultaneously. The framework is applied to detect the presence of soft tissue sarcoma from the combination of Magnetic Resonance Imaging (MRI), Computed Tomography (CT) and Positron Emission Tomography (PET) images. It is found from the results that while all the fusion schemes outperform the single-modality schemes, fusing at the feature level can generally achieve the best performance in terms of both accuracy and computational cost, but also suffers from the decreased robustness in the presence of large errors in any image modalities.Comment: Zhe Guo and Xiang Li contribute equally to this wor

    Pedestrian Trajectory Prediction with Structured Memory Hierarchies

    Full text link
    This paper presents a novel framework for human trajectory prediction based on multimodal data (video and radar). Motivated by recent neuroscience discoveries, we propose incorporating a structured memory component in the human trajectory prediction pipeline to capture historical information to improve performance. We introduce structured LSTM cells for modelling the memory content hierarchically, preserving the spatiotemporal structure of the information and enabling us to capture both short-term and long-term context. We demonstrate how this architecture can be extended to integrate salient information from multiple modalities to automatically store and retrieve important information for decision making without any supervision. We evaluate the effectiveness of the proposed models on a novel multimodal dataset that we introduce, consisting of 40,000 pedestrian trajectories, acquired jointly from a radar system and a CCTV camera system installed in a public place. The performance is also evaluated on the publicly available New York Grand Central pedestrian database. In both settings, the proposed models demonstrate their capability to better anticipate future pedestrian motion compared to existing state of the art.Comment: To appear in ECML-PKDD 201

    An evaluation of a three-modal hand-based database to forensic-based gender recognition

    Get PDF
    In recent years, behavioural soft-biometrics have been widely used to improve biometric systems performance. Information like gender, age and ethnicity can be obtained from more than one behavioural modality. In this paper, we propose a multimodal hand-based behavioural database for gender recognition. Thus, our goal in this paper is to evaluate the performance of the multimodal database. For this, the experiment was realised with 76 users and was collected keyboard dynamics, touchscreen dynamics and handwritten signature data. Our approach consists of compare two-modal and one-modal modalities of the biometric data with the multimodal database. Traditional and new classifiers were used and the statistical Kruskal-Wallis to analyse the accuracy of the databases. The results showed that the multimodal database outperforms the other databases

    On Acquisition and Analysis of a Dataset Comprising of Gait, Ear and Semantic data

    No full text
    In outdoor scenarios such as surveillance where there is very little control over the environments, complex computer vision algorithms are often required for analysis. However constrained environments, such as walkways in airports where the surroundings and the path taken by individuals can be controlled, provide an ideal application for such systems. Figure 1.1 depicts an idealised constrained environment. The path taken by the subject is restricted to a narrow path and once inside is in a volume where lighting and other conditions are controlled to facilitate biometric analysis. The ability to control the surroundings and the flow of people greatly simplifes the computer vision task, compared to typical unconstrained environments. Even though biometric datasets with greater than one hundred people are increasingly common, there is still very little known about the inter and intra-subject variation in many biometrics. This information is essential to estimate the recognition capability and limits of automatic recognition systems. In order to accurately estimate the inter- and the intra- class variance, substantially larger datasets are required [40]. Covariates such as facial expression, headwear, footwear type, surface type and carried items are attracting increasing attention; although considering the potentially large impact on an individuals biometrics, large trials need to be conducted to establish how much variance results. This chapter is the first description of the multibiometric data acquired using the University of Southampton's Multi-Biometric Tunnel [26, 37]; a biometric portal using automatic gait, face and ear recognition for identification purposes. The tunnel provides a constrained environment and is ideal for use in high throughput security scenarios and for the collection of large datasets. We describe the current state of data acquisition of face, gait, ear, and semantic data and present early results showing the quality and range of data that has been collected. The main novelties of this dataset in comparison with other multi-biometric datasets are: 1. gait data exists for multiple views and is synchronised, allowing 3D reconstruction and analysis; 2. the face data is a sequence of images allowing for face recognition in video; 3. the ear data is acquired in a relatively unconstrained environment, as a subject walks past; and 4. the semantic data is considerably more extensive than has been available previously. We shall aim to show the advantages of this new data in biometric analysis, though the scope for such analysis is considerably greater than time and space allows for here

    Effectiveness of Multi-View Face Images and Anthropometric Data In Real-Time Networked Biometrics

    Get PDF
    Over the years, biometric systems have evolved into a reliable mechanism for establishing identity of individuals in the context of applications such as access control, personnel screening and criminal identification. However, recent terror attacks, security threats and intrusion attempts have necessitated a transition to modern biometric systems that can identify humans under unconstrained environments, in real-time. Specifically, the following are three critical transitions that are needed and which form the focus of this thesis: (1) In contrast to operation in an offline mode using previously acquired photographs and videos obtained under controlled environments, it is required that identification be performed in a real-time dynamic mode using images that are continuously streaming in, each from a potentially different view (front, profile, partial profile) and with different quality (pose and resolution). (2) While different multi-modal fusion techniques have been developed to improve system accuracy, these techniques have mainly focused on combining the face biometrics with modalities such as iris and fingerprints that are more reliable but require user cooperation for acquisition. In contrast, the challenge in a real-time networked biometric system is that of combining opportunistically captured multi-view facial images along with soft biometric traits such as height, gait, attire and color that do not require user cooperation. (3) Typical operation is expected to be in an open-set mode where the number of subjects that enrolled in the system is much smaller than the number of probe subjects; yet the system is required to generate high accuracy.;To address these challenges and to make a successful transition to real-time human identification systems, this thesis makes the following contributions: (1) A score-based multi- modal, multi-sample fusion technique is designed to combine face images acquired by a multi-camera network and the effectiveness of opportunistically acquired multi-view face images using a camera network in improving the identification performance is characterized; (2) The multi-view face acquisition system is complemented by a network of Microsoft Kinects for extracting human anthropometric features (specifically height, shoulder width and arm length). The score-fusion technique is augmented to utilize human anthropometric data and the effectiveness of this data is characterized. (3) The performance of the system is demonstrated using a database of 51 subjects collected using the networked biometric data acquisition system.;Our results show improved recognition accuracy when face information from multiple views is utilized for recognition and also indicate that a given level of accuracy can be attained with fewer probe images (lesser time) when compared with a uni-modal biometric system
    • …
    corecore