473,560 research outputs found

    3D human action recognition in multiple view scenarios

    Get PDF
    This paper presents a novel view-independent approach to the recognition of human gestures of several people in low resolution sequences from multiple calibrated cameras. In contraposition with other multi-ocular gesture recognition systems based on generating a classification on a fusion of features coming from different views, our system performs a data fusion (3D representation of the scene) and then a feature extraction and classification. Motion descriptors introduced by Bobick et al. for 2D data are extended to 3D and a set of features based on 3D invariant statistical moments are computed. Finally, a Bayesian classifier is employed to perform recognition over a small set of actions. Results are provided showing the effectiveness of the proposed algorithm in a SmartRoom scenario.Peer ReviewedPostprint (published version

    An Empirical Study On Sampling Approaches For 3D Image Classification Using Deep Learning

    Get PDF
    A 3D classification method requires more training data than a 2D image classification method to achieve good performance. These training data usually come in the form of multiple 2D images (e.g., slices in a CT scan) or point clouds (e.g., 3D CAD modeling) for volumetric object representation. The amount of data required to complete this higher dimension problem comes with the cost of requiring more processing time and space. This problem can be mitigated with data size reduction (i.e., sampling). In this thesis, we empirically study and compare the classification performance and deep learning training time of PointNet utilizing uniform random sampling and farthest point sampling, and SampleNet which utilizes a reduction approach based on weighted average of nearest neighbor points, and Multi-view Convolution Neural Network (MVCNN). Contrary to recent research which claimed that SampleNet performs outright better than simple form of sampling approaches used by PointNet, our experimental results show that SampleNet may not significantly reduce processing time and yet it achieves a poorer classification performance. Additionally, resolution reduction for the views in MVCNN achieves poor accuracy when compared to view reduction. Moreover, our experimental result shows that simple sampling approaches used by PointNet as well as using simple view reduction when using a multi-view classifier can maintain accuracy while decreasing processing time for the 3D classification task

    compressive synthetic aperture sonar imaging with distributed optimization

    Get PDF
    Synthetic aperture sonar (SAS) provides high-resolution acoustic imaging by processing coherently the backscattered acoustic signal recorded over consecutive pings. Traditionally, object detection and classification tasks rely on high-resolution seafloor mapping achieved with widebeam, broadband SAS systems. However, aspect- or frequency-specific information is crucial for improving the performance of automatic target recognition algorithms. For example, low frequencies can be partly transmitted through objects or penetrate the seafloor providing information about internal structure and buried objects, while multiple views provide information about the object's shape and dimensions. Sub-band and limited-view processing, though, degrades the SAS resolution. In this paper, SAS imaging is formulated as an l1-norm regularized least-squares optimization problem which improves the resolution by promoting a parsimonious representation of the data. The optimization problem is solved in a distributed and computationally efficient way with an algorithm based on the alternating direction method of multipliers. The resulting SAS image is the consensus outcome of collaborative filtering of the data from each ping. The potential of the proposed method for high-resolution, narrowband, and limited-aspect SAS imaging is demonstrated with simulated and experimental data.Synthetic aperture sonar (SAS) provides high-resolution acoustic imaging by processing coherently the backscattered acoustic signal recorded over consecutive pings. Traditionally, object detection and classification tasks rely on high-resolution seafloor mapping achieved with widebeam, broadband SAS systems. However, aspect- or frequency-specific information is crucial for improving the performance of automatic target recognition algorithms. For example, low frequencies can be partly transmitted through objects or penetrate the seafloor providing information about internal structure and buried objects, while multiple views provide information about the object's shape and dimensions. Sub-band and limited-view processing, though, degrades the SAS resolution. In this paper, SAS imaging is formulated as an l1-norm regularized least-squares optimization problem which improves the resolution by promoting a parsimonious representation of the data. The optimization problem is solved in a distributed and computati..

    A Comparative Assessment of Multi-view fusion learning for Crop Classification

    Full text link
    With a rapidly increasing amount and diversity of remote sensing (RS) data sources, there is a strong need for multi-view learning modeling. This is a complex task when considering the differences in resolution, magnitude, and noise of RS data. The typical approach for merging multiple RS sources has been input-level fusion, but other - more advanced - fusion strategies may outperform this traditional approach. This work assesses different fusion strategies for crop classification in the CropHarvest dataset. The fusion methods proposed in this work outperform models based on individual views and previous fusion methods. We do not find one single fusion method that consistently outperforms all other approaches. Instead, we present a comparison of multi-view fusion methods for three different datasets and show that, depending on the test region, different methods obtain the best performance. Despite this, we suggest a preliminary criterion for the selection of fusion methods.Comment: Accepted at IEEE International Geoscience and Remote Sensing Symposium 202

    Fast, collaborative acquisition of multi-view face images using a camera network and its impact on real-time human identification

    Get PDF
    Biometric systems have been typically designed to operate under controlled environments based on previously acquired photographs and videos. But recent terror attacks, security threats and intrusion attempts have necessitated a transition to modern biometric systems that can identify humans in real-time under unconstrained environments. Distributed camera networks are appropriate for unconstrained scenarios because they can provide multiple views of a scene, thus offering tolerance against variable pose of a human subject and possible occlusions. In dynamic environments, the face images are continually arriving at the base station with different quality, pose and resolution. Designing a fusion strategy poses significant challenges. Such a scenario demands that only the relevant information is processed and the verdict (match / no match) regarding a particular subject is quickly (yet accurately) released so that more number of subjects in the scene can be evaluated.;To address these, we designed a wireless data acquisition system that is capable of acquiring multi-view faces accurately and at a rapid rate. The idea of epipolar geometry is exploited to get high multi-view face detection rates. Face images are labeled to their corresponding poses and are transmitted to the base station. To evaluate the impact of face images acquired using our real-time face image acquisition system on the overall recognition accuracy, we interface it with a face matching subsystem and thus create a prototype real-time multi-view face recognition system. For front face matching, we use the commercial PittPatt software. For non-frontal matching, we use a Local binary Pattern based classifier. Matching scores obtained from both frontal and non-frontal face images are fused for final classification. Our results show significant improvement in recognition accuracy, especially when the front face images are of low resolution

    Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition

    Get PDF
    Human action recognition remains an important yet challenging task. This work proposes a novel action recognition system. It uses a novel Multiple View Region Adaptive Multi-resolution in time Depth Motion Map (MV-RAMDMM) formulation combined with appearance information. Multiple stream 3D Convolutional Neural Networks (CNNs) are trained on the different views and time resolutions of the region adaptive Depth Motion Maps. Multiple views are synthesised to enhance the view invariance. The region adaptive weights, based on localised motion, accentuate and differentiate parts of actions possessing faster motion. Dedicated 3D CNN streams for multi-time resolution appearance information (RGB) are also included. These help to identify and differentiate between small object interactions. A pre-trained 3D-CNN is used here with fine-tuning for each stream along with multiple class Support Vector Machines (SVM)s. Average score fusion is used on the output. The developed approach is capable of recognising both human action and human-object interaction. Three public domain datasets including: MSR 3D Action,Northwestern UCLA multi-view actions and MSR 3D daily activity are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the-art algorithms.Comment: 14 pages, 6 figures, 13 tables. Submitte

    PAMOGK: A pathway graph kernel based multi-omics clustering approach for discovering cancer patient subgroups

    Get PDF
    Accurate classification of patients into homogeneous molecular subgroups is critical for the developmentof effective therapeutics and for deciphering what drives these different subtypes to cancer. However, the extensivemolecular heterogeneity observed among cancer patients presents a challenge. The availability of multi-omic datacatalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumorswith unprecedented resolution. In this work, we develop PAMOGK, which integrates multi-omics patient data andincorporates the existing knowledge on biological pathways. PAMOGK is well suited to deal with the sparsity ofalterations in assessing patient similarities. We develop a novel graph kernel which we denote as smoothed shortestpath graph kernel, which evaluates patient similarities based on a single molecular alteration type in the contextof pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alterationcombinations, PAMOGK uses multi-view kernel clustering. We apply PAMOGK to find subgroups of kidney renalclear cell carcinoma (KIRC) patients, which results in four clusters with significantly different survival times (p-value =7.4e-10). The patient subgroups also differ with respect to other clinical parameters such as tumor stage andgrade, and primary tumor and metastasis tumor spreads. When we compare PAMOGK to 8 other state-of-the-artexisting multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partitionpatients into groups with different survival distributions. PAMOGK enables extracting the relative importance ofpathways and molecular data types. PAMOGK is available at github.com/tastanlab/pamog

    Quantifying the effect of aerial imagery resolution in automated hydromorphological river characterisation

    Get PDF
    Existing regulatory frameworks aiming to improve the quality of rivers place hydromorphology as a key factor in the assessment of hydrology, morphology and river continuity. The majority of available methods for hydromorphological characterisation rely on the identification of homogeneous areas (i.e., features) of flow, vegetation and substrate. For that purpose, aerial imagery is used to identify existing features through either visual observation or automated classification techniques. There is evidence to believe that the success in feature identification relies on the resolution of the imagery used. However, little effort has yet been made to quantify the uncertainty in feature identification associated with the resolution of the aerial imagery. This paper contributes to address this gap in knowledge by contrasting results in automated hydromorphological feature identification from unmanned aerial vehicles (UAV) aerial imagery captured at three resolutions (2.5 cm, 5 cm and 10 cm) along a 1.4 km river reach. The results show that resolution plays a key role in the accuracy and variety of features identified, with larger identification errors observed for riffles and side bars. This in turn has an impact on the ecological characterisation of the river reach. The research shows that UAV technology could be essential for unbiased hydromorphological assessment
    corecore