342 research outputs found

    Automatic Alignment of 3D Multi-Sensor Point Clouds

    Get PDF
    Automatic 3D point cloud alignment is a major research topic in photogrammetry, computer vision and computer graphics. In this research, two keypoint feature matching approaches have been developed and proposed for the automatic alignment of 3D point clouds, which have been acquired from different sensor platforms and are in different 3D conformal coordinate systems. The first proposed approach is based on 3D keypoint feature matching. First, surface curvature information is utilized for scale-invariant 3D keypoint extraction. Adaptive non-maxima suppression (ANMS) is then applied to retain the most distinct and well-distributed set of keypoints. Afterwards, every keypoint is characterized by a scale, rotation and translation invariant 3D surface descriptor, called the radial geodesic distance-slope histogram. Similar keypoints descriptors on the source and target datasets are then matched using bipartite graph matching, followed by a modified-RANSAC for outlier removal. The second proposed method is based on 2D keypoint matching performed on height map images of the 3D point clouds. Height map images are generated by projecting the 3D point clouds onto a planimetric plane. Afterwards, a multi-scale wavelet 2D keypoint detector with ANMS is proposed to extract keypoints on the height maps. Then, a scale, rotation and translation-invariant 2D descriptor referred to as the Gabor, Log-Polar-Rapid Transform descriptor is computed for all keypoints. Finally, source and target height map keypoint correspondences are determined using a bi-directional nearest neighbour matching, together with the modified-RANSAC for outlier removal. Each method is assessed on multi-sensor, urban and non-urban 3D point cloud datasets. Results show that unlike the 3D-based method, the height map-based approach is able to align source and target datasets with differences in point density, point distribution and missing point data. Findings also show that the 3D-based method obtained lower transformation errors and a greater number of correspondences when the source and target have similar point characteristics. The 3D-based approach attained absolute mean alignment differences in the range of 0.23m to 2.81m, whereas the height map approach had a range from 0.17m to 1.21m. These differences meet the proximity requirements of the data characteristics and the further application of fine co-registration approaches

    Angular Visual Hardness

    Get PDF
    Recent convolutional neural networks (CNNs) have led to impressive performance but often suffer from poor calibration. They tend to be overconfident, with the model confidence not always reflecting the underlying true ambiguity and hardness. In this paper, we propose angular visual hardness (AVH), a score given by the normalized angular distance between the sample feature embedding and the target classifier to measure sample hardness. We validate this score with an in-depth and extensive scientific study, and observe that CNN models with the highest accuracy also have the best AVH scores. This agrees with an earlier finding that state-of-art models improve on the classification of harder examples. We observe that the training dynamics of AVH is vastly different compared to the training loss. Specifically, AVH quickly reaches a plateau for all samples even though the training loss keeps improving. This suggests the need for designing better loss functions that can target harder examples more effectively. We also find that AVH has a statistically significant correlation with human visual hardness. Finally, we demonstrate the benefit of AVH to a variety of applications such as self-training for domain adaptation and domain generalization

    Re-identifying people in the crowd

    Get PDF
    Developing an automated surveillance system is of great interest for various reasons including forensic and security applications. In the case of a network of surveillance cameras with non-overlapping fields of view, person detection and tracking alone are insufficient to track a subject of interest across the network. In this case, instances of a person captured in one camera view need to be retrieved among a gallery of different people, in other camera views. This vision problem is commonly known as person re-identification (re-id). Cross-view instances of pedestrians exhibit varied levels of illumination, viewpoint, and pose variations which makes the problem very challenging. Despite recent progress towards improving accuracy, existing systems suffer from low applicability to real-world scenarios. This is mainly caused by the need for large amounts of annotated data from pairwise camera views to be available for training. Given the difficulty of obtaining such data and annotating it, this thesis aims to bring the person re-id problem a step closer to real-world deployment. In the first contribution, the single-shot protocol, where each individual is represented by a pair of images that need to be matched, is considered. Following the extensive annotation of four datasets for six attributes, an evaluation of the most widely used feature extraction schemes is conducted. The results reveal two high-performing descriptors among those evaluated, and show illumination variation to have the most impact on re-id accuracy. Motivated by the wide availability of videos from surveillance cameras and the additional visual and temporal information they provide, video-based person re-id is then investigated, and a su-pervised system is developed. This is achieved by improving and extending the best performing image-based person descriptor into three dimensions and combining it with distance metric learn-ing. The system obtained achieves state-of-the-art results on two widely used datasets. Given the cost and difficulty of obtaining labelled data from pairwise cameras in a network to train the model, an unsupervised video-based person re-id method is also developed. It is based on a set-based distance measure that leverages rank vectors to estimate the similarity scores between person tracklets. The proposed system outperforms other unsupervised methods by a large margin on two datasets while competing with deep learning methods on another large-scale dataset

    The cerebellar predictions for social interactions: theory of mind abilities in patients with degenerative cerebellar atrophy

    Get PDF
    Recent studies have focused on the role of the cerebellum in the social domain, including in Theory of Mind (ToM). ToM, or the "mentalizing" process, is the ability to attribute mental states, such as emotion, intentions and beliefs, to others to explain and predict their behavior. It is a fundamental aspect of social cognition and crucial for social interactions, together with more automatic mechanisms, such as emotion contagion. Social cognition requires complex interactions between limbic, associative areas and subcortical structures, including the cerebellum. It has been hypothesized that the typical cerebellar role in adaptive control and predictive coding could also be extended to social behavior. The present study aimed to investigate the social cognition abilities of patients with degenerative cerebellar atrophy to understand whether the cerebellum acts in specific ToM components playing a role as predictive structure. To this aim, an social cognition battery was administered to 27 patients with degenerative cerebellar pathology and 27 healthy controls. In addition, 3D T1-weighted and resting-state fMRI scans were collected to characterize the structural and functional changes in cerebello-cortical loops. The results evidenced that the patients were impaired in lower-level processes of immediate perception as well as in the more complex conceptual level of mentalization. Furthermore, they presented a pattern of GM reduction in cerebellar portions that are involved in the social domain such as crus I-II, lobule IX and lobule VIIIa. These areas showed decreased functional connectivity with projection cerebral areas involved in specific aspects of social cognition. These findings boost the idea that the cerebellar modulatory function on the cortical projection areas subtends the social cognition process at different levels. Particularly, regarding the lower-level processes, the cerebellum may act by implicitly matching the external information (i.e., expression of the eyes) with the respective internal representation to guarantee an immediate judgment about the mental state of others. Otherwise, at a more complex conceptual level, the cerebellum seems to be involved in the construction of internal models of mental processes during social interactions in which the prediction of sequential events plays a role, allowing us to anticipate the other person's behavior

    Investigating information processing within the brain using multi-electrode array (MEA) electrophysiology data

    Get PDF
    How a stimulus, such as an odour, is represented in the brain is one of the main questions in neuroscience. It is becoming clearer that information is encoded by a population of neurons, but, how the spiking activity of a population of neurons conveys this information is unknown. Several population coding hypotheses have formulated over the years, and therefore, to obtain a more definitive answer as to how a population of neurons represents stimulus information we need to test, i.e. support or falsify, each of the hypotheses. One way of addressing these hypotheses is to record and analyse the activity of multiple individual neurons from the brain of a test subject when a stimulus is, and is not, presented. With the advent of multi electrode arrays (MEA) we can now record such activity. However, before we can investigate/test the population coding hypotheses using such recordings, we need to determine the number of neurons recorded by the MEA and their spiking activity, after spike detection, using an automatic spike sorting algorithm (we refer to the spiking activity of the neurons extracted from the MEA recordings as MEA sorted data). While there are many automatic spike sorting methods available, they have limitations. In addition, we are lacking methods to test/investigate the population coding hypotheses in detail using the MEA sorted data. That is, methods that show whether neurons respond in a hypothesised way and, if they do, shows how the stimulus is represented within the recorded area. Thus, in this thesis, we were motivated to, firstly, develop a new automatic spike sorting method, which avoids the limitations of other methods. We validated our method using simulated and biological data. In addition, we found our method can perform better than other standard methods. We next focused on the population rate coding hypothesis (i.e. the hypothesis that information is conveyed in the number of spikes fired by a pop- ulation of neurons within a relevant time period). More specifically, we developed a method for testing/investigating the population rate coding hypothesis using the MEA sorted data. That is, a method that uses the multi variate analysis of variance (MANOVA) test, where we modified its output, to show the most responsive subar- eas within the recorded area. We validated this using simulated and biological data. Finally, we investigated whether noise correlation between neurons (i.e. correlations in the trial to trial variability of the response of neurons to the same stimulus) in a rat's olfactory bulb can affect the amount of information a population rate code conveys about a set of stimuli. We found that noise correlation between neurons was predominately positive, which, ultimately, reduced the amount of information a population containing >45 neurons could convey about the stimuli by ~30%

    Investigating the function of the ventral visual reading pathway and its involvement in acquired reading disorders

    No full text
    This thesis investigated the role of the left ventral occipitotemporal (vOT) cortex and how damage to this area causes peripheral reading disorders. Functional magnetic resonance imaging (fMRI) studies in volunteers demonstrated that the left vOT is activated by written words over numbers or perceptually-matched baselines, irrespective of the word’s location on the visual field. Mixed results were observed for the comparison of words versus false font stimuli. This response profile suggests that the left vOT is preferentially activated by words or word-like stimuli, due to either: (1) bottom-up specialisation for processing familiar word-forms; (2) top-down task-dependent modulation, or (3) a combination of the two. Further studies are proposed to discriminate between these possibilities. Thirteen patients with left occipitotemporal damage participated in the rehabilitation and fMRI studies. The patients were impaired on word, text and letter reading. A structural analysis showed that damage to the left occipitotemporal white matter, in the vicinity of the inferior longitudinal fasciculus, was associated with slow word reading speed. The fMRI study showed that the patients had reduced activation of the bilateral posterior superior temporal sulci relative to controls. Activity in this area correlated with reading speed. The efficacy of intensive whole-word recognition training was tested. Immediately after the training, trained words were read faster than untrained words, but the effects did not persist until the follow-up assessment. Hence, damage to the left vOT white matter impairs rapid whole-word recognition and is resistant to rehabilitation. The final study investigated the role of spatial frequency (SF) in the lateralisation of vOT function. Lateralisation of high and low SF processing was demonstrated, concordant with the lateralisation for words and faces to the left and right vOT respectively. A perceptual basis for the organisation of vOT cortex might explain why left vOT damage is resistant to treatment

    Object Recognition

    Get PDF
    Vision-based object recognition tasks are very familiar in our everyday activities, such as driving our car in the correct lane. We do these tasks effortlessly in real-time. In the last decades, with the advancement of computer technology, researchers and application developers are trying to mimic the human's capability of visually recognising. Such capability will allow machine to free human from boring or dangerous jobs
    corecore