533 research outputs found

    Single view depth estimation from train images

    Get PDF
    L'estimation de la profondeur consiste à calculer la distance entre différents points de la scène et la caméra. Savoir à quelle distance un objet donné est de la caméra permettrait de comprendre sa représentation spatiale. Les anciennes méthodes ont utilisé des paires d'images stéréo pour extraire la profondeur. Pour avoir une paire d'images stéréo, nous avons besoin d'une paire de caméras calibrées. Cependant, il est plus simple d'avoir une seule image étant donnée qu'aucun calibrage de caméra n'est alors nécessaire. C'est pour cette raison que les méthodes basées sur l'apprentissage sont apparues. Ils estiment la profondeur à partir d'une seule image. Les premières solutions des méthodes basées sur l'apprentissage ont utilisé la vérité terrain de la profondeur durant l'apprentissage. Cette vérité terrain est généralement acquise à partir de capteurs tels que Kinect ou Lidar. L'acquisition de profondeur est coûteuse et difficile, c'est pourquoi des méthodes auto-supervisées se sont apparues naturellement comme une solution. Ces méthodes ont montré de bons résultats pour l'estimation de la profondeur d'une seule image. Dans ce travail, nous proposons d'estimer des cartes de profondeur d'images prises du point de vue des conducteurs de train. Pour ce faire, nous avons proposé d'utiliser les contraintes géométriques et les paramètres standards des rails pour extraire la carte de profondeur à entre les rails, afin de la fournir comme signal de supervision au réseau. Il a été démontré que la carte de profondeur fournie au réseau résout le problème de la profondeur des voies ferrées qui apparaissent généralement comme des objets verticaux devant la caméra. Cela a également amélioré les résultats de l'estimation de la profondeur des séquences des trains. Au cours de ce projet, nous avons d'abord choisi certaines séquences de trains et déterminé leurs distances focales pour calculer la carte de profondeur de la voie ferrée. Nous avons utilisé ce jeu de données et les distances focales calculées pour affiner un modèle existant « Monodepth2 » pré-entrainé précédemment sur le jeu de données Kitti.Depth prediction is the task of computing the distance of different points in the scene from the camera. Knowing how far away a given object is from the camera would make it possible to understand its spatial representation. Early methods have used stereo pairs of images to extract depth. To have a stereo pair of images, we need a calibrated pair of cameras. However, it is simpler to have a single image as no calibration or synchronization is needed. For this reason, learning-based methods, which estimate depth from monocular images, have been introduced. Early solutions of learning-based problems have used ground truth depth for training, usually acquired from sensors such as Kinect or Lidar. Acquiring depth ground truth is expensive and difficult which is why self-supervised methods, which do not acquire such ground truth for fine-tuning, has appeared and have shown promising results for single image depth estimation. In this work, we propose to estimate depth maps for images taken from the train driver viewpoint. To do so, we propose to use geometry constraints and rails standard parameters to extract the depth map inside the rails, to provide it as a supervisory signal to the network. To this end, we first gathered a train sequences dataset and determined their focal lengths to compute the depth map inside the rails. Then we used this dataset and the computed focal lengths to finetune an existing model "Monodepth2" trained previously on the Kitti dataset. We show that the ground truth depth map provided to the network solves the problem of depth of the rail tracks which otherwise appear as standing objects in front of the camera. It also improves the results of depth estimation of train sequences

    Emotional Expression of Korean Dance Assisted by a Virtual Environment System

    Full text link

    Censusing and modeling the dynamics of a population of eastern hemlock (Tsuga canadensis L.) using remote sensing

    Get PDF
    A population of eastern hemlock (Tsuga canadensis L.) was censused from the ground using traditional field methods and from the air using large scale, high-resolution, aerial imagery in the early spring of 1997, 1998 and 1999. A manual crown survey map of the population, prepared from aerial imagery, was compared to a traditional field census. Over 60% of the individuals measured on the ground were not detected in the aerial census. Tree size, crown density and crown position all played roles in determining a crown\u27s visibility from the air. Nearly all large, upper canopy hemlocks were visible in the aerial census. An important minority of small, lower canopy hemlocks were also visible in the aerial census. An automated spatial segmentation procedure was developed to identify and measure individual population units, or blobs, within the forest population. A blob was defined as a distinct portion of crown segmented from its neighbors on the basis of size, shape, and connectivity. To ensure the comparability of multi-year segmentation maps, an automated blob reconciliation procedure was also developed to make certain that no hemlock pixels were assigned to different blobs in different years. Following spatial segmentation and reconciliation, a large majority of hemlock blobs (∼64--72%) were found to be closely associated with ground referenced, manually delineated, individual hemlock crowns. The remaining blobs consisted of spatially distinct parts of a crown or closely clumped multiple crowns. Matrix population models were constructed from the ground-derived and aerial-derived population data. Matrix analysis produced a number of useful population characteristics including overall population growth rate (lambda), stable stage distributions, reproductive values, and sensitivity values. lambda\u27s calculated from the aerial and ground-derived matrices were compared using randomization tests. While providing a different perspective and description of a population than traditional ground studies, demographic studies using remote sensing provide some promising advantages. The spatially explicit nature of the data permits more biologically realistic modeling of the population and the investigation of potential environmental influences on population dynamics. Automated extraction of demographic or megademographic data from remotely sensed images represents an important first step toward scaling population analysis to the landscape and regional levels

    Offshore oil spill detection using synthetic aperture radar

    Get PDF
    Among the different types of marine pollution, oil spill has been considered as a major threat to the sea ecosystems. The source of the oil pollution can be located on the mainland or directly at sea. The sources of oil pollution at sea are discharges coming from ships, offshore platforms or natural seepage from sea bed. Oil pollution from sea-based sources can be accidental or deliberate. Different sensors to detect and monitor oil spills could be onboard vessels, aircraft, or satellites. Vessels equipped with specialised radars, can detect oil at sea but they can cover a very limited area. One of the established ways to monitor sea-based oil pollution is the use of satellites equipped with Synthetic Aperture Radar (SAR).The aim of the work presented in this thesis is to identify optimum set of feature extracted parameters and implement methods at various stages for oil spill detection from Synthetic Aperture Radar (SAR) imagery. More than 200 images of ERS-2, ENVSAT and RADARSAT 2 SAR sensor have been used to assess proposed feature vector for oil spill detection methodology, which involves three stages: segmentation for dark spot detection, feature extraction and classification of feature vector. Unfortunately oil spill is not only the phenomenon that can create a dark spot in SAR imagery. There are several others meteorological and oceanographic and wind induced phenomena which may lead to a dark spot in SAR imagery. Therefore, these dark objects also appear similar to the dark spot due to oil spill and are called as look-alikes. These look-alikes thus cause difficulty in detecting oil spill spots as their primary characteristic similar to oil spill spots. To get over this difficulty, feature extraction becomes important; a stage which may involve selection of appropriate feature extraction parameters. The main objective of this dissertation is to identify the optimum feature vector in order to segregate oil spill and ‘look-alike’ spots. A total of 44 Feature extracted parameters have been studied. For segmentation, four methods; based on edge detection, adaptive theresholding, artificial neural network (ANN) segmentation and the other on contrast split segmentation have been implemented. Spot features are extracted from both the dark spots themselves and their surroundings. Classification stage was performed using two different classification techniques, first one is based on ANN and the other based on a two-stage processing that combines classification tree analysis and fuzzy logic. A modified feature vector, including both new and improved features, is suggested for better description of different types of dark spots. An ANN classifier using full spectrum of feature parameters has also been developed and evaluated. The implemented methodology appears promising in detecting dark spots and discriminating oil spills from look-alikes and processing time is well below any operational service requirements

    Multi-touch Detection and Semantic Response on Non-parametric Rear-projection Surfaces

    Get PDF
    The ability of human beings to physically touch our surroundings has had a profound impact on our daily lives. Young children learn to explore their world by touch; likewise, many simulation and training applications benefit from natural touch interactivity. As a result, modern interfaces supporting touch input are ubiquitous. Typically, such interfaces are implemented on integrated touch-display surfaces with simple geometry that can be mathematically parameterized, such as planar surfaces and spheres; for more complicated non-parametric surfaces, such parameterizations are not available. In this dissertation, we introduce a method for generalizable optical multi-touch detection and semantic response on uninstrumented non-parametric rear-projection surfaces using an infrared-light-based multi-camera multi-projector platform. In this paradigm, touch input allows users to manipulate complex virtual 3D content that is registered to and displayed on a physical 3D object. Detected touches trigger responses with specific semantic meaning in the context of the virtual content, such as animations or audio responses. The broad problem of touch detection and response can be decomposed into three major components: determining if a touch has occurred, determining where a detected touch has occurred, and determining how to respond to a detected touch. Our fundamental contribution is the design and implementation of a relational lookup table architecture that addresses these challenges through the encoding of coordinate relationships among the cameras, the projectors, the physical surface, and the virtual content. Detecting the presence of touch input primarily involves distinguishing between touches (actual contact events) and hovers (near-contact proximity events). We present and evaluate two algorithms for touch detection and localization utilizing the lookup table architecture. One of the algorithms, a bounded plane sweep, is additionally able to estimate hover-surface distances, which we explore for interactions above surfaces. The proposed method is designed to operate with low latency and to be generalizable. We demonstrate touch-based interactions on several physical parametric and non-parametric surfaces, and we evaluate both system accuracy and the accuracy of typical users in touching desired targets on these surfaces. In a formative human-subject study, we examine how touch interactions are used in the context of healthcare and present an exploratory application of this method in patient simulation. A second study highlights the advantages of touch input on content-matched physical surfaces achieved by the proposed approach, such as decreases in induced cognitive load, increases in system usability, and increases in user touch performance. In this experiment, novice users were nearly as accurate when touching targets on a 3D head-shaped surface as when touching targets on a flat surface, and their self-perception of their accuracy was higher

    Algorithms for the reconstruction, analysis, repairing and enhancement of 3D urban models from multiple data sources

    Get PDF
    Over the last few years, there has been a notorious growth in the field of digitization of 3D buildings and urban environments. The substantial improvement of both scanning hardware and reconstruction algorithms has led to the development of representations of buildings and cities that can be remotely transmitted and inspected in real-time. Among the applications that implement these technologies are several GPS navigators and virtual globes such as Google Earth or the tools provided by the Institut Cartogràfic i Geològic de Catalunya. In particular, in this thesis, we conceptualize cities as a collection of individual buildings. Hence, we focus on the individual processing of one structure at a time, rather than on the larger-scale processing of urban environments. Nowadays, there is a wide diversity of digitization technologies, and the choice of the appropriate one is key for each particular application. Roughly, these techniques can be grouped around three main families: - Time-of-flight (terrestrial and aerial LiDAR). - Photogrammetry (street-level, satellite, and aerial imagery). - Human-edited vector data (cadastre and other map sources). Each of these has its advantages in terms of covered area, data quality, economic cost, and processing effort. Plane and car-mounted LiDAR devices are optimal for sweeping huge areas, but acquiring and calibrating such devices is not a trivial task. Moreover, the capturing process is done by scan lines, which need to be registered using GPS and inertial data. As an alternative, terrestrial LiDAR devices are more accessible but cover smaller areas, and their sampling strategy usually produces massive point clouds with over-represented plain regions. A more inexpensive option is street-level imagery. A dense set of images captured with a commodity camera can be fed to state-of-the-art multi-view stereo algorithms to produce realistic-enough reconstructions. One other advantage of this approach is capturing high-quality color data, whereas the geometric information is usually lacking. In this thesis, we analyze in-depth some of the shortcomings of these data-acquisition methods and propose new ways to overcome them. Mainly, we focus on the technologies that allow high-quality digitization of individual buildings. These are terrestrial LiDAR for geometric information and street-level imagery for color information. Our main goal is the processing and completion of detailed 3D urban representations. For this, we will work with multiple data sources and combine them when possible to produce models that can be inspected in real-time. Our research has focused on the following contributions: - Effective and feature-preserving simplification of massive point clouds. - Developing normal estimation algorithms explicitly designed for LiDAR data. - Low-stretch panoramic representation for point clouds. - Semantic analysis of street-level imagery for improved multi-view stereo reconstruction. - Color improvement through heuristic techniques and the registration of LiDAR and imagery data. - Efficient and faithful visualization of massive point clouds using image-based techniques.Durant els darrers anys, hi ha hagut un creixement notori en el camp de la digitalització d'edificis en 3D i entorns urbans. La millora substancial tant del maquinari d'escaneig com dels algorismes de reconstrucció ha portat al desenvolupament de representacions d'edificis i ciutats que es poden transmetre i inspeccionar remotament en temps real. Entre les aplicacions que implementen aquestes tecnologies hi ha diversos navegadors GPS i globus virtuals com Google Earth o les eines proporcionades per l'Institut Cartogràfic i Geològic de Catalunya. En particular, en aquesta tesi, conceptualitzem les ciutats com una col·lecció d'edificis individuals. Per tant, ens centrem en el processament individual d'una estructura a la vegada, en lloc del processament a gran escala d'entorns urbans. Avui en dia, hi ha una àmplia diversitat de tecnologies de digitalització i la selecció de l'adequada és clau per a cada aplicació particular. Aproximadament, aquestes tècniques es poden agrupar en tres famílies principals: - Temps de vol (LiDAR terrestre i aeri). - Fotogrametria (imatges a escala de carrer, de satèl·lit i aèries). - Dades vectorials editades per humans (cadastre i altres fonts de mapes). Cadascun d'ells presenta els seus avantatges en termes d'àrea coberta, qualitat de les dades, cost econòmic i esforç de processament. Els dispositius LiDAR muntats en avió i en cotxe són òptims per escombrar àrees enormes, però adquirir i calibrar aquests dispositius no és una tasca trivial. A més, el procés de captura es realitza mitjançant línies d'escaneig, que cal registrar mitjançant GPS i dades inercials. Com a alternativa, els dispositius terrestres de LiDAR són més accessibles, però cobreixen àrees més petites, i la seva estratègia de mostreig sol produir núvols de punts massius amb regions planes sobrerepresentades. Una opció més barata són les imatges a escala de carrer. Es pot fer servir un conjunt dens d'imatges capturades amb una càmera de qualitat mitjana per obtenir reconstruccions prou realistes mitjançant algorismes estèreo d'última generació per produir. Un altre avantatge d'aquest mètode és la captura de dades de color d'alta qualitat. Tanmateix, la informació geomètrica resultant sol ser de baixa qualitat. En aquesta tesi, analitzem en profunditat algunes de les mancances d'aquests mètodes d'adquisició de dades i proposem noves maneres de superar-les. Principalment, ens centrem en les tecnologies que permeten una digitalització d'alta qualitat d'edificis individuals. Es tracta de LiDAR terrestre per obtenir informació geomètrica i imatges a escala de carrer per obtenir informació sobre colors. El nostre objectiu principal és el processament i la millora de representacions urbanes 3D amb molt detall. Per a això, treballarem amb diverses fonts de dades i les combinarem quan sigui possible per produir models que es puguin inspeccionar en temps real. La nostra investigació s'ha centrat en les següents contribucions: - Simplificació eficaç de núvols de punts massius, preservant detalls d'alta resolució. - Desenvolupament d'algoritmes d'estimació normal dissenyats explícitament per a dades LiDAR. - Representació panoràmica de baixa distorsió per a núvols de punts. - Anàlisi semàntica d'imatges a escala de carrer per millorar la reconstrucció estèreo de façanes. - Millora del color mitjançant tècniques heurístiques i el registre de dades LiDAR i imatge. - Visualització eficient i fidel de núvols de punts massius mitjançant tècniques basades en imatges

    훈련 자료 자동 추출 알고리즘과 기계 학습을 통한 SAR 영상 기반의 선박 탐지

    Get PDF
    학위논문 (석사) -- 서울대학교 대학원 : 자연과학대학 지구환경과학부, 2021. 2. 김덕진.Detection and surveillance of vessels are regarded as a crucial application of SAR for their contribution to the preservation of marine resources and the assurance on maritime safety. Introduction of machine learning to vessel detection significantly enhanced the performance and efficiency of the detection, but a substantial majority of studies focused on modifying the object detector algorithm. As the fundamental enhancement of the detection performance would be nearly impossible without accurate training data of vessels, this study implemented AIS information containing real-time information of vessel’s movement in order to propose a robust algorithm which acquires the training data of vessels in an automated manner. As AIS information was irregularly and discretely obtained, the exact target interpolation time for each vessel was precisely determined, followed by the implementation of Kalman filter, which mitigates the measurement error of AIS sensor. In addition, as the velocity of each vessel renders an imprint inside the SAR image named as Doppler frequency shift, it was calibrated by restoring the elliptic satellite orbit from the satellite state vector and estimating the distance between the satellite and the target vessel. From the calibrated position of the AIS sensor inside the corresponding SAR image, training data was directly obtained via internal allocation of the AIS sensor in each vessel. For fishing boats, separate information system named as VPASS was applied for the identical procedure of training data retrieval. Training data of vessels obtained via the automated training data procurement algorithm was evaluated by a conventional object detector, for three detection evaluating parameters: precision, recall and F1 score. All three evaluation parameters from the proposed training data acquisition significantly exceeded that from the manual acquisition. The major difference between two training datasets was demonstrated in the inshore regions and in the vicinity of strong scattering vessels in which land artifacts, ships and the ghost signals derived from them were indiscernible by visual inspection. This study additionally introduced a possibility of resolving the unclassified usage of each vessel by comparing AIS information with the accurate vessel detection results.전천후 지구 관측 위성인 SAR를 통한 선박 탐지는 해양 자원의 확보와 해상 안전 보장에 매우 중요한 역할을 한다. 기계 학습 기법의 도입으로 인해 선박을 비롯한 사물 탐지의 정확도 및 효율성이 향상되었으나, 이와 관련된 다수의 연구는 탐지 알고리즘의 개량에 집중되었다. 그러나, 탐지 정확도의 근본적인 향상은 정밀하게 취득된 대량의 훈련자료 없이는 불가능하기에, 본 연구에서는 선박의 실시간 위치, 속도 정보인 AIS 자료를 이용하여 인공 지능 기반의 선박 탐지 알고리즘에 사용될 훈련자료를 자동적으로 취득하는 알고리즘을 제안하였다. 이를 위해 이산적인 AIS 자료를 SAR 영상의 취득시각에 맞추어 정확하게 보간하고, AIS 센서 자체가 가지는 오차를 최소화하였다. 또한, 이동하는 산란체의 시선 속도로 인해 발생하는 도플러 편이 효과를 보정하기 위해 SAR 위성의 상태 벡터를 이용하여 위성과 산란체 사이의 거리를 정밀하게 계산하였다. 이렇게 계산된 AIS 센서의 영상 내의 위치로부터 선박 내 AIS 센서의 배치를 고려하여 선박 탐지 알고리즘의 훈련자료 형식에 맞추어 훈련자료를 취득하고, 어선에 대한 위치, 속도 정보인 VPASS 자료 역시 유사한 방법으로 가공하여 훈련자료를 취득하였다. AIS 자료로부터 취득한 훈련자료는 기존 방법대로 수동 취득한 훈련자료와 함께 인공 지능 기반 사물 탐지 알고리즘을 통해 정확도를 평가하였다. 그 결과, 제시된 알고리즘으로 취득한 훈련 자료는 수동 취득한 훈련 자료 대비 더 높은 탐지 정확도를 보였으며, 이는 기존의 사물 탐지 알고리즘의 평가 지표인 정밀도, 재현율과 F1 score를 통해 진행되었다. 본 연구에서 제안한 훈련자료 자동 취득 기법으로 얻은 선박에 대한 훈련자료는 특히 기존의 선박 탐지 기법으로는 분별이 어려웠던 항만에 인접한 선박과 산란체 주변의 신호에 대한 정확한 분별 결과를 보였다. 본 연구에서는 이와 함께, 선박 탐지 결과와 해당 지역에 대한 AIS 및 VPASS 자료를 이용하여 선박의 미식별성을 판정할 수 있는 가능성 또한 제시하였다.Chapter 1. Introduction - 1 - 1.1 Research Background - 1 - 1.2 Research Objective - 8 - Chapter 2. Data Acquisition - 10 - 2.1 Acquisition of SAR Image Data - 10 - 2.2 Acquisition of AIS and VPASS Information - 20 - Chapter 3. Methodology on Training Data Procurement - 26 - 3.1 Interpolation of Discrete AIS Data - 29 - 3.1.1 Estimation of Target Interpolation Time for Vessels - 29 - 3.1.2 Application of Kalman Filter to AIS Data - 34 - 3.2 Doppler Frequency Shift Correction - 40 - 3.2.1 Theoretical Basis of Doppler Frequency Shift - 40 - 3.2.2 Mitigation of Doppler Frequency Shift - 48 - 3.3 Retrieval of Training Data of Vessels - 53 - 3.4 Algorithm on Vessel Training Data Acquisition from VPASS Information - 61 - Chapter 4. Methodology on Object Detection Architecture - 66 - Chapter 5. Results - 74 - 5.1 Assessment on Training Data - 74 - 5.2 Assessment on AIS-based Ship Detection - 79 - 5.3 Assessment on VPASS-based Fishing Boat Detection - 91 - Chapter 6. Discussions - 110 - 6.1 Discussion on AIS-Based Ship Detection - 110 - 6.2 Application on Determining Unclassified Vessels - 116 - Chapter 7. Conclusion - 125 - 국문 요약문 - 128 - Bibliography - 130 -Maste

    A generic knowledge-guided image segmentation and labeling system using fuzzy clustering algorithms

    Full text link

    Field Testing of a Stochastic Planner for ASV Navigation Using Satellite Images

    Full text link
    We introduce a multi-sensor navigation system for autonomous surface vessels (ASV) intended for water-quality monitoring in freshwater lakes. Our mission planner uses satellite imagery as a prior map, formulating offline a mission-level policy for global navigation of the ASV and enabling autonomous online execution via local perception and local planning modules. A significant challenge is posed by the inconsistencies in traversability estimation between satellite images and real lakes, due to environmental effects such as wind, aquatic vegetation, shallow waters, and fluctuating water levels. Hence, we specifically modelled these traversability uncertainties as stochastic edges in a graph and optimized for a mission-level policy that minimizes the expected total travel distance. To execute the policy, we propose a modern local planner architecture that processes sensor inputs and plans paths to execute the high-level policy under uncertain traversability conditions. Our system was tested on three km-scale missions on a Northern Ontario lake, demonstrating that our GPS-, vision-, and sonar-enabled ASV system can effectively execute the mission-level policy and disambiguate the traversability of stochastic edges. Finally, we provide insights gained from practical field experience and offer several future directions to enhance the overall reliability of ASV navigation systems.Comment: 33 pages, 20 figures. Project website https://pcctp.github.io. arXiv admin note: text overlap with arXiv:2209.1186
    corecore