3,694 research outputs found

    Single view depth estimation from train images

    Get PDF
    L'estimation de la profondeur consiste à calculer la distance entre différents points de la scène et la caméra. Savoir à quelle distance un objet donné est de la caméra permettrait de comprendre sa représentation spatiale. Les anciennes méthodes ont utilisé des paires d'images stéréo pour extraire la profondeur. Pour avoir une paire d'images stéréo, nous avons besoin d'une paire de caméras calibrées. Cependant, il est plus simple d'avoir une seule image étant donnée qu'aucun calibrage de caméra n'est alors nécessaire. C'est pour cette raison que les méthodes basées sur l'apprentissage sont apparues. Ils estiment la profondeur à partir d'une seule image. Les premières solutions des méthodes basées sur l'apprentissage ont utilisé la vérité terrain de la profondeur durant l'apprentissage. Cette vérité terrain est généralement acquise à partir de capteurs tels que Kinect ou Lidar. L'acquisition de profondeur est coûteuse et difficile, c'est pourquoi des méthodes auto-supervisées se sont apparues naturellement comme une solution. Ces méthodes ont montré de bons résultats pour l'estimation de la profondeur d'une seule image. Dans ce travail, nous proposons d'estimer des cartes de profondeur d'images prises du point de vue des conducteurs de train. Pour ce faire, nous avons proposé d'utiliser les contraintes géométriques et les paramètres standards des rails pour extraire la carte de profondeur à entre les rails, afin de la fournir comme signal de supervision au réseau. Il a été démontré que la carte de profondeur fournie au réseau résout le problème de la profondeur des voies ferrées qui apparaissent généralement comme des objets verticaux devant la caméra. Cela a également amélioré les résultats de l'estimation de la profondeur des séquences des trains. Au cours de ce projet, nous avons d'abord choisi certaines séquences de trains et déterminé leurs distances focales pour calculer la carte de profondeur de la voie ferrée. Nous avons utilisé ce jeu de données et les distances focales calculées pour affiner un modèle existant « Monodepth2 » pré-entrainé précédemment sur le jeu de données Kitti.Depth prediction is the task of computing the distance of different points in the scene from the camera. Knowing how far away a given object is from the camera would make it possible to understand its spatial representation. Early methods have used stereo pairs of images to extract depth. To have a stereo pair of images, we need a calibrated pair of cameras. However, it is simpler to have a single image as no calibration or synchronization is needed. For this reason, learning-based methods, which estimate depth from monocular images, have been introduced. Early solutions of learning-based problems have used ground truth depth for training, usually acquired from sensors such as Kinect or Lidar. Acquiring depth ground truth is expensive and difficult which is why self-supervised methods, which do not acquire such ground truth for fine-tuning, has appeared and have shown promising results for single image depth estimation. In this work, we propose to estimate depth maps for images taken from the train driver viewpoint. To do so, we propose to use geometry constraints and rails standard parameters to extract the depth map inside the rails, to provide it as a supervisory signal to the network. To this end, we first gathered a train sequences dataset and determined their focal lengths to compute the depth map inside the rails. Then we used this dataset and the computed focal lengths to finetune an existing model "Monodepth2" trained previously on the Kitti dataset. We show that the ground truth depth map provided to the network solves the problem of depth of the rail tracks which otherwise appear as standing objects in front of the camera. It also improves the results of depth estimation of train sequences

    The Programmable Web: Agile, Social, and Grassroots Computing

    Get PDF
    Web services, the semantic Web, and Web 2.0 are three somewhat separate movements trying to make the Web a programmable substrate. While each has achieved some level of success on their own right, it is becoming apparent that the grassroots approach of the Web 2.0 is gaining greater success than the other two. In this paper we analyze each movement, briefly describing its main traits, and outlining its primary assumptions. We then frame the common problem of achieving a programmable Web within the context of distributed computing and software engineering and then attempt to show why Web 2.0 is closest to give a pragmatic solution to the problem and will therefore likely continue to have the most success while the other two only have cursory contributions

    Cross-modal Learning for Domain Adaptation in 3D Semantic Segmentation

    Full text link
    Domain adaptation is an important task to enable learning when labels are scarce. While most works focus only on the image modality, there are many important multi-modal datasets. In order to leverage multi-modality for domain adaptation, we propose cross-modal learning, where we enforce consistency between the predictions of two modalities via mutual mimicking. We constrain our network to make correct predictions on labeled data and consistent predictions across modalities on unlabeled target-domain data. Experiments in unsupervised and semi-supervised domain adaptation settings prove the effectiveness of this novel domain adaptation strategy. Specifically, we evaluate on the task of 3D semantic segmentation using the image and point cloud modality. We leverage recent autonomous driving datasets to produce a wide variety of domain adaptation scenarios including changes in scene layout, lighting, sensor setup and weather, as well as the synthetic-to-real setup. Our method significantly improves over previous uni-modal adaptation baselines on all adaption scenarios. Code will be made available.Comment: arXiv admin note: text overlap with arXiv:1911.1267

    An algorithm for the selection of route dependent orientation information

    Get PDF
    Landmarks are important features of spatial cognition and are naturally included in human route descriptions. In the past algorithms were developed to select the most salient landmarks at decision points and automatically incorporate them in route instructions. Moreover, it was shown that human route descriptions contain a significant amount of orientation information, which support the users to orient themselves regarding known environmental information, and it was shown that orientation information support the acquisition of survey knowledge. Thus, there is a need to extend the landmarks selection to automatically select orientation information. In this work, we present an algorithm for the computational selection of route dependent orientation information, which extends previous algorithms and includes a salience calculation of orientation information for any location along the route. We implemented the algorithm and demonstrate the functionality based on OpenStreetMap data
    • …
    corecore