4,093 research outputs found
Deep Interactive Region Segmentation and Captioning
With recent innovations in dense image captioning, it is now possible to
describe every object of the scene with a caption while objects are determined
by bounding boxes. However, interpretation of such an output is not trivial due
to the existence of many overlapping bounding boxes. Furthermore, in current
captioning frameworks, the user is not able to involve personal preferences to
exclude out of interest areas. In this paper, we propose a novel hybrid deep
learning architecture for interactive region segmentation and captioning where
the user is able to specify an arbitrary region of the image that should be
processed. To this end, a dedicated Fully Convolutional Network (FCN) named
Lyncean FCN (LFCN) is trained using our special training data to isolate the
User Intention Region (UIR) as the output of an efficient segmentation. In
parallel, a dense image captioning model is utilized to provide a wide variety
of captions for that region. Then, the UIR will be explained with the caption
of the best match bounding box. To the best of our knowledge, this is the first
work that provides such a comprehensive output. Our experiments show the
superiority of the proposed approach over state-of-the-art interactive
segmentation methods on several well-known datasets. In addition, replacement
of the bounding boxes with the result of the interactive segmentation leads to
a better understanding of the dense image captioning output as well as accuracy
enhancement for the object detection in terms of Intersection over Union (IoU).Comment: 17, pages, 9 figure
Electronic Imaging & the Visual Arts. EVA 2012 Florence
The key aim of this Event is to provide a forum for the user, supplier and scientific research communities to meet and exchange experiences, ideas and plans in the wide area of Culture & Technology. Participants receive up to date news on new EC and international arts computing & telecommunications initiatives as well as on Projects in the visual arts field, in archaeology and history. Working Groups and new Projects are promoted. Scientific and technical demonstrations are presented
Ultrasound imaging operation capture and image analysis for speckle noise reduction and detection of shadows
Ultrasound is becoming increasingly important in medicine, both as a diagnostic tool
and as a therapeutic modality. At present, experienced sonographers observe trainees
as they generate hundreds of images, constantly providing them feedback and eventually
deciding if they have the appropriate skills and knowledge to perform ultrasound
independently.
This research seeks to advance towards developing an automated system capable of
assessing the motion of an ultrasound transducer and differentiate between a novice,
an intermediate and an expert sonographer. The research in this thesis synchronizes
the ultrasound images with three depth sensors (Microsoft Kinect) placed on the top,
left and right side of the patient to ensure the visibility of the ultrasound probe.
Videos obtained from the three categories of sonographers are manually labeled and
compared using Studiocode Development Environment to complete the items on the
medical form checklist.
Next, this thesis investigates and applies well known techniques used to smooth and
suppress speckle noise in ultrasound images by using quality metrics to test their
performance and show the benefits each one can contribute. Finally, this thesis investigates
the problem of shadow detection in ultrasound imaging and proposes to detect shadows automatically with an ultrasound confidence map using a random
walks algorithm. The results show that the proposed algorithm achieves an accuracy
of automatic detection of up to 85%, based on both the expert and manual
segmentation
A Cross-Season Correspondence Dataset for Robust Semantic Segmentation
In this paper, we present a method to utilize 2D-2D point matches between
images taken during different image conditions to train a convolutional neural
network for semantic segmentation. Enforcing label consistency across the
matches makes the final segmentation algorithm robust to seasonal changes. We
describe how these 2D-2D matches can be generated with little human interaction
by geometrically matching points from 3D models built from images. Two
cross-season correspondence datasets are created providing 2D-2D matches across
seasonal changes as well as from day to night. The datasets are made publicly
available to facilitate further research. We show that adding the
correspondences as extra supervision during training improves the segmentation
performance of the convolutional neural network, making it more robust to
seasonal changes and weather conditions.Comment: In Proc. CVPR 201
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
A generic tool for interactive complex image editing
Plenty of complex image editing techniques require certain per-pixel property or magnitude to be known, e.g., simulating depth of field effects requires a depth map. This work presents an efficient interaction paradigm that approximates any per-pixel magnitude from a few user strokes by propagating the sparse user input to each pixel of the image. The propagation scheme is based on a linear least-squares system of equations which represents local and neighboring restrictions over superpixels. After each user input, the system responds immediately, propagating the values and applying the corresponding filter. Our interaction paradigm is generic, enabling image editing applications to run at interactive rates by changing just the image processing algorithm, but keeping our proposed propagation scheme. We illustrate this through three interactive applications: depth of field simulation, dehazing and tone mapping
Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes
This report surveys advances in deep learning-based modeling techniques that
address four different 3D indoor scene analysis tasks, as well as synthesis of
3D indoor scenes. We describe different kinds of representations for indoor
scenes, various indoor scene datasets available for research in the
aforementioned areas, and discuss notable works employing machine learning
models for such scene modeling tasks based on these representations.
Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With
respect to analysis, we focus on four basic scene understanding tasks -- 3D
object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene
similarity. And for synthesis, we mainly discuss neural scene synthesis works,
though also highlighting model-driven methods that allow for human-centric,
progressive scene synthesis. We identify the challenges involved in modeling
scenes for these tasks and the kind of machinery that needs to be developed to
adapt to the data representation, and the task setting in general. For each of
these tasks, we provide a comprehensive summary of the state-of-the-art works
across different axes such as the choice of data representation, backbone,
evaluation metric, input, output, etc., providing an organized review of the
literature. Towards the end, we discuss some interesting research directions
that have the potential to make a direct impact on the way users interact and
engage with these virtual scene models, making them an integral part of the
metaverse.Comment: Published in Computer Graphics Forum, Aug 202
- …