679 research outputs found
Coarse-to-Fine Annotation Enrichment for Semantic Segmentation Learning
Rich high-quality annotated data is critical for semantic segmentation
learning, yet acquiring dense and pixel-wise ground-truth is both labor- and
time-consuming. Coarse annotations (e.g., scribbles, coarse polygons) offer an
economical alternative, with which training phase could hardly generate
satisfactory performance unfortunately. In order to generate high-quality
annotated data with a low time cost for accurate segmentation, in this paper,
we propose a novel annotation enrichment strategy, which expands existing
coarse annotations of training data to a finer scale. Extensive experiments on
the Cityscapes and PASCAL VOC 2012 benchmarks have shown that the neural
networks trained with the enriched annotations from our framework yield a
significant improvement over that trained with the original coarse labels. It
is highly competitive to the performance obtained by using human annotated
dense annotations. The proposed method also outperforms among other
state-of-the-art weakly-supervised segmentation methods.Comment: CIKM 2018 International Conference on Information and Knowledge
Managemen
Weakly-supervised Semantic Segmentation in Cityscape via Hyperspectral Image
High-resolution hyperspectral images (HSIs) contain the response of each
pixel in different spectral bands, which can be used to effectively distinguish
various objects in complex scenes. While HSI cameras have become low cost,
algorithms based on it have not been well exploited. In this paper, we focus on
a novel topic, weakly-supervised semantic segmentation in cityscape via HSIs.
It is based on the idea that high-resolution HSIs in city scenes contain rich
spectral information, which can be easily associated to semantics without
manual labeling. Therefore, it enables low cost, highly reliable semantic
segmentation in complex scenes. Specifically, in this paper, we theoretically
analyze the HSIs and introduce a weakly-supervised HSI semantic segmentation
framework, which utilizes spectral information to improve the coarse labels to
a finer degree. The experimental results show that our method can obtain highly
competitive labels and even have higher edge fineness than artificial fine
labels in some classes. At the same time, the results also show that the
refined labels can effectively improve the effect of semantic segmentation. The
combination of HSIs and semantic segmentation proves that HSIs have great
potential in high-level visual tasks
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last
decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract
information thereby are multiplying with, as a first essential step, document
layout analysis. If the identification and categorization of segments of
interest in document images have seen significant progress over the last years
thanks to deep learning techniques, many challenges remain with, among others,
the use of finer-grained segmentation typologies and the consideration of
complex, heterogeneous documents such as historical newspapers. Besides, most
approaches consider visual features only, ignoring textual signal. In this
context, we introduce a multimodal approach for the semantic segmentation of
historical newspapers that combines visual and textual features. Based on a
series of experiments on diachronic Swiss and Luxembourgish newspapers, we
investigate, among others, the predictive power of visual and textual features
and their capacity to generalize across time and sources. Results show
consistent improvement of multimodal models in comparison to a strong visual
baseline, as well as better robustness to high material variance
κ°μΈν λνν μμ λΆν μκ³ λ¦¬μ¦μ μν μλ μ 보 νμ₯ κΈ°λ²μ λν μ°κ΅¬
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2021. 2. μ΄κ²½λ¬΄.Segmentation of an area corresponding to a desired object in an image is essential
to computer vision problems. This is because most algorithms are performed in
semantic units when interpreting or analyzing images. However, segmenting the
desired object from a given image is an ambiguous issue. The target object varies
depending on user and purpose. To solve this problem, an interactive segmentation
technique has been proposed. In this approach, segmentation was performed in the
desired direction according to interaction with the user. In this case, seed information
provided by the user plays an important role. If the seed provided by a user contain
abundant information, the accuracy of segmentation increases. However, providing
rich seed information places much burden on the users. Therefore, the main goal of
the present study was to obtain satisfactory segmentation results using simple seed
information.
We primarily focused on converting the provided sparse seed information to a rich
state so that accurate segmentation results can be derived. To this end, a minimum
user input was taken and enriched it through various seed enrichment techniques.
A total of three interactive segmentation techniques was proposed based on: (1)
Seed Expansion, (2) Seed Generation, (3) Seed Attention. Our seed enriching type
comprised expansion of area around a seed, generation of new seed in a new position,
and attention to semantic information.
First, in seed expansion, we expanded the scope of the seed. We integrated reliable
pixels around the initial seed into the seed set through an expansion step
composed of two stages. Through the extended seed covering a wider area than the
initial seed, the seed's scarcity and imbalance problems was resolved. Next, in seed
generation, we created a seed at a new point, but not around the seed. We trained
the system by imitating the user behavior through providing a new seed point in the
erroneous region. By learning the user's intention, our model could e ciently create
a new seed point. The generated seed helped segmentation and could be used as additional
information for weakly supervised learning. Finally, through seed attention,
we put semantic information in the seed. Unlike the previous models, we integrated
both the segmentation process and seed enrichment process. We reinforced the seed
information by adding semantic information to the seed instead of spatial expansion.
The seed information was enriched through mutual attention with feature maps
generated during the segmentation process.
The proposed models show superiority compared to the existing techniques
through various experiments. To note, even with sparse seed information, our proposed
seed enrichment technique gave by far more accurate segmentation results
than the other existing methods.μμμμ μνλ 물체 μμμ μλΌλ΄λ κ²μ μ»΄ν¨ν° λΉμ λ¬Έμ μμ νμμ μΈ μμμ΄λ€. μμμ ν΄μνκ±°λ λΆμν λ, λλΆλΆμ μκ³ λ¦¬μ¦λ€μ΄ μλ―Έλ‘ μ μΈ λ¨μ κΈ°λ°μΌλ‘ λμνκΈ° λλ¬Έμ΄λ€. κ·Έλ¬λ μμμμ 물체 μμμ λΆν νλ κ²μ λͺ¨νΈν λ¬Έμ μ΄λ€. μ¬μ©μμ λͺ©μ μ λ°λΌ μνλ 물체 μμμ΄ λ¬λΌμ§κΈ° λλ¬Έμ΄λ€. μ΄λ₯Ό ν΄κ²°νκΈ° μν΄ μ¬μ©μμμ κ΅λ₯λ₯Ό ν΅ν΄ μνλ λ°©ν₯μΌλ‘ μμ λΆν μ μ§ννλ λνν μμ λΆν κΈ°λ²μ΄ μ¬μ©λλ€. μ¬κΈ°μ μ¬μ©μκ° μ 곡νλ μλ μ λ³΄κ° μ€μν μν μ νλ€. μ¬μ©μμ μλλ₯Ό λ΄κ³ μλ μλ μ λ³΄κ° μ νν μλ‘ μμ λΆν μ μ νλλ μ¦κ°νκ² λλ€. κ·Έλ¬λ νλΆν μλ μ 보λ₯Ό μ 곡νλ κ²μ μ¬μ©μμκ² λ§μ λΆλ΄μ μ£Όκ² λλ€. κ·Έλ¬λ―λ‘ κ°λ¨ν μλ μ 보λ₯Ό μ¬μ©νμ¬ λ§μ‘±ν λ§ν λΆν κ²°κ³Όλ₯Ό μ»λ κ²μ΄ μ£Όμ λͺ©μ μ΄ λλ€.
μ°λ¦¬λ μ 곡λ ν¬μν μλ μ 보λ₯Ό λ³ννλ μμ
μ μ΄μ μ λμλ€. λ§μ½ μλ μ λ³΄κ° νλΆνκ² λ³νλλ€λ©΄ μ νν μμ λΆν κ²°κ³Όλ₯Ό μ»μ μ μκΈ° λλ¬Έμ΄λ€. κ·Έλ¬λ―λ‘ λ³Έ νμ λ
Όλ¬Έμμλ μλ μ 보λ₯Ό νλΆνκ² νλ κΈ°λ²λ€μ μ μνλ€. μ΅μνμ μ¬μ©μ μ
λ ₯μ κ°μ νκ³ μ΄λ₯Ό λ€μν μλ νμ₯ κΈ°λ²μ ν΅ν΄ λ³ννλ€. μ°λ¦¬λ μλ νλ, μλ μμ±, μλ μ£Όμ μ§μ€μ κΈ°λ°ν μ΄ μΈ κ°μ§μ λνν μμ λΆν κΈ°λ²μ μ μνλ€. κ°κ° μλ μ£Όλ³μΌλ‘μ μμ νλ, μλ‘μ΄ μ§μ μ μλ μμ±, μλ―Έλ‘ μ μ 보μ μ£Όλͺ©νλ ννμ μλ νμ₯ κΈ°λ²μ μ¬μ©νλ€.
λ¨Όμ μλ νλμ κΈ°λ°ν κΈ°λ²μμ μ°λ¦¬λ μλμ μμ νμ₯μ λͺ©νλ‘ νλ€. λ λ¨κ³λ‘ ꡬμ±λ νλ κ³Όμ μ ν΅ν΄ μ²μ μλ μ£Όλ³μ λΉμ·ν ν½μ
λ€μ μλ μμμΌλ‘ νΈμ
νλ€. μ΄λ κ² νμ₯λ μλλ₯Ό μ¬μ©ν¨μΌλ‘μ¨ μλμ ν¬μν¨κ³Ό λΆκ· νμΌλ‘ μΈν λ¬Έμ λ₯Ό ν΄κ²°ν μ μλ€. λ€μμΌλ‘ μλ μμ±μ κΈ°λ°ν κΈ°λ²μμ μ°λ¦¬λ μλ μ£Όλ³μ΄ μλ μλ‘μ΄ μ§μ μ μλλ₯Ό μμ±νλ€. μ°λ¦¬λ μ€μ°¨κ° λ°μν μμμ μ¬μ©μκ° μλ‘μ΄ μλλ₯Ό μ 곡νλ λμμ λͺ¨λ°©νμ¬ μμ€ν
μ νμ΅νμλ€. μ¬μ©μμ μλλ₯Ό νμ΅ν¨μΌλ‘μ¨ ν¨κ³Όμ μΌλ‘ μλλ₯Ό μμ±ν μ μλ€. μμ±λ μλλ μμ λΆν μ μ νλλ₯Ό λμΌ λΏλ§ μλλΌ μ½μ§λνμ΅μ μν λ°μ΄ν°λ‘μ¨ νμ©λ μ μλ€. λ§μ§λ§μΌλ‘ μλ μ£Όμ μ§μ€μ νμ©ν κΈ°λ²μμ μ°λ¦¬λ μλ―Έλ‘ μ μ 보λ₯Ό μλμ λ΄λλ€. κΈ°μ‘΄μ μ μν κΈ°λ²λ€κ³Ό λ¬λ¦¬ μμ λΆν λμκ³Ό μλ νμ₯ λμμ΄ ν΅ν©λ λͺ¨λΈμ μ μνλ€. μλ μ 보λ μμ λΆν λ€νΈμν¬μ νΉμ§λ§΅κ³Ό μνΈ κ΅λ₯νλ©° κ·Έ μ λ³΄κ° νλΆν΄μ§λ€.
μ μν λͺ¨λΈλ€μ λ€μν μ€νμ ν΅ν΄ κΈ°μ‘΄ κΈ°λ² λλΉ μ°μν μ±λ₯μ κΈ°λ‘νμλ€. νΉν μλκ° λΆμ‘±ν μν©μμ μλ νμ₯ κΈ°λ²λ€μ νλ₯ν λνν μμ λΆν μ±λ₯μ 보μλ€.1 Introduction 1
1.1 Previous Works 2
1.2 Proposed Methods 4
2 Interactive Segmentation with Seed Expansion 9
2.1 Introduction 9
2.2 Proposed Method 12
2.2.1 Background 13
2.2.2 Pyramidal RWR 16
2.2.3 Seed Expansion 19
2.2.4 Re nement with Global Information 24
2.3 Experiments 27
2.3.1 Dataset 27
2.3.2 Implement Details 28
2.3.3 Performance 29
2.3.4 Contribution of Each Part 30
2.3.5 Seed Consistency 31
2.3.6 Running Time 33
2.4 Summary 34
3 Interactive Segmentation with Seed Generation 37
3.1 Introduction 37
3.2 Related Works 40
3.3 Proposed Method 41
3.3.1 System Overview 41
3.3.2 Markov Decision Process 42
3.3.3 Deep Q-Network 46
3.3.4 Model Architecture 47
3.4 Experiments 48
3.4.1 Implement Details 48
3.4.2 Performance 49
3.4.3 Ablation Study 53
3.4.4 Other Datasets 55
3.5 Summary 58
4 Interactive Segmentation with Seed Attention 61
4.1 Introduction 61
4.2 Related Works 64
4.3 Proposed Method 65
4.3.1 Interactive Segmentation Network 65
4.3.2 Bi-directional Seed Attention Module 67
4.4 Experiments 70
4.4.1 Datasets 70
4.4.2 Metrics 70
4.4.3 Implement Details 71
4.4.4 Performance 71
4.4.5 Ablation Study 76
4.4.6 Seed enrichment methods 79
4.5 Summary 82
5 Conclusions 87
5.1 Summary 89
Bibliography 90
κ΅λ¬Έμ΄λ‘ 103Docto
Spott : on-the-spot e-commerce for television using deep learning-based video analysis techniques
Spott is an innovative second screen mobile multimedia application which offers viewers relevant information on objects (e.g., clothing, furniture, food) they see and like on their television screens. The application enables interaction between TV audiences and brands, so producers and advertisers can offer potential consumers tailored promotions, e-shop items, and/or free samples. In line with the current views on innovation management, the technological excellence of the Spott application is coupled with iterative user involvement throughout the entire development process. This article discusses both of these aspects and how they impact each other. First, we focus on the technological building blocks that facilitate the (semi-) automatic interactive tagging process of objects in the video streams. The majority of these building blocks extensively make use of novel and state-of-the-art deep learning concepts and methodologies. We show how these deep learning based video analysis techniques facilitate video summarization, semantic keyframe clustering, and (similar) object retrieval. Secondly, we provide insights in user tests that have been performed to evaluate and optimize the application's user experience. The lessons learned from these open field tests have already been an essential input in the technology development and will further shape the future modifications to the Spott application
Iterative Few-shot Semantic Segmentation from Image Label Text
Few-shot semantic segmentation aims to learn to segment unseen class objects
with the guidance of only a few support images. Most previous methods rely on
the pixel-level label of support images. In this paper, we focus on a more
challenging setting, in which only the image-level labels are available. We
propose a general framework to firstly generate coarse masks with the help of
the powerful vision-language model CLIP, and then iteratively and mutually
refine the mask predictions of support and query images. Extensive experiments
on PASCAL-5i and COCO-20i datasets demonstrate that our method not only
outperforms the state-of-the-art weakly supervised approaches by a significant
margin, but also achieves comparable or better results to recent supervised
methods. Moreover, our method owns an excellent generalization ability for the
images in the wild and uncommon classes. Code will be available at
https://github.com/Whileherham/IMR-HSNet.Comment: ijcai 202
- β¦