14,884 research outputs found
The Impact of Acoustic Imaging Geometry on the Fidelity of Seabed Bathymetric Models
Attributes derived from digital bathymetric models (DBM) are a powerful means of analyzing seabed characteristics. Those models however are inherently constrained by the method of seabed sampling. Most bathymetric models are derived by collating a number of discrete corridors of multibeam sonar data. Within each corridor the data are collected over a wide range of distances, azimuths and elevation angles and thus the quality varies significantly. That variability therefore becomes imprinted into the DBM. Subsequent users of the DBM, unfamiliar with the original acquisition geometry, may potentially misinterpret such variability as attributes of the seabed. This paper examines the impact on accuracy and resolution of the resultant derived model as a function of the imaging geometry. This can be broken down into the range, angle, azimuth, density and overlap attributes. These attributes in turn are impacted by the sonar configuration including beam widths, beam spacing, bottom detection algorithms, stabilization strategies, platform speed and stability. Superimposed over the imaging geometry are residual effects due to imperfect integration of ancillary sensors. As the platform (normally a surface vessel), is moving with characteristic motions resulting from the ocean wave spectrum, periodic residuals in the seafloor can become imprinted that may again be misinterpreted as geomorphological information
μ 체 μλ² λ©μ νμ©ν μ€ν μΈμ½λ κΈ°λ° μ»΄ν¨ν° λΉμ λͺ¨νμ μ±λ₯ κ°μ
νμλ
Όλ¬Έ(λ°μ¬) -- μμΈλνκ΅λνμ : 곡과λν μ°μ
곡νκ³Ό, 2021.8. λ°μ’
ν.Deep learning models have dominated the field of computer vision, achieving state-of-the-art performance in various tasks. In particular, with recent increases in images and videos of people being posted on social media, research on computer vision tasks for analyzing human visual information is being used in various ways.
This thesis addresses classifying fashion styles and measuring motion similarity as two computer vision tasks related to humans. In real-world fashion style classification problems, the number of samples collected for each style class varies according to the fashion trend at the time of data collection, resulting in class imbalance. In this thesis, to cope with this class imbalance problem, generalized few-shot learning, in which both minority classes and majority classes are used for learning and evaluation, is employed. Additionally, the modalities of the foreground images, cropped to show only the body and fashion item parts, and the fashion attribute information are reflected in the fashion image embedding through a variational autoencoder. The K-fashion dataset collected from a Korean fashion shopping mall is used for the model training and evaluation.
Motion similarity measurement is used as a sub-module in various tasks such as action recognition, anomaly detection, and person re-identification; however, it has attracted less attention than the other tasks because the same motion can be represented differently depending on the performer's body structure and camera angle. The lack of public datasets for model training and evaluation also makes research challenging. Therefore, we propose an artificial dataset for model training, with motion embeddings separated from the body structure and camera angle attributes for training using an autoencoder architecture. The autoencoder is designed to generate motion embeddings for each body part to measure motion similarity by body part. Furthermore, motion speed is synchronized by matching patches performing similar motions using dynamic time warping. The similarity score dataset for evaluation was collected through a crowdsourcing platform utilizing videos of NTU RGB+D 120, a dataset for action recognition.
When the proposed models were verified with each evaluation dataset, both outperformed the baselines. In the fashion style classification problem, the proposed model showed the most balanced performance, without bias toward either the minority classes or the majority classes, among all the models. In addition, In the motion similarity measurement experiments, the correlation coefficient of the proposed model to the human-measured similarity score was higher than that of the baselines.μ»΄ν¨ν° λΉμ μ λ₯λ¬λ νμ΅ λ°©λ²λ‘ μ΄ κ°μ μ 보μ΄λ λΆμΌλ‘, λ€μν νμ€ν¬μμ μ°μν μ±λ₯μ 보μ΄κ³ μλ€. νΉν, μ¬λμ΄ ν¬ν¨λ μ΄λ―Έμ§λ λμμμ λ₯λ¬λμ ν΅ν΄ λΆμνλ νμ€ν¬μ κ²½μ°, μ΅κ·Ό μμ
λ―Έλμ΄μ μ¬λμ΄ ν¬ν¨λ μ΄λ―Έμ§ λλ λμμ κ²μλ¬Όμ΄ λμ΄λλ©΄μ κ·Έ νμ© κ°μΉκ° λμμ§κ³ μλ€.
λ³Έ λ
Όλ¬Έμμλ μ¬λκ³Ό κ΄λ ¨λ μ»΄ν¨ν° λΉμ νμ€ν¬ μ€ ν¨μ
μ€νμΌ λΆλ₯ λ¬Έμ μ λμ μ μ¬λ μΈ‘μ μ λν΄ λ€λ£¬λ€. ν¨μ
μ€νμΌ λΆλ₯ λ¬Έμ μ κ²½μ°, λ°μ΄ν° μμ§ μμ μ ν¨μ
μ νμ λ°λΌ μ€νμΌ ν΄λμ€λ³ μμ§λλ μνμ μμ΄ λ¬λΌμ§κΈ° λλ¬Έμ μ΄λ‘λΆν° ν΄λμ€ λΆκ· νμ΄ λ°μνλ€. λ³Έ λ
Όλ¬Έμμλ μ΄λ¬ν ν΄λμ€ λΆκ· ν λ¬Έμ μ λμ²νκΈ° μνμ¬, μμ μν ν΄λμ€μ λ€μ μν ν΄λμ€λ₯Ό νμ΅ λ° νκ°μ λͺ¨λ μ¬μ©νλ μΌλ°νλ ν¨μ·λ¬λμΌλ‘ ν¨μ
μ€νμΌ λΆλ₯ λ¬Έμ λ₯Ό μ€μ νμλ€. λν λ³λΆ μ€ν μΈμ½λ κΈ°λ°μ λͺ¨λΈμ ν΅ν΄, μ 체 λ° ν¨μ
μμ΄ν
λΆλΆλ§ μλΌλΈ μ κ²½ μ΄λ―Έμ§ λͺ¨λ¬λ¦¬ν°μ ν¨μ
μμ± μ 보 λͺ¨λ¬λ¦¬ν°κ° ν¨μ
μ΄λ―Έμ§μ μλ² λ© νμ΅μ λ°μλλλ‘ νμλ€. νμ΅ λ° νκ°λ₯Ό μν λ°μ΄ν°μ
μΌλ‘λ νκ΅ ν¨μ
μΌνλͺ°μμ μμ§λ K-fashion λ°μ΄ν°μ
μ μ¬μ©νμλ€.
ννΈ, λμ μ μ¬λ μΈ‘μ μ νμ μΈμ, μ΄μ λμ κ°μ§, μ¬λ μ¬μΈμ κ°μ λ€μν λΆμΌμ νμ λͺ¨λλ‘ νμ©λκ³ μμ§λ§ κ·Έ μμ²΄κ° μ°κ΅¬λ μ μ λ§μ§ μμλ°, μ΄λ κ°μ λμμ μννλλΌλ μ 체 ꡬ쑰 λ° μΉ΄λ©λΌ κ°λμ λ°λΌ λ€λ₯΄κ² ννλ μ μλ€λ μ μΌλ‘ λΆν° κΈ°μΈνλ€. νμ΅ λ° νκ°λ₯Ό μν κ³΅κ° λ°μ΄ν°μ
μ΄ λ§μ§ μλ€λ μ λν μ°κ΅¬λ₯Ό μ΄λ ΅κ² νλ μμΈμ΄λ€. λ°λΌμ λ³Έ λ
Όλ¬Έμμλ νμ΅μ μν μΈκ³΅ λ°μ΄ν°μ
μ μμ§νμ¬ μ€ν μΈμ½λ ꡬ쑰λ₯Ό ν΅ν΄ μ 체 ꡬ쑰 λ° μΉ΄λ©λΌ κ°λ μμκ° λΆλ¦¬λ λμ μλ² λ©μ νμ΅νμλ€. μ΄λ, κ° μ 체 λΆμλ³λ‘ λμ μλ² λ©μ μμ±ν μ μλλ‘νμ¬ μ 체 λΆμλ³λ‘ λμ μ μ¬λ μΈ‘μ μ΄ κ°λ₯νλλ‘ νμλ€. λ λμ μ¬μ΄μ μ μ¬λλ₯Ό μΈ‘μ ν λμλ λμ μκ° μν κΈ°λ²μ μ¬μ©, λΉμ·ν λμμ μννλ ꡬκ°λΌλ¦¬ μ λ ¬μμΌ μ μ¬λλ₯Ό μΈ‘μ νλλ‘ ν¨μΌλ‘μ¨, λμ μν μλμ μ°¨μ΄λ₯Ό 보μ νμλ€. νκ°λ₯Ό μν μ μ¬λ μ μ λ°μ΄ν°μ
μ νμ μΈμ λ°μ΄ν°μ
μΈ NTU-RGB+D 120μ μμμ νμ©νμ¬ ν¬λΌμ°λ μμ± νλ«νΌμ ν΅ν΄ μμ§λμλ€.
λ κ°μ§ νμ€ν¬μ μ μ λͺ¨λΈμ κ°κ°μ νκ° λ°μ΄ν°μ
μΌλ‘ κ²μ¦ν κ²°κ³Ό, λͺ¨λ λΉκ΅ λͺ¨λΈ λλΉ μ°μν μ±λ₯μ κΈ°λ‘νμλ€. ν¨μ
μ€νμΌ λΆλ₯ λ¬Έμ μ κ²½μ°, λͺ¨λ λΉκ΅κ΅°μμ μμ μν ν΄λμ€μ λ€μ μν ν΄λμ€ μ€ ν μͺ½μΌλ‘ μΉμ°μΉμ§ μλ κ°μ₯ κ· νμ‘ν μΆλ‘ μ±λ₯μ 보μ¬μ£Όμκ³ , λμ μ μ¬λ μΈ‘μ μ κ²½μ° μ¬λμ΄ μΈ‘μ ν μ μ¬λ μ μμ μκ΄κ³μμμ λΉκ΅ λͺ¨λΈ λλΉ λ λμ μμΉλ₯Ό λνλ΄μλ€.Chapter 1 Introduction 1
1.1 Background and motivation 1
1.2 Research contribution 5
1.2.1 Fashion style classication 5
1.2.2 Human motion similarity 9
1.2.3 Summary of the contributions 11
1.3 Thesis outline 13
Chapter 2 Literature Review 14
2.1 Fashion style classication 14
2.1.1 Machine learning and deep learning-based approaches 14
2.1.2 Class imbalance 15
2.1.3 Variational autoencoder 17
2.2 Human motion similarity 19
2.2.1 Measuring the similarity between two people 19
2.2.2 Human body embedding 20
2.2.3 Datasets for measuring the similarity 20
2.2.4 Triplet and quadruplet losses 21
2.2.5 Dynamic time warping 22
Chapter 3 Fashion Style Classication 24
3.1 Dataset for fashion style classication: K-fashion 24
3.2 Multimodal variational inference for fashion style classication 28
3.2.1 CADA-VAE 31
3.2.2 Generating multimodal features 33
3.2.3 Classier training with cyclic oversampling 36
3.3 Experimental results for fashion style classication 38
3.3.1 Implementation details 38
3.3.2 Settings for experiments 42
3.3.3 Experimental results on K-fashion 44
3.3.4 Qualitative analysis 48
3.3.5 Eectiveness of the cyclic oversampling 50
Chapter 4 Motion Similarity Measurement 53
4.1 Datasets for motion similarity 53
4.1.1 Synthetic motion dataset: SARA dataset 53
4.1.2 NTU RGB+D 120 similarity annotations 55
4.2 Framework for measuring motion similarity 58
4.2.1 Body part embedding model 58
4.2.2 Measuring motion similarity 67
4.3 Experimental results for measuring motion similarity 68
4.3.1 Implementation details 68
4.3.2 Experimental results on NTU RGB+D 120 similarity annotations 72
4.3.3 Visualization of motion latent clusters 78
4.4 Application 81
4.4.1 Real-world application with dancing videos 81
4.4.2 Tuning similarity scores to match human perception 87
Chapter 5 Conclusions 89
5.1 Summary and contributions 89
5.2 Limitations and future research 91
Appendices 93
Chapter A NTU RGB+D 120 Similarity Annotations 94
A.1 Data collection 94
A.2 AMT score analysis 95
Chapter B Data Cleansing of NTU RGB+D 120 Skeletal Data 100
Chapter C Motion Sequence Generation Using Mixamo 102
Bibliography 104
κ΅λ¬Έμ΄λ‘ 123λ°
Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources
We propose a method for visual question answering which combines an internal
representation of the content of an image with information extracted from a
general knowledge base to answer a broad range of image-based questions. This
allows more complex questions to be answered using the predominant neural
network-based approach than has previously been possible. It particularly
allows questions to be asked about the contents of an image, even when the
image itself does not contain the whole answer. The method constructs a textual
representation of the semantic content of an image, and merges it with textual
information sourced from a knowledge base, to develop a deeper understanding of
the scene viewed. Priming a recurrent neural network with this combined
information, and the submitted question, leads to a very flexible visual
question answering approach. We are specifically able to answer questions posed
in natural language, that refer to information not contained in the image. We
demonstrate the effectiveness of our model on two publicly available datasets,
Toronto COCO-QA and MS COCO-VQA and show that it produces the best reported
results in both cases.Comment: Accepted to IEEE Conf. Computer Vision and Pattern Recognitio
- β¦