14,884 research outputs found

    The Impact of Acoustic Imaging Geometry on the Fidelity of Seabed Bathymetric Models

    Get PDF
    Attributes derived from digital bathymetric models (DBM) are a powerful means of analyzing seabed characteristics. Those models however are inherently constrained by the method of seabed sampling. Most bathymetric models are derived by collating a number of discrete corridors of multibeam sonar data. Within each corridor the data are collected over a wide range of distances, azimuths and elevation angles and thus the quality varies significantly. That variability therefore becomes imprinted into the DBM. Subsequent users of the DBM, unfamiliar with the original acquisition geometry, may potentially misinterpret such variability as attributes of the seabed. This paper examines the impact on accuracy and resolution of the resultant derived model as a function of the imaging geometry. This can be broken down into the range, angle, azimuth, density and overlap attributes. These attributes in turn are impacted by the sonar configuration including beam widths, beam spacing, bottom detection algorithms, stabilization strategies, platform speed and stability. Superimposed over the imaging geometry are residual effects due to imperfect integration of ancillary sensors. As the platform (normally a surface vessel), is moving with characteristic motions resulting from the ocean wave spectrum, periodic residuals in the seafloor can become imprinted that may again be misinterpreted as geomorphological information

    신체 μž„λ² λ”©μ„ ν™œμš©ν•œ μ˜€ν† μΈμ½”λ” 기반 컴퓨터 λΉ„μ „ λͺ¨ν˜•μ˜ μ„±λŠ₯ κ°œμ„ 

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 산업곡학과, 2021.8. λ°•μ’…ν—Œ.Deep learning models have dominated the field of computer vision, achieving state-of-the-art performance in various tasks. In particular, with recent increases in images and videos of people being posted on social media, research on computer vision tasks for analyzing human visual information is being used in various ways. This thesis addresses classifying fashion styles and measuring motion similarity as two computer vision tasks related to humans. In real-world fashion style classification problems, the number of samples collected for each style class varies according to the fashion trend at the time of data collection, resulting in class imbalance. In this thesis, to cope with this class imbalance problem, generalized few-shot learning, in which both minority classes and majority classes are used for learning and evaluation, is employed. Additionally, the modalities of the foreground images, cropped to show only the body and fashion item parts, and the fashion attribute information are reflected in the fashion image embedding through a variational autoencoder. The K-fashion dataset collected from a Korean fashion shopping mall is used for the model training and evaluation. Motion similarity measurement is used as a sub-module in various tasks such as action recognition, anomaly detection, and person re-identification; however, it has attracted less attention than the other tasks because the same motion can be represented differently depending on the performer's body structure and camera angle. The lack of public datasets for model training and evaluation also makes research challenging. Therefore, we propose an artificial dataset for model training, with motion embeddings separated from the body structure and camera angle attributes for training using an autoencoder architecture. The autoencoder is designed to generate motion embeddings for each body part to measure motion similarity by body part. Furthermore, motion speed is synchronized by matching patches performing similar motions using dynamic time warping. The similarity score dataset for evaluation was collected through a crowdsourcing platform utilizing videos of NTU RGB+D 120, a dataset for action recognition. When the proposed models were verified with each evaluation dataset, both outperformed the baselines. In the fashion style classification problem, the proposed model showed the most balanced performance, without bias toward either the minority classes or the majority classes, among all the models. In addition, In the motion similarity measurement experiments, the correlation coefficient of the proposed model to the human-measured similarity score was higher than that of the baselines.컴퓨터 비전은 λ”₯λŸ¬λ‹ ν•™μŠ΅ 방법둠이 강점을 λ³΄μ΄λŠ” λΆ„μ•Όλ‘œ, λ‹€μ–‘ν•œ νƒœμŠ€ν¬μ—μ„œ μš°μˆ˜ν•œ μ„±λŠ₯을 보이고 μžˆλ‹€. 특히, μ‚¬λžŒμ΄ ν¬ν•¨λœ μ΄λ―Έμ§€λ‚˜ λ™μ˜μƒμ„ λ”₯λŸ¬λ‹μ„ 톡해 λΆ„μ„ν•˜λŠ” νƒœμŠ€ν¬μ˜ 경우, 졜근 μ†Œμ…œ 미디어에 μ‚¬λžŒμ΄ ν¬ν•¨λœ 이미지 λ˜λŠ” λ™μ˜μƒ κ²Œμ‹œλ¬Όμ΄ λŠ˜μ–΄λ‚˜λ©΄μ„œ κ·Έ ν™œμš© κ°€μΉ˜κ°€ 높아지고 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ‚¬λžŒκ³Ό κ΄€λ ¨λœ 컴퓨터 λΉ„μ „ νƒœμŠ€ν¬ 쀑 νŒ¨μ…˜ μŠ€νƒ€μΌ λΆ„λ₯˜ λ¬Έμ œμ™€ λ™μž‘ μœ μ‚¬λ„ 츑정에 λŒ€ν•΄ 닀룬닀. νŒ¨μ…˜ μŠ€νƒ€μΌ λΆ„λ₯˜ 문제의 경우, 데이터 μˆ˜μ§‘ μ‹œμ μ˜ νŒ¨μ…˜ μœ ν–‰μ— 따라 μŠ€νƒ€μΌ ν΄λž˜μŠ€λ³„ μˆ˜μ§‘λ˜λŠ” μƒ˜ν”Œμ˜ 양이 달라지기 λ•Œλ¬Έμ— μ΄λ‘œλΆ€ν„° 클래슀 λΆˆκ· ν˜•μ΄ λ°œμƒν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ΄λŸ¬ν•œ 클래슀 λΆˆκ· ν˜• λ¬Έμ œμ— λŒ€μ²˜ν•˜κΈ° μœ„ν•˜μ—¬, μ†Œμˆ˜ μƒ˜ν”Œ ν΄λž˜μŠ€μ™€ λ‹€μˆ˜ μƒ˜ν”Œ 클래슀λ₯Ό ν•™μŠ΅ 및 평가에 λͺ¨λ‘ μ‚¬μš©ν•˜λŠ” μΌλ°˜ν™”λœ ν“¨μƒ·λŸ¬λ‹μœΌλ‘œ νŒ¨μ…˜ μŠ€νƒ€μΌ λΆ„λ₯˜ 문제λ₯Ό μ„€μ •ν•˜μ˜€λ‹€. λ˜ν•œ λ³€λΆ„ μ˜€ν† μΈμ½”λ” 기반의 λͺ¨λΈμ„ 톡해, 신체 및 νŒ¨μ…˜ μ•„μ΄ν…œ λΆ€λΆ„λ§Œ μž˜λΌλ‚Έ μ „κ²½ 이미지 λͺ¨λ‹¬λ¦¬ν‹°μ™€ νŒ¨μ…˜ 속성 정보 λͺ¨λ‹¬λ¦¬ν‹°κ°€ νŒ¨μ…˜ μ΄λ―Έμ§€μ˜ μž„λ² λ”© ν•™μŠ΅μ— λ°˜μ˜λ˜λ„λ‘ ν•˜μ˜€λ‹€. ν•™μŠ΅ 및 평가λ₯Ό μœ„ν•œ λ°μ΄ν„°μ…‹μœΌλ‘œλŠ” ν•œκ΅­ νŒ¨μ…˜ μ‡Όν•‘λͺ°μ—μ„œ μˆ˜μ§‘λœ K-fashion 데이터셋을 μ‚¬μš©ν•˜μ˜€λ‹€. ν•œνŽΈ, λ™μž‘ μœ μ‚¬λ„ 츑정은 ν–‰μœ„ 인식, 이상 λ™μž‘ 감지, μ‚¬λžŒ μž¬μΈμ‹ 같은 λ‹€μ–‘ν•œ λΆ„μ•Όμ˜ ν•˜μœ„ λͺ¨λ“ˆλ‘œ ν™œμš©λ˜κ³  μžˆμ§€λ§Œ κ·Έ μžμ²΄κ°€ μ—°κ΅¬λœ 적은 λ§Žμ§€ μ•Šμ€λ°, μ΄λŠ” 같은 λ™μž‘μ„ μˆ˜ν–‰ν•˜λ”λΌλ„ 신체 ꡬ쑰 및 카메라 각도에 따라 λ‹€λ₯΄κ²Œ ν‘œν˜„λ  수 μžˆλ‹€λŠ” 점으둜 λΆ€ν„° κΈ°μΈν•œλ‹€. ν•™μŠ΅ 및 평가λ₯Ό μœ„ν•œ 곡개 데이터셋이 λ§Žμ§€ μ•Šλ‹€λŠ” 점 λ˜ν•œ 연ꡬλ₯Ό μ–΄λ ΅κ²Œ ν•˜λŠ” μš”μΈμ΄λ‹€. λ”°λΌμ„œ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” ν•™μŠ΅μ„ μœ„ν•œ 인곡 데이터셋을 μˆ˜μ§‘ν•˜μ—¬ μ˜€ν† μΈμ½”λ” ꡬ쑰λ₯Ό 톡해 신체 ꡬ쑰 및 카메라 각도 μš”μ†Œκ°€ λΆ„λ¦¬λœ λ™μž‘ μž„λ² λ”©μ„ ν•™μŠ΅ν•˜μ˜€λ‹€. μ΄λ•Œ, 각 신체 λΆ€μœ„λ³„λ‘œ λ™μž‘ μž„λ² λ”©μ„ 생성할 수 μžˆλ„λ‘ν•˜μ—¬ 신체 λΆ€μœ„λ³„λ‘œ λ™μž‘ μœ μ‚¬λ„ 츑정이 κ°€λŠ₯ν•˜λ„λ‘ ν•˜μ˜€λ‹€. 두 λ™μž‘ μ‚¬μ΄μ˜ μœ μ‚¬λ„λ₯Ό μΈ‘μ •ν•  λ•Œμ—λŠ” 동적 μ‹œκ°„ μ›Œν•‘ 기법을 μ‚¬μš©, λΉ„μŠ·ν•œ λ™μž‘μ„ μˆ˜ν–‰ν•˜λŠ” ꡬ간끼리 μ •λ ¬μ‹œμΌœ μœ μ‚¬λ„λ₯Ό μΈ‘μ •ν•˜λ„λ‘ ν•¨μœΌλ‘œμ¨, λ™μž‘ μˆ˜ν–‰ μ†λ„μ˜ 차이λ₯Ό λ³΄μ •ν•˜μ˜€λ‹€. 평가λ₯Ό μœ„ν•œ μœ μ‚¬λ„ 점수 데이터셋은 ν–‰μœ„ 인식 데이터셋인 NTU-RGB+D 120의 μ˜μƒμ„ ν™œμš©ν•˜μ—¬ ν¬λΌμš°λ“œ μ†Œμ‹± ν”Œλž«νΌμ„ 톡해 μˆ˜μ§‘λ˜μ—ˆλ‹€. 두 가지 νƒœμŠ€ν¬μ˜ μ œμ•ˆ λͺ¨λΈμ„ 각각의 평가 λ°μ΄ν„°μ…‹μœΌλ‘œ κ²€μ¦ν•œ κ²°κ³Ό, λͺ¨λ‘ 비ꡐ λͺ¨λΈ λŒ€λΉ„ μš°μˆ˜ν•œ μ„±λŠ₯을 κΈ°λ‘ν•˜μ˜€λ‹€. νŒ¨μ…˜ μŠ€νƒ€μΌ λΆ„λ₯˜ 문제의 경우, λͺ¨λ“  λΉ„κ΅κ΅°μ—μ„œ μ†Œμˆ˜ μƒ˜ν”Œ ν΄λž˜μŠ€μ™€ λ‹€μˆ˜ μƒ˜ν”Œ 클래슀 쀑 ν•œ μͺ½μœΌλ‘œ μΉ˜μš°μΉ˜μ§€ μ•ŠλŠ” κ°€μž₯ κ· ν˜•μž‘νžŒ μΆ”λ‘  μ„±λŠ₯을 λ³΄μ—¬μ£Όμ—ˆκ³ , λ™μž‘ μœ μ‚¬λ„ μΈ‘μ •μ˜ 경우 μ‚¬λžŒμ΄ μΈ‘μ •ν•œ μœ μ‚¬λ„ μ μˆ˜μ™€ μƒκ΄€κ³„μˆ˜μ—μ„œ 비ꡐ λͺ¨λΈ λŒ€λΉ„ 더 높은 수치λ₯Ό λ‚˜νƒ€λ‚΄μ—ˆλ‹€.Chapter 1 Introduction 1 1.1 Background and motivation 1 1.2 Research contribution 5 1.2.1 Fashion style classication 5 1.2.2 Human motion similarity 9 1.2.3 Summary of the contributions 11 1.3 Thesis outline 13 Chapter 2 Literature Review 14 2.1 Fashion style classication 14 2.1.1 Machine learning and deep learning-based approaches 14 2.1.2 Class imbalance 15 2.1.3 Variational autoencoder 17 2.2 Human motion similarity 19 2.2.1 Measuring the similarity between two people 19 2.2.2 Human body embedding 20 2.2.3 Datasets for measuring the similarity 20 2.2.4 Triplet and quadruplet losses 21 2.2.5 Dynamic time warping 22 Chapter 3 Fashion Style Classication 24 3.1 Dataset for fashion style classication: K-fashion 24 3.2 Multimodal variational inference for fashion style classication 28 3.2.1 CADA-VAE 31 3.2.2 Generating multimodal features 33 3.2.3 Classier training with cyclic oversampling 36 3.3 Experimental results for fashion style classication 38 3.3.1 Implementation details 38 3.3.2 Settings for experiments 42 3.3.3 Experimental results on K-fashion 44 3.3.4 Qualitative analysis 48 3.3.5 Eectiveness of the cyclic oversampling 50 Chapter 4 Motion Similarity Measurement 53 4.1 Datasets for motion similarity 53 4.1.1 Synthetic motion dataset: SARA dataset 53 4.1.2 NTU RGB+D 120 similarity annotations 55 4.2 Framework for measuring motion similarity 58 4.2.1 Body part embedding model 58 4.2.2 Measuring motion similarity 67 4.3 Experimental results for measuring motion similarity 68 4.3.1 Implementation details 68 4.3.2 Experimental results on NTU RGB+D 120 similarity annotations 72 4.3.3 Visualization of motion latent clusters 78 4.4 Application 81 4.4.1 Real-world application with dancing videos 81 4.4.2 Tuning similarity scores to match human perception 87 Chapter 5 Conclusions 89 5.1 Summary and contributions 89 5.2 Limitations and future research 91 Appendices 93 Chapter A NTU RGB+D 120 Similarity Annotations 94 A.1 Data collection 94 A.2 AMT score analysis 95 Chapter B Data Cleansing of NTU RGB+D 120 Skeletal Data 100 Chapter C Motion Sequence Generation Using Mixamo 102 Bibliography 104 ꡭ문초둝 123λ°•

    Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources

    Full text link
    We propose a method for visual question answering which combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. This allows more complex questions to be answered using the predominant neural network-based approach than has previously been possible. It particularly allows questions to be asked about the contents of an image, even when the image itself does not contain the whole answer. The method constructs a textual representation of the semantic content of an image, and merges it with textual information sourced from a knowledge base, to develop a deeper understanding of the scene viewed. Priming a recurrent neural network with this combined information, and the submitted question, leads to a very flexible visual question answering approach. We are specifically able to answer questions posed in natural language, that refer to information not contained in the image. We demonstrate the effectiveness of our model on two publicly available datasets, Toronto COCO-QA and MS COCO-VQA and show that it produces the best reported results in both cases.Comment: Accepted to IEEE Conf. Computer Vision and Pattern Recognitio
    • …
    corecore