3,354 research outputs found

    EMPATH: A Neural Network that Categorizes Facial Expressions

    Get PDF
    There are two competing theories of facial expression recognition. Some researchers have suggested that it is an example of "categorical perception." In this view, expression categories are considered to be discrete entities with sharp boundaries, and discrimination of nearby pairs of expressive faces is enhanced near those boundaries. Other researchers, however, suggest that facial expression perception is more graded and that facial expressions are best thought of as points in a continuous, low-dimensional space, where, for instance, "surprise" expressions lie between "happiness" and "fear" expressions due to their perceptual similarity. In this article, we show that a simple yet biologically plausible neural network model, trained to classify facial expressions into six basic emotions, predicts data used to support both of these theories. Without any parameter tuning, the model matches a variety of psychological data on categorization, similarity, reaction times, discrimination, and recognition difficulty, both qualitatively and quantitatively. We thus explain many of the seemingly complex psychological phenomena related to facial expression perception as natural consequences of the tasks' implementations in the brain

    Model-Based Matching by Linear Combinations of Prototypes

    Get PDF
    We describe a method for modeling object classes (such as faces) using 2D example images and an algorithm for matching a model to a novel image. The object class models are "learned'' from example images that we call prototypes. In addition to the images, the pixelwise correspondences between a reference prototype and each of the other prototypes must also be provided. Thus a model consists of a linear combination of prototypical shapes and textures. A stochastic gradient descent algorithm is used to match a model to a novel image by minimizing the error between the model and the novel image. Example models are shown as well as example matches to novel images. The robustness of the matching algorithm is also evaluated. The technique can be used for a number of applications including the computation of correspondence between novel images of a certain known class, object recognition, image synthesis and image compression

    Refined reverse correlation : a technique for investigating the power of faces

    Get PDF
    People effortlessly and rapidly form a first impression of an individualโ€™s personality based on their facial appearance. Forming an impression based on facial cues can have real world implications, for example, for the outcome of elections, courtroom decisions or work-place interviews. Research using traditional methods has, however, failed to identify the facial features that are related to specific personality traits in a reliable and valid way. This challenge can be overcome using a reverse correlation method. Here I present a refinement of the traditional reverse correlation image classification technique. Over the course of four projects I highlight the different possibilities that the refined technique offers. In the first project I will present how the technique was used to extract the facial prototype of someone that is likely to be ostracized. In the second project, I show how we extracted prototypes that evoke different emotions, applied them to real facial photographs and set the different prototypes in relation with each other. The third project offers insights into how the technique was used to investigate self-perception without any external standard of comparison except the participantsโ€™ own face. Finally, I present a fourth project where the technique was used to investigate whether the belief about how two personality traits co-occur on a conceptual level is reflected in the facial characteristics that are used to form an impression from faces. The here presented refined technique adds to the traditional reverse correlation technique in that internal representations can be visualized without visible artifacts, that the extracted prototypes can be applied to real photographs, and set in relation with each other. The discussion focuses on the reliability and validity of the method and presents future research possibilities

    Observations on Cortical Mechanisms for Object Recognition andsLearning

    Get PDF
    This paper sketches a hypothetical cortical architecture for visual 3D object recognition based on a recent computational model. The view-centered scheme relies on modules for learning from examples, such as Hyperbf-like networks. Such models capture a class of explanations we call Memory-Based Models (MBM) that contains sparse population coding, memory-based recognition, and codebooks of prototypes. Unlike the sigmoidal units of some artificial neural networks, the units of MBMs are consistent with the description of cortical neurons. We describe how an example of MBM may be realized in terms of cortical circuitry and biophysical mechanisms, consistent with psychophysical and physiological data

    Uncertainty-guided Boundary Learning for Imbalanced Social Event Detection

    Full text link
    Real-world social events typically exhibit a severe class-imbalance distribution, which makes the trained detection model encounter a serious generalization challenge. Most studies solve this problem from the frequency perspective and emphasize the representation or classifier learning for tail classes. While in our observation, compared to the rarity of classes, the calibrated uncertainty estimated from well-trained evidential deep learning networks better reflects model performance. To this end, we propose a novel uncertainty-guided class imbalance learning framework - UCLSED_{SED}, and its variant - UCL-ECSED_{SED}, for imbalanced social event detection tasks. We aim to improve the overall model performance by enhancing model generalization to those uncertain classes. Considering performance degradation usually comes from misclassifying samples as their confusing neighboring classes, we focus on boundary learning in latent space and classifier learning with high-quality uncertainty estimation. First, we design a novel uncertainty-guided contrastive learning loss, namely UCL and its variant - UCL-EC, to manipulate distinguishable representation distribution for imbalanced data. During training, they force all classes, especially uncertain ones, to adaptively adjust a clear separable boundary in the feature space. Second, to obtain more robust and accurate class uncertainty, we combine the results of multi-view evidential classifiers via the Dempster-Shafer theory under the supervision of an additional calibration method. We conduct experiments on three severely imbalanced social event datasets including Events2012\_100, Events2018\_100, and CrisisLexT\_7. Our model significantly improves social event representation and classification tasks in almost all classes, especially those uncertain ones.Comment: Accepted by TKDE 202

    ๋‹ค์–‘ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ํ•™์Šต ํ™˜๊ฒฝ ํ•˜์˜ ์ปจํ…์ธ  ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022.2. ์กฐ๋‚จ์ต.๋ฐฉ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์งˆ์˜์— ๋Œ€ํ•œ ๊ด€๋ จ ์ด๋ฏธ์ง€๋ฅผ ์ฐพ๋Š” ์ฝ˜ํ…์ธ  ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์˜ ๊ทผ๋ณธ์ ์ธ ์ž‘์—… ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํŠนํžˆ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ํ•ด์‹ฑ (Hashing) ๋ฐ ๊ณฑ ์–‘์žํ™” (Product Quantization, PQ) ๋กœ ๋Œ€ํ‘œ๋˜๋Š” ๊ทผ์‚ฌ์ตœ๊ทผ์ ‘ ์ด์›ƒ (Approximate Nearest Neighbor, ANN) ๊ฒ€์ƒ‰ ๋ฐฉ์‹์ด ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ๋‹ค. ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ๋”ฅ ๋Ÿฌ๋‹ (CNN-based deep learning) ์ด ๋งŽ์€ ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€ ์ดํ›„๋กœ, ํ•ด์‹ฑ ๋ฐ ๊ณฑ ์–‘์žํ™” ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ ๋ชจ๋‘ ๊ฐœ์„ ์„ ์œ„ํ•ด ๋”ฅ ๋Ÿฌ๋‹์„ ์ฑ„ํƒํ•˜๊ณ  ์žˆ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์ ์ ˆํ•œ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋”ฅ ๋Ÿฌ๋‹ ํ•™์Šต ํ™˜๊ฒฝ์•„๋ž˜์—์„œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์˜ ๋ชฉ์ ์„ ๊ณ ๋ คํ•˜์—ฌ ์˜๋ฏธ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ํ•ด์‹ฑ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•œ ์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ณ , ์˜๋ฏธ์ , ์‹œ๊ฐ์ ์œผ๋กœ ๋ชจ๋‘ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ๊ณฑ ์–‘์žํ™” ๊ธฐ๋ฐ˜์˜ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ์ค€์ง€๋„, ๋น„์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ํŠน์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ, ๋ถ„๋ฅ˜ํ•ด์•ผํ•  ํด๋ž˜์Šค (class category) ๊ฐ€ ๋งŽ์€ ์–ผ๊ตด ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์™€ ํ•˜๋‚˜ ์ด์ƒ์˜ ๋ ˆ์ด๋ธ” (label) ์ด ์ง€์ •๋œ ์ผ๋ฐ˜ ์ด๋ฏธ์ง€ ์„ธํŠธ๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ๋”ฐ๋กœ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ๋จผ์ € ์ด๋ฏธ์ง€์— ๋ถ€์—ฌ๋œ ์˜๋ฏธ๋ก ์  ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜๋Š” ์ง€๋„ ํ•™์Šต์„ ๋„์ž…ํ•˜์—ฌ ํ•ด์‹ฑ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ํด๋ž˜์Šค ๊ฐ„ ์œ ์‚ฌ์„ฑ (๋‹ค๋ฅธ ์‚ฌ๋žŒ ์‚ฌ์ด์˜ ์œ ์‚ฌํ•œ ์™ธ๋ชจ) ๊ณผ ํด๋ž˜์Šค ๋‚ด ๋ณ€ํ™”(๊ฐ™์€ ์‚ฌ๋žŒ์˜ ๋‹ค๋ฅธ ํฌ์ฆˆ, ํ‘œ์ •, ์กฐ๋ช…) ์™€ ๊ฐ™์€ ์–ผ๊ตด ์ด๋ฏธ์ง€ ๊ตฌ๋ณ„์˜ ์–ด๋ ค์›€์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ์ด๋ฏธ์ง€์˜ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•œ๋‹ค. ์–ผ๊ตด ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด SGH (Similarity Guided Hashing) ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ ๋‹ค์ค‘ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ์šฉํ•œ ์ž๊ธฐ ์œ ์‚ฌ์„ฑ ํ•™์Šต์ด ํ›ˆ๋ จ ์ค‘์— ์‚ฌ์šฉ๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•ด์‹ฑ ๊ธฐ๋ฐ˜์˜ ์ผ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด DHD(Deep Hash Distillation) ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ๋‹ค. DHD์—์„œ๋Š” ์ง€๋„ ์‹ ํ˜ธ๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ํด๋ž˜์Šค๋ณ„ ๋Œ€ํ‘œ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ํ•ด์‹œ ํ”„๋ก์‹œ (proxy) ๋ฅผ ๋„์ž…ํ•œ๋‹ค. ๋˜ํ•œ, ํ•ด์‹ฑ์— ์ ํ•ฉํ•œ ์ž์ฒด ์ฆ๋ฅ˜ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ ์ฆ๊ฐ• ๋ฐ์ดํ„ฐ์˜ ์ž ์žฌ๋ ฅ์„ ์ผ๋ฐ˜์ ์ธ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ์ ์šฉํ•œ๋‹ค. ๋‘˜์งธ๋กœ, ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ํ™œ์šฉํ•˜๋Š” ์ค€์ง€๋„ ํ•™์Šต์„ ์กฐ์‚ฌํ•˜์—ฌ ๊ณฑ ์–‘์žํ™” ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ์ง€๋„ ํ•™์Šต ๋”ฅ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•๋“ค์€ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ ค๋ฉด ๊ฐ’๋น„์‹ผ ๋ ˆ์ด๋ธ” ์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•ด์•ผ ํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ์ˆ˜๋งŽ์€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋Š” ํ›ˆ๋ จ์—์„œ ์ œ์™ธ๋œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฒกํ„ฐ ์–‘์žํ™” ๊ธฐ๋ฐ˜ ๋ฐ˜์ง€๋„ ์˜์ƒ ๊ฒ€์ƒ‰ ๋ฐฉ์‹์ธ GPQ (Generalized Product Quantization) ๋„คํŠธ์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์˜๋ฏธ๋ก ์  ์œ ์‚ฌ์„ฑ์„ ์œ ์ง€ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฉ”ํŠธ๋ฆญ ํ•™์Šต (Metric learning) ์ „๋žต๊ณผ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ์˜ ๊ณ ์œ ํ•œ ์ž ์žฌ๋ ฅ์„ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜๋Š” ์—”ํŠธ๋กœํ”ผ ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ฐœ์„ ํ•œ๋‹ค. ์ด ์†”๋ฃจ์…˜์€ ์–‘์žํ™” ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™” ์šฉ๋Ÿ‰์„ ์ฆ๊ฐ€์‹œ์ผœ ์ด์ „์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ๊ฒŒํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ์‚ฌ๋žŒ์˜ ์ง€๋„ ์—†์ด ์‹œ๊ฐ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ๋น„์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํƒ์ƒ‰ํ•œ๋‹ค. ๋น„๋ก ๋ ˆ์ด๋ธ” ์ฃผ์„์„ ํ™œ์šฉํ•œ ์‹ฌ์ธต ์ง€๋„ ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•๋“ค์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€๋น„ ์šฐ์ˆ˜ํ•œ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๋ณด์ผ์ง€๋ผ๋„, ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ •ํ™•ํ•˜๊ฒŒ ๋ ˆ์ด๋ธ”์„ ์ง€์ •ํ•˜๋Š” ๊ฒƒ์€ ํž˜๋“ค๊ณ  ์ฃผ์„์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ์‰ฝ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ ˆ์ด๋ธ” ์—†์ด ์ž์ฒด ์ง€๋„ ๋ฐฉ์‹์œผ๋กœ ํ›ˆ๋ จํ•˜๋Š” SPQ (Self-supervised Product Quantization) ๋„คํŠธ์›Œํฌ ๋ผ๋Š” ์‹ฌ์ธต ๋น„์ง€๋„ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ƒˆ๋กญ๊ฒŒ ์„ค๊ณ„๋œ ๊ต์ฐจ ์–‘์žํ™” ๋Œ€์กฐ ํ•™์Šต ๋ฐฉ์‹์œผ๋กœ ์„œ๋กœ ๋‹ค๋ฅด๊ฒŒ ๋ณ€ํ™˜๋œ ์ด๋ฏธ์ง€๋ฅผ ๋น„๊ตํ•˜์—ฌ ๊ณฑ ์–‘์žํ™”์˜ ์ฝ”๋“œ์›Œ๋“œ์™€ ์‹ฌ์ธต ์‹œ๊ฐ์  ํ‘œํ˜„์„ ๋™์‹œ์— ํ•™์Šตํ•œ๋‹ค. ์ด ๋ฐฉ์‹์„ ํ†ตํ•ด ์ด๋ฏธ์ง€์— ๋‚ด์ œ๋œ ๋‚ด์šฉ์„ ๋ณ„๋„์˜ ์‚ฌ๋žŒ ์ง€๋„ ์—†์ด ๋„คํŠธ์›Œํฌ๊ฐ€ ์Šค์Šค๋กœ ์ดํ•ดํ•˜๊ฒŒ ๋˜๊ณ , ์‹œ๊ฐ์ ์œผ๋กœ ์ •ํ™•ํ•œ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์„ค๋ช… ๊ธฐ๋Šฅ์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ๋‹ค์–‘ํ•œ ํ‰๊ฐ€ ํ”„๋กœํ† ์ฝœ์—์„œ ๋›ฐ์–ด๋‚œ ๊ฒฐ๊ณผ๋ฅผ ์‚ฐ์ถœํ•จ์„ ํ™•์ธํ–ˆ๋‹ค. ์ง€๋„ ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์–ผ๊ตด ์˜์ƒ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ SGH๋Š” ์ €ํ•ด์ƒ๋„ ๋ฐ ๊ณ ํ•ด์ƒ๋„ ์–ผ๊ตด ์˜์ƒ ๋ชจ๋‘์—์„œ ์ตœ๊ณ ์˜ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์˜€๊ณ , DHD๋Š” ์ตœ๊ณ ์˜ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋กœ ์ผ๋ฐ˜ ์˜์ƒ ๊ฒ€์ƒ‰ ์‹คํ—˜์—์„œ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ•œ๋‹ค. ์ค€์ง€๋„ ์ผ๋ฐ˜ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ GPQ๋Š” ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœํ† ์ฝœ์— ๋Œ€ํ•œ ์ตœ์ƒ์˜ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋น„์ง€๋„ ํ•™์Šต ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์˜ ๊ฒฝ์šฐ ์ง€๋„ ๋ฐฉ์‹์œผ๋กœ ๋ฏธ๋ฆฌ ํ•™์Šต๋œ ์ดˆ๊ธฐ ๊ฐ’ ์—†์ด๋„ SPQ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ƒ์˜ ๊ฒ€์ƒ‰ ์ ์ˆ˜๋ฅผ ์–ป์—ˆ์œผ๋ฉฐ ์‹œ๊ฐ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๊ฐ€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋กœ ์„ฑ๊ณต์ ์œผ๋กœ ๊ฒ€์ƒ‰๋˜๋Š” ๊ฒƒ์„ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ๋‹ค.Content-based image retrieval, which finds relevant images to a query from a huge database, is one of the fundamental tasks in the field of computer vision. Especially for conducting fast and accurate retrieval, Approximate Nearest Neighbor (ANN) search approaches represented by Hashing and Product Quantization (PQ) have been proposed to image retrieval community. Ever since neural network based deep learning has shown excellent performance in many computer vision tasks, both Hashing and product quantization-based image retrieval systems are also adopting deep learning for improvement. In this dissertation, image retrieval methods under various deep learning conditions are investigated to suggest the appropriate retrieval systems. Specifically, by considering the purpose of image retrieval, the supervised learning methods are proposed to develop the deep Hashing systems that retrieve semantically similar images, and the semi-supervised, unsupervised learning methods are proposed to establish the deep product quantization systems that retrieve both semantically and visually similar images. Moreover, by considering the characteristics of image retrieval database, the face image sets with numerous class categories, and the general image sets of one or more labeled images are separated to be explored when building a retrieval system. First, supervised learning with the semantic labels given to images is introduced to build a Hashing-based retrieval system. To address the difficulties of distinguishing face images, such as the inter-class similarities (similar appearance between different persons) and the intra-class variations (same person with different pose, facial expressions, illuminations), the identity label of each image is employed to derive the discriminative binary codes. To further develop the face image retrieval quality, Similarity Guided Hashing (SGH) scheme is proposed, where the self-similarity learning with multiple data augmentation results are employed during training. In terms of Hashing-based general image retrieval systems, Deep Hash Distillation (DHD) scheme is proposed, where the trainable hash proxy that presents class-wise representative is introduced to take advantage of supervised signals. Moreover, self-distillation scheme adapted for Hashing is utilized to improve general image retrieval performance by exploiting the potential of augmented data appropriately. Second, semi-supervised learning that utilizes both labeled and unlabeled image data is investigated to build a PQ-based retrieval system. Even if the supervised deep methods show excellent performance, they do not meet the expectations unless expensive label information is sufficient. Besides, there is a limitation that a tons of unlabeled image data is excluded from training. To resolve this issue, the vector quantization-based semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ) network is proposed. A novel metric learning strategy that preserves semantic similarity between labeled data, and a entropy regularization term that fully exploits inherent potentials of unlabeled data are employed to improve the retrieval system. This solution increases the generalization capacity of the quantization network, which allows to overcome previous limitations. Lastly, to enable the network to perform a visually similar image retrieval on its own without any human supervision, unsupervised learning algorithm is explored. Although, deep supervised Hashing and PQ methods achieve the outstanding retrieval performances compared to the conventional methods by fully exploiting the label annotations, however, it is painstaking to assign labels precisely for a vast amount of training data, and also, the annotation process is error-prone. To tackle these issues, the deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner is proposed. A newly designed Cross Quantized Contrastive learning strategy is applied to jointly learn the PQ codewords and the deep visual representations by comparing individually transformed images (views). This allows to understand the image content and extract descriptive features so that the visually accurate retrieval can be performed. By conducting extensive image retrieval experiments on the benchmark datasets, the proposed methods are confirmed to yield the outstanding results under various evaluation protocols. For supervised face image retrieval, SGH achieves the best retrieval performance for both low and high resolution face image, and DHD also demonstrates its efficiency in general image retrieval experiments with the state-of-the-art retrieval performance. For semi-supervised general image retrieval, GPQ shows the best search results for protocols that use both labeled and unlabeled image data. Finally, for unsupervised general image retrieval, the best retrieval scores are achieved with SPQ even without supervised pre-training, and it can be observed that visually similar images are successfully retrieved as search results.Abstract i Contents iv List of Tables vii List of Figures viii 1 Introduction 1 1.1 Contribution 3 1.2 Contents 4 2 Supervised Learning for Deep Hashing: Similarity Guided Hashing for Face Image Retrieval / Deep Hash Distillation for General Image Retrieval 5 2.1 Motivation and Overview for Face Image Retrieval 5 2.1.1 Related Works 9 2.2 Similarity Guided Hashing 10 2.3 Experiments 16 2.3.1 Datasets and Setup 16 2.3.2 Results on Small Face Images 18 2.3.3 Results on Large Face Images 19 2.4 Motivation and Overview for General Image Retrieval 20 2.5 Related Works 22 2.6 Deep Hash Distillation 24 2.6.1 Self-distilled Hashing 24 2.6.2 Teacher loss 27 2.6.3 Training 29 2.6.4 Hamming Distance Analysis 29 2.7 Experiments 32 2.7.1 Setup 32 2.7.2 Implementation Details 32 2.7.3 Results 34 2.7.4 Analysis 37 3 Semi-supervised Learning for Product Quantization: Generalized Product Quantization Network for Semi-supervised Image Retrieval 42 3.1 Motivation and Overview 42 3.1.1 Related Work 45 3.2 Generalized Product Quantization 47 3.2.1 Semi-Supervised Learning 48 3.2.2 Retrieval 52 3.3 Experiments 53 3.3.1 Setup 53 3.3.2 Results and Analysis 55 4 Unsupervised Learning for Product Quantization: Self-supervised Product Quantization for Deep Unsupervised Image Retrieval 58 4.1 Motivation and Overview 58 4.1.1 Related Works 61 4.2 Self-supervised Product Quantization 62 4.2.1 Overall Framework 62 4.2.2 Self-supervised Training 64 4.3 Experiments 67 4.3.1 Datasets 67 4.3.2 Experimental Settings 68 4.3.3 Results 71 4.3.4 Empirical Analysis 71 5 Conclusion 75 Abstract (In Korean) 88๋ฐ•

    Empirical Lessons for Philosophical Theories of Mental Content

    Get PDF
    This thesis concerns the content of mental representations. It draws lessons for philosophical theories of content from some empirical findings about brains and behaviour drawn from experimental psychology (cognitive, developmental, comparative), cognitive neuroscience and cognitive science (computational modelling). Chapter 1 motivates a naturalist and realist approach to mental representation. Chapter 2 sets out and defends a theory of content for static feedforward connectionist networks, and explains how the theory can be extended to other supervised networks. The theory takes forward Churchlandโ€™s state space semantics by making a new and clearer proposal about the syntax of connectionist networks โˆ’ one which nicely accounts for representational development. Chapter 3 argues that the same theoretical approach can be extended to unsupervised connectionist networks, and to some of the representational systems found in real brains. The approach can also show why connectionist systems sometimes show typicality effects, explaining them without relying upon prototype structure. That is discussed in chapter 4, which also argues that prototype structure, where it does exist, does not determine content. The thesis goes on to defend some unorthodox features of the foregoing theory: that a role is assigned to external samples in specifying syntax, that both inputs to and outputs from the system have a role in determining content, and that the content of a representation is partly determined by the circumstances in which it developed. Each, it is argued, may also be a fruitful way of thinking about mental content more generally. Reliance on developmental factors prompts a swampman-type objection. This is rebutted by reference to three possible reasons why content is attributed at all. Two of these motivations support the idea that content is partly determined by historical factors, and the third is consistent with it. The result: some empirical lessons for philosophical theories of mental content.Philosophy of Min

    Stability from variety: the prototype effect in face recognition

    Get PDF
    The central goal of the current thesis was to increase our understanding of how representations of individual faces are built from instances that vary. The prototype effect was used as a tool to probe the nature of our internal face representations. In face recognition, the prototype effect refers to the tendency to recognize, or find familiar, the average image of a face after having studied a series of similar face images. The experiments presented in this thesis investigated the modulating role of different variables on the prototype effect in face recognition. In the study phase, two or more different exemplars based on the same identity were presented. In the test phase, one of the seen exemplars, the unseen prototype, and an unseen exemplar of each studied identity were presented one at a time, and participants were asked to make a recognition judgement about the prior occurrence of either the exact image or the personโ€™s face. Variants of each face identity were either unaltered images of real peopleโ€™s faces, or they were created artificially by manipulating images of faces using several different techniques. All experiments using artificial variants produced strong prototype effects. The unseen prototype image was recognized more confidently than the actually studied images. This was true even when the variants were so similar that they were barely perceptually discriminable. Importantly, even when participants were given additional exposure to the studied exemplars, no weakening of the prototype effect was observed. Surprisingly, in the experiments using natural images of real peopleโ€™s faces, no clear recognition advantage for the prototype image was observed. Results suggest that the prototype effect in face recognition might not be tapping an averaging mechanism that operates solely on variations within the same identity

    Emerging Linguistic Functions in Early Infancy

    Get PDF
    This paper presents results from experimental studies on early language acquisition in infants and attempts to interpret the experimental results within the framework of the Ecological Theory of Language Acquisition (ETLA) recently proposed by (Lacerda et al., 2004a). From this perspective, the infantโ€™s first steps in the acquisition of the ambient language are seen as a consequence of the infantโ€™s general capacity to represent sensory input and the infantโ€™s interaction with other actors in its immediate ecological environment. On the basis of available experimental evidence, it will be argued that ETLA offers a productive alternative to traditional descriptive views of the language acquisition process by presenting an operative model of how early linguistic function may emerge through interaction
    • โ€ฆ
    corecore