3,999 research outputs found

    Fine-graind Image Classification via Combining Vision and Language

    Full text link
    Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy. Despite achieving promising results, these methods mainly have two limitations: (1) not all the parts which obtained through the part detection models are beneficial and indispensable for classification, and (2) fine-grained image classification requires more detailed visual descriptions which could not be provided by the part locations or attribute annotations. For addressing the above two limitations, this paper proposes the two-stream model combining vision and language (CVL) for learning latent semantic representations. The vision stream learns deep representations from the original visual information via deep convolutional neural network. The language stream utilizes the natural language descriptions which could point out the discriminative parts or characteristics for each image, and provides a flexible and compact way of encoding the salient visual aspects for distinguishing sub-categories. Since the two streams are complementary, combining the two streams can further achieves better classification accuracy. Comparing with 12 state-of-the-art methods on the widely used CUB-200-2011 dataset for fine-grained image classification, the experimental results demonstrate our CVL approach achieves the best performance.Comment: 9 pages, to appear in CVPR 201

    A psychology literature study on modality related issues for multimodal presentation in crisis management

    Get PDF
    The motivation of this psychology literature study is to obtain modality related guidelines for real-time information presentation in crisis management environment. The crisis management task is usually companied by time urgency, risk, uncertainty, and high information density. Decision makers (crisis managers) might undergo cognitive overload and tend to show biases in their performances. Therefore, the on-going crisis event needs to be presented in a manner that enhances perception, assists diagnosis, and prevents cognitive overload. To this end, this study looked into the modality effects on perception, cognitive load, working memory, learning, and attention. Selected topics include working memory, dual-coding theory, cognitive load theory, multimedia learning, and attention. The findings are several modality usage guidelines which may lead to more efficient use of the user’s cognitive capacity and enhance the information perception

    The influence of external and internal motor processes on human auditory rhythm perception

    Get PDF
    Musical rhythm is composed of organized temporal patterns, and the processes underlying rhythm perception are found to engage both auditory and motor systems. Despite behavioral and neuroscience evidence converging to this audio-motor interaction, relatively little is known about the effect of specific motor processes on auditory rhythm perception. This doctoral thesis was devoted to investigating the influence of both external and internal motor processes on the way we perceive an auditory rhythm. The first half of the thesis intended to establish whether overt body movement had a facilitatory effect on our ability to perceive the auditory rhythmic structure, and whether this effect was modulated by musical training. To this end, musicians and non-musicians performed a pulse-finding task either using natural body movement or through listening only, and produced their identified pulse by finger tapping. The results showed that overt movement benefited rhythm (pulse) perception especially for non-musicians, confirming the facilitatory role of external motor activities in hearing the rhythm, as well as its interaction with musical training. The second half of the thesis tested the idea that indirect, covert motor input, such as that transformed from the visual stimuli, could influence our perceived structure of an auditory rhythm. Three experiments examined the subjectively perceived tempo of an auditory sequence under different visual motion stimulations, while the auditory and visual streams were presented independently of each other. The results revealed that the perceived auditory tempo was accordingly influenced by the concurrent visual motion conditions, and the effect was related to the increment or decrement of visual motion speed. This supported the hypothesis that the internal motor information extracted from the visuomotor stimulation could be incorporated into the percept of an auditory rhythm. Taken together, the present thesis concludes that, rather than as a mere reaction to the given auditory input, our motor system plays an important role in contributing to the perceptual process of the auditory rhythm. This can occur via both external and internal motor activities, and may not only influence how we hear a rhythm but also under some circumstances improve our ability to hear the rhythm.Musikalische Rhythmen bestehen aus zeitlich strukturierten Mustern akustischer Stimuli. Es konnte gezeigt werden, dass die Prozesse, welche der Rhythmuswahrnehmung zugrunde liegen, sowohl motorische als auch auditive Systeme nutzen. Obwohl sich fĂŒr diese auditiv-motorischen Interaktionen sowohl in den Verhaltenswissenschaften als auch Neurowissenschaften ĂŒbereinstimmende Belege finden, weiß man bislang relativ wenig ĂŒber die Auswirkungen spezifischer motorischer Prozesse auf die auditive Rhythmuswahrnehmung. Diese Doktorarbeit untersucht den Einfluss externaler und internaler motorischer Prozesse auf die Art und Weise, wie auditive Rhythmen wahrgenommen werden. Der erste Teil der Arbeit diente dem Ziel herauszufinden, ob körperliche Bewegungen es dem Gehirn erleichtern können, die Struktur von auditiven Rhythmen zu erkennen, und, wenn ja, ob dieser Effekt durch ein musikalisches Training beeinflusst wird. Um dies herauszufinden wurde Musikern und Nichtmusikern die Aufgabe gegeben, innerhalb von prĂ€sentierten auditiven Stimuli den Puls zu finden, wobei ein Teil der Probanden wĂ€hrenddessen Körperbewegungen ausfĂŒhren sollte und der andere Teil nur zuhören sollte. Anschließend sollten die Probanden den gefundenen Puls durch Finger-Tapping ausfĂŒhren, wobei die Reizgaben sowie die Reaktionen mittels eines computerisierten Systems kontrolliert wurden. Die Ergebnisse zeigen, dass offen ausgefĂŒhrte Bewegungen die Wahrnehmung des Pulses vor allem bei Nichtmusikern verbesserten. Diese Ergebnisse bestĂ€tigen, dass Bewegungen beim Hören von Rhythmen unterstĂŒtzend wirken. Außerdem zeigte sich, dass hier eine Wechselwirkung mit dem musikalischen Training besteht. Der zweite Teil der Doktorarbeit ĂŒberprĂŒfte die Idee, dass indirekte, verdeckte Bewegungsinformationen, wie sie z.B. in visuellen Stimuli enthalten sind, die wahrgenommene Struktur von auditiven Rhythmen beeinflussen können. Drei Experimente untersuchten, inwiefern das subjektiv wahrgenommene Tempo einer akustischen Sequenz durch die PrĂ€sentation unterschiedlicher visueller Bewegungsreize beeinflusst wird, wobei die akustischen und optischen Stimuli unabhĂ€ngig voneinander prĂ€sentiert wurden. Die Ergebnisse zeigten, dass das wahrgenommene auditive Tempo durch die visuellen Bewegungsinformationen beeinflusst wird, und dass der Effekt in Verbindung mit der Zunahme oder Abnahme der visuellen Geschwindigkeit steht. Dies unterstĂŒtzt die Hypothese, dass internale Bewegungsinformationen, welche aus visuomotorischen Reizen extrahiert werden, in die Wahrnehmung eines auditiven Rhythmus integriert werden können. Zusammen genommen, 5 zeigt die vorgestellte Arbeit, dass unser motorisches System eine wichtige Rolle im Wahrnehmungsprozess von auditiven Rhythmen spielt. Dies kann sowohl durch Ă€ußere als auch durch internale motorische AktivitĂ€ten geschehen, und beeinflusst nicht nur die Art, wie wir Rhythmen hören, sondern verbessert unter bestimmten Bedingungen auch unsere FĂ€higkeit Rhythmen zu identifizieren

    Processing resources and interplay among sensory modalities: an EEG investigation

    Get PDF
    The primary aim of the present thesis was to investigate how the human brain handles and distributes limited processing resources among different sensory modalities. Two main hypothesis have been conventionally proposed: (1) common processing resources shared among sensory modalities (supra-modal attentional system) or (2) independent processing resources for each sensory modality. By means of four EEG experiments, we tested whether putative competitive interactions between sensory modalities – regardless of attentional influences – are present in early sensory areas. We observed no competitive interactions between sensory modalities, supporting independent processing resources in early sensory areas. Consequently, we tested the influence of top-down attention on a cross-modal dual task. We found evidence for shared attentional resources between visual and tactile modalities. Taken together, our results point toward a hybrid model of inter-modal attention. Attentional processing resources seem to be controlled by a supra-modal attentional system, however, in early sensory areas, the absence of competitive interactions strongly reduces interferences between sensory modalities, thus providing a strong processing resource independence

    PersonRank: Detecting Important People in Images

    Full text link
    Always, some individuals in images are more important/attractive than others in some events such as presentation, basketball game or speech. However, it is challenging to find important people among all individuals in images directly based on their spatial or appearance information due to the existence of diverse variations of pose, action, appearance of persons and various changes of occasions. We overcome this difficulty by constructing a multiple Hyper-Interaction Graph to treat each individual in an image as a node and inferring the most active node referring to interactions estimated by various types of clews. We model pairwise interactions between persons as the edge message communicated between nodes, resulting in a bidirectional pairwise-interaction graph. To enrich the personperson interaction estimation, we further introduce a unidirectional hyper-interaction graph that models the consensus of interaction between a focal person and any person in a local region around. Finally, we modify the PageRank algorithm to infer the activeness of persons on the multiple Hybrid-Interaction Graph (HIG), the union of the pairwise-interaction and hyperinteraction graphs, and we call our algorithm the PersonRank. In order to provide publicable datasets for evaluation, we have contributed a new dataset called Multi-scene Important People Image Dataset and gathered a NCAA Basketball Image Dataset from sports game sequences. We have demonstrated that the proposed PersonRank outperforms related methods clearly and substantially.Comment: 8 pages, conferenc
    • 

    corecore