Search CORE

3,999 research outputs found

Fine-graind Image Classification via Combining Vision and Language

Author: He Xiangteng
Peng Yuxin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/05/2017
Field of study

Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy. Despite achieving promising results, these methods mainly have two limitations: (1) not all the parts which obtained through the part detection models are beneficial and indispensable for classification, and (2) fine-grained image classification requires more detailed visual descriptions which could not be provided by the part locations or attribute annotations. For addressing the above two limitations, this paper proposes the two-stream model combining vision and language (CVL) for learning latent semantic representations. The vision stream learns deep representations from the original visual information via deep convolutional neural network. The language stream utilizes the natural language descriptions which could point out the discriminative parts or characteristics for each image, and provides a flexible and compact way of encoding the salient visual aspects for distinguishing sub-categories. Since the two streams are complementary, combining the two streams can further achieves better classification accuracy. Comparing with 12 state-of-the-art methods on the widely used CUB-200-2011 dataset for fine-grained image classification, the experimental results demonstrate our CVL approach achieves the best performance.Comment: 9 pages, to appear in CVPR 201

arXiv.org e-Print Archive

Crossref

A psychology literature study on modality related issues for multimodal presentation in crisis management

Author: Cao Yujia
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2008
Field of study

The motivation of this psychology literature study is to obtain modality related guidelines for real-time information presentation in crisis management environment. The crisis management task is usually companied by time urgency, risk, uncertainty, and high information density. Decision makers (crisis managers) might undergo cognitive overload and tend to show biases in their performances. Therefore, the on-going crisis event needs to be presented in a manner that enhances perception, assists diagnosis, and prevents cognitive overload. To this end, this study looked into the modality effects on perception, cognitive load, working memory, learning, and attention. Selected topics include working memory, dual-coding theory, cognitive load theory, multimedia learning, and attention. The findings are several modality usage guidelines which may lead to more efficient use of the user’s cognitive capacity and enhance the information perception

University of Twente Research Information

Selected extended abstracts from the 7th Annual Postgraduate Conference

Author: Battersby Stuart
Pachoud Samuel
Publication venue
Publication date: 30/12/2013
Field of study

Queen Mary Research Online

The influence of external and internal motor processes on human auditory rhythm perception

Author: Su Yi-Huang
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 24/05/2012
Field of study

Musical rhythm is composed of organized temporal patterns, and the processes underlying rhythm perception are found to engage both auditory and motor systems. Despite behavioral and neuroscience evidence converging to this audio-motor interaction, relatively little is known about the effect of specific motor processes on auditory rhythm perception. This doctoral thesis was devoted to investigating the influence of both external and internal motor processes on the way we perceive an auditory rhythm. The first half of the thesis intended to establish whether overt body movement had a facilitatory effect on our ability to perceive the auditory rhythmic structure, and whether this effect was modulated by musical training. To this end, musicians and non-musicians performed a pulse-finding task either using natural body movement or through listening only, and produced their identified pulse by finger tapping. The results showed that overt movement benefited rhythm (pulse) perception especially for non-musicians, confirming the facilitatory role of external motor activities in hearing the rhythm, as well as its interaction with musical training. The second half of the thesis tested the idea that indirect, covert motor input, such as that transformed from the visual stimuli, could influence our perceived structure of an auditory rhythm. Three experiments examined the subjectively perceived tempo of an auditory sequence under different visual motion stimulations, while the auditory and visual streams were presented independently of each other. The results revealed that the perceived auditory tempo was accordingly influenced by the concurrent visual motion conditions, and the effect was related to the increment or decrement of visual motion speed. This supported the hypothesis that the internal motor information extracted from the visuomotor stimulation could be incorporated into the percept of an auditory rhythm. Taken together, the present thesis concludes that, rather than as a mere reaction to the given auditory input, our motor system plays an important role in contributing to the perceptual process of the auditory rhythm. This can occur via both external and internal motor activities, and may not only influence how we hear a rhythm but also under some circumstances improve our ability to hear the rhythm.Musikalische Rhythmen bestehen aus zeitlich strukturierten Mustern akustischer Stimuli. Es konnte gezeigt werden, dass die Prozesse, welche der Rhythmuswahrnehmung zugrunde liegen, sowohl motorische als auch auditive Systeme nutzen. Obwohl sich für diese auditiv-motorischen Interaktionen sowohl in den Verhaltenswissenschaften als auch Neurowissenschaften übereinstimmende Belege finden, weiß man bislang relativ wenig über die Auswirkungen spezifischer motorischer Prozesse auf die auditive Rhythmuswahrnehmung. Diese Doktorarbeit untersucht den Einfluss externaler und internaler motorischer Prozesse auf die Art und Weise, wie auditive Rhythmen wahrgenommen werden. Der erste Teil der Arbeit diente dem Ziel herauszufinden, ob körperliche Bewegungen es dem Gehirn erleichtern können, die Struktur von auditiven Rhythmen zu erkennen, und, wenn ja, ob dieser Effekt durch ein musikalisches Training beeinflusst wird. Um dies herauszufinden wurde Musikern und Nichtmusikern die Aufgabe gegeben, innerhalb von präsentierten auditiven Stimuli den Puls zu finden, wobei ein Teil der Probanden währenddessen Körperbewegungen ausführen sollte und der andere Teil nur zuhören sollte. Anschließend sollten die Probanden den gefundenen Puls durch Finger-Tapping ausführen, wobei die Reizgaben sowie die Reaktionen mittels eines computerisierten Systems kontrolliert wurden. Die Ergebnisse zeigen, dass offen ausgeführte Bewegungen die Wahrnehmung des Pulses vor allem bei Nichtmusikern verbesserten. Diese Ergebnisse bestätigen, dass Bewegungen beim Hören von Rhythmen unterstützend wirken. Außerdem zeigte sich, dass hier eine Wechselwirkung mit dem musikalischen Training besteht. Der zweite Teil der Doktorarbeit überprüfte die Idee, dass indirekte, verdeckte Bewegungsinformationen, wie sie z.B. in visuellen Stimuli enthalten sind, die wahrgenommene Struktur von auditiven Rhythmen beeinflussen können. Drei Experimente untersuchten, inwiefern das subjektiv wahrgenommene Tempo einer akustischen Sequenz durch die Präsentation unterschiedlicher visueller Bewegungsreize beeinflusst wird, wobei die akustischen und optischen Stimuli unabhängig voneinander präsentiert wurden. Die Ergebnisse zeigten, dass das wahrgenommene auditive Tempo durch die visuellen Bewegungsinformationen beeinflusst wird, und dass der Effekt in Verbindung mit der Zunahme oder Abnahme der visuellen Geschwindigkeit steht. Dies unterstützt die Hypothese, dass internale Bewegungsinformationen, welche aus visuomotorischen Reizen extrahiert werden, in die Wahrnehmung eines auditiven Rhythmus integriert werden können. Zusammen genommen, 5 zeigt die vorgestellte Arbeit, dass unser motorisches System eine wichtige Rolle im Wahrnehmungsprozess von auditiven Rhythmen spielt. Dies kann sowohl durch äußere als auch durch internale motorische Aktivitäten geschehen, und beeinflusst nicht nur die Art, wie wir Rhythmen hören, sondern verbessert unter bestimmten Bedingungen auch unsere Fähigkeit Rhythmen zu identifizieren

Digitale Hochschulschriften der LMU

Processing resources and interplay among sensory modalities: an EEG investigation

Author: Porcu Emanuele
Publication venue
Publication date: 13/11/2014
Field of study

The primary aim of the present thesis was to investigate how the human brain handles and distributes limited processing resources among different sensory modalities. Two main hypothesis have been conventionally proposed: (1) common processing resources shared among sensory modalities (supra-modal attentional system) or (2) independent processing resources for each sensory modality. By means of four EEG experiments, we tested whether putative competitive interactions between sensory modalities – regardless of attentional influences – are present in early sensory areas. We observed no competitive interactions between sensory modalities, supporting independent processing resources in early sensory areas. Consequently, we tested the influence of top-down attention on a cross-modal dual task. We found evidence for shared attentional resources between visual and tactile modalities. Taken together, our results point toward a hybrid model of inter-modal attention. Attentional processing resources seem to be controlled by a supra-modal attentional system, however, in early sensory areas, the absence of competitive interactions strongly reduces interferences between sensory modalities, thus providing a strong processing resource independence

Qucosa - Publikationsserver der Universität Leipzig

PersonRank: Detecting Important People in Images

Author: Li Benchao
Li Wei-Hong
Zheng Wei-Shi
Publication venue
Publication date: 06/11/2017
Field of study

Always, some individuals in images are more important/attractive than others in some events such as presentation, basketball game or speech. However, it is challenging to find important people among all individuals in images directly based on their spatial or appearance information due to the existence of diverse variations of pose, action, appearance of persons and various changes of occasions. We overcome this difficulty by constructing a multiple Hyper-Interaction Graph to treat each individual in an image as a node and inferring the most active node referring to interactions estimated by various types of clews. We model pairwise interactions between persons as the edge message communicated between nodes, resulting in a bidirectional pairwise-interaction graph. To enrich the personperson interaction estimation, we further introduce a unidirectional hyper-interaction graph that models the consensus of interaction between a focal person and any person in a local region around. Finally, we modify the PageRank algorithm to infer the activeness of persons on the multiple Hybrid-Interaction Graph (HIG), the union of the pairwise-interaction and hyperinteraction graphs, and we call our algorithm the PersonRank. In order to provide publicable datasets for evaluation, we have contributed a new dataset called Multi-scene Important People Image Dataset and gathered a NCAA Basketball Image Dataset from sports game sequences. We have demonstrated that the proposed PersonRank outperforms related methods clearly and substantially.Comment: 8 pages, conferenc

arXiv.org e-Print Archive

Crossref