231 research outputs found
Video annotation for studying the brain in naturalistic settings
Aivojen tutkiminen luonnollisissa asetelmissa on viimeaikainen suunta aivotutkimuksessa. Perinteisesti aivotutkimuksessa on käytetty hyvin yksinkertaistettuja ja keinotekoisia ärsykkeitä, mutta viime aikoina on alettu tutkia ihmisaivoja yhä luonnollisimmissa asetelmissa. Näissä kokeissa on käytetty elokuvaa luonnollisena ärsykkeenä.
Elokuvan monimutkaisuudesta johtuen tarvitaan siitä yksinkertaistettu malli laskennallisen käsittely mahdollistamiseksi. Tämä malli tuotetaan annotoimalla; keräämällä elokuvan keskeisistä ärsykepiirteistä dataa tietorakenteen muodostamiseksi. Tätä dataa verrataan aivojen aikariippuvaiseen aktivaatioon etsittäessä mahdollisia korrelaatioita.
Kaikkia elokuvan ominaisuuksia ei pystytä annotoimaan automaattisesti; ihmiselle merkitykselliset ominaisuudet on annotoitava käsin, joka on joissain tapauksissa ongelmallista johtuen elokuvan käyttämistä useista viestintämuodoista. Ymmärrys näistä viestinnän muodoista auttaa analysoimaan ja annotoimaan elokuvia.
Elokuvaa Tulitikkutehtaan Tyttö (Aki Kaurismäki, 1990) käytettiin ärsykkeenä aivojen tutkimiseksi luonnollisissa asetelmissa. Kokeista saadun datan analysoinnin helpottamiseksi annotoitiin elokuvan keskeiset visuaaliset ärsykepiirteet. Tässä työssä tutkittiin annotointiin käytettävissä olevia eri lähestymistapoja ja teknologioita.
Annotointi auttaa informaation organisoinnissa, mistä syystä annotointia ilmestyy nykyään kaikkialla. Erilaisia annotaatiotyökaluja ja -teknologioita kehitetään jatkuvasti. Lisäksi videoanalyysimenetelmät ovat alkaneet mahdollistaa yhä merkityksellisemmän informaation automaattisen annotoinnin tulevaisuudessa.Studying the brain in naturalistic settings is a recent trend in neuroscience. Traditional brain imaging experiments have relied on using highly simplified and artificial stimuli, but recently efforts have been put into studying the human brain in conditions closer to real-life. The methodology used in these studies involve imitating naturalistic stimuli with a movie.
Because of the complexity of the naturalistic stimulus, a simplified model of it is needed to handle it computationally. This model is obtained by making annotations; collecting information of salient features of the movie to form a data structure. This data is compared with the brain activity evolving in time to search for possible correlations. All the features of a movie cannot be reliably annotated automatically: semantic features of a movie require manual annotations, which is in some occasions problematic due to the various cinematic techniques adopted. Understanding these methods helps analyzing and annotating movies.
The movie Match Factory Girl (Aki Kaurismäki, 1990) was used as a stimulus in studying the brain in naturalistic settings. To help the analysis of the acquired data the salient visual features of the movie were annotated. In this work existing annotation approaches and available technologies for annotation were reviewed.
Annotations help organizing information, therefore they are nowadays found everywhere. Different tools and technologies are being developed constantly. Furthermore, development of automatic video analysis methods are going to provide more meaningful annotations in the future
Guest Editorial : Special issue on advanced computing for image-guided intervention
Editorial Guest Editorial: Special issue on advanced computing for image-guided intervention In the past years, we have witnessed a growing number of applications of minimally invasive or non-invasive interventions in clinical practice, where imaging is playing an essential role for the success of both diagnosis and therapy. Particularly, advanced signal and image processing algorithms are receiving increasing attention, which aim to provide accurate and reliable information directly to physicians. We have seen the applications of these technologies during all stages of an intervention, including preoperational planning, intra-operational guidance and post-operational verification. For this special issue, we have received a significant number of submissions from both academia and industry, out of which we have carefully selected eleven articles with outstanding quality. These articles have covered the topics of anatomic structure identification and tracking, image registration, data visualization and newly emerging applications. In [1] have addressed the image registration problem between preand post-radiated MRI to facilitate the evaluation of the therapeutic response after External Beam Radiation Treatment (EBRT) for the prostate cancer. A different approach has been employed by We have also included three papers on ultrasound-guided image interventions. In We have included in this special issue two papers on tissue characterization from endoscopic images. Nawarathna et al. have proposed in With the increasing use of various imaging modalities in image-guided intervention and therapy, how to optimally present and visualize the data becomes also an important issue. In [10], the authors have addressed the use of autostereoscopic volumetric visualization of the patient's anatomy, which has the potential to be combined with augmented reality. The paper especially addresses the latency problem in the visualization chain, and a few improvements have been proposed. A new adjacent application has been presented in In summary, we have seen from submissions to this special issue a growing interest in applying advanced signal and image processing technologies to image-guided interventions. The submissions have covered a wide range of clinical applications using various imaging modalities. Image feature extraction remains to be an important subject and it has to be specifically designed to suit the needs for specific applications. Learning-based approaches have also attracted a lot of attention, especially in applications requiring automatic tissue characterization and classification. We are also very happy to have received new emerging applications which are able to extend the traditional interventional imaging into greater application areas. Acknowledgments We would like to thank all the reviewers who have helped to peer-review the submitted papers and their constructive comments are well appreciated
Recommended from our members
User-centred video abstraction
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonThe rapid growth of digital video content in recent years has imposed the need for the development of technologies with the capability to produce condensed but semantically rich versions of the input video stream in an effective manner. Consequently, the topic of Video Summarisation is becoming increasingly popular in multimedia community and numerous video abstraction approaches have been proposed accordingly. These recommended techniques can be divided into two major categories of automatic and semi-automatic in accordance with the required level of human intervention in summarisation process. The fully-automated methods mainly adopt the low-level visual, aural and textual features alongside the mathematical and statistical algorithms in furtherance to extract the most significant segments of original video. However, the effectiveness of this type of techniques is restricted by a number of factors such as domain-dependency, computational expenses and the inability to understand the semantics of videos from low-level features. The second category of techniques however, attempts to alleviate the quality of summaries by involving humans in the abstraction process to bridge the semantic gap. Nonetheless, a single user’s subjectivity and other external contributing factors such as distraction will potentially deteriorate the performance of this group of approaches. Accordingly, in this thesis we have focused on the development of three user-centred effective video summarisation techniques that could be applied to different video categories and generate satisfactory results. According to our first proposed approach, a novel mechanism for a user-centred video summarisation has been presented for the scenarios in which multiple actors are employed in the video summarisation process in order to minimise the negative effects of sole user adoption. Based on our recommended algorithm, the video frames were initially scored by a group of video annotators ‘on the fly’. This was followed by averaging these assigned scores in order to generate a singular saliency score for each video frame and, finally, the highest scored video frames alongside the corresponding audio and textual contents were extracted to be included into the final summary. The effectiveness of our approach has been assessed by comparing the video summaries generated based on our approach against the results obtained from three existing automatic summarisation tools that adopt different modalities for abstraction purposes. The experimental results indicated that our proposed method is capable of delivering remarkable outcomes in terms of Overall Satisfaction and Precision with an acceptable Recall rate, indicating the usefulness of involving user input in the video summarisation process. In an attempt to provide a better user experience, we have proposed our personalised video summarisation method with an ability to customise the generated summaries in accordance with the viewers’ preferences. Accordingly, the end-user’s priority levels towards different video scenes were captured and utilised for updating the average scores previously assigned by the video annotators. Finally, our earlier proposed summarisation method was adopted to extract the most significant audio-visual content of the video. Experimental results indicated the capability of this approach to deliver superior outcomes compared with our previously proposed method and the three other automatic summarisation tools. Finally, we have attempted to reduce the required level of audience involvement for personalisation purposes by proposing a new method for producing personalised video summaries. Accordingly, SIFT visual features were adopted to identify the video scenes’ semantic categories. Fusing this retrieved data with pre-built users’ profiles, personalised video abstracts can be created. Experimental results showed the effectiveness of this method in delivering superior outcomes comparing to our previously recommended algorithm and the three other automatic summarisation techniques
Tapahtumasegmentaation aivovasteet hippokampuksessa ja aivokuorella äänitarinan kuuntelun aikana
Tapahtumasegmentaatio jäsentää sekä arkista kokemustamme että muistiamme. Parhaillaan meneillään olevan tapahtuman hahmotus ja prosessointi tapahtuu todennäköisesti aivokuorella, mutta ilman toimivaa hippokampusta tilanteesta ei voi syntyä pysyvää muistoa. On olennainen kysymys, missä kohtaa ja miten hippokampus osallistuu tapahtumien prosessointiin ja mieleen painamiseen. Aiemmin on magneettikuvaustutkimuksin osoitettu, että hippokampus reagoi tapahtumien välisiin rajoihin aktivaatiopiikein. On ehdotettu, että ne ilmentäisivät aistimodaliteetista riippumattoman tason prosessia, jossa hippokampus kokoaa yhteen ja vahvistaa koetun tilanteen kokonaisrepresentaation, jotta se voidaan painaa muistiin. Aiemmat tutkimukset on kuitenkin toteutettu yksinomaan audiovisuaalisilla ärsykkeillä, ja koska hippokampuksen tiedetään osallistuvan myös visuaaliseen prosessointiin, ei ole täysin selvää, etteivätkö havaitut aktivaatiot voisi selittyä alemman, aistitietoa käsittelevän tason prosesseilla.
Tämän kysymyksen ratkaisemiseksi tässä tutkimuksessa selvitettiin reagoiko hippokampus tapahtumarajoihin puhtaasti auditiivisessa ärsykkeessä. Ärsykkeenä oli 71-minuuttinen tarinallinen äänikirja, jonka osallistujat kuuntelivat passiivisesti fMRI-rekisteröinnin aikana, ja jonka tapahtumarajat määriteltiin kokeellisesti erillisen koehenkilöryhmän avulla. Aivokuvausaineisto analysoitiin aivoalueittain sekä hippokampuksesta että eksploratiivisesti myös kaikilta aivokuoren alueilta.
Hippokampuksen havaittiin reagoivan tapahtumarajoihin aktivaatiopiikein. Aivokuorella voimakkaasti reagoivia alueita olivat mm. posteriorinen mediaalinen aivokuori, ventromediaalinen prefrontaalialue, parahippokampaalinen poimu sekä etummainen pihtipoimu. Monien näistä alueista uskotaan osallistuvan meneillään olevan tapahtuman mallintamiseen ja hahmottamiseen, ja osa mahdollisesti osallistuu huomion siirtämiseen sisäisen ja ulkoisen välillä. Etummaisen pihtipoimun tiedetään osallistuvan odotusten ja havaintojen välisten konfliktien monitorointiin, mikä saattaisi tukea teoriaa, jonka mukaan segmentaatio olisi riippuvaista havaituista ennustevirheistä. Tätä ei kuitenkaan tämän tutkimuksen perusteella voida varmasti päätellä, vaan asiaa tulisi tutkia tarkemmin.
Tämän tutkimuksen tulokset tukevat näkemystä, jonka mukaan hippokampuksen lisääntynyt toiminta tapahtumarajoilla liittyy korkean tason abstraktiin segmentaatioon ja mahdollisesti episodisen muiston luomiseen. Tämä prosessi mahdollisesti tapahtuu yhteistyössä aivokuoren aktiivisten alueiden kanssa, mutta kausaaliset suhteet ja informaation kulku näiden alueiden välillä on selvitettävä myöhemmissä tutkimuksissa.Event segmentation structures our experience as well as our memories. The representation of the currently ongoing event is likely dependent on a network of cortical areas, but the ability to retain a memory of the event requires an intact hippocampus. It is thus a relevant question how and when this hippocampal episodic encoding happens. It has previously been shown that the hippocampus is sensitive to event boundaries and responds to them with transient fMRI activation peaks. It has been proposed that these hippocampal end-of-event activations represent a high-level, modality-independent process of sharpening or “printing out” of the memory trace of the situation. However, the studies reporting hippocampal peaks have been conducted on audio-visual stimuli, so it is unclear whether these results generalise to narratives without a visual component, as the hippocampus is known to support visual processing as well as episodic encoding.
In this study I aim to answer this question by analysing fMRI data from participants experiencing a purely auditory narrative. The stimulus was a 71-minute-long audio book, and it was segmented behaviourally by a separate group of participants with a naïve intuitive segmentation paradigm. The data was analysed with a region of interest (ROI) analysis in the hippocampus, as well as in an exploratory manner on all areas from a cortical atlas.
The hippocampus was found to respond significantly to event boundaries in the story. Strong responses were also found in areas of the posterior medial cortex (PMC), as well as in ventromedial prefrontal cortex (vmPFC), parahippocampal gyrus, anterior cingulate (ACC) and the insula. Many of these are known to be involved in representing the event model, and some with switching between internal and external processing modes. ACC in particular is known to be involved in conflict monitoring – this might link with the proposal that segmentation in general is driven by prediction error and would merit further study.
I conclude that the hippocampus does detect and respond to event boundaries in a naturalistic auditory narrative, which is in line with the “print out” hypothesis and implies that these activations are related to domain-general episodic encoding. The increased hippocampal processing is likely to happen in collaboration with cortical areas involved in signalling change and representing the working event model. However, the causal connections between these areas during the boundary-related processing cascade needs to be elaborated in future studies
The neuroscience of musical creativity using complexity tools
This project is heavily experimental and draws on a wide variety of disciplines from musicology and music psychology to cognitive neuroscience and (neuro)philosophy.
The objective is to explore and characterise brain activity during the process of creativity and corroborating this with self-assessments from participants and external assessments from professional “judges”. This three-way experimental design bypasses the semantically difficult task of defining and assessing creativity by asking both participants and judges to rate ‘How creative did you think that was?’.
Characterising creativity is pertinent to complexity as it is an opportunity to comprehensively investigate a neural and cognitive system from multiple experimental and analytical facets. This thesis explores the anatomical and functional system underlying the creative cognitive state by analysing the concurrent time series recorded from the brain and furthermore, investigates a model in the stages of creativity using a behavioural experiment, in more detail than hitherto done in this domain.
Experimentally, the investigation is done in the domain of music and the time series is the recorded Electroencephalogram (EEG) of a pianist’s whilst performing the two creative musical tasks of ‘Interpretation’ and ‘Improvisation’ manipulations of musical extracts. An initial pilot study consisted of 5 participants being shown 30 musical extracts spanning the Classical soundworld across different rhythms, keys and tonalities. The study was then refined to only 20 extracts and modified to include 10 Jazz extracts and 8 participants from a roughly equal spread of Classical and Jazz backgrounds and gender. 5 external assessors had a roughly even spread of expertise in Jazz and Classical music.
Source localisation was performed on the experimental EEG data collected using a software called sLORETA that allows a linear inverse mapping of the electrical activity recorded at the scalp surface onto deeper cortical structures as the source of the recorded activity. Broadman Area (BA) 37 which has previously been linked to semantic processing, was robustly related to participants from a Classical background and BA 7 which has previously been linked to altered states of consciousness such as hypnagogia and sleep, was robustly related to participants from a Jazz background whilst Improvising.
Analyses exploring the spread, agreement and biases of ratings across the different judges and self-ratings revealed a judge and participant inter-rater reliability at participant level. There was also an equal agreement between judges when rating the different genres Jazz or Classical, across the different tasks of ‘Improvisation’ and ‘Interpretation’, increasing confidence in inter-genre rating reliability for further analyses on the EEG of the extracts themselves. Furthermore, based on the ratings alone, it was possible to partition participants into either Jazz or Classical, which agreed with phenomenological interview information taken from the participants themselves.
With the added conditions of extracts that were deemed creative by objective judge assessment, source localisation analyses pinpointed BA 32 as a robust indicator of Creativity within the participants’ brain. It is an area that is particularly well connected and allows an integration of motoric and emotional communication with a maintenance of executive control.
Network analysis was performed using the PLV index (Phase Locking Value) between the 64 electrodes, as the strength of the links in an adjacency matrix of a complex network. This revealed the brain network is significantly more efficient and more strongly synchronised and clustered when participants’ are playing Classical extracts compared to Jazz extracts, in the fronto-central region with a clear right hemispheric lateralization.
A behavioural study explored the role of distraction in the ‘Incubation’ period for both interpretation and improvisation using a 2-back number exercise occupying working memory, as the distractor. Analysis shows that a distractor has no significant effect on ‘Improvisation’ but significantly impairs ‘Interpretation’ based on the self-assessments by the participants.Open Acces
Recommended from our members
Large-scale Affective Computing for Visual Multimedia
In recent years, Affective Computing has arisen as a prolific interdisciplinary field for engineering systems that integrate human affections. While human-computer relationships have long revolved around cognitive interactions, it is becoming increasingly important to account for human affect, or feelings or emotions, to avert user experience frustration, provide disability services, predict virality of social media content, etc. In this thesis, we specifically focus on Affective Computing as it applies to large-scale visual multimedia, and in particular, still images, animated image sequences and video streams, above and beyond the traditional approaches of face expression and gesture recognition. By taking a principled psychology-grounded approach, we seek to paint a more holistic and colorful view of computational affect in the context of visual multimedia. For example, should emotions like 'surprise' and `fear' be assumed to be orthogonal output dimensions? Or does a 'positive' image in one culture's view elicit the same feelings of positivity in another culture? We study affect frameworks and ontologies to define, organize and develop machine learning models with such questions in mind to automatically detect affective visual concepts.
In the push for what we call "Big Affective Computing," we focus on two dimensions of scale for affect -- scaling up and scaling out -- which we propose are both imperative if we are to scale the Affective Computing problem successfully. Intuitively, simply increasing the number of data points corresponds to "scaling up". However, less intuitive, is when problems like Affective Computing "scale out," or diversify. We show that this latter dimension of introducing data variety, alongside the former of introducing data volume, can yield particular insights since human affections naturally depart from traditional Machine Learning and Computer Vision problems where there is an objectively truthful target. While no one might debate a picture of a 'dog' should be tagged as a 'dog,' but not all may agree that it looks 'ugly'. We present extensive discussions on why scaling out is critical and how it can be accomplished while in the context of large-volume visual data.
At a high-level, the main contributions of this thesis include:
Multiplicity of Affect Oracles:
Prior to the work in this thesis, little consideration has been paid to the affective label generating mechanism when learning functional mappings between inputs and labels. Throughout this thesis but first in Chapter 2, starting in Section 2.1.2, we make a case for a conceptual partitioning of the affect oracle governing the label generation process in Affective Computing problems resulting a multiplicity of oracles, whereas prior works assumed there was a single universal oracle. In Chapter 3, the differences between intended versus expressed versus induced versus perceived emotion are discussed, where we argue that perceived emotion is particularly well-suited for scaling up because it reduces the label variance due to its more objective nature compared to other affect states. And in Chapter 4 and 5, a division of the affect oracle along cultural lines with manifestations along both language and geography is explored. We accomplish all this without sacrificing the 'scale up' dimension, and tackle significantly larger volume problems than prior comparable visual affective computing research.
Content-driven Visual Affect Detection:
Traditionally, in most Affective Computing work, prediction tasks use psycho-physiological signals from subjects viewing the stimuli of interest, e.g., a video advertisement, as the system inputs. In essence, this means that the machine learns to label a proxy signal rather than the stimuli itself. In this thesis, with the rise of strong Computer Vision and Multimedia techniques, we focus on the learning to label the stimuli directly without a human subject provided biometric proxy signal (except in the unique circumstances of Chapter 7). This shift toward learning from the stimuli directly is important because it allows us to scale up with much greater ease given that biometric measurement acquisition is both low-throughput and somewhat invasive while stimuli are often readily available. In addition, moving toward learning directly from the stimuli will allow researchers to precisely determine which low-level features in the stimuli are actually coupled with affect states, e.g., which set of frames caused viewer discomfort rather a broad sense that a video was discomforting. In Part I of this thesis, we illustrate an emotion prediction task with a psychology-grounded affect representation. In particular, in Chapter 3, we develop a prediction task over semantic emotional classes, e.g., 'sad,' 'happy' and 'angry,' using animated image sequences given annotations from over 2.5 million users. Subsequently, in Part II, we develop visual sentiment and adjective-based semantics models from million-scale digital imagery mined from a social multimedia platform.
Mid-level Representations for Visual Affect:
While discrete semantic emotions and sentiment are classical representations of affect with decades of psychology grounding, the interdisciplinary nature of Affective Computing, now only about two decades old, allows for new avenues of representation. Mid-level representations have been proposed in numerous Computer Vision and Multimedia problems as an intermediary, and often more computable, step toward bridging the semantic gap between low-level system inputs and high-level label semantic abstractions. In Part II, inspired by this work, we adapt it for vision-based Affective Computing and adopt a semantic construct called adjective-noun pairs. Specifically, in Chapter 4, we explore the use of such adjective-noun pairs in the context of a social multimedia platform and develop a multilingual visual sentiment ontology with over 15,000 affective mid-level visual concepts across 12 languages associated with over 7.3 million images and representations from over 235 countries, resulting in the largest affective digital image corpus in both depth and breadth to date. In Chapter 5, we develop computational methods to predict such adjective-noun pairs and also explore their usefulness in traditional sentiment analysis but with a previously unexplored cross-lingual perspective. And in Chapter 6, we propose a new learning setting called 'cross-residual learning' building off recent successes in deep neural networks, and specifically, in residual learning; we show that cross-residual learning can be used effectively to jointly learn across even multiple related tasks in object detection (noun), more traditional affect modeling (adjectives), and affective mid-level representations (adjective-noun pairs), giving us a framework for better grounding the adjective-noun pair bridge in both vision and affect simultaneously
The role of phonology in visual word recognition: evidence from Chinese
Posters - Letter/Word Processing V: abstract no. 5024The hypothesis of bidirectional coupling of orthography and phonology predicts that phonology plays a role in visual word recognition, as observed in the effects of feedforward and feedback spelling to sound consistency on lexical decision. However, because orthography and phonology are closely related in alphabetic languages (homophones in alphabetic languages are usually orthographically similar), it is difficult to exclude an influence of orthography on phonological effects in visual word recognition. Chinese languages contain many written homophones that are orthographically dissimilar, allowing a test of the claim that phonological effects can be independent of orthographic similarity. We report a study of visual word recognition in Chinese based on a mega-analysis of lexical decision performance with 500 characters. The results from multiple regression analyses, after controlling for orthographic frequency, stroke number, and radical frequency, showed main effects of feedforward and feedback consistency, as well as interactions between these variables and phonological frequency and number of homophones. Implications of these results for resonance models of visual word recognition are discussed.postprin
Interactive effects of orthography and semantics in Chinese picture naming
Posters - Language Production/Writing: abstract no. 4035Picture-naming performance in English and Dutch is enhanced by presentation of a word that is similar in form to the picture name. However, it is unclear whether facilitation has an orthographic or a phonological locus. We investigated the loci of the facilitation effect in Cantonese Chinese speakers by manipulating—at three SOAs (2100, 0, and 1100 msec)—semantic, orthographic, and phonological similarity. We identified an effect of orthographic facilitation that was independent of and larger than phonological facilitation across all SOAs. Semantic interference was also found at SOAs of 2100 and 0 msec. Critically, an interaction of semantics and orthography was observed at an SOA of 1100 msec. This interaction suggests that independent effects of orthographic facilitation on picture naming are located either at the level of semantic processing or at the lemma level and are not due to the activation of picture name segments at the level of phonological retrieval.postprin
Cortical network responses map onto data-driven features that capture visual semantics of movie fragments
Research on how the human brain extracts meaning from sensory input relies in principle on methodological reductionism. In the present study, we adopt a more holistic approach by modeling the cortical responses to semantic information that was extracted from the visual stream of a feature film, employing artificial neural network models. Advances in both computer vision and natural language processing were utilized to extract the semantic representations from the film by combining perceptual and linguistic information. We tested whether these representations were useful in studying the human brain data. To this end, we collected electrocorticography responses to a short movie from 37 subjects and fitted their cortical patterns across multiple regions using the semantic components extracted from film frames. We found that individual semantic components reflected fundamental semantic distinctions in the visual input, such as presence or absence of people, human movement, landscape scenes, human faces, etc. Moreover, each semantic component mapped onto a distinct functional cortical network involving high-level cognitive regions in occipitotemporal, frontal and parietal cortices. The present work demonstrates the potential of the data-driven methods from information processing fields to explain patterns of cortical responses, and contributes to the overall discussion about the encoding of high-level perceptual information in the human brain
- …