6,728 research outputs found

    Enhanced visualisation of dance performance from automatically synchronised multimodal recordings

    Get PDF
    The Huawei/3DLife Grand Challenge Dataset provides multimodal recordings of Salsa dancing, consisting of audiovisual streams along with depth maps and inertial measurements. In this paper, we propose a system for augmented reality-based evaluations of Salsa dancer performances. An essential step for such a system is the automatic temporal synchronisation of the multiple modalities captured from different sensors, for which we propose efficient solutions. Furthermore, we contribute modules for the automatic analysis of dance performances and present an original software application, specifically designed for the evaluation scenario considered, which enables an enhanced dance visualisation experience, through the augmentation of the original media with the results of our automatic analyses

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

    ImageSpirit: Verbal Guided Image Parsing

    Get PDF
    Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interests enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g. smart phones, Google Glass, living room devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the tradeoffs compared to traditional mouse based interactions, results are reported for both a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit

    Scaling Up Medical Visualization : Multi-Modal, Multi-Patient, and Multi-Audience Approaches for Medical Data Exploration, Analysis and Communication

    Get PDF
    Medisinsk visualisering er en av de mest applikasjonsrettede områdene av visualiseringsforsking. Tett samarbeid med medisinske eksperter er nødvendig for å tolke medisinsk bildedata og lage betydningsfulle visualiseringsteknikker og visualiseringsapplikasjoner. Kreft er en av de vanligste dødsårsakene, og med økende gjennomsnittsalder i i-land øker også antallet diagnoser av gynekologisk kreft. Moderne avbildningsteknikker er et viktig verktøy for å vurdere svulster og produsere et økende antall bildedata som radiologer må tolke. I tillegg til antallet bildemodaliteter, øker også antallet pasienter, noe som fører til at visualiseringsløsninger må bli skalert opp for å adressere den økende kompleksiteten av multimodal- og multipasientdata. Dessuten er ikke medisinsk visualisering kun tiltenkt medisinsk personale, men har også som mål å informere pasienter, pårørende, og offentligheten om risikoen relatert til visse sykdommer, og mulige behandlinger. Derfor har vi identifisert behovet for å skalere opp medisinske visualiseringsløsninger for å kunne håndtere multipublikumdata. Denne avhandlingen adresserer skaleringen av disse dimensjonene i forskjellige bidrag vi har kommet med. Først presenterer vi teknikkene våre for å skalere visualiseringer i flere modaliteter. Vi introduserer en visualiseringsteknikk som tar i bruk små multipler for å vise data fra flere modaliteter innenfor et bildesnitt. Dette lar radiologer utforske dataen effektivt uten å måtte bruke flere sidestilte vinduer. I det neste steget utviklet vi en analyseplatform ved å ta i bruk «radiomic tumor profiling» på forskjellige bildemodaliteter for å analysere kohortdata og finne nye biomarkører fra bilder. Biomarkører fra bilder er indikatorer basert på bildedata som kan forutsi variabler relatert til kliniske utfall. «Radiomic tumor profiling» er en teknikk som genererer mulige biomarkører fra bilder basert på første- og andregrads statistiske målinger. Applikasjonen lar medisinske eksperter analysere multiparametrisk bildedata for å finne mulige korrelasjoner mellom kliniske parameter og data fra «radiomic tumor profiling». Denne tilnærmingen skalerer i to dimensjoner, multimodal og multipasient. I en senere versjon la vi til funksjonalitet for å skalere multipublikumdimensjonen ved å gjøre applikasjonen vår anvendelig for livmorhalskreft- og prostatakreftdata, i tillegg til livmorkreftdataen som applikasjonen var designet for. I et senere bidrag fokuserer vi på svulstdata på en annen skala og muliggjør analysen av svulstdeler ved å bruke multimodal bildedata i en tilnærming basert på hierarkisk gruppering. Applikasjonen vår finner mulige interessante regioner som kan informere fremtidige behandlingsavgjørelser. I et annet bidrag, en digital sonderingsinteraksjon, fokuserer vi på multipasientdata. Bildedata fra flere pasienter kan sammenlignes for å finne interessante mønster i svulstene som kan være knyttet til hvor aggressive svulstene er. Til slutt skalerer vi multipublikumdimensjonen med en likhetsvisualisering som er anvendelig for forskning på livmorkreft, på bilder av nevrologisk kreft, og maskinlæringsforskning på automatisk segmentering av svulstdata. Som en kontrast til de allerede fremhevete bidragene, fokuserer vårt siste bidrag, ScrollyVis, hovedsakelig på multipublikumkommunikasjon. Vi muliggjør skapelsen av dynamiske og vitenskapelige “scrollytelling”-opplevelser for spesifikke eller generelle publikum. Slike historien kan bli brukt i spesifikke brukstilfeller som kommunikasjon mellom lege og pasient, eller for å kommunisere vitenskapelige resultater via historier til et generelt publikum i en digital museumsutstilling. Våre foreslåtte applikasjoner og interaksjonsteknikker har blitt demonstrert i brukstilfeller og evaluert med domeneeksperter og fokusgrupper. Dette har ført til at noen av våre bidrag allerede er i bruk på andre forskingsinstitusjoner. Vi ønsker å evaluere innvirkningen deres på andre vitenskapelige felt og offentligheten i fremtidige arbeid.Medical visualization is one of the most application-oriented areas of visualization research. Close collaboration with medical experts is essential for interpreting medical imaging data and creating meaningful visualization techniques and visualization applications. Cancer is one of the most common causes of death, and with increasing average age in developed countries, gynecological malignancy case numbers are rising. Modern imaging techniques are an essential tool in assessing tumors and produce an increasing number of imaging data radiologists must interpret. Besides the number of imaging modalities, the number of patients is also rising, leading to visualization solutions that must be scaled up to address the rising complexity of multi-modal and multi-patient data. Furthermore, medical visualization is not only targeted toward medical professionals but also has the goal of informing patients, relatives, and the public about the risks of certain diseases and potential treatments. Therefore, we identify the need to scale medical visualization solutions to cope with multi-audience data. This thesis addresses the scaling of these dimensions in different contributions we made. First, we present our techniques to scale medical visualizations in multiple modalities. We introduced a visualization technique using small multiples to display the data of multiple modalities within one imaging slice. This allows radiologists to explore the data efficiently without having several juxtaposed windows. In the next step, we developed an analysis platform using radiomic tumor profiling on multiple imaging modalities to analyze cohort data and to find new imaging biomarkers. Imaging biomarkers are indicators based on imaging data that predict clinical outcome related variables. Radiomic tumor profiling is a technique that generates potential imaging biomarkers based on first and second-order statistical measurements. The application allows medical experts to analyze the multi-parametric imaging data to find potential correlations between clinical parameters and the radiomic tumor profiling data. This approach scales up in two dimensions, multi-modal and multi-patient. In a later version, we added features to scale the multi-audience dimension by making our application applicable to cervical and prostate cancer data and the endometrial cancer data the application was designed for. In a subsequent contribution, we focus on tumor data on another scale and enable the analysis of tumor sub-parts by using the multi-modal imaging data in a hierarchical clustering approach. Our application finds potentially interesting regions that could inform future treatment decisions. In another contribution, the digital probing interaction, we focus on multi-patient data. The imaging data of multiple patients can be compared to find interesting tumor patterns potentially linked to the aggressiveness of the tumors. Lastly, we scale the multi-audience dimension with our similarity visualization applicable to endometrial cancer research, neurological cancer imaging research, and machine learning research on the automatic segmentation of tumor data. In contrast to the previously highlighted contributions, our last contribution, ScrollyVis, focuses primarily on multi-audience communication. We enable the creation of dynamic scientific scrollytelling experiences for a specific or general audience. Such stories can be used for specific use cases such as patient-doctor communication or communicating scientific results via stories targeting the general audience in a digital museum exhibition. Our proposed applications and interaction techniques have been demonstrated in application use cases and evaluated with domain experts and focus groups. As a result, we brought some of our contributions to usage in practice at other research institutes. We want to evaluate their impact on other scientific fields and the general public in future work.Doktorgradsavhandlin

    Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability

    Full text link
    Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this research area shifted from concentrating on ConvNet-based to transformer-based models. In addition, various interpretability approaches have appeared for transformer models and video temporal dynamics, motivated by the growing interest in basic scientific understanding, model diagnostics and societal implications of real-world deployment. Previous surveys mainly focused on ConvNet models on a subset of video segmentation tasks or transformers for classification tasks. Moreover, component-wise discussion of transformer-based video segmentation models has not yet received due focus. In addition, previous reviews of interpretability methods focused on transformers for classification, while analysis of video temporal dynamics modelling capabilities of video models received less attention. In this survey, we address the above with a thorough discussion of various categories of video segmentation, a component-wise discussion of the state-of-the-art transformer-based models, and a review of related interpretability methods. We first present an introduction to the different video segmentation task categories, their objectives, specific challenges and benchmark datasets. Next, we provide a component-wise review of recent transformer-based models and document the state of the art on different video segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc interpretability methods for transformer models and interpretability methods for understanding the role of the temporal dimension in video models. Finally, we conclude our discussion with future research directions
    corecore