322 research outputs found

    Joint Task and Data Oriented Semantic Communications: A Deep Separate Source-channel Coding Scheme

    Full text link
    Semantic communications are expected to accomplish various semantic tasks with relatively less spectrum resource by exploiting the semantic feature of source data. To simultaneously serve both the data transmission and semantic tasks, joint data compression and semantic analysis has become pivotal issue in semantic communications. This paper proposes a deep separate source-channel coding (DSSCC) framework for the joint task and data oriented semantic communications (JTD-SC) and utilizes the variational autoencoder approach to solve the rate-distortion problem with semantic distortion. First, by analyzing the Bayesian model of the DSSCC framework, we derive a novel rate-distortion optimization problem via the Bayesian inference approach for general data distributions and semantic tasks. Next, for a typical application of joint image transmission and classification, we combine the variational autoencoder approach with a forward adaption scheme to effectively extract image features and adaptively learn the density information of the obtained features. Finally, an iterative training algorithm is proposed to tackle the overfitting issue of deep learning models. Simulation results reveal that the proposed scheme achieves better coding gain as well as data recovery and classification performance in most scenarios, compared to the classical compression schemes and the emerging deep joint source-channel schemes

    ADU-Depth: Attention-based Distillation with Uncertainty Modeling for Depth Estimation

    Full text link
    Monocular depth estimation is challenging due to its inherent ambiguity and ill-posed nature, yet it is quite important to many applications. While recent works achieve limited accuracy by designing increasingly complicated networks to extract features with limited spatial geometric cues from a single RGB image, we intend to introduce spatial cues by training a teacher network that leverages left-right image pairs as inputs and transferring the learned 3D geometry-aware knowledge to the monocular student network. Specifically, we present a novel knowledge distillation framework, named ADU-Depth, with the goal of leveraging the well-trained teacher network to guide the learning of the student network, thus boosting the precise depth estimation with the help of extra spatial scene information. To enable domain adaptation and ensure effective and smooth knowledge transfer from teacher to student, we apply both attention-adapted feature distillation and focal-depth-adapted response distillation in the training stage. In addition, we explicitly model the uncertainty of depth estimation to guide distillation in both feature space and result space to better produce 3D-aware knowledge from monocular observations and thus enhance the learning for hard-to-predict image regions. Our extensive experiments on the real depth estimation datasets KITTI and DrivingStereo demonstrate the effectiveness of the proposed method, which ranked 1st on the challenging KITTI online benchmark.Comment: accepted by CoRL 202

    Negative first impression judgements of autistic children by non-autistic adults

    Get PDF
    IntroductionAlthough autism inclusion and acceptance has increased in recent years, autistic people continue to face stigmatization, exclusion, and victimization. Based on brief 10-second videos, non-autistic adults rate autistic adults less favourably than they rate non-autistic adults in terms of traits and behavioural intentions. In the current study, we extended this paradigm to investigate the first impressions of autistic and non-autistic children by non-autistic adult raters and examined the relationship between the rater's own characteristics and bias against autistic children.MethodSegments of video recorded interviews from 15 autistic and 15 non-autistic children were shown to 346 undergraduate students in audio with video, audio only, video only, transcript, or still image conditions. Participants rated each child on a series of traits and behavioural intentions toward the child, and then completed a series of questionnaires measuring their own social competence, autistic traits, quantity and quality of past experiences with autistic people, and explicit autism stigma.ResultsOverall, autistic children were rated more negatively than non-autistic children, particularly in conditions containing audio. Raters with higher social competence and explicit autism stigma rated autistic children more negatively, whereas raters with more autistic traits and more positive past experiences with autistic people rated autistic children more positively.DiscussionThese rapid negative judgments may contribute to the social exclusion experienced by autistic children. The findings indicate that certain personal characteristics may be related to more stigmatised views of autism and decreased willingness to interact with the autistic person. The implications of the findings are discussed in relation to the social inclusion and well-being of autistic people

    Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos

    Full text link
    The success of the Neural Radiance Fields (NeRFs) for modeling and free-view rendering static objects has inspired numerous attempts on dynamic scenes. Current techniques that utilize neural rendering for facilitating free-view videos (FVVs) are restricted to either offline rendering or are capable of processing only brief sequences with minimal motion. In this paper, we present a novel technique, Residual Radiance Field or ReRF, as a highly compact neural representation to achieve real-time FVV rendering on long-duration dynamic scenes. ReRF explicitly models the residual information between adjacent timestamps in the spatial-temporal feature space, with a global coordinate-based tiny MLP as the feature decoder. Specifically, ReRF employs a compact motion grid along with a residual feature grid to exploit inter-frame feature similarities. We show such a strategy can handle large motions without sacrificing quality. We further present a sequential training scheme to maintain the smoothness and the sparsity of the motion/residual grids. Based on ReRF, we design a special FVV codec that achieves three orders of magnitudes compression rate and provides a companion ReRF player to support online streaming of long-duration FVVs of dynamic scenes. Extensive experiments demonstrate the effectiveness of ReRF for compactly representing dynamic radiance fields, enabling an unprecedented free-viewpoint viewing experience in speed and quality.Comment: Accepted by CVPR 2023. Project page, see https://aoliao12138.github.io/ReRF

    Automatic Animation of Hair Blowing in Still Portrait Photos

    Full text link
    We propose a novel approach to animate human hair in a still portrait photo. Existing work has largely studied the animation of fluid elements such as water and fire. However, hair animation for a real image remains underexplored, which is a challenging problem, due to the high complexity of hair structure and dynamics. Considering the complexity of hair structure, we innovatively treat hair wisp extraction as an instance segmentation problem, where a hair wisp is referred to as an instance. With advanced instance segmentation networks, our method extracts meaningful and natural hair wisps. Furthermore, we propose a wisp-aware animation module that animates hair wisps with pleasing motions without noticeable artifacts. The extensive experiments show the superiority of our method. Our method provides the most pleasing and compelling viewing experience in the qualitative experiments and outperforms state-of-the-art still-image animation methods by a large margin in the quantitative evaluation. Project url: \url{https://nevergiveu.github.io/AutomaticHairBlowing/}Comment: Accepted to ICCV 202

    Boas práticas para submissão de recursos educacionais em repositórios

    Get PDF
    Este trabalho se prop√Ķe a investigar boas pr√°ticas para a submiss√£o e o autoarquivamento de recursos educacionais em reposit√≥rios digitais, de modo a incentivar a implementa√ß√£o de uma pol√≠tica institucional para o acervo da Comunidade de Recursos Educacionais do Lume, o Reposit√≥rio Digital da Universidade Federal do Rio Grande do Sul. Recursos educacionais s√£o materiais voltados ao ensino, aprendizagem e pesquisa, produzidos em diferentes suportes e formatos. Tais recursos s√£o depositados nos reposit√≥rios digitais atrav√©s de autoarquivamento, que pode ser feito pelo pr√≥prio autor ou pessoa autorizada. Pol√≠ticas s√£o a√ß√Ķes estrat√©gicas coordenadas, concebidas para atingir metas ou objetivos espec√≠ficos de uma coletividade, enquanto boas pr√°ticas s√£o o conjunto de t√©cnicas ou procedimentos identificados como os mais adequados para a realiza√ß√£o de determinada tarefa. Com rela√ß√£o √† metodologia empregada, trata-se de um estudo qualitativo, de natureza aplicada e com an√°lise explorat√≥rio-descritiva. O procedimento adotado foi o estudo de caso. A elabora√ß√£o da pesquisa compreendeu cinco etapas. Na primeira, efetuou-se um diagn√≥stico da situa√ß√£o encontrada na UFRGS, referente aos procedimentos de submiss√£o e autoarquivamento dos recursos educacionais no Lume. Na segunda etapa, realizou-se sele√ß√£o de reposit√≥rios que pudessem ser avaliados quanto √† ado√ß√£o de boas pr√°ticas na avalia√ß√£o dos objetos em tela. A terceira etapa compreendeu pesquisa documental nos reposit√≥rios selecionados, enquanto a quarta etapa analisou os dados obtidos na etapa anterior. Com isso, foi poss√≠vel compor um conjunto de boas pr√°ticas com aplica√ß√£o poss√≠vel n√£o s√≥ ao contexto da UFRGS, mas tamb√©m ao de outras institui√ß√Ķes. Por fim, a √ļltima etapa do m√©todo recomendou a cria√ß√£o de uma pol√≠tica institucional para o acervo da Comunidade de Recursos Educacionais do Reposit√≥rio Lume. Conclui-se que a elabora√ß√£o de normativas para o acervo √© fundamental, proporcionar√° a melhoria dos fluxos e procedimentos adotados, promovendo assim o compartilhamento, uso e re√ļso dos recursos educacionais disponibilizados.This paper aims to investigate good practices for submission and self-archiving of educational resources in digital repositories in order to encourage the implementation of an institutional policy for the collection of Lume's Educational Resources Community, the Digital Repository of the Federal University of Rio Grande do Sul. Educational resources are materials aimed at teaching, learning and research, produced in different media and formats. They are deposited in digital repositories through self-archiving, a procedure that can be performed by the author or an authorized person. Policies are coordinated strategic actions designed to achieve specific goals or objectives of a collectivity, while good practices are the set of techniques or procedures identified as most appropriate for performing a given task. Regarding the methodology used, this is a qualitative study, of an applied nature, with exploratory-descriptive analysis. The procedure adopted was the case study. The research comprised five stages. In the first stage, a diagnosis of the situation found at UFRGS was made regarding the submission and self-archiving procedures of educational resources at Lume. In the second stage, a selection of repositories that could be evaluated for the adoption of good practices in the evaluation of the objects on screen was carried out. The third step comprised documentary research in the selected repositories, while the fourth step analyzed the data obtained in the previous step. With this, it was possible to compose a set of good practices with possible application not only to the UFRGS context, but also to other institutions. Finally, based on the data obtained, it was recommended the creation of an institutional policy for the collection of the Educational Resources Community of Lume. It is concluded that the development of norms for the collection is essential, it will provide the improvement of flows and procedures adopted, thus promoting the sharing, use and reuse of educational resources made available

    StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation

    Full text link
    Anticipation problem has been studied considering different aspects such as predicting humans' locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this paper, we studied the short-term object interaction anticipation problem from the egocentric point of view, proposing a new end-to-end architecture named StillFast. Our approach simultaneously processes a still image and a video detecting and localizing next-active objects, predicting the verb which describes the future interaction and determining when the interaction will start. Experiments on the large-scale egocentric dataset EGO4D show that our method outperformed state-of-the-art approaches on the considered task. Our method is ranked first in the public leaderboard of the EGO4D short term object interaction anticipation challenge 2022. Please see the project web page for code and additional details: https://iplab.dmi.unict.it/stillfast/

    Exploring Effective Mask Sampling Modeling for Neural Image Compression

    Full text link
    Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression. Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage. Moreover, to further reduce channel redundancy, we propose the Learnable Channel Mask Module (LCMM) and the Learnable Channel Completion Module (LCCM). Our plug-and-play CMSM, LCMM, LCCM modules can apply to both CNN-based and Transformer-based architectures, significantly reduce the computational cost, and improve the quality of images. Experiments on the public Kodak and Tecnick datasets demonstrate that our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods.Comment: 10 page

    Scene representation and matching for visual localization in hybrid camera scenarios

    Get PDF
    Scene representation and matching are crucial steps in a variety of tasks ranging from 3D reconstruction to virtual/augmented/mixed reality applications, to robotics, and others. While approaches exist that tackle these tasks, they mostly overlook the issue of efficiency in the scene representation, which is fundamental in resource-constrained systems and for increasing computing speed. Also, they normally assume the use of projective cameras, while performance on systems based on other camera geometries remains suboptimal. This dissertation contributes with a new efficient scene representation method that dramatically reduces the number of 3D points. The approach sets up an optimization problem for the automated selection of the most relevant points to retain. This leads to a constrained quadratic program, which is solved optimally with a newly introduced variant of the sequential minimal optimization method. In addition, a new initialization approach is introduced for the fast convergence of the method. Extensive experimentation on public benchmark datasets demonstrates that the approach produces a compressed scene representation quickly while delivering accurate pose estimates. The dissertation also contributes with new methods for scene matching that go beyond the use of projective cameras. Alternative camera geometries, like fisheye cameras, produce images with very high distortion, making current image feature point detectors and descriptors less efficient, since designed for projective cameras. New methods based on deep learning are introduced to address this problem, where feature detectors and descriptors can overcome distortion effects and more effectively perform feature matching between pairs of fisheye images, and also between hybrid pairs of fisheye and perspective images. Due to the limited availability of fisheye-perspective image datasets, three datasets were collected for training and testing the methods. The results demonstrate an increase of the detection and matching rates which outperform the current state-of-the-art methods
    • ‚Ķ
    corecore