2,565 research outputs found

    Fine-grained Controllable Video Generation via Object Appearance and Context

    Full text link
    Text-to-video generation has shown promising results. However, by taking only natural languages as input, users often face difficulties in providing detailed information to precisely control the model's output. In this work, we propose fine-grained controllable video generation (FACTOR) to achieve detailed control. Specifically, FACTOR aims to control objects' appearances and context, including their location and category, in conjunction with the text prompt. To achieve detailed control, we propose a unified framework to jointly inject control signals into the existing text-to-video model. Our model consists of a joint encoder and adaptive cross-attention layers. By optimizing the encoder and the inserted layer, we adapt the model to generate videos that are aligned with both text prompts and fine-grained control. Compared to existing methods relying on dense control signals such as edge maps, we provide a more intuitive and user-friendly interface to allow object-level fine-grained control. Our method achieves controllability of object appearances without finetuning, which reduces the per-subject optimization efforts for the users. Extensive experiments on standard benchmark datasets and user-provided inputs validate that our model obtains a 70% improvement in controllability metrics over competitive baselines.Comment: Project page: https://hhsinping.github.io/facto

    KEER2022

    Get PDF
    Avanttítol: KEER2022. DiversitiesDescripció del recurs: 25 juliol 202

    Knowledge exchange as a predictor of business innovation among SMEs in Ogun State, Nigeria.

    Get PDF
    Purpose: Business innovation is the improvement made on the array of products and services of a firm. Thus, this study investigates the influence of knowledge exchange on business innovations among SMEs in Ogun State, Nigeria. Design / Methodology: The study adopted survey research design. The population of the study comprised 415 staff employed by 55 SMEs in the state. A sample size of 204 was determined with the aid of Taro Yamane determinator. A proportionate sampling technique was used to select respondents through the structured and validated questionnaire that was designed for data collection. Cronbach’s alpha reliability coefficients for the variables ranged from 0.70 to 0.88, while a return rate of 100% was achieved. Data were analysed with descriptive and inferential (simple linear and multiple regression) statistics. Findings: Findings revealed that knowledge exchange had a positive significant influence on business innovation of SMEs in Ogun State. Nigeria The result further indicated that all the measures of knowledge exchange were significant. Based on the findings, this study recommends that SMEs should make more provision for collaboration, improved communication, training and encouragement of staff so as to improve their current achievements in business innovations. Originality / Value: The study has been able to fill some gaps noticed in the literature and also provided empirical justification on the influence of knowledge exchange on business innovation. It has also help to generate new knowledge needed for policy and decision-making relating to accessing innovativeness in SMEs

    On the Design, Implementation and Application of Novel Multi-disciplinary Techniques for explaining Artificial Intelligence Models

    Get PDF
    284 p.Artificial Intelligence is a non-stopping field of research that has experienced some incredible growth lastdecades. Some of the reasons for this apparently exponential growth are the improvements incomputational power, sensing capabilities and data storage which results in a huge increment on dataavailability. However, this growth has been mostly led by a performance-based mindset that has pushedmodels towards a black-box nature. The performance prowess of these methods along with the risingdemand for their implementation has triggered the birth of a new research field. Explainable ArtificialIntelligence. As any new field, XAI falls short in cohesiveness. Added the consequences of dealing withconcepts that are not from natural sciences (explanations) the tumultuous scene is palpable. This thesiscontributes to the field from two different perspectives. A theoretical one and a practical one. The formeris based on a profound literature review that resulted in two main contributions: 1) the proposition of anew definition for Explainable Artificial Intelligence and 2) the creation of a new taxonomy for the field.The latter is composed of two XAI frameworks that accommodate in some of the raging gaps found field,namely: 1) XAI framework for Echo State Networks and 2) XAI framework for the generation ofcounterfactual. The first accounts for the gap concerning Randomized neural networks since they havenever been considered within the field of XAI. Unfortunately, choosing the right parameters to initializethese reservoirs falls a bit on the side of luck and past experience of the scientist and less on that of soundreasoning. The current approach for assessing whether a reservoir is suited for a particular task is toobserve if it yields accurate results, either by handcrafting the values of the reservoir parameters or byautomating their configuration via an external optimizer. All in all, this poses tough questions to addresswhen developing an ESN for a certain application, since knowing whether the created structure is optimalfor the problem at hand is not possible without actually training it. However, some of the main concernsfor not pursuing their application is related to the mistrust generated by their black-box" nature. Thesecond presents a new paradigm to treat counterfactual generation. Among the alternatives to reach auniversal understanding of model explanations, counterfactual examples is arguably the one that bestconforms to human understanding principles when faced with unknown phenomena. Indeed, discerningwhat would happen should the initial conditions differ in a plausible fashion is a mechanism oftenadopted by human when attempting at understanding any unknown. The search for counterfactualsproposed in this thesis is governed by three different objectives. Opposed to the classical approach inwhich counterfactuals are just generated following a minimum distance approach of some type, thisframework allows for an in-depth analysis of a target model by means of counterfactuals responding to:Adversarial Power, Plausibility and Change Intensity

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version
    corecore