18 research outputs found

    Cross-Modal Learning for Sketch Visual Understanding.

    Get PDF
    PhD Theses.As touching devices have rapidly proliferated, sketch has gained much popularity as an alternative input to text descriptions and speeches. This is due to the fact that sketch has the advantage of being informative and convenient, which have stimulated sketchrelated research in areas such as sketch recognition, sketch segmentation, sketch-based image retrieval, and photo-to-sketch synthesis. Though these eld has been well touched, existing sketch works still su er from aligning the sketch and photo domains, resulting in unsatisfactory quality for both ne-grained retrieval and synthesis between sketch and photo modalities. To address these problems, in this thesis, we proposed a series novel works on free-hand sketch related tasks and throw out helpful insights to help future research. Sketch conveys ne-grained information, making ne-grained sketch-based image retrieval one of the most important topics for sketch research. The basic solution for this task is learning to exploit the informativeness of sketches and link it to other modalities. Apart from the informativeness of sketches, semantic information is also important to understanding sketch modality and link it with other related modalities. In this thesis, we indicate that semantic information can e ectively ll the domain gap between sketch and photo modalities as a bridge. Based on this observation, we proposed an attributeaware deep framework to exploit attribute information to aid ne-grained SBIR. Text descriptions are considered as another semantic alternative to attributes, and at the same time, with the advantage of more exible and natural, which are exploited in our proposed deep multi-task framework. The experimental study has shown that the semantic attribute information can improve the ne-grained SBIR performance in a large margin. Sketch also has its unique feature like containing temporal information. In sketch synthesis task, the understandings from both semantic meanings behind sketches and sketching i process are required. The semantic meaning of sketches has been well explored in the sketch recognition, and sketch retrieval challenges. However, the sketching process has somehow been ignored, even though the sketching process is also very important for us to understand the sketch modality, especially considering the unique temporal characteristics of sketches. in this thesis, we proposed the rst deep photo-to-sketch synthesis framework, which has provided good performance on sketch synthesis task, as shown in the experiment section. Generalisability is an important criterion to judge whether the existing methods are able to be applied to the real world scenario, especially considering the di culties and costly expense of collecting sketches and pairwise annotation. We thus proposed a generalised ne-grained SBIR framework. In detail, we follow the meta-learning strategy, and train a hyper-network to generate instance-level classi cation weights for the latter matching network. The e ectiveness of the proposed method has been validated by the extensive experimental results

    From Robust to Generalizable Representation Learning for Person Re-Identification

    Get PDF
    Person Re-Identification (ReID) is a retrieval task across non-overlapping cameras. Given a person-of-interest as a query, the goal of ReID is to determine whether this person has appeared in another place at a distinct time captured by a different camera, or even the same camera at a different time instant. ReID is considered a zero-shot learning task because the identities present in the training data may not necessarily overlap with those in the test data within the label space. This fundamental characteristic adds a layer of complexity to the task, making ReID a highly challenging representation learning problem. This thesis addresses the problem of learning generalizable yet discriminative representations with the following solutions: Chapter 3: Noisy and unrepresentative frames in automatically generated object bounding boxes from video sequences cause significant challenges in learning discriminative representations in video ReID. Most existing methods tackle this problem by assessing the importance of video frames according to their local part alignments or global appearance correlations separately. However, given the diverse and unknown sources of noise that usually co-exist in captured video data, existing methods have not been sufficiently effective. In this chapter, we explore both local alignments and global correlations jointly, with further consideration of their mutual reinforcement, to better assemble complementary discriminative ReID information within all relevant frames in video tracklets. We propose a model named Local-Global Associative Assembling (LOGA). Specifically, we concurrently optimize a Local Aligned Quality (LAQ) module that distinguishes the quality of each frame based on local alignments, and a Global Correlated Quality (GCQ) module that estimates global appearance correlations. With a locally-assembled global appearance prototype, we associate LAQ and GCQ to exploit their mutual complement. Chapter 4: While deep learning has significantly improved ReID model accuracy under the Independent and Identical Distribution (IID) assumption, it has become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable domain shifts. Contemporary Domain Generalizable ReID models struggle to learn domain-invariant representations solely through training on an instance classification objective. We consider that deep learning models are heavily influenced and thus biased towards domain-specific characteristics, such as background clutter, scale, and viewpoint variations, limiting the generalizability of the learned model. We hypothesize that pedestrians are domain-invariant as they share the same structural characteristics. To enable the ReID model to be less domain-specific, we introduce a Primary-Auxiliary Objectives Association (PAOA) model that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection. To solve the problem of conflicting optimization criteria in the model parameter space between the two learning objectives, PAOA calibrates the loss gradients of the auxiliary task towards the primary learning task gradients. Benefiting from the harmonious multitask learning design, our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective to maximize the model’s generative capacity in the test target domain. Experiments demonstrate the superiority of the proposed PAOA model. Chapter 5: In this chapter, we propose a Feature-Distribution Perturbation and Calibration (PECA) method to derive generic feature representations for person ReID, which are not only discriminative across cameras but also agnostic and deployable to arbitrary unseen target domains. Specifically, we perform per-domain feature-distribution perturbation to prevent the model from overfitting to the domain-biased distribution of each source (seen) domain by enforcing feature invariance to distribution shifts caused by perturbation. Complementarily, we design a global calibration mechanism to align feature distributions across all source domains to improve the model’s generalization capacity by eliminating domain bias. These local perturbation and global calibration processes are conducted simultaneously, sharing the same principle of avoiding overfitting by regularization on the perturbed and original distributions, respectively. Extensive experiments conducted on eight person ReID datasets show that the proposed PECA model outperformed state-of-the-art competitors by significant margins. Chapter 6: Existing Domain Generalizable ReID methods explore feature disentanglement to learn a compact generic feature space by eliminating domain-specific knowledge. Such methods not only sacrifice discrimination in target domains but also limit the model’s robustness against per-identity appearance variations across views, an inherent characteristic of ReID. In this chapter, we formulate a Cross-Domain Variations Mining (CDVM) model to simultaneously explore explicit domain-specific knowledge while advancing generalizable representation learning. Our key insight is that cross-domain style variations need to be explicitly modeled to represent per-identity cross-view appearance changes. CDVM retains the model’s robustness against cross-view style variations that reflect the specific characteristics of different domains while maximizing the learning of a globally generalizable (invariant) representation. To this end, we propose utilizing cross-domain consensus to learn a domain-agnostic generic prototype. This prototype is then refined by incorporating cross-domain style variations, thereby achieving cross-view feature augmentation. Additionally, we enhance the discriminative power of the augmented representation by formulating an identity attribute constraint to emphasize the importance of individual attributes while maintaining overall consistency across all pedestrians. Extensive experiments validate that the proposed CDVM model outperforms existing state-of-the-art methods by significant margins. These four solutions jointly solve the problem of domain distribution shift for out-of-distribution (OOD) data by enabling the network to derive robust yet generalizable representations for identities, thereby facilitating the differentiation of inter-class decision boundaries and improving matching accuracy among query and gallery instances

    White Paper 11: Artificial intelligence, robotics & data science

    Get PDF
    198 p. : 17 cmSIC white paper on Artificial Intelligence, Robotics and Data Science sketches a preliminary roadmap for addressing current R&D challenges associated with automated and autonomous machines. More than 50 research challenges investigated all over Spain by more than 150 experts within CSIC are presented in eight chapters. Chapter One introduces key concepts and tackles the issue of the integration of knowledge (representation), reasoning and learning in the design of artificial entities. Chapter Two analyses challenges associated with the development of theories –and supporting technologies– for modelling the behaviour of autonomous agents. Specifically, it pays attention to the interplay between elements at micro level (individual autonomous agent interactions) with the macro world (the properties we seek in large and complex societies). While Chapter Three discusses the variety of data science applications currently used in all fields of science, paying particular attention to Machine Learning (ML) techniques, Chapter Four presents current development in various areas of robotics. Chapter Five explores the challenges associated with computational cognitive models. Chapter Six pays attention to the ethical, legal, economic and social challenges coming alongside the development of smart systems. Chapter Seven engages with the problem of the environmental sustainability of deploying intelligent systems at large scale. Finally, Chapter Eight deals with the complexity of ensuring the security, safety, resilience and privacy-protection of smart systems against cyber threats.18 EXECUTIVE SUMMARY ARTIFICIAL INTELLIGENCE, ROBOTICS AND DATA SCIENCE Topic Coordinators Sara Degli Esposti ( IPP-CCHS, CSIC ) and Carles Sierra ( IIIA, CSIC ) 18 CHALLENGE 1 INTEGRATING KNOWLEDGE, REASONING AND LEARNING Challenge Coordinators Felip ManyĂ  ( IIIA, CSIC ) and AdriĂ  ColomĂ© ( IRI, CSIC – UPC ) 38 CHALLENGE 2 MULTIAGENT SYSTEMS Challenge Coordinators N. Osman ( IIIA, CSIC ) and D. LĂłpez ( IFS, CSIC ) 54 CHALLENGE 3 MACHINE LEARNING AND DATA SCIENCE Challenge Coordinators J. J. Ramasco Sukia ( IFISC ) and L. Lloret Iglesias ( IFCA, CSIC ) 80 CHALLENGE 4 INTELLIGENT ROBOTICS Topic Coordinators G. AlenyĂ  ( IRI, CSIC – UPC ) and J. Villagra ( CAR, CSIC ) 100 CHALLENGE 5 COMPUTATIONAL COGNITIVE MODELS Challenge Coordinators M. D. del Castillo ( CAR, CSIC) and M. Schorlemmer ( IIIA, CSIC ) 120 CHALLENGE 6 ETHICAL, LEGAL, ECONOMIC, AND SOCIAL IMPLICATIONS Challenge Coordinators P. Noriega ( IIIA, CSIC ) and T. AusĂ­n ( IFS, CSIC ) 142 CHALLENGE 7 LOW-POWER SUSTAINABLE HARDWARE FOR AI Challenge Coordinators T. Serrano ( IMSE-CNM, CSIC – US ) and A. Oyanguren ( IFIC, CSIC - UV ) 160 CHALLENGE 8 SMART CYBERSECURITY Challenge Coordinators D. Arroyo Guardeño ( ITEFI, CSIC ) and P. Brox JimĂ©nez ( IMSE-CNM, CSIC – US )Peer reviewe

    Multidisciplinary perspectives on Artificial Intelligence and the law

    Get PDF
    This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio

    Digital work in the planetary market

    Get PDF
    Many of the world’s most valuable companies rely on planetary networks of digital work that underpin their products and services. This important book examines implications for both work and workers when jobs are commodified and traded beyond local labor markets. For instance, Amazon’s contractors in Costa Rica, India, and Romania are paid to structure, annotate, and organize conversations captured by ‘Alexa’ to train Amazon’s speech recognition systems. Findings show that despite its planetary connections, labor remains geographically “sticky” and embedded in distinct contexts. The research emphasizes the globe-spanning nature of contemporary networks without resorting to an understanding of “the global” as a place beyond space.Aujourd’hui, de nombreux emplois peuvent ĂȘtre exercĂ©s depuis n’importe oĂč. La technologie numĂ©rique et la connectivitĂ© Internet gĂ©nĂ©ralisĂ©e permettent Ă  presque n’importe qui, n’importe oĂč, de se connecter Ă  n’importe qui d’autre pour communiquer et interagir Ă  l’échelle planĂ©taire. Ce livre examine les consĂ©quences, tant pour le travail que pour les travailleurs, de la marchandisation et de l’échange des emplois au-delĂ  des marchĂ©s du travail locaux. Allant au-delĂ  du discours habituel sur la mondialisation « le monde est plat », les contributeurs examinent Ă  la fois la transformation du travail lui-mĂȘme et les systĂšmes, rĂ©seaux et processus plus larges qui permettent le travail numĂ©rique dans un marchĂ© planĂ©taire, en offrant des perspectives empiriques et thĂ©oriques. Les contributeurs - des universitaires et des experts de premier plan issus de diverses disciplines - abordent une variĂ©tĂ© de questions, notamment la modĂ©ration du contenu, les vĂ©hicules autonomes et les assistants vocaux. Ils se penchent d’abord sur la nouvelle expĂ©rience du travail et constatent que, malgrĂ© ses connexions planĂ©taires, le travail reste gĂ©ographiquement collĂ© et intĂ©grĂ© dans des contextes distincts. Ils examinent ensuite comment les rĂ©seaux planĂ©taires de travail peuvent ĂȘtre cartographiĂ©s et problĂ©matisĂ©s, ils discutent de la multiplicitĂ© productive et de l’interdisciplinaritĂ© de la rĂ©flexion sur le travail numĂ©rique et ses rĂ©seaux et, enfin, ils imaginent comment le travail planĂ©taire pourrait ĂȘtre rĂ©glementĂ©. Les directeurs Mark Graham est professeur de gĂ©ographie de l’Internet Ă  l’Oxford Internet Institute et chargĂ© de cours Ă  l’Alan Turing Institute. Il est l’éditeur du livre Digital Economies at Global Margins (MIT Press et CRDI, 2019). Fabian Ferrari est un candidat au doctorat Ă  l’Oxford Internet Institute

    P5 eHealth: An Agenda for the Health Technologies of the Future

    Get PDF
    This open access volume focuses on the development of a P5 eHealth, or better, a methodological resource for developing the health technologies of the future, based on patients’ personal characteristics and needs as the fundamental guidelines for design. It provides practical guidelines and evidence based examples on how to design, implement, use and elevate new technologies for healthcare to support the management of incurable, chronic conditions. The volume further discusses the criticalities of eHealth, why it is difficult to employ eHealth from an organizational point of view or why patients do not always accept the technology, and how eHealth interventions can be improved in the future. By dealing with the state-of-the-art in eHealth technologies, this volume is of great interest to researchers in the field of physical and mental healthcare, psychologists, stakeholders and policymakers as well as technology developers working in the healthcare sector

    Recent Advances in Social Data and Artificial Intelligence 2019

    Get PDF
    The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace
    corecore