18 research outputs found
Cross-Modal Learning for Sketch Visual Understanding.
PhD Theses.As touching devices have rapidly proliferated, sketch has gained much popularity as an
alternative input to text descriptions and speeches. This is due to the fact that sketch
has the advantage of being informative and convenient, which have stimulated sketchrelated
research in areas such as sketch recognition, sketch segmentation, sketch-based
image retrieval, and photo-to-sketch synthesis. Though these eld has been well touched,
existing sketch works still su er from aligning the sketch and photo domains, resulting
in unsatisfactory quality for both ne-grained retrieval and synthesis between sketch and
photo modalities. To address these problems, in this thesis, we proposed a series novel
works on free-hand sketch related tasks and throw out helpful insights to help future
research.
Sketch conveys ne-grained information, making ne-grained sketch-based image retrieval
one of the most important topics for sketch research. The basic solution for this task
is learning to exploit the informativeness of sketches and link it to other modalities.
Apart from the informativeness of sketches, semantic information is also important to
understanding sketch modality and link it with other related modalities. In this thesis,
we indicate that semantic information can e ectively ll the domain gap between sketch
and photo modalities as a bridge. Based on this observation, we proposed an attributeaware
deep framework to exploit attribute information to aid ne-grained SBIR. Text
descriptions are considered as another semantic alternative to attributes, and at the same
time, with the advantage of more
exible and natural, which are exploited in our proposed
deep multi-task framework. The experimental study has shown that the semantic
attribute information can improve the ne-grained SBIR performance in a large margin.
Sketch also has its unique feature like containing temporal information. In sketch synthesis
task, the understandings from both semantic meanings behind sketches and sketching
i
process are required. The semantic meaning of sketches has been well explored in the
sketch recognition, and sketch retrieval challenges. However, the sketching process has
somehow been ignored, even though the sketching process is also very important for us
to understand the sketch modality, especially considering the unique temporal characteristics
of sketches. in this thesis, we proposed the rst deep photo-to-sketch synthesis
framework, which has provided good performance on sketch synthesis task, as shown in
the experiment section.
Generalisability is an important criterion to judge whether the existing methods are able
to be applied to the real world scenario, especially considering the di culties and costly
expense of collecting sketches and pairwise annotation. We thus proposed a generalised
ne-grained SBIR framework. In detail, we follow the meta-learning strategy, and train
a hyper-network to generate instance-level classi cation weights for the latter matching
network. The e ectiveness of the proposed method has been validated by the extensive
experimental results
From Robust to Generalizable Representation Learning for Person Re-Identification
Person Re-Identification (ReID) is a retrieval task across non-overlapping cameras. Given a person-of-interest as a query, the goal of ReID is to determine whether this person has appeared in another place at a distinct time captured by a different camera, or even the same camera at a different time instant. ReID is considered a zero-shot learning task because the identities present in the training data may not necessarily overlap with those in the test data within the label space. This fundamental characteristic adds a layer of complexity to the task, making ReID a highly challenging representation learning problem. This thesis addresses the problem of learning generalizable yet discriminative representations with the following solutions: Chapter 3: Noisy and unrepresentative frames in automatically generated object bounding boxes from video sequences cause significant challenges in learning discriminative representations in video ReID. Most existing methods tackle this problem by assessing the importance of video frames according to their local part alignments or global appearance correlations separately. However, given the diverse and unknown sources of noise that usually co-exist in captured video data, existing methods have not been sufficiently effective. In this chapter, we explore both local alignments and global correlations jointly, with further consideration of their mutual reinforcement, to better assemble complementary discriminative ReID information within all relevant frames in video tracklets. We propose a model named Local-Global Associative Assembling (LOGA). Specifically, we concurrently optimize a Local Aligned Quality (LAQ) module that distinguishes the quality of each frame based on local alignments, and a Global Correlated Quality (GCQ) module that estimates global appearance correlations. With a locally-assembled global appearance prototype, we associate LAQ and GCQ to exploit their mutual complement. Chapter 4: While deep learning has significantly improved ReID model accuracy under the Independent and Identical Distribution (IID) assumption, it has become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable domain shifts. Contemporary Domain Generalizable ReID models struggle to learn domain-invariant representations solely through training on an instance classification objective. We consider that deep learning models are heavily influenced and thus biased towards domain-specific characteristics, such as background clutter, scale, and viewpoint variations, limiting the generalizability of the learned model. We hypothesize that pedestrians are domain-invariant as they share the same structural characteristics. To enable the ReID model to be less domain-specific, we introduce a Primary-Auxiliary Objectives Association (PAOA) model that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection. To solve the problem of conflicting optimization criteria in the model parameter space between the two learning objectives, PAOA calibrates the loss gradients of the auxiliary task towards the primary learning task gradients. Benefiting from the harmonious multitask learning design, our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective to maximize the modelâs generative capacity in the test target domain. Experiments demonstrate the superiority of the proposed PAOA model. Chapter 5: In this chapter, we propose a Feature-Distribution Perturbation and Calibration (PECA) method to derive generic feature representations for person ReID, which are not only discriminative across cameras but also agnostic and deployable to arbitrary unseen target domains. Specifically, we perform per-domain feature-distribution perturbation to prevent the model from overfitting to the domain-biased distribution of each source (seen) domain by enforcing feature invariance to distribution shifts caused by perturbation. Complementarily, we design a global calibration mechanism to align feature distributions across all source domains to improve the modelâs generalization capacity by eliminating domain bias. These local perturbation and global calibration processes are conducted simultaneously, sharing the same principle of avoiding overfitting by regularization on the perturbed and original distributions, respectively. Extensive experiments conducted on eight person ReID datasets show that the proposed PECA model outperformed state-of-the-art competitors by significant margins. Chapter 6: Existing Domain Generalizable ReID methods explore feature disentanglement to learn a compact generic feature space by eliminating domain-specific knowledge. Such methods not only sacrifice discrimination in target domains but also limit the modelâs robustness against per-identity appearance variations across views, an inherent characteristic of ReID. In this chapter, we formulate a Cross-Domain Variations Mining (CDVM) model to simultaneously explore explicit domain-specific knowledge while advancing generalizable representation learning. Our key insight is that cross-domain style variations need to be explicitly modeled to represent per-identity cross-view appearance changes. CDVM retains the modelâs robustness against cross-view style variations that reflect the specific characteristics of different domains while maximizing the learning of a globally generalizable (invariant) representation. To this end, we propose utilizing cross-domain consensus to learn a domain-agnostic generic prototype. This prototype is then refined by incorporating cross-domain style variations, thereby achieving cross-view feature augmentation. Additionally, we enhance the discriminative power of the augmented representation by formulating an identity attribute constraint to emphasize the importance of individual attributes while maintaining overall consistency across all pedestrians. Extensive experiments validate that the proposed CDVM model outperforms existing state-of-the-art methods by significant margins. These four solutions jointly solve the problem of domain distribution shift for out-of-distribution (OOD) data by enabling the network to derive robust yet generalizable representations for identities, thereby facilitating the differentiation of inter-class decision boundaries and improving matching accuracy among query and gallery instances
White Paper 11: Artificial intelligence, robotics & data science
198 p. : 17 cmSIC white paper on Artificial Intelligence, Robotics and Data Science sketches a preliminary roadmap for addressing current R&D challenges associated with automated and autonomous machines. More than 50 research challenges investigated all over Spain by more than 150 experts within CSIC are presented in eight chapters. Chapter One introduces key concepts and tackles the issue of the integration of knowledge (representation), reasoning and learning in the design of artificial entities. Chapter Two analyses challenges associated with the development of theories âand supporting technologiesâ for modelling the behaviour of autonomous agents. Specifically, it pays attention to the interplay between elements at micro level (individual autonomous agent interactions) with the macro world (the properties we seek in large and complex societies). While Chapter Three discusses the variety of data science applications currently used in all fields of science, paying particular attention to Machine Learning (ML) techniques, Chapter Four presents current development in various areas of robotics. Chapter Five explores the challenges associated with computational cognitive models. Chapter Six pays attention to the ethical, legal, economic and social challenges coming alongside the development of smart systems. Chapter Seven engages with the problem of the environmental sustainability of deploying intelligent systems at large scale. Finally, Chapter Eight deals with the complexity of ensuring the security, safety, resilience and privacy-protection of smart systems against cyber threats.18 EXECUTIVE SUMMARY ARTIFICIAL INTELLIGENCE, ROBOTICS AND DATA SCIENCE Topic Coordinators Sara Degli Esposti ( IPP-CCHS, CSIC ) and Carles Sierra ( IIIA, CSIC ) 18 CHALLENGE 1 INTEGRATING KNOWLEDGE, REASONING AND LEARNING Challenge Coordinators Felip ManyĂ ( IIIA, CSIC ) and AdriĂ ColomĂ© ( IRI, CSIC â UPC ) 38 CHALLENGE 2 MULTIAGENT SYSTEMS Challenge Coordinators N. Osman ( IIIA, CSIC ) and D. LĂłpez ( IFS, CSIC ) 54 CHALLENGE 3 MACHINE LEARNING AND DATA SCIENCE Challenge Coordinators J. J. Ramasco Sukia ( IFISC ) and L. Lloret Iglesias ( IFCA, CSIC ) 80 CHALLENGE 4 INTELLIGENT ROBOTICS Topic Coordinators G. AlenyĂ ( IRI, CSIC â UPC ) and J. Villagra ( CAR, CSIC ) 100 CHALLENGE 5 COMPUTATIONAL COGNITIVE MODELS Challenge Coordinators M. D. del Castillo ( CAR, CSIC) and M. Schorlemmer ( IIIA, CSIC ) 120 CHALLENGE 6 ETHICAL, LEGAL, ECONOMIC, AND SOCIAL IMPLICATIONS Challenge Coordinators P. Noriega ( IIIA, CSIC ) and T. AusĂn ( IFS, CSIC ) 142 CHALLENGE 7 LOW-POWER SUSTAINABLE HARDWARE FOR AI Challenge Coordinators T. Serrano ( IMSE-CNM, CSIC â US ) and A. Oyanguren ( IFIC, CSIC - UV ) 160 CHALLENGE 8 SMART CYBERSECURITY Challenge Coordinators D. Arroyo Guardeño ( ITEFI, CSIC ) and P. Brox JimĂ©nez ( IMSE-CNM, CSIC â US )Peer reviewe
Recommended from our members
Proceedings of EVA London 2024
The Electronic Visualisation and the Arts London 2024 Conference (EVA London 2024) is co-sponsored by the Computer Arts Society (CAS) and BCS, the Chartered Institute for IT, of which the CAS is a Specialist Group. As for 2022, the EVA London 2023 Conference is a physical and online âhybridâ conference. We continue with publishing the proceedings, both online, with open access via ScienceOpen, and also in our traditional printed form, in full colour. The main conference presentations run during 10â13 July 2023, with workshops and other activities, especially for students, on 14 July 2023
Multidisciplinary perspectives on Artificial Intelligence and the law
This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (âAIâ) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics â and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the CatĂłlica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio
Digital work in the planetary market
Many of the worldâs most valuable companies rely on planetary networks of digital work that underpin their products and services. This important book examines implications for both work and workers when jobs are commodified and traded beyond local labor markets. For instance, Amazonâs contractors in Costa Rica, India, and Romania are paid to structure, annotate, and organize conversations captured by âAlexaâ to train Amazonâs speech recognition systems. Findings show that despite its planetary connections, labor remains geographically âstickyâ and embedded in distinct contexts. The research emphasizes the globe-spanning nature of contemporary networks without resorting to an understanding of âthe globalâ as a place beyond space.Aujourdâhui, de nombreux emplois peuvent ĂȘtre exercĂ©s depuis nâimporte oĂč. La technologie numĂ©rique et la connectivitĂ© Internet gĂ©nĂ©ralisĂ©e permettent Ă presque nâimporte qui, nâimporte oĂč, de se connecter Ă nâimporte qui dâautre pour communiquer et interagir Ă lâĂ©chelle planĂ©taire. Ce livre examine les consĂ©quences, tant pour le travail que pour les travailleurs, de la marchandisation et de lâĂ©change des emplois au-delĂ des marchĂ©s du travail locaux. Allant au-delĂ du discours habituel sur la mondialisation « le monde est plat », les contributeurs examinent Ă la fois la transformation du travail lui-mĂȘme et les systĂšmes, rĂ©seaux et processus plus larges qui permettent le travail numĂ©rique dans un marchĂ© planĂ©taire, en offrant des perspectives empiriques et thĂ©oriques. Les contributeurs - des universitaires et des experts de premier plan issus de diverses disciplines - abordent une variĂ©tĂ© de questions, notamment la modĂ©ration du contenu, les vĂ©hicules autonomes et les assistants vocaux. Ils se penchent dâabord sur la nouvelle expĂ©rience du travail et constatent que, malgrĂ© ses connexions planĂ©taires, le travail reste gĂ©ographiquement collĂ© et intĂ©grĂ© dans des contextes distincts. Ils examinent ensuite comment les rĂ©seaux planĂ©taires de travail peuvent ĂȘtre cartographiĂ©s et problĂ©matisĂ©s, ils discutent de la multiplicitĂ© productive et de lâinterdisciplinaritĂ© de la rĂ©flexion sur le travail numĂ©rique et ses rĂ©seaux et, enfin, ils imaginent comment le travail planĂ©taire pourrait ĂȘtre rĂ©glementĂ©. Les directeurs Mark Graham est professeur de gĂ©ographie de lâInternet Ă lâOxford Internet Institute et chargĂ© de cours Ă lâAlan Turing Institute. Il est lâĂ©diteur du livre Digital Economies at Global Margins (MIT Press et CRDI, 2019). Fabian Ferrari est un candidat au doctorat Ă lâOxford Internet Institute
Recommended from our members
Final Report: National Security Commission on Artificial Intelligence
Final report presenting the National Security Commission for Artificial Intelligence (NSCAI)'s recommendations for winning the AI era. It includes a 16 chapter Main Report and "Blueprints for Action that outline the concrete steps departments and agencies can take to implement NSCAI's recommendations." - Introduction
P5 eHealth: An Agenda for the Health Technologies of the Future
This open access volume focuses on the development of a P5 eHealth, or better, a methodological resource for developing the health technologies of the future, based on patientsâ personal characteristics and needs as the fundamental guidelines for design. It provides practical guidelines and evidence based examples on how to design, implement, use and elevate new technologies for healthcare to support the management of incurable, chronic conditions. The volume further discusses the criticalities of eHealth, why it is difficult to employ eHealth from an organizational point of view or why patients do not always accept the technology, and how eHealth interventions can be improved in the future. By dealing with the state-of-the-art in eHealth technologies, this volume is of great interest to researchers in the field of physical and mental healthcare, psychologists, stakeholders and policymakers as well as technology developers working in the healthcare sector
Recent Advances in Social Data and Artificial Intelligence 2019
The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace