15,929 research outputs found

    Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification

    Full text link
    Generalizable person re-identification (Re-ID) is a very hot research topic in machine learning and computer vision, which plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. However, previous methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which easily leads to poor generalization capability when adapted to the new domain. In this paper, we propose a Multi-Modal Equivalent Transformer called MMET for more robust visual-semantic embedding learning on visual, textual and visual-textual tasks respectively. To further enhance the robust feature learning in the context of transformer, a dynamic masking mechanism called Masked Multimodal Modeling strategy (MMM) is introduced to mask both the image patches and the text tokens, which can jointly works on multimodal or unimodal data and significantly boost the performance of generalizable person Re-ID. Extensive experiments on benchmark datasets demonstrate the competitive performance of our method over previous approaches. We hope this method could advance the research towards visual-semantic representation learning. Our source code is also publicly available at https://github.com/JeremyXSC/MMET

    The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions

    Full text link
    The Metaverse offers a second world beyond reality, where boundaries are non-existent, and possibilities are endless through engagement and immersive experiences using the virtual reality (VR) technology. Many disciplines can benefit from the advancement of the Metaverse when accurately developed, including the fields of technology, gaming, education, art, and culture. Nevertheless, developing the Metaverse environment to its full potential is an ambiguous task that needs proper guidance and directions. Existing surveys on the Metaverse focus only on a specific aspect and discipline of the Metaverse and lack a holistic view of the entire process. To this end, a more holistic, multi-disciplinary, in-depth, and academic and industry-oriented review is required to provide a thorough study of the Metaverse development pipeline. To address these issues, we present in this survey a novel multi-layered pipeline ecosystem composed of (1) the Metaverse computing, networking, communications and hardware infrastructure, (2) environment digitization, and (3) user interactions. For every layer, we discuss the components that detail the steps of its development. Also, for each of these components, we examine the impact of a set of enabling technologies and empowering domains (e.g., Artificial Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on its advancement. In addition, we explain the importance of these technologies to support decentralization, interoperability, user experiences, interactions, and monetization. Our presented study highlights the existing challenges for each component, followed by research directions and potential solutions. To the best of our knowledge, this survey is the most comprehensive and allows users, scholars, and entrepreneurs to get an in-depth understanding of the Metaverse ecosystem to find their opportunities and potentials for contribution

    Mobile Arts for Peace: Small Grants Evaluation Report

    Get PDF
    The Mobile Arts for Peace (MAP) project is an international study that seeks to provide a comparative approach to peace-building utilising interdisciplinary arts-based practices, working with communities in Indonesia, Kyrgyzstan, Nepal and Rwanda (see figure 1.1). This research was commissioned by the project lead organisation, the University of Lincoln, and has been delivered by the University of Northampton’s Institute for Social Innovation and Impact (see Appendix A for research biographies). This report focuses on the Small Grants awarded across the four countries, and acts as a follow-up to the Phase One Report that was produced in the winter of 2021. The delivery of the Small Grants projects has taken place over the last 12 months across the above four countries, and this report seeks to demonstrate, through a narrative case-study approach, how the Small Grants work delivered has promoted arts-based peacebuilding and supported community cohesion. The research reported in this document took place between February and October 2022 and focused on the below research aim and four key research questions. Aim: To evaluate the efficacy of the MAP Small Grants projects and understand their impact in communities. Specifically: 1. What outputs were delivered through the Small Grants projects? 2. What outcomes for beneficiaries/stakeholders were delivered through the Small Grants projects? 3. What impacts delivered for communities and societies across the four countries were delivered through the Small Grants projects? The report is structured as follows: first, the methodological approach undertaken in the evaluation will be presented; second, the case-studies across the four countries will be presented and discussed, utilising data gathered by the in-country research teams and the arts-based outputs produced; third, the findings will be summarised, with specific recommendations also made for the implications related to the MAP Large Grant evaluation projects and the recently awarded MAP Medium Grant projects. References and Appendices can also be found at the end of the report

    Comedians without a Cause: The Politics and Aesthetics of Humour in Dutch Cabaret (1966-2020)

    Get PDF
    Comedians play an important role in society and public debate. While comedians have been considered important cultural critics for quite some time, comedy has acquired a new social and political significance in recent years, with humour taking centre stage in political and social debates around issues of identity, social justice, and freedom of speech. To understand the shifting meanings and political implications of humour within a Dutch context, this PhD thesis examines the political and aesthetic workings of humour in the highly popular Dutch cabaret genre, focusing on cabaret performances from the 1960s to the present. The central questions of the thesis are: how do comedians use humour to deliver social critique, and how does their humour resonate with political ideologies? These questions are answered by adopting a cultural studies approach to humour, which is used to analyse Dutch cabaret performances, and by studying related materials such as reviews and media interviews with comedians. This thesis shows that, from the 1960s onwards, Dutch comedians have been considered ‘progressive rebels’ – politically engaged, subversive, and carrying a left-wing political agenda – but that this image is in need of correction. While we tend to look for progressive political messages in the work of comedians who present themselves as being anti-establishment rebels – such as Youp van ‘t Hek, Hans Teeuwen, and Theo Maassen – this thesis demonstrates that their transgressive and provocative humour tends to protect social hierarchies and relationships of power. Moreover, it shows that, paradoxically, both the deliberately moderate and nuanced humour of Wim Kan and Claudia de Breij, and the seemingly past-oriented nostalgia of Alex Klaasen, are more radical and progressive than the transgressive humour of van ‘t Hek, Teeuwen and Maassen. Finally, comedians who present absurdist or deconstructionist forms of humour, such as the early student cabarets, Freek de Jonge, and Micha Wertheim, tend to disassociate themselves from an explicit political engagement. By challenging the dominant image of the Dutch comedian as a ‘progressive rebel,’ this thesis contributes to a better understanding of humour in the present cultural moment, in which humour is often either not taken seriously, or one-sidedly celebrated as being merely pleasurable, innocent, or progressively liberating. In so doing, this thesis concludes, the ‘dark’ and more conservative sides of humour tend to get obscured

    People make Places

    Get PDF
    For centuries Glasgow, as a bucolic fishing village and ecclesiastical centre on the banks of the River Clyde, held little of strategic significance. When success and later threats came to the city, it was as a consequence of explosive growth during the industrial era that left a significant civic presence accompanied by social and environmental challenges. Wartime damage to the fabric of the city and the subsequent implementation of modernist planning left Glasgow with a series of existential threats to the lives and the health of its people that have taken time to understand and come to terms with. In a few remarkable decades of late 20th century regeneration, Glasgow began to be put back together. The trauma of the second half of the 20th century is fading but not yet a distant memory. Existential threats from the climate emergency can provoke the reaction “what, again?” However, the resilience built over the last 50 years has instilled a belief that a constructive, pro-active and creative approach to face this challenge along with the recognition that such action can be transformational for safeguarding and improving people’s lives and the quality of their places. A process described as a just transition that has become central to Glasgow’s approach. Of Scotland’s four big cities, three are surrounded by landscape and sea only Glasgow is surrounded by itself. Even with a small territory, Glasgow is still the largest of Scotland’s big cities and by some margin. When the wider metropolitan area is considered, Glasgow is – like Birmingham, Manchester and Liverpool – no mean city. People make Places begins with a review of the concept and complexities of place, discusses why these matter and reviews the growing body of evidence that place quality can deliver economic, social and environmental value. The following chapters focus on the history and evolution of modern Glasgow in four eras of 19th and early 20th century industrialisation, de- industrialisation and modernism in mid 20th century, late 20th century regeneration and a 21st century recovery towards transition and renaissance, and document the process, synthesis and the results of a major engagement programme and to explore systematic approaches to place and consensus building around the principal issues. The second half of the work reflects on a stocktaking of place in contemporary Glasgow, looking at the city through the lenses of an international, metropolitan and everyday city, concluding with a review of the places of Glasgow and what may be learned from them revealing some valuable insights presented in a series of Place Stories included. The concluding chapter sets out the findings of the investigation and analysis reviewing place goals, challenges and opportunities for Glasgow over the decades to 2030 and 2040 and ends with some recommendations about what Glasgow might do better to combine place thinking and climate awareness and setting out practical steps to mobilise Glasgow’s ‘place ecosystem’

    SViTT: Temporal Learning of Sparse Video-Text Transformers

    Full text link
    Do video-text transformers learn to model temporal relationships across frames? Despite their immense capacity and the abundance of multimodal training data, recent work has revealed the strong tendency of video-text models towards frame-based spatial representations, while temporal reasoning remains largely unsolved. In this work, we identify several key challenges in temporal learning of video-text transformers: the spatiotemporal trade-off from limited network size; the curse of dimensionality for multi-frame modeling; and the diminishing returns of semantic information by extending clip length. Guided by these findings, we propose SViTT, a sparse video-text architecture that performs multi-frame reasoning with significantly lower cost than naive transformers with dense attention. Analogous to graph-based networks, SViTT employs two forms of sparsity: edge sparsity that limits the query-key communications between tokens in self-attention, and node sparsity that discards uninformative visual tokens. Trained with a curriculum which increases model sparsity with the clip length, SViTT outperforms dense transformer baselines on multiple video-text retrieval and question answering benchmarks, with a fraction of computational cost. Project page: http://svcl.ucsd.edu/projects/svitt.Comment: CVPR 202

    Machine Learning Research Trends in Africa: A 30 Years Overview with Bibliometric Analysis Review

    Full text link
    In this paper, a critical bibliometric analysis study is conducted, coupled with an extensive literature survey on recent developments and associated applications in machine learning research with a perspective on Africa. The presented bibliometric analysis study consists of 2761 machine learning-related documents, of which 98% were articles with at least 482 citations published in 903 journals during the past 30 years. Furthermore, the collated documents were retrieved from the Science Citation Index EXPANDED, comprising research publications from 54 African countries between 1993 and 2021. The bibliometric study shows the visualization of the current landscape and future trends in machine learning research and its application to facilitate future collaborative research and knowledge exchange among authors from different research institutions scattered across the African continent

    Learning disentangled speech representations

    Get PDF
    A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody. The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions. In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks. This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically
    corecore