1,497 research outputs found

    Accessibility at Film Festivals: Guidelines for Inclusive Subtitling

    Get PDF
    In today's media-dominated world, the imperative for accessibility has never been greater, and ensuring that audiovisual experiences cater to individuals with sensory disabilities has become a pressing concern. One of the key initiatives in this endeavour is inclusive subtitling (IS), a practice rooted in the broader contexts of subtitling for the deaf and hard of hearing (SDH/CC), audiovisual translation studies (AVTS), media accessibility studies (MAS), and the evolving field of Deaf studies (DS). This study aims to offer a comprehensive exploration of how inclusive subtitling contributes to fostering accessible and inclusive audiovisual experiences, with a particular focus on its implications within the unique environment of film festivals. To gain a holistic perspective of inclusive subtitling, it is essential to examine its lineage in relation to analogous practices, which is the focus of the first chapter. Inclusive subtitling is an extension of SDH/CC, designed for individuals with hearing impairments, and SDH/CC, in turn, is a nuanced variation of traditional subtitling extensively explored within the realm of AVTS. To encapsulate the diverse techniques and modalities aimed at making audiovisual content universally accessible, the study recognises the term "Audiovisual Accessibility" (AVA). The second chapter explores the interconnection of accessibility studies (AS), AVTS, and MAS, highlighting their symbiotic relationship and their role in framing inclusive subtitles within these fields. These interconnections are pivotal in shaping a framework for the practice of inclusive subtitling, enabling a comprehensive examination of its applicability and research implications. The third chapter delves into Deaf studies and the evolution of Deafhood, which hinges on the history and culture of Deaf individuals. This chapter elucidates the distinction between ‘deafness’ as a medical construct and ‘Deafhood’ as a cultural identity, crucial to the understanding of audiovisual accessibility and its intersection with the Deaf community's perspectives. In the fourth chapter, the focus turns to the exploration of film festivals, with a specific emphasis on the crucial role of subtitles in enhancing accessibility, particularly when films are presented in their original languages. The chapter marks a critical point, highlighting the inherent connection between subtitles and the immersive nature of film festivals that aspire to promote inclusivity in the cinematic experience. The emphasis on inclusivity extends to the evolution of film festivals, giving rise to more advanced forms, including accessible film festivals and Deaf film festivals. At the core of the chapter is a thorough examination of the corpus, specifically, the SDH/CC of films spanning the editions from 2020 to 2023 of two highly significant film festivals, namely BFI Flare and the London Film Festival. The corpus serves as the foundation upon which my research unfolds, providing a nuanced understanding of the role subtitles play in film festival contexts. The main chapter, chapter five, thoroughly analyses the technical and linguistic aspects of inclusive subtitling, drawing insights from the Inclusive Subtitling Guidelines - a two version document devised by myself - and offering real-world applications supported by a case study at an Italian film festival and another case study of the short film Pure, with the relevant inclusive subtitles file annexed. In conclusion, the research sets the stage for a comprehensive exploration of inclusive subtitling's role in ensuring accessible and inclusive audiovisual experiences, particularly within film festivals. It underscores the importance of accessibility in the world of audiovisual media and highlights the need for inclusive practices to cater to diverse audiences

    OpenAGI: When LLM Meets Domain Experts

    Full text link
    Human intelligence has the remarkable ability to assemble basic skills into complex ones so as to solve complex tasks. This ability is equally important for Artificial Intelligence (AI), and thus, we assert that in addition to the development of large, comprehensive intelligent models, it is equally crucial to equip such models with the capability to harness various domain-specific expert models for complex task-solving in the pursuit of Artificial General Intelligence (AGI). Recent developments in Large Language Models (LLMs) have demonstrated remarkable learning and reasoning abilities, making them promising as a controller to select, synthesize, and execute external models to solve complex tasks. In this project, we develop OpenAGI, an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models. OpenAGI formulates complex tasks as natural language queries, serving as input to the LLM. The LLM subsequently selects, synthesizes, and executes models provided by OpenAGI to address the task. Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF) mechanism, which uses the task-solving result as feedback to improve the LLM's task-solving ability. Thus, the LLM is responsible for synthesizing various external models for solving complex tasks, while RLTF provides feedback to improve its task-solving ability, enabling a feedback loop for self-improving AI. We believe that the paradigm of LLMs operating various expert models for complex task-solving is a promising approach towards AGI. To facilitate the community's long-term improvement and evaluation of AGI's ability, we open-source the code, benchmark, and evaluation methods of the OpenAGI project at https://github.com/agiresearch/OpenAGI.Comment: 18 pages, 6 figures, 7 table

    CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context

    Full text link
    When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature reviews. This paper introduces CiteSee, a paper reading tool that leverages a user's publishing, reading, and saving activities to provide personalized visual augmentations and context around citations. First, CiteSee connects the current paper to familiar contexts by surfacing known citations a user had cited or opened. Second, CiteSee helps users prioritize their exploration by highlighting relevant but unknown citations based on saving and reading history. We conducted a lab study that suggests CiteSee is significantly more effective for paper discovery than three baselines. A field deployment study shows CiteSee helps participants keep track of their explorations and leads to better situational awareness and increased paper discovery via inline citation when conducting real-world literature reviews

    Dataset And Deep Neural Network Based Approach To Audio Question Answering

    Get PDF
    Audio question answering (AQA) is a multimodal task in which a system analyzes an audio signal and a question in natural language, to produce a desirable answer in natural language. In this thesis, a new dataset for audio question answering, Clotho-AQA, consisting of 1991 audio files each between 15 to 30 seconds in duration is presented. For each audio file in the dataset, six different questions and their corresponding answers were crowdsourced using Amazon Mechanical Turk (AMT). The questions and their corresponding answers were created by different annotators. Out of the six questions for each audio, two questions each were designed to have ‘yes’ and ‘no’ as answers respectively, while the remaining two questions have other single-word answers. For every question, answers from three independent annotators were collected. In this thesis, two baseline experiments are presented to portray the usage of the Clotho-AQA dataset - a multimodal binary classifier for ‘yes’ or ‘no’ answers and a multimodal multi-class classifier for single-word answers both based on long short-term memory (LSTM) layers. The binary classifier achieved an accuracy of 62.7% and the multi-class classifier achieved a top-1 accuracy of 54.2% and a top-5 accuracy of 93.7%. Further, an attention-based model was proposed, which increased the binary classifier accuracy to 66.2% and the top-1 and top-5 multiclass classifier accuracy to 57.5% and 99.8% respectively. Some drawbacks of the Clotho-AQA dataset such as the presence of the same answer words in different tenses, singular-plural forms, etc., that are considered as different classes for the classification problem were addressed and a refined version called Clotho-AQA_v2 is also presented. The multimodal baseline model achieved a top-1 and top-5 accuracy of 59.8% and 96.6% respectively while the attention-based model achieved a top-1 and top-5 accuracy of 61.3% and 99.6% respectively on this refined dataset

    TaleCrafter: Interactive Story Visualization with Multiple Characters

    Full text link
    Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting a text-to-image (T2I) model on a set of videos in the same style and with the same characters, e.g., the FlintstonesSV dataset. However, the learned T2I models typically struggle to adapt to new characters, scenes, and styles, and often lack the flexibility to revise the layout of the synthesized images. This paper proposes a system for generic interactive story visualization, capable of handling multiple novel characters and supporting the editing of layout and local structure. It is developed by leveraging the prior knowledge of large language and T2I models, trained on massive corpora. The system comprises four interconnected components: story-to-prompt generation (S2P), text-to-layout generation (T2L), controllable text-to-image generation (C-T2I), and image-to-video animation (I2V). First, the S2P module converts concise story information into detailed prompts required for subsequent stages. Next, T2L generates diverse and reasonable layouts based on the prompts, offering users the ability to adjust and refine the layout to their preference. The core component, C-T2I, enables the creation of images guided by layouts, sketches, and actor-specific identifiers to maintain consistency and detail across visualizations. Finally, I2V enriches the visualization process by animating the generated images. Extensive experiments and a user study are conducted to validate the effectiveness and flexibility of interactive editing of the proposed system.Comment: Github repository: https://github.com/VideoCrafter/TaleCrafte

    A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks

    Full text link
    Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. Unlike conventional neural networks or updated versions of Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. As a result, transformer-based models have attracted substantial interest among researchers in the field of artificial intelligence. This can be attributed to their immense potential and remarkable achievements, not only in Natural Language Processing (NLP) tasks but also in a wide range of domains, including computer vision, audio and speech processing, healthcare, and the Internet of Things (IoT). Although several survey papers have been published highlighting the transformer's contributions in specific fields, architectural differences, or performance evaluations, there is still a significant absence of a comprehensive survey paper encompassing its major applications across various domains. Therefore, we undertook the task of filling this gap by conducting an extensive survey of proposed transformer models from 2017 to 2022. Our survey encompasses the identification of the top five application domains for transformer-based models, namely: NLP, Computer Vision, Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze the impact of highly influential transformer-based models in these domains and subsequently classify them based on their respective tasks using a proposed taxonomy. Our aim is to shed light on the existing potential and future possibilities of transformers for enthusiastic researchers, thus contributing to the broader understanding of this groundbreaking technology

    Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

    Full text link
    Multi-channel video-language retrieval require models to understand information from different channels (e.g. video++question, video++speech) to correctly link a video with a textual response or query. Fortunately, contrastive multimodal models are shown to be highly effective at aligning entities in images/videos and text, e.g., CLIP; text contrastive models are extensively studied recently for their strong ability of producing discriminative sentence embeddings, e.g., SimCSE. However, there is not a clear way to quickly adapt these two lines to multi-channel video-language retrieval with limited data and resources. In this paper, we identify a principled model design space with two axes: how to represent videos and how to fuse video and text information. Based on categorization of recent methods, we investigate the options of representing videos using continuous feature vectors or discrete text tokens; for the fusion method, we explore the use of a multimodal transformer or a pretrained contrastive text model. We extensively evaluate the four combinations on five video-language datasets. We surprisingly find that discrete text tokens coupled with a pretrained contrastive text model yields the best performance, which can even outperform state-of-the-art on the iVQA and How2QA datasets without additional training on millions of video-text data. Further analysis shows that this is because representing videos as text tokens captures the key visual information and text tokens are naturally aligned with text models that are strong retrievers after the contrastive pretraining process. All the empirical analysis establishes a solid foundation for future research on affordable and upgradable multimodal intelligence.Comment: To appear in CVPR 2023; The code will be released at https://github.com/XudongLinthu/upgradable-multimodal-intelligenc

    La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.

    Get PDF
    Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (ForlĂŹ Campus) in collaboration with the Romagna Chamber of Commerce (ForlĂŹ-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices

    Speculative futures on ChatGPT and generative artificial intelligence (AI): a collective reflection from the educational landscape

    Get PDF
    While ChatGPT has recently become very popular, AI has a long history and philosophy. This paper intends to explore the promises and pitfalls of the Generative Pre-trained Transformer (GPT) AI and potentially future technologies by adopting a speculative methodology. Speculative future narratives with a specific focus on educational contexts are provided in an attempt to identify emerging themes and discuss their implications for education in the 21st century. Affordances of (using) AI in Education (AIEd)and possible adverse effects are identified and discussed which emerge from the narratives. It is argued that now is the best of times to define human vs AI contribution to education because AI can accomplish more and more educational activities that used to be the prerogative of human educators. Therefore, it is imperative to rethink the respective roles of technology and human educators in education with a future-oriented mindse
    • 

    corecore