1,497 research outputs found
Accessibility at Film Festivals: Guidelines for Inclusive Subtitling
In today's media-dominated world, the imperative for accessibility has never been greater, and ensuring that audiovisual experiences cater to individuals with sensory disabilities has become a pressing concern. One of the key initiatives in this endeavour is inclusive subtitling (IS), a practice rooted in the broader contexts of subtitling for the deaf and hard of hearing (SDH/CC), audiovisual translation studies (AVTS), media accessibility studies (MAS), and the evolving field of Deaf studies (DS). This study aims to offer a comprehensive exploration of how inclusive subtitling contributes to fostering accessible and inclusive audiovisual experiences, with a particular focus on its implications within the unique environment of film festivals. To gain a holistic perspective of inclusive subtitling, it is essential to examine its lineage in relation to analogous practices, which is the focus of the first chapter. Inclusive subtitling is an extension of SDH/CC, designed for individuals with hearing impairments, and SDH/CC, in turn, is a nuanced variation of traditional subtitling extensively explored within the realm of AVTS. To encapsulate the diverse techniques and modalities aimed at making audiovisual content universally accessible, the study recognises the term "Audiovisual Accessibility" (AVA). The second chapter explores the interconnection of accessibility studies (AS), AVTS, and MAS, highlighting their symbiotic relationship and their role in framing inclusive subtitles within these fields. These interconnections are pivotal in shaping a framework for the practice of inclusive subtitling, enabling a comprehensive examination of its applicability and research implications. The third chapter delves into Deaf studies and the evolution of Deafhood, which hinges on the history and culture of Deaf individuals. This chapter elucidates the distinction between âdeafnessâ as a medical construct and âDeafhoodâ as a cultural identity, crucial to the understanding of audiovisual accessibility and its intersection with the Deaf community's perspectives. In the fourth chapter, the focus turns to the exploration of film festivals, with a specific emphasis on the crucial role of subtitles in enhancing accessibility, particularly when films are presented in their original languages. The chapter marks a critical point, highlighting the inherent connection between subtitles and the immersive nature of film festivals that aspire to promote inclusivity in the cinematic experience. The emphasis on inclusivity extends to the evolution of film festivals, giving rise to more advanced forms, including accessible film festivals and Deaf film festivals. At the core of the chapter is a thorough examination of the corpus, specifically, the SDH/CC of films spanning the editions from 2020 to 2023 of two highly significant film festivals, namely BFI Flare and the London Film Festival. The corpus serves as the foundation upon which my research unfolds, providing a nuanced understanding of the role subtitles play in film festival contexts. The main chapter, chapter five, thoroughly analyses the technical and linguistic aspects of inclusive subtitling, drawing insights from the Inclusive Subtitling Guidelines - a two version document devised by myself - and offering real-world applications supported by a case study at an Italian film festival and another case study of the short film Pure, with the relevant inclusive subtitles file annexed. In conclusion, the research sets the stage for a comprehensive exploration of inclusive subtitling's role in ensuring accessible and inclusive audiovisual experiences, particularly within film festivals. It underscores the importance of accessibility in the world of audiovisual media and highlights the need for inclusive practices to cater to diverse audiences
OpenAGI: When LLM Meets Domain Experts
Human intelligence has the remarkable ability to assemble basic skills into
complex ones so as to solve complex tasks. This ability is equally important
for Artificial Intelligence (AI), and thus, we assert that in addition to the
development of large, comprehensive intelligent models, it is equally crucial
to equip such models with the capability to harness various domain-specific
expert models for complex task-solving in the pursuit of Artificial General
Intelligence (AGI). Recent developments in Large Language Models (LLMs) have
demonstrated remarkable learning and reasoning abilities, making them promising
as a controller to select, synthesize, and execute external models to solve
complex tasks. In this project, we develop OpenAGI, an open-source AGI research
platform, specifically designed to offer complex, multi-step tasks and
accompanied by task-specific datasets, evaluation metrics, and a diverse range
of extensible models. OpenAGI formulates complex tasks as natural language
queries, serving as input to the LLM. The LLM subsequently selects,
synthesizes, and executes models provided by OpenAGI to address the task.
Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF)
mechanism, which uses the task-solving result as feedback to improve the LLM's
task-solving ability. Thus, the LLM is responsible for synthesizing various
external models for solving complex tasks, while RLTF provides feedback to
improve its task-solving ability, enabling a feedback loop for self-improving
AI. We believe that the paradigm of LLMs operating various expert models for
complex task-solving is a promising approach towards AGI. To facilitate the
community's long-term improvement and evaluation of AGI's ability, we
open-source the code, benchmark, and evaluation methods of the OpenAGI project
at https://github.com/agiresearch/OpenAGI.Comment: 18 pages, 6 figures, 7 table
CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context
When reading a scholarly article, inline citations help researchers
contextualize the current article and discover relevant prior work. However, it
can be challenging to prioritize and make sense of the hundreds of citations
encountered during literature reviews. This paper introduces CiteSee, a paper
reading tool that leverages a user's publishing, reading, and saving activities
to provide personalized visual augmentations and context around citations.
First, CiteSee connects the current paper to familiar contexts by surfacing
known citations a user had cited or opened. Second, CiteSee helps users
prioritize their exploration by highlighting relevant but unknown citations
based on saving and reading history. We conducted a lab study that suggests
CiteSee is significantly more effective for paper discovery than three
baselines. A field deployment study shows CiteSee helps participants keep track
of their explorations and leads to better situational awareness and increased
paper discovery via inline citation when conducting real-world literature
reviews
Dataset And Deep Neural Network Based Approach To Audio Question Answering
Audio question answering (AQA) is a multimodal task in which a system analyzes an audio signal and a question in natural language, to produce a desirable answer in natural language. In this thesis, a new dataset for audio question answering, Clotho-AQA, consisting of 1991 audio files each between 15 to 30 seconds in duration is presented. For each audio file in the dataset, six different questions and their corresponding answers were crowdsourced using Amazon Mechanical Turk (AMT). The questions and their corresponding answers were created by different annotators. Out of the six questions for each audio, two questions each were designed to have âyesâ and ânoâ as answers respectively, while the remaining two questions have other single-word answers. For every question, answers from three independent annotators were collected. In this thesis, two baseline experiments are presented to portray the usage of the Clotho-AQA dataset - a multimodal binary classifier for âyesâ or ânoâ answers and a multimodal multi-class classifier for single-word answers both based on long short-term memory (LSTM) layers. The binary classifier achieved an accuracy of 62.7% and the multi-class classifier achieved a top-1 accuracy of 54.2% and a top-5 accuracy of 93.7%. Further, an attention-based model was proposed, which increased the binary classifier accuracy to 66.2% and the top-1 and top-5 multiclass classifier accuracy to 57.5% and 99.8% respectively. Some drawbacks of the Clotho-AQA dataset such as the presence of the same answer words in different tenses, singular-plural forms, etc., that are considered as different classes for the classification problem were addressed and a refined version called Clotho-AQA_v2 is also presented. The multimodal baseline model achieved a top-1 and top-5 accuracy of 59.8% and 96.6% respectively while the attention-based model achieved a top-1 and top-5 accuracy of 61.3% and 99.6% respectively on this refined dataset
TaleCrafter: Interactive Story Visualization with Multiple Characters
Accurate Story visualization requires several necessary elements, such as
identity consistency across frames, the alignment between plain text and visual
content, and a reasonable layout of objects in images. Most previous works
endeavor to meet these requirements by fitting a text-to-image (T2I) model on a
set of videos in the same style and with the same characters, e.g., the
FlintstonesSV dataset. However, the learned T2I models typically struggle to
adapt to new characters, scenes, and styles, and often lack the flexibility to
revise the layout of the synthesized images. This paper proposes a system for
generic interactive story visualization, capable of handling multiple novel
characters and supporting the editing of layout and local structure. It is
developed by leveraging the prior knowledge of large language and T2I models,
trained on massive corpora. The system comprises four interconnected
components: story-to-prompt generation (S2P), text-to-layout generation (T2L),
controllable text-to-image generation (C-T2I), and image-to-video animation
(I2V). First, the S2P module converts concise story information into detailed
prompts required for subsequent stages. Next, T2L generates diverse and
reasonable layouts based on the prompts, offering users the ability to adjust
and refine the layout to their preference. The core component, C-T2I, enables
the creation of images guided by layouts, sketches, and actor-specific
identifiers to maintain consistency and detail across visualizations. Finally,
I2V enriches the visualization process by animating the generated images.
Extensive experiments and a user study are conducted to validate the
effectiveness and flexibility of interactive editing of the proposed system.Comment: Github repository: https://github.com/VideoCrafter/TaleCrafte
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Transformer is a deep neural network that employs a self-attention mechanism
to comprehend the contextual relationships within sequential data. Unlike
conventional neural networks or updated versions of Recurrent Neural Networks
(RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in
handling long dependencies between input sequence elements and enable parallel
processing. As a result, transformer-based models have attracted substantial
interest among researchers in the field of artificial intelligence. This can be
attributed to their immense potential and remarkable achievements, not only in
Natural Language Processing (NLP) tasks but also in a wide range of domains,
including computer vision, audio and speech processing, healthcare, and the
Internet of Things (IoT). Although several survey papers have been published
highlighting the transformer's contributions in specific fields, architectural
differences, or performance evaluations, there is still a significant absence
of a comprehensive survey paper encompassing its major applications across
various domains. Therefore, we undertook the task of filling this gap by
conducting an extensive survey of proposed transformer models from 2017 to
2022. Our survey encompasses the identification of the top five application
domains for transformer-based models, namely: NLP, Computer Vision,
Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze
the impact of highly influential transformer-based models in these domains and
subsequently classify them based on their respective tasks using a proposed
taxonomy. Our aim is to shed light on the existing potential and future
possibilities of transformers for enthusiastic researchers, thus contributing
to the broader understanding of this groundbreaking technology
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Multi-channel video-language retrieval require models to understand
information from different channels (e.g. videoquestion, videospeech) to
correctly link a video with a textual response or query. Fortunately,
contrastive multimodal models are shown to be highly effective at aligning
entities in images/videos and text, e.g., CLIP; text contrastive models are
extensively studied recently for their strong ability of producing
discriminative sentence embeddings, e.g., SimCSE. However, there is not a clear
way to quickly adapt these two lines to multi-channel video-language retrieval
with limited data and resources. In this paper, we identify a principled model
design space with two axes: how to represent videos and how to fuse video and
text information. Based on categorization of recent methods, we investigate the
options of representing videos using continuous feature vectors or discrete
text tokens; for the fusion method, we explore the use of a multimodal
transformer or a pretrained contrastive text model. We extensively evaluate the
four combinations on five video-language datasets. We surprisingly find that
discrete text tokens coupled with a pretrained contrastive text model yields
the best performance, which can even outperform state-of-the-art on the iVQA
and How2QA datasets without additional training on millions of video-text data.
Further analysis shows that this is because representing videos as text tokens
captures the key visual information and text tokens are naturally aligned with
text models that are strong retrievers after the contrastive pretraining
process. All the empirical analysis establishes a solid foundation for future
research on affordable and upgradable multimodal intelligence.Comment: To appear in CVPR 2023; The code will be released at
https://github.com/XudongLinthu/upgradable-multimodal-intelligenc
La traduzione specializzata allâopera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.
Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The âLanguage Toolkit â Le lingue straniere al servizio dellâinternazionalizzazione dellâimpresaâ project, promoted by the Department of Interpreting and Translation (ForlĂŹ Campus) in collaboration with the Romagna Chamber of Commerce (ForlĂŹ-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices
Speculative futures on ChatGPT and generative artificial intelligence (AI): a collective reflection from the educational landscape
While ChatGPT has recently become very popular, AI has a long history and philosophy. This paper intends to explore the promises and pitfalls of the Generative Pre-trained Transformer (GPT) AI and potentially future technologies by adopting a speculative methodology. Speculative future narratives with a specific focus on educational contexts are provided in an attempt to identify emerging themes and discuss their implications for education in the 21st century. Affordances of (using) AI in Education (AIEd)and possible adverse effects are identified and discussed which emerge from the narratives. It is argued that now is the best of times to define human vs AI contribution to education because AI can accomplish more and more educational activities that used to be the prerogative of human educators. Therefore, it is imperative to rethink the respective roles of technology and human educators in education with a future-oriented mindse
- âŠ