449 research outputs found
TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer
Text style is highly abstract, as it encompasses various aspects of a
speaker's characteristics, habits, logical thinking, and the content they
express. However, previous text-style transfer tasks have primarily focused on
data-driven approaches, lacking in-depth analysis and research from the
perspectives of linguistics and cognitive science. In this paper, we introduce
a novel task called Text Speech-Style Transfer (TSST). The main objective is to
further explore topics related to human cognition, such as personality and
emotion, based on the capabilities of existing LLMs. Considering the objective
of our task and the distinctive characteristics of oral speech in real-life
scenarios, we trained multi-dimension (i.e. filler words, vividness,
interactivity, emotionality) evaluation models for the TSST and validated their
correlation with human assessments. We thoroughly analyze the performance of
several large language models (LLMs) and identify areas where further
improvement is needed. Moreover, driven by our evaluation models, we have
released a new corpus that improves the capabilities of LLMs in generating text
with speech-style characteristics. In summary, we present the TSST task, a new
benchmark for style transfer and emphasizing human-oriented evaluation,
exploring and advancing the performance of current LLMs.Comment: Working in progres
Multimedia content description framework
A framework is provided for describing multimedia content and a system in which a plurality of multimedia storage devices employing the content description methods of the present invention can interoperate. In accordance with one form of the present invention, the content description framework is a description scheme (DS) for describing streams or aggregations of multimedia objects, which may comprise audio, images, video, text, time series, and various other modalities. This description scheme can accommodate an essentially limitless number of descriptors in terms of features, semantics or metadata, and facilitate content-based search, index, and retrieval, among other capabilities, for both streamed or aggregated multimedia objects
An Investigation of the Persuasive Effects of Rhetorical Questions, Message Framing, and the ELM in Promoting Responsible Cell Phone Usage
This study evaluated persuasive messages that advocate support for a ban against cell phones while driving using Petty and Cacioppo\u27s Elaboration Likelihood Model of persuasion as its theoretical framework. Seven hypotheses were tested using a 2 x 2 x 2 factorial design assessing the influence of need for cognition (high vs. low) in tandem with the variables of message framing (gain vs. loss statements) and message form (questions vs. statements) upon assessments of elaboration (ME), cognition message value (CMV), message effectiveness ratings (MEF), and attitude toward the prescribed behavior (ATPB).
A significant main effect was found for message framing as positively framed messages produced more positive ratings for CMV, the degree to which individuals found the advocacy to be intellectually stimulating and worthwhile as vehicles for persuasion.
A pair of significant two way interactions were detected as: (1) High need for cognition individuals registered a stronger commitment toward the prescribed behavior ( don\u27t use a cell phone while driving ) when exposed to negatively framed messages and (2) Low cognition receivers exposed to negatively framed messages registered a greater willingness to adopt the targeted behavior, future intent not to use a cell phone while driving. This latter result partially contradicted the original hypothesis
Contextual awareness, messaging and communication in nomadic audio environments
Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998.Includes bibliographical references (p. 119-122).Nitin Sawhney.M.S
Automatic Mobile Video Remixing and Collaborative Watching Systems
In the thesis, the implications of combining collaboration with automation for remix creation are analyzed. We first present a sensor-enhanced Automatic Video Remixing System (AVRS), which intelligently processes mobile videos in combination with mobile device sensor information. The sensor-enhanced AVRS system involves certain architectural choices, which meet the key system requirements (leverage user generated content, use sensor information, reduce end user burden), and user experience requirements. Architecture adaptations are required to improve certain key performance parameters. In addition, certain operating parameters need to be constrained, for real world deployment feasibility. Subsequently, sensor-less cloud based AVRS and low footprint sensorless AVRS approaches are presented. The three approaches exemplify the importance of operating parameter tradeoffs for system design. The approaches cover a wide spectrum, ranging from a multimodal multi-user client-server system (sensor-enhanced AVRS) to a mobile application which can automatically generate a multi-camera remix experience from a single video. Next, we present the findings from the four user studies involving 77 users related to automatic mobile video remixing. The goal was to validate selected system design goals, provide insights for additional features and identify the challenges and bottlenecks. Topics studied include the role of automation, the value of a video remix as an event memorabilia, the requirements for different types of events and the perceived user value from creating multi-camera remix from a single video. System design implications derived from the user studies are presented. Subsequently, sport summarization, which is a specific form of remix creation is analyzed. In particular, the role of content capture method is analyzed with two complementary approaches. The first approach performs saliency detection in casually captured mobile videos; in contrast, the second one creates multi-camera summaries from role based captured content. Furthermore, a method for interactive customization of summary is presented. Next, the discussion is extended to include the role of users’ situational context and the consumed content in facilitating collaborative watching experience. Mobile based collaborative watching architectures are described, which facilitate a common shared context between the participants. The concept of movable multimedia is introduced to highlight the multidevice environment of current day users. The thesis presents results which have been derived from end-to-end system prototypes tested in real world conditions and corroborated with extensive user impact evaluation
Recent Trends in Computational Intelligence
Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications
Creating an iPED Tour of Nantucket
To enhance visitor learning and enjoyment, museums are transitioning from the traditional delivery of information via maps and guidebooks to the use of handheld interpretive and wayfinding devices. The Nantucket Historical Association desired a handheld device to disseminate information about its historic sites. To address this desire, we evaluated handheld technologies, tested their acceptability among NHA patrons, developed our own prototype tour, and then tested it. Our project resulted in an expandable prototype tour and recommendations for the NHA
- …