1,827 research outputs found
Augmenting conversations through context-aware multimedia retrieval based on speech recognition
Futureâs environments will be sensitive and responsive to the presence of people to support them carrying out their everyday life activities, tasks and rituals, in an easy and natural way. Such interactive spaces will use the information and communication technologies to bring the computation into the physical world, in order to enhance ordinary activities of their users. This paper describes a speech-based spoken multimedia retrieval system that can be used to present relevant video-podcast (vodcast) footage, in response to spontaneous speech and conversations during daily life activities. The proposed system allows users to search the spoken content of multimedia files rather than their associated meta-information and let them navigate to the right portion where queried words are spoken by facilitating within-medium searches of multimedia content through a bag-of-words approach. Finally, we have studied the proposed system on different scenarios by using vodcasts in English from various categories, as the targeted multimedia, and discussed how it would enhance peopleâs everyday life activities by different scenarios including education, entertainment, marketing, news and workplace
Utilizing Review Summarization in a Spoken Recommendation System
In this paper we present a framework for spoken recommendation
systems. To provide reliable recommendations
to users, we incorporate a review summarization
technique which extracts informative opinion
summaries from grass-roots usersâ reviews. The dialogue
system then utilizes these review summaries to
support both quality-based opinion inquiry and feature-
specific entity search. We propose a probabilistic
language generation approach to automatically creating
recommendations in spoken natural language
from the text-based opinion summaries. A user study
in the restaurant domain shows that the proposed approaches
can effectively generate reliable and helpful
recommendations in human-computer conversations.T-Party ProjectQuanta Computer (Firm
Audio Transcription and Summarization System using Cloud Computing and Artificial Intelligence
In the modern era, organizations increasingly rely on virtual meetings to address customer issues promptly and effectively. However, dealing with recorded customer calls can be arduous. This review abstract introduces an innovative methodology to summarize audio data from customer interactions, which can streamline virtual meetings. Leveraging a speech recognizer, like AssemblyAI's API, the methodology converts audio data into text, and then employs a Graph-theoretic approach to generate concise summaries.
This review abstract delves into the growing prominence of cloud-based AI and ML services in the tech industry. It underscores the unique competitive strategies and focuses of major players, namely Amazon, Microsoft, and Google, in the realm of AI and ML platform development. The analysis explores these companies' internal applications and external ecosystem, dissecting their respective AI and ML development strategies. Finally, it predicts future directions for AI and ML platforms, including potential business models and emerging trends, while considering how Amazon, Microsoft, and Google align their platform development strategies with these future prospects
Abstractive Multi-Document Summarization via Phrase Selection and Merging
We propose an abstraction-based multi-document summarization framework that
can construct new sentences by exploring more fine-grained syntactic units than
sentences, namely, noun/verb phrases. Different from existing abstraction-based
approaches, our method first constructs a pool of concepts and facts
represented by phrases from the input documents. Then new sentences are
generated by selecting and merging informative phrases to maximize the salience
of phrases and meanwhile satisfy the sentence construction constraints. We
employ integer linear optimization for conducting phrase selection and merging
simultaneously in order to achieve the global optimal solution for a summary.
Experimental results on the benchmark data set TAC 2011 show that our framework
outperforms the state-of-the-art models under automated pyramid evaluation
metric, and achieves reasonably well results on manual linguistic quality
evaluation.Comment: 11 pages, 1 figure, accepted as a full paper at ACL 201
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Recently, there has been growing interest in extending the context length of
instruction-following models in order to effectively process single-turn long
input (e.g. summarizing a paper) and conversations with more extensive
histories. While proprietary models such as GPT-4 and Claude have demonstrated
considerable advancements in handling tens of thousands of tokens of context,
open-sourced models are still in the early stages of experimentation. It also
remains unclear whether developing these long context models can offer
substantial gains on practical downstream tasks over retrieval-based methods or
models simply trained on chunked contexts. To address this challenge, we
propose to institute standardized evaluation for long context language models.
Concretely, we develop L-Eval which contains 411 long documents and over 2,000
query-response pairs manually annotated and checked by the authors encompassing
areas such as law, finance, school lectures, lengthy conversations, news,
long-form novels, and meetings. L-Eval also adopts diverse evaluation methods
and instruction styles, enabling a more reliable assessment of Long Context
Language Models (LCLMs). Our findings indicate that while open-source models
typically lag behind their commercial counterparts, they still exhibit
impressive performance. LLaMA2 achieves the best results (win 45\% vs
turbo-16k) on open-ended tasks with only 4k context length and ChatGLM2
achieves the best results on closed-ended tasks with 8k input tokens. We
release our new evaluation suite, code, and all generation results including
predictions from all open-sourced LCLMs, GPT4-32k, Cluade-100k at
{\url{https://github.com/OpenLMLab/LEval}}
Bringing a Design to the Table
This writing reflects upon an artist and designerâs shared experience of making a film about a design and planning teamâs collaborative process. The film, entitled â(Re)searching a Welsh Design Vernacularâ documents a meeting that took place on Friday 3rd October 2008 at Grwp Gwalia, a social housing organisation based in Swansea, Wales. The film was exhibited as part of âReflecting Wales: an architectural exhibition of innovative, speculative and built work in Walesâ at the Senedd in October 2008
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
- âŚ