242,173 research outputs found
From Audio Paper to Next Generation Paper
It has been 24 years since the publication of Wellner’s (1993)
digital desk, demonstrating the augmentation of paper
documents with projected information. Since then there have
been many related developments in computing; including the
world wide web, e-book readers, maturation of the augmented
reality paradigm, embedded and printed electronics, and the
internet of things. In this talk I draw on some of my own design
explorations of augmenting paper with sound over the years, to
illustrate the value of ‘audiopaper’ but also the way these
explorations were rooted in the applications and technology of
the day. I show that two key technologies have been important
to the implementation of audiopaper over the years, and that the
bigger opportunity is in connecting paper to the web. This
culminates in a vision for two future generations of paper which
communicate and interact with the digital devices around the
Sparks of Large Audio Models: A Survey and Outlook
This survey paper provides a comprehensive overview of the recent
advancements and challenges in applying large language models to the field of
audio signal processing. Audio processing, with its diverse signal
representations and a wide range of sources--from human voices to musical
instruments and environmental sounds--poses challenges distinct from those
found in traditional Natural Language Processing scenarios. Nevertheless,
\textit{Large Audio Models}, epitomized by transformer-based architectures,
have shown marked efficacy in this sphere. By leveraging massive amount of
data, these models have demonstrated prowess in a variety of audio tasks,
spanning from Automatic Speech Recognition and Text-To-Speech to Music
Generation, among others. Notably, recently these Foundational Audio Models,
like SeamlessM4T, have started showing abilities to act as universal
translators, supporting multiple speech tasks for up to 100 languages without
any reliance on separate task-specific systems. This paper presents an in-depth
analysis of state-of-the-art methodologies regarding \textit{Foundational Large
Audio Models}, their performance benchmarks, and their applicability to
real-world scenarios. We also highlight current limitations and provide
insights into potential future research directions in the realm of
\textit{Large Audio Models} with the intent to spark further discussion,
thereby fostering innovation in the next generation of audio-processing
systems. Furthermore, to cope with the rapid development in this area, we will
consistently update the relevant repository with relevant recent articles and
their open-source implementations at
https://github.com/EmulationAI/awesome-large-audio-models.Comment: work in progress, Repo URL:
https://github.com/EmulationAI/awesome-large-audio-model
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
Speechreading or lipreading is the technique of understanding and getting
phonetic features from a speaker's visual features such as movement of lips,
face, teeth and tongue. It has a wide range of multimedia applications such as
in surveillance, Internet telephony, and as an aid to a person with hearing
impairments. However, most of the work in speechreading has been limited to
text generation from silent videos. Recently, research has started venturing
into generating (audio) speech from silent video sequences but there have been
no developments thus far in dealing with divergent views and poses of a
speaker. Thus although, we have multiple camera feeds for the speech of a user,
but we have failed in using these multiple video feeds for dealing with the
different poses. To this end, this paper presents the world's first ever
multi-view speech reading and reconstruction system. This work encompasses the
boundaries of multimedia research by putting forth a model which leverages
silent video feeds from multiple cameras recording the same subject to generate
intelligent speech for a speaker. Initial results confirm the usefulness of
exploiting multiple camera views in building an efficient speech reading and
reconstruction system. It further shows the optimal placement of cameras which
would lead to the maximum intelligibility of speech. Next, it lays out various
innovative applications for the proposed system focusing on its potential
prodigious impact in not just security arena but in many other multimedia
analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul,
Republic of Kore
Real-time systems development with SDL and next generation validation tools
The language SDL has long been applied in the development of various kinds of systems. Real-time systems are one application area where SDL has been applied extensively. Whilst SDL allows for certain modelling aspects of real-time systems to be represented, the language and its associated tool support have certain drawbacks for modelling and reasoning about such systems. In this paper we highlight the limitations of SDL and its associated tool support in this domain and present language extensions and next generation real-time system tool support to help overcome them. The applicability of the extensions and tools is demonstrated through a case study based upon a multimedia binding object used to support a configuration of time dependent information producers and consumers realising the so called lip-synchronisation algorithm
Recommended from our members
What did the Romans ever do for us? ‘Next generation’ networks and hybrid learning resources
Networked learning is fundamentally concerned with the use of information and communication technologies (ICT) to link people to people and resources, to support the process of learning. This paper explores some current and forthcoming changes in ICT and some potential implications of these developments for networked learning. Whilst we aim to avoid taking a technologically determinist stance, we explore the potential for future practice and how some educational and pedagogic practices are evolving to exploit and shape the digital environment. We argue that we can change both the ways in which connections between people (learners and other learners; learners and tutors) are made and the nature of the resources that learning communities (particularly distributed communities) can engage with. In doing this we draw on two strands of work. Firstly, we draw on the ‘IBZL Education’ a UK Open University initiative to develop new scholarship in the context of STEM (Science, Technology, Engineering and Mathematics) through which educators are encouraged to think about technological change in the next five to ten years and ways in which we can intervene and shape these developments. We use problem-based learning as an example of a learning experience that can be difficult to implement in a networked learning environment. IBZL identified two broad strands of significant technological development. 'Superfast' broadband networks that are capable of supporting novel applications are being rolled in the UK (and elsewhere). Also, boundaries between the real and virtual worlds are becoming blurred as in the ‘internet of things’ where, for example, RFID tags enable information about the real world to be brought into the virtual one. We use the term ‘artefact’ to describe designed components, whether entirely digital, such as a computer forum, or material, such as a tablet PC. Networked ‘hybrid’ technologies of virtual and material components have may great potential for use in education.
Secondly, we illustrate how these changes may be beginning to happen in distance education using the example of TU100 My Digital Life, a new introductory Open University. . TU100 Students use an electronics board in their own homes to work on a programming problem in collaboration other students through a tutor-led tutorial in a web conferencing system. We also note some of the evident complexity that establishing such resources as part of wider infrastructures of networked learning would be likely to involve
- …