74 research outputs found
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Learning commonsense human-language descriptions from temporal and spatial sensor-network data
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2006.Includes bibliographical references (p. 105-109) and index.Embedded-sensor platforms are advancing toward such sophistication that they can differentiate between subtle actions. For example, when placed in a wristwatch, such platforms can tell whether a person is shaking hands or turning a doorknob. Sensors placed on objects in the environment now report many parameters, including object location, movement, sound, and temperature. A persistent problem, however, is the description of these sense data in meaningful human-language. This is an important problem that appears across domains ranging from organizational security surveillance to individual activity journaling. Previous models of activity recognition pigeon-hole descriptions into small, formal categories specified in advance; for example, location is often categorized as "at home" or "at the office." These models have not been able to adapt to the wider range of complex, dynamic, and idiosyncratic human activities. We hypothesize that the commonsense, semantically related, knowledge bases can be used to bootstrap learning algorithms for classifying and recognizing human activities from sensors.(cont.) Our system, LifeNet, is a first-person commonsense inference model, which consists of a graph with nodes drawn from a large repository of commonsense assertions expressed in human-language phrases. LifeNet is used to construct a mapping between streams of sensor data and partially ordered sequences of events, co-located in time and space. Further, by gathering sensor data in vivo, we are able to validate and extend the commonsense knowledge from which LifeNet is derived. LifeNet is evaluated in the context of its performance on a sensor-network platform distributed in an office environment. We hypothesize that mapping sensor data into LifeNet will act as a "semantic mirror" to meaningfully interpret sensory data into cohesive patterns in order to understand and predict human action.by Bo Morgan.S.M
Review : Deep learning in electron microscopy
Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy
Designing sound : procedural audio research based on the book by Andy Farnell
In
procedural
media,
data
normally
acquired
by
measuring
something,
commonly
described
as
sampling,
is
replaced
by
a
set
of
computational
rules
(procedure)
that
defines
the
typical
structure
and/or
behaviour
of
that
thing.
Here,
a
general
approach
to
sound
as
a
definable
process,
rather
than
a
recording,
is
developed.
By
analysis
of
their
physical
and
perceptual
qualities,
natural
objects
or
processes
that
produce
sound
are
modelled
by
digital
Sounding
Objects
for
use
in
arts
and
entertainments.
This
Thesis
discusses
different
aspects
of
Procedural
Audio
introducing
several
new
approaches
and
solutions
to
this
emerging
field
of
Sound
Design.Em
Media
Procedimental,
os
dados
os
dados
normalmente
adquiridos
através
da
medição
de
algo
habitualmente
designado
como
amostragem,
são
substituídos
por
um
conjunto
de
regras
computacionais
(procedimento)
que
definem
a
estrutura
típica,
ou
comportamento,
desse
elemento.
Neste
caso
é
desenvolvida
uma
abordagem
ao
som
definível
como
um
procedimento
em
vez
de
uma
gravação.
Através
da
análise
das
suas
características
físicas
e
perceptuais
,
objetos
naturais
ou
processos
que
produzem
som,
são
modelados
como
objetos
sonoros
digitais
para
utilização
nas
Artes
e
Entretenimento.
Nesta
Tese
são
discutidos
diferentes
aspectos
de
Áudio
Procedimental,
sendo
introduzidas
várias
novas
abordagens
e
soluções
para
o
campo
emergente
do
Design
Sonoro
- …