21 research outputs found
An analysis of frequency recognition algorithms and implementation in realtime
Frequency recognition is an important task in many engineering fields, such as audio signal processing and telecommunications engineering. There are numerous applications where frequency recognition is absolutely necessary like in Dual-Tone Multi-Frequency (DTMF) detection or the recognition of the carrier frequency of a Global Positioning System (GPS) signal. Furthermore, frequency recognition has entered many other engineering disciplines such as sonar and radar technology, spectral analysis of astronomic data, seismography, acoustics and consumer electronics. Listening to electronic music and playing electronic musical instruments is becoming more and more popular, not only among young musicians. This dissertation details back groundinformation and a preliminary analysis of a musical system, the Generic Musical Instrument System (GMIS), which allows composers to experiment with electronic instruments without actually, learning how to play them.This dissertation gives background information about frequency recognition algorithms implemented in real time. It analyses state-of-the-art techniques, such as Dual- Tone Multiple Frequency (DTMF) implementations and MIDI-based musical systems, in order to work out their similarities. The key idea is to adapt well-proven frequency recognition algorithms of DTMF systems, which are successfully and widely used in telephony. The investigations will show to what extent these principles and algorithms can be applied to a musical system like the GMIS. This dissertation presents results of investigations into frequency recognition algorithms implemented on a Texas Instruments (TI) TMS320C6713 Digital Signal Processor (DSP) core, in order to estimate the frequency of an audio signal in real time. The algorithms are evaluated using selected criteria in terms of speed and accuracy with accomplishing over 9600 single measurements. The evaluations are made with simple sinusoids and musical notes played by instruments as input signals which allows a solid decision, which of these
frequency recognition algorithms is appropriate for audio signal processing and for the constraints of the GMIS in real time
Interfaces avanzados aplicados a la interacción musical
The latest advances in human-computer interaction technologies have brought forth changes in the way we interact with computing devices of any kind, from the standard desktop computer to the more recent smartphones. The development of these technologies has thus introduced new interaction metaphors that provide more enriching experiences for a wide range of different applications.
Music is one of most ancient forms of art and entertainment that can be found in our legacy, and conforms a strong interactive experience on itself. The application of new technologies to enhance music computer-based interaction paradigms can potentially provide all sorts of improvements: providing low-cost access to music rehearsal, lowering knowledge barriers in regard to music learning, virtual instrument simulation, etc. Yet, surprisingly, there has been rather limited research on the application of new interaction models and technologies to the specific field of music interaction in regard to other areas.
This thesis aims to address the aforementioned need by presenting a set of studies which cover the use of innovative interaction models for music-based applications, from interaction paradigms for music learning to more entertainment-oriented interaction interfaces, such as virtual musical instruments, ensemble conductor simulation, etc. The main contributions of this thesis are:
· It is shown that the use of signal processing techniques on the music signal and music information retrieval techniques can create enticing interfaces for music learning. Concretely, the research conducted includes the implementation and experimental evaluation of a set of different learning-oriented applications which make use of these techniques to implement inexpensive, easy-to-use human-computer interfaces, which serve as support tools in music learning processes.
· This thesis explores the use of tracking systems and machine learning techniques to achieve more sophisticated interfaces for innovative music interaction paradigms. Concretely, the studies conducted have shown that it is indeed feasible to emulate the functionally of musical instruments such as the drumkit or the theremin. In a similar way, it is shown that more complex musical roles can also be recreated through the use of new interaction models, such as the case of the ensemble conductor or a step-aerobics application.
· The benefits in using advanced human-computer interfaces in musical experiences are review and assessed through experimental evaluation. It is shown that the addition of these interfaces contributes positively to user perception, providing more satisfying and enriching experiences overall.
· The thesis also illustrates that the use of machine learning algoriths and signal processing along with new interaction devices provides an effective framework for human gesture recognition and prediction, and even mood estimation
Automatic transcription of polyphonic music exploiting temporal evolution
PhDAutomatic music transcription is the process of converting an audio recording
into a symbolic representation using musical notation. It has numerous applications
in music information retrieval, computational musicology, and the
creation of interactive systems. Even for expert musicians, transcribing polyphonic
pieces of music is not a trivial task, and while the problem of automatic
pitch estimation for monophonic signals is considered to be solved, the creation
of an automated system able to transcribe polyphonic music without setting
restrictions on the degree of polyphony and the instrument type still remains
open.
In this thesis, research on automatic transcription is performed by explicitly
incorporating information on the temporal evolution of sounds. First efforts address
the problem by focusing on signal processing techniques and by proposing
audio features utilising temporal characteristics. Techniques for note onset and
offset detection are also utilised for improving transcription performance. Subsequent
approaches propose transcription models based on shift-invariant probabilistic
latent component analysis (SI-PLCA), modeling the temporal evolution
of notes in a multiple-instrument case and supporting frequency modulations in
produced notes. Datasets and annotations for transcription research have also
been created during this work. Proposed systems have been privately as well as
publicly evaluated within the Music Information Retrieval Evaluation eXchange
(MIREX) framework. Proposed systems have been shown to outperform several
state-of-the-art transcription approaches.
Developed techniques have also been employed for other tasks related to music
technology, such as for key modulation detection, temperament estimation,
and automatic piano tutoring. Finally, proposed music transcription models
have also been utilized in a wider context, namely for modeling acoustic scenes
Applications of loudness models in audio engineering
This thesis investigates the application of perceptual models to areas of audio engineering, with a particular focus on music production. The goal was to establish efficient and practical tools for the measurement and control of the perceived loudness of musical sounds. Two types of loudness model were investigated: the single-band model and the multiband excitation pattern (EP) model. The heuristic single-band devices were designed to be simple but sufficiently effective for real-world application, whereas the multiband procedures were developed to give a reasonable account of a large body of psychoacoustic findings according to a functional model of the peripheral hearing system. The research addresses the extent to which current models of loudness generalise to musical instruments, and whether can they be successfully employed in music applications. The domain-specific disparity between the two types of model was first tackled by reducing the computational load of state-of-the-art EP models to allow for fast but low-error auditory signal processing. Two elaborate hearing models were analysed and optimised using musical instruments and speech as test stimuli. It was shown that, after significantly reducing the complexity of both procedures, estimates of global loudness, such as peak loudness, as well as the intermediate auditory representations can be preserved with high accuracy. Based on the optimisations, two real-time applications were developed: a binaural loudness meter and an automatic multitrack mixer. This second system was designed to work independently of the loudness measurement procedure, and therefore supports both linear and nonlinear models. This allowed for a single mixing device to be assessed using different loudness metrics and this was demonstrated by evaluating three configurations through subjective assessment. Unexpectedly, when asked to rate both the overall quality of a mix and the degree to which instruments were equally loud, listeners preferred mixes generated using heuristic single-band models over those produced using a multiband procedure. A series of more systematic listening tests were conducted to further investigate this finding. Subjective loudness matches of musical instruments commonly found in western popular music were collected to evaluate the performance of five published models. The results were in accord with the application-based assessment, namely that current EP procedures do not generalise well when estimating the relative loudness of musical sounds which have marked differences in spectral content. Model specific issues were identified relating to the calculation of spectral loudness summation (SLS) and the method used to determine the global-loudness percept of time-varying musical sounds; associated refinements were proposed. It was shown that a new multiband loudness model with a heuristic loudness transformation yields superior performance over existing methods. This supports the idea that a revised model of SLS is needed, and therefore that modification to this stage in existing psychoacoustic procedures is an essential step towards the goal of achieving real-world deployment
ZATLAB : recognizing gestures for artistic performance interaction
Most artistic performances rely on human gestures, ultimately resulting in an elaborate
interaction between the performer and the audience.
Humans, even without any kind of formal analysis background in music, dance or
gesture are typically able to extract, almost unconsciously, a great amount of relevant
information from a gesture. In fact, a gesture contains so much information,
why not use it to further enhance a performance?
Gestures and expressive communication are intrinsically connected, and being
intimately attached to our own daily existence, both have a central position in our
(nowadays) technological society. However, the use of technology to understand
gestures is still somehow vaguely explored, it has moved beyond its first steps
but the way towards systems fully capable of analyzing gestures is still long and
difficult (Volpe, 2005). Probably because, if on one hand, the recognition of
gestures is somehow a trivial task for humans, on the other hand, the endeavor of
translating gestures to the virtual world, with a digital encoding is a difficult and illdefined
task. It is necessary to somehow bridge this gap, stimulating a constructive
interaction between gestures and technology, culture and science, performance
and communication. Opening thus, new and unexplored frontiers in the design of
a novel generation of multimodal interactive systems.
This work proposes an interactive, real time, gesture recognition framework called
the Zatlab System (ZtS). This framework is flexible and extensible. Thus, it is in
permanent evolution, keeping up with the different technologies and algorithms that emerge at a fast pace nowadays. The basis of the proposed approach is to partition
a temporal stream of captured movement into perceptually motivated descriptive
features and transmit them for further processing in Machine Learning algorithms.
The framework described will take the view that perception primarily depends on
the previous knowledge or learning. Just like humans do, the framework will have
to learn gestures and their main features so that later it can identify them. It is
however planned to be flexible enough to allow learning gestures on the fly.
This dissertation also presents a qualitative and quantitative experimental validation
of the framework. The qualitative analysis provides the results concerning
the users acceptability of the framework. The quantitative validation provides the
results about the gesture recognizing algorithms. The use of Machine Learning
algorithms in these tasks allows the achievement of final results that compare or
outperform typical and state-of-the-art systems.
In addition, there are also presented two artistic implementations of the framework,
thus assessing its usability amongst the artistic performance domain.
Although a specific implementation of the proposed framework is presented in this
dissertation and made available as open source software, the proposed approach
is flexible enough to be used in other case scenarios, paving the way to applications
that can benefit not only the performative arts domain, but also, probably in the near
future, helping other types of communication, such as the gestural sign language
for the hearing impaired.Grande parte das apresentações artísticas são baseadas em gestos humanos,
ultimamente resultando numa intricada interação entre o performer e o público.
Os seres humanos, mesmo sem qualquer tipo de formação em música, dança ou
gesto são capazes de extrair, quase inconscientemente, uma grande quantidade
de informações relevantes a partir de um gesto. Na verdade, um gesto contém
imensa informação, porque não usá-la para enriquecer ainda mais uma performance?
Os gestos e a comunicação expressiva estão intrinsecamente ligados e estando
ambos intimamente ligados à nossa própria existência quotidiana, têm uma posicão
central nesta sociedade tecnológica actual. No entanto, o uso da tecnologia para
entender o gesto está ainda, de alguma forma, vagamente explorado. Existem
já alguns desenvolvimentos, mas o objetivo de sistemas totalmente capazes de
analisar os gestos ainda está longe (Volpe, 2005). Provavelmente porque, se
por um lado, o reconhecimento de gestos é de certo modo uma tarefa trivial
para os seres humanos, por outro lado, o esforço de traduzir os gestos para
o mundo virtual, com uma codificação digital é uma tarefa difícil e ainda mal
definida. É necessário preencher esta lacuna de alguma forma, estimulando uma
interação construtiva entre gestos e tecnologia, cultura e ciência, desempenho e
comunicação. Abrindo assim, novas e inexploradas fronteiras na concepção de
uma nova geração de sistemas interativos multimodais .
Este trabalho propõe uma framework interativa de reconhecimento de gestos, em tempo real, chamada Sistema Zatlab (ZtS). Esta framework é flexível e extensível.
Assim, está em permanente evolução, mantendo-se a par das diferentes tecnologias
e algoritmos que surgem num ritmo acelerado hoje em dia. A abordagem
proposta baseia-se em dividir a sequência temporal do movimento humano nas
suas características descritivas e transmiti-las para posterior processamento, em
algoritmos de Machine Learning. A framework descrita baseia-se no facto de que
a percepção depende, principalmente, do conhecimento ou aprendizagem prévia.
Assim, tal como os humanos, a framework terá que aprender os gestos e as suas
principais características para que depois possa identificá-los. No entanto, esta
está prevista para ser flexível o suficiente de forma a permitir a aprendizagem de
gestos de forma dinâmica.
Esta dissertação apresenta também uma validação experimental qualitativa e quantitativa
da framework. A análise qualitativa fornece os resultados referentes à
aceitabilidade da framework. A validação quantitativa fornece os resultados sobre
os algoritmos de reconhecimento de gestos. O uso de algoritmos de Machine
Learning no reconhecimento de gestos, permite a obtençãoc¸ ˜ao de resultados finais
que s˜ao comparaveis ou superam outras implementac¸ ˜oes do mesmo g´enero.
Al ´em disso, s˜ao tamb´em apresentadas duas implementac¸ ˜oes art´ısticas da framework,
avaliando assim a sua usabilidade no dom´ınio da performance art´ıstica.
Apesar duma implementac¸ ˜ao espec´ıfica da framework ser apresentada nesta dissertac¸ ˜ao
e disponibilizada como software open-source, a abordagem proposta ´e suficientemente
flex´ıvel para que esta seja usada noutros cen´ arios. Abrindo assim, o
caminho para aplicac¸ ˜oes que poder˜ao beneficiar n˜ao s´o o dom´ınio das artes
performativas, mas tamb´em, provavelmente num futuro pr ´oximo, outros tipos de
comunicac¸ ˜ao, como por exemplo, a linguagem gestual usada em casos de deficiˆencia
auditiva
The Nexus between Artificial Intelligence and Economics
This book is organized as follows. Section 2 introduces the notion of the Singularity, a stage in development in which technological progress and economic growth increase at a near-infinite rate. Section 3 describes what artificial intelligence is and how it has been applied. Section 4 considers artificial happiness and the likelihood that artificial intelligence might increase human happiness. Section 5 discusses some prominent related concepts and issues. Section 6 describes the use of artificial agents in economic modeling, and section 7 considers some ways in which economic analysis can offer some hints about what the advent of artificial intelligence might bring. Chapter 8 presents some thoughts about the current state of AI and its future prospects.
Engineering systematic musicology : methods and services for computational and empirical music research
One of the main research questions of *systematic musicology* is concerned with how people make sense of their musical environment. It is concerned with signification and meaning-formation and relates musical structures to effects of music. These fundamental aspects can be approached from many different directions. One could take a cultural perspective where music is considered a phenomenon of human expression, firmly embedded in tradition. Another approach would be a cognitive perspective, where music is considered as an acoustical signal of which perception involves categorizations linked to representations and learning. A performance perspective where music is the outcome of human interaction is also an equally valid view. To understand a phenomenon combining multiple perspectives often makes sense. The methods employed within each of these approaches turn questions into
concrete musicological research projects. It is safe to say that today many of these methods draw upon digital data and tools. Some of those general methods are feature extraction from audio and movement signals, machine learning, classification and statistics. However, the problem is that, very often, the *empirical and computational methods require technical solutions* beyond the skills of researchers that typically have a humanities background. At that point, these researchers need access to specialized technical knowledge to advance their research. My PhD-work should be seen within the context of that tradition. In many respects I adopt a problem-solving attitude to problems that are posed by research in systematic musicology. This work *explores solutions that are relevant for systematic musicology*. It does this by engineering solutions for measurement problems in empirical research and developing research software which facilitates computational research. These solutions are placed in an
engineering-humanities plane. The first axis of the plane contrasts *services* with *methods*. Methods *in* systematic musicology propose ways to generate new insights in music related phenomena or contribute to how research can be done. Services *for* systematic musicology, on the other hand, support or automate research tasks which allow to change the scope of research. A shift in scope allows researchers to cope with larger data sets which offers a broader view on the phenomenon. The
second axis indicates how important Music Information Retrieval (MIR) techniques are in a solution. MIR-techniques are contrasted with various techniques to support empirical research. My research resulted in a total of thirteen solutions which are placed in this plane. The description of seven of these are bundled in this dissertation. Three fall into the methods category and four in the services category. For example Tarsos presents a method to compare performance practice with theoretical scales on a large scale. SyncSink is an example of a service
Digital signal processing optical receivers for the mitigation of physical layer impairments in dynamic optical networks
IT IS generally believed by the research community that the introduction of complex
network functions—such as routing—in the optical domain will allow a better network
utilisation, lower cost and footprint, and a more efficiency in energy usage. The new optical
components and sub-systems intended for dynamic optical networking introduce
new kinds of physical layer impairments in the optical signal, and it is of paramount
importance to overcome this problem if dynamic optical networks should become a
reality. Thus, the aim of this thesis was to first identify and characterise the physical
layer impairments of dynamic optical networks, and then digital signal processing
techniques were developed to mitigate them.
The initial focus of this work was the design and characterisation of digital optical
receivers for dynamic core optical networks. Digital receiver techniques allow for complex
algorithms to be implemented in the digital domain, which usually outperform
their analogue counterparts in performance and flexibility. An AC-coupled digital receiver
for core networks—consisting of a standard PIN photodiode and a digitiser that
takes samples at twice the Nyquist rate—was characterised in terms of both bit-error
rate and packet-error rate, and it is shown that the packet-error rate can be optimised by
appropriately setting the preamble length. Also, a realistic model of a digital receiver
that includes the quantisation impairments was developed. Finally, the influence of
the network load and the traffic sparsity on the packet-error rate performance of the
receiver was investigated.
Digital receiver technologies can be equally applied to optical access networks,
which share many traits with dynamic core networks. A dual-rate digital receiver, capable
of detecting optical packets at 10 and 1.25 Gb/s, was developed and characterised.
The receiver dynamic range was extended by means of DC-coupling and non-linear
signal clipping, and it is shown that the receiver performance is limited by digitiser
noise for low received power and non-linear clipping for high received power