21 research outputs found

    An analysis of frequency recognition algorithms and implementation in realtime

    Get PDF
    Frequency recognition is an important task in many engineering fields, such as audio signal processing and telecommunications engineering. There are numerous applications where frequency recognition is absolutely necessary like in Dual-Tone Multi-Frequency (DTMF) detection or the recognition of the carrier frequency of a Global Positioning System (GPS) signal. Furthermore, frequency recognition has entered many other engineering disciplines such as sonar and radar technology, spectral analysis of astronomic data, seismography, acoustics and consumer electronics. Listening to electronic music and playing electronic musical instruments is becoming more and more popular, not only among young musicians. This dissertation details back groundinformation and a preliminary analysis of a musical system, the Generic Musical Instrument System (GMIS), which allows composers to experiment with electronic instruments without actually, learning how to play them.This dissertation gives background information about frequency recognition algorithms implemented in real time. It analyses state-of-the-art techniques, such as Dual- Tone Multiple Frequency (DTMF) implementations and MIDI-based musical systems, in order to work out their similarities. The key idea is to adapt well-proven frequency recognition algorithms of DTMF systems, which are successfully and widely used in telephony. The investigations will show to what extent these principles and algorithms can be applied to a musical system like the GMIS. This dissertation presents results of investigations into frequency recognition algorithms implemented on a Texas Instruments (TI) TMS320C6713 Digital Signal Processor (DSP) core, in order to estimate the frequency of an audio signal in real time. The algorithms are evaluated using selected criteria in terms of speed and accuracy with accomplishing over 9600 single measurements. The evaluations are made with simple sinusoids and musical notes played by instruments as input signals which allows a solid decision, which of these frequency recognition algorithms is appropriate for audio signal processing and for the constraints of the GMIS in real time

    Interfaces avanzados aplicados a la interacción musical

    Get PDF
    The latest advances in human-computer interaction technologies have brought forth changes in the way we interact with computing devices of any kind, from the standard desktop computer to the more recent smartphones. The development of these technologies has thus introduced new interaction metaphors that provide more enriching experiences for a wide range of different applications. Music is one of most ancient forms of art and entertainment that can be found in our legacy, and conforms a strong interactive experience on itself. The application of new technologies to enhance music computer-based interaction paradigms can potentially provide all sorts of improvements: providing low-cost access to music rehearsal, lowering knowledge barriers in regard to music learning, virtual instrument simulation, etc. Yet, surprisingly, there has been rather limited research on the application of new interaction models and technologies to the specific field of music interaction in regard to other areas. This thesis aims to address the aforementioned need by presenting a set of studies which cover the use of innovative interaction models for music-based applications, from interaction paradigms for music learning to more entertainment-oriented interaction interfaces, such as virtual musical instruments, ensemble conductor simulation, etc. The main contributions of this thesis are: · It is shown that the use of signal processing techniques on the music signal and music information retrieval techniques can create enticing interfaces for music learning. Concretely, the research conducted includes the implementation and experimental evaluation of a set of different learning-oriented applications which make use of these techniques to implement inexpensive, easy-to-use human-computer interfaces, which serve as support tools in music learning processes. · This thesis explores the use of tracking systems and machine learning techniques to achieve more sophisticated interfaces for innovative music interaction paradigms. Concretely, the studies conducted have shown that it is indeed feasible to emulate the functionally of musical instruments such as the drumkit or the theremin. In a similar way, it is shown that more complex musical roles can also be recreated through the use of new interaction models, such as the case of the ensemble conductor or a step-aerobics application. · The benefits in using advanced human-computer interfaces in musical experiences are review and assessed through experimental evaluation. It is shown that the addition of these interfaces contributes positively to user perception, providing more satisfying and enriching experiences overall. · The thesis also illustrates that the use of machine learning algoriths and signal processing along with new interaction devices provides an effective framework for human gesture recognition and prediction, and even mood estimation

    Automatic transcription of polyphonic music exploiting temporal evolution

    Get PDF
    PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

    Applications of loudness models in audio engineering

    Get PDF
    This thesis investigates the application of perceptual models to areas of audio engineering, with a particular focus on music production. The goal was to establish efficient and practical tools for the measurement and control of the perceived loudness of musical sounds. Two types of loudness model were investigated: the single-band model and the multiband excitation pattern (EP) model. The heuristic single-band devices were designed to be simple but sufficiently effective for real-world application, whereas the multiband procedures were developed to give a reasonable account of a large body of psychoacoustic findings according to a functional model of the peripheral hearing system. The research addresses the extent to which current models of loudness generalise to musical instruments, and whether can they be successfully employed in music applications. The domain-specific disparity between the two types of model was first tackled by reducing the computational load of state-of-the-art EP models to allow for fast but low-error auditory signal processing. Two elaborate hearing models were analysed and optimised using musical instruments and speech as test stimuli. It was shown that, after significantly reducing the complexity of both procedures, estimates of global loudness, such as peak loudness, as well as the intermediate auditory representations can be preserved with high accuracy. Based on the optimisations, two real-time applications were developed: a binaural loudness meter and an automatic multitrack mixer. This second system was designed to work independently of the loudness measurement procedure, and therefore supports both linear and nonlinear models. This allowed for a single mixing device to be assessed using different loudness metrics and this was demonstrated by evaluating three configurations through subjective assessment. Unexpectedly, when asked to rate both the overall quality of a mix and the degree to which instruments were equally loud, listeners preferred mixes generated using heuristic single-band models over those produced using a multiband procedure. A series of more systematic listening tests were conducted to further investigate this finding. Subjective loudness matches of musical instruments commonly found in western popular music were collected to evaluate the performance of five published models. The results were in accord with the application-based assessment, namely that current EP procedures do not generalise well when estimating the relative loudness of musical sounds which have marked differences in spectral content. Model specific issues were identified relating to the calculation of spectral loudness summation (SLS) and the method used to determine the global-loudness percept of time-varying musical sounds; associated refinements were proposed. It was shown that a new multiband loudness model with a heuristic loudness transformation yields superior performance over existing methods. This supports the idea that a revised model of SLS is needed, and therefore that modification to this stage in existing psychoacoustic procedures is an essential step towards the goal of achieving real-world deployment

    ZATLAB : recognizing gestures for artistic performance interaction

    Get PDF
    Most artistic performances rely on human gestures, ultimately resulting in an elaborate interaction between the performer and the audience. Humans, even without any kind of formal analysis background in music, dance or gesture are typically able to extract, almost unconsciously, a great amount of relevant information from a gesture. In fact, a gesture contains so much information, why not use it to further enhance a performance? Gestures and expressive communication are intrinsically connected, and being intimately attached to our own daily existence, both have a central position in our (nowadays) technological society. However, the use of technology to understand gestures is still somehow vaguely explored, it has moved beyond its first steps but the way towards systems fully capable of analyzing gestures is still long and difficult (Volpe, 2005). Probably because, if on one hand, the recognition of gestures is somehow a trivial task for humans, on the other hand, the endeavor of translating gestures to the virtual world, with a digital encoding is a difficult and illdefined task. It is necessary to somehow bridge this gap, stimulating a constructive interaction between gestures and technology, culture and science, performance and communication. Opening thus, new and unexplored frontiers in the design of a novel generation of multimodal interactive systems. This work proposes an interactive, real time, gesture recognition framework called the Zatlab System (ZtS). This framework is flexible and extensible. Thus, it is in permanent evolution, keeping up with the different technologies and algorithms that emerge at a fast pace nowadays. The basis of the proposed approach is to partition a temporal stream of captured movement into perceptually motivated descriptive features and transmit them for further processing in Machine Learning algorithms. The framework described will take the view that perception primarily depends on the previous knowledge or learning. Just like humans do, the framework will have to learn gestures and their main features so that later it can identify them. It is however planned to be flexible enough to allow learning gestures on the fly. This dissertation also presents a qualitative and quantitative experimental validation of the framework. The qualitative analysis provides the results concerning the users acceptability of the framework. The quantitative validation provides the results about the gesture recognizing algorithms. The use of Machine Learning algorithms in these tasks allows the achievement of final results that compare or outperform typical and state-of-the-art systems. In addition, there are also presented two artistic implementations of the framework, thus assessing its usability amongst the artistic performance domain. Although a specific implementation of the proposed framework is presented in this dissertation and made available as open source software, the proposed approach is flexible enough to be used in other case scenarios, paving the way to applications that can benefit not only the performative arts domain, but also, probably in the near future, helping other types of communication, such as the gestural sign language for the hearing impaired.Grande parte das apresentações artísticas são baseadas em gestos humanos, ultimamente resultando numa intricada interação entre o performer e o público. Os seres humanos, mesmo sem qualquer tipo de formação em música, dança ou gesto são capazes de extrair, quase inconscientemente, uma grande quantidade de informações relevantes a partir de um gesto. Na verdade, um gesto contém imensa informação, porque não usá-la para enriquecer ainda mais uma performance? Os gestos e a comunicação expressiva estão intrinsecamente ligados e estando ambos intimamente ligados à nossa própria existência quotidiana, têm uma posicão central nesta sociedade tecnológica actual. No entanto, o uso da tecnologia para entender o gesto está ainda, de alguma forma, vagamente explorado. Existem já alguns desenvolvimentos, mas o objetivo de sistemas totalmente capazes de analisar os gestos ainda está longe (Volpe, 2005). Provavelmente porque, se por um lado, o reconhecimento de gestos é de certo modo uma tarefa trivial para os seres humanos, por outro lado, o esforço de traduzir os gestos para o mundo virtual, com uma codificação digital é uma tarefa difícil e ainda mal definida. É necessário preencher esta lacuna de alguma forma, estimulando uma interação construtiva entre gestos e tecnologia, cultura e ciência, desempenho e comunicação. Abrindo assim, novas e inexploradas fronteiras na concepção de uma nova geração de sistemas interativos multimodais . Este trabalho propõe uma framework interativa de reconhecimento de gestos, em tempo real, chamada Sistema Zatlab (ZtS). Esta framework é flexível e extensível. Assim, está em permanente evolução, mantendo-se a par das diferentes tecnologias e algoritmos que surgem num ritmo acelerado hoje em dia. A abordagem proposta baseia-se em dividir a sequência temporal do movimento humano nas suas características descritivas e transmiti-las para posterior processamento, em algoritmos de Machine Learning. A framework descrita baseia-se no facto de que a percepção depende, principalmente, do conhecimento ou aprendizagem prévia. Assim, tal como os humanos, a framework terá que aprender os gestos e as suas principais características para que depois possa identificá-los. No entanto, esta está prevista para ser flexível o suficiente de forma a permitir a aprendizagem de gestos de forma dinâmica. Esta dissertação apresenta também uma validação experimental qualitativa e quantitativa da framework. A análise qualitativa fornece os resultados referentes à aceitabilidade da framework. A validação quantitativa fornece os resultados sobre os algoritmos de reconhecimento de gestos. O uso de algoritmos de Machine Learning no reconhecimento de gestos, permite a obtençãoc¸ ˜ao de resultados finais que s˜ao comparaveis ou superam outras implementac¸ ˜oes do mesmo g´enero. Al ´em disso, s˜ao tamb´em apresentadas duas implementac¸ ˜oes art´ısticas da framework, avaliando assim a sua usabilidade no dom´ınio da performance art´ıstica. Apesar duma implementac¸ ˜ao espec´ıfica da framework ser apresentada nesta dissertac¸ ˜ao e disponibilizada como software open-source, a abordagem proposta ´e suficientemente flex´ıvel para que esta seja usada noutros cen´ arios. Abrindo assim, o caminho para aplicac¸ ˜oes que poder˜ao beneficiar n˜ao s´o o dom´ınio das artes performativas, mas tamb´em, provavelmente num futuro pr ´oximo, outros tipos de comunicac¸ ˜ao, como por exemplo, a linguagem gestual usada em casos de deficiˆencia auditiva

    The Nexus between Artificial Intelligence and Economics

    Get PDF
    This book is organized as follows. Section 2 introduces the notion of the Singularity, a stage in development in which technological progress and economic growth increase at a near-infinite rate. Section 3 describes what artificial intelligence is and how it has been applied. Section 4 considers artificial happiness and the likelihood that artificial intelligence might increase human happiness. Section 5 discusses some prominent related concepts and issues. Section 6 describes the use of artificial agents in economic modeling, and section 7 considers some ways in which economic analysis can offer some hints about what the advent of artificial intelligence might bring. Chapter 8 presents some thoughts about the current state of AI and its future prospects.

    Engineering systematic musicology : methods and services for computational and empirical music research

    Get PDF
    One of the main research questions of *systematic musicology* is concerned with how people make sense of their musical environment. It is concerned with signification and meaning-formation and relates musical structures to effects of music. These fundamental aspects can be approached from many different directions. One could take a cultural perspective where music is considered a phenomenon of human expression, firmly embedded in tradition. Another approach would be a cognitive perspective, where music is considered as an acoustical signal of which perception involves categorizations linked to representations and learning. A performance perspective where music is the outcome of human interaction is also an equally valid view. To understand a phenomenon combining multiple perspectives often makes sense. The methods employed within each of these approaches turn questions into concrete musicological research projects. It is safe to say that today many of these methods draw upon digital data and tools. Some of those general methods are feature extraction from audio and movement signals, machine learning, classification and statistics. However, the problem is that, very often, the *empirical and computational methods require technical solutions* beyond the skills of researchers that typically have a humanities background. At that point, these researchers need access to specialized technical knowledge to advance their research. My PhD-work should be seen within the context of that tradition. In many respects I adopt a problem-solving attitude to problems that are posed by research in systematic musicology. This work *explores solutions that are relevant for systematic musicology*. It does this by engineering solutions for measurement problems in empirical research and developing research software which facilitates computational research. These solutions are placed in an engineering-humanities plane. The first axis of the plane contrasts *services* with *methods*. Methods *in* systematic musicology propose ways to generate new insights in music related phenomena or contribute to how research can be done. Services *for* systematic musicology, on the other hand, support or automate research tasks which allow to change the scope of research. A shift in scope allows researchers to cope with larger data sets which offers a broader view on the phenomenon. The second axis indicates how important Music Information Retrieval (MIR) techniques are in a solution. MIR-techniques are contrasted with various techniques to support empirical research. My research resulted in a total of thirteen solutions which are placed in this plane. The description of seven of these are bundled in this dissertation. Three fall into the methods category and four in the services category. For example Tarsos presents a method to compare performance practice with theoretical scales on a large scale. SyncSink is an example of a service

    Digital signal processing optical receivers for the mitigation of physical layer impairments in dynamic optical networks

    Get PDF
    IT IS generally believed by the research community that the introduction of complex network functions—such as routing—in the optical domain will allow a better network utilisation, lower cost and footprint, and a more efficiency in energy usage. The new optical components and sub-systems intended for dynamic optical networking introduce new kinds of physical layer impairments in the optical signal, and it is of paramount importance to overcome this problem if dynamic optical networks should become a reality. Thus, the aim of this thesis was to first identify and characterise the physical layer impairments of dynamic optical networks, and then digital signal processing techniques were developed to mitigate them. The initial focus of this work was the design and characterisation of digital optical receivers for dynamic core optical networks. Digital receiver techniques allow for complex algorithms to be implemented in the digital domain, which usually outperform their analogue counterparts in performance and flexibility. An AC-coupled digital receiver for core networks—consisting of a standard PIN photodiode and a digitiser that takes samples at twice the Nyquist rate—was characterised in terms of both bit-error rate and packet-error rate, and it is shown that the packet-error rate can be optimised by appropriately setting the preamble length. Also, a realistic model of a digital receiver that includes the quantisation impairments was developed. Finally, the influence of the network load and the traffic sparsity on the packet-error rate performance of the receiver was investigated. Digital receiver technologies can be equally applied to optical access networks, which share many traits with dynamic core networks. A dual-rate digital receiver, capable of detecting optical packets at 10 and 1.25 Gb/s, was developed and characterised. The receiver dynamic range was extended by means of DC-coupling and non-linear signal clipping, and it is shown that the receiver performance is limited by digitiser noise for low received power and non-linear clipping for high received power
    corecore