215 research outputs found
Hidden Markov Models
Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research
Audio Spatio-Temporal Fingerprints for Cloudless Real-Time Hands-Free Diarization on Mobile Devices
In this paper, we propose a new low bit rate representation of a sound field and a new method for the corresponding cloudless low delay hands-free diarization suitable for low-performance mobile devices, e.g. mobile phones. The proposed audio spatio-temporal fingerprint representation results in low bit rate (500 bytes/second), however contains complete information about continuous audio tracking of multiple acoustic sources in an open, unconstrained environment. The core of the algorithm is based on simultaneous multiple data stream processing using audio spatio-temporal fingerprint representation to cover higher level events relevant for diarization, e.g. turns, interruptions, crosstalk, speech and non-speech segments. Performance levels achieved to date on 5 hours of hand-labelled datasets have shown the feasibility of the approach at the same time as resulting in 7.58% CPU load on 1-core ultra-low-power mobile processor running at 1 GHz and low algorithmic delay of 112 ms
Cross-layer design for multimedia applications in cognitive radio networks.
Ph. D. University of KwaZulu-Natal, Durban 2015.The exponential growth in wireless services and the current trend of development in wireless
communication technologies have resulted into an overcrowded radio spectrum band in such
a way that it can no longer meet the ever increasing requirements of wireless applications.
In contrary however, literature surveys indicate that a large amount of the licensed radio
spectrum bands are underutilized. This has necessitated the need for efficient ways to be
implemented for spectrum sharing among different systems, applications and services in
dynamic wireless environment. Cognitive radio (CR) technology emerges as a way to improve
the overall efficiency of radio spectrum utilization by allowing unlicensed users (also known
as secondary user) to utilize a licensed band when it is vacant.
Multimedia applications are being targeted for CR networks. However, the performance
and success of CR technology will be determined by the quality of service (QoS) perceived
by secondary users. In order to transmit multimedia contents which have stringent QoS
requirements over the CR networks, many technical challenges have to be addressed that are
constrained by the layered protocol architecture. Cross-layer design has shown a promise as
an approach to optimize network performance among different layers. This work is aimed
at addressing the question on how to provide QoS guarantee for multimedia transmission
over CR networks in terms of throughput maximization while ensuring that the interference
to primary users is avoided or minimized. Spectrum sensing is a fundamental problem in
cognitive radio networks for the protection of primary users and therefore the first part of
this work provides a review of some low complex spectrum sensing schemes. A cooperative
spectrum sensing scheme where multi-users are independently performing spectrum sensing
is also developed. In order to address a hidden node problem, a cooperate relay based on
amplify-and-forward technique (AF) is formulated. Usually the performance of a spectrum
sensor is evaluated using receiver operating characteristic (ROC) curve which provides a
trade-off between the probability of miss detection and the probability of false alarm. Due
to hardware limitations, the spectrum sensor can not sense the whole range of radio spec-
trum which results into partial information of the channel state. In order to model a media
access control(MAC) protocol which is able to make channel access decision under partial
information about the state of the system we apply a partially observable Markov decision
process (POMDP) technique as a suitable tool in making decision under uncertainty. A
throughput optimization MAC scheme in presence of spectrum sensing errors is then devel-
oped using the concept of cross-layer design which integrates the design of spectrum sensing
at physical layer (PHY) and sensing and access strategies at MAC layer in order to maximize
the overall network throughput. A problem is formulated as a POMDP and the throughput
performance of the scheme is evaluated using computer simulations under greedy sensing
algorithm. Simulation results demonstrate an improved overall throughput performance.
Further more, multiple channels with multiple secondary users having random message ar-
rivals are considered during simulation and the throughput performance is evaluated under
greedy sensing scheme which forms a benchmark for cross-layer MAC scheme in presence
of spectrum sensing errors. By realizing that speech communication is still the most dom-
inant and common service in wireless application, we develop a cross-layer MAC scheme
for speech transmission in CR networks. The design is aimed at maximizing throughput of
secondary users by integrating the design of spectrum sensing at PHY, quantization param-
eter of speech traffic at application layer (APP), together with strategy for spectrum access
at MAC layer with the main goal to improve the QoS perceived by secondary users in CR
networks. Simulation results demonstrate throughput performance improvement and hence
QoS is improved.
One of the main features of the modern communication systems is the parameterized
operation at different layers of the protocol stack. The feature aims at providing them with
the capability of adapting to the rapidly changing traffic, channel and system conditions.
Another interesting research problem in this thesis is the combination of individual adap-
tation mechanisms into a cross-layer that can maximize their effectiveness. We propose a
joint cross-layer design MAC scheme that integrates the design of spectrum sensing at PHY
layer, access at MAC layer and APP information in order to improve the QoS for video
transmission in CR networks. The end-to-end video distortion which is considered as an
APP parameter resides in the video encoder. This is integrated in the state space and the
problem is formulated as a constrained POMDP. H.264 coding algorithm which is one of the
high efficient video coding standards is considered. The objective is to minimize this end-to-
end video distortion while maximizes the overall network throughput for video transmission
in CR networks. The end-to-end video distortion has signifficant effects to the QoS the per-
ceived by the user and is viewed as the cost in the overall system design. Given the target
system throughput, the packet loss ration when the system is in the state i and a composite
action is taken in time slot t, the system immediate cost is evaluated. The expected total
cost for overall end-to-end video distortion over the total time slots is then computed. A
joint optimal policy which minimizes the expected total end-to-end distortion in total time
slots is computed iteratively. The minimum expected cost (which also known as the value
function) is also evaluated iteratively for the total time slots. The throughput performance
of the proposed scheme is evaluated through computer simulation. In order to study the
throughput performance of the proposed scheme, we considered four simulation scenarios
namely simulation scenario A, simulation scenario B, simulation scenario C, and simulation
scenario D. These simulation scenarios enabled us to study the throughput performance of
the proposed scheme by by computer simulations. In the simulation scenario A, the av-
erage throughput performance as a function of time horizon is studied. The throughput
performance under channel access decision based on belief vector and that of channel access
decision based on the end-to-end distortion are compared. Simulation results show that the
channel access decision based on end-to-end distortion outperforms that of channel access
decision based on a belief vector. In the simulation scenario B we aimed at studying the
spectral efficiency as a function of prescribed collision probability. The simulation results
show that, at large values of collision probability the overall spectral efficiency performs
poorly. However, there is an optimal value of collision probability of which the spectral
efficiency approaches that of the perfect channel access decision. In the simulation scenario
C, we aimed at studying the average throughput performance and the spectral efficiency
both as a function of prescribed collision probability. The simulation results show that both
average throughput and the spectral efficiency are highly affected by the increase in collision
probability. However, there is an optimal prescribed collision probability which achieves the
maximum average throughput and maximum spectral efficiency
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
- …