8,436 research outputs found
Colour technologies for content production and distribution of broadcast content
The requirement of colour reproduction has long been a priority driving the development of new colour imaging systems that maximise human perceptual plausibility. This thesis explores machine learning algorithms for colour processing to assist both content production and distribution. First, this research studies colourisation technologies with practical use cases in restoration and processing of archived content. The research targets practical deployable solutions, developing a cost-effective pipeline which integrates the activity of the producer into the processing workflow. In particular, a fully automatic image colourisation paradigm using Conditional GANs is proposed to improve content generalisation and colourfulness of existing baselines. Moreover, a more conservative solution is considered by providing references to guide the system towards more accurate colour predictions. A fast-end-to-end architecture is proposed to improve existing exemplar-based image colourisation methods while decreasing the complexity and runtime. Finally, the proposed image-based methods are integrated into a video colourisation pipeline. A general framework is proposed to reduce the generation of temporal flickering or propagation of errors when such methods are applied frame-to-frame. The proposed model is jointly trained to stabilise the input video and to cluster their frames with the aim of learning scene-specific modes. Second, this research explored colour processing technologies for content distribution with the aim to effectively deliver the processed content to the broad audience. In particular, video compression is tackled by introducing a novel methodology for chroma intra prediction based on attention models. Although the proposed architecture helped to gain control over the reference samples and better understand the prediction process, the complexity of the underlying neural network significantly increased the encoding and decoding time. Therefore, aiming at efficient deployment within the latest video coding standards, this work also focused on the simplification of the proposed architecture to obtain a more compact and explainable model
Recommended from our members
Neural mechanisms of attention and speech perception in complex, spatial acoustic environment
We can hold conversations with people in environments where typically there are additional simultaneous talkers in background acoustic space or noise like vehicles on the street or music playing at a café on the sidewalk. This seemingly trivial everyday task is difficult for people with hearing deficits and is extremely hard to model in machines. This dissertation focuses on exploring the neural mechanisms of how the human brain encodes such complex acoustic environments and how cognitive processes like attention shapes processing of the attended speech. My initial experiments explore the representation of acoustic features that help us localize single sound sources in the environment- features like direction and spectrotemporal content of the sounds, and the interaction of these representations with each other. I play natural American English sentences coming from five azimuthal directions in space.
Using intracranial electrocorticography (ECoG) recordings from the human auditory cortex of the listener, I show that the direction of sound and the spectrotemporal content are encoded in two distinct aspects of neural response, the direction modulates the mean of the response and the spectrotemporal features contributes to the modulation of neural response around its mean. Furthermore, I show that these features are orthogonal to each other and do not interact. This representation enables successful decoding of both spatial and phonetic information. These findings contribute to defining the functional organization of responses in the human auditory cortex, with implications for more accurate neurophysiological models of spatial speech processing.
I take a step further to investigate the role of attention in encoding the direction and phonetic features of speech. I play a mixture of male and female spatialized talkers eg. male at left side to the listener and female at right side (talker’s locations switch randomly after each sentence). I ask the listener to follow a given talker e.g. follow male talker as they switch their location after each uttered sentence. While the listener performs this experiment, I collect intracranial EEG data from their auditory cortex. I investigate the bottom-up stimulus dependent and attention independent encoding of such a cocktail party speech and the top-down attention driven role in the encoding of location and speech features. I find a bottom-up stimulus driven contralateral preference in encoding of the mixed speech i.e. Left brain hemisphere automatically and predominantly encodes speech coming from right direction and vice-versa. On top of this bottom-up representation, I find that attended talker’s direction modulates the baseline of the neural response and attended talker’s voice modulates the spectrotemporal tuning of the neural response. Moreover, the modulation to attended talker’s location is present throughout the auditory cortex but the modulation to attended talker’s voice is present only at higher order auditory cortex areas. My findings provide crucially needed evidence to determine how bottom-up and top-down signals interact in the auditory cortex in crowded and complex acoustic scenes to enable robust speech perception. Furthermore, they shed light on the hierarchical encoding of attended speech that have implications on bettering the auditory attention decoding models.
Finally, I talk about a clinical case study where we show that electrical stimulation to specific sites in planum temporale (PT) of an epilepsy patient implanted with intracranial electrode leads to enhancement in speech in noise perception. When noisy speech is played with such an electrical stimulation, the patient perceives that the noise disappears, and that the speech is similar to clean speech that they hear without any noise. We performed series of analysis to determine functional organization of the three main sub regions of the human auditory cortex- planum temporale (PT), Heschl’s gyrus (HG) and superior temporal gyrus (STG). Using Cortico-Cortical Evoked Potentials (CCEPs), we modeled the PT sites to be located between the sites in HG and STG. Furthermore, we find that the discriminability of speech from nonspeech sounds increased in population neural responses from HG to the PT to the STG sites. These findings causally implicate the PT in background noise suppression and may point to a novel potential neuroprosthetic solution to assist in the challenging task of speech perception in noise.
Together, this dissertation shows new evidence for the neural encoding of spatial speech; interaction of stimulus driven, and attention driven neural processes in spatial multi-talker speech perception and enhancement of speech in noise perception by electrical brain stimulation
2023-2024 Boise State University Undergraduate Catalog
This catalog is primarily for and directed at students. However, it serves many audiences, such as high school counselors, academic advisors, and the public. In this catalog you will find an overview of Boise State University and information on admission, registration, grades, tuition and fees, financial aid, housing, student services, and other important policies and procedures. However, most of this catalog is devoted to describing the various programs and courses offered at Boise State
Modelling, Monitoring, Control and Optimization for Complex Industrial Processes
This reprint includes 22 research papers and an editorial, collected from the Special Issue "Modelling, Monitoring, Control and Optimization for Complex Industrial Processes", highlighting recent research advances and emerging research directions in complex industrial processes. This reprint aims to promote the research field and benefit the readers from both academic communities and industrial sectors
Security and Privacy Problems in Voice Assistant Applications: A Survey
Voice assistant applications have become omniscient nowadays. Two models that
provide the two most important functions for real-life applications (i.e.,
Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR)
models and Speaker Identification (SI) models. According to recent studies,
security and privacy threats have also emerged with the rapid development of
the Internet of Things (IoT). The security issues researched include attack
techniques toward machine learning models and other hardware components widely
used in voice assistant applications. The privacy issues include technical-wise
information stealing and policy-wise privacy breaches. The voice assistant
application takes a steadily growing market share every year, but their privacy
and security issues never stopped causing huge economic losses and endangering
users' personal sensitive information. Thus, it is important to have a
comprehensive survey to outline the categorization of the current research
regarding the security and privacy problems of voice assistant applications.
This paper concludes and assesses five kinds of security attacks and three
types of privacy threats in the papers published in the top-tier conferences of
cyber security and voice domain.Comment: 5 figure
Decoding spatial location of attended audio-visual stimulus with EEG and fNIRS
When analyzing complex scenes, humans often focus their attention on an object at a particular spatial location in the presence of background noises and irrelevant visual objects. The ability to decode the attended spatial location would facilitate brain computer interfaces (BCI) for complex scene analysis. Here, we tested two different neuroimaging technologies and investigated their capability to decode audio-visual spatial attention in the presence of competing stimuli from multiple locations. For functional near-infrared spectroscopy (fNIRS), we targeted dorsal frontoparietal network including frontal eye field (FEF) and intra-parietal sulcus (IPS) as well as superior temporal gyrus/planum temporal (STG/PT). They all were shown in previous functional magnetic resonance imaging (fMRI) studies to be activated by auditory, visual, or audio-visual spatial tasks. We found that fNIRS provides robust decoding of attended spatial locations for most participants and correlates with behavioral performance. Moreover, we found that FEF makes a large contribution to decoding performance. Surprisingly, the performance was significantly above chance level 1s after cue onset, which is well before the peak of the fNIRS response.
For electroencephalography (EEG), while there are several successful EEG-based algorithms, to date, all of them focused exclusively on auditory modality where eye-related artifacts are minimized or controlled. Successful integration into a more ecological typical usage requires careful consideration for eye-related artifacts which are inevitable. We showed that fast and reliable decoding can be done with or without ocular-removal algorithm. Our results show that EEG and fNIRS are promising platforms for compact, wearable technologies that could be applied to decode attended spatial location and reveal contributions of specific brain regions during complex scene analysis
Anticholinergic use in the UK: longitudinal trends and associations with cognitive outcomes
Observational studies have shown an association between the use of anticholinergic drugs and various negative health outcomes. However, when studying cognitive outcomes, there is great heterogeneity in previous results. The objectives of the present thesis are threefold. First, to explore the longitudinal patterns of anticholinergic prescribing in the UK. Second, to examine the association between anticholinergic burden and dementia. Third, to probe the relationship between anticholinergic burden, general cognitive ability, and brain structural MRI in relatively healthy participants.
Chapter 1 provides an overview of the role of acetylcholine as a neurotransmitter in the human body. It begins with a description of its molecular characteristics and continues with a summary of anatomical and cellular features of cholinergic pathways in the brain. The chapter concludes with a description of the relevance of cholinergic processing in cognition and Alzheimer’s disease.
Chapter 2 gives a summary of anticholinergic drugs. It describes the history of anticholinergic compounds and their present use in medicine. It then appraises the tools used to gauge the anticholinergic potency of drugs. I conclude the Chapter by evaluating the available evidence on the effects of anticholinergic drugs on various important health outcomes.
Chapter 3 focuses on UK Biobank, the sample used in all analyses presented in this thesis. The chapter briefly describes the conception of the study, the timeline of assessments, and the available variables. I focus in my descriptions on the variables that were used in the present thesis, especially cognitive tests, brain imaging, and linked health data.
Chapters 4 to 6 present the empirical work conducted as part of this thesis. Chapter 4 presents an analysis of anticholinergic prescribing trends in UK primary care from the year 1990 to 2015. I first calculate an anticholinergic burden (AChB) according to 13 different anticholinergic scales and an average to derive a “Meta-scale”. I then describe the prevalence of anticholinergic prescribing and its longitudinal trend for all scales. I use different plots of age-, period- and cohort effects on the AChB according to the Meta-scale to evaluate the contributions of these effects to the linear longitudinal trend. The study finds AChB to have increased 9-fold over 25 years and that this effect was attributable to both age- and cohort/period-related changes. In other words, ageing of the sample is not sufficient to explain the increase in anticholinergic prescribing; cohort- or period-effects must have contributed to the observed changes.
Chapter 5 explores the relationship between anticholinergic prescribing and dementia. Previous studies on this topic had provided varied results. One of the goals of the present study was to probe potential factors for this heterogeneity. We find that greater AChB according to most of the studied anticholinergic scales (standardised HRs range: 1.027-1.125), as well as the slope of anticholinergic change (HR=1.094; 95% CI: 1.068-1.119), are associated with dementia. However, we find that not all drug classes are associated with dementia. Antidepressants (HR=1.11, 95% CI=1.07-1.17), antiepileptics (HR=1.07, 95% CI=1.04-1.11), and the antidiuretic furosemide (HR=1.06, 95% CI=1.02-1.10) exhibit the strongest effects. Interestingly, when exploring the effects of groups of anticholinergic drugs with different anticholinergic potencies, only the moderate potency group shows significant associations with dementia (HR=1.10, 95% CI=1.05-1.15).
Chapter 6 examines the association between AChB, general cognitive ability, and brain structural MRI. It aims both to explore the potential sources of heterogeneity in previous work, as well as to expand on it by studying relatively healthy community-dwelling adults. We study brain structural MRI in a much bigger sample (at least 5x bigger) and use many more outcomes than previous studies. We find weak, but significant associations between AChB and general cognitive ability, and with 7/9 individual cognitive tests (standardised betas (β) range: -0.039, -0.003). Again, AChB in only some drug classes is associated with lower general cognitive ability, especially β-lactam antibiotics (β=-0.035, pFDR.
Finally, chapter 7 summarizes the findings presented in chapters 4 to 6. The chapter also provides a critique of the sample and of my approach when conducting the analyses presented in the present thesis. The chapter concludes by discussing suggestions for future work on this topic
Deep Learning Enabled Semantic Communication Systems
In the past decades, communications primarily focus on how to accurately and effectively transmit symbols (measured by bits) from the transmitter to the receiver. Recently, various new applications appear, such as autonomous transportation, consumer robotics, environmental monitoring, and tele-health. The interconnection of these applications will generate a staggering amount of data in the order of zetta-bytes and require massive connectivity over limited spectrum resources but with lower latency, which poses critical challenges to conventional communication systems. Semantic communication has been proposed to overcome the challenges by extracting the meanings of data and filtering out the useless, irrelevant, and unessential information, which is expected to be robust to terrible channel environments and reduce the size of transmitted data. While semantic communications have been proposed decades ago, their applications to the wireless communication scenario remain limited. Deep learning (DL) based neural networks can effectively extract semantic information and can be optimized in an end-to-end (E2E) manner. The inborn characteristics of DL are suitable for semantic communications, which motivates us to exploit DL-enabled semantic communication. Inspired by the above, this thesis focus on exploring the semantic communication theory and designing semantic communication systems. First, a basic DL based semantic communication system, named DeepSC, is proposed for text transmission. In addition, DL based multi-user semantic communication systems are investigated for transmitting single-modal data and multimodal data, respectively, in which intelligent tasks are performed at the receiver directly. Moreover, a semantic communication system with a memory module, named Mem-DeepSC, is designed to support both memoryless and memory intelligent tasks. Finally, a lite distributed semantic communication system based on DL, named L-DeepSC, is proposed with low complexity, where the data transmission from the Internet-of-Things (IoT) devices to the cloud/edge works at the semantic level to improve transmission efficiency. The proposed various DeepSC systems can achieve less data transmission to reduce the transmission latency, lower complexity to fit capacity-constrained devices, higher robustness to multi-user interference and channel noise, and better performance to perform various intelligent tasks compared to the conventional communication systems
- …