12,556 research outputs found
Time as a supervisor: temporal regularity and auditory object learning
Sensory systems appear to learn to transform incoming sensory information into perceptual representations, or “objects,” that can inform and guide behavior with minimal explicit supervision. Here, we propose that the auditory system can achieve this goal by using time as a supervisor, i.e., by learning features of a stimulus that are temporally regular. We will show that this procedure generates a feature space sufficient to support fundamental computations of auditory perception. In detail, we consider the problem of discriminating between instances of a prototypical class of natural auditory objects, i.e., rhesus macaque vocalizations. We test discrimination in two ethologically relevant tasks: discrimination in a cluttered acoustic background and generalization to discriminate between novel exemplars. We show that an algorithm that learns these temporally regular features affords better or equivalent discrimination and generalization than conventional feature-selection algorithms, i.e., principal component analysis and independent component analysis. Our findings suggest that the slow temporal features of auditory stimuli may be sufficient for parsing auditory scenes and that the auditory brain could utilize these slowly changing temporal features
Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles
This thesis presents the utilization of multiple modalities, such as image and lidar, to incorporate spatio-temporal information from sequence data into deep learning architectures for 3Dobject detection in instrumented vehicles. The race to autonomy in instrumented vehicles or self-driving cars has stimulated significant research in developing autonomous driver assistance systems (ADAS) technologies related explicitly to perception systems. Object detection plays a crucial role in perception systems by providing spatial information to its subsequent modules; hence, accurate detection is a significant task supporting autonomous driving. The advent of deep learning in computer vision applications and the availability of multiple sensing modalities such as 360° imaging, lidar, and radar have led to state-of-the-art 2D and 3Dobject detection architectures. Most current state-of-the-art 3D object detection frameworks consider single-frame reference. However, these methods do not utilize temporal information associated with the objects or scenes from the sequence data. Thus, the present research hypothesizes that multimodal temporal information can contribute to bridging the gap between 2D and 3D metric space by improving the accuracy of deep learning frameworks for 3D object estimations. The thesis presents understanding multimodal data representations and selecting hyper-parameters using public datasets such as KITTI and nuScenes with Frustum-ConvNet as a baseline architecture. Secondly, an attention mechanism was employed along with convolutional-LSTM to extract spatial-temporal information from sequence data to improve 3D estimations and to aid the architecture in focusing on salient lidar point cloud features. Finally, various fusion strategies are applied to fuse the modalities and temporal information into the architecture to assess its efficacy on performance and computational complexity. Overall, this thesis has established the importance and utility of multimodal systems for refined 3D object detection and proposed a complex pipeline incorporating spatial, temporal and attention mechanisms to improve specific, and general class accuracy demonstrated on key autonomous driving data sets
Endogenous measures for contextualising large-scale social phenomena: a corpus-based method for mediated public discourse
This work presents an interdisciplinary methodology for developing endogenous measures of group membership through analysis of pervasive linguistic patterns in public discourse. Focusing on political discourse, this work critiques the conventional approach to the study of political participation, which is premised on decontextualised, exogenous measures to characterise groups. Considering the theoretical and empirical weaknesses of decontextualised approaches to large-scale social phenomena, this work suggests that contextualisation using endogenous measures might provide a complementary perspective to mitigate such weaknesses.
This work develops a sociomaterial perspective on political participation in mediated discourse as affiliatory action performed through language. While the affiliatory function of language is often performed consciously (such as statements of identity), this work is concerned with unconscious features (such as patterns in lexis and grammar). This work argues that pervasive patterns in such features that emerge through socialisation are resistant to change and manipulation, and thus might serve as endogenous measures of sociopolitical contexts, and thus of groups.
In terms of method, the work takes a corpus-based approach to the analysis of data from the Twitter messaging service whereby patterns in users’ speech are examined statistically in order to trace potential community membership. The method is applied in the US state of Michigan during the second half of 2018—6 November having been the date of midterm (i.e. non-Presidential) elections in the United States. The corpus is assembled from the original posts of 5,889 users, who are nominally geolocalised to 417 municipalities. These users are clustered according to pervasive language features. Comparing the linguistic clusters according to the municipalities they represent finds that there are regular sociodemographic differentials across clusters. This is understood as an indication of social structure, suggesting that endogenous measures derived from pervasive patterns in language may indeed offer a complementary, contextualised perspective on large-scale social phenomena
Security and Privacy Problems in Voice Assistant Applications: A Survey
Voice assistant applications have become omniscient nowadays. Two models that
provide the two most important functions for real-life applications (i.e.,
Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR)
models and Speaker Identification (SI) models. According to recent studies,
security and privacy threats have also emerged with the rapid development of
the Internet of Things (IoT). The security issues researched include attack
techniques toward machine learning models and other hardware components widely
used in voice assistant applications. The privacy issues include technical-wise
information stealing and policy-wise privacy breaches. The voice assistant
application takes a steadily growing market share every year, but their privacy
and security issues never stopped causing huge economic losses and endangering
users' personal sensitive information. Thus, it is important to have a
comprehensive survey to outline the categorization of the current research
regarding the security and privacy problems of voice assistant applications.
This paper concludes and assesses five kinds of security attacks and three
types of privacy threats in the papers published in the top-tier conferences of
cyber security and voice domain.Comment: 5 figure
Learning disentangled speech representations
A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody.
The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions.
In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks.
This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically
Discovering the hidden structure of financial markets through bayesian modelling
Understanding what is driving the price of a financial asset is a question that is currently mostly unanswered. In this work we go beyond the classic one step ahead prediction and instead construct models that create new information on the behaviour of these time series. Our aim is to get a better understanding of the hidden structures that drive the moves of each financial time series and thus the market as a whole.
We propose a tool to decompose multiple time series into economically-meaningful variables to explain the endogenous and exogenous factors driving their underlying variability. The methodology we introduce goes beyond the direct model forecast. Indeed, since our model continuously adapts its variables and coefficients, we can study the time series of coefficients and selected variables. We also present a model to construct the causal graph of relations between these time series and include them in the exogenous factors.
Hence, we obtain a model able to explain what is driving the move of both each specific time series and the market as a whole. In addition, the obtained graph of the time series provides new information on the underlying risk structure of this environment. With this deeper understanding of the hidden structure we propose novel ways to detect and forecast risks in the market. We investigate our results with inferences up to one month into the future using stocks, FX futures and ETF futures, demonstrating its superior performance according to accuracy of large moves, longer-term prediction and consistency over time. We also go in more details on the economic interpretation of the new variables and discuss the created graph structure of the market.Open Acces
Machine learning enabled millimeter wave cellular system and beyond
Millimeter-wave (mmWave) communication with advantages of abundant bandwidth and immunity to interference has been deemed a promising technology for the next generation network and beyond. With the help of mmWave, the requirements envisioned of the future mobile network could be met, such as addressing the massive growth required in coverage, capacity as well as traffic, providing a better quality of service and experience to users, supporting ultra-high data rates and reliability, and ensuring ultra-low latency. However, due to the characteristics of mmWave, such as short transmission distance, high sensitivity to the blockage, and large propagation path loss, there are some challenges for mmWave cellular network design. In this context, to enjoy the benefits from the mmWave networks, the architecture of next generation cellular network will be more complex. With a more complex network, it comes more complex problems. The plethora of possibilities makes planning and managing a complex network system more difficult. Specifically, to provide better Quality of Service and Quality of Experience for users in the such network, how to provide efficient and effective handover for mobile users is important. The probability of handover trigger will significantly increase in the next generation network, due to the dense small cell deployment. Since the resources in the base station (BS) is limited, the handover management will be a great challenge. Further, to generate the maximum transmission rate for the users, Line-of-sight (LOS) channel would be the main transmission channel. However, due to the characteristics of mmWave and the complexity of the environment, LOS channel is not feasible always. Non-line-of-sight channel should be explored and used as the backup link to serve the users. With all the problems trending to be complex and nonlinear, and the data traffic dramatically increasing, the conventional method is not effective and efficiency any more. In this case, how to solve the problems in the most efficient manner becomes important.
Therefore, some new concepts, as well as novel technologies, require to be explored. Among them, one promising solution is the utilization of machine learning (ML) in the mmWave cellular network. On the one hand, with the aid of ML approaches, the network could learn from the mobile data and it allows the system to use adaptable strategies while avoiding unnecessary human intervention. On the other hand, when ML is integrated in the network, the complexity and workload could be reduced, meanwhile, the huge number of devices and data could be efficiently managed.
Therefore, in this thesis, different ML techniques that assist in optimizing different areas in the mmWave cellular network are explored, in terms of non-line-of-sight (NLOS) beam tracking, handover management, and beam management. To be specific, first of all, a procedure to predict the angle of arrival (AOA) and angle of departure (AOD) both in azimuth and elevation in non-line-of-sight mmWave communications based on a deep neural network is proposed. Moreover, along with the AOA and AOD prediction, a trajectory prediction is employed based on the dynamic window approach (DWA). The simulation scenario is built with ray tracing technology and generate data. Based on the generated data, there are two deep neural networks (DNNs) to predict AOA/AOD in the azimuth (AAOA/AAOD) and AOA/AOD in the elevation (EAOA/EAOD). Furthermore, under an assumption that the UE mobility and the precise location is unknown, UE trajectory is predicted and input into the trained DNNs as a parameter to predict the AAOA/AAOD and EAOA/EAOD to show the performance under a realistic assumption. The robustness of both procedures is evaluated in the presence of errors and conclude that DNN is a promising tool to predict AOA and AOD in a NLOS scenario. Second, a novel handover scheme is designed aiming to optimize the overall system throughput and the total system delay while guaranteeing the quality of service (QoS) of each user equipment (UE). Specifically, the proposed handover scheme called O-MAPPO integrates the reinforcement learning (RL) algorithm and optimization theory. An RL algorithm known as multi-agent proximal policy optimization (MAPPO) plays a role in determining handover trigger conditions. Further, an optimization problem is proposed in conjunction with MAPPO to select the target base station and determine beam selection. It aims to evaluate and optimize the system performance of total throughput and delay while guaranteeing the QoS of each UE after the handover decision is made.
Third, a multi-agent RL-based beam management scheme is proposed, where multiagent deep deterministic policy gradient (MADDPG) is applied on each small-cell base station (SCBS) to maximize the system throughput while guaranteeing the quality of service. With MADDPG, smart beam management methods can serve the UEs more efficiently and accurately. Specifically, the mobility of UEs causes the dynamic changes of the network environment, the MADDPG algorithm learns the experience of these changes. Based on that, the beam management in the SCBS is optimized according the reward or penalty when severing different UEs. The approach could improve the overall system throughput and delay performance compared with traditional beam management methods.
The works presented in this thesis demonstrate the potentiality of ML when addressing the problem from the mmWave cellular network. Moreover, it provides specific solutions for optimizing NLOS beam tracking, handover management and beam management. For NLOS beam tracking part, simulation results show that the prediction errors of the AOA and AOD can be maintained within an acceptable range of ±2. Further, when it comes to the handover optimization part, the numerical results show the system throughput and delay are improved by 10% and 25%, respectively, when compared with two typical RL algorithms, Deep Deterministic Policy Gradient (DDPG) and Deep Q-learning (DQL). Lastly, when it considers the intelligent beam management part, numerical results reveal the convergence performance of the MADDPG and the superiority in improving the system throughput compared with other typical RL algorithms and the traditional beam management method
Computing Interpretable Representations of Cell Morphodynamics
Shape changes (morphodynamics) are one of the principal ways cells interact with their environments and perform key intrinsic behaviours like division. These dynamics arise from a myriad of complex signalling pathways that often organise with emergent simplicity to carry out critical functions including predation, collaboration and migration. A powerful method for analysis can therefore be to quantify this emergent structure, bypassing the low-level complexity. Enormous image datasets are now available to mine. However, it can be difficult to uncover interpretable representations of the global organisation of these heterogeneous dynamic processes. Here, such representations were developed for interpreting morphodynamics in two key areas: mode of action (MoA) comparison for drug discovery (developed using the economically devastating Asian soybean rust crop pathogen) and 3D migration of immune system T cells through extracellular matrices (ECMs). For MoA comparison, population development over a 2D space of shapes (morphospace) was described using two models with condition-dependent parameters: a top-down model of diffusive development over Waddington-type landscapes, and a bottom-up model of tip growth. A variety of landscapes were discovered, describing phenotype transitions during growth, and possible perturbations in the tip growth machinery that cause this variation were identified. For interpreting T cell migration, a new 3D shape descriptor that incorporates key polarisation information was developed, revealing low-dimensionality of shape, and the distinct morphodynamics of run-and-stop modes that emerge at minute timescales were mapped. Periodically oscillating morphodynamics that include retrograde deformation flows were found to underlie active translocation (run mode). Overall, it was found that highly interpretable representations could be uncovered while still leveraging the enormous discovery power of deep learning algorithms. The results show that whole-cell morphodynamics can be a convenient and powerful place to search for structure, with potentially life-saving applications in medicine and biocide discovery as well as immunotherapeutics.Open Acces
- …