14 research outputs found

    Simulating realistic multiparty speech data: for the development of distant microphone ASR systems

    Get PDF
    Automatic speech recognition has become a ubiquitous technology integrated into our daily lives. However, the problem remains challenging when the speaker is far away from the microphone. In such scenarios, the speech is degraded both by reverberation and by the presence of additive noise. This situation is particularly challenging when there are competing speakers present (i.e. multi-party scenarios) Acoustic scene simulation has been a major tool for training and developing distant microphone speech recognition systems, and is now being used to develop solutions for mult-party scenarios. It has been used both in training -- as it allows cheap generation of limitless amounts of data -- and for evaluation -- because it can provide easy access to a ground truth (i.e. a noise-free target signal). However, whilst much work has been conducted to produce realistic artificial scene simulators, the signals produced from such simulators are only as good as the `metadata' being used to define the setups, i.e., the data describing, for example, the number of speakers and their distribution relative to the microphones. This thesis looks at how realistic metadata can be derived by analysing how speakers behave in real domestic environments. In particular, how to produce scenes that provide a realistic distribution for various factors that are known to influence the 'difficulty' of the scene, including the separation angle between speakers, the absolute and relative distances of speakers to microphones, and the pattern of temporal overlap of speech. Using an existing audio-visual multi-party conversational dataset, CHiME-5, each of these aspects has been studied in turn. First, producing a realistic angular separation between speakers allows for algorithms which enhance signals based on the direction of arrival to be fairly evaluated, reducing the mismatch between real and simulated data. This was estimated using automatic people detection techniques in video recordings from CHiME-5. Results show that commonly used datasets of simulated signals do not follow a realistic distribution, and when a realistic distribution is enforced, a significant drop in performance is observed. Second, by using multiple cameras it has been possible to estimate the 2-D positions of people inside each scene. This has allowed the estimation of realistic distributions for the absolute distance to the microphone and relative distance to the competing speaker. The results show grouping behaviour among participants when located in a room and the impact this has on performance depends on the room size considered. Finally, the amount of overlap and points in the mixture which contain overlap were explored using finite-state models. These models allowed for mixtures to be generated, which approached the overlap patterns observed in the real data. Features derived from these models were also shown to be a predictor of the difficulty of the mixture. At each stage of the project, simulated datasets derived using the realistic metadata distributions have been compared to existing standard datasets that use naive or uninformed metadata distributions, and implications for speech recognition performance are observed and discussed. This work has demonstrated how unrealistic approaches can produce over-promising results, and can bias research towards techniques that might not work well in practice. Results will also be valuable in informing the design of future simulated datasets

    Online Audio-Visual Multi-Source Tracking and Separation: A Labeled Random Finite Set Approach

    Get PDF
    The dissertation proposes an online solution for separating an unknown and time-varying number of moving sources using audio and visual data. The random finite set framework is used for the modeling and fusion of audio and visual data. This enables an online tracking algorithm to estimate the source positions and identities for each time point. With this information, a set of beamformers can be designed to separate each desired source and suppress the interfering sources

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Human-Computer Interaction

    Get PDF
    In this book the reader will find a collection of 31 papers presenting different facets of Human Computer Interaction, the result of research projects and experiments as well as new approaches to design user interfaces. The book is organized according to the following main topics in a sequential order: new interaction paradigms, multimodality, usability studies on several interaction mechanisms, human factors, universal design and development methodologies and tools

    Security and Privacy Threats on Mobile Devices through Side-Channels Analysis

    Get PDF
    In recent years, mobile devices (such as smartphones and tablets) have become essential tools in everyday life for billions of people all around the world. Users continuously carry such devices with them and use them for daily communication activities and social network interactions. Hence, such devices contain a huge amount of private and sensitive information. For this reason, mobile devices become popular targets of attacks. In most attack settings, the adversary aims to take local or remote control of a device to access user sensitive information. However, such violations are not easy to carry out since they need to leverage a vulnerability of the system or a careless user (i.e., install a malware app from an unreliable source). A different approach that does not have these shortcomings is the side-channels analysis. In fact, side-channels are physical phenomenon that can be measured from both inside or outside a device. They are mostly due to the user interaction with a mobile device, but also to the context in which the device is used, hence they can reveal sensitive user information such as identity and habits, environment, and operating system itself. Hence, this approach consists of inferring private information that is leaked by a mobile device through a side-channel. Besides, side-channel information is also extremely valuable to enforce security mechanisms such as user authentication, intrusion and information leaks detection. This dissertation investigates novel security and privacy challenges on the analysis of side-channels of mobile devices. This thesis is composed of three parts, each focused on a different side-channel: (i) the usage of network traffic analysis to infer user private information; (ii) the energy consumption of mobile devices during battery recharge as a way to identify a user and as a covert channel to exfiltrate data; and (iii) the possible security application of data collected from built-in sensors in mobile devices to authenticate the user and to evade sandbox detection by malware. In the first part of this dissertation, we consider an adversary who is able to eavesdrop the network traffic of the device on the network side (e.g., controlling a WiFi access point). The fact that the network traffic is often encrypted makes the attack even more challenging. Our work proves that it is possible to leverage machine learning techniques to identify user activity and apps installed on mobile devices analyzing the encrypted network traffic they produce. Such insights are becoming a very attractive data gathering technique for adversaries, network administrators, investigators and marketing agencies. In the second part of this thesis, we investigate the analysis of electric energy consumption. In this case, an adversary is able to measure with a power monitor the amount of energy supplied to a mobile device. In fact, we observed that the usage of mobile device resources (e.g., CPU, network capabilities) directly impacts the amount of energy retrieved from the supplier, i.e., USB port for smartphones, wall-socket for laptops. Leveraging energy traces, we are able to recognize a specific laptop user among a group and detect intruders (i.e., user not belonging to the group). Moreover, we show the feasibility of a covert channel to exfiltrate user data which relies on temporized energy consumption bursts. In the last part of this dissertation, we present a side-channel that can be measured within the mobile device itself. Such channel consists of data collected from the sensors a mobile device is equipped with (e.g., accelerometer, gyroscope). First, we present DELTA, a novel tool that collects data from such sensors, and logs user and operating system events. Then, we develop MIRAGE, a framework that relies on sensors data to enhance sandboxes against malware analysis evasion

    Abstracts on Radio Direction Finding (1899 - 1995)

    Get PDF
    The files on this record represent the various databases that originally composed the CD-ROM issue of "Abstracts on Radio Direction Finding" database, which is now part of the Dudley Knox Library's Abstracts and Selected Full Text Documents on Radio Direction Finding (1899 - 1995) Collection. (See Calhoun record https://calhoun.nps.edu/handle/10945/57364 for further information on this collection and the bibliography). Due to issues of technological obsolescence preventing current and future audiences from accessing the bibliography, DKL exported and converted into the three files on this record the various databases contained in the CD-ROM. The contents of these files are: 1) RDFA_CompleteBibliography_xls.zip [RDFA_CompleteBibliography.xls: Metadata for the complete bibliography, in Excel 97-2003 Workbook format; RDFA_Glossary.xls: Glossary of terms, in Excel 97-2003 Workbookformat; RDFA_Biographies.xls: Biographies of leading figures, in Excel 97-2003 Workbook format]; 2) RDFA_CompleteBibliography_csv.zip [RDFA_CompleteBibliography.TXT: Metadata for the complete bibliography, in CSV format; RDFA_Glossary.TXT: Glossary of terms, in CSV format; RDFA_Biographies.TXT: Biographies of leading figures, in CSV format]; 3) RDFA_CompleteBibliography.pdf: A human readable display of the bibliographic data, as a means of double-checking any possible deviations due to conversion

    Optimising route comfort indices for neonatal transfers by road

    Get PDF
    The risk of severe brain injuries in sick premature infants increases when transferred between hospitals. Causality is uncertain, but stress levels are elevated during ambulance journeys; potentially due to excessive levels of noise and vibration. It has been proposed that reducing these levels would reduce the risk, with one prospective method being comfort-optimised navigation. An Android app was developed that logs noise level, Inertial Measurement Unit (IMU) and location data during journeys, sampling at the fastest rates possible depending on the hardware and firmware. The smartphone used during development was found to sample noise levels accurate to 0.3 dB up to 80 dB(A) and accelerations accurate to 10\% up to 40~Hz, although considerable jitter was present in the IMU sampling. Recorded data were shown to be repeatable for multiple passes over the same stretch of road (acceleration interquartile range (IQR): 0.14ms^{-2}; noise IQR: 2.8 dB). Data were influenced by both supplementary audio and the smartphone model so an initial idea of gathering data through public engagement was determined unsuitable. Controlled collection of data was planned, utilising the neonatal ambulances operated by CenTre Neonatal Transport (CenTre). A new smartphone model was identified that was capable of sampling accelerations at a sufficient rate to comply with the "Evaluation of human exposure to whole-body vibration" standard, ISO 2631. This model also had greater processing power than the previous model used during initial testing, resulting in reduced jitter, and was found to provide more accurate accelerations (within 5% up to 55 Hz). Logging of periods before and after each journey was added along with meta-data describing each journey. Journeys performed by CenTre were recorded over the course of 12 months. Recorded variables were supplemented by calculation of ISO-weighted vibration parameters. The final dataset comprises 1,487 journeys over 81,901 km and 1,318 hours. Strong similarities between meta-data and officially reported transport data suggested there was no bias in the journeys that the staff recorded. Roads driven between Nottingham City Hospital (NCH) and Leicester Royal Infirmary (LRI) were chosen as a case study. Data from 588 journeys contributed towards the analysis. A range of metrics, derived from previous studies and adult standards, were used to assess the roads of the NCH to LRI network. Both speed and road classification were found to influence vibration and noise level, however the influence could not be separated due to the inherent link between both parameters. All routes involved either use of motorway or a concrete A-road, with the latter producing worse vibration. Although individual road sections varied, differences were reduced between the routes. Assessments were also performed of the metrics at each of the 42 hospitals (36 departing; 38 arriving) present in the data. Results were similar between hospitals, but differed between loading and unloading phases. High magnitude shocks were more abundant during the loading phases, whereas low impact vibrations were more frequent during unloading. Both phases registered greater shocks than those found during journeys. In summary, this work provides a low-cost method of obtaining large amounts of data describing the ambulance environment without requiring any technical knowledge to operate. The theory that the physical environment could be altered through routing has also been confirmed. The data collected during this work could be utilised in the future to aid determination of neonatal responses and subsequently establish optimal routes

    Proceedings of the 1st European conference on disability, virtual reality and associated technologies (ECDVRAT 1996)

    Get PDF
    The proceedings of the conferenc

    Optimising route comfort indices for neonatal transfers by road

    Get PDF
    The risk of severe brain injuries in sick premature infants increases when transferred between hospitals. Causality is uncertain, but stress levels are elevated during ambulance journeys; potentially due to excessive levels of noise and vibration. It has been proposed that reducing these levels would reduce the risk, with one prospective method being comfort-optimised navigation. An Android app was developed that logs noise level, Inertial Measurement Unit (IMU) and location data during journeys, sampling at the fastest rates possible depending on the hardware and firmware. The smartphone used during development was found to sample noise levels accurate to 0.3 dB up to 80 dB(A) and accelerations accurate to 10\% up to 40~Hz, although considerable jitter was present in the IMU sampling. Recorded data were shown to be repeatable for multiple passes over the same stretch of road (acceleration interquartile range (IQR): 0.14ms^{-2}; noise IQR: 2.8 dB). Data were influenced by both supplementary audio and the smartphone model so an initial idea of gathering data through public engagement was determined unsuitable. Controlled collection of data was planned, utilising the neonatal ambulances operated by CenTre Neonatal Transport (CenTre). A new smartphone model was identified that was capable of sampling accelerations at a sufficient rate to comply with the "Evaluation of human exposure to whole-body vibration" standard, ISO 2631. This model also had greater processing power than the previous model used during initial testing, resulting in reduced jitter, and was found to provide more accurate accelerations (within 5% up to 55 Hz). Logging of periods before and after each journey was added along with meta-data describing each journey. Journeys performed by CenTre were recorded over the course of 12 months. Recorded variables were supplemented by calculation of ISO-weighted vibration parameters. The final dataset comprises 1,487 journeys over 81,901 km and 1,318 hours. Strong similarities between meta-data and officially reported transport data suggested there was no bias in the journeys that the staff recorded. Roads driven between Nottingham City Hospital (NCH) and Leicester Royal Infirmary (LRI) were chosen as a case study. Data from 588 journeys contributed towards the analysis. A range of metrics, derived from previous studies and adult standards, were used to assess the roads of the NCH to LRI network. Both speed and road classification were found to influence vibration and noise level, however the influence could not be separated due to the inherent link between both parameters. All routes involved either use of motorway or a concrete A-road, with the latter producing worse vibration. Although individual road sections varied, differences were reduced between the routes. Assessments were also performed of the metrics at each of the 42 hospitals (36 departing; 38 arriving) present in the data. Results were similar between hospitals, but differed between loading and unloading phases. High magnitude shocks were more abundant during the loading phases, whereas low impact vibrations were more frequent during unloading. Both phases registered greater shocks than those found during journeys. In summary, this work provides a low-cost method of obtaining large amounts of data describing the ambulance environment without requiring any technical knowledge to operate. The theory that the physical environment could be altered through routing has also been confirmed. The data collected during this work could be utilised in the future to aid determination of neonatal responses and subsequently establish optimal routes

    METROPOLITAN ENCHANTMENT AND DISENCHANTMENT. METROPOLITAN ANTHROPOLOGY FOR THE CONTEMPORARY LIVING MAP CONSTRUCTION

    Get PDF
    We can no longer interpret the contemporary metropolis as we did in the last century. The thought of civil economy regarding the contemporary Metropolis conflicts more or less radically with the merely acquisitive dimension of the behaviour of its citizens. What is needed is therefore a new capacity for imagining the economic-productive future of the city: hybrid social enterprises, economically sustainable, structured and capable of using technologies, could be a solution for producing value and distributing it fairly and inclusively. Metropolitan Urbanity is another issue to establish. Metropolis needs new spaces where inclusion can occur, and where a repository of the imagery can be recreated. What is the ontology behind the technique of metropolitan planning and management, its vision and its symbols? Competitiveness, speed, and meritocracy are political words, not technical ones. Metropolitan Urbanity is the characteristic of a polis that expresses itself in its public places. Today, however, public places are private ones that are destined for public use. The Common Good has always had a space of representation in the city, which was the public space. Today, the Green-Grey Infrastructure is the metropolitan city's monument that communicates a value for future generations and must therefore be recognised and imagined; it is the production of the metropolitan symbolic imagery, the new magic of the city
    corecore