243 research outputs found
Energy-based temporal neural networks for imputing missing values
Imputing missing values in high dimensional time series is a difficult problem. There have been some approaches to the problem [11,8] where neural architectures were trained as probabilistic models of the data. However, we argue that this approach is not optimal. We propose to view temporal neural networks with latent variables as energy-based models and train them for missing value recovery directly. In this paper we introduce two energy-based models. The first model is based on a one dimensional convolution and the second model utilizes a recurrent neural network. We demonstrate how ideas from the energy-based learning framework can be used to train these models to recover missing values. The models are evaluated on a motion capture dataset
Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
Automatic speech recognition can potentially benefit from the lip motion
patterns, complementing acoustic speech to improve the overall recognition
performance, particularly in noise. In this paper we propose an audio-visual
fusion strategy that goes beyond simple feature concatenation and learns to
automatically align the two modalities, leading to enhanced representations
which increase the recognition accuracy in both clean and noisy conditions. We
test our strategy on the TCD-TIMIT and LRS2 datasets, designed for large
vocabulary continuous speech recognition, applying three types of noise at
different power ratios. We also exploit state of the art Sequence-to-Sequence
architectures, showing that our method can be easily integrated. Results show
relative improvements from 7% up to 30% on TCD-TIMIT over the acoustic modality
alone, depending on the acoustic noise level. We anticipate that the fusion
strategy can easily generalise to many other multimodal tasks which involve
correlated modalities. Code available online on GitHub:
https://github.com/georgesterpu/Sigmedia-AVSRComment: In ICMI'18, October 16-20, 2018, Boulder, CO, USA. Equation (2)
corrected on this versio
The relationship between mandibular advancement, tongue movement, and treatment outcome in obstructive sleep apnea
Study Objectives: To characterize how mandibular advancement enlarges the upper airway via posterior tongue advancement in people with obstructive sleep apnea (OSA) and whether this is associated with mandibular advancement splint (MAS) treatment outcome. Methods: One-hundred and one untreated people with OSA underwent a 3T magnetic resonance (MRI) scan. Dynamic mid-sagittal posterior tongue and mandible movements during passive jaw advancement were measured with tagged MRI. Upper airway cross-sectional areas were measured with the mandible in a neutral position and advanced to 70% of maximum advancement. Treatment outcome was determined after a minimum of 9 weeks of therapy. Results: Seventy-one participants completed the study: 33 were responders (AHI50% AHI reduction), 11 were partial responders (>50% AHI reduction but AHI>10 events/hr), and 27 nonresponders (AHI reduction 4 mm). In comparison, a model using only baseline AHI correctly classified 50.0% of patients (5-fold cross-validated 52.5%, n = 40). Conclusions: Tongue advancement and upper airway enlargement with mandibular advancement in conjunction with baseline AHI improve treatment response categorization to a satisfactory level (69.2%, 5-fold cross-validated 62.5%)
Enhancing the operational stability of unencapsulated perovskite solar cells through Cu-Ag bilayer electrode incorporation
We identify a facile strategy that significantly reduces electrode corrosion and device degradation in unencapsulated perovskite solar cells (PSCs) operating in ambient air. By employing Cu-Ag bilayer top electrodes PSCs, we show enhanced operational lifetime compared with devices prepared from single metal (Al, Ag and Cu) analogues. Time-of-flight secondary ion mass spectrometry depth profiles indicate that the insertion of the thin layer of Cu (10nm) below the Ag (100nm) electrode significantly reduces diffusion of species originating in the perovskite active layer into the electron transport layer and electrode. X-ray diffraction (XRD) analysis reveals the mutually beneficial relationship between the bilayer metals, whereby the thermally evaporated Ag inhibits Cu oxidation and the Cu prevents interfacial reactions between the perovskite and Ag. The results here not only demonstrate a simple approach to prevent the electrode and device degradation that enhance lifetime and stability but also give an insight into ageing related ion migration and structural reorganisation
Checklist of the Odonata (Insecta) of Sundaland and Wallacea (Malaysia, Singapore, Brunei, Indonesia and Timor Leste)
A checklist, based on a database containing published data, of the Odonata (dragonflies and damselflies) occurring in
Sundaland and wallacea is presented. The presence of (sub)species is indicated for eight main regions (Singapore & Peninsular Malaysia, South China Sea (islands in the South China Sea that are not sensibly treated as satellites of larger landmasses), Borneo, Sumatra, Java & Bali, Lesser Sunda, Sulawesi, Moluccas), 22 subregions and 80 smaller islands and island groups. In total 743 full species are recorded from the entire area with 549 species known from Sundaland and 270 from wallacea. Of these 482 are not found outside Sundaland and wallacea, 385 (ca. 52% of the fauna) of which are single region endemics; the majority of these are actually single island endemics. Notes are provided on taxonomic problems or indicating problematic distribution records. Prodasineura lansbergei is considered to be a nomen nudum (stat nov.). For each of the eight main regions the history of the study of odonates is briefly discussed, information is provided on the coverage of the available data and the faunal composition is described. An overview is given of genera for which no larvae have been described. A brief comparison is made between the faunas of Sundaland and wallacea showing that they only share 10% of the species between them (76 of 743)
EEG ERP preregistration template
This preregistration template guides researchers who wish to preregister their EEG projects, more specifically studies investigating event-related potentials (ERPs) in the sensor space
A Domain Adaptation Approach to Improve Speaker Turn Embedding Using Face Representation
This paper proposes a novel approach to improve speaker modeling using knowledge transferred from face representation. In particular, we are interested in learning a discriminative metric which allows speaker turns to be compared directly, which is beneficial for tasks such as diarization and dialogue analysis. Our method improves the embedding space of speaker turns by applying maximum mean discrepancy loss to minimize the disparity between the distributions of facial and acoustic embedded features. This approach aims to discover the shared underlying structure of the two embedded spaces, thus enabling the transfer of knowledge from the richer face representation to the counterpart in speech. Experiments are conducted on broadcast TV news datasets, REPERE and ETAPE, to demonstrate the validity of our method. Quantitative results in verification and clustering tasks show promising improvement, especially in cases where speaker turns are short or the training data size is limited
Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media
Person discovery in the absence of prior identity knowledge requires accurate association of visual and auditory cues. In broadcast data, multimodal analysis faces additional challenges due to narrated voices over muted scenes or dubbing in different languages. To address these challenges, we define and analyze the problem of dubbing detection in broadcast data, which has not been explored before. We propose a method to represent the temporal relationship between the auditory and visual streams. This method consists of canonical correlation analysis to learn a joint multimodal space, and long short term memory (LSTM) networks to model cross-modality temporal dependencies. Our contributions also include the introduction of a newly acquired dataset of face-speech segments from TV data, which we have made publicly available. The proposed method achieves promising performance on this real world dataset as compared to several baselines
Multinational characterization of neurological phenotypes in patients hospitalized with COVID-19
Neurological complications worsen outcomes in COVID-19. To define the prevalence of neurological conditions among hospitalized patients with a positive SARS-CoV-2 reverse transcription polymerase chain reaction test in geographically diverse multinational populations during early pandemic, we used electronic health records (EHR) from 338 participating hospitals across 6 countries and 3 continents (January–September 2020) for a cross-sectional analysis. We assessed the frequency of International Classification of Disease code of neurological conditions by countries, healthcare systems, time before and after admission for COVID-19 and COVID-19 severity. Among 35,177 hospitalized patients with SARS-CoV-2 infection, there was an increase in the proportion with disorders of consciousness (5.8%, 95% confidence interval [CI] 3.7–7.8%, pFDR < 0.001) and unspecified disorders of the brain (8.1%, 5.7–10.5%, pFDR < 0.001) when compared to the pre-admission proportion. During hospitalization, the relative risk of disorders of consciousness (22%, 19–25%), cerebrovascular diseases (24%, 13–35%), nontraumatic intracranial hemorrhage (34%, 20–50%), encephalitis and/or myelitis (37%, 17–60%) and myopathy (72%, 67–77%) were higher for patients with severe COVID-19 when compared to those who never experienced severe COVID-19. Leveraging a multinational network to capture standardized EHR data, we highlighted the increased prevalence of central and peripheral neurological phenotypes in patients hospitalized with COVID-19, particularly among those with severe disease
- …