10,519 research outputs found

    Deep Multimodal Learning for Audio-Visual Speech Recognition

    Full text link
    In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep network is built. While the audio network alone achieves a phone error rate (PER) of 41%41\% under clean condition on the IBM large vocabulary audio-visual studio dataset, this fusion model achieves a PER of 35.83%35.83\% demonstrating the tremendous value of the visual channel in phone classification even in audio with high signal to noise ratio. Second, we present a new deep network architecture that uses a bilinear softmax layer to account for class specific correlations between modalities. We show that combining the posteriors from the bilinear networks with those from the fused model mentioned above results in a further significant phone error rate reduction, yielding a final PER of 34.03%34.03\%.Comment: ICASSP 201

    Cluster analysis of multiplex ligation-dependent probe amplification data in choroidal melanoma.

    Get PDF
    PurposeTo determine underlying correlations in multiplex ligation-dependent probe amplification (MLPA) data and their significance regarding survival following treatment of choroidal melanoma (CM).MethodsMLPA data were available for 31 loci across four chromosomes (1p, 3, 6, and 8) in tumor material obtained from 602 patients with CM treated at the Liverpool Ocular Oncology Center (LOOC) between 1993 and 2012. Data representing chromosomes 3 and 8q were analyzed in depth since their association with CM patient survival is well-known. Unsupervised k-means cluster analysis was performed to detect latent structure in the data set. Principal component analysis (PCA) was also performed to determine the intrinsic dimensionality of the data. Survival analyses of the identified clusters were performed using Kaplan-Meier (KM) and log-rank statistical tests. Correlation with largest basal tumor diameter (LTD) was investigated.ResultsChromosome 3: A two-cluster (bimodal) solution was found in chromosome 3, characterized by centroids at unilaterally normal probe values and unilateral deletion. There was a large, significant difference in the survival characteristics of the two clusters (log-rank, p<0.001; 5-year survival: 80% versus 40%). Both clusters had a broad distribution in LTD, although larger tumors were characteristically in the poorer outcome group (Mann-Whitney, p<0.001). Threshold values of 0.85 for deletion and 1.15 for gain optimized the classification of the clusters. PCA showed that the first principal component (PC1) contained more than 80% of the data set variance and all of the bimodality, with uniform coefficients (0.28±0.03). Chromosome 8q: No clusters were found in chromosome 8q. Using a conventional threshold-based definition of 8q gain, and in conjunction with the chromosome 3 clusters, three prognostic groups were identified: chromosomes 3 and 8q both normal, either chromosome 3 or 8q abnormal, and both chromosomes 3 and 8q abnormal. KM analysis showed 5-year survival figures of approximately 97%, 80%, and 30% for these prognostic groups, respectively (log-rank, p<0.001). All MLPA probes within both chromosomes were significantly correlated with each other (Spearman, p<0.001).ConclusionsWithin chromosome 3, the strong correlation between the MLPA variables and the uniform coefficients from the PCA indicates a lack of evidence for a signature gene that might account for the bimodality we observed. We hypothesize that the two clusters we found correspond to binary underlying states of complete monosomy or disomy 3 and that these states are sampled by the complete ensemble of probes. Consequently, we would expect a similar pattern to emerge in higher-resolution MLPA data sets. LTD may be a significant confounding factor. Considering chromosome 8q, we found that chromosome 3 cluster membership and 8q gain as traditionally defined have an indistinguishable impact on patient outcome

    Review: Do the Different Sensory Areas within the Cat Anterior Ectosylvian Sulcal Cortex Collectively Represent a Network Multisensory Hub?

    Get PDF
    Current theory supports that the numerous functional areas of the cerebral cortex are organized and function as a network. Using connectional databases and computational approaches, the cerebral network has been demonstrated to exhibit a hierarchical structure composed of areas, clusters and, ultimately, hubs. Hubs are highly connected, higher-order regions that also facilitate communication between different sensory modalities. One region computationally identified network hub is the visual area of the Anterior Ectosylvian Sulcal cortex (AESc) of the cat. The Anterior Ectosylvian Visual area (AEV) is but one component of the AESc that also includes the auditory (Field of the Anterior Ectosylvian Sulcus - FAES) and somatosensory (Fourth somatosensory representation - SIV). To better understand the nature of cortical network hubs, the present report reviews the biological features of the AESc. Within the AESc, each area has extensive external cortical connections as well as among one another. Each of these core representations is separated by a transition zone characterized by bimodal neurons that share sensory properties of both adjoining core areas. Finally, core and transition zones are underlain by a continuous sheet of layer 5 neurons that project to common output structures. Altogether, these shared properties suggest that the collective AESc region represents a multiple sensory/multisensory cortical network hub. Ultimately, such an interconnected, composite structure adds complexity and biological detail to the understanding of cortical network hubs and their function in cortical processing

    Use of remotely-derived bathymetry for modelling biomass in marine environments

    Get PDF
    The paper presents results on the influence of geometric attributes of satellite-derived raster bathymetric data, namely the General Bathymetric Charts of the Oceans, on spatial statistical modelling of marine biomass. In the initial experiment, both the resolution and projection of the raster dataset are taken into account. It was found that, independently of the equal-area projection chosen for the analysis, the calculated areas are very similar, and the differences between them are insignificant. Likewise, any variation in the raster resolution did not change the computed area. Although the differences were shown to be insignificant, for the subsequent analysis we selected the cylindrical equal area projection, as it implies rectangular spatial extent, along with the automatically derived resolution. Then, in the second experiment, we focused on demersal fish biomass data acquired from trawl samples taken from the western parts of ICES Sub-area VII, near the sea floor. The aforementioned investigation into processing bathymetric data allowed us to build various statistical models that account for a relationship between biomass, sea floor topography and geographic location. We fitted a set of generalised additive models and generalised additive mixed models to combinations of trawl data of the roundnose grenadier (Coryphaenoides rupestris) and bathymetry. Using standard statistical techniques—such as analysis of variance, Akaike information criterion, root mean squared error, mean absolute error and cross-validation—we compared the performance of the models and found that depth and latitude may serve as statistically significant explanatory variables for biomass of roundnose grenadier in the study area. However, the results should be interpreted with caution as sampling locations may have an impact on the biomass–depth relationship
    corecore