1,820 research outputs found

    Machine learning methods for sign language recognition: a critical review and analysis.

    Get PDF
    Sign language is an essential tool to bridge the communication gap between normal and hearing-impaired people. However, the diversity of over 7000 present-day sign languages with variability in motion position, hand shape, and position of body parts making automatic sign language recognition (ASLR) a complex system. In order to overcome such complexity, researchers are investigating better ways of developing ASLR systems to seek intelligent solutions and have demonstrated remarkable success. This paper aims to analyse the research published on intelligent systems in sign language recognition over the past two decades. A total of 649 publications related to decision support and intelligent systems on sign language recognition (SLR) are extracted from the Scopus database and analysed. The extracted publications are analysed using bibliometric VOSViewer software to (1) obtain the publications temporal and regional distributions, (2) create the cooperation networks between affiliations and authors and identify productive institutions in this context. Moreover, reviews of techniques for vision-based sign language recognition are presented. Various features extraction and classification techniques used in SLR to achieve good results are discussed. The literature review presented in this paper shows the importance of incorporating intelligent solutions into the sign language recognition systems and reveals that perfect intelligent systems for sign language recognition are still an open problem. Overall, it is expected that this study will facilitate knowledge accumulation and creation of intelligent-based SLR and provide readers, researchers, and practitioners a roadmap to guide future direction

    Ethnicity : UK colorectal cancer screening pilot : final report

    Get PDF
    27. In summary, the overall evaluation of the UK Pilot has demonstrated that key parameters of test and programme performance observed in randomised studies of FOBt screening can be repeated in population-based pilot programmes. However, our study provides strong evidence of very low CRC screening uptake for ethnic groups in the Pilot area. This is coupled with a very low uptake of colonoscopy for individuals from ethnic groups with a positive FOBt result. 28. It has long been acknowledged that a diverse population may require diverse responses. Following the implementation of the Race Relations Amendment Act 2000, there has been a statutory duty laid upon all NHS agencies to ‘have due regard to the need to eliminate unlawful discrimination’, and to make explicit consideration of the implications for racial equality of every action or policy. 29. Because the observed overall outcomes in the UK Pilot generally compare favourably with the results of previous randomised trials of FOBt screening, the main Evaluation Group has concluded that benefits observed in the trials should be repeatable in a national roll-out. 30. However, our study indicates that any national colorectal cancer screening programme would need to very carefully consider the implications of ethnicity for roll-out, and develop a strategic plan on how best to accommodate this at both a national and local level. Based on our findings, consideration will clearly need to be given to improved access and screening service provision for ethnic minorities. 31. In order to ensure adequate CRC screening provision for a diverse UK population, and to address the explicit implications for racial equality highlighted by our findings, interventions now urgently need to be evaluated to improve access for ethnic minorities. This work should be undertaken as part of the second round of CRC screening currently underway in the English Pilot

    A review of temporal aspects of hand gesture analysis applied to discourse analysis and natural conversation

    Get PDF
    Lately, there has been a\ud n increasing\ud interest in hand gesture analysis systems. Recent works have employed\ud pat\ud tern recognition techniques and have focused on the development of systems with more natural user\ud interfaces. These systems may use gestures to control interfaces or recognize sign language gestures\ud , which\ud can provide systems with multimodal interaction; o\ud r consist in multimodal tools to help psycholinguists to\ud understand new aspects of discourse analysis and to automate laborious tasks.\ud Gestures are characterized\ud by several aspects, mainly by movements\ud and sequence of postures\ud . Since data referring to move\ud ments\ud or\ud sequences\ud carry temporal information\ud , t\ud his paper presents a\ud literature\ud review\ud about\ud temporal aspects of\ud hand gesture analysis, focusing on applications related to natural conversation and psycholinguistic\ud analysis, using Systematic Literature Revi\ud ew methodology. In our results, we organized works according to\ud type of analysis, methods, highlighting the use of Machine Learning techniques, and applications.FAPESP 2011/04608-

    Reference and 'référence dangereuse' to persons in Kilivila: An overview and a case study

    Get PDF
    Based on the conversation analysts’ insights into the various forms of third person reference in English, this paper first presents the inventory of forms Kilivila, the Austronesian language of the Trobriand Islanders of Papua New Guinea, offers its speakers for making such references. To illustrate such references to third persons in talk-in-interaction in Kilivila, a case study on gossiping is presented in the second part of the paper. This case study shows that ambiguous anaphoric references to two first mentioned third persons turn out to not only exceed and even violate the frame of a clearly defined situational-intentional variety of Kilivila that is constituted by the genre “gossip”, but also that these references are extremely dangerous for speakers in the Trobriand Islanders’ society. I illustrate how this culturally dangerous situation escalates and how other participants of the group of gossiping men try to “repair” this violation of the frame of a culturally defined and metalinguistically labelled “way of speaking”. The paper ends with some general remarks on how the understanding of forms of person reference in a language is dependent on the culture specific context in which they are produced

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Do Personality and Culture Influence Perceived Video Quality and Enjoyment?

    Get PDF
    The interplay between system, context and human factors is important in perception of multimedia quality. However, studies on human factors are very limited in comparison to those for system and context factors. This article presents an attempt to explore the influence of personality and cultural traits on perception of multimedia quality. As a first step, a database consisting of 144 video sequences from 12 short movie excerpts has been assembled and rated by 114 participants from a cross-cultural population. Thereby providing a useful ground-truth for this (as well as future) study. As a second step, three statistical models are compared: (i) a baseline model to only consider system factors; (ii) an extended model to include personality and culture; and (iii) an optimistic model in which each participant is modeled. As a third step, predictive models based on content, affect, system, and human factors are trained to generalize the statistical findings. As shown by statistical analysis, personality and cultural traits represent 9.3% of the variance attributable to human factors and human factors overall predict an equal or higher proportion of variance compared to system factors. Moreover, the quality-enjoyment correlation varies across the excerpts. Predictive models trained by including human factors demonstrate about 3% and 9% improvement over models trained solely based on system factors for predicting perceived quality and enjoyment. As evidenced by this, human factors indeed are important in perceptual multimedia quality, but the results suggest further investigation of moderation effects and a broader range of human factors is necessary

    Exploring the influence of suprasegmental features of speech on rater judgements of intelligibility

    Get PDF
    A thesis submitted to the University of Bedfordshire in partial fulfilment of the requirements for the degree of Doctor of PhilosophyThe importance of suprasegmental features of speech to pronunciation proficiency is well known, yet limited research has been undertaken to identify how raters attend to suprasegmental features in the English-language speaking test encounter. Currently, such features appear to be underrepresented in language learning frameworks and are not always satisfactorily incorporated into the analytical rating scales that are used by major language testing organisations. This thesis explores the influence of lexical stress, rhythm and intonation on rater decision making in order to provide insight into their proper place in rating scales and frameworks. Data were collected from 30 raters, half of whom were experienced professional raters and half of whom lacked rater training and a background in language learning or teaching. The raters were initially asked to score 12 test taker performances using a 9-point intelligibility scale. The performances were taken from the long turn of Cambridge English Main Suite exams and were selected on the basis of the inclusion of a range of notable suprasegmental features. Following scoring, the raters took part in a stimulated recall procedure to report the features that influenced their decisions. The resulting scores were quantitatively analysed using many-facet Rasch measurement analysis. Transcriptions of the verbal reports were analysed using qualitative methods. Finally, an integrated analysis of the quantitative and qualitative data was undertaken to develop a series of suprasegmental rating scale descriptors. The results showed that experienced raters do appear to attend to specific suprasegmental features in a reliable way, and that their decisions have a great deal in common with the way non-experienced raters regard such features. This indicates that stress, rhythm, and intonation may be somewhat underrepresented on current speaking proficiency scales and frameworks. The study concludes with the presentation of a series of suprasegmental rating scale descriptors

    NON-LINEAR AND SPARSE REPRESENTATIONS FOR MULTI-MODAL RECOGNITION

    Get PDF
    In the first part of this dissertation, we address the problem of representing 2D and 3D shapes. In particular, we introduce a novel implicit shape representation based on Support Vector Machine (SVM) theory. Each shape is represented by an analytic decision function obtained by training an SVM, with a Radial Basis Function (RBF) kernel, so that the interior shape points are given higher values. This empowers support vector shape (SVS) with multifold advantages. First, the representation uses a sparse subset of feature points determined by the support vectors, which significantly improves the discriminative power against noise, fragmentation and other artifacts that often come with the data. Second, the use of the RBF kernel provides scale, rotation, and translation invariant features, and allows a shape to be represented accurately regardless of its complexity. Finally, the decision function can be used to select reliable feature points. These features are described using gradients computed from highly consistent decision functions instead of conventional edges. Our experiments on 2D and 3D shapes demonstrate promising results. The availability of inexpensive 3D sensors like Kinect necessitates the design of new representation for this type of data. We present a 3D feature descriptor that represents local topologies within a set of folded concentric rings by distances from local points to a projection plane. This feature, called as Concentric Ring Signature (CORS), possesses similar computational advantages to point signatures yet provides more accurate matches. CORS produces compact and discriminative descriptors, which makes it more robust to noise and occlusions. It is also well-known to computer vision researchers that there is no universal representation that is optimal for all types of data or tasks. Sparsity has proved to be a good criterion for working with natural images. This motivates us to develop efficient sparse and non-linear learning techniques for automatically extracting useful information from visual data. Specifically, we present dictionary learning methods for sparse and redundant representations in a high-dimensional feature space. Using the kernel method, we describe how the well-known dictionary learning approaches such as the method of optimal directions and KSVD can be made non-linear. We analyse their kernel constructions and demonstrate their effectiveness through several experiments on classification problems. It is shown that non-linear dictionary learning approaches can provide significantly better discrimination compared to their linear counterparts and kernel PCA, especially when the data is corrupted by different types of degradations. Visual descriptors are often high dimensional. This results in high computational complexity for sparse learning algorithms. Motivated by this observation, we introduce a novel framework, called sparse embedding (SE), for simultaneous dimensionality reduction and dictionary learning. We formulate an optimization problem for learning a transformation from the original signal domain to a lower-dimensional one in a way that preserves the sparse structure of data. We propose an efficient optimization algorithm and present its non-linear extension based on the kernel methods. One of the key features of our method is that it is computationally efficient as the learning is done in the lower-dimensional space and it discards the irrelevant part of the signal that derails the dictionary learning process. Various experiments show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive algorithms on signal recovery and object classification tasks. In many practical applications, we are often confronted with the situation where the data that we use to train our models are different from that presented during the testing. In the final part of this dissertation, we present a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N), which makes use of the old data to improve the performance of a system operating on a new domain. Our network jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as traversing deeper into the hierarchy. Experimental results show that our method consistently outperforms the current state-of-the-art by a significant margin. Moreover, we found that a multi-layer {DASH-N} has an edge over the single-layer DASH-N
    corecore