1,056 research outputs found

    Multimodal perception of histological images for persons blind or visually impaired

    Get PDF
    Currently there is no suitable substitute technology to enable blind or visually impaired (BVI) people to interpret visual scientific data commonly generated during lab experimentation in real time, such as performing light microscopy, spectrometry, and observing chemical reactions. This reliance upon visual interpretation of scientific data certainly impedes students and scientists that are BVI from advancing in careers in medicine, biology, chemistry, and other scientific fields. To address this challenge, a real-time multimodal image perception system is developed to transform standard laboratory blood smear images for persons with BVI to perceive, employing a combination of auditory, haptic, and vibrotactile feedbacks. These sensory feedbacks are used to convey visual information through alternative perceptual channels, thus creating a palette of multimodal, sensorial information. A Bayesian network is developed to characterize images through two groups of features of interest: primary and peripheral features. Causal relation links were established between these two groups of features. Then, a method was conceived for optimal matching between primary features and sensory modalities. Experimental results confirmed this real-time approach of higher accuracy in recognizing and analyzing objects within images compared to tactile images

    Viseme-based Lip-Reading using Deep Learning

    Get PDF
    Research in Automated Lip Reading is an incredibly rich discipline with so many facets that have been the subject of investigation including audio-visual data, feature extraction, classification networks and classification schemas. The most advanced and up-to-date lip-reading systems can predict entire sentences with thousands of different words and the majority of them use ASCII characters as the classification schema. The classification performance of such systems however has been insufficient and the need to cover an ever expanding range of vocabulary using as few classes as possible is challenge. The work in this thesis contributes to the area concerning classification schemas by proposing an automated lip reading model that predicts sentences using visemes as a classification schema. This is an alternative schema to using ASCII characters, which is the conventional class system used to predict sentences. This thesis provides a review of the current trends in deep learning- based automated lip reading and analyses a gap in the research endeavours of automated lip-reading by contributing towards work done in the region of classification schema. A whole new line of research is opened up whereby an alternative way to do lip-reading is explored and in doing so, lip-reading performance results for predicting s entences from a benchmark dataset are attained which improve upon the current state-of-the-art. In this thesis, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The lip-reading system predicts sentences as a two-stage procedure with visemes being recognised as the first stage and words being classified as the second stage. This is such that the second-stage has to both overcome the one-to-many mapping problem posed in lip-reading where one set of visemes can map to several words, and the problem of visemes being confused or misclassified to begin with. To develop the proposed lip-reading system, a number of tasks have been performed in this thesis. These include the classification of continuous sequences of visemes; and the proposal of viseme-to-word conversion models that are both effective in their conversion performance of predicting words, and robust to the possibility of viseme confusion or misclassification. The initial system reported has been testified on the challenging BBC Lip Reading Sentences 2 (LRS2) benchmark dataset attaining a word accuracy rate of 64.6%. Compared with the state-of-the-art works in lip reading sentences reported at the time, the system had achieved a significantly improved performance. The lip reading system is further improved upon by using a language model that has been demonstrated to be effective at discriminating between homopheme words and being robust to incorrectly classified visemes. An improved performance in predicting spoken sentences from the LRS2 dataset is yielded with an attained word accuracy rate of 79.6% which is still better than another lip-reading system trained and evaluated on the the same dataset that attained a word accuracy rate 77.4% and it is to the best of our knowledge the next best observed result attained on LRS2

    NONLINEAR IDENTIFICATION AND CONTROL: A PRACTICAL SOLUTION AND ITS APPLICATION

    Get PDF
    It is well known that typical welding processes such as laser welding are nonlinear although mostly they are treated as linear system. For the purpose of automatic control, Identification of nonlinear system, especially welding processes is a necessary and fundamental problem. The purpose of this research is to develop a simple and practical identification and control for welding processes. Many investigations have shown the possibility to represent physical processes by nonlinear models, such as Hammerstein structure, consisting of a nonlinearity and linear dynamics in series with each other. Motivated by the fact that typical welding processes do not have non-zeroes, a novel two-step nonlinear Hammerstein identification method is proposed for laser welding processes. The method can be realized both in continuous and discrete case. To study the relation among parameters influencing laser processing, a standard diode laser processing system is built as system prototype. Based on experimental study, a SISO and 2ISO nonlinear Hammerstein model structure are developed to approximate the diode laser welding process. Specific persistent excitation signals such as PRTS (Pseudo-random-ternary-series) to Step signal are used for identification. The model takes welding speed as input and the top surface molten weld pool width as output. A vision based sensor implemented with a Pulse-controlled-CCD camera is proposed and applied to acquire the images and the geometric data of the weld pool. The estimated model is then verified by comparing the simulation and experimental measurement. The verification shows that the model is reasonably correct and can be use to model the nonlinear process for further study. The two-step nonlinear identification method is proved valid and applicable to traditional welding processes and similar manufacturing processes. Based on the identified model, nonlinear control algorithms are also studied. Algorithms include simple linearization and backstepping based robust adaptive control algorithm are proposed and simulated

    Robust Image Hashing Based Efficient Authentication for Smart Industrial Environment

    Full text link
    [EN] Due to large volume and high variability of editing tools, protecting multimedia contents, and ensuring their privacy and authenticity has become an increasingly important issue in cyber-physical security of industrial environments, especially industrial surveillance. The approaches authenticating images using their principle content emerge as popular authentication techniques in industrial video surveillance applications. But maintaining a good tradeoff between perceptual robustness and discriminations is the key research challenge in image hashing approaches. In this paper, a robust image hashing method is proposed for efficient authentication of keyframes extracted from surveillance video data. A novel feature extraction strategy is employed in the proposed image hashing approach for authentication by extracting two important features: the positions of rich and nonzero low edge blocks and the dominant discrete cosine transform (DCT) coefficients of the corresponding rich edge blocks, keeping the computational cost at minimum. Extensive experiments conducted from different perspectives suggest that the proposed approach provides a trustworthy and secure way of multimedia data transmission over surveillance networks. Further, the results vindicate the suitability of our proposal for real-time authentication and embedded security in smart industrial applications compared to state-of-the-art methods.This work was supported in part by the National Natural Science Foundation of China under Grant 61976120, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20191445, in part by the Six Talent Peaks Project of Jiangsu Province under Grant XYDXXJS-048, and sponsored by Qing Lan Project of Jiangsu Province, China.Sajjad, M.; Ul Haq, I.; Lloret, J.; Ding, W.; Muhammad, K. (2019). Robust Image Hashing Based Efficient Authentication for Smart Industrial Environment. IEEE Transactions on Industrial Informatics. 15(12):6541-6550. https://doi.org/10.1109/TII.2019.2921652S65416550151

    An Analysis of Perturbed Quantization Steganography in the Spatial Domain

    Get PDF
    Steganography is a form of secret communication in which a message is hidden into a harmless cover object, concealing the actual existence of the message. Due to the potential abuse by criminals and terrorists, much research has also gone into the field of steganalysis - the art of detecting and deciphering a hidden message. As many novel steganographic hiding algorithms become publicly known, researchers exploit these methods by finding statistical irregularities between clean digital images and images containing hidden data. This creates an on-going race between the two fields and requires constant countermeasures on the part of steganographers in order to maintain truly covert communication. This research effort extends upon previous work in perturbed quantization (PQ) steganography by examining its applicability to the spatial domain. Several different information-reducing transformations are implemented along with the PQ system to study their effect on the security of the system as well as their effect on the steganographic capacity of the system. Additionally, a new statistical attack is formulated for detecting ± 1 embedding techniques in color images. Results from performing state-of-the-art steganalysis reveal that the system is less detectable than comparable hiding methods. Grayscale images embedded with message payloads of 0.4bpp are detected only 9% more accurately than by random guessing, and color images embedded with payloads of 0.2bpp are successfully detected only 6% more reliably than by random guessing

    A group-theoretic approach to formalizing bootstrapping problems

    Get PDF
    The bootstrapping problem consists in designing agents that learn a model of themselves and the world, and utilize it to achieve useful tasks. It is different from other learning problems as the agent starts with uninterpreted observations and commands, and with minimal prior information about the world. In this paper, we give a mathematical formalization of this aspect of the problem. We argue that the vague constraint of having "no prior information" can be recast as a precise algebraic condition on the agent: that its behavior is invariant to particular classes of nuisances on the world, which we show can be well represented by actions of groups (diffeomorphisms, permutations, linear transformations) on observations and commands. We then introduce the class of bilinear gradient dynamics sensors (BGDS) as a candidate for learning generic robotic sensorimotor cascades. We show how framing the problem as rejection of group nuisances allows a compact and modular analysis of typical preprocessing stages, such as learning the topology of the sensors. We demonstrate learning and using such models on real-world range-finder and camera data from publicly available datasets

    Image watermarking, steganography, and morphological processing

    Get PDF
    With the fast development of computer technology, research in the fields of multimedia security, image processing, and robot vision have recently become popular. Image watermarking, steganogrphic system, morphological processing and shortest path planning are important subjects among them. In this dissertation, the fundamental techniques are reviewed first followed by the presentation of novel algorithms and theorems for these three subjects. The research on multimedia security consists of two parts, image watermarking and steganographic system. In image watermarking, several algorithms are developed to achieve different goals as shown below. In order to embed more watermarks and to minimize distortion of watermarked images, a novel watermarking technique using combinational spatial and frequency domains is presented. In order to correct rounding errors, a novel technique based on the genetic algorithm (GA) is developed. By separating medical images into Region of Interest (ROI) and non-ROI parts, higher compression rates can be achieved where the ROI is compressed by lossless compression and the non-ROI by lossy compression. The GA-based watermarking technique can also be considered as a fundamental platform for other fragile watermarking techniques. In order to simplify the selection and integrate different watermarking techniques, a novel adjusted-purpose digital watermarking is developed. In order to enlarge the capacity of robust watermarking, a novel robust high-capacity watermarking is developed. In steganographic system, a novel steganographic algorithm is developed by using GA to break the inspection of steganalytic system. In morphological processing, the GA-based techniques are developed to decompose arbitrary shapes of big binary structuring elements and arbitrary values of big grayscale structuring elements into small ones. The decomposition is suited for a parallel-pipelined architecture. The techniques can speed up the morphological processing and allow full freedom for users to design any type and any size of binary and grayscale structuring elements. In applications such as shortest path planning, a novel method is first presented to obtaining Euclidean distance transformation (EDT) in just two scans of image. The shortest path can be extracted based on distance maps by tracking minimum values. In order to record the motion path, a new chain-code representation is developed to allow forward and backward movements. By placing the smooth turning-angle constraint, it is possible to mimic realistic motions of cars. By using dynamically rotational morphology, it is not only guarantee collision-free in the shortest path, but also reduce time complexity dramatically. As soon as the distance map of a destination and collision-free codes have been established off-line, shortest paths of cars given any starting location toward the destination can be promptly obtained on-line
    • …
    corecore