486 research outputs found

    The robustness of animated text CAPTCHAs

    Get PDF
    PhD ThesisCAPTCHA is standard security technology that uses AI techniques to tells computer and human apart. The most widely used CAPTCHA are text-based CAPTCHA schemes. The robustness and usability of these CAPTCHAs relies mainly on the segmentation resistance mechanism that provides robustness against individual character recognition attacks. However, many CAPTCHAs have been shown to have critical flaws caused by many exploitable invariants in their design, leaving only a few CAPTCHA schemes resistant to attacks, including ReCAPTCHA and the Wikipedia CAPTCHA. Therefore, new alternative approaches to add motion to the CAPTCHA are used to add another dimension to the character cracking algorithms by animating the distorted characters and the background, which are also supported by tracking resistance mechanisms that prevent the attacks from identifying the main answer through frame-toframe attacks. These technologies are used in many of the new CAPTCHA schemes including the Yahoo CAPTCHA, CAPTCHANIM, KillBot CAPTCHAs, non-standard CAPTCHA and NuCAPTCHA. Our first question: can the animated techniques included in the new CAPTCHA schemes provide the required level of robustness against the attacks? Our examination has shown many of the CAPTCHA schemes that use the animated features can be broken through tracking attacks including the CAPTCHA schemes that uses complicated tracking resistance mechanisms. The second question: can the segmentation resistance mechanism used in the latest standard text-based CAPTCHA schemes still provide the additional required level of resistance against attacks that are not present missed in animated schemes? Our test against the latest version of ReCAPTCHA and the Wikipedia CAPTCHA exposed vulnerability problems against the novel attacks mechanisms that achieved a high success rate against them. The third question: how much space is available to design an animated text-based CAPTCHA scheme that could provide a good balance between security and usability? We designed a new animated text-based CAPTCHA using guidelines we designed based on the results of our attacks on standard and animated text-based CAPTCHAs, and we then tested its security and usability to answer this question. ii In this thesis, we put forward different approaches to examining the robustness of animated text-based CAPTCHA schemes and other standard text-based CAPTCHA schemes against segmentation and tracking attacks. Our attacks included several methodologies that required thinking skills in order to distinguish the animated text from the other animated noises, including the text distorted by highly tracking resistance mechanisms that displayed them partially as animated segments and which looked similar to noises in other CAPTCHA schemes. These attacks also include novel attack mechanisms and other mechanisms that uses a recognition engine supported by attacking methods that exploit the identified invariants to recognise the connected characters at once. Our attacks also provided a guideline for animated text-based CAPTCHAs that could provide resistance to tracking and segmentation attacks which we designed and tested in terms of security and usability, as mentioned before. Our research also contributes towards providing a toolbox for breaking CAPTCHAs in addition to a list of robustness and usability issues in the current CAPTCHA design that can be used to provide a better understanding of how to design a more resistant CAPTCHA scheme

    Information embedding and retrieval in 3D printed objects

    Get PDF
    Deep learning and convolutional neural networks have become the main tools of computer vision. These techniques are good at using supervised learning to learn complex representations from data. In particular, under limited settings, the image recognition model now performs better than the human baseline. However, computer vision science aims to build machines that can see. It requires the model to be able to extract more valuable information from images and videos than recognition. Generally, it is much more challenging to apply these deep learning models from recognition to other problems in computer vision. This thesis presents end-to-end deep learning architectures for a new computer vision field: watermark retrieval from 3D printed objects. As it is a new area, there is no state-of-the-art on many challenging benchmarks. Hence, we first define the problems and introduce the traditional approach, Local Binary Pattern method, to set our baseline for further study. Our neural networks seem useful but straightfor- ward, which outperform traditional approaches. What is more, these networks have good generalization. However, because our research field is new, the problems we face are not only various unpredictable parameters but also limited and low-quality training data. To address this, we make two observations: (i) we do not need to learn everything from scratch, we know a lot about the image segmentation area, and (ii) we cannot know everything from data, our models should be aware what key features they should learn. This thesis explores these ideas and even explore more. We show how to use end-to-end deep learning models to learn to retrieve watermark bumps and tackle covariates from a few training images data. Secondly, we introduce ideas from synthetic image data and domain randomization to augment training data and understand various covariates that may affect retrieve real-world 3D watermark bumps. We also show how the illumination in synthetic images data to effect and even improve retrieval accuracy for real-world recognization applications

    High Capacity Analog Channels for Smart Documents

    Get PDF
    Widely-used valuable hardcopy documents such as passports, visas, driving licenses, educational certificates, entrance-passes for entertainment events etc. are conventionally protected against counterfeiting and data tampering attacks by applying analog security technologies (e.g. KINEGRAMS®, holograms, micro-printing, UV/IR inks etc.). How-ever, easy access to high quality, low price modern desktop publishing technology has left most of these technologies ineffective, giving rise to high quality false documents. The higher price and restricted usage are other drawbacks of the analog document pro-tection techniques. Digital watermarking and high capacity storage media such as IC-chips, optical data stripes etc. are the modern technologies being used in new machine-readable identity verification documents to ensure contents integrity; however, these technologies are either expensive or do not satisfy the application needs and demand to look for more efficient document protection technologies. In this research three different high capacity analog channels: high density data stripe (HD-DataStripe), data hiding in printed halftone images (watermarking), and super-posed constant background grayscale image (CBGI) are investigated for hidden com-munication along with their applications in smart documents. On way to develop high capacity analog channels, noise encountered from printing and scanning (PS) process is investigated with the objective to recover the digital information encoded at nearly maximum channel utilization. By utilizing noise behaviour, countermeasures against the noise are taken accordingly in data recovery process. HD-DataStripe is a printed binary image similar to the conventional 2-D barcodes (e.g. PDF417), but it offers much higher data storage capacity and is intended for machine-readable identity verification documents. The capacity offered by the HD-DataStripe is sufficient to store high quality biometric characteristics rather than extracted templates, in addition to the conventional bearer related data contained in a smart ID-card. It also eliminates the need for central database system (except for backup record) and other ex-pensive storage media, currently being used. While developing novel data-reading tech-nique for HD-DataStripe, to count for the unavoidable geometrical distortions, registra-tion marks pattern is chosen in such a way so that it results in accurate sampling points (a necessary condition for reliable data recovery at higher data encoding-rate). For more sophisticated distortions caused by the physical dot gain effects (intersymbol interfer-ence), the countermeasures such as application of sampling theorem, adaptive binariza-tion and post-data processing, each one of these providing only a necessary condition for reliable data recovery, are given. Finally, combining the various filters correspond-ing to these countermeasures, a novel Data-Reading technique for HD-DataStripe is given. The novel data-reading technique results in superior performance than the exist-ing techniques, intended for data recovery from printed media. In another scenario a small-size HD-DataStripe with maximum entropy is used as a copy detection pattern by utilizing information loss encountered at nearly maximum channel capacity. While considering the application of HD-DataStripe in hardcopy documents (contracts, official letters etc.), unlike existing work [Zha04], it allows one-to-one contents matching and does not depend on hash functions and OCR technology, constraints mainly imposed by the low data storage capacity offered by the existing analog media. For printed halftone images carrying hidden information higher capacity is mainly attributed to data-reading technique for HD-DataStripe that allows data recovery at higher printing resolution, a key requirement for a high quality watermarking technique in spatial domain. Digital halftoning and data encoding techniques are the other factors that contribute to data hiding technique given in this research. While considering security aspects, the new technique allows contents integrity and authenticity verification in the present scenario in which certain amount of errors are unavoidable, restricting the usage of existing techniques given for digital contents. Finally, a superposed constant background grayscale image, obtained by the repeated application of a specially designed small binary pattern, is used as channel for hidden communication and it allows up to 33 pages of A-4 size foreground text to be encoded in one CBGI. The higher capacity is contributed from data encoding symbols and data reading technique

    Application and Theory of Multimedia Signal Processing Using Machine Learning or Advanced Methods

    Get PDF
    This Special Issue is a book composed by collecting documents published through peer review on the research of various advanced technologies related to applications and theories of signal processing for multimedia systems using ML or advanced methods. Multimedia signals include image, video, audio, character recognition and optimization of communication channels for networks. The specific contents included in this book are data hiding, encryption, object detection, image classification, and character recognition. Academics and colleagues who are interested in these topics will find it interesting to read

    Currency security and forensics: a survey

    Get PDF
    By its definition, the word currency refers to an agreed medium for exchange, a nation’s currency is the formal medium enforced by the elected governing entity. Throughout history, issuers have faced one common threat: counterfeiting. Despite technological advancements, overcoming counterfeit production remains a distant future. Scientific determination of authenticity requires a deep understanding of the raw materials and manufacturing processes involved. This survey serves as a synthesis of the current literature to understand the technology and the mechanics involved in currency manufacture and security, whilst identifying gaps in the current literature. Ultimately, a robust currency is desire

    Fourier-based automatic alignment for improved visual cryptography schemes

    No full text
    International audienceIn Visual Cryptography, several images, called "shadow images", that separately contain no information, are overlapped to reveal a shared secret message. We develop a method to digitally register one printed shadow image acquired by a camera with a purely digital shadow image, stored in memory. Using Fourier techniques derived from Fourier Optics concepts, the idea is to enhance and exploit the quasi periodicity of the shadow images, composed by a random distribution of black and white patterns on a periodic sampling grid. The advantage is to speed up the security control or the access time to the message, in particular in the cases of a small pixel size or of large numbers of pixels. Furthermore, the interest of visual cryptography can be increased by embedding the initial message in two shadow images that do not have identical mathematical supports, making manual registration impractical. Experimental results demonstrate the successful operation of the method, including the possibility to directly project the result onto the printed shadow image

    Computer vision in target pursuit using a UAV

    Get PDF
    Research in target pursuit using Unmanned Aerial Vehicle (UAV) has gained attention in recent years, this is primarily due to decrease in cost and increase in demand of small UAVs in many sectors. In computer vision, target pursuit is a complex problem as it involves the solving of many sub-problems which are typically concerned with the detection, tracking and following of the object of interest. At present, the majority of related existing methods are developed using computer simulation with the assumption of ideal environmental factors, while the remaining few practical methods are mainly developed to track and follow simple objects that contain monochromatic colours with very little texture variances. Current research in this topic is lacking of practical vision based approaches. Thus the aim of this research is to fill the gap by developing a real-time algorithm capable of following a person continuously given only a photo input. As this research considers the whole procedure as an autonomous system, therefore the drone is activated automatically upon receiving a photo of a person through Wi-Fi. This means that the whole system can be triggered by simply emailing a single photo from any device anywhere. This is done by first implementing image fetching to automatically connect to WIFI, download the image and decode it. Then, human detection is performed to extract the template from the upper body of the person, the intended target is acquired using both human detection and template matching. Finally, target pursuit is achieved by tracking the template continuously while sending the motion commands to the drone. In the target pursuit system, the detection is mainly accomplished using a proposed human detection method that is capable of detecting, extracting and segmenting the human body figure robustly from the background without prior training. This involves detecting face, head and shoulder separately, mainly using gradient maps. While the tracking is mainly accomplished using a proposed generic and non-learning template matching method, this involves combining intensity template matching with colour histogram model and employing a three-tier system for template management. A flight controller is also developed, it supports three types of controls: keyboard, mouse and text messages. Furthermore, the drone is programmed with three different modes: standby, sentry and search. To improve the detection and tracking of colour objects, this research has also proposed several colour related methods. One of them is a colour model for colour detection which consists of three colour components: hue, purity and brightness. Hue represents the colour angle, purity represents the colourfulness and brightness represents intensity. It can be represented in three different geometric shapes: sphere, hemisphere and cylinder, each of these shapes also contains two variations. Experimental results have shown that the target pursuit algorithm is capable of identifying and following the target person robustly given only a photo input. This can be evidenced by the live tracking and mapping of the intended targets with different clothing in both indoor and outdoor environments. Additionally, the various methods developed in this research could enhance the performance of practical vision based applications especially in detecting and tracking of objects

    Adaptive visual sampling

    Get PDF
    PhDVarious visual tasks may be analysed in the context of sampling from the visual field. In visual psychophysics, human visual sampling strategies have often been shown at a high-level to be driven by various information and resource related factors such as the limited capacity of the human cognitive system, the quality of information gathered, its relevance in context and the associated efficiency of recovering it. At a lower-level, we interpret many computer vision tasks to be rooted in similar notions of contextually-relevant, dynamic sampling strategies which are geared towards the filtering of pixel samples to perform reliable object association. In the context of object tracking, the reliability of such endeavours is fundamentally rooted in the continuing relevance of object models used for such filtering, a requirement complicated by realworld conditions such as dynamic lighting that inconveniently and frequently cause their rapid obsolescence. In the context of recognition, performance can be hindered by the lack of learned context-dependent strategies that satisfactorily filter out samples that are irrelevant or blunt the potency of models used for discrimination. In this thesis we interpret the problems of visual tracking and recognition in terms of dynamic spatial and featural sampling strategies and, in this vein, present three frameworks that build on previous methods to provide a more flexible and effective approach. Firstly, we propose an adaptive spatial sampling strategy framework to maintain statistical object models for real-time robust tracking under changing lighting conditions. We employ colour features in experiments to demonstrate its effectiveness. The framework consists of five parts: (a) Gaussian mixture models for semi-parametric modelling of the colour distributions of multicolour objects; (b) a constructive algorithm that uses cross-validation for automatically determining the number of components for a Gaussian mixture given a sample set of object colours; (c) a sampling strategy for performing fast tracking using colour models; (d) a Bayesian formulation enabling models of object and the environment to be employed together in filtering samples by discrimination; and (e) a selectively-adaptive mechanism to enable colour models to cope with changing conditions and permit more robust tracking. Secondly, we extend the concept to an adaptive spatial and featural sampling strategy to deal with very difficult conditions such as small target objects in cluttered environments undergoing severe lighting fluctuations and extreme occlusions. This builds on previous work on dynamic feature selection during tracking by reducing redundancy in features selected at each stage as well as more naturally balancing short-term and long-term evidence, the latter to facilitate model rigidity under sharp, temporary changes such as occlusion whilst permitting model flexibility under slower, long-term changes such as varying lighting conditions. This framework consists of two parts: (a) Attribute-based Feature Ranking (AFR) which combines two attribute measures; discriminability and independence to other features; and (b) Multiple Selectively-adaptive Feature Models (MSFM) which involves maintaining a dynamic feature reference of target object appearance. We call this framework Adaptive Multi-feature Association (AMA). Finally, we present an adaptive spatial and featural sampling strategy that extends established Local Binary Pattern (LBP) methods and overcomes many severe limitations of the traditional approach such as limited spatial support, restricted sample sets and ad hoc joint and disjoint statistical distributions that may fail to capture important structure. Our framework enables more compact, descriptive LBP type models to be constructed which may be employed in conjunction with many existing LBP techniques to improve their performance without modification. The framework consists of two parts: (a) a new LBP-type model known as Multiscale Selected Local Binary Features (MSLBF); and (b) a novel binary feature selection algorithm called Binary Histogram Intersection Minimisation (BHIM) which is shown to be more powerful than established methods used for binary feature selection such as Conditional Mutual Information Maximisation (CMIM) and AdaBoost

    Visual region understanding: unsupervised extraction and abstraction

    Get PDF
    The ability to gain a conceptual understanding of the world in uncontrolled environments is the ultimate goal of vision-based computer systems. Technological societies today are heavily reliant on surveillance and security infrastructure, robotics, medical image analysis, visual data categorisation and search, and smart device user interaction, to name a few. Out of all the complex problems tackled by computer vision today in context of these technologies, that which lies closest to the original goals of the field is the subarea of unsupervised scene analysis or scene modelling. However, its common use of low level features does not provide a good balance between generality and discriminative ability, both a result and a symptom of the sensory and semantic gaps existing between low level computer representations and high level human descriptions. In this research we explore a general framework that addresses the fundamental problem of universal unsupervised extraction of semantically meaningful visual regions and their behaviours. For this purpose we address issues related to (i) spatial and spatiotemporal segmentation for region extraction, (ii) region shape modelling, and (iii) the online categorisation of visual object classes and the spatiotemporal analysis of their behaviours. Under this framework we propose (a) a unified region merging method and spatiotemporal region reduction, (b) shape representation by the optimisation and novel simplication of contour-based growing neural gases, and (c) a foundation for the analysis of visual object motion properties using a shape and appearance based nearest-centroid classification algorithm and trajectory plots for the obtained region classes. 1 Specifically, we formulate a region merging spatial segmentation mechanism that combines and adapts features shown previously to be individually useful, namely parallel region growing, the best merge criterion, a time adaptive threshold, and region reduction techniques. For spatiotemporal region refinement we consider both scalar intensity differences and vector optical flow. To model the shapes of the visual regions thus obtained, we adapt the growing neural gas for rapid region contour representation and propose a contour simplication technique. A fast unsupervised nearest-centroid online learning technique next groups observed region instances into classes, for which we are then able to analyse spatial presence and spatiotemporal trajectories. The analysis results show semantic correlations to real world object behaviour. Performance evaluation of all steps across standard metrics and datasets validate their performance
    corecore