176,952 research outputs found

    Extensions of algebraic image operators: An approach to model-based vision

    Get PDF
    Researchers extend their previous research on a highly structured and compact algebraic representation of grey-level images which can be viewed as fuzzy sets. Addition and multiplication are defined for the set of all grey-level images, which can then be described as polynomials of two variables. Utilizing this new algebraic structure, researchers devised an innovative, efficient edge detection scheme. An accurate method for deriving gradient component information from this edge detector is presented. Based upon this new edge detection system researchers developed a robust method for linear feature extraction by combining the techniques of a Hough transform and a line follower. The major advantage of this feature extractor is its general, object-independent nature. Target attributes, such as line segment lengths, intersections, angles of intersection, and endpoints are derived by the feature extraction algorithm and employed during model matching. The algebraic operators are global operations which are easily reconfigured to operate on any size or shape region. This provides a natural platform from which to pursue dynamic scene analysis. A method for optimizing the linear feature extractor which capitalizes on the spatially reconfiguration nature of the edge detector/gradient component operator is discussed

    Multimodal Biometric Systems for Personal Identification and Authentication using Machine and Deep Learning Classifiers

    Get PDF
    Multimodal biometrics, using machine and deep learning, has recently gained interest over single biometric modalities. This interest stems from the fact that this technique improves recognition and, thus, provides more security. In fact, by combining the abilities of single biometrics, the fusion of two or more biometric modalities creates a robust recognition system that is resistant to the flaws of individual modalities. However, the excellent recognition of multimodal systems depends on multiple factors, such as the fusion scheme, fusion technique, feature extraction techniques, and classification method. In machine learning, existing works generally use different algorithms for feature extraction of modalities, which makes the system more complex. On the other hand, deep learning, with its ability to extract features automatically, has made recognition more efficient and accurate. Studies deploying deep learning algorithms in multimodal biometric systems tried to find a good compromise between the false acceptance and the false rejection rates (FAR and FRR) to choose the threshold in the matching step. This manual choice is not optimal and depends on the expertise of the solution designer, hence the need to automatize this step. From this perspective, the second part of this thesis details an end-to-end CNN algorithm with an automatic matching mechanism. This thesis has conducted two studies on face and iris multimodal biometric recognition. The first study proposes a new feature extraction technique for biometric systems based on machine learning. The iris and facial features extraction is performed using the Discrete Wavelet Transform (DWT) combined with the Singular Value Decomposition (SVD). Merging the relevant characteristics of the two modalities is used to create a pattern for an individual in the dataset. The experimental results show the robustness of our proposed technique and the efficiency when using the same feature extraction technique for both modalities. The proposed method outperformed the state-of-the-art and gave an accuracy of 98.90%. The second study proposes a deep learning approach using DensNet121 and FaceNet for iris and faces multimodal recognition using feature-level fusion and a new automatic matching technique. The proposed automatic matching approach does not use the threshold to ensure a better compromise between performance and FAR and FRR errors. However, it uses a trained multilayer perceptron (MLP) model that allows people’s automatic classification into two classes: recognized and unrecognized. This platform ensures an accurate and fully automatic process of multimodal recognition. The results obtained by the DenseNet121-FaceNet model by adopting feature-level fusion and automatic matching are very satisfactory. The proposed deep learning models give 99.78% of accuracy, and 99.56% of precision, with 0.22% of FRR and without FAR errors. The proposed and developed platform solutions in this thesis were tested and vali- dated in two different case studies, the central pharmacy of Al-Asria Eye Clinic in Dubai and the Abu Dhabi Police General Headquarters (Police GHQ). The solution allows fast identification of the persons authorized to access the different rooms. It thus protects the pharmacy against any medication abuse and the red zone in the military zone against the unauthorized use of weapons

    Indexing and Retrieval of 3D Articulated Geometry Models

    Get PDF
    In this PhD research study, we focus on building a content-based search engine for 3D articulated geometry models. 3D models are essential components in nowadays graphic applications, and are widely used in the game, animation and movies production industry. With the increasing number of these models, a search engine not only provides an entrance to explore such a huge dataset, it also facilitates sharing and reusing among different users. In general, it reduces production costs and time to develop these 3D models. Though a lot of retrieval systems have been proposed in recent years, search engines for 3D articulated geometry models are still in their infancies. Among all the works that we have surveyed, reliability and efficiency are the two main issues that hinder the popularity of such systems. In this research, we have focused our attention mainly to address these two issues. We have discovered that most existing works design features and matching algorithms in order to reflect the intrinsic properties of these 3D models. For instance, to handle 3D articulated geometry models, it is common to extract skeletons and use graph matching algorithms to compute the similarity. However, since this kind of feature representation is complex, it leads to high complexity of the matching algorithms. As an example, sub-graph isomorphism can be NP-hard for model graph matching. Our solution is based on the understanding that skeletal matching seeks correspondences between the two comparing models. If we can define descriptive features, the correspondence problem can be solved by bag-based matching where fast algorithms are available. In the first part of the research, we propose a feature extraction algorithm to extract such descriptive features. We then convert the skeletal matching problems into bag-based matching. We further define metric similarity measure so as to support fast search. We demonstrate the advantages of this idea in our experiments. The improvement on precision is 12\% better at high recall. The indexing search of 3D model is 24 times faster than the state of the art if only the first relevant result is returned. However, improving the quality of descriptive features pays the price of high dimensionality. Curse of dimensionality is a notorious problem on large multimedia databases. The computation time scales exponentially as the dimension increases, and indexing techniques may not be useful in such situation. In the second part of the research, we focus ourselves on developing an embedding retrieval framework to solve the high dimensionality problem. We first argue that our proposed matching method projects 3D models on manifolds. We then use manifold learning technique to reduce dimensionality and maximize intra-class distances. We further propose a numerical method to sub-sample and fast search databases. To preserve retrieval accuracy using fewer landmark objects, we propose an alignment method which is also beneficial to existing works for fast search. The advantages of the retrieval framework are demonstrated in our experiments that it alleviates the problem of curse of dimensionality. It also improves the efficiency (3.4 times faster) and accuracy (30\% more accurate) of our matching algorithm proposed above. In the third part of the research, we also study a closely related area, 3D motions. 3D motions are captured by sticking sensor on human beings. These captured data are real human motions that are used to animate 3D articulated geometry models. Creating realistic 3D motions is an expensive and tedious task. Although 3D motions are very different from 3D articulated geometry models, we observe that existing works also suffer from the problem of temporal structure matching. This also leads to low efficiency in the matching algorithms. We apply the same idea of bag-based matching into the work of 3D motions. From our experiments, the proposed method has a 13\% improvement on precision at high recall and is 12 times faster than existing works. As a summary, we have developed algorithms for 3D articulated geometry models and 3D motions, covering feature extraction, feature matching, indexing and fast search methods. Through various experiments, our idea of converting restricted matching to bag-based matching improves matching efficiency and reliability. These have been shown in both 3D articulated geometry models and 3D motions. We have also connected 3D matching to the area of manifold learning. The embedding retrieval framework not only improves efficiency and accuracy, but has also opened a new area of research

    A visual transformer-based smart textual extraction method for financial invoices

    Get PDF
    In era of big data, the computer vision-assisted textual extraction techniques for financial invoices have been a major concern. Currently, such tasks are mainly implemented via traditional image processing techniques. However, they highly rely on manual feature extraction and are mainly developed for specific financial invoice scenes. The general applicability and robustness are the major challenges faced by them. As consequence, deep learning can adaptively learn feature representation for different scenes and be utilized to deal with the above issue. As a consequence, this work introduces a classic pre-training model named visual transformer to construct a lightweight recognition model for this purpose. First, we use image processing technology to preprocess the bill image. Then, we use a sequence transduction model to extract information. The sequence transduction model uses a visual transformer structure. In the stage target location, the horizontal-vertical projection method is used to segment the individual characters, and the template matching is used to normalize the characters. In the stage of feature extraction, the transformer structure is adopted to capture relationship among fine-grained features through multi-head attention mechanism. On this basis, a text classification procedure is designed to output detection results. Finally, experiments on a real-world dataset are carried out to evaluate performance of the proposal and the obtained results well show the superiority of it. Experimental results show that this method has high accuracy and robustness in extracting financial bill information

    Applying MDL to Learning Best Model Granularity

    Get PDF
    The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a two-part code of the data set: this embodies ``Occam's Razor.'' In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Based on a new modification of elastic matching, using multiple prototypes per character, the optimal prediction rate is predicted for the learned parameter (length of sampling interval) considered most likely by MDL, which is shown to coincide with the best value found experimentally. In the second experiment the task is to model a robot arm with two degrees of freedom using a three layer feed-forward neural network where we need to determine the number of nodes in the hidden layer giving best modeling performance. The optimal model (the one that extrapolizes best on unseen examples) is predicted for the number of nodes in the hidden layer considered most likely by MDL, which again is found to coincide with the best value found experimentally.Comment: LaTeX, 32 pages, 5 figures. Artificial Intelligence journal, To appea
    corecore