2,838 research outputs found
Sub-Classifier Construction for Error Correcting Output Code Using Minimum Weight Perfect Matching
Multi-class classification is mandatory for real world problems and one of
promising techniques for multi-class classification is Error Correcting Output
Code. We propose a method for constructing the Error Correcting Output Code to
obtain the suitable combination of positive and negative classes encoded to
represent binary classifiers. The minimum weight perfect matching algorithm is
applied to find the optimal pairs of subset of classes by using the
generalization performance as a weighting criterion. Based on our method, each
subset of classes with positive and negative labels is appropriately combined
for learning the binary classifiers. Experimental results show that our
technique gives significantly higher performance compared to traditional
methods including the dense random code and the sparse random code both in
terms of accuracy and classification times. Moreover, our method requires
significantly smaller number of binary classifiers while maintaining accuracy
compared to the One-Versus-One.Comment: 7 pages, 3 figure
Iris Indexing and Ear Classification
To identify an individual using a biometric system, the input biometric data has to be typically compared against that of each and every identity in the existing database during the matching stage. The response time of the system increases with the increase in number of individuals (i.e., database size), which is not acceptable in real time monitoring or when working on large scale data. This thesis addresses the problem of reducing the number of database candidates to be considered during matching in the context of iris and ear recognition. In the case of iris, an indexing mechanism based on Burrows Wheeler Transform (BWT) is proposed. Experiments on the CASIA version 3 iris database show a significant reduction in both search time and search space, suggesting the potential of this scheme for indexing iris databases. The ear classification scheme proposed in the thesis is based on parameterizing the shape of the ear and assigning it to one of four classes: round, rectangle, oval and triangle. Experiments on the MAGNA database suggest the potential of this scheme for classifying ear databases
On the performance of helper data template protection schemes
The use of biometrics looks promising as it is already being applied in elec- tronic passports, ePassports, on a global scale. Because the biometric data has to be stored as a reference template on either a central or personal storage de- vice, its wide-spread use introduces new security and privacy risks such as (i) identity fraud, (ii) cross-matching, (iii) irrevocability and (iv) leaking sensitive medical information. Mitigating these risks is essential to obtain the accep- tance from the subjects of the biometric systems and therefore facilitating the successful implementation on a large-scale basis. A solution to mitigate these risks is to use template protection techniques. The required protection properties of the stored reference template according to ISO guidelines are (i) irreversibility, (ii) renewability and (iii) unlinkability. A known template protection scheme is the helper data system (HDS). The fun- damental principle of the HDS is to bind a key with the biometric sample with use of helper data and cryptography, as such that the key can be reproduced or released given another biometric sample of the same subject. The identity check is then performed in a secure way by comparing the hash of the key. Hence, the size of the key determines the amount of protection. This thesis extensively investigates the HDS system, namely (i) the the- oretical classication performance, (ii) the maximum key size, (iii) the irre- versibility and unlinkability properties, and (iv) the optimal multi-sample and multi-algorithm fusion method. The theoretical classication performance of the biometric system is deter- mined by assuming that the features extracted from the biometric sample are Gaussian distributed. With this assumption we investigate the in uence of the bit extraction scheme on the classication performance. With use of the the- oretical framework, the maximum size of the key is determined by assuming the error-correcting code to operate on Shannon's bound. We also show three vulnerabilities of HDS that aect the irreversibility and unlinkability property and propose solutions. Finally, we study the optimal level of applying multi- sample and multi-algorithm fusion with the HDS at either feature-, score-, or decision-level
Pattern Recognition
A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition
Information-Theoretic Multiclass Classification Based on Binary Classifiers: On Coding Matrix Design, Reliability and Maximum Number of Classes
In this paper, we consider the multiclass classification problem based on sets of independent binary classifiers. Each binary classifier represents the output of a quantized projection of training data onto a randomly generated orthonormal basis vector thus producing a binary label. The ensemble of all binary labels forms an analogue of a coding matrix. The properties of such kind of matrices and their impact on the maximum number of uniquely distinguishable classes are analyzed in this paper from an information-theoretic point of view. We also consider a concept of reliability for such kind of coding matrix generation that can be an alternative to other adaptive training techniques and investigate the impact on the bit error probability. We demonstrate that it is equivalent to the considered random coding matrix without any bit reliability information in terms of recognition rat
Methods of Disambiguating and De-anonymizing Authorship in Large Scale Operational Data
Operational data from software development, social networks and other domains are often contaminated with incorrect or missing values. Examples include misspelled or changed names, multiple emails belonging to the same person and user profiles that vary in different systems. Such digital traces are extensively used in research and practice to study collaborating communities of various kinds. To achieve a realistic representation of the networks that represent these communities, accurate identities are essential. In this work, we aim to identify, model, and correct identity errors in data from open-source software repositories, which include more than 23M developer IDs and nearly 1B Git commits (developer activity records). Our investigation into the nature and prevalence of identity errors in software activity data reveals that they are different and occur at much higher rates than other domains. Existing techniques relying on string comparisons can only disambiguate Synonyms, but not Homonyms, which are common in software activity traces. Therefore, we introduce measures of behavioral fingerprinting to improve the accuracy of Synonym resolution, and to disambiguate Homonyms. Fingerprints are constructed from the traces of developers’ activities, such as, the style of writing in commit messages, the patterns in files modified and projects participated in by developers, and the patterns related to the timing of the developers’ activity. Furthermore, to address the lack of training data necessary for the supervised learning approaches that are used in disambiguation, we design a specific active learning procedure that minimizes the manual effort necessary to create training data in the domain of developer identity matching. We extensively evaluate the proposed approach, using over 16,000 OpenStack developers in 1200 projects, against commercial and most recent research approaches, and further on recent research on a much larger sample of over 2,000,000 IDs. Results demonstrate that our method is significantly better than both the recent research and commercial methods. We also conduct experiments to demonstrate that such erroneous data have significant impact on developer networks. We hope that the proposed approach will expedite research progress in the domain of software engineering, especially in applications for which graphs of social networks are critical
Multi Criteria Mapping Based on SVM and Clustering Methods
There are many more ways to automate the application process like using some commercial software’s that are used in big organizations to scan bills and forms, but this application is only for the static frames or formats. In our application, we are trying to automate the non-static frames as the study certificate we get are from different counties with different universities. Each and every university have there one format of certificates, so we try developing a very new application that can commonly work for all the frames or formats. As we observe many applicants are from same university which have a common format of the certificate, if we implement this type of tools, then we can analyze this sort of certificates in a simple way within very less time. To make this process more accurate we try implementing SVM and Clustering methods. With these methods we can accurately map courses in certificates to ASE study path if not to exclude list. A grade calculation is done for courses which are mapped to an ASE list by separating the data for both labs and courses in it. At the end, we try to award some points, which includes points from ASE related courses, work experience, specialization certificates and German language skills. Finally, these points are provided to the chair to select the applicant for master course ASE
Decoding algorithms for surface codes
Quantum technologies have the potential to solve computationally hard
problems that are intractable via classical means. Unfortunately, the unstable
nature of quantum information makes it prone to errors. For this reason,
quantum error correction is an invaluable tool to make quantum information
reliable and enable the ultimate goal of fault-tolerant quantum computing.
Surface codes currently stand as the most promising candidates to build error
corrected qubits given their two-dimensional architecture, a requirement of
only local operations, and high tolerance to quantum noise. Decoding algorithms
are an integral component of any error correction scheme, as they are tasked
with producing accurate estimates of the errors that affect quantum
information, so that it can subsequently be corrected. A critical aspect of
decoding algorithms is their speed, since the quantum state will suffer
additional errors with the passage of time. This poses a connundrum-like
tradeoff, where decoding performance is improved at the expense of complexity
and viceversa. In this review, a thorough discussion of state-of-the-art
surface code decoding algorithms is provided. The core operation of these
methods is described along with existing variants that show promise for
improved results. In addition, both the decoding performance, in terms of error
correction capability, and decoding complexity, are compared. A review of the
existing software tools regarding surface code decoding is also provided.Comment: 54 pages, 31 figure
Recommended from our members
Robust Algorithms for Clustering with Applications to Data Integration
A growing number of data-based applications are used for decision-making that have far-reaching consequences and significant societal impact. Entity resolution, community detection and taxonomy construction are some of the building blocks of these applications and for these methods, clustering is the fundamental underlying concept. Therefore, the use of accurate, robust and scalable methods for clustering cannot be overstated. We tackle the various facets of clustering with a multi-pronged approach described below.
1. While identification of clusters that refer to different entities is challenging for automated strategies, it is relatively easy for humans. We study the robustness of clustering methods that leverage supervision through an oracle i.e an abstraction of crowdsourcing. Additionally, we focus on scalability to handle web-scale datasets.
2. In community detection applications, a common setback in evaluation of the quality of clustering techniques is the lack of ground truth data. We propose a generative model that considers dependent edge formation and devise techniques for efficient cluster recovery
- …