27 research outputs found

    Multi-Character Field Recognition for Arabic and Chinese Handwriting

    Get PDF
    Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references

    Multi-Character Field Recognition for Arabic and Chinese Handwriting

    Get PDF
    Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references

    Camera-Based Ballot Counter

    Get PDF
    Portable ballot counters using camera technology and manual paper feed are potentially more reliable and less expensive than scanner based systems. We show that the spatial sampling rate, geometric linearity, point spread function, and photometric transfer function of off-the-shelf consumer cameras are acceptable for ballot imaging. However, scanner illumination is much more uniform than can be economically accomplished for variable size ballots. Therefore flat-field compensation must be designed into the image processing software. We illustrate the mechanical design of a prototype camera based ballot reader based on our comparative observations

    Degradation Specific OCR

    Get PDF
    Optical Character Recognition (OCR) is the mechanical or electronic translation of scanned images of handwritten, typewritten, or printed text into machine-encoded text. OCR has many applications, such as enabling a text document in a physical form to be editable, or enabling computer searching on a computer of a text that was initially in printed form. OCR engines are widely used to digitize text documents so that they can be digitally stored for remote access, mainly for websites. This facilitates the availability of these invaluable resources instantly, no matter the geographical location of the end user. Huge OCR misclassification errors can occur when an OCR engine is used to digitize a document that is degraded. The degradation may be due to varied reasons, including aging of the paper, incomplete printed characters, and blots of ink on the original document. In this thesis, the degradation due to scanning text documents was considered. To improve the OCR performance, it is vital to train the classifier on a large training set that has significant data points similar to the degraded real-life characters. In this thesis, characters with varying degrees of blurring and binarization thresholds were generated and they were used to calculate Edge Spread degradation parameters. These parameters were then used to divide the training data set of the OCR engine into more homogeneous sets. The resulting classification accuracy by training on these smaller sets was analyzed. The training data set consisted of 100,000 data points of 300 DPI, 12 point Sans Serif font lowercase characters ‘c and ‘e’. These characters were generated with random values of threshold and blur width with random Gaussian noise added. To group the similar degraded characters together, clustering was performed using the Isodata clustering algoirithm. The two edge-spread parameters, one calculated on isolated edges named DC, one calculated on edges in close proximity accounting for interference effects, named MDC, were estimated to fit the cluster boundaries. These values were then used to divide the training data and a Bayesian classifier was used for recognition. It was verified that MDC is slightly better than DC as a division parameter. A choice of either 2 or 3 partitions was found to be the best choice for dataset division. An experimental way to estimate the best boundary to divide the data set was determined and tests were conducted that verified it. Both crisp and fuzzy approaches for classifier training and testing were implemented and various combinations were tried with the crisp training and fuzzy testing being the best approach, giving a 98.08% classification rate for the data set divided into 2 partitions and 98.93% classification rate for the data set divided into 3 partitions in comparison to 94.08% for the classification of the data set with no divisions

    Towards Post-Quantum Security for Signal's X3DH Handshake

    Get PDF
    Modern key exchange protocols are usually based on the Diffie–Hellman (DH) primitive. The beauty of this primitive, among other things, is its potential reusage of key shares: DH shares can be either used a single time or in multiple runs. Since DH-based protocols are insecure against quantum adversaries, alternative solutions have to be found when moving to the post-quantum setting. However, most post-quantum candidates, including schemes based on lattices and even supersingular isogeny DH, are not known to be secure under key reuse. In particular, this means that they cannot be necessarily deployed as an immediate DH substitute in protocols. In this paper, we introduce the notion of a split key encapsulation mechanism (split KEM) to translate the desired key-reusability of a DH-based protocol to a KEM-based flow. We provide the relevant security notions of split KEMs and show how the formalism lends itself to lifting Signal’s X3DH handshake to the post-quantum KEM setting without additional message flows. Although the proposed framework conceptually solves the raised issues, instantiating it securely from post-quantum assumptions proved to be non-trivial. We give passively secure instantiations from (R)LWE, yet overcoming the above-mentioned insecurities under key reuse in the presence of active adversaries remains an open problem. Approaching one- sided key reuse, we provide a split KEM instantiation that allows such reuse based on the KEM introduced by Kiltz (PKC 2007), which may serve as a post-quantum blueprint if the underlying hardness assumption (gap hashed Diffie–Hellman) holds for the commutative group action of CSIDH (Asiacrypt 2018). The intention of this paper hence is to raise awareness of the challenges arising when moving to KEM-based key exchange protocols with key-reusability, and to propose split KEMs as a specific target for instantiation in future research
    corecore