Search CORE

1,268 research outputs found

Automatic Document Image Binarization using Bayesian Optimization

Author: Badekas E
Bernsen John
Gatos Basilis
Nafchi Hossein Ziaei
Ntirogiannis Konstantinos
Pratikakis Ioannis
Pratikakis Ioannis
Pratikakis Ioannis
Pratikakis Ioannis
Su Bolan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/10/2017
Field of study

Document image binarization is often a challenging task due to various forms of degradation. Although there exist several binarization techniques in literature, the binarized image is typically sensitive to control parameter settings of the employed technique. This paper presents an automatic document image binarization algorithm to segment the text from heavily degraded document images. The proposed technique uses a two band-pass filtering approach for background noise removal, and Bayesian optimization for automatic hyperparameter selection for optimal results. The effectiveness of the proposed binarization technique is empirically demonstrated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets

arXiv.org e-Print Archive

Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery

Author: A Strehl
A Weingessel
Asoke K. Nandi
B Abu-Jamous
B Abu-Jamous
B Fischer
Basel Abu-Jamous
D Greene
D Liu
D Stuart
David J. Roberts
E Dimitriadou
E Dimitriadou
FD Gibbons
HG Ayad
JM Pena
K Tumer
KY Yeung
LP Zhao
MBH Rhouma
N Slonim
O Nwamadi
PT Spellman
R Avogadri
R BabusÏka
R Baumgartner
R Fa
R Nilsson
RJ Cho
Rui Fa
S Dudoit
S Haykin
S Vega-Pons
S Vega-Pons
SA Salem
Shyamal D. Peddada
T Pramila
TE Kohonen
X Zhou
Z Yu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc

CiteSeerX

Directory of Open Access Journals

Brunel University Research Archive

FigShare

DCTNet : A Simple Learning-free Approach for Face Recognition

Author: Ng Cong Jie
Teoh Andrew Beng Jin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/09/2015
Field of study

PCANet was proposed as a lightweight deep learning network that mainly leverages Principal Component Analysis (PCA) to learn multistage filter banks followed by binarization and block-wise histograming. PCANet was shown worked surprisingly well in various image classification tasks. However, PCANet is data-dependence hence inflexible. In this paper, we proposed a data-independence network, dubbed DCTNet for face recognition in which we adopt Discrete Cosine Transform (DCT) as filter banks in place of PCA. This is motivated by the fact that 2D DCT basis is indeed a good approximation for high ranked eigenvectors of PCA. Both 2D DCT and PCA resemble a kind of modulated sine-wave patterns, which can be perceived as a bandpass filter bank. DCTNet is free from learning as 2D DCT bases can be computed in advance. Besides that, we also proposed an effective method to regulate the block-wise histogram feature vector of DCTNet for robustness. It is shown to provide surprising performance boost when the probe image is considerably different in appearance from the gallery image. We evaluate the performance of DCTNet extensively on a number of benchmark face databases and being able to achieve on par with or often better accuracy performance than PCANet.Comment: APSIPA ASC 201

arXiv.org e-Print Archive

Automatic Palaeographic Exploration of Genizah Manuscripts

Author: Choueka Yaakov
Dershowitz Nachum
German Tanya
Potikha Liza
Shweka Roni
Wolf Lior
Publication venue: Books on Demand (BoD)
Publication date: 01/01/2011
Field of study

The Cairo Genizah is a collection of hand-written documents containing approximately 350,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in some 75 libraries and private collections worldwide, but there is an ongoing effort to document and catalogue all extant fragments. Palaeographic information plays a key role in the study of the Genizah collection. Script style, and–more specifically–handwriting, can be used to identify fragments that might originate from the same original work. Such matched fragments, commonly referred to as “joins”, are currently identified manually by experts, and presumably only a small fraction of existing joins have been discovered to date. In this work, we show that automatic handwriting matching functions, obtained from non-specific features using a corpus of writing samples, can perform this task quite reliably. In addition, we explore the problem of grouping various Genizah documents by script style, without being provided any prior information about the relevant styles. The automatically obtained grouping agrees, for the most part, with the palaeographic taxonomy. In cases where the method fails, it is due to apparent similarities between related scripts

Q-Learnheuristics: towards data-driven balanced metaheuristics

Author: Astorga Gino
Becerra Rozas Marcelo
Caballe Cervigón Nuria
Castillo Mauricio
Castro Carlos
Cisternas Caneo Felipe
Crawford Labrin Broderick
Garcia Jose
Lanza Gutiérrez José Manuel
Lemun Romani José
Rubio José Miguel
Soto Ricardo
Tapia Diego
Publication venue: MDPI
Publication date: 01/08/2021
Field of study

One of the central issues that must be resolved for a metaheuristic optimization process to work well is the dilemma of the balance between exploration and exploitation. The metaheuristics (MH) that achieved this balance can be called balanced MH, where a Q-Learning (QL) integration framework was proposed for the selection of metaheuristic operators conducive to this balance, particularly the selection of binarization schemes when a continuous metaheuristic solves binary combinatorial problems. In this work the use of this framework is extended to other recent metaheuristics, demonstrating that the integration of QL in the selection of operators improves the exploration-exploitation balance. Specifically, the Whale Optimization Algorithm and the Sine-Cosine Algorithm are tested by solving the Set Covering Problem, showing statistical improvements in this balance and in the quality of the solutions

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Directory of Open Access Journals