3,724 research outputs found
Penyelenggaraan struktur penahan cerun rock shed: langkah mitigasi runtuhan tanah di Simpang Pulai - Blue Valley, Perak
Industri pembinaan merupakan industri yang sangat mencabar bukan sahaja di Malaysia malah di seluruh dunia yang merangkumi skop 3D dirty, difficult and dangerous. Industri ini juga meruapakan antara penyumbang terbesar KDNK iaitu sebanyak 7.4 peratus pada tahun 2016, walaupun industri ini antara penyumbang terbesar dari aspek keselamatan iaitu kemalangan (CIDB, 2017). Justeru itu, pihak yang bertanggungjawab seharusnya memandang serius mengenai masalah-masalah yang dihadapi supaya industri ini mampu bersaing di peringkat antarabangsa
Recommended from our members
Use of colour for hand-filled form analysis and recognition
Colour information in form analysis is currently under utilised. As technology has advanced and computing costs have reduced, the processing of forms in colour has now become practicable. This paper describes a novel colour-based approach to the extraction of filled data from colour form images. Images are first quantised to reduce the colour complexity and data is extracted by examining the colour characteristics of the images. The improved performance of the proposed method has been verified by comparing the processing time, recognition rate, extraction precision and recall rate to that of an equivalent black and white system
Anveshak - A Groundtruth Generation Tool for Foreground Regions of Document Images
We propose a graphical user interface based groundtruth generation tool in
this paper. Here, annotation of an input document image is done based on the
foreground pixels. Foreground pixels are grouped together with user interaction
to form labeling units. These units are then labeled by the user with the user
defined labels. The output produced by the tool is an image with an XML file
containing its metadata information. This annotated data can be further used in
different applications of document image analysis.Comment: Accepted in DAR 201
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis
Automatic analysis of scanned historical documents comprises a wide range of
image analysis tasks, which are often challenging for machine learning due to a
lack of human-annotated learning samples. With the advent of deep neural
networks, a promising way to cope with the lack of training data is to
pre-train models on images from a different domain and then fine-tune them on
historical documents. In the current research, a typical example of such
cross-domain transfer learning is the use of neural networks that have been
pre-trained on the ImageNet database for object recognition. It remains a
mostly open question whether or not this pre-training helps to analyse
historical documents, which have fundamentally different image properties when
compared with ImageNet. In this paper, we present a comprehensive empirical
survey on the effect of ImageNet pre-training for diverse historical document
analysis tasks, including character recognition, style classification,
manuscript dating, semantic segmentation, and content-based retrieval. While we
obtain mixed results for semantic segmentation at pixel-level, we observe a
clear trend across different network architectures that ImageNet pre-training
has a positive effect on classification as well as content-based retrieval
Automated framework for robust content-based verification of print-scan degraded text documents
Fraudulent documents frequently cause severe financial damages and impose security breaches to civil and government organizations. The rapid advances in technology and the widespread availability of personal computers has not reduced the use of printed documents. While digital documents can be verified by many robust and secure methods such as digital signatures and digital watermarks, verification of printed documents still relies on manual inspection of embedded physical security mechanisms.The objective of this thesis is to propose an efficient automated framework for robust content-based verification of printed documents. The principal issue is to achieve robustness with respect to the degradations and increased levels of noise that occur from multiple cycles of printing and scanning. It is shown that classic OCR systems fail under such conditions, moreover OCR systems typically rely heavily on the use of high level linguistic structures to improve recognition rates. However inferring knowledge about the contents of the document image from a-priori statistics is contrary to the nature of document verification. Instead a system is proposed that utilizes specific knowledge of the document to perform highly accurate content verification based on a Print-Scan degradation model and character shape recognition. Such specific knowledge of the document is a reasonable choice for the verification domain since the document contents are already known in order to verify them.The system analyses digital multi font PDF documents to generate a descriptive summary of the document, referred to as \Document Description Map" (DDM). The DDM is later used for verifying the content of printed and scanned copies of the original documents. The system utilizes 2-D Discrete Cosine Transform based features and an adaptive hierarchical classifier trained with synthetic data generated by a Print-Scan degradation model. The system is tested with varying degrees of Print-Scan Channel corruption on a variety of documents with corruption produced by repetitive printing and scanning of the test documents. Results show the approach achieves excellent accuracy and robustness despite the high level of noise
- …