346 research outputs found

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Computer screenshot classification for boosting ADHD productivity in a VR environment

    Get PDF
    Individuals with ADHD face significant challenges in their daily lives due to difficulties with attention, hyperactivity, and impulsivity. These challenges are especially pronounced in the workplace or educational settings, where the ability to sustain attention and manage time effectively is crucial for success. Virtual reality (VR) software has emerged as a promising tool for improving productivity in individuals with ADHD. However, the effectiveness of such software depends on the identification of potential distractions and timely intervention. The proposed computer screenshot classification approach addresses this need by providing a means for identifying and analyzing potential distractions within VR software. By integrating Convolutional Neural Networks (CNNs), Optical Character Recognition (OCR), and Natural Language Processing (NLP), the proposed approach can accurately classify screenshots and extract features, facilitating the identification of distractions and enabling timely intervention to minimize their impact on productivity. The implications of this research are significant, as ADHD affects a substantial portion of the population and has a significant impact on productivity and quality of life. By providing a novel approach for studying, detecting, and enhancing productivity, this research has the potential to improve outcomes for individuals with ADHD and increase the efficiency and effectiveness of workplaces and educational settings. Moreover, the proposed approach holds promise for wider applicability to other productivity studies involving computer users, where the classification of screenshots and feature extraction play a crucial role in discerning behavioral patterns.Les persones amb TDAH s’enfronten a reptes importants en la seva vida diària a causa de les dificultats d’atenció, hiperactivitat i impulsivitat. Aquests reptes són especialment pronunciats al lloc de treball o en entorns educatius, on la capacitat de mantenir l’atenció i gestionar el temps de manera eficaç és crucial per a l’èxit. El software de realitat virtual (RV) s’ha revelat com a eina prometedora per millorar la productivitat de les persones amb TDAH. Tanmateix, l’eficàcia del software esmentat depèn de la identificació de distraccions potencials i de la intervenció oportuna. L’enfocament de classificació de captures de pantalla d’ordinador proposat aborda aquesta necessitat proporcionant un mitjà per identificar i analitzar les distraccions potencials dins del programari de RV. Mitjançant la integració de xarxes neuronals convolucionals (CNN), el reconeixement òptic de caràcters (OCR) i el processament del llenguatge natural (NLP), l’enfocament proposat pot classificar amb precisió les captures de pantalla i extreure’n característiques, facilitant la identificació de les distraccions i permetent una intervenció oportuna per minimitzar-ne l’impacte en la productivitat. Les implicacions d’aquesta investigació són importants, ja que el TDAH afecta una part substancial de la població i té un impacte significatiu a la productivitat i la qualitat de vida. En proporcionar un enfocament nou per estudiar, detectar i millorar la productivitat, aquesta investigació té el potencial de millorar els resultats per a les persones amb TDAH i augmentar l’eficiència i l’eficàcia dels llocs de treball i els entorns educatius. A més, l’enfocament proposat promet una aplicabilitat més gran a altres estudis de productivitat en què participin usuaris d’ordinadors, en què la classificació de captures de pantalla i l’extracció de característiques tenen un paper crucial a l’hora de discernir patrons de comportament

    Diagnosis of Parkinson’s Disease by Boosted Neural Networks

    Get PDF
    A boosting by filtering technique for neural network systems with back propagation together with a majority voting scheme is presented in this paper. Previous research with regards to predict the presence of Parkinson’s Disease has shown accuracy rates up to 92.9% [1] but it comes with a cost of reduced prediction accuracy of the minority class. The designed neural network system boosted by filtering in this article presents a significant increase of robustness and it is shown that by majority voting of the parallel networks, recognition rates reach to > 90 in a imbalanced 3:1 imbalanced class distribution Parkinson’s Disease data set

    Recognition of handwritten Chinese characters by combining regularization, Fisher's discriminant and distorted sample generation

    Get PDF
    Proceedings of the 10th International Conference on Document Analysis and Recognition, 2009, p. 1026–1030The problem of offline handwritten Chinese character recognition has been extensively studied by many researchers and very high recognition rates have been reported. In this paper, we propose to further boost the recognition rate by incorporating a distortion model that artificially generates a huge number of virtual training samples from existing ones. We achieve a record high recognition rate of 99.46% on the ETL-9B database. Traditionally, when the dimension of the feature vector is high and the number of training samples is not sufficient, the remedies are to (i) regularize the class covariance matrices in the discriminant functions, (ii) employ Fisher's dimension reduction technique to reduce the feature dimension, and (iii) generate a huge number of virtual training samples from existing ones. The second contribution of this paper is the investigation of the relative effectiveness of these three methods for boosting the recognition rate. © 2009 IEEE.published_or_final_versio

    Boosted ensemble algorithm strategically trained for the incremental learning of unbalanced data

    Get PDF
    Many pattern classification problems require a solution that needs to be incrementally updated over a period of time. Incremental learning problems are often complicated by the appearance of new concept classes and unbalanced cardinality in training data. The purpose of this research is to develop an algorithm capable of incrementally learning from severely unbalanced data. This work introduces three novel ensemble based algorithms derived from the incremental learning algorithm, Learn++. Learn++.NC is designed specifically for incrementally learning New Classes through dynamically adjusting the combination weights of the classifiers\u27 decisions. Learn++.UD handles Unbalanced Data through class-conditional voting weights that are proportional to the cardinality differences among training datasets. Finally, we introduce the Boosted Ensemble Algorithm Strategically Trained (BEAST) for incremental learning of unbalanced data. BEAST combines Learn++.NC and Learn++.UD with additional strategies that compensate for unbalanced data arising from cardinality differences among concept classes. These three algorithms are investigated both analytically and empirically through a series of simulations. The simulation results are presented, compared and discussed. While Learn++.NC and Learn++.UD perform well on the specific problems they were designed for, BEAST provides a strong and more robust performance on a much broader spectrum of complex incremental learning and unbalanced data problems

    Subjective and objective quality assessment of ancient degraded documents

    Get PDF
    Archiving, restoration and analysis of damaged manuscripts have been largely increased in recent decades. Usually, these documents are physically degraded because of aging and improper handing. They also cannot be processed manually because a massive volume of these documents exist in libraries and archives around the world. Therefore, automatic methodologies are needed to preserve and to process their content. These documents are usually processed through their images. Degraded document image processing is a difficult task mainly because of the existing physical degradations. While it can be very difficult to accurately locate and remove such distortions, analyzing the severity and type(s) of these distortions is feasible. This analysis provides useful information on the type and severity of degradations with a number of applications. The main contributions of this thesis are to propose models for objectively assessing the physical condition of document images and to classify their degradations. In this thesis, three datasets of degraded document images along with the subjective ratings for each image are developed. In addition, three no-reference document image quality assessment (NR-DIQA) metrics are proposed for historical and medieval document images. It should be mentioned that degraded medieval document images are a subset of the historical document images and may contain both graphical and textual content. Finally, we propose a degradation classification model in order to identify common distortion types in old document images. Essentially, existing no reference image quality assessment (NR-IQA) metrics are not designed to assess physical document distortions. In the first contribution, we propose the first dataset of degraded document images along with the human opinion scores for each document image. This dataset is introduced to evaluate the quality of historical document images. We also propose an objective NR-DIQA metric based on the statistics of the mean subtracted contrast normalized (MSCN) coefficients computed from segmented layers of each document image. The segmentation into four layers of foreground and background is done based on an analysis of the log-Gabor filters. This segmentation is based on the assumption that the sensitivity of the human visual system (HVS) is different at the locations of text and non-text. Experimental results show that the proposed metric has comparable or better performance than the state-of-the-art metrics, while it has a moderate complexity. Degradation identification and quality assessment can complement each other to provide information on both type and severity of degradations in document images. Therefore, we introduced, in the second contribution, a multi-distortion historical document image database that can be used for the research on quality assessment of degraded documents as well as degradation classification. The developed dataset contains historical document images which are classified into four categories based on their distortion types, namely, paper translucency, stain, readers’ annotations, and worn holes. An efficient NR-DIQA metric is then proposed based on three sets of spatial and frequency image features extracted from two layers of text and non-text. In addition, these features are used to estimate the probability of the four aforementioned physical distortions for the first time in the literature. Both proposed quality assessment and degradation classification models deliver a very promising performance. Finally, we develop in the third contribution a dataset and a quality assessment metric for degraded medieval document (DMD) images. This type of degraded images contains both textual and pictorial information. The introduced DMD dataset is the first dataset in its category that also provides human ratings. Also, we propose a new no-reference metric in order to evaluate the quality of DMD images in the developed dataset. The proposed metric is based on the extraction of several statistical features from three layers of text, non-text, and graphics. The segmentation is based on color saliency with assumption that pictorial parts are colorful. It also follows HVS that gives different weights to each layer. The experimental results validate the effectiveness of the proposed NR-DIQA strategy for DMD images

    Error-correcting codes and applications to large scale classification systems

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (p. 37-39).In this thesis, we study the performance of distributed output coding (DOC) and error-Correcting output coding (ECOC) as potential methods for expanding the class of tractable machine-learning problems. Using distributed output coding, we were able to scale a neural-network-based algorithm to handle nearly 10,000 output classes. In particular, we built a prototype OCR engine for Devanagari and Korean texts based upon distributed output coding. We found that the resulting classifiers performed better than existing algorithms, while maintaining small size. Error-correction, however, was found to be ineffective at increasing the accuracy of the ensemble. For each language, we also tested the feasibility of automatically finding a good codebook. Unfortunately, the results in this direction were primarily negative.by Jeremy Scott Hurwitz.M.Eng
    corecore