168 research outputs found

    Dissimilarity-based representation for radiomics applications

    Full text link
    Radiomics is a term which refers to the analysis of the large amount of quantitative tumor features extracted from medical images to find useful predictive, diagnostic or prognostic information. Many recent studies have proved that radiomics can offer a lot of useful information that physicians cannot extract from the medical images and can be associated with other information like gene or protein data. However, most of the classification studies in radiomics report the use of feature selection methods without identifying the machine learning challenges behind radiomics. In this paper, we first show that the radiomics problem should be viewed as an high dimensional, low sample size, multi view learning problem, then we compare different solutions proposed in multi view learning for classifying radiomics data. Our experiments, conducted on several real world multi view datasets, show that the intermediate integration methods work significantly better than filter and embedded feature selection methods commonly used in radiomics.Comment: conference, 6 pages, 2 figure

    Handwritten Document Analysis for Automatic Writer Recognition

    Get PDF
    In this paper, we show that both the writer identification and the writer verification tasks can be carried out using local features such as graphemes extracted from the segmentation of cursive handwriting. We thus enlarge the scope of the possible use of these two tasks which have been, up to now, mainly evaluated on script handwritings. A textual based Information Retrieval model is used for the writer identification stage. This allows the use of a particular feature space based on feature frequencies. Image queries are handwritten documents projected in this feature space. The approach achieves 95% correct identification on the PSI_DataBase and 86% on the IAM_DataBase. Then writer hypothesis retrieved are analysed during a verification phase. We call upon a mutual information criterion to verify that two documents may have been produced by the same writer or not. Hypothesis testing is used for this purpose. The proposed method is first scaled on the PSI_DataBase then evaluated on the IAM_DataBase. On both databases, similar performance of nearly 96% correct verification is reported, thus making the approach general and very promising for large scale applications in the domain of handwritten document querying and writer verification

    Dynamic voting in multi-view learning for radiomics applications

    Get PDF
    Cancer diagnosis and treatment often require a personalized analysis for each patient nowadays, due to the heterogeneity among the different types of tumor and among patients. Radiomics is a recent medical imaging field that has shown during the past few years to be promising for achieving this personalization. However, a recent study shows that most of the state-of-the-art works in Radiomics fail to identify this problem as a multi-view learning task and that multi-view learning techniques are generally more efficient. In this work, we propose to further investigate the potential of one family of multi-view learning methods based on Multiple Classifiers Systems where one classifier is learnt on each view and all classifiers are combined afterwards. In particular, we propose a random forest based dynamic weighted voting scheme, which personalizes the combination of views for each new patient for classification tasks. The proposed method is validated on several real-world Radiomics problems.Comment: 10 page

    Adaptation de modèles de Markov cachés - Application à la reconnaissance de caractères imprimés

    Get PDF
    International audienceWe present in this paper a new algorithm for the adaptation of hidden Markov models (HMM models). The principle of our iterative adaptive algorithm is to alternate an HMM structure adaptation stage with an HMM Gaussian MAP adaptation stage. This algorithm is applied to the recognition of printed characters to adapt the models learned by a polyfont character recognition engine to new forms of characters. Comparing the results with those of MAP and MLLR classic adaptations shows a slight increase in the performance of the recognition system

    Alpha-Numerical Sequences Extraction in Handwritten Documents

    Full text link
    International audienceIn this paper, we introduce an alpha-numerical sequences extraction system (keywords, numerical fields or alpha-numerical sequences) in unconstrained handwritten documents. Contrary to most of the approaches presented in the literature, our system relies on a global handwriting line model describing two kinds of information : i) the relevant information and ii) the irrelevant information represented by a shallow parsing model. The shallow parsing of isolated text lines allows quick information extraction in any document while rejecting at the same time irrelevant information. Results on a public french incoming mails database show the efficiency of the approach

    Détection de motifs graphiques dans des images de documents anciens

    Get PDF
    International audienceLa détection de motifs graphiques consiste à rechercher dans une collection d'images de documents, les occurences les plus similaires à une requête image. Dans cet article, nous proposons un système non supervisé pour la détection de motifs, sans besoin de segmentation préalable, en nous inspirant de techniques récentes en vision par ordinateur. Notre approche s'appuie sur une décomposition des documents en fenêtres de tailles variées et une description de ces fenêtres par sac de mots visuels, le tout hors-ligne afin de diminuer le temps de calcul. Une technique de compression des données, proposée tout récemment en recherche d'images, permet de maintenir une quantité de mémoire raisonnable, mais nécessite d'approximer le calcul de distance à la requête. De premiers résultats encourageants sont obtenus sur la base de documents DocExplore, une base de documents médiévaux. Abstract-Pattern spotting consists of retrieving the most similar graphical patterns from a collection of document images. Inspired by the recent advances in computer vision and word spotting techniques, we propose in this paper an unsupervised, segmentation-free pattern spotting system. Overall, the system includes a powerful patch-based framework, the bag of visual word model with an offline sliding window mechanism to avoid heavy computational burden during the retrieval process. Our system takes advantage of the most recent powerful compression and distance approximation techniques (product quantization and asymmetric distance computation) to efficiently index the great number of sub-windows produced by sliding windows and allows to retrieve small sized queries in a large indexed corpus

    A syntax directed method for numerical field extraction in incoming mail documents

    Get PDF
    In this article, we propose a generic method for the automatic localisation and recognition of numerical fields (phone number, ZIP code, etc.) in unconstrained handwritten incoming mail documents. The method exploits the syntax of a numerical field as an a priori knowledge to locate it in the document. A syntactical analysis based on Markov models filters the connected component sequences that respect a particular syntax known by the system. Once extracted, the fields are submitted to a numeral recognition process. Hence, we avoids an integral recognition of the document, which is a very tough and time consuming task. We show the efficiency of the method on a real incoming mail document database.Dans cet article, nous présentons une méthode générique d'extraction et de reconnaissance de champs numériques (numéro de téléphone, code postal, etc.) dans des courriers manuscrits non contraints. La méthode d'extraction exploite la syntaxe des champs comme information a priori pour les localiser. Un analyseur syntaxique à base de modèles de Markov filtre les séquences de composantes qui respectent la syntaxe d'un type de champ connu du système. Notre approche permet ainsi d'éviter la reconnaissance totale du document, opération délicate et coûteuse en temps de calcul, puisque seuls les champs localisés sont soumis à un système de reconnaissance. Nous montrons l'efficacité de la méthode sur une base de courriers manuscrits réels de type courrier entrant

    Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing

    Full text link
    This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. First, a region proposal algorithm detects object candidates in the document page images. Next, deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations. Finally, candidate images are ranked by computing the feature similarity with a given input query. A robust experimental protocol evaluates the proposed approach considering each representation scheme (real-valued and binary code) on the DocExplore image database. The experimental results show that the proposed deep models compare favorably to the state-of-the-art image retrieval approaches for images of historical documents, outperforming other deep models by 2.56 percentage points using the same techniques for pattern spotting. Besides, the proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works based on real-valued representations.Comment: 7 page
    • …