    An Online Content Based Email Attachments Retrieval System

    E-mail is one of the most popular programs used by most people today. As a result of the continuous daily use, thousands of messages are accumulated in the electronic box of most individuals, which make it difficult for them after a period of time to retrieve the attachments of these messages. Most Email providers constantly improved their search technology, but till now there is something could not be done; i.e., searching inside attachments. Some email providers like Gmail has added searching words inside attachments for some file types (.pdf files, .doc documents, .ppt presentations) but for image files this feature not supported till now. However, E-mail providers and even modern researchers have not focused on retrieving the image attachments in the E- mail box. The paper was aimed to introduce a novel idea of using Content based Image Retrieval (CBIR) in E-mail application to retrieve images from email attachments based on entire contents. The work main phases are: feature extraction based on color features and connect to Email server to read Emails, the second phase is retrieving similar image attachments. The tests carried on email inbox contain 100 messages with 500 image attachments and gave good precision and recall rates When the threshold value is less than or equal to 0.4

    Content-based indexing of low resolution documents

    In any multimedia presentation, the trend for attendees taking pictures of slides that interest them during the presentation using capturing devices is gaining popularity. To enhance the image usefulness, the images captured could be linked to image or video database. The database can be used for the purpose of file archiving, teaching and learning, research and knowledge management, which concern image search. However, the above-mentioned devices include cameras or mobiles phones have low resolution resulted from poor lighting and noise. Content-Based Image Retrieval (CBIR) is considered among the most interesting and promising fields as far as image search is concerned. Image search is related with finding images that are similar for the known query image found in a given image database. This thesis concerns with the methods used for the purpose of identifying documents that are captured using image capturing devices. In addition, the thesis also concerns with a technique that can be used to retrieve images from an indexed image database. Both concerns above apply digital image processing technique. To build an indexed structure for fast and high quality content-based retrieval of an image, some existing representative signatures and the key indexes used have been revised. The retrieval performance is very much relying on how the indexing is done. The retrieval approaches that are currently in existence including making use of shape, colour and texture features. Putting into consideration these features relative to individual databases, the majority of retrievals approaches have poor results on low resolution documents, consuming a lot of time and in the some cases, for the given query image, irrelevant images are obtained. The proposed identification and indexing method in the thesis uses a Visual Signature (VS). VS consists of the captures slides textual layout’s graphical information, shape’s moment and spatial distribution of colour. This approach, which is signature-based are considered for fast and efficient matching to fulfil the needs of real-time applications. The approach also has the capability to overcome the problem low resolution document such as noisy image, the environment’s varying lighting conditions and complex backgrounds. We present hierarchy indexing techniques, whose foundation are tree and clustering. K-means clustering are used for visual features like colour since their spatial distribution give a good image’s global information. Tree indexing for extracted layout and shape features are structured hierarchically and Euclidean distance is used to get similarity image for CBIR. The assessment of the proposed indexing scheme is conducted based on recall and precision, a standard CBIR retrieval performance evaluation. We develop CBIR system and conduct various retrieval experiments with the fundamental aim of comparing the accuracy during image retrieval. A new algorithm that can be used with integrated visual signatures, especially in late fusion query was introduced. The algorithm has the capability of reducing any shortcoming associated with normalisation in initial fusion technique. Slides from conferences, lectures and meetings presentation are used for comparing the proposed technique’s performances with that of the existing approaches with the help of real data. This finding of the thesis presents exciting possibilities as the CBIR systems is able to produce high quality result even for a query, which uses low resolution documents. In the future, the utilization of multimodal signatures, relevance feedback and artificial intelligence technique are recommended to be used in CBIR system to further enhance the performance

    Ανάπτυξη ολοκληρωμένου περιβάλλοντος ανάλυσης και ταξινόμησης μαστογραφικών εικόνων

    Η μαστογραφία είναι μια αποτελεσματική και ασφαλής μέθοδος για την διάγνωση του καρκίνου. Ωστόσο, η ερμηνεία των μαστογραφιών ενέχει δυσκολίες για τους ακτινολόγους. Έτσι, έχουν αναπτυχθεί συστήματα υποβοηθούμενης διάγνωσης (CAD) που παρέχουν μια δεύτερη γνώμη για την τελική τους διάγνωση. Οι μικροασβεστώσεις είναι ευρήματα που σχετίζονται με τον καρκίνο του μαστού και μπορεί να είναι καλοήθεις ή κακοήθεις. Στην παρούσα εργασία παρουσιάζουμε το CAD σύστημα Ιπποκράτης-μστ που στοχεύει στην ανάλυση και αξιολόγηση μεμονωμένων μικροασβεστώσεων και συμπλεγμάτων. Η εφαρμογή περιλαμβάνει: α) αρχειοθέτηση ασθενών, β) χρήση τεχνικών ανάλυσης μαστογραφικής εικόνας, γ) ανίχνευση και ανάλυση μικροασβεστώσεων δ) εξαγωγή διάγνωσης στηριζόμενη στον αλγόριθμο SVM. Επίσης, αναπτύξαμε ένα συνδυαστικό σχήμα ταξινόμησης μαστογραφικών εικόνων που αποτελείται από έναν SVM και έναν νέο ταξινομητή που δημιουργήσαμε. Ο SVM εκπαιδεύεται με ένα μικρό σύνολο χαρακτηριστικών των μικροασβεστώσεων που επελέγησαν μετά από υπολογισμούς. Ο νέος ταξινομητής κατηγοριοποιεί νέες μαστογραφίες με βάση το περιεχόμενό τους και στηρίζεται στον υπολογισμό αποστάσεων ανάμεσα στο διάνυσμα χαρακτηριστικών της άγνωστης εικόνας και των γνωστών εικόνων. Η απόφαση προκύπτει από τις ψήφους των κοντινότερων γνωστών εικόνων. Η τελική πρόβλεψη της άγνωστης εικόνας, προκύπτει από τον συνδυασμό των προβλέψεων των δυο ταξινομητών, με εφαρμογή ενός απλού κανόνα. Επίσης, επικαιροποιήθηκε η διαδικτυακή βάση μαστογραφικών εικόνων MIRACLE.Mammography is an effective and safe method to diagnose breast cancer. However, the interpretation of mammograms involves difficulties for the radiologists. Hence, Computer Aided Diagnosis (CAD) systems have been developed to provide radiologists a second opinion. Microcalcifications are benign or malignant findings that relate to breast cancer. In this master thesis we present the CAD system Hippoctates-mst which is based on the analysis of single microcalcifications and clusters. The implementation includes: a) patient’s archive, b) mammographic image analysis, c) MCs detection and analysis d) final diagnosis based on SVM algorithm. During this project a combined classification scheme has been developed to classify mammographic images. This scheme consists of an SVM classifier and a new classifier we have created. The SVM is trained with a small group of features that were selected after calculations. The other classifier categorizes new mammograms based on their content and relies on the calculation of distances between the feature vector of the unknown image and the known images. The decision is based on majority voting regarding the nearest known images. The final prediction arises from the combination of the predictions of the two classifiers by applying a simple rule. In addition, we updated the MIRACLE database

    Erschließung und bildliche Dokumentation von Wasserzeichen in Online-Datenbanken

    Die Untersuchung von Wasserzeichen zählt in vielen quellenorientierten Wissenschaften wie der Musikwissenschaft oder der Mediävistik zu den Standardmethoden. In den 1990er Jahren entstanden die ersten Online-Wasserzeichendatenbanken. Die Wissenschaft erhielt dadurch Zugriff auf umfangreiches Vergleichs-material zur Datierung, Zuschreibung oder Echtheitsbestimmung. Die Erschließung und bildliche Dokumentation von Wasserzeichen stellt allerdings eine Herausforderung dar, da es sich um komplexe nicht-textuelle Objekte handelt. Die Arbeit analysiert und bewertet aktuelle Wasserzeichendatenbanken und diskutiert Konzepte zur Optimierung im Bereich der Erschließung und des Information Retrieval. Zunächst wird der spezielle Gegenstandsbereich der Wasserzeichen betrachtet. Darauf aufbauend werden inhaltliche und informationswissenschaftliche Anforderungen an Indexierungssprachen im Bereich der Wasserzeichenerschließung formuliert. Im Zentrum der Arbeit steht die Analyse und Evaluation der Datenbank „Wasserzeicheninformationssystem Deutschland (WZIS)“. Als Strategie zur Optimierung wird der Einsatz facettierter Indexierungssprachen erörtert

    Framework for Automatic Identification of Paper Watermarks with Chain Codes

    Title from PDF of title page viewed May 21, 2018Dissertation advisor: Reza DerakhshaniVitaIncludes bibliographical references (pages 220-235)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2017In this dissertation, I present a new framework for automated description, archiving, and identification of paper watermarks found in historical documents and manuscripts. The early manufacturers of paper have introduced the embedding of identifying marks and patterns as a sign of a distinct origin and perhaps as a signature of quality. Thousands of watermarks have been studied, classified, and archived. Most of the classification categories are based on image similarity and are searchable based on a set of defined contextual descriptors. The novel method presented here is for automatic classification, identification (matching) and retrieval of watermark images based on chain code descriptors (CC). The approach for generation of unique CC includes a novel image preprocessing method to provide a solution for rotation and scale invariant representation of watermarks. The unique codes are truly reversible, providing high ratio lossless compression, fast searching, and image matching. The development of a novel distance measure for CC comparison is also presented. Examples for the complete process are given using the recently acquired watermarks digitized with hyper-spectral imaging of Summa Theologica, the work of Antonino Pierozzi (1389 – 1459). The performance of the algorithm on large datasets is demonstrated using watermarks datasets from well-known library catalogue collections.Introduction -- Paper and paper watermarks -- Automatic identification of paper watermarks -- Rotation, Scale and translation invariant chain code -- Comparison of RST_Invariant chain code -- Automatic identification of watermarks with chain codes -- Watermark composite feature vector -- Summary -- Appendix A. Watermarks from the Bernstein Collection used in this study -- Appendix B. The original and transformed images of watermarks -- Appendix C. The transformed and scaled images of watermarks -- Appendix D. Example of chain cod