9 research outputs found

    Texture-based Video Indexing

    Get PDF
    International audienceWe present a method for indexing and searching video sequences based on textural information. Our method proceeds by first defining and computing texture descriptors relevant to the database at hand. Then a similarity distance is computed between a video sample given by the user and small space-time regions in the sequences. The system returns an ordered set of best matches. We show some results on real sequences, which indicate that this scheme has reasonable complexity and performs well in practice. We believe that these good results stem mainly from a) the introduction of a new texture descriptor based on dynamic local H¨older exponents, and b) the automated adapted choice of the relevant parameters

    e-DOCSPROS : exploring TEXPROS into e-business era

    Get PDF
    Document processing is a critical element of office automation. TEXPROS (TEXt PROcessing System) is a knowledge-based system designed to manage personal documents. However, as the Internet and e-Business changed the way offices operate, there is a need to re-envision document processing, storage, retrieval, and sharing. In the current environment, people must be able to access documents remotely and to share those documents with others. e-DOCPROS (e-DOCument PROcessing System) is a new document processing system that takes advantage of many of TEXPROS\u27s structures but adapts the system to this new environment. The new system is built to serve e-businesses, takes advantage of Internet protocols, and to give remote access and document sharing. e-DOCPROS meets the challenge to provide wider usage, and eventually will improve the efficiency and effectiveness of office automation. It allows end users to access their data through any Web browser with Internet access, even a wireless network, which will evolutionarily change the way we manage information. The application of e-DOCPROS to e-Business is considered. Four types of business models re considered here. The first is the Business-to-Business (B2B) model, which performs business-to-business transactions through an Extranet. The Extranet consists of multiple Intranets connected via the Internet.The second is the Business-to-Consumer (B2Q model, which performs business-to-consumer transactions through the Internet. The third is the Intranet model, which performs transactions within an organization through the organization\u27s network. The fourth is the Consumer-to-Consumer (C2C) model, which performs consumer-to consumer transactions through the Internet. A triple model is proposed in this dissertation to integrate organization type hierarchy and document type hierarchy together into folder organization. e-DOCPROS introduces new features into TEXPROS to support those four business models and to accommodate the system requirements. Extensible Markup Language (XML), an industrial standard protocol for data exchange, is employed to achieve the goal of information exchange between e-DOCPROS and the other systems, and also among the subsystems within e-DOCPROS. Document Object Model (DOM) specification is followed throughout the implementation of e-DOCPROS to achieve portability. Agent-based Application Service Provider (ASP) implementation is employed in e-DOCPROS system to achieve cost-effectiveness and accessibility

    Knowledge-based document retrieval with application to TEXPROS

    Get PDF
    Document retrieval in an information system is most often accomplished through keyword search. The common technique behind keyword search is indexing. The major drawback of such a search technique is its lack of effectiveness and accuracy. It is very common in a typical keyword search over the Internet to identify hundreds or even thousands of records as the potentially desired records. However, often few of them are relevant to users\u27 interests. This dissertation presents knowledge-based document retrieval architecture with application to TEXPROS. The architecture is based on a dual document model that consists of a document type hierarchy and, a folder organization. Using the knowledge collected during document filing, the search space can be narrowed down significantly. Combining the classical text-based retrieval methods with the knowledge-based retrieval can improve tremendously both search efficiency and effectiveness. With the proposed predicate-based query language, users can more precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. To assist users formulate a query, a guided search is presented as part of an intelligent user interface. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users\u27 particular interests. A knowledge-based query processing and search engine is presented as the core component in this architecture. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query. Cache is introduced to speed up the process of query refinement. Theoretical proof and performance analysis are performed to prove the efficiency and effectiveness of this knowledge-based document retrieval approach

    Automatic document classification and extraction system (ADoCES)

    Get PDF
    Document processing is a critical element of office automation. Document image processing begins from the Optical Character Recognition (OCR) phase with complex processing for document classification and extraction. Document classification is a process that classifies an incoming document into a particular predefined document type. Document extraction is a process that extracts information pertinent to the users from the content of a document and assigns the information as the values of the “logical structure” of the document type. Therefore, after document classification and extraction, a paper document will be represented in its digital form instead of its original image file format, which is called a frame instance. A frame instance is an operable and efficient form that can be processed and manipulated during document filing and retrieval. This dissertation describes a system to support a complete procedure, which begins with the scanning of the paper document into the system and ends with the output of an effective digital form of the original document. This is a general-purpose system with “learning” ability and, therefore, it can be adapted easily to many application domains. In this dissertation, the “logical closeness” segmentation method is proposed. A novel representation of document layout structure - Labeled Directed Weighted Graph (LDWG) and a methodology of transforming document segmentation into LDWG representation are described. To find a match between two LDWGs, string representation matching is applied first instead of doing graph comparison directly, which reduces the time necessary to make the comparison. Applying artificial intelligence, the system is able to learn from experiences and build samples of LDWGs to represent each document type. In addition, the concept of frame templates is used for the document logical structure representation. The concept of Document Type Hierarchy (DTH) is also enhanced to express the hierarchical relation over the logical structures existing among the documents

    The Application of Machine Learning to At-Risk Cultural Heritage Image Data

    Get PDF
    This project investigates the application of Convolutional Neural Network (CNN) methods and technologies to problems related to At-Risk cultural heritage object recognition. The primary aim for this work is the use of developmental software combining the disciplines of computer vision and artefact studies, developing applications in the field of heritage protection specifically related to the illegal antiquities market. To accomplish this digital image data provided by the Durham University Oriental Museum was used in conjunction with several different implementations of pre-trained CNN software models, for the purposes of artefact Classification and Identification. Testing focused on data capture using a variety of digital recording devices, guided by the developmental needs of a heritage programme seeking to create software solutions to heritage threats in the Middle East and North Africa (MENA) region. Quantitative data results using information retrieval metrics is reported for all model and test sets, and has been used to evaluate the models predictive results

    Content-based image retrieval-- a small sample learning approach.

    Get PDF
    Tao Dacheng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 70-75).Abstracts in English and Chinese.Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Content-based Image Retrieval --- p.1Chapter 1.2 --- SVM based RF in CBIR --- p.3Chapter 1.3 --- DA based RF in CBIR --- p.4Chapter 1.4 --- Existing CBIR Engines --- p.5Chapter 1.5 --- Practical Applications of CBIR --- p.10Chapter 1.6 --- Organization of this thesis --- p.11Chapter Chapter 2 --- Statistical Learning Theory and Support Vector Machine --- p.12Chapter 2.1 --- The Recognition Problem --- p.12Chapter 2.2 --- Regularization --- p.14Chapter 2.3 --- The VC Dimension --- p.14Chapter 2.4 --- Structure Risk Minimization --- p.15Chapter 2.5 --- Support Vector Machine --- p.15Chapter 2.6 --- Kernel Space --- p.17Chapter Chapter 3 --- Discriminant Analysis --- p.18Chapter 3.1 --- PCA --- p.18Chapter 3.2 --- KPCA --- p.18Chapter 3.3 --- LDA --- p.20Chapter 3.4 --- BDA --- p.20Chapter 3.5 --- KBDA --- p.21Chapter Chapter 4 --- Random Sampling Based SVM --- p.24Chapter 4.1 --- Asymmetric Bagging SVM --- p.25Chapter 4.2 --- Random Subspace Method SVM --- p.26Chapter 4.3 --- Asymmetric Bagging RSM SVM --- p.26Chapter 4.4 --- Aggregation Model --- p.30Chapter 4.5 --- Dissimilarity Measure --- p.31Chapter 4.6 --- Computational Complexity Analysis --- p.31Chapter 4.7 --- QueryGo Image Retrieval System --- p.32Chapter 4.8 --- Toy Experiments --- p.35Chapter 4.9 --- Statistical Experimental Results --- p.36Chapter Chapter 5 --- SSS Problems in KBDA RF --- p.42Chapter 5.1 --- DKBDA --- p.43Chapter 5.1.1 --- DLDA --- p.43Chapter 5.1.2 --- DKBDA --- p.43Chapter 5.2 --- NKBDA --- p.48Chapter 5.2.1 --- NLDA --- p.48Chapter 5.2.2 --- NKBDA --- p.48Chapter 5.3 --- FKBDA --- p.49Chapter 5.3.1 --- FLDA --- p.49Chapter 5.3.2 --- FKBDA --- p.49Chapter 5.4 --- Experimental Results --- p.50Chapter Chapter 6 --- NDA based RF for CBIR --- p.52Chapter 6.1 --- NDA --- p.52Chapter 6.2 --- SSS Problem in NDA --- p.53Chapter 6.2.1 --- Regularization method --- p.53Chapter 6.2.2 --- Null-space method --- p.54Chapter 6.2.3 --- Full-space method --- p.54Chapter 6.3 --- Experimental results --- p.55Chapter 6.3.1 --- K nearest neighbor evaluation for NDA --- p.55Chapter 6.3.2 --- SSS problem --- p.56Chapter 6.3.3 --- Evaluation experiments --- p.57Chapter Chapter 7 --- Medical Image Classification --- p.59Chapter 7.1 --- Introduction --- p.59Chapter 7.2 --- Region-based Co-occurrence Matrix Texture Feature --- p.60Chapter 7.3 --- Multi-level Feature Selection --- p.62Chapter 7.4 --- Experimental Results --- p.63Chapter 7.4.1 --- Data Set --- p.64Chapter 7.4.2 --- Classification Using Traditional Features --- p.65Chapter 7.4.3 --- Classification Using the New Features --- p.66Chapter Chapter 8 --- Conclusion --- p.68Bibliography --- p.7

    New Techniques for Clustering Complex Objects

    Get PDF
    The tremendous amount of data produced nowadays in various application domains such as molecular biology or geography can only be fully exploited by efficient and effective data mining tools. One of the primary data mining tasks is clustering, which is the task of partitioning points of a data set into distinct groups (clusters) such that two points from one cluster are similar to each other whereas two points from distinct clusters are not. Due to modern database technology, e.g.object relational databases, a huge amount of complex objects from scientific, engineering or multimedia applications is stored in database systems. Modelling such complex data often results in very high-dimensional vector data ("feature vectors"). In the context of clustering, this causes a lot of fundamental problems, commonly subsumed under the term "Curse of Dimensionality". As a result, traditional clustering algorithms often fail to generate meaningful results, because in such high-dimensional feature spaces data does not cluster anymore. But usually, there are clusters embedded in lower dimensional subspaces, i.e. meaningful clusters can be found if only a certain subset of features is regarded for clustering. The subset of features may even be different for varying clusters. In this thesis, we present original extensions and enhancements of the density-based clustering notion to cope with high-dimensional data. In particular, we propose an algorithm called SUBCLU (density-connected Subspace Clustering) that extends DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to the problem of subspace clustering. SUBCLU efficiently computes all clusters of arbitrary shape and size that would have been found if DBSCAN were applied to all possible subspaces of the feature space. Two subspace selection techniques called RIS (Ranking Interesting Subspaces) and SURFING (SUbspaces Relevant For clusterING) are proposed. They do not compute the subspace clusters directly, but generate a list of subspaces ranked by their clustering characteristics. A hierarchical clustering algorithm can be applied to these interesting subspaces in order to compute a hierarchical (subspace) clustering. In addition, we propose the algorithm 4C (Computing Correlation Connected Clusters) that extends the concepts of DBSCAN to compute density-based correlation clusters. 4C searches for groups of objects which exhibit an arbitrary but uniform correlation. Often, the traditional approach of modelling data as high-dimensional feature vectors is no longer able to capture the intuitive notion of similarity between complex objects. Thus, objects like chemical compounds, CAD drawings, XML data or color images are often modelled by using more complex representations like graphs or trees. If a metric distance function like the edit distance for graphs and trees is used as similarity measure, traditional clustering approaches like density-based clustering are applicable to those data. However, we face the problem that a single distance calculation can be very expensive. As clustering performs a lot of distance calculations, approaches like filter and refinement and metric indices get important. The second part of this thesis deals with special approaches for clustering in application domains with complex similarity models. We show, how appropriate filters can be used to enhance the performance of query processing and, thus, clustering of hierarchical objects. Furthermore, we describe how the two paradigms of filtering and metric indexing can be combined. As complex objects can often be represented by using different similarity models, a new clustering approach is presented that is able to cluster objects that provide several different complex representations
    corecore