65,597 research outputs found

    Towards the Automatic Classification of Documents in User-generated Classifications

    Get PDF
    There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing

    Automatic multi-label subject indexing in a multilingual environment

    Get PDF
    This paper presents an approach to automatically subject index fulltext documents with multiple labels based on binary support vector machines(SVM). The aim was to test the applicability of SVMs with a real world dataset. We have also explored the feasibility of incorporating multilingual background knowledge, as represented in thesauri or ontologies, into our text document representation for indexing purposes. The test set for our evaluations has been compiled from an extensive document base maintained by the Food and Agriculture Organization (FAO) of the United Nations (UN). Empirical results show that SVMs are a good method for automatic multi- label classification of documents in multiple languages

    An Experimental Digital Library Platform - A Demonstrator Prototype for the DigLib Project at SICS

    Get PDF
    Within the framework of the Digital Library project at SICS, this thesis describes the implementation of a demonstrator prototype of a digital library (DigLib); an experimental platform integrating several functions in one common interface. It includes descriptions of the structure and formats of the digital library collection, the tailoring of the search engine Dienst, the construction of a keyword extraction tool, and the design and development of the interface. The platform was realised through sicsDAIS, an agent interaction and presentation system, and is to be used for testing and evaluating various tools for information seeking. The platform supports various user interaction strategies by providing: search in bibliographic records (Dienst); an index of keywords (the Keyword Extraction Function (KEF)); and browsing through the hierarchical structure of the collection. KEF was developed for this thesis work, and extracts and presents keywords from Swedish documents. Although based on a comparatively simple algorithm, KEF contributes by supplying a long-felt want in the area of Information Retrieval. Evaluations of the tasks and the interface still remain to be done, but the digital library is very much up and running. By implementing the platform through sicsDAIS, DigLib can deploy additional tools and search engines without interfering with already running modules. If wanted, agents providing other services than SICS can supply, can be plugged in

    Context and Keyword Extraction in Plain Text Using a Graph Representation

    Full text link
    Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources

    Optical tomography: Image improvement using mixed projection of parallel and fan beam modes

    Get PDF
    Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be defined by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The findings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam

    Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlation and Semantic Spaces

    No full text
    This paper proposes a new technique for auto-annotation and semantic retrieval based upon the idea of linearly mapping an image feature space to a keyword space. The new technique is compared to several related techniques, and a number of salient points about each of the techniques are discussed and contrasted. The paper also discusses how these techniques might actually scale to a real-world retrieval problem, and demonstrates this though a case study of a semantic retrieval technique being used on a real-world data-set (with a mix of annotated and unannotated images) from a picture library
    corecore