157,956 research outputs found

    Sentence Complexity in French: a Corpus-Based Approach

    Get PDF
    International audienceLanguage complexity is a notion widely used in a number of linguistic fields and language applications, and can be described by a number of linguistic features and practical measures. This work proposes a closer, data-oriented look at sentence complexity. Starting from a number of different studies, we selected and implemented 52 linguistic features and measured them on a corpus of varied French texts. Using statistical methods, we identify five underlying dimensions of sentence complexity. In addition to providing a better understanding of the phenomenon, these dimensions have been used in some information retrieval experiments

    Text Augmentation: Inserting markup into natural language text with PPM Models

    Get PDF
    This thesis describes a new optimisation and new heuristics for automatically marking up XML documents. These are implemented in CEM, using PPMmodels. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods. Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BIBTEX system and markedup in XML with every field from the original BIBTEX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists’ Communique corpus and the Reuters’ corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory. A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked-up documents. The performance of the new heuristics and optimisation is examined using the four corpora

    Musical Deep Learning: Stylistic Melodic Generation with Complexity Based Similarity

    Get PDF
    The wide-ranging impact of deep learning models implies significant application in music analysis, retrieval, and generation. Initial findings from musical application of a conditional restricted Boltzmann machine (CRBM) show promise towards informing creative computation. Taking advantage of the CRBM’s ability to model temporal dependencies full reconstructions of pieces are achievable given a few starting seed notes. The generation of new material using figuration from the training corpus requires restrictions on the size and memory space of the CRBM, forcing associative rather than perfect recall. Musical analysis and information complexity measures show the musical encoding to be the primary determinant of the nature of the generated results

    Video matching using DC-image and local features

    Get PDF
    This paper presents a suggested framework for video matching based on local features extracted from the DCimage of MPEG compressed videos, without decompression. The relevant arguments and supporting evidences are discussed for developing video similarity techniques that works directly on compressed videos, without decompression, and especially utilising small size images. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and the corresponding computation complexity. The second experiment compares between using local features and global features in video matching, especially in the compressed domain and with the small size images. The results confirmed that the use of DC-image, despite its highly reduced size, is promising as it produces at least similar (if not better) matching precision, compared to the full I-frame. Also, using SIFT, as a local feature, outperforms precision of most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the realtime margin. There are also various optimisations that can be done to improve this computation complexity

    DC-image for real time compressed video matching

    Get PDF
    This chapter presents a suggested framework for video matching based on local features extracted from the DC-image of MPEG compressed videos, without full decompression. In addition, the relevant arguments and supporting evidences are discussed. Several local feature detectors will be examined to select the best for matching using the DC-image. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and computation complexity. The second experiment compares between using local features and global features regarding compressed video matching with respect to the DC-image. The results confirmed that the use of DC-image, despite its highly reduced size, it is promising as it produces higher matching precision, compared to the full I-frame. Also, SIFT, as a local feature, outperforms most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the real-time margin which leaves a space for further optimizations that can be done to improve this computation complexity

    Ordinary Search Engine Users Carrying Out Complex Search Tasks

    Full text link
    Web search engines have become the dominant tools for finding information on the Internet. Due to their popularity, users apply them to a wide range of search needs, from simple look-ups to rather complex information tasks. This paper presents the results of a study to investigate the characteristics of these complex information needs in the context of Web search engines. The aim of the study is to find out more about (1) what makes complex search tasks distinct from simple tasks and if it is possible to find simple measures for describing their complexity, (2) if search success for a task can be predicted by means of unique measures, and (3) if successful searchers show a different behavior than unsuccessful ones. The study includes 60 people who carried out a set of 12 search tasks with current commercial search engines. Their behavior was logged with the Search-Logger tool. The results confirm that complex tasks show significantly different characteristics than simple tasks. Yet it seems to be difficult to distinguish successful from unsuccessful search behaviors. Good searchers can be differentiated from bad searchers by means of measurable parameters. The implications of these findings for search engine vendors are discussed.Comment: 60 page

    Stylistic Variation in an Information Retrieval Experiment

    Full text link
    Texts exhibit considerable stylistic variation. This paper reports an experiment where a corpus of documents (N= 75 000) is analyzed using various simple stylistic metrics. A subset (n = 1000) of the corpus has been previously assessed to be relevant for answering given information retrieval queries. The experiment shows that this subset differs significantly from the rest of the corpus in terms of the stylistic metrics studied.Comment: Proceedings of NEMLAP-

    Sequential Complexity as a Descriptor for Musical Similarity

    Get PDF
    We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio

    Affective feedback: an investigation into the role of emotions in the information seeking process

    Get PDF
    User feedback is considered to be a critical element in the information seeking process, especially in relation to relevance assessment. Current feedback techniques determine content relevance with respect to the cognitive and situational levels of interaction that occurs between the user and the retrieval system. However, apart from real-life problems and information objects, users interact with intentions, motivations and feelings, which can be seen as critical aspects of cognition and decision-making. The study presented in this paper serves as a starting point to the exploration of the role of emotions in the information seeking process. Results show that the latter not only interweave with different physiological, psychological and cognitive processes, but also form distinctive patterns, according to specific task, and according to specific user

    A Semantic Similarity Measure for Expressive Description Logics

    Full text link
    A totally semantic measure is presented which is able to calculate a similarity value between concept descriptions and also between concept description and individual or between individuals expressed in an expressive description logic. It is applicable on symbolic descriptions although it uses a numeric approach for the calculus. Considering that Description Logics stand as the theoretic framework for the ontological knowledge representation and reasoning, the proposed measure can be effectively used for agglomerative and divisional clustering task applied to the semantic web domain.Comment: 13 pages, Appeared at CILC 2005, Convegno Italiano di Logica Computazionale also available at http://www.disp.uniroma2.it/CILC2005/downloads/papers/15.dAmato_CILC05.pd
    corecore