157,956 research outputs found
Sentence Complexity in French: a Corpus-Based Approach
International audienceLanguage complexity is a notion widely used in a number of linguistic fields and language applications, and can be described by a number of linguistic features and practical measures. This work proposes a closer, data-oriented look at sentence complexity. Starting from a number of different studies, we selected and implemented 52 linguistic features and measured them on a corpus of varied French texts. Using statistical methods, we identify five underlying dimensions of sentence complexity. In addition to providing a better understanding of the phenomenon, these dimensions have been used in some information retrieval experiments
Text Augmentation: Inserting markup into natural language text with PPM Models
This thesis describes a new optimisation and new heuristics for automatically marking up XML documents. These are implemented in CEM, using PPMmodels. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods.
Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BIBTEX system and markedup in XML with every field from the original BIBTEX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists’ Communique corpus and the Reuters’ corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory.
A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked-up documents. The performance of the new heuristics and optimisation is examined using the four corpora
Musical Deep Learning: Stylistic Melodic Generation with Complexity Based Similarity
The wide-ranging impact of deep learning models implies significant application in music analysis, retrieval, and generation. Initial findings from musical application of a conditional restricted Boltzmann machine (CRBM) show promise towards informing creative computation. Taking advantage of the CRBM’s ability to model temporal dependencies full reconstructions of pieces are achievable given a few starting seed notes. The generation of new material using figuration from the training corpus requires restrictions on the size
and memory space of the CRBM, forcing associative rather than perfect recall. Musical analysis and information complexity measures show the musical encoding to be the primary determinant of the nature of the generated results
Video matching using DC-image and local features
This paper presents a suggested framework for video matching based on local features extracted from the DCimage of MPEG compressed videos, without decompression. The relevant arguments and supporting evidences are discussed for developing video similarity techniques that works directly on compressed videos, without decompression, and especially utilising small size images. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and the corresponding computation complexity. The second experiment compares between using local features and global features in video matching, especially in the compressed domain and with the small size images. The results confirmed that the use of DC-image, despite its highly reduced size, is promising as it produces at least similar (if not better) matching precision, compared to the full I-frame. Also, using SIFT, as a local feature, outperforms precision of most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the realtime margin. There are also various optimisations that can be done to improve this computation complexity
DC-image for real time compressed video matching
This chapter presents a suggested framework for video matching based on local features extracted from the DC-image of MPEG compressed videos, without full decompression. In addition, the relevant arguments and supporting evidences are discussed. Several local feature detectors will be examined to select the best for matching using the DC-image. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and computation complexity. The second experiment compares between using local features and global features regarding compressed video matching with respect to the DC-image. The results confirmed that the use of DC-image, despite its highly reduced size, it is promising as it produces higher matching precision, compared to the full I-frame. Also, SIFT, as a local feature, outperforms most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the real-time margin which leaves a space for further optimizations that can be done to improve this computation complexity
Ordinary Search Engine Users Carrying Out Complex Search Tasks
Web search engines have become the dominant tools for finding information on
the Internet. Due to their popularity, users apply them to a wide range of
search needs, from simple look-ups to rather complex information tasks. This
paper presents the results of a study to investigate the characteristics of
these complex information needs in the context of Web search engines. The aim
of the study is to find out more about (1) what makes complex search tasks
distinct from simple tasks and if it is possible to find simple measures for
describing their complexity, (2) if search success for a task can be predicted
by means of unique measures, and (3) if successful searchers show a different
behavior than unsuccessful ones. The study includes 60 people who carried out a
set of 12 search tasks with current commercial search engines. Their behavior
was logged with the Search-Logger tool. The results confirm that complex tasks
show significantly different characteristics than simple tasks. Yet it seems to
be difficult to distinguish successful from unsuccessful search behaviors. Good
searchers can be differentiated from bad searchers by means of measurable
parameters. The implications of these findings for search engine vendors are
discussed.Comment: 60 page
Stylistic Variation in an Information Retrieval Experiment
Texts exhibit considerable stylistic variation. This paper reports an
experiment where a corpus of documents (N= 75 000) is analyzed using various
simple stylistic metrics. A subset (n = 1000) of the corpus has been previously
assessed to be relevant for answering given information retrieval queries. The
experiment shows that this subset differs significantly from the rest of the
corpus in terms of the stylistic metrics studied.Comment: Proceedings of NEMLAP-
Sequential Complexity as a Descriptor for Musical Similarity
We propose string compressibility as a descriptor of temporal structure in
audio, for the purpose of determining musical similarity. Our descriptors are
based on computing track-wise compression rates of quantised audio features,
using multiple temporal resolutions and quantisation granularities. To verify
that our descriptors capture musically relevant information, we incorporate our
descriptors into similarity rating prediction and song year prediction tasks.
We base our evaluation on a dataset of 15500 track excerpts of Western popular
music, for which we obtain 7800 web-sourced pairwise similarity ratings. To
assess the agreement among similarity ratings, we perform an evaluation under
controlled conditions, obtaining a rank correlation of 0.33 between intersected
sets of ratings. Combined with bag-of-features descriptors, we obtain
performance gains of 31.1% and 10.9% for similarity rating prediction and song
year prediction. For both tasks, analysis of selected descriptors reveals that
representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio
Affective feedback: an investigation into the role of emotions in the information seeking process
User feedback is considered to be a critical element in the information seeking process, especially in relation to relevance assessment. Current feedback techniques determine content relevance with respect to the cognitive and situational levels of interaction that occurs between the user and the retrieval system. However, apart from real-life problems and information objects, users interact with intentions, motivations and feelings, which can be seen as critical aspects of cognition and decision-making. The study presented in this paper serves as a starting point to the exploration of the role of emotions in the information seeking process. Results show that the latter not only interweave with different physiological, psychological and cognitive processes, but also form distinctive patterns, according to specific task, and according to specific user
A Semantic Similarity Measure for Expressive Description Logics
A totally semantic measure is presented which is able to calculate a
similarity value between concept descriptions and also between concept
description and individual or between individuals expressed in an expressive
description logic. It is applicable on symbolic descriptions although it uses a
numeric approach for the calculus. Considering that Description Logics stand as
the theoretic framework for the ontological knowledge representation and
reasoning, the proposed measure can be effectively used for agglomerative and
divisional clustering task applied to the semantic web domain.Comment: 13 pages, Appeared at CILC 2005, Convegno Italiano di Logica
Computazionale also available at
http://www.disp.uniroma2.it/CILC2005/downloads/papers/15.dAmato_CILC05.pd
- …