10,444 research outputs found
Stylistic Experiments for Information Retrieval
Information retrieval systems are built to handle texts as topical items:
texts are tabulated by occurrence frequencies of content words in them,
under the assumption that text topic is reasonably well modeled by content
word occurrence. But texts have several interesting characteristics beyond
topic. The experiments described in this text investigate {\em stylistic
variation}. Roughly put, style is the difference between two ways of saying
the same thing --- and systematic stylistic variation can be used to
characterize the {\em genre} of documents. These experiments investigate if
stylistic information is distinguishable using simple language engineering
methods, and if in that case this type of information can be used to
improve information retrieval systems.
A first set of experiments shows that simple measures of stylistic
variation can be used to distinguish genres from each other quite
adequately; how well depends on what the genres in question are.
A second set of experiments evaluates the utility of stylistic measures for
the purposes of information retrieval, to identify common characteristics
of relevant and non-relevant documents. The conclusion is that the requests
for information as typically expressed to retrieval systems are too terse
and inspecific for non-topical information to improve retrieval results.
Systems for information access need to be designed from the beginning to
handle richer information about the texts and documents at hand:
information about stylistic variation cannot easily be added to an existing
system.
A third set of experiments explores how an interactive system can be
designed to incorporate stylistic information in the interface between user
and system. These experiments resulted in the design an interface for
categorizing retrieval results by genre, and displaying the retrieval
results using this categorization. This interface is integrated into a
prototype for retrieving information from the World Wide Web
Stylistic Variation in an Information Retrieval Experiment
Texts exhibit considerable stylistic variation. This paper reports an
experiment where a corpus of documents (N= 75 000) is analyzed using various
simple stylistic metrics. A subset (n = 1000) of the corpus has been previously
assessed to be relevant for answering given information retrieval queries. The
experiment shows that this subset differs significantly from the rest of the
corpus in terms of the stylistic metrics studied.Comment: Proceedings of NEMLAP-
Feature Type Analysis in Automated Genre Classification
In this paper, we compare classifiers based on language model, image, and stylistic features for automated genre classification. The majority of previous studies in genre classification have created models based on an amalgamated representation of a document using a multitude of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. By independently modeling and comparing classifiers based on features belonging to three types, describing visual, stylistic, and topical properties, we demonstrate that different genres have distinctive feature strengths.
Towards a style-specific basis for computational beat tracking
Outlined in this paper are a number of sources of evidence, from psychological, ethnomusicological and engineering grounds, to suggest that current approaches to computational beat tracking are incomplete. It is contended that the degree to which cultural knowledge, that is, the specifics of style and associated learnt representational schema, underlie the human faculty of beat tracking has been severely underestimated. Difficulties in building general beat tracking solutions, which can provide both period and phase locking across a large corpus of styles, are highlighted. It is probable that no universal beat tracking model exists which does not utilise a switching model to recognise style and context prior to application
Timbre-invariant Audio Features for Style Analysis of Classical Music
Copyright: (c) 2014 Christof Weiß et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
People on Drugs: Credibility of User Statements in Health Communities
Online health communities are a valuable source of information for patients
and physicians. However, such user-generated resources are often plagued by
inaccuracies and misinformation. In this work we propose a method for
automatically establishing the credibility of user-generated medical statements
and the trustworthiness of their authors by exploiting linguistic cues and
distant supervision from expert sources. To this end we introduce a
probabilistic graphical model that jointly learns user trustworthiness,
statement credibility, and language objectivity. We apply this methodology to
the task of extracting rare or unknown side-effects of medical drugs --- this
being one of the problems where large scale non-expert data has the potential
to complement expert medical knowledge. We show that our method can reliably
extract side-effects and filter out false statements, while identifying
trustworthy users that are likely to contribute valuable medical information
- …