16 research outputs found
The impact on retrieval effectiveness of skewed frequency distributions
We present an analysis of word senses that provides a fresh insight
into the impact of word ambiguity on retrieval effectiveness with
potential broader implications for other processes of information
retrieval. Using a methodology of forming artificially ambiguous,
words known as pseudo-words, and through reference to other
researchers’ work, the analysis illustrates that the distribution of
the frequency of occurrence of the senses of a word plays a strong
role in ambiguity’s impact on effectiveness. Further investigation
shows that this analysis may also be applicable to other processes
of retrieval, such as Cross Language Information Retrieval, query
expansion, retrieval of OCR’ed texts, and stemming. The analysis
appears to provide a means of explaining, at least in part, reasons
for the processes’ impact (or lack of it) on effectiveness
Mapping the World of Consumption: Computational Linguistics Analysis of the Google Text Corpus
This article describes a method that develops overviews to bring out the relationships between any loosely connected set of actors/objects. The study examines 37 principal actors involved in the processes of consumption (consumers, brands, ads, stores…), and how they are described on the internet in the Google corpus of linguistic data. The verbs used with each actor constitute a profile of the behaviors that people ascribe to that actor. The analysis synthesizes these profiles into pictures using multidimensional scaling. Separate analyses examine actors as (a) the subject of the verbs, and (b) the object of the verbs. This reliability check reveals highly congruent pictures of the relationship between actors. The paper subsequently examines the most distinctive behaviors of contrasting actors to further understand selected parts of the picture (e.g., how products differ from services). Web chatter is unrestricted in topic, which is produced by people and for people. Therefore, the corpus is a rich source of data, not just for marketing research - as illustrated here - but for almost any branch of research into human affairs