16 research outputs found

    The impact on retrieval effectiveness of skewed frequency distributions

    No full text
    We present an analysis of word senses that provides a fresh insight into the impact of word ambiguity on retrieval effectiveness with potential broader implications for other processes of information retrieval. Using a methodology of forming artificially ambiguous, words known as pseudo-words, and through reference to other researchers’ work, the analysis illustrates that the distribution of the frequency of occurrence of the senses of a word plays a strong role in ambiguity’s impact on effectiveness. Further investigation shows that this analysis may also be applicable to other processes of retrieval, such as Cross Language Information Retrieval, query expansion, retrieval of OCR’ed texts, and stemming. The analysis appears to provide a means of explaining, at least in part, reasons for the processes’ impact (or lack of it) on effectiveness

    Mapping the World of Consumption: Computational Linguistics Analysis of the Google Text Corpus

    No full text
    This article describes a method that develops overviews to bring out the relationships between any loosely connected set of actors/objects. The study examines 37 principal actors involved in the processes of consumption (consumers, brands, ads, stores…), and how they are described on the internet in the Google corpus of linguistic data. The verbs used with each actor constitute a profile of the behaviors that people ascribe to that actor. The analysis synthesizes these profiles into pictures using multidimensional scaling. Separate analyses examine actors as (a) the subject of the verbs, and (b) the object of the verbs. This reliability check reveals highly congruent pictures of the relationship between actors. The paper subsequently examines the most distinctive behaviors of contrasting actors to further understand selected parts of the picture (e.g., how products differ from services). Web chatter is unrestricted in topic, which is produced by people and for people. Therefore, the corpus is a rich source of data, not just for marketing research - as illustrated here - but for almost any branch of research into human affairs
    corecore