52,953 research outputs found

    Of course we share! Testing Assumptions about Social Tagging Systems

    Full text link
    Social tagging systems have established themselves as an important part in today's web and have attracted the interest from our research community in a variety of investigations. The overall vision of our community is that simply through interactions with the system, i.e., through tagging and sharing of resources, users would contribute to building useful semantic structures as well as resource indexes using uncontrolled vocabulary not only due to the easy-to-use mechanics. Henceforth, a variety of assumptions about social tagging systems have emerged, yet testing them has been difficult due to the absence of suitable data. In this work we thoroughly investigate three available assumptions - e.g., is a tagging system really social? - by examining live log data gathered from the real-world public social tagging system BibSonomy. Our empirical results indicate that while some of these assumptions hold to a certain extent, other assumptions need to be reflected and viewed in a very critical light. Our observations have implications for the design of future search and other algorithms to better reflect the actual user behavior

    Improving the evaluation of web search systems

    Get PDF
    Linkage analysis as an aid to web search has been assumed to be of significant benefit and we know that it is being implemented by many major Search Engines. Why then have few TREC participants been able to scientifically prove the benefits of linkage analysis over the past three years? In this paper we put forward reasons why disappointing results have been found and we identify the linkage density requirements of a dataset to faithfully support experiments into linkage analysis. We also report a series of linkage-based retrieval experiments on a more densely linked dataset culled from the TREC web documents

    On the power laws of language: word frequency distributions

    Get PDF
    About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. Over the years, this phenomenon has been documented and studied extensively. For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a loglog scale, the distribution is concave and appears to be composed of two differently sloped straight lines joined by a smooth curve. A simple generative model is proposed to capture this phenomenon. Theword frequency distributions produced by this model are shown to match the observations both analytically and empirically. © 2017 Copyright held by the owner/author(s)

    Walking across Wikipedia: a scale-free network model of semantic memory retrieval.

    Get PDF
    Semantic knowledge has been investigated using both online and offline methods. One common online method is category recall, in which members of a semantic category like "animals" are retrieved in a given period of time. The order, timing, and number of retrievals are used as assays of semantic memory processes. One common offline method is corpus analysis, in which the structure of semantic knowledge is extracted from texts using co-occurrence or encyclopedic methods. Online measures of semantic processing, as well as offline measures of semantic structure, have yielded data resembling inverse power law distributions. The aim of the present study is to investigate whether these patterns in data might be related. A semantic network model of animal knowledge is formulated on the basis of Wikipedia pages and their overlap in word probability distributions. The network is scale-free, in that node degree is related to node frequency as an inverse power law. A random walk over this network is shown to simulate a number of results from a category recall experiment, including power law-like distributions of inter-response intervals. Results are discussed in terms of theories of semantic structure and processing

    Errors and uncertainties in microwave link rainfall estimation explored using drop size measurements and high-resolution radar data

    Get PDF
    Microwave links can be used for the estimation of path-averaged rainfall by using either the path-integrated attenuation or the difference in attenuation of two signals with different frequencies and/or polarizations. Link signals have been simulated using measured time series of raindrop size distributions (DSDs) over a period of nearly 2 yr, in combination with wind velocity data and Taylor’s hypothesis. For this purpose, Taylor’s hypothesis has been tested using more than 1.5 yr of high-resolution radar data. In terms of correlation between spatial and temporal profiles of rainfall intensities, the validity of Taylor’s hypothesis quickly decreases with distance. However, in terms of error statistics, the hypothesis is seen to hold up to distances of at least 10 km. Errors and uncertainties (mean bias error and root-mean-square error, respectively) in microwave link rainfall estimates due to spatial DSD variation are at a minimum at frequencies (and frequency combinations) where the power-law relation for the conversion to rainfall intensity is close to linear. Errors generally increase with link length, whereas uncertainties decrease because of the decrease of scatter about the retrieval relations because of averaging of spatially variable DSDs for longer links. The exponent of power-law rainfall retrieval relations can explain a large part of the variation in both bias and uncertainty, which means that the order of magnitude of these error statistics can be predicted from the value of this exponent, regardless of the link length
    corecore