85 research outputs found

    Analyzing Disproportionate Reaction via Comparative Multilingual Targeted Sentiment in Twitter

    Get PDF
    Global events such as terrorist attacks are commented upon in social media, such as Twitter, in different languages and from different parts of the world. Most prior studies have focused on monolingual sentiment analysis, and therefore excluded an extensive proportion of the Twitter userbase. In this paper, we perform a multilingual comparative sentiment analysis study on the terrorist attack in Paris, during November 2015. In particular, we look at targeted sentiment, investigating opinions on specific entities, not simply the general sentiment of each tweet. Given the potentially inflammatory and polarizing effect that these types of tweets may have on attitudes, we examine the sentiments expressed about different targets and explore whether disproportionate reaction was expressed about such targets across different languages. Specifically, we assess whether the sentiment for French speaking Twitter users during the Paris attack differs from English-speaking ones. We identify disproportionately negative attitudes in the English dataset over the French one towards some entities and, via a crowdsourcing experiment, illustrate that this also extends to forming an annotator bias

    Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces

    Get PDF
    Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) document–document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document–document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the documentʼs relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.Fil: Gil Costa, Graciela Verónica. Yahoo; México. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; ArgentinaFil: Santos, Rodrygo L. T.. University Of Glasgow; Reino UnidoFil: Macdonald, Craig. University Of Glasgow; Reino UnidoFil: Ounis, Iadh. University Of Glasgow; Reino Unid

    CEFLES2: the remote sensing component to quantify photosynthetic efficiency from the leaf to the region by measuring sun-induced fluorescence in the oxygen absorption bands

    Full text link
    The CEFLES2 campaign during the Carbo Europe Regional Experiment Strategy was designed to provide simultaneous airborne measurements of solar induced fluorescence and CO2 fluxes. It was combined with extensive ground-based quantification of leaf- and canopy-level processes in support of ESA's Candidate Earth Explorer Mission of the "Fluorescence Explorer" (FLEX). The aim of this campaign was to test if fluorescence signal detected from an airborne platform can be used to improve estimates of plant mediated exchange on the mesoscale. Canopy fluorescence was quantified from four airborne platforms using a combination of novel sensors: (i) the prototype airborne sensor AirFLEX quantified fluorescence in the oxygen A and B bands, (ii) a hyperspectral spectrometer (ASD) measured reflectance along transects during 12 day courses, (iii) spatially high resolution georeferenced hyperspectral data cubes containing the whole optical spectrum and the thermal region were gathered with an AHS sensor, and (iv) the first employment of the high performance imaging spectrometer HYPER delivered spatially explicit and multi-temporal transects across the whole region. During three measurement periods in April, June and September 2007 structural, functional and radiometric characteristics of more than 20 different vegetation types in the Les Landes region, Southwest France, were extensively characterized on the ground. The campaign concept focussed especially on quantifying plant mediated exchange processes (photosynthetic electron transport, CO2 uptake, evapotranspiration) and fluorescence emission. The comparison between passive sun-induced fluorescence and active laser-induced fluorescence was performed on a corn canopy in the daily cycle and under desiccation stress. Both techniques show good agreement in detecting stress induced fluorescence change at the 760 nm band. On the large scale, airborne and ground-level measurements of fluorescence were compared on several vegetation types supporting the scaling of this novel remote sensing signal. The multi-scale design of the four airborne radiometric measurements along with extensive ground activities fosters a nested approach to quantify photosynthetic efficiency and gross primary productivity (GPP) from passive fluorescence

    The whens and hows of learning to rank for web search

    No full text
    Web search engines are increasingly deploying many features, combined using learning to rank techniques. However, various practical questions remain concerning the manner in which learning to rank should be deployed. For instance, a sample of documents with sufficient recall is used, such that re-ranking of the sample by the learned model brings the relevant documents to the top. However, the properties of the document sample such as when to stop ranking—i.e. its minimum effective size—remain unstudied. Similarly, effective listwise learning to rank techniques minimise a loss function corresponding to a standard information retrieval evaluation measure. However, the appropriate choice of how to calculate the loss function—i.e. the choice of the learning evaluation measure and the rank depth at which this measure should be calculated—are as yet unclear. In this paper, we address all of these issues by formulating various hypotheses and research questions, before performing exhaustive experiments using multiple learning to rank techniques and different types of information needs on the ClueWeb09 and LETOR corpora. Among many conclusions, we find, for instance, that the smallest effective sample for a given query set is dependent on the type of information need of the queries, the document representation used during sampling and the test evaluation measure. As the sample size is varied, the selected features markedly change—for instance, we find that the link analysis features are favoured for smaller document samples. Moreover, despite reflecting a more realistic user model, the recently proposed ERR measure is not as effective as the traditional NDCG as a learning loss function. Overall, our comprehensive experiments provide the first empirical derivation of best practices for learning to rank deployments

    Search Result Diversification

    No full text
    Ranking in information retrieval has been traditionally approached as a pursuit of relevant information, under the assumption that the users’ information needs are unambiguously conveyed by their submitted queries. Nevertheless, as an inherently limited representation of a more complex information need, every query can arguably be considered ambiguous to some extent. In order to tackle query ambiguity, search result diversification approaches have recently been proposed to produce rankings aimed to satisfy the multiple possible information needs underlying a query. In this survey, we review the published literature on search result diversification. In particular, we discuss the motivations for diversifying the search results for an ambiguous query and provide a formal definition of the search result diversification problem. In addition, we describe the most successful approaches in the literature for producing and evaluating diversity in multiple search domains. Finally, we also discuss recent advances as well as open research directions in the field of search result diversification
    corecore