Search CORE

410 research outputs found

Automatic abstracting: a review and an empirical evaluation

Author: Amanda J. Tinker (7173968)
Publication venue
Publication date: 01/01/1997
Field of study

The abstract is a fundamental tool in information retrieval. As condensed representations, they facilitate conservation of the increasingly precious search time and space of scholars, allowing them to manage more effectively an ever-growing deluge of documentation. Traditionally the product of human intellectual effort, attempts to automate the abstracting process began in 1958. Two identifiable automatic abstracting techniques emerged which reflect differing levels of ambition regarding simulation of the human abstracting process, namely sentence extraction and text summarisation. This research paradigm has recently diversified further, with a cross-fertilisation of methods. Commercial systems are beginning to appear, but automatic abstracting is still mainly confined to an experimental arena. The purpose of this study is firstly to chart the historical development and current state of both manual and automatic abstracting; and secondly, to devise and implement an empirical user-based evaluation to assess the adequacy of automatic abstracts derived from sentence extraction techniques according to a set of utility criteria. [Continues.

Loughborough University Institutional Repository

Privacy-preserving data outsourcing in the cloud via semantic data splitting

Author: Batet Montserrat
Sánchez David
Publication venue: 'Elsevier BV'
Publication date: 01/04/2017
Field of study

Even though cloud computing provides many intrinsic benefits, privacy concerns related to the lack of control over the storage and management of the outsourced data still prevent many customers from migrating to the cloud. Several privacy-protection mechanisms based on a prior encryption of the data to be outsourced have been proposed. Data encryption offers robust security, but at the cost of hampering the efficiency of the service and limiting the functionalities that can be applied over the (encrypted) data stored on cloud premises. Because both efficiency and functionality are crucial advantages of cloud computing, in this paper we aim at retaining them by proposing a privacy-protection mechanism that relies on splitting (clear) data, and on the distributed storage offered by the increasingly popular notion of multi-clouds. We propose a semantically-grounded data splitting mechanism that is able to automatically detect pieces of data that may cause privacy risks and split them on local premises, so that each chunk does not incur in those risks; then, chunks of clear data are independently stored into the separate locations of a multi-cloud, so that external entities cannot have access to the whole confidential data. Because partial data are stored in clear on cloud premises, outsourced functionalities are seamlessly and efficiently supported by just broadcasting queries to the different cloud locations. To enforce a robust privacy notion, our proposal relies on a privacy model that offers a priori privacy guarantees; to ensure its feasibility, we have designed heuristic algorithms that minimize the number of cloud storage locations we need; to show its potential and generality, we have applied it to the least structured and most challenging data type: plain textual documents

arXiv.org e-Print Archive

The Oberta in open access

Supervised Learning with Similarity Functions

Author: Jain Prateek
Kar Purushottam
Publication venue
Publication date: 22/10/2012
Field of study

We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi-class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a "goodness" criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using "good" similarity functions. We demonstrate the effectiveness of our model on three important super-vised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector recovery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs.Comment: To appear in the proceedings of NIPS 2012, 30 page

arXiv.org e-Print Archive

CiteSeerX

Memory Structure and Cognitive Maps

Author: Aronowitz Sara
Robins Sarah K.
Stolk Arjen
Publication venue
Publication date
Field of study

A common way to understand memory structures in the cognitive sciences is as a cognitive map. Cognitive maps are representational systems organized by dimensions shared with physical space. The appeal to these maps begins literally: as an account of how spatial information is represented and used to inform spatial navigation. Invocations of cognitive maps, however, are often more ambitious; cognitive maps are meant to scale up and provide the basis for our more sophisticated memory capacities. The extension is not meant to be metaphorical, but the way in which these richer mental structures are supposed to remain map-like is rarely made explicit. Here we investigate this missing link, asking: how do cognitive maps represent non-spatial information? We begin with a survey of foundational work on spatial cognitive maps and then provide a comparative review of alternative, non-spatial representational structures. We then turn to several cutting-edge projects that are engaged in the task of scaling up cognitive maps so as to accommodate non-spatial information: first, on the spatial-isometric approach , encoding content that is non-spatial but in some sense isomorphic to spatial content; second, on the abstraction approach , encoding content that is an abstraction over first-order spatial information; and third, on the embedding approach , embedding non-spatial information within a spatial context, a prominent example being the Method-of-Loci. Putting these cases alongside one another reveals the variety of options available for building cognitive maps, and the distinctive limitations of each. We conclude by reflecting on where these results take us in terms of understanding the place of cognitive maps in memory

PhilPapers

Users' perception of relevance of spoken documents

Author: Crestani F.
Tombros A.
Publication venue
Publication date: 01/01/2000
Field of study

We present the results of a study of user's perception of relevance of documents. The aim is to study experimentally how users' perception varies depending on the form that retrieved documents are presented. Documents retrieved in response to a query are presented to users in a variety of ways, from full text to a machine spoken query-biased automatically-generated summary, and the difference in users' perception of relevance is studied. The experimental results suggest that the effectiveness of advanced multimedia information retrieval applications may be affected by the low level of users' perception of relevance of retrieved documents

University of Strathclyde Institutional Repository

Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs

Author: Osting Braxton
Xiong Jiechao
Xu Qianqian
Yao Yuan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Crowdsourcing platforms are now extensively used for conducting subjective pairwise comparison studies. In this setting, a pairwise comparison dataset is typically gathered via random sampling, either \emph{with} or \emph{without} replacement. In this paper, we use tools from random graph theory to analyze these two random sampling methods for the HodgeRank estimator. Using the Fiedler value of the graph as a measurement for estimator stability (informativeness), we provide a new estimate of the Fiedler value for these two random graph models. In the asymptotic limit as the number of vertices tends to infinity, we prove the validity of the estimate. Based on our findings, for a small number of items to be compared, we recommend a two-stage sampling strategy where a greedy sampling method is used initially and random sampling \emph{without} replacement is used in the second stage. When a large number of items is to be compared, we recommend random sampling with replacement as this is computationally inexpensive and trivially parallelizable. Experiments on synthetic and real-world datasets support our analysis

arXiv.org e-Print Archive