16,376 research outputs found
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Using treemaps for variable selection in spatio-temporal visualisation
We demonstrate and reflect upon the use of enhanced treemaps that incorporate spatial and temporal ordering for exploring a large multivariate spatio-temporal data set. The resulting data-dense views summarise and simultaneously present hundreds of space-, time-, and variable-constrained subsets of a large multivariate data set in a structure that facilitates their meaningful comparison and supports visual analysis. Interactive techniques allow localised patterns to be explored and subsets of interest selected and compared with the spatial aggregate. Spatial variation is considered through interactive raster maps and high-resolution local road maps. The techniques are developed in the context of 42.2 million records of vehicular activity in a 98 km(2) area of central London and informally evaluated through a design used in the exploratory visualisation of this data set. The main advantages of our technique are the means to simultaneously display hundreds of summaries of the data and to interactively browse hundreds of variable combinations with ordering and symbolism that are consistent and appropriate for space- and time- based variables. These capabilities are difficult to achieve in the case of spatio-temporal data with categorical attributes using existing geovisualisation methods. We acknowledge limitations in the treemap representation but enhance the cognitive plausibility of this popular layout through our two-dimensional ordering algorithm and interactions. Patterns that are expected (e.g. more traffic in central London), interesting (e.g. the spatial and temporal distribution of particular vehicle types) and anomalous (e.g. low speeds on particular road sections) are detected at various scales and locations using the approach. In many cases, anomalies identify biases that may have implications for future use of the data set for analyses and applications. Ordered treemaps appear to have potential as interactive interfaces for variable selection in spatio-temporal visualisation. Information Visualization (2008) 7, 210-224. doi: 10.1057/palgrave.ivs.950018
Context Models For Web Search Personalization
We present our solution to the Yandex Personalized Web Search Challenge. The
aim of this challenge was to use the historical search logs to personalize
top-N document rankings for a set of test users. We used over 100 features
extracted from user- and query-depended contexts to train neural net and
tree-based learning-to-rank and regression models. Our final submission, which
was a blend of several different models, achieved an NDCG@10 of 0.80476 and
placed 4'th amongst the 194 teams winning 3'rd prize
- …