Search CORE

13,790 research outputs found

Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach

Author: Awadallah Ahmed Hassan
Spink Amanda
Yang Hui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/06/2017
Field of study

A significant amount of search queries originate from some real world information need or tasks. In order to improve the search experience of the end users, it is important to have accurate representations of tasks. As a result, significant amount of research has been devoted to extracting proper representations of tasks in order to enable search systems to help users complete their tasks, as well as providing the end user with better query suggestions, for better recommendations, for satisfaction prediction, and for improved personalization in terms of tasks. Most existing task extraction methodologies focus on representing tasks as flat structures. However, tasks often tend to have multiple subtasks associated with them and a more naturalistic representation of tasks would be in terms of a hierarchy, where each task can be composed of multiple (sub)tasks. To this end, we propose an efficient Bayesian nonparametric model for extracting hierarchies of such tasks \& subtasks. We evaluate our method based on real world query log data both through quantitative and crowdsourced experiments and highlight the importance of considering task/subtask hierarchies.Comment: 10 pages. Accepted at SIGIR 2017 as a full pape

arXiv.org e-Print Archive

Crossref

UCL Discovery

Detecting hierarchical and overlapping network communities using locally optimal modularity changes

Author: Barber Michael J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/09/2013
Field of study

Agglomerative clustering is a well established strategy for identifying communities in networks. Communities are successively merged into larger communities, coarsening a network of actors into a more manageable network of communities. The order in which merges should occur is not in general clear, necessitating heuristics for selecting pairs of communities to merge. We describe a hierarchical clustering algorithm based on a local optimality property. For each edge in the network, we associate the modularity change for merging the communities it links. For each community vertex, we call the preferred edge that edge for which the modularity change is maximal. When an edge is preferred by both vertices that it links, it appears to be the optimal choice from the local viewpoint. We use the locally optimal edges to define the algorithm: simultaneously merge all pairs of communities that are connected by locally optimal edges that would increase the modularity, redetermining the locally optimal edges after each step and continuing so long as the modularity can be further increased. We apply the algorithm to model and empirical networks, demonstrating that it can efficiently produce high-quality community solutions. We relate the performance and implementation details to the structure of the resulting community hierarchies. We additionally consider a complementary local clustering algorithm, describing how to identify overlapping communities based on the local optimality condition.Comment: 10 pages; 4 tables, 3 figure

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network

Author: Buntine Wray
Lim Kar Wai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/09/2016
Field of study

Bibliographic analysis considers the author's research areas, the citation network and the paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents, using a nonparametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. This gives rise to the Citation Network Topic Model (CNTM). We propose a novel and efficient inference algorithm for the CNTM to explore subsets of research publications from CiteSeerX. The publication datasets are organised into three corpora, totalling to about 168k publications with about 62k authors. The queried datasets are made available online. In three publicly available corpora in addition to the queried datasets, our proposed model demonstrates an improved performance in both model fitting and document clustering, compared to several baselines. Moreover, our model allows extraction of additional useful knowledge from the corpora, such as the visualisation of the author-topics network. Additionally, we propose a simple method to incorporate supervision into topic modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin

arXiv.org e-Print Archive

The Australian National University