13,790 research outputs found
Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach
A significant amount of search queries originate from some real world
information need or tasks. In order to improve the search experience of the end
users, it is important to have accurate representations of tasks. As a result,
significant amount of research has been devoted to extracting proper
representations of tasks in order to enable search systems to help users
complete their tasks, as well as providing the end user with better query
suggestions, for better recommendations, for satisfaction prediction, and for
improved personalization in terms of tasks. Most existing task extraction
methodologies focus on representing tasks as flat structures. However, tasks
often tend to have multiple subtasks associated with them and a more
naturalistic representation of tasks would be in terms of a hierarchy, where
each task can be composed of multiple (sub)tasks. To this end, we propose an
efficient Bayesian nonparametric model for extracting hierarchies of such tasks
\& subtasks. We evaluate our method based on real world query log data both
through quantitative and crowdsourced experiments and highlight the importance
of considering task/subtask hierarchies.Comment: 10 pages. Accepted at SIGIR 2017 as a full pape
Detecting hierarchical and overlapping network communities using locally optimal modularity changes
Agglomerative clustering is a well established strategy for identifying
communities in networks. Communities are successively merged into larger
communities, coarsening a network of actors into a more manageable network of
communities. The order in which merges should occur is not in general clear,
necessitating heuristics for selecting pairs of communities to merge. We
describe a hierarchical clustering algorithm based on a local optimality
property. For each edge in the network, we associate the modularity change for
merging the communities it links. For each community vertex, we call the
preferred edge that edge for which the modularity change is maximal. When an
edge is preferred by both vertices that it links, it appears to be the optimal
choice from the local viewpoint. We use the locally optimal edges to define the
algorithm: simultaneously merge all pairs of communities that are connected by
locally optimal edges that would increase the modularity, redetermining the
locally optimal edges after each step and continuing so long as the modularity
can be further increased. We apply the algorithm to model and empirical
networks, demonstrating that it can efficiently produce high-quality community
solutions. We relate the performance and implementation details to the
structure of the resulting community hierarchies. We additionally consider a
complementary local clustering algorithm, describing how to identify
overlapping communities based on the local optimality condition.Comment: 10 pages; 4 tables, 3 figure
Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network
Bibliographic analysis considers the author's research areas, the citation
network and the paper content among other things. In this paper, we combine
these three in a topic model that produces a bibliographic model of authors,
topics and documents, using a nonparametric extension of a combination of the
Poisson mixed-topic link model and the author-topic model. This gives rise to
the Citation Network Topic Model (CNTM). We propose a novel and efficient
inference algorithm for the CNTM to explore subsets of research publications
from CiteSeerX. The publication datasets are organised into three corpora,
totalling to about 168k publications with about 62k authors. The queried
datasets are made available online. In three publicly available corpora in
addition to the queried datasets, our proposed model demonstrates an improved
performance in both model fitting and document clustering, compared to several
baselines. Moreover, our model allows extraction of additional useful knowledge
from the corpora, such as the visualisation of the author-topics network.
Additionally, we propose a simple method to incorporate supervision into topic
modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin
- …