This paper describes the participation of the University of Twente in the Web track of TREC 2012. Our baseline approach uses the Mirex toolkit, an open source tool that sequantially scans all the documents. For result diversification, we experimented with improving the quality of clusters through ensemble clustering. We combined clusters obtained by different clustering methods (such as LDA and K-means) and clusters obtained by using different types of data (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based diversification and also better than a non-diversification run

Hiemstra, Djoerd

Nguyen, Dong-Phuong

English

University of Twente Research Information

Ensemble Clustering for Result DiversificationDong NguyenHuman Media InteractionUniversity of Twented.nguyen@utwente.nlDjoerd HiemstraDatabase GroupUniversity of Twented.hiemstra@utwente.nlABSTRACTThis paper describes the participation of the University ofTwente in the Web track of TREC 2012. Our baseline ap-proach uses the Mirex toolkit, an open source tool that se-quantially scans all the documents. For result diversifica-tion, we experimented with improving the quality of clus-ters through ensemble clustering. We combined clusters ob-tained by different clustering methods (such as LDA andK-means) and clusters obtained by using different types ofdata (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based di-versification and also better than a non-diversification run.1. INTRODUCTIONWeb queries are often short and ambiguous. Result diversifi-cation, which aims to diversify queries to cover the multiplefacets or subtopics of a query, can improve the quality ofthese queries. A common strategy is to estimate the as-pects/subtopics of the top ranked documents, and rerankthese documents based on the estimated subtopics. Usuallythe subtopics are discovered by clustering the (top ranked)documents. Two well known methods to rerank results areIA-select [1] and xQuAD [12].Recently, researchers have explored combining multiple clus-terings to improve result diversification. Dou et al. [6] usedfour methods to obtain subtopics: anchor texts, query logs,search result clusters and hosts. They proposed a rerank-ing framework that incorporated the subtopics from thesemultiple dimensions. Contrary to our work, they only ex-perimented with clusterings obtained using different datasources, and not with different clustering methods for a par-ticular data source. He et al. [8] proposed a framework tocombine clusters of external resources to regularize implicitsubtopics based on pLSA using random walks.In this work, we explore the use of clustering ensembles toobtain better clusterings for result diversification. Cluster-ing ensembles can combine arbitrary clusterings, for exam-ple based on different data sources (e.g. full document text,anchor text, urls) or by using different clustering methods(such as k-means and LDA [2]). Experiments were done onCategory B of ClueWeb09.We first discuss related work and the track in which we par-ticipated. We then describe our experimental setup and dis-cuss the results. We conclude with a summary and suggestfuture work.2. WEB TRACKThe Web track of TREC 2012 consists of an adhoc and adiversity track. In this paper we focus on the diversity track.Participants initially only have access to plain queries. How-ever, the evaluation of the runs are evaluated using the fulltopic descriptions.Topics are classified either as ambiguous or faceted [5]. Am-biguous queries have several unrelated interpretations. Forexample, an ambiguous query in TREC 2010 was the sun,which could refer to the newspaper or the star in the solarsystem. Faceted queries have a primary interpretation. Thesubtopics then reflect several aspects related to this inter-pretation. For example, a faceted query was Neil Young,with aspects such as Neil Young’s albums, biographical in-formation, lyrics and tour dates.The adhoc task is evaluated using Expected Reciprocal Rank(ERR) [3]. The diversity track is evaluated using an In-tent Aware version [1] of Expected Reciprocal Rank (ERR-IA) where the score for the different subtopics are weightedby the probability of that specific subtopic for the givenquery. In the Web track, these measures are calculated atrank 20. In this paper we also report nDCG@20[10] andα-nDCG@20[4].3. AD HOC RETRIEVALIn this section we describe our approach to obtain a baselineranking. Next, we rerank these results to improve resultdiversification.We use Mirex [9],1 a tool that sequentially scans the doc-uments. Built on Hadoop, sequential scanning becomes aviable approach. In addition, it allows researchers to eas-ily experiment with different retrieval models, because theframework is easy to extend. Documents were scored usinga language model with linear interpolation smoothing and adocument length prior. We decided to only use anchor text,since previous experiments indicated that this gave high pre-cision and still enough recall for this task.We use λ = 0.90 as our baseline for further reranking, afterexperimenting with different smoothing parameters on datafrom the Web track of 2009, 2010 and 2011. The baselinerun is referred to as utw2012lm09.1http://mirex.sourceforge.net4. RESULT DIVERSIFICATIONWe make the simplifying assumption that a document onlybelongs to one topic. However, our described methods caneasily be extended to support methods where documentsbelong to multiple topics.4.1 ClusteringWe experiment with several methods to cluster the docu-ments obtained from the baseline ranking.MethodsI K-means. An iterative algorithm where documents areassigned to the cluster with the nearest mean.II Ward. A hierarchical clustering method, where clustersare merged to minimize the total within-cluster vari-ance.III LDA [2]. A generative model that aims to uncover la-tent topics.IV LSA [11]. A method based on singular value decompo-sition to uncover latent concepts.We also vary the data source.DataI Full text. Cluster documents based on the full text asextracted from the HTML.II Anchor. Cluster documents based on the anchor text.III Host. Documents are assigned to the same cluster whenthey come from the same host.In our experiments, we use the same number of clusters forall clustering methods (except for host clustering, for whichthe number of clusters is dependent on the results). Anoptimized system that would vary the number of clustersbased on the used clustering method or particular querycould potentially provide better results.4.2 Combining Multiple ClusteringsClustering ensembles combine multiple clusterings into a sin-gle clustering. Advantages include more robustness, novelty(a combined solution that may not have been found by theindividual clustering algorithms), more stability and confi-dence, and support of parallelization and scalability [7]. Inthis paper we cluster the documents using multiple meth-ods and across several dimensions, and combine these intoa single, more robust clustering using clustering ensembles.We apply the most simple method for combining multipleclusterings called the cluster based similarity partitioningalgorithm (CSPA) [13]. Two documents have a similarity of1 if they appear in the same cluster. As a result, for eachclustering we are able to create an n × n binary similaritymatrix (with n the number of documents). A similaritymatrix for a combined clustering is then just the average ofthe individual similarity matrices.We experiment with assigning weights to the specific clus-terings. For example, if a certain method has an assignedweight of 0.8, the similarity matrix will have a value of 0.8if the two documents appear in the same cluster (and zerootherwise). We set the weight such that the total weights forthe different clusterings add up to 1. We then apply a clus-tering method on this induced similarity matrix to make afinal clustering. In this paper we use hierarchical clusteringusing the centroid method, where distances are calculatedbased on the centroids of the clusters.The advantage of this approach is that it is independentof the clusterings used. In addition, by combining multipleclusterings into one new clustering, we are also free to chooseany reranking algorithm we like to use. And by findingweights for the different clusterings, we obtain insight intowhat dimension or which clustering methods are effective forresult diversification.I Two-layer Ensemble ClusteringWe experiment with an ensemble clustering over ensembleclusterings. The final clustering is an ensemble clusteringover three clusterings:1. Text clustering. Ensemble clustering based on cluster-ings obtained using K-means, Ward, LDA and LSA onthe full text.2. Anchor clustering. Ensemble clustering based on clus-terings obtained using K-means, Ward, LDA and LSAon the anchor text.3. Host clustering.II Simple Ensemble ClusteringPreliminary experiments on previous TREC data found LDAto be the most effective of the clustering algorithms. There-fore, in this variant we only use LDA as the clustering methodfor the text and anchor data:1. LDA text clustering.2. LDA anchor clustering.3. Host clustering.III One-layer Ensemble ClusteringThis ensemble clustering uses the same clusterings as theTwo-layer Ensemble Clustering, however the clusters are di-rectly combined into a new clustering, instead of applyingtwo layers. Thus we create an ensemble clustering over thefollowing:1. Text - K-means.. . .4. Text - LSA.5. Anchor - K-means.. . .8. Anchor - LSA.9. Host clustering.Run nDCG@20 ERR@20 ERR-IA@20 α-nDCG@20Language modeling baseline (utw2012lm09) 0.122 0.218 0.404 0.505Diversification using LDA (utw2012lda) 0.111 0.215 0.402 0.499Two-layer ensemble clustering (utw2012c1) 0.120 0.220 0.405 0.508Simple ensemble clustering (utw2012sc1) 0.107 0.207 0.398 0.498One-layer ensemble clustering (utw2012fc1) 0.113 0.219 0.400 0.497Two-layer ensemble clustering (utw2012c2) 0.117 0.219 0.399 0.499Table 1: Results4.3 Result RerankingWe use the IA-select algorithm to diversify search resultsbased on clusters [1]. The IA-select algorithm involves com-puting the conditional probability P (c|q) of a subtopic cgiven the query q and the quality value of a document dgiven a query and subtopic, V (d|q, c).The algorithm then selects documents based on the highestmarginal utility:g(d|q, c, S) =∑c∈C(d)U(c|q, S)V (d|q, c)Where U(c|q, S) is initially set to P (c|q) when no docu-ments are selected yet, and updated for every added doc-ument. After preliminary experiments, we decided to calcu-late V (d|q, c) by the score of the document for query q di-vided by the total score of all documents in cluster c. P (c|q)is the sum of the quality values of the documents in clusterc divided by the total sum.4.4 Submitted RunsFor all runs, we rerank the top 1000 documents using clusterswith 25 topics. Parameters were selected using a parametersweep over data from 2009, 2010 and 2011. We submittedthe following runs to the adhoc (AH) and diversity (DIV)task:I [AH] Baseline run (utw2012lm09) A run usinglanguage modeling with λ = 0.9 and the Mirex toolkit.II [DIV] LDA (utw2012lda) A run using LDA cluster-ing based on document text.III [DIV] Two-layer Ensemble Clustering (utw2012c1)Clustering based on anchor text (weight 0.8; ensemblecluster of k-means: 0.2, LDA: 0.6, LSA: 0.2) and text(weight: 0.2; ensemble cluster of Ward: 0.2, LDA: 0.8).IV [DIV] Simple Ensemble Clustering (utw2012sc1)Clustering based on host (weight 0.8) and LDA basedon text (weight 0.2).V [AH] One-layer Ensemble Clustering (utw2012fc1)Due to time constraints the weights for this methodwere not optimized. We used an ensemble clusteringover text using LDA (0.4) and anchor text using K-means (0.2) and Ward (0.4).VI [AH] Two-layer Ensemble Clustering (utw2012c2)The weights for this run were not optimized. Cluster-ing based on host (weight: 0.4), anchor text (weight0.2; ensemble cluster of k-means: 0.33, LDA: 0.5, LSA:0.166) and text (weight: 0.4; ensemble cluster of Ward:0.2, LDA: 0.8).5. RESULTSThe results are presented in Table 1. We find that the base-line, with no diversification, performs very well. We sus-pect that our reranking algorithm is not very effective, sinceonly clustering based on LDA performs worse than the non-diversification run. However, we do find that our two-layerensemble clustering (utw2012c1) performs better than LDAon all measures, and also better than the non-diversificationbaseline on all measures except nDCG@20. When compar-ing based on ERR-IA@20, it performs better than or equalto LDA for 32/50 queries, and for 29/50 queries when com-paring with the LM baseline.The one-layer ensemble clustering (utw2012fc1), performsnot as well as the two-layer ensemble clustering, however,since we only did a partial parameter sweep it is hard to drawany conclusions from this. The simple ensemble clustering(utw2012sc1) performs the worst. We would expect thismethod to perform better than LDA, since LDA is one of theclusterings used. However, we only did a coarse parametersweep, and perhaps have not find the optimal weights yet.But this also illustrates that the used method is sensitiveto the weights that are used. In addition, the performancemight be degraded because of the used clustering method toobtain an ensemble clustering based on the similarity matrix.We will further analyze the performance of our best run(utw2012c1) by comparing with the diversification run us-ing LDA (utw2012lda). The difference in ERR-IA@20 whencomparing LDA and the LM baseline (no diversification) canbe found in Figure 1. A positive value means that the LDArun performed better. A similar graph comparing the two-layer ensemble model and the LM baseline can be found inFigure 2.A query that performed well when comparing ERR-IA@20is query 154 ‘figs’ (Find information on nutritional or healthbenefits of figs), with subtopics on nutritional/health bene-fits, recipes, varieties and growing figs. The LDA run ob-tained an ERR-IA@20 of 0.384, the LM baseline a score of0.402 and the utw2012c1 run scored 0.430.We expect that when using a better reranking algorithm,the results can benefit more from improved clusterings. Wealso encountered some drawbacks with ensemble clusterings.First, we found it to be sensitive to the weights that wereused. In addition, given a similarity matrix, we need todecide on a clustering method. More experiments should bedone to assess what clustering method is the most suitablefor this task.Queries∆ERR-IA@20-0.060.000.06Figure 1: Diff. in performance of LDA (utw2012lda)and the LM baseline.Queries∆ERR-IA@20-0.060.000.06Figure 2: Diff. in performance of two-layer ensembleclustering (utw2012c1) and the LM baseline.6. CONCLUSIONIn this paper we presented the participation of the Uni-versity of Twente in the Web track of TREC 2012. Thisyear, we focused on the diversity track. We used an en-semble clustering approach aimed to improve the quality ofthe document clusters. Our ensemble run performed betterthan the LDA based diversification and also better than anon-diversification run.The main advantage of this approach is that it is simple,it can be applied on any clustering algorithm, and it is alsoapplicable for any reranking method based on clusters. How-ever, a lot more parameters are introduced, and during de-velopment we found the results to be sensitive to the specificparameters used.Results suggest that the used reranking algorithm might notbe effective enough, therefore reducing the possible improve-ment when better clusters are obtained. For future workother reranking approaches should be explored. In addi-tion, in our experiments we used the same weights across allqueries for the different clustering methods and data sources.We expect better results could be obtained by estimating thequality of clusters at query time and adapting the weightsper query.7. ACKNOWLEDGEMENSThis research was supported by the Netherlands Organiza-tion for Scientific Research, NWO, grants 640.005.002 and639.022.809.8. REFERENCES[1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.Diversifying search results. In Proceedings of theSecond ACM International Conference on Web Searchand Data Mining, pages 5–14. ACM, 2009.[2] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latentdirichlet allocation. Journal of Machine LearningResearch, 3:993–1022, Mar. 2003.[3] O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan.Expected reciprocal rank for graded relevance. InProceedings of the 18th ACM conference onInformation and knowledge management, pages621–630. ACM, 2009.[4] C. L. Clarke, M. Kolla, G. V. Cormack,O. Vechtomova, A. Ashkan, S. Büttcher, andI. MacKinnon. Novelty and diversity in informationretrieval evaluation. In Proceedings of the 31st annualinternational ACM SIGIR conference on Research anddevelopment in information retrieval, pages 659–666.ACM, 2008.[5] C. L. A. Clarke, N. Craswell, and E. M. Voorhees.Overview of the TREC 2012 Web Track. InProceedings of TREC, 2012.[6] Z. Dou, S. Hu, K. Chen, R. Song, and J.-R. Wen.Multi-dimensional search result diversification. InProceedings of the fourth ACM internationalconference on Web search and data mining, pages475–484. ACM, 2011.[7] R. Ghaemi, M. N. Sulaiman, H. Ibrahim, andN. Mustapha. A survey: cluster ensemble techniques.In Proceedings of World Academy of Science,Engineering andTechnology, 2009.[8] J. He, V. Hollink, and A. P. de Vries. Combiningimplicit and explicit topic representations for resultdiversification. In Proceedings of SIGIR, 2012.[9] D. Hiemstra and C. Hauff. Mapreduce for informationretrieval evaluation: ‘let’s quickly test this on 12 tb ofdata’. In Multilingual and Multimodal InformationAccess Evaluation. Lecture Notes in Computer Science6360, pages 64–69. Springer Verlag, 2010.[10] K. Järvelin and J. Kekäläinen. Cumulated gain-basedevaluation of ir techniques. ACM Transactions onInformation Systems (TOIS), 20(4):422–446, Oct.2002.[11] T. K. Landauer, P. W. Foltz, and D. Laham. Anintroduction to latent semantic analysis. DiscourseProcesses, 25:259–284, 1998.[12] R. L. Santos, C. Macdonald, and I. Ounis. Exploitingquery reformulations for web search resultdiversification. In Proceedings of the 19th internationalconference on World wide web, pages 881–890. ACM,2010.[13] A. Strehl and J. Ghosh. Cluster ensembles — aknowledge reuse framework for combining multiplepartitions. Journal of Machine Learning Research,3:583–617, Mar. 2003.

Ensemble clustering for result diversification

This paper describes the participation of the University ofTwente in the Web track of TREC 2012. Our baseline approachuses the Mirex toolkit, an open source tool that sequantiallyscans all the documents. For result diversification,we experimented with improving the quality of clustersthrough ensemble clustering. We combined clusters obtainedby different clustering methods (such as LDA andK-means) and clusters obtained by using different types ofdata (such as document text and anchor text). Our twolayerensemble run performed better than the LDA based diversificationand also better than a non-diversification run

Nguyen, Dong

Edinburgh Research Explorer

     Edinburgh Research Explorer                                      Ensemble Clustering for Result DiversificationCitation for published version:Nguyen, D & Hiemstra, D 2012, Ensemble Clustering for Result Diversification. in Proceedings of TheTwenty-First Text REtrieval Conference, TREC 2012, Gaithersburg, Maryland, USA, November 6-9, 2012.Link:Link to publication record in Edinburgh Research ExplorerDocument Version:Publisher's PDF, also known as Version of recordPublished In:Proceedings of The Twenty-First Text REtrieval Conference, TREC 2012, Gaithersburg, Maryland, USA,November 6-9, 2012General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and / or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights.Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation. If you believe that the public display of this file breaches copyright pleasecontact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately andinvestigate your claim.Download date: 05. Apr. 2019Ensemble Clustering for Result DiversificationDong NguyenHuman Media InteractionUniversity of Twented.nguyen@utwente.nlDjoerd HiemstraDatabase GroupUniversity of Twented.hiemstra@utwente.nlABSTRACTThis paper describes the participation of the University ofTwente in the Web track of TREC 2012. Our baseline ap-proach uses the Mirex toolkit, an open source tool that se-quantially scans all the documents. For result diversifica-tion, we experimented with improving the quality of clus-ters through ensemble clustering. We combined clusters ob-tained by different clustering methods (such as LDA andK-means) and clusters obtained by using different types ofdata (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based di-versification and also better than a non-diversification run.1. INTRODUCTIONWeb queries are often short and ambiguous. Result diversifi-cation, which aims to diversify queries to cover the multiplefacets or subtopics of a query, can improve the quality ofthese queries. A common strategy is to estimate the as-pects/subtopics of the top ranked documents, and rerankthese documents based on the estimated subtopics. Usuallythe subtopics are discovered by clustering the (top ranked)documents. Two well known methods to rerank results areIA-select [1] and xQuAD [12].Recently, researchers have explored combining multiple clus-terings to improve result diversification. Dou et al. [6] usedfour methods to obtain subtopics: anchor texts, query logs,search result clusters and hosts. They proposed a rerank-ing framework that incorporated the subtopics from thesemultiple dimensions. Contrary to our work, they only ex-perimented with clusterings obtained using different datasources, and not with different clustering methods for a par-ticular data source. He et al. [8] proposed a framework tocombine clusters of external resources to regularize implicitsubtopics based on pLSA using random walks.In this work, we explore the use of clustering ensembles toobtain better clusterings for result diversification. Cluster-ing ensembles can combine arbitrary clusterings, for exam-ple based on different data sources (e.g. full document text,anchor text, urls) or by using different clustering methods(such as k-means and LDA [2]). Experiments were done onCategory B of ClueWeb09.We first discuss related work and the track in which we par-ticipated. We then describe our experimental setup and dis-cuss the results. We conclude with a summary and suggestfuture work.2. WEB TRACKThe Web track of TREC 2012 consists of an adhoc and adiversity track. In this paper we focus on the diversity track.Participants initially only have access to plain queries. How-ever, the evaluation of the runs are evaluated using the fulltopic descriptions.Topics are classified either as ambiguous or faceted [5]. Am-biguous queries have several unrelated interpretations. Forexample, an ambiguous query in TREC 2010 was the sun,which could refer to the newspaper or the star in the solarsystem. Faceted queries have a primary interpretation. Thesubtopics then reflect several aspects related to this inter-pretation. For example, a faceted query was Neil Young,with aspects such as Neil Young’s albums, biographical in-formation, lyrics and tour dates.The adhoc task is evaluated using Expected Reciprocal Rank(ERR) [3]. The diversity track is evaluated using an In-tent Aware version [1] of Expected Reciprocal Rank (ERR-IA) where the score for the different subtopics are weightedby the probability of that specific subtopic for the givenquery. In the Web track, these measures are calculated atrank 20. In this paper we also report nDCG@20[10] andα-nDCG@20[4].3. AD HOC RETRIEVALIn this section we describe our approach to obtain a baselineranking. Next, we rerank these results to improve resultdiversification.We use Mirex [9],1 a tool that sequentially scans the doc-uments. Built on Hadoop, sequential scanning becomes aviable approach. In addition, it allows researchers to eas-ily experiment with different retrieval models, because theframework is easy to extend. Documents were scored usinga language model with linear interpolation smoothing and adocument length prior. We decided to only use anchor text,since previous experiments indicated that this gave high pre-cision and still enough recall for this task.We use λ = 0.90 as our baseline for further reranking, afterexperimenting with different smoothing parameters on datafrom the Web track of 2009, 2010 and 2011. The baselinerun is referred to as utw2012lm09.1http://mirex.sourceforge.net4. RESULT DIVERSIFICATIONWe make the simplifying assumption that a document onlybelongs to one topic. However, our described methods caneasily be extended to support methods where documentsbelong to multiple topics.4.1 ClusteringWe experiment with several methods to cluster the docu-ments obtained from the baseline ranking.MethodsI K-means. An iterative algorithm where documents areassigned to the cluster with the nearest mean.II Ward. A hierarchical clustering method, where clustersare merged to minimize the total within-cluster vari-ance.III LDA [2]. A generative model that aims to uncover la-tent topics.IV LSA [11]. A method based on singular value decompo-sition to uncover latent concepts.We also vary the data source.DataI Full text. Cluster documents based on the full text asextracted from the HTML.II Anchor. Cluster documents based on the anchor text.III Host. Documents are assigned to the same cluster whenthey come from the same host.In our experiments, we use the same number of clusters forall clustering methods (except for host clustering, for whichthe number of clusters is dependent on the results). Anoptimized system that would vary the number of clustersbased on the used clustering method or particular querycould potentially provide better results.4.2 Combining Multiple ClusteringsClustering ensembles combine multiple clusterings into a sin-gle clustering. Advantages include more robustness, novelty(a combined solution that may not have been found by theindividual clustering algorithms), more stability and confi-dence, and support of parallelization and scalability [7]. Inthis paper we cluster the documents using multiple meth-ods and across several dimensions, and combine these intoa single, more robust clustering using clustering ensembles.We apply the most simple method for combining multipleclusterings called the cluster based similarity partitioningalgorithm (CSPA) [13]. Two documents have a similarity of1 if they appear in the same cluster. As a result, for eachclustering we are able to create an n × n binary similaritymatrix (with n the number of documents). A similaritymatrix for a combined clustering is then just the average ofthe individual similarity matrices.We experiment with assigning weights to the specific clus-terings. For example, if a certain method has an assignedweight of 0.8, the similarity matrix will have a value of 0.8if the two documents appear in the same cluster (and zerootherwise). We set the weight such that the total weights forthe different clusterings add up to 1. We then apply a clus-tering method on this induced similarity matrix to make afinal clustering. In this paper we use hierarchical clusteringusing the centroid method, where distances are calculatedbased on the centroids of the clusters.The advantage of this approach is that it is independentof the clusterings used. In addition, by combining multipleclusterings into one new clustering, we are also free to chooseany reranking algorithm we like to use. And by findingweights for the different clusterings, we obtain insight intowhat dimension or which clustering methods are effective forresult diversification.I Two-layer Ensemble ClusteringWe experiment with an ensemble clustering over ensembleclusterings. The final clustering is an ensemble clusteringover three clusterings:1. Text clustering. Ensemble clustering based on cluster-ings obtained using K-means, Ward, LDA and LSA onthe full text.2. Anchor clustering. Ensemble clustering based on clus-terings obtained using K-means, Ward, LDA and LSAon the anchor text.3. Host clustering.II Simple Ensemble ClusteringPreliminary experiments on previous TREC data found LDAto be the most effective of the clustering algorithms. There-fore, in this variant we only use LDA as the clustering methodfor the text and anchor data:1. LDA text clustering.2. LDA anchor clustering.3. Host clustering.III One-layer Ensemble ClusteringThis ensemble clustering uses the same clusterings as theTwo-layer Ensemble Clustering, however the clusters are di-rectly combined into a new clustering, instead of applyingtwo layers. Thus we create an ensemble clustering over thefollowing:1. Text - K-means.. . .4. Text - LSA.5. Anchor - K-means.. . .8. Anchor - LSA.9. Host clustering.Run nDCG@20 ERR@20 ERR-IA@20 α-nDCG@20Language modeling baseline (utw2012lm09) 0.122 0.218 0.404 0.505Diversification using LDA (utw2012lda) 0.111 0.215 0.402 0.499Two-layer ensemble clustering (utw2012c1) 0.120 0.220 0.405 0.508Simple ensemble clustering (utw2012sc1) 0.107 0.207 0.398 0.498One-layer ensemble clustering (utw2012fc1) 0.113 0.219 0.400 0.497Two-layer ensemble clustering (utw2012c2) 0.117 0.219 0.399 0.499Table 1: Results4.3 Result RerankingWe use the IA-select algorithm to diversify search resultsbased on clusters [1]. The IA-select algorithm involves com-puting the conditional probability P (c|q) of a subtopic cgiven the query q and the quality value of a document dgiven a query and subtopic, V (d|q, c).The algorithm then selects documents based on the highestmarginal utility:g(d|q, c, S) =∑c∈C(d)U(c|q, S)V (d|q, c)Where U(c|q, S) is initially set to P (c|q) when no docu-ments are selected yet, and updated for every added doc-ument. After preliminary experiments, we decided to calcu-late V (d|q, c) by the score of the document for query q di-vided by the total score of all documents in cluster c. P (c|q)is the sum of the quality values of the documents in clusterc divided by the total sum.4.4 Submitted RunsFor all runs, we rerank the top 1000 documents using clusterswith 25 topics. Parameters were selected using a parametersweep over data from 2009, 2010 and 2011. We submittedthe following runs to the adhoc (AH) and diversity (DIV)task:I [AH] Baseline run (utw2012lm09) A run usinglanguage modeling with λ = 0.9 and the Mirex toolkit.II [DIV] LDA (utw2012lda) A run using LDA cluster-ing based on document text.III [DIV] Two-layer Ensemble Clustering (utw2012c1)Clustering based on anchor text (weight 0.8; ensemblecluster of k-means: 0.2, LDA: 0.6, LSA: 0.2) and text(weight: 0.2; ensemble cluster of Ward: 0.2, LDA: 0.8).IV [DIV] Simple Ensemble Clustering (utw2012sc1)Clustering based on host (weight 0.8) and LDA basedon text (weight 0.2).V [AH] One-layer Ensemble Clustering (utw2012fc1)Due to time constraints the weights for this methodwere not optimized. We used an ensemble clusteringover text using LDA (0.4) and anchor text using K-means (0.2) and Ward (0.4).VI [AH] Two-layer Ensemble Clustering (utw2012c2)The weights for this run were not optimized. Cluster-ing based on host (weight: 0.4), anchor text (weight0.2; ensemble cluster of k-means: 0.33, LDA: 0.5, LSA:0.166) and text (weight: 0.4; ensemble cluster of Ward:0.2, LDA: 0.8).5. RESULTSThe results are presented in Table 1. We find that the base-line, with no diversification, performs very well. We sus-pect that our reranking algorithm is not very effective, sinceonly clustering based on LDA performs worse than the non-diversification run. However, we do find that our two-layerensemble clustering (utw2012c1) performs better than LDAon all measures, and also better than the non-diversificationbaseline on all measures except nDCG@20. When compar-ing based on ERR-IA@20, it performs better than or equalto LDA for 32/50 queries, and for 29/50 queries when com-paring with the LM baseline.The one-layer ensemble clustering (utw2012fc1), performsnot as well as the two-layer ensemble clustering, however,since we only did a partial parameter sweep it is hard to drawany conclusions from this. The simple ensemble clustering(utw2012sc1) performs the worst. We would expect thismethod to perform better than LDA, since LDA is one of theclusterings used. However, we only did a coarse parametersweep, and perhaps have not find the optimal weights yet.But this also illustrates that the used method is sensitiveto the weights that are used. In addition, the performancemight be degraded because of the used clustering method toobtain an ensemble clustering based on the similarity matrix.We will further analyze the performance of our best run(utw2012c1) by comparing with the diversification run us-ing LDA (utw2012lda). The difference in ERR-IA@20 whencomparing LDA and the LM baseline (no diversification) canbe found in Figure 1. A positive value means that the LDArun performed better. A similar graph comparing the two-layer ensemble model and the LM baseline can be found inFigure 2.A query that performed well when comparing ERR-IA@20is query 154 ‘figs’ (Find information on nutritional or healthbenefits of figs), with subtopics on nutritional/health bene-fits, recipes, varieties and growing figs. The LDA run ob-tained an ERR-IA@20 of 0.384, the LM baseline a score of0.402 and the utw2012c1 run scored 0.430.We expect that when using a better reranking algorithm,the results can benefit more from improved clusterings. Wealso encountered some drawbacks with ensemble clusterings.First, we found it to be sensitive to the weights that wereused. In addition, given a similarity matrix, we need todecide on a clustering method. More experiments should bedone to assess what clustering method is the most suitablefor this task.Queries∆ERR-IA@20-0.060.000.06Figure 1: Diff. in performance of LDA (utw2012lda)and the LM baseline.Queries∆ERR-IA@20-0.060.000.06Figure 2: Diff. in performance of two-layer ensembleclustering (utw2012c1) and the LM baseline.6. CONCLUSIONIn this paper we presented the participation of the Uni-versity of Twente in the Web track of TREC 2012. Thisyear, we focused on the diversity track. We used an en-semble clustering approach aimed to improve the quality ofthe document clusters. Our ensemble run performed betterthan the LDA based diversification and also better than anon-diversification run.The main advantage of this approach is that it is simple,it can be applied on any clustering algorithm, and it is alsoapplicable for any reranking method based on clusters. How-ever, a lot more parameters are introduced, and during de-velopment we found the results to be sensitive to the specificparameters used.Results suggest that the used reranking algorithm might notbe effective enough, therefore reducing the possible improve-ment when better clusters are obtained. For future workother reranking approaches should be explored. In addi-tion, in our experiments we used the same weights across allqueries for the different clustering methods and data sources.We expect better results could be obtained by estimating thequality of clusters at query time and adapting the weightsper query.7. ACKNOWLEDGEMENSThis research was supported by the Netherlands Organiza-tion for Scientific Research, NWO, grants 640.005.002 and639.022.809.8. REFERENCES[1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.Diversifying search results. In Proceedings of theSecond ACM International Conference on Web Searchand Data Mining, pages 5–14. ACM, 2009.[2] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latentdirichlet allocation. Journal of Machine LearningResearch, 3:993–1022, Mar. 2003.[3] O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan.Expected reciprocal rank for graded relevance. InProceedings of the 18th ACM conference onInformation and knowledge management, pages621–630. ACM, 2009.[4] C. L. Clarke, M. Kolla, G. V. Cormack,O. Vechtomova, A. Ashkan, S. Bu¨ttcher, andI. MacKinnon. Novelty and diversity in informationretrieval evaluation. In Proceedings of the 31st annualinternational ACM SIGIR conference on Research anddevelopment in information retrieval, pages 659–666.ACM, 2008.[5] C. L. A. Clarke, N. Craswell, and E. M. Voorhees.Overview of the TREC 2012 Web Track. InProceedings of TREC, 2012.[6] Z. Dou, S. Hu, K. Chen, R. Song, and J.-R. Wen.Multi-dimensional search result diversification. InProceedings of the fourth ACM internationalconference on Web search and data mining, pages475–484. ACM, 2011.[7] R. Ghaemi, M. N. Sulaiman, H. Ibrahim, andN. Mustapha. A survey: cluster ensemble techniques.In Proceedings of World Academy of Science,Engineering andTechnology, 2009.[8] J. He, V. Hollink, and A. P. de Vries. Combiningimplicit and explicit topic representations for resultdiversification. In Proceedings of SIGIR, 2012.[9] D. Hiemstra and C. Hauff. Mapreduce for informationretrieval evaluation: ‘let’s quickly test this on 12 tb ofdata’. In Multilingual and Multimodal InformationAccess Evaluation. Lecture Notes in Computer Science6360, pages 64–69. Springer Verlag, 2010.[10] K. Ja¨rvelin and J. Keka¨la¨inen. Cumulated gain-basedevaluation of ir techniques. ACM Transactions onInformation Systems (TOIS), 20(4):422–446, Oct.2002.[11] T. K. Landauer, P. W. Foltz, and D. Laham. Anintroduction to latent semantic analysis. DiscourseProcesses, 25:259–284, 1998.[12] R. L. Santos, C. Macdonald, and I. Ounis. Exploitingquery reformulations for web search resultdiversification. In Proceedings of the 19th internationalconference on World wide web, pages 881–890. ACM,2010.[13] A. Strehl and J. Ghosh. Cluster ensembles — aknowledge reuse framework for combining multiplepartitions. Journal of Machine Learning Research,3:583–617, Mar. 2003.

Ensemble Clustering for Result Diversification

Contains fulltext :
                  227253.pdf (publisher's version ) (Open Access)TREC-21 201

Nguyen, D.-P.

Hiemstra, D.

Radboud Repository

Nguyen, D.

NARCIS 

A survey: cluster ensemble techniques.

An introduction to latent semantic analysis.

Cluster ensembles | a knowledge reuse framework for combining multiple partitions.

Cumulated gain-based evaluation of ir techniques.

de Vries. Combining implicit and explicit topic representations for result diversi

Diversifying search results.

Expected reciprocal rank for graded relevance.

Exploiting query reformulations for web search result diversi

Latent dirichlet allocation.

Mapreduce for information retrieval evaluation: `let's quickly test this on 12 tb of data'.

Multi-dimensional search result diversi

Novelty and diversity in information retrieval evaluation.

Overview of the TREC 2012 Web Track.

http://purl.utwente.nl/publications/84255

Ensemble clustering for result diversification

Abstract

Similar works

Full text

Available Versions

University of Twente Research Information

Edinburgh Research Explorer

Radboud Repository

NARCIS