6 research outputs found
Recommended from our members
WIDIT in TREC-2006 Blog track
Web Information Discovery Integrated Tool (WIDIT) Laboratory at the Indiana University School of Library and Information Science participated in the Blog trackâs opinion task in TREC- 2006. The goal of opinion task is to "uncover the public sentiment towards a given entity/target", which involves not only retrieving topically relevant blogs but also identifying those that contain opinions about the target. To further complicate the matter, the blog test collection contains considerable amount of noise, such as blogs with non-English content and non-blog content (e.g., advertisement, navigational text), which may misdirect retrieval systems.
Based on our hypothesis that noise reduction (e.g., exclusion of non-English blogs, navigational text) will improve both on-topic and opinion retrieval performances, we explored various noise reduction approaches that can effectively eliminate the noise in blog data without inadvertently excluding valid content. After creating two separate indexes (with and without noise) to assess the noise reduction effect, we tackled the opinion blog retrieval task by breaking it down to two sequential subtasks: on-topic retrieval followed by opinion classification. Our opinion retrieval approach was to first apply traditional IR methods to retrieve on-topic blogs, and then boost the ranks of opinionated blogs based on opinion scores generated by opinion assessment methods. Our opinion module consists of Opinion Term Module, which identify opinions based on the frequency of opinion terms (i.e., terms that only occur frequently in opinion blogs), Rare Term Module, which uses uncommon/rare terms (e.g., âsooo goodâ) for opinion classification, IU Module, which uses IU (I and you) collocations, and Adjective-Verb Module, which uses computational linguisticsâ distribution similarity approach to learn the subjective language from training data
Recommended from our members
WIDIT in TREC-2007 Blog Track: Combining Lexicon-based Methods to Detect Opinionated Blogs
In TREC-2007, Indiana Universityâs WIDIT Lab1 participated in the Blog trackâs opinion task and the polarity subtask. For the opinion task, whose goal is to "uncover the public sentiment towards a given entity/target", we focused on combining multiple sources of evidence to detect opinionated blog postings. Since detecting opinionated blogs on a given topic (i.e., entity/target) involves not only retrieving topically relevant blogs but also identifying those that contain opinions about the target, our approach to the opinion finding task consisted of first applying traditional IR methods to retrieve on-topic blogs and then boosting the ranks of opinionated blogs based on combined opinion scores generated by multiple opinion detection methods. The key idea underlying our opinion detection method is to rely on a variety of complementary evidences rather than trying to optimize a single approach. This fusion approach to opinionated blog detection is motivated by our past experience that suggested no single approach, whether lexicon-based or classifier-driven, is well-suited for the blog opinion retrieval task. To accomplish the polarity subtask, which requires classification of the retrieved blogs into positive or negative orientation, our opinion detection module was extended to generate polarity scores to be used for polarity determination
Recommended from our members
Fusion Approach to Finding Opinionated Blogs
In this paper, we describe a fusion approach to finding opinionated blog postings. Our approach to opinion blog retrieval consisted of first applying traditional IR methods to retrieve on-topic blogs and then boosting the ranks of opinionated blogs based on combined opinion scores generated by multiple assessment methods. Our opinion module is composed of the Opinion Term Module, which identifies opinions based on the frequency of opinion terms (i.e., terms that occur frequently in opinion blogs), the Rare Term Module, which uses uncommon/rare terms (e.g., âsooo goodâ) for opinion classification, the IU Module, which uses IU (I and you) collocations, and the Adjective-Verb Module, which uses computational linguisticsâ distribution similarity approach to learn the subjective language from training data
Recommended from our members
Fusion Approach to Finding Opinions in Blogosphere
In this paper, we describe a fusion approach to finding opinion about a given target in blog postings. We tackled the opinion blog retrieval task by breaking it down to two sequential subtasks: on- topic retrieval followed by opinion classification. Our opinion retrieval approach was to first apply traditional IR methods to retrieve on-topic blogs, and then boost the ranks of opinionated blogs using combined opinion scores generated by four opinion assessment methods. Our opinion module consists of Opinion Term Module, which identify opinions based on the frequency of opinion terms (i.e., terms that only occur frequently in opinion blogs), Rare Term Module, which uses uncommon/rare terms (e.g., âsooo goodâ) for opinion classification, IU Module, which uses IU (I and you) collocations, and Adjective-Verb Module, which uses computational linguisticsâ distribution similarity approach to learn the subjective language from training data.This paper was presented by the author(s) at the International Conference on Weblogs and Social Media on March 27, 2007, in Boulder, Colorado, U.S.A. This paper has also been published as: Yang, K., Yu, N., Valerio, A., Zhang, H., & Ke, W. (2007). Fusion approach to finding opinionated blogs. Proceedings of the American Society for Information Science and Technology, 44(1), 1â14. doi: 10.1002/meet.1450440254Keywords: Opinion Identification, Method Fusion, Rank-boosting, Dynamic Tunin
Internet multimedia information retrieval based on link analysis.
Chan Ka Yan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves i-iv (3rd gp.)).Abstracts in English and Chinese.ACKNOWLEDGEMENT --- p.IABSTRACT --- p.IIæèŠ --- p.IVTABLE OF CONTENT --- p.VILIST OF FIGURE --- p.VIIILIST OF TABLE --- p.IXChapter CHAPTER 1. --- INTRODUCTION --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Importance of hyperlink analysis --- p.2Chapter CHAPTER 2. --- RELATED WORK --- p.4Chapter 2.1 --- Crawling --- p.4Chapter 2.1.1 --- Crawling method for HITS Algorithm --- p.4Chapter 2.1.2 --- Crawling method for Page Rank Algorithm --- p.7Chapter 2.2 --- Ranking --- p.7Chapter 2.2.1 --- Page Rank Algorithm --- p.8Chapter 2.2.2 --- HITS Algorithm --- p.11Chapter 2.2.3 --- PageRank-HITS Algorithm --- p.15Chapter 2.2.4 --- SALSA Algorithm --- p.16Chapter 2.2.5 --- Average and Sim --- p.18Chapter 2.2.6 --- Netscape Approach --- p.19Chapter 2.2.7 --- Cocitation Approach --- p.19Chapter 2.3 --- Multimedia Information Retrieval --- p.20Chapter 2.3.1 --- Octopus --- p.21Chapter CHAPTER 3. --- RESEARCH METHODOLOGY --- p.25Chapter 3.1 --- Research Objective --- p.25Chapter 3.2 --- Proposed Crawling Methodology --- p.26Chapter 3.2.1 --- Collecting Media Objects --- p.26Chapter 3.2.2 --- Filtering the collection of links --- p.29Chapter 3.3 --- Proposed Ranking Methodology --- p.34Chapter 3.3.1 --- Identifying the factors affect ranking --- p.34Chapter 3.3.2 --- Modified Ranking Algorithms --- p.37Chapter CHAPTER 4. --- EXPERIMENTAL RESULTS AND DISCUSSIONS --- p.52Chapter 4.1 --- Experimental Setup --- p.52Chapter 4.1.1 --- Assumptions for the Experiment --- p.53Chapter 4.2 --- Some Observations from Experiment --- p.54Chapter 4.2.1 --- Dangling links --- p.55Chapter 4.2.2 --- "Good Hub = bad Authority, Good Authority = bad Hub?" --- p.55Chapter 4.2.3 --- Setting of weights --- p.56Chapter 4.3 --- Discussion on Experimental Results --- p.57Chapter 4.3.1 --- Relevance --- p.57Chapter 4.3.2 --- Precision and recall --- p.58Chapter 4.3.3 --- Significance testing --- p.61Chapter 4.3.4 --- Ranking --- p.63Chapter 4.4 --- Limitations and Difficulties --- p.67Chapter 4.4.1 --- Small size of the base set --- p.68Chapter 4.4.2 --- Parameter settings --- p.68Chapter 4.4.3 --- Unable to remove all the meaningless links from base set --- p.68Chapter 4.4.4 --- Resources and time-consuming --- p.69Chapter 4.4.5 --- TKC Effect --- p.69Chapter 4.4.6 --- Continuously updated format of HTML codes and file types --- p.70Chapter 4.4.7 --- The object citation habit of authors --- p.70Chapter CHAPTER 5. --- CONCLUSION --- p.71Chapter 5.1 --- Contribution of our Methodology --- p.71Chapter 5.2 --- Possible Improvement --- p.71Chapter 5.3 --- Conclusion --- p.72BIBLIOGRAPHY --- p.IAPPENDIX --- p.A-IChapter A.1 --- One-tailed paired t-test results --- p.A-IChapter A2. --- Anova results --- p.A-I
Combining text- and link-based retrieval methods for Web IR
uncvmss, uncvsmm, uncfsls, uncfslm 1 â WT10g automatic topic relevance task run