Search CORE

6 research outputs found

Recommended from our members

WIDIT in TREC-2006 Blog track

Author: Valerio Alejandro
Yang Kiduk
Yu Ning
Zhang Hui
Publication venue
Publication date
Field of study

Web Information Discovery Integrated Tool (WIDIT) Laboratory at the Indiana University School of Library and Information Science participated in the Blog track’s opinion task in TREC- 2006. The goal of opinion task is to "uncover the public sentiment towards a given entity/target", which involves not only retrieving topically relevant blogs but also identifying those that contain opinions about the target. To further complicate the matter, the blog test collection contains considerable amount of noise, such as blogs with non-English content and non-blog content (e.g., advertisement, navigational text), which may misdirect retrieval systems. Based on our hypothesis that noise reduction (e.g., exclusion of non-English blogs, navigational text) will improve both on-topic and opinion retrieval performances, we explored various noise reduction approaches that can effectively eliminate the noise in blog data without inadvertently excluding valid content. After creating two separate indexes (with and without noise) to assess the noise reduction effect, we tackled the opinion blog retrieval task by breaking it down to two sequential subtasks: on-topic retrieval followed by opinion classification. Our opinion retrieval approach was to first apply traditional IR methods to retrieve on-topic blogs, and then boost the ranks of opinionated blogs based on opinion scores generated by opinion assessment methods. Our opinion module consists of Opinion Term Module, which identify opinions based on the frequency of opinion terms (i.e., terms that only occur frequently in opinion blogs), Rare Term Module, which uses uncommon/rare terms (e.g., “sooo good”) for opinion classification, IU Module, which uses IU (I and you) collocations, and Adjective-Verb Module, which uses computational linguistics’ distribution similarity approach to learn the subjective language from training data

ScholarsArchive@OSU

Recommended from our members

WIDIT in TREC-2007 Blog Track: Combining Lexicon-based Methods to Detect Opinionated Blogs

Author: Yang Kiduk
Yu Ning
Zhang Hui
Publication venue
Publication date
Field of study

In TREC-2007, Indiana University‟s WIDIT Lab1 participated in the Blog track‟s opinion task and the polarity subtask. For the opinion task, whose goal is to "uncover the public sentiment towards a given entity/target", we focused on combining multiple sources of evidence to detect opinionated blog postings. Since detecting opinionated blogs on a given topic (i.e., entity/target) involves not only retrieving topically relevant blogs but also identifying those that contain opinions about the target, our approach to the opinion finding task consisted of first applying traditional IR methods to retrieve on-topic blogs and then boosting the ranks of opinionated blogs based on combined opinion scores generated by multiple opinion detection methods. The key idea underlying our opinion detection method is to rely on a variety of complementary evidences rather than trying to optimize a single approach. This fusion approach to opinionated blog detection is motivated by our past experience that suggested no single approach, whether lexicon-based or classifier-driven, is well-suited for the blog opinion retrieval task. To accomplish the polarity subtask, which requires classification of the retrieved blogs into positive or negative orientation, our opinion detection module was extended to generate polarity scores to be used for polarity determination

ScholarsArchive@OSU

Recommended from our members

Fusion Approach to Finding Opinionated Blogs

Author: Ke Weimao
Valerio Alejandro
Yang Kiduk
Yu Ning
Zhang Hui
Publication venue: American Society for Information Science and Technology
Publication date
Field of study

In this paper, we describe a fusion approach to finding opinionated blog postings. Our approach to opinion blog retrieval consisted of first applying traditional IR methods to retrieve on-topic blogs and then boosting the ranks of opinionated blogs based on combined opinion scores generated by multiple assessment methods. Our opinion module is composed of the Opinion Term Module, which identifies opinions based on the frequency of opinion terms (i.e., terms that occur frequently in opinion blogs), the Rare Term Module, which uses uncommon/rare terms (e.g., “sooo good”) for opinion classification, the IU Module, which uses IU (I and you) collocations, and the Adjective-Verb Module, which uses computational linguistics’ distribution similarity approach to learn the subjective language from training data

ScholarsArchive@OSU

Recommended from our members

Fusion Approach to Finding Opinions in Blogosphere

Author: Ke Weimao
Valerio Alejandro
Yang Kiduk
Yu Ning
Zhang Hui
Publication venue
Publication date
Field of study

In this paper, we describe a fusion approach to finding opinion about a given target in blog postings. We tackled the opinion blog retrieval task by breaking it down to two sequential subtasks: on- topic retrieval followed by opinion classification. Our opinion retrieval approach was to first apply traditional IR methods to retrieve on-topic blogs, and then boost the ranks of opinionated blogs using combined opinion scores generated by four opinion assessment methods. Our opinion module consists of Opinion Term Module, which identify opinions based on the frequency of opinion terms (i.e., terms that only occur frequently in opinion blogs), Rare Term Module, which uses uncommon/rare terms (e.g., “sooo good”) for opinion classification, IU Module, which uses IU (I and you) collocations, and Adjective-Verb Module, which uses computational linguistics’ distribution similarity approach to learn the subjective language from training data.This paper was presented by the author(s) at the International Conference on Weblogs and Social Media on March 27, 2007, in Boulder, Colorado, U.S.A. This paper has also been published as: Yang, K., Yu, N., Valerio, A., Zhang, H., & Ke, W. (2007). Fusion approach to finding opinionated blogs. Proceedings of the American Society for Information Science and Technology, 44(1), 1–14. doi: 10.1002/meet.1450440254Keywords: Opinion Identification, Method Fusion, Rank-boosting, Dynamic Tunin

ScholarsArchive@OSU

Internet multimedia information retrieval based on link analysis.

Author
Publication venue
Publication date: 01/01/2004
Field of study

Chan Ka Yan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves i-iv (3rd gp.)).Abstracts in English and Chinese.ACKNOWLEDGEMENT --- p.IABSTRACT --- p.II摘要 --- p.IVTABLE OF CONTENT --- p.VILIST OF FIGURE --- p.VIIILIST OF TABLE --- p.IXChapter CHAPTER 1. --- INTRODUCTION --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Importance of hyperlink analysis --- p.2Chapter CHAPTER 2. --- RELATED WORK --- p.4Chapter 2.1 --- Crawling --- p.4Chapter 2.1.1 --- Crawling method for HITS Algorithm --- p.4Chapter 2.1.2 --- Crawling method for Page Rank Algorithm --- p.7Chapter 2.2 --- Ranking --- p.7Chapter 2.2.1 --- Page Rank Algorithm --- p.8Chapter 2.2.2 --- HITS Algorithm --- p.11Chapter 2.2.3 --- PageRank-HITS Algorithm --- p.15Chapter 2.2.4 --- SALSA Algorithm --- p.16Chapter 2.2.5 --- Average and Sim --- p.18Chapter 2.2.6 --- Netscape Approach --- p.19Chapter 2.2.7 --- Cocitation Approach --- p.19Chapter 2.3 --- Multimedia Information Retrieval --- p.20Chapter 2.3.1 --- Octopus --- p.21Chapter CHAPTER 3. --- RESEARCH METHODOLOGY --- p.25Chapter 3.1 --- Research Objective --- p.25Chapter 3.2 --- Proposed Crawling Methodology --- p.26Chapter 3.2.1 --- Collecting Media Objects --- p.26Chapter 3.2.2 --- Filtering the collection of links --- p.29Chapter 3.3 --- Proposed Ranking Methodology --- p.34Chapter 3.3.1 --- Identifying the factors affect ranking --- p.34Chapter 3.3.2 --- Modified Ranking Algorithms --- p.37Chapter CHAPTER 4. --- EXPERIMENTAL RESULTS AND DISCUSSIONS --- p.52Chapter 4.1 --- Experimental Setup --- p.52Chapter 4.1.1 --- Assumptions for the Experiment --- p.53Chapter 4.2 --- Some Observations from Experiment --- p.54Chapter 4.2.1 --- Dangling links --- p.55Chapter 4.2.2 --- "Good Hub = bad Authority, Good Authority = bad Hub?" --- p.55Chapter 4.2.3 --- Setting of weights --- p.56Chapter 4.3 --- Discussion on Experimental Results --- p.57Chapter 4.3.1 --- Relevance --- p.57Chapter 4.3.2 --- Precision and recall --- p.58Chapter 4.3.3 --- Significance testing --- p.61Chapter 4.3.4 --- Ranking --- p.63Chapter 4.4 --- Limitations and Difficulties --- p.67Chapter 4.4.1 --- Small size of the base set --- p.68Chapter 4.4.2 --- Parameter settings --- p.68Chapter 4.4.3 --- Unable to remove all the meaningless links from base set --- p.68Chapter 4.4.4 --- Resources and time-consuming --- p.69Chapter 4.4.5 --- TKC Effect --- p.69Chapter 4.4.6 --- Continuously updated format of HTML codes and file types --- p.70Chapter 4.4.7 --- The object citation habit of authors --- p.70Chapter CHAPTER 5. --- CONCLUSION --- p.71Chapter 5.1 --- Contribution of our Methodology --- p.71Chapter 5.2 --- Possible Improvement --- p.71Chapter 5.3 --- Conclusion --- p.72BIBLIOGRAPHY --- p.IAPPENDIX --- p.A-IChapter A.1 --- One-tailed paired t-test results --- p.A-IChapter A2. --- Anova results --- p.A-I

CUHK Digital Repository

Combining text- and link-based retrieval methods for Web IR

Author: Kiduk Yang
Publication venue
Publication date
Field of study

uncvmss, uncvsmm, uncfsls, uncfslm 1 – WT10g automatic topic relevance task run

CiteSeerX