6 research outputs found

    Proximity-based approaches to blog opinion retrieval

    Get PDF
    Recent years have seen the rapid growth of social media platforms that enable people to express their thoughts and perceptions on the web and share them with other users. Many people write their opinion about products, movies, people or events on blogs, forums or review sites. The so-called User Generated Content is a good source of users’ opinion and mining it can be very useful for a wide variety of applications that require understudying of public opinion about a concept. Blogs are one of the most popular and influential social media. The rapid growth in the popularity of blogs, the ability of bloggers to write about different topics and the possibility of getting feedback from other users, makes the blogosphere a valuable source of opinions on different topics. To facilitate access to such opinionated content new retrieval models called opinion retrieval mod- els are necessary. Opinion retrieval models aim at finding documents that are relevant to the topic of a query and express opinion about it. However, opinion retrieval in blogs is challenging due to a number of reasons. The first reason is that blogs are not limited to a single topic, they can be about anything that is of interest to an author. Therefore, a large number of blog posts may not be relevant to the topic of query. The second reason is that a blog post relevant to a query, can be also relevant to a number of other topics and express opinion about one of the non-query topics. Therefore, an opinion retrieval system should first locate the document relevant to a query and then score documents based on the opinion that is targeted at the query in a relevant document. Finally, blogs are not limited to a single domain, an opinion retrieval model should be general enough to be able to retrieve posts related to different topics in different domains. In this thesis, we focus on the opinion retrieval task in blogs. Our aim is to propose methods that improve blog post opinion retrieval performance. To this end, we consider an opinion retrieval model to consist of three components: relevance scoring, opinion scoring and the score combination components. In this thesis we focus on the opinion scoring and combination components and propose methods for better handling these two important steps. We evaluate our propose methods on the standard TREC collection and provide evidence that the proposed methods are indeed helpful and improve the performance of the state of the art techniques

    Building queries for prior-art search

    Get PDF
    Prior-art search is a critical step in the examination procedure of a patent application. This study explores automatic query generation from patent documents to facilitate the time-consuming and labor-intensive search for relevant patents. It is essential for this task to identify discriminative terms in different fields of a query patent, which enables us to distinguish relevant patents from non-relevant patents. To this end we investigate the distribution of terms occurring in different fields of the query patent and compare the distributions with the rest of the collection using language modeling estimation techniques. We experiment with term weighting based on the Kullback-Leibler divergence between the query patent and the collection and also with parsimonious language model estimation. Both of these techniques promote words that are common in the query patent and are rare in the collection. We also incorporate the classification assigned to patent documents into our model, to exploit available human judgements in the form of a hierarchical classification. Experimental results show that the retrieval using the generated queries is effective, particularly in terms of recall, while patent description is shown to be the most useful source for extracting query terms

    Proximity-based opinion retrieval

    No full text
    Blog post opinion retrieval aims at finding blog posts that are relevant and opinionated about a user’s query. In this paper we propose a simple probabilistic model for assigning relevant opinion scores to documents. The key problem is how to capture opinion expressions in the document, that are related to the query topic. Current solutions enrich general opinion lexicons by finding query-specific opinion lexicons using pseudo-relevance feedback on external corpora or the collection itself. In this paper we use a general opinion lexicon and propose using proximity information in order to capture opinion term relatedness to the query. We propose a proximity-based opinion propagation method to calculate the opinion density at each point in a document. The opinion density at the position of a query term in the document can then be considered as the probability of opinion about the query term at that position. The effect of different kernels for capturing the proximity is also discussed. Experimental results on the BLOG06 dataset show that the proposed method provides significant improvement over standard TREC baselines and achieves a 2.5 % increase in MAP over the best performing run in the TREC 2008 blog track

    Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval

    No full text
    ABSTRACT Patent prior art search is a task in patent retrieval where the goal is to rank documents which describe prior art work related to a patent application. One of the main properties of patent retrieval is that the query topic is a full patent application and does not represent a focused information need. This query by document nature of patent retrieval introduces new challenges and requires new investigations specific to this problem. Researchers have addressed this problem by considering different information resources for query reduction and query disambiguation. However, previous work has not fully studied the effect of using proximity information and exploiting domain specific resources for performing query disambiguation. In this paper, we first reduce the query document by taking the first claim of the document itself. We then build a query-specific patent lexicon based on definitions of the International Patent Classification (IPC). We study how to expand queries by selecting expansion terms from the lexicon that are focused on the query topic. The key problem is how to capture whether an expansion term is focused on the query topic or not. We address this problem by exploiting proximity information. We assign high weights to expansion terms appearing closer to query terms based on the intuition that terms closer to query terms are more likely to be related to the query topic. Experimental results on two patent retrieval datasets show that the proposed method is effective and robust for query expansion, significantly outperforming the standard pseudo relevance feedback (PRF) and existing baselines in patent retrieval
    corecore