17 research outputs found
γγ€γ’γγ° γγ : γΈγγ»γ«γ€ γγγΉγ γ·γ₯γ¦γ΄γ¦ γ γ’γγΊγ― γγ―γΌγ³ γ γ·γΏ γ·γγ’γ³ γ«γ© γ°γΏγ€γγγ γ«γ€γγ¦ γ¨γ γγγ²γΌγ·γ§γ³ γ·γΉγγ
As computers and their networks continue to be developed, our day-to-day lives are being surrounded by increasingly more complex instruments, and we often have to ask questions about using them. At the same time, large collections of texts to answer these questions are being gathered. Therefore, there are potential answers to many of our questions that exist as texts somewhere. However, there are various gaps between our various questions and the texts, and these prevent us from accessing appropriate texts to answer our questions. The gaps are mainly composed of both expression and vagueness gaps. When we seek texts for answers using conventional keyword-based text retrieval systems, we often have trouble locating them. In contrast, when we ask experts on instruments or operators of call centers, they can resolve the various gaps, by interpreting our questions flexibly, and by producing some ask-backs. The problem with experts and call centers is that they are not always available. Two approaches have been studied to resolve the various gaps: the extension of keyword-based text retrieval systems, and the application of artificial intelligence techniques. However, these approaches have their respective limitations. The former uses texts or keywords as methods for ask-back questions, but these methods are not always suitable. The latter requires a specialized knowledge base described in formal languages, so it cannot be applied to existing collections with large amount of texts. This thesis targets real-world the large text collections provided by Microsoft Corporation, and addresses a novel methodology to resolve the gaps between various user questions and the texts. The methodology consists of two key solutions: precise and flexible methods of matching user questions with texts based on NLP (natural language processing) techniques, and ask-back methods using the matching methods. First, the matching methods, including sentence structure analysis and expression gap resolution, are described. In addition, these methods are extended into matching through metonymy, which is frequently observed in natural languages. After that, a solution to make ask backs based on these matching methods, by using two kinds of ask-backs that complement each other, is proposed. Both ask-backs navigate users from vague questions to specific answers. Finally, our methodology is evaluated through the real-world operation of a dialog system, Dialog Navigator, in which all the proposed methods are implemented. Chapter 1 discusses issues on information retrieval, and present which issues are to be solved. That is, it examines the question logs from a real-world natural-language-based text retrieval system, and organizes types and factors of the gaps. The examination indicates that some gaps between user questions and texts cannot be resolved well by methods used in previous studies, and suggests that both interactions with users and applicability to real-world text collections are needed. Based on the discussion, a solution to deal with these gaps is proposed, by advancing an approach employed in open-domain question-answering systems, i.e., utilization of recent NLP techniques, into resolving the various gaps. Chapter 2 proposes several methods of matching user questions with texts, based on the NLP techniques. Of these techniques, sentence structure analysis through fullparsing is essential for two reasons: first, it enables expression gaps to be resolved beyond the keyword level; second, it is indispensable in resolving vagueness gaps by providing ask-backs. Our methods include: sentence structure analysis using a Japanese parser KNP, expression-gap resolution based on two kinds of dictionaries, text-collection selection through question-type estimates, and score calculations based on sentence structures. An experimental evaluation on testsets shows significant improvements of performance by our methods. Chapter 3 proposes a novel method of processing metonymy, as an extension of the matching methods proposed in Chapter 2. Metonymy is a figure of speech in which the name of one thing is substituted for that of something else to which it is related, and this frequently occurs in both user questions and texts. Namely, this chapter addresses the automatic acquisition of pairs of metonymic expressions and their interpretative expressions from large corpora, and applies the acquired pairs to resolving structural gaps caused by metonymy. Unlike previous studies on metonymy, the method targets both recognition and interpretation process of metonymy. The method acquired 1, 126 pairs from corpora, and over 80% of the pairs were correct as interpretations of metonymy. Furthermore, an experimental evaluation on the testsets demonstrated that introducing the acquired pairs significantly improves matching. Chapter 4 presents a strategy of navigating users from vague questions to specific texts based on the previously discussed matching methods. Of course, it is necessary to make some use of ask-backs to achieve this, and this strategy involves two approaches: description extraction as a bottom-up approach, and dialog cards as a top-down approach. The former extracts the neighborhoods of the part that matches the user question in each text through matching methods. Such neighborhoods are mostly suitable for ask-backs that clarify vague user questions. However, if a userβs question is too vague, this approach often fails. The latter covers vague questions based on the know-how of the call center; dialog cards systematize procedures for ask-backs to clarify frequently asked questions that are vague. Matching methods are also applied to match user questions with the cards. Finally, a comparison of the approaches with those used in other related work demonstrates the novelty of the approaches. Chapter 5 describes the architecture for Dialog Navigator, a dialog system in which all the proposed methods are implemented. The system uses the real-world large text collections provided by Microsoft Corporation, and it has been open to the public on a website from April 2002. The methods were evaluated based on the real-world operational results of the system, because the various gaps to be resolved should reflect those in the real-world. The evaluation proved the effectiveness of the methods: more than 70% of all user questions were answered with relevant texts, the behaviors of both users and the system were reasonable with most dialogs, and most of the extracted descriptions for ask-backs were suitably matched. Chapter 6 concludes the thesis.δΊ¬ι½ε€§ε¦0048ζ°εΆγ»θͺ²η¨ε士ε士(ζ
ε ±ε¦)η²η¬¬11209ε·ζ
ε第135ε·ζ°εΆ||ζ
||31(ιε±ε³ζΈι€¨)UT51-2004-T178δΊ¬ι½ε€§ε¦ε€§ε¦ι’ζ
ε ±ε¦η η©Άη§η₯θ½ζ
ε ±ε¦ε°ζ»(δΈ»ζ»)ζζ ζΎε±± ιεΈ, ζζ ζ²³ε ιδΉ, ε©ζζ δ½θ€ ηε²ε¦δ½θ¦ε第4ζ‘第1ι
θ©²ε½Doctor of InformaticsKyoto UniversityDFA
Discovering Serendipitous Information from Wikipedia by Using Its Network Structure
Many researchers conducted studies on extracting relevant information from web documents. However, there are few studies on extracting serendipitous information. We propose methods to discover unexpected information from Wikipedia by using its network structure, for example, the distance between two categories. We evaluated two methods: a classification-based method using support vector machines (SVMs), and a ranking-based method using regression. We demonstrate advantages of regression over classification