6 research outputs found

    Evaluating Information Retrieval and Access Tasks

    Get PDF
    This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one

    بناء أداة تفاعلية متعددة اللغات لاسترجاع المعلومات

    Get PDF
    The growing requirement on the Internet have made users access to the information expressed in a language other than their own , which led to Cross lingual information retrieval (CLIR) .CLIR is established as a major topic in Information Retrieval (IR). One approach to CLIR uses different methods of translation to translate queries to documents and indexes in other languages. As queries submitted to search engines suffer lack of untranslatable query keys (i.e., words that the dictionary is missing) and translation ambiguity, which means difficulty in choosing between alternatives of translation. Our approach in this thesis is to build and develop the software tool (MORTAJA-IR-TOOL) , a new tool for retrieving information using programming JAVA language with JDK 1.6. This tool has many features, which is develop multiple systematic languages system to be use as a basis for translation when using CLIR, as well as the process of stemming the words entered in the query process as a stage preceding the translation process. The evaluation of the proposed methodology translator of the query comparing it with the basic translation that uses readable dictionary automatically the percentage of improvement is 8.96%. The evaluation of the impact of the process of stemming the words entered in the query on the quality of the output process in the retrieval of matched data in other process the rate of improvement is 4.14%. Finally the rated output of the merger between the use of stemming methodology proposed and translation process (MORTAJA-IR-TOOL) which concluded that the proportion of advanced in the process of improvement in data rate of retrieval is 15.86%. Keywords: Cross lingual information retrieval, CLIR, Information Retrieval, IR, Translation, stemming.الاحتياجات المتنامية على شبكة الإنترنت جعلت المستخدمين لهم حق الوصول إلى المعلومات بلغة غير لغتهم الاصلية، مما يقودنا الى مصطلح عبور اللغات لاسترجاع المعلومات (CLIR). CLIR أنشئت كموضوع رئيسي في "استرجاع المعلومات" (IR). نهج واحد ل CLIR يستخدم أساليب مختلفة للترجمة ومنها لترجمة الاستعلامات وترجمة الوثائق والفهارس في لغات أخرى. الاستفسارات والاستعلامات المقدمة لمحركات البحث تعاني من عدم وجود ترجمه لمفاتيح الاستعلام (أي أن العبارة مفقودة من القاموس) وايضا تعاني من غموض الترجمة، مما يعني صعوبة في الاختيار بين بدائل الترجمة. في نهجنا في هذه الاطروحة تم بناء وتطوير الأداة البرمجية (MORTAJA-IR-TOOL) أداة جديدة لاسترجاع المعلومات باستخدام لغة البرمجة JAVA مع JDK 1.6، وتمتلك هذه الأداة العديد من الميزات، حيث تم تطوير منظومة منهجية متعددة اللغات لاستخدامها كأساس للترجمة عند استخدام CLIR، وكذلك عملية تجذير للكلمات المدخلة في عملية الاستعلام كمرحلة تسبق عملية الترجمة. وتم تقييم الترجمة المنهجية المقترحة للاستعلام ومقارنتها مع الترجمة الأساسية التي تستخدم قاموس مقروء اليا كأساس للترجمة في تجربة تركز على المستخدم وكانت نسبة التحسين 8.96% , وكذلك يتم تقييم مدى تأثير عملية تجذير الكلمات المدخلة في عملية الاستعلام على جودة المخرجات في عملية استرجاع البيانات المتطابقة باللغة الاخرى وكانت نسبة التحسين 4.14% , وفي النهاية تم تقييم ناتج عملية الدمج بين استخدام التجذير والترجمة المنهجية المقترحة (MORTAJA-IR-TOOL) والتي خلصت الى نسبة متقدمة في عملية التحسين في نسبة البيانات المرجعة وكانت 15.86%

    Matching Meaning for Cross-Language Information Retrieval

    Get PDF
    Cross-language information retrieval concerns the problem of finding information in one language in response to search requests expressed in another language. The explosive growth of the World Wide Web, with access to information in many languages, has provided a substantial impetus for research on this important problem. In recent years, significant advances in cross-language retrieval effectiveness have resulted from the application of statistical techniques to estimate accurate translation probabilities for individual terms from automated analysis of human-prepared translations. With few exceptions, however, those results have been obtained by applying evidence about the meaning of terms to translation in one direction at a time (e.g., by translating the queries into the document language). This dissertation introduces a more general framework for the use of translation probability in cross-language information retrieval based on the notion that information retrieval is dependent fundamentally upon matching what the searcher means with what the document author meant. The perspective yields a simple computational formulation that provides a natural way of combining what have been known traditionally as query and document translation. When combined with the use of synonym sets as a computational model of meaning, cross-language search results are obtained using English queries that approximate a strong monolingual baseline for both French and Chinese documents. Two well-known techniques (structured queries and probabilistic structured queries) are also shown to be a special case of this model under restrictive assumptions

    Classifying Attitude by Topic Aspect for English and Chinese Document Collections

    Get PDF
    The goal of this dissertation is to explore the design of tools to help users make sense of subjective information in English and Chinese by comparing attitudes on aspects of a topic in English and Chinese document collections. This involves two coupled challenges: topic aspect focus and attitude characterization. The topic aspect focus is specified by using information retrieval techniques to obtain documents on a topic that are of interest to a user and then allowing the user to designate a few segments of those documents to serve as examples for aspects that she wishes to see characterized. A novel feature of this work is that the examples can be drawn from documents in two languages (English and Chinese). A bilingual aspect classifier which applies monolingual and cross-language classification techniques is used to assemble automatically a large set of document segments on those same aspects. A test collection was designed for aspect classification by annotating consecutive sentences in documents from the Topic Detection and Tracking collections as aspect instances. Experiments show that classification effectiveness can often be increased by using training examples from both languages. Attitude characterization is achieved by classifiers which determine the subjectivity and polarity of document segments. Sentence attitude classification is the focus of the experiments in the dissertation because the best presently available test collection for Chinese attitude classification (the NTCIR-6 Chinese Opinion Analysis Pilot Task) is focused on sentence-level classification. A large Chinese sentiment lexicon was constructed by leveraging existing Chinese and English lexical resources, and an existing character-based approach for estimating the semantic orientation of other Chinese words was extended. A shallow linguistic analysis approach was adopted to classify the subjectivity and polarity of a sentence. Using the large sentiment lexicon with appropriate handling of negation, and leveraging sentence subjectivity density, sentence positivity and negativity, the resulting sentence attitude classifier was more effective than the best previously reported systems

    A Corpus-based Approach to the Chinese Word Segmentation

    Get PDF
    For a society based upon laws and reason, it has become too easy for us to believe that we live in a world without them. And given that our linguistics wisdom was originally motivated by the search for rules, it seems strange that we now consider these rules to be the exceptions and take exceptions as the norm. The current task of contemporary computational linguistics is to describe these exceptions. In particular, it suffices for most language processing needs, to just describe the argument and predicate within an elementary sentence, under the framework of local grammar. Therefore, a corpus-based approach to the Chinese Word Segmentation problem is proposed, as the first step towards a local grammar for the Chinese language. The two main issues with existing lexicon-based approaches are (a) the classification of unknown character sequences, i.e. sequences that are not listed in the lexicon, and (b) the disambiguation of situations where two candidate words overlap. For (a), we propose an automatic method of enriching the lexicon by comparing candidate sequences to occurrences of the same strings in a manually segmented reference corpus, and using methods of machine learning to select the optimal segmentation for them. These methods are developed in the course of the thesis specifically for this task. The possibility of applying these machine learning method will be discussed in NP-extraction and alignment domain. (b) is approached by designing a general processing framework for Chinese text, which will be called multi-level processing. Under this framework, sentences are recursively split into fragments, according to a language-specific, but domainindependent heuristics. The resulting fragments then define the ultimate boundaries between candidate words and therefore resolve any segmentation ambiguity caused by overlapping sequences. A new shallow semantical annotation is also proposed under the frame work of multi-level processing. A word segmentation algorithm based on these principles has been implemented and tested; results of the evaluation are given and compared to the performance of previous approaches as reported in the literature. The first chapter of this thesis discusses the goals of segmentation and introduces some background concepts. The second chapter analyses the current state-of-theart approach to Chinese language segmentation. Chapter 3 proposes a new corpusbased approach to the identification of unknown words. In chapter 4, a new shallow semantical annotation is also proposed under the framework of multi-level processing

    UA3/1/3 Education & Politics Scrapbook

    Get PDF
    Scrapbook regarding education & politics for 1911-1912 created by the WKU President\u27s Office during Henry Cherry\u27s tenure. Page numbers from original scrapbook and may not match pdf file. A Child\u27s Value 139 A Cure for Lawlessness, Elizabethtown News 89 A \u27Practical\u27 Education 137 Address & Resolutions to be Presented to State-Wide Rural School Conference, March 29, 1911 18 All Round Efficiency 4 Beckman, F.W. A Cry from Macedonia for Teachers of Agriculture . . . 58 Behind the School 3 Beveridge\u27s Eloquent Plea for Cause of Humanity 145 Campbell, James Jr. Commission Form of Government 85 Child Labor a National Crime 93 Child Labor Conference 96 Clay, C.M. The Initiative, Referendum and Recall of Officials, Jan. 26, 1912 108 Clay, Cassius. Initiative, Referendum and Recall 104 Conservation of the Soil, U.S. Dept. of Agriculture #38 10 Din\u27a Ye Hear the Slogan? re: J. McKenzie Todd 47 Dixon, S.V. Crime and Its Punishment 109 Dr. McCormack Submits Statement of Work of the Board of Health, Jan. 25 99 Editorial. Spokesman, Not the Ruler of the People 114 Education and Good Roads, Durham Co., NC 90 Enelow, H.G. The Religious Element of Education 17 Every Human Being on Earth Lives in a Cage 48 Farming Efficiency 107 Freedom & Opportunity 2 German Democracy 98 Good Teachers the Great Need 1 Gov. McCreary\u27s Message to Legislature, Jan. 2 33 Governor\u27s Speech Accepting the Statue of Lincoln, Nov. 8 20 Great Rural School Fair, advertisement 130 Helping the Boys to Make Good 93 Honesty in Politics 106 How Judson Harmon Looks as Presidential Timber 140 Interesting Event, Arbor Day, Nov. 18, 1911 87 It\u27s What We Do with the Chance that Counts in Life 54 Kaufman, Herbert. You Must Stand the Gaff 124 Kentucky\u27s Population by Color for All the Counties in the State, Dec. 5 94 Large and Small Farms 97 Law Indefensible Say Baptist Teachers 84 Lincoln\u27s Great Patience One of Chief Virtues, Nov. 8 22 Mason, Walt. The Poet Philosopher on Lillian Russell\u27s Engagement 124 McDermott, E.J. Imperative Law Reforms. 82 McFerran, John. A Greater Kentucky, Nov. 23 51 McLean County Teachers\u27 Association Division No. 4, Oct. 1911 92 M\u27Ferran, John. Improved Educational Conditions from an Investment Standpoint 136 M\u27Ferran, John. Necessity for \u27Best\u27 Teacher in Each School District 131 M\u27Ferran, John. The Thirty and Nine; or Why Stop With One? 132 Mr. Watterson\u27s Address Presenting Speed Statute, Nov. 8 23 Need of the Best 5 No Saved Soul in Lost Body, Declares Dr. E.L. Powell in Sermon on Social Reform 101 O\u27Rear, ? Compares the County Unit Planks 62 O\u27Rear, ? Speech Delivered at Elizabethtown, Aug. 14, 1911 65 O\u27Rear, ? Speech Delivered at Hartford re: Farmers & Laborers Organizations 73 O\u27Rear, ? Education in Kentucky. 56 Paint the Town Red re: Ohio Northern University Student 123 Poverty and the Public 138 Powell, E.L. The Realization of the Presence of God 88 Proposed Forestry Bill 96 Public Education 137 Republican Platform, July 12, 1911 55 Roosevelt, Theodore. Moose Rally at Madison Square, Oct. 30 127 Savoyard. Government by Commission 125 Sommers, H.A. Catechism on the Public Schools of Kentucky. 60 Spirit, the Endowment 9 Spiritual Equipment 3 Spokesman, Not the Ruler of the People re: Governor Wilson of NY 113 Taft, William Howard. Lincoln Farm Memorial Speech, Nov. 9 80 The Greater South 7 The Greatest Crime of this Age is War, etc. 52 The Ideal Teacher & Sunshine in Teaching 19 The Initiative, Referendum and Recall 29 The Method of Farmer Frank Black 94 The National Education Association Declaration, July 13, 1911 50 The Normal School Platform, Statement of Principles 130 The Right Training 139 To Raise the Standard of the County School, Nov. 21 64 White, L.R. Defects in Kentucky School Laws, Jan. 16, 1912 111 Wilson, Woodrow. Lop Off Patronage 133 Wilson, Woodrow. Message to the People, Oct. 19, 1912 115 Wilson, Woodrow. Pithy Paragraphs from Speech 121 Wilson, Woodrow. Reasons Wy Commission Government Excels, June 1911 116 Wilson, Woodrow. Speech Delivered to Kentucky Legislature Feb. 10. 122 Winchester, Boyd. The People and Civic Duty 12
    corecore