17 research outputs found

    Qualität in der Inhaltserschließung

    Get PDF
    This edited volume deals with issues relating to the quality of subject cataloging in the digital age, where heterogenous articles from different processes meet, and attempts to define important quality standards. Topics range from metadata and the cataloging policies of the German National Library, the GND, and the head offices of the German library association, to the presentation of a range of different projects, such as QURATOR and SoNAR

    بناء أداة تفاعلية متعددة اللغات لاسترجاع المعلومات

    Get PDF
    The growing requirement on the Internet have made users access to the information expressed in a language other than their own , which led to Cross lingual information retrieval (CLIR) .CLIR is established as a major topic in Information Retrieval (IR). One approach to CLIR uses different methods of translation to translate queries to documents and indexes in other languages. As queries submitted to search engines suffer lack of untranslatable query keys (i.e., words that the dictionary is missing) and translation ambiguity, which means difficulty in choosing between alternatives of translation. Our approach in this thesis is to build and develop the software tool (MORTAJA-IR-TOOL) , a new tool for retrieving information using programming JAVA language with JDK 1.6. This tool has many features, which is develop multiple systematic languages system to be use as a basis for translation when using CLIR, as well as the process of stemming the words entered in the query process as a stage preceding the translation process. The evaluation of the proposed methodology translator of the query comparing it with the basic translation that uses readable dictionary automatically the percentage of improvement is 8.96%. The evaluation of the impact of the process of stemming the words entered in the query on the quality of the output process in the retrieval of matched data in other process the rate of improvement is 4.14%. Finally the rated output of the merger between the use of stemming methodology proposed and translation process (MORTAJA-IR-TOOL) which concluded that the proportion of advanced in the process of improvement in data rate of retrieval is 15.86%. Keywords: Cross lingual information retrieval, CLIR, Information Retrieval, IR, Translation, stemming.الاحتياجات المتنامية على شبكة الإنترنت جعلت المستخدمين لهم حق الوصول إلى المعلومات بلغة غير لغتهم الاصلية، مما يقودنا الى مصطلح عبور اللغات لاسترجاع المعلومات (CLIR). CLIR أنشئت كموضوع رئيسي في "استرجاع المعلومات" (IR). نهج واحد ل CLIR يستخدم أساليب مختلفة للترجمة ومنها لترجمة الاستعلامات وترجمة الوثائق والفهارس في لغات أخرى. الاستفسارات والاستعلامات المقدمة لمحركات البحث تعاني من عدم وجود ترجمه لمفاتيح الاستعلام (أي أن العبارة مفقودة من القاموس) وايضا تعاني من غموض الترجمة، مما يعني صعوبة في الاختيار بين بدائل الترجمة. في نهجنا في هذه الاطروحة تم بناء وتطوير الأداة البرمجية (MORTAJA-IR-TOOL) أداة جديدة لاسترجاع المعلومات باستخدام لغة البرمجة JAVA مع JDK 1.6، وتمتلك هذه الأداة العديد من الميزات، حيث تم تطوير منظومة منهجية متعددة اللغات لاستخدامها كأساس للترجمة عند استخدام CLIR، وكذلك عملية تجذير للكلمات المدخلة في عملية الاستعلام كمرحلة تسبق عملية الترجمة. وتم تقييم الترجمة المنهجية المقترحة للاستعلام ومقارنتها مع الترجمة الأساسية التي تستخدم قاموس مقروء اليا كأساس للترجمة في تجربة تركز على المستخدم وكانت نسبة التحسين 8.96% , وكذلك يتم تقييم مدى تأثير عملية تجذير الكلمات المدخلة في عملية الاستعلام على جودة المخرجات في عملية استرجاع البيانات المتطابقة باللغة الاخرى وكانت نسبة التحسين 4.14% , وفي النهاية تم تقييم ناتج عملية الدمج بين استخدام التجذير والترجمة المنهجية المقترحة (MORTAJA-IR-TOOL) والتي خلصت الى نسبة متقدمة في عملية التحسين في نسبة البيانات المرجعة وكانت 15.86%

    Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval

    Get PDF
    Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu. Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände. In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval. Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten. Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt

    The effects of separate and merged indexes and word normalization in multilingual CLIR

    Get PDF
    Multilingual IR may be performed in two environments: there may exist a separate index for each target language, or all the languages may be indexed in a merged index. In the first case, retrieval must be performed separately in each index, after which the result lists have to be merged. In the case of the merged index, there are two alternatives: either to perform retrieval with a merged query (all the languages in the same query), or to perform distinct retrievals in each language, and merge the result lists. Further, there are several indexing approaches concerning word normalization. The present paper examines the impact of stemming compared with inflected retrieval in multilingual IR when there are separate indexes / a merged index. Four different result list merging approaches are compared with each other. The best result was achieved when retrieval was performed in separate indexes and result lists were merged. Stemming seems to improve the results compared with inflected retrieval

    The effects of separate and merged indexes and word normalization in multilingual CLIR

    Get PDF
    Multilingual IR may be performed in two environments: there may exist a separate index for each target language, or all the languages may be indexed in a merged index. In the first case, retrieval must be performed separately in each index, after which the result lists have to be merged. In the case of the merged index, there are two alternatives: either to perform retrieval with a merged query (all the languages in the same query), or to perform distinct retrievals in each language, and merge the result lists. Further, there are several indexing approaches concerning word normalization. The present paper examines the impact of stemming compared with inflected retrieval in multilingual IR when there are separate indexes / a merged index. Four different result list merging approaches are compared with each other. The best result was achieved when retrieval was performed in separate indexes and result lists were merged. Stemming seems to improve the results compared with inflected retrieval
    corecore