19 research outputs found

    Einsatz neuronaler Netze als Transferkomponenten beim Retrieval in heterogenen Dokumentbeständen

    Full text link
    "Die zunehmende weltweite Vernetzung und der Aufbau von digitalen Bibliotheken führt zu neuen Möglichkeiten bei der Suche in mehreren Datenbeständen. Dabei entsteht das Problem der semantischen Heterogenität, da z.B. Begriffe in verschiedenen Kontexten verschiedene Bedeutung haben können. Die dafür notwendigen Transferkomponenten bilden eine neue Herausforderung, für die neuronale Netze gut geeignet sind." (Autorenreferat

    On-line learning for adaptive text filtering.

    Get PDF
    Yu Kwok Leung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1999.Includes bibliographical references (leaves 91-96).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- The Problem --- p.1Chapter 1.2 --- Information Filtering --- p.2Chapter 1.3 --- Contributions --- p.7Chapter 1.4 --- Organization Of The Thesis --- p.10Chapter 2 --- Related Work --- p.12Chapter 3 --- Adaptive Text Filtering --- p.22Chapter 3.1 --- Representation --- p.22Chapter 3.1.1 --- Textual Document --- p.23Chapter 3.1.2 --- Filtering Profile --- p.28Chapter 3.2 --- On-line Learning Algorithms For Adaptive Text Filtering --- p.29Chapter 3.2.1 --- The Sleeping Experts Algorithm --- p.29Chapter 3.2.2 --- The EG-based Algorithms --- p.32Chapter 4 --- The REPGER Algorithm --- p.37Chapter 4.1 --- A New Approach --- p.37Chapter 4.2 --- Relevance Prediction By RElevant feature Pool --- p.42Chapter 4.3 --- Retrieving Good Training Examples --- p.45Chapter 4.4 --- Learning Dissemination Threshold Dynamically --- p.49Chapter 5 --- The Threshold Learning Algorithm --- p.50Chapter 5.1 --- Learning Dissemination Threshold Dynamically --- p.50Chapter 5.2 --- Existing Threshold Learning Techniques --- p.51Chapter 5.3 --- A New Threshold Learning Algorithm --- p.53Chapter 6 --- Empirical Evaluations --- p.55Chapter 6.1 --- Experimental Methodology --- p.55Chapter 6.2 --- Experimental Settings --- p.59Chapter 6.3 --- Experimental Results --- p.62Chapter 7 --- Integrating With Feature Clustering --- p.76Chapter 7.1 --- Distributional Clustering Algorithm --- p.79Chapter 7.2 --- Integrating With Our REPGER Algorithm --- p.82Chapter 7.3 --- Empirical Evaluation --- p.84Chapter 8 --- Conclusions --- p.87Chapter 8.1 --- Summary --- p.87Chapter 8.2 --- Future Work --- p.88Bibliography --- p.91Chapter A --- Experimental Results On The AP Corpus --- p.97Chapter A.1 --- The EG Algorithm --- p.97Chapter A.2 --- The EG-C Algorithm --- p.98Chapter A.3 --- The REPGER Algorithm --- p.100Chapter B --- Experimental Results On The FBIS Corpus --- p.102Chapter B.1 --- The EG Algorithm --- p.102Chapter B.2 --- The EG-C Algorithm --- p.103Chapter B.3 --- The REPGER Algorithm --- p.105Chapter C --- Experimental Results On The WSJ Corpus --- p.107Chapter C.1 --- The EG Algorithm --- p.107Chapter C.2 --- The EG-C Algorithm --- p.108Chapter C.3 --- The REPGER Algorithm --- p.11

    Inter-relaão das técnicas Term Extration e Query Expansion aplicadas na recuperação de documentos textuais

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-graduação em Engenharia e Gestão do ConhecimentoConforme Sighal (2006) as pessoas reconhecem a importância do armazenamento e busca da informação e, com o advento dos computadores, tornou-se possível o armazenamento de grandes quantidades dela em bases de dados. Em conseqüência, catalogar a informação destas bases tornou-se imprescindível. Nesse contexto, o campo da Recuperação da Informação, surgiu na década de 50, com a finalidade de promover a construção de ferramentas computacionais que permitissem aos usuários utilizar de maneira mais eficiente essas bases de dados. O principal objetivo da presente pesquisa é desenvolver um Modelo Computacional que possibilite a recuperação de documentos textuais ordenados pela similaridade semântica, baseado na intersecção das técnicas de Term Extration e Query Expansion

    Supporting the Chinese Language in Oracle Text

    Get PDF
    Gegenstand dieser Arbeit sind die Problematik von chinesischem Information Retrieval (IR) sowie die Faktoren, die die Leistung eines chinesischen IR-System beeinflussen können. Experimente wurden im Rahmen des Bewertungsmodells von „TREC-5 Chinese Track“ und der Nutzung eines großen Korpusses von über 160.000 chinesischen Nachrichtenartikeln auf einer Oracle10g (Beta Version) Datenbank durchgeführt. Schließlich wurde die Leistung von Oracle® Text in einem so genannten „Benchmarking“ Prozess gegenüber den Ergebnissen der Teilnehmer von TREC-5 verglichen. Die Hauptergebnisse dieser Arbeit sind: (a) Die Wirksamkeit eines chinesischen IR Systems ist durch die Art und Weise der Formulierung einer Abfrage stark beeinflusst. Besonders sollte man während der Formulierung einer Anfrage die Vielzahl von Abkürzungen und die regionalen Unterschiede in der chinesischen Sprache, sowie die verschiedenen Transkriptionen der nicht-chinesischen Eigennamen beachten; (b) Stopwords haben keinen Einfluss auf die Leistungsfähigkeit eines chinesischen IR Systems; (c) die Benutzer neigen dazu, kürzere Abfragen zu formulieren, und die Suchergebnisse sind besonders schlecht, wenn Feedback und Expansion von Anfragen („query expansion“) nicht genutzt werden; (d) im Vergleich zu dem Chinese_Vgram_Lexer, hat der Chinese_Lexer den Vorteil, reale Wörter und einen kleineren Index zu erzeugen, sowie höhere Präzision in den Suchergebnissen zu erzielen; und (e) die Leistung von Oracle® Text für chinesisches IR ist vergleichbar mit den Ergebnissen von TREC-5

    Sparse multi-level representations for text retrieval

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. [153]-160).by Charles Lee Isbell, Junior.Ph.D

    Selective web information retrieval

    Get PDF
    This thesis proposes selective Web information retrieval, a framework formulated in terms of statistical decision theory, with the aim to apply an appropriate retrieval approach on a per-query basis. The main component of the framework is a decision mechanism that selects an appropriate retrieval approach on a per-query basis. The selection of a particular retrieval approach is based on the outcome of an experiment, which is performed before the final ranking of the retrieved documents. The experiment is a process that extracts features from a sample of the set of retrieved documents. This thesis investigates three broad types of experiments. The first one counts the occurrences of query terms in the retrieved documents, indicating the extent to which the query topic is covered in the document collection. The second type of experiments considers information from the distribution of retrieved documents in larger aggregates of related Web documents, such as whole Web sites, or directories within Web sites. The third type of experiments estimates the usefulness of the hyperlink structure among a sample of the set of retrieved Web documents. The proposed experiments are evaluated in the context of both informational and navigational search tasks with an optimal Bayesian decision mechanism, where it is assumed that relevance information exists. This thesis further investigates the implications of applying selective Web information retrieval in an operational setting, where the tuning of a decision mechanism is based on limited existing relevance information and the information retrieval system’s input is a stream of queries related to mixed informational and navigational search tasks. First, the experiments are evaluated using different training and testing query sets, as well as a mixture of different types of queries. Second, query sampling is introduced, in order to approximate the queries that a retrieval system receives, and to tune an ad-hoc decision mechanism with a broad set of automatically sampled queries
    corecore