Search CORE

2 research outputs found

Information dissemination based on semantic relations

Author: Katzagiannaki Irini - Ilektra G
Κατζαγιαννάκη Ειρήνη - Ηλέκτρα Γ
Publication venue
Publication date: 01/12/2002
Field of study

In a selective information dissemination (SDI) system, users submit profiles consisting of a number of long standing queries to represent their information needs. The system then continuously collects new documents from underlying information sources, filters them against the user profiles, and delivers relevant information to corresponding users. SDI systems are very important nowadays due to the vast amount of information that flows in the World Wide Web, as they inform users for relevant information, without requiring them to spend time to locate it. The majority of SDI systems are based on lexical search. In particular, they represent documents and user profiles as sets of terms and check the identicalness between these two sets, in order to make the decision for sending a document to a user. Users of SDI systems may use many different terms to express the same meaning, terms that are called synonyms. Simultaneously, when users seek information about a term, they are also interested in information about terms that are hyponyms (special terms) of this initial term. As a result, it is necessary for such a system to contain mechanisms that take into account the semantic relationships between terms during matching of user profiles with documents. In the present thesis, an SDI system has been implemented, which takes into account the semantic relationships between terms. In particular, a user profile is considered relevant with a document, if its terms or the synonyms or hyponyms of them appear in the document. The system deals with profiles that are represented in the two most popular models in information retrieval, namely the Boolean model and the Vector Space model. In order to improve the systems performance, an index structure of profiles rather than of documents has been created, as profile information constitutes a larger volume and is more static. When a document arrives from the information sources, a matching algorithm for the document and the profiles in the index structure is executed. This algorithm takes into account the semantics of terms. Finally documents that contain the terms of a profile or relative terms of profile terms are delivered to user. The system has been evaluated based on experiments that have been conducted.Σε ένα σύστημα επιλεκτικής διασποράς πληροφορίας, οι χρήστες αποστέλλουν προφίλ στα οποία δηλώνουν τα ενδιαφέροντά τους. Αυτά τα ενδιαφέροντα μπορούν να θεωρηθούν ως συνεχείς επερωτήσεις προς ένα σύστημα διασποράς πληροφορίας. Το σύστημα συλλέγει συνεχώς νέα κείμενα από τις πηγές πληροφορίας, τα φιλτράρει σε σχέση με τα προφίλ των χρηστών και παραδίδει τη σχετική πληροφορία στους αντίστοιχους χρήστες. Τα συστήματα επιλεκτικής διασποράς πληροφορίας αποτελούν μία αναγκαιότητα σήμερα λόγω κυρίως του μεγάλου όγκου της πληροφορίας που διαδίδεται μέσω του παγκόσμιου ιστού, καθώς ενημερώνουν το χρήστη για την πληροφορία που τον ενδιαφέρει, χωρίς αυτός να δαπανά χρόνο να την εντοπίσει. Τα περισσότερα συστήματα επιλεκτικής διασποράς πληροφορίας βασίζονται στη λεκτική αναζήτηση. Πιο συγκεκριμένα, εκφράζουν τα κείμενα και τα προφίλ των χρηστών ως σύνολα λέξεων και ελέγχουν την ταυτοσημότητα των λέξεων ανάμεσα στα δύο αυτά σύνολα για να αποφασίσουν την αποστολή ενός κειμένου σε ένα χρήστη. Ωστόσο, συχνά οι χρήστες ενός συστήματος επιλεκτικής διασποράς πληροφορίας χρησιμοποιούν πολλούς διαφορετικούς όρους για να δηλώσουν την ίδια έννοια, όρους που χαρακτηρίζονται ως συνώνυμα. Παράλληλα, όταν κάποιος χρήστης αναζητά πληροφορία για κάποιον όρο, σίγουρα τον ενδιαφέρει και η πληροφορία που αναφέρεται σε όρους ειδικότερους από αυτόν. Επομένως είναι απαραίτητο ένα τέτοιο σύστημα να διατηρεί μηχανισμούς που λαμβάνουν υπόψη τις σημασιολογικές συσχετίσεις ανάμεσα στους όρους κατά τη σύγκριση των προφίλ και των κειμένων. Στην παρούσα εργασία υλοποιήθηκε ένα σύστημα επιλεκτικής διασποράς πληροφορίας το οποίο λαμβάνει υπόψη τις σημασιολογικές συσχετίσεις των όρων. Πιο συγκεκριμένα ένα προφίλ θεωρείται σχετικό με ένα κείμενο, όχι μόνο στην περίπτωση που οι όροι του εμφανίζονται στο κείμενο, αλλά και όταν τα συνώνυμα ή τα υπώνυμα (ειδικότεροι όροι) των όρων του παρουσιάζονται στο κείμενο. Το σύστημα διαχειρίζεται προφίλ εκφρασμένα σε δύο από τα πιο διαδεδομένα μοντέλα στο χώρο της ανάκτησης πληροφορίας, στο Boolean μοντέλο και στο Vector Space μοντέλο. Για την αύξηση της απόδοσης του συστήματος δημιουργείται μία δομή ευρετηρίασης των προφίλ, και όχι των κειμένων, καθώς τα προφίλ είναι περισσότερα και πιο στατικά. Όταν εμφανίζεται κάποιο κείμενο από τις πηγές πληροφορίας, εκτελείται ένας αλγόριθμος σύγκρισης του κειμένου με τα προφίλ που υπάρχουν στη δομή, ο οποίος λαμβάνει υπόψη του τη σημασιολογία των όρων. Τελικά αποστέλλονται στους χρήστες τα κείμενα τα οποία περιέχουν όρους λεκτικά όμοιους ή σχετικούς με τους όρους του προφίλ τους. Η απόδοση του συστήματος έχει αξιολογηθεί βάσει πειραμάτων που πραγματοποιήθηκαν

E-Locus

Information Dissemination based on Semantic Relations

Author: Dimitris Plexousakis
Irini Electra Katzagiannaki
Publication venue
Publication date
Field of study

Abstract. In a selective information dissemination (SDI) system, users submit profiles consisting of a number of long-standing queries to represent their information needs. The system then continuously collects new documents from underlying information sources, filters them against the user profiles, and delivers relevant information to corresponding users. SDI systems are very important nowadays due to the vast amount of information that flows in the World Wide Web, as they inform users for relevant information, without requiring them to spend time to locate it. This paper presents an SDI system, which takes into account the lexical, as well as the semantic relationships between terms of documents and user profiles. In particular, a user profile is considered relevant to a document, if its terms or their synonyms or hyponyms appear in the document. The paper also presents a profile index structure supporting both the Boolean and Vector Space models. 1

CiteSeerX