7 research outputs found

    Self-Organizing Word Map for Context-Based Document Classification

    Get PDF
    In this paper, a novel SOM-based system for document organization is presented. The purpose of the system is the classification of a document collection in terms of document content. The system possesses a two-level hybrid connectionist architecture that comprises (i) an automatically created word map using a SOM, which functions as a feature extraction module and (ii) a supervised MLP-based classifier, which provides the final classification result. The experiments, which have been performed on Modern Greek text documents, indicate that the proposed system separates effectively the different types of text

    Self-Organizing Word Map for Context-Based Document Classification

    Get PDF
    In this paper, a novel SOM-based system for document organization is presented. The purpose of the system is the classification of a document collection in terms of document content. The system possesses a two-level hybrid connectionist architecture that comprises (i) an automatically created word map using a SOM, which functions as a feature extraction module and (ii) a supervised MLP-based classifier, which provides the final classification result. The experiments, which have been performed on Modern Greek text documents, indicate that the proposed system separates effectively the different types of text

    D7.4 Third evaluation report. Evaluation of PANACEA v3 and produced resources

    Get PDF
    D7.4 reports on the evaluation of the different components integrated in the PANACEA third cycle of development as well as the final validation of the platform itself. All validation and evaluation experiments follow the evaluation criteria already described in D7.1. The main goal of WP7 tasks was to test the (technical) functionalities and capabilities of the middleware that allows the integration of the various resource-creation components into an interoperable distributed environment (WP3) and to evaluate the quality of the components developed in WP5 and WP6. The content of this deliverable is thus complementary to D8.2 and D8.3 that tackle advantages and usability in industrial scenarios. It has to be noted that the PANACEA third cycle of development addressed many components that are still under research. The main goal for this evaluation cycle thus is to assess the methods experimented with and their potentials for becoming actual production tools to be exploited outside research labs. For most of the technologies, an attempt was made to re-interpret standard evaluation measures, usually in terms of accuracy, precision and recall, as measures related to a reduction of costs (time and human resources) in the current practices based on the manual production of resources. In order to do so, the different tools had to be tuned and adapted to maximize precision and for some tools the possibility to offer confidence measures that could allow a separation of the resources that still needed manual revision has been attempted. Furthermore, the extension to other languages in addition to English, also a PANACEA objective, has been evaluated. The main facts about the evaluation results are now summarized

    Μηχανική Μάθηση στην Επεξεργασία Φυσικής Γλώσσας

    Get PDF
    Η διατριβή εξετάζει την χρήση τεχνικών μηχανικής μάθησης σε διάφορα στάδια της επεξεργασίας φυσικής γλώσσας, κυρίως για σκοπούς εξαγωγής πληροφορίας από κείμενα. Στόχος είναι τόσο η βελτίωση της προσαρμοστικότητας των συστημάτων εξαγωγής πληροφορίας σε νέες θεματικές περιοχές (ή ακόμα και γλώσσες), όσο και η επίτευξη καλύτερης απόδοσης χρησιμοποιώντας όσο το δυνατό λιγότερους πόρους (τόσο γλωσσικούς όσο και ανθρώπινους). Η διατριβή κινείται σε δύο κύριους άξονες: α) την έρευνα και αποτίμηση υπαρχόντων αλγορίθμων μηχανικής μάθησης κυρίως στα στάδια της προ-επεξεργασίας (όπως η αναγνώριση μερών του λόγου) και της αναγνώρισης ονομάτων οντοτήτων, και β) τη δημιουργία ενός νέου αλγορίθμου μηχανικής μάθησης και αποτίμησής του, τόσο σε συνθετικά δεδομένα, όσο και σε πραγματικά δεδομένα από το στάδιο της εξαγωγής σχέσεων μεταξύ ονομάτων οντοτήτων. Ο νέος αλγόριθμος μηχανικής μάθησης ανήκει στην κατηγορία της επαγωγικής εξαγωγής γραμματικών, και εξάγει γραμματικές ανεξάρτητες από τα συμφραζόμενα χρησιμοποιώντας μόνο θετικά παραδείγματα.This thesis examines the use of machine learning techniques in various tasks of natural language processing, mainly for the task of information extraction from texts. The objectives are the improvement of adaptability of information extraction systems to new thematic domains (or even languages), and the improvement of their performance using as fewer resources (either linguistic or human) as possible. This thesis has examined two main axes: a) the research and assessment of existing algorithms of machine learning mainly in the stages of linguistic pre-processing (such as part of speech tagging) and named-entity recognition, and b) the creation of a new machine learning algorithm and its assessment on synthetic data, as well as in real world data from the task of relation extraction between named entities. This new algorithm belongs to the category of inductive grammar learning, and can infer context free grammars from positive examples only

    Representation and parsing of multiword expressions

    Get PDF
    This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches

    Current trends

    Get PDF
    Deep parsing is the fundamental process aiming at the representation of the syntactic structure of phrases and sentences. In the traditional methodology this process is based on lexicons and grammars representing roughly properties of words and interactions of words and structures in sentences. Several linguistic frameworks, such as Headdriven Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Tree Adjoining Grammar (TAG), Combinatory Categorial Grammar (CCG), etc., offer different structures and combining operations for building grammar rules. These already contain mechanisms for expressing properties of Multiword Expressions (MWE), which, however, need improvement in how they account for idiosyncrasies of MWEs on the one hand and their similarities to regular structures on the other hand. This collaborative book constitutes a survey on various attempts at representing and parsing MWEs in the context of linguistic theories and applications
    corecore