5 research outputs found

    Ορογραφία ρηματικών πολυλεκτικών εκφράσεων-Δημιουργία λεξιλογικού πόρου με τη χρήση ηλεκτρονικού Σώματος Κειμένων για την υποβοήθηση της μεταφραστικής διαδικασίας

    Get PDF
    Στόχος της παρούσας διπλωματικής εργασίας είναι η μελέτη ρηματικών πολυλεκτικών εκφράσεων της νέας ελληνικής γλώσσας μέσω της χειρωνακτικής επισημείωσης σε κειμενικά δεδομένα. Απώτερος σκοπός της εργασίας είναι ο σχεδιασμός και η ανάπτυξη δίγλωσσου λεξιλογικού πόρου στην ελληνική και στην αγγλική γλώσσα, ο οποίος θα περιλαμβάνει τις επισημειωμένες ρηματικές πολυλεκτικές εκφράσεις, τις συντακτικές και λεξιλογικές ιδιομορφίες τους και τη μετάφρασή τους στα αγγλικά. Ο λεξιλογικός πόρος που παρατίθεται στην εργασία θα μπορούσε να αποτελέσει ένα χρήσιμο εργαλείο για την υποβοήθηση της μεταφραστικής διαδικασίας. Η μεθοδολογία που ακολουθήθηκε για τη συλλογή του σώματος κειμένων, όσο και για τη δημιουργία του λεξιλογικού πόρου συνδυάζει τις νέες τεχνολογίες και βασίζεται σε πραγματικά δεδομένα. Η συγκεκριμένη μεταπτυχιακή διπλωματική εργασία όχι μόνο αξιοποίησε, αλλά και εμπλούτισε σχετική ερευνητική υποδομή, που στοχεύει στην επισημείωση ρηματικών πολυλεκτικών εκφράσεων σε κειμενικά δεδομένα και στη δημιουργία γλωσσικών πόρων για την αυτόματη επεξεργασία κειμένων.Multi-word expressions pose a number of issues especially because of their idiosyncrasies. In this regard, the creation of annotated resources and their manual identification on the basis of specific linguistic criteria is of paramount importance. The scope of this thesis is the study of Greek verbal multi-word expressions in a modern Greek corpus using manual annotation. The ultimate purpose of this thesis is the design and the development of a bilingual lexical resource in Greek and English, which comprises the verbal multi-word expressions, that were annotated, their syntactic and lexical idiosyncrasies and their translation in English. This lexical resource could be used as a helpful tool for assisting the translation process. The methodology adopted for the collection of the corpus and the creation of the lexical resource combines new technology and is based on real data. This thesis not only utilized, but also enriched relevant research, which aims to annotate verbal multi-word expressions in running texts for the development of lexical resources, which could be used for the automatic processing of running texts

    Creación y Simulación de Metodologías de Análisis, Clasificación e Integración de Nuevos Requerimientos a Software Propietario

    Get PDF
    La priorización de nuevos requerimientos a implementar en un software propietario es un punto fundamental para su mantenimiento, la conservación de la calidad, observación de las reglas de negocio y los estándares de la empresa. Aunque existen herramientas de priorización basadas en técnicas probadas y reconocidas, las mismas requieren una calificación previa de cada requerimiento. Si la empresa cuenta con solicitudes provenientes de varios clientes de un mismo producto, aumentan los factores que afectan a la empresa, las herramientas disponibles no contemplan estos aspectos y hacen mucho más compleja la tarea de calificación. Este trabajo de investigación abarca la realización de un relevamiento de los métodos de priorización y selección de nuevos requerimientos utilizados por empresas de la zona de Rosario, y la definición de una metodología para la selección un nuevo requerimiento, que implica el análisis y evaluación de todas las implicaciones sobre el producto de software y la empresa, respetando sus reglas de negocio. La metodología creada conduce a la definición de los procesos para la construcción de una herramienta de calificación y priorización de nuevos requerimientos en software propietario que tiene solicitudes de varios clientes al mismo tiempo, con instrumentos de calificación que consideran todos los aspectos relacionados, proveerá técnicas de priorización actuales y emitirá informes personalizados según diferentes perspectivas de la empresa.Eje: Ingeniería de SoftwareRed de Universidades con Carreras en Informática (RedUNCI

    Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.1)

    No full text
    This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). VMWEs were annotated according to the universal guidelines in 19 languages. The corpora are provided in the cupt format, inspired by the CONLL-U format. The corpora were used in the 1.1 edition of the PARSEME Shared Task (2018). For most languages, morphological and syntactic information ­­­­– not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.1 (2018). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.

    Annotated corpora and tools of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (edition 1.1)

    No full text
    This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). VMWEs were annotated according to the universal guidelines in 19 languages. The corpora are provided in the cupt format, inspired by the CONLL-U format. The corpora were used in the 1.1 edition of the PARSEME Shared Task (2018). For most languages, morphological and syntactic information ­­­­– not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.1 (2018). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.

    PARSEME corpora annotated for verbal multiword expressions (version 1.3)

    No full text
    This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). This is the first release of the corpora without an associated shared task. Previous version (1.2) was associated with the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, ­­­­including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. The annotation guidelines are available online: https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.3 The .cupt format is detailed here: https://multiword.sourceforge.net/cupt-format
    corecore