9 research outputs found

    Diseño de una metodología para la extracción de funciones y mapas relacionales a partir de herramientas de minería de texto

    Get PDF
    En las organizaciones, uno de los problemas más frecuentes es la ausencia de un procedimiento para documentar, sistematizar y recopilar la información relativa a las nociones de cargos específicos. Esta situación se da en especial, cuando se trata de trabajadores del conocimiento. Estos son entendidos como aquellos cuya labor dentro de a organización está relacionada a sus habilidades para crear comunicar conocimiento. Frente a lo anterior, la Minería de Texto se convierte en una herramienta para extraer una fracción de tales actividades y relaciones, para encontrar dentro en el correo electrónico la respuesta a este conocimiento desaprovechado.In organizations, one of the most common problems is the absence of a procedure for document, systematize and gather information on the functions the perform. This situation occurs especially when they are knowledge workers. These are understood as those whose work within an organization is related to their ability to crease, transform and communicate knowledge.Magíster en Ingeniería IndustrialMaestrí

    The Future of Information Sciences : INFuture2013 : Information Governance

    Get PDF

    Natural language processing meets business:algorithms for mining meaning from corporate texts

    Get PDF

    Terminology Integration in Statistical Machine Translation

    Get PDF
    Elektroniskā versija nesatur pielikumusPromocijas darbs apraksta autora izpētītas metodes un izstrādātus rīkus divvalodu terminoloģijas integrācijai statistiskās mašīntulkošanas sistēmās. Autors darbā piedāvā inovatīvas metodes terminu integrācijai SMT sistēmu trenēšanas fāzē (ar statiskas integrācijas palīdzību) un tulkošanas fāzē (ar dinamiskas integrācijas palīdzību). Darbā uzmanība pievērsta ne tikai metodēm terminu integrācijai SMT, bet arī metodēm valodas resursu, kas nepieciešami dažādu uzdevumu veikšanai terminu integrācijas SMT darbplūsmās, ieguvei. Piedāvātās metodes ir novērtētas automātiskas un manuālas novērtēšanas eksperimentos. Iegūtie rezultāti parāda, ka statiskās un dinamiskās integrācijas metodes ļauj būtiski uzlabot tulkošanas kvalitāti. Darbā aprakstītie rezultāti ir aprobēti vairākos pētniecības projektos un ieviesti praktiskos risinājumos. Atslēgvārdi: statistiskā mašīntulkošana, terminoloģija, starpvalodu informācijas izvilkšanaThe doctoral thesis describes methods and tools researched and developed by the author for bilingual terminology integration into statistical machine translation systems. The author presents novel methods for terminology integration in SMT systems during training (through static integration) and during translation (through dynamic integration). The work focusses not only on the SMT integration techniques, but also on methods for acquisition of linguistic resources that are necessary for different tasks involved in workflows for terminology integration in SMT systems. The proposed methods have been evaluated using automatic and manual evaluation methods. The results show that both static and dynamic integration methods allow increasing translation quality. The thesis describes also areas where the methods have been approbated in practice. Keywords: statistical machine translation, terminology, cross-lingual information extractio

    The Future of Information Sciences : INFuture2009 : Digital Resources and Knowledge Sharing

    Get PDF

    The Prime Machine: a user-friendly corpus tool for English language teaching and self-tutoring based on the Lexical Priming theory of language

    Get PDF
    This thesis presents the design and evaluation of a new concordancer called The Prime Machine which has been developed as an English language learning and teaching tool. The software has been designed to provide learners with a multitude of examples from corpus texts and additional information about the contextual environment in which words and combinations of words tend to occur. The prevailing view of how language operates has been that grammar and lexis are separate systems and sentences can be constructed merely by choosing any syntactic structure and slotting in vocabulary. Over the last few decades, however, corpus linguistics has presented challenges to this view of language, drawing on evidence which can be found in the patterning of language choices in texts. Nevertheless, despite some reports of success from researchers in this area, only a limited number of teachers and learners of second language seem to make direct use of corpus software tools. The desire to develop a new corpus tool grew out of professional experience as an English language teacher and manager in China. This thesis begins by introducing some background information about the role of English in international higher education and the language learning context in China, and then goes on to describe the software architecture and the process by which corpus texts are transformed from their raw state into rows of data in a sophisticated database to be accessed by the concordancer. It then introduces innovations including several aspects of the search screen interface, the concordance line display and the use of collocation data. The software provides a rich learning platform for language learners to independently look up and compare similar words, different word forms, different collocations and the same words across two corpora. Underpinning the design is a view of language which draws on Michael Hoey's theory of Lexical Priming. The software is designed to make it possible to see tendencies of words and phrases which are not usually apparent in either dictionary examples or the output from other concordancing software. The design features are considered from a pedagogical perspective, focusing on English for Academic Purposes and including important software design principles from Computer Aided Language Learning. Through a small evaluation involving undergraduate students, the software has been shown to have great potential as a tool for the writing process. It is believed that The Prime Machine will be a very useful corpus tool which, while simple to operate, provides a wealth of information for English language teaching and self-tutoring
    corecore