1,332 research outputs found

    Corpus-Based Machine Translation : A Study Case for the e-Government of Costa Rica Corpus-Based Machine Translation: A Study Case for the e-Government of Costa Rica

    Get PDF
    Esta investigación pretende estudiar el estado del arte en las tecnologías de la traducción automática. Se explorará la teoría fundamental de los sistemas estadísticos basados en frases (PB-SMT) y neuronales (NMT): su arquitectura y funcionamiento. Luego, nos concentraremos en un caso de estudio que pondrá a prueba la capacidad del traductor para aprovechar al máximo el potencial de estas tecnologías. Este caso de estudio incita al traductor a poner en práctica todos sus conocimientos y habilidades profesionales para llevar a cabo la preparación de datos, entrenamiento, evaluación y ajuste de los motores.This research paper aims to approach the state-of-the-art technologies in machine translation. Following an overview of the architecture and mechanisms underpinning PB-SMT and NMT systems, we will focus on a specific use-case that would attest the translator's agency at maximizing the cutting-edge potential of these technologies, particularly the PB-SMT's capacity. The use-case urges the translator to dig out of his/her toolbox the best practices possible to improve the translation output text by means of data preparation, training, assessment and refinement tasks

    Metodología para la clasificación de documentos de texto de hojas de vida basado en aprendizaje de máquina

    Get PDF
    El proceso de selección de personal es complejo y requiere una gran cantidad de información y análisis para encontrar a los candidatos adecuados para una posición. Incluye varias etapas, como la revisión de currículums, pruebas psicológicas y verificación de referencias. Sin embargo, el análisis de currículums puede ser un desafío, ya que implica una intervención humana y la gran cantidad de información puede resultar difícil de procesar por computadora. Además, las empresas pueden enfrentar dificultades y costos elevados debido a la complejidad del proceso y la alta demanda en el mercado laboral. Para resolver este problema, se propone la metodología CVNLP (Curriculum Natural Language Processing), que utiliza un conjunto de 725 hojas de vida en formatos PDF, DOCX y DOC para analizar los currículums de manera eficiente y eficaz. La metodología se aplica de manera transversal y ha demostrado su eficacia en la selección de personal. Al reducir los costos y mejorar la eficiencia en el proceso de selección de personal, las empresas pueden centrarse en su núcleo de negocio y facilitar el proceso de selección de personal. En resumen, la metodología CVNLP se presenta como una solución prometedora para mejorar la eficacia y eficiencia en los procesos de selección de personal, especialmente para las PYMEs con recursos limitado

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Transfomer Models: From Model Inspection to Applications in Patents

    Get PDF
    L'elaborazione del linguaggio naturale viene utilizzata per affrontare diversi compiti, sia di tipo linguistico, come ad esempio l'etichettatura della parte del discorso, il parsing delle dipendenze, sia più specifiche, come ad esempio la traduzione automatica e l'analisi del sentimento. Per affrontare questi compiti, nel tempo sono stati sviluppati approcci dedicati.Una metodologia che aumenta le prestazioni in tutti questi casi in modo unificato è la modellazione linguistica, che consiste nel preaddestrare un modello per sostituire i token mascherati in grandi quantità di testo, in modo casuale all'interno di pezzi di testo o in modo sequenziale uno dopo l'altro, per sviluppare rappresentazioni di uso generale che possono essere utilizzate per migliorare le prestazioni in molti compiti contemporaneamente.L'architettura di rete neurale che attualmente svolge al meglio questo compito è il transformer, inoltre, le dimensioni del modello e la quantità dei dati sono essenziali per lo sviluppo di rappresentazioni ricche di informazioni. La disponibilità di insiemi di dati su larga scala e l'uso di modelli con miliardi di parametri sono attualmente il percorso più efficace verso una migliore rappresentazione del testo.Tuttavia, i modelli di grandi dimensioni comportano una maggiore difficoltà nell'interpretazione dell'output che forniscono. Per questo motivo, sono stati condotti diversi studi per indagare le rappresentazioni fornite da modelli di transformers.In questa tesi indago questi modelli da diversi punti di vista, studiando le proprietà linguistiche delle rappresentazioni fornite da BERT, per capire se le informazioni che codifica sono localizzate all'interno di specifiche elementi della rappresentazione vettoriale. A tal fine, identifico pesi speciali che mostrano un'elevata rilevanza per diversi compiti di sondaggio linguistico. In seguito, analizzo la causa di questi particolari pesi e li collego alla distribuzione dei token e ai token speciali.Per completare questa analisi generale ed estenderla a casi d'uso più specifici, studio l'efficacia di questi modelli sui brevetti. Utilizzo modelli dedicati, per identificare entità specifiche del dominio, come le tecnologie o per segmentare il testo dei brevetti. Studio sempre l'analisi delle prestazioni integrandola con accurate misurazioni dei dati e delle proprietà del modello per capire se le conclusioni tratte per i modelli generici valgono anche in questo contesto.Natural Language Processing is used to address several tasks, linguistic related ones, e.g. part of speech tagging, dependency parsing, and downstream tasks, e.g. machine translation, sentiment analysis. To tackle these tasks, dedicated approaches have been developed over time.A methodology that increases performance on all tasks in a unified manner is language modeling, this is done by pre-training a model to replace masked tokens in large amounts of text, either randomly within chunks of text or sequentially one after the other, to develop general purpose representations that can be used to improve performance in many downstream tasks at once.The neural network architecture currently best performing this task is the transformer, moreover, model size and data scale are essential to the development of information-rich representations. The availability of large scale datasets and the use of models with billions of parameters is currently the most effective path towards better representations of text.However, with large models, comes the difficulty in interpreting the output they provide. Therefore, several studies have been carried out to investigate the representations provided by transformers models trained on large scale datasets.In this thesis I investigate these models from several perspectives, I study the linguistic properties of the representations provided by BERT, a language model mostly trained on the English Wikipedia, to understand if the information it codifies is localized within specific entries of the vector representation. Doing this I identify special weights that show high relevance to several distinct linguistic probing tasks. Subsequently, I investigate the cause of these special weights, and link them to token distribution and special tokens.To complement this general purpose analysis and extend it to more specific use cases, given the wide range of applications for language models, I study their effectiveness on technical documentation, specifically, patents. I use both general purpose and dedicated models, to identify domain-specific entities such as users of the inventions and technologies or to segment patents text. I always study performance analysis complementing it with careful measurements of data and model properties to understand if the conclusions drawn for general purpose models hold in this context as well

    Human Rights in Patient Care: A Practitioner Guide - Macedonia

    Get PDF
    Health systems can too often be places of punishment, coercion, and violations of basic rights—rather than places of treatment and care. In many cases, existing laws and tools that provide remedies are not adequately used to protect rights.This Practitioner Guide series presents practical how-to manuals for lawyers interested in taking cases around human rights in patient care. The manuals examine patient and provider rights and responsibilities, as well as procedures for protection through both the formal court system and alternative mechanisms in 10 countries.Each Practitioner Guide is country-specific, supplementing coverage of the international and regional framework with national standards and procedures in the following:ArmeniaGeorgiaKazakhstanKyrgyzstanMacedoniaMoldova (forthcoming)RomaniaRussia (forthcoming)SerbiaUkraineThis series is the first to systematically examine the application of constitutional, civil, and criminal laws; categorize them by right; and provide examples and practical tips. As such, the guides are useful for medical professionals, public health mangers, Ministries of Health and Justice personnel, patient advocacy groups, and patients themselves.Advancing Human Rights in Patient Care: The Law in Seven Transitional Countries is a compendium that supplements the practitioner guides. It provides the first comparative overview of legal norms, practice cannons, and procedures for addressing rights in health care in Armenia, Georgia, Kazakhstan, Kyrgyzstan, Macedonia, Russia, and Ukraine.A Legal Fellow in Human Rights in each country is undertaking the updating of each guide and building the field of human rights in patient care through trainings and the development of materials, networks, and jurisprudence. Fellows are recent law graduates based at a local organization with expertise and an interest in expanding work in law, human rights, and patient care. To learn more about the fellowships, please visit health-rights.org

    Special Libraries, November 1965

    Get PDF
    Volume 56, Issue 9https://scholarworks.sjsu.edu/sla_sl_1965/1008/thumbnail.jp
    corecore