1,107 research outputs found

    Using collocation segmentation to augment the phrase table

    Get PDF
    This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC1 in cooperation with BMIC2 and VMU3. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation over the source and target corpus before the alignment causes that di erent and larger phrases are extracted from the same original documents. We performed this segmentation and used the union of this phrase set with the phrase set extracted from the nonsegmented corpus to compute the phrase table. We present the con gurations considered and also report results obtained with internal and o cial test sets.Postprint (published version

    UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system

    Get PDF
    This paper describes the UPC-BMIC-VMU participation in the IWSLT 2010 evaluation campaign. The SMT system is a standard phrase-based enriched with novel segmentations. These novel segmentations are computed using statistical measures such as Log-likelihood, T-score, Chi-squared, Dice, Mutual Information or Gravity-Counts. The analysis of translation results allows to divide measures into three groups. First, Log-likelihood, Chi-squared and T-score tend to combine high frequency words and collocation segments are very short. They improve the SMT system by adding new translation units. Second, Mutual Information and Dice tend to combine low frequency words and collocation segments are short. They improve the SMT system by smoothing the translation units. And third, Gravity- Counts tends to combine high and low frequency words and collocation segments are long. However, in this case, the SMT system is not improved. Thus, the road-map for translation system improvement is to introduce new phrases with either low frequency or high frequency words. It is hard to introduce new phrases with low and high frequency words in order to improve translation quality. Experimental results are reported in the Frenchto- English IWSLT 2010 evaluation where our system was ranked 3rd out of nine systems.Postprint (published version

    The Toxoplasma gondii plastid replication and repair enzyme complex, PREX

    Get PDF
    A plastid-like organelle, the apicoplast, is essential to the majority of medically and veterinary important apicomplexan protozoa including Toxoplasma gondii and Plasmodium. The apicoplast contains multiple copies of a 35 kb genome, the replication of which is dependent upon nuclear-encoded proteins that are imported into the organelle. In P. falciparum an unusual multi-functional gene, pfprex, was previously identified and inferred to encode a protein with DNA primase, DNA helicase and DNA polymerase activities. Herein, we report the presence of a prex orthologue in T. gondii. The protein is predicted to have a bi-partite apicoplast targeting sequence similar to that demonstrated on the PfPREX polypeptide, capable of delivering marker proteins to the apicoplast. Unlike the P. falciparum gene that is devoid of introns, the T. gondii prex gene carries 19 introns, which are spliced to produce a contiguous mRNA. Bacterial expression of the polymerase domain reveals the protein to be active. Consistent with the reported absence of a plastid in Cryptosporidium species, in silico analysis of their genomes failed to demonstrate an orthologue of prex. These studies indicate that prex is conserved across the plastid-bearing apicomplexans and may play an important role in the replication of the plastid genome

    How to Improve Visual Acuity in Keratoconic Cornea?

    Get PDF
    Keratoconus is one of the most important corneal diseases that causes preventable blindness, so we decided to review the main techniques for improving visual acuity in patients with progressive and nonprogressive keratoconus, in order to expand knowledge in relation to the range of therapeutic possibilities that exist today and the benefits and risks of each of these alternatives

    Improving statistical machine translation through adaptation and learning

    Get PDF
    With the arrival of free on-line machine translation (MT) systems, came the possibility to improve automatic translations with the help of daily users. One of the methods to achieve such improvements is to ask to users themselves for a better translation. It is possible that the system had made a mistake and if the user is able to detect it, it would be a valuable help to let the user teach the system where it made the mistake so it does not make it again if it finds a similar situation. Most of the translation systems you can find on-line provide a text area for users to suggest a better translation (like Google translator) or a ranking system for them to use (like Microsoft's). In 2009, as part of the Seventh Framework Programme of the European Commission, the FAUST project started with the goal of developing "machine translation (MT) systems which respond rapidly and intelligently to user feedback". Specifically, one of the project objective was to "develop mechanisms for instantaneously incorporating user feedback into the MT engines that are used in production environments, ...". As a member of the FAUST project, this thesis focused on developing one such mechanism. Formally, the general objective of this work was to design and implement a strategy to improve the translation quality of an already trained Statistical Machine Translation (SMT) system, using translations of input sentences that are corrections of the system's attempt to translate them. To address this problem we divided it in three specific objectives: 1. Define a relation between the words of a correction sentence and the words in the system's translation, in order to detect the errors that the former is aiming to solve. 2. Include the error corrections in the original system, so it learns how to solve them in case a similar situation occurs. 3. Test the strategy in different scenarios and with different data, in order to validate the applications of the proposed methodology. The main contributions made to the SMT field that can be found in this Ph.D. thesis are: - We defined a similarity function that compares an MT system output with a translation reference for that output and align the errors made by the system with the correct translations found in the reference. This information is then used to compute an alignment between the original input sentence and the reference. - We defined a method to perform domain adaptation based on the alignment mentioned before. Using this alignment with an in-domain parallel corpus, we extract new translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the correct translations found in the reference. These new units are then scored and combined with the units in the original system in order to improve its quality in terms of both human an automatic metrics. - We succesfully applied the method in a new task: to improve a SMT translation quality using post-editions provided by real users of the system. In this case, the alignment was computed over a parallel corpus build with post-editions, extracting translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the corrections found in the feedback provided. - The method proposed in this dissertation is able to achieve significant improvements in translation quality with a small learning material, corresponding to a 0.5% of the training material used to build the original system. Results from our evaluations also indicate that the improvement achieved with the domain adaptation strategy is measurable by both automatic a human-based evaluation metrics.Esta tesis propone un nuevo método para mejorar un sistema de Traducción Automática Estadística (SMT por sus siglas en inglés) utilizando post-ediciones de sus traducciones automáticas. La estrategia puede asociarse con la adaptación de dominio, considerando las post-ediciones obtenidas a través de usuarios reales del sistema de traducción como el material del dominio a adaptar. El método compara las post-ediciones con las traducciones automáticas con la finalidad de detectar automáticamente los lugares en los que el traductor cometió algún error, para poder aprender de ello. Una vez los errores han sido detectados se realiza un alineado a nivel de palabras entre las oraciones originales y las postediciones, para extraer unidades de traducción que son luego incorporadas al sistema base de manera que se corrijan los errores en futuras traducciones. Nuestros resultados muestran mejoras estadísticamente significativas a partir de un conjunto de datos que representa en tamaño un 0, 5% del material utilizado durante el entrenamiento. Junto con las medidas automáticas de calidad, también presentamos un análisis cualitativo del sistema para validar los resultados. Las mejoras en la traducción se observan en su mayoría en el léxico y el reordenamiento de palabras, seguido de correcciones morfológicas. La estrategia, que introduce los conceptos de corpus aumentado, función de similaridad y unidades de traducción derivadas, es probada con dos paradigmas de SMT (traducción basada en N-gramas y en frases), con dos pares de lengua (Catalán-Español e Inglés-Español) y en diferentes escenarios de adaptación de dominio, incluyendo un dominio abierto en el cual el sistema fue adaptado a través de peticiones recogidas por usuarios reales a través de internet, obteniendo resultados similares durante todas las pruebas. Los resultados de esta investigación forman parte del projecto FAUST (en inglés, Feedback Analysis for User adaptive Statistical Translation), un proyecto del Séptimo Programa Marco de la Comisión Europea

    Continuous improvement integrating technological tools to assertively accelerate decision-making of logistics. Case implemented in a construction materials supplier company

    Get PDF
    Considering that many of the logistics infrastructure designs around the world are often sup-ported by studies with various computational tools, but most of these solutions are using in isolation and little understandably. Therefore, it is proposed, to develop this research based on a Logistics Reference Model, which will allow, visualize, manage and analyze the different processes and logistical scenarios of the system, with in aim to execute the best cost-benefit strategy in a company dedicated to the distribution of construction materials. By implementing this methodology, the management of the company studied was able to make the best decision for the structuring of its processes in the area of picking and dispatch. The results showed a 50% re-duction in inventory review time, equal an increase in reliability 7% that leaves the company in around location close to 85.68%; a decrease in cycle time in each order between 20% and 40% which positively impacted the customer service level. In addition, a decrease in lead times for the receipt of materials to suppliers was achieved between 15% and 30%, and a decrease in the number of warehouses, went having from 5 independent to maintain one single distribution center

    Mars Science Laboratory Workstation Test Set

    Get PDF
    The Mars Science Laboratory developed the Workstation TestSet (WSTS) is a computer program that enables flight software development on virtual MSL avionics. The WSTS is the non-real-time flight avionics simulator that is designed to be completely software-based and run on a workstation class Linux PC
    corecore