7,815 research outputs found

    Enhancing scarce-resource language translation through pivot combinations

    Get PDF
    Chinese and Spanish are the most spoken languages in the world. However, there is not much research done in machine translation for this language pair. We experiment with the parallel Chinese-Spanish corpus (United Nations) to explore alternatives of SMT strategies which consist on using a pivot language. Particularly, two well-known alternatives are shown for pivoting: the cascade system and the pseudo-corpus. As Pivot language we use English, Arabic and French. Results show that English is the best pivot language between Chinese and Spanish. As a new strategy, we propose to perform a combination of the pivot strategies which is capable to highly outperform the direct translation strategy.Postprint (published version

    A Client mobile application for Chinese-Spanish statistical machine translation

    Get PDF
    This show and tell paper describes a client mobile application for Chinese-Spanish machine translation. The system combines a standard server-based statistical machine translation (SMT) system, which requires online operation, with different input modalities including text, optical character recognition (OCR) and automatic speech recognition (ASR). It also includes an index-based search engine for supporting off-line translation.Postprint (published version

    Plagiarism detection using information retrieval and similarity measures based on image processing techniques

    Get PDF
    This paper describes the Barcelona Media Innovation Center participation in the 2nd International Competition on Plagiarism Detection. Particularly, our system focused on the external plagiarism detection task, which assumes the source documents are available. We present a two-step a approach. In the first step of our method, we build an information retrieval system based on Solr/Lucene, segmenting both suspicious and source documents into smaller texts.We perform a search based on bag-of-words which provides a first selection of potentially plagiarized texts. In the second step, each promising pair is further investigated. We implemented a sliding window approach that computes cosine distances between overlapping text segments from both the source and suspicious documents on a pair wise basis. As a result, a similarity matrix between text segments is obtained, which is smoothed by means of low-pass 2-D filtering. From the smoothed similarity matrix, plagiarized segments are identified by using image processing techniques. Our results were placed in the middle of the official ranking, which considered together two types of plagiarism: intrinsic and external.Postprint (published version

    Las fuentes de las enseñanzas a Pero Niño en El Victorial

    Get PDF

    Sentence similarity-based source context modelling in PBSMT

    Get PDF
    Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation (PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of supertags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with supertag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a supertag-based feature

    Gender inequality and economic growth in Spain: an exploratory analysis.

    Get PDF
    This paper compares the geometric variant of the Gender-Related Development Index with that of the Human Development Index for Spanish provinces (EUROSTAT Nomenclature of Territorial Units for Statistics-3, NUTS-3) in 1959, 1981, and 1999. The main objective is to carry out an exploratory analysis of the relationship between these indices and two alternative indices of gender inequality— the Relative Status of Women and the Gender Inequality Index. An analysis of the relationship between these indices and economic growth at the provincial level is also conducted

    Comunicació publicitària i estudis de relacions públiques

    Get PDF
    La publicitat, en el seu sentit més ampli, ha adquirit en la Comunitat Valenciana especificitat com a objecte d'estudi acadèmic a partir de l'any 2000, quan les titulacions en aquest àmbit s'han assentat i s'ha doctorat un elenc d'investigadors novells. Ara, les línies de treball són àmplies i diverses. Tant a la Universitat d'Alacant, com a la Jaume I de Castelló o a la Cardenal Herrera - CEU (UCH - CEU) de València, a més de l'anàlisi dels manifestos publicitaris, són objecte d'investigació, entre altres qüestions, els sistemes i les polítiques de comunicació o la docència en comunicació persuasiva. No obstant això, la idiosincràsia de cada institució ha marcat diferents singularitats per a cadascuna de les universitats. Així, a Alacant, adquireixen rellevància els estudis centrats en el benestar social; a Castelló, aquells orientats a la comunicació comercial i corporativa; i, a la UCH - CEU, els estudis sobre processos i tècniques publicitàries i les noves tecnologies.Publicity, in its broadest sense, has acquired weight in the Autonomous Community of Valencia as a subject of academic study since 2000, when degrees in the field of publicity began to be granted and a group of young researchers received doctoral degrees. At present, a wide range of diverse lines of work is being carried out. At the Universitat d’Alacant, at the Universitat Jaume I in Castelló, and at the Universidad Cardenal Herrera - CEU in Valencia proper, besides analysis of publicity texts, subjects such as communication systems and policies, and courses in persuasive communication are being researched. Each university, however, has its own idiosyncrasies. The Universitat d’Alacant focuses more on social welfare studies; the Universitat Jaume I, on commercial and corporative communication; and Cardenal Herrera - CEU, on publicity processes and techniques, and the new technologies

    Medir mejor para un desarrollo sostenible. La dimensión democrática ausente en el IDH

    Get PDF
    International audienceA pesar de los intentos de Naciones Unidas por conseguir una medida empírica y relativa del desarrollo humano, más allá de la simple consideración del progreso económico en términos de crecimiento y cambio estructural, el peso de la variable renta en la construcción del IDH es tan grande que este indicador sintético se hace cada vez más redundante, lo que es producto del carácter reduccionista de su medición. Se precisa, por tanto, una redefinición del IDH que ponga énfasis en la dimensión política que está en el origen de la definición misma de desarrollo humano

    UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system

    Get PDF
    This paper describes the UPC-BMIC-VMU participation in the IWSLT 2010 evaluation campaign. The SMT system is a standard phrase-based enriched with novel segmentations. These novel segmentations are computed using statistical measures such as Log-likelihood, T-score, Chi-squared, Dice, Mutual Information or Gravity-Counts. The analysis of translation results allows to divide measures into three groups. First, Log-likelihood, Chi-squared and T-score tend to combine high frequency words and collocation segments are very short. They improve the SMT system by adding new translation units. Second, Mutual Information and Dice tend to combine low frequency words and collocation segments are short. They improve the SMT system by smoothing the translation units. And third, Gravity- Counts tends to combine high and low frequency words and collocation segments are long. However, in this case, the SMT system is not improved. Thus, the road-map for translation system improvement is to introduce new phrases with either low frequency or high frequency words. It is hard to introduce new phrases with low and high frequency words in order to improve translation quality. Experimental results are reported in the Frenchto- English IWSLT 2010 evaluation where our system was ranked 3rd out of nine systems.Postprint (published version

    Using collocation segmentation to augment the phrase table

    Get PDF
    This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC1 in cooperation with BMIC2 and VMU3. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation over the source and target corpus before the alignment causes that di erent and larger phrases are extracted from the same original documents. We performed this segmentation and used the union of this phrase set with the phrase set extracted from the nonsegmented corpus to compute the phrase table. We present the con gurations considered and also report results obtained with internal and o cial test sets.Postprint (published version
    corecore