Search CORE

92 research outputs found

Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

Author: EHRMANN MAUD
TURCHI MARCO
Publication venue: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Mexico
Publication date: 09/08/2011
Field of study

Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

Author: Torres-Moreno Juan-Manuel
Publication venue
Publication date: 14/09/2012
Field of study

In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

arXiv.org e-Print Archive

CiteSeerX

Content Extraction based on Hierarchical Relations in DOM Structures

Author: Insa Cabrera David
López Romero Sergio
Silva Galiana Josep Francesc
Publication venue: IPN, Centro de Innovación y Desarrollo Tecnológico en Cómputo
Publication date: 01/06/2012
Field of study

This article introduces a new approach for content extraction that exploits the hierarchical inter-relations of the elements in a webpage. Content extraction is a technique used to extract from a webpage the main textual content. This is useful in order to filter out the advertisements and all the additional information that is not part of the main content. The main idea behind our approach is to use the DOM tree as an explicit representation of the inter-relations of the elements in a webpage. Using the information contained in the DOM tree we can identify blocks of content and we can easily determine what of the blocks contains more text. Thanks to this information, the technique achieves a considerable recall and precision. Using the DOM structure for content extraction gives us the benefits of other approaches based on the syntax of the webpage (such as characters, words and tags), but it also gives us a very precise information regarding the related components in a block, thus, producing very cohesive blocks.López Romero, S.; Silva Galiana, JF.; Insa Cabrera, D. (2012). Content Extraction based on Hierarchical Relations in DOM Structures. Research and Development in Computer Science and Engineering. 45:5-12. http://hdl.handle.net/10251/47738S5124

RiuNet

Identification of Central Points in Road Networks using Betweenness Centrality Combined with Traffic Demand

Author: Ana Lucia
Cetertich Bazzan
Rodrigo De Abreu Batista
Publication venue
Publication date: 05/03/2020
Field of study

Abstract-This paper aims to identify central points in road networks considering traffic demand. This is made with a variation of betweenness centrality. In this variation, the graph that corresponds to the road network is weighted according to the number of routes generated by the traffic demand. To test the proposed approach three networks have been created, which are Porto Alegre and Sioux Falls cities and a regular 10 × 10 grid. Then, trips were microscopically simulated and the results were compared with the proposed method

CiteSeerX

CHATBOT FOR KNOWLEDGE – BASED MUSEUM RECOMMENDER SYSTEM (CASE STUDY: MUSEUM IN JAKARTA)

Author: Baizal Abdurahman
Hakim M. Rayhan
Publication venue: 'STKIP PGRI Tulungagung'
Publication date: 31/05/2022
Field of study

Sistem pemberi rekomendasi yang umum digunakan untuk merekomendasi museum adalah content-based filtering dan collaborative filtering. Tetapi, sistem pemberi rekomendasi tersebut mengalami permasalahan seperti cold start dan data sparsity, karena beberapa museum masih memiliki rating dan feedback yang rendah. Untuk mengatasi masalah tersebut, knowledge-based recommender system dapat digunakan untuk memberikan rekomendasi museum berdasarkan preferensi pengguna, sehingga sistem tidak perlu menggunakan rating dan feedback. Preferensi pengguna bisa didapatkan menggunakan conversational recommender system dengan memanfaatkan percakapan dua arah antara pengguna dengan sistem. Chatbot merupakan salah satu bentuk conversational recommender system yang umum digunakan. Penelitian ini mengembangkan sebuah chatbot untuk merekomendasikan museum di Jakarta menggunakan knowledge-based recommender system. Sistem yang dikembangkan menggunakan Rasa framework untuk membangun chatbot yang mampu melakukan percakapan dengan pengguna. Knowledge graph dan k-nearest neighbor digunakan untuk merekomendasikan museum berdasarkan preferensi pengguna. Berdasarkan evaluasi yang telah dilakukan, sistem yang dikembangkan dapat memahami pesan pengguna dan memberikan rekomendasi museum berdasarkan preferensi pengguna. Tetapi, performa sistem masih dapat dikembangkan supaya sistem dapat diandalkan pada skenario dunia nyata

JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika)

Jurnal Online STKIP PGRI Tulungagung (Sekolah Tinggi Keguruan Dan Ilmu Pendidikan Persatuan Guru Republik Indonesia)

Jurnal Online STKIP PGRI Tulungagung

What is SemEval evaluating?: A Systematic Analysis of Evaluation Campaigns in NLP

Author: Florea Malina
Freitas Andre
Wysocki Oskar
Publication venue
Publication date: 28/05/2020
Field of study

SemEval is the primary venue in the NLP community for the proposal of new challenges and for the systematic empirical evaluation of NLP systems. This paper provides a systematic quantitative analysis of SemEval aiming to evidence the patterns of the contributions behind SemEval. By understanding the distribution of task types, metrics, architectures, participation and citations over time we aim to answer the question on what is being evaluated by SemEval.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Similarity "Online Learning Content and Learning Management System for Early Detection of Cervical Cancer"

Author: Muljo Hery Harjono
Publication venue: Convergence Information Society (CIS)
Publication date: 01/02/2015
Field of study

Binus University Repository

Cross-Lingual Zero Pronoun Resolution

Author: Aloraini A
Poesio M
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)
Publication venue: ELRA and the Association for Computational Linguistics
Publication date: 31/05/2020
Field of study

In languages like Arabic, Chinese, Italian, Japanese, Korean, Portuguese, Spanish, and many others, predicate arguments in certainsyntactic positions are not realized instead of being realized as overt pronouns, and are thus called zero- or null-pronouns. Identifyingand resolving such omitted arguments is crucial to machine translation, information extraction and other NLP tasks, but depends heavilyonsemanticcoherenceandlexicalrelationships. WeproposeaBERT-basedcross-lingualmodelforzeropronounresolution,andevaluateit on the Arabic and Chinese portions of OntoNotes 5.0. As far as we know, ours is the first neural model of zero-pronoun resolutionfor Arabic; and our model also outperforms the state-of-the-art for Chinese. In the paper we also evaluate BERT feature extraction andfine-tune models on the task, and compare them with our model. We also report on an investigation of BERT layers indicating whichlayer encodes the most suitable representation for the task. Our code is available at https://github.com/amaloraini/cross-lingual-Z

Queen Mary Research Online