Search CORE

106 research outputs found

MIRACLE at GeoCLEF Query Parsing 2007: Extraction and Classification of Geographical Information

Author: Goñi Menoyo José Miguel
Lana Serrano Sara
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2007
Field of study

This paper describes the participation of MIRACLE research consortium at the Query Parsing task of GeoCLEF 2007. Our system is composed of three main modules. First, the Named Geo-entity Identifier, whose objective is to perform the geo-entity identification and tagging, i.e., to extract the “where” component of the geographical query, should there be any. This module is based on a gazetteer built up from the Geonames geographical database and carries out a sequential process in three steps that consist on geo-entity recognition, geo-entity selection and query tagging. Then, the Query Analyzer parses this tagged query to identify the “what” and “geo-relation” components by means of a rule-based grammar. Finally, a two-level multiclassifier first decides whether the query is indeed a geographical query and, should it be positive, then determines the query type according to the type of information that the user is supposed to be looking for: map, yellow page or information. According to a strict evaluation criterion where a match should have all fields correct, our system reaches a precision value of 42.8% and a recall of 56.6% and our submission is ranked 1st out of 6 participants in the task. A detailed evaluation of the confusion matrixes reveal that some extra effort must be invested in “user-oriented” disambiguation techniques to improve the first level binary classifier for detecting geographical queries, as it is a key component to eliminate many false-positives

Archivo Digital UPM

Report of MIRACLE team for the Ad-Hoc track in CLEF 2006

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2006
Field of study

This paper presents the 2006 MIRACLE’s team approach to the AdHoc Information Retrieval track. The experiments for this campaign keep on testing our IR approach. First, a baseline set of runs is obtained, including standard components: stemming, transforming, filtering, entities detection and extracting, and others. Then, a extended set of runs is obtained using several types of combinations of these baseline runs. The improvements introduced for this campaign have been a few ones: we have used an entity recognition and indexing prototype tool into our tokenizing scheme, and we have run more combining experiments for the robust multilingual case than in previous campaigns. However, no significative improvements have been achieved. For the this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, French, Hungarian, and Portuguese. - Bilingual: English to Bulgarian, French, Hungarian, and Portuguese; Spanish to French and Portuguese; and French to Portuguese. - Robust monolingual: German, English, Spanish, French, Italian, and Dutch. - Robust bilingual: English to German, Italian to Spanish, and French to Dutch. - Robust multilingual: English to robust monolingual languages. We still need to work harder to improve some aspects of our processing scheme, being the most important, to our knowledge, the entities recognition and normalization

Archivo Digital UPM

Miracle’s 2005 Approach to Cross-lingual Information Retrieval

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2005
Field of study

This paper presents the 2005 Miracle’s team approach to Bilingual and Multilingual Information Retrieval. In the multilingual track, we have concentrated our work on the merging process of the results of monolingual runs to get the multilingual overall result, relying on available translations. In the bilingual and multilingual tracks, we have used available translation resources, and in some cases we have using a combining approach

Archivo Digital UPM

Miracle’s 2005 Approach to Monolingual Information Retrieval

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2005
Field of study

This paper presents the 2005 Miracle’s team approach to Monolingual Information Retrieval. The goal for the experiments in this year was twofold: continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extracting, paragraph extracting, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query

Archivo Digital UPM

Revisión entre pares como instrumento de aprendizaje

Author: Crespo García Raquel
Villena Román Julio
Publication venue
Publication date: 01/03/2005
Field of study

Proyecto de Innovación Docente en las asignaturas de Organización de Contenidos Audiovisuales y e Inteligencia en Redes de ComunicacionesEste artículo describe la experiencia de innovación docente llevada a cabo este último curso basada en la aplicación de la metodología de revisión entre iguales como instrumento para el aprendizaje, desarrollada en las asignaturas Organización de Contenidos Audiovisuales (Ingeniería Técnica de Telecomunicación, especialidad Sonido e Imagen) durante el curso 2003-04 e Inteligencia en Redes de Comunicaciones (Ingeniería de Telecomunicación) durante el curso 2004-05. Los experimentos realizados muestran la utilidad de la metodología de revisión entre pares como instrumento para el aprendizaje, percibido de forma subjetiva por los alumnos participantes y demostrado empíricamente en un experimento de control mediante preguntas test, descrito en este documento.

Universidad Carlos III de Madrid e-Archivo

DAEDALUS at PAN 2014: Guessing tweet author's gender and age

Author: González Cristóbal José Carlos
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2014
Field of study

This paper describes our participation at PAN 2014 author profiling task. Our idea was to define, develop and evaluate a simple machine learning classifier able to guess the gender and the age of a given user based on his/her texts, which could become part of the solution portfolio of the company. We were interested in finding not the best possible classifier that achieves the highest accuracy, but to find the optimum balance between performance and throughput using the most simple strategy and less dependent of external systems. Results show that our software using Naive Bayes Multinomial with a term vector model representation of the text is ranked quite well among the rest of participants in terms of accuracy

Archivo Digital UPM

MIRACLE at ImageCLEFanot 2007: Machine Learning Experiments on Medical Image Annotation

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Lana Serrano Sara
Villena Román Julio
Publication venue: E.U.I.T. Telecomunicación (UPM)
Publication date: 01/01/2007
Field of study

This paper describes the participation of MIRACLE research consortium at the ImageCLEF Medical Image Annotation task of ImageCLEF 2007. Our areas of expertise do not include image analysis, thus we approach this task as a machine-learning problem, regardless of the domain. FIRE is used as a black-box algorithm to extract different groups of image features that are later used for training different classifiers in order to predict the IRMA code. Three types of classifiers are built. The first type is a single classifier that predicts the complete IRMA code. The second type is a two level classifier composed of four classifiers that individually predict each axis of the IRMA code. The third type is similar to the second one but predicts a combined pair of axes. The main idea behind the definition of our experiments is to evaluate whether an axis-by-axis prediction is better than a prediction by pairs of axes or the complete code, or vice versa. We submitted 30 experiments to be evaluated and results are disappointing compared to other groups. However, the main conclusion that can be drawn from the experiments is that, irrespective of the selected image features, the axis-by-axis prediction achieves more accurate results not only than the prediction of a combined pair of axes but also, in turn, than the prediction of the complete IRMA code. In addition, data normalization seems to improve the predictions and vector-based features are preferred over histogram-based ones

Archivo Digital UPM

Report of MIRACLE team for the Ad-Hoc track in CLEF 2007

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Lana Serrano Sara
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/09/2007
Field of study

This paper presents the 2007 MIRACLE’s team approach to the AdHoc Information Retrieval track. The work carried out for this campaign has been reduced to monolingual experiments, in the standard and in the robust tracks. No new approaches have been attempted in this campaign, following the procedures established in our participation in previous campaigns. For this campaign, runs were submitted for the following languages and tracks: - Monolingual: Bulgarian, Hungarian, and Czech. - Robust monolingual: French, English and Portuguese. There is still some room for improvement around multilingual named entities recognition

Archivo Digital UPM

MIRACLE at ImageCLEFannot 2008: Classification of Image Features for Medical Image Annotation

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Lana Serrano Sara
Villena Román Julio
Publication venue: E.U.I.T. Telecomunicación (UPM)
Publication date: 01/01/2008
Field of study

This paper describes the participation of MIRACLE research consortium at the ImageCLEF Medical Image Annotation task of ImageCLEF 2008. A lot of effort was invested this year to develop our own image analysis system, based on MATLAB, to be used in our experiments. This system extracts a variety of global and local features including histogram, image statistics, Gabor features, fractal dimension, DCT and DWT coefficients, Tamura features and coocurrency matrix statistics. Then a k-Nearest Neighbour algorithm analyzes the extracted image feature vectors to determine the IRMA code associated to a given image. The focus of our experiments is mainly to test and evaluate this system in-depth and to make a comparison among diverse configuration parameters such as number of images for the relevance feedback to use in the classification module

Archivo Digital UPM

MIRACLE’s Naive Approach to Medical Images Annotation

Author: González Cristóbal José Carlos
Goñi Menoyo José Miguel
Martínez Fernández José Luis
Villena Román Julio
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2005
Field of study

One of the proposed tasks of the ImageCLEF 2005 campaign has been an Automatic Annotation Task. The objective is to provide the classification of a given set of 1,000 previously unseen medical (radiological) images according to 57 predefined categories covering different medical pathologies. 9,000 classified training images are given which can be used in any way to train a classifier. The Automatic Annotation task uses no textual information, but image-content information only. This paper describes our participation in the automatic annotation task of ImageCLEF 2005

Archivo Digital UPM