    El Coru帽a Corpus Tool: diez a帽os despu茅s

    In this paper we provide a brief introduction to a new version of the Coru帽a Corpus Tool. Currently available for Windows, macOS and Linux, the Coru帽a Corpus Tool is a corpus management tool that facilitates the retrieval of information from an indexed textual repository. Although it works like most concordance programs, its distinguishing feature is that it allows users to search for old or non-standard characters and tags in texts and metadata files, as well as to extract and export specific data for the purposes of research. With a new set of advanced search features and other recent improvements, researchers now have access to functionalities that significantly enhance the previous user experience.En este art铆culo presentamos una breve introducci贸n a una nueva versi贸n del Coru帽a Corpus Tool. Actualmente disponible para Windows, macOS y Linux, el Coru帽a Corpus Tool es una herramienta de gesti贸n de corpus que facilita la recuperaci贸n de informaci贸n desde un repositorio textual indexado. Aunque funciona como la mayor铆a de los programas de concordancia, su caracter铆stica distintiva es que permite a los usuarios buscar caracteres y etiquetas antiguos o no est谩ndar en archivos de texto y metadatos, as铆 como extraer y exportar datos espec铆ficos con fines de investigaci贸n. Con un nuevo conjunto de funciones de b煤squeda avanzada y otras mejoras recientes, los investigadores ahora tienen acceso a funcionalidades que mejoran significativamente la experiencia previa del usuario.The research reported here has been funded by the Spanish Ministry of the Economy, Industry and Competitiveness (MINECO), grant number FFI2016-75599-P. This grant is hereby gratefully acknowledged. The second author also acknowledges the support of the Spanish Ministry of Science, Innovation and Universities, grant number FPU014/01724

    Distributed multivariate regression with unknown noise covariance in the presence of outliers: an MDL approach

    We consider the problem of estimating the coefficients in a multivariable linear model by means of a wireless sensor network which may be affected by anomalous measurements. The noise covariance matrices at the different sensors are assumed unknown. Treating outlying samples, and their support, as additional nuisance parameters, the Maximum Likelihood estimate is investigated, with the number of outliers being estimated according to the Minimum Description Length principle. A distributed implementation based on iterative consensus techniques is then proposed, and it is shown effective for managing outliers in the data.Peer ReviewedPostprint (author's final draft

    Information retrieval models for recommender systems

    Programa Oficial de Doutoramento en Computaci贸n . 5009V01[Abstract] Information retrieval addresses the information needs of users by delivering relevant pieces of information but requires users to convey their information needs explicitly. In contrast, recommender systems offer personalized suggestions of items automatically. Ultimately, both fields help users cope with information overload by providing them with relevant items of information. This thesis aims to explore the connections between information retrieval and recommender systems. Our objective is to devise recommendation models inspired in information retrieval techniques. We begin by borrowing ideas from the information retrieval evaluation literature to analyze evaluation metrics in recommender systems. Second, we study the applicability of pseudo-relevance feedback models to different recommendation tasks. We investigate the conventional top-N recommendation task, but we also explore the recently formulated user-item group formation problem and propose a novel task based on the liquidation oflong tail items. Third, we exploit ad hoc retrieval models to compute neighborhoods in a collaborative filtering scenario. Fourth, we explore the opposite direction by adapting an effective recommendation framework to pseudo-relevance feedback. Finally, we discuss the results and present our concIusions. In summary, this doctoral thesis adapts a series of information retrieval models to recommender systems. Our investigation shows that many retrieval models can be accommodated to deal with different recommendation tasks. Moreover, we find that taking the opposite path is also possible. Exhaustive experimentation confirms that the proposed models are competitive. Finally, we also perform a theoretical analysis of sorne models to explain their effectiveness.[Resumen] La recuperaci贸n de informaci贸n da respuesta a las necesidades de informaci贸n de los usuarios proporcionando informaci贸n relevante, pero requiere que los usuarios expresen expl铆citamente sus necesidades de informaci贸n. Por el contrario, los sistemas de recomendaci贸n ofrecen sugerencias personalizadas de elementos autom谩ticamente. En 煤ltima instancia, ambos campos ayudan a los usuarios a lidiar con la sobrecarga de informaci贸n al proporcionarles informaci贸n relevante. Esta tesis tiene como prop贸sito explorar las conexiones entre la recuperaci贸n de informaci贸n y los sistemas de recomendaci贸n. Nuestro objetivo es dise帽ar modelos de recomendaci贸n inspirados en t茅cnicas de recuperaci贸n de informaci贸n. Comenzamos tomando prestadas ideas de la literatura de evaluaci贸n en recuperaci贸n de informaci贸n para analizar las m茅tricas de evaluaci贸n en los sistemas de recomendaci贸n. En segundo lugar, estudiamos la aplicabilidad de los modelos de retroalimentaci贸n de pseudo-relevancia a diferentes tareas de recomendaci贸n. Investigamos la tarea de recomendar listas ordenadas de elementos, pero tambi茅n exploramos el problema recientemente formulado de formaci贸n de grupos usuario-elemento y proponemos una tarea novedosa basada en la liquidaci贸n de los elementos de la larga cola. Tercero, explotamos modelos de recuperaci贸n ad hoc para calcular vecindarios en un escenario de filtrado colaborativo. En cuarto lugar, exploramos la direcci贸n opuesta adaptando un m茅todo eficaz de recomendaci贸n a la retroalimentaci贸n de pseudo-relevancia. Finalmente, discutimos los resultados y presentamos nuestras conclusiones. En resumen, esta tesis doctoral adapta varios modelos de recuperaci贸n de informaci贸n para su uso como sistemas de recomendaci贸n. Nuestra investigaci贸n muestra que muchos modelos de recuperaci贸n de informaci贸n se pueden aplicar para tratar diferentes tareas de recomendaci贸n. Adem谩s, comprobamos que tomar el camino contrario tambi茅n es posible. Una experimentaci贸n exhaustiva confirma que los modelos propuestos son competitivos. Finalmente, tambi茅n realizamos un an谩lisis te贸rico de algunos modelos para explicar su efectividad.[Resumo] A recuperaci贸n de informaci贸n d谩 resposta 谩s necesidades de informaci贸n dos usuarios proporcionando informaci贸n relevante, pero require que os usuarios expresen explicitamente as s煤as necesidades de informaci贸n. Pola contra, os sistemas de recomendaci贸n ofrecen suxesti贸ns personalizadas de elementos automaticamente. En 煤ltima instancia, ambos os campos axudan aos usuarios a lidar coa sobrecarga de informaci贸n ao proporcionarlles informaci贸n relevante. Esta tese ten como prop贸sito explorar as conexi贸ns entre a recuperaci贸n de informaci贸n e os sistemas de recomendaci贸n. O naso obxectivo 茅 dese帽ar modelos de recomendaci贸n inspirados en t茅cnicas de recuperaci贸n de informaci贸n. Comezamos tomando prestadas ideas da literatura de avaliaci贸n en recuperaci贸n de informaci贸n para analizar as m茅tricas de avaliaci贸n nos sistemas de recomendaci贸n. En segundo lugar, estudamos a aplicabilidade dos modelos de retroalimentaci贸n de seudo-relevancia a diferentes tarefas de recomendaci贸n. Investigamos a tarefa de recomendar listas ordenadas de elementos, pero tam茅n exploramos o problema recentemente formulado de formaci贸n de grupos de usuario-elemento e propo帽emos unha tarefa nova baseada na liquidaci贸n dos elementos da longa cola. Terceiro, explotamos modelos de recuperaci贸n ad hoc para calcular veci帽anzas nun escenario de filtrado colaborativo. En cuarto lugar, exploramos a direcci贸n aposta adaptando un m茅todo eficaz de recomendaci贸n 谩 retroalimentaci贸n de seudo-relevancia. Finalmente, discutimos os resultados e presentamos as nasas conclusi贸ns. En resumo, esta tese doutoral adapta varios modelos de recuperaci贸n de informaci贸n para o seu uso como sistemas de recomendaci贸n. A nosa investigaci贸n mostra que moitos modelos de recuperaci贸n de informaci贸n p贸dense aplicar para tratar diferentes tarefas de recomendaci贸n. Ademais, comprobamos que tomar o cami帽o contrario tam茅n 茅 posible. Unha experimentaci贸n exhaustiva confirma que os modelos propostos son competitivos. Finalmente, tam茅n realizamos unha an谩lise te贸rica dalg煤ns modelos para explicar a s煤a efectividade

    Priors for Diversity and Novelty on Neural Recommender Systems

    [Abstract] PRIN is a neural based recommendation method that allows the incorporation of item prior information into the recommendation process. In this work we study how the system behaves in terms of novelty and diversity under different configurations of item prior probability estimations. Our results show the versatility of the framework and how its behavior can be adapted to the desired properties, whether accuracy is preferred or diversity and novelty are the desired properties, or how a balance can be achieved with the proper selection of prior estimations.Ministerio de Ciencia, Innovaci贸n y Universidades; RTI2018-093336-B-C22Xunta de Galicia; GPC ED431B 2019/03Xunta de Galicia; ED431G/01Ministerio de Ciencia, Innovaci贸n y Universidades; FPU17/03210Ministerio de Ciencia, Innovaci贸n y Universidades; FPU014/0172

    Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost

    [Abstract] Information Retrieval is not any more exclusively about document ranking. Continuously new tasks are proposed on this and sibling fields. With this proliferation of tasks, it becomes crucial to have a cheap way of constructing test collections to evaluate the new developments. Building test collections is time and resource consuming: it requires time to obtain the documents, to define the user needs and it requires the assessors to judge a lot of documents. To reduce the latest, pooling strategies aim to decrease the assessment effort by presenting to the assessors a sample of documents in the corpus with the maximum number of relevant documents in it. In this paper, we propose the preliminary design of different techniques to easily and cheapily build high-quality test collections without the need of having participants systems.Ministerio de Ciencia, Innovaci贸n y Universidades; RTI2018-093336-B-C22Xunta de Galicia; GPC ED431B 2019/03Xunta de Galicia; ED431G/0

    Docencia en sistemas de acceso 谩 informaci贸n: detecci贸n de plaxios, emprego de tecnolox铆as avanzadas para desenvolvemento software e achegamento da experiencia na industria 谩 aula

    [Resumo] Este artigo presenta as actividades desenvolvidas polo grupo de innovaci贸n educativa en Sistemas de Acceso 谩 Informaci贸n durante o curso 2017/2018. Este grupo, con docencia na Facultade de Inform谩tica da Universidade da Coru帽a, realizou acci贸ns en tres li帽as de actuaci贸n diferentes. A primeira delas, dirixida 谩 mellora da calidade nos m茅todos de avaliaci贸n, consiste no emprego dun protocolo para a detecci贸n de plaxios en pr谩cticas de programaci贸n. A segunda actividade pretende mellorar a empregabilidade do alumnado e consiste en utilizar unha metodolox铆a de aprendizaxe baseada en proxectos xunto cunha serie de ferramentas avanzadas para desenvolvemento software, permitindo recrear a actividade que deber谩n levar a cabo cando se incorporen ao mundo laboral. Por 煤ltimo, e de cara a aumentar o co帽ecemento das alternativas profesionais do alumnado, organiz谩ronse unha serie de seminarios e charlas impartidas por profesionais dunha empresa internacional, unha empresa local multidisciplinar e un investigador da contorna acad茅mica. A experiencia obtida das diferentes actividades foi satisfactoria e enriquecedora tanto para o alumnado como para o profesorado, que xa baralla melloras de cara aos vindeiros cursos acad茅micos.[Abstract] This paper presents the activities performed by the educative innovation group in Information Access Systems during the academic year 2017/2018. This group, with teaching at the Faculty of Informatics of the University of A Coru帽a, carried out actions addressing three different topics. The first action was designed to improve the quality of the evaluation methods, and consisted in following a protocol for detecting plagiarism in programming exercises. The second activity aimed to improve the employability of the students and consisted in using a methodology based on project-based learning along with a series of advanced tools for software development, which recreated the activity that the students will carry out when they obtain their first job. Lastly, heading towards a better knowledge about the available professional alternatives, a series of seminars and talks were organized, which were performed by professionals from an international company, a local interdisciplinary company, and a researcher from an academic institution. The experience obtained from the different activities was satisfactory for both students and teachers, who are already considering improvements for the next academic year

    ArcDrain: A GIS Add-In for Automated Determination of Surface Runoff in Urban Catchments

    ABSTRACT: Surface runoff determination in urban areas is crucial to facilitate ex ante water planning, especially in the context of climate and land cover changes, which are increasing the frequency of floods, due to a combination of violent storms and increased imperviousness. To this end, the spatial identification of urban areas prone to runoff accumulation is essential, to guarantee effective water management in the future. Under these premises, this work sought to produce a tool for automated determination of urban surface runoff using a geographic information systems (GIS). This tool, which was designed as an ArcGIS add-in called ArcDrain, consists of the discretization of urban areas into subcatchments and the subsequent application of the rational method for runoff depth estimation. The formulation of this method directly depends on land cover type and soil permeability, thereby enabling the identification of areas with a low infiltration capacity. ArcDrain was tested using the city of Santander (northern Spain) as a case study. The results achieved demonstrated the accuracy of the tool for detecting high runoff rates and how the inclusion of mitigation measures in the form of sustainable drainage systems (SuDS) and green infrastructure (GI) can help reduce flood hazards in critical zonesThis research was funded by the Spanish Ministry of Science, Innovation, and Universities, with funds from the State General Budget (PGE) and the European Regional Development Fund (ERDF), grant number RTI2018-094217-B-C32 (MCIU/AEI/FEDER, UE)

    Spatial Statistical Modeling of Rockfall Hazard in a Mountainous Road in Cantabria (Spain)

    Rockfall events are one of the most frequent types of mass wasting in mountainous areas, causing service and traffic disruption, as well as infrastructure and human damage. Hence, having accurate tools to model these hazards becomes crucial to prevent fatalities, especially in a context of climate change whereby the effects of these phenomena might be exacerbated. Under this premise, this article concerned the development of a framework for assessing rockfall hazard in mountainous areas. First, a set of factors expected to favor rockfalls were processed and aggregated using spatial analysis tools, yielding a series of hazard maps with which to fit observed data through statistical modeling. The validation process was undertaken with the support of a database containing the number of rocks removed from a mountainous road section located in Cantabria, northern Spain. The results achieved, which demonstrated the accuracy of the proposed approach to reproduce rockfall hazard using frequency data, highlighted the primary role played by factors such as slope, runoff threshold, and precipitation to explain the occurrence of these events. The effects of climate change were considerably influenced by the fluctuations in the projections of precipitation, which limited the variations in the spatial distribution and magnitude of rockfall hazard.This work was supported in part by the Spanish Ministry of Science, Innovation, and Universities, in part by the State General Budget (PGE), and in part by the European Regional Development Fund (ERDF)under Grant RTI2018-094217-B-C32 (MCIU/AEI/FEDER, UE). The work of Alejandro Roldan-Valcarce was supported by the Spanish Ministry of Science, Innovation and Universities through a Researcher Formation Fellowship under Grant PRE2019-08945