    Computational tools and spoken corpora design: an ongoing dialogue

    The design of an oral corpus and the processes of registering, codifying and treating the materials in order to build a useful resource for linguistic analysis prompt numerous decisions regarding theory and methodology. This article is focused on those stages of corpus construction which are more clearly conditioned by the computational processing necessary to make it functional.  In order to adequately match the initial expectations and the real possibilities of using the tool, each feature we intend to codify must be measured against the workload and the means required to do so. Therefore, it is essential to take into account the available possibilities of processing and exploitation as they have a crucial impact on decisions regarding the corpus’ construction. Based on experience acquired in the construction of the ESLORA corpus, the present article looks into some of the problems arising in the process of designing an oral corpus, such as the delicacy with which oral phenomena are represented, the segmentation of the discourse, the coexistence of different simultaneous tagging systems and the particularities of annotation in a bilingual or multilingual context

    Les eines computacionals i el disseny de corpus orals: un diàleg vigent

    El disseny d'un corpus oral i els processos de registrar, codificar i tractar els materials per construir un recurs útil per a l'anàlisi lingüística, comporta nombroses decisions pel que fa a la teoria i la metodologia. Aquest article s'ocupa d'aquelles etapes de la construcció d'un corpus que més clarament estan condicionades pel processament informàtic necessari que ha de fer el corpus funcional. Per tal de conjugar les expectatives inicials i les possibilitats reals quan usem l'eina, cada característica que pretenem codificar ha de ser mesurada quant a la càrrega de treball que comporta i els mitjans que són requerits per fer-ho possible. Per això, és essencial tenir en compte els recursos disponibles a l'hora de processar i explotar el corpus, ja que tenen un impacte fonamental en les decisions pel que fa a la construcció del corpus. Basat en l'experiència adquirida en la construcció del corpus ESLORA, l'article analitza alguns dels problemes que sorgeixen en el procés de dissenyar un corpus oral, com ara el grau de detall en què és representat el fenomen oral, la segmentació del discurs, la convivència de diferents sistemes d'etiquetatge simultanis i les particularitats de l'anotació en un context bilingüe o multilingüeThis study was financed by the Agencia Estatal de Investigación (AEI) 'Spanish State Research Agency' and by the Fondo Europeo de Desarrollo Regional (FEDER) (European Regional Development Fund) through the ESLORA+ project (FFI2017-86379-P). The authors are members of the research group Gramática del español 'Spanish Grammar' from the University of Santiago de Compostela, which has been awarded a grant for the Strengthening and Organisation of Research Groups with Potential for Growth by the Regional Government's Education Department (ED431B 2017/39). The study has also benefited from the participation of the ESLORA project in the Red temática en estudios de Análisis del Discurso (FFI2017-90738-REDT)S

    Corpus lingüísticos estruturados de grandes dimensións: metodoloxía e sistemas de recuperación de información

    Tese defendida o 12 de Febreiro de 2010 na Facultade de Informática da Universidade da Coruña[Resumen] La reciente evolución de Internet ha permitido el acceso a un volumen de información enorme, pero toda esta no resulta útil si no existe una manera precisa de encontrar lo que se necesita en un momento dado, Por eso, casi paralelamente al crecimiento de Internet se han ido desarrollando sistemas de recuperación de información (RI) que permitían localizar la información relevante en cada caso, dando lugar a lo que hoy conocemos como buscadores. Pero uno de los principales problemas que presentan estos sistemas radica en que, en general, la información que utilizan está muy poco estructurada, lo que limita en cierto modo sus posibilidades: no se pueden delimitar secciones en los documentos, ni aplicar filtros de búsqueda, etc., es decir, solo se permite introducir una expresión de búsqueda que se intenta encontrar en toda la base documental. Debido a estas carencias, al mismo tiempo también se han ido desarrollando sistemas de RI que requerían que la información estuviera organizada de algún modo particular. Estos sistemas no están diseñados para hacer búsquedas en Internet en general, sino que actúan sobre un conjunto más grande o más pequeño de información disponible y ofrecen más oportunidades de búsqueda. Estas dos vertientes evolutivas, la de utilizar información desestructurada y la de tenerla organizada, llegaron a la actualidad propiciando la aparición de distintas herramientas de búsqueda. Por un lado, tenemos los buscadores de Internet, que permiten localizar documentos que satisfacen una búsqueda concreta y, por la otra, los sistemas que utilizan información estructurada, que cubren aspectos como la obtención de datos de clientes, facturación, control de stock, etc. Finalmente, incluso hai entornos que pueden combinar en diferente grado estas dos vertientes (herramientas de minería de datos, sistemas de predicción, etc.) En este trabajo tratamos un caso particular de los sistemas de RI que utilizan información estructurada: el de los sistemas lingüísticos que trabajan con grandes colecciones de documentos (corpus), lo que enmarca la presente tesis de doctorado dentro de la lingüística computacional y, más concretamente, en la lingüística de corpus. Aunque en este campo también hai un ámplio espectro de posibilidades, nos centramos en aquellos en los que la información que necesitan los usuarios, normalmente lingüistas, está relacionada con la frecuencia de ocurrencia de palabras o con la visualización de ejemplos en su contexto. La evolución de estos sistemas ha sido practicamente simultánea al desarrollo de la informática. Desde las primeras herramientas de búsqueda monolíticas que utilizaban colecciones textuales, consideradas ahora de reducidas dimensiones, se ha ido evolucionando gracias al incremento de la capacidad de los ordenadores, hasta los actuales sistemas de consulta a través de la red que manejan corpus de gran tamaño. Nos centramos en estos últimos, analizando las diferentes posibilidades y tecnologías disponibles actualmente para desarrollarlos pero, además, tambien hacemos una propuesta metodológica genérica para la creación de corpus, que son el sustento de datos de estos sistemas de RI. Ofrecemos, pues, una visión de conjunto que abarca, tanto la construcción de corpus como su posterior explotación, teniendo siempre en mente la utilización de los estándares más actuales. Además, ilustramos nuestras propuestas genéricas con su aplicación al caso concreto del Corpus de Referencia do Galego Actual (CORGA), desarrollado en el Centro Ramón Piñeiro para a Investigación en Humanidades, lo que permite aclarar cómo se concretan los conceptos abstractos en un caso práctico

    Avaliación dun etiquetador automático estatístico para o galego actual: Xiada

    We evaluate, from a linguistic point of view, a statistical automatic labelling machine, which is explained together by the Center Ramón Piñeiro on Humanities Research and the COLE Group of Vigo and La Coruña Universities, and which also set aside for labelling the papers of Present Galician Reference Corpus so as to provide tools and resources for the computational linguistic analysis of Present Galician.Neste traballo avaliamos, dende o punto de vista lingüístico, un etiquetador automático estatístico, desenvolto conxuntamente polo Centro Ramón Piñeiro para a Investigación en Humanidades e o Grupo COLE das Universidades de Vigo e A Coruña, destinado a etiquetar os documentos do Corpus de Referencia do Galego Actual co obxecto de proporcionar recursos e ferramentas para a análise lingüística computacional do galego actual

    The risks in the practice of activities in nature: the accident rate in sports practices and preventive measures

    El entorno natural que nos rodea no sólo es fuente de vida, sino que permite a través de sus recursos, ocio, recreación y educación. La afluencia de diferentes tipos de practicantes a los entornos naturales en los últimos años, ha tenido una proliferación de nuevas tendencias, nuevos deportes y diversidad de actividades. Con todo ello, se ha producido un aumento de los riesgos e incidentes que se producen en estos espacios de nuevas prácticas deportivas. En el ámbito de la educación, los espacios naturales son utilizados para realizar todo tipo de actividades deportivas, siendo la motivación inherente a cualquier práctica físico-deportiva a desarrollar. En base a ello, con este trabajo se pretende dar conocer los riesgos objetivos que se producen en el medio natural, que datos existen sobre la accidentabilidad en este medio y que medidas preventivas se pueden tomar para reducirlos y que la práctica deportiva sea más segura.The natural environment that surrounds us is not only a source of life, but allows through its resources, leisure, recreation and education. The influx of different types of practitioners to natural environments in recent years, has had a proliferation of new trends, new sports and diversity of activities. With all this, there has been an increase in the risks and incidents that occur in these spaces of new sports practices. In the field of education, natural spaces are used to perform all kinds of sports activities, being the motivation inherent in any physical-sporting practice to be developed. Based on this, this work aims to provide information on the objective risks that occur in the natural environment, what data exist on the accident rate in this environment and what preventive measures can be taken to reduce them and that sports practice is safer.S

    Discovering Light: Fun Experiments with Optics

    Editor(s): Maria Viñas-Peña.Light is an element that draws together many areas of human knowledge: physics, chemistry, biology, astronomy, engineering, and art. Moreover, optical phenomena and the technologies based on them are widespread in our daily lives. However, it can be difficult to understand or explain these phenomena. What is light? Where are optics and photonics present in our lives and in nature? What lies behind different optical phenomena? What is an optical instrument? How does the eye resemble an optical instrument? How can we explain human vision? This book, written by a group of young scientists, answers these questions and many more to help you to get to know the exciting world of optics and photonics. It is intended for the general public, with an emphasis on students at all levels of secondary education. A variety of easy-to-follow experiments related to different optical phenomena and technologies are presented. All of them are preceded by an explanation of the concepts and accompanied by numerous illustrations and curiosities. All of it is meant for you to have fun with optics and photonics!Peer reviewe

    Multi-ancestry genome-wide association study of asthma exacerbations

    Altres ajuts: European Regional Development Fund "ERDF A way of making Europe"; Allergopharma-EAACI award 2021; SysPharmPedia grant from the ERACoSysMed 1st Joint Transnational Call from the European Union under the Horizon 2020; Sandler Family Foundation; American Asthma Foundation; RWJF Amos Medical Faculty Development Program; National Heart, Lung, and Blood Institute of the National Institutes of Health (R01HL117004, R01HL128439, R01HL135156, X01HL134589, R01HL141992, R01HL141845); National Institute of Health and Environmental Health Sciences (R01ES015794, R21ES24844); National Institute on Minority Health and Health Disparities (NIMHD) (P60MD006902, R01MD010443, R56MD013312); National Institute of General Medical Sciences (NIGMS) (RL5GM118984); Tobacco-Related Disease Research Program (24RT-0025, 27IR-0030); National Human Genome Research Institute (NHGRI) (U01HG009080); GlaxoSmithKline and Utrecht Institute for Pharmaceutical Sciences; Slovenian Research Agency (P3-0067); SysPharmPediA grant, co-financed by the Ministry of Education, Science and Sport Slovenia (MIZS) (C3330-16-500106); NHS Research Scotland; Wellcome Trust Biomedical Resource (099177/Z/12/Z); Genotyping National Centre (CeGEN) CeGen-PRB3-ISCIII (AC15/00015); UK Medical Research Council and Wellcome (102215/2/13/2); University of Bristol; Swedish Heart-Lung Foundation, Swedish Research Council; Region Stockholm (ALF project and database maintenance); NHS Chair of Pharmacogenetics via the UK Department of Health; Innovative Medicines Initiative (IMI) (115010); European Federation of Pharmaceutical Industries and Associations (EFPIA); Spanish National Cancer Research Centre; Fundación Canaria Instituto de Investigación Sanitaria de Canarias (PIFIISC19/17); Erasmus Medical Center; Erasmus University Rotterdam; Netherlands Organization for the Health Research and Development (ZonMw); the Research Institute for Diseases in the Elderly (RIDE); Ministry of Education, Culture and Science; Ministry for Health, Welfare and Sports; European Commission (DG XII); Municipality of Rotterdam; German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung, BMBF); U.S. National Institutes of Health (HL07966); European Social Fund "ESF Investing in your future"; Ministerio de Ciencia, Innovación y Universidades; Universidad de La Laguna (ULL); European Academy of Allergy and Clinical Immunology (EAACI); European Respiratory Society (ERS) (LTRF202101-00861); Ministry of Education, Science and Sport of the Republic of Slovenia (C3330-19-252012); Singapore Ministry of Education Academic Research Fund; Singapore Immunology Network (SIgN); National Medical Research Council (NMRC Singapore); Biomedical Research Council (BMRC Singapore); Agency for Science Technology and Research (A*STAR Singapore, N-154-000-038-001, R-154-000-191-112, R-154-000-404-112, R-154-000-553-112, R-154-000-565-112, R-154-000-630-112, R-154-000-A08-592, R-154-000-A27-597, R-154-000-A91-592, R-154-000-A95-592, R-154-000-B99-114, BMRC/01/1/21/18/077, BMRC/04/1/21/19/315, SIgN-06-006, SIgN-08-020, NMRC/1150/2008, H17/01/a0/008); Sime Darby Technology Centre; First Resources Ltd; Genting Plantation; Olam International; U.S. National Institutes of Health (HL138098).Background: Asthma exacerbations are a serious public health concern due to high healthcare resource utilization, work/school productivity loss, impact on quality of life, and risk of mortality. The genetic basis of asthma exacerbations has been studied in several populations, but no prior study has performed a multi-ancestry meta-analysis of genome-wide association studies (meta-GWAS) for this trait. We aimed to identify common genetic loci associated with asthma exacerbations across diverse populations and to assess their functional role in regulating DNA methylation and gene expression. Methods: A meta-GWAS of asthma exacerbations in 4989 Europeans, 2181 Hispanics/Latinos, 1250 Singaporean Chinese, and 972 African Americans analyzed 9.6 million genetic variants. Suggestively associated variants (p ≤ 5 × 10) were assessed for replication in 36,477 European and 1078 non-European asthma patients. Functional effects on DNA methylation were assessed in 595 Hispanic/Latino and African American asthma patients and in publicly available databases. The effect on gene expression was evaluated in silico. Results: One hundred and twenty-six independent variants were suggestively associated with asthma exacerbations in the discovery phase. Two variants independently replicated: rs12091010 located at vascular cell adhesion molecule-1/exostosin like glycosyltransferase-2 (VCAM1/EXTL2) (discovery: odds ratio (OR) = 0.82, p = 9.05 × 10 and replication: OR = 0.89, p = 5.35 × 10) and rs943126 from pantothenate kinase 1 (PANK1) (discovery: OR = 0.85, p = 3.10 × 10 and replication: OR = 0.89, p = 1.30 × 10). Both variants regulate gene expression of genes where they locate and DNA methylation levels of nearby genes in whole blood. Conclusions: This multi-ancestry study revealed novel suggestive regulatory loci for asthma exacerbations located in genomic regions participating in inflammation and host defense

    Overview of recent TJ-II stellarator results

    The main results obtained in the TJ-II stellarator in the last two years are reported. The most important topics investigated have been modelling and validation of impurity transport, validation of gyrokinetic simulations, turbulence characterisation, effect of magnetic configuration on transport, fuelling with pellet injection, fast particles and liquid metal plasma facing components. As regards impurity transport research, a number of working lines exploring several recently discovered effects have been developed: the effect of tangential drifts on stellarator neoclassical transport, the impurity flux driven by electric fields tangent to magnetic surfaces and attempts of experimental validation with Doppler reflectometry of the variation of the radial electric field on the flux surface. Concerning gyrokinetic simulations, two validation activities have been performed, the comparison with measurements of zonal flow relaxation in pellet-induced fast transients and the comparison with experimental poloidal variation of fluctuations amplitude. The impact of radial electric fields on turbulence spreading in the edge and scrape-off layer has been also experimentally characterized using a 2D Langmuir probe array. Another remarkable piece of work has been the investigation of the radial propagation of small temperature perturbations using transfer entropy. Research on the physics and modelling of plasma core fuelling with pellet and tracer-encapsulated solid-pellet injection has produced also relevant results. Neutral beam injection driven Alfvénic activity and its possible control by electron cyclotron current drive has been examined as well in TJ-II. Finally, recent results on alternative plasma facing components based on liquid metals are also presentedThis work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014–2018 under Grant Agreement No. 633053. It has been partially funded by the Ministerio de Ciencia, Inovación y Universidades of Spain under projects ENE2013-48109-P, ENE2015-70142-P and FIS2017-88892-P. It has also received funds from the Spanish Government via mobility grant PRX17/00425. The authors thankfully acknowledge the computer resources at MareNostrum and the technical support provided by the Barcelona S.C. It has been supported as well by The Science and Technology Center in Ukraine (STCU), Project P-507F