105 research outputs found

    Developing competitive HMM PoS taggers using small training corpora

    Get PDF
    This paper presents a study aiming to find out the best strategy to develop a fast and accurate HMM tagger when only a limited amount of training material is available. This is a crucial factor when dealing with languages for which small annotated material is not easily available. First, we develop some experiments in English, using WSJ corpus as a test-bench to establish the differences caused by the use of large or a small train set. Then, we port the results to develop an accurate Spanish PoS tagger using a limited amount of training data. Different configurations of a HMM tagger are studied. Namely, trigram and 4-gram models are tested, as well as different smoothing techniques. The performance of each configuration depending on the size of the training corpus is tested in order to determine the most appropriate setting to develop HMM PoS taggers for languages with reduced amount of corpus available.Postprint (published version

    CARPANTA eats words you don't need from e-mail

    Get PDF
    Presentamos CARPANTA, un sistema de resumen automático de correo electrónico que aplica técnicas de conocimiento intensivo para obtener resúmenes coherentes. El uso de herramientas de PLN de amplia cobertura garantiza la robusteza y portabilidad del sistema, pero también se explota conocimiento dependiente de lengua y dominio. CARPANTA ha sido evaluado por comparación con un corpus de resúmenes confeccionados por jueces humanos, con resultados satisfactorios.We present CARPANTA, an e-mail summarization system that applies a knowledge intensive approach to obtain highly coherent summaries. Robustness and portability are guaranteed by the use of general-purpose NLP, but it also exploits language- and domain-dependent knowledge. The system is evaluated against a corpus of human-judged summaries, reaching, satisfactory levels of performance.This research has been conducted thanks to a grant associated to the X-TRACT project, PB98-1226 of the Spanish Research Department. It has also been partially funded by projects HERMES (TIC2000-0335-C03-02), PETRA (TIC2000-1735-C02-02) and by CLiC (Centre de Llenguatge i Computació)

    miR-21, miR-99b and miR-375 combination as predictive response signature for preoperative chemoradiotherapy in rectal cancer.

    Get PDF
    INTRODUCTION: Preoperative chemoradiotherapy (CRT) is a standard treatment for locally advanced rectal cancer patients. Despite the benefits of CRT, its use in non-responder patients can be associated with increased toxicities and surgical resection delay. The identification of CRT response biomarkers, such as microRNAs, could improve the management of these patients. We have studied the microRNA expression in pretreatment endoscopy biopsies from rectal cancer patients treated with CRT to identify potential microRNAs able to predict CRT response and clinical outcome of these patients. MATERIAL AND METHODS: RNA from pretreatment endoscopy biopsies from 96 rectal cancer patients treated with preoperative CRT were studied. Pathological response was graded according to the tumor regression grade (TRG) Dworak classification. In the screening phase, 377 miRNAs were studied in 12 patients with extreme responses (TRG0-1 vs TRG4). The potential role as predictive biomarkers for CRT response, disease-free survival (DFS) and overall survival (OS) of the miRNAs identified in the screening phase were validated in the whole cohort. RESULTS: In the screening phase, an 8-miRNAs CRT-response signature was identified: let-7b, let-7e, miR-21, miR-99b, miR-183, miR-328, miR-375 and miR-483-5p. In the validation phase, miR-21, miR-99b and miR-375 emerged as CRT response-related miRNAs while miR-328 and let-7e emerged as prognostic markers for DFS and OS. Interestingly, ROC curve analysis showed that the combination of miR-21, miR-99b and miR-375 had the best capacity to distinguish patients with maximum response (TRG4) from others. CONCLUSIONS: miR-21, miR-99b and miR-375 could add valuable information for individualizing treatment in locally advanced rectal cancer patients

    Sistema de recomendación para un uso inclusivo del lenguaje

    No full text
    Sistema que procesa un texto escrito en castellano detectando usos del lenguaje no inclusivos. Para cada sintagma nominal sospechoso el sistema propone una serie de alternativas. El sistema permite también la adquisición automática de ejemplos positivos a partir de documentos que hagan un uso inclusivo del lenguaje. Estos ejemplos serán usados, junto a su contexto, en la presentación de sugerencias. Abstract: System to detect exclusive language in spanish documents. For each noun phrase detected as exclusive, several alternative are suggested by the system. Moreover, the system allows the automatic adquisition of positive examples from inclusive documents to be presented within their context as alternatives.Peer ReviewedPostprint (published version

    Efectos a corto plazo de la contaminación atmosférica sobre la mortalidad : resultados del proyecto EMECAM en Cartagena, 1992-96

    Get PDF
    [ESP] Fundamento: Los problemas de contaminación atmosférica se han venido percibiendo en la ciudad de Cartagena desde la década de los setenta, con episodios puntuales de altos niveles de SO? y partículas. Nos proponemos evaluar, utilizando la metodología del proyecto EMECAM, los efectos agudos de la contaminación atmosférica por SO2 y partículas sobre la mortalidad diaria en la ciudad de Cartagena de 1992 a 1996. Métodos: se relacionan las defunciones diarias por todas las causas excepto las externas, en población general y en la de 70 y más años, por causas cardiovasculares y por causas respiratorias, con la contaminación atmosférica por dióxido de azufre y partículas, en el período 1992-l 996, utilizando modelos de Poisson autorregresiva que controlan por estacionalidad, meteorología, calendario, gripe, eventos especiales y retardos. Resultados: En el período en estudio se ha producido una disminución de la contaminación por SO2, con respecto a años anteriores, lo que no ha sido evidente para la contaminación por partículas. Los análisis muestran asociaciones significativas en las defunciones totales sin accidentes en mayores de 69 años con el valor promedio de partículas, éstas con las defunciones cardiovasculares del mes de mayo a octubre. En el semestre frío, encontramos asociación estadísticamente significativa positiva en el valor máximo horario diario de las partículas y las muertes, por enfermedades cardiocirculatorias y respiratorias. Sin embargo no hay consistencia en las asociaciones al evaluar la fiabilidad de los modelos. [ENG] Background: The problems of air pollution became noticeable in Cartagena in the seventies, high SO? and particle levels having been reached from time to time. Our aim is to assess, using the EMECAM methodology, the acute impact of SO: and particle air pollution on the daily death rate of the City of Cartagena in the 1992- 1996 period. Methods: A daily listing is provided of the total number of non-accidental deaths within the population as a whole and for those over age 70, the cardiovascular and the respiratory deaths due to dioxide and particle air pollution for the 1992- 1996 period using autoregressive Poisson models which control seasonality, weather, time of year, flu, special events, and time lags. Results: In the period under study, there has been a drop in the SO2 air pollution as compared to previous years, which was not as marked for the particles. The analyses revea1 signifícant relationships in the total non-accidental deaths in those over age 69, with the average particle count and thole particles with cardiovascular deaths for the months of May to October. In the six-month period of the year, when the weather is cold, we found a positive statistically significant relationship to exist in the maximum daily hourly value of the particles and the deaths due to cardiocirculatory and respiratory diseases. However, there is no consistency in the between on assessing the reliability of the models.Este trabajo cuenta con una beca del Fondo de Investigaciones Sanitarias (Expediente núm 971005 l-09)

    Lincp21-RNA as Predictive Response Marker for Preoperative Chemoradiotherapy in Rectal Cancer

    Get PDF
    Preoperative chemoradiotherapy (CRT) is a standard treatment for locally advanced rectal cancer (RC) patients, but its use in non-responders can be associated with increased toxicities and resection delay. LincRNA-p21 is a long non-coding RNA involved in the p53 pathway and angiogenesis regulation. We aimed to study whether lincRNA-p21 expression levels can act as a predictive biomarker for neoadjuvant CRT response. We analyzed RNAs from pretreatment biopsies from 70 RC patients treated with preoperative CRT. Pathological response was classified according to the tumor regression grade (TRG) Dworak classification. LincRNA-p21 expression was determined by RTqPCR. The results showed that lincRNA-p21 was upregulated in stage III tumors (p = 0.007) and in tumors with the worst response regarding TRG (p = 0.027) and downstaging (p = 0.016). ROC curve analysis showed that lincRNA-p21 expression had the capacity to distinguish a complete response from others (AUC:0.696; p = 0.014). LincRNA-p21 was shown as an independent marker of preoperative CRT response (p = 0.047) and for time to relapse (TTR) (p = 0.048). In conclusion, lincRNA-p21 is a marker of advanced disease, worse response to neoadjuvant CRT, and shorter TTR in locally advanced RC patients. The study of lincRNA-p21 may be of value in the individualization of pre-operative CRT in RC

    Pharmacogenetic clinical randomised phase II trial to evaluate the efficacy and safety of FOLFIRI with high-dose irinotecan (HD-FOLFIRI) in metastatic colorectal cancer patients according to their UGT1A 1 genotype

    Get PDF
    Patients harbouring the UGT1A1 *28/*28 genotype are at risk of severe toxicity with the standard irinotecan dose. However, this dose is considerably lower than the dose that can be tolerated by UGT1A1 *1/*1 and *1/*28 patients. This randomised phase II trial evaluated the efficacy and safety of the FOLFIRI regimen with high-dose irinotecan (HD-FOLFIRI) in metastatic colorectal cancer patients. Eighty-two patients with the UGT1A1 *1/*1 or the *1/*28 genotype were randomised to receive HD-FOLFIRI versus FOLFIRI. Patients with the UGT1A1 *28/*28 genotype were excluded. In the experimental group, the irinotecan dose was 300 mg/m 2 for UGT1A1 *1/*1 and 260 mg/m 2 for *1/*28 patients. In the control group, the dose was 180 mg/m 2. We analysed the overall response rate (ORR), toxicity, and survival. The ORR was significantly higher in the HD-FOLFIRI group (67.5 versus 43.6%; p = 0.001 OR: 1.73 [95% CI:1.03-2.93]). Neutropenia (17.7%), diarrhoea (5.1%), and asthenia (5.1%) were the most common grade 3-4 toxicity. No differences were observed in severe toxicity (22.5% versus 20.5%), dose reduction (22.5% versus 28.2%), or prophylactic G-CSF (17.5% versus 12.8%). No difference in survival was found. Patients with the UGT1A1 *1/*1 and *1/*28 genotypes can receive high doses of irinotecan to achieve a more favourable ORR without significant adverse events

    Advanced semantic textual processing for the detection of diagnostic codes, procedures, concepts and their relationships in health records

    Get PDF
    El objetivo de este proyecto es desarrollar procesadores para el análisis automático de textos médicos, poniendo a disposición de la comunidad científica y empresarial un conjunto amplio y versátil de herramientas y recursos lingüísticos para el análisis morfológico, sintáctico y semántico, así como la asignación de códigos diagnósticos y procedimientos a informes médicos según el estándar CIE-10 y la detección de relaciones entre conceptos. Se desarrollaran herramientas para el español, dado su amplio uso en sistemas de salud a nivel internacional, explorando además otras lenguas con diferentes características como el catalán y el vasco.The main aim of this project will be to develop a set of processors for the automatic analysis of medical texts. The project will create a wide and exibleset of tools, linguistic, and semantic resources for the following tasks: morphologic, syntactic and semantic analysis adapted to medical texts; assignment of diagnostics and procedures following the ICD-10 coding, and detection of relationships between concepts. The project will develop tools for Spanish, used in multiple health systems of different countries. Moreover, we will also tackle other languages with different characteristics such as Catalan and Basque.Esta contribución ha sido subvencionada por el MINECO (TIN2016-77820-C3-1-R, TIN2016-77820-C3-2-R, TIN2016-77820-C3-3-R y AEI/FEDER, UE.

    Trends in socioeconomic inequalities in mortality in small areas of 33 Spanish cities

    Get PDF
    Background: In Spain, several ecological studies have analyzed trends in socioeconomic inequalities in mortality from all causes in urban areas over time. However, the results of these studies are quite heterogeneous finding, in general, that inequalities decreased, or remained stable. Therefore, the objectives of this study are: (1) to identify trends in geographical inequalities in all-cause mortality in the census tracts of 33 Spanish cities between the two periods 1996–1998 and 2005–2007; (2) to analyse trends in the relationship between these geographical inequalities and socioeconomic deprivation; and (3) to obtain an overall measure which summarises the relationship found in each one of the cities and to analyse its variation over time. Methods: Ecological study of trends with 2 cross-sectional cuts, corresponding to two periods of analysis: 1996–1998 and 2005–2007. Units of analysis were census tracts of the 33 Spanish cities. A deprivation index calculated for each census tracts in all cities was included as a covariate. A Bayesian hierarchical model was used to estimate smoothed Standardized Mortality Ratios (sSMR) by each census tract and period. The geographical distribution of these sSMR was represented using maps of septiles. In addition, two different Bayesian hierarchical models were used to measure the association between all-cause mortality and the deprivation index in each city and period, and by sex: (1) including the association as a fixed effect for each city; (2) including the association as random effects. In both models the data spatial structure can be controlled within each city. The association in each city was measured using relative risks (RR) and their 95 % credible intervals (95 % CI). Results: For most cities and in both sexes, mortality rates decline over time. For women, the mortality and deprivation patterns are similar in the first period, while in the second they are different for most cities. For men, RRs remain stable over time in 29 cities, in 3 diminish and in 1 increase. For women, in 30 cities, a non-significant change over time in RR is observed. However, in 4 cities RR diminishes. In overall terms, inequalities decrease (with a probability of 0.9) in both men (RR = 1.13, 95 % CI = 1.12–1.15 in the 1st period; RR = 1.11, 95 % CI = 1.09–1.13 in the 2nd period) and women (RR = 1.07, 95 % CI = 1.05–1.08 in the 1st period; RR = 1.04, 95 % CI = 1.02–1.06 in the 2nd period). Conclusions: In the future, it is important to conduct further trend studies, allowing to monitoring trends in socioeconomic inequalities in mortality and to identify (among other things) temporal factors that may influence these inequalities.This article was partially funded by Plan Nacional de I + D + I 2008–2011 and Instituto de Salud Carlos III (ISCIII) –Subdirección General de Evaluación y Fomento de la Investigación- (Award numbers: PI081488, PI081978, PI080367, PI08/1017, PI080330, P08/0142, PI081785, PI080662, PI081713, PI081058, PI081340, PI080803, PI126/08), Fundación Canaria de Investigación Sanitaria FUNCIS 84/07 and by CIBER Epidemiología y Salud Pública (CIBERESP)

    A combined analysis of the short-term effects of photochemical air pollutants on mortality within the EMECAM project.

    Get PDF
    In recent years, some epidemiologic studies have attributed adverse effects of air pollutants on health not only to particles and sulfur dioxide but also to photochemical air pollutants (nitrogen dioxide and ozone). The effects are usually small, leading to some inconsistencies in the results of the studies. Furthermore, the different methodologic approaches of the studies used has made it difficult to derive generic conclusions. We provide here a quantitative summary of the short-term effects of photochemical air pollutants on mortality in seven Spanish cities involved in the EMECAM project, using generalized additive models from analyses of single and multiple pollutants. Nitrogen dioxide and ozone data were provided by seven EMECAM cities (Barcelona, Gijón, Huelva, Madrid, Oviedo, Seville, and Valencia). Mortality indicators included daily total mortality from all causes excluding external causes, daily cardiovascular mortality, and daily respiratory mortality. Individual estimates, obtained from city-specific generalized additive Poisson autoregressive models, were combined by means of fixed effects models and, if significant heterogeneity among local estimates was found, also by random effects models. Significant positive associations were found between daily mortality (all causes and cardiovascular) and NO(2), once the rest of air pollutants were taken into account. A 10 microg/m(3) increase in the 24-hr average 1-day NO(2)level was associated with an increase in the daily number of deaths of 0.43% [95% confidence interval (CI), -0.003-0.86%] for all causes excluding external. In the case of significant relationships, relative risks for cause-specific mortality were nearly twice as much as that for total mortality for all the photochemical pollutants. Ozone was independently related only to cardiovascular daily mortality. No independent statistically significant relationship between photochemical air pollutants and respiratory mortality was found. The results in this study suggest that, given the present levels of photochemical pollutants, people living in Spanish cities are exposed to health risks derived from air pollution
    corecore