35 research outputs found

    Domain-matched Pre-training Tasks for Dense Retrieval

    Get PDF
    Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased performance across almost all NLP tasks. A notable exception is information retrieval, where additional pre-training has so far failed to produce convincing results. We show that, with the right pre-training setup, this barrier can be overcome. We demonstrate this by pre-training large bi-encoder models on 1) a recently released set of 65 million synthetically generated questions, and 2) 200 million post-comment pairs from a preexisting dataset of Reddit conversations. We evaluate on a set of information retrieval and dialogue retrieval benchmarks, showing substantial improvements over supervised baselines

    Poor nutritional status of schoolchildren in urban and peri-urban areas of Ouagadougou (Burkina Faso)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Malnutrition is still highly prevalent in developing countries. Schoolchildren may also be at high nutritional risk, not only under-five children. However, their nutritional status is poorly documented, particularly in urban areas. The paucity of information hinders the development of relevant nutrition programs for schoolchildren. The aim of this study carried out in Ouagadougou was to assess the nutritional status of schoolchildren attending public and private schools.</p> <p>Methods</p> <p>The study was carried out to provide baseline data for the implementation and evaluation of the Nutrition Friendly School Initiative of WHO. Six intervention schools and six matched control schools were selected and a sample of 649 schoolchildren (48% boys) aged 7-14 years old from 8 public and 4 private schools were studied. Anthropometric and haemoglobin measurements, along with thyroid palpation, were performed. Serum retinol was measured in a random sub-sample of children (N = 173). WHO criteria were used to assess nutritional status. Chi square and independent t-test were used for proportions and mean comparisons between groups.</p> <p>Results</p> <p>Mean age of the children (48% boys) was 11.5 ± 1.2 years. Micronutrient malnutrition was highly prevalent, with 38.7% low serum retinol and 40.4% anaemia. The prevalence of stunting was 8.8% and that of thinness, 13.7%. The prevalence of anaemia (p = 0.001) and vitamin A deficiency (p < 0.001) was significantly higher in public than private schools. Goitre was not detected. Overweight/obesity was low (2.3%) and affected significantly more children in private schools (p = 0.009) and younger children (7-9 y) (p < 0.05). Thinness and stunting were significantly higher in peri-urban compared to urban schools (p < 0.05 and p = 0.004 respectively). Almost 15% of the children presented at least two nutritional deficiencies.</p> <p>Conclusion</p> <p>This study shows that malnutrition and micronutrient deficiencies are also widely prevalent in schoolchildren in cities, and it underlines the need for nutrition interventions to target them.</p

    Chinese Whispers: Cooperative Paraphrase Acquisition.

    No full text
    We present a framework for the acquisition of sentential paraphrases based on crowdsourcing. The proposed method maximizes the lexical divergence between an original sentence s and its valid paraphrases by running a sequence of paraphrasing jobs carried out by a crowd of non-expert workers. Instead of collecting direct paraphrases of s, at each step of the sequence workers manipulate semantically equivalent reformulations produced in the previous round. We applied this method to paraphrase English sentences extracted from Wikipedia. Our results show that, keeping at each round n the most promising paraphrases (i.e. the more lexically dissimilar from those acquired at round n-1), the monotonic increase of divergence allows to collect good-quality paraphrases in a cost-effective manner

    Semeval-2013 Task 8: Cross-lingual Textual Entailment for Content Synchronization

    No full text
    This paper presents the second round of the task on Cross-lingual Textual Entailment for Content Synchronization, organized within SemEval-2013. The task was designed to promote research on semantic inference over texts written in different languages, targeting at the same time a real application scenario. Participants were presented with datasets for different language pairs, where multi-directional entailment relations (“forward”, “backward”, “bidirectional”, “no entailment”) had to be identified. We report on the training and test data used for evaluation, the process of their creation, the participating systems (six teams, 61 runs), the approaches adopted and the results achieved

    Nutritional status and eating pattern in prostate cancer patients Estado nutricional y tipo de alimentación en pacientes con cáncer de próstata

    No full text
    Background: Prostate cancer is the second most common cancer in men worldwide. Differences in prostate cancer incidence suggest a significant role of environmental factors in the aetiology: obesity, central adiposity and some dietary factors have been suggested as risk factors. This pilot study aimed to analyse the pattern of nutritional status, body fat, and the usual dietary intake among men diagnosed with prostate cancer, consecutively referred to the Radiotherapy Department of the University Hospital Santa Maria. Patients & methods: Throughout 2006, 87 men with prostate cancer were included. Evaluations: weight & height to calculate body mass index (BMI), waist circumference, percentage body fat with bipolar hand-held bioimpedance analysis (BF-306®), Food Frequency Questionnaire validated for the Portuguese population to assess the usual dietary intake. Frequency analysis and Mann-Whitney U test were used to evaluate prevalence and associations. Results: Mean age was 69 &plusmn; 7 (46-85) years; 74 (84.1%) patients were in stage II, 5 (5.7%) in stage I & 9 (10.2%) in stage III; 39(45%) patients had a Gleason score &#8805; 7. Regarding nutritional status, 78 (89%) patients were overweight/obese, 84 (97%) had a body fat above the maximum limit (> 25%) and 43 (49%) had a waist circumference > 102 cm (prevalence analysis: p Introducción: el cáncer de próstata es el segundo en frecuencia en hombres en el mundo. Las diferencias en la incidencia del cáncer de próstata sugieren un papel significativo de los factores ambientales en su etiología: se ha sugerido la obesidad, adiposidad central y algunos factores dietéticos como factores de riesgo. Objetivos: este estudio piloto se proponía analizar el patrón del estado nutricional, la grasa corporal y el consumo dietético habitual en hombres diagnosticados de cáncer de próstata y remitidos de forma consecutiva al Servicio de Radioterapia del Hospital Universitario de Santa María. Pacientes y métodos: a lo largo de 2.006, se incluyeron 87 hombres con cáncer de próstata. Evaluaciones: peso y talla para calcular el índice de masa corporal (IMC), la circunferencia de la cintura, el % de grasa corporal mediante análisis bipolar manual de bioimpedancia (BF-306®), el cuestionario Food Frequency Questionnaire validado en su versión portuguesa para valorar el consumo dietético habitual. Se emplearon los análisis de frecuencia y la prueba U de Mann-Whitney para evaluar la prevalencia y las asociaciones. Resultados y discusión: la edad media fue de 69 &plusmn; 7 (46-85) años; 74 (84,1%) pacientes estaban en estadio II, 5 (5,7%), en estadio I y 9 (10,2%) en estadio III; 39 (45%) pacientes tenían una puntuación de Gleason &ge; 7. Con respecto al estado nutricional, 78 (89%) pacientes eran obesos o tenían sobrepeso, 84 (97%) tenían grasa corporal por encima del límite máximo (>25%) y en 43 (49%) la circunferencia de la cintura era > 102 cm (análisis de prevalencia: p < 0,05). El análisis univariable no mostró ninguna asociación entre la puntuación de Gleason, el IMC, el % de grasa corporal ni la circunferencia de la cintura; el análisis multivariado mostró una asociación entre un mayor IMC, el % de grasa corporal y puntuaciones de Gleason malas (p < 0,002); estas variables empeoraban con al edad. El análisis de frecuencia de alimentos mostró un consumo bajo de fuentes de ácidos grasos n-3 así como de vegetales y de cereales integrales, y se encontró una correlación entre un consumo bajo de yogur y vegetales y unas peores puntuaciones de Gleason (p < 0,05). Conclusión: nuestros hallazgos muestran un prevalencia elevada de obesidad, exceso de grasa corporal y abdominal y las dietas deficientes en nutrientes protectores. ¡Se requieren investigaciones adicionales puesto que las tasas de cáncer en Portugal siguen aumentando!

    CoSyne: a framework for multilingual content synchronization of wikis

    Get PDF
    open6siWikis allow a large base of contributors easy access to shared content, and freedom in editing it. One of the side-effects of this freedom was the emergence of parallel and independently evolving versions in a variety of languages, reflecting the multilingual background of the pool of contributors. For the Wiki to properly represent the user-added content, this should be fully available in all its languages. Working on parallel Wikis in several European languages, we investigate the possibility to “synchronize” different language versions of the same document, by: i) pinpointing topically related pieces of information in the different languages, ii) identifying information that is missing or less detailed in one of the two versions,iii) translating this in the appropriate language, iv) inserting it in the appropriate place. Progress along such directions will allow users to share more easily content across language boundaries.Christof, Monz; Vivi, Nastase; Matteo, Negri; Angela, Fahrni; Yashar, Mehdad; Michael, StrubeMonz, Christof; Nastase, Viviana Antonela; Negri, Matteo; Fahrni, Angela; Mehdad, Yashar; Strube, Michae

    Semeval-2012 Task 8: Cross-lingual Textual Entailment for Content Synchronization.

    No full text
    This paper presents the first round of the task on Cross-lingual Textual Entailment for Content Synchronization, organized within SemEval-2012. The task was designed to promote research on semantic inference over texts written in different languages, targeting at the same time a real application scenario. Participants were presented with datasets for different language pairs, where multi-directional entailment relations (“forward”, “backward”, “bidirectional”, “no entailment”) had to be identified. We report on the training and test data used for evaluation, the process of their creation, the participating systems (10 teams, 92 runs), the approaches adopted and the results achieved