22 research outputs found

    Investigating domain-independent NLP techniques for precise target selection in video hyperlinking

    Get PDF
    International audienceAutomatic generation of hyperlinks in multimedia video data is a subject with growing interest, as demonstrated by recent work undergone in the framework of the Search and Hyperlinking task within the Mediaeval benchmark initiative. In this paper, we compare NLP-based strategies for precise target selection in video hyperlinking exploiting speech material, with the goal of providing hyperlinks from a specified anchor to help information retrieval. We experimentally compare two approaches enabling to select short portions of videos which are relevant and possibly complementary with respect to the anchor. The first approach exploits a bipartite graph relating utterances and words to find the most relevant utterances. The second one uses explicit topic segmentation, whether hierarchical or not, to select the target segments. Experimental results are reported on the Mediaeval 2013 Search and Hyperlinking dataset which consists of BBC videos, demonstrating the interest of hierarchical topic segmentation for precise target selection

    IRISA and KUL at MediaEval 2014: Search and Hyperlinking Task

    Get PDF
    International audienceThis paper presents our approach and results in the hyper-linking sub-task at MediaEval 2014. A two step approach is implemented: relying on a topic segmentation technique, the first step consists in generating potential target segments; then, for each anchor, the best 20 target segments are selected according to two distinct strategies: the first one focuses on the identification of very similar targets using n-grams and named entities; the second one makes use of an intermediate structure built from topic models, which offers the possibility to control serendipity and to explain the links created

    Leveraging lexical cohesion and disruption for topic segmentation

    Get PDF
    International audienceTopic segmentation classically relies on one of two criteria, either finding areas with coherent vocabulary use or detecting discontinuities. In this paper, we propose a segmentation criterion combining both lexical cohesion and disruption, enabling a trade-off between the two. We provide the mathematical formulation of the criterion and an efficient graph based decoding algorithm for topic segmentation. Experimental results on standard textual data sets and on a more challenging corpus of automatically transcribed broadcast news shows demonstrate the benefit of such a combination. Gains were observed in all conditions, with segments of either regular or varying length and abrupt or smooth topic shifts. Long segments benefit more than short segments.However the algorithm has proven robust on automatic transcripts with short segments and limited vocabulary reoccurrences

    Hierarchical Topic Models for Language-based Video Hyperlinking

    Get PDF
    International audienceWe investigate video hyperlinking based on speech transcripts , leveraging a hierarchical topical structure to address two essential aspects of hyperlinking, namely, serendipity control and link justification. We propose and compare different approaches exploiting a hierarchy of topic models as an intermediate representation to compare the transcripts of video segments. These hierarchical representations offer a basis to characterize the hyperlinks, thanks to the knowledge of the topics who contributed to the creation of the links, and to control serendipity by choosing to give more weights to either general or specific topics. Experiments are performed on BBC videos from the Search and Hyperlinking task at MediaEval. Link precisions similar to those of direct text comparison are achieved however exhibiting different targets along with a potential control of serendipity

    IRISA at TrecVid2015: Leveraging Multimodal LDA for Video Hyperlinking

    Get PDF
    International audienceThis paper presents the runs that we submitted in the context of the TRECVid 2015 Video Hyperlinking task. The task aims at proposing a set of video segments, called targets, to complement a query video segment defined as anchor. We used automatic transcripts and automatically extracted visual concept as input data. Two out of four runs use cross-modal LDA as a means to jointly make use of visual and audio information in the videos. As a contrast, one is based solely on visual information, and a combination of the cross-modal and visual runs is considered. After presenting the approaches, we discuss the performance obtained by the respective runs, as well as some of the limitations of the evaluation process

    Global burden and strength of evidence for 88 risk factors in 204 countries and 811 subnational locations, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021

    Get PDF
    Background: Understanding the health consequences associated with exposure to risk factors is necessary to inform public health policy and practice. To systematically quantify the contributions of risk factor exposures to specific health outcomes, the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 aims to provide comprehensive estimates of exposure levels, relative health risks, and attributable burden of disease for 88 risk factors in 204 countries and territories and 811 subnational locations, from 1990 to 2021. Methods: The GBD 2021 risk factor analysis used data from 54 561 total distinct sources to produce epidemiological estimates for 88 risk factors and their associated health outcomes for a total of 631 risk–outcome pairs. Pairs were included on the basis of data-driven determination of a risk–outcome association. Age-sex-location-year-specific estimates were generated at global, regional, and national levels. Our approach followed the comparative risk assessment framework predicated on a causal web of hierarchically organised, potentially combinative, modifiable risks. Relative risks (RRs) of a given outcome occurring as a function of risk factor exposure were estimated separately for each risk–outcome pair, and summary exposure values (SEVs), representing risk-weighted exposure prevalence, and theoretical minimum risk exposure levels (TMRELs) were estimated for each risk factor. These estimates were used to calculate the population attributable fraction (PAF; ie, the proportional change in health risk that would occur if exposure to a risk factor were reduced to the TMREL). The product of PAFs and disease burden associated with a given outcome, measured in disability-adjusted life-years (DALYs), yielded measures of attributable burden (ie, the proportion of total disease burden attributable to a particular risk factor or combination of risk factors). Adjustments for mediation were applied to account for relationships involving risk factors that act indirectly on outcomes via intermediate risks. Attributable burden estimates were stratified by Socio-demographic Index (SDI) quintile and presented as counts, age-standardised rates, and rankings. To complement estimates of RR and attributable burden, newly developed burden of proof risk function (BPRF) methods were applied to yield supplementary, conservative interpretations of risk–outcome associations based on the consistency of underlying evidence, accounting for unexplained heterogeneity between input data from different studies. Estimates reported represent the mean value across 500 draws from the estimate's distribution, with 95% uncertainty intervals (UIs) calculated as the 2·5th and 97·5th percentile values across the draws. Findings: Among the specific risk factors analysed for this study, particulate matter air pollution was the leading contributor to the global disease burden in 2021, contributing 8·0% (95% UI 6·7–9·4) of total DALYs, followed by high systolic blood pressure (SBP; 7·8% [6·4–9·2]), smoking (5·7% [4·7–6·8]), low birthweight and short gestation (5·6% [4·8–6·3]), and high fasting plasma glucose (FPG; 5·4% [4·8–6·0]). For younger demographics (ie, those aged 0–4 years and 5–14 years), risks such as low birthweight and short gestation and unsafe water, sanitation, and handwashing (WaSH) were among the leading risk factors, while for older age groups, metabolic risks such as high SBP, high body-mass index (BMI), high FPG, and high LDL cholesterol had a greater impact. From 2000 to 2021, there was an observable shift in global health challenges, marked by a decline in the number of all-age DALYs broadly attributable to behavioural risks (decrease of 20·7% [13·9–27·7]) and environmental and occupational risks (decrease of 22·0% [15·5–28·8]), coupled with a 49·4% (42·3–56·9) increase in DALYs attributable to metabolic risks, all reflecting ageing populations and changing lifestyles on a global scale. Age-standardised global DALY rates attributable to high BMI and high FPG rose considerably (15·7% [9·9–21·7] for high BMI and 7·9% [3·3–12·9] for high FPG) over this period, with exposure to these risks increasing annually at rates of 1·8% (1·6–1·9) for high BMI and 1·3% (1·1–1·5) for high FPG. By contrast, the global risk-attributable burden and exposure to many other risk factors declined, notably for risks such as child growth failure and unsafe water source, with age-standardised attributable DALYs decreasing by 71·5% (64·4–78·8) for child growth failure and 66·3% (60·2–72·0) for unsafe water source. We separated risk factors into three groups according to trajectory over time: those with a decreasing attributable burden, due largely to declining risk exposure (eg, diet high in trans-fat and household air pollution) but also to proportionally smaller child and youth populations (eg, child and maternal malnutrition); those for which the burden increased moderately in spite of declining risk exposure, due largely to population ageing (eg, smoking); and those for which the burden increased considerably due to both increasing risk exposure and population ageing (eg, ambient particulate matter air pollution, high BMI, high FPG, and high SBP). Interpretation: Substantial progress has been made in reducing the global disease burden attributable to a range of risk factors, particularly those related to maternal and child health, WaSH, and household air pollution. Maintaining efforts to minimise the impact of these risk factors, especially in low SDI locations, is necessary to sustain progress. Successes in moderating the smoking-related burden by reducing risk exposure highlight the need to advance policies that reduce exposure to other leading risk factors such as ambient particulate matter air pollution and high SBP. Troubling increases in high FPG, high BMI, and other risk factors related to obesity and metabolic syndrome indicate an urgent need to identify and implement interventions

    Global incidence, prevalence, years lived with disability (YLDs), disability-adjusted life-years (DALYs), and healthy life expectancy (HALE) for 371 diseases and injuries in 204 countries and territories and 811 subnational locations, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021

    Get PDF
    Background: Detailed, comprehensive, and timely reporting on population health by underlying causes of disability and premature death is crucial to understanding and responding to complex patterns of disease and injury burden over time and across age groups, sexes, and locations. The availability of disease burden estimates can promote evidence-based interventions that enable public health researchers, policy makers, and other professionals to implement strategies that can mitigate diseases. It can also facilitate more rigorous monitoring of progress towards national and international health targets, such as the Sustainable Development Goals. For three decades, the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) has filled that need. A global network of collaborators contributed to the production of GBD 2021 by providing, reviewing, and analysing all available data. GBD estimates are updated routinely with additional data and refined analytical methods. GBD 2021 presents, for the first time, estimates of health loss due to the COVID-19 pandemic. Methods: The GBD 2021 disease and injury burden analysis estimated years lived with disability (YLDs), years of life lost (YLLs), disability-adjusted life-years (DALYs), and healthy life expectancy (HALE) for 371 diseases and injuries using 100 983 data sources. Data were extracted from vital registration systems, verbal autopsies, censuses, household surveys, disease-specific registries, health service contact data, and other sources. YLDs were calculated by multiplying cause-age-sex-location-year-specific prevalence of sequelae by their respective disability weights, for each disease and injury. YLLs were calculated by multiplying cause-age-sex-location-year-specific deaths by the standard life expectancy at the age that death occurred. DALYs were calculated by summing YLDs and YLLs. HALE estimates were produced using YLDs per capita and age-specific mortality rates by location, age, sex, year, and cause. 95% uncertainty intervals (UIs) were generated for all final estimates as the 2·5th and 97·5th percentiles values of 500 draws. Uncertainty was propagated at each step of the estimation process. Counts and age-standardised rates were calculated globally, for seven super-regions, 21 regions, 204 countries and territories (including 21 countries with subnational locations), and 811 subnational locations, from 1990 to 2021. Here we report data for 2010 to 2021 to highlight trends in disease burden over the past decade and through the first 2 years of the COVID-19 pandemic. Findings: Global DALYs increased from 2·63 billion (95% UI 2·44–2·85) in 2010 to 2·88 billion (2·64–3·15) in 2021 for all causes combined. Much of this increase in the number of DALYs was due to population growth and ageing, as indicated by a decrease in global age-standardised all-cause DALY rates of 14·2% (95% UI 10·7–17·3) between 2010 and 2019. Notably, however, this decrease in rates reversed during the first 2 years of the COVID-19 pandemic, with increases in global age-standardised all-cause DALY rates since 2019 of 4·1% (1·8–6·3) in 2020 and 7·2% (4·7–10·0) in 2021. In 2021, COVID-19 was the leading cause of DALYs globally (212·0 million [198·0–234·5] DALYs), followed by ischaemic heart disease (188·3 million [176·7–198·3]), neonatal disorders (186·3 million [162·3–214·9]), and stroke (160·4 million [148·0–171·7]). However, notable health gains were seen among other leading communicable, maternal, neonatal, and nutritional (CMNN) diseases. Globally between 2010 and 2021, the age-standardised DALY rates for HIV/AIDS decreased by 47·8% (43·3–51·7) and for diarrhoeal diseases decreased by 47·0% (39·9–52·9). Non-communicable diseases contributed 1·73 billion (95% UI 1·54–1·94) DALYs in 2021, with a decrease in age-standardised DALY rates since 2010 of 6·4% (95% UI 3·5–9·5). Between 2010 and 2021, among the 25 leading Level 3 causes, age-standardised DALY rates increased most substantially for anxiety disorders (16·7% [14·0–19·8]), depressive disorders (16·4% [11·9–21·3]), and diabetes (14·0% [10·0–17·4]). Age-standardised DALY rates due to injuries decreased globally by 24·0% (20·7–27·2) between 2010 and 2021, although improvements were not uniform across locations, ages, and sexes. Globally, HALE at birth improved slightly, from 61·3 years (58·6–63·6) in 2010 to 62·2 years (59·4–64·7) in 2021. However, despite this overall increase, HALE decreased by 2·2% (1·6–2·9) between 2019 and 2021. Interpretation: Putting the COVID-19 pandemic in the context of a mutually exclusive and collectively exhaustive list of causes of health loss is crucial to understanding its impact and ensuring that health funding and policy address needs at both local and global levels through cost-effective and evidence-based interventions. A global epidemiological transition remains underway. Our findings suggest that prioritising non-communicable disease prevention and treatment policies, as well as strengthening health systems, continues to be crucially important. The progress on reducing the burden of CMNN diseases must not stall; although global trends are improving, the burden of CMNN diseases remains unacceptably high. Evidence-based interventions will help save the lives of young children and mothers and improve the overall health and economic conditions of societies across the world. Governments and multilateral organisations should prioritise pandemic preparedness planning alongside efforts to reduce the burden of diseases and injuries that will strain resources in the coming decades. Funding: Bill & Melinda Gates Foundation

    Hierarchical Topic Segmentation of TV shows Automatic Transcripts

    No full text
    The growth in the collections of multimedia documents made the development of new data access and data structuring techniques a necessity. The work presented in this report focuses on structuring TV shows and among the different kinds of structuring we approach the topic segmentation. Moreover we are interested in techniques able to provide hierarchical topic segmentation. The motivation for this research is defined by the potential impact of these techniques, since they fold perfectly on navigation and information retrieval subjects. In order to provide an automatic structuring of TV shows, that is generic, we use the words pronounced in TV shows, made available by their automatic textual transcription provided by an ASR system. The proposed topic segmentation algorithm consists in the recursive application of a modified version of TextTiling. It is based on the exploitation of a technique called vectorization, which was recently introduced for linear segmentation and outperformed the other existing techniques. We decided to study vectorization in more depth since it is a powerful technique and we tested it both for linear and hierarchical segmentation. The results obtained show that using vectorization can improve the segmentation and justify the interest of further applying such a technique

    Un modèle segmental probabiliste combinant cohésion lexicale et rupture lexicale pour la segmentation thématique

    No full text
    International audienceIdentifying topical structure in any text-like data is a challenging task. Most existing techniques rely either on maximizing a measure of the lexical cohesion or on detecting lexical disruptions. A novel method combining the two criteria so as to obtain the best trade-off between cohesion and disruption is proposed in this paper. A new statistical model is defined, based on the work of Isahara and Utiyama (2001), maintaining the properties of domain independence and limited a priori of the latter. Evaluations are performed both on written texts and on automatic transcripts of TV shows, the latter not respecting the norms of written texts, thus increasing the difficulty of the task. Experimental results demonstrate the relevance of combining lexical cohesion and disrupture.L'identification d'une structure thématique dans des données textuelles quelconques est une tâche difficile. La plupart des techniques existantes reposent soit sur la maximisation d'une mesure de cohésion lexicale au sein d'un segment, soit sur la détection de ruptures lexicales. Nous proposons une nouvelle technique combinant ces deux critères de manière à obtenir le meilleur compromis entre cohésion et rupture. Nous définissons un nouveau modèle probabiliste, fondé sur l'approche proposée par Utiyama et Isahara (2001), en préservant les propriétés d'indépendance au domaine et de faible a priori de cette dernière. Des évaluations sont menées sur des textes écrits et sur des transcriptions automatiques de la parole à la télévision, transcriptions qui ne respectent pas les normes des textes écrits, ce qui accroît la difficulté. Les résultats expérimentaux obtenus démontrent la pertinence de la combinaison des critères de cohésion et de rupture
    corecore