50 research outputs found

    Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services

    Full text link
    It is universal to see people obtain knowledge on micro-blog services by asking others decision making questions. In this paper, we study the Jury Selection Problem(JSP) by utilizing crowdsourcing for decision making tasks on micro-blog services. Specifically, the problem is to enroll a subset of crowd under a limited budget, whose aggregated wisdom via Majority Voting scheme has the lowest probability of drawing a wrong answer(Jury Error Rate-JER). Due to various individual error-rates of the crowd, the calculation of JER is non-trivial. Firstly, we explicitly state that JER is the probability when the number of wrong jurors is larger than half of the size of a jury. To avoid the exponentially increasing calculation of JER, we propose two efficient algorithms and an effective bounding technique. Furthermore, we study the Jury Selection Problem on two crowdsourcing models, one is for altruistic users(AltrM) and the other is for incentive-requiring users(PayM) who require extra payment when enrolled into a task. For the AltrM model, we prove the monotonicity of JER on individual error rate and propose an efficient exact algorithm for JSP. For the PayM model, we prove the NP-hardness of JSP on PayM and propose an efficient greedy-based heuristic algorithm. Finally, we conduct a series of experiments to investigate the traits of JSP, and validate the efficiency and effectiveness of our proposed algorithms on both synthetic and real micro-blog data.Comment: VLDB201

    Model Debiasing via Gradient-based Explanation on Representation

    Full text link
    Machine learning systems produce biased results towards certain demographic groups, known as the fairness problem. Recent approaches to tackle this problem learn a latent code (i.e., representation) through disentangled representation learning and then discard the latent code dimensions correlated with sensitive attributes (e.g., gender). Nevertheless, these approaches may suffer from incomplete disentanglement and overlook proxy attributes (proxies for sensitive attributes) when processing real-world data, especially for unstructured data, causing performance degradation in fairness and loss of useful information for downstream tasks. In this paper, we propose a novel fairness framework that performs debiasing with regard to both sensitive attributes and proxy attributes, which boosts the prediction performance of downstream task models without complete disentanglement. The main idea is to, first, leverage gradient-based explanation to find two model focuses, 1) one focus for predicting sensitive attributes and 2) the other focus for predicting downstream task labels, and second, use them to perturb the latent code that guides the training of downstream task models towards fairness and utility goals. We show empirically that our framework works with both disentangled and non-disentangled representation learning methods and achieves better fairness-accuracy trade-off on unstructured and structured datasets than previous state-of-the-art approaches

    ViT-CX: Causal Explanation of Vision Transformers

    Full text link
    Despite the popularity of Vision Transformers (ViTs) and eXplainable AI (XAI), only a few explanation methods have been proposed for ViTs thus far. They use attention weights of the classification token on patch embeddings and often produce unsatisfactory saliency maps. In this paper, we propose a novel method for explaining ViTs called ViT-CX. It is based on patch embeddings, rather than attentions paid to them, and their causal impacts on the model output. ViT-CX can be used to explain different ViT models. Empirical results show that, in comparison with previous methods, ViT-CX produces more meaningful saliency maps and does a better job at revealing all the important evidence for prediction. It is also significantly more faithful to the model as measured by deletion AUC and insertion AUC

    Explanation Strategies for Image Classification in Humans vs. Current Explainable AI

    Full text link
    Explainable AI (XAI) methods provide explanations of AI models, but our understanding of how they compare with human explanations remains limited. In image classification, we found that humans adopted more explorative attention strategies for explanation than the classification task itself. Two representative explanation strategies were identified through clustering: One involved focused visual scanning on foreground objects with more conceptual explanations diagnostic for inferring class labels, whereas the other involved explorative scanning with more visual explanations rated higher for effectiveness. Interestingly, XAI saliency-map explanations had the highest similarity to the explorative attention strategy in humans, and explanations highlighting discriminative features from invoking observable causality through perturbation had higher similarity to human strategies than those highlighting internal features associated with higher class score. Thus, humans differ in information and strategy use for explanations, and XAI methods that highlight features informing observable causality match better with human explanations, potentially more accessible to users

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.Peer reviewe

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background: In an era of shifting global agendas and expanded emphasis on non-communicable diseases and injuries along with communicable diseases, sound evidence on trends by cause at the national level is essential. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) provides a systematic scientific assessment of published, publicly available, and contributed data on incidence, prevalence, and mortality for a mutually exclusive and collectively exhaustive list of diseases and injuries. Methods: GBD estimates incidence, prevalence, mortality, years of life lost (YLLs), years lived with disability (YLDs), and disability-adjusted life-years (DALYs) due to 369 diseases and injuries, for two sexes, and for 204 countries and territories. Input data were extracted from censuses, household surveys, civil registration and vital statistics, disease registries, health service use, air pollution monitors, satellite imaging, disease notifications, and other sources. Cause-specific death rates and cause fractions were calculated using the Cause of Death Ensemble model and spatiotemporal Gaussian process regression. Cause-specific deaths were adjusted to match the total all-cause deaths calculated as part of the GBD population, fertility, and mortality estimates. Deaths were multiplied by standard life expectancy at each age to calculate YLLs. A Bayesian meta-regression modelling tool, DisMod-MR 2.1, was used to ensure consistency between incidence, prevalence, remission, excess mortality, and cause-specific mortality for most causes. Prevalence estimates were multiplied by disability weights for mutually exclusive sequelae of diseases and injuries to calculate YLDs. We considered results in the context of the Socio-demographic Index (SDI), a composite indicator of income per capita, years of schooling, and fertility rate in females younger than 25 years. Uncertainty intervals (UIs) were generated for every metric using the 25th and 975th ordered 1000 draw values of the posterior distribution. Findings: Global health has steadily improved over the past 30 years as measured by age-standardised DALY rates. After taking into account population growth and ageing, the absolute number of DALYs has remained stable. Since 2010, the pace of decline in global age-standardised DALY rates has accelerated in age groups younger than 50 years compared with the 1990–2010 time period, with the greatest annualised rate of decline occurring in the 0–9-year age group. Six infectious diseases were among the top ten causes of DALYs in children younger than 10 years in 2019: lower respiratory infections (ranked second), diarrhoeal diseases (third), malaria (fifth), meningitis (sixth), whooping cough (ninth), and sexually transmitted infections (which, in this age group, is fully accounted for by congenital syphilis; ranked tenth). In adolescents aged 10–24 years, three injury causes were among the top causes of DALYs: road injuries (ranked first), self-harm (third), and interpersonal violence (fifth). Five of the causes that were in the top ten for ages 10–24 years were also in the top ten in the 25–49-year age group: road injuries (ranked first), HIV/AIDS (second), low back pain (fourth), headache disorders (fifth), and depressive disorders (sixth). In 2019, ischaemic heart disease and stroke were the top-ranked causes of DALYs in both the 50–74-year and 75-years-and-older age groups. Since 1990, there has been a marked shift towards a greater proportion of burden due to YLDs from non-communicable diseases and injuries. In 2019, there were 11 countries where non-communicable disease and injury YLDs constituted more than half of all disease burden. Decreases in age-standardised DALY rates have accelerated over the past decade in countries at the lower end of the SDI range, while improvements have started to stagnate or even reverse in countries with higher SDI. Interpretation: As disability becomes an increasingly large component of disease burden and a larger component of health expenditure, greater research and developm nt investment is needed to identify new, more effective intervention strategies. With a rapidly ageing global population, the demands on health services to deal with disabling outcomes, which increase with age, will require policy makers to anticipate these changes. The mix of universal and more geographically specific influences on health reinforces the need for regular reporting on population health in detail and by underlying cause to help decision makers to identify success stories of disease control to emulate, as well as opportunities to improve. Funding: Bill & Melinda Gates Foundation. © 2020 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 licens

    Global age-sex-specific fertility, mortality, healthy life expectancy (HALE), and population estimates in 204 countries and territories, 1950-2019 : a comprehensive demographic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background: Accurate and up-to-date assessment of demographic metrics is crucial for understanding a wide range of social, economic, and public health issues that affect populations worldwide. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019 produced updated and comprehensive demographic assessments of the key indicators of fertility, mortality, migration, and population for 204 countries and territories and selected subnational locations from 1950 to 2019. Methods: 8078 country-years of vital registration and sample registration data, 938 surveys, 349 censuses, and 238 other sources were identified and used to estimate age-specific fertility. Spatiotemporal Gaussian process regression (ST-GPR) was used to generate age-specific fertility rates for 5-year age groups between ages 15 and 49 years. With extensions to age groups 10–14 and 50–54 years, the total fertility rate (TFR) was then aggregated using the estimated age-specific fertility between ages 10 and 54 years. 7417 sources were used for under-5 mortality estimation and 7355 for adult mortality. ST-GPR was used to synthesise data sources after correction for known biases. Adult mortality was measured as the probability of death between ages 15 and 60 years based on vital registration, sample registration, and sibling histories, and was also estimated using ST-GPR. HIV-free life tables were then estimated using estimates of under-5 and adult mortality rates using a relational model life table system created for GBD, which closely tracks observed age-specific mortality rates from complete vital registration when available. Independent estimates of HIV-specific mortality generated by an epidemiological analysis of HIV prevalence surveys and antenatal clinic serosurveillance and other sources were incorporated into the estimates in countries with large epidemics. Annual and single-year age estimates of net migration and population for each country and territory were generated using a Bayesian hierarchical cohort component model that analysed estimated age-specific fertility and mortality rates along with 1250 censuses and 747 population registry years. We classified location-years into seven categories on the basis of the natural rate of increase in population (calculated by subtracting the crude death rate from the crude birth rate) and the net migration rate. We computed healthy life expectancy (HALE) using years lived with disability (YLDs) per capita, life tables, and standard demographic methods. Uncertainty was propagated throughout the demographic estimation process, including fertility, mortality, and population, with 1000 draw-level estimates produced for each metric. Findings: The global TFR decreased from 2·72 (95% uncertainty interval [UI] 2·66–2·79) in 2000 to 2·31 (2·17–2·46) in 2019. Global annual livebirths increased from 134·5 million (131·5–137·8) in 2000 to a peak of 139·6 million (133·0–146·9) in 2016. Global livebirths then declined to 135·3 million (127·2–144·1) in 2019. Of the 204 countries and territories included in this study, in 2019, 102 had a TFR lower than 2·1, which is considered a good approximation of replacement-level fertility. All countries in sub-Saharan Africa had TFRs above replacement level in 2019 and accounted for 27·1% (95% UI 26·4–27·8) of global livebirths. Global life expectancy at birth increased from 67·2 years (95% UI 66·8–67·6) in 2000 to 73·5 years (72·8–74·3) in 2019. The total number of deaths increased from 50·7 million (49·5–51·9) in 2000 to 56·5 million (53·7–59·2) in 2019. Under-5 deaths declined from 9·6 million (9·1–10·3) in 2000 to 5·0 million (4·3–6·0) in 2019. Global population increased by 25·7%, from 6·2 billion (6·0–6·3) in 2000 to 7·7 billion (7·5–8·0) in 2019. In 2019, 34 countries had negative natural rates of increase; in 17 of these, the population declined because immigration was not sufficient to counteract the negative rate of decline. Globally, HALE increased from 58·6 years (56·1–60·8) in 2000 to 63·5 years (60·8–66·1) in 2019. HALE increased in 202 of 204 countries and territories between 2000 and 2019

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.</p

    Measurement of the charge asymmetry in top-quark pair production in the lepton-plus-jets final state in pp collision data at s=8TeV\sqrt{s}=8\,\mathrm TeV{} with the ATLAS detector

    Get PDF
    corecore