12 research outputs found

    Entities with quantities : extraction, search, and ranking

    Get PDF
    Quantities are more than numeric values. They denote measures of the world’s entities such as heights of buildings, running times of athletes, energy efficiency of car models or energy production of power plants, all expressed in numbers with associated units. Entity-centric search and question answering (QA) are well supported by modern search engines. However, they do not work well when the queries involve quantity filters, such as searching for athletes who ran 200m under 20 seconds or companies with quarterly revenue above $2 Billion. State-of-the-art systems fail to understand the quantities, including the condition (less than, above, etc.), the unit of interest (seconds, dollar, etc.), and the context of the quantity (200m race, quarterly revenue, etc.). QA systems based on structured knowledge bases (KBs) also fail as quantities are poorly covered by state-of-the-art KBs. In this dissertation, we developed new methods to advance the state-of-the-art on quantity knowledge extraction and search.Zahlen sind mehr als nur numerische Werte. Sie beschreiben Maße von EntitĂ€ten wie die Höhe von GebĂ€uden, die Laufzeit von Sportlern, die Energieeffizienz von Automodellen oder die Energieerzeugung von Kraftwerken - jeweils ausgedrĂŒckt durch Zahlen mit zugehörigen Einheiten. EntitĂ€tszentriete Anfragen und direktes Question-Answering werden von Suchmaschinen hĂ€ufig gut unterstĂŒtzt. Sie funktionieren jedoch nicht gut, wenn die Fragen Zahlenfilter beinhalten, wie z. B. die Suche nach Sportlern, die 200m unter 20 Sekunden gelaufen sind, oder nach Unternehmen mit einem Quartalsumsatz von ĂŒber 2 Milliarden US-Dollar. Selbst moderne Systeme schaffen es nicht, QuantitĂ€ten, einschließlich der genannten Bedingungen (weniger als, ĂŒber, etc.), der Maßeinheiten (Sekunden, Dollar, etc.) und des Kontexts (200-Meter-Rennen, Quartalsumsatz usw.), zu verstehen. Auch QA-Systeme, die auf strukturierten Wissensbanken (“Knowledge Bases”, KBs) aufgebaut sind, versagen, da quantitative Eigenschaften von modernen KBs kaum erfasst werden. In dieser Dissertation werden neue Methoden entwickelt, um den Stand der Technik zur Wissensextraktion und -suche von QuantitĂ€ten voranzutreiben. Unsere HauptbeitrĂ€ge sind die folgenden: ‱ ZunĂ€chst prĂ€sentieren wir Qsearch [Ho et al., 2019, Ho et al., 2020] – ein System, das mit erweiterten Fragen mit QuantitĂ€tsfiltern umgehen kann, indem es Hinweise verwendet, die sowohl in der Frage als auch in den Textquellen vorhanden sind. Qsearch umfasst zwei HauptbeitrĂ€ge. Der erste Beitrag ist ein tiefes neuronales Netzwerkmodell, das fĂŒr die Extraktion quantitĂ€tszentrierter Tupel aus Textquellen entwickelt wurde. Der zweite Beitrag ist ein neuartiges Query-Matching-Modell zum Finden und zur Reihung passender Tupel. ‱ Zweitens, um beim Vorgang heterogene Tabellen einzubinden, stellen wir QuTE [Ho et al., 2021a, Ho et al., 2021b] vor – ein System zum Extrahieren von QuantitĂ€tsinformationen aus Webquellen, insbesondere Ad-hoc Webtabellen in HTML-Seiten. Der Beitrag von QuTE umfasst eine Methode zur VerknĂŒpfung von QuantitĂ€ts- und EntitĂ€tsspalten, fĂŒr die externe Textquellen genutzt werden. Zur Beantwortung von Fragen kontextualisieren wir die extrahierten EntitĂ€ts-QuantitĂ€ts-Paare mit informativen Hinweisen aus der Tabelle und stellen eine neue Methode zur Konsolidierung und verbesserteer Reihung von Antwortkandidaten durch Inter-Fakten-Konsistenz vor. ‱ Drittens stellen wir QL [Ho et al., 2022] vor – eine Recall-orientierte Methode zur Anreicherung von Knowledge Bases (KBs) mit quantitativen Fakten. Moderne KBs wie Wikidata oder YAGO decken viele EntitĂ€ten und ihre relevanten Informationen ab, ĂŒbersehen aber oft wichtige quantitative Eigenschaften. QL ist frage-gesteuert und basiert auf iterativem Lernen mit zwei HauptbeitrĂ€gen, um die KB-Abdeckung zu verbessern. Der erste Beitrag ist eine Methode zur Expansion von Fragen, um einen grĂ¶ĂŸeren Pool an Faktenkandidaten zu erfassen. Der zweite Beitrag ist eine Technik zur Selbstkonsistenz durch BerĂŒcksichtigung der Werteverteilungen von QuantitĂ€ten

    Generating Rules to Filter Candidate Triples for their Correctness Checking by Knowledge Graph Completion Techniques

    Get PDF
    Knowledge Graphs (KGs) contain large amounts of structured information. Due to their inherent incompleteness, a process known as KG completion is often carried out to find the missing triples in a KG, usually by training a fact checking model that is able to discern between correct and incorrect knowledge. After the fact checking model has been trained and evaluated, it has to be applied to a set of candidate triples, and those that are considered correct are added to the KG as new knowledge. However, this process needs a set of candidate triples of a reasonable size that represents possible new knowledge, in order to be evaluated by the fact checking task and, if considered to be correct, added to the KG, enriching it. Current approaches for selecting candidate triples for their correctness checking either use the full set possible missing candidate triples (and thus provide no filtering) or apply very basic rules to filter out unlikely candidates, which may have a negative effect on the completion performance as very few candidate triples are filtered out. In this paper we present CHAI, a method for producing more complex rules that are able to filter candidate triples by combining a set of criteria to optimize a fitness function. Our experiments show that CHAI is able to generate rules that, when applied, yield smaller candidate sets than similar proposals while still including promising candidate triples.Ministerio de EconomĂ­a y Competitividad TIN2016-75394-

    Socializing One Health: an innovative strategy to investigate social and behavioral risks of emerging viral threats

    Get PDF
    In an effort to strengthen global capacity to prevent, detect, and control infectious diseases in animals and people, the United States Agency for International Development’s (USAID) Emerging Pandemic Threats (EPT) PREDICT project funded development of regional, national, and local One Health capacities for early disease detection, rapid response, disease control, and risk reduction. From the outset, the EPT approach was inclusive of social science research methods designed to understand the contexts and behaviors of communities living and working at human-animal-environment interfaces considered high-risk for virus emergence. Using qualitative and quantitative approaches, PREDICT behavioral research aimed to identify and assess a range of socio-cultural behaviors that could be influential in zoonotic disease emergence, amplification, and transmission. This broad approach to behavioral risk characterization enabled us to identify and characterize human activities that could be linked to the transmission dynamics of new and emerging viruses. This paper provides a discussion of implementation of a social science approach within a zoonotic surveillance framework. We conducted in-depth ethnographic interviews and focus groups to better understand the individual- and community-level knowledge, attitudes, and practices that potentially put participants at risk for zoonotic disease transmission from the animals they live and work with, across 6 interface domains. When we asked highly-exposed individuals (ie. bushmeat hunters, wildlife or guano farmers) about the risk they perceived in their occupational activities, most did not perceive it to be risky, whether because it was normalized by years (or generations) of doing such an activity, or due to lack of information about potential risks. Integrating the social sciences allows investigations of the specific human activities that are hypothesized to drive disease emergence, amplification, and transmission, in order to better substantiate behavioral disease drivers, along with the social dimensions of infection and transmission dynamics. Understanding these dynamics is critical to achieving health security--the protection from threats to health-- which requires investments in both collective and individual health security. Involving behavioral sciences into zoonotic disease surveillance allowed us to push toward fuller community integration and engagement and toward dialogue and implementation of recommendations for disease prevention and improved health security

    Tracy: Tracing facts over knowledge graphs and text

    No full text
    In order to accurately populate and curate Knowledge Graphs (KGs), it is important to distinguish s p o facts that can be traced back to sources from facts that cannot be verified. Manually validating each fact is time-consuming. Prior work on automating this task relied on numerical confidence scores which might not be easily interpreted. To overcome this limitation, we present Tracy, a novel tool that generates human-comprehensible explanations for candidate facts. Our tool relies on background knowledge in the form of rules to rewrite the fact in question into other easier-to-spot facts. These rewritings are then used to reason over the candidate fact creating semantic traces that can aid KG curators. The goal of our demonstration is to illustrate the main features of our system and to show how the semantic traces can be computed over both text and knowledge graphs with a simple and intuitive user interface

    Previously Unrecorded Invasive Species and the Unsatisfying Knowledge of Turtle Communities in Northern Vietnam

    No full text
    According to the IUCN, Southeast Asia is the area of the world with the highest number of threatened turtle species. The current status of chelonians is particularly catastrophic in Vietnam. However, there is still a lack of field data to unambiguously support this fact for a few species. To better understand the freshwater turtle diversity and eventually undertake efficient conservation actions, we conducted surveys with local fishers using standardized questionnaires in two independent river systems in northern Vietnam. A total of 112 questionnaires were administered to as many fishers in April and October 2022. We directly observed four sympatric freshwater species (Pelodiscus sinensis, Palea steindachneri, Mauremys sinensis and Sacalia quadriocellata) in Lao Cai and Yen Bai provinces, and two species (Pelodiscus sinensis and Palea steindachneri) in Bac Giang, Hai Duong, Thai Binh, and Hung Yen provinces. Based on the interviews, we added as possible the presence of two other species (Rafetus swinhoei and Pelochelys cantorii) in each of the two study areas. Moreover, we recorded for the first time in Vietnam, two wild individuals of an invasive alien species, the Common snapping turtle (Chelydra serpentina), confirming that the distribution and ecology of turtle species in Vietnam is poorly understood. Furthermore, recent photos (year 2019) of a 38 kg softshell turtle, possibly attributable to Rafetus swinhoei, were recorded from a restaurant in the area. In conclusion, interviews with local fishers have been found to be useful for exploring the likely presence and the local distribution of the various turtle species

    Characteristics of Hepatitis B Virus Genotype and Sub-Genotype in Hepatocellular Cancer Patients in Vietnam

    No full text
    Untreated chronic hepatitis B virus (HBV) infection can lead to chronic liver disease and may progress to cirrhosis or hepatocellular carcinoma (HCC). HBV infection has been prevalent in Vietnam, but there is little information available on the genotypes, sub-genotypes, and mutations of HBV in patients with HBV-related HCC confirmed by histopathological diagnosis. We studied the molecular characteristics of HBV and its genetic variants in Vietnamese HCC patients after liver tumor resection. We conducted a descriptive cross-sectional study on 107 HBV-related HCC hospitalized patients from October 2018 to April 2019. The specimens collected included EDTA anticoagulant blood and liver tissues. Extracted HBV DNA was subjected to whole genome sequencing by the Sanger method. We discovered 62 individuals (57.9%) with genotype B and 45 patients (42.1%) with genotype C, with only sub-genotypes B4 and C1. Among the mutations, the double mutation, A1762T-G1764A, had the most significant frequency (73/107 samples; 68.2%) and was higher in genotype C than in genotype B (p < 0.001). The most common genotypes found in HCC patients in this investigation were B and C, with sub-genotypes B4 and C1 for each. The prevalence of genotype B4 was greater in HBV-infected Vietnamese HCC patients

    Evaluation of GeneXpert MTB/RIF for Diagnosis of Tuberculous Meningitis

    Get PDF
    Tuberculous meningitis (TBM) is the most severe form of tuberculosis. Microbiological confirmation is rare, and treatment is often delayed, increasing mortality and morbidity. The GeneXpert MTB/RIF test was evaluated in a large cohort of patients with suspected tuberculous meningitis. Three hundred seventy-nine patients presenting with suspected tuberculous meningitis to the Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam, between 17 April 2011 and 31 December 2012 were included in the study. Cerebrospinal fluid samples were tested by Ziehl-Neelsen smear, mycobacterial growth indicator tube (MGIT) culture, and Xpert MTB/RIF. Rifampin (RIF) resistance results by Xpert were confirmed by an MTBDR-Plus line probe assay and all positive cultures were tested by phenotypic MGIT drug susceptibility testing. Overall, 182/379 included patients (48.0%) were diagnosed with tuberculous meningitis. Sensitivities of Xpert, smear, and MGIT culture among patients diagnosed with TBM were 59.3% (108/182 [95% confidence interval {CI}, 51.8 to 66.5%]), 78.6% (143/182 [95% CI, 71.9 to 84.3%]) and 66.5% (121/182 [95% CI, 59.1 to 73.3%]), respectively. There was one false-positive Xpert MTB/RIF test (99.5% specificity). Four cases of RIF resistance (4/109; 3.7%) were identified by Xpert, of which 3 were confirmed to be multidrug-resistant (MDR) TBM and one was culture negative. Xpert MTB/RIF is a rapid and specific test for the diagnosis of tuberculous meningitis. The addition of a vortexing step to sample processing increased sensitivity for confirmed TBM by 20% (P = 0.04). Meticulous examination of a smear from a large volume of cerebrospinal fluid (CSF) remains the most sensitive technique but is not practical in most laboratories. The Xpert MTB/RIF represents a significant advance in the early diagnosis of this devastating condition
    corecore