161,132 research outputs found
Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs
Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guaranteeing quality, remains an open and important challenge for the research community. In this paper, we present a set of ensembling strategies oriented toward entity and relation extraction tasks. The main goal is to combine several automatically annotated versions of corpora to produce a single version with improved quality. An ensembler is built by exploring a configuration space in search of the combination that maximizes the fitness of the ensembled collection according to a reference collection. The eHealth-KD 2019 challenge was chosen for the case study. The submitted systems’ outputs were ensembled, resulting in the construction of an automatically annotated collection of 8000 sentences. We show that using this collection as additional training input for a baseline algorithm has a positive impact on its performance. Additionally, the ensembling pipeline was used as a participant system in the 2020 edition of the challenge. The ensembled run achieved a slightly better performance than the individual runs.This research has been partially funded by the University of Alicante and the University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089). Moreover, it has been backed by the work of both COST Actions: CA19134 - “Distributed Knowledge Graphs” and CA19142 - “Leading Platform for European Citizens, Industries, Academia and Policymakers in Media Accessibility”
Negative Statements Considered Useful
Knowledge bases (KBs), pragmatic collections of knowledge about notable entities, are an important asset in applications such as search, question answering and dialogue. Rooted in a long tradition in knowledge representation, all popular KBs only store positive information, while they abstain from taking any stance towards statements not contained in them. In this paper, we make the case for explicitly stating interesting statements which are not true. Negative statements would be important to overcome current limitations of question answering, yet due to their potential abundance, any effort towards compiling them needs a tight coupling with ranking. We introduce two approaches towards compiling negative statements. (i) In peer-based statistical inferences, we compare entities with highly related entities in order to derive potential negative statements, which we then rank using supervised and unsupervised features. (ii) In query-log-based text extraction, we use a pattern-based approach for harvesting search engine query logs. Experimental results show that both approaches hold promising and complementary potential. Along with this paper, we publish the first datasets on interesting negative information, containing over 1.1M statements for 100K popular Wikidata entities
Recommended from our members
Protocol for a systematic scoping review of reasons given to justify the performance of randomised controlled trials.
IntroductionRandomised controlled trials (RCTs) are widely viewed to generate the most reliable medical knowledge. However, RCTs are not always scientifically necessary and therefore not always ethical. Unfortunately, it is not clear when an RCT is not necessary or how this should be established. This study seeks to systematically catalogue justifications offered throughout the medical and ethics literature for performing randomisation within clinical trials.Methods and analysisWe will systematically search electronic databases of the medical literature including MEDLINE, EMBASE, Cochrane Database of Systematic Reviews, Cochrane Clinical Trials Register, Web of Science Proceedings, ClinicalTrials.gov; databases of philosophical literature including Philosopher's Index, Phil Papers, JSTOR, Periodicals Archive Online, Project MUSE, National Reference Centre for Bioethics; the library catalogue at the University of Ottawa; bibliographies of retrieved papers; and the grey literature. We will also pursue suggestions from experts in the fields of medical ethics, philosophy and clinical trial methodology. Article screening, selection and data extraction will be performed by two independent reviewers based on prespecified inclusion/exclusion criteria. A third reviewer will be consulted to resolve any discrepancies. We will then extract the reasons given to justify randomisation using methodology established to extract data in a defensible, systematic manner. We will track the reasons given, their frequency of use and changes over time. Finally, using grounded theory, we will combine the reasons into broader themes. These themes will form the foundation of our subsequent analysis from qualitative and quantitative perspectives. This review will map existing arguments that clinicians, ethicists and philosophers use to ethically justify randomisation in clinical trials.Ethics and disseminationNo research ethics board approval is necessary because we are not examining patient-level data. This protocol complies with the reported guidance for conducting systematic scoping reviews. The findings of this paper will be disseminated via presentations and academic publication. In a subsequent phase of this research, we hope to engage with stakeholders and translate any recommendations derived from our findings into operational guidelines
- …