37 research outputs found
High-Precision Biomedical Relation Extraction for Reducing Human Curation Efforts in Industrial Applications
The body of biomedical literature is growing at an unprecedented rate, exceeding the ability of researchers to make effective use of this knowledge-rich amount of information. This growth has created interest in biomedical relation extraction approaches to extract domain-specific knowledge for diverse applications. Despite the great progress in the techniques, the retrieved evidence still needs to undergo a time-consuming manual curation process to be truly useful. Most relation extraction systems have been conceived in the context of Shared Tasks, with the goal of maximizing the F1 score on restricted, domain-specific test sets. However, in industrial applications relations typically serve as input to a pipeline of biologically driven analyses; as a result, highly precise extractions are central for cutting down the manual curation effort, thus to translate the research evidence into practice smoothly and reliably. In this paper, we present a highly precise relation extraction system designed to reduce human curation efforts. The engine is made up of sophisticated rules that leverage linguistic aspects of the texts rather than sticking on application-specific training data. As a result, the system could be applied to diverse needs. Experiments on gold-standard corpora show that the system achieves the highest precision compared with previous rule-based, kernel-based, and neural approaches, while maintaining a F1 score comparable or superior to other methods. To show the usefulness of our approach in industrial scenarios, we finally present a case study on the mTOR pathway, showing how it could be applied on a large-scale
High-Precision Biomedical Relation Extraction for Reducing Human Curation Efforts in Industrial Applications
The body of biomedical literature is growing at an unprecedented rate, exceeding the ability of researchers to make effective use of this knowledge-rich amount of information. This growth has created interest in biomedical relation extraction approaches to extract domain-specific knowledge for diverse applications. Despite the great progress in the techniques, the retrieved evidence still needs to undergo a time-consuming manual curation process to be truly useful. Most relation extraction systems have been conceived in the context of Shared Tasks, with the goal of maximizing the F1 score on restricted, domain-specific test sets. However, in industrial applications relations typically serve as input to a pipeline of biologically driven analyses; as a result, highly precise extractions are central for cutting down the manual curation effort, thus to translate the research evidence into practice smoothly and reliably. In this paper, we present a highly precise relation extraction system designed to reduce human curation efforts. The engine is made up of sophisticated rules that leverage linguistic aspects of the texts rather than sticking on application-specific training data. As a result, the system could be applied to diverse needs. Experiments on gold-standard corpora show that the system achieves the highest precision compared with previous rule-based, kernel-based, and neural approaches, while maintaining a F1 score comparable or superior to other methods. To show the usefulness of our approach in industrial scenarios, we finally present a case study on the mTOR pathway, showing how it could be applied on a large-scale
Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora.
We address the creation of cross-lingual tex- tual entailment corpora by means of crowdsourcing. Our goal is to define a cheap and replicable data collection methodology that minimizes the manual work done by expert annotators, without resorting to preprocessing tools or already annotated monolingual datasets. In line with recent works empha- sizing the need of large-scale annotation efforts for textual entailment, our work aims to: i) tackle the scarcity of data available to train and evaluate systems, and ii) promote the re- course to crowdsourcing as an effective way to reduce the costs of data collection without sacrificing quality. We show that a complex data creation task, for which even experts usually feature low agreement scores, can be effectively decomposed into simple subtasks as- signed to non-expert annotators. The resulting dataset, obtained from a pipeline of different jobs routed to Amazon Mechanical Turk, contains more than 1,600 aligned pairs for each combination of texts-hypotheses in English, Italian and German
QSPcc reduces bottlenecks in computational model simulations
Mathematical models have grown in size and complexity becoming often computationally intractable. In sensitivity analysis and optimization phases, critical for tuning, validation and qualification, these models may be run thousands of times. Scientific programming languages popular for prototyping, such as MATLAB and R, can be a bottleneck in terms of performance. Here we show a compiler-based approach, designed to be universal at handling engineering and life sciences modeling styles, that automatically translates models into fast C code. At first QSPcc is demonstrated to be crucial in enabling the research on otherwise intractable Quantitative Systems Pharmacology models, such as in rare Lysosomal Storage Disorders. To demonstrate the full value in seamlessly accelerating, or enabling, the R&D efforts in natural sciences, we then benchmark QSPcc against 8 solutions on 24 real-world projects from different scientific fields. With speed-ups of 22000x peak, and 1605x arithmetic mean, our results show consistent superior performances.Lombardo and colleagues present QSPcc, a computational code compiler designed to convert code from popular scientific programming languages, such as MATLAB or R, into fast-running C code. This reduces the computational load required for complex modelling approaches and reduces user investment learning additional complex languages
Seismological constraints for the dyke emplacement of the July-August 2001 lateral eruption at Mt. Etna volcano, Italy
In this paper we report seismological evidence regarding the emplacement of the dike that fed the July 18 - August 9, 2001 lateral eruption at Mt. Etna volcano. The shallow intrusion and the opening of the eruptive fracture system, which mostly occurred during July 12, and July 18, were accompanied by one of the most intense seismic swarms of the last 20 years. A total of 2694 earthquakes (1 £ Md £ 3.9) were recorded from the beginning of the swarm (July 12) to the end of the eruption (August 9). Seismicity shows the upward migration of the dike from the basement to the relatively thin volcanic pile. A clear hypocentral migration was observed, well constraining the upwards propagation of a near-vertical dike, oriented roughly N-S, and located a few kilometers south of the summit region. Earthquake distribution and orientation of the P-axes from focal mechanisms indicate that the swarm was caused by the local stress source related to the dike intrusion