108 research outputs found
Ontology Population for Open-Source Intelligence
We present an approach based on GATE (General Architecture for Text Engineering) for the automatic population of ontologies from text documents. We describe some experimental results, which are encouraging in terms of extracted correct instances of the ontology. We then focus on a phase of our pipeline and discuss a variant thereof, which aims at reducing the manual effort needed to generate pre-defined dictionaries used in document annotation. Our additional experiments show promising results also in this case
Ontology population for open-source intelligence: A GATE-based solution
Open-Source INTelligence is intelligence based on publicly available sources such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but because of the vast amount of documents available, automatic mechanisms for their population are needed, starting from the crawled text. This paper presents an approach for the automatic population of predefined ontologies with data extracted from text and discusses the design and realization of a pipeline based on the General Architecture for Text Engineering system, which is interesting for both researchers and practitioners in the field. Some experimental results that are encouraging in terms of extracted correct instances of the ontology are also reported. Furthermore, the paper also describes an alternative approach and provides additional experiments for one of the phases of our pipeline, which requires the use of predefined dictionaries for relevant entities. Through such a variant, the manual workload required in this phase was reduced, still obtaining promising results
Ontology-based Document Spanning Systems for Information Extraction
Information Extraction (IE) is the task of automatically organizing in a structured form data extracted from free text documents. In several contexts, it is often desirable that extracted data are then organized according to an ontology, which provides a formal and conceptual representation of the domain of interest. Ontologies allow for a better data interpretation, as well as for their semantic integration with other information, as in Ontology-based Data Access (OBDA), a popular declarative framework for data management where an ontology is connected to a data layer through mappings. However, the data layer considered so far in OBDA has consisted essentially of relational databases, and how to declaratively couple an ontology with unstructured data sources is still unexplored.
By leveraging the recent study on document spanners for rule-based IE by Fagin et al., in this paper we propose a new framework that allows to map text documents to ontologies, in the spirit of OBDA. We investigate the problem of answering conjunctive queries in this framework. For ontologies specified in the Description Logics DL-LiteR and DL-LiteF , we show that the problem is polynomial in the size of the underlying documents. We also provide algorithms to solve query answering by rewriting the input query on the basis of the ontology and its mapping towards the source documents. Through these techniques we pursue a virtual approach, similar to that typically adopted in OBDA, which allows us to answer a query without having to first populate the entire ontology. Interestingly, for DL-LiteR both the spanners used in the mapping and the one computed by the rewriting algorithm belong to the same class of expressiveness. This holds also for DL-LiteF , modulo some limitations on the form of the mapping. These results say that in these cases our framework can be easily implemented by decoupling ontology management and document access, which can be delegated to an external IE system able to compute the extraction rules we use in the mapping
Coupling ontologies with document spanners
[No abstract available
A Framework for Combining Entity Resolution and Query Answering in Knowledge Bases
We propose a new framework for combining entity resolution and query
answering in knowledge bases (KBs) with tuple-generating dependencies (tgds)
and equality-generating dependencies (egds) as rules. We define the semantics
of the KB in terms of special instances that involve equivalence classes of
entities and sets of values. Intuitively, the former collect all entities
denoting the same real-world object, while the latter collect all alternative
values for an attribute. This approach allows us to both resolve entities and
bypass possible inconsistencies in the data. We then design a chase procedure
that is tailored to this new framework and has the feature that it never fails;
moreover, when the chase procedure terminates, it produces a universal
solution, which in turn can be used to obtain the certain answers to
conjunctive queries. We finally discuss challenges arising when the chase does
not terminate
Reproducibility, accuracy and concordance of Accutrend® Plus for measuring circulating lipid concentration in adults
Introduction: The determination of lipid biomarkers by capillary sampling may be useful in the scree-ning, diagnosis and/or personal management of hyperlipidemia and cardiovascular risk. It remains unclear whether the use of the Accutrend® Plus system is appropriate. This study aimed to assess its reproducibility, accuracy and concordance for blood lipid profiling in adults.
Materials and methods: Fasting capillary total cholesterol (TC) and triglyceride (TG) concentration on Accutrend® Plus were compared with their venous analogues obtained by a laboratory reference method in sixty-one adults (27 men and 34 women, aged 33.0 years). Supplementary capillary sam-pling was performed at two consecutive days taking into account macro-nutrient intake.
Results: The day-to-day reproducibility of the Accutrend® Plus system proved to be high for TC (ICC = 0.85, P < 0.001), but moderate for TG (ICC = 0.68, P < 0.001). Strong correlations (r ≥ 0.80, P < 0.001) with the reference method were found for TC and TG. Mean difference (limits of agreement) were: 0.26 mmol/L (-0.95, 1.47) for TC, and -0.16 mmol/L (-1.29, 0.98) for TG. The concordance for subject classification according to the National Cholesterol Education Program (NCEP) guidelines was significant (P < 0.001), with substantial agreement for TC (κw = 0.67), and moderate agreement for TG (κw = 0.50).
Conclusions: Day-to-day reproducibility of the Accutrend® Plus device for TC and TG is not optimal and lacks accuracy when compared to the reference laboratory method. The concordance between both methods for classifying subjects according to the NCEP is inadequate. Accutrend® Plus device should not be interchangeably used as a substitution for the standard laboratory methods in the diag-nosis of hyperlipidemia
Automatic Information Extraction from Investment Product Documents
In this paper we report on the activities carried out within a collaboration between Consob and Sapienza University. The developed project focus on Information Extraction from documents describing financial investment products. We discuss how we automate this task, via both rule-based and machine learningbased methods, and describe the performances of our approach
The validity of ultrasound-derived equation models to predict whole-body muscle mass:A systematic review
Background & aims: Sarcopenia is defined as the age-related loss in muscle quantity and quality which is associated with physical disability. The assessment of muscle quantity plays a role in the diagnosis of sarcopenia. However, the methods used for this assessment have many disadvantages in daily practice and research, like high costs, exposure to radiation, not being portable, or doubtful reliability. Ultrasound has been suggested for the estimation of muscle quantity by estimating muscle mass, using a prediction equation based on muscle thickness. In this systematic review, we aimed to summarize the available evidence on existing prediction equations to estimate muscle mass and to assess whether these are applicable in various adult populations. Methods: The databases PubMed, PsycINFO, and Web of Science were used to search for studies predicting total or appendicular muscle mass using ultrasound. The methodological quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies, version 2 (QUADAS-2) and the quality assessment checklist (QA) designed by Pretorius and Keating (2008). Results: Twelve studies were included in this systematic review. The participants were between 18 and 79 years old. Magnetic Resonance Imaging and dual-energy X-ray absorptiometry were used as reference methods. The studies generally had low risk of bias and there were low concerns regarding the applicability (QUADAS-2). Nine out of eleven studies reached high quality on the QA. All equations were developed in healthy adults. Conclusions: The ultrasound-derived equations in the included articles are valid and applicable in a healthy population. For a Caucasian population we recommend to use the equation of Abe et al., 2015. While for an Asian population, we recommend to use the equation of Abe et al., 2018, for the South American population, the use of the equation of Barbosa-Silva et al., 2021 is the most appropriate
- …