106 research outputs found

    Ontology Population for Open-Source Intelligence

    Get PDF
    We present an approach based on GATE (General Architecture for Text Engineering) for the automatic population of ontologies from text documents. We describe some experimental results, which are encouraging in terms of extracted correct instances of the ontology. We then focus on a phase of our pipeline and discuss a variant thereof, which aims at reducing the manual effort needed to generate pre-defined dictionaries used in document annotation. Our additional experiments show promising results also in this case

    Ontology population for open-source intelligence: A GATE-based solution

    Get PDF
    Open-Source INTelligence is intelligence based on publicly available sources such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but because of the vast amount of documents available, automatic mechanisms for their population are needed, starting from the crawled text. This paper presents an approach for the automatic population of predefined ontologies with data extracted from text and discusses the design and realization of a pipeline based on the General Architecture for Text Engineering system, which is interesting for both researchers and practitioners in the field. Some experimental results that are encouraging in terms of extracted correct instances of the ontology are also reported. Furthermore, the paper also describes an alternative approach and provides additional experiments for one of the phases of our pipeline, which requires the use of predefined dictionaries for relevant entities. Through such a variant, the manual workload required in this phase was reduced, still obtaining promising results

    Ontology-based Document Spanning Systems for Information Extraction

    Get PDF
    Information Extraction (IE) is the task of automatically organizing in a structured form data extracted from free text documents. In several contexts, it is often desirable that extracted data are then organized according to an ontology, which provides a formal and conceptual representation of the domain of interest. Ontologies allow for a better data interpretation, as well as for their semantic integration with other information, as in Ontology-based Data Access (OBDA), a popular declarative framework for data management where an ontology is connected to a data layer through mappings. However, the data layer considered so far in OBDA has consisted essentially of relational databases, and how to declaratively couple an ontology with unstructured data sources is still unexplored. By leveraging the recent study on document spanners for rule-based IE by Fagin et al., in this paper we propose a new framework that allows to map text documents to ontologies, in the spirit of OBDA. We investigate the problem of answering conjunctive queries in this framework. For ontologies specified in the Description Logics DL-LiteR and DL-LiteF , we show that the problem is polynomial in the size of the underlying documents. We also provide algorithms to solve query answering by rewriting the input query on the basis of the ontology and its mapping towards the source documents. Through these techniques we pursue a virtual approach, similar to that typically adopted in OBDA, which allows us to answer a query without having to first populate the entire ontology. Interestingly, for DL-LiteR both the spanners used in the mapping and the one computed by the rewriting algorithm belong to the same class of expressiveness. This holds also for DL-LiteF , modulo some limitations on the form of the mapping. These results say that in these cases our framework can be easily implemented by decoupling ontology management and document access, which can be delegated to an external IE system able to compute the extraction rules we use in the mapping

    A Framework for Combining Entity Resolution and Query Answering in Knowledge Bases

    Full text link
    We propose a new framework for combining entity resolution and query answering in knowledge bases (KBs) with tuple-generating dependencies (tgds) and equality-generating dependencies (egds) as rules. We define the semantics of the KB in terms of special instances that involve equivalence classes of entities and sets of values. Intuitively, the former collect all entities denoting the same real-world object, while the latter collect all alternative values for an attribute. This approach allows us to both resolve entities and bypass possible inconsistencies in the data. We then design a chase procedure that is tailored to this new framework and has the feature that it never fails; moreover, when the chase procedure terminates, it produces a universal solution, which in turn can be used to obtain the certain answers to conjunctive queries. We finally discuss challenges arising when the chase does not terminate

    Reproducibility, accuracy and concordance of Accutrend® Plus for measuring circulating lipid concentration in adults

    Get PDF
    Introduction: The determination of lipid biomarkers by capillary sampling may be useful in the scree-ning, diagnosis and/or personal management of hyperlipidemia and cardiovascular risk. It remains unclear whether the use of the Accutrend® Plus system is appropriate. This study aimed to assess its reproducibility, accuracy and concordance for blood lipid profiling in adults. Materials and methods: Fasting capillary total cholesterol (TC) and triglyceride (TG) concentration on Accutrend® Plus were compared with their venous analogues obtained by a laboratory reference method in sixty-one adults (27 men and 34 women, aged 33.0 years). Supplementary capillary sam-pling was performed at two consecutive days taking into account macro-nutrient intake. Results: The day-to-day reproducibility of the Accutrend® Plus system proved to be high for TC (ICC = 0.85, P < 0.001), but moderate for TG (ICC = 0.68, P < 0.001). Strong correlations (r ≥ 0.80, P < 0.001) with the reference method were found for TC and TG. Mean difference (limits of agreement) were: 0.26 mmol/L (-0.95, 1.47) for TC, and -0.16 mmol/L (-1.29, 0.98) for TG. The concordance for subject classification according to the National Cholesterol Education Program (NCEP) guidelines was significant (P < 0.001), with substantial agreement for TC (κw = 0.67), and moderate agreement for TG (κw = 0.50). Conclusions: Day-to-day reproducibility of the Accutrend® Plus device for TC and TG is not optimal and lacks accuracy when compared to the reference laboratory method. The concordance between both methods for classifying subjects according to the NCEP is inadequate. Accutrend® Plus device should not be interchangeably used as a substitution for the standard laboratory methods in the diag-nosis of hyperlipidemia

    Automatic Information Extraction from Investment Product Documents

    Get PDF
    In this paper we report on the activities carried out within a collaboration between Consob and Sapienza University. The developed project focus on Information Extraction from documents describing financial investment products. We discuss how we automate this task, via both rule-based and machine learningbased methods, and describe the performances of our approach

    The validity of ultrasound-derived equation models to predict whole-body muscle mass:A systematic review

    Get PDF
    Background & aims: Sarcopenia is defined as the age-related loss in muscle quantity and quality which is associated with physical disability. The assessment of muscle quantity plays a role in the diagnosis of sarcopenia. However, the methods used for this assessment have many disadvantages in daily practice and research, like high costs, exposure to radiation, not being portable, or doubtful reliability. Ultrasound has been suggested for the estimation of muscle quantity by estimating muscle mass, using a prediction equation based on muscle thickness. In this systematic review, we aimed to summarize the available evidence on existing prediction equations to estimate muscle mass and to assess whether these are applicable in various adult populations. Methods: The databases PubMed, PsycINFO, and Web of Science were used to search for studies predicting total or appendicular muscle mass using ultrasound. The methodological quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies, version 2 (QUADAS-2) and the quality assessment checklist (QA) designed by Pretorius and Keating (2008). Results: Twelve studies were included in this systematic review. The participants were between 18 and 79 years old. Magnetic Resonance Imaging and dual-energy X-ray absorptiometry were used as reference methods. The studies generally had low risk of bias and there were low concerns regarding the applicability (QUADAS-2). Nine out of eleven studies reached high quality on the QA. All equations were developed in healthy adults. Conclusions: The ultrasound-derived equations in the included articles are valid and applicable in a healthy population. For a Caucasian population we recommend to use the equation of Abe et al., 2015. While for an Asian population, we recommend to use the equation of Abe et al., 2018, for the South American population, the use of the equation of Barbosa-Silva et al., 2021 is the most appropriate
    • …
    corecore