2,886 research outputs found
Normalisation of imprecise temporal expressions extracted from text
Information extraction systems and techniques have been largely used to deal with the increasing amount of unstructured data available nowadays. Time is among the different kinds of information that may be extracted from such unstructured data sources, including text documents. However, the inability to correctly identify and extract temporal information from text makes it difficult to understand how the extracted events are organised in a chronological order. Furthermore, in many situations, the meaning of temporal expressions (timexes) is imprecise, such as in “less than 2 years” and “several weeks”, and cannot be accurately normalised, leading to interpretation errors. Although there are some approaches that enable representing imprecise timexes, they are not designed to be applied to specific scenarios and difficult to generalise. This paper presents a novel methodology to analyse and normalise imprecise temporal expressions by representing temporal imprecision in the form of membership functions, based on human interpretation of time in two different languages (Portuguese and English). Each resulting model is a generalisation of probability distributions in the form of trapezoidal and hexagonal fuzzy membership functions. We use an adapted F1-score to guide the choice of the best models for each kind of imprecise timex and a weighted F1-score (F1 3 D ) as a complementary metric in order to identify relevant differences when comparing two normalisation models. We apply the proposed methodology for three distinct classes of imprecise timexes, and the resulting models give distinct insights in the way each kind of temporal expression is interpreted
On Range Searching with Semialgebraic Sets II
Let be a set of points in . We present a linear-size data
structure for answering range queries on with constant-complexity
semialgebraic sets as ranges, in time close to . It essentially
matches the performance of similar structures for simplex range searching, and,
for , significantly improves earlier solutions by the first two authors
obtained in~1994. This almost settles a long-standing open problem in range
searching.
The data structure is based on the polynomial-partitioning technique of Guth
and Katz [arXiv:1011.4105], which shows that for a parameter , , there exists a -variate polynomial of degree such that
each connected component of contains at most points
of , where is the zero set of . We present an efficient randomized
algorithm for computing such a polynomial partition, which is of independent
interest and is likely to have additional applications
Data reliability assessment in a data warehouse opened on the Web
International audienceThis paper presents an ontology-driven workflow that feeds and queries a data warehouse opened on the Web. Data are extracted from data tables in Web documents. As web documents are very heterogeneous in nature, a key issue in this workflow is the ability to assess the reliability of retrieved data. We first recall the main steps of our method to annotate and query Web data tables driven by a domain ontology. Then we propose an original method to assess Web data table reliability from a set of criteria by the means of evidence theory. Finally, we show how we extend the workflow to integrate the reliability assessment step
Ontology-based knowledge representation and semantic search information retrieval: case study of the underutilized crops domain
The aim of using semantic technologies in domain knowledge modeling is to introduce the semantic meaning of concepts in knowledge bases, such that they are both human-readable as well as machine-understandable. Due to their powerful knowledge representation formalism and associated inference mechanisms, ontology-based approaches have been increasingly adopted to formally represent domain knowledge. The primary objective of this thesis work has been to use semantic technologies in advancing knowledge-sharing of Underutilized crops as a domain and investigate the integration of underlying ontologies developed in OWL (Web Ontology Language) with augmented SWRL (Semantic Web Rule Language) rules for added expressiveness.
The work further investigated generating ontologies from existing data sources and proposed the reverse-engineering approach of generating domain specific conceptualization through competency questions posed from possible ontology users and domain experts. For utilization, a semantic search engine (the Onto-CropBase) has been developed to serve as a Web-based access point for the Underutilized crops ontology model. Relevant linked-data in Resource Description Framework Schema (RDFS) were added for comprehensiveness in generating federated queries.
While the OWL/SWRL combination offers a highly expressive ontology language for modeling knowledge domains, the combination is found to be lacking supplementary descriptive constructs to model complex real-life scenarios, a necessary requirement for a successful Semantic Web application. To this end, the common logic programming formalisms for extending Description Logic (DL)-based ontologies were explored and the state of the art in SWRL expressiveness extensions determined with a view to extending the SWRL formalism. Subsequently, a novel fuzzy temporal extension to the Semantic Web Rule Language (FT-SWRL), which combines SWRL with fuzzy logic theories based on the valid-time temporal model, has been proposed to allow modeling imprecise temporal expressions in domain ontologies
Temporal detection and analysis of guideline interactions
Background
Clinical practice guidelines (CPGs) are assuming a major role in the medical area, to grant the quality of medical assistance, supporting physicians with evidence-based information of interventions in the treatment of single pathologies. The treatment of patients affected by multiple diseases (comorbid patients) is one of the main challenges for the modern healthcare. It requires the development of new methodologies, supporting physicians in the treatment of interactions between CPGs. Several approaches have started to face such a challenging problem. However, they suffer from a substantial limitation: they do not take into account the temporal dimension. Indeed, practically speaking, interactions occur in time. For instance, the effects of two actions taken from different guidelines may potentially conflict, but practical conflicts happen only if the times of execution of such actions are such that their effects overlap in time.
Objectives
We aim at devising a methodology to detect and analyse interactions between CPGs that considers the temporal dimension.
Methods
In this paper, we first extend our previous ontological model to deal with the fact that actions, goals, effects and interactions occur in time, and to model both qualitative and quantitative temporal constraints between them. Then, we identify different application scenarios, and, for each of them, we propose different types of facilities for user physicians, useful to support the temporal detection of interactions.
Results
We provide a modular approach in which different Artificial Intelligence temporal reasoning techniques, based on temporal constraint propagation, are widely exploited to provide users with such facilities. We applied our methodology to two cases of comorbidities, using simplified versions of CPGs.
Conclusion
We propose an innovative approach to the detection and analysis of interactions between CPGs considering different sources of temporal information (CPGs, ontological knowledge and execution logs), which is the first one in the literature that takes into account the temporal issues, and accounts for different application scenarios
EquiX---A Search and Query Language for XML
EquiX is a search language for XML that combines the power of querying with
the simplicity of searching. Requirements for such languages are discussed and
it is shown that EquiX meets the necessary criteria. Both a graphical abstract
syntax and a formal concrete syntax are presented for EquiX queries. In
addition, the semantics is defined and an evaluation algorithm is presented.
The evaluation algorithm is polynomial under combined complexity.
EquiX combines pattern matching, quantification and logical expressions to
query both the data and meta-data of XML documents. The result of a query in
EquiX is a set of XML documents. A DTD describing the result documents is
derived automatically from the query.Comment: technical report of Hebrew University Jerusalem Israe
AAPOR Report on Big Data
In recent years we have seen an increase in the amount of statistics in society describing different phenomena based on so called Big Data. The term Big Data is used for a variety of data as explained in the report, many of them characterized not just by their large volume, but also by their variety and velocity, the organic way in which they are created, and the new types of processes needed to analyze them and make inference from them. The change in the nature of the new types of data, their availability, the way in which they are collected, and disseminated are fundamental. The change constitutes a paradigm shift for survey research.There is a great potential in Big Data but there are some fundamental challenges that have to be resolved before its full potential can be realized. In this report we give examples of different types of Big Data and their potential for survey research. We also describe the Big Data process and discuss its main challenges
- …