173 research outputs found
The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective
The World Wide Web no longer consists just of HTML pages. Our work sheds
light on a number of trends on the Internet that go beyond simple Web pages.
The hidden Web provides a wealth of data in semi-structured form, accessible
through Web forms and Web services. These services, as well as numerous other
applications on the Web, commonly use XML, the eXtensible Markup Language. XML
has become the lingua franca of the Internet that allows customized markups to
be defined for specific domains. On top of XML, the Semantic Web grows as a
common structured data source. In this work, we first explain each of these
developments in detail. Using real-world examples from scientific domains of
great interest today, we then demonstrate how these new developments can assist
the managing, harvesting, and organization of data on the Web. On the way, we
also illustrate the current research avenues in these domains. We believe that
this effort would help bridge multiple database tracks, thereby attracting
researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011
Emerging multidisciplinary research across database management systems
The database community is exploring more and more multidisciplinary avenues:
Data semantics overlaps with ontology management; reasoning tasks venture into
the domain of artificial intelligence; and data stream management and
information retrieval shake hands, e.g., when processing Web click-streams.
These new research avenues become evident, for example, in the topics that
doctoral students choose for their dissertations. This paper surveys the
emerging multidisciplinary research by doctoral students in database systems
and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D.
workshop at the International Conference on Information and Knowledge
Management (CIKM). The topics addressed include ontology development, data
streams, natural language processing, medical databases, green energy, cloud
computing, and exploratory search. In addition to core ideas from the workshop,
we list some open research questions in these multidisciplinary areas
Open Digital Forms
International audienceThe maintenance of digital libraries often passes through physical paper forms. Such forms are tedious to handle for both senders and receivers. Several commercial solutions exist for the digitization of forms. However, most of them are proprietary, expensive, centralized, or require software installation. With this demo, we propose a free, secure, and lightweight framework for digital forms. It is based on HTML documents with embedded JavaScript, it uses exclusively open standards, and it does not require a centralized architecture. Our forms can be digitally signed with the OpenPGP standard, and they contain machine-readable RDFa. Thus, they allow for the semantic analysis, sharing, re-use, or merger of documents across users or institutions
Emerging multidisciplinary research across database management systems
The database community is exploring more and more multidisciplinary avenues:
Data semantics overlaps with ontology management; reasoning tasks venture into
the domain of artificial intelligence; and data stream management and
information retrieval shake hands, e.g., when processing Web click-streams.
These new research avenues become evident, for example, in the topics that
doctoral students choose for their dissertations. This paper surveys the
emerging multidisciplinary research by doctoral students in database systems
and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D.
workshop at the International Conference on Information and Knowledge
Management (CIKM). The topics addressed include ontology development, data
streams, natural language processing, medical databases, green energy, cloud
computing, and exploratory search. In addition to core ideas from the workshop,
we list some open research questions in these multidisciplinary areas
Ontology Alignment at the Instance and Schema Level
We present PARIS, an approach for the automatic alignment of ontologies.
PARIS aligns not only instances, but also relations and classes. Alignments at
the instance-level cross-fertilize with alignments at the schema-level.
Thereby, our system provides a truly holistic solution to the problem of
ontology alignment. The heart of the approach is probabilistic. This allows
PARIS to run without any parameter tuning. We demonstrate the efficiency of the
algorithm and its precision through extensive experiments. In particular, we
obtain a precision of around 90% in experiments with two of the world's largest
ontologies.Comment: Technical Report at INRIA RT-040
The Locality and Symmetry of Positional Encodings
Positional Encodings (PEs) are used to inject word-order information into
transformer-based language models. While they can significantly enhance the
quality of sentence representations, their specific contribution to language
models is not fully understood, especially given recent findings that various
positional encodings are insensitive to word order. In this work, we conduct a
systematic study of positional encodings in \textbf{Bidirectional Masked
Language Models} (BERT-style) , which complements existing work in three
aspects: (1) We uncover the core function of PEs by identifying two common
properties, Locality and Symmetry; (2) We show that the two properties are
closely correlated with the performances of downstream tasks; (3) We quantify
the weakness of current PEs by introducing two new probing tasks, on which
current PEs perform poorly. We believe that these results are the basis for
developing better PEs for transformer-based language models. The code is
available at \faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry}Comment: Long Paper in Findings of EMNLP2
ESTER: efficient search on text, entities, and relations
We present ESTER, a modular and highly efficient system for combined full-text and ontology search. ESTER builds on a query engine that supports two basic operations: prefix search and join. Both of these can be implemented very efficiently with a compact index, yet in combination provide powerful querying capabilities. We show how ESTER can answer basic SPARQL graphpattern queries on the ontology by reducing them to a small number of these two basic operations. ESTER further supports a natural blend of such semantic queries with ordinary full-text queries. Moreover, the prefix search operation allows for a fully interactive and proactive user interface, which after every keystroke suggests to the user possible semantic interpretations of his or her query, and speculatively executes the most likely of these interpretations. As a proof of concept, we applied ESTER to the English Wikipedia, which contains about 3 million documents, combined with the recent YAGO ontology, which contains about 2.5 million facts. For a variety of complex queries, ESTER achieves worst-case query processing times of a fraction of a second, on a single machine, with an index size of about 4 GB
Knowledge harvesting from text and web sources
Abstract-The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources has enabled the automatic construction of very large knowledge bases. Recent endeavors of this kind include academic research projects such as DBpedia, KnowItAll, Probase, ReadTheWeb, and YAGO, as well as industrial ones such as Freebase and Trueknowledge. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disambiguating natural-language text, deep question answering, and semantic search for entities and relations in Web and enterprise data. Prominent examples of how knowledge bases can be harnessed include the Google Knowledge Graph and the IBM Watson question answering system. This tutorial presents state-of-theart methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications
Knowledge Bases in the Age of Big Data Analytics
This tutorial gives an overview on state-of-the-art methods for the automatic construction of large knowledge bases and harnessing them for data and text analytics. It covers both big-data methods for building knowledge bases and knowledge bases being assets for big-data applications. The tutorial also points out challenges and research opportunities.</jats:p
- …