9 research outputs found
Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis
More and more websites embed structured data describing for instance
products, reviews, blog posts, people, organizations, events, and cooking recipes
into their HTML pages using markup standards such as Microformats, Microdata
and RDFa. This development has accelerated in the last two years as major Web
companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to
use the embedded data within their applications. In this paper, we analyze the
adoption of RDFa, Microdata, and Microformats across the Web. Our study is
based on a large public Web crawl dating from early 2012 and consisting of 3
billion HTML pages which originate from over 40 million websites. The analysis
reveals the deployment of the different markup standards, the main topical areas
of the published data as well as the different vocabularies that are used within each
topical area to represent data. What distinguishes our work from earlier studies,
published by the large Web companies, is that the analyzed crawl as well as the
extracted data are publicly available. This allows our findings to be verified and to
be used as starting points for further domain-specific investigations as well as for
focused information extraction endeavors
A Quantitative Analysis of the Use of Microdata for Semantic Annotations on Educational Resources
A current trend in the semantic web is the use of embedded markup formats aimed to semantically enrich web content by making it more understandable to search engines and other applications. The deployment of Microdata as a markup format has increased thanks to the widespread of a controlled vocabulary provided by Schema.org. Recently, a set of properties from the Learning Resource Metadata Initiative (LRMI) specification, which describes educational resources, was adopted by Schema.org. These properties, in addition to those related to accessibility and the license of resources included in Schema.org, would enable search engines to provide more relevant results in searching for educational resources for all users, including users with disabilities. In order to obtain a reliable evaluation of the use of Microdata properties related to the LRMI specification, accessibility, and the license of resources, this research conducted a quantitative analysis of the deployment of these properties in large-scale web corpora covering two consecutive years. The corpora contain hundreds of millions of web pages. The results further our understanding of this deployment in addition to highlighting the pending issues and challenges concerning the use of such properties
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data
El actual diluvio de datos está inundando la web con grandes volúmenes de datos representados en RDF, dando lugar a la denominada 'Web de Datos'. En esta tesis proponemos, en primer lugar, un estudio profundo de aquellos textos que nos permitan abordar un conocimiento global de la estructura real de los conjuntos de datos RDF, HDT, que afronta la representación eficiente de grandes volúmenes de datos RDF a través de estructuras optimizadas para su almacenamiento y transmisión en red. HDT representa efizcamente un conjunto de datos RDF a través de su división en tres componentes: la cabecera (Header), el diccionario (Dictionary) y la estructura de sentencias RDF (Triples). A continuación, nos centramos en proveer estructuras eficientes de dichos componentes, ocupando un espacio comprimido al tiempo que se permite el acceso directo a cualquier dat
Optimizing search user interfaces and interactions within professional social networks
Professional social networks (PSNs) play the key role in the online social media ecosystem, generate hundreds of terabytes of new data per day, and connect millions of people. To help users cope with the scale and influx of new information, PSNs provide search functionality. However, most of the search engines within PSNs today still provide only keyword queries, basic faceted search capabilities, and uninformative query-biased snippets overlooking the structured and interlinked nature of PSN entities. This results in siloed information, inefficient results presentation, and suboptimal search user experience (UX). In this thesis, we reconsider and comprehensively study input, control, and presentation elements of the search user interface (SUI) to enable more effective and efficient search within PSNs. Specifically, we demonstrate that: (1) named entity queries (NEQs) and structured queries (SQs) complement each other helping PSN users search for people and explore the PSN social graph beyond the first degree; (2) relevance-aware filtering saves users' efforts when they sort jobs, status updates, and people by an attribute value rather than by relevance; (3) extended informative structured snippets increase job search effectiveness and efficiency by leveraging human intelligence and exposing the most critical information about jobs right on a search engine result page (SERP); and (4) non-redundant delta snippets, which different from traditional query-biased snippets show on a SERP information relevant but complementary to the query, are more favored by users performing entity (e.g. people) search, lead to faster task completion times and better search outcomes. Thus, by modeling the structured and interlinked nature of PSN entities, we can optimize the query-refine-view interaction loop, facilitate serendipitous network exploration, and increase search utility. We believe that the insights, algorithms, and recommendations presented in this thesis will serve the next generation designers of SUIs within and beyond PSNs and shape the (structured) search landscape of the future
Improving Search Effectiveness through Query Log and Entity Mining
The Web is the largest repository of knowledge in the world. Everyday people contribute to make it bigger by generating new web data. Data never sleeps. Every minute someone writes a new blog post, uploads a video or comments on an article. Usually people rely on Web Search Engines for satisfying their information needs: they formulate their needs as text queries and they expect a list of highly relevant documents answering their requests. Being able to manage this massive volume of data, ensuring high quality and performance, is a challenging topic that we tackle in this thesis.
In this dissertation we focus on the Web of Data: a recent approach, originated from the Semantic Web community, consisting in a collective effort to augment the existing Web with semistructured-data. We propose to manage the data explosion shifting from a retrieval model based on documents to a model enriched with entities, where an entity can describe a person, a product, a location, a company, through semi-structured information.
In our work, we combine the Web of Data with an important source of knowledge: query logs, which record the interactions between the Web Search Engine and the users. Query log mining aims at extracting valuable knowledge that can be exploited to enhance users’ search experience. According to this vision, this dissertation aims at improving Web Search Engines toward the mutual use of query logs and entities.
The contributions of this work are the following: we show how historical usage data can be exploited for improving performance during the snippet generation process. Secondly, we propose a query recommender system that, by combining entities with queries, leads to significant improvements to the quality of the suggestions. Furthermore, we develop a new technique for estimating the relatedness between two entities, i.e., their semantic similarity. Finally, we show that entities may be useful for automatically building explanatory statements that aim at helping the user to better understand if, and why, the suggested item can be of her interest
Ontology-based semantic reminiscence support system
This thesis addresses the needs of people who find reminiscence helpful in focusing on
the development of a computerised reminiscence support system, which facilitates the
access to and retrieval of stored memories used as the basis for positive interactions
between elderly and young, and also between people with cognitive impairment and
members of their family or caregivers.
To model users’ background knowledge, this research defines a light weight useroriented
ontology and its building principles. The ontology is flexible, and has
simplified knowledge structure populated with semantically homogeneous ontology
concepts. The user-oriented ontology is different from generic ontology models, as it
does not rely on knowledge experts. Its structure enables users to browse, edit and
create new entries on their own.
To solve the semantic gap problem in personal information retrieval, this thesis
proposes a semantic ontology-based feature matching method. It involves natural
language processing and semantic feature extraction/selection using the user-oriented
ontology. It comprises four stages: (i) user-oriented ontology building, (ii) semantic
feature extraction for building vectors representing information objects, (iii) semantic
feature selection using the user-oriented ontology, and (iv) measuring the similarity
between the information objects.
To facilitate personal information management and dynamic generation of content,
the system uses ontologies and advanced algorithms for semantic feature matching.
An algorithm named Onto-SVD is also proposed, which uses the user-oriented
ontology to automatically detect the semantic relations within the stored memories. It
combines semantic feature selection with matrix factorisation and k-means clustering
to achieve topic identification based on semantic relations.
The thesis further proposes an ontology-based personalised retrieval mechanism for
the system. It aims to assist people to recall, browse and re-discover events from their
lives by considering their profiles and background knowledge, and providing them
v
with customised retrieval results. Furthermore, a user profile space model is defined,
and its construction method is also described. The model combines multiple useroriented
ontologies and has a self-organised structure based on relevance feedback.
The identification of person’s search intentions in this mechanism is on the conceptual
level and involves the person’s background knowledge. Based on the identified search
intentions, knowledge spanning trees are automatically generated from the ontologies
or user profile spaces. The knowledge spanning trees are used to expand and reform
queries, which enhance the queries’ semantic representations by applying domain
knowledge.
The crowdsourcing-based system evaluation measures users’ satisfaction on the
generated content of Sem-LSB. It compares the advantage and disadvantage of three
types of content presentations (i.e. unstructured, LSB-based and semantic/knowledgebased).
Based on users’ feedback, the semantic/knowledge-based presentation is
considered to have higher overall satisfaction and stronger reminiscing support effects
than the others
Enhanced results for web search
“Ten blue links ” have defined web search results for the last fifteen years – snippets of text combined with document titles and URLs. In this paper, we establish the notion of enhanced search results that extend web search results to include multimedia objects such as images and video, intentspecific key value pairs, and elements that allow the user to interact with the contents of a web page directly from the search results page. We show that users express a preference for enhanced results both explicitly, and when observed in their search behavior. We also demonstrate the effectiveness of enhanced results in helping users to assess the relevance of search results. Lastly, we show that we can efficiently generate enhanced results to cover a significant fraction of search result pages
Enhanced results for web search
"Ten blue links" have defined web search results for the last fifteen years -- snippets of text combined with document titles and URLs. In this paper, we establish the notion of enhanced search results that extend web search results to include multimedia objects such as images and video, intent-specific key value pairs, and elements that allow the user to interact with the contents of a web page directly from the search results page. We show that users express a preference for enhanced results both explicitly, and when observed in their search behavior. We also demonstrate the effectiveness of enhanced results in helping users to assess the relevance of search results. Lastly, we show that we can efficiently generate enhanced results to cover a significant fraction of search result pages