9 research outputs found

    Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis

    Get PDF
    More and more websites embed structured data describing for instance products, reviews, blog posts, people, organizations, events, and cooking recipes into their HTML pages using markup standards such as Microformats, Microdata and RDFa. This development has accelerated in the last two years as major Web companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to use the embedded data within their applications. In this paper, we analyze the adoption of RDFa, Microdata, and Microformats across the Web. Our study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites. The analysis reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. What distinguishes our work from earlier studies, published by the large Web companies, is that the analyzed crawl as well as the extracted data are publicly available. This allows our findings to be verified and to be used as starting points for further domain-specific investigations as well as for focused information extraction endeavors

    A Quantitative Analysis of the Use of Microdata for Semantic Annotations on Educational Resources

    Get PDF
    A current trend in the semantic web is the use of embedded markup formats aimed to semantically enrich web content by making it more understandable to search engines and other applications. The deployment of Microdata as a markup format has increased thanks to the widespread of a controlled vocabulary provided by Schema.org. Recently, a set of properties from the Learning Resource Metadata Initiative (LRMI) specification, which describes educational resources, was adopted by Schema.org. These properties, in addition to those related to accessibility and the license of resources included in Schema.org, would enable search engines to provide more relevant results in searching for educational resources for all users, including users with disabilities. In order to obtain a reliable evaluation of the use of Microdata properties related to the LRMI specification, accessibility, and the license of resources, this research conducted a quantitative analysis of the deployment of these properties in large-scale web corpora covering two consecutive years. The corpora contain hundreds of millions of web pages. The results further our understanding of this deployment in addition to highlighting the pending issues and challenges concerning the use of such properties

    Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

    Get PDF
    El actual diluvio de datos está inundando la web con grandes volúmenes de datos representados en RDF, dando lugar a la denominada 'Web de Datos'. En esta tesis proponemos, en primer lugar, un estudio profundo de aquellos textos que nos permitan abordar un conocimiento global de la estructura real de los conjuntos de datos RDF, HDT, que afronta la representación eficiente de grandes volúmenes de datos RDF a través de estructuras optimizadas para su almacenamiento y transmisión en red. HDT representa efizcamente un conjunto de datos RDF a través de su división en tres componentes: la cabecera (Header), el diccionario (Dictionary) y la estructura de sentencias RDF (Triples). A continuación, nos centramos en proveer estructuras eficientes de dichos componentes, ocupando un espacio comprimido al tiempo que se permite el acceso directo a cualquier dat

    Optimizing search user interfaces and interactions within professional social networks

    Get PDF
    Professional social networks (PSNs) play the key role in the online social media ecosystem, generate hundreds of terabytes of new data per day, and connect millions of people. To help users cope with the scale and influx of new information, PSNs provide search functionality. However, most of the search engines within PSNs today still provide only keyword queries, basic faceted search capabilities, and uninformative query-biased snippets overlooking the structured and interlinked nature of PSN entities. This results in siloed information, inefficient results presentation, and suboptimal search user experience (UX). In this thesis, we reconsider and comprehensively study input, control, and presentation elements of the search user interface (SUI) to enable more effective and efficient search within PSNs. Specifically, we demonstrate that: (1) named entity queries (NEQs) and structured queries (SQs) complement each other helping PSN users search for people and explore the PSN social graph beyond the first degree; (2) relevance-aware filtering saves users' efforts when they sort jobs, status updates, and people by an attribute value rather than by relevance; (3) extended informative structured snippets increase job search effectiveness and efficiency by leveraging human intelligence and exposing the most critical information about jobs right on a search engine result page (SERP); and (4) non-redundant delta snippets, which different from traditional query-biased snippets show on a SERP information relevant but complementary to the query, are more favored by users performing entity (e.g. people) search, lead to faster task completion times and better search outcomes. Thus, by modeling the structured and interlinked nature of PSN entities, we can optimize the query-refine-view interaction loop, facilitate serendipitous network exploration, and increase search utility. We believe that the insights, algorithms, and recommendations presented in this thesis will serve the next generation designers of SUIs within and beyond PSNs and shape the (structured) search landscape of the future

    Improving Search Effectiveness through Query Log and Entity Mining

    Get PDF
    The Web is the largest repository of knowledge in the world. Everyday people contribute to make it bigger by generating new web data. Data never sleeps. Every minute someone writes a new blog post, uploads a video or comments on an article. Usually people rely on Web Search Engines for satisfying their information needs: they formulate their needs as text queries and they expect a list of highly relevant documents answering their requests. Being able to manage this massive volume of data, ensuring high quality and performance, is a challenging topic that we tackle in this thesis. In this dissertation we focus on the Web of Data: a recent approach, originated from the Semantic Web community, consisting in a collective effort to augment the existing Web with semistructured-data. We propose to manage the data explosion shifting from a retrieval model based on documents to a model enriched with entities, where an entity can describe a person, a product, a location, a company, through semi-structured information. In our work, we combine the Web of Data with an important source of knowledge: query logs, which record the interactions between the Web Search Engine and the users. Query log mining aims at extracting valuable knowledge that can be exploited to enhance users’ search experience. According to this vision, this dissertation aims at improving Web Search Engines toward the mutual use of query logs and entities. The contributions of this work are the following: we show how historical usage data can be exploited for improving performance during the snippet generation process. Secondly, we propose a query recommender system that, by combining entities with queries, leads to significant improvements to the quality of the suggestions. Furthermore, we develop a new technique for estimating the relatedness between two entities, i.e., their semantic similarity. Finally, we show that entities may be useful for automatically building explanatory statements that aim at helping the user to better understand if, and why, the suggested item can be of her interest

    Ontology-based semantic reminiscence support system

    Get PDF
    This thesis addresses the needs of people who find reminiscence helpful in focusing on the development of a computerised reminiscence support system, which facilitates the access to and retrieval of stored memories used as the basis for positive interactions between elderly and young, and also between people with cognitive impairment and members of their family or caregivers. To model users’ background knowledge, this research defines a light weight useroriented ontology and its building principles. The ontology is flexible, and has simplified knowledge structure populated with semantically homogeneous ontology concepts. The user-oriented ontology is different from generic ontology models, as it does not rely on knowledge experts. Its structure enables users to browse, edit and create new entries on their own. To solve the semantic gap problem in personal information retrieval, this thesis proposes a semantic ontology-based feature matching method. It involves natural language processing and semantic feature extraction/selection using the user-oriented ontology. It comprises four stages: (i) user-oriented ontology building, (ii) semantic feature extraction for building vectors representing information objects, (iii) semantic feature selection using the user-oriented ontology, and (iv) measuring the similarity between the information objects. To facilitate personal information management and dynamic generation of content, the system uses ontologies and advanced algorithms for semantic feature matching. An algorithm named Onto-SVD is also proposed, which uses the user-oriented ontology to automatically detect the semantic relations within the stored memories. It combines semantic feature selection with matrix factorisation and k-means clustering to achieve topic identification based on semantic relations. The thesis further proposes an ontology-based personalised retrieval mechanism for the system. It aims to assist people to recall, browse and re-discover events from their lives by considering their profiles and background knowledge, and providing them v with customised retrieval results. Furthermore, a user profile space model is defined, and its construction method is also described. The model combines multiple useroriented ontologies and has a self-organised structure based on relevance feedback. The identification of person’s search intentions in this mechanism is on the conceptual level and involves the person’s background knowledge. Based on the identified search intentions, knowledge spanning trees are automatically generated from the ontologies or user profile spaces. The knowledge spanning trees are used to expand and reform queries, which enhance the queries’ semantic representations by applying domain knowledge. The crowdsourcing-based system evaluation measures users’ satisfaction on the generated content of Sem-LSB. It compares the advantage and disadvantage of three types of content presentations (i.e. unstructured, LSB-based and semantic/knowledgebased). Based on users’ feedback, the semantic/knowledge-based presentation is considered to have higher overall satisfaction and stronger reminiscing support effects than the others

    Enhanced results for web search

    No full text
    “Ten blue links ” have defined web search results for the last fifteen years – snippets of text combined with document titles and URLs. In this paper, we establish the notion of enhanced search results that extend web search results to include multimedia objects such as images and video, intentspecific key value pairs, and elements that allow the user to interact with the contents of a web page directly from the search results page. We show that users express a preference for enhanced results both explicitly, and when observed in their search behavior. We also demonstrate the effectiveness of enhanced results in helping users to assess the relevance of search results. Lastly, we show that we can efficiently generate enhanced results to cover a significant fraction of search result pages

    Enhanced results for web search

    No full text
    "Ten blue links" have defined web search results for the last fifteen years -- snippets of text combined with document titles and URLs. In this paper, we establish the notion of enhanced search results that extend web search results to include multimedia objects such as images and video, intent-specific key value pairs, and elements that allow the user to interact with the contents of a web page directly from the search results page. We show that users express a preference for enhanced results both explicitly, and when observed in their search behavior. We also demonstrate the effectiveness of enhanced results in helping users to assess the relevance of search results. Lastly, we show that we can efficiently generate enhanced results to cover a significant fraction of search result pages
    corecore