4,535 research outputs found

    Retrieval, crawling and fusion of entity-centric data on the web

    Get PDF
    While the Web of (entity-centric) data has seen tremendous growth over the past years, take-up and re-use is still limited. Data vary heavily with respect to their scale, quality, coverage or dynamics, what poses challenges for tasks such as entity retrieval or search. This chapter provides an overview of approaches to deal with the increasing heterogeneity of Web data. On the one hand, recommendation, linking, profiling and retrieval can provide efficient means to enable discovery and search of entity-centric data, specifically when dealing with traditional knowledge graphs and linked data. On the other hand, embedded markup such as Microdata and RDFa has emerged a novel, Web-scale source of entitycentric knowledge. While markup has seen increasing adoption over the last few years, driven by initiatives such as schema.org, it constitutes an increasingly important source of entity-centric data on the Web, being in the same order of magnitude as the Web itself with regards to dynamics and scale. To this end, markup data lends itself as a data source for aiding tasks such as knowledge base augmentation, where data fusion techniques are required to address the inherent characteristics of markup data, such as its redundancy, heterogeneity and lack of links. Future directions are concerned with the exploitation of the complementary nature of markup data and traditional knowledge graphs. The final publication is available at Springer via http://dx.doi.org/ 10.1007/978-3-319-53640-8_1

    Automatic extraction of knowledge from web documents

    Get PDF
    A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper

    Raising awareness of the accessibility challenges in mathematics MOOCs

    Get PDF
    MOOCs provide learning environments that make it easier for learners to study from anywhere, at their own pace and with open access to content. This has revolutionised the field of eLearning, but accessibility continues to be a problem, even more so if we include the complexity of the STEM disciplines which have their own specific characteristics. This work presents an analysis of the accessibility of several MOOC platforms which provide courses in mathematics. We attempt to visualise the main web accessibility problems and challenges that disabled learners could face in taking these types of courses, both in general and specifically in the context of the subject of mathematics

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    TechNews digests: Jan - Nov 2009

    Get PDF
    TechNews is a technology, news and analysis service aimed at anyone in the education sector keen to stay informed about technology developments, trends and issues. TechNews focuses on emerging technologies and other technology news. TechNews service : digests september 2004 till May 2010 Analysis pieces and News combined publish every 2 to 3 month

    Tailored retrieval of health information from the web for facilitating communication and empowerment of elderly people

    Get PDF
    A patient, nowadays, acquires health information from the Web mainly through a “human-to-machine” communication process with a generic search engine. This, in turn, affects, positively or negatively, his/her empowerment level and the “human-to-human” communication process that occurs between a patient and a healthcare professional such as a doctor. A generic communication process can be modelled by considering its syntactic-technical, semantic-meaning, and pragmatic-effectiveness levels and an efficacious communication occurs when all the communication levels are fully addressed. In the case of retrieval of health information from the Web, although a generic search engine is able to work at the syntactic-technical level, the semantic and pragmatic aspects are left to the user and this can be challenging, especially for elderly people. This work presents a custom search engine, FACILE, that works at the three communication levels and allows to overcome the challenges confronted during the search process. A patient can specify his/her information requirements in a simple way and FACILE will retrieve the “right” amount of Web content in a language that he/she can easily understand. This facilitates the comprehension of the found information and positively affects the empowerment process and communication with healthcare professionals

    Improving content authoring user experience in the lovelace learning environment

    Get PDF
    Abstract. This thesis provides the analysis, planning, execution and evaluation of a new back- and frontend prototype of the online learning environment Lovelace created by the project group. The pre-requisite for the prototype was to utilize the same technologies as the current live version, and the project groups first major task was to investigate, comprehend and execute them. In addition, the project group was advised not to review the code of the live version to ensure fresh perspective into execution of the new version. The design takes influence from other sources such as Moodle. This thesis covers the process of the design from first sketches and analysis of the set-out requirements. These requirements include extracting the current editing functionality from separate administrator page to the easily accessible lecture pages editing widget, the static website contents caching, support for the existing Lovelace markup text and many others. The implementation phase starts by following the plan created in the design part which made the process more streamlined. Technical aspects of the development are handled in the implementation part of the thesis. Polymorphism, the way the content is rendered to the viewer, explanation and representation of how content forms and caching works are explored here. The evaluation of the finished prototype was executed in form of measurement of websites load times with addition of an expert evaluation meeting with experienced user of the live version of Lovelace. The meeting consisted of different test cases which the attendee had to complete on both old and new versions. These tasks were all timed and the results were vastly better with the project groups prototype. All of the tasks completed with less time with the prototype, and in some cases even twice as fast. Comparing the end result with the pre-requisites, the requirements were met well, and the improvements were proven to be a success.TiivistelmĂ€. TĂ€mĂ€ tutkielma sisĂ€ltÀÀ verkko-opintoympĂ€ristö Lovelacen uuden prototyypin suunnittelun, toteutuksen, analyysin sekĂ€ evaluaation vaiheet. Prototyyppi on projektiryhmĂ€n tuottama uusi front- ja backend toteutus, jonka esivaatimuksena oli hyödyntÀÀ samoja teknologioita kuin alkuperĂ€inen versio. TĂ€mĂ€n johdosta projektin keskeisin tehtĂ€vĂ€ alun kannalta oli tutkia ja opiskella nĂ€itĂ€ teknologioita perusteellisesti. LisĂ€ksi projektiryhmÀÀ ohjeistettiin olemaan katsomatta alkuperĂ€isen version koodia ja toteutusta, jotta projektia lĂ€hestyttĂ€isiin tĂ€ysin uudesta perspektiivistĂ€. Suunnitteluvaiheessa projekti ottaa vaikutteita muista oppimisympĂ€ristöistĂ€ kuten Moodle:sta. TĂ€mĂ€ tutkielma kĂ€sittelee suunnitteluprosessin vaiheet mukaanlukien kĂ€yttöliittymĂ€n luonnokset ja esivaatimusten analyysin. Vaatimuksia ovat editointitoiminnallisuuden siirtĂ€minen erilliseltĂ€ yllĂ€pitosivustolta luentosivulta helposti kĂ€siteltĂ€vÀÀn editointipienoisohjelmaan, vĂ€limuistin implementointi latausnopeuksien nopeuttamiseksi, tuki Lovelacen ”markup” -tekstille sekĂ€ monia muita. Projektin tekninen puoli, kuten polymorfismi, sisĂ€llön renderointi sekĂ€ tarkempi vĂ€limuistin ja sisĂ€ltölomakkeiden toiminta kĂ€sitellÀÀn implementointiosiossa. Valmiin prototyypin evaluointi suoritettiin latausnopeuksien mittauksilla sekĂ€ asiantuntijan arvioinnilla. Kyseinen asiantuntija oli kokenut Lovelacen kĂ€yttĂ€jĂ€ ja hĂ€nelle annettiin kolme erilaista tehtĂ€vÀÀ, jotka hĂ€nen kuului suorittaa sekĂ€ alkuperĂ€isellĂ€ ettĂ€ projektiryhmĂ€n versioilla. NĂ€mĂ€ suoritukset ajoitettiin sekuntikellolla myöhempÀÀ data-analyysia varten. Tulokset prototyyppiversiolla olivat menestyksekkĂ€itĂ€. TehtĂ€viin kulutettu aika oli jokaisella kerralla lyhyempi prototyyppiĂ€ kĂ€yttĂ€essĂ€ ja joissain tapauksissa jopa kaksi kertaa lyhyempi. Kun kokonaiskuvaa lopputuloksesta verrataan alkuvaatimuksiin, niin huomataan, ettĂ€ vaatimukset tavoitettiin ja uudet toiminnallisuudet paransivat todistetusti kĂ€yttĂ€jĂ€kokemusta

    The Semantic Grid: A future e-Science infrastructure

    No full text
    e-Science offers a promising vision of how computer and communication technology can support and enhance the scientific process. It does this by enabling scientists to generate, analyse, share and discuss their insights, experiments and results in an effective manner. The underlying computer infrastructure that provides these facilities is commonly referred to as the Grid. At this time, there are a number of grid applications being developed and there is a whole raft of computer technologies that provide fragments of the necessary functionality. However there is currently a major gap between these endeavours and the vision of e-Science in which there is a high degree of easy-to-use and seamless automation and in which there are flexible collaborations and computations on a global scale. To bridge this practice–aspiration divide, this paper presents a research agenda whose aim is to move from the current state of the art in e-Science infrastructure, to the future infrastructure that is needed to support the full richness of the e-Science vision. Here the future e-Science research infrastructure is termed the Semantic Grid (Semantic Grid to Grid is meant to connote a similar relationship to the one that exists between the Semantic Web and the Web). In particular, we present a conceptual architecture for the Semantic Grid. This architecture adopts a service-oriented perspective in which distinct stakeholders in the scientific process, represented as software agents, provide services to one another, under various service level agreements, in various forms of marketplace. We then focus predominantly on the issues concerned with the way that knowledge is acquired and used in such environments since we believe this is the key differentiator between current grid endeavours and those envisioned for the Semantic Grid
    • 

    corecore