4,535 research outputs found
Retrieval, crawling and fusion of entity-centric data on the web
While the Web of (entity-centric) data has seen tremendous growth over the past years, take-up and re-use is still limited. Data vary heavily with respect to their scale, quality, coverage or dynamics, what poses challenges for tasks such as entity retrieval or search. This chapter provides an overview of approaches to deal with the increasing heterogeneity of Web data. On the one hand, recommendation, linking, profiling and retrieval can provide efficient means to enable discovery and search of entity-centric data, specifically when dealing with traditional knowledge graphs and linked data. On the other hand, embedded markup such as Microdata and RDFa has emerged a novel, Web-scale source of entitycentric knowledge. While markup has seen increasing adoption over the last few years, driven by initiatives such as schema.org, it constitutes an increasingly important source of entity-centric data on the Web, being in the same order of magnitude as the Web itself with regards to dynamics and scale. To this end, markup data lends itself as a data source for aiding tasks such as knowledge base augmentation, where data fusion techniques are required to address the inherent characteristics of markup data, such as its redundancy, heterogeneity and lack of links. Future directions are concerned with the exploitation of the complementary nature of markup data and traditional knowledge graphs. The final publication is available at Springer via http://dx.doi.org/ 10.1007/978-3-319-53640-8_1
Automatic extraction of knowledge from web documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper
Raising awareness of the accessibility challenges in mathematics MOOCs
MOOCs provide learning environments that make it easier for learners to study from anywhere, at their own pace and with open access to content. This has revolutionised the field of eLearning, but accessibility continues to be a problem, even more so if we include the complexity of the STEM disciplines which have their own specific characteristics. This work presents an analysis of the accessibility of several MOOC platforms which provide courses in mathematics. We attempt to visualise the main web accessibility problems and challenges that disabled learners could face in taking these types of courses, both in general and specifically in the context of the subject of mathematics
A Survey on Retrieval of Mathematical Knowledge
We present a short survey of the literature on indexing and retrieval of
mathematical knowledge, with pointers to 72 papers and tentative taxonomies of
both retrieval problems and recurring techniques.Comment: CICM 2015, 20 page
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
TechNews digests: Jan - Nov 2009
TechNews is a technology, news and analysis service aimed at anyone in the education sector keen to stay informed about technology developments, trends and issues. TechNews focuses on emerging technologies and other technology news. TechNews service : digests september 2004 till May 2010 Analysis pieces and News combined publish every 2 to 3 month
Tailored retrieval of health information from the web for facilitating communication and empowerment of elderly people
A patient, nowadays, acquires health information from the Web mainly through a âhuman-to-machineâ
communication process with a generic search engine. This, in turn, affects, positively or negatively, his/her
empowerment level and the âhuman-to-humanâ communication process that occurs between a patient and a
healthcare professional such as a doctor. A generic communication process can be modelled by considering
its syntactic-technical, semantic-meaning, and pragmatic-effectiveness levels and an efficacious
communication occurs when all the communication levels are fully addressed. In the case of retrieval of health
information from the Web, although a generic search engine is able to work at the syntactic-technical level,
the semantic and pragmatic aspects are left to the user and this can be challenging, especially for elderly
people. This work presents a custom search engine, FACILE, that works at the three communication levels
and allows to overcome the challenges confronted during the search process. A patient can specify his/her
information requirements in a simple way and FACILE will retrieve the ârightâ amount of Web content in a
language that he/she can easily understand. This facilitates the comprehension of the found information and
positively affects the empowerment process and communication with healthcare professionals
Improving content authoring user experience in the lovelace learning environment
Abstract. This thesis provides the analysis, planning, execution and evaluation of a new back- and frontend prototype of the online learning environment Lovelace created by the project group. The pre-requisite for the prototype was to utilize the same technologies as the current live version, and the project groups first major task was to investigate, comprehend and execute them. In addition, the project group was advised not to review the code of the live version to ensure fresh perspective into execution of the new version.
The design takes influence from other sources such as Moodle. This thesis covers the process of the design from first sketches and analysis of the set-out requirements. These requirements include extracting the current editing functionality from separate administrator page to the easily accessible lecture pages editing widget, the static website contents caching, support for the existing Lovelace markup text and many others. The implementation phase starts by following the plan created in the design part which made the process more streamlined. Technical aspects of the development are handled in the implementation part of the thesis. Polymorphism, the way the content is rendered to the viewer, explanation and representation of how content forms and caching works are explored here.
The evaluation of the finished prototype was executed in form of measurement of websites load times with addition of an expert evaluation meeting with experienced user of the live version of Lovelace. The meeting consisted of different test cases which the attendee had to complete on both old and new versions. These tasks were all timed and the results were vastly better with the project groups prototype. All of the tasks completed with less time with the prototype, and in some cases even twice as fast. Comparing the end result with the pre-requisites, the requirements were met well, and the improvements were proven to be a success.TiivistelmÀ. TÀmÀ tutkielma sisÀltÀÀ verkko-opintoympÀristö Lovelacen uuden prototyypin suunnittelun, toteutuksen, analyysin sekÀ evaluaation vaiheet. Prototyyppi on projektiryhmÀn tuottama uusi front- ja backend toteutus, jonka esivaatimuksena oli hyödyntÀÀ samoja teknologioita kuin alkuperÀinen versio. TÀmÀn johdosta projektin keskeisin tehtÀvÀ alun kannalta oli tutkia ja opiskella nÀitÀ teknologioita perusteellisesti. LisÀksi projektiryhmÀÀ ohjeistettiin olemaan katsomatta alkuperÀisen version koodia ja toteutusta, jotta projektia lÀhestyttÀisiin tÀysin uudesta perspektiivistÀ.
Suunnitteluvaiheessa projekti ottaa vaikutteita muista oppimisympĂ€ristöistĂ€ kuten Moodle:sta. TĂ€mĂ€ tutkielma kĂ€sittelee suunnitteluprosessin vaiheet mukaanlukien kĂ€yttöliittymĂ€n luonnokset ja esivaatimusten analyysin. Vaatimuksia ovat editointitoiminnallisuuden siirtĂ€minen erilliseltĂ€ yllĂ€pitosivustolta luentosivulta helposti kĂ€siteltĂ€vÀÀn editointipienoisohjelmaan, vĂ€limuistin implementointi latausnopeuksien nopeuttamiseksi, tuki Lovelacen âmarkupâ -tekstille sekĂ€ monia muita. Projektin tekninen puoli, kuten polymorfismi, sisĂ€llön renderointi sekĂ€ tarkempi vĂ€limuistin ja sisĂ€ltölomakkeiden toiminta kĂ€sitellÀÀn implementointiosiossa.
Valmiin prototyypin evaluointi suoritettiin latausnopeuksien mittauksilla sekÀ asiantuntijan arvioinnilla. Kyseinen asiantuntija oli kokenut Lovelacen kÀyttÀjÀ ja hÀnelle annettiin kolme erilaista tehtÀvÀÀ, jotka hÀnen kuului suorittaa sekÀ alkuperÀisellÀ ettÀ projektiryhmÀn versioilla. NÀmÀ suoritukset ajoitettiin sekuntikellolla myöhempÀÀ data-analyysia varten. Tulokset prototyyppiversiolla olivat menestyksekkÀitÀ. TehtÀviin kulutettu aika oli jokaisella kerralla lyhyempi prototyyppiÀ kÀyttÀessÀ ja joissain tapauksissa jopa kaksi kertaa lyhyempi. Kun kokonaiskuvaa lopputuloksesta verrataan alkuvaatimuksiin, niin huomataan, ettÀ vaatimukset tavoitettiin ja uudet toiminnallisuudet paransivat todistetusti kÀyttÀjÀkokemusta
The Semantic Grid: A future e-Science infrastructure
e-Science offers a promising vision of how computer and communication technology can support and enhance the scientific process. It does this by enabling scientists to generate, analyse, share and discuss their insights, experiments and results in an effective manner. The underlying computer infrastructure that provides these facilities is commonly referred to as the Grid. At this time, there are a number of grid applications being developed and there is a whole raft of computer technologies that provide fragments of the necessary functionality. However there is currently a major gap between these endeavours and the vision of e-Science in which there is a high degree of easy-to-use and seamless automation and in which there are flexible collaborations and computations on a global scale. To bridge this practiceâaspiration divide, this paper presents a research agenda whose aim is to move from the current state of the art in e-Science infrastructure, to the future infrastructure that is needed to support the full richness of the e-Science vision. Here the future e-Science research infrastructure is termed the Semantic Grid (Semantic Grid to Grid is meant to connote a similar relationship to the one that exists between the Semantic Web and the Web). In particular, we present a conceptual architecture for the Semantic Grid. This architecture adopts a service-oriented perspective in which distinct stakeholders in the scientific process, represented as software agents, provide services to one another, under various service level agreements, in various forms of marketplace. We then focus predominantly on the issues concerned with the way that knowledge is acquired and used in such environments since we believe this is the key differentiator between current grid endeavours and those envisioned for the Semantic Grid
- âŠ