Search CORE

2,373 research outputs found

Identifying Web Tables - Supporting a Neglected Type of Content on the Web

Author: A Silva
D Embley
J Hu
M Kolchin
MA Babyak
Y Tijerino
Publication venue
Publication date: 23/03/2015
Field of study

The abundance of the data in the Internet facilitates the improvement of extraction and processing tools. The trend in the open data publishing encourages the adoption of structured formats like CSV and RDF. However, there is still a plethora of unstructured data on the Web which we assume contain semantics. For this reason, we propose an approach to derive semantics from web tables which are still the most popular publishing tool on the Web. The paper also discusses methods and services of unstructured data extraction and processing as well as machine learning techniques to enhance such a workflow. The eventual result is a framework to process, publish and visualize linked open data. The software enables tables extraction from various open data sources in the HTML format and an automatic export to the RDF format making the data linked. The paper also gives the evaluation of machine learning techniques in conjunction with string similarity functions to be applied in a tables recognition task.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

Crossref

MEDQUAL: Improving Medical Web Search over Time with Dynamic Credibility Heuristics

Author: Ginsburg Mark - University of Arizona
Publication venue
Publication date: 01/01/2004
Field of study

Performing a search on the World Wide Web (WWW) and traversing the resulting links is an adventure in which one encounters both credible and incredible web pages. Search engines, such as Google, rely on macroscopic Web topology patterns and even highly ranked 'authoritative' web sites may be a mixture of informed and uninformed opinions. Without credibility heuristics to guide the user in a maze of facts, assertions, and inferences, the Web remains an ineffective knowledge delivery platform. This report presents the design and implementation of a modular extension to the popular Google search engine, MEDQUAL, which provisions both URL and content-based heuristic credibility rules to reorder raw Google rankings in the medical domain. MEDQUAL, a software system written in Java, starts with a bootstrap configuration file which loads in basic heuristics in XML format. It then provides a subscription mechanism so users can join birds of feather specialty groups, for example Pediatrics, in order to load specialized heuristics as well. The platform features a coordination mechanism whereby information seekers can effectively become secondary authors, contributing by consensus vote additional credibility heuristics. MEDQUAL uses standard XML namespace conventions to divide opinion groups so that competing groups can be supported simultaneously. The net effect is a merger of basic and supplied heuristics so that the system continues to adapt and improve itself over time to changing web content, changing opinions, and new opinion groups. The key goal of leveraging the intelligence of a large-scale and diffuse WWW user community is met and we conclude by discussing our plans to develop MEDQUAL further and evaluate it

New York University Faculty Digital Archive

A Taxonomy of Workflow Management Systems for Grid Computing

Author: Buyya Rajkumar
Yu Jia
Publication venue
Publication date: 01/01/2005
Field of study

With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

Representation and use of chemistry in the global electronic age.

Author: Murray-Rust Peter
Rzepa Henry S
Tyrrell Simon M
Zhang Yong
Publication venue: Org Biomol Chem
Publication date: 01/01/2004
Field of study

We present an overview of the current state of public semantic chemistry and propose new approaches at a strategic and a detailed level. We show by example how a model for a Chemical Semantic Web can be constructed using machine-processed data and information from journal articles.This manuscript addresses questions of robotic access to data and its automatic re-use, including the role of Open Access archival of data. This is a pre-refereed preprint allowed by the publisher's (Royal Soc. Chemistry) Green policy. The author's preferred manuscript is an HTML hyperdocument with ca. 20 links to images, some of which are JPEgs and some of which are SVG (scalable vector graphics) including animations. There are also links to molecules in CML, for which the Jmol viewer is recommended. We susgeest that readers who wish to see the full glory of the manuscript, download the Zipped version and unpack on their machine. We also supply a PDF and DOC (Word) version which obviously cannot show the animations, but which may be the best palce to start, particularly for those more interested in the text

Crossref

Apollo (Cambridge)

Recommended from our members

AXEL: A framework to deal with ambiguity in three-noun compounds

Author: Matadamas Martinez Jorge
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 6/12/2010.Cognitive Linguistics has been widely used to deal with the ambiguity generated by words in combination. Although this domain offers many solutions to address this challenge, not all of them can be implemented in a computational environment. The Dynamic Construal of Meaning framework is argued to have this ability because it describes an intrinsic degree of association of meanings, which in turn, can be translated into computational programs. A limitation towards a computational approach, however, has been the lack of syntactic parameters. This research argues that this limitation could be overcome with the aid of the Generative Lexicon Theory (GLT). Specifically, this dissertation formulated possible means to marry the GLT and Cognitive Linguistics in a novel rapprochement between the two. This bond between opposing theories provided the means to design a computational template (the AXEL System) by realising syntax and semantics at software levels. An instance of the AXEL system was created using a Design Research approach. Planned iterations were involved in the development to improve artefact performance. Such iterations boosted performance-improving, which accounted for the degree of association of meanings in three-noun compounds. This dissertation delivered three major contributions on the brink of a so-called turning point in Computational Linguistics (CL). First, the AXEL system was used to disclose hidden lexical patterns on ambiguity. These patterns are difficult, if not impossible, to be identified without automatic techniques. This research claimed that these patterns can assist audiences of linguists to review lexical knowledge on a software-based viewpoint. Following linguistic awareness, the second result advocated for the adoption of improved resources by decreasing electronic space of Sense Enumerative Lexicons (SELs). The AXEL system deployed the generation of “at the moment of use” interpretations, optimising the way the space is needed for lexical storage. Finally, this research introduced a subsystem of metrics to characterise an ambiguous degree of association of three-noun compounds enabling ranking methods. Weighing methods delivered mechanisms of classification of meanings towards Word Sense Disambiguation (WSD). Overall these results attempted to tackle difficulties in understanding studies of Lexical Semantics via software tools

Brunel University Research Archive

Data Transformation and Semantic Log Purging for Process Mining

Author: D. Fahland
E. Rahm
H.M.W. Verbeek
L.T. Ly
L.T. Ly
M. Funk
R. Dunkl
R.S. Mans
W.M.P. Aalst van der
W.M.P. Aalst van der
W.M.P. Aalst van der
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2012
Field of study

Existing process mining approaches are able to tolerate a certain degree of noise in the process log. However, processes that contain infrequent paths, multiple (nested) parallel branches, or have been changed in an ad-hoc manner, still pose major challenges. For such cases, process mining typically returns "spaghetti-models", that are hardly usable even as a starting point for process (re-)design. In this paper, we address these challenges by introducing data transformation and pre-processing steps that improve and ensure the quality of mined models for existing process mining approaches. We propose the concept of semantic log purging, the cleaning of logs based on domain specific constraints utilizing semantic knowledge which typically complements processes. Furthermore we demonstrate the feasibility and effectiveness of the approach based on a case study in the higher education domain. We think that semantic log purging will enable process mining to yield better results, thus giving process (re-)designers a valuable tool

DBIS EPub

Crossref

A Techno-Social Approach for Achieving Online Readership Popularity

Author: Du Helen S.
Wagner Christian
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2009
Field of study

Understanding what drives readership popularity in online interactive media has important implications to individual practitioners and net-enabled organizations. For instance, it helps generate a success “formula” for designing potentially popular websites in the increasingly competitive online world. So far, research in this area lacks a unified approach in guiding the design of online interactive media as well as in predicting their successful adoption and use, from both technological and social orientations. Drawing upon the media success literature and related social cognition theories, we establish a techno-social model for achieving online readership popularity, accounting for the impacts of technology-dependent and media-embedded characteristics. The proposed model and hypotheses will be tested by a content analysis of 100+ very popular weblogs and survey of 2000+ active weblog readers. This research carries significant value for sustaining community- and firm-based user networks that have been recognized as an important source of social and knowledge capitals

AIS Electronic Library (AISeL)

Lexically specific knowledge and individual differences in adult native speakers’ processing of the English passive

Author: Dabrowska Ewa
Street James
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2014
Field of study

This article provides experimental evidence for the role of lexically specific representations in the processing of passive sentences and considerable education-related differences in comprehension of the passive construction. The experiment measured response time and decision accuracy of participants with high and low academic attainment using an online task that compared processing and comprehension of active and passive sentences containing verbs strongly associated with the passive and active constructions, as determined by collostructional analysis. As predicted by usage-based accounts, participants’ performance was influenced by frequency (both groups processed actives faster than passives; the low academic attainment participants also made significantly more errors on passive sentences) and lexical specificity (i.e., processing of passives was slower with verbs strongly associated with the active). Contra to proposals made by Dąbrowska and Street (2006), the results suggest that all participants have verb-specific as well as verb-general representations, but that the latter are not as entrenched in the participants with low academic attainment, resulting in less reliable performance. The results also show no evidence of a speed–accuracy trade-off, making alternative accounts of the results (e.g., those of two-stage processing models, such as Townsend & Bever, 2001) problematic

Northumbria Research Link

BlogForever D2.4: Weblog spider prototype and associated methodology

Author: Banos V.
Gulliksen M.
Joy M.
Manolopoulos I.
Rynning M.
Stepanyan K.
Tselepidis I.
Publication venue
Publication date: 25/10/2013
Field of study

The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype

ZENODO