2,373 research outputs found
Identifying Web Tables - Supporting a Neglected Type of Content on the Web
The abundance of the data in the Internet facilitates the improvement of
extraction and processing tools. The trend in the open data publishing
encourages the adoption of structured formats like CSV and RDF. However, there
is still a plethora of unstructured data on the Web which we assume contain
semantics. For this reason, we propose an approach to derive semantics from web
tables which are still the most popular publishing tool on the Web. The paper
also discusses methods and services of unstructured data extraction and
processing as well as machine learning techniques to enhance such a workflow.
The eventual result is a framework to process, publish and visualize linked
open data. The software enables tables extraction from various open data
sources in the HTML format and an automatic export to the RDF format making the
data linked. The paper also gives the evaluation of machine learning techniques
in conjunction with string similarity functions to be applied in a tables
recognition task.Comment: 9 pages, 4 figure
MEDQUAL: Improving Medical Web Search over Time with Dynamic Credibility Heuristics
Performing a search on the World Wide Web (WWW) and traversing the
resulting links is an adventure in which one encounters both credible
and incredible web pages. Search engines, such as Google, rely on
macroscopic Web topology patterns and even highly ranked 'authoritative'
web sites may be a mixture of informed and uninformed opinions. Without
credibility heuristics to guide the user in a maze of facts, assertions,
and inferences, the Web remains an ineffective knowledge delivery
platform. This report presents the design and implementation of a
modular extension to the popular Google search engine, MEDQUAL, which
provisions both URL and content-based heuristic credibility rules to
reorder raw Google rankings in the medical domain. MEDQUAL, a software
system written in Java, starts with a bootstrap configuration file which
loads in basic heuristics in XML format. It then provides a subscription
mechanism so users can join birds of feather specialty groups, for
example Pediatrics, in order to load specialized heuristics as well. The
platform features a coordination mechanism whereby information seekers
can effectively become secondary authors, contributing by consensus vote
additional credibility heuristics. MEDQUAL uses standard XML namespace
conventions to divide opinion groups so that competing groups can be
supported simultaneously. The net effect is a merger of basic and
supplied heuristics so that the system continues to adapt and improve
itself over time to changing web content, changing opinions, and new
opinion groups. The key goal of leveraging the intelligence of a
large-scale and diffuse WWW user community is met and we conclude by
discussing our plans to develop MEDQUAL further and evaluate it
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
Representation and use of chemistry in the global electronic age.
We present an overview of the current state of public semantic chemistry and propose new approaches at a strategic and a detailed level. We show by example how a model for a Chemical Semantic Web can be constructed using machine-processed data and information from journal articles.This manuscript addresses questions of robotic access to data and its automatic re-use, including the role of Open Access archival of data. This is a pre-refereed preprint allowed by the publisher's (Royal Soc. Chemistry) Green policy. The author's preferred manuscript is an HTML hyperdocument with ca. 20 links to images, some of which are JPEgs and some of which are SVG (scalable vector graphics) including animations. There are also links to molecules in CML, for which the Jmol viewer is recommended. We susgeest that readers who wish to see the full glory of the manuscript, download the Zipped version and unpack on their machine. We also supply a PDF and DOC (Word) version which obviously cannot show the animations, but which may be the best palce to start, particularly for those more interested in the text
Recommended from our members
AXEL: A framework to deal with ambiguity in three-noun compounds
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 6/12/2010.Cognitive Linguistics has been widely used to deal with the ambiguity generated by words in combination. Although this domain offers many solutions to address this challenge, not all of them can be implemented in a computational environment. The Dynamic Construal of Meaning framework is argued to have this ability because it describes an intrinsic degree of association of meanings, which in turn, can be translated into computational programs. A limitation towards a computational approach, however, has been the lack of syntactic parameters. This research argues that this limitation could be overcome with the aid of the Generative Lexicon Theory (GLT). Specifically, this dissertation formulated possible means to marry the GLT and Cognitive Linguistics in a novel rapprochement between the two.
This bond between opposing theories provided the means to design a computational template (the AXEL System) by realising syntax and semantics at software levels. An instance of the AXEL system was created using a Design Research approach. Planned iterations were involved in the development to improve artefact performance. Such iterations boosted performance-improving, which accounted for the degree of association of meanings in three-noun compounds.
This dissertation delivered three major contributions on the brink of a so-called turning point in Computational Linguistics (CL). First, the AXEL system was used to disclose hidden lexical patterns on ambiguity. These patterns are difficult, if not impossible, to be identified without automatic techniques. This research claimed that these patterns can assist audiences of linguists to review lexical knowledge on a software-based viewpoint.
Following linguistic awareness, the second result advocated for the adoption of improved resources by decreasing electronic space of Sense Enumerative Lexicons (SELs). The AXEL system deployed the generation of “at the moment of use” interpretations, optimising the way the space is needed for lexical storage.
Finally, this research introduced a subsystem of metrics to characterise an ambiguous degree of association of three-noun compounds enabling ranking methods. Weighing methods delivered mechanisms of classification of meanings towards Word Sense Disambiguation (WSD). Overall these results attempted to tackle difficulties in understanding studies of Lexical Semantics via software tools
Data Transformation and Semantic Log Purging for Process Mining
Existing process mining approaches are able to tolerate a certain degree of noise in the process log. However, processes that contain infrequent paths, multiple (nested) parallel branches, or have been changed in an ad-hoc manner,
still pose major challenges. For such cases, process mining typically returns "spaghetti-models", that are hardly usable even as a starting point for process (re-)design. In this paper, we address these challenges by introducing data transformation and pre-processing steps that improve and ensure the quality of mined models for existing process mining approaches. We propose the concept of semantic log purging, the cleaning of logs based on domain specific
constraints utilizing semantic knowledge which typically complements processes. Furthermore we demonstrate the feasibility and effectiveness of the approach based
on a case study in the higher education domain. We think that semantic log purging will enable process mining to yield better results, thus giving process (re-)designers a valuable tool
A Techno-Social Approach for Achieving Online Readership Popularity
Understanding what drives readership popularity in online interactive media has important implications to individual practitioners and net-enabled organizations. For instance, it helps generate a success “formula” for designing potentially popular websites in the increasingly competitive online world. So far, research in this area lacks a unified approach in guiding the design of online interactive media as well as in predicting their successful adoption and use, from both technological and social orientations. Drawing upon the media success literature and related social cognition theories, we establish a techno-social model for achieving online readership popularity, accounting for the impacts of technology-dependent and media-embedded characteristics. The proposed model and hypotheses will be tested by a content analysis of 100+ very popular weblogs and survey of 2000+ active weblog readers. This research carries significant value for sustaining community- and firm-based user networks that have been recognized as an important source of social and knowledge capitals
Lexically specific knowledge and individual differences in adult native speakers’ processing of the English passive
This article provides experimental evidence for the role of lexically specific representations in the processing of passive sentences and considerable education-related differences in comprehension of the passive construction. The experiment measured response time and decision accuracy of participants with high and low academic attainment using an online task that compared processing and comprehension of active and passive sentences containing verbs strongly associated with the passive and active constructions, as determined by collostructional analysis. As predicted by usage-based accounts, participants’ performance was influenced by frequency (both groups processed actives faster than passives; the low academic attainment participants also made significantly more errors on passive sentences) and lexical specificity (i.e., processing of passives was slower with verbs strongly associated with the active). Contra to proposals made by Dąbrowska and Street (2006), the results suggest that all participants have verb-specific as well as verb-general representations, but that the latter are not as entrenched in the participants with low academic attainment, resulting in less reliable performance. The results also show no evidence of a speed–accuracy trade-off, making alternative accounts of the results (e.g., those of two-stage processing models, such as Townsend & Bever, 2001) problematic
BlogForever D2.4: Weblog spider prototype and associated methodology
The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype
- …