52 research outputs found
Using Ontologies for Extracting Product Features from Web Pages
Abstract. In this paper, we show how to use ontologies to bootstrap a knowledge acquisition process that extracts product information from tabular data on Web pages. Furthermore, we use logical rules to reason about product specific properties and to derive higher-order knowledge about product features. We will also explain the knowledge acquisition process, covering both ontological and procedural aspects. Finally, we will give an qualitative and quantitative evaluation of our results.
Learning from text-based close call data
A key feature of big data is the variety of data sources that are available; which include not just numerical data but also image or video data or even free text. The GB railways collects a large volume of free text data daily from railway workers describing close call hazard reports: instances where an accident could have – but did not – occur. These close call reports contain valuable safety information which could be useful in managing safety on the railway, but which can be lost in the very large volume of data – much larger than is viable for a human analyst to read. This paper describes the application of rudimentary natural language processing (NLP) techniques to uncover safety information from close calls. The analysis has proven that basic information extraction is possible using the rudimentary techniques, but has also identified some limitations that arise using only basic techniques. Using these findings further research in this area intends to look at how the techniques that have been proven to date can be improved with the use of more advanced NLP techniques coupled with machine-learning
Automatic location and separation of records: A case study in the genealogical domain
Abstract. Locating specific chunks (records) of information within documents on the web is an interesting and nontrivial problem. If the problem of locating and separating records can be solved well, the longstanding problem of grouping extracted values into appropriate relationships in a record structure can be more easily resolved. Our solution is a hybrid of two well established techniques: (1) ontology-based extraction [ECJ + 99] and (2) vector space modeling [SM83]. To show that the technique has merit, we apply it to the particularly challenging task of locating and separating records for genealogical web documents, which tend to vary considerably in layout and format. Experiments we have conducted show this technique yields an average of 92 % recall and 93 % precision for locating and separating genealogical records in web documents.
- …