643 research outputs found
Automatic extraction of knowledge from web documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper
Methodological considerations concerning manual annotation of musical audio in function of algorithm development
In research on musical audio-mining, annotated music databases are needed which allow the development of computational tools that extract from the musical audiostream the kind of high-level content that users can deal with in Music Information Retrieval (MIR) contexts. The notion of musical content, and therefore the notion of annotation, is ill-defined, however, both in the syntactic and semantic sense. As a consequence, annotation has been approached from a variety of perspectives (but mainly linguistic-symbolic oriented), and a general methodology is lacking. This paper is a step towards the definition of a general framework for manual annotation of musical audio in function of a computational approach to musical audio-mining that is based on algorithms that learn from annotated data. 1
Web based knowledge extraction and consolidation for automatic ontology instantiation
The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically ex-tract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to gen-erate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation
Recommended from our members
Facilitating file retrieval on resource limited devices
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid development of mobile technologies has facilitated users to generate and store files on mobile devices. However, it has become a challenging issue for users to search efficiently and effectively for files of interest in a mobile environment that involves a large number of mobile nodes. In this thesis, file management and retrieval alternatives have been investigated to propose a feasible framework that can be employed on resource-limited devices without altering their operating systems. The file annotation and retrieval framework (FARM) proposed in the thesis automatically annotates the files with their basic file attributes by extracting them from the underlying operating system of the device. The framework is implemented in the JME platform as a case study. This framework provides a variety of features for managing the metadata and file search features on the device itself and on other devices in a networked environment. FARM not only automates the file-search process but also provides accurate results as demonstrated by the experimental analysis.
In order to facilitate a file search and take advantage of the Semantic Web Technologies, the SemFARM framework is proposed which utilizes the knowledge of a generic ontology. The generic ontology defines the most common keywords that can be used as the metadata of stored files. This provides semantic-based file search capabilities on low-end devices where the search keywords are enriched with additional knowledge extracted from the defined ontology. The existing frameworks annotate image files only, while SemFARM can be used to annotate all types of files.
Semantic heterogeneity is a challenging issue and necessitates extensive research to accomplish the aim of a semantic web. For this reason, significant research efforts have been made in recent years by proposing an enormous number of ontology alignment systems to deal with ontology heterogeneities.
In the process of aligning different ontologies, it is essential to encompass their semantic, structural or any system-specific measures in mapping decisions to produce more accurate alignments. The proposed solution, in this thesis, for ontology alignment presents a structural matcher, which computes the similarity between the super-classes, sub-classes and properties of two entities from different ontologies that require aligning. The proposed alignment system (OARS)
uses Rough Sets to aggregate the results obtained from various matchers in order to deal with uncertainties during the mapping process of entities. The OARS uses a combinational approach by using a string-based and linguistic-based matcher, in addition to structural-matcher for computing the overall similarity between two entities. The performance of the OARS is evaluated in comparison with existing state of the art alignment systems in terms of precision and recall. The performance tests are performed by using benchmark ontologies and the results show significant improvements, specifically in terms of recall on all groups of test ontologies. There is no such existing framework, which can use alignments for file search on mobile devices.
The ontology alignment paradigm is integrated in the SemFARM to further enhance the file search features of the framework as it utilises the knowledge of more than one ontology in order to perform a search query. The experimental evaluations show that it performs better in terms of precision and recall where more than one ontology is available when searching for a required file.Education Commission of Pakistan and the University of Engineering & Technology, Peshawa
Semantic Content Mediation and Acquisition: The Challenge for Semantic e-Business Solutions
A Top Quadrant report situates the Semantic Web within the current Innovation Wave of “Distributed Intelligence”. This is one of the main innovation waves of the last centuries including textile, railway, auto, computer, distributed intelligence (1997-2061) and nanotechnology (2007-2081). The Distributed Intelligence wave started in the late nineties and is expected to peak between 2010 and 2020. The report estimates first return on investments in 2006-7, growing to a market of $40-60 billion in 2010. Funds are coming primary from governments, venture capitalists and industry commercialization. Over the next few years, this is expected to change in favour of industry commercialization
- …