3,551 research outputs found

    Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web

    Get PDF
    The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The combination of this expertise, and the time and space afforded the consortium by the IRC structure, suggested the opportunity for a concerted effort to develop an approach to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to the knowledge management services AKT tries to provide. As a medium for the semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing ontologies to create a third). Ontology mapping, and the elimination of conflicts of reference, will be important tasks. All of these issues are discussed along with our proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which semantic hygiene prevails interesting enough to reason in? These and many other questions need to be addressed if we are to provide effective knowledge technologies for our content on the web

    The Semantic Grid: A future e-Science infrastructure

    No full text
    e-Science offers a promising vision of how computer and communication technology can support and enhance the scientific process. It does this by enabling scientists to generate, analyse, share and discuss their insights, experiments and results in an effective manner. The underlying computer infrastructure that provides these facilities is commonly referred to as the Grid. At this time, there are a number of grid applications being developed and there is a whole raft of computer technologies that provide fragments of the necessary functionality. However there is currently a major gap between these endeavours and the vision of e-Science in which there is a high degree of easy-to-use and seamless automation and in which there are flexible collaborations and computations on a global scale. To bridge this practice–aspiration divide, this paper presents a research agenda whose aim is to move from the current state of the art in e-Science infrastructure, to the future infrastructure that is needed to support the full richness of the e-Science vision. Here the future e-Science research infrastructure is termed the Semantic Grid (Semantic Grid to Grid is meant to connote a similar relationship to the one that exists between the Semantic Web and the Web). In particular, we present a conceptual architecture for the Semantic Grid. This architecture adopts a service-oriented perspective in which distinct stakeholders in the scientific process, represented as software agents, provide services to one another, under various service level agreements, in various forms of marketplace. We then focus predominantly on the issues concerned with the way that knowledge is acquired and used in such environments since we believe this is the key differentiator between current grid endeavours and those envisioned for the Semantic Grid

    New Methods, Current Trends and Software Infrastructure for NLP

    Full text link
    The increasing use of `new methods' in NLP, which the NeMLaP conference series exemplifies, occurs in the context of a wider shift in the nature and concerns of the discipline. This paper begins with a short review of this context and significant trends in the field. The review motivates and leads to a set of requirements for support software of general utility for NLP research and development workers. A freely-available system designed to meet these requirements is described (called GATE - a General Architecture for Text Engineering). Information Extraction (IE), in the sense defined by the Message Understanding Conferences (ARPA \cite{Arp95}), is an NLP application in which many of the new methods have found a home (Hobbs \cite{Hob93}; Jacobs ed. \cite{Jac92}). An IE system based on GATE is also available for research purposes, and this is described. Lastly we review related work.Comment: 12 pages, LaTeX, uses nemlap.sty (included

    A Methodology for Simulated Experiments in Interactive Search

    Get PDF
    Interactive information retrieval has received much attention in recent years, e.g. [7]. Furthermore, increased activity in developing interactive features in search systems used across existing popular Web search engines suggests that interactive systems are being recognised as a promising next step in assisting information search. One of the most challenging problems with interactive systems however remains evaluation. We describe the general specifications of a methodology for conducting controlled and reproducible experiments in the context of interactive search. It was developed in the AutoAdapt project1 focusing on search in intranets, but the methodology is more generic than that and can be applied to interactive Web search as well. The goal of this methodology is to evaluate the ability of different algorithms to produce domain models that provide accurate suggestions for query modifications. The AutoAdapt project investigates the application of automatically constructed adaptive domain models for providing suggestions for query modifications to the users of an intranet search engine. This goes beyond static models such as the one employed to guide users who search the Web site of the University of Essex which is based on a domain model that has been built in advance using the documents’ markup structure

    Adaptive text mining: Inferring structure from sequences

    Get PDF
    Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively

    Utilising semantic technologies for intelligent indexing and retrieval of digital images

    Get PDF
    The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as the exploitation of lexical databases for explicit semantic-based query expansion

    Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

    Full text link
    Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.Comment: 10 pages, 4 figure

    The Digital Puglia Project: An Active Digital Library of Remote Sensing Data

    Get PDF
    The growing need of software infrastructure able to create, maintain and ease the evolution of scientific data, promotes the development of digital libraries in order to provide the user with fast and reliable access to data. In a world that is rapidly changing, the standard view of a digital library as a data repository specialized to a community of users and provided with some search tools is no longer tenable. To be effective, a digital library should be an active digital library, meaning that users can process available data not just to retrieve a particular piece of information, but to infer new knowledge about the data at hand. Digital Puglia is a new project, conceived to emphasize not only retrieval of data to the client's workstation, but also customized processing of the data. Such processing tasks may include data mining, filtering and knowledge discovery in huge databases, compute-intensive image processing (such as principal component analysis, supervised classification, or pattern matching) and on demand computing sessions. We describe the issues, the requirements and the underlying technologies of the Digital Puglia Project, whose final goal is to build a high performance distributed and active digital library of remote sensing data
    • 

    corecore