22 research outputs found

    A semiotic approach to ad-hoc networked environments

    Get PDF
    The aim of the work in this thesis is to develop a new approach of interacting with adhoc networked environments. These are networks where devices connect on demand with no underlying network infrastructure. The intention of this work is to develop these environments so that devices and services on these networks can publish their services, query for other services and connect with each other when required. The devices need to be able to perform these actions without prior knowledge of each other, therefore a theory of communication, semiotics, is presented. Ad-hoc networks provide an appropriate test-bed for this application of semiotics as they allow services to `know' about each other and communicate with one another. By using semiotics, we aim to create a representation of communication that allows a system to communicate within the networked environment and ask for services and connections as well as interact with users and provide. This way a user can demand something from the surrounding environment and the elements within this environment can communicate with each other to provide the service the user required. To create an effective model for this representation, various research areas will be discussed such as smart environments, natural language processing, multicast environments and human computer interaction. Principles will be used from all these areas to implement an approach of interacting with smart environments. Different types of smart environments, such as as smart homes and m-commerce environments, will be used to observe how di erent contexts a ect communication. A prototype system was realised for proof of concept and evaluated by subjects. This work highlighted the feasibility of this approach and opened a new area worthwhile of further research.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Recategorising research: Mapping from FoR 2008 to FoR 2020 in Dimensions

    Full text link
    In 2020 the Australia New Zealand Standard Research Classification Fields of Research Codes (ANZSRC FoR codes) were updated by their owners. This has led the sector to need to update their systems of reference and has caused suppliers working in the research information sphere to need to update both systems and data. This paper describes the approach developed by Digital Science's Dimensions team to the creation of an improved machine learning training set, and the mapping of that set from FoR 2008 codes to FoR 2020 codes so that Dimensions classification approach for the ANZSRC codes could be improved and updated.Comment: 10 pages, 6 figures, v2 - more information on translation of dataset to production system, author added to reflect these change

    ChemicalTagger: A tool for semantic text-mining in chemistry.

    Get PDF
    BACKGROUND: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. RESULTS: We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). CONCLUSIONS: It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    OSCAR4: a flexible architecture for chemical text-mining

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract The Open-Source Chemistry Analysis Routines (OSCAR) software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling) that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.Peer Reviewe

    Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

    Get PDF
    Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR

    Information extraction and linked open data in chemistry

    No full text
    Chemists not only produce a significant amount of data-rich scholarly communication artifacts, but have also adopted a highly formulaic style of writing. The literature of this discipline is an attractive target for automated data extraction. In previous work, we have demonstrated the identification and extraction of chemical entities from scientific papers.[1][2] However, we have not addressed the extraction of the relationships linking the chemical entities to both each other as well as to the document object from which they were extracted. Using chemical synthesis procedures as an exemplar, we present a methodology for the extraction of both chemical entities and the relationships between them using these techniques. Chemical synthesis procedures are collected by data-mining the chemical literature. Natural language processing tools and entity recognisers are then used to analyse the individual elements within these procedures and provide a grammatical structure. Relationships between the individual entities are then established. This structured information is then stored in RDF[3] using domain-specific ontologies. Once information is expressed in a semantic format, it can then be searched and indexed using the RDF querying Language SPARQL[4] as well as generate visualisations such as visual document summaries. The ultimate goal of the work documented here is to make data contained in publications available and re-usable by the scientific community

    Evaluating an NLG System using Post-Editing

    No full text
    Computer-generated texts, whether from Natural Language Generation (NLG) or Machine Translation (MT) systems, are often post-edited by humans before being released to users. The frequency and type of post-edits is a measure of how well the system works, and can be used for evaluation. We describe how we have used post-edit data to evaluate SUMTIME-MOUSAM, an NLG system that produces weather forecasts
    corecore