64 research outputs found

    Using compression to identify acronyms in text

    Get PDF
    Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key technology for text mining, and backed this up with a study that showed how particular kinds of lexical tokens---names, dates, locations, etc.---can be identified and located in running text, using compression models to provide the leverage necessary to distinguish different token types (Witten et al., 1999)Comment: 10 pages. A short form published in DCC200

    Text Augmentation: Inserting markup into natural language text with PPM Models

    Get PDF
    This thesis describes a new optimisation and new heuristics for automatically marking up XML documents. These are implemented in CEM, using PPMmodels. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods. Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BIBTEX system and markedup in XML with every field from the original BIBTEX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists’ Communique corpus and the Reuters’ corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory. A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked-up documents. The performance of the new heuristics and optimisation is examined using the four corpora

    Using video-based examiner score comparison and adjustment (VESCA) to compare the influence of examiners at different sites in a distributed objective structured clinical exam (OSCE).

    Get PDF
    PurposeEnsuring equivalence of examiners' judgements within distributed objective structured clinical exams (OSCEs) is key to both fairness and validity but is hampered by lack of cross-over in the performances which different groups of examiners observe. This study develops a novel method called Video-based Examiner Score Comparison and Adjustment (VESCA) using it to compare examiners scoring from different OSCE sites for the first time.Materials/ methodsWithin a summative 16 station OSCE, volunteer students were videoed on each station and all examiners invited to score station-specific comparator videos in addition to usual student scoring. Linkage provided through the video-scores enabled use of Many Facet Rasch Modelling (MFRM) to compare 1/ examiner-cohort and 2/ site effects on students' scores.ResultsExaminer-cohorts varied by 6.9% in the overall score allocated to students of the same ability. Whilst only a tiny difference was apparent between sites, examiner-cohort variability was greater in one site than the other. Adjusting student scores produced a median change in rank position of 6 places (0.48 deciles), however 26.9% of students changed their rank position by at least 1 decile. By contrast, only 1 student's pass/fail classification was altered by score adjustment.ConclusionsWhilst comparatively limited examiner participation rates may limit interpretation of score adjustment in this instance, this study demonstrates the feasibility of using VESCA for quality assurance purposes in large scale distributed OSCEs

    Using video-based examiner score comparison and adjustment (VESCA) to compare the influence of examiners at different sites in a distributed objective structured clinical exam (OSCE)

    Get PDF
    Purpose: Ensuring equivalence of examiners’ judgements within distributed objective structured clinical exams (OSCEs) is key to both fairness and validity but is hampered by lack of cross-over in the performances which different groups of examiners observe. This study develops a novel method called Video-based Examiner Score Comparison and Adjustment (VESCA) using it to compare examiners scoring from different OSCE sites for the first time. Materials/ methods: Within a summative 16 station OSCE, volunteer students were videoed on each station and all examiners invited to score station-specific comparator videos in addition to usual student scoring. Linkage provided through the video-scores enabled use of Many Facet Rasch Modelling (MFRM) to compare 1/ examiner-cohort and 2/ site effects on students’ scores. Results: Examiner-cohorts varied by 6.9% in the overall score allocated to students of the same ability. Whilst only a tiny difference was apparent between sites, examiner-cohort variability was greater in one site than the other. Adjusting student scores produced a median change in rank position of 6 places (0.48 deciles), however 26.9% of students changed their rank position by at least 1 decile. By contrast, only 1 student’s pass/fail classification was altered by score adjustment. Conclusions: Whilst comparatively limited examiner participation rates may limit interpretation of score adjustment in this instance, this study demonstrates the feasibility of using VESCA for quality assurance purposes in large scale distributed OSCEs

    Caenorhabditis elegans Genomic Response to Soil Bacteria Predicts Environment-Specific Genetic Effects on Life History Traits

    Get PDF
    With the post-genomic era came a dramatic increase in high-throughput technologies, of which transcriptional profiling by microarrays was one of the most popular. One application of this technology is to identify genes that are differentially expressed in response to different environmental conditions. These experiments are constructed under the assumption that the differentially expressed genes are functionally important in the environment where they are induced. However, whether differential expression is predictive of functional importance has yet to be tested. Here we have addressed this expectation by employing Caenorhabditis elegans as a model for the interaction of native soil nematode taxa and soil bacteria. Using transcriptional profiling, we identified candidate genes regulated in response to different bacteria isolated in association with grassland nematodes or from grassland soils. Many of the regulated candidate genes are predicted to affect metabolism and innate immunity suggesting similar genes could influence nematode community dynamics in natural systems. Using mutations that inactivate 21 of the identified genes, we showed that most contribute to lifespan and/or fitness in a given bacterial environment. Although these bacteria may not be natural food sources for C. elegans, we show that changes in food source, as can occur in environmental disturbance, can have a large effect on gene expression, with important consequences for fitness. Moreover, we used regression analysis to demonstrate that for many genes the degree of differential gene expression between two bacterial environments predicted the magnitude of the effect of the loss of gene function on life history traits in those environments

    Imperfection and radiation damage in protein crystals studied with coherent radiation

    Get PDF
    Fringes and speckles occur within diffraction spots when a crystal is illuminated with coherent radiation during X-ray diffraction. The additional information in these features provides insight into the imperfections in the crystal at the sub-micrometre scale. In addition, these features can provide more accurate intensity measurements (e.g. by model-based profile fitting), detwinning (by distinguishing the various components), phasing (by exploiting sampling of the molecular transform) and refinement (by distinguishing regions with different unit-cell parameters). In order to exploit these potential benefits, the features due to coherent diffraction have to be recorded and any change due to radiation damage properly modelled. Initial results from recording coherent diffraction at cryotemperatures from polyhedrin crystals of approximately 2 µm in size are described. These measurements allowed information about the type of crystal imperfections to be obtained at the sub-micrometre level, together with the changes due to radiation damage

    The Science Performance of JWST as Characterized in Commissioning

    Full text link
    This paper characterizes the actual science performance of the James Webb Space Telescope (JWST), as determined from the six month commissioning period. We summarize the performance of the spacecraft, telescope, science instruments, and ground system, with an emphasis on differences from pre-launch expectations. Commissioning has made clear that JWST is fully capable of achieving the discoveries for which it was built. Moreover, almost across the board, the science performance of JWST is better than expected; in most cases, JWST will go deeper faster than expected. The telescope and instrument suite have demonstrated the sensitivity, stability, image quality, and spectral range that are necessary to transform our understanding of the cosmos through observations spanning from near-earth asteroids to the most distant galaxies.Comment: 5th version as accepted to PASP; 31 pages, 18 figures; https://iopscience.iop.org/article/10.1088/1538-3873/acb29

    Design patterns in garbage collection

    Get PDF
    This thesis presents an examination of design patterns within the context of garbage collection. Initially, I review garbage collection and design patterns. Four garbage collectors are then examined and the design patterns found described. Both domain specific and generic patterns are described. The domain specific patterns are TriColour and RootSet, the generic patterns are Adaptor, Facade, Iterator and Proxy. It is hoped that by, applying these patterns, systems designers have access to a less efficient, but simpler and more flexible way of implementing and reusing garbage collectors in programming languages. The requirements analysis for a garbage collector for a real-time object-oriented micro-kernel is then performed, and a design prepared using the design patterns found in the other garbage collectors. The garbage collector is then implemented in Java using appropriate data structures. Due to timing difficulties in the runtime environment, timing was ruled out as a method of performance analysis. Algorithmic analysis is performed to evaluate the worst-case performance of the collector, which is found to be satisfactory in all but one method of the RootSet implementation. An approach to remedying this is suggested

    Automatic extraction of acronyms from text

    No full text
    A brief introduction to acronyms is given and motivation for extracting them in a digital library environment is discussed. A technique for extracting acronyms is given with an analysis of the results. The technique is found to have a low number of false negatives and a high number of false positives
    corecore