2,515 research outputs found

    Transcription enhancement of a digitised multi-lingual pamphlet collection: a case study and guide for similar projects

    Get PDF
    UCL Library Services holds an extensive collection of over 9,000 Jewish pamphlets, many of these extremely rare. Over the past five years, UCL has embarked on a project to widen access to this collection through an extensive programme of cataloguing, conservation and digitisation. With the cataloguing complete and the most fragile items conserved, the focus is now on making these texts available to global audiences via UCL Digital Collections website. The pamphlets were ranked for rarity, significance and fragility and the highest-scoring selected for digitisation. Unique identifiers allocated at the point of cataloguing were used to track individual pamphlets through the stages of the project. This guide details the text-enhancement methods used, highlighting particular issues relating to Hebrew scripts and early-printed texts. Initial attempts to enable images of these pamphlets to be searched digitally relied on the Optical Character Recognition (OCR) embedded within the software used to create the PDF files. Whilst satisfactory for texts chiefly in Roman script, it provided no reliable means to search the extensive corpus of texts in Hebrew. Generous advice offered by the National Library of Israel led to our adoption of ABBYY FineReader software as a means of enhancing the transcriptions embedded within the PDF files. Following image capture, JPEG files were used to create multi-page PDF files of each pamphlet. Pre-processing in ABBYY FineReader consisted of: setting the language and colour mode; detecting page orientation; selecting and refining areas of the text to be read; reading the text to produce a transcription. The resultant files were stored in folders according to language of text. The software highlighted spelling errors and doubtful readings. A verification tool allowed transcribers to correct these as required. However, some erroneous or doubtful readings were nevertheless genuine words and not highlighted; it was therefore essential to proofread the text, particularly for early-printed scripts. Transcribers maintained logs of common errors; additionally, problems with Hebrew vocalisations, cursive and Gothic scripts were noted. During initial quality checks of the transcriptions, many text searches were unsuccessful due to previously unidentified spacings occurring within words. This was generally linked to the font size being too small. Maintaining logs of font sizes used led to the adoption of a minimum of Arial 8 or Times New Roman 10 in transcribed text. The methodology was revised to include the preliminary quality-checking of one page. We concluded that it was difficult to develop a standardised procedure applicable to all texts given the variance in language, script and typography. However, we concluded that the font Arial gave the most successful accuracy ratings for Hebrew script, minimum text size 17, minimum title size 25. ABBYY file preparation took a minimum of 1.5 hours per pamphlet; transcription correction took an average of 10.4 minutes per page; the final quality check took 30 minutes per pamphlet. On average, the work on each pamphlet took a minimum of 6 hours to complete. As a result of the project, average accuracy ratings improved from 60% to 89%, the greatest improvement being for pre-1800 and Hebrew script publications. We are therefore inclined to focus future transcription-enhancement activity on these types of publication for the remainder of our Jewish Pamphlet Collections

    Optimal search strategies for identifying sound clinical prediction studies in EMBASE

    Get PDF
    BACKGROUND: Clinical prediction guides assist clinicians by pointing to specific elements of the patient's clinical presentation that should be considered when forming a diagnosis, prognosis or judgment regarding treatment outcome. The numbers of validated clinical prediction guides are growing in the medical literature, but their retrieval from large biomedical databases remains problematic and this presents a barrier to their uptake in medical practice. We undertook the systematic development of search strategies ("hedges") for retrieval of empirically tested clinical prediction guides from EMBASE. METHODS: An analytic survey was conducted, testing the retrieval performance of search strategies run in EMBASE against the gold standard of hand searching, using a sample of all 27,769 articles identified in 55 journals for the 2000 publishing year. All articles were categorized as original studies, review articles, general papers, or case reports. The original and review articles were then tagged as 'pass' or 'fail' for methodologic rigor in the areas of clinical prediction guides and other clinical topics. Search terms that depicted clinical prediction guides were selected from a pool of index terms and text words gathered in house and through request to clinicians, librarians and professional searchers. A total of 36,232 search strategies composed of single and multiple term phrases were trialed for retrieval of clinical prediction studies. The sensitivity, specificity, precision, and accuracy of search strategies were calculated to identify which were the best. RESULTS: 163 clinical prediction studies were identified, of which 69 (42.3%) passed criteria for scientific merit. A 3-term strategy optimized sensitivity at 91.3% and specificity at 90.2%. Higher sensitivity (97.1%) was reached with a different 3-term strategy, but with a 16% drop in specificity. The best measure of specificity (98.8%) was found in a 2-term strategy, but with a considerable fall in sensitivity to 60.9%. All single term strategies performed less well than 2- and 3-term strategies. CONCLUSION: The retrieval of sound clinical prediction studies from EMBASE is supported by several search strategies

    Mapping an atlas of tissue-specific drosophila melanogaster metabolomes by high resolution mass spectrometry

    Get PDF
    Metabolomics can provide exciting insights into organismal function, but most work on simple models has focussed on the whole organism metabolome, so missing the contributions of individual tissues. Comprehensive metabolite profiles for ten tissues from adult Drosophila melanogaster were obtained here by two chromatographic methods, a hydrophilic interaction (HILIC) method for polar metabolites and a lipid profiling method also based on HILIC, in combination with an Orbitrap Exactive instrument. Two hundred and forty two polar metabolites were putatively identified in the various tissues, and 251 lipids were observed in positive ion mode and 61 in negative ion mode. Although many metabolites were detected in all tissues, every tissue showed characteristically abundant metabolites which could be rationalised against specific tissue functions. For example, the cuticle contained high levels of glutathione, reflecting a role in oxidative defence; the alimentary canal (like vertebrate gut) had high levels of acylcarnitines for fatty acid metabolism, and the head contained high levels of ether lipids. The male accessory gland uniquely contained decarboxylated S-adenosylmethionine. These data thus both provide valuable insights into tissue function, and a reference baseline, compatible with the FlyAtlas.org transcriptomic resource, for further metabolomic analysis of this important model organism, for example in the modelling of human inborn errors of metabolism, aging or metabolic imbalances such as diabetes

    SentiBench - a benchmark comparison of state-of-the-practice sentiment analysis methods

    Get PDF
    In the last few years thousands of scientific papers have investigated sentiment analysis, several startups that measure opinions on real data have emerged and a number of innovative products related to this theme have been developed. There are multiple methods for measuring sentiments, including lexical-based and supervised machine learning methods. Despite the vast interest on the theme and wide popularity of some methods, it is unclear which one is better for identifying the polarity (i.e., positive or negative) of a message. Accordingly, there is a strong need to conduct a thorough apple-to-apple comparison of sentiment analysis methods, \textit{as they are used in practice}, across multiple datasets originated from different data sources. Such a comparison is key for understanding the potential limitations, advantages, and disadvantages of popular methods. This article aims at filling this gap by presenting a benchmark comparison of twenty-four popular sentiment analysis methods (which we call the state-of-the-practice methods). Our evaluation is based on a benchmark of eighteen labeled datasets, covering messages posted on social networks, movie and product reviews, as well as opinions and comments in news articles. Our results highlight the extent to which the prediction performance of these methods varies considerably across datasets. Aiming at boosting the development of this research area, we open the methods' codes and datasets used in this article, deploying them in a benchmark system, which provides an open API for accessing and comparing sentence-level sentiment analysis methods

    Dynamic Epitope Expression from Static Cytometry Data: Principles and Reproducibility

    Get PDF
    Background: An imprecise quantitative sense for the oscillating levels of proteins and their modifications, interactions, and translocations as a function of the cell cycle is fundamentally important for a cartoon/narrative understanding for how the cell cycle works. Mathematical modeling of the same cartoon/narrative models would be greatly enhanced by an openended methodology providing precise quantification of many proteins and their modifications, etc. Here we present methodology that fulfills these features. Methodology: Multiparametric flow cytometry was performed on Molt4 cells to measure cyclins A2 and B1, phospho-S10histone H3, DNA content, and light scatter (cell size). The resulting 5 dimensional data were analyzed as a series of bivariate plots to isolate the data as segments of an N-dimensional ‘‘worm’ ’ through the data space. Sequential, unidirectional regions of the data were used to assemble expression profiles for each parameter as a function of cell frequency. Results: Analysis of synthesized data in which the true values where known validated the approach. Triplicate experiments demonstrated exceptional reproducibility. Comparison of three triplicate experiments stained by two methods (single cyclin or dual cyclin measurements with common DNA and phospho-histone H3 measurements) supported the feasibility of combining an unlimited number of epitopes through this methodology. The sequential degradations of cyclin A2 followed by cyclin B1 followed by de-phosphorylation of histone H3 were precisely mapped. Finally, a two phase expression rat

    The Attitudes to Ageing Questionnaire:Mokken Scaling Analysis

    Get PDF
    BACKGROUND:Hierarchical scales are useful in understanding the structure of underlying latent traits in many questionnaires. The Attitudes to Ageing Questionnaire (AAQ) explored the attitudes to ageing of older people themselves, and originally described three distinct subscales: (1) Psychosocial Loss (2) Physical Change and (3) Psychological Growth. This study aimed to use Mokken analysis, a method of Item Response Theory, to test for hierarchies within the AAQ and to explore how these relate to underlying latent traits. METHODS:Participants in a longitudinal cohort study, the Lothian Birth Cohort 1936, completed a cross-sectional postal survey. Data from 802 participants were analysed using Mokken Scaling analysis. These results were compared with factor analysis using exploratory structural equation modelling. RESULTS:Participants were 51.6% male, mean age 74.0 years (SD 0.28). Three scales were identified from 18 of the 24 items: two weak Mokken scales and one moderate Mokken scale. (1) 'Vitality' contained a combination of items from all three previously determined factors of the AAQ, with a hierarchy from physical to psychosocial; (2) 'Legacy' contained items exclusively from the Psychological Growth scale, with a hierarchy from individual contributions to passing things on; (3) 'Exclusion' contained items from the Psychosocial Loss scale, with a hierarchy from general to specific instances. All of the scales were reliable and statistically significant with 'Legacy' showing invariant item ordering. The scales correlate as expected with personality, anxiety and depression. Exploratory SEM mostly confirmed the original factor structure. CONCLUSIONS:The concurrent use of factor analysis and Mokken scaling provides additional information about the AAQ. The previously-described factor structure is mostly confirmed. Mokken scaling identifies a new factor relating to vitality, and a hierarchy of responses within three separate scales, referring to vitality, legacy and exclusion. This shows what older people themselves consider important regarding their own ageing

    Characterisation of the pathogenic effects of the in vivo expression of an ALS-linked mutation in D-amino acid oxidase: Phenotype and loss of spinal cord motor neurons

    Get PDF
    Amyotrophic lateral sclerosis (ALS) is the most common adult-onset neuromuscular disorder characterised by selective loss of motor neurons leading to fatal paralysis. Current therapeutic approaches are limited in their effectiveness. Substantial advances in understanding ALS disease mechanisms has come from the identification of pathogenic mutations in dominantly inherited familial ALS (FALS). We previously reported a coding mutation in D-amino acid oxidase (DAOR199W) associated with FALS. DAO metabolises D-serine, an essential co-agonist at the N-Methyl-D-aspartic acid glutamate receptor subtype (NMDAR). Using primary motor neuron cultures or motor neuron cell lines we demonstrated that expression of DAOR199W, promoted the formation of ubiquitinated protein aggregates, activated autophagy and increased apoptosis. The aim of this study was to characterise the effects of DAOR199W in vivo, using transgenic mice overexpressing DAOR199W. Marked abnormal motor features, e.g. kyphosis, were evident in mice expressing DAOR199W, which were associated with a significant loss (19%) of lumbar spinal cord motor neurons, analysed at 14 months. When separated by gender, this effect was greater in females (26%; p< 0.0132). In addition, we crossed the DAOR199W transgenic mouse line with the SOD1G93A mouse model of ALS to determine whether the effects of SOD1G93A were potentiated in the double transgenic line (DAOR199W/SOD1G93A). Although overall survival was not affected, onset of neurological signs was significantly earlier in female double transgenic animals than their female SOD1G93A littermates (125 days vs 131 days, P = 0.0239). In summary, some significant in vivo effects of DAOR199W on motor neuron function (i.e. kyphosis and loss of motor neurons) were detected which were most marked in females and could contribute to the earlier onset of neurological signs in double transgenic females compared to SOD1G93A littermates, highlighting the importance of recognizing gender effects present in animal models of ALS

    An extension of Laplace's method

    Get PDF
    Asymptotic expansions are obtained for contour integrals of the form abexp(zp(t)+zν/μr(t))q(t)dt, \int_a^b \exp \left( - zp(t) + z^{\nu /\mu } r(t) \right)q(t)dt, in which zz is a large real or complex parameter, p(t)p(t), q(t)q(t) and r(t)r(t) are analytic functions of tt, and the positive constants μ\mu and ν\nu are related to the local behaviour of the functions p(t)p(t) and r(t)r(t) near the endpoint aa. Our main theorem includes as special cases several important asymptotic methods for integrals such as those of Laplace, Watson, Erd\'elyi and Olver. Asymptotic expansions similar to ours were derived earlier by Dingle using formal, non-rigorous methods. The results of the paper also serve to place Dingle's investigations on a rigorous mathematical foundation. The new results have potential applications in the asymptotic theory of special functions in transition regions, and we illustrate this by two examples.Comment: 19 pages, 2 figures, revised version, accepted for publication in Constructive Approximatio

    Otitis media in young Aboriginal children from remote communities in Northern and Central Australia: a cross-sectional survey

    Get PDF
    BACKGROUND: Middle ear disease (otitis media) is common and frequently severe in Australian Aboriginal children. There have not been any recent large-scale surveys using clear definitions and a standardised middle ear assessment. The aim of the study was to determine the prevalence of middle ear disease (otitis media) in a high-risk population of young Aboriginal children from remote communities in Northern and Central Australia. METHODS: 709 Aboriginal children aged 6–30 months living in 29 communities from 4 health regions participated in the study between May and November 2001. Otitis media (OM) and perforation of the tympanic membrane (TM) were diagnosed by tympanometry, pneumatic otoscopy, and video-otoscopy. We used otoscopic criteria (bulging TM or recent perforation) to diagnose acute otitis media. RESULTS: 914 children were eligible to participate in the study and 709 were assessed (78%). Otitis media affected nearly all children (91%, 95%CI 88, 94). Overall prevalence estimates adjusted for clustering by community were: 10% (95%CI 8, 12) for unilateral otitis media with effusion (OME); 31% (95%CI 27, 34) for bilateral OME; 26% (95%CI 23, 30) for acute otitis media without perforation (AOM/woP); 7% (95%CI 4, 9) for AOM with perforation (AOM/wiP); 2% (95%CI 1, 3) for dry perforation; and 15% (95%CI 11, 19) for chronic suppurative otitis media (CSOM). The perforation prevalence ranged from 0–60% between communities and from 19–33% between regions. Perforations of the tympanic membrane affected 40% of children in their first 18 months of life. These were not always persistent. CONCLUSION: Overall, 1 in every 2 children examined had otoscopic signs consistent with suppurative ear disease and 1 in 4 children had a perforated tympanic membrane. Some of the children with intact tympanic membranes had experienced a perforation that healed before the survey. In this high-risk population, high rates of tympanic perforation were associated with high rates of bulging of the tympanic membrane

    Dimethyl sulfide production: what is the contribution of the coccolithophores?

    Get PDF
    corecore