496 research outputs found

    Fractal-like Distributions over the Rational Numbers in High-throughput Biological and Clinical Data

    Get PDF
    Recent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing, expression profiles, proteomics, and electronic health records are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of distributions that commonly appear in the analysis of such data. These distributions present some interesting features: they are discontinuous in the rational numbers, but continuous in the irrational numbers, and possess a certain self-similar (fractal-like) structure. The first set of examples which we present here are drawn from a high-throughput sequencing experiment. Here, the self-similar distributions appear as part of the evaluation of the error rate of the sequencing technology and the identification of tumorogenic genomic alterations. The other examples are obtained from risk factor evaluation and analysis of relative disease prevalence and co-mordbidity as these appear in electronic clinical data. The distributions are also relevant to identification of subclonal populations in tumors and the study of the evolution of infectious diseases, and more precisely the study of quasi-species and intrahost diversity of viral populations

    Improved Vote Aggregation Techniques for the Geo-Wiki Cropland Capture Crowdsourcing Game

    Get PDF
    Crowdsourcing is a new approach for solving data processing problems for which conventional methods appear to be inaccurate, expensive, or time-consuming. Nowadays, the development of new crowdsourcing techniques is mostly motivated by so called Big Data problems, including problems of assessment and clustering for large datasets obtained in aerospace imaging, remote sensing, and even in social network analysis. By involving volunteers from all over the world, the Geo-Wiki project tackles problems of environmental monitoring with applications to flood resilience, biomass data analysis and classification of land cover. For example, the Cropland Capture Game, which is a gamified version of Geo-Wiki, was developed to aid in the mapping of cultivated land, and was used to gather 4.5 million image classifications from the Earth’s surface. More recently, the Picture Pile game, which is a more generalized version of Cropland Capture, aims to identify tree loss over time from pairs of very high resolution satellite images. Despite recent progress in image analysis, the solution to these problems is hard to automate since human experts still outperform the majority of machine learning algorithms and artificial systems in this field on certain image recognition tasks. The replacement of rare and expensive experts by a team of distributed volunteers seems to be promising, but this approach leads to challenging questions such as: how can individual opinions be aggregated optimally, how can confidence bounds be obtained, and how can the unreliability of volunteers be dealt with? In this paper, on the basis of several known machine learning techniques, we propose a technical approach to improve the overall performance of the majority voting decision rule used in the Cropland Capture Game. The proposed approach increases the estimated consistency with expert opinion from 77% to 86%

    How to Increase the Accuracy of Crowdsourcing Campaigns?

    Get PDF
    Crowdsourcing is a new approach to performing tasks, with a group of volunteers rather than experts. For example, the Geo-Wiki project [1] aims to improve the global land-cover map by crowdsourcing for image recognition. Though crowdsourcing gives a simple way to perform tasks that are hard to automate, analysis of data received from non-experts is a challenging problem that requires a holistic approach. Here we study in detail the dataset of the Cropland Capture game (part of Geo-Wiki project) to increase the accuracy of campaign’s results. Using this analysis, we developed a methodology for a generic type of crowdsourcing campaign similar to the Cropland Capture game. The proposed methodology relies on computer vision and machine learning techniques. Using the Cropland Capture dataset we showed that our methodology increases agreement between aggregated volunteers’ votes and experts’ decisions from 77% to 86%. [1] Fritz, Steffen, et al. “Geo-Wiki. Org: The use of crowdsourcing to improve global land cover.” Remote Sensing 1.3 (2009): 345-354

    Effects of Electron Correlations on Hofstadter Spectrum

    Full text link
    By allowing interactions between electrons, a new Harper's equation is derived to examine the effects of electron correlations on the Hofstadter energy spectra. It is shown that the structure of the Hofstadter butterfly ofr the system of correlated electrons is modified only in the band gaps and the band widths, but not in the characteristics of self-similarity and the Cantor set.Comment: 13 pages, 5 Postscript figure

    T-cell cytotoxicity in the absence of viral protein synthesis in target cells

    Get PDF
    CYTOTOXIC T cells lyse only those virus infected target cells in vitro which express, in addition to the viral antigen(s), those K or D region products of the major histocompati-bility complex (MHC) which were present during anti-viral sensitisation in vivo. This 'associative recogniton' by cytotoxic T cells could reflect the interaction of two T-cell receptors with specificity for target K or D gene products and independently for the viral antigen, or one receptor with specificity for virally altered K or D region products (see ref. 1 and refs therein). There are various ways that the MHC antigens could be altered, including 'modification from within', where the virus modifies host protein synthesis by interfering with transcription2, translation or post-translational glycosylation; or 'modification from without' where enzymic or chemical alteration of cell membrane proteins are induced by virus activity at the cell surface. In this report we show that inactivated Sendai virus or isolated Sendai virus envelopes can serve to modify a cell and make it a specific target for Sendai-immune T-cell killing, thus excluding the possibility of 'modification from within' in this system

    Two-temperature LATE-PCR endpoint genotyping

    Get PDF
    BACKGROUND: In conventional PCR, total amplicon yield becomes independent of starting template number as amplification reaches plateau and varies significantly among replicate reactions. This paper describes a strategy for reconfiguring PCR so that the signal intensity of a single fluorescent detection probe after PCR thermal cycling reflects genomic composition. The resulting method corrects for product yield variations among replicate amplification reactions, permits resolution of homozygous and heterozygous genotypes based on endpoint fluorescence signal intensities, and readily identifies imbalanced allele ratios equivalent to those arising from gene/chromosomal duplications. Furthermore, the use of only a single colored probe for genotyping enhances the multiplex detection capacity of the assay. RESULTS: Two-Temperature LATE-PCR endpoint genotyping combines Linear-After-The-Exponential (LATE)-PCR (an advanced form of asymmetric PCR that efficiently generates single-stranded DNA) and mismatch-tolerant probes capable of detecting allele-specific targets at high temperature and total single-stranded amplicons at a lower temperature in the same reaction. The method is demonstrated here for genotyping single-nucleotide alleles of the human HEXA gene responsible for Tay-Sachs disease and for genotyping SNP alleles near the human p53 tumor suppressor gene. In each case, the final probe signals were normalized against total single-stranded DNA generated in the same reaction. Normalization reduces the coefficient of variation among replicates from 17.22% to as little as 2.78% and permits endpoint genotyping with >99.7% accuracy. These assays are robust because they are consistent over a wide range of input DNA concentrations and give the same results regardless of how many cycles of linear amplification have elapsed. The method is also sufficiently powerful to distinguish between samples with a 1:1 ratio of two alleles from samples comprised of 2:1 and 1:2 ratios of the same alleles. CONCLUSION: SNP genotyping via Two-Temperature LATE-PCR takes place in a homogeneous closed-tube format and uses a single hybridization probe per SNP site. These assays are convenient, rely on endpoint analysis, improve the options for construction of multiplex assays, and are suitable for SNP genotyping, mutation scanning, and detection of DNA duplication or deletions

    LACO-Wiki: A New Online Land Cover Validation Tool Demonstrated Using GlobeLand30 for Kenya

    Get PDF
    Accuracy assessment, also referred to as validation, is a key process in the workflow of developing a land cover map. To make this process open and transparent, we have developed a new online tool called LACO-Wiki, which encapsulates this process into a set of four simple steps including uploading a land cover map, creating a sample from the map, interpreting the sample with very high resolution satellite imagery and generating a report with accuracy measures. The aim of this paper is to present the main features of this new tool followed by an example of how it can be used for accuracy assessment of a land cover map. For the purpose of illustration, we have chosen GlobeLand30 for Kenya. Two different samples were interpreted by three individuals: one sample was provided by the GlobeLand30 team as part of their international efforts in validating GlobeLand30 with GEO (Group on Earth Observation) member states while a second sample was generated using LACO-Wiki. Using satellite imagery from Google Maps, Bing and Google Earth, the results show overall accuracies between 53% to 61%, which is lower than the global accuracy assessment of GlobeLand30 but may be reasonable given the complex landscapes found in Kenya. Statistical models were then fit to the data to determine what factors affect the agreement between the three interpreters such as the land cover class, the presence of very high resolution satellite imagery and the age of the image in relation to the baseline year for GlobeLand30 (2010). The results showed that all factors had a significant effect on the agreement
    corecore