127 research outputs found

    Improving Transmission Efficiency of Large Sequence Alignment/Map (SAM) Files

    Get PDF
    Research in bioinformatics primarily involves collection and analysis of a large volume of genomic data. Naturally, it demands efficient storage and transfer of this huge amount of data. In recent years, some research has been done to find efficient compression algorithms to reduce the size of various sequencing data. One way to improve the transmission time of large files is to apply a maximum lossless compression on them. In this paper, we present SAMZIP, a specialized encoding scheme, for sequence alignment data in SAM (Sequence Alignment/Map) format, which improves the compression ratio of existing compression tools available. In order to achieve this, we exploit the prior knowledge of the file format and specifications. Our experimental results show that our encoding scheme improves compression ratio, thereby reducing overall transmission time significantly

    The atm-1 gene is required for genome stability in Caenorhabditis elegans

    Get PDF
    The Ataxia-telangiectasia-mutated (ATM) gene in humans was identified as the basis of a rare autosomal disorder leading to cancer susceptibility and is now well known as an important signal transducer in response to DNA damage. An approach to understanding the conserved functions of this gene is provided by the model system, Caenorhabditis elegans. In this paper we describe the structure and loss of function phenotype of the ortholog atm-1. Using bioinformatic and molecular analysis we show that the atm-1 gene was previously misannotated. We find that the transcript is in fact a product of three gene predictions, Y48G1BL.2 (atm-1), K10E9.1, and F56C11.4 that together make up the complete coding region of ATM-1. We also characterize animals that are mutant for two available knockout alleles, gk186 and tm5027. As expected, atm-1 mutant animals are sensitive to ionizing radiation. In addition, however, atm-1 mutants also display phenotypes associated with genomic instability, including low brood size, reduced viability and sterility. We document several chromosomal fusions arising from atm-1 mutant animals. This is the first time a mutator phenotype has been described for atm-1 in C. elegans. Finally we demonstrate the use of a balancer system to screen for and capture atm-1-derived mutational events. Our study establishes C. elegans as a model for the study of ATM as a mutator potentially leading to the development of screens to identify therapeutic targets in humans

    Is there a common water-activity limit for the three domains of life?

    Get PDF
    Archaea and Bacteria constitute a majority of life systems on Earth but have long been considered inferior to Eukarya in terms of solute tolerance. Whereas the most halophilic prokaryotes are known for an ability to multiply at saturated NaCl (water activity (a w) 0.755) some xerophilic fungi can germinate, usually at high-sugar concentrations, at values as low as 0.650-0.605 a w. Here, we present evidence that halophilic prokayotes can grow down to water activities of <0.755 for Halanaerobium lacusrosei (0.748), Halobacterium strain 004.1 (0.728), Halobacterium sp. NRC-1 and Halococcus morrhuae (0.717), Haloquadratum walsbyi (0.709), Halococcus salifodinae (0.693), Halobacterium noricense (0.687), Natrinema pallidum (0.681) and haloarchaeal strains GN-2 and GN-5 (0.635 a w). Furthermore, extrapolation of growth curves (prone to giving conservative estimates) indicated theoretical minima down to 0.611 a w for extreme, obligately halophilic Archaea and Bacteria. These were compared with minima for the most solute-tolerant Bacteria in high-sugar (or other non-saline) media (Mycobacterium spp., Tetragenococcus halophilus, Saccharibacter floricola, Staphylococcus aureus and so on) and eukaryotic microbes in saline (Wallemia spp., Basipetospora halophila, Dunaliella spp. and so on) and high-sugar substrates (for example, Xeromyces bisporus, Zygosaccharomyces rouxii, Aspergillus and Eurotium spp.). We also manipulated the balance of chaotropic and kosmotropic stressors for the extreme, xerophilic fungi Aspergillus penicilloides and X. bisporus and, via this approach, their established water-activity limits for mycelial growth (∼0.65) were reduced to 0.640. Furthermore, extrapolations indicated theoretical limits of 0.632 and 0.636 a w for A. penicilloides and X. bisporus, respectively. Collectively, these findings suggest that there is a common water-activity limit that is determined by physicochemical constraints for the three domains of life

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Interaction Testing and Polygenic Risk Scoring to Estimate the Association of Common Genetic Variants with Treatment Resistance in Schizophrenia

    Get PDF
    Importance: About 20% to 30% of people with schizophrenia have psychotic symptoms that do not respond adequately to first-line antipsychotic treatment. This clinical presentation, chronic and highly disabling, is known as treatment-resistant schizophrenia (TRS). The causes of treatment resistance and their relationships with causes underlying schizophrenia are largely unknown. Adequately powered genetic studies of TRS are scarce because of the difficulty in collecting data from well-characterized TRS cohorts. Objective: To examine the genetic architecture of TRS through the reassessment of genetic data from schizophrenia studies and its validation in carefully ascertained clinical samples. Design, Setting, and Participants: Two case-control genome-wide association studies (GWASs) of schizophrenia were performed in which the case samples were defined as individuals with TRS (n = 10501) and individuals with non-TRS (n = 20325). The differences in effect sizes for allelic associations were then determined between both studies, the reasoning being such differences reflect treatment resistance instead of schizophrenia. Genotype data were retrieved from the CLOZUK and Psychiatric Genomics Consortium (PGC) schizophrenia studies. The output was validated using polygenic risk score (PRS) profiling of 2 independent schizophrenia cohorts with TRS and non-TRS: a prevalence sample with 817 individuals (Cardiff Cognition in Schizophrenia [CardiffCOGS]) and an incidence sample with 563 individuals (Genetics Workstream of the Schizophrenia Treatment Resistance and Therapeutic Advances [STRATA-G]). Main Outcomes and Measures: GWAS of treatment resistance in schizophrenia. The results of the GWAS were compared with complex polygenic traits through a genetic correlation approach and were used for PRS analysis on the independent validation cohorts using the same TRS definition. Results: The study included a total of 85490 participants (48635 [56.9%] male) in its GWAS stage and 1380 participants (859 [62.2%] male) in its PRS validation stage. Treatment resistance in schizophrenia emerged as a polygenic trait with detectable heritability (1% to 4%), and several traits related to intelligence and cognition were found to be genetically correlated with it (genetic correlation, 0.41-0.69). PRS analysis in the CardiffCOGS prevalence sample showed a positive association between TRS and a history of taking clozapine (r2 = 2.03%; P =.001), which was replicated in the STRATA-G incidence sample (r2 = 1.09%; P =.04). Conclusions and Relevance: In this GWAS, common genetic variants were differentially associated with TRS, and these associations may have been obscured through the amalgamation of large GWAS samples in previous studies of broadly defined schizophrenia. Findings of this study suggest the validity of meta-analytic approaches for studies on patient outcomes, including treatment resistance

    Immunologic difference between hypersensitivity to mosquito bite and hemophagocytic lymphohistiocytosis associated with Epstein-Barr virus infection.

    Get PDF
    Hemophagocytic lymphohistiocytosis (HLH) is a life-threatening, virus-triggered immune disease. Hypersensitivity to mosquito bite (HMB), a presentation of Chronic Active Epstein-Barr Virus infection (CAEBV), may progress to HLH. This study aimed to investigate the immunologic difference between the HMB episodes and the HLH episodes associated with EBV infection. Immunologic changes of immunoglobulins, lymphocyte subsets, cytotoxicity, intracellular perforin and granzyme expressions, EBV virus load and known candidate genes for hereditary HLH were evaluated and compared. In 12 HLH episodes (12 patients) and 14 HMB episodes (4 patients), there were both decreased percentages of CD4+ and CD8+ and increased memory CD4+ and activated (CD2+HLADR+) lymphocytes. In contrast to HMB episodes that had higher IgE levels and EBV virus load predominantly in NK cells, those HLH episodes with virus load predominantly in CD3+ lymphocyte had decreased perforin expression and cytotoxicity that were recovered in the convalescence period. However, there was neither significant difference of total virus load in these episodes nor candidate genetic mutations responsible for hereditary HLH. In conclusion, decreased perforin expression in the HLH episodes with predominant-CD3+ EBV virus load is distinct from those HMB episodes with predominant-NK EBV virus load. Whether the presence of non-elevated memory CD4+ cells or activated lymphocytes (CD2+HLADR+) increases the mortality rate in the HLH episodes remains to be further warranted through larger-scale studies
    corecore