10 research outputs found

    Improving the Caenorhabditis elegans Genome Annotation Using Machine Learning

    Get PDF
    For modern biology, precise genome annotations are of prime importance, as they allow the accurate definition of genic regions. We employ state-of-the-art machine learning methods to assay and improve the accuracy of the genome annotation of the nematode Caenorhabditis elegans. The proposed machine learning system is trained to recognize exons and introns on the unspliced mRNA, utilizing recent advances in support vector machines and label sequence learning. In 87% (coding and untranslated regions) and 95% (coding regions only) of all genes tested in several out-of-sample evaluations, our method correctly identified all exons and introns. Notably, only 37% and 50%, respectively, of the presently unconfirmed genes in the C. elegans genome annotation agree with our predictions, thus we hypothesize that a sizable fraction of those genes are not correctly annotated. A retrospective evaluation of the Wormbase WS120 annotation [1] of C. elegans reveals that splice form predictions on unconfirmed genes in WS120 are inaccurate in about 18% of the considered cases, while our predictions deviate from the truth only in 10%–13%. We experimentally analyzed 20 controversial genes on which our system and the annotation disagree, confirming the superiority of our predictions. While our method correctly predicted 75% of those cases, the standard annotation was never completely correct. The accuracy of our system is further corroborated by a comparison with two other recently proposed systems that can be used for splice form prediction: SNAP and ExonHunter. We conclude that the genome annotation of C. elegans and other organisms can be greatly enhanced using modern machine learning technology

    Traveler's diarrhea: epidemiology and impact on visitors to Fortaleza, Brazil

    No full text
    Objective. To assess the epidemiology and impact of traveler's diarrhea (TD) among visitors to the city of Fortaleza, CearĂĄ, Brazil, as part of a global study on TD carried out in four countries. Methods. Within a cross-sectional survey, questionnaires were completed by departing travelers at the Fortaleza airport between March 1997 and February 1998. The questions inquired about demographics, duration of stay, reason for their visit, pretravel health advice they had received, risky food and beverage consumption while in Fortaleza, and quality of life during the visit to Fortaleza in relation to having or not having contracted TD. Results. A total of 12 499 questionnaires were analyzed. The most common reason that the visitors gave for their travel to Fortaleza was a holiday (60.3%). The total diarrhea attack rate was 13.4%. Younger people (< 36 years) had significantly higher TD attack rates than did older persons. Using a logistic regression model, we investigated the visitors' risk factors, including age, gender, length of stay, and trip's purpose. According to that analysis, characteristics that are slightly predictive of TD are gender, length of stay, and visiting as a tourist rather than for some other purpose. Characteristics that protect against contracting TD include being older and traveling for business rather than for some other reason. Of those who were incapacitated by TD, the mean duration of the impairment was 42 hours. Conclusions. TD affected the travel plans and activities of many of the visitors to Fortaleza. Further, although aware of the health risks, the majority of those travelers did not avoid all potentially contaminated food or beverage items. Given this pattern of behavior, future efforts to combat TD may have to depend on such other alternative strategies as new vaccines

    The State Model That Uses Open Reading Frame Information

    No full text
    <p>The sequences next to the state indicate which consensus has to appear at the transitions between intron (capital) and exon (bold). Here, we use the IUPAC code for ambiguous nucleotides (e.g., B = C/G/T, R = A/G, Y = C/T). The digit on the transition arrows is related to the reading frame and indicates the required frame shift to follow the transition (e.g., between state 1 and 2, one can only accept exons leading to a frame shift of 0). Also, it defines in which frame stop codons are allowed to occur—no stop codon should appear in-frame. Finally, the model is constructed such that in-frame stop codons cannot be assembled on the exon boundaries (this required the three additional state pairs 6/7, 10/11, and 12/13).</p

    POIMs for Donor (Left) and Acceptor (Right) SVM Classifiers

    No full text
    <p>Shown are the color-coded importance scores of substring lengths for positions around the splice sites. Near the splice site, many important oligomers are identified. Particularly long substrings are important upstream of the donor and downstream of the acceptor site. See the main text for discussion.</p

    Probenahme und Bestimmung von Aerosolen und deren Inhaltsstoffen – Bestimmung von metallhaltigen Staubinhaltsstoffen : Air monitoring methods in German language, 2019

    No full text
    In addition to the gravimetric determination of airborne particles (total concentration), it is often necessary to selectively determine metals and their compounds in particle fractions because of their toxicological relevance. Usually, the total metal concentration is determined independently of the type of binding or oxidation state in a sample. From an occupational medical and toxicological point of view it makes sense to distinguish between different compounds of a metal, because type and extent of the toxic effect of metals depend considerably on their binding type and their solubility in the human body. In addition to the limit values of the respirable and inhalable particle fraction that must be complied, many metals have an OEL (occupational exposure limit) or MAK value that has to be checked and complied too. For cancerogenic compounds the exposure‐risk relationship has to be considered. Analysis for metals and their compounds predominantly resorts to methods, which require that the dust particle sample is brought into solution. That means the metals and their compounds contained in the sample need to be extracted, dissolved or digested. Aim of the sample preparation is the complete solution of all relevant substances to be analysed. Common digestion methods are for example acid digestion, which uses an acid mixture to digest the sample, and the suspension method, in which acetone is used to suspend the sample. An alternative sample preparation method is the microwave‐assisted pressure digestion with acid/acid mixture. In this chapter the different digestion methods are presented, discussed and compared, taking into account recent developments, in particular microwave‐assisted digestion
    corecore