444 research outputs found

    The accessibility and scalability of gene family analysis

    Get PDF
    Gene family detection allows us to gain a better understanding of how different genomes are related. At UNH, we have a pipeline that computes these families using a variety of methods. However, the pipeline is inefficient, and performs poorly on large numbers of genornes. The pipeline is comprised of many Pert scripts, which are complex to use, and require specific organization of the data at each step. This means that all users of the pipeline must undergo training to understand each step of the pipeline and the intricacies of each script. The goal of my thesis is two-fold. First, I have optimized the scripts used in determining the gene families. This allows users to run gene family analysis on any number of genomes, without using excessive amounts of memory. My second step was to create a web interface for the pipeline. Each user is given an account that they can use to create pipeline projects. Within a project, users can simply upload their data, create the jobs they wish to run, and the web interface takes care of all the details. The server structures their data in the correct form, and the pipeline scripts are run automatically. The results are produced in an easy to understand format, and can be downloaded by the users. We have taken this interface, and have created a machine image containing all the tools needed to run the pipeline, and have made it available publicly on the Amazon Elastic Compute Cloud

    Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within <it>GFINDer</it>, a Web system we previously developed that dynamically aggregates functional and phenotypic annotations of user-uploaded gene lists and allows performing their statistical analysis and mining.</p> <p>Results</p> <p>Exploiting protein information in Pfam and InterPro databanks, we developed and added in <it>GFINDer </it>original modules specifically devoted to the exploration and analysis of functional signatures of gene protein products. They allow annotating numerous user-classified nucleotide sequence identifiers with controlled information on related protein families, domains and functional sites, classifying them according to such protein annotation categories, and statistically analyzing the obtained classifications. In particular, when uploaded nucleotide sequence identifiers are subdivided in classes, the <it>Statistics Protein Families&Domains </it>module allows estimating relevance of Pfam or InterPro controlled annotations for the uploaded genes by highlighting protein signatures significantly more represented within user-defined classes of genes. In addition, the <it>Logistic Regression </it>module allows identifying protein functional signatures that better explain the considered gene classification.</p> <p>Conclusion</p> <p>Novel <it>GFINDer </it>modules provide genomic protein family and domain analyses supporting better functional interpretation of gene classes, for instance defined through statistical and clustering analyses of gene expression results from microarray experiments. They can hence help understanding fundamental biological processes and complex cellular mechanisms influenced by protein domain composition, and contribute to unveil new biomedical knowledge about the codifying genes.</p

    Morphometric analysis of ears in two families of pinnipeds

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Master of Science at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution August 2001Pinniped (seal and sea lion) auditory systems operate in two acoustically distinct environments, air and water. Piniped species differ in how much time they typically spend in water. They therefore offer an exceptional opportunity to investigate aquatic versus terrestrial hearing mechanisms. The Otariidae (sea lions and fur seals) generally divide their time evenly between land and water and have several adaptations; e.g. external pinnae, related to this lifestyle. Phocidae (true seals) spend the majority of their time in water; they lack external pinnae and have well developed ear canal valves. Differences in hearing ranges and sensitivities have been reported recently for members of both of these familes (Kastak, D., Schusterman, RJ., 1998. Low frequency amphibious hearing in pinnipeds. J. Acoust. Soc. Am. 1303,2216- 2228.; Moore, P.W.B., Schusterman, RJ., 1987. Audiometric assessment of northern fur seals, Callorhinus ursinus. Mar. Mamm. Sci. 3,31-53.). In this project, the ear anatomy of three species of pinnipeds: an otariid, the California sea lion (Zalophus californianus), and two phocids, the northern elephant seal (Mirounga angustirostris) and the harbor seal (Phoca vitulina), was examined using computerized tomography (CT scans) and gross dissection. Thee-dimensional reconstructions of the heads and ears from CT data were used to determine interaural dimensions and ossicular chain morphometrics. Ossicular weights and densities were measured conventionally. Results strongly support a canalcentric system for pinniped sound reception and localization. Further, true seals show adaptations for aquatic high frequency specialization.I was supported by an NDSEG fellowship from ONR

    Evidence of Gene Conversion in Genes Encoding the Gal/GalNac Lectin Complex of Entamoeba

    Get PDF
    The human gut parasite Entamoeba histolytica, uses a lectin complex on its cell surface to bind to mucin and to ligands on the intestinal epithelia. Binding to mucin is necessary for colonisation and binding to intestinal epithelia for invasion, therefore blocking this binding may protect against amoebiasis. Acquired protective immunity raised against the lectin complex should create a selection pressure to change the amino acid sequence of lectin genes in order to avoid future detection. We present evidence that gene conversion has occurred in lineages leading to E. histolytica strain HM1:IMSS and E. dispar strain SAW760. This evolutionary mechanism generates diversity and could contribute to immune evasion by the parasites

    NEOWISE: Observations of the Irregular Satellites of Jupiter and Saturn

    Get PDF
    We present thermal model fits for 11 Jovian and 3 Saturnian irregular satellites based on measurements from the WISE/NEOWISE dataset. Our fits confirm spacecraft-measured diameters for the objects with in situ observations (Himalia and Phoebe) and provide diameters and albedo for 12 previously unmeasured objects, 10 Jovian and 2 Saturnian irregular satellites. The best-fit thermal model beaming parameters are comparable to what is observed for other small bodies in the outer Solar System, while the visible, W1, and W2 albedos trace the taxonomic classifications previously established in the literature. Reflectance properties for the irregular satellites measured are similar to the Jovian Trojan and Hilda Populations, implying common origins.Comment: 17 pages, 3 figures, accepted for publication in Astrophysical Journa

    SUBFAMILY CLUSTERING USING LABEL UNCERTAINTY (FOR TRANSPOSABLE ELEMENT FAMILIES)

    Get PDF
    Biological sequence annotation is typically performed by aligning a sequence to a database of known sequence elements. For transposable elements, these known sequences represent subfamily consensus sequences. When many of the subfamily models in the database are highly similar to each other, a sequence belonging to one subfamily can easily be mistaken as belonging to another, causing non-reproducible subfamily annotation. Because annotation with subfamilies is expected to give some amount of insight into a sequence’s evolutionary history, it is important that such annotation be reproducible. Here, we present our software tool, SCULU, which builds upon our previously-described methods for computing annotation confidence, and uses those confidence estimates to find and collapse pairs of subfamilies that have a high risk of annotation collision. The result is a reduced set of subfamilies, with increased expected subfamily annotation reliability

    Electrostatic and Functional Analysis of the Seven-Bladed WD β-Propellers

    Get PDF
    β-propeller domains composed of WD repeats are highly ubiquitous and typically used as multi-site docking platforms to coordinate and integrate the activities of groups of proteins. Here, we have used extensive homology modelling of the WD40-repeat family of seven-bladed β-propellers coupled with subsequent structural classification and clustering of these models to define subfamilies of β-propellers with common structural, and probable, functional characteristics. We show that it is possible to assign seven-bladed WD β-propeller proteins into functionally different groups based on the information gained from homology modelling. We examine general structural diversity within the WD40-repeat family of seven-bladed β-propellers and demonstrate that seven-bladed β-propellers composed of WD-repeats are structurally distinct from other seven-bladed β-propellers. We further provide some insights into the multifunctional diversity of the seven-bladed WD β-propeller surfaces. This report once again reinforces the importance of structural data and the usefulness of homology models in functional classification

    The intrinsic dimension of protein sequence evolution

    Get PDF
    It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by computing the intrinsic dimension (ID) of the sequences belonging to a selection of protein families. The ID is a measure of the number of independent directions that evolution can take starting from a given sequence. We find that the ID is practically constant for sequences belonging to the same family, and moreover it is very similar in different families, with values ranging between 6 and 12. These values are significantly smaller than the raw number of amino acids, confirming the importance of correlations between mutations in different sites. However, we demonstrate that correlations are not sufficient to explain the small value of the ID we observe in protein families. Indeed, we show that the ID of a set of protein sequences generated by maximum entropy models, an approach in which correlations are accounted for, is typically significantly larger than the value observed in natural protein families. We further prove that a critical factor to reproduce the natural ID is to take into consideration the phylogeny of sequences

    Virus Evolution: How Does an Enveloped Virus Make a Regular Structure?

    Get PDF
    The evolution of viruses has been an exciting area of study, albeit an area that is fraught with difficulties be- cause of the lack of a fossil record and because of the rapid sequence divergence exhibited by viruses. All viruses in collections available for study in the laboratory have been isolated within the last 70 years. Studies of the rate of sequence divergence in viruses over this period of time, all of which have focused on RNA viruses, have given estimates of 10^2 to 10^4 changes per nucleotide per year (Takeda et al., 1994; Weaver et al., 1997). Although these rates for the fixation of mutations of necessity assay changes in only the most variable positions in the viral genome, and there are clearly positions that change much more slowly, it is nonetheless clear that it is difficult to establish relationships between two viruses that last had a common ancestor, for example, a million years ago, based solely on sequence relation- ships. Furthermore, it has become increasingly clear in the last two decades that extensive recombination over the ages has complicated the evolutionary relationships among viruses belonging to different families (Strauss et al., 1996). To ascertain distant relationships among viruses, structural studies are of increasing importance, because the structure of a protein changes much less rapidly than does the amino acid sequence that forms the structure (Rossmann et al., 1974)
    corecore