627 research outputs found

    Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

    Get PDF
    The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships

    Predicting conserved protein motifs with Sub-HMMs

    Get PDF
    BackgroundProfile HMMs (hidden Markov models) provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-features, such as catalytically relevant sequence motifs in enzymes or ligand binding signatures of receptor proteins.ResultsTo identify these conserved motifs efficiently, we propose a method for extracting the most information-rich regions in protein families from their profile HMMs. The method was used here to predict a comprehensive set of sub-HMMs from the Pfam domain database. Cross-validations with the PROSITE and CSA databases confirmed the efficiency of the method in predicting most of the known functionally relevant motifs and residues. At the same time, 46,768 novel conserved regions could be predicted. The data set also allowed us to link at least 461 Pfam domains of known and unknown function by their common sub-HMMs. Finally, the sub-HMM method showed very promising results as an alternative search method for identifying proteins that share only short sequence similarities.ConclusionsSub-HMMs extend the application spectrum of profile HMMs to motif discovery. Their most interesting utility is the identification of the functionally relevant residues in proteins of known and unknown function. Additionally, sub-HMMs can be used for highly localized sequence similarity searches that focus on shorter conserved features rather than entire domains or global similarities. The motif data generated by this study is a valuable knowledge resource for characterizing protein functions in the future

    Visualizing information spaces to enhance social interaction

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 119-122).Human beings are social animals. In real life, we are constantly in contact with others, whose activities help us decide where to go, what to see, and who to talk to. In information spaces like the World Wide Web (WWW), however, people cannot easily gain a sense of being with others, determine patterns of activities, nor find appropriate others to interact with. Our solution, LiveWeb, visualizes the underlying information structure and overlays the dynamic real-time presence of people. This live visualization serves as a basis for more effective interaction. LiveWeb has been deployed on a public Web site with over ten thousand hits a day. Experiments with real users of the system over several months have shown a strong preference for the presence display.by Rebecca Wen Fei Xiong.Ph.D

    Evolution of protein domain architectures

    Get PDF
    This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution

    THE ANALYTICS QUOTIENT: RETOOLING CIVIL AFFAIRS FOR THE FUTURE OPERATING ENVIRONMENT

    Get PDF
    Historically, military intelligence analysts and U.S. forces, frozen in their preferred strategy of attrition warfare, have undervalued civil information in conflicts against irregular threats. As operating environments grow more complex, uncertain, and population-centric, the roles of Civil Affairs Forces and civil information will become increasingly relevant. Unfortunately, the current analytical methods prescribed in Civil Affairs doctrine are inadequate for evaluating complex environments. They fail to provide supported commanders with the information required to make informed decisions. The purpose of this research is to determine how Civil Affairs Forces must retool their analytical capabilities to meet the demands of future operating environments. The answer lies in developing an organic Civil Affairs analytic capability suitable for employing data-driven approaches to gain actionable insights into uncertain operational environments, and subsequently, integrating those insights into sophisticated operational targeting frameworks and strategies designed to disrupt irregular threats. This research uses case studies of organizations, across a range of industries, that leveraged innovative data-driven approaches into disruptive competitive advantages. These organizations highlight the broad utility of the prescribed approaches and potential pathways for Civil Affairs Forces to pursue in creating an analytic capability that supports effective civil knowledge integration.http://archive.org/details/theanalyticsquot1094564891Major, United States ArmyApproved for public release; distribution is unlimited

    Organizational Leadership in Academic Libraries : Identifying Culture Types and Leadership Roles

    Get PDF
    The purpose of this study is to identify organizational culture types and leadership roles among research and non-research libraries in higher education institutions in the United States and to reveal trends that can assist in enacting needed organizational change. Organizational culture and leadership are two intertwined concepts that are strongly aligned with the human element of any supervisory experience. According to Crosby, they help “nurture effective and humane organizations” (Crosby, 2004). This research project sought to test the claims brought forth by library researchers such as Kaarts-Brown et al. in which they reported a tie between the library manager’s ability to shift leadership roles to the overall effectiveness of the organization’s culture (2004, p. 38). It also examined possible models to aid libraries in diagnosing and making change that can influence organizational culture in positive ways. Application of Cameron and Quinn’s Competing Values Framework (CVF) by use of the Organizational Culture Assessment Instrument (OCAI) provided a method for identifying culture and leadership roles among 625 academic library respondents. One hundred higher education libraries affiliated with the Association of Research Libraries (ARL) were compared to 123 similar-sized non-research oriented colleges and universities. The library literature stresses that budgetary constraints cause great difficulties among libraries of all types in this country. It also states that library science education does little to prepare its leaders to tackle this wide-spread crisis. This research project attempted to reveal the impact budget may have on culture and if education has any bearing on leadership traits and if one library type displays cultures or leadership roles that are desirable. Significant differences were revealed for several of the variables studied. Revealing culture types or library organizations and the leadership roles of their chief officers can aid in the diagnosis of effective or ineffective organizations. Once types and roles are identified, strategies can be suggested to meet institutional goals in spite of budget problems. With no state-supported economic relief anticipated for higher education in the near future, identifying creative strategies for library directors to employ may aid them in becoming more effective managers. Cameron and Quinn assert that effective managers beget effective leaders, who in turn can invoke positive change within their organizations (2006, p. 81)

    Finding related pages in the World Wide Web

    Get PDF
    When using traditional search engines, users have to formulate queries to describe their information need. This paper discusses a different approach to Web searching where the input to the search process is not a set of query terms, but instead is the URL of a page, and the output is a set of related Web pages. A related Web page is one that addresses the same topic as the original page. For example, www.washingtonpost.com is a page related to www.nytimes.com, since both are online newspapers. We describe two algorithms to identify related Web pages. These algorithms use only the connectivity information in the Web (i.e., the links between pages) and not the content of pages or usage information. We have implemented both algorithms and measured their runtime performance. To evaluate the effectiveness of our algorithms, we performed a user study comparing our algorithms with Netscape's `What's Related' service (http://home. netscape, com/escapes/related/). Our study showed that the precision at 10 for our two algorithms are 73% better and 51% better than that of Netscape, despite the fact that Netscape uses both content and usage pattern information in addition to connectivity information

    Functional discovery in the oxidative D-galacturonate assimilation pathway and development of the enzyme similarity web tool

    Get PDF
    Sequencing technology has improved dramatically over the past few decades. Before the sequencing of complete genomes was possible, the sequencing of a gene was directly linked to the biochemical characterization of its product [1], however biochemical and genetic characterization has not benefited from being scaled up in the same way as has sequencing. Thus, the scientific community is confronted with exponentially growing sequence databases in which roughly half of the entries are either annotated incorrectly or not at all. Therefore, in order to realize the true potential of the data being generated by sequencing projects, something must be done about the way the functions of those sequences are being discovered and identified. One approach to addressing the problem of the growing number of sequences without a known function is that set forth by the Enzyme Function Initiative (EFI). The goal of the EFI is to develop tools and strategies to characterize enzymes discovered in genome projects, and the EFI uses an interdisciplinary approach to address the problem. EFI labs include those with expertise in bioinformatics, computational biology, structural biology, enzymology, and biology, that work together to develop a systematic approach that starts with using bioinformatics to select enzyme candidates for structural elucidation, ligand docking to identify potential substrates, in vitro biochemistry to test those predictions, and microbiology to test for the physiological role of activities identified in vitro. The approach just described is the general approach taken, but other tools and approaches also have been tested and developed in each of the areas mentioned (e.g., bioinformatics, computational biology). Bioinformatics tools that have been further developed include sequence similarity networks (SSNs) and genomic context networks. SSNs have a long history and are useful in visualizing trends across groups of related protein sequences, namely function. Before this work, access to SSNs by experimentalists with little bioinformatics training was limited. To provide the ability for experimentalist to generate an SSN for any protein family (~16,000 now in Pfam), we developed a web tool to generate SSNs quickly and easily. The networks can be viewed in Cytoscape and contain an aggregate of annotation data pulled from different sources (e.g., UniProt, GenomesOnline). The first part of this work (Chapter 2) describes the web tool and provides an example in which members of the enolase superfamily from Agrobacterium tumefaciens strain C58 are mined in a shotgun approach to discover novel enzymatic activities. In the second part of this work, combined bioinformatics and experimental approaches are used to identify two novel enzymes in the oxidative pathway to degrade pectin, the abundant plant cell wall polysaccharide. In the first example (Chapter 3), genomic context and pathway reconstruction combined with in vitro biochemistry and gene expression analysis reveal a novel enzymatic activity of isomerizing the 6-member ring lactone of D-galacturonate (D-galA) to its 5-member ring lactone counterpart. An enzyme to catalyze this reaction had not been identified before this work. In the second example (Chapter 4), in a large scale screening of transporters we were lead to microbial gene neighborhoods containing many enzymes in the known D-galA oxidative pathway but noticed in a number of cases components of the known pathway were missing; in their place candidate enzymes were likely involved in an alternative pathway for metabolizing D-galA. This work lead us to the discovery of an enzyme that hydrolyzed the 6-member ring lactone of D-galA to its acyclic diacid counterpart, meso-galactarate

    Networks, Fields and Organizations: Micro-Dynamics, Scale and Cohesive Embeddings

    Get PDF
    Social action is situated in fields that are simultaneously composed of interpersonal ties and relations among organizations, which are both usefully characterized as social networks. We introduce a novel approach to distinguishing different network macro-structures in terms of cohesive subsets and their overlaps. We develop a vocabulary that relates different forms of network cohesion to field properties as opposed to organizational constraints on ties and structures. We illustrate differences in probabilistic attachment processes in network evolution that link on the one hand to organizational constraints versus field properties and to cohesive network topologies on the other. This allows us to identify a set of important new micro-macro linkages between local behavior in networks and global network properties. The analytic strategy thus puts in place a methodology for Predictive Social Cohesion theory to be developed and tested in the context of informal and formal organizations and organizational fields. We also show how organizations and fields combine at different scales of cohesive depth and cohesive breadth. Operational measures and results are illustrated for three organizational examples, and analysis of these cases suggests that different structures of cohesive subsets and overlaps may be predictive in organizational contexts and similarly for the larger fields in which they are embedded. Useful predictions may also be based on feedback from level of cohesion in the larger field back to organizations, conditioned on the level of multiconnectivity to the field.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/44715/1/10588_2005_Article_5273175.pd
    • 

    corecore