738 research outputs found
Algorithms to Explore the Structure and Evolution of Biological Networks
High-throughput experimental protocols have revealed thousands of relationships amongst genes and proteins under various conditions. These putative associations are being aggressively mined to decipher the structural and functional architecture of the cell. One useful tool for exploring this data has been computational network analysis. In this thesis, we propose a collection of novel algorithms to explore the structure and evolution of large, noisy, and sparsely annotated biological networks.
We first introduce two information-theoretic algorithms to extract interesting patterns and modules embedded in large graphs. The first, graph summarization, uses the minimum description length principle to find compressible parts of the graph. The second, VI-Cut, uses the variation of information to non-parametrically find groups of topologically cohesive and similarly annotated nodes in the network. We show that both algorithms find structure in biological data that is consistent with known biological processes, protein complexes, genetic diseases, and operational taxonomic units. We also propose several algorithms to systematically generate an ensemble of near-optimal network clusterings and show how these multiple views can be used together to identify clustering dynamics that any single solution approach would miss.
To facilitate the study of ancient networks, we introduce a framework called ``network archaeology'') for reconstructing the node-by-node and edge-by-edge arrival history of a network. Starting with a present-day network, we apply a probabilistic growth model backwards in time to find high-likelihood previous states of the graph. This allows us to explore how interactions and modules may have evolved over time. In experiments with real-world social and biological networks, we find that our algorithms can recover significant features of ancestral networks that have long since disappeared.
Our work is motivated by the need to understand large and complex biological systems that are being revealed to us by imperfect data. As data continues to pour in, we believe that computational network analysis will continue to be an essential tool towards this end
Proceedings of the 97th Annual Virginia Academy of Science Meeting, 2019
Proceedings of the 97th Annual Virginia Academy of Science Meeting, May 22-24, 2019, at Old Dominion University, Norfolk, Virginia
Reconstrução e classificação de sequências de ADN desconhecidas
The continuous advances in DNA sequencing technologies and techniques
in metagenomics require reliable reconstruction and accurate classification
methodologies for the diversity increase of the natural repository while contributing
to the organisms' description and organization. However, after
sequencing and de-novo assembly, one of the highest complex challenges
comes from the DNA sequences that do not match or resemble any biological
sequence from the literature. Three main reasons contribute to this
exception: the organism sequence presents high divergence according to the
known organisms from the literature, an irregularity has been created in the
reconstruction process, or a new organism has been sequenced. The inability
to efficiently classify these unknown sequences increases the sample
constitution's uncertainty and becomes a wasted opportunity to discover
new species since they are often discarded.
In this context, the main objective of this thesis is the development and
validation of a tool that provides an efficient computational solution to
solve these three challenges based on an ensemble of experts, namely
compression-based predictors, the distribution of sequence content, and
normalized sequence lengths. The method uses both DNA and amino acid
sequences and provides efficient classification beyond standard referential
comparisons. Unusually, it classifies DNA sequences without resorting directly
to the reference genomes but rather to features that the species biological
sequences share. Specifically, it only makes use of features extracted
individually from each genome without using sequence comparisons.
RFSC was then created as a machine learning classification pipeline that
relies on an ensemble of experts to provide efficient classification in metagenomic
contexts. This pipeline was tested in synthetic and real data, both
achieving precise and accurate results that, at the time of the development
of this thesis, have not been reported in the state-of-the-art. Specifically, it
has achieved an accuracy of approximately 97% in the domain/type classification.Os contínuos avanços em tecnologias de sequenciação de ADN e técnicas
em meta genómica requerem metodologias de reconstrução confiáveis e de
classificação precisas para o aumento da diversidade do repositório natural,
contribuindo, entretanto, para a descrição e organização dos organismos.
No entanto, após a sequenciação e a montagem de-novo, um dos desafios
mais complexos advém das sequências de ADN que não correspondem ou se
assemelham a qualquer sequencia biológica da literatura. São três as principais
razões que contribuem para essa exceção: uma irregularidade emergiu
no processo de reconstrução, a sequência do organismo é altamente dissimilar
dos organismos da literatura, ou um novo e diferente organismo foi
reconstruído. A incapacidade de classificar com eficiência essas sequências
desconhecidas aumenta a incerteza da constituição da amostra e desperdiça
a oportunidade de descobrir novas espécies, uma vez que muitas vezes são
descartadas.
Neste contexto, o principal objetivo desta tese é fornecer uma solução computacional
eficiente para resolver este desafio com base em um conjunto
de especialistas, nomeadamente preditores baseados em compressão, a distribuição de conteúdo de sequência e comprimentos de sequência normalizados.
O método usa sequências de ADN e de aminoácidos e fornece classificação eficiente além das comparações referenciais padrão. Excecionalmente,
ele classifica as sequências de ADN sem recorrer diretamente a genomas
de referência, mas sim às características que as sequências biológicas da
espécie compartilham. Especificamente, ele usa apenas recursos extraídos
individualmente de cada genoma sem usar comparações de sequência. Além
disso, o pipeline é totalmente automático e permite a reconstrução sem referência de genomas a partir de reads FASTQ com a garantia adicional de
armazenamento seguro de informações sensíveis.
O RFSC é então um pipeline de classificação de aprendizagem automática
que se baseia em um conjunto de especialistas para fornecer classificação
eficiente em contextos meta genómicos. Este pipeline foi aplicado em dados
sintéticos e reais, alcançando em ambos resultados precisos e exatos que,
no momento do desenvolvimento desta dissertação, não foram relatados na
literatura. Especificamente, esta ferramenta desenvolvida, alcançou uma
precisão de aproximadamente 97% na classificação de domínio/tipo.Mestrado em Engenharia de Computadores e Telemátic
Computationally Comparing Biological Networks and Reconstructing Their Evolution
Biological networks, such as protein-protein interaction, regulatory, or metabolic networks, provide information about biological function, beyond what can be gleaned from sequence alone. Unfortunately, most computational problems associated with these networks are NP-hard. In this dissertation, we develop algorithms to tackle numerous fundamental problems in the study of biological networks.
First, we present a system for classifying the binding affinity of peptides to a diverse array of immunoglobulin antibodies. Computational approaches to this problem are integral to virtual screening and modern drug discovery. Our system is based on an ensemble of support vector machines and exhibits state-of-the-art performance. It placed 1st in the 2010 DREAM5 competition.
Second, we investigate the problem of biological network alignment. Aligning the biological networks of different species allows for the discovery of shared structures and conserved pathways. We introduce an original procedure for network alignment based on a novel topological node signature. The pairwise global alignments of biological networks produced by our procedure, when evaluated under multiple metrics, are both more accurate and more robust to noise than those of previous work.
Next, we explore the problem of ancestral network reconstruction. Knowing the state of ancestral networks allows us to examine how biological pathways have evolved, and how pathways in extant species have diverged from that of their common ancestor. We describe a novel framework for representing the evolutionary histories of biological networks and present efficient algorithms for reconstructing either a single parsimonious evolutionary history, or an ensemble of near-optimal histories. Under multiple models of network evolution, our approaches are effective at inferring the ancestral network interactions. Additionally, the ensemble approach is robust to noisy input, and can be used to impute missing interactions in experimental data.
Finally, we introduce a framework, GrowCode, for learning network growth models. While previous work focuses on developing growth models manually, or on procedures for learning parameters for existing models, GrowCode learns fundamentally new growth models that match target networks in a flexible and user-defined way. We show that models learned by GrowCode produce networks whose target properties match those of real-world networks more closely than existing models
Program and Proceedings: The Nebraska Academy of Sciences 1880-2013
PROGRAM
FRIDAY, APRIL 19, 2013
REGISTRATION FOR ACADEMY, Lobby of Lecture wing, Olin Hall
Aeronautics and Space Science, Session A, Olin 249
Aeronautics and Space Science, Session B, Olin 224
Collegiate Academy, Biology Session A, Olin B
Biological and Medical Sciences, Session A, Olin 112
Biological and Medical Sciences, Session B, Smith Callen Conference Center
NE Chapter, Nat\u27l Council For Geographic Education, Olin 325
Junior Academy, Judges Check-In, Olin 219
Junior Academy, Senior High REGISTRATION, Olin Hall Lobby
Chemistry and Physics, Section A, Chemistry, Olin A
Chemistry and Physics, Section B, Physics, Planetarium
Collegiate Academy, Chemistry and Physics, Session A, Olin 324
Junior Academy, Senior High Competition, Olin 124, Olin 131
Aeronautics and Space Science, Poster Session, Olin 249
Anthropology, Olin 111
NWU Health and Sciences Graduate School Fair, Olin and Smith Curtiss Halls
Aeronautics and Space Science, Poster Session, Olin 249
MAIBEN MEMORIAL LECTURE, OLIN B
Bob Feurer, North Bend High School, Making People Smarter Using Habits of Mind
LUNCH, PATIO ROOM, STORY STUDENT CENTER
(pay and carry tray through cafeteria line, or pay at NAS registration desk)
Aeronautics Group, Sunflower Room
Biological and Medical Sciences, Session C, Olin 112
Biological and Medical Sciences, Session D, Smith Callen Conference Center
Chemistry and Physics, Section A, Chemistry, Olin A
Collegiate Academy, Biology Session A, Olin B
Collegiate Academy, Biology Session B, Olin 249
Collegiate Academy, Chemistry and Physics, Session B, Olin 324
Junior Academy, Judges Check-In, Olin 219
Junior Academy, Junior High REGISTRATION, Olin Hall Lobby
Junior Academy, Senior High Competition, (Final), Olin 110
Anthropology, Olin 111
Teaching of Science and Math, Olin 224
Applied Science and Technology, Olin 325
Junior Academy, Junior High Competition, Olin 124, Olin 131
NJAS Board/Teacher Meeting, Olin 219
BUSINESS MEETING, OLIN B
AWARDS RECEPTION for NJAS, Scholarships, Members, Spouses, and Guests
First United Methodist Church, 2723 N 50th Street, Lincoln, N
Integration of protein binding interfaces and abundance data reveals evolutionary pressures in protein networks
Networks of protein-protein interactions have received considerable interest in the past two decades for their insights about protein function and evolution. Traditionally, these networks only map the functional partners of proteins; they lack further levels of data such as binding affinity, allosteric regulation, competitive vs noncompetitive binding, and protein abundance. Recent experiments have made such data on a network-wide scale available, and in this thesis I integrate two extra layers of data in particular: the binding sites that proteins use to interact with their partners, and the abundance or “copy numbers” of the proteins. By analyzing the networks for the clathrin-mediated endocytosis (CME) system in yeast and the ErbB signaling pathway in humans, I find that this extra data reveals new insights about the evolution of protein networks. The structure of the binding site or interface interaction network (IIN) is optimized to allow higher binding specificity; that is, a high gap in strength between functional binding and nonfunctional mis-binding. This strongly implies that mis-binding is an evolutionary error-load constraint shaping protein network structure. Another method to limit mis-binding is to balance protein copy numbers so that there are no “leftover” proteins available for mis-binding. By developing a new method to quantify balance in IINs, I show that the CME network is significantly balanced when compared to randomly sampled sets of copy numbers. Furthermore, IINs with a biologically realistic structure produce less mis-binding under balanced concentrations, when compared to random networks, but more mis-binding under unbalanced concentrations. This implies strong pressure for copy number balance and that any imbalance should occur for functional reasons. I thus explore some functional consequences of imbalance by constructing dynamic models of two poorly balanced subnetworks of the larger CME network. In general, I find that balanced copy numbers provide higher protein complex yield (number of complete complexes), but imbalance may allow cells to “bottleneck” a functional process, effectively turning complex formation on or off via spatial localization of subunits. Finally, I find that strongly binding proteins are more likely to be balanced, as these “sticky” proteins would be more likely to engage in mid-binding otherwise
- …