42 research outputs found

    Pairwise Sequence Alignment between HBV and HCC Using Modified Needleman Wunsch Algorithm

    Get PDF
    Ths paper aims to find similarity of Hepatitis B virus (HBV) and Hepatocelluler Carcinoma (HCC) DNA sequences.The similarity of sequence allignments indicates that they have similarity of chemical and physical properties. Mutation of the virus DNA in X region has potential role in HCC. It is observed using pairwise sequence alignment of genotype-A in HBV. This paper is to purpose the modified method of Needleman Wunsch algorithm for optimum global DNA sequence alignment. The main idea is to optimize filling matrix and backtracking proccess of DNA components, so that there is reduction of computational time and space complexity. This research is applied to DNA sequence of 858 hepatitis B virus and 12 carcinoma patient. There are 10,296 pairwise of DNA sequences to be aligned globally using the modified method. As a result, it is achieved high similarity of 96.547% and validity of 99.854%. There is reduction of computational time as 34.6% and space complexity as 42.52

    Qualidade dos dados & Machine Learning : uma nova abordagem aos censos populacionais e habitacionais

    Get PDF
    Mestrado em Gestão de Sistemas de InformaçãoO projeto realizado consiste no processo de recolha e preparação de dados manuscritos em papel, da aplicação do inquérito Censo Populacional e Habitacional a uma população de mais de vinte milhões de pessoas. Este é um tipo de inquérito que se faz à população de um país, tendo como objetivo retirar conclusões a nível geográfico tanto da população, como das suas condições de vida. Os Censos são realizados com alguma frequência, o que permite efetuar comparações e perceber a transformação da sociedade e de um país, ao longo dos anos. Com o objetivo de tornar os mais de vinte milhões de inquéritos manuscritos em informação útil e de qualidade acerca de um país e de uma população foi necessário dividir o trabalho em três fases, a fase recolha de dados e da sua conversão de imagem para um formato digital onde o texto possa ser editável, a fase de limpeza e tratamento dos dados e, por último, a fase de análise e classificação dos mesmos. De acordo com cada fase, foram utilizadas diversas metodologias e tecnologias, como é o caso do OCR (Optical Character Recognition), NLP (Natural Language Processing) e Machine Learning, respetivamente. Estas abordagens permitiram uma melhor, mais rápida e mais fiável análise de resultados.The project undertaken consists on the process of collecting and preparing paper handwritten data obtained from the Population and Housing Census survey applied to a population of over twenty million people. This type of inquiry done to the population of a country has the purpose of drawing up conclusions and insights on the populations' geographical characteristics, as well as their life conditions. These censuses are done on a frequent basis, which allows for continuous comparisons to be done and thus understand the changes occurring in a given society and country throughout time. In order to turn more than twenty million handwritten surveys into useful and quality information about a country and a population, it was necessary to divide the work into three phases. The first stage consisted on the collection of data and its conversion into an image in a digital format, where text can be edited, followed by data cleansing and transformation, and finally, the third stage involved the analysis of the data and its respective classification. In regards to the data analysis, for each sentence there were various methodologies and technologies applied, such as OCR (Optical Character Recognition), NLP (Natural Language Processing) e Machine Learning. This approach led to a better, quicker and more reliable analysis of the data.info:eu-repo/semantics/publishedVersio

    Sequence and structural analysis of antibodies

    Get PDF
    The work presented in this thesis focusses on the sequence and structural analysis of antibodies and has fallen into three main areas. First I developed a method to assess how typical an antibody sequence is of the expressed human antibody repertoire. My hypothesis was that the more \humanlike" an antibody sequence is (in other words how typical it is of the expressed human repertoire), the less likely it is to elicit an immune response when used in vivo in humans. In practice, I found that, while the most and least-human sequences generated the lowest and highest anti-antibody reponses in the small available dataset, there was little correlation in between these extremes. Second, I examined the distribution of the packing angles between VH and VL domains of antibodies and whether residues in the interface in uence the packing angle angle. This is an important factor which has essentially been ignored in modelling antibody structures since the packing angle can have a signi�cant e�ect on the topography of the combining site. Finding out which interface residues have the greatest in uence is also important in protocols for `humanizing' mouse antibodies to make them more suitable for use in therapy in humans. Third, I developed a method to apply standard Kabat or Chothia numbering schemes to an antibody sequence automatically. In brief, the method uses pro�les to identify the ends of the framework regions and then �lls in the numbers for each section. Benchmarking the performance of this algorithm against annotations in the Kabat database highlighted several errors in the manual annotations in the Kabat database. Based on structural analysis of insertions and deletions in the framework regions of antibodies, I have extended the Chothia numbering scheme to identify the structurally correct positions of insertions and deletions in the framework regions

    Statistische Analyse von Sequenzpopulationen in der Virologie und Immunologie

    Get PDF
    In this thesis I have examined various topics regarding the relationship between viruses and the human immune system. I expanded and refined a tool (which can now be found as R-package SeqFeatR on C-RAN) for the analysis of sequence data and features of this sequences like HLA type or tropism (see chapter 4) and checked with this tool if there are differences between some multiple correction approaches for sequence data, and how Bayesian inference could be used in this context (see chapter 5). It could be shown that Bayesian inference is superior to the frequentistic methods for this kind of problem, because multiple correction approaches ignore the fact that different positions in a sequence alignment may be connected in the protein product of this sequence and are therefor not independent. Furthermore, I have examined sequences from HCV with a form of bootstrap algorithm to find sequence areas which can be used in unknown transmission cases in court. Two areas were found, one in the hypervariable region and the other at the end of the non-structural protein NS5B (see chapter 9). Proteasomal cleavage of alien amino acid sequences inside human cells leads to a presentation of fragments of these sequences on the surface of the cell as epitopes. To present such a fragment, not only must it bind to the MHC, but also needs to be in the correct length to be presented. Therefore viral evolution should favor those viruses, which cannot be cut into presentable epitopes. With epitope data from IEDB and predicted viral sequences which bind the MHC, I searched for amino acids inside the flanking regions around the epitope that may indicate a possible escape mutation against the proteasomal cleavage processes. Fourteen such amino acids and positions were found (see chapter 7). I created a model of HBV reverse transcriptase to check if mutations in certain positions could influence binding with the nucleotide analogue reverse transcriptase inhibitor Tenofovir. Mutations which were inside the binding pocket for Tenofovir showed, in an experimental design by the group of Mengji Lu, a decreased affinity towards the drug (see chapter 10). Together with Ralf Küppers group I examined NGS from different types of B cells to search for almost identical sequences between those. We found similar to identical sequences from two, three and even four kinds of cells in the blood samples of both donors (see chapter 6).In dieser Dissertation bearbeitete ich verschiedene Themen aus dem Bereich der humanpatho-genen Viren und des menschlichen Immunsystems. Zu diesem Zweck entwarf ich ein Programm (welches auf dem R-Archiv C-RAN unter dem Namen SeqFeatR zu finden ist) mit dem sich der Zusammenhang zwischen Sequenzdaten und spezifischen Eigenschaften, wie etwa HLA Typ oder Tropismus, analysieren läßt (s.h Kapitel 4). Mit diesem Programm untersuchte ich ob ein Unterschied zwischen den Verfahren zur Korrektur von Alphafehler-Kumulierung bei Sequenzdaten besteht und in welchem Maße die Verfahren der Bayesschen Statistik besser für diese geeignet sind (s.h. Kapitel 5). Dabei stellte sich heraus, dass letztere für diese Klasse von Problemen eher verwendet werden sollten, da Alphafehler-Kumulierungskorrekturen möglichen Abhängigkeite zwischen verschiedenen Sequen-zpositionen, welche sich unter Umständen erst im fertigen Protein offenbaren, ignorieren. Weiterhin untersuchte ich HCV Sequenzen mittels einer Variante des Bootstrap-Algorithmus um jene Sequenz-Bereiche zu finden, die im Falle von ungeklärten Übertragungswegen zur Identifizierung dieser genutzt werden können. Dabei stellten sich zwei Bereiche als besonders geeignet heraus: Die hypervariable Region sowie ein Bereich am Ende des Nicht-Struktur Protein NS5B (s.h. Kapitel 9). Die Spaltung von fremden Aminosäuresequenzen innerhalb von menschlichen Zellen durch das Proteasom kann zu einer Präsentation dieser Fragmente auf der Zelloberfläche als Epitope führen. Um solche Fragmente präsentieren zu können, müssen diese nicht nur an das spezifische MHC Molekül binden, sondern auch eine optimale Länge besitzen. Daher sollte der evolutionäre Prozess solche Viren fördern, deren Sequenzen sich nicht in entsprechende Stücke zerteilen lassen. Durch eine Kombination von Epitopdaten aus der IEDB und vorhergesagten viralen Sequenzen, welche sicher an MHC Moleküle binden, untersuchte ich, ob innerhalb der flankierenden Regionen um das jeweilige Epitop Sequenzpositionen existieren, welche auf eine Mutation hinweisen, die den Schnittmechanismus der Zelle verhindert. Ich fand vierzehn Aminosäuren und Positionen, die einen solchen Zusammenhang besitzen können (s.h. Kapitel 7). Um heraus zu finden ob es in der reversen Transkriptase von HBV Positionen gibt, welche die Bindung mit dem nukleotidischen Reverse-Transkriptase-Inhibitor Tenofovir beeinflussen, erstellte ich ein Modell dieses Enzyms. Mutationen, die innerhalb der Bindetasche für Tenofovir lagen, führten in einer Versuchsreihe von der Gruppe von Mengji Lu zu einer verringerten Affinität zw ischen Enzym und Medikament (s.h. Kapitel 10). Zusammen mit der Gruppe von Ralf Küppers untersuchte ich Hoch-Durchsatz-Sequenzdaten von verschiedenen Arten von B Zellen um ähnliche Sequenzen zu finden. Wir fanden ähnliche und sogar identische Sequenzen zwischen zwei, drei und sogar allen vier Arten von Zellen jeweils innerhalb der Blutproben jedes der beiden Spender (s.h Kapitel 6)

    Mechanistic Elucidation of Protease–Substrate and Protein–Protein Interactions for Targeting Viral Infections

    Get PDF
    Viral infections represent an old threat to global health, with multiple epidemics and pandemics in the history of mankind. Despite several advances in the development of antiviral substances and vaccines, many viral species are still not targeted. Additionally, new viral species emerge, posing a menace without precedent to humans and animals and causing fatalities, disabilities, environmental harm, and economic losses. In this thesis, we present rational modeling approaches for targeting specific protease-substrate and protein-protein interactions pivotal for the viral replication cycle. Over the course of this work, antiviral research is supported beginning with the development of small molecular antiviral substances, going through the modeling of a potential immunogenic epitope for vaccine development, towards the establishment of descriptors for susceptibility of animals to a viral infection. Notably, all the research was done under scarce data availability, highlighting the predictive power of computational methods and complementarity between in-silico and in-vitro or in-vivo methods

    Comprehensive analysis of lectin-glycan interactions reveals determinants of lectin specificity

    Get PDF
    Lectin-glycan interactions facilitate inter- and intracellular communication in many processes including protein trafficking, host-pathogen recognition, and tumorigenesis promotion. Specific recognition of glycans by lectins is also the basis for a wide range of applications in areas including glycobiology research, cancer screening, and antiviral therapeutics. To provide a better understanding of the determinants of lectin-glycan interaction specificity and support such applications, this study comprehensively investigates specificity-conferring features of all available lectin-glycan complex structures. Systematic characterization, comparison, and predictive modeling of a set of 221 complementary physicochemical and geometric features representing these interactions highlighted specificity-conferring features with potential mechanistic insight. Univariable comparative analyses with weighted Wilcoxon-Mann-Whitney tests revealed strong statistical associations between binding site features and specificity that are conserved across unrelated lectin binding sites. Multivariable modeling with random forests demonstrated the utility of these features for predicting the identity of bound glycans based on generalized patterns learned from non-homologous lectins. These analyses revealed global determinants of lectin specificity, such as sialic acid glycan recognition in deep, concave binding sites enriched for positively charged residues, in contrast to high mannose glycan recognition in fairly shallow but well-defined pockets enriched for non-polar residues. Focused fine specificity analysis of hemagglutinin interactions with human-like and avian-like glycans uncovered features representing both known and novel mutations related to shifts in influenza tropism from avian to human tissues. As the approach presented here relies on co-crystallized lectin-glycan pairs for studying specificity, it is limited in its inferences by the quantity, quality, and diversity of the structural data available. Regardless, the systematic characterization of lectin binding sites presented here provides a novel approach to studying lectin specificity and is a step towards confidently predicting new lectin-glycan interactions

    Structural Analysis of the Glycosylated Intact HIV-1 gp120-b12 Antibody Complex Using Hydroxyl Radical Protein Footprinting

    Get PDF
    Glycoprotein gp120 is a surface antigen and virulence factor of human immunodeficiency virus 1. Broadly neutralizing antibodies (bNAbs) that react to gp120 from a variety of HIV isolates offer hope for the development of broadly effective immunogens for vaccination purposes, if the interactions between gp120 and bNAbs can be understood. From a structural perspective, gp120 is a particularly difficult system because of its size, the presence of multiple flexible regions, and the large amount of glycosylation, all of which are important in gp120-bNAb interactions. Here, the interaction of full-length, glycosylated gp120 with bNAb b12 is probed using high-resolution hydroxyl radical protein footprinting (HR-HRPF) by fast photochemical oxidation of proteins. HR-HRPF allows for the measurement of changes in the average solvent accessible surface area of multiple amino acids without the need for measures that might alter the protein conformation, such as mutagenesis. HR-HRPF of the gp120-b12 complex coupled with computational modeling shows a novel extensive interaction of the V1/V2 domain, probably with the light chain of b12. Our data also reveal HR-HRPF protection in the C3 domain caused by interaction of the N330 glycan with the b12 light chain. In addition to providing information about the interactions of full-length, glycosylated gp120 with b12, this work serves as a template for the structural interrogation of full-length glycosylated gp120 with other bNAbs to better characterize the interactions that drive the broad specificity of the bNAb

    Gene Transcription and Splicing of T-Type Channels Are Evolutionarily-Conserved Strategies for Regulating Channel Expression and Gating

    Get PDF
    T-type calcium channels operate within tightly regulated biophysical constraints for supporting rhythmic firing in the brain, heart and secretory organs of invertebrates and vertebrates. The snail T-type gene, LCav3 from Lymnaea stagnalis, possesses alternative, tandem donor splice sites enabling a choice of a large exon 8b (201 aa) or a short exon 25c (9 aa) in cytoplasmic linkers, similar to mammalian homologs. Inclusion of optional 25c exons in the III–IV linker of T-type channels speeds up kinetics and causes hyperpolarizing shifts in both activation and steady-state inactivation of macroscopic currents. The abundant variant lacking exon 25c is the workhorse of embryonic Cav3 channels, whose high density and right-shifted activation and availability curves are expected to increase pace-making and allow the channels to contribute more significantly to cellular excitation in prenatal tissue. Presence of brain-enriched, optional exon 8b conserved with mammalian Cav3.1 and encompassing the proximal half of the I–II linker, imparts a ∼50% reduction in total and surface-expressed LCav3 channel protein, which accounts for reduced whole-cell calcium currents of +8b variants in HEK cells. Evolutionarily conserved optional exons in cytoplasmic linkers of Cav3 channels regulate expression (exon 8b) and a battery of biophysical properties (exon 25c) for tuning specialized firing patterns in different tissues and throughout development

    Evaluating Design Decay during Software Evolution

    Full text link
    Les logiciels sont en constante évolution, nécessitant une maintenance et un développement continus. Ils subissent des changements tout au long de leur vie, que ce soit pendant l'ajout de nouvelles fonctionnalités ou la correction de bogues dans le code. Lorsque ces logiciels évoluent, leurs architectures ont tendance à se dégrader avec le temps et deviennent moins adaptables aux nouvelles spécifications des utilisateurs. Elles deviennent plus complexes et plus difficiles à maintenir. Dans certains cas, les développeurs préfèrent refaire la conception de ces architectures à partir du zéro plutôt que de prolonger la durée de leurs vies, ce qui engendre une augmentation importante des coûts de développement et de maintenance. Par conséquent, les développeurs doivent comprendre les facteurs qui conduisent à la dégradation des architectures, pour prendre des mesures proactives qui facilitent les futurs changements et ralentissent leur dégradation. La dégradation des architectures se produit lorsque des développeurs qui ne comprennent pas la conception originale du logiciel apportent des changements au logiciel. D'une part, faire des changements sans comprendre leurs impacts peut conduire à l'introduction de bogues et à la retraite prématurée du logiciel. D'autre part, les développeurs qui manquent de connaissances et–ou d'expérience dans la résolution d'un problème de conception peuvent introduire des défauts de conception. Ces défauts ont pour conséquence de rendre les logiciels plus difficiles à maintenir et évoluer. Par conséquent, les développeurs ont besoin de mécanismes pour comprendre l'impact d'un changement sur le reste du logiciel et d'outils pour détecter les défauts de conception afin de les corriger. Dans le cadre de cette thèse, nous proposons trois principales contributions. La première contribution concerne l'évaluation de la dégradation des architectures logicielles. Cette évaluation consiste à utiliser une technique d’appariement de diagrammes, tels que les diagrammes de classes, pour identifier les changements structurels entre plusieurs versions d'une architecture logicielle. Cette étape nécessite l'identification des renommages de classes. Par conséquent, la première étape de notre approche consiste à identifier les renommages de classes durant l'évolution de l'architecture logicielle. Ensuite, la deuxième étape consiste à faire l'appariement de plusieurs versions d'une architecture pour identifier ses parties stables et celles qui sont en dégradation. Nous proposons des algorithmes de bit-vecteur et de clustering pour analyser la correspondance entre plusieurs versions d'une architecture. La troisième étape consiste à mesurer la dégradation de l'architecture durant l'évolution du logiciel. Nous proposons un ensemble de m´etriques sur les parties stables du logiciel, pour évaluer cette dégradation. La deuxième contribution est liée à l'analyse de l'impact des changements dans un logiciel. Dans ce contexte, nous présentons une nouvelle métaphore inspirée de la séismologie pour identifier l'impact des changements. Notre approche considère un changement à une classe comme un tremblement de terre qui se propage dans le logiciel à travers une longue chaîne de classes intermédiaires. Notre approche combine l'analyse de dépendances structurelles des classes et l'analyse de leur historique (les relations de co-changement) afin de mesurer l'ampleur de la propagation du changement dans le logiciel, i.e., comment un changement se propage à partir de la classe modifiée è d'autres classes du logiciel. La troisième contribution concerne la détection des défauts de conception. Nous proposons une métaphore inspirée du système immunitaire naturel. Comme toute créature vivante, la conception de systèmes est exposée aux maladies, qui sont des défauts de conception. Les approches de détection sont des mécanismes de défense pour les conception des systèmes. Un système immunitaire naturel peut détecter des pathogènes similaires avec une bonne précision. Cette bonne précision a inspiré une famille d'algorithmes de classification, appelés systèmes immunitaires artificiels (AIS), que nous utilisions pour détecter les défauts de conception. Les différentes contributions ont été évaluées sur des logiciels libres orientés objets et les résultats obtenus nous permettent de formuler les conclusions suivantes: • Les métriques Tunnel Triplets Metric (TTM) et Common Triplets Metric (CTM), fournissent aux développeurs de bons indices sur la dégradation de l'architecture. La d´ecroissance de TTM indique que la conception originale de l'architecture s’est dégradée. La stabilité de TTM indique la stabilité de la conception originale, ce qui signifie que le système est adapté aux nouvelles spécifications des utilisateurs. • La séismologie est une métaphore intéressante pour l'analyse de l'impact des changements. En effet, les changements se propagent dans les systèmes comme les tremblements de terre. L'impact d'un changement est plus important autour de la classe qui change et diminue progressivement avec la distance à cette classe. Notre approche aide les développeurs à identifier l'impact d'un changement. • Le système immunitaire est une métaphore intéressante pour la détection des défauts de conception. Les résultats des expériences ont montré que la précision et le rappel de notre approche sont comparables ou supérieurs à ceux des approches existantes.Software systems evolve, requiring continuous maintenance and development. They undergo changes throughout their lifetimes as new features are added and bugs are fixed. As these systems evolved, their designs tend to decay with time and become less adaptable to changing users'requirements. Consequently, software designs become more complex over time and harder to maintain; in some not-sorare cases, developers prefer redesigning from scratch rather than prolonging the life of existing designs, which causes development and maintenance costs to rise. Therefore, developers must understand the factors that drive the decay of their designs and take proactive steps that facilitate future changes and slow down decay. Design decay occurs when changes are made on a software system by developers who do not understand its original design. On the one hand, making software changes without understanding their effects may lead to the introduction of bugs and the premature retirement of the system. On the other hand, when developers lack knowledge and–or experience in solving a design problem, they may introduce design defects, which are conjectured to have a negative impact on the evolution of systems, which leads to design decay. Thus, developers need mechanisms to understand how a change to a system will impact the rest of the system and tools to detect design defects. In this dissertation, we propose three principal contributions. The first contribution aims to evaluate design decay. Measuring design decay consists of using a diagram matching technique to identify structural changes among versions of a design, such as a class diagram. Finding structural changes occurring in long-lived, evolving designs requires the identification of class renamings. Thus, the first step of our approach concerns the identification of class renamings in evolving designs. Then, the second step requires to match several versions of an evolving design to identify decaying and stable parts of the design. We propose bit-vector and incremental clustering algorithms to match several versions of an evolving design. The third step consists of measuring design decay. We propose a set of metrics to evaluate this design decay. The second contribution is related to change impact analysis. We present a new metaphor inspired from seismology to identify the change impact. In particular, our approach considers changes to a class as an earthquake that propagates through a long chain of intermediary classes. Our approach combines static dependencies between classes and historical co-change relations to measure the scope of change propagation in a system, i.e., how far a change propagation will proceed from a “changed class” to other classes. The third contribution concerns design defect detection. We propose a metaphor inspired from a natural immune system. Like any living creature, designs are subject to diseases, which are design defects. Detection approaches are defense mechanisms if designs. A natural immune system can detect similar pathogens with good precision. This good precision has inspired a family of classification algorithms, artificial Immune Systems (AIS) algorithms, which we use to detect design defects. The three contributions are evaluated on open-source object-oriented systems and the obtained results enable us to draw the following conclusions: • Design decay metrics, Tunnel Triplets Metric (TTM) and Common Triplets Metric (CTM), provide developers useful insights regarding design decay. If TTM decreases, then the original design decays. If TTM is stable, then the original design is stable, which means that the system is more adapted to the new changing requirements. • Seismology provides an interesting metaphor for change impact analysis. Changes propagate in systems, like earthquakes. The change impact is most severe near the changed class and drops off away from the changed class. Using external information, we show that our approach helps developers to locate easily the change impact. • Immune system provides an interesting metaphor for detecting design defects. The results of the experiments showed that the precision and recall of our approach are comparable or superior to that of previous approaches
    corecore