272 research outputs found

    Filogenetska analiza i molekularna karakterizacija virusa humane imunodeficijencije u Srbiji

    Get PDF
    Human immunodeficiency virus (HIV) is a retrovirus, the causative agent of Acquired immunodeficiency syndrome (AIDS). Since the beginning of the epidemic over 35 years ago, more than 78 million people have been infected so far and over 30 million have died. The high genetic variability and rapid evolution of HIV have been critical to its persistence and spread throughout the world. HIV-1 and HIV-2 comprise two distinct types of HIV. HIV-1 has diversified extensively into numerous genetic forms, including four groups (M, N, O, P), of which group M is causing the pandemic of HIV infection and AIDS. Group M viruses are further classified in multiple phylogenetically distinct subtypes (A-D, F, G, H, J and K), sub-subtypes (A1, A2, F1 and F2) and numerous recombinant forms. The global distribution of HIV-1 is complex and dynamic with regional epidemics representing only a subset of the global diversity. Molecular phylogenetic analysis, a method of reconstructing evolutionary relationships between nucleotide sequences, is one of the strategies for studying viral diversity and transmission dynamics. It is estimated that around half of HIV infected people are undiagnosed, making identification of transmission networks important for targeted public health intervention programs...Virus humane imunodeficijencije (HIV) je retrovirus koji uzrokuje sindrom stečene imunodeficijencije. Od početka epidemije pre 35 godina, ovim virusom je inficirano više od 78 miliona ljudi a preko 30 miliona je umrlo. Visoka genetička varijabilnost i brza evolucija HIV-a su ključni uzroci opstanka i globalnog širenjaepidemije. HIV je filogenetski klasifikovan u dva tipa: HIV-1 i HIV-2. Visoki diverzitet HIV-1 ogleda u postojanju četiri grupe (M, N, O, P) od kojih su virusi grupe M uzročnici globalne HIV-1 pandemije. Grupa M virusa je podeljena u više filogenetski različitih podtipova (A-D, F-H, J i K), pod-podtipove (A1, A2, F1 i F2) i cirkulišuće rekombinantne forme. Distribucija podtipova u svetu je složena i dinamična sa regionalnim HIV-1 epidemijama unutar globalnog diverziteta. Molekularna filogenetska analiza, metod za rekonstrukciju evolutivnih odnosa između nukleotidnih sekvenci, je tehnika za proučavanje varijabilnosti virusa i dinamike transmisije unutar regionalnih populacija. Procenjuje se da kod blizu polovine inficiranih osoba HIV infekcija nije dijagnostikovana, zbog čega je identifikacija puteva transmisije izuzetno značajna u cilju javno zdravstvenog nadzora. U ovom istraživanju primenjene su savremene filogenetske metode u analizi HIV-1 sekvenci izolata iz Srbije u cilju karakterizacije molekularne epidemiologije i dinamike transmisije, što je ključno za bolje razumevanje karakteristika aktuelne HIV-1 epidemije u Srbiji..

    Technical debt-aware and evolutionary adaptation for service composition in SaaS clouds

    Get PDF
    The advantages of composing and delivering software applications in the Cloud-Based Software as a Service (SaaS) model are offering cost-effective solutions with minimal resource management. However, several functionally-equivalent web services with diverse Quality of Service (QoS) values have emerged in the SaaS cloud, and the tenant-specific requirements tend to lead the difficulties to select the suitable web services for composing the software application. Moreover, given the changing workload from the tenants, it is not uncommon for a service composition running in the multi-tenant SaaS cloud to encounter under-utilisation and over-utilisation on the component services that affects the service revenue and violates the service level agreement respectively. All those bring challenging decision-making tasks: (i) when to recompose the composite service? (ii) how to select new component services for the composition that maximise the service utility over time? at the same time, low operation cost of the service composition is desirable in the SaaS cloud. In this context, this thesis contributes an economic-driven service composition framework to address the above challenges. The framework takes advantage of the principal of technical debt- a well-known software engineering concept, evolutionary algorithm and time-series forecasting method to predictively handle the service provider constraints and SaaS dynamics for creating added values in the service composition. We emulate the SaaS environment setting for conducting several experiments using an e-commerce system, realistic datasets and workload trace. Further, we evaluate the framework by comparing it with other state-of-the-art approaches based on diverse quality metrics

    Elastic Dataflow Processing on the Cloud

    Get PDF
    Τα νεφη εχουν μετατραπει σε μια ελκυστικη πλατφορμα για την πολυπλοκη επεξεργασια δεδομενων μεγαλης κλιμακας, ειδικα εξαιτιας της εννοιας της ελαστικοτητας, η οποια και τα χαρακτηριζει: οι υπολογιστικοι ποροι μπορουν να εκμισθωθουν δυναμικα και να χρησιμοποιουνται για οσο χρονο ειναι απαραιτητο. Αυτο δινει την δυνατοτητα να δημιουργηθει μια εικονικη υποδομη η οποια μπορει να αλλαζει δυναμικα στο χρονο. Οι συγχρονες εφαρμογες απαιτουν την εκτελεση πολυπλοκων ερωτηματων σε Μεγαλα Δεδομενα για την εξορυξη γνωσης και την υποστηριξη επιχειρησιακων αποφασεων. Τα πολυπλοκα αυτα ερωτηματα, εκφραζονται σε γλωσσες υψηλου επιπεδου και τυπικα μεταφραζονται σε ροες επεξεργασιας δεδομενων, η απλα ροες δεδομενων. Ενα λογικο ερωτημα που τιθεται ειναι κατα ποσον η ελαστικοτητα επηρεαζει την εκτελεση των ροων δεδομενων και με πιο τροπο. Ειναι λογικο οτι η εκτελεση να ειναι πιθανον γρηγοροτερη αν χρησιμοποιηθουν περισ- σοτεροι υπολογιστικοι ποροι, αλλα το κοστος θα ειναι υψηλοτερο. Αυτο δημιουργει την εννοια της οικο-ελαστικοτητας, ενος επιπλεον τυπου ελαστικοτητας ο οποιος προερχεται απο την οικονο- μικη θεωρια, και συλλαμβανει τις εναλλακτικες μεταξυ του χρονου εκτελεσης και του χρηματικου κοστους οπως προκυπτει απο την χρηση των πορων. Στα πλαισια αυτης της διδακτορικης διατριβης, προσεγγιζουμε την ελαστικοτητα με ενα ενοποιημενο μοντελο που περιλαμβανει και τις δυο ειδων ελαστικοτητες που υπαρχουν στα υπολογιστικα νεφη. Αυτη η ενοποιημενη προσεγγιση της ελαστικοτητας ειναι πολυ σημαντικη στην σχεδιαση συστηματων που ρυθμιζονται αυτοματα (auto-tuned) σε περιβαλλοντα νεφους. Αρχικα δειχνουμε οτι η οικο-ελαστικοτητα υπαρχει σε αρκετους τυπους υπολογισμου που εμφανιζονται συχνα στην πραξη και οτι μπορει να βρεθει χρησιμοποιωντας εναν απλο, αλλα ταυτοχρονα αποδοτικο και ε- πεκτασιμο αλγοριθμο. Επειτα, παρουσιαζουμε δυο εφαρμογες που χρησιμοποιουν αλγοριθμους οι οποιοι χρησιμοποιουν το ενοποιημενο μοντελο ελαστικοτητας που προτεινουμε για να μπορουν να προσαρμοζουν δυναμικα το συστημα στα ερωτηματα της εισοδου: 1) την ελαστικη επεξεργασια αναλυτικων ερωτηματων τα οποια εχουν πλανα εκτελεσης με μορφη δεντρων με σκοπο την μεγι- στοποιηση του κερδους και 2) την αυτοματη διαχειριση χρησιμων ευρετηριων λαμβανοντας υποψη το χρηματικο κοστος των υπολογιστικων και των αποθηκευτικων πορων. Τελος, παρουσιαζουμε το EXAREME, ενα συστημα για την ελαστικη επεξεργασια μεγαλου ογκου δεδομενων στο νεφος το οποιο εχει χρησιμοποιηθει και επεκταθει σε αυτην την δουλεια. Το συστημα προσφερει δηλωτικες γλωσσες που βασιζονται στην SQL επεκταμενη με συναρτησεις οι οποιες μπορει να οριστουν απο χρηστες (User-Defined Functions, UDFs). Επιπλεον, το συντακτικο της γλωσσας εχει επεκταθει με στοιχεια παραλληλισμου. Το EXAREME εχει σχεδιαστει για να εκμεταλλευεται τις ελαστικοτη- τες που προσφερουν τα νεφη, δεσμευοντας και αποδεσμευοντας υπολογιστικους πορους δυναμικα με σκοπο την προσαρμογη στα ερωτηματα.Clouds have become an attractive platform for the large-scale processing of modern applications on Big Data, especially due to the concept of elasticity, which characterizes them: resources can be leased on demand and used for as much time as needed, offering the ability to create virtual infrastructures that change dynamically over time. Such applications often require processing of complex queries that are expressed in a high-level language and are typically transformed into data processing flows (dataflows). A logical question that arises is whether elasticity affects dataflow execution and in which way. It seems reasonable that the execution is faster when more resources are used, however the monetary cost is higher. This gives rise to the concept eco-elasticity, an additional kind of elasticity that comes from economics, and captures the trade-offs between the response time of the system and the amount of money we pay for it as influenced by the use of different amounts of resources. In this thesis, we approach the elasticity of clouds in a unified way that combines both the traditional notion and eco-elasticity. This unified elasticity concept is essential for the development of auto-tuned systems in cloud environments. First, we demonstrate that eco-elasticity exists in several common tasks that appear in practice and that can be discovered using a simple, yet highly scalable and efficient algorithm. Next, we present two cases of auto-tuned algorithms that use the unified model of elasticity in order to adapt to the query workload: 1) processing analytical queries in the form of tree execution plans in order to maximize profit and 2) automated index management taking into account compute and storage re- sources. Finally, we describe EXAREME, a system for elastic data processing on the cloud that has been used and extended in this work. The system offers declarative languages that are based on SQL with user-defined functions (UDFs) extended with parallelism primi- tives. EXAREME exploits both elasticities of clouds by dynamically allocating and deallocating compute resources in order to adapt to the query workload

    Coping with new Challenges in Clustering and Biomedical Imaging

    Get PDF
    The last years have seen a tremendous increase of data acquisition in different scientific fields such as molecular biology, bioinformatics or biomedicine. Therefore, novel methods are needed for automatic data processing and analysis of this large amount of data. Data mining is the process of applying methods like clustering or classification to large databases in order to uncover hidden patterns. Clustering is the task of partitioning points of a data set into distinct groups in order to minimize the intra cluster similarity and to maximize the inter cluster similarity. In contrast to unsupervised learning like clustering, the classification problem is known as supervised learning that aims at the prediction of group membership of data objects on the basis of rules learned from a training set where the group membership is known. Specialized methods have been proposed for hierarchical and partitioning clustering. However, these methods suffer from several drawbacks. In the first part of this work, new clustering methods are proposed that cope with problems from conventional clustering algorithms. ITCH (Information-Theoretic Cluster Hierarchies) is a hierarchical clustering method that is based on a hierarchical variant of the Minimum Description Length (MDL) principle which finds hierarchies of clusters without requiring input parameters. As ITCH may converge only to a local optimum we propose GACH (Genetic Algorithm for Finding Cluster Hierarchies) that combines the benefits from genetic algorithms with information-theory. In this way the search space is explored more effectively. Furthermore, we propose INTEGRATE a novel clustering method for data with mixed numerical and categorical attributes. Supported by the MDL principle our method integrates the information provided by heterogeneous numerical and categorical attributes and thus naturally balances the influence of both sources of information. A competitive evaluation illustrates that INTEGRATE is more effective than existing clustering methods for mixed type data. Besides clustering methods for single data objects we provide a solution for clustering different data sets that are represented by their skylines. The skyline operator is a well-established database primitive for finding database objects which minimize two or more attributes with an unknown weighting between these attributes. In this thesis, we define a similarity measure, called SkyDist, for comparing skylines of different data sets that can directly be integrated into different data mining tasks such as clustering or classification. The experiments show that SkyDist in combination with different clustering algorithms can give useful insights into many applications. In the second part, we focus on the analysis of high resolution magnetic resonance images (MRI) that are clinically relevant and may allow for an early detection and diagnosis of several diseases. In particular, we propose a framework for the classification of Alzheimer's disease in MR images combining the data mining steps of feature selection, clustering and classification. As a result, a set of highly selective features discriminating patients with Alzheimer and healthy people has been identified. However, the analysis of the high dimensional MR images is extremely time-consuming. Therefore we developed JGrid, a scalable distributed computing solution designed to allow for a large scale analysis of MRI and thus an optimized prediction of diagnosis. In another study we apply efficient algorithms for motif discovery to task-fMRI scans in order to identify patterns in the brain that are characteristic for patients with somatoform pain disorder. We find groups of brain compartments that occur frequently within the brain networks and discriminate well among healthy and diseased people

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Systems epidemiology to devise new interventions for multi-host tuberculosis

    Get PDF
    Animal tuberculosis (TB) is an infectious disease of livestock and wildlife mainly caused by Mycobacterium bovis and, in a minor extent, Mycobacterium caprae. In Portugal, animal TB is maintained in a multi-host system involving livestock and wild ungulates. The understanding of the processes driving transmission at this interface is key to inform control. In this work, M. caprae isolates (n=55) from Portugal were characterized by spoligotyping and MIRU-VNTR, supporting the clonal structure, co-infection and in vivo microevolution of this ecotype. M. bovis (n=948) from cattle, red deer and wild boar from TB hotspot regions were genotyped. Bayesian inference identified five ancestral populations and associated the most probable ancient M. bovis subpopulation with cattle and Beja, providing clues on the epidemics origin. A multinomial spatiotemporal probability model identified two significant TB clusters: one that persisted in 2004-2010 with Barrancos (Beja) at the centre, highlighting a significant higher risk associated to cattle; a second cluster, predominant in 2012-2016, holding the county Rosmaninhal (Castelo Branco) at the centre, for which wild boar contributed the most in relative risk. Whole-genome sequences (WGS) of 44 representative M. bovis distinguished five genetic clades and supported sustained transmission and multiple introductions in this multi-host system. Exploratory evolutionary analysis gave further support to pathogen transition between different hosts. Comparative genomics applied to M. bovis (n=70) representing the global clonal complex diversity predicted an open pan-genome and showed diversification of discrete subpopulations through core and accessory genomes. Consistent non-synonymous SNPs illustrated clade-specific virulence landscapes correlating with disease severity. Positive selection and weaker effects of recombination compared with mutation were evidenced as predominant evolutionary forces. Altogether, our results provide novel evidence on the population structure and evolution of M. caprae and M. bovis, delivering insights that could be used to inform adaptive TB control choices in different hosts and regions

    Value- and debt-aware selection and composition in cloud-based service-oriented architectures using real options

    Get PDF
    This thesis presents a novel model for service selection and composition in Cloud-based Service-Oriented Architectures (CB-SOA), which is called CloudMTD, using real options, Dependency Structure Matrix (DSM) and propagation-cost metrics. CB-SOA architectures are composed of web services, which are leased or bought off the cloud marketplace. CB-SOA can improve its utility and add value to its composition by substituting its constituent services. The substitution decisions may introduce technical debt, which needs to be managed. The thesis defines the concept of technical debt for CB-SOA and reports on the available technical debt definitions and approaches in the literature. The formulation of service substitution problem and its technical debt valuation is based on options, which exploits Binomial Options Analysis. This thesis looks at different option types under uncertainty. This thesis is concerned with some scenarios that may lead to technical debt, which are related to web service selection and composition that has been driven by either a technical or a business objective. In each scenario, we are interested in three decisions (1) keep, (2) substitute or (3) abandon the current service. Each scenario takes into consideration either one or more QoS attribute dimension (e.g. Availability). We address these scenarios from an option-based perspective. Each scenario is linked to a suitable option type. A specific option type depends on the nature of the application, problem to be investigated, and the decision to be taken. In addition, we use Dependency Structure Matrix (DSM) in order to represent dependencies among web services in CB-SOA. We introduce time and complexity sensitive propagation-cost metrics to DSM to solve the problem. In addition, CloudMTD model informs the time-value of the decisions under uncertainty based on behavioral and structural aspects of CB-SOA
    corecore