491 research outputs found

    Methods for Investigation of Dependencies between Attributes in Databases

    Get PDF
    This paper surveys research in the field of data mining, which is related to discovering the dependencies between attributes in databases. We consider a number of approaches to finding the distribution intervals of association rules, to discovering branching dependencies between a given set of attributes and a given attribute in a database relation, to finding fractional dependencies between a given set of attributes and a given attribute in a database relation, and to collaborative filtering

    An Integer Programming approach to Bayesian Network Structure Learning

    Get PDF
    We study the problem of learning a Bayesian Network structure from data using an Integer Programming approach. We study the existing approaches, an in particular some recent works that formulate the problem as an Integer Programming model. By discussing some weaknesses of the existing approaches, we propose an alternative solution, based on a statistical sparsification of the search space. Results show how our approach can lead to promising results, especially for large network

    Computational approaches for metagenomic analysis of high-throughput sequencing data

    Get PDF
    High-throughput DNA sequencing has revolutionised microbiology and is the foundation on which the nascent field of metagenomics has been built. This ability to cheaply sample billions of DNA reads directly from environments has democratised sequencing and allowed researchers to gain unprecedented insights into diverse microbial communities. These technologies however are not without their limitations: the short length of the reads requires the production of vast amounts of data to ensure all information is captured. This “data deluge” has been a major bottleneck and has necessitated the development of new algorithms for analysis. Sequence alignment methods provide the most information about the composition of a sample as they allow both taxonomic and functional classification but algorithms are prohibitively slow. This inefficiency has led to the reliance on faster algorithms which only produce simple taxonomic classification or abundance estimation, losing the valuable information given by full alignments against annotated genomes. This thesis will describe k-SLAM, a novel ultra-fast method for the alignment and taxonomic classification of metagenomic data. Using a k -mer based method k-SLAM achieves speeds three orders of magnitude faster than current alignment based approaches, allowing a full taxonomic classification and gene identification to be tractable on modern large datasets. The alignments found by k-SLAM can also be used to find variants and identify genes, along with their nearest taxonomic origins. A novel pseudo-assembly method produces more specific taxonomic classifications on species which have high sequence identity within their genus. This provides a significant (up to 40%) increase in accuracy on these species. Also described is a re-analysis of a Shiga-toxin producing E. coli O104:H4 isolate via alignment against bacterial and viral species to find antibiotic resistance and toxin producing genes. k-SLAM has been used by a range of research projects including FLORINASH and is currently being used by a number of groups.Open Acces

    Alamprotsessidest, protsesside variatsioonidest ja nendevahelisest koosmĂ”just: Integreeritud “jaga ja valitse” meetod Ă€riprotsesside ja nende variatsioonide modelleerimiseks

    Get PDF
    Igat organisatsiooni vĂ”ib vaadelda kui sĂŒsteemi, mis rakendab Ă€riprotsesse vÀÀrtuste loomiseks. Suurtes organisatsioonides on tavapĂ€rane esitada Ă€riprotsesse kasutades protsessimudeleid, mida kasutatakse erinevatel eesmĂ€rkidel nagu nĂ€iteks sisekommunikatsiooniks, koolitusteks, protsesside parendamiseks ja infosĂŒsteemide arendamiseks. Arvestades protsessimudelite multifunktsionaalset olemust tuleb protsessimudeleid koostada selliselt, et see vĂ”imaldab nendest arusaamist ning haldamist erinevate osapoolte poolt. KĂ€esolev doktoritöö pakkudes vĂ€lja integreeritud dekompositsioonist ajendatud meetodi Ă€riprotsesside modelleerimiseks koos nende variatsioonidega. Meetodi kandvaks ideeks on jĂ€rkjĂ€rguline Ă€riprotsessi ja selle variatsioonide dekomponeerimine alamprotsessideks. Igal dekompositsiooni tasemel ning iga alamprotsessi jaoks mÀÀratletakse esmalt kas vastavat alamprotsessi tuleks modelleerida konsolideeritud moel (ĂŒks alamprotsessi mudel kĂ”ikide vĂ”i osade variatsioonide jaoks) vĂ”i fragmenteeritud moel (ĂŒks alamprotsess ĂŒhe variatsiooni jaoks). Sel moel kasutades ĂŒlalt-alla lĂ€henemist viilutatakse ja tĂŒkeldatakse Ă€riprotsess vĂ€iksemateks osadeks. Äriprotsess viilutatakse esmalt tema variatsioonideks ning seejĂ€rel tĂŒkeldatakse dekompositsioonideks kasutades kaht peamist parameetrit. Esimeseks on Ă€ri ajendid variatsioonide jaoks – igal Ă€riprotsessi variatsioonil on oma juurpĂ”hjus, mis pĂ€rineb Ă€rist endast ja pĂ”hjustab protsesside kĂ€ivitamisel erisusi. Need juurpĂ”hjused jagatakse viide kategooriasse – ajendid kliendist, tootest, operatiivsetest pĂ”hjustest, turust ja ajast. Teine parameeter on erinevuste hulk viisides (tegevuste jĂ€rjekord, tulemuste vÀÀrtused jms) kuidas variatsioonid oma vĂ€ljundit toodavad. KĂ€esolevas töös esitatud meetod on valideeritud kahes praktilises juhtumiuuringus. Kui esimeses juhtumiuuringus on pĂ”hirĂ”hk olemasolevate protsessimudelite konsolideerimisel, siis teises protsessimudelite avastamisel. Sel moel rakendatakse meetodit kahes eri kontekstis kahele ĂŒksteisest eristatud juhtumile. MĂ”lemas juhtumiuuringus tootis meetod protsessimudelite hulgad, milles oli liiasust kuni 50% vĂ€hem vĂ”rreldes tavapĂ€raste meetoditega jĂ€ttes samas mudelite keerukuse nendega vĂ”rreldes enamvĂ€hem samale tasemele.Every organization can be conceived as a system where value is created by means of business processes. In large organizations, it is common for business processes to be represented by means of process models, which are used for a range of purposes such as internal communication, training, process improvement and information systems development. Given their multifunctional character, process models need to be captured in a way that facilitates understanding and maintenance by a variety of stakeholders. This thesis proposes an integrated decomposition-driven method for modeling business processes with variants. The core idea of the method is to incrementally construct a decomposition of a business process and its variants into subprocesses. At each level of the decomposition and for each subprocess, we determine if this subprocess should be modeled in a consolidated manner (one subprocess model for all variants or for multiple variants) or in a fragmented manner (one subprocess model per variant). In this manner, a top-down approach of slicing and dicing a business process is taken. The process model is sliced in accordance with its variants, and then diced (decomposed). This decision is taken based on two parameters. The first is the business drivers for the existence of the variants. All variants of a business process has a root cause i.e. a reason stemming from the business that causes the processes to have differences in how they are executed. The second parameter considered when deciding how to model the variants is the degree of difference in the way the variants produce their outcomes. As such, the modeling of business process variations is dependent on their degree of similarity in regards to how they produce value (such as values, execution order and so on). The method presented in this thesis is validated by two real-life case studies. The first case study concerns a case of consolidation existing process models. The other deals with green-field process discovery. As such, the method is applied in two different contexts (consolidation and discovery) on two different cases that differ from each other. In both cases, the method produced sets of process models that had reduced the duplicity rate by up to 50 % while keeping the degree of complexity of the models relatively stable
    • 

    corecore