12 research outputs found

    Bioprospecting for Genes Encoding Hydrocarbon-Degrading Enzymes from Metagenomic Samples Isolated from Northern Adriatic Sea Sediments

    Get PDF
    Three metagenomic libraries were constructed using surface sediment samples from the northern Adriatic Sea. Two of the samples were taken from a highly polluted and an unpolluted site respectively. The third sample from a polluted site had been enriched using crude oil. The results of the metagenome analyses were incorporated in the REDPET relational database (http://redpet.bioinfo.pbf.hr/REDPET), which was generated using the previously developed MEGGASENSE platform. The database includes taxonomic data to allow the assessment of the biodiversity of metagenomic libraries and a general functional analysis of genes using hidden Markov model (HMM) profiles based on the KEGG database. A set of 22 specialised HMM-profiles was developed to detect putative genes for hydrocarbon-degrading enzymes. Use of these profiles showed that the metagenomic library generated after selection on crude oil had enriched genes for aerobic n-alkane degradation. The use of this system for bioprospecting was exemplified using potential alkB and almA genes from this library

    Optimiranje metoda i reprezentacija za prediktivno modeliranje mehanizama djelovanja i afiniteta vezanja bioloŔki aktivnih molekula

    No full text
    The vastness of the chemical space of compound scaffolds is humongous and it represents a large playground for potential lead drug discovery or repurposing. With the accumulation of experimental data over the years, together with the development of more complex statistical frameworks, screening of such elaborate chemical spaces is finally possible. There are several well-defined problem areas for drug screening efforts, the most popular being the inhibition activity against a multitude of protein targets in human cells related to often occurring diseases. Some examples of highly targeted protein spaces include protein kinases, g-protein coupled receptors, and/or (non)selective serotonin re-uptake inhibitors. Mutation and dysregulation in any of the three of the mentioned protein groups can result in hereditary disorders, tumors, and mental disorders. Contrary to the available machine learning frameworks for prediction of direct physical interactions between compounds and protein targets, certain chemical activity predictions are not well-represented or defined in the literature, e.g. phytotoxic activity. In this work, publicly available data is collected with regard to the experimentally measured binding affinities of diverse compounds against one of the most popular target protein families, protein kinases. This protein super-family is one of the most important enzyme groups responsible for the regulation of most of the important cellular processes, including cell metabolism, cell growth, and division. Protein kinases regulate biochemical cycles by transferring high energy phosphoryl group from adenosine-3-phosphate (ATP) to specific amino acid residues of the target protein substrates. All members of this enzyme family are characterised by the highly conserved protein kinase (PK) domain, but depending on the phosphorylation site and the activation mechanisms of individual members of this family, this superfamily can be divided into several kinase groups. Due the specific characteristics of this protein group and kinase inhibitors, it is important to investigate how each of these chemical or biological spaces impact models performance and how to achieve more optimal predictive performance. On the other hand, we examine a different subspace of biological activity, focusing mostly on synthetic compounds with determined phytotoxic or herbicidal activity. We define this problem as a multiclass classification problem by using two predefined classification systems: main one, by the Herbicide Resistance Action Committee (HRAC), and the second one, by the Weed Science Society of America (WSSA). Considering that no defined machine learning framework for modeling and prediction of herbicidal activity was publicly available, an effort was made to collect the representative data set and define the optimal computational approach to maximize the prediction accuracy for mode of action (MoA) prediction. Considering that the classification of phytotoxic compounds was mostly performed by visual inspection of phenotypic changes in the affected weeds, there is a great need for an automated, systematic approach to this endeavor. Due to the limited size of the collected data, consisting of molecular structures of known activity and denoted by a MoA group, we further tested several ā€œshallowā€ learners. The panel of tested algorithms includes naive bayes (NB), support vector machines (SVM), extreme-gradient boosting approach (XGBoost) and random forest (RF). All the approaches mentioned were trained in a ten times repeated ten fold (10x10-fold) cross validation mode. A comparison of trained models over all hundred resamples was performed using a non-frequentist approach - Bayesian analysis. For the first time for the herbicide activity modeling, we have implemented a computational framework from feature processing and selection to the training of several learners and, ultimately, a statistical comparison of their performance. However, due to the sheer size of the publicly available experimental data for protein kinase inhibitors, modeling of physical interactions between small compound spaces and the human kinome has allowed for more complex modeling techniques - but has also been more challenging in defining and engineering the feature space for over 8000 compounds and the nuance of the protein kinase family. Both of the aforementioned methods are founded on the QSAR (Quantitative structure-activity relationship) modeling principles. The definition of the applicability domain (AD) for a specified problem is one of the pillars of QSAR modeling. However, defining the boundaries of the chemical space within which the model can make accurate predictions is not simple and is dependent on the nature of the trained model. In the case of predicting general biological activity in the form of a phenotypic signal, as is the case with herbicidal activity, the applicability domain can be simply defined in two-dimensional space by considering the structural similarity of available molecules and a model output, such as the probability of belonging to a particular class. Predicting the physical interaction between any two entities, such as compounds and protein targets, adds complexity that cannot be accommodated by the conventional applicability domain. In this instance, we intend to extend the standard applicability domain to include both entities and generate a quantitative estimate of prediction confidence using the conformal prediction framework. Conformal predictors can reliably estimate a prediction region based on the computed nonconformity of test samples. The disadvantage of this method is that the nonconformity is defined in the label space of predefined calibration samples, resulting in estimates that work well in general but are not specific to any tested compound-target pair, thus failing for samples that are not already available in the training set. Combining concepts from both frameworks, we dynamically define similarity-based applicability domains or conformity regions for each new sample and then calculate nonconformity scores - we refer to this approach as the dynamic applicability domain (dAD). The dAD approach was shown to produce tighter prediction regions when compared to the original conformal predictors algorithm. More importantly, complementary to the prediction regions, when it comes to realistic use-case scenarios (S2, S3), dAD achieves lower error rates for any confidence level. More importantly, merging the concept of applicability domain with a conformal predictors corrects for existing bottlenecks in the traditional applicability domain definition and allows for the evaluation of model behavior in an abstract interaction space between any number of interacting entities. This way, it is a valuable and informative approach for validation of data quality in subregions of interaction space specific for biomolecular complexes.Veličina prostora potencijalnih kemijskih struktura je ogromna te omogućava pretraživanje i testiranje novih potencijalnih terapeutika ili prenamjenu već postojećih u svrhu ciljanja drugih proteina. Kroz vrijeme, sve veće nakupljanje eksperimentalnih podataka i razvoja naprednih statističkih pristupa omogućilo je učinkovito ciljano pretraživanje kemijskog prostora. Postoji nekoliko dobro definiranih problematičnih područja gdje se automatizirano pretraživanje novih terapeutika pokazalo učinkovitim, a najpopularnija je inhibicija aktivnosti mnoÅ”tva ciljanih proteina u ljudskim stanicama povezanih s učestalnim bolestima. Među proteinske skupine od velikog interasa spadaju proteinske kinaze, g-protein spregnuti receptori i/ili (ne)selektivni inhibitori ponovne pohrane serotonina. Mutacija i disregulacija u bilo kojoj od tri navedene skupine proteina može rezultirati nasljednim poremećajima, tumorima i mentalnim poremećajima. Suprotno dostupnim okvirima strojnog učenja za predviđanje izravnih fizičkih interakcija između spojeva i proteina od interesa, određena predviđanja kemijske aktivnosti nisu dobro predstavljena ili definirana u literaturi, npr. herbicidno djelovanje. U ovom radu prikupljena je većina javno dostupnih podataka s eksperimentalno izmjerenim afinitetima vezanja različitih spojeva protiv jedne od najpopularnijih proteinskih porodica od interesa, proteinskih kinaza. Ova super-porodica proteina jedna je od najvažnijih enzimskih skupina odgovornih za regulaciju većine važnih staničnih procesa, uključujući regulaciju staničnog metabolizma, rasta i diobe stanica. Kinaze reguliraju biokemijske cikluse prijenosom fosforilnih skupina visoke energije s molekule adenozin-3-fosfata (ATP) na specifične aminokiselinske bočne lance ciljnih proteinskih supstrata. Svi članovi ove obitelji enzima karakterizirani su visoko očuvanom proteinskom kinaznom (PK) domenom, ali ovisno o mjestu fosforilacije i mehanizmima aktivacije, članovi ove porodice mogu se podijeliti u nekoliko kinaznih skupina. S obzirom na specifičnost proteinske porodice kinaza, kao i kinaznih inhibitora, vrlo je važno analizirati utjecaj svakog pojedinačnog kemijskog, odnosno bioloÅ”kog prostora, na izvedbu i učinkovitost samog modela, kao i način za postizanje optimalnijeg rijeÅ”enja. S druge strane, osim prostora proteinskih kinaznih inhibitora, ispitujemo i drugačiji potprostor bioloÅ”ke aktivnosti, fokusirajući se uglavnom na sintetičke primjere molekula s izmjerenom fitotoksičnom aktivnoŔću. Budući da ova specifična aktivnost, u smislu fizičke interakcije između spojeva i ciljanih proteina, obično nije dobro dokumentirana za ovaj specifični zadatak - definiramo ovaj problem kao problem klasifikacije s viÅ”e oznaka uzimajući unaprijed definirane sustave klasifikacije od strane Odbora za otpornost na herbicide (engl. Herbicide Resistance Action Committee, HRAC) i Američko druÅ”tvo za znanost o korovima (engl. Weed Science Society of America, WSSA). Zbog nedostatka javno dostupnih definiranih okvira strojnog učenja za modeliranje i predviđanje učinkovitosti herbicida tijekom provedenog istraživanja, nastojimo sakupiti reprezentativan skup podataka i uspostaviti optimalan računalni pristup radi povećanja točnosti predviđanja mehanizma djelovanja (MoA). Imajući u vidu da se klasifikacija fitotoksičnih spojeva obično vrÅ”i vizualnom inspekcijom promjena fenotipa biljaka nakon izlaganja, postoji izražena potreba za automatizacijom ovog pristupa. Zbog ograničene veličine prikupljenih podataka koji se sastoje od molekularnih struktura poznate aktivnosti i označenih MoA skupinom, dodatno testiramo nekoliko "plitkih" modela strojnog učenja. Panel testiranih algoritama uključuje Naive Bayes (NB), stroj potpornih vektora (engl. support vector machine, SVM), pristup ekstremnog pojačanja gradijenta (engl. extreme gradient boosting, XGBoost) i nasumične Å”ume (engl. random forest, RF). Svi spomenuti pristupi naučeni su u deset puta ponovljenom desetostrukom (10x10-strukom) načinu unakrsne validacije. Usporedba treniranih modela na svih stotinu ponovnih uzoraka provedena je nefrekvencijskim pristupom - Bayesovom analizom. Po prvi put za modeliranje aktivnosti herbicida, implementirali smo računalni okvir od obrade značajki i odabira, od učenja nekoliko modela, i konačno, statističke usporedbe njihove izvedbe. Obje navedene metode temelje se na principima kvantitativnog modeliranja odnosa između strukture i aktivnosti (engl. quantitative structure-activity relationship, QSAR). Definicija domene primjenjivosti za određeni problem jedan je od temelja QSAR-a. Međutim, definiranje granica kemijskog prostora unutar kojeg model može napraviti točna predviđanja nije jednostavno i ovisi o prirodi naučenog modela. U slučaju predviđanja opće bioloÅ”ke aktivnosti u obliku fenotipskog signala, kao Å”to je slučaj s herbicidnom aktivnoŔću, domena primjenjivosti može se jednostavno definirati u dvodimenzionalnom prostoru uzimajući u obzir strukturnu sličnost dostupnih molekula i modelnog produkta kao npr. vjerojatnost pripadnosti određenoj klasi. Predviđanje fizičke interakcije između bilo koja dva entiteta, kao Å”to su spojevi i proteinski ciljevi, dodaje složenost koja se ne može prilagoditi konvencionalnoj domeni primjenjivosti. U ovom slučaju, namjeravamo proÅ”iriti standardnu domenu primjenjivosti kako bismo uključili oba entiteta i generirali kvantitativnu procjenu pouzdanosti predviđanja koriÅ”tenjem okvira predviđanja nesukladnosti primjera (engl. conformal predictors). Navedenim postpukom može se pouzdano procijeniti područje predviđanja na temelju izračunate nesukladnosti ispitnih uzoraka. Nedostatak ove metode je taj Å”to je nesukladnost definirana u prostoru oznaka unaprijed definiranih kalibracijskih uzoraka, Å”to rezultira procjenama koje općenito dobro funkcioniraju, ali nisu specifične ni za jedan testirani par kemijskog spoja i proteina, stoga nisu uspjeÅ”ne za uzorke koji su malo izvan distribucije podataka u skupu za učenje. Kombinirajući koncepte iz oba okvira, dinamički definiramo domene primjenjivosti temeljene na sličnosti, sto nazivamo regijama sukladnosti za svaki novi uzorak, a zatim izračunavamo rezultate nesukladnosti - ovaj pristup nazivamo dinamičkom domenom primjenjivosti (engl. dynamic applicability domain, dAD). Pokazalo se da dAD pristup proizvodi strože intervale predviđanja u usporedbi s izvornim algoritmom konformnih prediktora. JoÅ” važnije, komplementarno regijama predviđanja, dAD postiže niže stope pogreÅ”ke za bilo koju razinu pouzdanosti. Å to je posebno važno za teže scenarije testiranja, kao Å”to su scenariji otkrivanja (S2) i prenamjene (S3)

    Optimiranje metoda i reprezentacija za prediktivno modeliranje mehanizama djelovanja i afiniteta vezanja bioloŔki aktivnih molekula

    No full text
    The vastness of the chemical space of compound scaffolds is humongous and it represents a large playground for potential lead drug discovery or repurposing. With the accumulation of experimental data over the years, together with the development of more complex statistical frameworks, screening of such elaborate chemical spaces is finally possible. There are several well-defined problem areas for drug screening efforts, the most popular being the inhibition activity against a multitude of protein targets in human cells related to often occurring diseases. Some examples of highly targeted protein spaces include protein kinases, g-protein coupled receptors, and/or (non)selective serotonin re-uptake inhibitors. Mutation and dysregulation in any of the three of the mentioned protein groups can result in hereditary disorders, tumors, and mental disorders. Contrary to the available machine learning frameworks for prediction of direct physical interactions between compounds and protein targets, certain chemical activity predictions are not well-represented or defined in the literature, e.g. phytotoxic activity. In this work, publicly available data is collected with regard to the experimentally measured binding affinities of diverse compounds against one of the most popular target protein families, protein kinases. This protein super-family is one of the most important enzyme groups responsible for the regulation of most of the important cellular processes, including cell metabolism, cell growth, and division. Protein kinases regulate biochemical cycles by transferring high energy phosphoryl group from adenosine-3-phosphate (ATP) to specific amino acid residues of the target protein substrates. All members of this enzyme family are characterised by the highly conserved protein kinase (PK) domain, but depending on the phosphorylation site and the activation mechanisms of individual members of this family, this superfamily can be divided into several kinase groups. Due the specific characteristics of this protein group and kinase inhibitors, it is important to investigate how each of these chemical or biological spaces impact models performance and how to achieve more optimal predictive performance. On the other hand, we examine a different subspace of biological activity, focusing mostly on synthetic compounds with determined phytotoxic or herbicidal activity. We define this problem as a multiclass classification problem by using two predefined classification systems: main one, by the Herbicide Resistance Action Committee (HRAC), and the second one, by the Weed Science Society of America (WSSA). Considering that no defined machine learning framework for modeling and prediction of herbicidal activity was publicly available, an effort was made to collect the representative data set and define the optimal computational approach to maximize the prediction accuracy for mode of action (MoA) prediction. Considering that the classification of phytotoxic compounds was mostly performed by visual inspection of phenotypic changes in the affected weeds, there is a great need for an automated, systematic approach to this endeavor. Due to the limited size of the collected data, consisting of molecular structures of known activity and denoted by a MoA group, we further tested several ā€œshallowā€ learners. The panel of tested algorithms includes naive bayes (NB), support vector machines (SVM), extreme-gradient boosting approach (XGBoost) and random forest (RF). All the approaches mentioned were trained in a ten times repeated ten fold (10x10-fold) cross validation mode. A comparison of trained models over all hundred resamples was performed using a non-frequentist approach - Bayesian analysis. For the first time for the herbicide activity modeling, we have implemented a computational framework from feature processing and selection to the training of several learners and, ultimately, a statistical comparison of their performance. However, due to the sheer size of the publicly available experimental data for protein kinase inhibitors, modeling of physical interactions between small compound spaces and the human kinome has allowed for more complex modeling techniques - but has also been more challenging in defining and engineering the feature space for over 8000 compounds and the nuance of the protein kinase family. Both of the aforementioned methods are founded on the QSAR (Quantitative structure-activity relationship) modeling principles. The definition of the applicability domain (AD) for a specified problem is one of the pillars of QSAR modeling. However, defining the boundaries of the chemical space within which the model can make accurate predictions is not simple and is dependent on the nature of the trained model. In the case of predicting general biological activity in the form of a phenotypic signal, as is the case with herbicidal activity, the applicability domain can be simply defined in two-dimensional space by considering the structural similarity of available molecules and a model output, such as the probability of belonging to a particular class. Predicting the physical interaction between any two entities, such as compounds and protein targets, adds complexity that cannot be accommodated by the conventional applicability domain. In this instance, we intend to extend the standard applicability domain to include both entities and generate a quantitative estimate of prediction confidence using the conformal prediction framework. Conformal predictors can reliably estimate a prediction region based on the computed nonconformity of test samples. The disadvantage of this method is that the nonconformity is defined in the label space of predefined calibration samples, resulting in estimates that work well in general but are not specific to any tested compound-target pair, thus failing for samples that are not already available in the training set. Combining concepts from both frameworks, we dynamically define similarity-based applicability domains or conformity regions for each new sample and then calculate nonconformity scores - we refer to this approach as the dynamic applicability domain (dAD). The dAD approach was shown to produce tighter prediction regions when compared to the original conformal predictors algorithm. More importantly, complementary to the prediction regions, when it comes to realistic use-case scenarios (S2, S3), dAD achieves lower error rates for any confidence level. More importantly, merging the concept of applicability domain with a conformal predictors corrects for existing bottlenecks in the traditional applicability domain definition and allows for the evaluation of model behavior in an abstract interaction space between any number of interacting entities. This way, it is a valuable and informative approach for validation of data quality in subregions of interaction space specific for biomolecular complexes.Veličina prostora potencijalnih kemijskih struktura je ogromna te omogućava pretraživanje i testiranje novih potencijalnih terapeutika ili prenamjenu već postojećih u svrhu ciljanja drugih proteina. Kroz vrijeme, sve veće nakupljanje eksperimentalnih podataka i razvoja naprednih statističkih pristupa omogućilo je učinkovito ciljano pretraživanje kemijskog prostora. Postoji nekoliko dobro definiranih problematičnih područja gdje se automatizirano pretraživanje novih terapeutika pokazalo učinkovitim, a najpopularnija je inhibicija aktivnosti mnoÅ”tva ciljanih proteina u ljudskim stanicama povezanih s učestalnim bolestima. Među proteinske skupine od velikog interasa spadaju proteinske kinaze, g-protein spregnuti receptori i/ili (ne)selektivni inhibitori ponovne pohrane serotonina. Mutacija i disregulacija u bilo kojoj od tri navedene skupine proteina može rezultirati nasljednim poremećajima, tumorima i mentalnim poremećajima. Suprotno dostupnim okvirima strojnog učenja za predviđanje izravnih fizičkih interakcija između spojeva i proteina od interesa, određena predviđanja kemijske aktivnosti nisu dobro predstavljena ili definirana u literaturi, npr. herbicidno djelovanje. U ovom radu prikupljena je većina javno dostupnih podataka s eksperimentalno izmjerenim afinitetima vezanja različitih spojeva protiv jedne od najpopularnijih proteinskih porodica od interesa, proteinskih kinaza. Ova super-porodica proteina jedna je od najvažnijih enzimskih skupina odgovornih za regulaciju većine važnih staničnih procesa, uključujući regulaciju staničnog metabolizma, rasta i diobe stanica. Kinaze reguliraju biokemijske cikluse prijenosom fosforilnih skupina visoke energije s molekule adenozin-3-fosfata (ATP) na specifične aminokiselinske bočne lance ciljnih proteinskih supstrata. Svi članovi ove obitelji enzima karakterizirani su visoko očuvanom proteinskom kinaznom (PK) domenom, ali ovisno o mjestu fosforilacije i mehanizmima aktivacije, članovi ove porodice mogu se podijeliti u nekoliko kinaznih skupina. S obzirom na specifičnost proteinske porodice kinaza, kao i kinaznih inhibitora, vrlo je važno analizirati utjecaj svakog pojedinačnog kemijskog, odnosno bioloÅ”kog prostora, na izvedbu i učinkovitost samog modela, kao i način za postizanje optimalnijeg rijeÅ”enja. S druge strane, osim prostora proteinskih kinaznih inhibitora, ispitujemo i drugačiji potprostor bioloÅ”ke aktivnosti, fokusirajući se uglavnom na sintetičke primjere molekula s izmjerenom fitotoksičnom aktivnoŔću. Budući da ova specifična aktivnost, u smislu fizičke interakcije između spojeva i ciljanih proteina, obično nije dobro dokumentirana za ovaj specifični zadatak - definiramo ovaj problem kao problem klasifikacije s viÅ”e oznaka uzimajući unaprijed definirane sustave klasifikacije od strane Odbora za otpornost na herbicide (engl. Herbicide Resistance Action Committee, HRAC) i Američko druÅ”tvo za znanost o korovima (engl. Weed Science Society of America, WSSA). Zbog nedostatka javno dostupnih definiranih okvira strojnog učenja za modeliranje i predviđanje učinkovitosti herbicida tijekom provedenog istraživanja, nastojimo sakupiti reprezentativan skup podataka i uspostaviti optimalan računalni pristup radi povećanja točnosti predviđanja mehanizma djelovanja (MoA). Imajući u vidu da se klasifikacija fitotoksičnih spojeva obično vrÅ”i vizualnom inspekcijom promjena fenotipa biljaka nakon izlaganja, postoji izražena potreba za automatizacijom ovog pristupa. Zbog ograničene veličine prikupljenih podataka koji se sastoje od molekularnih struktura poznate aktivnosti i označenih MoA skupinom, dodatno testiramo nekoliko "plitkih" modela strojnog učenja. Panel testiranih algoritama uključuje Naive Bayes (NB), stroj potpornih vektora (engl. support vector machine, SVM), pristup ekstremnog pojačanja gradijenta (engl. extreme gradient boosting, XGBoost) i nasumične Å”ume (engl. random forest, RF). Svi spomenuti pristupi naučeni su u deset puta ponovljenom desetostrukom (10x10-strukom) načinu unakrsne validacije. Usporedba treniranih modela na svih stotinu ponovnih uzoraka provedena je nefrekvencijskim pristupom - Bayesovom analizom. Po prvi put za modeliranje aktivnosti herbicida, implementirali smo računalni okvir od obrade značajki i odabira, od učenja nekoliko modela, i konačno, statističke usporedbe njihove izvedbe. Obje navedene metode temelje se na principima kvantitativnog modeliranja odnosa između strukture i aktivnosti (engl. quantitative structure-activity relationship, QSAR). Definicija domene primjenjivosti za određeni problem jedan je od temelja QSAR-a. Međutim, definiranje granica kemijskog prostora unutar kojeg model može napraviti točna predviđanja nije jednostavno i ovisi o prirodi naučenog modela. U slučaju predviđanja opće bioloÅ”ke aktivnosti u obliku fenotipskog signala, kao Å”to je slučaj s herbicidnom aktivnoŔću, domena primjenjivosti može se jednostavno definirati u dvodimenzionalnom prostoru uzimajući u obzir strukturnu sličnost dostupnih molekula i modelnog produkta kao npr. vjerojatnost pripadnosti određenoj klasi. Predviđanje fizičke interakcije između bilo koja dva entiteta, kao Å”to su spojevi i proteinski ciljevi, dodaje složenost koja se ne može prilagoditi konvencionalnoj domeni primjenjivosti. U ovom slučaju, namjeravamo proÅ”iriti standardnu domenu primjenjivosti kako bismo uključili oba entiteta i generirali kvantitativnu procjenu pouzdanosti predviđanja koriÅ”tenjem okvira predviđanja nesukladnosti primjera (engl. conformal predictors). Navedenim postpukom može se pouzdano procijeniti područje predviđanja na temelju izračunate nesukladnosti ispitnih uzoraka. Nedostatak ove metode je taj Å”to je nesukladnost definirana u prostoru oznaka unaprijed definiranih kalibracijskih uzoraka, Å”to rezultira procjenama koje općenito dobro funkcioniraju, ali nisu specifične ni za jedan testirani par kemijskog spoja i proteina, stoga nisu uspjeÅ”ne za uzorke koji su malo izvan distribucije podataka u skupu za učenje. Kombinirajući koncepte iz oba okvira, dinamički definiramo domene primjenjivosti temeljene na sličnosti, sto nazivamo regijama sukladnosti za svaki novi uzorak, a zatim izračunavamo rezultate nesukladnosti - ovaj pristup nazivamo dinamičkom domenom primjenjivosti (engl. dynamic applicability domain, dAD). Pokazalo se da dAD pristup proizvodi strože intervale predviđanja u usporedbi s izvornim algoritmom konformnih prediktora. JoÅ” važnije, komplementarno regijama predviđanja, dAD postiže niže stope pogreÅ”ke za bilo koju razinu pouzdanosti. Å to je posebno važno za teže scenarije testiranja, kao Å”to su scenariji otkrivanja (S2) i prenamjene (S3)

    In silico characterisation of metagenomic alkane 1-monooxygenases

    No full text
    Cilj istraživanja bio je provesti filogenetsku analizu, funkcionalnu karakterizaciju i trodimenzionalno strukturno modeliranje hipotetskih alkan monooksigenaza iz metagenomske knjižnice sastavljene iz umjereno onečiŔćenog uzorka sakupljenog iz sedimenta lučnog sidriÅ”ta u Puli. Filogenetska analiza provedena je s pomoću MEGA programa, te je dala hipotetski prikaz evolucijske povezanosti između proteina. Funkcionalnom karakterizacijom putem UniProt, InterPro i CD baze podataka potvrđena je pripadnost ispitivanih proteina alkan 1-monooksigenazama, dok strukturno modeliranje s pomoću SWISS-MODEL i I-TASSER web servisa nisu dali značajne rezultate niti za jedan od proteina. Neuspjelost strukturnog modeliranja pripisana je nedostatku homolognih proteina s eksperimentalno određenom trodimenzionalnom strukturom.Phylogenetic analysis, functional characterization and structural modeling of putative alkan monooxygenases from metagenomic library constructed from a moderately polluted sample collected from a tanker berth station in Pula was performed. Phylogenetic analysis conducted via MEGA software resulted in phylogenetic tree representing hypothesis of evolutionary relationships between proteins. Functional characterization using UniProt, InterPro and CDD search engines confirmed similarities between putative proteins and alkan 1-monooxygenases, while structural modeling via SWISS-MODEL and I-TASSER server didn't give significant results. Inefficiency of structural modeling was attributed to absence of homologous proteins with experimentally determined three-dimensional structure

    In silico characterisation of metagenomic alkane 1-monooxygenases

    No full text
    Cilj istraživanja bio je provesti filogenetsku analizu, funkcionalnu karakterizaciju i trodimenzionalno strukturno modeliranje hipotetskih alkan monooksigenaza iz metagenomske knjižnice sastavljene iz umjereno onečiŔćenog uzorka sakupljenog iz sedimenta lučnog sidriÅ”ta u Puli. Filogenetska analiza provedena je s pomoću MEGA programa, te je dala hipotetski prikaz evolucijske povezanosti između proteina. Funkcionalnom karakterizacijom putem UniProt, InterPro i CD baze podataka potvrđena je pripadnost ispitivanih proteina alkan 1-monooksigenazama, dok strukturno modeliranje s pomoću SWISS-MODEL i I-TASSER web servisa nisu dali značajne rezultate niti za jedan od proteina. Neuspjelost strukturnog modeliranja pripisana je nedostatku homolognih proteina s eksperimentalno određenom trodimenzionalnom strukturom.Phylogenetic analysis, functional characterization and structural modeling of putative alkan monooxygenases from metagenomic library constructed from a moderately polluted sample collected from a tanker berth station in Pula was performed. Phylogenetic analysis conducted via MEGA software resulted in phylogenetic tree representing hypothesis of evolutionary relationships between proteins. Functional characterization using UniProt, InterPro and CDD search engines confirmed similarities between putative proteins and alkan 1-monooxygenases, while structural modeling via SWISS-MODEL and I-TASSER server didn't give significant results. Inefficiency of structural modeling was attributed to absence of homologous proteins with experimentally determined three-dimensional structure

    cfDNA methylation in liquid biopsies as potential testicular seminoma biomarker

    No full text
    Background: Seminoma is a testicular tumor type, routinely diagnosed after orchidectomy. As cfDNA represents a source of minimally invasive seminoma patient management, this study aimed to investigate whether cfDNA methylation of six genes from liquid biopsies, have potential as novel seminoma biomarkers. Materials & methods: cfDNA methylation from liquid biopsies was assessed by pyrosequencing and compared with healthy volunteers' samples. Results: Detailed analysis revealed specific CpGs as possible seminoma biomarkers, but receiver operating characteristic curve analysis showed modest diagnostic performance. In an analysis of panels of statistically significant CpGs, two DNA methylation panels emerged as potential seminoma screening panels, one in blood CpG8/CpG9/CpG10 (KITLG) and the other in seminal plasma CpG1(MAGEC2)/CpG1(OCT3/4). Conclusion: The presented data promote the development of liquid biopsy epigenetic biomarkers in the screening of seminoma patients

    Bioprospecting for Genes Encoding Hydrocarbon-Degrading Enzymes from Metagenomic Samples Isolated from Northern Adriatic Sea Sediments

    No full text
    Three metagenomic libraries were constructed using surface sediment samples from the northern Adriatic Sea. Two of the samples were taken from a highly polluted and an unpolluted site respectively. The third sample from a polluted site had been enriched using crude oil. The results of the metagenome analyses were incorporated in the REDPET relational database (http://redpet.bioinfo.pbf.hr/REDPET), which was generated using the previously developed MEGGASENSE platform. The database includes taxonomic data to allow the assessment of the biodiversity of metagenomic libraries and a general functional analysis of genes using hidden Markov model (HMM) profiles based on the KEGG database. A set of 22 specialised HMM profiles was developed to detect putative genes for hydrocarbon-degrading enzymes. Use of these profiles showed that the metagenomic library generated after selection on crude oil had enriched genes for aerobic n-alkane degradation. The use of this system for bioprospecting was exemplified using potential alkB and almA genes from this library

    TLR5 Variants Are Associated with the Risk for COPD and NSCLC Development, Better Overall Survival of the NSCLC Patients and Increased Chemosensitivity in the H1299 Cell Line

    No full text
    Chronic obstructive pulmonary disease (COPD) is considered as the strongest independent risk factor for lung cancer (LC) development, suggesting an overlapping genetic background in both diseases. A common feature of both diseases is aberrant immunity in respiratory epithelia that is mainly regulated by Toll-like receptors (TLRs), key regulators of innate immunity. The function of the flagellin-sensing TLR5 in airway epithelia and pathophysiology of COPD and LC has remained elusive. We performed case–control genetic association and functional studies on the importance of TLR5 in COPD and LC development, comparing Caucasian COPD/LC patients (n = 974) and healthy donors (n = 1283). Association analysis of three single nucleotide polymorphisms (SNPs) (rs725084, rs2072493_N592S, and rs5744174_F616L) indicated the minor allele of rs2072493_N592S to be associated with increased risk for COPD (OR = 4.41, p < 0.0001) and NSCLC (OR = 5.17, p < 0.0001) development and non-small cell LC risk in the presence of COPD (OR = 1.75, p = 0.0031). The presence of minor alleles (rs5744174 and rs725084) in a co-dominant model was associated with overall survival in squamous cell LC patients. Functional analysis indicated that overexpression of the rs2072493_N592S allele affected the activation of NF-κB and AP-1, which could be attributed to impaired phosphorylation of p38 and ERK. Overexpression of TLR5N592S was associated with increased chemosensitivity in the H1299 cell line. Finally, genome-wide transcriptomic analysis on WI-38 and H1299 cells overexpressing TLR5WT or TLR5N592S, respectively, indicated the existence of different transcription profiles affecting several cellular pathways potentially associated with a dysregulated immune response. Our results suggest that TLR5 could be recognized as a potential biomarker for COPD and LC development with functional relevance
    corecore