12 research outputs found
Bioprospecting for Genes Encoding Hydrocarbon-Degrading Enzymes from Metagenomic Samples Isolated from Northern Adriatic Sea Sediments
Three metagenomic libraries were constructed using surface sediment samples from the northern Adriatic Sea. Two of the samples were taken from a highly polluted and an unpolluted site respectively. The third sample from a polluted site had been enriched using crude oil. The results of the metagenome analyses were incorporated in the REDPET relational database (http://redpet.bioinfo.pbf.hr/REDPET), which was generated using the previously developed MEGGASENSE platform. The database includes taxonomic data to allow the assessment of the biodiversity of metagenomic libraries and a general functional analysis of genes using hidden Markov model (HMM) profiles based on the KEGG database. A set of 22 specialised HMM-profiles was developed to detect putative genes for hydrocarbon-degrading enzymes. Use of these profiles showed that the metagenomic library generated after selection on crude oil had enriched genes for aerobic n-alkane degradation. The use of this system for bioprospecting was exemplified using potential alkB and almA genes from this library
Optimiranje metoda i reprezentacija za prediktivno modeliranje mehanizama djelovanja i afiniteta vezanja bioloŔki aktivnih molekula
The vastness of the chemical space of compound scaffolds is humongous and it represents a large playground for potential lead drug discovery or repurposing. With the accumulation of experimental data over the years, together with the development of more complex statistical frameworks, screening of such elaborate chemical spaces is finally possible. There are several well-defined problem areas for drug screening efforts, the most popular being the inhibition activity against a multitude of protein targets in human cells related to often occurring diseases. Some examples of highly targeted protein spaces include protein kinases, g-protein coupled receptors, and/or (non)selective serotonin re-uptake inhibitors. Mutation and dysregulation in any of the three of the mentioned protein groups can result in hereditary disorders, tumors, and mental disorders. Contrary to the available machine learning frameworks for prediction of direct physical interactions between compounds and protein targets, certain chemical activity predictions are not well-represented or defined in the literature, e.g. phytotoxic activity. In this work, publicly available data is collected with regard to the experimentally measured binding affinities of diverse compounds against one of the most popular target protein families, protein kinases. This protein super-family is one of the most important enzyme groups responsible for the regulation of most of the important cellular processes, including cell metabolism, cell growth, and division. Protein kinases regulate biochemical cycles by transferring high energy phosphoryl group from adenosine-3-phosphate (ATP) to specific amino acid residues of the target protein substrates. All members of this enzyme family are characterised by the highly conserved protein kinase (PK) domain, but depending on the phosphorylation site and the activation mechanisms of individual members of this family, this superfamily can be divided into several kinase groups. Due the specific characteristics of this protein group and kinase inhibitors, it is important to investigate how each of these chemical or biological spaces impact models performance and how to achieve more optimal predictive performance. On the other hand, we examine a different subspace of biological activity, focusing mostly on synthetic compounds with determined phytotoxic or herbicidal activity. We define this problem as a multiclass classification problem by using two predefined classification systems: main one, by the Herbicide Resistance Action Committee (HRAC), and the second one, by the Weed Science Society of America (WSSA). Considering that no defined machine learning framework for modeling and prediction of herbicidal activity was publicly available, an effort was made to collect the representative data set and define the optimal computational approach to maximize the prediction accuracy for mode of action (MoA) prediction. Considering that the classification of phytotoxic compounds was mostly performed by visual inspection of phenotypic changes in the affected weeds, there is a great need for an automated, systematic approach to this endeavor. Due to the limited size of the collected data, consisting of molecular structures of known activity and denoted by a MoA group, we further tested several āshallowā learners. The panel of tested algorithms includes naive bayes (NB), support vector machines (SVM), extreme-gradient boosting approach (XGBoost) and random forest (RF). All the approaches mentioned were trained in a ten times repeated ten fold (10x10-fold) cross validation mode. A comparison of trained models over all hundred resamples was performed using a non-frequentist approach - Bayesian analysis. For the first time for the herbicide activity modeling, we have implemented a computational framework from feature processing and selection to the training of several learners and, ultimately, a statistical comparison of their performance. However, due to the sheer size of the publicly available experimental data for protein kinase inhibitors, modeling of physical interactions between small compound spaces and the human kinome has allowed for more complex modeling techniques - but has also been more challenging in defining and engineering the feature space for over 8000 compounds and the nuance of the protein kinase family. Both of the aforementioned methods are founded on the QSAR (Quantitative structure-activity relationship) modeling principles. The definition of the applicability domain (AD) for a specified problem is one of the pillars of QSAR modeling. However, defining the boundaries of the chemical space within which the model can make accurate predictions is not simple and is dependent on the nature of the trained model. In the case of predicting general biological activity in the form of a phenotypic signal, as is the case with herbicidal activity, the applicability domain can be simply defined in two-dimensional space by considering the structural similarity of available molecules and a model output, such as the probability of belonging to a particular class. Predicting the physical interaction between any two entities, such as compounds and protein targets, adds complexity that cannot be accommodated by the conventional applicability domain. In this instance, we intend to extend the standard applicability domain to include both entities and generate a quantitative estimate of prediction confidence using the conformal prediction framework. Conformal predictors can reliably estimate a prediction region based on the computed nonconformity of test samples. The disadvantage of this method is that the nonconformity is defined in the label space of predefined calibration samples, resulting in estimates that work well in general but are not specific to any tested compound-target pair, thus failing for samples that are not already available in the training set. Combining concepts from both frameworks, we dynamically define similarity-based applicability domains or conformity regions for each new sample and then calculate nonconformity scores - we refer to this approach as the dynamic applicability domain (dAD). The dAD approach was shown to produce tighter prediction regions when compared to the original conformal predictors algorithm. More importantly, complementary to the prediction regions, when it comes to realistic use-case scenarios (S2, S3), dAD achieves lower error rates for any confidence level. More importantly, merging the concept of applicability domain with a conformal predictors corrects for existing bottlenecks in the traditional applicability domain definition and allows for the evaluation of model behavior in an abstract interaction space between any number of interacting entities. This way, it is a valuable and informative approach for validation of data quality in subregions of interaction space specific for biomolecular
complexes.VeliÄina prostora potencijalnih kemijskih struktura je ogromna te omoguÄava pretraživanje i testiranje novih potencijalnih terapeutika ili prenamjenu veÄ postojeÄih u svrhu ciljanja drugih proteina. Kroz vrijeme, sve veÄe nakupljanje eksperimentalnih podataka i razvoja naprednih statistiÄkih pristupa omoguÄilo je uÄinkovito ciljano pretraživanje kemijskog prostora. Postoji nekoliko dobro definiranih problematiÄnih podruÄja gdje se automatizirano pretraživanje novih terapeutika pokazalo uÄinkovitim, a najpopularnija je inhibicija aktivnosti mnoÅ”tva ciljanih proteina u ljudskim stanicama povezanih s uÄestalnim bolestima. MeÄu proteinske skupine od velikog interasa spadaju proteinske kinaze, g-protein spregnuti receptori i/ili (ne)selektivni inhibitori ponovne pohrane serotonina. Mutacija i disregulacija u bilo kojoj od tri navedene skupine proteina može rezultirati nasljednim poremeÄajima, tumorima i mentalnim poremeÄajima. Suprotno dostupnim okvirima strojnog uÄenja za predviÄanje izravnih fiziÄkih interakcija izmeÄu spojeva i proteina od interesa, odreÄena predviÄanja kemijske aktivnosti nisu dobro predstavljena ili definirana u literaturi, npr. herbicidno djelovanje. U ovom radu prikupljena je veÄina javno dostupnih podataka s eksperimentalno izmjerenim afinitetima vezanja razliÄitih spojeva protiv jedne od najpopularnijih proteinskih porodica od interesa, proteinskih kinaza. Ova super-porodica proteina jedna je od najvažnijih enzimskih skupina odgovornih za regulaciju veÄine važnih staniÄnih procesa, ukljuÄujuÄi regulaciju staniÄnog metabolizma, rasta i diobe stanica. Kinaze reguliraju biokemijske cikluse prijenosom fosforilnih skupina visoke energije s molekule adenozin-3-fosfata (ATP) na specifiÄne aminokiselinske boÄne lance ciljnih proteinskih supstrata. Svi Älanovi ove obitelji enzima karakterizirani su visoko oÄuvanom proteinskom kinaznom (PK) domenom, ali ovisno o mjestu fosforilacije i mehanizmima aktivacije, Älanovi ove porodice mogu se podijeliti u nekoliko kinaznih skupina. S obzirom na specifiÄnost proteinske porodice kinaza, kao i kinaznih inhibitora, vrlo je važno analizirati utjecaj svakog pojedinaÄnog kemijskog, odnosno bioloÅ”kog prostora, na izvedbu i uÄinkovitost samog modela, kao i naÄin za postizanje optimalnijeg rijeÅ”enja. S druge strane, osim prostora proteinskih kinaznih inhibitora, ispitujemo i drugaÄiji potprostor bioloÅ”ke aktivnosti, fokusirajuÄi se uglavnom na sintetiÄke primjere molekula s izmjerenom fitotoksiÄnom aktivnoÅ”Äu. BuduÄi da ova specifiÄna aktivnost, u smislu fiziÄke interakcije izmeÄu spojeva i ciljanih proteina, obiÄno nije dobro dokumentirana za ovaj specifiÄni zadatak - definiramo ovaj problem kao problem klasifikacije s viÅ”e oznaka uzimajuÄi unaprijed definirane sustave klasifikacije od strane Odbora za otpornost na herbicide (engl. Herbicide Resistance Action Committee, HRAC) i AmeriÄko druÅ”tvo za znanost o korovima (engl. Weed Science Society of America, WSSA). Zbog nedostatka javno dostupnih definiranih okvira strojnog uÄenja za modeliranje i predviÄanje uÄinkovitosti herbicida tijekom provedenog istraživanja, nastojimo sakupiti reprezentativan skup podataka i uspostaviti optimalan raÄunalni pristup radi poveÄanja toÄnosti predviÄanja mehanizma djelovanja (MoA). ImajuÄi u vidu da se klasifikacija fitotoksiÄnih spojeva obiÄno vrÅ”i vizualnom inspekcijom promjena fenotipa biljaka nakon izlaganja, postoji izražena potreba za automatizacijom ovog pristupa. Zbog ograniÄene veliÄine prikupljenih podataka koji se sastoje od molekularnih struktura poznate aktivnosti i oznaÄenih MoA skupinom, dodatno testiramo nekoliko "plitkih" modela strojnog uÄenja. Panel testiranih algoritama ukljuÄuje Naive Bayes (NB), stroj potpornih vektora (engl. support vector machine, SVM), pristup ekstremnog pojaÄanja gradijenta (engl. extreme gradient boosting, XGBoost) i nasumiÄne Å”ume (engl. random forest, RF). Svi spomenuti pristupi nauÄeni su u deset puta ponovljenom desetostrukom (10x10-strukom) naÄinu unakrsne validacije. Usporedba treniranih modela na svih stotinu ponovnih uzoraka provedena je nefrekvencijskim pristupom - Bayesovom analizom. Po prvi put za modeliranje aktivnosti herbicida, implementirali smo raÄunalni okvir od obrade znaÄajki i odabira, od uÄenja nekoliko modela, i konaÄno, statistiÄke usporedbe njihove izvedbe. Obje navedene metode temelje se na principima kvantitativnog modeliranja odnosa izmeÄu strukture i aktivnosti (engl. quantitative structure-activity relationship, QSAR). Definicija domene primjenjivosti za odreÄeni problem jedan je od temelja QSAR-a. MeÄutim, definiranje granica kemijskog prostora unutar kojeg model može napraviti toÄna predviÄanja nije jednostavno i ovisi o prirodi nauÄenog modela. U sluÄaju predviÄanja opÄe bioloÅ”ke aktivnosti u obliku fenotipskog signala, kao Å”to je sluÄaj s herbicidnom aktivnoÅ”Äu, domena primjenjivosti može se jednostavno definirati u dvodimenzionalnom prostoru uzimajuÄi u obzir strukturnu sliÄnost dostupnih molekula i modelnog produkta kao npr. vjerojatnost pripadnosti odreÄenoj klasi. PredviÄanje fiziÄke interakcije izmeÄu bilo koja dva entiteta, kao Å”to su spojevi i proteinski ciljevi, dodaje složenost koja se ne može prilagoditi konvencionalnoj domeni primjenjivosti. U ovom sluÄaju, namjeravamo proÅ”iriti standardnu domenu primjenjivosti kako bismo ukljuÄili oba entiteta i generirali kvantitativnu procjenu pouzdanosti predviÄanja koriÅ”tenjem okvira predviÄanja nesukladnosti primjera (engl. conformal predictors). Navedenim postpukom može se pouzdano procijeniti podruÄje predviÄanja na temelju izraÄunate nesukladnosti ispitnih uzoraka. Nedostatak ove metode je taj Å”to je nesukladnost definirana u prostoru oznaka unaprijed definiranih kalibracijskih uzoraka, Å”to rezultira procjenama koje opÄenito dobro funkcioniraju, ali nisu specifiÄne ni za jedan testirani par kemijskog spoja i proteina, stoga nisu uspjeÅ”ne za uzorke koji su malo izvan distribucije podataka u skupu za uÄenje. KombinirajuÄi koncepte iz oba okvira, dinamiÄki definiramo domene primjenjivosti temeljene na sliÄnosti, sto nazivamo regijama sukladnosti za svaki novi uzorak, a zatim izraÄunavamo rezultate nesukladnosti - ovaj pristup nazivamo dinamiÄkom domenom primjenjivosti (engl. dynamic applicability domain, dAD). Pokazalo se da dAD pristup proizvodi strože intervale predviÄanja u usporedbi s izvornim algoritmom konformnih prediktora. JoÅ” važnije, komplementarno regijama predviÄanja, dAD postiže niže stope pogreÅ”ke za bilo koju razinu pouzdanosti. Å to je posebno važno za teže scenarije testiranja, kao Å”to su scenariji otkrivanja (S2) i prenamjene (S3)
Canada ā Alberta Province ā Banff ā Lake Louise
https://digital.sandiego.edu/pccanadawestern/1014/thumbnail.jp
Optimiranje metoda i reprezentacija za prediktivno modeliranje mehanizama djelovanja i afiniteta vezanja bioloŔki aktivnih molekula
The vastness of the chemical space of compound scaffolds is humongous and it represents a large playground for potential lead drug discovery or repurposing. With the accumulation of experimental data over the years, together with the development of more complex statistical frameworks, screening of such elaborate chemical spaces is finally possible. There are several well-defined problem areas for drug screening efforts, the most popular being the inhibition activity against a multitude of protein targets in human cells related to often occurring diseases. Some examples of highly targeted protein spaces include protein kinases, g-protein coupled receptors, and/or (non)selective serotonin re-uptake inhibitors. Mutation and dysregulation in any of the three of the mentioned protein groups can result in hereditary disorders, tumors, and mental disorders. Contrary to the available machine learning frameworks for prediction of direct physical interactions between compounds and protein targets, certain chemical activity predictions are not well-represented or defined in the literature, e.g. phytotoxic activity. In this work, publicly available data is collected with regard to the experimentally measured binding affinities of diverse compounds against one of the most popular target protein families, protein kinases. This protein super-family is one of the most important enzyme groups responsible for the regulation of most of the important cellular processes, including cell metabolism, cell growth, and division. Protein kinases regulate biochemical cycles by transferring high energy phosphoryl group from adenosine-3-phosphate (ATP) to specific amino acid residues of the target protein substrates. All members of this enzyme family are characterised by the highly conserved protein kinase (PK) domain, but depending on the phosphorylation site and the activation mechanisms of individual members of this family, this superfamily can be divided into several kinase groups. Due the specific characteristics of this protein group and kinase inhibitors, it is important to investigate how each of these chemical or biological spaces impact models performance and how to achieve more optimal predictive performance. On the other hand, we examine a different subspace of biological activity, focusing mostly on synthetic compounds with determined phytotoxic or herbicidal activity. We define this problem as a multiclass classification problem by using two predefined classification systems: main one, by the Herbicide Resistance Action Committee (HRAC), and the second one, by the Weed Science Society of America (WSSA). Considering that no defined machine learning framework for modeling and prediction of herbicidal activity was publicly available, an effort was made to collect the representative data set and define the optimal computational approach to maximize the prediction accuracy for mode of action (MoA) prediction. Considering that the classification of phytotoxic compounds was mostly performed by visual inspection of phenotypic changes in the affected weeds, there is a great need for an automated, systematic approach to this endeavor. Due to the limited size of the collected data, consisting of molecular structures of known activity and denoted by a MoA group, we further tested several āshallowā learners. The panel of tested algorithms includes naive bayes (NB), support vector machines (SVM), extreme-gradient boosting approach (XGBoost) and random forest (RF). All the approaches mentioned were trained in a ten times repeated ten fold (10x10-fold) cross validation mode. A comparison of trained models over all hundred resamples was performed using a non-frequentist approach - Bayesian analysis. For the first time for the herbicide activity modeling, we have implemented a computational framework from feature processing and selection to the training of several learners and, ultimately, a statistical comparison of their performance. However, due to the sheer size of the publicly available experimental data for protein kinase inhibitors, modeling of physical interactions between small compound spaces and the human kinome has allowed for more complex modeling techniques - but has also been more challenging in defining and engineering the feature space for over 8000 compounds and the nuance of the protein kinase family. Both of the aforementioned methods are founded on the QSAR (Quantitative structure-activity relationship) modeling principles. The definition of the applicability domain (AD) for a specified problem is one of the pillars of QSAR modeling. However, defining the boundaries of the chemical space within which the model can make accurate predictions is not simple and is dependent on the nature of the trained model. In the case of predicting general biological activity in the form of a phenotypic signal, as is the case with herbicidal activity, the applicability domain can be simply defined in two-dimensional space by considering the structural similarity of available molecules and a model output, such as the probability of belonging to a particular class. Predicting the physical interaction between any two entities, such as compounds and protein targets, adds complexity that cannot be accommodated by the conventional applicability domain. In this instance, we intend to extend the standard applicability domain to include both entities and generate a quantitative estimate of prediction confidence using the conformal prediction framework. Conformal predictors can reliably estimate a prediction region based on the computed nonconformity of test samples. The disadvantage of this method is that the nonconformity is defined in the label space of predefined calibration samples, resulting in estimates that work well in general but are not specific to any tested compound-target pair, thus failing for samples that are not already available in the training set. Combining concepts from both frameworks, we dynamically define similarity-based applicability domains or conformity regions for each new sample and then calculate nonconformity scores - we refer to this approach as the dynamic applicability domain (dAD). The dAD approach was shown to produce tighter prediction regions when compared to the original conformal predictors algorithm. More importantly, complementary to the prediction regions, when it comes to realistic use-case scenarios (S2, S3), dAD achieves lower error rates for any confidence level. More importantly, merging the concept of applicability domain with a conformal predictors corrects for existing bottlenecks in the traditional applicability domain definition and allows for the evaluation of model behavior in an abstract interaction space between any number of interacting entities. This way, it is a valuable and informative approach for validation of data quality in subregions of interaction space specific for biomolecular
complexes.VeliÄina prostora potencijalnih kemijskih struktura je ogromna te omoguÄava pretraživanje i testiranje novih potencijalnih terapeutika ili prenamjenu veÄ postojeÄih u svrhu ciljanja drugih proteina. Kroz vrijeme, sve veÄe nakupljanje eksperimentalnih podataka i razvoja naprednih statistiÄkih pristupa omoguÄilo je uÄinkovito ciljano pretraživanje kemijskog prostora. Postoji nekoliko dobro definiranih problematiÄnih podruÄja gdje se automatizirano pretraživanje novih terapeutika pokazalo uÄinkovitim, a najpopularnija je inhibicija aktivnosti mnoÅ”tva ciljanih proteina u ljudskim stanicama povezanih s uÄestalnim bolestima. MeÄu proteinske skupine od velikog interasa spadaju proteinske kinaze, g-protein spregnuti receptori i/ili (ne)selektivni inhibitori ponovne pohrane serotonina. Mutacija i disregulacija u bilo kojoj od tri navedene skupine proteina može rezultirati nasljednim poremeÄajima, tumorima i mentalnim poremeÄajima. Suprotno dostupnim okvirima strojnog uÄenja za predviÄanje izravnih fiziÄkih interakcija izmeÄu spojeva i proteina od interesa, odreÄena predviÄanja kemijske aktivnosti nisu dobro predstavljena ili definirana u literaturi, npr. herbicidno djelovanje. U ovom radu prikupljena je veÄina javno dostupnih podataka s eksperimentalno izmjerenim afinitetima vezanja razliÄitih spojeva protiv jedne od najpopularnijih proteinskih porodica od interesa, proteinskih kinaza. Ova super-porodica proteina jedna je od najvažnijih enzimskih skupina odgovornih za regulaciju veÄine važnih staniÄnih procesa, ukljuÄujuÄi regulaciju staniÄnog metabolizma, rasta i diobe stanica. Kinaze reguliraju biokemijske cikluse prijenosom fosforilnih skupina visoke energije s molekule adenozin-3-fosfata (ATP) na specifiÄne aminokiselinske boÄne lance ciljnih proteinskih supstrata. Svi Älanovi ove obitelji enzima karakterizirani su visoko oÄuvanom proteinskom kinaznom (PK) domenom, ali ovisno o mjestu fosforilacije i mehanizmima aktivacije, Älanovi ove porodice mogu se podijeliti u nekoliko kinaznih skupina. S obzirom na specifiÄnost proteinske porodice kinaza, kao i kinaznih inhibitora, vrlo je važno analizirati utjecaj svakog pojedinaÄnog kemijskog, odnosno bioloÅ”kog prostora, na izvedbu i uÄinkovitost samog modela, kao i naÄin za postizanje optimalnijeg rijeÅ”enja. S druge strane, osim prostora proteinskih kinaznih inhibitora, ispitujemo i drugaÄiji potprostor bioloÅ”ke aktivnosti, fokusirajuÄi se uglavnom na sintetiÄke primjere molekula s izmjerenom fitotoksiÄnom aktivnoÅ”Äu. BuduÄi da ova specifiÄna aktivnost, u smislu fiziÄke interakcije izmeÄu spojeva i ciljanih proteina, obiÄno nije dobro dokumentirana za ovaj specifiÄni zadatak - definiramo ovaj problem kao problem klasifikacije s viÅ”e oznaka uzimajuÄi unaprijed definirane sustave klasifikacije od strane Odbora za otpornost na herbicide (engl. Herbicide Resistance Action Committee, HRAC) i AmeriÄko druÅ”tvo za znanost o korovima (engl. Weed Science Society of America, WSSA). Zbog nedostatka javno dostupnih definiranih okvira strojnog uÄenja za modeliranje i predviÄanje uÄinkovitosti herbicida tijekom provedenog istraživanja, nastojimo sakupiti reprezentativan skup podataka i uspostaviti optimalan raÄunalni pristup radi poveÄanja toÄnosti predviÄanja mehanizma djelovanja (MoA). ImajuÄi u vidu da se klasifikacija fitotoksiÄnih spojeva obiÄno vrÅ”i vizualnom inspekcijom promjena fenotipa biljaka nakon izlaganja, postoji izražena potreba za automatizacijom ovog pristupa. Zbog ograniÄene veliÄine prikupljenih podataka koji se sastoje od molekularnih struktura poznate aktivnosti i oznaÄenih MoA skupinom, dodatno testiramo nekoliko "plitkih" modela strojnog uÄenja. Panel testiranih algoritama ukljuÄuje Naive Bayes (NB), stroj potpornih vektora (engl. support vector machine, SVM), pristup ekstremnog pojaÄanja gradijenta (engl. extreme gradient boosting, XGBoost) i nasumiÄne Å”ume (engl. random forest, RF). Svi spomenuti pristupi nauÄeni su u deset puta ponovljenom desetostrukom (10x10-strukom) naÄinu unakrsne validacije. Usporedba treniranih modela na svih stotinu ponovnih uzoraka provedena je nefrekvencijskim pristupom - Bayesovom analizom. Po prvi put za modeliranje aktivnosti herbicida, implementirali smo raÄunalni okvir od obrade znaÄajki i odabira, od uÄenja nekoliko modela, i konaÄno, statistiÄke usporedbe njihove izvedbe. Obje navedene metode temelje se na principima kvantitativnog modeliranja odnosa izmeÄu strukture i aktivnosti (engl. quantitative structure-activity relationship, QSAR). Definicija domene primjenjivosti za odreÄeni problem jedan je od temelja QSAR-a. MeÄutim, definiranje granica kemijskog prostora unutar kojeg model može napraviti toÄna predviÄanja nije jednostavno i ovisi o prirodi nauÄenog modela. U sluÄaju predviÄanja opÄe bioloÅ”ke aktivnosti u obliku fenotipskog signala, kao Å”to je sluÄaj s herbicidnom aktivnoÅ”Äu, domena primjenjivosti može se jednostavno definirati u dvodimenzionalnom prostoru uzimajuÄi u obzir strukturnu sliÄnost dostupnih molekula i modelnog produkta kao npr. vjerojatnost pripadnosti odreÄenoj klasi. PredviÄanje fiziÄke interakcije izmeÄu bilo koja dva entiteta, kao Å”to su spojevi i proteinski ciljevi, dodaje složenost koja se ne može prilagoditi konvencionalnoj domeni primjenjivosti. U ovom sluÄaju, namjeravamo proÅ”iriti standardnu domenu primjenjivosti kako bismo ukljuÄili oba entiteta i generirali kvantitativnu procjenu pouzdanosti predviÄanja koriÅ”tenjem okvira predviÄanja nesukladnosti primjera (engl. conformal predictors). Navedenim postpukom može se pouzdano procijeniti podruÄje predviÄanja na temelju izraÄunate nesukladnosti ispitnih uzoraka. Nedostatak ove metode je taj Å”to je nesukladnost definirana u prostoru oznaka unaprijed definiranih kalibracijskih uzoraka, Å”to rezultira procjenama koje opÄenito dobro funkcioniraju, ali nisu specifiÄne ni za jedan testirani par kemijskog spoja i proteina, stoga nisu uspjeÅ”ne za uzorke koji su malo izvan distribucije podataka u skupu za uÄenje. KombinirajuÄi koncepte iz oba okvira, dinamiÄki definiramo domene primjenjivosti temeljene na sliÄnosti, sto nazivamo regijama sukladnosti za svaki novi uzorak, a zatim izraÄunavamo rezultate nesukladnosti - ovaj pristup nazivamo dinamiÄkom domenom primjenjivosti (engl. dynamic applicability domain, dAD). Pokazalo se da dAD pristup proizvodi strože intervale predviÄanja u usporedbi s izvornim algoritmom konformnih prediktora. JoÅ” važnije, komplementarno regijama predviÄanja, dAD postiže niže stope pogreÅ”ke za bilo koju razinu pouzdanosti. Å to je posebno važno za teže scenarije testiranja, kao Å”to su scenariji otkrivanja (S2) i prenamjene (S3)
In silico characterisation of metagenomic alkane 1-monooxygenases
Cilj istraživanja bio je provesti filogenetsku analizu, funkcionalnu karakterizaciju i trodimenzionalno strukturno modeliranje hipotetskih alkan monooksigenaza iz metagenomske knjižnice sastavljene iz umjereno oneÄiÅ”Äenog uzorka sakupljenog iz sedimenta luÄnog sidriÅ”ta u Puli. Filogenetska analiza provedena je s pomoÄu MEGA programa, te je dala hipotetski prikaz evolucijske povezanosti izmeÄu proteina. Funkcionalnom karakterizacijom putem UniProt, InterPro i CD baze podataka potvrÄena je pripadnost ispitivanih proteina alkan 1-monooksigenazama, dok strukturno modeliranje s pomoÄu SWISS-MODEL i I-TASSER web servisa nisu dali znaÄajne rezultate niti za jedan od proteina. Neuspjelost strukturnog modeliranja pripisana je nedostatku homolognih proteina s eksperimentalno odreÄenom trodimenzionalnom strukturom.Phylogenetic analysis, functional characterization and structural modeling of putative alkan monooxygenases from metagenomic library constructed from a moderately polluted sample collected from a tanker berth station in Pula was performed. Phylogenetic analysis conducted via MEGA software resulted in phylogenetic tree representing hypothesis of evolutionary relationships between proteins. Functional characterization using UniProt, InterPro and CDD search engines confirmed similarities between putative proteins and alkan 1-monooxygenases, while structural modeling via SWISS-MODEL and I-TASSER server didn't give significant results. Inefficiency of structural modeling was attributed to absence of homologous proteins with experimentally determined three-dimensional structure
In silico characterisation of metagenomic alkane 1-monooxygenases
Cilj istraživanja bio je provesti filogenetsku analizu, funkcionalnu karakterizaciju i trodimenzionalno strukturno modeliranje hipotetskih alkan monooksigenaza iz metagenomske knjižnice sastavljene iz umjereno oneÄiÅ”Äenog uzorka sakupljenog iz sedimenta luÄnog sidriÅ”ta u Puli. Filogenetska analiza provedena je s pomoÄu MEGA programa, te je dala hipotetski prikaz evolucijske povezanosti izmeÄu proteina. Funkcionalnom karakterizacijom putem UniProt, InterPro i CD baze podataka potvrÄena je pripadnost ispitivanih proteina alkan 1-monooksigenazama, dok strukturno modeliranje s pomoÄu SWISS-MODEL i I-TASSER web servisa nisu dali znaÄajne rezultate niti za jedan od proteina. Neuspjelost strukturnog modeliranja pripisana je nedostatku homolognih proteina s eksperimentalno odreÄenom trodimenzionalnom strukturom.Phylogenetic analysis, functional characterization and structural modeling of putative alkan monooxygenases from metagenomic library constructed from a moderately polluted sample collected from a tanker berth station in Pula was performed. Phylogenetic analysis conducted via MEGA software resulted in phylogenetic tree representing hypothesis of evolutionary relationships between proteins. Functional characterization using UniProt, InterPro and CDD search engines confirmed similarities between putative proteins and alkan 1-monooxygenases, while structural modeling via SWISS-MODEL and I-TASSER server didn't give significant results. Inefficiency of structural modeling was attributed to absence of homologous proteins with experimentally determined three-dimensional structure
cfDNA methylation in liquid biopsies as potential testicular seminoma biomarker
Background: Seminoma is a testicular tumor type, routinely diagnosed after orchidectomy. As cfDNA represents a source of minimally invasive seminoma patient management, this study aimed to investigate whether cfDNA methylation of six genes from liquid biopsies, have potential as novel seminoma biomarkers.
Materials & methods: cfDNA methylation from liquid biopsies was assessed by pyrosequencing and compared with healthy volunteers' samples.
Results: Detailed analysis revealed specific CpGs as possible seminoma biomarkers, but receiver operating characteristic curve analysis showed modest diagnostic performance. In an analysis of panels of statistically significant CpGs, two DNA methylation panels emerged as potential seminoma screening panels, one in blood CpG8/CpG9/CpG10 (KITLG) and the other in seminal plasma CpG1(MAGEC2)/CpG1(OCT3/4).
Conclusion: The presented data promote the development of liquid biopsy epigenetic biomarkers in the screening of seminoma patients
Bioprospecting for Genes Encoding Hydrocarbon-Degrading Enzymes from Metagenomic Samples Isolated from Northern Adriatic Sea Sediments
Three metagenomic libraries were constructed using surface sediment samples from the northern Adriatic Sea. Two of the samples were taken from a highly polluted and an unpolluted site respectively. The third sample from a polluted site had been enriched using crude oil. The results of the metagenome analyses were incorporated in the REDPET relational database (http://redpet.bioinfo.pbf.hr/REDPET), which was generated using the previously developed MEGGASENSE platform. The database includes taxonomic data to allow the assessment of the biodiversity of metagenomic libraries and a general functional analysis of genes using hidden Markov model (HMM) profiles based on the KEGG database. A set of 22 specialised HMM profiles was developed to detect putative genes for hydrocarbon-degrading enzymes. Use of these profiles showed that the metagenomic library generated after selection on crude oil had enriched genes for aerobic n-alkane degradation. The use of this system for bioprospecting was exemplified using potential alkB and almA genes from this library
TLR5 Variants Are Associated with the Risk for COPD and NSCLC Development, Better Overall Survival of the NSCLC Patients and Increased Chemosensitivity in the H1299 Cell Line
Chronic obstructive pulmonary disease (COPD) is considered as the strongest independent risk factor for lung cancer (LC) development, suggesting an overlapping genetic background in both diseases. A common feature of both diseases is aberrant immunity in respiratory epithelia that is mainly regulated by Toll-like receptors (TLRs), key regulators of innate immunity. The function of the flagellin-sensing TLR5 in airway epithelia and pathophysiology of COPD and LC has remained elusive. We performed case–control genetic association and functional studies on the importance of TLR5 in COPD and LC development, comparing Caucasian COPD/LC patients (n = 974) and healthy donors (n = 1283). Association analysis of three single nucleotide polymorphisms (SNPs) (rs725084, rs2072493_N592S, and rs5744174_F616L) indicated the minor allele of rs2072493_N592S to be associated with increased risk for COPD (OR = 4.41, p < 0.0001) and NSCLC (OR = 5.17, p < 0.0001) development and non-small cell LC risk in the presence of COPD (OR = 1.75, p = 0.0031). The presence of minor alleles (rs5744174 and rs725084) in a co-dominant model was associated with overall survival in squamous cell LC patients. Functional analysis indicated that overexpression of the rs2072493_N592S allele affected the activation of NF-κB and AP-1, which could be attributed to impaired phosphorylation of p38 and ERK. Overexpression of TLR5N592S was associated with increased chemosensitivity in the H1299 cell line. Finally, genome-wide transcriptomic analysis on WI-38 and H1299 cells overexpressing TLR5WT or TLR5N592S, respectively, indicated the existence of different transcription profiles affecting several cellular pathways potentially associated with a dysregulated immune response. Our results suggest that TLR5 could be recognized as a potential biomarker for COPD and LC development with functional relevance