21 research outputs found

    Computational Comparative Study of Tuberculosis Proteomes Using a Model Learned from Signal Peptide Structures

    Get PDF
    Secretome analysis is important in pathogen studies. A fundamental and convenient way to identify secreted proteins is to first predict signal peptides, which are essential for protein secretion. However, signal peptides are highly complex functional sequences that are easily confused with transmembrane domains. Such confusion would obviously affect the discovery of secreted proteins. Transmembrane proteins are important drug targets, but very few transmembrane protein structures have been determined experimentally; hence, prediction of the structures is essential. In the field of structure prediction, researchers do not make assumptions about organisms, so there is a need for a general signal peptide predictor

    Protein structural phylogeny, a missing chapter in molecular evolutionary biology

    No full text

    Lipid exposure prediction enhances the inference of rotational preferences of transmembrane helices

    No full text
    [[sponsorship]]資訊科學研究所[[note]]出版中(submitted);[SCI];有審查制度;具代表性[[note]]http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Drexel&SrcApp=hagerty_opac&KeyRecord=0887-3585&DestApp=JCR&RQ=IF_CAT_BOXPLO

    Computational comparative study of tuberculosis proteomes using a modellLearned from signal peptide structures

    No full text
    [[sponsorship]]資訊科學研究所[[note]]已出版;[SCI];有審查制度;具代表性[[note]]http://gateway.isiknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Drexel&SrcApp=hagerty_opac&KeyRecord=1932-6203&DestApp=JCR&RQ=IF_CAT_BOXPLO

    Graph-Based Clustering Approaches for Gene Network Reconstruction

    No full text
    為了解生物基因間的調控關係,生物學家常利用干擾性核醣核酸(RNAi),或是基因剔除(gene knockout)的方式來觀察生物系統的反應。資訊學家則嘗試利用演算法以mRNA隨時間變化的表現量曲線重建出可能的基因間調控關係。然而,基因間的調控包含許多階段,包括轉錄 (Transcription)、轉錄後修飾(Post-transcriptional modification) 、轉譯 (Translation) 、mRNA的降解 (mRNA degradation) 、轉譯後修飾 (Post-translational modification)等。這些階段都需要時間來反應,因此許多研究根據時間延遲的特徵,分析基因間的調控關係。這份研究中,我們使用兩個方法來重建基因網路。一個是Normalized Cuts以圖學方式試著將有功能性的基因調控網路分割出來。另一個方法則是PARE (Pattern Recognition Approach)演算法,一個以時間延遲(time-lagged)以及非線性特徵作為基因間調控關係的推論演算法。我們使用酵母菌的mRNA隨時間變化的表現量作為重建基因調控網路的分析材料,再以KEGG pathway資料庫、BIOGRID 交互影響資料庫與MIPS資料庫做為比較分析的參考。而從分析出的F score結果來看,我們的方法優於Kim等人所發展出的動態貝式網路。後,我們將方法應用到一個實際的例子,yox1與yhp1兩個基因皆剔除的酵母菌的生物晶片上,分析其mRNA隨時間變化的表現量。由於細胞每段時期間轉換機制尚未完全被了解,目前已知yox1與yhp1是以負回饋的機制控制細胞在G1時期的時間。我們成功地找到與細胞生命週期相關的調控網路,其中一個調控網路與細胞分裂相關。藉由這份應用結果,我們期望能夠探究出更多關於細胞生命週期中每個時期轉換間的調控機制。To understand regulatory relationships between genes in real life. Biologists often use RNA interference (RNAi) or knockout genes to observe the response in the real life system. Informationists try to reconstruct regulatory relationship between genes from mRNA expression profile by algorithms or mathematic models. There are several phases involved in gene regulation such as transcription, post-transcriptional modifications, translation, RNA degradation and post-translational modifications .Time is essential for all these phases to be completed and many researches analyze regulation via these features. n this study, we use two methods to reconstruct regulatory relationships between genes. One is a graph partition algorithm named Normalized Cuts for partitioning off genes into functional gene network. The other method, PARE (Pattern Recognition Approach), an algorithm based on time-lagged non-linear feature of the profile, is to infer regulation between genes. In addition, we use yeast microarray to construct gene regulatory networks and check results from KEGG pathway database, BIOGRID interaction database and MIPS database. Comparing our F score result with Dynamic Bayesian Network developed by Kim, et al., it shows that our method performs better than theirs. inally, we apply our method to a real case in yeast microarray in which yox1 and yhp1 are both deleted and we analyze its mRNA expression time profile. Although mechanisms between phases in cell cycle are not clear, yox1 and yhp1 are two genes known controlling duration of a cell in G1 phase by negative feedback. We successfully find networks associated with cell cycle and one of the networks is associated with cell mitosis. In the future, we hope to decipher more mechanisms between phases in cell cycle.口試委員審定書 i謝 ii要 iiibstract iv錄 v目錄 vii目錄 ix一章 序論 1.1研究背景 1.2研究動機 2.3研究目的 4二章 文獻探討 6.1基因網路重建演算法與模型的探討 6.2關於時間延遲的文獻探討 12三章 研究方法與材料 16.1以圖形理論方式重建網路 16.2 Normalized Cuts 19.3 PARE (Pattern Recognition Approach) 23.4 演算法流程 26.5生物驗證與應用材料 29.5.1生物驗證比對資料 29.5.2生物應用資料 30四章 研究成果與討論 31.1生物驗證比較資料結果 31.1.1 Dynamic Bayesian重建細胞代謝網路在BIOGRID上的比對 32.1.2 Dynamic Bayesian重建細胞生命週期網路在BIOGRID上的比對 33.1.3 圖形群聚演算法重建細胞代謝網路在BIOGRID上的比對 34.1.4 圖形群聚演算法重建細胞生命週期網路在BIOGRID上的比對 35.1.5 Dynamic Bayesian重建細胞代謝網路在MIPS上的比對 37.1.6 Dynamic Bayesian重建細胞生命週期網路在MIPS上的比對 38.1.7 圖形群聚演算法重建細胞代謝網路在MIPS上的比對 39.1.8 圖形群聚演算法重建細胞生命週期網路在MIPS上的比對 40.1.9兩種方法重建細胞代謝網路在KEGG上的比對 42.1.10兩種方法重建細胞生命週期網路在KEGG上的比對 45.2生物應用資料結果 49.3 生物應用資料討論 51五章 結論與展望 54考文獻 56錄 6

    Evolutionary model of protein secondary structure capable of revealing new biological relationships

    No full text
    Ancestral sequence reconstruction has had recent success in decoding the origins and the determinants of complex protein functions. However, phylogenetic analyses of remote homologues must handle extreme amino-acid sequence diversity resulting from extended periods of evolutionary change. We exploited the wealth of protein structures to develop an evolutionary model based on protein secondary structure. The approach follows the differences between discrete secondary structure states observed in modern proteins and those hypothesised in their immediate ancestors. We implemented maximum likelihood-based phylogenetic inference to reconstruct ancestral secondary structure. The predictive accuracy from the use of the evolutionary model surpasses that of comparative modelling and sequence-based prediction; the reconstruction extracts information not available from modern structures or the ancestral sequences alone. Based on a phylogenetic analysis of a sequence-diverse protein family, we showed that the model can highlight relationships that are evolutionarily rooted in structure and not evident in amino acid-based analysis

    Lipid exposure prediction enhances the inference of rotational angles of transmembrane helices

    No full text
    Background: Since membrane protein structures are challenging to crystallize, computational approaches are essential for elucidating the sequence-to-structure relationships. Structural modeling of membrane proteins requires a multidimensional approach, and one critical geometric parameter is the rotational angle of transmembrane helices. Rotational angles of transmembrane helices are characterized by their folded structures and could be inferred by the hydrophobic moment; however, the folding mechanism of membrane proteins is not yet fully understood. The rotational angle of a transmembrane helix is related to the exposed surface of a transmembrane helix, since lipid exposure gives the degree of accessibility of each residue in lipid environment. To the best of our knowledge, there have been few advances in investigating whether an environment descriptor of lipid exposure could infer a geometric parameter of rotational angle.Results: Here, we present an analysis of the relationship between rotational angles and lipid exposure and a support-vector-machine method, called TMexpo, for predicting both structural features from sequences. First, we observed from the development set of 89 protein chains that the lipid exposure, i.e., the relative accessible surface area (rASA) of residues in the lipid environment, generated from high-resolution protein structures could infer the rotational angles with a mean absolute angular error (MAAE) of 46.32° More importantly, the predicted rASA from TMexpo achieved an MAAE of 51.05°, which is better than 71.47° obtained by the best of the compared hydrophobicity scales. Lastly, TMexpo outperformed the compared methods in rASA prediction on the independent test set of 21 protein chains and achieved an overall Matthew's correlation coefficient, accuracy, sensitivity, specificity, and precision of 0.51, 75.26%, 81.30%, 69.15%, and 72.73%, respectively. TMexpo is publicly available at http://bio-cluster.iis.sinica.edu.tw/TMexpo.Conclusions: TMexpo can better predict rASA and rotational angles than the compared methods. When rotational angles can be accurately predicted, free modeling of transmembrane protein structures in turn may benefit from a reduced complexity in ensembles with a significantly less number of packing arrangements. Furthermore, sequence-based prediction of both rotational angle and lipid exposure can provide essential information when high-resolution structures are unavailable and contribute to experimental design to elucidate transmembrane protein functions

    Specificities (%) of various predictors on the non-signal peptide protein benchmark datasets.

    No full text
    <p>Specificities (%) of various predictors on the non-signal peptide protein benchmark datasets.</p

    Sensitivities (%) of various predictors on the signal peptide protein benchmark datasets.

    No full text
    <p>Sensitivities (%) of various predictors on the signal peptide protein benchmark datasets.</p

    List of unique proteins and their similar non-signal peptide proteins.

    No full text
    <p>List of unique proteins and their similar non-signal peptide proteins.</p
    corecore