Search CORE

464 research outputs found

Utilizing Protein Structure to Identify Non-Random Somatic Mutations

Author: Cheng Yuwei
Cheung Kei-Hoi
Modis Yorgo
Ryslik Gregory
Zhao Hongyu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/02/2013
Field of study

Motivation: Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key "driver" mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose a new methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering. Results: We have developed a novel algorithm, iPAC: identification of Protein Amino acid Clustering, for the identification of non-random somatic mutations in proteins that takes into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KCa. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology

arXiv.org e-Print Archive

Springer - Publisher Connector

Semantic Web for data harmonization in Chinese medicine

Author: Chen Huajun
Cheung Kei-Hoi
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Scientific studies to investigate Chinese medicine with Western medicine have been generating a large amount of data to be shared preferably under a global data standard. This article provides an overview of Semantic Web and identifies some representative Semantic Web applications in Chinese medicine. Semantic Web is proposed as a standard for representing Chinese medicine data and facilitating their integration with Western medicine data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

On the Stability of the Implicit Prices of Housing Attributes: A Dynamic Theory and Some Evidence

Author: Charles Ka Yui Leung
Kelvin Siu Kei Wong
Patrick Wai Yin Cheung
Publication venue
Publication date
Field of study

Given the dramatic fluctuations in aggregate housing prices, this paper attempts to examine whether the implicit prices of different housing attributes are “stable.” Theoretically, this paper provides perhaps the first dynamic, general equilibrium model in which housing attributes’ implicit prices fluctuate. Empirically, this paper models the time paths of different implicit prices as auto-regressive processes by employing a hedonic pricing model on a large set of housing transaction data over a relatively long period of time. An endogenous structural break test is then performed. Except for a few attributes, structural breaks are not detected. Directions for future research are discussed.hedonic pricing; structural break; evolution of valuation; housing attributes

Research Papers in Economics

A Spatial Simulation Approach to Account for Protein Structure When Identifying Non-Random Somatic Mutations

Author: Bjornson Robert
Cheng Yuwei
Cheung Kei-Hoi
Modis Yorgo
Ryslik Gregory
Zelterman Daniel
Zhao Hongyu
Publication venue
Publication date: 28/10/2013
Field of study

Background: Current research suggests that a small set of "driver" mutations are responsible for tumorigenesis while a larger body of "passenger" mutations occurs in the tumor but does not progress the disease. Due to recent pharmacological successes in treating cancers caused by driver mutations, a variety of of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical. Results: We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html Conclusion: SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structureComment: 16 pages, 8 Figures, 4 Table

arXiv.org e-Print Archive

Springer - Publisher Connector

Handling multiple testing while interpreting microarrays with the Gene Ontology Database

Author: Cheung Kei-Hoi
Osier Michael V
Zhao Hongyu
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The development of software tools that analyze microarray data in the context of genetic knowledgebases is being pursued by multiple research groups using different methods. A common problem for many of these tools is how to correct for multiple statistical testing since simple corrections are overly conservative and more sophisticated corrections are currently impractical. A careful study of the nature of the distribution one would expect by chance, such as by a simulation study, may be able to guide the development of an appropriate correction that is not overly time consuming computationally. RESULTS: We present the results from a preliminary study of the distribution one would expect for analyzing sets of genes extracted from Drosophila, S. cerevisiae, Wormbase, and Gramene databases using the Gene Ontology Database. CONCLUSIONS: We found that the estimated distribution is not regular and is not predictable outside of a particular set of genes. Permutation-based simulations may be necessary to determine the confidence in results of such analyses

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Analysis of Cancer Omics Data In A Semantic Web Framework

Author: James P. McCusker
Kei Cheung
Matthew Holford
Michael Krauthammer
Publication venue
Publication date: 07/12/2010
Field of study

Our work concerns the elucidation of the cancer (epi)genome, transcriptome and proteome to better understand the complex interplay between a cancel cell's molecular state and its response to anti-cancer therapy. To study the problem, we have previously focused on data warehousing technologies and statistical data integration. In this paper, we present recent work on extending our analytical capabilities using Semantic Web technology. A key new component presented here is a SPARQL endpoint to our existing data warehouse. This endpoint allows the merging of observed quantitative data with existing data from semantic knowledge sources such as Gene Ontology (GO). We show how such variegated quantitative and functional data can be integrated and accessed in a universal manner using Semantic Web tools. We also demonstrate how Description Lobic (DL) reasoning can be used to infer previously unstated conclusions from existing knowledge bases. As proof of concept, we illustrate the ability of our setup to answer complex queries on resistance of cancer cells to Decitabine, a demethylating agent

arXiv.org e-Print Archive

Crossref

Nature Precedings

An urban heat island study for building and urban design

Author: Cheung Kei Wang
Wang Yong
Publication venue
Publication date: 01/01/2011
Field of study

A lot of research has been conducted in the past decades on urban heat island (UHI) all over the world. Nevertheless, the UHI effect has not been included in weather data used by building services engineers to design buildings and size their heating and cooling plants. This research was carried out to investigate the UHI effect in Greater Manchester by setting up fixed point monitoring stations over the city. Woodford Met Office ground observation station was selected to be the rural reference point. A multiple regression model was developed to incorporate the heat island effect into the Manchester weather data for engineering usage.It was found that the urban heat island intensity (the difference between the rural and urban area temperatures) can be as high as 8°C in summer and 10°C in winter in Manchester. Clear and calm nocturnal temperature data was used (when maximum heat island occurs ) to find the relationship between the UHI intensity and sky view factor (SVF), distance away from the city centre, evapotranspiration fraction (EF), wind speed, cloud cover and rural reference temperature. Results indicate that all factors have a negative linear relationship with UHI intensity. All measured data were fed into a statistical software package to create general linear regression models. Validation showed that these models were capable of predicting average UHI effect to a good accuracy. The maximum heat island effect peaks are not so accurate. However, an analytical model was developed based on energy balance equations to predict the maximum heat island effect. Validation shows a good prediction for summer but not so good for winter. This is probably due to the lower average UHI intensity in winter than in summer.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

A semantic web framework to integrate cancer omics data with biological knowledge

Author: James P McCusker
Kei-Hoi Cheung
Matthew E Holford
Michael Krauthammer
Publication venue: Springer Nature
Publication date: 25/01/2012
Field of study

BACKGROUND: The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. RESULTS: For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. CONCLUSIONS: We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily

Springer - Publisher Connector

PubMed Central

A Graph Theoretic Approach to Utilizing Protein Structure to Identify Non-Random Somatic Mutations

Author: Cheng Yuwei
Cheung Kei-Hoi
Modis Yorgo
Ryslik Gregory
Zhao Hongyu
Publication venue
Publication date: 12/07/2013
Field of study

Background: It is well known that the development of cancer is caused by the accumulation of somatic mutations within the genome. For oncogenes specifically, current research suggests that there is a small set of "driver" mutations that are primarily responsible for tumorigenesis. Further, due to some recent pharmacological successes in treating these driver mutations and their resulting tumors, a variety of methods have been developed to identify potential driver mutations using methods such as machine learning and mutational clustering. We propose a novel methodology that increases our power to identify mutational clusters by taking into account protein tertiary structure via a graph theoretical approach. Results: We have designed and implemented GraphPAC (Graph Protein Amino Acid Clustering) to identify mutational clustering while considering protein spatial structure. Using GraphPAC, we are able to detect novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of prior clustering based on current methods. Specifically, by utilizing the spatial information available in the Protein Data Bank (PDB) along with the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), GraphPAC identifies new mutational clusters in well known oncogenes such as EGFR and KRAS. Further, by utilizing graph theory to account for the tertiary structure, GraphPAC identifies clusters in DPP4, NRP1 and other proteins not identified by existing methods. The R package is available at: http://bioconductor.org/packages/release/bioc/html/GraphPAC.html Conclusion: GraphPAC provides an alternative to iPAC and an extension to current methodology when identifying potential activating driver mutations by utilizing a graph theoretic approach when considering protein tertiary structure.Comment: 25 pages, 8 figures, 3 Table

arXiv.org e-Print Archive

Springer - Publisher Connector

Web GIS in practice VI: a demo playlist of geo-mashups for public health neogeographers

Author: Boulos Maged N Kamel
Burden David
Cheung Kei-Hoi
Scotch Matthew
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

'Mashup' was originally used to describe the mixing together of musical tracks to create a new piece of music. The term now refers to Web sites or services that weave data from different sources into a new data source or service. Using a musical metaphor that builds on the origin of the word 'mashup', this paper presents a demonstration "playlist" of four geo-mashup vignettes that make use of a range of Web 2.0, Semantic Web, and 3-D Internet methods, with outputs/end-user interfaces spanning the flat Web (two-dimensional – 2-D maps), a three-dimensional – 3-D mirror world (Google Earth) and a 3-D virtual world (Second Life ®). The four geo-mashup "songs" in this "playlist" are: 'Web 2.0 and GIS (Geographic Information Systems) for infectious disease surveillance', 'Web 2.0 and GIS for molecular epidemiology', 'Semantic Web for GIS mashup', and 'From Yahoo! Pipes to 3-D, avatar-inhabited geo-mashups'. It is hoped that this showcase of examples and ideas, and the pointers we are providing to the many online tools that are freely available today for creating, sharing and reusing geo-mashups with minimal or no coding, will ultimately spark the imagination of many public health practitioners and stimulate them to start exploring the use of these methods and tools in their day-to-day practice. The paper also discusses how today's Web is rapidly evolving into a much more intensely immersive, mixed-reality and ubiquitous socio-experiential Metaverse that is heavily interconnected through various kinds of user-created mashups

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central