Search CORE

19 research outputs found

Characterisation and Classification of Protein Sequences by Using Enhanced Amino Acid Indices and Signal Processing-Based Methods

Author: Chrysostomou Charalambos
Publication venue: Centre for Computational Intelligence
Publication date: 01/01/2013
Field of study

Due to copyright reasons, the authors published papers have been removed from this copy of the thesis.Protein sequencing has produced overwhelming amount of protein sequences, especially in the last decade. Nevertheless, the majority of the proteins' functional and structural classes are still unknown, and experimental methods currently used to determine these properties are very expensive, laborious and time consuming. Therefore, automated computational methods are urgently required to accurately and reliably predict functional and structural classes of the proteins. Several bioinformatics methods have been developed to determine such properties of the proteins directly from their sequence information. Such methods that involve signal processing methods have recently become popular in the bioinformatics area and been investigated for the analysis of DNA and protein sequences and shown to be useful and generally help better characterise the sequences. However, there are various technical issues that need to be addressed in order to overcome problems associated with the signal processing methods for the analysis of the proteins sequences. Amino acid indices that are used to transform the protein sequences into signals have various applications and can represent diverse features of the protein sequences and amino acids. As the majority of indices have similar features, this project proposes a new set of computationally derived indices that better represent the original group of indices. A study is also carried out that resulted in finding a unique and universal set of best discriminating amino acid indices for the characterisation of allergenic proteins. This analysis extracts features directly from the protein sequences by using Discrete Fourier Transform (DFT) to build a classification model based on Support Vector Machines (SVM) for the allergenic proteins. The proposed predictive model yields a higher and more reliable accuracy than those of the existing methods. A new method is proposed for performing a multiple sequence alignment. For this method, DFT-based method is used to construct a new distance matrix in combination with multiple amino acid indices that were used to encode protein sequences into numerical sequences. Additionally, a new type of substitution matrix is proposed where the physicochemical similarities between any given amino acids is calculated. These similarities were calculated based on the 25 amino acids indices selected, where each one represents a unique biological protein feature. The proposed multiple sequence alignment method yields a better and more reliable alignment than the existing methods. In order to evaluate complex information that is generated as a result of DFT, Complex Informational Spectrum Analysis (CISA) is developed and presented. As the results show, when protein classes present similarities or differences according to the Common Frequency Peak (CFP) in specific amino acid indices, then it is probable that these classes are related to the protein feature that the specific amino acid represents. By using only the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient, as biologically related features can appear individually either in the real or the imaginary spectrum. This is successfully demonstrated over the analysis of influenza neuraminidase protein sequences. Upon identification of a new protein, it is important to single out amino acid responsible for the structural and functional classification of the protein, as well as the amino acids contributing to the protein's specific biological characterisation. In this work, a novel approach is presented to identify and quantify the relationship between individual amino acids and the protein. This is successfully demonstrated over the analysis of influenza neuraminidase protein sequences. Characterisation and identification problem of the Influenza A virus protein sequences is tackled through a Subgroup Discovery (SD) algorithm, which can provide ancillary knowledge to the experts. The main objective of the case study was to derive interpretable knowledge for the influenza A virus problem and to consequently better describe the relationships between subtypes of this virus. Finally, by using DFT-based sequence-driven features a Support Vector Machine (SVM)-based classification model was built and tested, that yields higher predictive accuracy than that of SD. The methods developed and presented in this study yield promising results and can be easily applied to proteomic fields

De Montfort University Open Research Archive

CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences

Author: Charalambos Chrysostomou
Huseyin Seker
Nizamettin Aydin
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

Light source detection for digital images in noisy scenes: a neural network approach

Author: Andries Dam van
AP Pentland
C-S Bouganis
C-S Bouganis
CH Lee
Charalambos Chrysostomou
Chow Chi Kin
D Rumelhart
David A. Elizondo
H Martnez
K Hara
Q Zheng
R Klette
R Ramamoorthi
R Zhang
Shang-Ming Zhou
SM Zhou
T Masters
UM Ascher
W Chojnacki
W Grimson
Zheng Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Cronfa at Swansea University

The BioMart community portal: an innovative alternative to large, centralized data repositories.

Author: Allen James
Arnaiz Olivier
Awedh Mohammad
Baldock Richard
Barbiera Giulia
Bardou Philippe
Beck Tim
Blake Andrew
Bonierbale Meredith
Brookes Anthony
Bucci Gabrielle
Bueti Iwan
Burge Sarah
Cabua Cédric
Carlson Joseph
Chelala Claude
Chrysostomou Charalambos
Citaro Davide
Collin Olivier
Cordova Raul
Cutts Rosalind
Dassi Erik
Di Genova Alex
Djari Anis
Durinck Steffen
Esposito Anthony
Estrella Heather
Eyras Eduardo
Fernandez-Banet Julia
Forbes Simon
Free Robert
Fujisawa Takamoto
Gadaleta Emanuela
Garcia-Manteiga Jose
Goodstein David
Gray Kristian
Guerra-Assunção José
Haggarty Bernard
Haider Syed
Han Byung
Han Dong-Jin
Harris Todd
Harshbarger Jayson
Hastings Robert
Hayes Richard
Hoede Claire
Hu Shen
Hu Zhi-Liang
Hutchins Lucie
Kan Zhengyan
Kasprzyk Arek
Kawaji Hideya
Keliet Aminah
Kerhornou Arnaud
Kim Sunghoon
Kinsella Rhoda
Klopp Christophe
Kong Lei
Lawson Daniel
Lazarevic Dejan
Lee Ji-Hyun
Letellier Thomas
Li Chuan-Yun
Lio Pietro
Liu Chu-Jun
Luo Jie
Maass Alejandro
Mariette Jerome
Maurel Thomas
Merella Stefania
Mohamed Azza
Moreews Francois
Nabihoudine Ibounyamine
Ndegwa Nelson
Noirot Céline
Pandini Luca
Perez-Llamas Cristian
Primig Michael
Provero Paolo
Quattrone Alessandro
Quesneville Hasi
Rambaldi Davide
Reecy James
Reecy James
Riba Michela
Rosanoff Steven
Saddiq Amna
Salas Elise
Sallou Olivier
Shepherd Rebecca
Simon Reinhard
Smedley Damian
Sperling Linda
Spooner William
Staines Daniel
Steinbach Delphine
Stone Kevin
Stupka Elia
Teague Jon
Ullah Abu
Wang Jun
Ware Doreen
Wong-Erasmus Marie
Youens-Clark Ken
Zadissa Amonida
Zhang Shi-Jian
Publication venue: Nucleic Acids Res
Publication date: 01/01/2015
Field of study

The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations

HAL-CentraleSupelec

The Jackson Laboratory: The Mouseion at the JAXlibrary

Cold Spring Harbor Laboratory Institutional Repository

HAL Descartes

UPF Digital Repository

ProdInra

Hal-Diderot

Digital Repository @ Iowa State University (ISU)

Crossref

INRIA a CCSD electronic archive server

HAL-Inserm

PubMed Central

eScholarship - University of California

CGSpace

Apollo (Cambridge)

Institutional Research Information System University of Turin

HAL-Rennes 1

Leicester Research Archive

Structural classification of protein sequences based on signal processing and support vector machines

Author: Chrysostomou Charalambos
Seker Huseyin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/10/2016
Field of study

The function of any protein depends directly on its secondary and tertiary structure. Proteins can fold into a three-dimensional shape, which is primarily depended on the arrangement of amino acids in the primary structure. In recent years, with the explosive sequencing of proteins, it is unfeasible to perform detailed experimental studies, as these methodologies are very expensive and time consuming. This leaves the structure of the majority of currently available protein sequences unknown. In this paper, a predictive model is therefore presented for the classification of protein sequence's secondary structures, namely alpha helix and beta sheet. The proteins used throughout this study were collected from the Structural Classification of Proteinsextended (SCOPe) database, which contains manually curated information from proteins with known structure. Two sets of proteins are used for all alpha and all beta protein sequences. The first set comprise of sequences with less than 40% identity, and the second set comprise of proteins with less than 95% identity. The analysis shows a strong connection between the amino acid indices used to convert protein sequences to numerical sequences and proteins' secondary structures. The total classification accuracy for the proposed classifier for the protein sequences with less than 40% identity for amino acid index BIOV880101 and BIOV880102 are 78.49% and 76.40%, respectively. The classification accuracy for sets of protein sequences with less than 95% identity for amino acid index BIOV880101 and BIOV880102 are 88.01% and 85.17%, respectively

Northumbria Research Link

Crossref

Novel protein weight matrix generated from amino acid indices

Author: Chrysostomou Charalambos
Seker Huseyin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2015
Field of study

In recent years, numerous protein weight matrices have been developed that include physical characteristics of proteins, such as local sequence-structure information, alpha-helix information, secondary structure information and solvent accessibility states. These protein weight matrices are shown to have generally improved protein sequence alignments over classical protein weight matrices, like Point Accepted Mutation (PAM), Blocks of Amino Acid Substitution (BLOSUM), and GONNET matrices, where important limitations have been observe in recent works. In this paper, a novel protein weight matrix is constructed and presented. This protein weight matrix is not considered based on the mutation rate, like PAM or BLOSUM matrices, but on the physicochemical properties of each amino acid. In the literature, over 500 amino acid indices exist, each one representing a unique biological protein feature. For this study, 25 amino acid indices were selected. These amino acid indices represent general and widely accepted features of the amino acids. By using the proposed protein weight matrix the following advantages can be obtained compared to the classical protein weight matrices. The proposed protein weight matrix is not biased to specific groups of protein sequences as the values are calculated from the amino acid indices, and not from the protein sequences. Additionally, for the proposed protein weight matrix, the same matrix can be considered regardless of the protein sequence's homology to be aligned or the mutation rate presented. A correlation to the physical characterisations of the amino acids that the protein weight matrix derived from can be achieved. Different similarity matrices can be generated when different physical characterisations of amino acids are considered

Northumbria Research Link

Crossref