Search CORE

219 research outputs found

Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data

Author: Du LinFang
Xu Tao
Zhou Yan
Publication venue: BioMed Central
Publication date: 01/11/2008
Field of study

Abstract Background Researchers interested in analysing the expression patterns of functionally related genes usually hope to improve the accuracy of their results beyond the boundaries of currently available experimental data. Gene ontology (GO) data provides a novel way to measure the functional relationship between gene products. Many approaches have been reported for calculating the similarities between two GO terms, known as semantic similarities. However, biologists are more interested in the relationship between gene products than in the scores linking the GO terms. To highlight the relationships among genes, recent studies have focused on functional similarities. Results In this study, we evaluated five functional similarity methods using both protein-protein interaction (PPI) and expression data of <it>S. cerevisiae</it>. The receiver operating characteristics (ROC) and correlation coefficient analysis of these methods showed that the maximum method outperformed the other methods. Statistical comparison of multiple- and single-term annotated proteins in biological process ontology indicated that genes with multiple GO terms may be more reliable for separating true positives from noise. Conclusion This study demonstrated the reliability of current approaches that elevate the similarity of GO terms to the similarity of proteins. Suggestions for further improvements in functional similarity analysis are also provided.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Visualisation of heterogeneous data with simultaneous feature saliency using Generalised Generative Topographic Mapping

Author: Bassi Gurjinder
Mumtaz Shahzad
Nabney Ian T.
Randrianandrasana Michel F.
Publication venue: Universität Bielefeld
Publication date: 01/10/2015
Field of study

Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets

Aston Publications Explorer

Explore Bristol Research

A systematic review of data quality issues in knowledge discovery tasks

Author: Corrales David Camilo
Corrales Juan Carlos
Ledezma Agapito Ismael
Publication venue: 'Universidad de Medellin'
Publication date: 07/11/2015
Field of study

Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad de Medellín: Revistas Científicas

Repositorio Institucional Universidad de Medellín

DIALNET

Recommended from our members

Tissue-specific and interpretable sub-segmentation of whole tumour burden on CT images by unsupervised fuzzy clustering.

Author: Beer Lucian
Brenton James D
Crispin-Ortuzar Mireia
Markowetz Florian
Martin-Gonzalez Paula
Rundo Leonardo
Sala Evis
Ursprung Stephan
Woitek Ramona
Publication venue: Comput Biol Med
Publication date: 01/05/2020
Field of study

BACKGROUND: Cancer typically exhibits genotypic and phenotypic heterogeneity, which can have prognostic significance and influence therapy response. Computed Tomography (CT)-based radiomic approaches calculate quantitative features of tumour heterogeneity at a mesoscopic level, regardless of macroscopic areas of hypo-dense (i.e., cystic/necrotic), hyper-dense (i.e., calcified), or intermediately dense (i.e., soft tissue) portions. METHOD: With the goal of achieving the automated sub-segmentation of these three tissue types, we present here a two-stage computational framework based on unsupervised Fuzzy C-Means Clustering (FCM) techniques. No existing approach has specifically addressed this task so far. Our tissue-specific image sub-segmentation was tested on ovarian cancer (pelvic/ovarian and omental disease) and renal cell carcinoma CT datasets using both overlap-based and distance-based metrics for evaluation. RESULTS: On all tested sub-segmentation tasks, our two-stage segmentation approach outperformed conventional segmentation techniques: fixed multi-thresholding, the Otsu method, and automatic cluster number selection heuristics for the K-means clustering algorithm. In addition, experiments showed that the integration of the spatial information into the FCM algorithm generally achieves more accurate segmentation results, whilst the kernelised FCM versions are not beneficial. The best spatial FCM configuration achieved average Dice similarity coefficient values starting from 81.94±4.76 and 83.43±3.81 for hyper-dense and hypo-dense components, respectively, for the investigated sub-segmentation tasks. CONCLUSIONS: The proposed intelligent framework could be readily integrated into clinical research environments and provides robust tools for future radiomic biomarker validation

Apollo (Cambridge)

Protein complex detection with semi-supervised learning in protein interaction networks

Author: Lei Xiujuan
Shi Lei
Zhang Aidong
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Protein-protein interactions (PPIs) play fundamental roles in nearly all biological processes. The systematic analysis of PPI networks can enable a great understanding of cellular organization, processes and function. In this paper, we investigate the problem of protein complex detection from noisy protein interaction data, i.e., finding the subsets of proteins that are closely coupled via protein interactions. However, protein complexes are likely to overlap and the interaction data are very noisy. It is a great challenge to effectively analyze the massive data for biologically meaningful protein complex detection. Results Many people try to solve the problem by using the traditional unsupervised graph clustering methods. Here, we stand from a different point of view, redefining the properties and features for protein complexes and designing a “semi-supervised” method to analyze the problem. In this paper, we utilize the neural network with the “semi-supervised” mechanism to detect the protein complexes. By retraining the neural network model recursively, we could find the optimized parameters for the model, in such a way we can successfully detect the protein complexes. The comparison results show that our algorithm could identify protein complexes that are missed by other methods. We also have shown that our method achieve better precision and recall rates for the identified protein complexes than other existing methods. In addition, the framework we proposed is easy to be extended in the future. Conclusions Using a weighted network to represent the protein interaction network is more appropriate than using a traditional unweighted network. In addition, integrating biological features and topological features to represent protein complexes is more meaningful than using dense subgraphs. Last, the “semi-supervised” learning model is a promising model to detect protein complexes with more biological and topological features available.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Multi-Criterion Evolutionary Approach Applied to Phylogenetic Reconstruction

Author: Cancino W.
Delbem A.C.B.
Publication venue: 'IntechOpen'
Publication date: 01/02/2010
Field of study

IntechOpen

Crossref

A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

GPU Accelerated Browser for Neuroimaging Genomics

Author: Alzheimer’s Disease Neuroimaging Initiative
Fang Shiaofen
Hasan Mohammad Al
Li Huang
Moore Jason H.
Saykin Andrew J.
Shen Li
Yan Jingwen
Yao Xiaohui
Zigon Bob
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2018
Field of study

Neuroimaging genomics is an emerging field that provides exciting opportunities to understand the genetic basis of brain structure and function. The unprecedented scale and complexity of the imaging and genomics data, however, have presented critical computational bottlenecks. In this work we present our initial efforts towards building an interactive visual exploratory system for mining big data in neuroimaging genomics. A GPU accelerated browsing tool for neuroimaging genomics is created that implements the ANOVA algorithm for single nucleotide polymorphism (SNP) based analysis and the VEGAS algorithm for gene-based analysis, and executes them at interactive rates. The ANOVA algorithm is 110 times faster than the 4-core OpenMP version, while the VEGAS algorithm is 375 times faster than its 4-core OpenMP counter part. This approach lays a solid foundation for researchers to address the challenges of mining large-scale imaging genomics datasets via interactive visual exploration

IUPUIScholarWorks