Search CORE

71,133 research outputs found

Dimensionality reduction and hierarchical clustering in framework for hyperspectral image segmentation

Author: Harikiran J.
Mallikharjuna Rao K.
Sai Chandana B.
Srinivasa Rao B.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/09/2019
Field of study

The hyperspectral data contains hundreds of narrows bands representing the same scene on earth, with each pixel has a continuous reflectance spectrum. The first attempts to analysehyperspectral images were based on techniques that were developed for multispectral images by randomly selecting few spectral channels, usually less than seven. This random selection of bands degrades the performance of segmentation algorithm on hyperspectraldatain terms of accuracies. In this paper, a new framework is designed for the analysis of hyperspectral image by taking the information from all the data channels with dimensionality reduction method using subset selection and hierarchical clustering. A methodology based on subset construction is used for selecting k informative bands from d bands dataset. In this selection, similarity metrics such as Average Pixel Intensity [API], Histogram Similarity [HS], Mutual Information [MI] and Correlation Similarity [CS] are used to create k distinct subsets and from each subset, a single band is selected. The informative bands which are selected are merged into a single image using hierarchical fusion technique. After getting fused image, Hierarchical clustering algorithm is used for segmentation of image. The qualitative and quantitative analysis shows that CS similarity metric in dimensionality reduction algorithm gets high quality segmented image

Bulletin of Electrical Engineering and Informatics

Recommended from our members

Integrative analysis of the inter-tumoral heterogeneity of triple-negative breast cancer.

Author: Boymoushakian Lari
Chiu Alec M
Coller Hilary A
Mitra Mithun
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Triple-negative breast cancers (TNBC) lack estrogen and progesterone receptors and HER2 amplification, and are resistant to therapies that target these receptors. Tumors from TNBC patients are heterogeneous based on genetic variations, tumor histology, and clinical outcomes. We used high throughput genomic data for TNBC patients (n = 137) from TCGA to characterize inter-tumor heterogeneity. Similarity network fusion (SNF)-based integrative clustering combining gene expression, miRNA expression, and copy number variation, revealed three distinct patient clusters. Integrating multiple types of data resulted in more distinct clusters than analyses with a single datatype. Whereas most TNBCs are classified by PAM50 as basal subtype, one of the clusters was enriched in the non-basal PAM50 subtypes, exhibited more aggressive clinical features and had a distinctive signature of oncogenic mutations, miRNAs and expressed genes. Our analyses provide a new classification scheme for TNBC based on multiple omics datasets and provide insight into molecular features that underlie TNBC heterogeneity

eScholarship - University of California

Similarity-based virtual screening using 2D fingerprints

Author: Bajorath
Belkin
Bender
Brown
Brown
Carhart
Charifson
Chen
Chen
Clark
Cramer
Cramer
Cruciani
Dixon
Downs
Everitt
Fligner
Flower
Ginn
Ginn
Godden
Godden
Gower
Hall
Harper
He
Hert
Hert
Hert
Hert
Holliday
Holliday
Hsu
Hubálek
Jenkins
Kearsley
Klein
Kubinyi
Lajiness
Leach
Makara
Martin
Matter
Nikolova
Patel
Peter Willett
Salim
Schuffenhauer
Schuffenhauer
Shanmugasundaram
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Stahura
Stahura
Walters
Wang
Warr
Whittle
Willett
Willett
Willett
Willett
Xue
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/12/2006
Field of study

This paper summarises recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases related to the sizes of the molecules that are being sought. Group fusion involves combining the results of similarity searches based on multiple reference structures and a single similarity measure. We demonstrate the effectiveness of this approach to screening, and also describe an approximate form of group fusion, turbo similarity searching, that can be used when just a single reference structure is available

Crossref

White Rose Research Online

An integrated clustering analysis framework for heterogeneous data

Author: Mojahed Aalaa
Publication venue
Publication date: 01/08/2016
Field of study

Big data is a growing area of research with some important research challenges that motivate our work. We focus on one such challenge, the variety aspect. First, we introduce our problem by defining heterogeneous data as data about objects that are described by different data types, e.g., structured data, text, time-series, images, etc. Through our work we use five datasets for experimentation: a real dataset of prostate cancer data and four synthetic dataset that we have created and made them publicly available. Each dataset covers different combinations of data types that are used to describe objects. Our strategy for clustering is based on fusion approaches. We compare intermediate and late fusion schemes. We propose an intermediary fusion approach, Similarity Matrix Fusion (SMF), where the integration process takes place at the level of calculating similarities. SMF produces a single distance fusion matrix and two uncertainty expression matrices. We then propose a clustering algorithm, Hk-medoids, a modified version of the standard k-medoids algorithm that utilises uncertainty calculations to improve on the clustering performance. We evaluate our results by comparing them to clustering produced using individual elements and show that the fusion approach produces equal or significantly better results. Also, we show that there are advantages in utilising the uncertainty information as Hkmedoids does. In addition, from a theoretical point of view, our proposed Hk-medoids algorithm has less computation complexity than the popular PAM implementation of the k-medoids algorithm. Then, we employed late fusion that aggregates the results of clustering by individual elements by combining cluster labels using an object co-occurrence matrix technique. The final cluster is then derived by a hierarchical clustering algorithm. We show that intermediate fusion for clustering of heterogeneous data is a feasible and efficient approach using our proposed Hk-medoids algorithm

University of East Anglia digital repository

Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings

Author: Holliday J.D.
Hu C.-Y.
Willett P.
Publication venue: Bentham Science Publishers
Publication date: 01/03/2002
Field of study

This paper compares 22 different similarity coefficients when they are used for searching databases of 2D fragment bit-strings. Experiments with the National Cancer Institute's AIDS and IDAlert databases show that the coefficients fall into several well-marked clusters, in which the members of a cluster will produce comparable rankings of a set of molecules. These clusters provide a basis for selecting combinations of coefficients for use in data fusion experiments. The results of these experiments provide a simple way of increasing the effectiveness of fragment-based similarity searching systems

MIT Libraries Dome

White Rose Research Online

Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings

Author: Holliday J.D.
Hu C.-Y.
Willett P.
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/03/2002
Field of study

White Rose Research Online

Patient-specific data fusion for cancer stratification and personalised treatment

Author: Gligorijević V
Malod-Dognin N
Pržulj N
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 04/01/2016
Field of study

According to Cancer Research UK, cancer is a leading cause of death accounting for more than one in four of all deaths in 2011. The recent advances in experimental technologies in cancer research have resulted in the accumulation of large amounts of patient-specific datasets, which provide complementary information on the same cancer type. We introduce a versatile data fusion (integration) framework that can effectively integrate somatic mutation data, molecular interactions and drug chemical data to address three key challenges in cancer research: stratification of patients into groups having different clinical outcomes, prediction of driver genes whose mutations trigger the onset and development of cancers, and repurposing of drugs treating particular cancer patient groups. Our new framework is based on graph-regularised non-negative matrix tri-factorization, a machine learning technique for co-clustering heterogeneous datasets. We apply our framework on ovarian cancer data to simultaneously cluster patients, genes and drugs by utilising all datasets.We demonstrate superior performance of our method over the state-of-the-art method, Network-based Stratification, in identifying three patient subgroups that have significant differences in survival outcomes and that are in good agreement with other clinical data. Also, we identify potential new driver genes that we obtain by analysing the gene clusters enriched in known drivers of ovarian cancer progression. We validated the top scoring genes identified as new drivers through database search and biomedical literature curation. Finally, we identify potential candidate drugs for repurposing that could be used in treatment of the identified patient subgroups by targeting their mutated gene products. We validated a large percentage of our drug-target predictions by using other databases and through literature curation

Spiral - Imperial College Digital Repository

Bayesian correlated clustering to integrate multiple datasets

Author: Balasubramanian
Barash
Brock
Carlson
Cheng
Cherry
Cho
Cooke
Datta
David L. Wild
Dempster
Friedman
Fritsch
Granovskaia
Green
Harbison
Hubert
Huttenhower
Ideker
Ishwaran
Jackson
Jackson
Jansen
Jim E. Griffin
Kirk
Lee
Liu
Liu
Lockhart
Mistry
Myers
Myers
Neal
Neal
Nieto-Barajas
Paul Kirk
Puig
Rand
Rasmussen
Rasmussen
Reiss
Rhodes
Richard S. Savage
Rigaut
Rogers
Rogers
Rousseau
Santisteban
Savage
Schena
Shen
Solomon
Stark
Suchard
Troyanskaya
Wei
Wong
Yeung
Yuan
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets. Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods

CiteSeerX

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Kent Academic Repository