Search CORE

143,714 research outputs found

Identification of disease-causing genes using microarray data mining and gene ontology

Author: A Mohammadi
A Zhang
AA Alizadeh
Azadeh Mohammadi
B Duval
BF Souza
C Ambroise
C Ding
C Tago
D Lin
D Singh
E Martinez
FM Couto
I Guyon
I Inza
J Jaeger
JJ Jiang
L Li
L Yu
L Ziaei
Mansoor Salehi
Mohammad H Saraee
N Cristianini
P Pavlidis
P Resnik
PA Mundra
PA Mundra
PJ Park
R Genuer
RF Weaver
S Li
S Li
TM Huang
TR Golub
TS Furey
U Alon
W Xu
Y Ding
Y Saeys
Y Wang
YL Chin
Z Xie
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

University of Salford Institutional Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Difference of Normals as a Multi-Scale Operator in Unorganized Point Clouds

Author: Greenspan MA
Harrap R
Ioannou Y
Taati B
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

A novel multi-scale operator for unorganized 3D point clouds is introduced. The Difference of Normals (DoN) provides a computationally efficient, multi-scale approach to processing large unorganized 3D point clouds. The application of DoN in the multi-scale filtering of two different real-world outdoor urban LIDAR scene datasets is quantitatively and qualitatively demonstrated. In both datasets the DoN operator is shown to segment large 3D point clouds into scale-salient clusters, such as cars, people, and lamp posts towards applications in semi-automatic annotation, and as a pre-processing step in automatic object recognition. The application of the operator to segmentation is evaluated on a large public dataset of outdoor LIDAR scenes with ground truth annotations.Comment: To be published in proceedings of 3DIMPVT 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

CUED - Cambridge University Engineering Department

Using online linear classifiers to filter spam Emails

Author: Jones Gareth J.F.
Wang Bin
Wenfeng Pan
Publication venue: Springer Verlag
Publication date: 01/11/2007
Field of study

The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering

Irish Universities

DCU Online Research Access Service

Automatic Document Image Binarization using Bayesian Optimization

Author: Badekas E
Bernsen John
Gatos Basilis
Nafchi Hossein Ziaei
Ntirogiannis Konstantinos
Pratikakis Ioannis
Pratikakis Ioannis
Pratikakis Ioannis
Pratikakis Ioannis
Su Bolan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/10/2017
Field of study

Document image binarization is often a challenging task due to various forms of degradation. Although there exist several binarization techniques in literature, the binarized image is typically sensitive to control parameter settings of the employed technique. This paper presents an automatic document image binarization algorithm to segment the text from heavily degraded document images. The proposed technique uses a two band-pass filtering approach for background noise removal, and Bayesian optimization for automatic hyperparameter selection for optimal results. The effectiveness of the proposed binarization technique is empirically demonstrated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets

arXiv.org e-Print Archive

Crossref

Retinex filtering of foggy images: generation of a bulk set with selection and ranking

Author: Marazzato Roberto
Sparavigna Amelia Carolina
Publication venue
Publication date: 01/01/2015
Field of study

In this paper we are proposing the use of GIMP Retinex, a filter of the GNU Image Manipulation Program, for enhancing foggy images. This filter involves adjusting four different parameters to find the output image which has to be preferred according to some specific purposes. Aiming to obtain a processing, which is able of choosing automatically the best image from a given set, we are proposing a method for the generation a bulk set of GIMP Retinex filtered images and a preliminary approach for selecting and ranking them.Comment: Keywords: GIMP Retinex, GIMP, Image processing, Bulk generation of images, Bulk manipulation of image

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Color image segmentation using a self-initializing EM algorithm

Author: Ilea Dana E.
Whelan Paul F.
Publication venue: IASTED
Publication date: 01/01/2006
Field of study

This paper presents a new method based on the Expectation-Maximization (EM) algorithm that we apply for color image segmentation. Since this algorithm partitions the data based on an initial set of mixtures, the color segmentation provided by the EM algorithm is highly dependent on the starting condition (initialization stage). Usually the initialization procedure selects the color seeds randomly and often this procedure forces the EM algorithm to converge to numerous local minima and produce inappropriate results. In this paper we propose a simple and yet effective solution to initialize the EM algorithm with relevant color seeds. The resulting self initialised EM algorithm has been included in the development of an adaptive image segmentation scheme that has been applied to a large number of color images. The experimental data indicates that the refined initialization procedure leads to improved color segmentation

Irish Universities

DCU Online Research Access Service