Search CORE

25 research outputs found

Chi-square-based scoring function for categorization of MEDLINE citations

Author: Hristovski Dimitar
Kastrin Andrej
Peterlin Borut
Publication venue: 'Georg Thieme Verlag KG'
Publication date: 01/01/2010
Field of study

Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine learning algorithms (support vector machines, decision trees, na\"ive Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.Comment: 34 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Minería de Datos: Conceptos y Tendencias

Author: Gilbert Karina
Riquelme Santos José Cristóbal
Ruiz Roberto
Publication venue: IBERAMIA : Sociedad Iberoamericana de Inteligencia Artificial
Publication date: 01/01/2006
Field of study

Hoy en día, la minería de datos (MD) está consiguiendo cada vez más captar la atención de las empresas. Todavía es infrecuente oír frases como “deberíamos segmentar a nuestros clientes utilizando herramientas de MD”, “la MD incrementará la satisfacción del cliente”, o “la competencia está utilizando MD para ganar cuota de mercado”. Sin embargo, todo apunta a que más temprano que tarde la minería de datos será usada por la sociedad, al menos con el mismo peso que actualmente tiene la Estadística. Así que ¿qué es la minería de datos y qué beneficios aporta? ¿Cómo puede influir esta tecnología en la resolución de los problemas diarios de las empresas y la sociedad en general? ¿Qué tecnologías están detrás de la minería de datos? ¿Cuál es el ciclo de vida de un proyecto típico de minería de datos? En este artículo, se intentarán aclarar estas cuestiones mediante una introducción a la minería de datos: definición, ejemplificar problemas que se pueden resolver con minería de datos, las tareas de la minería de datos, técnicas usadas y finalmente retos y tendencias en minería de datos

Secretaría de Estado de Cultura

idUS. Depósito de Investigación Universidad de Sevilla

MÉTODOS ESTIMADORES DE ERROR

Author: TRUEBA ESPINOSA ADRIAN
Publication venue
Publication date: 28/09/2018
Field of study

SÓLO VISIÓN PROYECTABLE

Repositorio Institucional de la Universidad Autónoma del Estado de México

Using 3D information for classification of non-melanoma skin lesions

Author: Fisher Robert
McDonagh Steven
Rees Jonathan
Publication venue
Publication date: 01/07/2008
Field of study

Edinburgh Research Explorer

Recommended from our members

Comparing predictions made by a prediction model, clinical score, and physicians Pediatric asthma exacerbations in the emergency department

Author: D. O’Sullivan
Dexheimer
Gorelick
J. Sayyad-Shirabad
K.J. Farion
Perlich
S. Wilk
Sefion
W. Michalowski
Wilk
Publication venue: 'Georg Thieme Verlag KG'
Publication date: 01/01/2013
Field of study

Background: Asthma exacerbations are one of the most common medical reasons for children to be brought to the hospital emergency department (ED). Various prediction models have been proposed to support diagnosis of exacerbations and evaluation of their severity. Objectives: First, to evaluate prediction models constructed from data using machine learning techniques and to select the best performing model. Second, to compare predictions from the selected model with predictions from the Pediatric Respiratory Assessment Measure (PRAM) score, and predictions made by ED physicians. Design: A two-phase study conducted in the ED of an academic pediatric hospital. In phase 1 data collected prospectively using paper forms was used to construct and evaluate five prediction models, and the best performing model was selected. In phase 2, data collected prospectively using a mobile system was used to compare the predictions of the selected prediction model with those from PRAM and ED physicians. Measurements: Area under the receiver operating characteristic curve and accuracy in phase 1; accuracy, sensitivity, specificity, positive and negative predictive values in phase 2. Results: In phase 1 prediction models were derived from a data set of 240 patients and evaluated using 10-fold cross validation. A naive Bayes (NB) model demonstrated the best performance and it was selected for phase 2. Evaluation in phase 2 was conducted on data from 82 patients. Predictions made by the NB model were less accurate than the PRAM score and physicians (accuracy of 70.7%, 73.2% and 78.0% respectively), however, according to McNemar’s test it is not possible to conclude that the differences between predictions are statistically significant. Conclusion: Both the PRAM score and the NB model were less accurate than physicians. The NB model can handle incomplete patient data and as such may complement the PRAM score. However, it requires further research to improve its accuracy

Aplicação de Multiclassificadores Heterogêneos no Reconhecimento de Classes Estruturais de Proteínas

Author: Bittencourt Valnaide G.
Canuto Anne M.P.
Costa José Alfredo F.
Da Costa Abreu Marjory
Souto Marcílio C.P. de
Publication venue: 'Associacao Brasileira de Inteligencia Computacional - ABRICOM'
Publication date: 29/08/2016
Field of study

O reconhecimento de dobras de proteína é um dos principais problemas em aberto da biologia molecular e uma importante abordagem para a descoberta de estruturas de proteínas desconsiderando a similaridade de suas seqüências. Neste contexto, as ferramentas computacionais, principalmente as técnicas da Aprendizagem de Máquina (AM), tornaram-se alternativas essenciais para tratar esse problema, considerando o grande volume de dados empregado. Este trabalho apresenta os resultados obtidos com a aplicação de diferentes sistemas multiclassificadores heterogêneos (Stacking, StackingC e Vote), empregando tipos distintos de classificadores base (Árvores de Decisão, K-Vizinhos Mais próximos, Naive Bayes, Máquinas de Vetores Suporte e Redes Neurais), à tarefa de predição de classes estruturais de proteína

Crossref

Sheffield Hallam University Research Archive

Hierarchical cost-sensitive algorithms for genome-wide gene function prediction

Author: G. Valentini
N. Cesa Bianchi
Publication venue: place:Helsinki
Publication date: 01/01/2009
Field of study

In this work we propose new ensemble methods for the hierarchical classification of gene functions. Our methods exploit the hierarchical relationships between the classes in different ways: each ensemble node is trained \u201clocally\u201d, according to its position in the hierarchy; moreover, in the evaluation phase the set of predicted annotations is built so to minimize a global loss function defined over the hierarchy. We also address the problem of sparsity of annotations by introducing a cost- sensitive parameter that allows to control the precision-recall trade-off. Experiments with the model organism S. cerevisiae, using the FunCat taxonomy and 7 biomolecular data sets, reveal a significant advantage of our techniques over \u201cflat\u201d and cost-insensitive hierarchical ensembles

AIR Universita degli studi di Milano

Random subspace ensembles for the bio-molecular diagnosis of tumors.

Author: A. Bertoni
R. Folgieri
G. Valentini
Publication venue
Publication date: 01/01/2004
Field of study

The bio-molecular diagnosis of malignancies, based on DNA microarray biotechnologies, is a difficult learning task, because of the high dimensionality and low cardinality of the data. Many supervised learning techniques, among them support vector machines (SVMs), have been experimented, using also feature selection methods to reduce the dimensionality of the data. In this paper we investigate an alternative approach based on random subspace ensemble methods. The high dimensionality of the data is reduced by randomly sampling subsets of features (gene expression levels), and accuracy is improved by aggregating the resulting base classifiers. Our experiments, in the area of the diagnosis of malignancies at bio-molecular level, show the effectiveness of the proposed approach

AIR Universita degli studi di Milano

OpenEdition

Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction

Author: G. Valentini
M. Re
Publication venue
Publication date: 01/01/2009
Field of study

The genome-wide hierarchical classification of gene functions, using biomolecular data from high-throughput biotechnologies, is one of the central topics in bioinformatics and functional genomics. In this paper we present a multilabel hierarchical algorithm inspired by the \u201ctrue path rule\u201d that governs both the Gene Ontology and the Functional Catalogue (FunCat). In particular we propose an enhanced version of the True Path Rule (TPR) algorithm, by which we can control the flow of information between the classifiers of the hierarchical ensemble, thus allowing to tune the precision/recall characteristics of the overall hierarchical classification system. Results with the model organism S. cerevisiae show that the proposed method significantly improves on the basic version of the TPR algorithm, as well as on the Hierarchical Top-down and Flat ensembles

AIR Universita degli studi di Milano