Search CORE

25 research outputs found

Selected proceedings of the First Summit on Translational Bioinformatics 2008

Author: A Malovini
Atul J Butte
BJ Keller
DL Rubin
HJ Tipney
I Kunz
Indra Neil Sarkar
J Yang
LT Sam
Marco Ramoni
NH Shah
Olga Troyanskaya
P Mirhaji
PL Elkin
V Sintchenko
Y Garten
Y Liu
YI Liu
Yves Lussier
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

On Regularization Parameter Estimation under Covariate Shift

Author: Kouw Wouter M.
Loog Marco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/07/2016
Field of study

This paper identifies a problem with the usual procedure for L2-regularization parameter estimation in a domain adaptation setting. In such a setting, there are differences between the distributions generating the training data (source domain) and the test data (target domain). The usual cross-validation procedure requires validation data, which can not be obtained from the unlabeled target data. The problem is that if one decides to use source validation data, the regularization parameter is underestimated. One possible solution is to scale the source validation data through importance weighting, but we show that this correction is not sufficient. We conclude the paper with an empirical analysis of the effect of several importance weight estimators on the estimation of the regularization parameter.Comment: 6 pages, 2 figures, 2 tables. Accepted to ICPR 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Getting Started in Text Mining: Part Two

Author: Andrey Rzhetsky
CE Crangle
DR Swanson
I Spasic
JD Kim
JW Huss III
KB Cohen
L Hirschman
M Fleischman
Mark B. Gerstein
Michael Seringhaus
MV Blagosklonny
NH Shah
Olga G. Troyanskaya
R Kanagasabai
R Mitkov
S Aerts
SM Douglas
W Hersh
Y Sasaki
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

BioPortal: ontologies and integrated data resources at the click of a mouse

Author: B. Dai
C. G. Chute
C. Jonquet
Cote
D. L. Rubin
M. A. Musen
M. Dorf
M.-A. Storey
N. F. Noy
N. Griffith
N. H. Shah
P. L. Whetzel
Smith
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Biomedical ontologies provide essential domain knowledge to drive data integration, information retrieval, data annotation, natural-language processing and decision support. BioPortal (http://bioportal.bioontology.org) is an open repository of biomedical ontologies that provides access via Web services and Web browsers to ontologies developed in OWL, RDF, OBO format and Protégé frames. BioPortal functionality includes the ability to browse, search and visualize ontologies. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. BioPortal also enables integrated search of biomedical data resources such as the Gene Expression Omnibus (GEO), ClinicalTrials.gov, and ArrayExpress, through the annotation and indexing of these resources with ontologies in BioPortal. Thus, BioPortal not only provides investigators, clinicians, and developers ‘one-stop shopping’ to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources

CiteSeerX

Crossref

PubMed Central

Improving gene expression similarity measurement using pathway-based analytic dimension

Author: Keum Changwon
No Kyoung Tai
Oh Won Seok
Park Sue-Nie
Woo Jung Hoon
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets

Author: Butte Atul J.
Dudley Joel T.
Krishnan Vijay
Mbagwu Michael
Ruau David
Publication venue: Elsevier Inc.
Publication date: 01/12/2011
Field of study

AbstractPublicly available molecular datasets can be used for independent verification or investigative repurposing, but depends on the presence, consistency and quality of descriptive annotations. Annotation and indexing of molecular datasets using well-defined controlled vocabularies or ontologies enables accurate and systematic data discovery, yet the majority of molecular datasets available through public data repositories lack such annotations. A number of automated annotation methods have been developed; however few systematic evaluations of the quality of annotations supplied by application of these methods have been performed using annotations from standing public data repositories. Here, we compared manually-assigned Medical Subject Heading (MeSH) annotations associated with experiments by data submitters in the PRoteomics IDEntification (PRIDE) proteomics data repository to automated MeSH annotations derived through the National Center for Biomedical Ontology Annotator and National Library of Medicine MetaMap programs. These programs were applied to free-text annotations for experiments in PRIDE. As many submitted datasets were referenced in publications, we used the manually curated MeSH annotations of those linked publications in MEDLINE as “gold standard”. Annotator and MetaMap exhibited recall performance 3-fold greater than that of the manual annotations. We connected PRIDE experiments in a network topology according to shared MeSH annotations and found 373 distinct clusters, many of which were found to be biologically coherent by network analysis. The results of this study suggest that both Annotator and MetaMap are capable of annotating public molecular datasets with a quality comparable, and often exceeding, that of the actual data submitters, highlighting a continuous need to improve and apply automated methods to molecular datasets in public data repositories to maximize their value and utility

Elsevier - Publisher Connector

PubMed Central

Annotation analysis for testing drug safety signals using unstructured clinical notes

Author: A Bate
C Friedman
D Classen
D Dore
D Graham
DW Bates
G Alterovitz
GK Savova
H Cao
KD Shetty
L Ohno-Machado
L Tari
MJ Goldacre
N Tatonetti
NF Noy
NH Shah
NH Shah
O Bodenreider
P Khatri
P LePendu
P LePendu
P Stang
PM Coloma
PM Nadkarni
R Harpaz
R Harpaz
R Harpaz
RP Radecki
S Paumier
S Schneeweiss
S Schneeweiss
S Weiss-Smith
SJ Reisinger
W Chapman
WW Chapman
WW Chapman
WW Chapman
X Wang
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

BackgroundThe electronic surveillance for adverse drug events is largely based upon the analysis of coded data from reporting systems. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. With the increasing access to large volumes of electronic medical data-in particular the clinical notes-it may be possible to computationally encode and to test drug safety signals in an active manner.ResultsWe describe the application of simple annotation tools on clinical text and the mining of the resulting annotations to compute the risk of getting a myocardial infarction for patients with rheumatoid arthritis that take Vioxx. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005.ConclusionsOur results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

DIAL UCLouvain