Search CORE

211 research outputs found

Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics

Author: D.D. Lewis
E.M. Marcotte
J.D. Kim
K. Lund
L. Azzopardi
M. Girolami
M.N. Jones
M.N. Jones
R. Bunescu
S. Padó
S. Pyysalo
S. Rogers
T. Joachims
T.K. Landauer
Z. Minier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process

Deep MMT Transit Survey of the Open Cluster M37 IV: Limit on the Fraction of Stars With Planets as Small as 0.3 R_J

Author: Alard
Alonso
An
B. A. McLeod
B. S. Gaudi
Bakos
Beatty
Burke
Butler
Butler
Carraro
Endl
Fischer
Fischer
Fortney
Gaudi
Gaudi
Gilliland
Gould
Gratton
Hartman
Hartman
Hartman
Hartman
Ibata
Ida
Ida
J. A. Barranco
J. D. Hartman
J. S. Kalirai
Joachims
Joachims
K. Z. Stanek
Kalirai
Kalirai
Kaluzny
Konacki
Kurucz
M. H. Pinsonneault
M. J. Holman
Mandel
McCullough
McLeod
Mochejska
Mochejska
Origlia
Pepper
Pepper
Press
Rivera
S. Meibom
Skrutskie
Stetson
Szentgyorgyi
Tingley
Torres
Udalski
Vapnik
Vogt
Weldrake
Weldrake
Publication venue: 'IOP Publishing'
Publication date: 22/12/2008
Field of study

We present the results of a deep (15 ~< r ~< 23), 20 night survey for transiting planets in the intermediate age open cluster M37 (NGC 2099) using the Megacam wide-field mosaic CCD camera on the 6.5m MMT. We do not detect any transiting planets among the ~1450 observed cluster members. We do, however, identify a ~ 1 R_J candidate planet transiting a ~ 0.8 Msun Galactic field star with a period of 0.77 days. The source is faint (V = 19.85 mag) and has an expected velocity semi-amplitude of K ~ 220 m/s (M/M_J). We conduct Monte Carlo transit injection and recovery simulations to calculate the 95% confidence upper limit on the fraction of cluster members and field stars with planets as a function of planetary radius and orbital period. Assuming a uniform logarithmic distribution in orbital period, we find that < 1.1%, < 2.7% and < 8.3% of cluster members have 1.0 R_J planets within Extremely Hot Jupiter (EHJ, 0.4 < T < 1.0 day), Very Hot Jupiter (VHJ, 1.0 < T < 3.0 days) and Hot Jupiter (HJ, 3.0 < T < 5.0 days) period ranges respectively. For 0.5 R_J planets the limits are < 3.2%, and < 21% for EHJ and VHJ period ranges, while for 0.35 R_J planets we can only place an upper limit of < 25% on the EHJ period range. For a sample of 7814 Galactic field stars, consisting primarily of FGKM dwarfs, we place 95% upper limits of < 0.3%, < 0.8% and < 2.7% on the fraction of stars with 1.0 R_J EHJ, VHJ and HJ assuming the candidate planet is not genuine. If the candidate is genuine, the frequency of ~ 1.0 R_J planets in the EHJ period range is 0.002% < f_EHJ < 0.5% with 95% confidence. We place limits of < 1.4%, < 8.8% and < 47% for 0.5 R_J planets, and a limit of < 16% on 0.3 R_J planets in the EHJ period range. This is the first transit survey to place limits on the fraction of stars with planets as small as Neptune.Comment: 61 pages, 19 figures, 5 tables, replaced with the version accepted for publication in Ap

arXiv.org e-Print Archive

Crossref

Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS

Author: B Suomela
C Burges
C Sneiderman
D States
F Radlinski
G Poulter
G Salton
H Oh
H Yu
H Yu
Hwanjo Yu
Ilhwan Ko
J Xu
Jinoh Oh
L Murphy
M Siadaty
Sungchul Kim
T Joachims
T Joachims
T Qin
Taehoon Kim
V Cherkassky
W Hersh
Wook-Shin Han
X Geng
Y Cao
Y Lin
Yoo Illhoi
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results: RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions: RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.1114Nsciescopu

Crossref

Springer - Publisher Connector

PubMed Central

포항공과대학교

Web Mining for Web Personalization

Author: Berendt B.
Berendt B.
Buchner A. G.
Chen M. S.
Coenen F.
Cooley R.
Huang Z.
Joachims T.
Joshi A.
Lieberman H.
Magdalini Eirinaki
Masseglia F.
Michalis Vazirgiannis
Mladenic D.
Mobasher B.
Mobasher B.
Nasraoui O.
Perkowitz M.
Perkowitz M.
Perkowitz M.
Shahabi C.
Spiliopoulou M.
Spiliopoulou M.
Yan T. W.
Zaiane O. R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2003
Field of study

Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user\u27s navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented

Crossref

SJSU ScholarWorks

Notch signaling during human T cell development

Author: A Galy
A Krueger
A Sambandam
A Wilson
A Wolfer
AC Jaleco
AI Garbe
AP Weng
B Blom
B Reizis
B Vandekerckhove
BN Weber
C Benne
CC Tydell
CN Ting
D Hockemeyer
D Hockemeyer
E Robey
EM Six
ES David-Fung
F Klein
F Radtke
F Radtke
F Timmermans
F Weerkamp
F Weerkamp
G Awong
H Hirata
H Neves
H Wang
HT Petrie
HY Kueh
I Hoebeke
I Maillard
I Maillard
I Walle Van de
I Walle Van de
J Buer
J Gotter
J Plum
J Shi
JS Yuan
K Heinzel
K Hozumi
K Li
K Maki
K Tanigaki
L Li
M Ciofani
M Ciofani
M Ciofani
M Garcia-Peydro
M Ghisi
M Magri
M Smedt De
M Smedt De
M Smedt De
M Smedt De
MA Yui
MD Green
ML Joachims
N Lefort
P Beatus
P Doerfler
P Li
QL Hao
QL Hao
R Haddad
RN Motte-Mohs La
S Coppernolle Van
S Doulatov
S Gonzalez-Garcia
S Suliman
S Verbeek
SK Durum
SK Ye
SM Lehar
SY Lee
T Hosoya
T Ikawa
T Ikawa
T Kreslavsky
T Kreslavsky
T Palomero
T Taghon
T Taghon
T Taghon
T Taghon
T Taghon
T Taghon
TB Feyerabend
TM Schmitt
TN Taghon
W Dontje
WA Dik
Y Wu
YR Carrasco
Z Galic
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Notch signaling is critical during multiple stages of T cell development in both mouse and human. Evidence has emerged in recent years that this pathway might regulate T-lineage differentiation differently between both species. Here, we review our current understanding of how Notch signaling is activated and used during human T cell development. First, we set the stage by describing the developmental steps that make up human T cell development before describing the expression profiles of Notch receptors, ligands, and target genes during this process. To delineate stage-specific roles for Notch signaling during human T cell development, we subsequently try to interpret the functional Notch studies that have been performed in light of these expression profiles and compare this to its suggested role in the mouse

Crossref

Ghent University Academic Bibliography

Automated Home-Cage Behavioural Phenotyping of Mice

Author: A Veeraraghavan
AD Steele
AD Steele
D Chen
EH Goulding
EP Simoncelli
G Dell'Omo
H Dankert
H Jhuang
HG Mcfarlane
J Auwerx
JC Crabbe
JM Greer
JV Roughan
K Branson
LH Tecott
LP Noldus
MA Giese
N Dalal
O Rudenko
P Dollar
P Viola
RT Born
SN Fry
T Joachims
TB Moeslund
WS Jackson
X Xue
Y Altun
Z Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Neurobehavioral analysis of mouse phenotypes requires the monitoring of mouse behavior over long periods of time. Here, we describe a trainable computer vision system enabling the automated analysis of complex mouse behaviors. We provide software and an extensive manually annotated video database used for training and testing the system. Our system performs on par with human scoring, as measured from ground-truth manual annotations of thousands of clips of freely behaving mice. As a validation of the system, we characterized the home-cage behaviors of two standard inbred and two non-standard mouse strains. From this data we were able to predict in a blind test the strain identity of individual animals with high accuracy. Our video-based software will complement existing sensor based automated approaches and enable an adaptable, comprehensive, high-throughput, fine-grained, automated analysis of mouse behavior.McGovern Institute for Brain ResearchCalifornia Institute of Technology. Broad Fellows Program in Brain CircuitryNational Science Council (China) (TMS-094-1-A032

Incorporating rich background knowledge for gene named entity classification and recognition

Author: AM Cohen
AS Yeh
B Settles
B Settles
C Lanczos
CD Manning
CN Hsu
GD Zhou
H Liu
Hongfei Lin
J Finkel
J Hakenberg
J Lafferty
J Wilbur
JD Kim
KW Church
L Tanabe
M Ryan
O Etzioni
R Herbrich
R Leaman
RK Ando
RK Ando
T Joachims
VN Vapnik
W Hersh
X Zhu
Yanpeng Li
Z Yang
Zhihao Yang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Gene named entity classification and recognition are crucial preliminary steps of text mining in biomedical literature. Machine learning based methods have been used in this area with great success. In most state-of-the-art systems, elaborately designed lexical features, such as words, n-grams, and morphology patterns, have played a central part. However, this type of feature tends to cause extreme sparseness in feature space. As a result, out-of-vocabulary (OOV) terms in the training data are not modeled well due to lack of information. Results We propose a general framework for gene named entity representation, called feature coupling generalization (FCG). The basic idea is to generate higher level features using term frequency and co-occurrence information of highly indicative features in huge amount of unlabeled data. We examine its performance in a named entity classification task, which is designed to remove non-gene entries in a large dictionary derived from online resources. The results show that new features generated by FCG outperform lexical features by 5.97 F-score and 10.85 for OOV terms. Also in this framework each extension yields significant improvements and the sparse lexical features can be transformed into both a lower dimensional and more informative representation. A forward maximum match method based on the refined dictionary produces an F-score of 86.2 on BioCreative 2 GM test set. Then we combined the dictionary with a conditional random field (CRF) based gene mention tagger, achieving an F-score of 89.05, which improves the performance of the CRF-based tagger by 4.46 with little impact on the efficiency of the recognition system. A demo of the NER system is available at <url>http://202.118.75.18:8080/bioner</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

From classification to quantification in tweet sentiment analysis

Author: A Esuli
AP Dempster
DJ Hopkins
E Martínez-Cámara
F Wilcoxon
F Zou
G Forman
G King
I Csiszár
I Tsochantaridis
J Barranquero
J Barranquero
J Bollen
J Demšar
KP Murphy
M Saerens
PS Dodds
R Alaíz-Rodríguez
R-E Fan
S Burton
S Kiritchenko
T Joachims
T-F Wu
TM Cover
V González-Castro
V Vapnik
W Pan
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2016
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Predicting mostly disordered proteins by using structure-unknown protein data

Author: AK Dunker
AK Dunker
AK Dunker
AL Fink
CJ Oldfield
DT Jones
E Garner
EA Weathers
HJ Dyson
J Prilusky
JJ Ward
JJ Ward
JW Chen
Kana Shimizu
Kentaro Tomii
LM Iakoucheva
MJ Zvelebil
NS Bogatyreva
P Romero
P Tompa
P Tompa
PE Wright
R Apweiler
R Linding
R Linding
S Vucetic
S Vucetic
Shuichi Hirose
SO Garbuzynskiy
T Joachims
Tamotsu Noguchi
V Receveur-Brechot
VN Uversky
VN Uversky
VN Uversky
X Li
Y Minezaki
Yoichi Muraoka
Z Dosztanyi
Z Obradovic
ZR Yang
Publication venue: BioMed Central
Publication date: 01/03/2007
Field of study

BACKGROUND: Predicting intrinsically disordered proteins is important in structural biology because they are thought to carry out various cellular functions even though they have no stable three-dimensional structure. We know the structures of far more ordered proteins than disordered proteins. The structural distribution of proteins in nature can therefore be inferred to differ from that of proteins whose structures have been determined experimentally. We know many more protein sequences than we do protein structures, and many of the known sequences can be expected to be those of disordered proteins. Thus it would be efficient to use the information of structure-unknown proteins in order to avoid training data sparseness. We propose a novel method for predicting which proteins are mostly disordered by using spectral graph transducer and training with a huge amount of structure-unknown sequences as well as structure-known sequences. RESULTS: When the proposed method was evaluated on data that included 82 disordered proteins and 526 ordered proteins, its sensitivity was 0.723 and its specificity was 0.977. It resulted in a Matthews correlation coefficient 0.202 points higher than that obtained using FoldIndex, 0.221 points higher than that obtained using the method based on plotting hydrophobicity against the number of contacts and 0.07 points higher than that obtained using support vector machines (SVMs). To examine robustness against training data sparseness, we investigated the correlation between two results obtained when the method was trained on different datasets and tested on the same dataset. The correlation coefficient for the proposed method is 0.14 higher than that for the method using SVMs. When the proposed SGT-based method was compared with four per-residue predictors (VL3, GlobPlot, DISOPRED2 and IUPred (long)), its sensitivity was 0.834 for disordered proteins, which is 0.052–0.523 higher than that of the per-residue predictors, and its specificity was 0.991 for ordered proteins, which is 0.036–0.153 higher than that of the per-residue predictors. The proposed method was also evaluated on data that included 417 partially disordered proteins. It predicted the frequency of disordered proteins to be 1.95% for the proteins with 5%–10% disordered sequences, 1.46% for the proteins with 10%–20% disordered sequences and 16.57% for proteins with 20%–40% disordered sequences. CONCLUSION: The proposed method, which utilizes the information of structure-unknown data, predicts disordered proteins more accurately than other methods and is less affected by training data sparseness

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Automated Retraining Methods for Document Classification and Their Parameter Tuning

Author: B. Krishnapuram
C. Manning
C. Seymour
D. Zhou
D.D. Lewis
E. Chen
E. Okanla
K.P. Bennett
K.P. Bennett
M.-R. Amini
R. Baeza-Yates
S. Chakrabarti
T. Joachims
V. Vapnik
Z. Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

This paper addresses the problem of semi-supervised classification on document collections using retraining (also called self-training). A possible application is focused Web crawling which may start with very few, manually selected, training documents but can be enhanced by automatically adding initially unlabeled, positively classified Web pages for retraining. Such an approach is by itself not robust and faces tuning problems regarding parameters like the number of selected documents, the number of retraining iterations, and the ratio of positive and negative classified samples used for retraining. The paper develops methods for automatically tuning these parameters, based on predicting the leave-one-out error for a re-trained classifier and avoiding that the classifier is diluted by selecting too many or weak documents for retraining. Our experiments with three different datasets confirm the practical viability of the approach

Crossref

MPG.PuRe