Search CORE

353 research outputs found

Learning Multi-label Alternating Decision Trees from Texts and Data

Author: J.R. Quinlan
R. E. Schapire
T.G. Dietterich
Y. Freund
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2003
Field of study

International audienceMulti-label decision procedures are the target of the supervised learning algorithm we propose in this paper. Multi-label decision procedures map examples to a finite set of labels. Our learning algorithm extends Schapire and Singer?s Adaboost.MH and produces sets of rules that can be viewed as trees like Alternating Decision Trees (invented by Freund and Mason). Experiments show that we take advantage of both performance and readability using boosting techniques as well as tree representations of large set of rules. Moreover, a key feature of our algorithm is the ability to handle heterogenous input data: discrete and continuous values and text data. Keywords boosting - alternating decision trees - text mining - multi-label problem

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers

Author: A. Tsybakov
C. Cortes
D. A. McAllester
D. A. McAllester
E. Mammen
J. H. Friedman
J. Rissanen
J.-Y. Audibert
L. Devroye
P. Alquier
R. Schapire
S. Boucheron
T. Zhang
W. Hoeffding
Publication venue: 'Allerton Press'
Publication date: 01/01/2008
Field of study

The aim of this paper is to generalize the PAC-Bayesian theorems proved by Catoni in the classification setting to more general problems of statistical inference. We show how to control the deviations of the risk of randomized estimators. A particular attention is paid to randomized estimators drawn in a small neighborhood of classical estimators, whose study leads to control the risk of the latter. These results allow to bound the risk of very general estimation procedures, as well as to perform model selection

arXiv.org e-Print Archive

Crossref

HAL: Hyper Article en Ligne

Hal-Diderot

HAL-Polytechnique

A survey of cost-sensitive decision tree induction algorithms

Author: Bradford J. P.
Elkan C.
Esmeir S.
Esmeir S.
Estruch V.
Fan W.
Ferri C.
Freund Y.
Hart A. E.
Knoll U.
Li J.
Lin F. Y.
Liu X.
Mease D.
Murthy S.
Ni A.
Norton S. W.
Pazzani M.
Quinlan J. R.
Quinlan J. R.
Schapire R. E.
Sunil Vadera
Susan Lomax
Swets J.
Tan M.
Ting K.
Ting K.
Ting K. M.
von Neumann J.
Zadrozny B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2013
Field of study

The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

University of Salford Institutional Repository

Crossref

Learning to Order Things

Author: Cohen W. W.
Schapire R. E.
Singer Y.
Publication venue: 'AI Access Foundation'
Publication date: 26/05/2011
Field of study

There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order instances given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a two-stage approach in which one first learns by conventional means a binary preference function indicating whether it is advisable to rank one instance before another. Here we consider an on-line algorithm for learning preference functions that is based on Freund and Schapire's 'Hedge' algorithm. In the second stage, new instances are ordered so as to maximize agreement with the learned preference function. We show that the problem of finding the ordering that agrees best with a learned preference function is NP-complete. Nevertheless, we describe simple greedy algorithms that are guaranteed to find a good approximation. Finally, we show how metasearch can be formulated as an ordering problem, and present experimental results on learning a combination of 'search experts', each of which is a domain-specific query expansion strategy for a web search engine

arXiv.org e-Print Archive

Crossref

Study of B0(s)→K0Sh+h′− decays with first observation of B0s→K0SK±π∓ and B0s→K0Sπ+π−

Author: A Garmash
A Garmash
A. A. Alves
A. Affolder
A. Artamonov
A. Bay
A. Berezhnoy
A. Bien
A. Bizzeti
A. Bondar
A. Borgia
A. Bursche
A. C. dos Reis
A. Camboni
A. Carbone
A. Cardini
A. Comerma-Montells
A. Contu
A. Cook
A. D. Nguyen
A. D. Webber
A. Davis
A. Di Canto
A. Dosil Suárez
A. Dovbnya
A. Dziurda
A. Dzyuba
A. Falabella
A. Gallas Torreira
A. Golutvin
A. Gomes
A. Grecu
A. Hicheur
A. Jaeger
A. Jawahery
A. Keune
A. Kozlinskiy
A. Lai
A. Leflat
A. Martens
A. Martynov
A. Martín Sánchez
A. Massafferri
A. Mazurov
A. McNab
A. Mordà
A. Nomerotski
A. Novoselov
A. Oblakowska-Mucha
A. Oyanguren
A. Palano
A. Papanestis
A. Pazos Alvarez
A. Pellegrino
A. Petrolini
A. Phan
A. Poluektov
A. Popov
A. Powell
A. Pritchard
A. Puig Navarro
A. Pérez-Calero Yzquierdo
A. Richards
A. Romero Vidal
A. Sarti
A. Satta
A. Schopper
A. Sciubba
A. Semennikov
A. Shires
A. Sparkes
A. Tsaregorodtsev
A. Ukleja
A. Ustyuzhanin
A. Vallier
A. Vollhardt
A. Vorobyev
A. Zhelezov
A. Zhokhov
A. Zvyagin
B Aubert
B Aubert
B. Adeva
B. Couturier
B. Gui
B. Hamilton
B. Jost
B. K. Pal
B. Khanji
B. Leverington
B. Liu
B. McSkelly
B. Meadows
B. Muryn
B. Muster
B. Pietrzyk
B. Popovici
B. Rakotomiaramanana
B. Saitta
B. Sanmartin Sedes
B. Schmidt
B. Sciascia
B. Souza De Paula
B. Spaan
B. Storaci
B. Viaud
C. Adrover
C. Baesso
C. Barschel
C. Bozzi
C. Coca
C. D’Ambrosio
C. Farinelli
C. Fitzpatrick
C. Frei
C. Färber
C. Gaspar
C. Gotti
C. Göbel
C. Hadjivasiliou
C. Haen
C. Hombach
C. J. Parkinson
C. Joram
C. Langenbruch
C. Lazzeroni
C. Linn
C. Matteuzzi
C. Nguyen-Mau
C. Parkes
C. Patrignani
C. Pavel-Nicorescu
C. Potterat
C. Prouve
C. R. Jones
C. Santamarina Rios
C. Satriano
C. Thomas
C. Voß
C. Vázquez Sierra
C. Wallace
Ch. Cauet
Ch. Elsasser
D Lange
D London
D. A. Milanes
D. A. Roa Romero
D. A. Roberts
D. Brett
D. C. Craik
D. Campora Perez
D. Decamp
D. Derkach
D. Dossett
D. Ferguson
D. Galli
D. Golubkov
D. Hill
D. Hutchcroft
D. Hynds
D. Johnson
D. Lacarrere
D. Lambert
D. Lucchesi
D. Martinez Santos
D. Martins Tostes
D. Moran
D. Pinci
D. Popov
D. R. Ward
D. Savrina
D. Souza
D. Tonelli
D. Urner
D. van Eijk
D. Vieira
D. Volyanskyy
D. Voong
D. Websdale
D. Wiedner
E. Aslanides
E. Ben-Haim
E. Bowen
E. Cogneras
E. Cowie
E. Furfaro
E. Gersabeck
E. Graugés
E. Greening
E. Gushchin
E. Hicks
E. Jans
E. Lanciotti
E. Maurice
E. Perez Trigo
E. Pesen
E. Picatoste Olloqui
E. Polycarpo
E. Rodrigues
E. Santovetti
E. Smith
E. Teodorescu
E. Thomas
E. Tournefier
E. van Herwijnen
F. Alessio
F. Archilli
F. Bedeschi
F. Blanc
F. Dettori
F. Dordei
F. Dupertuis
F. F. Wilson
F. Ferreira Rodrigues
F. Fontanelli
F. J. P. Soler
F. Jing
F. Kruse
F. Machefert
F. Maciuc
F. Meier
F. Muheim
F. Polci
F. Ruffini
F. Soomro
F. Stagni
F. Teubert
F. Zhang
G Buchalla
G. A. Cowan
G. Alkhazov
G. Auriemma
G. Bencivenni
G. Busetto
G. Carboni
G. Casse
G. Ciezarek
G. Corti
G. D. Patel
G. Fardell
G. Graziani
G. Haefeli
G. Krocker
G. Lafferty
G. Lanfranchi
G. Liu
G. Manca
G. Mancinelli
G. Martellotti
G. N. Patrick
G. Passaleva
G. Penso
G. Polok
G. Punzi
G. Raven
G. Sabatino
G. Valenti
G. Veneziano
G. Wilkinson
GJ Feldman
H Albrecht
H. Brown
H. Carranza-Mejia
H. Dijkstra
H. Gordon
H. Lu
H. Luo
H. Ruiz
H. Schindler
H. V. Cliff
H. Voss
I. Bediaga
I. Belyaev
I. Burducea
I. De Bonis
I. El Rifai
I. Komarov
I. Longstaff
I. Mous
I. Nasteva
I. R. Kenyon
I. Raniuk
I. Sepp
I. Shapoval
I. V. Machikhiliyan
J Allison
J Beringer
J Dalseno
J Lees
J Lees
J. A. Hernando Morata
J. Albrecht
J. Anderson
J. Beddow
J. Benton
J. Blouw
J. Bressieux
J. Buytaert
J. Closier
J. Cogan
J. E. Andrews
J. Garofoli
J. Garra Tico
J. H. Lopes
J. H. Rademacker
J. Harrison
J. He
J. J. Back
J. J. Saborido Silva
J. J. Velthuis
J. Lefrançois
J. Luisier
J. M. De Miranda
J. M. Otalora Goicochea
J. Maratas
J. Marks
J. McCarthy
J. Molina Rodriguez
J. Panman
J. Prisciandaro
J. Rouvinet
J. Serrano
J. Smith
J. van den Brand
J. van Leerdam
J. van Tilburg
J. Wang
J. Wicht
J. Wiechczynski
J. Wimberley
J. Wishahi
J.-P. Lees
K Bruyn De
K. Belous
K. Carvalho Akiba
K. Ciba
K. De Bruyn
K. Hennessy
K. Kreplin
K. Kurek
K. Müller
K. Petridis
K. Rinnert
K. Senderowska
K. Wyllie
L Silvestrini
L. A. Granado Cardoso
L. Anderlini
L. Carson
L. Castillo Garcia
L. De Paula
L. Del Buono
L. Eklund
L. Garrido
L. Giubega
L. Kravchuk
L. Li Gioi
L. Pescatore
L. Shekhtman
L. Sun
L. Wiggers
L. Zhang
L. Zhong
LHCb collaboration
LHCb collaboration
LHCb collaboration
M Ciuchini
M Ciuchini
M Ciuchini
M Gronau
M Kobayashi
M Pivk
M. Adinolfi
M. Alexander
M. Artuso
M. Baalouch
M. Britsch
M. Calvi
M. Calvo Gomez
M. Cattaneo
M. Charles
M. Chrzaszcz
M. Clemencic
M. Coombes
M. D. Sokoloff
M. De Cian
M. Deckenhoff
M. Dogaru
M. Ferro-Luzzi
M. Fiore
M. Fontana
M. Frank
M. Frosini
M. Gandelman
M. Gersabeck
M. Grabalosa Gándara
M. Hess
M. Hoballah
M. Idzik
M. J. Morello
M. John
M. Kaballo
M. Karacson
M. Korolev
M. Kreps
M. Kucharczyk
M. Liles
M. M. Reid
M. Martinelli
M. Meissner
M. Merk
M. Needham
M. Nicol
M. Orlandea
M. P. Williams
M. Palutan
M. Pappagallo
M. Patel
M. Pepe Altarelli
M. Perrin-Terrin
M. Plo Casasus
M. S. Rangel
M. Sannino
M. Sapunov
M. Savrie
M. Schiller
M. Schlupp
M. Schmelling
M. Seco
M. Shapkin
M. Sirendi
M. Smith
M. Straticiuc
M. Szczekowski
M. T. Tran
M. Teklishyn
M. Tobin
M. Tresch
M. Ubeda Garcia
M. van Beuzekom
M. Van Dijk
M. Veltri
M. Vesterinen
M. Whitehead
M. Williams
M. Witek
M. Zangoli
M. Zavertyaev
M.-H. Schune
M.-N. Minard
M.-O. Bettler
N Cabibbo
N. A. Smith
N. Bondar
N. Chiapolini
N. Déléage
N. H. Brook
N. Harnew
N. Hussain
N. K. Watson
N. Lopez-March
N. Neufeld
N. Nikitin
N. Rauschmayr
N. Sagidova
N. Serra
N. Skidmore
N. Torr
N. Tuning
O. Aquines Gutierrez
O. Callot
O. Deschamps
O. Francisco
O. Grünberg
O. Kochebina
O. Leroy
O. Maev
O. Okhrimenko
O. Schneider
O. Shevchenko
O. Steinkamp
O. Yushchenko
P Amo Sanchez del
P Golonka
P. Alvarez Cartelle
P. Campana
P. Chen
P. Collins
P. David
P. De Simone
P. Durante
P. E. L. Clarke
P. Gandini
P. Garosi
P. Gorbounov
P. Griffith
P. Henrard
P. Hopchev
P. Hunt
P. Ilten
P. Jaton
P. Koppenburg
P. Krokovny
P. M. Bjørnstad
P. Marino
P. Morawski
P. N. Y. David
P. Naik
P. Owen
P. Perret
P. Robbe
P. Rodriguez Perez
P. Ruiz Valls
P. Sail
P. Schaack
P. Seyfert
P. Shatalov
P. Spradlin
P. Szczypka
P. Tsopelas
P. Vazquez Regueiro
Ph. Charpentier
Ph. Ghez
R. Aaij
R. Andreassen
R. B. Appleby
R. Bernet
R. Cardinale
R. Cenci
R. Currie
R. Dzhelyadin
R. Ekelhof
R. F. Koopman
R. Forty
R. Gauld
R. Graciani Diaz
R. J. Barlow
R. Jacobsson
R. Le Gac
R. Lefèvre
R. Lindner
R. Matev
R. McNulty
R. Mountain
R. Muresan
R. Märki
R. Nandakumar
R. Niet
R. Oldeman
R. Santacesaria
R. Schwemmer
R. Silva Coutinho
R. Vazquez Gomez
R. W. Lambert
R. Waldi
R. Wallace
R. Young
RE Schapire
RH Dalitz
S Agostinelli
S. A. Wotton
S. Ali
S. Amato
S. Amerio
S. Bachmann
S. Barsuk
S. Belogurov
S. Benson
S. Bifani
S. Blusk
S. Borghi
S. C. Haines
S. Cadeddu
S. Coquereau
S. Cunliffe
S. De Capua
S. Donleavy
S. Easo
S. Eidelman
S. Eisenhardt
S. Farry
S. Filippov
S. Furcas
S. Gregson
S. Hall
S. Hansmann-Menzemer
S. Kandybei
S. Leo
S. Lohn
S. Malde
S. Monteil
S. Neubert
S. Oggero
S. Ogilvy
S. Perazzini
S. Playfer
S. Redford
S. Ricciardi
S. Roiser
S. Stahl
S. Stevenson
S. Stoica
S. Stone
S. Swientek
S. T. Harnew
S. Tolk
S. Topp-Joergensen
S. Tourneur
S. T’Jampens
S. Vecchi
S. Wandernoth
S. Wright
S. Wu
T Sjöstrand
T. Bird
T. Blake
T. Brambach
T. Britton
T. D. Nguyen
T. Gershon
T. Gys
T. Hampson
T. Hartmann
T. Head
T. Huse
T. J. V. Bowcock
T. Ketel
T. Kvaratskheliya
T. Latham
T. Lesiak
T. M. Karbach
T. Nakada
T. Nikodem
T. Palczewski
T. Pilař
T. Ruf
T. Shears
T. Skwarnicki
T. Szumlak
Th. Bauer
U. Egede
U. Eitschberger
U. Marconi
U. Straumann
U. Uwer
V. Balagura
V. Bocci
V. Coco
V. Egorychev
V. Fernandez Albor
V. Gibson
V. Heijne
V. Iakovenko
V. K. Subbiah
V. Kudryavtsev
V. N. La Thi
V. Niess
V. Obraztsov
V. Pugatch
V. Rives Molina
V. Romanovsky
V. Salustino Guimaraes
V. Shevchenko
V. Syropoulos
V. Tisserand
V. V. Gligorov
V. Vagnoni
V. Vorobyev
W. Baldini
W. Barter
W. Bonivento
W. C. Zhang
W. De Silva
W. Hulsbergen
W. Kanso
W. Qian
W. Wislicki
X. Cid Vidal
X. Vilasis-Cardona
X. Yuan
Y Grossman
Y Nakahama
Y. Amhis
Y. Gao
Y. Li
Y. Shcheglov
Y. Xie
Y. Zhang
Yu. Guz
Z. Ajaltouni
Z. Mathe
Z. Xing
Z. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

A search for charmless three-body decays of B 0 and B0s mesons with a K0S meson in the final state is performed using the pp collision data, corresponding to an integrated luminosity of 1.0 fb−1, collected at a centre-of-mass energy of 7 TeV recorded by the LHCb experiment. Branching fractions of the B0(s)→K0Sh+h′− decay modes (h (′) = π, K), relative to the well measured B0→K0Sπ+π− decay, are obtained. First observation of the decay modes B0s→K0SK±π∓ and B0s→K0Sπ+π− and confirmation of the decay B0→K0SK±π∓ are reported. The following relative branching fraction measurements or limits are obtained B(B0→K0SK±π∓)B(B0→K0Sπ+π−)=0.128±0.017(stat.)±0.009(syst.), B(B0→K0SK+K−)B(B0→K0Sπ+π−)=0.385±0.031(stat.)±0.023(syst.), B(B0s→K0Sπ+π−)B(B0→K0Sπ+π−)=0.29±0.06(stat.)±0.03(syst.)±0.02(fs/fd), B(B0s→K0SK±π∓)B(B0→K0Sπ+π−)=1.48±0.12(stat.)±0.08(syst.)±0.12(fs/fd)B(B0s→K0SK+K−)B(B0→K0Sπ+π−)∈[0.004;0.068]at90%CL

VU Research Portal

Archivio istituzionale della ricerca - Università di Bari

Warwick Research Archives Portal Repository

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Ferrara

Archivio istituzionale della ricerca - Università di Genova

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Archivio istituzionale della ricerca - Università di Padova

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Urbino

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

ZORA

Diposit Digital de la Universitat de Barcelona

Hal - Université Grenoble Alpes

HAL AMU

HAL Clermont Université

HAL Université de Savoie

Enlighten

The University of Manchester - Institutional Repository

ePubs: the open archive for STFC research publications

Manisa Celal Bayar Üniversitesi Akademik Arşiv Sistemi

ART

Hal-Diderot

MPG.PuRe

Infoscience - École polytechnique fédérale de Lausanne

Springer - Publisher Connector

Archivio della Ricerca - Università della Basilicata

UCL Discovery

Archivio della ricerca- Università di Roma La Sapienza

BoostingTree: parallel selection of weak learners in boosting, with application to ranking

Author: A. György
A. Matanović
Andrea N. Bán
András György
C. J. C. Burges
C. Tamon
D. D. Margineantu
D. R. Jones
G. Escudero
G. Martínez-Muñoz
G. Tsoumakas
H. Valizadegan
K. Järvelin
L. Kocsis
Levente Kocsis
M. J. Streeter
M. Luby
O. Chapelle
P. Auer
P. Auer
P. Donmez
Q. Wu
R. Busa-Fekete
R. Busa-Fekete
R. E. Schapire
R. M. Hyatt
R. Munos
S. Gelly
T. Cazenave
Y. Freund
Y. Freund
Y. Freund
Y. T. Xi
Z. J. Xiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Boosting algorithms have been found successful in many areas of machine learning and, in particular, in ranking. For typical classes of weak learners used in boosting (such as decision stumps or trees), a large feature space can slow down the training, while a long sequence of weak hypotheses combined by boosting can result in a computationally expensive model. In this paper we propose a strategy that builds several sequences of weak hypotheses in parallel, and extends the ones that are likely to yield a good model. The weak hypothesis sequences are arranged in a boosting tree, and new weak hypotheses are added to promising nodes (both leaves and inner nodes) of the tree using some randomized method. Theoretical results show that the proposed algorithm asymptotically achieves the performance of the base boosting algorithm applied. Experiments are provided in ranking web documents and move ordering in chess, and the results indicate that the new strategy yields better performance when the length of the sequence is limited, and converges to similar performance as the original boosting algorithms otherwise. © 2013 The Author(s)

Crossref

SZTAKI Publication Repository

Machine Learning in Automated Text Categorization

Author: ANDROUTSOPOULOS I.
ATTARDI G.
BAKER L.D.
BIEBRICHER P.
CAROPRESO M.F.
CAVNAR W.B.
CHAKRABARTI S.
CLACK C.
CLEVERDON C.
COHEN W. W.
COHEN W. W.
COHEN W.W.
DAGAN I.
DEERWESTER S.
DENOYER L.
DIAZ ESTEBAN A.
DRUCKER H.
DUMAIS S.T.
DUMAIS S.T.
ESCUDERO G.
Fabrizio Sebastiani
FIELD B.
FORSYTH R. S.
FUHR N.
FUHR N.
FUHR N.
FURNKRANZ J.
GALAVOTTI L.
GALE W. A.
GOVERT N.
GRAY W.A.
GUTHRIE L.
HAYES P.J.
HEAPS H.
HERSH W.
HULL D. A.
HULL D. A.
ITTNER D.J.
IWAYAMA M.
IYER R.D.
JOACHIMS T.
JOACHIMS T.
JOACHIMS T.
JOHN G. H.
JUNKER M.
JUNKER M.
KESSLER B.
KIM Y.-H.
KLINKENBERG R.
KNORZ G.
KOLLER D.
LAM S.L.
LAM W.
LAM W.
LANG K.
LARKEY L. S.
LARKEY L. S.
LARKEY L.S.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LI H.
LI Y.H.
LIERE R.
LIM J. H.
MASAND B.
MASAND B.
MCCALLUM A. K.
MCCALLUM A.K.
MLADENIC D.
MLADENIC D.
MOULINIER I.
MOULINIER I.
MYERS K.
NG H.T.
OH H.-J.
PAZIENZA M. T.
RILOFF E.
ROBERTSON S.E.
ROBERTSON S.E.
ROTH D.
RUIZ M.E.
SABLE C.L.
SARACEVIC T.
SCHAPIRE R. E.
SCHUTZE H.
SCHUTZE H.
SCOTT S.
SEBASTIANI F.
SINGHAL A.
SLONIM N.
TAIRA H.
TUMER K.
TZERAS K.
VAN RIJSBERGEN C. J.
WIENER E.D.
YANG Y.
YANG Y.
YANG Y.
YANG Y.
YU K.L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2001
Field of study

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

arXiv.org e-Print Archive

CiteSeerX

Crossref

Information retrieval and text mining technologies for chemistry

Author: Abacha A. B.
Alberts D.
Alfonso Valencia
American Chemical Society
Anália Lourenço
Aphinyanaphongs Y.
Appelt D. E.
Aramaki E.
Aronson A. R.
Asahara M.
Babych B.
Baeza-Yates R.
Bambenek J.
Barnard J. M.
Bast H.
Batista-Navarro R.
Batista-Navarro R. T.
Bian J.
Bies A.
Bikel D. M.
Blaschke C.
Brecher J. S.
Brill E.
Bunescu R.
Bunescu R. C.
Califf M. E.
Carpenter B.
Caruana R.
Chee B. W.
Chhieng D.
Chinchor N.
Chiticariu L.
Chowdhury M. F. M.
Chowdhury M. F. M.
Ciravegna F.
Cleverdon C. W.
Coden A.
Cohen R.
Collier N.
Corbett P.
Corbett P.
Cover T. M.
Craven M.
Cummings M. D.
Currano J. N.
Currano J. N.
Currano J. N.
Currano J. N.
Cutting D. R.
Davis C. H.
Dieb T. M.
Dieb T. M.
Dogan R. I.
Downs G. M.
Dunikowski L. G.
Embarek M.
Eom J.-H.
Faber J.
Fall C. J.
Fattore M.
Fennell R. W.
Freund Y.
Fujiyoshi A.
Fukuda K.
Gale W. A.
Garcelon N.
Garnier J.-P.
Garten Y.
Ginn R.
Giuliano C.
Gold S.
Grefenstette G.
Grishman R.
Gurulingappa H.
Gurulingappa H.
Gusfield D.
He Y.
Hearst M. A.
Hersh W.
Hersh W.
Hirschman L.
Hobbs J. R.
Hodge G. M.
Holzinger A.
Hsueh P.-Y.
Huber T.
Iyer S. V
Jackson P.
Joachims T.
Johnson D.
Jonnalagadda S.
Jonnalagadda S.
Julen Oyarzabal
Jurafsky D.
Kaewphan S.
Kaewphan S.
Karkaletsis V.
Katragadda S.
Kazama J.
Kazawa H.
Kelly L.
Kenny P. W.
Kim J.-D.
Kim Y.
Kleene S. C.
Kolárik C.
Kongburan W.
Kornai A.
Kraaij W.
Krallinger M.
Krallinger M.
Krallinger M.
Kremer G.
Kreuzthaler M.
Kucera H.
Lai H.
Lawson A. J.
Leaman R.
Leaman R.
Lee C.-H.
Levenshtein V. I.
Levin M. A.
Li J.
Li N.
Li Y.
Liu X.
Locke W. N.
Lovins J. B.
Lowe D. M.
Lupu M.
Lupu M.
Mackenzie C. E.
Manning C. D.
Mansouri A.
Martin E.
Martin Krallinger
Mattmann C.
Maynard D.
McCallum A.
McEwen L.
McKnight L.
McNaught A.
Meystre S. M.
Michalski S. R.
Michie D.
Mihalcea R.
Mitton R.
Miwa M.
Mollá D.
Murray-Rust P.
Müller B.
Nebel A.
Nikfarjam A.
Névéol A.
Névéol A.
Obdulia Rabal
Pang B.
Panico R.
Perez-Iratxeta C.
Ponomareva N.
Ratinov L.
Ratnaparkhi A.
Read J.
Rebholz-Schuhmann D.
Reeker L. H.
Rocchio J. J.
Rohbeck H.-G.
Rosario B.
Roth D. L.
Rupp C. J.
Rupp C. J.
Sagae K.
Salim N.
Salton G.
Sanchez-Cisneros D.
Saracevic T.
Sasaki Y.
Schapire R. E.
Schenck R.
Schenck R. J.
Schlaf A.
Schuemie M. J.
Segura Bedmar I.
Segura-Bedmar I.
Sekine S.
Sequeira E.
Settles B.
Settles B.
Sewell W.
Shen D.
Shidha M. V
Singhal A.
Smith E. G.
Stamatatos E.
Sutton C.
Sætre R.
Taylor K. T.
Tharatipyakul A.
Tomanek K.
Tomanek K.
Tsuruoka Y.
Tsuruoka Y.
Täger W.
Urbain J.
van Rijsbergen C. J.
Vapnik V. N.
Vasserman A.
Visweswaran S.
Voorhees E. M.
Wang W.
Wang Y.
Wei C.-H.
Wei C.-H.
Wermter J.
Wilbur W. J.
Willett P.
Willett P.
Williams A. J.
Witten I. H.
Workman M. L.
Wrublewski D. T.
Xu R.
Xue N.
Yan S.
Yang C.
Yang C. C.
Yang Y.
Zass E.
Zipf G. K.
Zipf G. K.
Zitnik S.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

UPCommons (Universitat Politècnica de Catalunya)

Aneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields

Author: Albertson
Albertson
Beitzinger
Brown
Chin
E. M. Airoldi
Han
Heim
Huang
Jonsson
Nag
O. G. Troyanskaya
R. E. Schapire
Rocke
Rueda
Shah
Snijders
V. Dumeaux
van Beers
Wessels
Yi
Z. Barutcuoglu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: The heterogeneity of cancer cannot always be recognized by tumor morphology, but may be reflected by the underlying genetic aberrations. Array comparative genome hybridization (array-CGH) methods provide high-throughput data on genetic copy numbers, but determining the clinically relevant copy number changes remains a challenge. Conventional classification methods for linking recurrent alterations to clinical outcome ignore sequential correlations in selecting relevant features. Conversely, existing sequence classification methods can only model overall copy number instability, without regard to any particular position in the genome

Crossref

Harvard University - DASH

TUScholarShare (Temple University)

PubMed Central

Munin - Open Research Archive

NORA - Norwegian Open Research Archives