Search CORE

2,052 research outputs found

Optimism in Active Learning with Gaussian Processes

Author: A Kapoor
B Settles
DD Lewis
Publication venue: HAL CCSD
Publication date: 09/11/2015
Field of study

International audienceIn the context of Active Learning for classification, the classification error depends on the joint distribution of samples and their labels which is initially unknown. The minimization of this error requires estimating this distribution. Online estimation of this distribution involves a trade-off between exploration and exploitation. This is a common problem in machine learning for which multi-armed bandit theory, building upon Optimism in the Face of Uncertainty, has been proven very efficient these last years. We introduce two novel algorithms that use Optimism in the Face of Uncertainty along with Gaussian Processes for the Active Learning problem. The evaluation lead on real world datasets shows that these new algorithms compare positively to state-of-the-art methods

HAL-CentraleSupelec

CiteSeerX

HAL - Université de Franche-Comté

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

Author: A Solow
C Rao
CD Manning
DD Lewis
DM Blei
DQ Nguyen
H Azarbonyad
H Soleimani
M Dehghani
Publication venue
Publication date: 01/01/2017
Field of study

A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a background corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents' topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments.Comment: Proceedings of the 39th European Conference on Information Retrieval (ECIR2017

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

T ${}^2$ K ${}^2$ : The Twitter Top-K Keywords Benchmark

Author: A Guille
AE Gattiker
CD Manning
D Kılınç
DD Lewis
F Ravat
J Darmont
J Ferrarons
J Gray
J O’Shea
JD Cooper
K Spärck Jones
K Spärck Jones
L Wang
S Bringay
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2017
Field of study

Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly, and thus must be efficiently computed. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present a top-k keywords benchmark, T

{}^2

{}^2

, which features a real tweet dataset and queries with various complexities and selectivities. T

{}^2

{}^2

helps evaluate weighting schemes and database implementations in terms of computing performance. To illustrate T

{}^2

{}^2

's relevance and genericity, we successfully performed tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand

arXiv.org e-Print Archive

Good Statistical Practice—development of tailored Good Clinical Practice training for statisticians

Author: Armstrong E
Brown J
Dutton SJ
Gamble C
Lewis S
Mossop H
Peckitt C
Stocken DD
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

\ua9 The Author(s) 2024.Background: Statisticians are fundamental in ensuring clinical research, including clinical trials, are conducted with quality, transparency, reproducibility and integrity. Good Clinical Practice (GCP) is an international quality standard for the conduct of clinical trials research. Statisticians are required to undertake training on GCP but existing training is generic and, crucially, does not cover statistical activities. This results in statisticians undertaking training mostly unrelated to their role and variation in awareness and implementation of relevant regulatory requirements with regards to statistical conduct. The need for role-relevant training is recognised by the UK NHS Health Research Authority and the Medicines and Healthcare products Regulatory Agency (MHRA). Methods: The Good Statistical Practice (GCP for Statisticians) project was instigated by the UK Clinical Research Collaboration (UKCRC) Registered Clinical Trials Unit (CTU) Statisticians Operational Group and funded by the National Institute for Health and Care Research (NIHR), to develop materials to enable role-specific GCP training tailored to statisticians. Review of current GCP training was undertaken by survey. Development of training materials were based on MHRA GCP. Critical review and piloting was conducted with UKCRC CTU and NIHR researchers with comment from MHRA. Final review was conducted through the UKCRC CTU Statistics group. Results: The survey confirmed the need and desire for the development of dedicated GCP training for statisticians. An accessible, comprehensive, piloted training package was developed tailored to statisticians working in clinical research, particularly the clinical trials arena. The training materials cover legislation and guidance for best practice across all clinical trial processes with statistical involvement, including exercises and real-life scenarios to bridge the gap between theory and practice. Comprehensive feedback was incorporated. The training materials are freely available for national and international adoption. Conclusion: All research staff should have training in GCP yet the training undertaken by most academic statisticians does not cover activities related to their role. The Good Statistical Practice (GCP for Statisticians) project has developed and extensively piloted new, role-specific, comprehensive, accessible GCP training tailored to statisticians working in clinical research, particularly the clinical trials arena. This role-specific training will encourage best practice, leading to transparent and reproducible statistical activity, as required by regulatory authorities and funders

Newcastle University E-Prints

TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning

Author: A Achille
B Leskes
C Busso
DD Lewis
J Ashburner
M Studenỳ
S Chandar
SL Huang
T Miyato
X Nguyen
Y Cheng
Publication venue
Publication date: 13/07/2020
Field of study

Fusing data from multiple modalities provides more information to train machine learning systems. However, it is prohibitively expensive and time-consuming to label each modality with a large amount of data, which leads to a crucial problem of semi-supervised multi-modal learning. Existing methods suffer from either ineffective fusion across modalities or lack of theoretical guarantees under proper assumptions. In this paper, we propose a novel information-theoretic approach, namely \textbf{T}otal \textbf{C}orrelation \textbf{G}ain \textbf{M}aximization (TCGM), for semi-supervised multi-modal learning, which is endowed with promising properties: (i) it can utilize effectively the information across different modalities of unlabeled data points to facilitate training classifiers of each modality (ii) it has theoretical guarantee to identify Bayesian classifiers, i.e., the ground truth posteriors of all modalities. Specifically, by maximizing TC-induced loss (namely TC gain) over classifiers of all modalities, these classifiers can cooperatively discover the equivalent class of ground-truth classifiers; and identify the unique ones by leveraging limited percentage of labeled data. We apply our method to various tasks and achieve state-of-the-art results, including news classification, emotion recognition and disease prediction.Comment: ECCV 2020 (oral

arXiv.org e-Print Archive

Crossref

Comparison of transverse wires and half pins in Taylor Spatial Frame: A biomechanical study

Author: A Podolsky
AR Cross
Ashish Khurana
B Gasser
Carlton Byrne
CS Roberts
DD Lewis
DD Lewis
E Yilmaz
GA Ilizarov
H Windhagen
Hiro Tanaka
JK Oh
Kartik Haraharan
L Claes
L Orienti
L Yang
LP Kristiansen
MJ Al-Sayyad
PJ Hillard
R Rödl
SA Green
Sam Evans
T Yamaji
V Antoci
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Online Research @ Cardiff

Springer - Publisher Connector

PubMed Central

A Compromise between Neutrino Masses and Collider Signatures in the Type-II Seesaw Model

Author: A Bookstein
BK Ghosh
CJ Rijsbergen van
D Angluin
D Cohn
DD Lewis
DJC MacKay
DT Davis
G Salton
HS Seung
J Hwang
M Plutowski
ME Maron
N Fuhr
N Fuhr
P Biebricher
P McCullagh
P. E. Hart
PE Utgoff
PJ Hayes
RO Duda
S Robertson
TM Mitchell
WA Gale
WB Croft
WG Cochran
WS Cooper
WS Cooper
Y Freund
Publication venue
Publication date: 01/01/1994
Field of study

A natural extension of the standard

SU(2)_{\rm L} \times U(1)_{\rm Y}

gauge model to accommodate massive neutrinos is to introduce one Higgs triplet and three right-handed Majorana neutrinos, leading to a

6\times 6

neutrino mass matrix which contains three

3\times 3

sub-matrices

M_{\rm L}

M_{\rm D}

and

M_{\rm R}

. We show that three light Majorana neutrinos (i.e., the mass eigenstates of

\nu_e

\nu_\mu

and

\nu_\tau

) are exactly massless in this model, if and only if

M_{\rm L} = M_{\rm D} M_{\rm R}^{-1} M_{\rm D}^T

exactly holds. This no-go theorem implies that small but non-vanishing neutrino masses may result from a significant but incomplete cancellation between

M_{\rm L}

and

M_{\rm D} M_{\rm R}^{-1} M_{\rm D}^T

terms in the Type-II seesaw formula, provided three right-handed Majorana neutrinos are of

{\cal O}(1)

TeV and experimentally detectable at the LHC. We propose three simple Type-II seesaw scenarios with the

A_4 \times U(1)_{\rm X}

flavor symmetry to interpret the observed neutrino mass spectrum and neutrino mixing pattern. Such a TeV-scale neutrino model can be tested in two complementary ways: (1) searching for possible collider signatures of lepton number violation induced by the right-handed Majorana neutrinos and doubly-charged Higgs particles; and (2) searching for possible consequences of unitarity violation of the

3\times 3

neutrino mixing matrix in the future long-baseline neutrino oscillation experiments.Comment: RevTeX 19 pages, no figure

arXiv.org e-Print Archive

Crossref

Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

Author: B Naidan
DD Lewis
DM Blei
DW Jacobs
E Chávez
G Chechik
GR Hjaltason
GT Toussaint
H Samet
L Boytsov
M Aumüller
S Kullback
S Robertson
T Skopal
Y Malkov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/10/2019
Field of study

We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective, we find that constructing an index using a modified, e.g., symmetrized, distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions

arXiv.org e-Print Archive

Crossref