Search CORE

995 research outputs found

Decoding billions of integers per second through vectorization

Author: Aksyonoff A
Büttcher S
Jones DM
Witten IH
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

arXiv.org e-Print Archive

R-libre

Crossref

Fast Hands-free Writing by Gaze Direction

Author: CE Shannon
David J. C. MacKay
David J. Ward
IH Witten
JG Cleary
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/08/2002
Field of study

We describe a method for text entry based on inverse arithmetic coding that relies on gaze direction and which is faster and more accurate than using an on-screen keyboard. These benefits are derived from two innovations: the writing task is matched to the capabilities of the eye, and a language model is used to make predictable words and phrases easier to write.Comment: 3 pages. Final versio

arXiv.org e-Print Archive

Crossref

Investigating five key predictive text entry with combined distance and keystroke modelling

Author: CL James
IH Witten
Mark D. Dunlop
MD Dunlop
Michelle Montgomery Masters
PM Fitts
SA Brewster
SK Card
SM Katz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2008
Field of study

This paper investigates text entry on mobile devices using only five-keys. Primarily to support text entry on smaller devices than mobile phones, this method can also be used to maximise screen space on mobile phones. Reported combined Fitt's law and keystroke modelling predicts similar performance with bigram prediction using a five-key keypad as is currently achieved on standard mobile phones using unigram prediction. User studies reported here show similar user performance on five-key pads as found elsewhere for novice nine-key pad users

Crossref

University of Strathclyde Institutional Repository

Bekenstein entropy bound for weakly-coupled field theories on a 3-sphere

Author: D Klemm
D Kutasov
E Elizalde
E Elizalde
E Witten
G Gibbons
IH Brevik
J Dowker
JD Bekenstein
JL Cardy
Joyce C. Myers
M Bordag
O Aharony
S Blau
S Hawking
S Lim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We calculate the high temperature partition functions for SU(Nc) or U(Nc) gauge theories in the deconfined phase on S^1 x S^3, with scalars, vectors, and/or fermions in an arbitrary representation, at zero 't Hooft coupling and large Nc, using analytical methods. We compare these with numerical results which are also valid in the low temperature limit and show that the Bekenstein entropy bound resulting from the partition functions for theories with any amount of massless scalar, fermionic, and/or vector matter is always satisfied when the zero-point contribution is included, while the theory is sufficiently far from a phase transition. We further consider the effect of adding massive scalar or fermionic matter and show that the Bekenstein bound is satisfied when the Casimir energy is regularized under the constraint that it vanishes in the large mass limit. These calculations can be generalized straightforwardly for the case of a different number of spatial dimensions.Comment: 32 pages, 12 figures. v2: Clarifications added. JHEP versio

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Springer - Publisher Connector

Dissertations of the University of Groningen

Holographic Approach to Regge Trajectory and Rotating D5 brane

Author: A Karch
A Kehagias
A Paredes
CG Callan Jr
DJ Gross
E Witten
E Witten
F Bigazzi
Fumihiko Toyoda
GF Teramond de
H Liu
I Kirsch
IH Brevik
J Gomis
JM Maldacena
K Ghoroku
K Ghoroku
K Ghoroku
Kazuo Ghoroku
LA Pando Zayas
M D’Elia
M Huang
M Kruczenski
M Kruczenski
SJ Brodsky
SS Gubser
T Imoto
Tomoki Taminato
Y Imamura
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/04/2011
Field of study

We study the Regge trajectories of holographic mesons and baryons by considering rotating strings and D5 brane, which is introduced as the baryon vertex. Our model is based on the type IIB superstring theory with the background of asymptotic

AdS_5\times S^5

. This background is dual to a confining supersymmetric Yang-Mills theory (SYM) with gauge condensate,

, which determines the tension of the linear potential between the quark and anti-quark. Then the slope of the meson trajectory (

\alpha'_{M}

) is given by this condensate as

\alpha'_{M}=1/\sqrt{\pi }

at large spin

J

. This relation is compatible with the other theoretical results and experiments. For the baryon, we show the importance of spinning baryon vertex to obtain a Regge slope compatible with the one of

N

and

\Delta

series. In both cases, mesons and baryons, the trajectories are shifted to large mass side with the same slope for increasing current quark mass.Comment: 28 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

An Overview of the Use of Neural Networks for Data Mining Tasks

Author: Alberts B
Alpaydin E
Ando T
Blake CL
Bramer MA
Castanheira LG
Han J
Lu H
Mitchell M
Ni X
Quinlan RJ
Rumelhart DE
Shafer JC
Shendure J
Simić D
Stahl F
Steinwart I
Surjandari I
Wei JS
Widrow B
Witten IH
Zaslavsky B
Zhang D
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

Central Archive at the University of Reading

Crossref

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online

Natural language analysis of online health forums

Author: A Nikfarjam
CD Manning
H Sampathkumar
HJ Dai
I Korkontzelos
IH Witten
K Denecke
L Polanyi
O Bodenreider
P Gooch
S Gupta
S Karimi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Despite advances in concept extraction from free text, finding meaningful health related information from online patient forums still poses a significant challenge. Here we demonstrate how structured information can be extracted from posts found in such online health related forums by forming relationships between a drug/treatment and a symptom or side effect, including the polarity/sentiment of the patient. In particular, a rule-based natural language processing (NLP) system is deployed, where information in sentences is linked together though anaphora resolution. Our NLP relationship extraction system provides a strong baseline, achieving an F1 score of over 80% in discovering the said relationships that are present in the posts we analysed

Crossref

Birkbeck Institutional Research Online

Identify error-sensitive patterns by decision tree

Author: E Alpaydin
IA Gheyas
IH Witten
J Han
JR Quinlan
L Breiman
L Breiman
L Breiman
LI Kuncheva
M Hall
P Yang
RE Schapire
S Tabakhi
W Wu
Y Saeys
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

© Springer International Publishing Switzerland 2015. When errors are inevitable during data classification, finding a particular part of the classification model which may be more susceptible to error than others, when compared to finding an Achilles’ heel of the model in a casual way, may help uncover specific error-sensitive value patterns and lead to additional error reduction measures. As an initial phase of the investigation, this study narrows the scope of problem by focusing on decision trees as a pilot model, develops a simple and effective tagging method to digitize individual nodes of a binary decision tree for node-level analysis, to link and track classification statistics for each node in a transparent way, to facilitate the identification and examination of the potentially “weakest” nodes and error-sensitive value patterns in decision trees, to assist cause analysis and enhancement development. This digitization method is not an attempt to re-develop or transform the existing decision tree model, but rather, a pragmatic node ID formulation that crafts numeric values to reflect the tree structure and decision making paths, to expand post-classification analysis to detailed node-level. Initial experiments have shown successful results in locating potentially high-risk attribute and value patterns; this is an encouraging sign to believe this study worth further exploration

Crossref

OPUS - University of Technology Sydney

Decision level ensemble method for classifying multi-media data

Author: A Bagnall
A Mojahed
A Oliva
A Onan
B Krawczyk
CT Do
D Partridge
G Giacinto
IH Witten
K Yamanishi
LI Kuncheva
M Chen
M Woźniak
Saleh Alyahyan
W Zhu
Wenjia Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2022
Field of study

In the digital era, the data, for a given analytical task, can be collected in different formats, such as text, images and audio etc. The data with multiple formats are called multimedia data. Integrating and fusing multimedia datasets has become a challenging task in machine learning and data mining. In this paper, we present heterogeneous ensemble method that combines multi-media datasets at the decision level. Our method consists of several components, including extracting the features from multimedia datasets that are not represented by features, modelling independently on each of multimedia datasets, selecting models based on their accuracy and diversity and building the ensemble at the decision level. Hence our method is called decision level ensemble method (DLEM). The method is tested on multimedia data and compared with other heterogeneous ensemble based methods. The results show that the DLEM outperformed these methods significantly

Crossref

University of East Anglia digital repository

Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology

Author: A Freitas
A Oliveira
AD Kalra
C Clifton
D Merom
F Abelha
G Chiusano
IH Witten
K Cios
M Brown
P Cortez
P Cortez
S Menard
T Hastie
U Fayyad
Á Silva
Á Silva
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2015
Field of study

Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artiﬁcial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coeﬃcient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three inﬂuential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge conﬁrmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers

Universidade do Minho: RepositoriUM

Crossref