995 research outputs found
Decoding billions of integers per second through vectorization
In many important applications -- such as search engines and relational
database systems -- data is stored in the form of arrays of integers. Encoding
and, most importantly, decoding of these arrays consumes considerable CPU time.
Therefore, substantial effort has been made to reduce costs associated with
compression and decompression. In particular, researchers have exploited the
superscalar nature of modern processors and SIMD instructions. Nevertheless, we
introduce a novel vectorized scheme called SIMD-BP128 that improves over
previously proposed vectorized approaches. It is nearly twice as fast as the
previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the
same time, SIMD-BP128 saves up to 2 bits per integer. For even better
compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has
a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while
being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see
http://boytsov.info/datasets/clueweb09gap
Fast Hands-free Writing by Gaze Direction
We describe a method for text entry based on inverse arithmetic coding that
relies on gaze direction and which is faster and more accurate than using an
on-screen keyboard.
These benefits are derived from two innovations: the writing task is matched
to the capabilities of the eye, and a language model is used to make
predictable words and phrases easier to write.Comment: 3 pages. Final versio
Investigating five key predictive text entry with combined distance and keystroke modelling
This paper investigates text entry on mobile devices using only five-keys. Primarily to support text entry on smaller devices than mobile phones, this method can also be used to maximise screen space on mobile phones. Reported combined Fitt's law and keystroke modelling predicts similar performance with bigram prediction using a five-key keypad as is currently achieved on standard mobile phones using unigram prediction. User studies reported here show similar user performance on five-key pads as found elsewhere for novice nine-key pad users
Bekenstein entropy bound for weakly-coupled field theories on a 3-sphere
We calculate the high temperature partition functions for SU(Nc) or U(Nc)
gauge theories in the deconfined phase on S^1 x S^3, with scalars, vectors,
and/or fermions in an arbitrary representation, at zero 't Hooft coupling and
large Nc, using analytical methods. We compare these with numerical results
which are also valid in the low temperature limit and show that the Bekenstein
entropy bound resulting from the partition functions for theories with any
amount of massless scalar, fermionic, and/or vector matter is always satisfied
when the zero-point contribution is included, while the theory is sufficiently
far from a phase transition. We further consider the effect of adding massive
scalar or fermionic matter and show that the Bekenstein bound is satisfied when
the Casimir energy is regularized under the constraint that it vanishes in the
large mass limit. These calculations can be generalized straightforwardly for
the case of a different number of spatial dimensions.Comment: 32 pages, 12 figures. v2: Clarifications added. JHEP versio
Holographic Approach to Regge Trajectory and Rotating D5 brane
We study the Regge trajectories of holographic mesons and baryons by
considering rotating strings and D5 brane, which is introduced as the baryon
vertex. Our model is based on the type IIB superstring theory with the
background of asymptotic . This background is dual to a
confining supersymmetric Yang-Mills theory (SYM) with gauge condensate,
, which determines the tension of the linear potential between the quark
and anti-quark. Then the slope of the meson trajectory () is given
by this condensate as at large spin . This
relation is compatible with the other theoretical results and experiments. For
the baryon, we show the importance of spinning baryon vertex to obtain a Regge
slope compatible with the one of and series. In both cases, mesons
and baryons, the trajectories are shifted to large mass side with the same
slope for increasing current quark mass.Comment: 28 pages, 7 figure
Recommended from our members
An Overview of the Use of Neural Networks for Data Mining Tasks
In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks
Natural language analysis of online health forums
Despite advances in concept extraction from free text, finding
meaningful health related information from online patient forums
still poses a significant challenge. Here we demonstrate how structured
information can be extracted from posts found in such online health related
forums by forming relationships between a drug/treatment and a
symptom or side effect, including the polarity/sentiment of the patient.
In particular, a rule-based natural language processing (NLP) system
is deployed, where information in sentences is linked together though
anaphora resolution. Our NLP relationship extraction system provides
a strong baseline, achieving an F1 score of over 80% in discovering the
said relationships that are present in the posts we analysed
Identify error-sensitive patterns by decision tree
© Springer International Publishing Switzerland 2015. When errors are inevitable during data classification, finding a particular part of the classification model which may be more susceptible to error than others, when compared to finding an Achilles’ heel of the model in a casual way, may help uncover specific error-sensitive value patterns and lead to additional error reduction measures. As an initial phase of the investigation, this study narrows the scope of problem by focusing on decision trees as a pilot model, develops a simple and effective tagging method to digitize individual nodes of a binary decision tree for node-level analysis, to link and track classification statistics for each node in a transparent way, to facilitate the identification and examination of the potentially “weakest” nodes and error-sensitive value patterns in decision trees, to assist cause analysis and enhancement development. This digitization method is not an attempt to re-develop or transform the existing decision tree model, but rather, a pragmatic node ID formulation that crafts numeric values to reflect the tree structure and decision making paths, to expand post-classification analysis to detailed node-level. Initial experiments have shown successful results in locating potentially high-risk attribute and value patterns; this is an encouraging sign to believe this study worth further exploration
Decision level ensemble method for classifying multi-media data
In the digital era, the data, for a given analytical task, can be collected in different formats, such as text, images and audio etc. The data with multiple formats are called multimedia data. Integrating and fusing multimedia datasets has become a challenging task in machine learning and data mining. In this paper, we present heterogeneous ensemble method that combines multi-media datasets at the decision level. Our method consists of several components, including extracting the features from multimedia datasets that are not represented by features, modelling independently on each of multimedia datasets, selecting models based on their accuracy and diversity and building the ensemble at the decision level. Hence our method is called decision level ensemble method (DLEM). The method is tested on multimedia data and compared with other heterogeneous ensemble based methods. The results show that the DLEM outperformed these methods significantly
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers
- …
