995 research outputs found

    Decoding billions of integers per second through vectorization

    Get PDF
    In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

    Fast Hands-free Writing by Gaze Direction

    Full text link
    We describe a method for text entry based on inverse arithmetic coding that relies on gaze direction and which is faster and more accurate than using an on-screen keyboard. These benefits are derived from two innovations: the writing task is matched to the capabilities of the eye, and a language model is used to make predictable words and phrases easier to write.Comment: 3 pages. Final versio

    Investigating five key predictive text entry with combined distance and keystroke modelling

    Get PDF
    This paper investigates text entry on mobile devices using only five-keys. Primarily to support text entry on smaller devices than mobile phones, this method can also be used to maximise screen space on mobile phones. Reported combined Fitt's law and keystroke modelling predicts similar performance with bigram prediction using a five-key keypad as is currently achieved on standard mobile phones using unigram prediction. User studies reported here show similar user performance on five-key pads as found elsewhere for novice nine-key pad users

    Bekenstein entropy bound for weakly-coupled field theories on a 3-sphere

    Get PDF
    We calculate the high temperature partition functions for SU(Nc) or U(Nc) gauge theories in the deconfined phase on S^1 x S^3, with scalars, vectors, and/or fermions in an arbitrary representation, at zero 't Hooft coupling and large Nc, using analytical methods. We compare these with numerical results which are also valid in the low temperature limit and show that the Bekenstein entropy bound resulting from the partition functions for theories with any amount of massless scalar, fermionic, and/or vector matter is always satisfied when the zero-point contribution is included, while the theory is sufficiently far from a phase transition. We further consider the effect of adding massive scalar or fermionic matter and show that the Bekenstein bound is satisfied when the Casimir energy is regularized under the constraint that it vanishes in the large mass limit. These calculations can be generalized straightforwardly for the case of a different number of spatial dimensions.Comment: 32 pages, 12 figures. v2: Clarifications added. JHEP versio

    Holographic Approach to Regge Trajectory and Rotating D5 brane

    Full text link
    We study the Regge trajectories of holographic mesons and baryons by considering rotating strings and D5 brane, which is introduced as the baryon vertex. Our model is based on the type IIB superstring theory with the background of asymptotic AdS5×S5AdS_5\times S^5. This background is dual to a confining supersymmetric Yang-Mills theory (SYM) with gauge condensate, , which determines the tension of the linear potential between the quark and anti-quark. Then the slope of the meson trajectory (αM\alpha'_{M}) is given by this condensate as αM=1/π\alpha'_{M}=1/\sqrt{\pi } at large spin JJ. This relation is compatible with the other theoretical results and experiments. For the baryon, we show the importance of spinning baryon vertex to obtain a Regge slope compatible with the one of NN and Δ\Delta series. In both cases, mesons and baryons, the trajectories are shifted to large mass side with the same slope for increasing current quark mass.Comment: 28 pages, 7 figure

    Natural language analysis of online health forums

    Get PDF
    Despite advances in concept extraction from free text, finding meaningful health related information from online patient forums still poses a significant challenge. Here we demonstrate how structured information can be extracted from posts found in such online health related forums by forming relationships between a drug/treatment and a symptom or side effect, including the polarity/sentiment of the patient. In particular, a rule-based natural language processing (NLP) system is deployed, where information in sentences is linked together though anaphora resolution. Our NLP relationship extraction system provides a strong baseline, achieving an F1 score of over 80% in discovering the said relationships that are present in the posts we analysed

    Identify error-sensitive patterns by decision tree

    Full text link
    © Springer International Publishing Switzerland 2015. When errors are inevitable during data classification, finding a particular part of the classification model which may be more susceptible to error than others, when compared to finding an Achilles’ heel of the model in a casual way, may help uncover specific error-sensitive value patterns and lead to additional error reduction measures. As an initial phase of the investigation, this study narrows the scope of problem by focusing on decision trees as a pilot model, develops a simple and effective tagging method to digitize individual nodes of a binary decision tree for node-level analysis, to link and track classification statistics for each node in a transparent way, to facilitate the identification and examination of the potentially “weakest” nodes and error-sensitive value patterns in decision trees, to assist cause analysis and enhancement development. This digitization method is not an attempt to re-develop or transform the existing decision tree model, but rather, a pragmatic node ID formulation that crafts numeric values to reflect the tree structure and decision making paths, to expand post-classification analysis to detailed node-level. Initial experiments have shown successful results in locating potentially high-risk attribute and value patterns; this is an encouraging sign to believe this study worth further exploration

    Decision level ensemble method for classifying multi-media data

    Get PDF
    In the digital era, the data, for a given analytical task, can be collected in different formats, such as text, images and audio etc. The data with multiple formats are called multimedia data. Integrating and fusing multimedia datasets has become a challenging task in machine learning and data mining. In this paper, we present heterogeneous ensemble method that combines multi-media datasets at the decision level. Our method consists of several components, including extracting the features from multimedia datasets that are not represented by features, modelling independently on each of multimedia datasets, selecting models based on their accuracy and diversity and building the ensemble at the decision level. Hence our method is called decision level ensemble method (DLEM). The method is tested on multimedia data and compared with other heterogeneous ensemble based methods. The results show that the DLEM outperformed these methods significantly

    Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology

    Get PDF
    Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers
    corecore