11 research outputs found

    Hash Embeddings for Efficient Word Representations

    Full text link
    We present hash embeddings, an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function (the hashing trick). In hash embeddings each token is represented by kk dd-dimensional embeddings vectors and one kk dimensional weight vector. The final dd dimensional representation of the token is the product of the two. Rather than fitting the embedding vectors for each token these are selected by the hashing trick from a shared pool of BB embedding vectors. Our experiments show that hash embeddings can easily deal with huge vocabularies consisting of millions of tokens. When using a hash embedding there is no need to create a dictionary before training nor to perform any kind of vocabulary pruning after training. We show that models trained using hash embeddings exhibit at least the same level of performance as models trained using regular embeddings across a wide range of tasks. Furthermore, the number of parameters needed by such an embedding is only a fraction of what is required by a regular embedding. Since standard embeddings and embeddings constructed using the hashing trick are actually just special cases of a hash embedding, hash embeddings can be considered an extension and improvement over the existing regular embedding types

    Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches

    Get PDF
    Physicians and the general public are increasingly using web-based tools to find answers to medical questions. The field of rare diseases is especially challenging and important as shown by the long delay and many mistakes associated with diagnoses. In this paper we review recent initiatives on the use of web search, social media and data mining in data repositories for medical diagnosis. We compare the retrieval accuracy on 56 rare disease cases with known diagnosis for the web search tools google.com, pubmed.gov, omim.org and our own search tool findzebra.com. We give a detailed description of IBM's Watson system and make a rough comparison between findzebra.com and Watson on subsets of the Doctor's dilemma dataset. The recall@10 and recall@20 (fraction of cases where the correct result appears in top 10 and top 20) for the 56 cases are found to be be 29%, 16%, 27% and 59% and 32%, 18%, 34% and 64%, respectively. Thus, FindZebra has a significantly (p < 0.01) higher recall than the other 3 search engines. When tested under the same conditions, Watson and FindZebra showed similar recall@10 accuracy. However, the tests were performed on different subsets of Doctors dilemma questions. Advances in technology and access to high quality data have opened new possibilities for aiding the diagnostic process. Specialized search engines, data mining tools and social media are some of the areas that hold promise

    A prospective survey in European Society of Cardiology member countries of atrial fibrillation management: baseline results of EURO bservational Research Programme Atrial Fibrillation (EORP-AF) Pilot General Registry

    Get PDF
    Aims: Given the advances in atrial fibrillation (AF) management and the availability of new European Society of Cardiology (ESC) guidelines, there is a need for the systematic collection of contemporary data regarding the management and treatment of AF in ESC member countries. Methods and results: We conducted a registry of consecutive in- and outpatients with AF presenting to cardiologists in nine participating ESC countries. All patients with an ECG-documented diagnosis of AF confirmed in the year prior to enrolment were eligible. We enroled a total of 3119 patients from February 2012 to March 2013, with full data on clinical subtype available for 3049 patients (40.4% female; mean age 68.8 years). Common comorbidities were hypertension, coronary disease, and heart failure. Lone AF was present in only 3.9% (122 patients). Asymptomatic AF was common, particularly among those with permanent AF. Amiodarone was the most common antiarrhythmic agent used (~20%), while beta-blockers and digoxin were the most used rate control drugs. Oral anticoagulants (OACs) were used in 80% overall, most often vitamin K antagonists (71.6%), with novel OACs being used in 8.4%. Other antithrombotics (mostly antiplatelet therapy, especially aspirin) were still used in one-third of the patients, and no antithrombotic treatment in only 4.8%. Oral anticoagulants were used in 56.4% of CHA 2DS2-VASc = 0, with 26.3% having no antithrombotic therapy. A high HAS-BLED score was not used to exclude OAC use, but there was a trend towards more aspirin use in the presence of a high HAS-BLED score. Conclusion: The EURObservational Research Programme Atrial Fibrillation (EORP-AF) Pilot Registry has provided systematic collection of contemporary data regarding the management and treatment of AF by cardiologists in ESC member countries. Oral anticoagulant use has increased, but novel OAC use was still low. Compliance with the treatment guidelines for patients with the lowest and higher stroke risk scores remains suboptimal. © The Author 2013
    corecore