45 research outputs found

    Deepr: A Convolutional Net for Medical Records

    Full text link
    Feature engineering remains a major bottleneck when creating predictive systems from electronic medical records. At present, an important missing element is detecting predictive regular clinical motifs from irregular episodic records. We present Deepr (short for Deep record), a new end-to-end deep learning system that learns to extract features from medical records and predicts future risk automatically. Deepr transforms a record into a sequence of discrete elements separated by coded time gaps and hospital transfers. On top of the sequence is a convolutional neural net that detects and combines predictive local clinical motifs to stratify the risk. Deepr permits transparent inspection and visualization of its inner working. We validate Deepr on hospital data to predict unplanned readmission after discharge. Deepr achieves superior accuracy compared to traditional techniques, detects meaningful clinical motifs, and uncovers the underlying structure of the disease and intervention space

    Eesti keele ĂŒhendverbide automaattuvastus lingvistiliste ja statistiliste meetoditega

    Get PDF
    TĂ€napĂ€eval on inimkeeli (kaasa arvatud eesti keelt) töötlevad tehnoloogiaseadmed igapĂ€evaelu osa, kuid arvutite „keeleoskus“ pole kaugeltki tĂ€iuslik. Keele automaattöötluse kĂ”ige rohkem kasutust leidev rakendus on ilmselt masintĂ”lge. Ikka ja jĂ€lle jagatakse sotsiaalmeedias, kuidas tuntud sĂŒsteemid (nĂ€iteks Google Translate) midagi valesti tĂ”lgivad. Enamasti tekitavad absurdse olukorra mitmest sĂ”nast koosnevad fraasid vĂ”i laused. NĂ€iteks ei suuda tĂ”lkesĂŒsteemid tabada lauses „Ta lĂ€ks lepinguga alt“ ĂŒhendi alt minema tĂ€hendust petta saama, sest Ă”ige tĂ€henduse edastamiseks ei saa selle ĂŒhendi komponente sĂ”na-sĂ”nalt tĂ”lkida ja seetĂ”ttu satubki arvuti hĂ€tta. Selleks et nii masintĂ”lkesĂŒsteemide kui ka teiste kasulike rakenduste nagu libauudiste tuvastuse vĂ”i kĂŒsimus-vastus sĂŒsteemide kvaliteet paraneks, on oluline, et arvuti oskaks tuvastada mitmesĂ”nalisi ĂŒksuseid ja nende eri tĂ€hendusi, mida inimesed konteksti pĂ”hjal ĂŒpriski lihtalt teha suudavad. PĂŒsiĂŒhendite (tĂ€henduse) automaattuvastus on oluline kĂ”ikides keeltes ja on seetĂ”ttu pĂ€lvinud arvutilingvistikas rohkelt tĂ€helepanu. Seega on eriti inglise keele pĂ”hjal vĂ€lja pakutud terve hulk meetodeid, mida pole siiamaani eesti keele pĂŒsiĂŒhendite tuvastamiseks rakendatud. Doktoritöös kasutataksegi masinĂ”ppe meetodeid, mis on teiste keelte pĂŒsiĂŒhendite tuvastamisel edukad olnud, ĂŒht liiki eesti keele pĂŒsiĂŒhendi – ĂŒhendverbi – automaatseks tuvastamiseks. Töös demonstreeritakse suurte tekstiandmete pĂ”hjal, et seni eesti keele traditsioonilises kĂ€sitluses esitatud eesti keele ĂŒhendverbide jaotus ainukordseteks (ĂŒhendi komponentide koosesinemisel tekib uus tĂ€hendus) ja korrapĂ€rasteks (ĂŒhendi tĂ€hendus on tema komponentide summa) ei ole piisavalt pĂ”hjalik. Nimelt kinnitab töö arvutilingvistilistes uurimustes laialt levinud arusaama, et pĂŒsiĂŒhendid (k.a ĂŒhendverbid) jaotuvad skaalale, mille ĂŒhes otsas on ĂŒhendid, mille tĂ€hendus on selgelt komponentide tĂ€henduste summa. ja teises need ĂŒhendid, mis saavad uue tĂ€henduse. Uurimus nĂ€itab, et lisaks kontekstile aitavad arvutil tuvastada ĂŒhendverbi Ă”iget tĂ€hendust mitmed teised tunnuseid, nĂ€iteks subjekti ja objekti elusus ja kÀÀnded. Doktoritöö raames valminud andmestikud ja vektoresitused on vajalikud uued ressursid, mis on avalikud edaspidisteks uurimusteks.Nowadays, applications that process human languages (including Estonian) are part of everyday life. However, computers are not yet able to understand every nuance of language. Machine translation is probably the most well-known application of natural language processing. Occasionally, the worst failures of machine translation systems (e.g. Google Translate) are shared on social media. Most of such cases happen when sequences longer than words are translated. For example, translation systems are not able to catch the correct meaning of the particle verb alt (‘from under’) minema (‘to go’) (‘to get deceived’) in the sentence Ta lĂ€ks lepinguga alt because the literal translation of the components of the expression is not correct. In order to improve the quality of machine translation systems and other useful applications, e.g. spam detection or question answering systems, such (idiomatic) multi-word expressions and their meanings must be well detected. The detection of multi-word expressions and their meaning is important in all languages and therefore much research has been done in the field, especially in English. However, the suggested methods have not been applied to the detection of Estonian multi-word expressions before. The dissertation fills that gap and applies well-known machine learning methods to detect one type of Estonian multi-word expressions – the particle verbs. Based on large textual data, the thesis demonstrates that the traditional binary division of Estonian particle verbs to non-compositional (ainukordne, meaning is not predictable from the meaning of its components) and compositional (korrapĂ€rane, meaning is predictable from the meaning of its components) is not comprehensive enough. The research confirms the widely adopted view in computational linguistics that the multi-word expressions form a continuum between the compositional and non-compositional units. Moreover, it is shown that in addition to context, there are some linguistic features, e.g. the animacy and cases of subject and object that help computers to predict whether the meaning of a particle verb in a sentence is compositional or non-compositional. In addition, the research introduces novel resources for Estonian language – trained embeddings and created compositionality datasets are available for the future research.https://www.ester.ee/record=b5252157~S

    Can humain association norm evaluate latent semantic analysis?

    Get PDF
    This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

    Using attention methods to predict judicial outcomes

    Full text link
    Legal Judgment Prediction is one of the most acclaimed fields for the combined area of NLP, AI, and Law. By legal prediction we mean an intelligent systems capable to predict specific judicial characteristics, such as judicial outcome, a judicial class, predict an specific case. In this research, we have used AI classifiers to predict judicial outcomes in the Brazilian legal system. For this purpose, we developed a text crawler to extract data from the official Brazilian electronic legal systems. These texts formed a dataset of second-degree murder and active corruption cases. We applied different classifiers, such as Support Vector Machines and Neural Networks, to predict judicial outcomes by analyzing textual features from the dataset. Our research showed that Regression Trees, Gated Recurring Units and Hierarchical Attention Networks presented higher metrics for different subsets. As a final goal, we explored the weights of one of the algorithms, the Hierarchical Attention Networks, to find a sample of the most important words used to absolve or convict defendants

    Computational approaches to semantic change (Volume 6)

    Get PDF
    Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans
    corecore