729 research outputs found

    Event detection in high throughput social media

    Get PDF

    Linking social media, medical literature, and clinical notes using deep learning.

    Get PDF
    Researchers analyze data, information, and knowledge through many sources, formats, and methods. The dominant data format includes text and images. In the healthcare industry, professionals generate a large quantity of unstructured data. The complexity of this data and the lack of computational power causes delays in analysis. However, with emerging deep learning algorithms and access to computational powers such as graphics processing unit (GPU) and tensor processing units (TPUs), processing text and images is becoming more accessible. Deep learning algorithms achieve remarkable results in natural language processing (NLP) and computer vision. In this study, we focus on NLP in the healthcare industry and collect data not only from electronic medical records (EMRs) but also medical literature and social media. We propose a framework for linking social media, medical literature, and EMRs clinical notes using deep learning algorithms. Connecting data sources requires defining a link between them, and our key is finding concepts in the medical text. The National Library of Medicine (NLM) introduces a Unified Medical Language System (UMLS) and we use this system as the foundation of our own system. We recognize social media’s dynamic nature and apply supervised and semi-supervised methodologies to generate concepts. Named entity recognition (NER) allows efficient extraction of information, or entities, from medical literature, and we extend the model to process the EMRs’ clinical notes via transfer learning. The results include an integrated, end-to-end, web-based system solution that unifies social media, literature, and clinical notes, and improves access to medical knowledge for the public and experts

    Detection and relocation of earthquakes in the sparsely instrumented Mackenzie Mountains region, Yukon and Northwest Territories, Canada

    Get PDF
    2020 Spring.Includes bibliographical references.The Mackenzie Mountains are an actively uplifting and seismogenic arcuate thrust belt lying within the Northwest Territories and Yukon, Canada. Seismic activity in the region is poorly constrained due to a historically sparse seismograph distribution. In this study, new data are analyzed from the 40-station, ~875 km-long Mackenzie Mountains temporary network (Baker et al., 2020) crossing the Cordillera-Craton region adjacent to and within the Mackenzie Mountains, in conjunction with Transportable Array and other sparsely distributed arrays in the region. Data from approximately August 2016 – August 2018 are processed and compared to the sparse-network earthquake catalog records maintained by the USGS and Natural Resources Canada. Using algorithms developed by Kushnir et al. (1990), Rawles and Thurber (2015), and Roecker et al. (2006), signals are identified and subsequently associated across the network to note potential events, estimate phase onsets, and resolve hypocenter locations. This study improves the regional earthquake catalog by detecting smaller-magnitude earthquakes and lowering the regional magnitude of completeness from Mc = 2.5 to 1.9. Within the Mackenzie Mountains and immediately surrounding areas we find 524 new events and additionally recommend an updated location for 185 previously cataloged events. Our b-value computation for the updated catalog (0.916 ± 0.08) likely indicates a relatively high level of regional differential stress. We identify the spatial distribution of earthquakes in the Mackenzie Mountains as diffuse, and offer far-field stress transfer as a mechanism for producing widespread reverse faulting observed in the region. Further, we associate regional seismicity with tectonic activity in the context of known faults and orogenic provinces such as the Richardson Mountains

    Event detection in high throughput social media

    Get PDF

    Big Data Now, 2015 Edition

    Get PDF
    Now in its fifth year, O’Reilly’s annual Big Data Now report recaps the trends, tools, applications, and forecasts we’ve talked about over the past year. For 2015, we’ve included a collection of blog posts, authored by leading thinkers and experts in the field, that reflect a unique set of themes we’ve identified as gaining significant attention and traction. Our list of 2015 topics include: Data-driven cultures Data science Data pipelines Big data architecture and infrastructure The Internet of Things and real time Applications of big data Security, ethics, and governance Is your organization on the right track? Get a hold of this free report now and stay in tune with the latest significant developments in big data

    Using LLMs to discover emerging coded antisemitic hate-speech in extremist social media

    Full text link
    Online hate speech proliferation has created a difficult problem for social media platforms. A particular challenge relates to the use of coded language by groups interested in both creating a sense of belonging for its users and evading detection. Coded language evolves quickly and its use varies over time. This paper proposes a methodology for detecting emerging coded hate-laden terminology. The methodology is tested in the context of online antisemitic discourse. The approach considers posts scraped from social media platforms, often used by extremist users. The posts are scraped using seed expressions related to previously known discourse of hatred towards Jews. The method begins by identifying the expressions most representative of each post and calculating their frequency in the whole corpus. It filters out grammatically incoherent expressions as well as previously encountered ones so as to focus on emergent well-formed terminology. This is followed by an assessment of semantic similarity to known antisemitic terminology using a fine-tuned large language model, and subsequent filtering out of the expressions that are too distant from known expressions of hatred. Emergent antisemitic expressions containing terms clearly relating to Jewish topics are then removed to return only coded expressions of hatred.Comment: 9 pages, 4 figures, 2 algorithms, 3 table

    Incorporating neighbourhood features in RNNs for popularity forecasting for emerging research fields

    Full text link
    The accurate modelling and forecasting of the popularity of emerging fields can benefit researchers by allocating resources and efforts on promising research directions. While existing forecasting approaches enjoy various levels of success, most suffer from at least one of the following three challenges: a limited scope due to having to mine topic terms from only a few documents, low generalizability due to assigning arbitrary binary classifications on topics to be either emerging, non-emerging, or using an emerging topic or field of study’s historical features as inputs to forecast its future popularity while disregarding the existing effect of a “cold start”. In this thesis, we propose a framework inclusive of two algorithms. The first algorithm is a customised ontology extraction algorithm that can generate a field of study taxonomy from a scholarly database when none exists. Using the field of study taxonomy, the second forecasting algorithm addresses the three challenges in three steps. Firstly, we leverage the field of study taxonomy present in most academic databases to obtain a neighbourhood of trending fields within the discipline of the field of study of interest. Then, dynamic time warping is used to measure the similarity of each neighbour’s trending pattern compared to the trending pattern of the field of study of interest. Lastly, we conduct multivariate forecasting using a RNN model such as long short-term memory (LSTM) or dual attention recurrent neural networks (DA-RNN) while utilizing the historical popularity scores of similar trending neighbours as input. Experimental results on 10 emerging and non-emerging fields of study showcases the existence and various dynamics of “cold start”. In addition, the proposed algorithm is also shown to greatly reduce the RMSE, MAE, and MAPE of forecasts against traditional methods for emerging fields while retaining similar performance for non-emerging fields. This validates the significance of these challenges against existing methods and provides insight on the dependency structure of emerging topics with their historical features

    Interactions in information spread: quantification and interpretation using stochastic block models

    Full text link
    In most real-world applications, it is seldom the case that a given observable evolves independently of its environment. In social networks, users' behavior results from the people they interact with, news in their feed, or trending topics. In natural language, the meaning of phrases emerges from the combination of words. In general medicine, a diagnosis is established on the basis of the interaction of symptoms. Here, we propose a new model, the Interactive Mixed Membership Stochastic Block Model (IMMSBM), which investigates the role of interactions between entities (hashtags, words, memes, etc.) and quantifies their importance within the aforementioned corpora. We find that interactions play an important role in those corpora. In inference tasks, taking them into account leads to average relative changes with respect to non-interactive models of up to 150\% in the probability of an outcome. Furthermore, their role greatly improves the predictive power of the model. Our findings suggest that neglecting interactions when modeling real-world phenomena might lead to incorrect conclusions being drawn.Comment: 17 pages, 3 figures, submitted to ECML-PKDD 202

    Trending Topics in Multiple Sclerosis

    Get PDF
    Multiple sclerosis (MS) is a chronic inflammatory disease characterized by progressive demyelination and neurodegeneration of the central nervous system (CNS), constituting the most common demyelinating disease of the CNS in humans. Although intensive research over many decades has unveiled many pathophysiological mechanisms in the development of MS, the cause is still unknown. Nevertheless, it does seem clear that genetic susceptibility and environmental factors play crucial roles. Trending Topics in Multiple Sclerosis is a book that provides an insight into some of the main problems currently debated in this area of research, focusing on topics that deal with genetic and environmental risk factors, pathophysiological mechanisms, neurocognitive findings, and neuroprotective strategies
    • …
    corecore