10 research outputs found

    Combining Supervised and Unsupervised Learning to Detect and Semantically Aggregate Crisis-Related Twitter Content

    Get PDF
    The Twitter Stream API offers the possibility to develop (near) real-time methods and applications to detect and monitor impacts of crisis events and their changes over time. As demonstrated by various related research, the content of individual tweets or even entire thematic trends can be utilized to support disaster management, fill information gaps and augment results of satellite-based workflows as well as to extend and improve disaster management databases. Considering the sheer volume of incoming tweets, it is necessary to automatically identify the small number of crisis-relevant tweets and present them in a manageable way. Current approaches for identifying crisis-related content focus on the use of supervised models that decide on the relevance of each tweet individually. Although supervised models can efficiently process the high number of incoming tweets, they have to be extensively pre-trained. Furthermore, the models do not capture the history of already processed messages. During a crisis, various and unique sub-events can occur that are likely to be not covered by the respective supervised model and its training data. Unsupervised learning offers both, to take into account tweets from the past, and a higher adaptive capability, which in turn allows a customization to the specific needs of different disasters. From a practical point of view, drawbacks of unsupervised methods are the higher computational costs and the potential need of user interaction for result interpretation. In order to enhance the limited generalization capabilities of pre-trained models as well as to speed up and guide unsupervised learning, we propose a combination of both concepts. A successive clustering of incoming tweets allows to semantically aggregate the stream data, whereas pre-trained models allow to identify potentially crisis-relevant clusters. Besides the identification of potentially crisis-related content based on semantically aggregated clusters, this approach offers a sound foundation for visualizations, and further related tasks, like event detection as well as the extraction of detailed information about the temporal or spatial development of events. Our work focuses on analyzing the entire freely available Twitter stream by combining an interval-based semantic clustering with an supervised machine learning model for identifying crisis-related messages. The stream is divided into intervals, e.g. of one hour, and each tweet is projected into a numerical vector by using state-of-the-art sentence embeddings. The embeddings are then grouped by a parametric Chinese Restaurant Process clustering. At the end of each interval, a pre-trained feed-forward neural network decides whether a cluster contains crisis-related tweets. With a further developed concept of cluster chains and central centroids, crisis-related clusters of different intervals can be linked in a topic- and even subtopic-related manner. Initial results show that the hybrid approach can significantly improve the results of pre-trained supervised methods. This is especially true for categories in which the supervised model could not be sufficiently pre-trained due to missing labels. In addition, the semantic clustering of tweets offers a flexible and customizable procedure, resulting in a practical summary of topic-specific stream content

    Gaussian Processes for One-class and Binary Classification of Crisis-related Tweets

    Get PDF
    The Twitter Stream API offers the possibility to develop (near) real-time methods and applications to detect and monitor impacts of crisis events and their changes over time. As demonstrated by various related research, the content of individual tweets or even entire thematic trends can be utilized to support disaster management, fill information gaps and augment results of satellite-based workflows as well as to extend and improve disaster management databases. Considering the sheer volume of incoming tweets, it is necessary to automatically identify the small number of crisis-relevant tweets and present them in a manageable way. Current approaches for identifying crisis-related content focus on the use of supervised models that decide on the relevance of each tweet individually. Although supervised models can efficiently process the high number of incoming tweets, they have to be extensively pre-trained. Furthermore, the models do not capture the history of already processed messages. During a crisis, various and unique sub-events can occur that are likely to be not covered by the respective supervised model and its training data. Unsupervised learning offers both, to take into account tweets from the past, and a higher adaptive capability, which in turn allows a customization to the specific needs of different disasters. From a practical point of view, drawbacks of unsupervised methods are the higher computational costs and the potential need of user interaction for result interpretation. In order to enhance the limited generalization capabilities of pre-trained models as well as to speed up and guide unsupervised learning, we propose a combination of both concepts. A successive clustering of incoming tweets allows to semantically aggregate the stream data, whereas pre-trained models allow to identify potentially crisis-relevant clusters. Besides the identification of potentially crisis-related content based on semantically aggregated clusters, this approach offers a sound foundation for visualizations, and further related tasks, like event detection as well as the extraction of detailed information about the temporal or spatial development of events. Our work focuses on analyzing the entire freely available Twitter stream by combining an interval-based semantic clustering with an supervised machine learning model for identifying crisis-related messages. The stream is divided into intervals, e.g. of one hour, and each tweet is projected into a numerical vector by using state-of-the-art sentence embeddings. The embeddings are then grouped by a parametric Chinese Restaurant Process clustering. At the end of each interval, a pre-trained feed-forward neural network decides whether a cluster contains crisis-related tweets. With a further developed concept of cluster chains and central centroids, crisis-related clusters of different intervals can be linked in a topic- and even subtopic-related manner. Initial results show that the hybrid approach can significantly improve the results of pre-trained supervised methods. This is especially true for categories in which the supervised model could not be sufficiently pre-trained due to missing labels. In addition, the semantic clustering of tweets offers a flexible and customizable procedure, resulting in a practical summary of topic-specific stream content

    Combining Supervised and Unsupervised Learning to Detect and Semantically Aggregate Crisis-Related Twitter Content

    Get PDF
    Twitter is an immediate and almost ubiquitous platform and therefore can be a valuable source of information during disasters. Current methods for identifying and classifying crisis-related content are often based on single tweets, i.e., already known information from the past is neglected. In this paper, the combination of tweet-wise pre-trained neural networks and unsupervised semantic clustering is proposed and investigated. The intention is to (1) enhance the generalization capability of pre-trained models, (2) to be able to handle massive amounts of stream data, (3) to reduce information overload by identifying potentially crisis-related content, and (4) to obtain a semantically aggregated data representation that allows for further automated, manual and visual analyses. Latent representations of each tweet based on pre-trained sentence embedding models are used for both, clustering and tweet classification. For a fast, robust and time-continuous processing, subsequent time periods are clustered individually according to a Chinese restaurant process. Clusters without any tweet classified as crisis-related are pruned. Data aggregation over time is ensured by merging semantically similar clusters. A comparison of our hybrid method to a similar clustering approach, as well as first quantitative and qualitative results from experiments with two different labeled data sets demonstrate the great potential for crisis-related Twitter stream analyses

    Combining Remote Sensing with Webdata and Machine Learning to Support Humanitarian Relief Work

    Get PDF
    Gathering, analyzing and disseminating up-to-date information related to incidents and disasters is key to disaster management and relief. Satellite imagery, geo-information, and in-situ data are the mainly used information sources to support decision making. However, limitations in data timeliness as well as in spatial and temporal resolution lead to systematic information gaps in current well-established satellite-based workflows. Citizen observations spread through social media channels, like Twitter, as well as freely available webdata, like WikiData or the GDELT database, are promising complementary sources of relevant information that might be utilized to fill these information gaps and to support in-situ data acquisition. Practical examples for this are impact assessments based on social media eyewitness reports, and the utilization of this information for the early tasking of satellite or drone-based image acquisitions. The great potential, for instance of social media data analysis in crisis response, was investigated and demonstrated in various related research works. However, the barriers of utilizing webdata and appropriate information extraction methods for decision support in real-world scenarios are still high, for instance due to information overload, varying surrounding conditions, or issues related to limited field work infrastructures, trustworthiness, and legal aspects. Within the current DLR research project "Data4Human", demand driven data services for humanitarian aid are developed. Among others, one goal of "Data4Human" is to investigate the practical benefit of augmenting existing workflows of the involved partners (German Red Cross, World Food Programme, and Humanitarian Open Street Map) with social media (Twitter) and real-time global event database (GDELT) data. In this contribution, the general concepts, ideas and corresponding methods for Webdata analysis are presented. State-of-the-art deep learning models are utilized to filter, classify and cluster the data to automatically identify potentially crisis-relevant data, to assess impacts, and to summarize and characterize the course of events, respectively. We present first practical findings and analysis results for the 2019 cyclones Idai and Kenneth

    Comparison of proteomic responses as global approach to antibiotic mechanism of action elucidation

    Get PDF
    This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license. New antibiotics are urgently needed to address the mounting resistance challenge. In early drug discovery, one of the bottlenecks is the elucidation of targets and mechanisms. To accelerate antibiotic research, we provide a proteomic approach for the rapid classification of compounds into those with precedented and unprecedented modes of action. We established a proteomic response library of Bacillus subtilis covering 91 antibiotics and comparator compounds, and a mathematical approach was developed to aid data analysis. Comparison of proteomic responses (CoPR) allows the rapid identification of antibiotics with dual mechanisms of action as shown for atypical tetracyclines. It also aids in generating hypotheses on mechanisms of action as presented for salvarsan (arsphenamine) and the antirheumatic agent auranofin, which is under consideration for repurposing. Proteomic profiling also provides insights into the impact of antibiotics on bacterial physiology through analysis of marker proteins indicative of the impairment of cellular processes and structures. As demonstrated for trans-translation, a promising target not yet exploited clinically, proteomic profiling supports chemical biology approaches to investigating bacterial physiology

    Searching and Structuring the Twitter Stream for Crisis Response: A Flexible Concept to Support Research and Practice

    No full text
    In the context of crisis situations, the high information value of Twitter has already been demonstrated in several studies. However, there are various issues that prevent users from deploying information from social media in practice. This paper outlines the experiences and results of the Data4Human project. In collaboration with the World Food Programme, the German Red Cross and the Humanitarian Open Street Map Team, requirements were defined that have to be fulfilled for a practical use of the data. On this basis, a modular processing system was developed which, in combination with a dashboard, enables the practitioners to evaluate the data and analysis results in a structured and interactive manner. Pre-trained and adjustable machine learning methods combined with search- and unsupervised aggregation capabilities allow to yield different thematic views on the data. By using two use cases, Cyclone Idai and Kenneth in 2019 in Southern Africa and the current war in Ukraine, it becomes apparent, that interactive and customizable methods are of great importance, as well as an effective data representation that enables the identification of relevant content. Based on the results obtained and the user feedback received, future research guidelines are defined in order to close the evident gap between research and practice

    Identification of Noncatalytic Lysine Residues from Allosteric Circuits via Covalent Probes

    No full text
    Covalent modifications of nonactive site lysine residues by small molecule probes has recently evolved into an important strategy for interrogating biological systems. Here, we report the discovery of a class of bioreactive compounds that covalently modify lysine residues in DegS, the rate limiting protease of the essential bacterial outer membrane stress response pathway. These modifications lead to an allosteric activation and allow the identification of novel residues involved in the allosteric activation circuit. These findings were validated by structural analyses via X-ray crystallography and cell-based reporter systems. We anticipate that our findings are not only relevant for a deeper understanding of the structural basis of allosteric activation in DegS and other HtrA serine proteases but also pinpoint an alternative use of covalent small molecules for probing essential biochemical mechanisms

    Identification of Noncatalytic Lysine Residues from Allosteric Circuits via Covalent Probes

    No full text
    Covalent modifications of nonactive site lysine residues by small molecule probes has recently evolved into an important strategy for interrogating biological systems. Here, we report the discovery of a class of bioreactive compounds that covalently modify lysine residues in DegS, the rate limiting protease of the essential bacterial outer membrane stress response pathway. These modifications lead to an allosteric activation and allow the identification of novel residues involved in the allosteric activation circuit. These findings were validated by structural analyses via X-ray crystallography and cell-based reporter systems. We anticipate that our findings are not only relevant for a deeper understanding of the structural basis of allosteric activation in DegS and other HtrA serine proteases but also pinpoint an alternative use of covalent small molecules for probing essential biochemical mechanisms
    corecore