395 research outputs found

    Spatio-temporal framework for integrative analysis of zebrafish development studies

    Get PDF
    Bio-informatica kan omschreven worden als het toepassen van algoritmen om meerwaarde te verkrijgen uit data afkomstig van biomedisch en/of biologisch onderzoek. In bio-informatica wordt onderzoek gedaan met grote gegevens verzamelingen die afkomstig zijn uit biomedisch en/of biologisch experimenten. Het doel van dit onderzoek is komen tot nieuwe inzichten vanuit de gegevens verzameling. Deze inzichten komen tot stand door de goede organisatie van de data, het linken naar en integreren met complementaire gegevens verzamelingen en ontwikkelen en toepassen van analytische methodieken. Als bio-informatica groep onderzoeken wij het inrichten en ontwikkelen van een 3D spatio-temporele data omgeving voor ontwikkelingsstudies van het zebravis model organisme. De expressie van genen in spatio-temporale patronen vormt de basis van het ontwikkelingsproces. Voor onderzoekers is een begrip van deze patronen in sam enhang met de anatomische ontwikkeling belangrijk; hoe vormen de patronen de basis voor vorm verandering en welke genen kunnen bij dergelijke veranderende patronen betrokken zijn. In deze context hebben wij een omgeving ontwikkeld voor spatio-temporele gegevens uit embryonische studies van het zebravis modelsysteem.LEI Universiteit LeidenImagin

    Evolving Neural Networks through a Reverse Encoding Tree

    Full text link
    NeuroEvolution is one of the most competitive evolutionary learning frameworks for designing novel neural networks for use in specific tasks, such as logic circuit design and digital gaming. However, the application of benchmark methods such as the NeuroEvolution of Augmenting Topologies (NEAT) remains a challenge, in terms of their computational cost and search time inefficiency. This paper advances a method which incorporates a type of topological edge coding, named Reverse Encoding Tree (RET), for evolving scalable neural networks efficiently. Using RET, two types of approaches -- NEAT with Binary search encoding (Bi-NEAT) and NEAT with Golden-Section search encoding (GS-NEAT) -- have been designed to solve problems in benchmark continuous learning environments such as logic gates, Cartpole, and Lunar Lander, and tested against classical NEAT and FS-NEAT as baselines. Additionally, we conduct a robustness test to evaluate the resilience of the proposed NEAT algorithms. The results show that the two proposed strategies deliver improved performance, characterized by (1) a higher accumulated reward within a finite number of time steps; (2) using fewer episodes to solve problems in targeted environments, and (3) maintaining adaptive robustness under noisy perturbations, which outperform the baselines in all tested cases. Our analysis also demonstrates that RET expends potential future research directions in dynamic environments. Code is available from https://github.com/HaolingZHANG/ReverseEncodingTree.Comment: Accepted to IEEE Congress on Evolutionary Computation (IEEE CEC) 2020. Lecture Presentatio

    MANTRA: A Topic Modeling-Based Tool to Support Automated Trend Analysis on Unstructured Social Media Data

    Get PDF
    The early identification of new and auspicious ideas leads to competitive advantages for companies. Thereby, topic modeling can serve as an effective analytical approach for the automated investigation of trends from unstructured social media data. However, existing trend analysis tools do not meet the requirements regarding (a) Product Development, (b) Customer Behavior Analysis, and (c) Market-/Brand-Monitoring as reflected within extant literature. Thus, based on the requirements for each of these common marketing-related use cases, we derived design principles following design science research and instantiated the artifact “MANTRA” (MArketiNg TRend Analysis). We demonstrated MANTRA on a real-world data set (~1.03 million Yelp reviews) and hereby could confirm remarkable trends of vegan and global cuisine. In particular, the importance of meeting all specific requirements of the respective use cases and especially flexibly incorporating several external parameters into the trend analysis is exemplified

    Explainable temporal data mining techniques to support the prediction task in Medicine

    Get PDF
    In the last decades, the increasing amount of data available in all fields raises the necessity to discover new knowledge and explain the hidden information found. On one hand, the rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, results to users. In the biomedical informatics and computer science communities, there is considerable discussion about the `` un-explainable" nature of artificial intelligence, where often algorithms and systems leave users, and even developers, in the dark with respect to how results were obtained. Especially in the biomedical context, the necessity to explain an artificial intelligence system result is legitimate of the importance of patient safety. On the other hand, current database systems enable us to store huge quantities of data. Their analysis through data mining techniques provides the possibility to extract relevant knowledge and useful hidden information. Relationships and patterns within these data could provide new medical knowledge. The analysis of such healthcare/medical data collections could greatly help to observe the health conditions of the population and extract useful information that can be exploited in the assessment of healthcare/medical processes. Particularly, the prediction of medical events is essential for preventing disease, understanding disease mechanisms, and increasing patient quality of care. In this context, an important aspect is to verify whether the database content supports the capability of predicting future events. In this thesis, we start addressing the problem of explainability, discussing some of the most significant challenges need to be addressed with scientific and engineering rigor in a variety of biomedical domains. We analyze the ``temporal component" of explainability, focusing on detailing different perspectives such as: the use of temporal data, the temporal task, the temporal reasoning, and the dynamics of explainability in respect to the user perspective and to knowledge. Starting from this panorama, we focus our attention on two different temporal data mining techniques. The first one, based on trend abstractions, starting from the concept of Trend-Event Pattern and moving through the concept of prediction, we propose a new kind of predictive temporal patterns, namely Predictive Trend-Event Patterns (PTE-Ps). The framework aims to combine complex temporal features to extract a compact and non-redundant predictive set of patterns composed by such temporal features. The second one, based on functional dependencies, we propose a methodology for deriving a new kind of approximate temporal functional dependencies, called Approximate Predictive Functional Dependencies (APFDs), based on a three-window framework. We then discuss the concept of approximation, the data complexity of deriving an APFD, the introduction of two new error measures, and finally the quality of APFDs in terms of coverage and reliability. Exploiting these methodologies, we analyze intensive care unit data from the MIMIC dataset

    Distributional Semantic Models for Clinical Text Applied to Health Record Summarization

    Get PDF
    As information systems in the health sector are becoming increasingly computerized, large amounts of care-related information are being stored electronically. In hospitals clinicians continuously document treatment and care given to patients in electronic health record (EHR) systems. Much of the information being documented is in the form of clinical notes, or narratives, containing primarily unstructured free-text information. For each care episode, clinical notes are written on a regular basis, ending with a discharge summary that basically summarizes the care episode. Although EHR systems are helpful for storing and managing such information, there is an unrealized potential in utilizing this information for smarter care assistance, as well as for secondary purposes such as research and education. Advances in clinical language processing are enabling computers to assist clinicians in their interaction with the free-text information documented in EHR systems. This includes assisting in tasks like query-based search, terminology development, knowledge extraction, translation, and summarization. This thesis explores various computerized approaches and methods aimed at enabling automated semantic textual similarity assessment and information extraction based on the free-text information in EHR systems. The focus is placed on the task of (semi-)automated summarization of the clinical notes written during individual care episodes. The overall theme of the presented work is to utilize resource-light approaches and methods, circumventing the need to manually develop knowledge resources or training data. Thus, to enable computational semantic textual similarity assessment, word distribution statistics are derived from large training corpora of clinical free text and stored as vector-based representations referred to as distributional semantic models. Also resource-light methods are explored in the task of performing automatic summarization of clinical freetext information, relying on semantic textual similarity assessment. Novel and experimental methods are presented and evaluated that focus on: a) distributional semantic models trained in an unsupervised manner from statistical information derived from large unannotated clinical free-text corpora; b) representing and computing semantic similarities between linguistic items of different granularity, primarily words, sentences and clinical notes; and c) summarizing clinical free-text information from individual care episodes. Results are evaluated against gold standards that reflect human judgements. The results indicate that the use of distributional semantics is promising as a resource-light approach to automated capturing of semantic textual similarity relations from unannotated clinical text corpora. Here it is important that the semantics correlate with the clinical terminology, and with various semantic similarity assessment tasks. Improvements over classical approaches are achieved when the underlying vector-based representations allow for a broader range of semantic features to be captured and represented. These are either distributed over multiple semantic models trained with different features and training corpora, or use models that store multiple sense-vectors per word. Further, the use of structured meta-level information accompanying care episodes is explored as training features for distributional semantic models, with the aim of capturing semantic relations suitable for care episode-level information retrieval. Results indicate that such models performs well in clinical information retrieval. It is shown that a method called Random Indexing can be modified to construct distributional semantic models that capture multiple sense-vectors for each word in the training corpus. This is done in a way that retains the original training properties of the Random Indexing method, by being incremental, scalable and distributional. Distributional semantic models trained with a framework called Word2vec, which relies on the use of neural networks, outperform those trained using the classic Random Indexing method in several semantic similarity assessment tasks, when training is done using comparable parameters and the same training corpora. Finally, several statistical features in clinical text are explored in terms of their ability to indicate sentence significance in a text summary generated from the clinical notes. This includes the use of distributional semantics to enable case-based similarity assessment, where cases are other care episodes and their “solutions”, i.e., discharge summaries. A type of manual evaluation is performed, where human experts rates the different aspects of the summaries using a evaluation scheme/tool. In addition, the original clinician-written discharge summaries are explored as gold standard for the purpose of automated evaluation. Evaluation shows a high correlation between manual and automated evaluation, suggesting that such a gold standard can function as a proxy for human evaluations. --- This thesis has been published jointly with Norwegian University of Science and Technology, Norway and University of Turku, Finland.This thesis has beenpublished jointly with Norwegian University of Science and Technology, Norway.Siirretty Doriast

    Visual analytics of location-based social networks for decision support

    Get PDF
    Recent advances in technology have enabled people to add location information to social networks called Location-Based Social Networks (LBSNs) where people share their communication and whereabouts not only in their daily lives, but also during abnormal situations, such as crisis events. However, since the volume of the data exceeds the boundaries of human analytical capabilities, it is almost impossible to perform a straightforward qualitative analysis of the data. The emerging field of visual analytics has been introduced to tackle such challenges by integrating the approaches from statistical data analysis and human computer interaction into highly interactive visual environments. Based on the idea of visual analytics, this research contributes the techniques of knowledge discovery in social media data for providing comprehensive situational awareness. We extract valuable hidden information from the huge volume of unstructured social media data and model the extracted information for visualizing meaningful information along with user-centered interactive interfaces. We develop visual analytics techniques and systems for spatial decision support through coupling modeling of spatiotemporal social media data, with scalable and interactive visual environments. These systems allow analysts to detect and examine abnormal events within social media data by integrating automated analytical techniques and visual methods. We provide comprehensive analysis of public behavior response in disaster events through exploring and examining the spatial and temporal distribution of LBSNs. We also propose a trajectory-based visual analytics of LBSNs for anomalous human movement analysis during crises by incorporating a novel classification technique. Finally, we introduce a visual analytics approach for forecasting the overall flow of human crowds

    Modeling Complex High Level Interactions in the Process of Visual Mining

    Get PDF
    Visual Mining refers to the human analytical process that uses visual representations of raw data and makes suitable inferences. During this analytical process, users are engaged in complex cognitive activities such as decision making, problem solving, analytical reasoning and learning. Now a days, users typically use interactive visualization tools, which we call as visual mining support tools (VMSTs), to mediate their interactions with the information present in visual representations of raw data and also to support their complex cognitive activities when performing visual mining. VMSTs have two main components: visual representation and interaction. Even though, these two components are fundamental aspects of VMSTs, the research on visual representation has received the most attention. It is still unclear how to design interactions which can properly support users in performing complex cognitive activities during the visual mining process. Although some fundamental concepts and techniques regarding interaction design have been in place for a while, many established researchers are of the opinion that we do not yet have a generalized, principled, and systematic understanding of interaction components of these VMSTs, and how interactions should be analyzed, designed, and integrated to support complex cognitive activities. Many researchers have recommended that one way to address this problem is through appropriate characterization of interactions in the visual mining process. Models that provide classifications of interactions have indeed been proposed in the visualization research community. While these models are important contributions for the visualization research community, they often characterize interactions at lower levels of human information interaction and high level interactions are not well addressed. In addition, some of these models are not designed to model user activity; rather they are most applicable for representing a system’s response to user activity and not the user activity itself. In this thesis, we address this problem through characterization of the interaction space of visual mining at the appropriate level. Our main contribution in this research is the discovery of a small set of classification criteria which can comprehensively characterize the interaction space of visual mining involving interactions with VMSTs for performing complex cognitive activities. These complex cognitive activities are modeled through visual mining episodes, a coherent set of activities consisting of visual mining strategies (VMSs). Using the classification criteria, VMSs are simply described as combinations of different values of these criteria. By considering all combinations, we can comprehensively cover the interaction space of visual mining. Our VMS interaction space model is unique in identifying the activity tier, a granularity of interactions (high level) which supports performance of complex cognitive activities through interactions with visual information using VMSTs. As further demonstration of the utility of this VMS interaction space model, we describe the formulation of an inspection framework which can provide quantitative measures for the support provided by VMSTs for complex cognitive activities in visual mining. This inspection framework, which has enabled us to produce a new simpler evaluation method for VMSTs in comparison to existing evaluation methods, is based soundly on existing theories and models. Both the VMS interaction space model and the inspection framework present many interesting avenues for further research

    R-CAD: Rare Cyber Alert Signature Relationship Extraction Through Temporal Based Learning

    Get PDF
    The large number of streaming intrusion alerts make it challenging for security analysts to quickly identify attack patterns. This is especially difficult since critical alerts often occur too rarely for traditional pattern mining algorithms to be effective. Recognizing the attack speed as an inherent indicator of differing cyber attacks, this work aggregates alerts into attack episodes that have distinct attack speeds, and finds attack actions regularly co-occurring within the same episode. This enables a novel use of the constrained SPADE temporal pattern mining algorithm to extract consistent co-occurrences of alert signatures that are indicative of attack actions that follow each other. The proposed Rare yet Co-occurring Attack action Discovery (R-CAD) system extracts not only the co-occurring patterns but also the temporal characteristics of the co-occurrences, giving the `strong rules\u27 indicative of critical and repeated attack behaviors. Through the use of a real-world dataset, we demonstrate that R-CAD helps reduce the overwhelming volume and variety of intrusion alerts to a manageable set of co-occurring strong rules. We show specific rules that reveal how critical attack actions follow one another and in what attack speed
    corecore