4,693 research outputs found

    EXTRACTING FLOW FEATURES USING BAG-OF-FEATURES AND SUPERVISED LEARNING TECHNIQUES

    Get PDF
    Measuring the similarity between two streamlines is fundamental to many important flow data analysis and visualization tasks such as feature detection, pattern querying and streamline clustering. This dissertation presents a novel streamline similarity measure inspired by the bag-of-features concept from computer vision. Different from other streamline similarity measures, the proposed one considers both the distribution of and the distances among features along a streamline. The proposed measure is tested in two common tasks in vector field exploration: streamline similarity query and streamline clustering. Compared with a recent streamline similarity measure, the proposed measure allows users to see the interesting features more clearly in a complicated vector field. In addition to focusing on similar streamlines through streamline similarity query or clustering, users sometimes want to group and see similar features from different streamlines. For example, it is useful to find all the spirals contained in different streamlines and present them to users. To this end, this dissertation proposes to segment each streamline into different features. This problem has not been studied extensively in flow visualization. For instance, many flow feature extraction techniques segment streamline based on simple heuristics such as accumulative curvature or arc length, and, as a result, the segments they found usually do not directly correspond to complete flow features. This dissertation proposes a machine learning-based streamline segmentation algorithm to segment each streamline into distinct features. It is shown that the proposed method can locate interesting features (e.g., a spiral in a streamline) more accurately than some other flow feature extraction methods. Since streamlines are space curves, the proposed method also serves as a general curve segmentation method and may be applied in other fields such as computer vision. Besides flow visualization, a pedagogical visualization tool DTEvisual for teaching access control is also discussed in this dissertation. Domain Type Enforcement (DTE) is a powerful abstraction for teaching students about modern models of access control in operating systems. With DTEvisual, students have an environment for visualizing a DTE-based policy using graphs, visually modifying the policy, and animating the common DTE queries in real time. A user study of DTEvisual suggests that the tool is helpful for students to understand DTE

    Evaluating 35 Methods to Generate Structural Connectomes Using Pairwise Classification

    Full text link
    There is no consensus on how to construct structural brain networks from diffusion MRI. How variations in pre-processing steps affect network reliability and its ability to distinguish subjects remains opaque. In this work, we address this issue by comparing 35 structural connectome-building pipelines. We vary diffusion reconstruction models, tractography algorithms and parcellations. Next, we classify structural connectome pairs as either belonging to the same individual or not. Connectome weights and eight topological derivative measures form our feature set. For experiments, we use three test-retest datasets from the Consortium for Reliability and Reproducibility (CoRR) comprised of a total of 105 individuals. We also compare pairwise classification results to a commonly used parametric test-retest measure, Intraclass Correlation Coefficient (ICC).Comment: Accepted for MICCAI 2017, 8 pages, 3 figure

    Web Content Extraction - a Meta-Analysis of its Past and Thoughts on its Future

    Full text link
    In this paper, we present a meta-analysis of several Web content extraction algorithms, and make recommendations for the future of content extraction on the Web. First, we find that nearly all Web content extractors do not consider a very large, and growing, portion of modern Web pages. Second, it is well understood that wrapper induction extractors tend to break as the Web changes; heuristic/feature engineering extractors were thought to be immune to a Web site's evolution, but we find that this is not the case: heuristic content extractor performance also tends to degrade over time due to the evolution of Web site forms and practices. We conclude with recommendations for future work that address these and other findings.Comment: Accepted for publication in SIGKDD Exploration

    Structure Selection from Streaming Relational Data

    Full text link
    Statistical relational learning techniques have been successfully applied in a wide range of relational domains. In most of these applications, the human designers capitalized on their background knowledge by following a trial-and-error trajectory, where relational features are manually defined by a human engineer, parameters are learned for those features on the training data, the resulting model is validated, and the cycle repeats as the engineer adjusts the set of features. This paper seeks to streamline application development in large relational domains by introducing a light-weight approach that efficiently evaluates relational features on pieces of the relational graph that are streamed to it one at a time. We evaluate our approach on two social media tasks and demonstrate that it leads to more accurate models that are learned faster

    Patient triage by topic modelling of referral letters: Feasibility study

    Get PDF
    Background: Musculoskeletal conditions are managed within primary care but patients can be referred to secondary care if a specialist opinion is required. The ever increasing demand of healthcare resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions. Objective: This study aims to explore the feasibility of using natural language processing and machine learning to automate triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, i.e. considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing two research questions. Can latent topics be used to automatically predict the treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experience such as medical history, demographics and possible treatments? Methods: We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, qualitative evaluation was performed to assess human interpretability of topics. Results: The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin giving an indication that topic modelling could be used to predict the treatment thus effectively supporting patient triage. Qualitative evaluation confirmed high clinical interpretability of the topic model. Conclusions: The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee and/or hip pain by analyzing information from their referral letters

    Computational support for academic peer review:a perspective from artificial intelligence

    Get PDF

    Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media

    Full text link
    Sentiment analysis has been emerging recently as one of the major natural language processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become significant sources for brands to observe user opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we notice that there is a high volume of short and informal messages posted by users on those channels. This kind of data makes the existing works suffer from many difficulties to handle, especially ones using deep learning approaches. In this paper, we propose an approach to handle this problem. This work is extended from our previous work, in which we proposed to combine the typical deep learning technique of Convolutional Neural Networks with domain knowledge. The combination is used for acquiring additional training data augmentation and a more reasonable loss function. In this work, we further improve our architecture by various substantial enhancements, including negation-based data augmentation, transfer learning for word embeddings, the combination of word-level embeddings and character-level embeddings, and using multitask learning technique for attaching domain knowledge rules in the learning process. Those enhancements, specifically aiming to handle short and informal messages, help us to enjoy significant improvement in performance once experimenting on real datasets.Comment: A Preprint of an article accepted for publication by Inderscience in IJCVR on September 201
    corecore