638 research outputs found

    ARIANA: Adaptive Robust and Integrative Analysis for finding Novel Associations

    Get PDF
    The effective mining of biological literature can provide a range of services such as hypothesis-generation, semantic-sensitive information retrieval, and knowledge discovery, which can be important to understand the confluence of different diseases, genes, and risk factors. Furthermore, integration of different tools at specific levels could be valuable. The main focus of the dissertation is developing and integrating tools in finding network of semantically related entities. The key contribution is the design and implementation of an Adaptive Robust and Integrative Analysis for finding Novel Associations. ARIANA is a software architecture and a web-based system for efficient and scalable knowledge discovery. It integrates semantic-sensitive analysis of text-data through ontology-mapping with database search technology to ensure the required specificity. ARIANA was prototyped using the Medical Subject Headings ontology and PubMed database and has demonstrated great success as a dynamic-data-driven system. ARIANA has five main components: (i) Data Stratification, (ii) Ontology-Mapping, (iii) Parameter Optimized Latent Semantic Analysis, (iv) Relevance Model and (v) Interface and Visualization. The other contribution is integration of ARIANA with Online Mendelian Inheritance in Man database, and Medical Subject Headings ontology to provide gene-disease associations. Empirical studies produced some exciting knowledge discovery instances. Among them was the connection between the hexamethonium and pulmonary inflammation and fibrosis. In 2001, a research study at John Hopkins used the drug hexamethonium on a healthy volunteer that ended in a tragic death due to pulmonary inflammation and fibrosis. This accident might have been prevented if the researcher knew of published case report. Since the original case report in 1955, there has not been any publications regarding that association. ARIANA extracted this knowledge even though its database contains publications from 1960 to 2012. Out of 2,545 concepts, ARIANA ranked “Scleroderma, Systemic”, “Neoplasms, Fibrous Tissue”, “Pneumonia”, “Fibroma”, and “Pulmonary Fibrosis” as the 13th, 16th, 38th, 174th and 257th ranked concept respectively. The researcher had access to such knowledge this drug would likely not have been used on healthy subjects.In today\u27s world where data and knowledge are moving away from each other, semantic-sensitive tools such as ARIANA can bridge that gap and advance dissemination of knowledge

    Understanding Mobility and Transport Modal Disparities Using Emerging Data Sources: Modelling Potentials and Limitations

    Get PDF
    Transportation presents a major challenge to curb climate change due in part to its ever-increasing travel demand. Better informed policy-making requires up-to-date empirical mobility data to model viable mitigation options for reducing emissions from the transport sector. On the one hand, the prevalence of digital technologies enables a large-scale collection of human mobility traces, providing big potentials for improving the understanding of mobility patterns and transport modal disparities. On the other hand, the advancement in data science has allowed us to continue pushing the boundary of the potentials and limitations, for new uses of big data in transport.This thesis uses emerging data sources, including Twitter data, traffic data, OpenStreetMap (OSM), and trip data from new transport modes, to enhance the understanding of mobility and transport modal disparities, e.g., how car and public transit support mobility differently. Specifically, this thesis aims to answer two research questions: (1) What are the potentials and limitations of using these emerging data sources for modelling mobility? (2) How can these new data sources be properly modelled for characterising transport modal disparities? Papers I-III model mobility mainly using geotagged social media data, and reveal the potentials and limitations of this data source by validating against established sources (Q1). Papers IV-V combine multiple data sources to characterise transport modal disparities (Q2) which further demonstrate the modelling potentials of the emerging data sources (Q1).Despite a biased population representation and low and irregular sampling of the actual mobility, the geolocations of Twitter data can be used in models to produce good agreements with the other data sources on the fundamental characteristics of individual and population mobility. However, its feasibility for estimating travel demand depends on spatial scale, sparsity, sampling method, and sample size. To extend the use of social media data, this thesis develops two novel approaches to address the sparsity issue: (1) An individual-based mobility model that fills the gaps in the sparse mobility traces for synthetic travel demand; (2) A population-based model that uses Twitter geolocations as attractions instead of trips for estimating the flows of people between regions. This thesis also presents two reproducible data fusion frameworks for characterising transport modal disparities. They demonstrate the power of combining different data sources to gain new insights into the spatiotemporal patterns of travel time disparities between car and public transit, and the competition between ride-sourcing and public transport

    Constructing hypergraphs from temporal data

    Full text link
    A wide range of systems across the social and natural sciences produce temporal data consisting of interaction events among nodes in disjoint sets. Online shopping, for example, generates purchasing events of the form (user, product, time of purchase), and mutualistic interactions in plant-pollinator systems generate pollination events of the form (insect, plant, time of pollination). These data sets can be meaningfully modeled as temporal hypergraph snapshots in which multiple nodes within one set (i.e. online shoppers) share a hyperedge if they interacted with a common node in the opposite set (i.e. purchased the same product) within a given time window, allowing for the application of a range of hypergraph analysis techniques. However, it is often unclear how to choose the number and duration of these temporal snapshots, which have a strong influence on the final hypergraph representations. Here we propose a principled, efficient, nonparametric solution to this longstanding problem by extracting temporal hypergraph snapshots that optimally capture structural regularities in temporal event data according to the minimum description length principle. We demonstrate our methods on real and synthetic datasets, finding that they can recover planted artificial hypergraph structure in the presence of considerable noise and reveal meaningful activity fluctuations in human mobility data

    Self-building Artificial Intelligence and machine learning to empower big data analytics in smart cities

    Get PDF
    YesThe emerging information revolution makes it necessary to manage vast amounts of unstructured data rapidly. As the world is increasingly populated by IoT devices and sensors that can sense their surroundings and communicate with each other, a digital environment has been created with vast volumes of volatile and diverse data. Traditional AI and machine learning techniques designed for deterministic situations are not suitable for such environments. With a large number of parameters required by each device in this digital environment, it is desirable that the AI is able to be adaptive and self-build (i.e. self-structure, self-configure, self-learn), rather than be structurally and parameter-wise pre-defined. This study explores the benefits of self-building AI and machine learning with unsupervised learning for empowering big data analytics for smart city environments. By using the growing self-organizing map, a new suite of self-building AI is proposed. The self-building AI overcomes the limitations of traditional AI and enables data processing in dynamic smart city environments. With cloud computing platforms, the selfbuilding AI can integrate the data analytics applications that currently work in silos. The new paradigm of the self-building AI and its value are demonstrated using the IoT, video surveillance, and action recognition applications.Supported by the Data to Decisions Cooperative Research Centre (D2D CRC) as part of their analytics and decision support program and a La Trobe University Postgraduate Research Scholarship

    Fine-Grained Image Analysis with Deep Learning: A Survey

    Get PDF
    Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas -- fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.Comment: Accepted by IEEE TPAM

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health
    • 

    corecore