10 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationData-driven analytics has been successfully utilized in many experience-oriented areas, such as education, business, and medicine. With the profusion of traffic-related data from Internet of Things and development of data mining techniques, data-driven analytics is becoming increasingly popular in the transportation industry. The objective of this research is to explore the application of data-driven analytics in transportation research to improve the traffic management and operations. Three problems in the respective areas of transportation planning, traffic operation, and maintenance management have been addressed in this research, including exploring the impact of dynamic ridesharing system in a multimodal network, quantifying non-recurrent congestion impact on freeway corridors, and developing infrastructure sampling method for efficient maintenance activities. First, the impact of dynamic ridesharing in a multimodal network is studied with agent-based modeling. The competing mechanism between dynamic ridesharing system and public transit is analyzed. The model simulates the interaction between travelers and the environment and emulates travelers' decision making process with the presence of competing modes. The model is applicable to networks with varying demographics. Second, a systematic approach is proposed to quantify Incident-Induced Delay on freeway corridors. There are two particular highlights in the study of non-recurrent congestion quantification: secondary incident identification and K-Nearest Neighbor pattern matching. The proposed methodology is easily transferable to any traffic operation system that has access to sensor data at a corridor level. Lastly, a high-dimensional clustering-based stratified sampling method is developed for infrastructure sampling. The stratification process consists of two components: current condition estimation and high-dimensional cluster analysis. High-dimensional cluster analysis employs Locality-Sensitive Hashing algorithm and spectral sampling. The proposed method is a potentially useful tool for agencies to effectively conduct infrastructure inspection and can be easily adopted for choosing samples containing multiple features. These three examples showcase the application of data-driven analytics in transportation research, which can potentially transform the traffic management mindset into a model of data-driven, sensing, and smart urban systems. The analytic

    Re-designing distance functions and distance-based applications for high dimensional data

    No full text

    Data and knowledge engineering for medical image and sensor data

    Get PDF

    Image Analysis Applications of the Maximum Mean Discrepancy Distance Measure

    Get PDF
    The need to quantify distance between two groups of objects is prevalent throughout the signal processing world. The difference of group means computed using the Euclidean, or L2 distance, is one of the predominant distance measures used to compare feature vectors and groups of vectors, but many problems arise with it when high data dimensionality is present. Maximum mean discrepancy (MMD) is a recent unsupervised kernel-based pattern recognition method which may improve differentiation between two distinct populations over many commonly used methods such as the difference of means, when paired with the proper feature representations and kernels. MMD-based distance computation combines many powerful concepts from the machine learning literature, such as data distribution-leveraging similarity measures and kernel methods for machine learning. Due to this heritage, we posit that dissimilarity-based classification and changepoint detection using MMD can lead to enhanced separation between different populations. To test this hypothesis, we conduct studies comparing MMD and the difference of means in two subareas of image analysis and understanding: first, to detect scene changes in video in an unsupervised manner, and secondly, in the biomedical imaging field, using clinical ultrasound to assess tumor response to treatment. We leverage effective computer vision data descriptors, such as the bag-of-visual-words and sparse combinations of SIFT descriptors, and choose from an assessment of several similarity kernels (e.g. Histogram Intersection, Radial Basis Function) in order to engineer useful systems using MMD. Promising improvements over the difference of means, measured primarily using precision/recall for scene change detection, and k-nearest neighbour classification accuracy for tumor response assessment, are obtained in both applications.1 yea

    Building efficient wireless infrastructures for pervasive computing environments

    Get PDF
    Pervasive computing is an emerging concept that thoroughly brings computing devices and the consequent technology into people\u27s daily life and activities. Most of these computing devices are very small, sometimes even invisible , and often embedded into the objects surrounding people. In addition, these devices usually are not isolated, but networked with each other through wireless channels so that people can easily control and access them. In the architecture of pervasive computing systems, these small and networked computing devices form a wireless infrastructure layer to support various functionalities in the upper application layer.;In practical applications, the wireless infrastructure often plays a role of data provider in a query/reply model, i.e., applications issue a query requesting certain data and the underlying wireless infrastructure is responsible for replying to the query. This dissertation has focused on the most critical issue of efficiency in designing such a wireless infrastructure. In particular, our problem resides in two domains depending on different definitions of efficiency. The first definition is time efficiency, i.e., how quickly a query can be replied. Many applications, especially real-time applications, require prompt response to a query as the consequent operations may be affected by the prior delay. The second definition is energy efficiency which is extremely important for the pervasive computing devices powered by batteries. Above all, our design goal is to reply to a query from applications quickly and with low energy cost.;This dissertation has investigated two representative wireless infrastructures, sensor networks and RFID systems, both of which can serve applications with useful information about the environments. We have comprehensively explored various important and representative problems from both algorithmic and experimental perspectives including efficient network architecture design and efficient protocols for basic queries and complicated data mining queries. The major design challenges of achieving efficiency are the massive amount of data involved in a query and the extremely limited resources and capability each small device possesses. We have proposed novel and efficient solutions with intensive evaluation. Compared to the prior work, this dissertation has identified a few important new problems and the proposed solutions significantly improve the performance in terms of time efficiency and energy efficiency. Our work also provides referrable insights and appropriate methodology to other similar problems in the research community

    Physically inspired methods and development of data-driven predictive systems.

    Get PDF
    Traditionally building of predictive models is perceived as a combination of both science and art. Although the designer of a predictive system effectively follows a prescribed procedure, his domain knowledge as well as expertise and intuition in the field of machine learning are often irreplaceable. However, in many practical situations it is possible to build well–performing predictive systems by following a rigorous methodology and offsetting not only the lack of domain knowledge but also partial lack of expertise and intuition, by computational power. The generalised predictive model development cycle discussed in this thesis is an example of such methodology, which despite being computationally expensive, has been successfully applied to real–world problems. The proposed predictive system design cycle is a purely data–driven approach. The quality of data used to build the system is thus of crucial importance. In practice however, the data is rarely perfect. Common problems include missing values, high dimensionality or very limited amount of labelled exemplars. In order to address these issues, this work investigated and exploited inspirations coming from physics. The novel use of well–established physical models in the form of potential fields, has resulted in derivation of a comprehensive Electrostatic Field Classification Framework for supervised and semi–supervised learning from incomplete data. Although the computational power constantly becomes cheaper and more accessible, it is not infinite. Therefore efficient techniques able to exploit finite amount of predictive information content of the data and limit the computational requirements of the resource–hungry predictive system design procedure are very desirable. In designing such techniques this work once again investigated and exploited inspirations coming from physics. By using an analogy with a set of interacting particles and the resulting Information Theoretic Learning framework, the Density Preserving Sampling technique has been derived. This technique acts as a computationally efficient alternative for cross–validation, which fits well within the proposed methodology. All methods derived in this thesis have been thoroughly tested on a number of benchmark datasets. The proposed generalised predictive model design cycle has been successfully applied to two real–world environmental problems, in which a comparative study of Density Preserving Sampling and cross–validation has also been performed confirming great potential of the proposed methods

    Physically inspired methods and development of data-driven predictive systems

    Get PDF
    Traditionally building of predictive models is perceived as a combination of both science and art. Although the designer of a predictive system effectively follows a prescribed procedure, his domain knowledge as well as expertise and intuition in the field of machine learning are often irreplaceable. However, in many practical situations it is possible to build well–performing predictive systems by following a rigorous methodology and offsetting not only the lack of domain knowledge but also partial lack of expertise and intuition, by computational power. The generalised predictive model development cycle discussed in this thesis is an example of such methodology, which despite being computationally expensive, has been successfully applied to real–world problems. The proposed predictive system design cycle is a purely data–driven approach. The quality of data used to build the system is thus of crucial importance. In practice however, the data is rarely perfect. Common problems include missing values, high dimensionality or very limited amount of labelled exemplars. In order to address these issues, this work investigated and exploited inspirations coming from physics. The novel use of well–established physical models in the form of potential fields, has resulted in derivation of a comprehensive Electrostatic Field Classification Framework for supervised and semi–supervised learning from incomplete data. Although the computational power constantly becomes cheaper and more accessible, it is not infinite. Therefore efficient techniques able to exploit finite amount of predictive information content of the data and limit the computational requirements of the resource–hungry predictive system design procedure are very desirable. In designing such techniques this work once again investigated and exploited inspirations coming from physics. By using an analogy with a set of interacting particles and the resulting Information Theoretic Learning framework, the Density Preserving Sampling technique has been derived. This technique acts as a computationally efficient alternative for cross–validation, which fits well within the proposed methodology. All methods derived in this thesis have been thoroughly tested on a number of benchmark datasets. The proposed generalised predictive model design cycle has been successfully applied to two real–world environmental problems, in which a comparative study of Density Preserving Sampling and cross–validation has also been performed confirming great potential of the proposed methods.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore