188 research outputs found

    Word embedding-based techniques for text clustering and topic modelling with application in the healthcare domain

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.In the field of text analytics, document clustering and topic modelling are two widely-used tools for many applications. Document clustering aims to automatically organize similar documents into groups, which is crucial for document organization, browsing, summarization, classification and retrieval. Topic modelling refers to unsupervised models that automatically discover the main topics of a collection of documents. In topic modelling, the topics are simply represented as probability distributions over the words in the collection (the different probabilities reveal what topic is at stake). In turn, each document is represented as a distribution over the topics. Such distributions can also be seen as low-dimensional representations of the documents that can be used for information retrieval, document summarization and classification. Document clustering and topic modelling are highly correlated and can mutually benefit from each other. Many document clustering algorithms exist, including the classic k-means. In this thesis, we have developed three new algorithms: 1) a maximum-margin clustering approach which was originally proposed for general data, but can also suit text clustering, 2) a modified global k-means algorithm for text clustering which is able to improve the local minima and find a deeper local solution for clustering document collections in a limited amount of time, and 3) a taxonomy-augmented algorithm which addresses two main drawbacks of the so-called “bag-of-words” (BoW) models, namely, the curse of dimensionality and the dismissal of word ordering. Our main emphasis is on high accuracy and effectiveness within the bounds of limited memory consumption. Although great effort has been devoted to topic modelling to date, a limitation of many topic models such as latent Dirichlet allocation is that they do not take the words’ relations explicitly into account. Our contribution has been two-fold. We have developed a topic model which captures how words are topically related. The model is presented as a semi-supervised Markov chain topic model in which topics are assigned to individual words based on how each word is topically connected to the previous one in the collection. We have combined topic modelling and clustering to propose a new algorithm that benefits from both. This research was industry-driven, focusing on projects from the Transport Accident Commission (TAC), a major accident compensation agency of the Victorian Government in Australia. It has received full ethics approval from the UTS Human Research Ethics Committee. The results presented in this thesis do not allow reidentifying any person involved in the services

    A simulated annealing-based maximum-margin clustering algorithm

    Get PDF
    © 2018 Wiley Periodicals, Inc. Maximum-margin clustering is an extension of the support vector machine (SVM) to clustering. It partitions a set of unlabeled data into multiple groups by finding hyperplanes with the largest margins. Although existing algorithms have shown promising results, there is no guarantee of convergence of these algorithms to global solutions due to the nonconvexity of the optimization problem. In this paper, we propose a simulated annealing-based algorithm that is able to mitigate the issue of local minima in the maximum-margin clustering problem. The novelty of our algorithm is twofold, ie, (i) it comprises a comprehensive cluster modification scheme based on simulated annealing, and (ii) it introduces a new approach based on the combination of k-means++ and SVM at each step of the annealing process. More precisely, k-means++ is initially applied to extract subsets of the data points. Then, an unsupervised SVM is applied to improve the clustering results. Experimental results on various benchmark data sets (of up to over a million points) give evidence that the proposed algorithm is more effective at solving the clustering problem than a number of popular clustering algorithms

    Evaluation of Mesh Characteristics for Scale-Resolving Simulation of Incompressible Flows

    Get PDF
    RÉSUMÉ L'objectif principal de cette recherche est la simulation d'écoulements similaires à ceux rencontrés à l'intérieur des aspirateurs des turbines hydrauliques lorsque celles-ci fonctionnent hors des conditions nominales. L'importance de cette application réside dans le fait que les turbines doivent souvent être exploitées dans une gamme étendue de conditions de fonctionnement, y compris des conditions hors du point du rendement maximum. Ceci s'explique par le fait que l'hydroélectricité joue un rôle important en tant que source flexible d'alimentation en énergie pour le réseau électrique. L'énergie hydro-électrique est particulièrement importante dans la mesure où des sources d'énergie intermittentes telles que l'énergie solaire et éolienne font désormais partie du marché. Cependant, élargir les gammes de conditions de fonctionnement rend plus cruciale l'analyse des contraintes fluctuantes. Celles-ci peuvent en effet entraîner des instabilités, des défaillances mécaniques du système et également des oscillations de puissance spontanées sur le réseau. Par conséquent, la compréhension et l'atténuation du comportement instable des turbines hydrauliques est centrale. Les approches SRS (Scale Resolving Simulation) telles que les LES et DES ont suscité beaucoup d'intérêt au cours de la dernière décennie pour une compréhension plus complète du comportement opérationnel instable des turbines hydrauliques. Cet intérêt s'explique par leur capacité à résoudre une partie de l'écoulement turbulent. Cependant, pour certains écoulements industriels, comme ceux à charge partielle, à charge partielle profonde ou à vide, pour lesquels les données expérimentales sont insuffisantes pour une compréhension approfondie des phénomènes, la fiabilité des simulations numériques en termes de dépendance au maillage est toujours un problème en suspens. Les études de vérification en LES sont également très difficiles, car les erreurs de discrétisation numérique et de modélisation des échelles sont toutes deux influencées par la résolution du maillage. Un examen approfondi de la littérature montre que les résultats SRS des différentes conditions de fonctionnement des turbines hydrauliques sont encore assez limités et qu'il n'y a pas de consensus sur l'exigence de résolution pour ces études. Par conséquent, le but de cette recherche est de développer un cadre fiable pour la validation et la vérification des études SRS, et plus particulièrement les études LES, afin qu'elles puissent être utilisées pour l'analyse des phénomènes d'écoulement dans les aspirateurs et les roues des turbines hydrauliques, pour des conditions de fonctionnement hors conception. Plusieurs critères de résolution pour l'analyse LES ont été identifiés dans la littérature et leur applicabilité et leur sensibilité sont examinées. Deux principaux cas test sont considérés dans cette recherche: l'écoulement turbulent dans un canal et un cas d'expansion soudaine. Dans cette étude, nous n'irons pas plus loin dans les applications aux turbines hydrauliques, mais celles-ci bénéficieront à terme des résultats des recherches en cours. Les résultats montrent que l'autocorrélation entre deux points est plus sensible à la résolution du maillage que le spectre énergétique. De plus, dans le cas d'une expansion soudaine, la résolution du maillage a un effet énorme sur les résultats et jusqu'à présent, nous n'avons pas capté de comportement de convergence asymptotique dans les résultats de RMS des fluctuations de vitesse et d'autocorrélation en deux points. Ce cas, qui représente un comportement d'écoulement complexe, nécessite d'autres études de résolution de maillage.----------ABSTRACT The central aim of this research is on the simulation of flows similar to the ones which occur inside hydraulic turbine draft-tubes at off-design operating conditions. The importance of this application is due to the fact that hydroturbines often need to be operated over an extended range of operating conditions including off-design conditions, since hydropower plays a significant role as a flexible source of energy supply to the electric network. This significance is due to the integration of non-dispatchable sources of energy such as solar and wind power. This range of operating conditions, however, makes the investigation of fluctuating stresses more crucial. Load fluctuations lead to instability, system mechanical failure and also to spontaneous power swings to the grid. Consequently, understanding and mitigating unsteady operational behavior of hydro turbines is very crucial. SRS approaches such as LES and DES have received more interests in the recent decade for understanding and mitigating unsteady operational behavior of hydro turbines. This interest is due to the ability of these methods to resolve part of turbulent flow. However, for some industrial flows, where there is no adequate experimental data for deep understanding of the flow physics, such as the ones which happen at part load, deep part load and speed no-load operation of hydraulic turbines, the reliability of numerical simulations in terms of their grid-dependency is still an open question. Verification studies in LES are also very challenging, since errors in numerical discretization and also subgrid-scale-model are both influenced by grid resolution. Comprehensive examination of the literature shows that the SRS of different operating condition of the hydraulic turbines is still quite limited and that there is no consensus on the resolution requirement of SRS studies. Therefore, the goal of this research is to develop a reliable framework for validation and verification of SRS , specially LES, so that it can be applied for the investigation of flow phenomena in hydraulic turbines draft-tubes and runners at their off-design operating conditions. Several resolution criteria for LES analysis have been identified in the literature and their applicability and the level of insight which they put into our analysis are scrutinized. Two main test cases are considered in this research, turbulent channel flow and a case of sudden expansion. In this study we will not further go to the real applications and simulations in hydraulic turbines. Hydraulic turbines will eventually benefit from the results of the current research. The results show that two-point autocorrelation is more sensitive to mesh resolution that energy spectra. In addition, for the case of sudden expansion, the mesh resolution has a tremendous effect on the results and so far, we did not capture an asymptotic converging behaviour in the results of RMS of velocity fluctuations and two-point autocorrelation. This case, which represents complex flow behaviour, needs further mesh resolution studies

    Urbanisation-driven land degradation and socioeconomic challenges in peri-urban areas: Insights from Southern Europe

    Get PDF
    Climate change and landscape transformation have led to rapid expansion of peri-urban areas globally, representing new ‘laboratories’ for the study of human–nature relationships aiming at land degradation management. This paper contributes to the debate on human-driven land degradation processes by highlighting how natural and socioeconomic forces trigger soil depletion and environmental degradation in peri-urban areas. The aim was to classify and synthesise the interactions of urbanisation-driven factors with direct or indirect, on-site or off-site, and short-term or century-scale impacts on land degradation, focussing on Southern Europe as a paradigmatic case to address this issue. Assuming complex and multifaceted interactions among influencing factors, a relevant contribution to land degradation was shown to derive from socioeconomic drivers, the most important of which were population growth and urban sprawl. Viewing peri-urban areas as socio-environmental systems adapting to intense socioeconomic transformations, these factors were identified as forming complex environmental ‘syndromes’ driven by urbanisation. Based on this classification, we suggested three key measures to support future land management in Southern European peri-urban areas

    The effects of solution treatment on the microstructure of the cast Ni-based IN100 superalloy

    Get PDF
    In this research, the effects of the partial, full and partial + full solution heat treatments followed by aging at 900 °C for 10 h, on the microstructure of cast Ni-based IN100 superalloy were assessed. It has been found that, the alloy in the partial + full solution treated condition had the optimal combination of γ’ morphology, volume fraction and size. In this condition, the alloy possesses a cubic primary γ’with an average size of 470 ±10nm and 45% volume fraction. Discrete M23C6 and M6C carbides were formed at the grain boundaries and the morphology of the cubic MC carbide was changed to the spherical shape. In addition, the volume fraction of γ’/γ eutectic phase dropped to half of its value, compared to the as-cast alloy. During partial solution treatment followed by aging, discrete carbides were formed at the grain boundaries. This treatment without full solutioning was not an effective method to provide an optimal volume fraction and arrangement of γ’ and MC carbides morphology. Full solutioning alone, changed the cubic morphology of the primary γ’ and the blocky MC carbides to the spherical shape

    Flow Control of Transonic Airfoils using Optimum Suction and Injection Parameters

    Get PDF
    In this paper, the application of the surface mass transfer optimization in shock wave-boundary layer interaction control at off-design conditions of transonic aircraft wing is presented. The suction or injection parameters include for example its position on the airfoil, its angle, the length of the hole and the rate of the injected or sucked flow. The optimization process is carried out using an efficient Genetic Algorithm (GA) method. The compressible viscous flow equations in Reynolds Averaged form are solved together with a two-equation k-epsilon turbulence model to accurately compute the objective function. Four different objective functions are introduced including maximum lift to drag ratio, minimum drag coefficient, maximum lift to drag ratio with no drag increment and minimum drag coefficient with no lift decrement. Effectiveness of each objective function is examined by comparing the optimum results in terms of the flow control parameters and flow characteristics

    An enhanced stochastic optimization in fracture network modelling conditional on seismic events

    Get PDF
    This paper presents an approach to modelling fracture networks in hot dry rock geothermal reservoirs. A detailed understanding of the fracture network within a geothermal reservoir is critically important for assessments of reservoir potential and optimal production design. One important step in fracture network modelling is to estimate the fracture density and the fracture geometries, particularly the size and orientation of fractures. As fracture networks in these reservoirs can never be directly observed there is significant uncertainty about their true nature and the only feasible approach to modelling is a stochastic one. We propose a global optimization approach using simulated annealing which is an extension of our previous work. The fracture model consists of a number of individual fractures represented by ellipses passing through the micro-seismic points detected during the fracture stimulation process, i.e. the fracture model is conditioned on the seismic points. The distances of the seismic points from fitted fracture planes (ellipses) are, therefore, important in assessing the goodness-of-fit of the model. Our aims in the proposed approach are to formulate an appropriate objective function for the optimal fitting of a set of fracture planes to the micro-seismic data and to derive an efficient modification scheme to update the model parameters. The proposed objective function consists of three components: orthogonal projection distances of the seismic points from the nearest fitted fractures, the amount of fracturing (fitted fracture areas) and the volumes of the convex hull of the associated points of fitted fractures. The functions used in the model update scheme allow the model to achieve an acceptable fit to the points and to converge to acceptable fitted fracture sizes. These functions include two groups of proposals: one for updating fracture parameters and the other for determining the size of the fracture network. To increase the efficiency of the optimization, a spatial clustering approach, the Distance-Directional Transform, was developed to generate parameters for newly proposed fractures. A simulated dataset was used as an example to evaluate our approach and we compared the results to those derived using our previously published algorithm on a real dataset from the Habanero geothermal field in the Cooper Basin, South Australia. In a real application, such as the Habanero dataset, it is difficult to determine definitively which algorithm performs better due to the many uncertainties but the number of association points, the number of final fractures and the error are three important factors that quantify the effectiveness of our algorithm. © 2014 Elsevier Ltd.S. Seifollahi, P.A. Dowd, C. X

    A stochastic model for the fracture network in the Habanero enhanced geothermal system

    Get PDF
    GeoCat; 74874Fracture Network Modelling (FNM) plays an important role in many areas where the characterization of discontinuities in deep ground is required. Applications of the FNM include, but not limited, hydrocarbon reservoir production, mineral extraction, tunnelling, underground storage or disposal of hazardous wastes and geothermal systems. One important step in FNM is to estimate the density of fractures and geometries and properties of individual fractures such as the size and orientation. Due to the lack of data, the tortuous nature of fractures and the great uncertainty involved in practice, the only feasible approach is via a stochastic modelling. This paper describes a general optimization approach to modelling the fracture network in a geothermal reservoir, conditioned on the seismic events several kilometres beneath the surface detected during the fracture stimulation process. Two key aspects of our method are the construction of an appropriate objective function and the derivation of an efficient updating scheme, which still remain to be the two challenging issues of most global optimization techniques. In our application, the objective function consists of two important components: the minimisation of squared distances of the seismic points to the fracture model and the minimisation of number of fractures or the amount of fracturing, which corresponds to the least consumption of fracturing energy. The model updating process includes several proposals for perturbing the parameters of individual fractures and also to alter the size of the fracture network in order to get a global optimal solution. As a case study, the model is applied to Geodynamics’ Habanero reservoir in the Cooper Basin of South Australia.Seifollahi, S., Dowd, P-A and Xu,

    Globally convergent algorithms for solving unconstrained optimization problems

    Get PDF
    New algorithms for solving unconstrained optimization problems are presented based on the idea of combining two types of descent directions: the direction of anti-gradient and either the Newton or quasi-Newton directions. The use of latter directions allows one to improve the convergence rate. Global and superlinear convergence properties of these algorithms are established. Numerical experiments using some unconstrained test problems are reported. Also, the proposed algorithms are compared with some existing similar methods using results of experiments. This comparison demonstrates the efficiency of the proposed combined methods
    corecore