1,816 research outputs found

    Text mining and natural language processing for the early stages of space mission design

    Get PDF
    Final thesis submitted December 2021 - degree awarded in 2022A considerable amount of data related to space mission design has been accumulated since artificial satellites started to venture into space in the 1950s. This data has today become an overwhelming volume of information, triggering a significant knowledge reuse bottleneck at the early stages of space mission design. Meanwhile, virtual assistants, text mining and Natural Language Processing techniques have become pervasive to our daily life. The work presented in this thesis is one of the first attempts to bridge the gap between the worlds of space systems engineering and text mining. Several novel models are thus developed and implemented here, targeting the structuring of accumulated data through an ontology, but also tasks commonly performed by systems engineers such as requirement management and heritage analysis. A first collection of documents related to space systems is gathered for the training of these methods. Eventually, this work aims to pave the way towards the development of a Design Engineering Assistant (DEA) for the early stages of space mission design. It is also hoped that this work will actively contribute to the integration of text mining and Natural Language Processing methods in the field of space mission design, enhancing current design processes.A considerable amount of data related to space mission design has been accumulated since artificial satellites started to venture into space in the 1950s. This data has today become an overwhelming volume of information, triggering a significant knowledge reuse bottleneck at the early stages of space mission design. Meanwhile, virtual assistants, text mining and Natural Language Processing techniques have become pervasive to our daily life. The work presented in this thesis is one of the first attempts to bridge the gap between the worlds of space systems engineering and text mining. Several novel models are thus developed and implemented here, targeting the structuring of accumulated data through an ontology, but also tasks commonly performed by systems engineers such as requirement management and heritage analysis. A first collection of documents related to space systems is gathered for the training of these methods. Eventually, this work aims to pave the way towards the development of a Design Engineering Assistant (DEA) for the early stages of space mission design. It is also hoped that this work will actively contribute to the integration of text mining and Natural Language Processing methods in the field of space mission design, enhancing current design processes

    A novel combination of Cased-Based Reasoning and Multi Criteria Decision Making approach to radiotherapy dose planning

    Get PDF
    In this thesis, a set of novel approaches has been developed by integration of Cased-Based Reasoning (CBR) and Multi-Criteria Decision Making (MCDM) techniques. Its purpose is to design a support system to assist oncologists with decision making about the dose planning for radiotherapy treatment with a focus on radiotherapy for prostate cancer. CBR, an artificial intelligence approach, is a general paradigm to reasoning from past experiences. It retrieves previous cases similar to a new case and exploits the successful past solutions to provide a suggested solution for the new case. The case pool used in this research is a dataset consisting of features and details related to successfully treated patients in Nottingham University Hospital. In a typical run of prostate cancer radiotherapy simple CBR, a new case is selected and thereafter based on the features available at our data set the most similar case to the new case is obtained and its solution is prescribed to the new case. However, there are a number of deficiencies associated with this approach. Firstly, in a real-life scenario, the medical team considers multiple factors rather than just the similarity between two cases and not always the most similar case provides with the most appropriate solution. Thus, in this thesis, the cases with high similarity to a new case have been evaluated with the application of the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). This approach takes into account multiple criteria besides similarity to prescribe a final solution. Moreover, the obtained dose plans were optimised through a Goal Programming mathematical model to improve the results. By incorporating oncologists’ experiences about violating the conventionally available dose limits a system was devised to manage the trade-off between treatment risk for sensitive organs and necessary actions to effectively eradicate cancer cells. Additionally, the success rate of the treatment, the 2-years cancer free possibility, has a vital role in the efficiency of the prescribed solutions. To consider the success rate, as well as uncertainty involved in human judgment about the values of different features of radiotherapy Data Envelopment Analysis (DEA) based on grey numbers, was used to assess the efficiency of different treatment plans on an input and output based approach. In order to deal with limitations involved in DEA regarding the number of inputs and outputs, we presented an approach for Factor Analysis based on Principal Components to utilize the grey numbers. Finally, to improve the CBR base of the system, we applied Grey Relational Analysis and Gaussian distant based CBR along with features weight selection through Genetic Algorithm to better handle the non-linearity exists within the problem features and the high number of features. Finally, the efficiency of each system has been validated through leave-one-out strategy and the real dataset. The results demonstrated the efficiency of the proposed approaches and capability of the system to assist the medical planning team. Furthermore, the integrated approaches developed within this thesis can be also applied to solve other real-life problems in various domains other than healthcare such as supply chain management, manufacturing, business success prediction and performance evaluation

    A Knowledge Enriched Computational Model to Support Lifecycle Activities of Computational Models in Smart Manufacturing

    Get PDF
    Due to the needs in supporting lifecycle activities of computational models in Smart Manufacturing (SM), a Knowledge Enriched Computational Model (KECM) is proposed in this dissertation to capture and integrate domain knowledge with standardized computational models. The KECM captures domain knowledge into information model(s), physics-based model(s), and rationales. To support model development in a distributed environment, the KECM can be used as the medium for formal information sharing between model developers. A case study has been developed to demonstrate the utilization of the KECM in supporting the construction of a Bayesian Network model. To support the deployment of computational models in SM systems, the KECM can be used for data integration between computational models and SM systems. A case study has been developed to show the deployment of a Constraint Programming optimization model into a Business To Manufacturing Markup Language (B2MML) -based system. In another situation where multiple computational models need to be deployed, the KECM can be used to support the combination of computational models. A case study has been developed to show the combination of an Agent-based model and a Decision Tree model using the KECM. To support model retrieval, a semantics-based method is suggested in this dissertation. As an example, a dispatching rule model retrieval problem has been addressed with a semantics-based approach. The semantics-based approach has been verified and it demonstrates good capability in using the KECM to retrieve computational models

    Adapting image processing and clustering methods to productive efficiency analysis and benchmarking: A cross disciplinary approach

    Get PDF
    This dissertation explores the interdisciplinary applications of computational methods in quantitative economics. Particularly, this thesis focuses on problems in productive efficiency analysis and benchmarking that are hardly approachable or solvable using conventional methods. In productive efficiency analysis, null or zero values are often produced due to the wrong skewness or low kurtosis of the inefficiency distribution as against the distributional assumption on the inefficiency term. This thesis uses the deconvolution technique, which is traditionally used in image processing for noise removal, to develop a fully non-parametric method for efficiency estimation. Publications 1 and 2 are devoted to this topic, with focus being laid on the cross-sectional case and panel case, respectively. Through Monte-Carlo simulations and empirical applications to Finnish electricity distribution network data and Finnish banking data, the results show that the Richardson-Lucy blind deconvolution method is insensitive to the distributio-nal assumptions, robust to the data noise levels and heteroscedasticity on efficiency estimation. In benchmarking, which could be the next step of productive efficiency analysis, the 'best practice' target may not perform under the same operational environment with the DMU under study. This would render the benchmarks impractical to follow and adversely affects the managers to make the correct decisions on performance improvement of a DMU. This dissertation proposes a clustering-based benchmarking framework in Publication 3. The empirical study on Finnish electricity distribution network reveals that the proposed framework novels not only in its consideration on the differences of the operational environment among DMUs, but also its extreme flexibility. We conducted a comparison analysis on the different combinations of the clustering and efficiency estimation techniques using computational simulations and empirical applications to Finnish electricity distribution network data, based on which Publication 4 specifies an efficient combination for benchmarking in energy regulation.  This dissertation endeavors to solve problems in quantitative economics using interdisciplinary approaches. The methods developed benefit this field and the way how we approach the problems open a new perspective

    Management of data quality when integrating data with known provenance

    Get PDF
    Abstract unavailable please refer to PD

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Text Clumping for Technical Intelligence

    Get PDF

    Methodological review of multicriteria optimization techniques: aplications in water resources

    Get PDF
    Multi-criteria decision analysis (MCDA) is an umbrella approach that has been applied to a wide range of natural resource management situations. This report has two purposes. First, it aims to provide an overview of advancedmulticriteriaapproaches, methods and tools. The review seeks to layout the nature of the models, their inherent strengths and limitations. Analysis of their applicability in supporting real-life decision-making processes is provided with relation to requirements imposed by organizationally decentralized and economically specific spatial and temporal frameworks. Models are categorized based on different classification schemes and are reviewed by describing their general characteristics, approaches, and fundamental properties. A necessity of careful structuring of decision problems is discussed regarding planning, staging and control aspects within broader agricultural context, and in water management in particular. A special emphasis is given to the importance of manipulating decision elements by means ofhierarchingand clustering. The review goes beyond traditionalMCDAtechniques; it describes new modelling approaches. The second purpose is to describe newMCDAparadigms aimed at addressing the inherent complexity of managing water ecosystems, particularly with respect to multiple criteria integrated with biophysical models,multistakeholders, and lack of information. Comments about, and critical analysis of, the limitations of traditional models are made to point out the need for, and propose a call to, a new way of thinking aboutMCDAas they are applied to water and natural resources management planning. These new perspectives do not undermine the value of traditional methods; rather they point to a shift in emphasis from methods for problem solving to methods for problem structuring. Literature review show successfully integrations of watershed management optimization models to efficiently screen a broad range of technical, economic, and policy management options within a watershed system framework and select the optimal combination of management strategies and associated water allocations for designing a sustainable watershed management plan at least cost. Papers show applications in watershed management model that integrates both natural and human elements of a watershed system including the management of ground and surface water sources, water treatment and distribution systems, human demands,wastewatertreatment and collection systems, water reuse facilities,nonpotablewater distribution infrastructure, aquifer storage and recharge facilities, storm water, and land use

    Challenging the exploration and exploitation dichotomy: towards theory building in innovation management

    Get PDF
    The conceptual dichotomy between exploration and exploitation, importantly highlighted in March’s (1991) seminal paper, has been widely employed to study innovation management processes and resource allocation decisions in organisations. Despite its extensive usage, the validity of this dichotomy has not been subjected to adequate theoretical scrutiny and empirical support. Therefore, this thesis provides a critical examination of the origins and consequences of exploration and exploitation, and questions this dichotomy especially as pertaining to its application in innovation management. It challenges the taken-for-granted assumption that these two concepts refer to distinct and observable decision-making processes and concludes that this is an assumption largely unwarranted. A systematic literature review about the use of this dichotomy was conducted in the context of innovation management and the findings confirmed that although studies have proposed related notions, such as ambidexterity, as a way to overcome the supposed trade-off between exploration and exploitation. It is confirmed that there has been no attempt hitherto to question the validity of this dichotomy. Also, little empirical evidence was found to suggest that the understanding of managing innovation can be enhanced through a reliance on this dichotomy. Thus, it is argued that the employment of this dichotomy in practices for managing innovation has not been justified and should be investigated directly through empirical evidence. To investigate exploration and exploitation both as performance criteria and internal processes, a mixed-method design that utilises data envelopment analysis (DEA) as quantitative method, and a focus group supplemented by interviews as the qualitative method was relied on. Findings from DEA indicated that exploration and exploitation can be used as criteria for performance evaluation in innovation. However, findings from the qualitative part of the study suggested that in practices for innovation management, exploration and exploitation are not viewed as separated internal processes; hence, this distinction is not featured in decision-making during innovation processes. This means that the classification based on exploration and exploitation is not used for appraisal of activities or projects in managing innovation. It is therefore concluded that the dichotomy of exploration and exploitation is not valid in practices for innovation management and thus its application in theorising innovation should be reconsidered; thus, studies of innovation management should not unquestioningly rely on this dichotomy, because it does not reflect organisational reality. Consequently, this study contributed to innovation management literature by pointing to alternative possible directions, such as ‘problem-solving’, in theorising the processes of innovation management for future studies

    Developing techniques for enhancing comprehensibility of controlled medical terminologies

    Get PDF
    A controlled medical terminology (CMT) is a collection of concepts (or terms) that are used in the medical domain. Typically, a CMT also contains attributes of those concepts and/or relationships between those concepts. Electronic CMTs are extremely useful and important for communication between and integration of independent information systems in healthcare, because data in this area is highly fragmented. A single query in this area might involve several databases, e.g., a clinical database, a pharmacy database, a radiology database, and a lab test database. Unfortunately, the extensive sizes of CMTs, often containing tens of thousands of concepts and hundreds of thousands of relationships between pairs of those concepts, impose steep learning curves for new users of such CMTs. In this dissertation, we address the problem of helping a user to orient himself in an existing large CMT. In order to help a user comprehend a large, complex CMT, we need to provide abstract views of the CMT. However, at this time, no tools exist for providing a user with such abstract views. One reason for the lack of tools is the absence of a good theory on how to partition an overwhelming CMT into manageable pieces. In this dissertation, we try to overcome the described problem by using a threepronged approach. (1) We use the power of Object-Oriented Databases to design a schema extraction process for large, complex CMTs. The schema resulting from this process provides an excellent, compact representation of the CMT. (2) We develop a theory and a methodology for partitioning a large OODI3 schema, modeled as a graph, into small meaningful units. The methodology relies on the interaction between a human and a computer, making optimal use of the human\u27s semantic knowledge and the computer\u27s speed. Furthermore, the theory and methodology developed for the scbemalevel partitioning are also adapted to the object-level of a CMT. (3) We use purely structural similarities for partitioning CMTs, eliminating the need for a human expert in the partitioning methodology mentioned above. Two large medical terminologies are used as our test beds, the Medical Entities Dictionary (MED) and the Unified Medical Language System (UMLS), which itself contains a number of terminologies
    • …
    corecore