58 research outputs found

    Constructive Approximation and Learning by Greedy Algorithms

    Get PDF
    This thesis develops several kernel-based greedy algorithms for different machine learning problems and analyzes their theoretical and empirical properties. Greedy approaches have been extensively used in the past for tackling problems in combinatorial optimization where finding even a feasible solution can be a computationally hard problem (i.e., not solvable in polynomial time). A key feature of greedy algorithms is that a solution is constructed recursively from the smallest constituent parts. In each step of the constructive process a component is added to the partial solution from the previous step and, thus, the size of the optimization problem is reduced. The selected components are given by optimization problems that are simpler and easier to solve than the original problem. As such schemes are typically fast at constructing a solution they can be very effective on complex optimization problems where finding an optimal/good solution has a high computational cost. Moreover, greedy solutions are rather intuitive and the schemes themselves are simple to design and easy to implement. There is a large class of problems for which greedy schemes generate an optimal solution or a good approximation of the optimum. In the first part of the thesis, we develop two deterministic greedy algorithms for optimization problems in which a solution is given by a set of functions mapping an instance space to the space of reals. The first of the two approaches facilitates data understanding through interactive visualization by providing means for experts to incorporate their domain knowledge into otherwise static kernel principal component analysis. This is achieved by greedily constructing embedding directions that maximize the variance at data points (unexplained by the previously constructed embedding directions) while adhering to specified domain knowledge constraints. The second deterministic greedy approach is a supervised feature construction method capable of addressing the problem of kernel choice. The goal of the approach is to construct a feature representation for which a set of linear hypotheses is of sufficient capacity — large enough to contain a satisfactory solution to the considered problem and small enough to allow good generalization from a small number of training examples. The approach mimics functional gradient descent and constructs features by fitting squared error residuals. We show that the constructive process is consistent and provide conditions under which it converges to the optimal solution. In the second part of the thesis, we investigate two problems for which deterministic greedy schemes can fail to find an optimal solution or a good approximation of the optimum. This happens as a result of making a sequence of choices which take into account only the immediate reward without considering the consequences onto future decisions. To address this shortcoming of deterministic greedy schemes, we propose two efficient randomized greedy algorithms which are guaranteed to find effective solutions to the corresponding problems. In the first of the two approaches, we provide a mean to scale kernel methods to problems with millions of instances. An approach, frequently used in practice, for this type of problems is the Nyström method for low-rank approximation of kernel matrices. A crucial step in this method is the choice of landmarks which determine the quality of the approximation. We tackle this problem with a randomized greedy algorithm based on the K-means++ cluster seeding scheme and provide a theoretical and empirical study of its effectiveness. In the second problem for which a deterministic strategy can fail to find a good solution, the goal is to find a set of objects from a structured space that are likely to exhibit an unknown target property. This discrete optimization problem is of significant interest to cyclic discovery processes such as de novo drug design. We propose to address it with an adaptive Metropolis–Hastings approach that samples candidates from the posterior distribution of structures conditioned on them having the target property. The proposed constructive scheme defines a consistent random process and our empirical evaluation demonstrates its effectiveness across several different application domains

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    Semi-supervised and unsupervised kernel-based novelty detection with application to remote sensing images

    Get PDF
    The main challenge of new information technologies is to retrieve intelligible information from the large volume of digital data gathered every day. Among the variety of existing data sources, the satellites continuously observing the surface of the Earth are key to the monitoring of our environment. The new generation of satellite sensors are tremendously increasing the possibilities of applications but also increasing the need for efficient processing methodologies in order to extract information relevant to the users' needs in an automatic or semi-automatic way. This is where machine learning comes into play to transform complex data into simplified products such as maps of land-cover changes or classes by learning from data examples annotated by experts. These annotations, also called labels, may actually be difficult or costly to obtain since they are established on the basis of ground surveys. As an example, it is extremely difficult to access a region recently flooded or affected by wildfires. In these situations, the detection of changes has to be done with only annotations from unaffected regions. In a similar way, it is difficult to have information on all the land-cover classes present in an image while being interested in the detection of a single one of interest. These challenging situations are called novelty detection or one-class classification in machine learning. In these situations, the learning phase has to rely only on a very limited set of annotations, but can exploit the large set of unlabeled pixels available in the images. This setting, called semi-supervised learning, allows significantly improving the detection. In this Thesis we address the development of methods for novelty detection and one-class classification with few or no labeled information. The proposed methodologies build upon the kernel methods, which take place within a principled but flexible framework for learning with data showing potentially non-linear feature relations. The thesis is divided into two parts, each one having a different assumption on the data structure and both addressing unsupervised (automatic) and semi-supervised (semi-automatic) learning settings. The first part assumes the data to be formed by arbitrary-shaped and overlapping clusters and studies the use of kernel machines, such as Support Vector Machines or Gaussian Processes. An emphasis is put on the robustness to noise and outliers and on the automatic retrieval of parameters. Experiments on multi-temporal multispectral images for change detection are carried out using only information from unchanged regions or none at all. The second part assumes high-dimensional data to lie on multiple low dimensional structures, called manifolds. We propose a method seeking a sparse and low-rank representation of the data mapped in a non-linear feature space. This representation allows us to build a graph, which is cut into several groups using spectral clustering. For the semi-supervised case where few labels of one class of interest are available, we study several approaches incorporating the graph information. The class labels can either be propagated on the graph, constrain spectral clustering or used to train a one-class classifier regularized by the given graph. Experiments on the unsupervised and oneclass classification of hyperspectral images demonstrate the effectiveness of the proposed approaches

    Advanced Computational Methods for Oncological Image Analysis

    Get PDF
    [Cancer is the second most common cause of death worldwide and encompasses highly variable clinical and biological scenarios. Some of the current clinical challenges are (i) early diagnosis of the disease and (ii) precision medicine, which allows for treatments targeted to specific clinical cases. The ultimate goal is to optimize the clinical workflow by combining accurate diagnosis with the most suitable therapies. Toward this, large-scale machine learning research can define associations among clinical, imaging, and multi-omics studies, making it possible to provide reliable diagnostic and prognostic biomarkers for precision oncology. Such reliable computer-assisted methods (i.e., artificial intelligence) together with clinicians’ unique knowledge can be used to properly handle typical issues in evaluation/quantification procedures (i.e., operator dependence and time-consuming tasks). These technical advances can significantly improve result repeatability in disease diagnosis and guide toward appropriate cancer care. Indeed, the need to apply machine learning and computational intelligence techniques has steadily increased to effectively perform image processing operations—such as segmentation, co-registration, classification, and dimensionality reduction—and multi-omics data integration.

    Radioactive Waste

    Get PDF
    The safe management of nuclear and radioactive wastes is a subject that has recently received considerable recognition due to the huge volume of accumulative wastes and the increased public awareness of the hazards of these wastes. This book aims to cover the practice and research efforts that are currently conducted to deal with the technical difficulties in different radioactive waste management activities and to introduce to the non-technical factors that can affect the management practice. The collective contribution of esteem international experts has covered the science and technology of different management activities. The authors have introduced to the management system, illustrate how old management practices and radioactive accident can affect the environment and summarize the knowledge gained from current management practice and results of research efforts for using some innovative technologies in both pre-disposal and disposal activities

    Markov models of biomolecular systems

    Get PDF

    Evolution of microstructure and nanoscale chemistry of Zircaloy-2-type alloys during nuclear reactor operation

    Get PDF
    Zirconium alloys are used as fuel cladding tubes in nuclear reactors. During reactor operation, these alloys are degraded by corrosion, hydrogen pickup (HPU), and radiation-induced growth, processes influenced by the alloying elements. The alloy Zircaloy-2, which contains Sn, Fe, Cr, Ni, and O as alloying elements, is commonly used in boiling water reactors (BWRs). This thesis deals with atom probe tomography (APT) investigations of Zircaloy-2 and a similar model alloy, Alloy 2, before and after up to nine years of BWR operation. Alloy 2 contains more Fe and Cr and exhibits lower corrosion and HPU. Less than 10 wt ppm each of Fe, Cr, and Ni was observed in the matrix of as-produced Zircaloy-2 and Alloy 2 of commercial heat treatment, a consequence of very low solubility and formation of second phase particles (SPPs). After reactor exposure, these elements were found in nanoscale clusters that were located at radiation-induced 〈a〉-type dislocation loops. The amount of Fe, Cr, and Ni in clusters increased with increasing fluence. There were two main types of clusters, spheroidal Fe–Cr clusters and disc-shaped Fe–Ni clusters. On average there were no large differences in clusters before and after acceleration in degradation, only small increases in cluster number density, cluster size, and cluster Cr content. 〈c〉-component loops decorated with Sn, Fe, and Ni were observed after but not before acceleration in degradation. Sn formed a network-like structure. No differences in cluster and matrix chemistry between Zircaloy-2 and Alloy 2 were observed after reactor exposure, indicating that the improved properties of Alloy 2 are related to additional Fe and Cr being located in SPPs.It was possible to analyse the materials using voltage-pulsed APT. Voltage pulsing was needed to reliably determine Fe–Ni cluster composition and shape. Fe–Cr clusters were observed also using laser-pulsed APT. Focused-ion-beam (FIB) preparation of APT specimens at room temperature resulted in phase transformation from α-Zr to γ-hydride, whereas cryo-FIB preparation did not. The average number of ions detected before specimen fracture was higher for γ-hydride specimens. There were no significant differences in clustering of Fe, Cr, and Ni between α-Zr and γ-hydride specimens

    Proceedings of the 1st WSEAS International Conference on "Environmental and Geological Science and Engineering (EG'08)"

    Get PDF
    This book contains the proceedings of the 1st WSEAS International Conference on Environmental and Geological Science and Engineering (EG'08) which was held in Malta, September 11-13, 2008. This conference aims to disseminate the latest research and applications in Renewable Energy, Mineral Resources, Natural Hazards and Risks, Environmental Impact Assessment, Urban and Regional Planning Issues, Remote Sensing and GIS, and other relevant topics and applications. The friendliness and openness of the WSEAS conferences, adds to their ability to grow by constantly attracting young researchers. The WSEAS Conferences attract a large number of well-established and leading researchers in various areas of Science and Engineering as you can see from http://www.wseas.org/reports. Your feedback encourages the society to go ahead as you can see in http://www.worldses.org/feedback.htm The contents of this Book are also published in the CD-ROM Proceedings of the Conference. Both will be sent to the WSEAS collaborating indices after the conference: www.worldses.org/indexes In addition, papers of this book are permanently available to all the scientific community via the WSEAS E-Library. Expanded and enhanced versions of papers published in this conference proceedings are also going to be considered for possible publication in one of the WSEAS journals that participate in the major International Scientific Indices (Elsevier, Scopus, EI, ACM, Compendex, INSPEC, CSA .... see: www.worldses.org/indexes) these papers must be of high-quality (break-through work) and a new round of a very strict review will follow. (No additional fee will be required for the publication of the extended version in a journal). WSEAS has also collaboration with several other international publishers and all these excellent papers of this volume could be further improved, could be extended and could be enhanced for possible additional evaluation in one of the editions of these international publishers. Finally, we cordially thank all the people of WSEAS for their efforts to maintain the high scientific level of conferences, proceedings and journals
    • …
    corecore