168 research outputs found

    Peak annotation and data analysis software tools for mass spectrometry imaging

    Get PDF
    La metabolòmica espacial és la disciplina que estudia les imatges de les distribucions de compostos químics de baix pes (metabòlits) a la superfície dels teixits biològics per revelar interaccions entre molècules. La imatge d'espectrometria de masses (MSI) és actualment la tècnica principal per obtenir informació d'imatges moleculars per a la metabolòmica espacial. MSI és una tecnologia d'imatges moleculars sense marcador que produeix espectres de masses que conserven les estructures espacials de les mostres de teixit. Això s'aconsegueix ionitzant petites porcions d'una mostra (un píxel) en un ràster definit a través de tota la seva superfície, cosa que dona com a resultat una col·lecció d'imatges de distribució de ions (registrades com a relacions massa-càrrega (m/z)) sobre la mostra. Aquesta tesi té com a objectius desenvolupar eines computacionals per a l'anotació de pics de MSI i el disseny de fluxos de treball per a l'anàlisi estadística i multivariant de dades MSI, inclosa la segmentació espacial. El treball realitzat en aquesta tesi es pot separar clarament en dues parts. En primer lloc, el desenvolupament d'una eina d'anotació de pics d'isòtops i adductes adequada per facilitar la identificació de compostos de rang de massa baix. Ara podem trobar fàcilment ions monoisotòpics als nostres conjunts de dades MSI gràcies al paquet de programari rMSIannotation. En segon lloc, el desenvolupament de eines de programari per a l’anàlisi de dades i la segmentació espacial basades en soft clustering per a dades MSI.La metabolómica espacial es la disciplina que estudia las imágenes de las distribuciones de compuestos químicos de bajo peso (metabolitos) en la superficie de los tejidos biológicos para revelar interacciones entre moléculas. Las imágenes de espectrometría de masas (MSI) es actualmente la principal técnica para obtener información de imágenes moleculares para la metabolómica espacial. MSI es una tecnología de imágenes moleculares sin marcador que produce espectros de masas que conservan las estructuras espaciales de las muestras de tejido. Esto se logra ionizando pequeñas porciones de una muestra (un píxel) en un ráster definido a través de toda su superficie, lo que da como resultado una colección de imágenes de distribución de iones (registradas como relaciones masa-carga (m/z)) sobre la muestra. Esta tesis tiene como objetivo desarrollar herramientas computacionales para la anotación de picos en MSI y en el diseño de flujos de trabajo para el análisis estadístico y multivariado de datos MSI, incluida la segmentación espacial. El trabajo realizado en esta tesis se puede separar claramente en dos partes. En primer lugar, el desarrollo de una herramienta de anotación de picos de isótopos y aductos adecuada para facilitar la identificación de compuestos de bajo rango de masa. Ahora podemos encontrar fácilmente iones monoisotópicos en nuestros conjuntos de datos MSI gracias al paquete de software rMSIannotation.Spatial metabolomics is the discipline that studies the images of the distributions of low weight chemical compounds (metabolites) on the surface of biological tissues to unveil interactions between molecules. Mass spectrometry imaging (MSI) is currently the principal technique to get molecular imaging information for spatial metabolomics. MSI is a labelfree molecular imaging technology that produces mass spectra preserving the spatial structures of tissue samples. This is achieved by ionizing small portions of a sample (a pixel) in a defined raster through all its surface, which results in a collection of ion distribution images (registered as mass-to-charge ratios (m/z)) over the sample. This thesis is aimed to develop computational tools for peak annotation in MSI and in the design of workflows for the statistical and multivariate analysis of MSI data, including spatial segmentation. The work carried out in this thesis can be clearly separated in two parts. Firstly, the development of an isotope and adduct peak annotation tool suited to facilitate the identification of the low mass range compounds. We can now easily find monoisotopic ions in our MSI datasets thanks to the rMSIannotation software package. Secondly, the development of software tools for data analysis and spatial segmentation based on soft clustering for MSI data. In this thesis, we have developed tools and methodologies to search for significant ions (rMSIKeyIon software package) and for the soft clustering of tissues (Fuzzy c-means algorithm)

    Computational methods to predict and enhance decision-making with biomedical data.

    Get PDF
    The proposed research applies machine learning techniques to healthcare applications. The core ideas were using intelligent techniques to find automatic methods to analyze healthcare applications. Different classification and feature extraction techniques on various clinical datasets are applied. The datasets include: brain MR images, breathing curves from vessels around tumor cells during in time, breathing curves extracted from patients with successful or rejected lung transplants, and lung cancer patients diagnosed in US from in 2004-2009 extracted from SEER database. The novel idea on brain MR images segmentation is to develop a multi-scale technique to segment blood vessel tissues from similar tissues in the brain. By analyzing the vascularization of the cancer tissue during time and the behavior of vessels (arteries and veins provided in time), a new feature extraction technique developed and classification techniques was used to rank the vascularization of each tumor type. Lung transplantation is a critical surgery for which predicting the acceptance or rejection of the transplant would be very important. A review of classification techniques on the SEER database was developed to analyze the survival rates of lung cancer patients, and the best feature vector that can be used to predict the most similar patients are analyzed

    Fuzzy Logic

    Get PDF
    Fuzzy Logic is becoming an essential method of solving problems in all domains. It gives tremendous impact on the design of autonomous intelligent systems. The purpose of this book is to introduce Hybrid Algorithms, Techniques, and Implementations of Fuzzy Logic. The book consists of thirteen chapters highlighting models and principles of fuzzy logic and issues on its techniques and implementations. The intended readers of this book are engineers, researchers, and graduate students interested in fuzzy logic systems

    Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

    Get PDF
    Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science

    Multimodel Approaches for Plasma Glucose Estimation in Continuous Glucose Monitoring. Development of New Calibration Algorithms

    Full text link
    ABSTRACT Diabetes Mellitus (DM) embraces a group of metabolic diseases which main characteristic is the presence of high glucose levels in blood. It is one of the diseases with major social and health impact, both for its prevalence and also the consequences of the chronic complications that it implies. One of the research lines to improve the quality of life of people with diabetes is of technical focus. It involves several lines of research, including the development and improvement of devices to estimate "online" plasma glucose: continuous glucose monitoring systems (CGMS), both invasive and non-invasive. These devices estimate plasma glucose from sensor measurements from compartments alternative to blood. Current commercially available CGMS are minimally invasive and offer an estimation of plasma glucose from measurements in the interstitial fluid CGMS is a key component of the technical approach to build the artificial pancreas, aiming at closing the loop in combination with an insulin pump. Yet, the accuracy of current CGMS is still poor and it may partly depend on low performance of the implemented Calibration Algorithm (CA). In addition, the sensor-to-patient sensitivity is different between patients and also for the same patient in time. It is clear, then, that the development of new efficient calibration algorithms for CGMS is an interesting and challenging problem. The indirect measurement of plasma glucose through interstitial glucose is a main confounder of CGMS accuracy. Many components take part in the glucose transport dynamics. Indeed, physiology might suggest the existence of different local behaviors in the glucose transport process. For this reason, local modeling techniques may be the best option for the structure of the desired CA. Thus, similar input samples are represented by the same local model. The integration of all of them considering the input regions where they are valid is the final model of the whole data set. Clustering is tBarceló Rico, F. (2012). Multimodel Approaches for Plasma Glucose Estimation in Continuous Glucose Monitoring. Development of New Calibration Algorithms [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17173Palanci

    Big Earth Data and Machine Learning for Sustainable and Resilient Agriculture

    Full text link
    Big streams of Earth images from satellites or other platforms (e.g., drones and mobile phones) are becoming increasingly available at low or no cost and with enhanced spatial and temporal resolution. This thesis recognizes the unprecedented opportunities offered by the high quality and open access Earth observation data of our times and introduces novel machine learning and big data methods to properly exploit them towards developing applications for sustainable and resilient agriculture. The thesis addresses three distinct thematic areas, i.e., the monitoring of the Common Agricultural Policy (CAP), the monitoring of food security and applications for smart and resilient agriculture. The methodological innovations of the developments related to the three thematic areas address the following issues: i) the processing of big Earth Observation (EO) data, ii) the scarcity of annotated data for machine learning model training and iii) the gap between machine learning outputs and actionable advice. This thesis demonstrated how big data technologies such as data cubes, distributed learning, linked open data and semantic enrichment can be used to exploit the data deluge and extract knowledge to address real user needs. Furthermore, this thesis argues for the importance of semi-supervised and unsupervised machine learning models that circumvent the ever-present challenge of scarce annotations and thus allow for model generalization in space and time. Specifically, it is shown how merely few ground truth data are needed to generate high quality crop type maps and crop phenology estimations. Finally, this thesis argues there is considerable distance in value between model inferences and decision making in real-world scenarios and thereby showcases the power of causal and interpretable machine learning in bridging this gap.Comment: Phd thesi

    Human Factors in Agile Software Development

    Full text link
    Through our four years experiments on students' Scrum based agile software development (ASD) process, we have gained deep understanding into the human factors of agile methodology. We designed an agile project management tool - the HASE collaboration development platform to support more than 400 students self-organized into 80 teams to practice ASD. In this thesis, Based on our experiments, simulations and analysis, we contributed a series of solutions and insights in this researches, including 1) a Goal Net based method to enhance goal and requirement management for ASD process, 2) a novel Simple Multi-Agent Real-Time (SMART) approach to enhance intelligent task allocation for ASD process, 3) a Fuzzy Cognitive Maps (FCMs) based method to enhance emotion and morale management for ASD process, 4) the first large scale in-depth empirical insights on human factors in ASD process which have not yet been well studied by existing research, and 5) the first to identify ASD process as a human-computation system that exploit human efforts to perform tasks that computers are not good at solving. On the other hand, computers can assist human decision making in the ASD process.Comment: Book Draf

    Demand Response in Smart Grids

    Get PDF
    The Special Issue “Demand Response in Smart Grids” includes 11 papers on a variety of topics. The success of this Special Issue demonstrates the relevance of demand response programs and events in the operation of power and energy systems at both the distribution level and at the wide power system level. This reprint addresses the design, implementation, and operation of demand response programs, with focus on methods and techniques to achieve an optimized operation as well as on the electricity consumer

    Geographical information modelling for land resource survey

    Get PDF
    The increasing popularity of geographical information systems (GIS) has at least three major implications for land resources survey. Firstly, GIS allows alternative and richer representation of spatial phenomena than is possible with the traditional paper map. Secondly, digital technology has improved the accessibility of ancillary data, such as digital elevation models and remotely sensed imagery, and the possibilities of incorporating these into target database production. Thirdly, owing to the greater distance between data producers and consumers there is a greater need for uncertainty analysis. However, partly due to disciplinary gaps, the introduction of GIS has not resulted in a thorough adjustment of traditional survey methods. Against this background, the overall objective of this study was to explore and demonstrate the utility of new concepts and tools within the context of pedological and agronomical land surveys. To this end, research was conducted on the interface between five fields of study: geographic information theory, land resource survey, remote sensing, statistics and fuzzy set theory. A demonstration site was chosen around the village of Alora in southern Spain.Fuzzy set theory provides a formalism to deal with classes that are partly indistinct as a result of vague class intensions. Fuzzy sets are characterised by membership functions that assign real numbers from the interval [0, 1] to elements, thereby indicating the grade of membership in that set. When fuzzy membership functions are used to classify attribute data linked to geometrical elements, presence of spatial dependence among these elements ensures that they form spatially contiguous regions. These can be interpreted as objects with indeterminate boundaries or fuzzy objects. Fuzzy set theory thus adds to the conventional conceptual data models that assume either discrete spatial objects or continuous fields.This thesis includes two case studies that demonstrate the use of the fuzzy set theory in the acquisition and querying of geographical information. The first study explored the use of fuzzy c -means clustering of attribute data derived from a digital elevation model to represent transition zones in a soil-landscape model. Validity evaluation of the resulting terrain descriptions was based on the coefficient of determination of regressing topsoil clay data on membership grades. Vaguely bounded regions were more closely related to the observed variation of clay content () than crisply bounded units as used in a conventional soil survey ().The second case study involved the use of the fuzzy set theory in querying uncertain geographical data. It explains differences between fuzziness and stochastic uncertainty on the basis of an example query concerning loss of forest and ease of access. Relationships between probabilities and fuzzy set memberships were established using a linguistic probability qualifier (high probability) and the expectation of a membership function defined on a stochastic travel time. Fuzzy query processing was compared with crisp processing. The fuzzy query response contained more information because, unlike the crisp response, it indicated the degree to which individual locations matched the vague selection criteria.In a land resource survey, data acquisition typically involves collecting a small sample of precisely measured primary data as well as a larger or even exhaustive sample of related secondary data. Soil surveyors often rely on soil-landscape relationships and image interpretation to enable efficient mapping of soil properties. Yet, they generally fail to communicate about the knowledge and methods employed in deriving map units and statements about their content.In this thesis, a methodological framework is formulated and demonstrated that takes advantage of GIS to interactively formalise soil-landscape knowledge using stepwise image interpretation and inductive learning of soil-landscape relationships. It examines topology to record potential part of links between hierarchically nested terrain objects corresponding to distinct soil formation regimes. These relationships can be applied in similar areas to facilitate image interpretation by restricting possible lower level objects. GIS visualisation tools can be used to create images (e.g. perspective views) illustrating the landscape configuration of interpreted terrain objects. The framework is expected to support different methods for analysing and describing soil variation in relation to a terrain description, including those requiring alternative conceptual data models. In this thesis, though, it is only demonstrated with the discrete object model.Satellite remote sensing has become an important tool in land cover mapping, providing an attractive supplement to relatively inefficient ground surveys. A common approach to extract land cover data from remotely sensed imagery is by probabilistic classification of multispectral data. Additional information can be incorporated into such classification, for example by translating it into Bayesian prior probabilities for each land cover type. This is particularly advantageous in the case of spectral overlap among target classes, i.e. when unequivocal class assignment based on spectral data alone is impossible.This thesis demonstrates a procedure to iteratively estimate regional prior class probabilities pertaining to areas resulting from image stratification. This method thus allows the incorporation of additional information into the classification process without the requirement of known prior class probabilities. The demonstration project involved Landsat TM imagery from 1984 and 1995. Image stratification was based on a geological map of the study area. Overall classification accuracy improved from 76% to 90% (1984) and from 64% to 69% (1995) when employing iteratively estimated prior probabilities.The fact that any landscape description is a model based on a limited sample of measured target attribute data implies that it is never completely certain. The presence of error or inaccuracy in the data contributes significantly to such uncertainty. Usually, the accuracy of land survey datasets is indicated using global indices (e.g. see above). Error modelling, on the other hand, allows an indication of the spatial distribution of possible map inaccuracies to be given. This study explored two approaches to error modelling, which are demonstrated within the context of land cover analysis using remotely sensed imagery.The first approach involves the use of local class probabilities conditional to the pixels' spectral data. These probabilities are intermediate results of probabilistic image classification and indicate the magnitude and distribution of classification uncertainty. A case study demonstrated the implication of such uncertainty on change detection by comparing independently classified images. A major shortcoming of this approach is that it implicitly assumes data in neighbouring pixels to be independent. Moreover, it does not make full use of available reference data as it ignores their spatial component. It does not consider data locations nor does it use spatial dependence models that can be derived from the reference data.The assumption of independent pixels obviously impedes proper assessment of spatial uncertainty, such as joint uncertainty about the land cover class at several pixels taken together. Therefore, the second approach was based on geostatistical methods, which exploit spatial dependence rather than ignoring it. It is demonstrated how the above conditional probabilities can be updated by conditioning on sampled reference data at their locations. Stochastic simulation was used to generate a set of 500 equally probable maps, from which uncertainties regarding the spatial extent of contiguous olive orchards could be inferred.Future challenges include studies on other quality aspects of land survey datasets. The present research was limited to uncertainty analysis, so that, for example, data precision and fitness for use were not addressed. Other potential extensions to this work concern full inclusion of the third spatial dimension and modelling of temporal aspects.</p
    corecore