8,346 research outputs found
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
A review of multi-instance learning assumptions
Multi-instance (MI) learning is a variant of inductive machine learning, where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios, including drug activity prediction and image classification, hence many MI learning algorithms have been proposed. Any MI learning method must relate instances to bag-level class labels, but many types of relationships between instances and class labels are possible. Although all early work in MI learning assumes a specific MI concept class known to be appropriate for a drug activity prediction domain; this ‘standard MI assumption’ is not guaranteed to hold in other domains. Much of the recent work in MI learning has concentrated on a relaxed view of the MI problem, where the standard MI assumption is dropped, and alternative assumptions are considered instead. However, often it is not clearly stated what particular assumption is used and how it relates to other assumptions that have been proposed. In this paper, we aim to clarify the use of alternative MI assumptions by reviewing the work done in this area
Model-Based Environmental Visual Perception for Humanoid Robots
The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling
Forecasting loss given default with the nearest neighbor algorithm
Mestrado em Matemática FinanceiraNos últimos anos, a previsão do Loss Given Default (LGD) tem sido um dos principais desafios no âmbito da gestão do risco de crédito. Investigadores académicos e profissionais da indústria bancária têm-se dedicado ao estudo deste parâmetro de risco em particular. Apesar de todas as diferentes abordagens já desenvolvidas e publicadas até hoje, a previsão do LGD continua a ser um tema de estudo académico intenso e sobre o qual ainda não existe um "consenso" metodológico na banca. Este trabalho apresenta uma abordagem alternativa para a previsão do LGD baseada na utilização de um simples, mas intuitivo, algoritmo de Machine Learning: o algoritmo nearest neighbor. De forma a avaliar a perfomance desta técnica não paramétrica na previsão do LGD, são utilizadas determinadas métricas de avaliação que permitem a comparação com um modelo paramétrico mais convencional e com a utilização do LGD médio histórico.In recent years, forecasting Loss Given Default (LGD) has been a major challenge in the field of credit risk management. Practitioners and academic researchers have focused on the study of this particular risk dimension. Despite all different approaches that have been developed and published so far, it remains an area of intense academic study and with lack of consensual solutions in the banking industry. This paper presents an LGD forecasting approach based on a simple and intuitive Machine Learning algorithm: the nearest neighbor algorithm. In order to evaluate the performance of this non parametric technique, some proper evaluation metrics are used to compare it to a more ?classical? parametric model and to the use of historical recovery rates to predict LGD
A survey on pre-processing techniques: relevant issues in the context of environmental data mining
One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.Peer ReviewedPostprint (author's final draft
- …