1,633 research outputs found

    Artificial neural networks for diagnosis and survival prediction in colon cancer

    Get PDF
    ANNs are nonlinear regression computational devices that have been used for over 45 years in classification and survival prediction in several biomedical systems, including colon cancer. Described in this article is the theory behind the three-layer free forward artificial neural networks with backpropagation error, which is widely used in biomedical fields, and a methodological approach to its application for cancer research, as exemplified by colon cancer. Review of the literature shows that applications of these networks have improved the accuracy of colon cancer classification and survival prediction when compared to other statistical or clinicopathological methods. Accuracy, however, must be exercised when designing, using and publishing biomedical results employing machine-learning devices such as ANNs in worldwide literature in order to enhance confidence in the quality and reliability of reported data

    A new toolbox to distinguish the sources of spatial memory error

    Get PDF

    Estimating an NBA player's impact on his team's chances of winning

    Full text link
    Traditional NBA player evaluation metrics are based on scoring differential or some pace-adjusted linear combination of box score statistics like points, rebounds, assists, etc. These measures treat performances with the outcome of the game still in question (e.g. tie score with five minutes left) in exactly the same way as they treat performances with the outcome virtually decided (e.g. when one team leads by 30 points with one minute left). Because they ignore the context in which players perform, these measures can result in misleading estimates of how players help their teams win. We instead use a win probability framework for evaluating the impact NBA players have on their teams' chances of winning. We propose a Bayesian linear regression model to estimate an individual player's impact, after controlling for the other players on the court. We introduce several posterior summaries to derive rank-orderings of players within their team and across the league. This allows us to identify highly paid players with low impact relative to their teammates, as well as players whose high impact is not captured by existing metrics.Comment: To appear in the Journal of Quantitative Analysis of Spor

    Uncertainty modelling in power spectrum estimation of environmental processes

    Get PDF
    For efficient reliability analysis of buildings and structures, robust load models are required in stochastic dynamics, which can be estimated in particular from environmental processes, such as earthquakes or wind loads. To determine the response behaviour of a dynamic system under such loads, the power spectral density (PSD) function is a widely used tool for identifying the frequency components and corresponding amplitudes of environmental processes. Since the real data records required for this purpose are often subject to aleatory and epistemic uncertainties, and the PSD estimation process itself can induce further uncertainties, a rigorous quantification of these is essential, as otherwise a highly inaccurate load model could be generated which may yield in misleading simulation results. A system behaviour that is actually catastrophic can thus be shifted into an acceptable range, classifying the system as safe even though it is exposed to a high risk of damage or collapse. To address these issues, alternative loading models are proposed using probabilistic and non-deterministic models, that are able to efficiently account for these uncertainties and to model the loadings accordingly. Various methods are used in the generation of these load models, which are selected in particular according to the characteristic of the data and the number of available records. In case multiple data records are available, reliable statistical information can be extracted from a set of similar PSD functions that differ, for instance, only slightly in shape and peak frequency. Based on these statistics, a PSD function model is derived utilising subjective probabilities to capture the epistemic uncertainties and represent this information effectively. The spectral densities are characterised as random variables instead of employing discrete values, and thus the PSD function itself represents a non-stationary random process comprising a range of possible valid PSD functions for a given data set. If only a limited amount of data records is available, it is not possible to derive such reliable statistical information. Therefore, an interval-based approach is proposed that determines only an upper and lower bound and does not rely on any distribution within these bounds. A set of discrete-valued PSD functions is transformed into an interval-valued PSD function by optimising the weights of pre-derived basis functions from a Radial Basis Function Network such that they compose an upper and lower bound that encompasses the data set. Therefore, a range of possible values and system responses are identified rather than discrete values, which are able to quantify the epistemic uncertainties. When generating such a load model using real data records, the problem can arise that the individual records exhibit a high spectral variance in the frequency domain and therefore differ too much from each other, although they appear to be similar in the time domain. A load model derived from these data may not cover the entire spectral range and is therefore not representative. The data are therefore grouped according to their similarity using the Bhattacharyya distance and k-means algorithm, which may generate two or more load models from the entire data set. These can be applied separately to the structure under investigation, leading to more accurate simulation results. This approach can also be used to estimate the spectral similarity of individual data sets in the frequency domain, which is particularly relevant for the load models mentioned above. If the uncertainties are modelled directly in the time signal, it can be a challenging task to transform them efficiently into the frequency domain. Such a signal may consist only of reliable bounds in which the actual signal lies. A method is presented that can automatically propagate this interval uncertainty through the discrete Fourier transform, obtaining the exact bounds on the Fourier amplitude and an estimate of the PSD function. The method allows such an interval signal to be propagated without making assumptions about the dependence and distribution of the error over the time steps. These novel representations of load models are able to quantify epistemic uncertainties inherent in real data records and induced due to the PSD estimation process. The strengths and advantages of these approaches in practice are demonstrated by means of several numerical examples concentrated in the field of stochastic dynamics.Für eine effiziente Zuverlässigkeitsanalyse von Gebäuden und Strukturen sind robuste Belastungsmodelle in der stochastischen Dynamik erforderlich, die insbesondere aus Umweltprozessen wie Erdbeben oder Windlasten geschätzt werden können. Um das Antwortverhalten eines dynamischen Systems unter solchen Belastungen zu bestimmen, ist die Funktion der Leistungsspektraldichte (PSD) ein weit verbreitetes Werkzeug zur Identifizierung der Frequenzkomponenten und der entsprechenden Amplituden von Umweltprozessen. Da die zu diesem Zweck benötigten realen Datensätze häufig mit aleatorischen und epistemischen Unsicherheiten behaftet sind und der PSD-Schätzprozess selbst weitere Unsicherheiten induzieren kann, ist eine strenge Quantifizierung dieser Unsicherheiten unerlässlich, da andernfalls ein sehr ungenaues Belastungsmodell erzeugt werden könnte, das zu fehlerhaften Simulationsergebnissen führen kann. Ein eigentlich katastrophales Systemverhalten kann so in einen akzeptablen Bereich verschoben werden, so dass das System als sicher eingestuft wird, obwohl es einem hohen Risiko der Beschädigung oder des Zusammenbruchs ausgesetzt ist. Um diese Probleme anzugehen, werden alternative Belastungsmodelle vorgeschlagen, die probabilistische und nicht-deterministische Modelle verwenden, welche in der Lage sind, diese Unsicherheiten effizient zu berücksichtigen und die Belastungen entsprechend zu modellieren. Bei der Erstellung dieser Lastmodelle werden verschiedene Methoden verwendet, die insbesondere nach dem Charakter der Daten und der Anzahl der verfügbaren Datensätze ausgewählt werden. Wenn mehrere Datensätze verfügbar sind, können zuverlässige statistische Informationen aus einer Reihe ähnlicher PSD-Funktionen extrahiert werden, die sich z.B. nur geringfügig in Form und Spitzenfrequenz unterscheiden. Auf der Grundlage dieser Statistiken wird ein Modell der PSD-Funktion abgeleitet, das subjektive Wahrscheinlichkeiten verwendet, um die epistemischen Unsicherheiten zu erfassen und diese Informationen effektiv darzustellen. Die spektralen Leistungsdichten werden als Zufallsvariablen charakterisiert, anstatt diskrete Werte zu verwenden, somit stellt die PSD-Funktion selbst einen nicht-stationären Zufallsprozess dar, der einen Bereich möglicher gültiger PSD-Funktionen für einen gegebenen Datensatz umfasst. Wenn nur eine begrenzte Anzahl von Datensätzen zur Verfügung steht, ist es nicht möglich, solche zuverlässigen statistischen Informationen abzuleiten. Daher wird ein intervallbasierter Ansatz vorgeschlagen, der nur eine obere und untere Grenze bestimmt und sich nicht auf eine Verteilung innerhalb dieser Grenzen stützt. Ein Satz von diskret wertigen PSD-Funktionen wird in eine intervallwertige PSD-Funktion umgewandelt, indem die Gewichte von vorab abgeleiteten Basisfunktionen aus einem Radialbasisfunktionsnetz so optimiert werden, dass sie eine obere und untere Grenze bilden, die den Datensatz umfassen. Damit wird ein Bereich möglicher Werte und Systemreaktionen anstelle diskreter Werte ermittelt, welche in der Lage sind, epistemische Unsicherheiten zu erfassen. Bei der Erstellung eines solchen Lastmodells aus realen Datensätzen kann das Problem auftreten, dass die einzelnen Datensätze eine hohe spektrale Varianz im Frequenzbereich aufweisen und sich daher zu stark voneinander unterscheiden, obwohl sie im Zeitbereich ähnlich erscheinen. Ein aus diesen Daten abgeleitetes Lastmodell deckt möglicherweise nicht den gesamten Spektralbereich ab und ist daher nicht repräsentativ. Die Daten werden daher mit Hilfe der Bhattacharyya-Distanz und des k-means-Algorithmus nach ihrer Ähnlichkeit gruppiert, wodurch zwei oder mehr Belastungsmodelle aus dem gesamten Datensatz erzeugt werden können. Diese können separat auf die zu untersuchende Struktur angewandt werden, was zu genaueren Simulationsergebnissen führt. Dieser Ansatz kann auch zur Schätzung der spektralen Ähnlichkeit einzelner Datensätze im Frequenzbereich verwendet werden, was für die oben genannten Lastmodelle besonders relevant ist. Wenn die Unsicherheiten direkt im Zeitsignal modelliert werden, kann es eine schwierige Aufgabe sein, sie effizient in den Frequenzbereich zu transformieren. Ein solches Signal kann möglicherweise nur aus zuverlässigen Grenzen bestehen, in denen das tatsächliche Signal liegt. Es wird eine Methode vorgestellt, mit der diese Intervallunsicherheit automatisch durch die diskrete Fourier Transformation propagiert werden kann, um die exakten Grenzen der Fourier-Amplitude und der Schätzung der PSD-Funktion zu erhalten. Die Methode ermöglicht es, ein solches Intervallsignal zu propagieren, ohne Annahmen über die Abhängigkeit und Verteilung des Fehlers über die Zeitschritte zu treffen. Diese neuartigen Darstellungen von Lastmodellen sind in der Lage, epistemische Unsicherheiten zu quantifizieren, die in realen Datensätzen enthalten sind und durch den PSD-Schätzprozess induziert werden. Die Stärken und Vorteile dieser Ansätze in der Praxis werden anhand mehrerer numerischer Beispiele aus dem Bereich der stochastischen Dynamik demonstriert

    ISIPTA'07: Proceedings of the Fifth International Symposium on Imprecise Probability: Theories and Applications

    Get PDF
    B

    Cost-sensitive ensemble learning: a unifying framework

    Get PDF
    Over the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different costs. Our contribution is a unifying framework that provides a comprehensive and insightful overview on cost-sensitive ensemble methods, pinpointing their differences and similarities via a fine-grained categorization. Our framework contains natural extensions and generalisations of ideas across methods, be it AdaBoost, Bagging or Random Forest, and as a result not only yields all methods known to date but also some not previously considered.publishedVersio

    A Risk-Based IoT Decision-Making Framework Based on Literature Review with Human Activity Recognition Case Studies

    Get PDF
    The Internet of Things (IoT) is a key and growing technology for many critical real-life applications, where it can be used to improve decision making. The existence of several sources of uncertainty in the IoT infrastructure, however, can lead decision makers into taking inappropriate actions. The present work focuses on proposing a risk-based IoT decision-making framework in order to effectively manage uncertainties in addition to integrating domain knowledge in the decision-making process. A structured literature review of the risks and sources of uncertainty in IoT decision-making systems is the basis for the development of the framework and Human Activity Recognition (HAR) case studies. More specifically, as one of the main targeted challenges, the potential sources of uncertainties in an IoT framework, at different levels of abstraction, are firstly reviewed and then summarized. The modules included in the framework are detailed, with the main focus given to a novel risk-based analytics module, where an ensemble-based data analytic approach, called Calibrated Random Forest (CRF), is proposed to extract useful information while quantifying and managing the uncertainty associated with predictions, by using confidence scores. Its output is subsequently integrated with domain knowledge-based action rules to perform decision making in a cost-sensitive and rational manner. The proposed CRF method is firstly evaluated and demonstrated on a HAR scenario in a Smart Home environment in case study I and is further evaluated and illustrated with a remote health monitoring scenario for a diabetes use case in case study II. The experimental results indicate that using the framework’s raw sensor data can be converted into meaningful actions despite several sources of uncertainty. The comparison of the proposed framework to existing approaches highlights the key metrics that make decision making more rational and transparent
    corecore