6,127 research outputs found

    Neural Gas based classification of Globular Clusters

    Full text link
    Within scientific and real life problems, classification is a typical case of extremely complex tasks in data-driven scenarios, especially if approached with traditional techniques. Machine Learning supervised and unsupervised paradigms, providing self-adaptive and semi-automatic methods, are able to navigate into large volumes of data characterized by a multi-dimensional parameter space, thus representing an ideal method to disentangle classes of objects in a reliable and efficient way. In Astrophysics, the identification of candidate Globular Clusters through deep, wide-field, single band images, is one of such cases where self-adaptive methods demonstrated a high performance and reliability. Here we experimented some variants of the known Neural Gas model, exploring both supervised and unsupervised paradigms of Machine Learning for the classification of Globular Clusters. Main scope of this work was to verify the possibility to improve the computational efficiency of the methods to solve complex data-driven problems, by exploiting the parallel programming with GPU framework. By using the astrophysical playground, the goal was to scientifically validate such kind of models for further applications extended to other contexts.Comment: 15 pages, 3 figures, to appear in the Volume of Springer Communications in Computer and Information Science (CCIS). arXiv admin note: substantial text overlap with arXiv:1710.0390

    The Role of Cognitive Disposition in Re-examining the Privacy Paradox: A Neuroscience Study

    Get PDF
    The privacy paradox is a phenomenon whereby individuals continue to disclose their personal information, contrary to their claim of concerns for the privacy of their personal information. This study investigated the privacy paradox to better understand individuals\u27 decisions to disclose or withhold their personal information. The study argued that individuals’ decisions are based on a cognitive disposition, which involves both rational and emotional mental processes. While the extended privacy calculus model was used as the theoretical basis for the study, the findings of cognitive neuroscience was applied to it to address its limitation in assuming individuals are purely rational decision-makers. Three within-subjects experiments were conducted whereby each subject participated in all three experiments as if it were one. Experiment 1 captured the neural correlates of mental processes involved in privacy-related decisions, while experiment 2 and 3 were factorial-design experiments used for testing the relationship of neural correlates in predicting privacy concerns and personal information disclosure. The findings of this study indicated that at least one neural correlate of every mental process involved in privacy-related decisions significantly influenced personal information disclosure, except for uncertainty. However, there were no significant relationships between mental processes and privacy concerns, except Brodmann’s Area 13, a neural correlate of distrust. This relationship, however, had a positive relationship with privacy concerns, opposite to what was hypothesized. Furthermore, interaction effects indicated that individuals put more emphasis on negative perceptions in privacy-related situations. This study contributed to the information privacy field by supporting the argument that individuals’ privacy-related decisions are both rational and emotional. Specifically, the privacy paradox cannot be explained through solely rational cost-benefit analysis or through an examination of individuals’ emotions alone

    Nichtlineare Merkmalsselektion mit der generalisierten Transinformation

    Get PDF
    In the context of information theory, the term Mutual Information has first been formulated by Claude Elwood Shannon. Information theory is the consistent mathematical description of technical communication systems. To this day, it is the basis of numerous applications in modern communications engineering and yet became indispensable in this field. This work is concerned with the development of a concept for nonlinear feature selection from scalar, multivariate data on the basis of the mutual information. From the viewpoint of modelling, the successful construction of a realistic model depends highly on the quality of the employed data. In the ideal case, high quality data simply consists of the relevant features for deriving the model. In this context, it is important to possess a suitable method for measuring the degree of the, mostly nonlinear, dependencies between input- and output variables. By means of such a measure, the relevant features could be specifically selected. During the course of this work, it will become evident that the mutual information is a valuable and feasible measure for this task and hence the method of choice for practical applications. Basically and without the claim of being exhaustive, there are two possible constellations that recommend the application of feature selection. On the one hand, feature selection plays an important role, if the computability of a derived system model cannot be guaranteed, due to a multitude of available features. On the other hand, the existence of very few data points with a significant number of features also recommends the employment of feature selection. The latter constellation is closely related to the so called "Curse of Dimensionality". The actual statement behind this is the necessity to reduce the dimensionality to obtain an adequate coverage of the data space. In other word, it is important to reduce the dimensionality of the data, since the coverage of the data space exponentially decreases, for a constant number of data points, with the dimensionality of the available data. In the context of mapping between input- and output space, this goal is ideally reached by selecting only the relevant features from the available data set. The basic idea for this work has its origin in the rather practical field of automotive engineering. It was motivated by the goals of a complex research project in which the nonlinear, dynamic dependencies among a multitude of sensor signals should be identified. The final goal of such activities was to derive so called virtual sensors from identified dependencies among the installed automotive sensors. This enables the real-time computability of the required variable without the expenses of additional hardware. The prospect of doing without additional computing hardware is a strong motive force in particular in automotive engineering. In this context, the major problem was to find a feasible method to capture the linear- as well as the nonlinear dependencies. As mentioned before, the goal of this work is the development of a flexibly applicable system for nonlinear feature selection. The important point here is to guarantee the practicable computability of the developed method even for high dimensional data spaces, which are rather realistic in technical environments. The employed measure for the feature selection process is based on the sophisticated concept of mutual information. The property of the mutual information, regarding its high sensitivity and specificity to linear- and nonlinear statistical dependencies, makes it the method of choice for the development of a highly flexible, nonlinear feature selection framework. In addition to the mere selection of relevant features, the developed framework is also applicable for the nonlinear analysis of the temporal influences of the selected features. Hence, a subsequent dynamic modelling can be performed more efficiently, since the proposed feature selection algorithm additionally provides information about the temporal dependencies between input- and output variables. In contrast to feature extraction techniques, the developed feature selection algorithm in this work has another considerable advantage. In the case of cost intensive measurements, the variables with the highest information content can be selected in a prior feasibility study. Hence, the developed method can also be employed to avoid redundance in the acquired data and thus prevent for additional costs.Der Begriff der Transinformation wurde erstmals von Claude Elwood Shannon im Kontext der Informationstheorie, einer einheitlichen mathematischen Beschreibung technischer Kommunikationssysteme, geprägt. Die vorliegenden Arbeit befaßt sich vor diesem Hintergrund mit der Entwicklung einer in der Praxis anwendbaren Methodik zur nichtlinearen Merkmalselektion quantitativer, multivariater Daten auf der Basis des bereits erwähnten informationstheoretischen Ansatzes der Transinformation. Der Erfolg beim Übergang von realen Meßdaten zu einer geeigneten Modellbeschreibung wird maßgeblich von der Qualität der verwendeten Datenmengen bestimmt. Eine qualitativ hochwertige Datenmenge besteht im Idealfall ausschließlich aus den für eine erfolgreiche Modellformulierung relevanten Daten. In diesem Kontext stellt sich daher sofort die Frage nach der Existenz eines geeigneten Maßes, um den Grad des, im Allgemeinen nichtlinearen, funktionalen Zusammenhangs zwischen Ein- und Ausgaben quantitativ korrekt erfassen zu können. Mit Hilfe einer solchen Größe können die relevanten Merkmale gezielt ausgewählt und somit von den redundanten Merkmalen getrennt werden. Im Verlaufe dieser Arbeit wird deutlich werden, daß die eingangs erwähnte Transinformation ein hierfür geeignetes Maß darstellt und im praktischen Einsatz bestens bestehen kann. Die ursprüngliche Motivation zur Erstellung der vorliegenden Arbeit hat ihren durchaus praktischen Hintergrund in der Automobiltechnik. Sie entstand im Rahmen eines komplexen Forschungsprojektes zur Ermittlung von nichtlinearen, dynamischen Zusammenhängen zwischen einer Vielzahl von meßtechnisch ermittelten Sensorsignalen. Das Ziel dieser Aktivitäten war, durch die Identifikation von nichtlinearen, dynamischen Zusammenhängen zwischen den im Automobil verbauten Sensoren, sog. virtuelle Sensoren abzuleiten. Die konkrete Aufgabenstellung bestand nun darin, die Bestimmung einer zentralen Motorgröße so effizient zu gestalten, daß diese ohne zusätzliche Hardware unter harten Echtzeitvorgaben berechenbar ist. Auf den zusätzlichen Einsatz von Hardware verzichten zu können und mit der bereits vorhandenen Rechenleistung auszukommen, stellt aufgrund des resultierenden, enormen Kostenaufwandes insbesondere in der Automobiltechnik eine unglaublich starke Motivation dar. In diesem Zusammenhang trat immer wieder die große Problematik zutage, eine praktisch berechenbare Methode zu finden, die sowohl lineare- als auch nichtlineare Zusammenhänge zuverlässig quantitativ erfassen kann. Im Verlauf der Arbeit werden nun unterschiedliche Selektionsstrategien mit der Transinformation kombiniert und deren Eigenschaften miteinander verglichen. In diesem Zusammenhang erweist sich die Kombination von Transinformation mit der sogenannten Forward Selection Strategie als besonders interessant. Es wird gezeigt, daß diese Kombination die praktische Berechenbarkeit für hochdimensionale Datenräume, im Vergleich zu anderen Vorgehensweisen, tatsächlich erst ermöglicht. Im Anschluß daran wird die Konvergenz dieses neuen Verfahrens zur Merkmalselektion bewiesen. Wir werden weiterhin sehen, daß die erzielten Ergebnisse bemerkenswert nahe an der optimalen Lösung liegen und im Vergleich mit einer alternativen Selektionsstrategie deutlich überlegen sind. Parallel zur eigentlichen Selektion der relevanten Merkmale ist es mit der in dieser Arbeit entwickelten Methode nun auch problemlos möglich, eine nichtlineare Analyse der zeitlichen Abhängigkeiten von ausgewählten Merkmalen durchzuführen. Eine anschließende dynamische Modellierung kann somit wesentlich effizienter durchgeführt werden, da die entwickelte Merkmalselektion zusätzliche Information hinsichtlich des dynamischen Zusammenhangs von Eingangs- und Ausgangsdaten liefert. Mit der in dieser Arbeit entwickelten Methode ist nun letztendlich gelungen was vorher nicht möglich war. Das quantitative Erfassen der nichtlinearen Zusammenhänge zwischen dedizierten Sensorsignalen, um diese in eine effiziente Merkmalselektion einfließen zu lassen. Im Gegensatz zur Merkmalsextraktion, hat die in diese Arbeit entwickelte Methode der nichtlinearen Merkmalselektion einen weiteren entscheidenden Vorteil. Insbesondere bei sehr kostenintensiven Messungen können diejenigen Variablen ausgewählt werden, die hinsichtlich der Abbildung auf eine Ausgangsgröße den höchsten Informationsgehalt tragen. Neben dem rein technischen Aspekt, die Selektionsentscheidung direkt auf den Informationsgehalt der verfügbaren Daten zu stützen, kann die entwickelte Methode ebenfalls im Vorfeld kostenrelevanter Entscheidungen herangezogen werden, um Redundanz und die damit verbundenen höheren Kosten gezielt zu vermeiden

    Induction, complexity, and economic methodology

    Get PDF
    This paper focuses on induction, because the supposed weaknesses of that process are the main reason for favouring falsificationism, which plays an important part in scientific methodology generally; the paper is part of a wider study of economic methodology. The standard objections to, and paradoxes of, induction are reviewed, and this leads to the conclusion that the supposed ‘problem’ or ‘riddle’ of induction is a false one. It is an artefact of two assumptions: that the classic two-valued logic (CL) is appropriate for the contexts in which induction is relevant; and that it is the touchstone of rational thought. The status accorded to CL is the result of historical and cultural factors. The material we need to reason about falls into four distinct domains; these are explored in turn, while progressively relaxing the restrictions that are essential to the valid application of CL. The restrictions include the requirement for a pre-existing, independently-guaranteed classification, into which we can fit all new cases with certainty; and non-ambiguous relationships between antecedents and consequents. Natural kinds, determined by the existence of complex entities whose characteristics cannot be unbundled and altered in a piecemeal, arbitrary fashion, play an important part in the review; so also does fuzzy logic (FL). These are used to resolve two famous paradoxes about induction (the grue and raven paradoxes); and the case for believing that conventional logic is a subset of fuzzy logic is outlined. The latter disposes of all questions of justifying induction deductively. The concept of problem structure is used as the basis for a structured concept of rationality that is appropriate to all four of the domains mentioned above. The rehabilitation of induction supports an alternative definition of science: that it is the business of developing networks of contrastive, constitutive explanations of reproducible, inter-subjective (‘objective’) data. Social and psychological obstacles ensure the progress of science is slow and convoluted; however, the relativist arguments against such a project are rejected.induction; economics; methodology; complexity

    The synthesis of artificial neural networks using single string evolutionary techniques.

    Get PDF
    The research presented in this thesis is concerned with optimising the structure of Artificial Neural Networks. These techniques are based on computer modelling of biological evolution or foetal development. They are known as Evolutionary, Genetic or Embryological methods. Specifically, Embryological techniques are used to grow Artificial Neural Network topologies. The Embryological Algorithm is an alternative to the popular Genetic Algorithm, which is widely used to achieve similar results. The algorithm grows in the sense that the network structure is added to incrementally and thus changes from a simple form to a more complex form. This is unlike the Genetic Algorithm, which causes the structure of the network to evolve in an unstructured or random way. The thesis outlines the following original work: The operation of the Embryological Algorithm is described and compared with the Genetic Algorithm. The results of an exhaustive literature search in the subject area are reported. The growth strategies which may be used to evolve Artificial Neural Network structure are listed. These growth strategies are integrated into an algorithm for network growth. Experimental results obtained from using such a system are described and there is a discussion of the applications of the approach. Consideration is given of the advantages and disadvantages of this technique and suggestions are made for future work in the area. A new learning algorithm based on Taguchi methods is also described. The report concludes that the method of incremental growth is a useful and powerful technique for defining neural network structures and is more efficient than its alternatives. Recommendations are also made with regard to the types of network to which this approach is best suited. Finally, the report contains a discussion of two important aspects of Genetic or Evolutionary techniques related to the above. These are Modular networks (and their synthesis) and the functionality of the network itself
    corecore