37 research outputs found

    Clustering of nonstationary data streams: a survey of fuzzy partitional methods

    Get PDF
    YesData streams have arisen as a relevant research topic during the past decade. They are real‐time, incremental in nature, temporally ordered, massive, contain outliers, and the objects in a data stream may evolve over time (concept drift). Clustering is often one of the earliest and most important steps in the streaming data analysis workflow. A comprehensive literature is available about stream data clustering; however, less attention is devoted to the fuzzy clustering approach, even though the nonstationary nature of many data streams makes it especially appealing. This survey discusses relevant data stream clustering algorithms focusing mainly on fuzzy methods, including their treatment of outliers and concept drift and shift.Ministero dell‘Istruzione, dell‘Universitá e della Ricerca

    On the semantics of fuzzy logic

    Get PDF
    AbstractThis paper presents a formal characterization of the major concepts and constructs of fuzzy logic in terms of notions of distance, closeness, and similarity between pairs of possible worlds. The formalism is a direct extension (by recognition of multiple degrees of accessibility, conceivability, or reachability) of the najor modal logic concepts of possible and necessary truth.Given a function that maps pairs of possible worlds into a number between 0 and 1, generalizing the conventional concept of an equivalence relation, the major constructs of fuzzy logic (conditional and unconditioned possibility distributions) are defined in terms of this similarity relation using familiar concepts from the mathematical theory of metric spaces. This interpretation is different in nature and character from the typical, chance-oriented, meanings associated with probabilistic concepts, which are grounded on the mathematical notion of set measure. The similarity structure defines a topological notion of continuity in the space of possible worlds (and in that of its subsets, i.e., propositions) that allows a form of logical “extrapolation” between possible worlds.This logical extrapolation operation corresponds to the major deductive rule of fuzzy logic — the compositional rule of inference or generalized modus ponens of Zadeh — an inferential operation that generalizes its classical counterpart by virtue of its ability to be utilized when propositions representing available evidence match only approximately the antecedents of conditional propositions. The relations between the similarity-based interpretation of the role of conditional possibility distributions and the approximate inferential procedures of Baldwin are also discussed.A straightforward extension of the theory to the case where the similarity scale is symbolic rather than numeric is described. The problem of generating similarity functions from a given set of possibility distributions, with the latter interpreted as defining a number of (graded) discernibility relations and the former as the result of combining them into a joint measure of distinguishability between possible worlds, is briefly discussed

    Representing archaeological uncertainty in cultural informatics

    Get PDF
    This thesis sets out to explore, describe, quantify, and visualise uncertainty in a cultural informatics context, with a focus on archaeological reconstructions. For quite some time, archaeologists and heritage experts have been criticising the often toorealistic appearance of three-dimensional reconstructions. They have been highlighting one of the unique features of archaeology: the information we have on our heritage will always be incomplete. This incompleteness should be reflected in digitised reconstructions of the past. This criticism is the driving force behind this thesis. The research examines archaeological theory and inferential process and provides insight into computer visualisation. It describes how these two areas, of archaeology and computer graphics, have formed a useful, but often tumultuous, relationship through the years. By examining the uncertainty background of disciplines such as GIS, medicine, and law, the thesis postulates that archaeological visualisation, in order to mature, must move towards archaeological knowledge visualisation. Three sequential areas are proposed through this thesis for the initial exploration of archaeological uncertainty: identification, quantification and modelling. The main contributions of the thesis lie in those three areas. Firstly, through the innovative design, distribution, and analysis of a questionnaire, the thesis identifies the importance of uncertainty in archaeological interpretation and discovers potential preferences among different evidence types. Secondly, the thesis uniquely analyses and evaluates, in relation to archaeological uncertainty, three different belief quantification models. The varying ways that these mathematical models work, are also evaluated through simulated experiments. Comparison of results indicates significant convergence between the models. Thirdly, a novel approach to archaeological uncertainty and evidence conflict visualisation is presented, influenced by information visualisation schemes. Lastly, suggestions for future semantic extensions to this research are presented through the design and development of new plugins to a search engine

    Efficient Maximum A-Posteriori Inference in Markov Logic and Application in Description Logics

    Full text link
    Maximum a-posteriori (MAP) query in statistical relational models computes the most probable world given evidence and further knowledge about the domain. It is arguably one of the most important types of computational problems, since it is also used as a subroutine in weight learning algorithms. In this thesis, we discuss an improved inference algorithm and an application for MAP queries. We focus on Markov logic (ML) as statistical relational formalism. Markov logic combines Markov networks with first-order logic by attaching weights to first-order formulas. For inference, we improve existing work which translates MAP queries to integer linear programs (ILP). The motivation is that existing ILP solvers are very stable and fast and are able to precisely estimate the quality of an intermediate solution. In our work, we focus on improving the translation process such that we result in ILPs having fewer variables and fewer constraints. Our main contribution is the Cutting Plane Aggregation (CPA) approach which leverages symmetries in ML networks and parallelizes MAP inference. Additionally, we integrate the cutting plane inference (Riedel 2008) algorithm which significantly reduces the number of groundings by solving multiple smaller ILPs instead of one large ILP. We present the new Markov logic engine RockIt which outperforms state-of-the-art engines in standard Markov logic benchmarks. Afterwards, we apply the MAP query to description logics. Description logics (DL) are knowledge representation formalisms whose expressivity is higher than propositional logic but lower than first-order logic. The most popular DLs have been standardized in the ontology language OWL and are an elementary component in the Semantic Web. We combine Markov logic, which essentially follows the semantic of a log-linear model, with description logics to log-linear description logics. In log-linear description logic weights can be attached to any description logic axiom. Furthermore, we introduce a new query type which computes the most-probable 'coherent' world. Possible applications of log-linear description logics are mainly located in the area of ontology learning and data integration. With our novel log-linear description logic reasoner ELog, we experimentally show that more expressivity increases quality and that the solutions of optimal solving strategies have higher quality than the solutions of approximate solving strategies

    Pertanika Journal of Science & Technology

    Get PDF

    Robust learning to rank models and their biomedical applications

    Get PDF
    There exist many real-world applications such as recommendation systems, document retrieval, and computational biology where the correct ordering of instances is of equal or greater importance than predicting the exact value of some discrete or continuous outcome. Learning-to-Rank (LTR) refers to a group of algorithms that apply machine learning techniques to tackle these ranking problems. Despite their empirical success, most existing LTR models are not built to be robust to errors in labeling or annotation, distributional data shift, or adversarial data perturbations. To fill this gap, we develop four LTR frameworks that are robust to various types of perturbations. First, Pairwise Elastic Net Regression Ranking (PENRR) is an elastic-net-based regression method for drug sensitivity prediction. PENRR infers robust predictors of drug responses from patient genomic information. The special design of this model (comparing each drug with other drugs in the same cell line and comparing that drug with itself in other cell lines) significantly enhances the accuracy of the drug prediction model under limited data. This approach is also able to solve the problem of fitting on the insensitive drugs that is commonly encountered in regression-based models. Second, Regression-based Ranking by Pairwise Cluster Comparisons (RRPCC) is a ridge-regression-based method for ranking clusters of similar protein complex conformations generated by an underlying docking program (i.e., ClusPro). Rather than using regression to predict scores, which would equally penalize deviations for either low-quality and high-quality clusters, we seek to predict the difference of scores for any pair of clusters corresponding to the same complex. RRPCC combines these pairwise assessments to form a ranked list of clusters, from higher to lower quality. We apply RRPCC to clusters produced by the automated docking server ClusPro and, depending on the training/validation strategy, we show. improvement by 24%–100% in ranking acceptable or better quality clusters first, and by 15%–100% in ranking medium or better quality clusters first. Third, Distributionally Robust Multi-Output Regression Ranking (DRMRR) is a listwise LTR model that induces robustness into LTR problems using the Distributionally Robust Optimization framework. Contrasting to existing methods, the scoring function of DRMRR was designed as a multivariate mapping from a feature vector to a vector of deviation scores, which captures local context information and cross-document interactions. DRMRR employs ranking metrics (i.e., NDCG) in its output. Particularly, we used the notion of position deviation to define a vector of relevance score instead of a scalar one. We then adopted the DRO framework to minimize a worst-case expected multi-output loss function over a probabilistic ambiguity set that is defined by the Wasserstein metric. We also presented an equivalent convex reformulation of the DRO problem, which is shown to be tighter than the ones proposed by the previous studies. Fourth, Inversion Transformer-based Neural Ranking (ITNR) is a Transformer-based model to predict drug responses using RNAseq gene expression profiles, drug descriptors, and drug fingerprints. It utilizes a Context-Aware-Transformer architecture as its scoring function that ensures the modeling of inter-item dependencies. We also introduced a new loss function using the concept of Inversion and approximate permutation matrices. The accuracy and robustness of these LTR models are verified through three medical applications, namely cluster ranking in protein-protein docking, medical document retrieval, and drug response prediction

    Beurteilung der Resttragfähigkeit von Bauwerken mit Hilfe der Fuzzy-Logik und Entscheidungstheorie

    Get PDF
    Whereas the design of new structures is almost completely regulated by codes, there are no objective ways for the evaluation of existing facilities. Experts often are not familiar with the new tasks in system identification and try to retrieve at least some information from available documents. They therefore make compromises which, for many stakeholders, are not satisfying. Consequently, this publication presents a more objective and more realistic method for condition assessment. Necessary basics for this task are fracture mechanics combined with computational analysis, methods and techniques for geometry recording and material investigation, ductility and energy dissipation, risk analysis and uncertainty consideration. Present tools for evaluation perform research on how to analytically conceptualize a structure directly from given loads and measured response. Since defects are not necessarily visible or in a direct way detectable, several damage indices are combined and integrated in a model of the real system. Fuzzy-sets are ideally suited to illustrate parametric/data uncertainty and system- or model uncertainty. Trapezoidal membership functions may very well represent the condition state of structural components as function of damage extent or performance. Tthe residual load-bearing capacity can be determined by successively performing analyses in three steps. The "Screening assessment" shall eliminate a large majority of structures from detailed consideration and advise on immediate precautions to save lives and high economic values. Here, the defects have to be explicitly defined and located. If this is impossible, an "approximate evaluation" should follow describing system geometry, material properties and failure modes in detail. Here, a fault-tree helps investigate defaults in a systematic way avoiding random search or negligence of important features or damage indices. In order to inform about the structural system it is deemed essential not only due to its conceptual clarity, but also due to its applicational simplicity. It therefore represents an important prerequisite in condition assessment though special circumstances might require "fur-ther investigations" to consider the actual material parameters and unaccounted reserves due to spatial or other secondary contributions. Here, uncertainties with respect to geometry, material, loading or modeling should in no case be neglected, but explicitly quantified. Postulating a limited set of expected failure modes is not always sufficient, since detectable signature changes are seldom directly attributable and every defect might -together with other unforeseen situations- become decisive. So, a determination of all possible scenarios to consider every imaginable influence would be required. Risk is produced by a combination of various and ill-defined failure modes. Due to the interaction of many variables there is no simple and reliable way to predict which failure mode is dominant. Risk evaluation therefore comprises the estimation of the prognostic factor with respect to undesir-able events, component importance and the expected damage extent.Während die Bemessung von Tragwerken im allgemeinen durch Vorschriften geregelt ist, gibt es für die Zustandsbewertung bestehender Bauwerken noch keine objektiven Richtlinien. Viele Experten sind mit der neuen Problematik (Systemidentifikation anhand von Belastung und daraus entstehender Strukturantwort) noch nicht vertraut und begnügen sich daher mit Kompromißlösungen. Für viele Bauherren ist dies unbefriedigend, weshalb hier eine objektivere und wirklichkeitsnähere Zustandsbewertung vorgestellt wird. Wichtig hierfür sind theoretische Grundlagen der Schadensanalyse, Methoden und Techniken zur Geometrie- und Materialerkundung, Duktilität und Energieabsorption, Risikoanalyse und Beschreibung von Unsicherheiten. Da nicht alle Schäden offensichtlich sind, kombiniert man zur Zeit mehrere Zustandsindikatoren, bereitet die registrierten Daten gezielt auf, und integriert sie vor einer endgültigen Bewertung in ein validiertes Modell. Werden deterministische Nachweismethoden mit probabilstischen kombiniert, lassen sich nur zufällige Fehler problemlos minimieren. Systematische Fehler durch ungenaue Modellierung oder vagem Wissen bleiben jedoch bestehen. Daß Entscheidungsträger mit unsicheren, oft sogar widersprüchlichen Angaben subjektiv urteilen, ist also nicht zu vermeiden. In dieser Arbeit wird gezeigt, wie mit Hilfe eines dreistufigen Bewertungsverfahrens Tragglieder in Qualitätsklassen eingestuft werden können. Abhängig von ihrem mittleren Schadensausmaß, ihrer Strukturbedeutung I (wiederum von ihrem Stellenwert bzw. den Konsequenzen ihrer Schädigung abhängig) und ihrem Prognosefaktor L ergibt sich ihr Versagensrisiko mit. Das Risiko für eine Versagen der Gesamtstruktur wird aus der Topologie ermittelt. Wenn das mittlere Schadensausmaß nicht eindeutig festgelegt werden kann, oder wenn die Material-, Geometrie- oder Lastangaben vage sind, wird im Rahmen "Weitergehender Untersuchungen" ein mathematisches Verfahren basierend auf der Fuzzy-Logik vorgeschlagen. Es filtert auch bei komplexen Ursache-Wirkungsbeziehungen die dominierende Schadensursache heraus und vermeidet, daß mit Unsicherheiten behaftete Parameter für zuverlässige Absolutwerte gehalten werden. Um den mittleren Schadensindex und daraus das Risiko zu berechnen, werden die einzelnen Schadensindizes (je nach Fehlermodus) abhängig von ihrer Bedeutung mit Wichtungsfaktoren belegt,und zusätzlich je nach Art, Bedeutung und Zuverlässigkeit der erhaltenen Information durch Gamma dividiert. Hiermit wurde ein neues Verfahren zur Analyse komplexer Versagensmechanismen vorgestellt, welches nachvollziehbare Schlußfolgerungen ermöglicht

    INVESTIGATIONS ON COGNITIVE COMPUTATION AND COMPUTATIONAL COGNITION

    Get PDF
    This Thesis describes our work at the boundary between Computer Science and Cognitive (Neuro)Science. In particular, (1) we have worked on methodological improvements to clustering-based meta-analysis of neuroimaging data, which is a technique that allows to collectively assess, in a quantitative way, activation peaks from several functional imaging studies, in order to extract the most robust results in the cognitive domain of interest. Hierarchical clustering is often used in this context, yet it is prone to the problem of non-uniqueness of the solution: a different permutation of the same input data might result in a different clustering result. In this Thesis, we propose a new version of hierarchical clustering that solves this problem. We also show the results of a meta-analysis, carried out using this algorithm, aimed at identifying specific cerebral circuits involved in single word reading. Moreover, (2) we describe preliminary work on a new connectionist model of single word reading, named the two-component model because it postulates a cascaded information flow from a more cognitive component that computes a distributed internal representation for the input word, to an articulatory component that translates this code into the corresponding sequence of phonemes. Output production is started when the internal code, which evolves in time, reaches a sufficient degree of clarity; this mechanism has been advanced as a possible explanation for behavioral effects consistently reported in the literature on reading, with a specific focus on the so called serial effects. This model is here discussed in its strength and weaknesses. Finally, (3) we have turned to consider how features that are typical of human cognition can inform the design of improved artificial agents; here, we have focused on modelling concepts inspired by emotion theory. A model of emotional interaction between artificial agents, based on probabilistic finite state automata, is presented: in this model, agents have personalities and attitudes that can change through the course of interaction (e.g. by reinforcement learning) to achieve autonomous adaptation to the interaction partner. Markov chain properties are then applied to derive reliable predictions of the outcome of an interaction. Taken together, these works show how the interplay between Cognitive Science and Computer Science can be fruitful, both for advancing our knowledge of the human brain and for designing more and more intelligent artificial systems
    corecore