135 research outputs found

    Uncertainty and indistinguishability. Application to modelling with words.

    Get PDF
    El concepte d'igualtat és fonamental en qualsevol teoria donat que és una noció essencial a l'hora de discernir entre els elements objecte del seu estudi i possibilitar la definició de mecanismes de classificació.Quan totes les propietats són perfectament precises (absència d'incertesa), hom obtè la igualtat clàssica a on dos objectes són considerats iguals si i només si comparteixen el mateix conjunt de propietats. Però, què passa quan considerem l'aparició d'incertesa, com en el cas a on els objectes compleixen una determinada propietat només fins a un cert grau?. Llavors, donat que alguns objectes seran més similars entre si que d'altres, sorgeix la necessitat de una noció gradual del concepte d'igualtat.Aquestes consideracions refermen la idea de que certs contextos requereixen una definició més flexible, que superi la rigidesa de la noció clàssica d'igualtat. Els operadors de T-indistingibilitat semblen bons candidats per aquest nou tipus d'igualtat que cerquem.D'altra banda, La Teoria de l'Evidència de Dempster-Shafer, com a marc pel tractament d'evidències, defineix implícitament una noció d'indistingibilitat entre els elements del domini de discurs basada en la seva compatibilitat relativa amb l'evidència considerada. El capítol segon analitza diferents mètodes per definir l'operador de T-indistingibilitat associat a una evidència donada.En el capítol tercer, després de presentar un exhaustiu estat de l'art en mesures d'incertesa, ens centrem en la qüestió del còmput de l'entropia quan sobre els elements del domini s'ha definit una relació d'indistingibilitat. Llavors, l'entropia hauria de ser mesurada no en funció de l'ocurrència d'events diferents, sinó d'acord amb la variabilitat percebuda per un observador equipat amb la relació d'indistingibilitat considerada. Aquesta interpretació suggereix el "paradigma de l'observador" que ens porta a la introducció del concepte d'entropia observacional.La incertesa és un fenomen present al món real. El desenvolupament de tècniques que en permetin el tractament és doncs, una necessitat. La 'computació amb paraules' ('computing with words') pretén assolir aquest objectiu mitjançant un formalisme basat en etiquetes lingüístiques, en contrast amb els mètodes numèrics tradicionals. L'ús d'aquestes etiquetes millora la comprensibilitat del llenguatge de representació delconeixement, a l'hora que requereix una adaptació de les tècniques inductives tradicionals.En el quart capítol s'introdueix un nou tipus d'arbre de decisió que incorpora les indistingibilitats entre elements del domini a l'hora de calcular la impuresa dels nodes. Hem anomenat arbres de decisió observacionals a aquests nou tipus, donat que es basen en la incorporació de l'entropia observacional en la funció heurística de selecció d'atributs. A més, presentem un algorisme capaç d'induir regles lingüístiques mitjançant un tractament adient de la incertesa present a les etiquetes lingüístiques o a les dades mateixes. La definició de l'algorisme s'acompanya d'una comparació formal amb altres algorismes estàndards.The concept of equality is a fundamental notion in any theory since it is essential to the ability of discerning the objects to whom it concerns, ability which in turn is a requirement for any classification mechanism that might be defined. When all the properties involved are entirely precise, what we obtain is the classical equality, where two individuals are considered equal if and only if they share the same set of properties. What happens, however, when imprecision arises as in the case of properties which are fulfilled only up to a degree? Then, because certain individuals will be more similar than others, the need for a gradual notion of equality arises.These considerations show that certain contexts that are pervaded with uncertainty require a more flexible concept of equality that goes beyond the rigidity of the classic concept of equality. T-indistinguishability operators seem to be good candidates for this more flexible and general version of the concept of equality that we are searching for.On the other hand, Dempster-Shafer Theory of Evidence, as a framework for representing and managing general evidences, implicitly conveys the notion of indistinguishability between the elements of the domain of discourse based on their relative compatibility with the evidence at hand. In chapter two we are concerned with providing definitions for the T-indistinguishability operator associated to a given body of evidence.In chapter three, after providing a comprehensive summary of the state of the art on measures of uncertainty, we tackle the problem of computing entropy when an indistinguishability relation has been defined over the elements of the domain. Entropy should then be measured not according to the occurrence of different events, but according to the variability perceived by an observer equipped with indistinguishability abilities as defined by the indistinguishability relation considered. This idea naturally leads to the introduction of the concept of observational entropy.Real data is often pervaded with uncertainty so that devising techniques intended to induce knowledge in the presence of uncertainty seems entirely advisable.The paradigm of computing with words follows this line in order to provide a computation formalism based on linguistic labels in contrast to traditional numerical-based methods.The use of linguistic labels enriches the understandability of the representation language, although it also requires adapting the classical inductive learning procedures to cope with such labels.In chapter four, a novel approach to building decision trees is introduced, addressing the case when uncertainty arises as a consequence of considering a more realistic setting in which decision maker's discernment abilities are taken into account when computing node's impurity measures. This novel paradigm results in what have been called --observational decision trees' since the main idea stems from the notion of observational entropy in order to incorporate indistinguishability concerns. In addition, we present an algorithm intended to induce linguistic rules from data by properly managing the uncertainty present either in the set of describing labels or in the data itself. A formal comparison with standard algorithms is also provided

    Mining for User-Defined Categorizations as an Approach for Process Simplification in Business Process Discovery

    Get PDF
    Business process discovery approaches analyse event logs to create process models describing the as-is state of the underlying processes. Because this type of process mining is especially relevant for low structured processes, there are approaches designed to deal with such processes by simplifying the resulting model. Such simplifications are primarily applied with metrics based on the frequency of observed behaviour. However, a high frequency of certain behaviour is not synonymous with a high relevance to the user. Consequently, this paper applies a design science research approach to design and implement a business process discovery approach based on user-defined categories to guarantee relevance to the respective user. During one design science research cycle, a design theory consisting of design requirements and design principles is constructed, a method called categorization approach is created, and this method implemented in a software artefact is evaluated with regard to perceived usefulness in an expert survey

    Reducing the number of membership functions in linguistic variables

    Get PDF
    Dissertation presented at Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia in fulfilment of the requirements for the Masters degree in Mathematics and Applications, specialization in Actuarial Sciences, Statistics and Operations ResearchThe purpose of this thesis was to develop algorithms to reduce the number of membership functions in a fuzzy linguistic variable. Groups of similar membership functions to be merged were found using clustering algorithms. By “summarizing” the information given by a similar group of membership functions into a new membership function we obtain a smaller set of membership functions representing the same concept as the initial linguistic variable. The complexity of clustering problems makes it difficult for exact methods to solve them in practical time. Heuristic methods were therefore used to find good quality solutions. A Scatter Search clustering algorithm was implemented in Matlab and compared to a variation of the K-Means algorithm. Computational results on two data sets are discussed. A case study with linguistic variables belonging to a fuzzy inference system automatically constructed from data collected by sensors while drilling in different scenarios is also studied. With these systems already constructed, the task was to reduce the number of membership functions in its linguistic variables without losing performance. A hierarchical clustering algorithm relying on performance measures for the inference system was implemented in Matlab. It was possible not only to simplify the inference system by reducing the number of membership functions in each linguistic variable but also to improve its performance

    Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis

    Get PDF
    Background and Objectives: This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. Methods: In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. Results: It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Sup- port Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). Conclusions: It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society

    A knowledge based system for scientific data visualization

    Get PDF
    A knowledge-based system, called visualization tool assistant (VISTA), which was developed to assist scientists in the design of scientific data visualization techniques, is described. The system derives its knowledge from several sources which provide information about data characteristics, visualization primitives, and effective visual perception. The design methodology employed by the system is based on a sequence of transformations which decomposes a data set into a set of data partitions, maps this set of partitions to visualization primitives, and combines these primitives into a composite visualization technique design. Although the primary function of the system is to generate an effective visualization technique design for a given data set by using principles of visual perception the system also allows users to interactively modify the design, and renders the resulting image using a variety of rendering algorithms. The current version of the system primarily supports visualization techniques having applicability in earth and space sciences, although it may easily be extended to include other techniques useful in other disciplines such as computational fluid dynamics, finite-element analysis and medical imaging

    Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

    Get PDF

    Bayesian stochastic blockmodeling

    Full text link
    This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via increased Bayesian hierarchies, and we contrast the task of sampling network partitions from the posterior distribution with finding the single point estimate that maximizes it, while describing efficient algorithms to perform either one. We also show how inferring the SBM can be used to predict missing and spurious links, and shed light on the fundamental limitations of the detectability of modular structures in networks.Comment: 44 pages, 16 figures. Code is freely available as part of graph-tool at https://graph-tool.skewed.de . See also the HOWTO at https://graph-tool.skewed.de/static/doc/demos/inference/inference.htm

    Fault diagnosis for PV arrays considering dust impact based on transformed graphical feature of characteristic curves and convolutional neural network with CBAM modules

    Full text link
    Various faults can occur during the operation of PV arrays, and both the dust-affected operating conditions and various diode configurations make the faults more complicated. However, current methods for fault diagnosis based on I-V characteristic curves only utilize partial feature information and often rely on calibrating the field characteristic curves to standard test conditions (STC). It is difficult to apply it in practice and to accurately identify multiple complex faults with similarities in different blocking diodes configurations of PV arrays under the influence of dust. Therefore, a novel fault diagnosis method for PV arrays considering dust impact is proposed. In the preprocessing stage, the Isc-Voc normalized Gramian angular difference field (GADF) method is presented, which normalizes and transforms the resampled PV array characteristic curves from the field including I-V and P-V to obtain the transformed graphical feature matrices. Then, in the fault diagnosis stage, the model of convolutional neural network (CNN) with convolutional block attention modules (CBAM) is designed to extract fault differentiation information from the transformed graphical matrices containing full feature information and to classify faults. And different graphical feature transformation methods are compared through simulation cases, and different CNN-based classification methods are also analyzed. The results indicate that the developed method for PV arrays with different blocking diodes configurations under various operating conditions has high fault diagnosis accuracy and reliability

    Identifikasi Personal Biometrik Berdasarkan Sinyal Photoplethysmography dari Detak Jantung

    Get PDF
    Sistem biometrik sangat berguna untuk membedakan karakteristik individu seseorang. Sistem identifikasi yang paling banyak digunakan diantaranya berdasarkan metode fingerprint, face detection, iris atu hand geometry. Penelitian ini mencoba untuk meningkatkan sistem biometrik menggunakan sinyal Photoplethysmography dari detak jantung. Algoritma yang diusulkan menggunakan seluruh ektraksi fitur yang didapatkan melalui sistem untuk pengenalan biometrik. Efesiensi dari algoritma yang diusulkan didemonstrasikan oleh hasil percobaan yang didapatkan menggunakan metode klasifikasi Multilayer Perceptron, Naïve Bayes dan Random Forest berdasarkan fitur ekstraksi yang didapatkan dari proses sinyal prosesing. Didapatkan 51 subjek pada penelitian ini; sinyal PPG signals dari setiap individu didapatkan melalui sensor pada dua rentang waktu yang berbeda. 30 fitur karakteristik didapatkan dari setiap periode dan kemudian digunakan untuk proses klasifikasi. Sistem klasifikasi menggunakan metode Multilayer Perceptron, Naïve Bayes dan Random Forest; nilai true positive dari masing-masing metode adalah 94.6078 %, 92.1569 % dan 90.3922 %. Hasil yang didapatkan menunjukkan bahwa seluruh algoritma yang diusulkan dan sistem identifikasi biometrik dari pengembangan sinyal PPG ini sangat menjanjikan untuk sistem pengenalan individu manusia. ============================================================================================= The importance of biometric system can distinguish the uniqueness of personal characteristics. The most popular identification systems have concerned the method based on fingerprint, face detection, iris or hand geometry. This study is trying to improve the biometric system using Photoplethysmography signal by heart rate. The proposed algorithm calculates the contribution of all extracted features to biometric recognition. The efficiency of the proposed algorithms is demonstrated by the experiment results obtained from the Multilayer Perceptron, Naïve Bayes and Random Forest classifier applications based on the extracted features. There are fifty one persons joined for the experiments; the PPG signals of each person were recorded for two different time spans. 30 characteristic features were extracted for each period and these characteristic features are used for the purpose of classification. The results were evaluated via the Multilayer Perceptron, Naïve Bayes and Random Forest classifier models; the true positive rates are then 94.6078 %, 92.1569 % and 90.3922 %, respectively. The obtained results showed that both the proposed algorithm and the biometric identification model based on this developed PPG signal are very promising for contact less recognizing systems
    corecore