1,065 research outputs found

    On Information Granulation via Data Filtering for Granular Computing-Based Pattern Recognition: A Graph Embedding Case Study

    Get PDF
    Granular Computing is a powerful information processing paradigm, particularly useful for the synthesis of pattern recognition systems in structured domains (e.g., graphs or sequences). According to this paradigm, granules of information play the pivotal role of describing the underlying (possibly complex) process, starting from the available data. Under a pattern recognition viewpoint, granules of information can be exploited for the synthesis of semantically sound embedding spaces, where common supervised or unsupervised problems can be solved via standard machine learning algorithms. In this companion paper, we follow our previous paper (Martino et al. in Algorithms 15(5):148, 2022) in the context of comparing different strategies for the automatic synthesis of information granules in the context of graph classification. These strategies mainly differ on the specific topology adopted for subgraphs considered as candidate information granules and the possibility of using or neglecting the ground-truth class labels in the granulation process and, conversely, to our previous work, we employ a filtering-based approach for the synthesis of information granules instead of a clustering-based one. Computational results on 6 open-access data sets corroborate the robustness of our filtering-based approach with respect to data stratification, if compared to a clustering-based granulation stage

    Human versus Machine Intelligence: Assessing Natural Language Generation Models through Complex Systems Theory

    Get PDF
    The introduction of Transformer architectures - with the self-attention mechanism - in automatic Natural Language Generation (NLG) is a breakthrough in solving general task-oriented problems, such as the simple production of long text excerpts that resemble ones written by humans. While the performance of GPT-X architectures is there for all to see, many efforts are underway to penetrate the secrets of these black-boxes in terms of intelligent information processing whose output statistical distributions resemble that of natural language. In this work, through the complexity science framework, a comparative study of the stochastic processes underlying the texts produced by the English version of GPT-2 with respect to texts produced by human beings, notably novels in English and programming codes, is offered. The investigation, of a methodological nature, consists first of all of an analysis phase in which the Multifractal Detrended Fluctuation Analysis and the Recurrence Quantification Analysis - together with Zipf's law and approximate entropy - are adopted to characterize long-term correlations, regularities and recurrences in human and machine-produced texts. Results show several peculiarities and trends in terms of long-range correlations and recurrences in the last case. The synthesis phase, on the other hand, uses the complexity measures to build synthetic text descriptors - hence a suitable text embedding - which serve to constitute the features for feeding a machine learning system designed to operate feature selection through an evolutionary technique. Using multivariate analysis, it is then shown the grouping tendency of the three analyzed text types, allowing to place GTP-2 texts in between natural language texts and computer codes. Similarly, the classification task demonstrates that, given the high accuracy obtained in the automatic discrimination of text classes, the proposed set of complexity measures is highly informative. These interesting results allow us to add another piece to the theoretical understanding of the surprising results obtained by NLG systems based on deep learning and let us to improve the design of new informetrics or text mining systems for text classification, fake news detection, or even plagiarism detection

    Sources of possible artefacts in the contrast evaluation for the backscattering polarimetric images of different targets in turbid medium

    No full text
    International audienceIt is known that polarization-sensitive backscattering images of different objects in turbid media may show better contrasts than usual intensity images. Polarimetric image contrast depends on both target and background polarization properties and typically involves averaging over groups of pixels, corresponding to given areas of the image. By means of numerical modelling we show that the experimental arrangement, namely, the shape of turbid medium container, the optical properties of the container walls, the relative positioning of the absorbing, scattering and reflecting targets with respect to each other and to the container walls, as well as the choice of the image areas for the contrast calculations, can strongly affect the final results for both linearly and circularly polarized light

    Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations

    Get PDF
    Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system

    Calibration techniques for binary classification problems: A comparative analysis

    Get PDF
    Calibrating a classification system consists in transforming the output scores, which somehow state the confidence of the classifier regarding the predicted output, into proper probability estimates. Having a well-calibrated classifier has a non-negligible impact on many real-world applications, for example decision making systems synthesis for anomaly detection/fault prediction. In such industrial scenarios, risk assessment is certainly related to costs which must be covered. In this paper we review three state-of-the-art calibration techniques (Platt’s Scaling, Isotonic Regression and SplineCalib) and we propose three lightweight procedures based on a plain fitting of the reliability diagram. Computational results show that the three proposed techniques have comparable performances with respect to the three state-of-the-art approaches

    Accidental impacts on historical and architectural heritage in port areas: the case of Brindisi

    Get PDF
    Most port areas can produce impacts on the historical and architectural heritage, leading to rapid pathological effects and generating high risks in terms of damages and losses of historical, artistic, and cultural values. In effect, in addition to stationary actions (air pollution, waste, water discharge), port activities could generate exceptional impacts: the so-called “major accidents”, such as fires or explosions and chemical releases. The present contribution analyses and discusses a given case, the port of Brindisi, suggesting a methodology for the assessment of exceptional impacts in ports, in order to identify those potential accidents and their effects on the historical landscape. It points out that, as often occurs in ports, the most frequent major accidents are caused by activities involving hazardous materials. The methodology proposed for this given case aims to demonstrate that in the historical port areas, such as in the Mediterranean Sea, the development and management should be accompanied, or even oriented to the protection of the historical and cultural landscape.Postprint (author's final draft

    Effect of speckle on APSCI method and Mueller Imaging

    No full text
    7 pagesInternational audienceThe principle of the polarimetric imaging method called APSCI (Adapted Polarization State Contrast Imaging) is to maximize the polarimetric contrast between an object and its background using specific polarization states of illumination and detection. We perform here a comparative study of the APSCI method with existing Classical Mueller Imaging(CMI) associated with polar decomposition in the presence of fully and partially polarized circular Gaussian speckle. The results show a noticeable increase of the Bhattacharyya distance used as our contrast parameter for the APSCI method, especially when the object and background exhibit several polarimetric properties simultaneously

    From Bag-of-Words to Transformers: A Comparative Study for Text Classification in Healthcare Discussions in Social Media

    Get PDF
    One notable paradigm shift in Natural Language Processing has been the introduction of Transformers, revolutionizing language modeling as Convolutional Neural Networks did for Computer Vision. The power of Transformers, along with many other innovative features, also lies in the integration of word embedding techniques, traditionally used to represent words in a text and to build classification systems directly. This study delves into the comparison of text representation techniques for classifying users who generate medical topic posts on Facebook discussion groups. Short and noisy social media texts in Italian pose challenges for user categorization. The study employs two datasets, one for word embedding model estimation and another comprising discussions from users. The main objective is to achieve optimal user categorization through different pre-processing and embedding techniques, aiming at high generalization performance despite class imbalance. The paper has a dual purpose, i.e., to build an effective classifier, ensuring accurate information dissemination in medical discussions and combating fake news, and to explore also the representational capabilities of various LLMs, especially concerning BERT, Mistral and GPT-4. The latter is investigated using the in-context learning approach. Finally, data visualization tools are used to evaluate the semantic embeddings with respect to the achieved performance. This investigation, focusing on classification performance, compares the classic BERT and several hybrid versions (i.e., employing different training strategies and approximate Support Vector Machines in the classification layer) against LLMs and several Bag-of-Words based embedding (notably, one of the earliest approaches in text classification). This research offers insights into the latest developments in language modeling, advancing in the field of text representation and its practical application for user classification within medical discussions

    Polarimetric imaging for cancer diagnosis and staging

    No full text
    A medical imaging technique that relies on light polarization could become a fast and accurate optical method for detecting cancer and determining the stage of the disease
    corecore