Search CORE

361 research outputs found

High-Dimensional Non-Gaussian Data Clustering using Variational Learning of Mixture Models

Author: Fan Wentao
Publication venue
Publication date: 15/12/2013
Field of study

Clustering has been the topic of extensive research in the past. The main concern is to automatically divide a given data set into different clusters such that vectors of the same cluster are as similar as possible and vectors of different clusters are as different as possible. Finite mixture models have been widely used for clustering since they have the advantages of being able to integrate prior knowledge about the data and to address the problem of unsupervised learning in a formal way. A crucial starting point when adopting mixture models is the choice of the components densities. In this context, the well-known Gaussian distribution has been widely used. However, the deployment of the Gaussian mixture implies implicitly clustering based on the minimization of Euclidean distortions which may yield to poor results in several real applications where the per-components densities are not Gaussian. Recent works have shown that other models such as the Dirichlet, generalized Dirichlet and Beta-Liouville mixtures may provide better clustering results in applications containing non-Gaussian data, especially those involving proportional data (or normalized histograms) which are naturally generated by many applications. Two other challenging aspects that should also be addressed when considering mixture models are: how to determine the model's complexity (i.e. the number of mixture components) and how to estimate the model's parameters. Fortunately, both problems can be tackled simultaneously within a principled elegant learning framework namely variational inference. The main idea of variational inference is to approximate the model posterior distribution by minimizing the Kullback-Leibler divergence between the exact (or true) posterior and an approximating distribution. Recently, variational inference has provided good generalization performance and computational tractability in many applications including learning mixture models. In this thesis, we propose several approaches for high-dimensional non-Gaussian data clustering based on various mixture models such as Dirichlet, generalized Dirichlet and Beta-Liouville. These mixture models are learned using variational inference which main advantages are computational efficiency and guaranteed convergence. More specifically, our contributions are four-fold. Firstly, we develop a variational inference algorithm for learning the finite Dirichlet mixture model, where model parameters and the model complexity can be determined automatically and simultaneously as part of the Bayesian inference procedure; Secondly, an unsupervised feature selection scheme is integrated with finite generalized Dirichlet mixture model for clustering high-dimensional non-Gaussian data; Thirdly, we extend the proposed finite generalized mixture model to the infinite case using a nonparametric Bayesian framework known as Dirichlet process, so that the difficulty of choosing the appropriate number of clusters is sidestepped by assuming that there are an infinite number of mixture components; Finally, we propose an online learning framework to learn a Dirichlet process mixture of Beta-Liouville distributions (i.e. an infinite Beta-Liouville mixture model), which is more suitable when dealing with sequential or large scale data in contrast to batch learning algorithm. The effectiveness of our approaches is evaluated using both synthetic and real-life challenging applications such as image databases categorization, anomaly intrusion detection, human action videos categorization, image annotation, facial expression recognition, behavior recognition, and dynamic textures clustering

Concordia University Research Repository

SYNERGY OF BUILDING CYBERSECURITY SYSTEMS

Author: Barabash Oleg
Florov Serhii
Gavrilova Alla
Herasimov Sergey
Hrod Ivan
Khvostenko Vladyslav
Korol Olha
Laptiev Oleksandr
Melenti Yevgen
Milevskyi Stanislav
Milov Oleksandr
Nyemkova Elena
Opirskyy Ivan
Ostapov Sergey
Pavlenko Maksim
Pohasii Serhii
Ponomarenko Volodymir
Prokopenko Oleksandr
Savchenko Vitalii
Shmatko Olexander
Shuklin German
Sievierinov Оleksandr
Sobchuk Valentyn
Tkachov Andrii
Tomashevsky Bogdan
Trystan Andrii
Tsyhanenko Oleksii
Tymochko Oleksandr
Yevseiev Serhii
Zvieriev Volodymyr
Publication venue: ПП "ТЕХНОЛОГІЧНИЙ ЦЕНТР"
Publication date: 24/05/2021
Field of study

The development of the modern world community is closely related to advances in computing resources and cyberspace. The formation and expansion of the range of services is based on the achievements of mankind in the field of high technologies. However, the rapid growth of computing resources, the emergence of a full-scale quantum computer tightens the requirements for security systems not only for information and communication systems, but also for cyber-physical systems and technologies. The methodological foundations of building security systems for critical infrastructure facilities based on modeling the processes of behavior of antagonistic agents in security systems are discussed in the first chapter. The concept of information security in social networks, based on mathematical models of data protection, taking into account the influence of specific parameters of the social network, the effects on the network are proposed in second chapter. The nonlinear relationships of the parameters of the defense system, attacks, social networks, as well as the influence of individual characteristics of users and the nature of the relationships between them, takes into account. In the third section, practical aspects of the methodology for constructing post-quantum algorithms for asymmetric McEliece and Niederreiter cryptosystems on algebraic codes (elliptic and modified elliptic codes), their mathematical models and practical algorithms are considered. Hybrid crypto-code constructions of McEliece and Niederreiter on defective codes are proposed. They can significantly reduce the energy costs for implementation, while ensuring the required level of cryptographic strength of the system as a whole. The concept of security of corporate information and educational systems based on the construction of an adaptive information security system is proposed. ISBN 978-617-7319-31-2 (on-line)ISBN 978-617-7319-32-9 (print) ------------------------------------------------------------------------------------------------------------------ How to Cite: Yevseiev, S., Ponomarenko, V., Laptiev, O., Milov, O., Korol, O., Milevskyi, S. et. al.; Yevseiev, S., Ponomarenko, V., Laptiev, O., Milov, O. (Eds.) (2021). Synergy of building cybersecurity systems. Kharkiv: РС ТЕСHNOLOGY СЕNTЕR, 188. doi: http://doi.org/10.15587/978-617-7319-31-2 ------------------------------------------------------------------------------------------------------------------ Indexing:                    Розвиток сучасної світової спільноти тісно пов’язаний з досягненнями в області обчислювальних ресурсів і кіберпростору. Формування та розширення асортименту послуг базується на досягненнях людства у галузі високих технологій. Однак стрімке зростання обчислювальних ресурсів, поява повномасштабного квантового комп’ютера посилює вимоги до систем безпеки не тільки інформаційно-комунікаційних, але і до кіберфізичних систем і технологій. У першому розділі обговорюються методологічні основи побудови систем безпеки для об'єктів критичної інфраструктури на основі моделювання процесів поведінки антагоністичних агентів у систем безпеки. У другому розділі пропонується концепція інформаційної безпеки в соціальних мережах, яка заснована на математичних моделях захисту даних, з урахуванням впливу конкретних параметрів соціальної мережі та наслідків для неї. Враховуються нелінійні взаємозв'язки параметрів системи захисту, атак, соціальних мереж, а також вплив індивідуальних характеристик користувачів і характеру взаємовідносин між ними. У третьому розділі розглядаються практичні аспекти методології побудови постквантових алгоритмів для асиметричних криптосистем Мак-Еліса та Нідеррейтера на алгебраїчних кодах (еліптичних та модифікованих еліптичних кодах), їх математичні моделі та практичні алгоритми. Запропоновано гібридні конструкції криптокоду Мак-Еліса та Нідеррейтера на дефектних кодах. Вони дозволяють істотно знизити енергетичні витрати на реалізацію, забезпечуючи при цьому необхідний рівень криптографічної стійкості системи в цілому. Запропоновано концепцію безпеки корпоративних інформаційних та освітніх систем, які засновані на побудові адаптивної системи захисту інформації. ISBN 978-617-7319-31-2 (on-line)ISBN 978-617-7319-32-9 (print) ------------------------------------------------------------------------------------------------------------------ Як цитувати: Yevseiev, S., Ponomarenko, V., Laptiev, O., Milov, O., Korol, O., Milevskyi, S. et. al.; Yevseiev, S., Ponomarenko, V., Laptiev, O., Milov, O. (Eds.) (2021). Synergy of building cybersecurity systems. Kharkiv: РС ТЕСHNOLOGY СЕNTЕR, 188. doi: http://doi.org/10.15587/978-617-7319-31-2 ------------------------------------------------------------------------------------------------------------------ Індексація:                 &nbsp

PRIVATE COMPANY TECHNOLOGY CENTER

Contributions to outlier detection and recommendation systems

Author: Shu Wu
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2012
Field of study

Le forage de données, appelé également "Découverte de connaissance dans les bases de données" , est un jeune domaine de recherche interdisciplinaire. Le forage de données étudie les processus d'analyse de grands ensembles de données pour en extraire des connaissances, et les processus de transformation de ces connaissances en des structures faciles à comprendre et à utiliser par les humains. Cette thèse étudie deux tâches importantes dans le domaine du forage de données : la détection des anomalies et la recommandation de produits. La détection des anomalies est l'identification des données non conformes aux observations normales. La recommandation de produit est la prédiction du niveau d'intérêt d'un client pour des produits en se basant sur des données d'achats antérieurs et des données socio-économiques. Plus précisément, cette thèse porte sur 1) la détection des anomalies dans de grands ensembles de données de type catégorielles; et 2) les techniques de recommandation à partir des données de classements asymétriques. La détection des anomalies dans des données catégorielles de grande échelle est un problème important qui est loin d'être résolu. Les méthodes existantes dans ce domaine souffrnt d'une faible efficience et efficacité en raison de la dimensionnalité élevée des données, de la grande taille des bases de données, de la complexité élevée des tests statistiques, ainsi que des mesures de proximité non adéquates. Cette thèse propose une définition formelle d'anomalie dans les données catégorielles ainsi que deux algorithmes efficaces et efficients pour la détection des anomalies dans les données de grande taille. Ces algorithmes ont besoin d'un seul paramètre : le nombre des anomalies. Pour déterminer la valeur de ce paramètre, nous avons développé un critère en nous basant sur un nouveau concept qui est l'holo-entropie. Plusieurs recherches antérieures sur les systèmes de recommandation ont négligé un type de classements répandu dans les applications Web, telles que le commerce électronique (ex. Amazon, Taobao) et les sites fournisseurs de contenu (ex. YouTube). Les données de classements recueillies par ces sites se différencient de celles de classements des films et des musiques par leur distribution asymétrique élevée. Cette thèse propose un cadre mieux adapté pour estimer les classements et les préférences quantitatives d'ordre supérieur pour des données de classements asymétriques. Ce cadre permet de créer de nouveaux modèles de recommandation en se basant sur la factorisation de matrice ou sur l'estimation de voisinage. Des résultats expérimentaux sur des ensembles de données asymétriques indiquent que les modèles créés avec ce cadre ont une meilleure performance que les modèles conventionnels non seulement pour la prédiction de classements, mais aussi pour la prédiction de la liste des Top-N produits

Savoirs UdeS

Efficient Learning Machines

Author: Awad Mariette
Khanna Rahul
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Computer scienc

OAPEN Library

An Explainable Machine Learning Model for Early Prediction of Sepsis Using ICU Data

Author: Nesaragi Naimahmed
Patidar Shivnarayan
Publication venue: 'IntechOpen'
Publication date: 23/07/2021
Field of study

Early identification of individuals with sepsis is very useful in assisting clinical triage and decision-making, resulting in early intervention and improved outcomes. This study aims to develop an explainable machine learning model with the clinical interpretability to predict sepsis onset before 6 hours and validate with improved prediction risk power for every time interval since admission to the ICU. The retrospective observational cohort study is carried out using PhysioNet Challenge 2019 ICU data from three distinct hospital systems, viz. A, B, and C. Data from A and B were shared publicly for training and validation while sequestered data from all three cohorts were used for scoring. However, this study is limited only to publicly available training data. Training data contains 15,52,210 patient records of 40,336 ICU patients with up to 40 clinical variables (sourced for each hour of their ICU stay) divided into two datasets, based on hospital systems A and B. The clinical feature exploration and interpretation for early prediction of sepsis is achieved using the proposed framework, viz. the explainable Machine Learning model for Early Prediction of Sepsis (xMLEPS). A total of 85 features comprising the given 40 clinical variables augmented with 10 derived physiological features and 35 time-lag difference features are fed to xMLEPS for the said prediction task of sepsis onset. A ten-fold cross-validation scheme is employed wherein an optimal prediction risk threshold is searched for each of the 10 LightGBM models. These optimum threshold values are later used by the corresponding models to refine the predictive power in terms of utility score for the prediction of labels in each fold. The entire framework is designed via Bayesian optimization and trained with the resultant feature set of 85 features, yielding an average normalized utility score of 0.4214 and area under receiver operating characteristic curve of 0.8591 on publicly available training data. This study establish a practical and explainable sepsis onset prediction model for ICU data using applied ML approach, mainly gradient boosting. The study highlights the clinical significance of physiological inter-relations among the given and proposed clinical signs via feature importance and SHapley Additive exPlanations (SHAP) plots for visualized interpretation

IntechOpen

Monte Carlo Method with Heuristic Adjustment for Irregularly Shaped Food Product Volume Measurement

Author: Abdullah Azizi
Idrus Bahari
Prabuwono Anton Satria
Siswantoro Joko
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Volume measurement plays an important role in the production and processing of food products. Various methods have been proposed to measure the volume of food products with irregular shapes based on 3D reconstruction. However, 3D reconstruction comes with a high-priced computational cost. Furthermore, some of the volume measurement methods based on 3D reconstruction have a low accuracy. Another method for measuring volume of objects uses Monte Carlo method. Monte Carlo method performs volume measurements using random points. Monte Carlo method only requires information regarding whether random points fall inside or outside an object and does not require a 3D reconstruction. This paper proposes volume measurement using a computer vision system for irregularly shaped food products without 3D reconstruction based on Monte Carlo method with heuristic adjustment. Five images of food product were captured using five cameras and processed to produce binary images. Monte Carlo integration with heuristic adjustment was performed to measure the volume based on the information extracted from binary images. The experimental results show that the proposed method provided high accuracy and precision compared to the water displacement method. In addition, the proposed method is more accurate and faster than the space carving method

Crossref

Directory of Open Access Journals

PubMed Central

University of Surabaya Institutional Repository

Intelligent Sensor Networks

Author
Publication venue: 'Informa UK Limited'
Publication date
Field of study

In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts

OAPEN Library

Mining a Small Medical Data Set by Integrating the Decision Tree and t-test

Author: Chang Ming-Yang
Publication venue: 'Academy Publisher'
Publication date
Field of study

[[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]國外[[incitationindex]]EI[[booktype]]紙本[[countrycodes]]FI

Tamkang University Institutional Repository

Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges

Author: Chai Lilong
Dou Fei
Guan Le
Li Changying
Li Guoming
Li Sheng
Liu Ninghao
Liu Tianming
Liu Zhengliang
Lu Guoyu
Lu Jin
Lu Qin
Mai Gengchen
Niu Wei
Shao Yunli
Song Wenzhan
Sun Haijian
Sun Hongyue
Sun Jin
Tan Chenjiao
Wang Xianqiao
Wu Zihao
Xu Shaochen
Ye Jin
Yuan Geng
Publication venue
Publication date: 14/09/2023
Field of study

Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and execute tasks with human cognitive abilities, engenders significant anticipation and intrigue across scientific, commercial, and societal arenas. This fascination extends particularly to the Internet of Things (IoT), a landscape characterized by the interconnection of countless devices, sensors, and systems, collectively gathering and sharing data to enable intelligent decision-making and automation. This research embarks on an exploration of the opportunities and challenges towards achieving AGI in the context of the IoT. Specifically, it starts by outlining the fundamental principles of IoT and the critical role of Artificial Intelligence (AI) in IoT systems. Subsequently, it delves into AGI fundamentals, culminating in the formulation of a conceptual framework for AGI's seamless integration within IoT. The application spectrum for AGI-infused IoT is broad, encompassing domains ranging from smart grids, residential environments, manufacturing, and transportation to environmental monitoring, agriculture, healthcare, and education. However, adapting AGI to resource-constrained IoT settings necessitates dedicated research efforts. Furthermore, the paper addresses constraints imposed by limited computing resources, intricacies associated with large-scale IoT communication, as well as the critical concerns pertaining to security and privacy

arXiv.org e-Print Archive