974 research outputs found

    Learning Collective Behavior in Multi-relational Networks

    Get PDF
    With the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of relationship. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, networked data extracted from social media, web pages, and bibliographic databases can contain entities of multiple classes and linked by various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers. Learning the collective behavior and interactions in heterogeneous networks becomes much more complex. The contribution of this dissertation include 1) two classification frameworks for identifying human collective behavior in multi-relational social networks; 2) unsupervised and supervised learning models for relationship prediction in multi-relational collaborative networks. Our methods improve the performance of homogeneous predictive models by differentiating heterogeneous relations and capturing the prominent interaction patterns underlying the network structure. The work has been evaluated in various real-world social networks. We believe that this study will be useful for analyzing human collective behavior and interactions specifically in the scenario when the heterogeneous relationships in the network arise from various causal reasons

    A framework for AI-driven neurorehabilitation training: the profiling challenge

    Get PDF
    Cognitive decline is a common sign that a person is ageing. However, abnormal cases can lead to dementia, affecting daily living activities and independent functioning. It is a leading cause of disability and death. Its prevention is a global health priority. One way to address cognitive decline is to undergo cognitive rehabilitation. Cognitive rehabilitation aims to restore or mitigate the symptoms of a cognitive disability, increasing the quality of life for the patient. However, cognitive rehabilitation is stuck to clinical environments and logistics, leading to a suboptimal set of expansive tools that is hard to accommodate every patientā€™s needs. The BRaNT project aims to create a tool that mitigates this problem. The NeuroAIreh@b is a rehabilitation tool developed within a framework that combines neuropsychological assessments, neurorehabilitation procedures, artificial intelligence and game design, composing a tool that is easy to set up in a clinical environment and accessible to adapt to every patientā€™s needs. Among all the challenges within NeuroAlreh@b, one focuses on representing a cognitive profile through the aggregation of multiple neuropsychological assessments. To test this possibility, we will need data from patients currently unavailable. In the first part of this masterā€™s project, study the possibility of aggregating neuropsychological assessments for the case of Alzheimerā€™s disease using the Alzheimerā€™s Disease Neuroimaging Initiative database. This database contains a vast collection of images and neuropsychological assessments that will serve as a baseline for the NeuroAlreh@b when the time comes. In the second part of this project, we set up a computational system to run all the artificial intelligence models and simulations required for the BRaNT project. The system allocates a database and a webserver to serve all the required pages for the project.O declĆ­nio cognitivo Ć© um sinal comum de que uma pessoa estĆ” a envelhecer. No entanto, casos anormais podem levar Ć  demĆŖncia, afetando as atividades diĆ”rias e funcionamento independente. DemĆŖncia Ć© uma das principais causas de incapacidade e morte. Fazendo da sua prevenĆ§Ć£o uma prioridade para a saĆŗde global. Uma forma de lidar com o declĆ­nio cognitivo Ć© submeter-se Ć  reabilitaĆ§Ć£o cognitiva. A reabilitaĆ§Ć£o cognitiva visa restaurar ou mitigar os sintomas de uma deficiĆŖncia cognitiva, aumentando a qualidade de vida do paciente. No entanto, a reabilitaĆ§Ć£o cognitiva estĆ” presa a ambientes clĆ­nicos e logĆ­stica, levando a um conjunto sub-ideal de ferramentas com custos elevados e complicadas de acomodar as necessidades de cada paciente. O projeto BRaNT visa criar uma ferramenta que atenue este problema. O NeuroAIreh@b Ć© uma ferramenta de reabilitaĆ§Ć£o desenvolvida num quadro que combina avaliaƧƵes neuropsicolĆ³gicas, reabilitaĆ§Ć£o, inteligĆŖncia artificial e design de jogos, compondo uma ferramenta fĆ”cil de adaptar a um ambiente clĆ­nico e acessĆ­vel para se adaptar Ć s necessidades de cada paciente. Entre todos os desafios dentro de NeuroAlreh@b, foca-se em representar um perfil cognitivo atravĆ©s da agregaĆ§Ć£o de mĆŗltiplas avaliaƧƵes neuropsicolĆ³gicas. Para testar esta possibilidade, precisaremos de dados de pacientes, que atualmente nĆ£o temos. Na primeira parte do projeto deste mestrado, vamos testar a possibilidade de agregar avaliaƧƵes neuropsicolĆ³gicas para o caso da doenƧa de Alzheimer utilizando a base de dados da Iniciativa de Neuroimagem da DoenƧa de Alzheimer. Esta base de dados contĆ©m uma vasta coleĆ§Ć£o de imagens e avaliaƧƵes neuropsicolĆ³gicas que servirĆ£o de base para o NeuroAlreh@b quando chegar a hora. Na segunda parte deste projeto, vamos criar um sistema informĆ”tico para executar todos os modelos e simulaƧƵes de inteligĆŖncia artificial necessĆ”rios para o projeto BRaNT. O sistema tambĆ©m irĆ” alocar uma base de dados e um webserver para servir todas as pĆ”ginas necessĆ”rias para o projeto

    End-to-End Entity Resolution for Big Data: A Survey

    Get PDF
    One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-to-end view of modern ER workflows, and of the novel aspects of entity indexing and matching methods in order to cope with more than one of the Big Data characteristics simultaneously. We present the basic concepts, processing steps and execution strategies that have been proposed by different communities, i.e., database, semantic Web and machine learning, in order to cope with the loose structuredness, extreme diversity, high speed and large scale of entity descriptions used by real-world applications. Finally, we provide a synthetic discussion of the existing approaches, and conclude with a detailed presentation of open research directions

    LearnFCA: A Fuzzy FCA and Probability Based Approach for Learning and Classification

    Get PDF
    Formal concept analysis(FCA) is a mathematical theory based on lattice and order theory used for data analysis and knowledge representation. Over the past several years, many of its extensions have been proposed and applied in several domains including data mining, machine learning, knowledge management, semantic web, software development, chemistry ,biology, medicine, data analytics, biology and ontology engineering. This thesis reviews the state-of-the-art of theory of Formal Concept Analysis(FCA) and its various extensions that have been developed and well-studied in the past several years. We discuss their historical roots, reproduce the original definitions and derivations with illustrative examples. Further, we provide a literature review of itā€™s applications and various approaches adopted by researchers in the areas of dataanalysis, knowledge management with emphasis to data-learning and classification problems. We propose LearnFCA, a novel approach based on FuzzyFCA and probability theory for learning and classification problems. LearnFCA uses an enhanced version of FuzzyLattice which has been developed to store class labels and probability vectors and has the capability to be used for classifying instances with encoded and unlabelled features. We evaluate LearnFCA on encodings from three datasets - mnist, omniglot and cancer images with interesting results and varying degrees of success. Adviser: Dr Jitender Deogu

    On relational learning and discovery in social networks: a survey

    Get PDF
    The social networking scene has evolved tremendously over the years. It has grown in relational complexities that extend a vast presence onto popular social media platforms on the internet. With the advance of sentimental computing and social complexity, relationships which were once thought to be simple have now become multi-dimensional and widespread in the online scene. This explosion in the online social scene has attracted much research attention. The main aims of this work revolve around the knowledge discovery and datamining processes of these feature-rich relations. In this paper, we provide a survey of relational learning and discovery through popular social analysis of different structure types which are integral to applications within the emerging field of sentimental and affective computing. It is hoped that this contribution will add to the clarity of how social networks are analyzed with the latest groundbreaking methods and provide certain directions for future improvements

    Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy

    Get PDF
    [Abstract]: Outlier detection is an important research problem in data mining that aims to discover useful abnormal and irregular patterns hidden in large data sets. Most existing outlier detection methods only deal with static data with relatively low dimensionality. Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers in high-dimensional data are projected outliers, i.e., they are embedded in lower-dimensional subspaces. Detecting projected outliers from high-dimensional stream data is a very challenging task for several reasons. First, detecting projected outliers is difficult even for high-dimensional static data. The exhaustive search for the out-lying subspaces where projected outliers are embedded is a NP problem. Second, the algorithms for handling data streams are constrained to take only one pass to process the streaming data with the conditions of space limitation and time criticality. The currently existing methods for outlier detection are found to be ineffective for detecting projected outliers in high-dimensional data streams. In this thesis, we present a new technique, called the Stream Project Outlier deTector (SPOT), which attempts to detect projected outliers in high-dimensional data streams. SPOT employs an innovative window-based time model in capturing dynamic statistics from stream data, and a novel data structure containing a set of top sparse subspaces to detect projected outliers effectively. SPOT also employs a multi-objective genetic algorithm as an effective search method for finding the outlying subspaces where most projected outliers are embedded. The experimental results demonstrate that SPOT is efficient and effective in detecting projected outliers for high-dimensional data streams. The main contribution of this thesis is that it provides a backbone in tackling the challenging problem of outlier detection for high- dimensional data streams. SPOT can facilitate the discovery of useful abnormal patterns and can be potentially applied to a variety of high demand applications, such as for sensor network data monitoring, online transaction protection, etc

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    PANDA: prioritization of autism-genes using network-based deep-learning approach

    Get PDF
    Autism is a neuropsychiatric disorder characterized by impairments in reciprocal social interaction and communication, and the presence of restricted and repetitive behaviours. Autism is predominantly heritable, but the underlying genetic associations are still largely unknown. Understanding the genetic background of complex diseases, such as autism, plays an essential role in the promising precision medicine. The evaluation of candidate genes, however, requires time-consuming and expensive experiments given the large number of possibilities. Thus, computational methods have seen increasing applications in predicting gene-disease associations. In this thesis, we proposed a bioinformatics framework, Prioritization of Autism-genes using Network-based Deep-learning Approach (PANDA). Our approach aims to identify autism-genes across the human genome based on patterns of gene-gene interactions and topological similarity of genes in the interaction network. PANDA trains a graph deep learning classifier using the input of the human molecular interaction network (HMIN) and predicts and ranks the probability of autism association of every node (gene) in the network. PANDA was able to achieve a high classification accuracy of 89%, outperforming three other commonly used machine learning algorithms. Moreover, the gene prioritization ranking list produced by PANDA was evaluated and validated using a large-scale independent exome-sequencing study. The top decile (top 10%) of PANDA ranked genes were found significantly enriched for autism association
    • ā€¦
    corecore