974 research outputs found
Learning Collective Behavior in Multi-relational Networks
With the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of relationship. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, networked data extracted from social media, web pages, and bibliographic databases can contain entities of multiple classes and linked by various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers. Learning the collective behavior and interactions in heterogeneous networks becomes much more complex. The contribution of this dissertation include 1) two classification frameworks for identifying human collective behavior in multi-relational social networks; 2) unsupervised and supervised learning models for relationship prediction in multi-relational collaborative networks. Our methods improve the performance of homogeneous predictive models by differentiating heterogeneous relations and capturing the prominent interaction patterns underlying the network structure. The work has been evaluated in various real-world social networks. We believe that this study will be useful for analyzing human collective behavior and interactions specifically in the scenario when the heterogeneous relationships in the network arise from various causal reasons
A framework for AI-driven neurorehabilitation training: the profiling challenge
Cognitive decline is a common sign that a person is ageing. However, abnormal
cases can lead to dementia, affecting daily living activities and independent
functioning. It is a leading cause of disability and death. Its prevention is a global
health priority.
One way to address cognitive decline is to undergo cognitive rehabilitation.
Cognitive rehabilitation aims to restore or mitigate the symptoms of a cognitive
disability, increasing the quality of life for the patient. However, cognitive
rehabilitation is stuck to clinical environments and logistics, leading to a
suboptimal set of expansive tools that is hard to accommodate every patientās
needs.
The BRaNT project aims to create a tool that mitigates this problem.
The NeuroAIreh@b is a rehabilitation tool developed within a framework
that combines neuropsychological assessments, neurorehabilitation procedures,
artificial intelligence and game design, composing a tool that is easy to set up in
a clinical environment and accessible to adapt to every patientās needs.
Among all the challenges within NeuroAlreh@b, one focuses on representing
a cognitive profile through the aggregation of multiple neuropsychological
assessments. To test this possibility, we will need data from patients currently
unavailable.
In the first part of this masterās project, study the possibility of aggregating
neuropsychological assessments for the case of Alzheimerās disease using the
Alzheimerās Disease Neuroimaging Initiative database. This database contains a
vast collection of images and neuropsychological assessments that will serve as
a baseline for the NeuroAlreh@b when the time comes.
In the second part of this project, we set up a computational system to run all
the artificial intelligence models and simulations required for the BRaNT project.
The system allocates a database and a webserver to serve all the required pages
for the project.O declĆnio cognitivo Ć© um sinal comum de que uma pessoa estĆ” a envelhecer.
No entanto, casos anormais podem levar Ć demĆŖncia, afetando as atividades
diĆ”rias e funcionamento independente. DemĆŖncia Ć© uma das principais causas
de incapacidade e morte. Fazendo da sua prevenĆ§Ć£o uma prioridade para a
saĆŗde global.
Uma forma de lidar com o declĆnio cognitivo Ć© submeter-se Ć reabilitaĆ§Ć£o
cognitiva. A reabilitaĆ§Ć£o cognitiva visa restaurar ou mitigar os sintomas de uma
deficiĆŖncia cognitiva, aumentando a qualidade de vida do paciente. No entanto,
a reabilitaĆ§Ć£o cognitiva estĆ” presa a ambientes clĆnicos e logĆstica, levando a
um conjunto sub-ideal de ferramentas com custos elevados e complicadas de
acomodar as necessidades de cada paciente.
O projeto BRaNT visa criar uma ferramenta que atenue este problema.
O NeuroAIreh@b Ć© uma ferramenta de reabilitaĆ§Ć£o desenvolvida num quadro
que combina avaliaƧƵes neuropsicolĆ³gicas, reabilitaĆ§Ć£o, inteligĆŖncia artificial e
design de jogos, compondo uma ferramenta fƔcil de adaptar a um ambiente
clĆnico e acessĆvel para se adaptar Ć s necessidades de cada paciente.
Entre todos os desafios dentro de NeuroAlreh@b, foca-se em representar um
perfil cognitivo atravĆ©s da agregaĆ§Ć£o de mĆŗltiplas avaliaƧƵes neuropsicolĆ³gicas.
Para testar esta possibilidade, precisaremos de dados de pacientes, que
atualmente nĆ£o temos.
Na primeira parte do projeto deste mestrado, vamos testar a possibilidade
de agregar avaliaƧƵes neuropsicolĆ³gicas para o caso da doenƧa de Alzheimer
utilizando a base de dados da Iniciativa de Neuroimagem da DoenƧa de
Alzheimer. Esta base de dados contĆ©m uma vasta coleĆ§Ć£o de imagens e
avaliaƧƵes neuropsicolĆ³gicas que servirĆ£o de base para o NeuroAlreh@b quando
chegar a hora.
Na segunda parte deste projeto, vamos criar um sistema informƔtico para
executar todos os modelos e simulaƧƵes de inteligĆŖncia artificial necessĆ”rios
para o projeto BRaNT. O sistema tambƩm irƔ alocar uma base de dados e um
webserver para servir todas as pƔginas necessƔrias para o projeto
End-to-End Entity Resolution for Big Data: A Survey
One of the most important tasks for improving data quality and the
reliability of data analytics results is Entity Resolution (ER). ER aims to
identify different descriptions that refer to the same real-world entity, and
remains a challenging problem. While previous works have studied specific
aspects of ER (and mostly in traditional settings), in this survey, we provide
for the first time an end-to-end view of modern ER workflows, and of the novel
aspects of entity indexing and matching methods in order to cope with more than
one of the Big Data characteristics simultaneously. We present the basic
concepts, processing steps and execution strategies that have been proposed by
different communities, i.e., database, semantic Web and machine learning, in
order to cope with the loose structuredness, extreme diversity, high speed and
large scale of entity descriptions used by real-world applications. Finally, we
provide a synthetic discussion of the existing approaches, and conclude with a
detailed presentation of open research directions
LearnFCA: A Fuzzy FCA and Probability Based Approach for Learning and Classification
Formal concept analysis(FCA) is a mathematical theory based on lattice and order theory used for data analysis and knowledge representation. Over the past several years, many of its extensions have been proposed and applied in several domains including data mining, machine learning, knowledge management, semantic web, software development, chemistry ,biology, medicine, data analytics, biology and ontology engineering.
This thesis reviews the state-of-the-art of theory of Formal Concept Analysis(FCA) and its various extensions that have been developed and well-studied in the past several years. We discuss their historical roots, reproduce the original definitions and derivations with illustrative examples. Further, we provide a literature review of itās applications and various approaches adopted by researchers in the areas of dataanalysis, knowledge management with emphasis to data-learning and classification problems.
We propose LearnFCA, a novel approach based on FuzzyFCA and probability theory for learning and classification problems. LearnFCA uses an enhanced version of FuzzyLattice which has been developed to store class labels and probability vectors and has the capability to be used for classifying instances with encoded and unlabelled features. We evaluate LearnFCA on encodings from three datasets - mnist, omniglot and cancer images with interesting results and varying degrees of success.
Adviser: Dr Jitender Deogu
On relational learning and discovery in social networks: a survey
The social networking scene has evolved tremendously over the years. It has grown in relational complexities that extend a vast presence onto popular social media platforms on the internet. With the advance of sentimental computing and social complexity, relationships which were once thought to be simple have now become multi-dimensional and widespread in the online scene. This explosion in the online social scene has attracted much research attention. The main aims of this work revolve around the knowledge discovery and datamining processes of these feature-rich relations. In this paper, we provide a survey of relational learning and discovery through popular social analysis of different structure types which are integral to applications within the emerging field of sentimental and affective computing. It is hoped that this contribution will add to the clarity of how social networks are analyzed with the latest groundbreaking methods and provide certain directions for future improvements
Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy
[Abstract]: Outlier detection is an important research problem in data mining that aims to discover useful abnormal and irregular patterns hidden in large data sets. Most existing outlier detection methods only deal with static data with relatively low dimensionality.
Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers
in high-dimensional data are projected outliers, i.e., they are embedded in lower-dimensional subspaces. Detecting projected outliers from high-dimensional stream
data is a very challenging task for several reasons. First, detecting projected outliers is difficult even for high-dimensional static data. The exhaustive search for the out-lying subspaces where projected outliers are embedded is a NP problem. Second, the algorithms for handling data streams are constrained to take only one pass to process the streaming data with the conditions of space limitation and time criticality. The currently existing methods for outlier detection are found to be ineffective for detecting projected outliers in high-dimensional data streams.
In this thesis, we present a new technique, called the Stream Project Outlier deTector (SPOT), which attempts to detect projected outliers in high-dimensional
data streams. SPOT employs an innovative window-based time model in capturing dynamic statistics from stream data, and a novel data structure containing a set of
top sparse subspaces to detect projected outliers effectively. SPOT also employs a multi-objective genetic algorithm as an effective search method for finding the
outlying subspaces where most projected outliers are embedded. The experimental results demonstrate that SPOT is efficient and effective in detecting projected outliers
for high-dimensional data streams. The main contribution of this thesis is that it provides a backbone in tackling the challenging problem of outlier detection for high-
dimensional data streams. SPOT can facilitate the discovery of useful abnormal patterns and can be potentially applied to a variety of high demand applications, such as for sensor network data monitoring, online transaction protection, etc
New Fundamental Technologies in Data Mining
The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining
PANDA: prioritization of autism-genes using network-based deep-learning approach
Autism is a neuropsychiatric disorder characterized by impairments in reciprocal
social interaction and communication, and the presence of restricted and repetitive
behaviours. Autism is predominantly heritable, but the underlying genetic associations
are still largely unknown. Understanding the genetic background of complex
diseases, such as autism, plays an essential role in the promising precision medicine.
The evaluation of candidate genes, however, requires time-consuming and expensive
experiments given the large number of possibilities. Thus, computational methods
have seen increasing applications in predicting gene-disease associations. In this thesis,
we proposed a bioinformatics framework, Prioritization of Autism-genes using
Network-based Deep-learning Approach (PANDA). Our approach aims to identify
autism-genes across the human genome based on patterns of gene-gene interactions
and topological similarity of genes in the interaction network. PANDA trains a graph
deep learning classifier using the input of the human molecular interaction network
(HMIN) and predicts and ranks the probability of autism association of every node
(gene) in the network. PANDA was able to achieve a high classification accuracy of
89%, outperforming three other commonly used machine learning algorithms. Moreover,
the gene prioritization ranking list produced by PANDA was evaluated and
validated using a large-scale independent exome-sequencing study. The top decile
(top 10%) of PANDA ranked genes were found significantly enriched for autism association
- ā¦