144 research outputs found

    Predicting Anchor Links between Heterogeneous Social Networks

    Full text link
    People usually get involved in multiple social networks to enjoy new services or to fulfill their needs. Many new social networks try to attract users of other existing networks to increase the number of their users. Once a user (called source user) of a social network (called source network) joins a new social network (called target network), a new inter-network link (called anchor link) is formed between the source and target networks. In this paper, we concentrated on predicting the formation of such anchor links between heterogeneous social networks. Unlike conventional link prediction problems in which the formation of a link between two existing users within a single network is predicted, in anchor link prediction, the target user is missing and will be added to the target network once the anchor link is created. To solve this problem, we use meta-paths as a powerful tool for utilizing heterogeneous information in both the source and target networks. To this end, we propose an effective general meta-path-based approach called Connector and Recursive Meta-Paths (CRMP). By using those two different categories of meta-paths, we model different aspects of social factors that may affect a source user to join the target network, resulting in the formation of a new anchor link. Extensive experiments on real-world heterogeneous social networks demonstrate the effectiveness of the proposed method against the recent methods.Comment: To be published in "Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

    Three Essays on Friend Recommendation Systems for Online Social Networks

    Get PDF
    Social networking sites (SNSs) first appeared in the mid-90s. In recent years, however, Web 2.0 technologies have made modern SNSs increasingly popular and easier to use, and social networking has expanded explosively across the web. This brought a massive number of new users. Two of the most popular SNSs, Facebook and Twitter, have reached one billion users and exceeded half billion users, respectively. Too many new users may cause the cold start problem. Users sign up on a SNS and discover they do not have any friends. Normally, SNSs solve this problem by recommending potential friends. The current major methods for friend recommendations are profile matching and “friends-of-friends.” The profile matching method compares two users’ profiles. This is relatively inflexible because it ignores the changing nature of users. It also requires complete profiles. The friends-of-friends method can only find people who are likely to be previously known to each other and neglects many users who share the same interests. To the best of my knowledge, existing research has not proposed guidelines for building a better recommendation system based on context information (location information) and user-generated content (UGC). This dissertation consists of three essays. The first essay focuses on location information and then develops a framework for using location to recommend friends--a framework that is not limited to making only known people recommendations but that also adds stranger recommendations. The second essay employs UGC by developing a text analytic framework that discovers users’ interests and personalities and uses this information to recommend friends. The third essay discusses friend recommendations in a certain type of online community – health and fitness social networking sites, physical activities and health status become more important factors in this case. Essay 1: Location-sensitive Friend Recommendations in Online Social Networks GPS-embedded smart devices and wearable devices such as smart phones, tablets, smart watches, etc., have significantly increased in recent years. Because of them, users can record their location at anytime and anyplace. SNSs such as Foursquare, Facebook, and Twitter all have developed their own location-based services to collect users’ location check-in data and provide location-sensitive services such as location-based promotions. None of these sites, however, have used location information to make friend recommendations. In this essay, we investigate a new model to make friend recommendations. This model includes location check-in data as predictors and calculates users’ check-in histories--users’ life patterns--to make friend recommendations. The results of our experiment show that this novel model provides better performance in making friend recommendations. Essay 2: Novel Friend Recommendations Based on User-generated Contents More and more users have joined and contributed to SNSs. Users share stories of their daily life (such as having delicious food, enjoying shopping, traveling, hanging out, etc.) and leave comments. This huge amount of UGC could provide rich data for building an accurate, adaptable, effective, and extensible user model that reflects users’ interests, their sentiments about different type of locations, and their personalities. From the computer-supported social matching process, these attributes could influence friend matches. Unfortunately, none of the previous studies in this area have focused on using these extracted meta-text features for friend recommendation systems. In this study, we develop a text analytic framework and apply it to UGCs on SNSs. By extracting interests and personality features from UGCs, we can make text-based friend recommendations. The results of our experiment show that text features could further improve recommendation performance. Essay 3: Friend Recommendations in Health/Fitness Social Networking Sites Thanks to the growing number of wearable devices, online health/fitness communities are becoming more and more popular. This type of social networking sites offers individuals the opportunity to monitor their diet process and motivating them to change their lifestyles. Users can improve their physical activity level and health status by receiving information, advice and supports from their friends in the social networks. Many studies have confirmed that social network structure and the degree of homophily in a network will affect how health behavior and innovations are spread. However, very few studies have focused on the opposite, the impact from users’ daily activities for building friendships in a health/fitness social networking site. In this study, we track and collect users’ daily activities from Record, a famous online fitness social networking sites. By building an analytic framework, we test and evaluate how people’s daily activities could help friend recommendations. The results of our experiment have shown that by using the helps from these information, friend recommendation systems become more accurate and more precise

    Causal Discovery from Temporal Data: An Overview and New Perspectives

    Full text link
    Temporal data, representing chronological observations of complex systems, has always been a typical data structure that can be widely generated by many domains, such as industry, medicine and finance. Analyzing this type of data is extremely valuable for various applications. Thus, different temporal data analysis tasks, eg, classification, clustering and prediction, have been proposed in the past decades. Among them, causal discovery, learning the causal relations from temporal data, is considered an interesting yet critical task and has attracted much research attention. Existing casual discovery works can be divided into two highly correlated categories according to whether the temporal data is calibrated, ie, multivariate time series casual discovery, and event sequence casual discovery. However, most previous surveys are only focused on the time series casual discovery and ignore the second category. In this paper, we specify the correlation between the two categories and provide a systematical overview of existing solutions. Furthermore, we provide public datasets, evaluation metrics and new perspectives for temporal data casual discovery.Comment: 52 pages, 6 figure

    A data-driven approach to modelling structures

    Get PDF
    This thesis is focussed on machine-learning approaches to defining accurate models for structural dynamics. The work is motivated by the concept of `digital twin' and is an attempt to build tools that could be included in a modelling campaign for structures or within the context of a digital twin used for structural health monitoring (or more broadly, asset management). In recent years, machine learning has provided solutions to many modelling problems, offering solutions that do not require exact knowledge of the physics of the phenomena that are modelled. For structural dynamics this approach can be quite useful, since accurate mathematical representations of the physics of several structures are often not available. Moreover, for performing \textit{structural health monitoring} SHM of structures, data should be used, making machine learning a straightforward way to deal with such problems. The thesis attempts to exploit powerful machine learning algorithms to perform inference for structures in situations that traditional methodologies might fail. The attempts concern several fields of structural dynamics, such as population-based structural health monitoring, modelling under uncertainty and with a combination of known and unknown environmental conditions, performing modal decomposition for structures with nonlinear elements and defining the remaining useful life of a structure within a population of similar structures. The methodologies presented yield very promising results and reinforce the idea that machine learning, in some cases combined with physics, can be used as a tool to define accurate models of structures. As described in the first chapters of the thesis, an efficient modelling strategy for structures is to use various different models in order to model different parts, substructures or functionalities of a structure. Therefore, an organising technique for all the available data and models that are used is required. For this reason, an \textit{ontological approach} is proposed herein to include all the aforementioned elements and in order to facilitate knowledge sharing. After defining an organising technique for such a project, some data-driven schemes using novel machine-learning algorithms are presented. Initially, a method to define nonlinear normal modes of oscillations of structures is presented. The method is based on the use of a variation of a \textit{generative adversarial network} (GAN), and proves to provide quite efficient modal decomposition, under specific assumptions. The generative adversarial network algorithm is further explored and an algorithm is developed to define \textit{generative mirror models} of structures. The algorithm is developed to perform in an environment where both known/measured and unknown variables affect a structure. The algorithm, being a generative algorithm, provides a probability distribution of potential outcomes, rather than single-point predictions, allowing probabilistic assessment and planning about a structure to be undertaken. Moreover, \textit{population-based structural health monitoring} (PBSHM) is addressed using machine-learning algorithms. Performing inference in heterogeneous populations can be complicated, because of the big differences between structures within such populations. In the current thesis, a \textit{graph neural network} (GNN) approach is combined with the transformation of structures into graphs, to perform inference in such a population. The novel GNN algorithm proves able to learn efficiently the interaction physics between structural members and their environment. Finally, a generative model is used to deal with the problem of estimating the remaining useful life of structures within a population. This algorithm is also a variation of the GAN and is built to generate time series. Using this method, a probability density is defined over the remaining lifetime of a structure, exploiting information available from other structures of the population, for which data are available and which have reached their total lifetime. The new contribution of this research is the use of currently-state-of-the-art machine learning models for the purposes of structural dynamics. GANs are used for purposes other than their original purpose (artificial data generation), i.e. to perform nonlinear modal analysis and to define generative digital twins of structures. Such models are also used with a view to defining a generative time-series model, which is exploited to estimate the remaining useful lifetime of structures within a population. A second novel type of model that is exploited in the current thesis for the purposes of structural dynamics is that of graph neural networks, which are used to infer the normal condition characteristics of structures within a population

    Exploiting Latent Features of Text and Graphs

    Get PDF
    As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation

    Model based forecasting for demand response strategies

    Get PDF
    The incremental deployment of decentralized renewable energy sources in the distribution grid is triggering a paradigm change for the power sector. This shift from a centralized structure with big power plants to a decentralized scenario of distributed energy resources, such as solar and wind, calls for a more active management of the distribution grid. Conventional distribution grids were passive systems, in which the power was flowing unidirectionally from upstream to downstream. Nowadays, and increasingly in the future, the penetration of distributed generation (DG), with its stochastic nature and lack of controllability, represents a major challenge for the stability of the network, especially at the distribution level. In particular, the power flow reversals produced by DG cause voltage excursions, which must be compensated. This poses an obstacle to the energy transition towards a more sustainable energy mix, which can however be mitigated by using a more active approach towards the control of the distribution networks. Demand side management (DSM) offers a possible solution to the problem, allowing to actively control the balance between generation, consumption and storage, close to the point of generation. An active energy management implies not only the capability to react promptly in case of disturbances, but also to ability to anticipate future events and take control actions accordingly. This is usually achieved through model predictive control (MPC), which requires a prediction of the future disturbances acting on the system. This thesis treat challenges of distributed DSM, with a particular focus on the case of a high penetration of PV power plants. The first subject of the thesis is the evaluation of the performance of models for forecasting and control with low computational requirements, of distributed electrical batteries. The proposed methods are compared by means of closed loop deterministic and stochastic MPC performance. The second subject of the thesis is the development of model based forecasting for PV power plants, and methods to estimate these models without the use of dedicated sensors. The third subject of the thesis concerns strategies for increasing forecasting accuracy when dealing with multiple signals linked by hierarchical relations. Hierarchical forecasting methods are introduced and a distributed algorithm for reconciling base forecasters is presented. At the same time, a new methodology for generating aggregate consistent probabilistic forecasts is proposed. This method can be applied to distributed stochastic DSM, in the presence of high penetration of rooftop installed PV systems. In this case, the forecasts' errors become mutually dependent, raising difficulties in the control problem due to the nontrivial summation of dependent random variables. The benefits of considering dependent forecasting errors over considering them as independent and uncorrelated, are investigated. The last part of the thesis concerns models for distributed energy markets, relying on hierarchical aggregators. To be effective, DSM requires a considerable amount of flexible load and storage to be controllable. This generates the need to be able to pool and coordinate several units, in order to reach a critical mass. In a real case scenario, flexible units will have different owners, who will have different and possibly conflicting interests. In order to recruit as much flexibility as possible, it is therefore importan
    • …
    corecore