1,485 research outputs found

    Uncovering New Links Through Interaction Duration

    Get PDF
    Link Prediction is the problem of inferring new relationships among nodes in a network that can occur in the near future. Classical approaches mainly consider neighborhood structure similarity when linking nodes. However, we may also want to take into account whether the two nodes we are going to link will benefit from that by having an active interaction over time. For instance, it is better to link two nodes � and � if we know that these two nodes will interact in the social network in the future, rather than suggesting �, who may never interact with �. Thus, the longer the interaction is estimated to last, i.e., persistent interactions, the higher the priority is for connecting the two nodes. This current thesis focuses on the problem of predicting how long two nodes will interact in a network by identifying potential pairs of nodes (�, �)that are not connected, yet show some Indirect Interaction. “Indirect Interaction” means that there is a particular action involving both the nodes depending on the type of network. For example, in social networks such as Facebook, there are users that are not friends but interact with other user’s wall posts. On the Wikipedia hyperlink network, it happens when readers navigate from page � to page � through the search box (on the top right corner of page �), and there is no explicit link on page � to �. This research explores cases that involved multiple interactions between � and � during an observational time interval [��, ��). Two supervised learning approaches are proposed for the problem. Given a set of network-based predictors, the basic approach consists of learning a binary classifier to predict whether or not an observed Indirect Interaction will last in the future. The second and more fine-grained approach consists of estimating how long the interaction will last by modeling the problem via Survival Analysis or as a Regression task. Once the duration is estimated, this information is leveraged for the Link Prediction task. Experiments were performed on the longitudinal Facebook network and wall interactions dataset, and Wikipedia Clickstream dataset to test this approach of predicting the Duration of Interaction and Link Prediction. Based on the experiments conducted, this study’s results show that the fine-grained approach performs the best with an AUROC of 85.4% on Facebook and 77% on Wikipedia for Link Prediction. Moreover, this approach beats a Link Prediction model that does not consider the Duration of Interaction and is based only on network properties, and that performs with an AUROC of 0.80 and 0.68 on Facebook and Wikipedia, respectively

    Predicting Online Invitation Responses with a Competing Risk Model Using Privacy-Friendly Social Event Data

    Get PDF
    Predicting people's responses to invitations is an important issue for social event management, as the decision-making process behind member responses to invitations is complicated. The purpose of this paper is to suggest a privacy-friendly method to predict whether and when people will respond to open invitations. We apply the competing risk model to predict member responses. The predictive model uses past social event participation data to infer a network structure among people who accept or reject invitations. The inferred networks collectively show the extent to which people are likely to accept or reject invitations. Validated using real datasets including 31,230 people and 8,885 events, the proposed method not only presents the variables that predict attendance (such as past attendance and social network), but also those that predict faster responses. This approach is privacy friendly, as it requires no personal information regarding people and social events (such as name, age and gender or event content). This work contributes to the predictive modeling literature as the first study of a competing risk model developed for replies to a social invitation. Our findings will help event organizers predict how many people will attend events, allowing them to organize effectively

    Algorithms in E-recruitment Systems

    Get PDF

    Discourse network analysis: policy debates as dynamic networks

    Get PDF
    Political discourse is the verbal interaction between political actors. Political actors make normative claims about policies conditional on each other. This renders discourse a dynamic network phenomenon. Accordingly, the structure and dynamics of policy debates can be analyzed with a combination of content analysis and dynamic network analysis. After annotating statements of actors in text sources, networks can be created from these structured data, such as congruence or conflict networks at the actor or concept level, affiliation networks of actors and concept stances, and longitudinal versions of these networks. The resulting network data reveal important properties of a debate, such as the structure of advocacy coalitions or discourse coalitions, polarization and consensus formation, and underlying endogenous processes like popularity, reciprocity, or social balance. The added value of discourse network analysis over survey-based policy network research is that policy processes can be analyzed from a longitudinal perspective. Inferential techniques for understanding the micro-level processes governing political discourse are being developed

    Is Oprah Contagious? Identifying Demand Spillovers in Product Networks

    Get PDF
    We study the online contagion of exogenous demand shocks generated by book reviews featured on the Oprah Winfrey TV show and published in the New York Times, through the co-purchase recommendation network on Amazon.com. These exogenous events may ripple through and affect the demand for a 'network' of related books that were not explicitly mentioned in a review but were located 'close' to reviewed books in this network. Using a difference-in-differences matched-sample approach, we identify the extent of the variations caused by the visibility of the online network and distinguish this effect from variation caused by hidden product complementarities. Our results show that the demand shock diffuses to books that are upto five links away from the reviewed book, and that this diffused shock persists for a substantial number of days, although the depth and the magnitude of diffusion varies widely across books at the same network distance from the focal product. We then analyze how product characteristics, assortative mixing and local network structure, play a role in explaining this variation in the depth and persistence of the contagion. Specifically, more clustered local networks 'trap' the diffused demand shocks and cause it to be more intense and of a greater duration but restrict the distance of its spread, while less clustered networks lead to wider contagion of a lower magnitude and duration. Our results provide new evidence of the interplay between a firm's online and offline media strategies and we contribute methods for modeling and analyzing contagion in networks

    Mapping (Dis-)Information Flow about the MH17 Plane Crash

    Get PDF
    Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

    Novel methods based on regression techniques to analyze multistate models and high-dimensional omics data.

    Get PDF
    The dissertation is based on four distinct research projects that are loosely interconnected by the common link of a regression framework. Chapter 1 provides an introductory outline of the problems addressed in the projects along with a detailed review of the previous works that have been done on them and a brief discussion on our newly developed methodologies. Chapter 2 describes the first project that is concerned with the identification of hidden subject-specific sources of heterogeneity in gene expression profiling analyses and adjusting for them by a technique based on Partial Least Squares (PLS) regression, in order to ensure a more accurate inference on the expression pattern of the genes over two different varieties of samples. Chapter 3 focuses on the development of an R package based on Project 1 and its performance evaluation with respect to other popular software dealing with differential gene expression analyses. Chapter 4 covers the third project that proposes a non-parametric regression method for the estimation of stage occupation probabilities at different time points in a right-censored multistate model data, using an Inverse Probability of Censoring (IPCW) (Datta and Satten, 2001) based version of the backfitting principle (Hastie and Tibshirani, 1992). Chapter 5 describes the fourth project which deals with the testing for the equality of the residual distributions after adjusting for available covariate information from the right censored waiting times of two groups of subjects, by using an Inverse Probability of Censoring weighted (IPCW) version of the Mann-Whitney U test

    Unsupervised multiple kernel learning approaches for integrating molecular cancer patient data

    Get PDF
    Cancer is the second leading cause of death worldwide. A characteristic of this disease is its complexity leading to a wide variety of genetic and molecular aberrations in the tumors. This heterogeneity necessitates personalized therapies for the patients. However, currently defined cancer subtypes used in clinical practice for treatment decision-making are based on relatively few selected markers and thus provide only a coarse classifcation of tumors. The increased availability in multi-omics data measured for cancer patients now offers the possibility of defining more informed cancer subtypes. Such a more fine-grained characterization of cancer subtypes harbors the potential of substantially expanding treatment options in personalized cancer therapy. In this thesis, we identify comprehensive cancer subtypes using multidimensional data. For this purpose, we apply and extend unsupervised multiple kernel learning methods. Three challenges of unsupervised multiple kernel learning are addressed: robustness, applicability, and interpretability. First, we show that regularization of the multiple kernel graph embedding framework, which enables the implementation of dimensionality reduction techniques, can increase the stability of the resulting patient subgroups. This improvement is especially beneficial for data sets with a small number of samples. Second, we adapt the objective function of kernel principal component analysis to enable the application of multiple kernel learning in combination with this widely used dimensionality reduction technique. Third, we improve the interpretability of kernel learning procedures by performing feature clustering prior to integrating the data via multiple kernel learning. On the basis of these clusters, we derive a score indicating the impact of a feature cluster on a patient cluster, thereby facilitating further analysis of the cluster-specific biological properties. All three procedures are successfully tested on real-world cancer data. Comparing our newly derived methodologies to established methods provides evidence that our work offers novel and beneficial ways of identifying patient subgroups and gaining insights into medically relevant characteristics of cancer subtypes.Krebs ist eine der häufigsten Todesursachen weltweit. Krebs ist gekennzeichnet durch seine Komplexität, die zu vielen verschiedenen genetischen und molekularen Aberrationen im Tumor führt. Die Unterschiede zwischen Tumoren erfordern personalisierte Therapien für die einzelnen Patienten. Die Krebssubtypen, die derzeit zur Behandlungsplanung in der klinischen Praxis verwendet werden, basieren auf relativ wenigen, genetischen oder molekularen Markern und können daher nur eine grobe Unterteilung der Tumoren liefern. Die zunehmende Verfügbarkeit von Multi-Omics-Daten für Krebspatienten ermöglicht die Neudefinition von fundierteren Krebssubtypen, die wiederum zu spezifischeren Behandlungen für Krebspatienten führen könnten. In dieser Dissertation identifizieren wir neue, potentielle Krebssubtypen basierend auf Multi-Omics-Daten. Hierfür verwenden wir unüberwachtes Multiple Kernel Learning, welches in der Lage ist mehrere Datentypen miteinander zu kombinieren. Drei Herausforderungen des unüberwachten Multiple Kernel Learnings werden adressiert: Robustheit, Anwendbarkeit und Interpretierbarkeit. Zunächst zeigen wir, dass die zusätzliche Regularisierung des Multiple Kernel Learning Frameworks zur Implementierung verschiedener Dimensionsreduktionstechniken die Stabilität der identifizierten Patientengruppen erhöht. Diese Robustheit ist besonders vorteilhaft für Datensätze mit einer geringen Anzahl von Proben. Zweitens passen wir die Zielfunktion der kernbasierten Hauptkomponentenanalyse an, um eine integrative Version dieser weit verbreiteten Dimensionsreduktionstechnik zu ermöglichen. Drittens verbessern wir die Interpretierbarkeit von kernbasierten Lernprozeduren, indem wir verwendete Merkmale in homogene Gruppen unterteilen bevor wir die Daten integrieren. Mit Hilfe dieser Gruppen definieren wir eine Bewertungsfunktion, die die weitere Auswertung der biologischen Eigenschaften von Patientengruppen erleichtert. Alle drei Verfahren werden an realen Krebsdaten getestet. Den Vergleich unserer Methodik mit etablierten Methoden weist nach, dass unsere Arbeit neue und nützliche Möglichkeiten bietet, um integrative Patientengruppen zu identifizieren und Einblicke in medizinisch relevante Eigenschaften von Krebssubtypen zu erhalten
    corecore