126 research outputs found

    CoCon: A Data Set on Combined Contextualized Research Artifact Use

    Full text link
    In the wake of information overload in academia, methodologies and systems for search, recommendation, and prediction to aid researchers in identifying relevant research are actively studied and developed. Existing work, however, is limited in terms of granularity, focusing only on the level of papers or a single type of artifact, such as data sets. To enable more holistic analyses and systems dealing with academic publications and their content, we propose CoCon, a large scholarly data set reflecting the combined use of research artifacts, contextualized in academic publications' full-text. Our data set comprises 35 k artifacts (data sets, methods, models, and tasks) and 340 k publications. We additionally formalize a link prediction task for "combined research artifact use prediction" and provide code to utilize analyses of and the development of ML applications on our data. All data and code is publicly available at https://github.com/IllDepence/contextgraph.Comment: submitted to JCDL202

    Relational Boosted Bandits

    Full text link
    Contextual bandits algorithms have become essential in real-world user interaction problems in recent years. However, these algorithms rely on context as attribute value representation, which makes them unfeasible for real-world domains like social networks are inherently relational. We propose Relational Boosted Bandits(RB2), acontextual bandits algorithm for relational domains based on (relational) boosted trees. RB2 enables us to learn interpretable and explainable models due to the more descriptive nature of the relational representation. We empirically demonstrate the effectiveness and interpretability of RB2 on tasks such as link prediction, relational classification, and recommendations.Comment: 8 pages, 3 figure

    Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits

    Get PDF
    Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc. However, most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context. For example, in a music recommendation system, the user context contains what music they listen to, which artists create this music, the artist albums, etc. Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder. To improve the efficiency of exploration-exploitation knowledge about the context can be infused to guide the exploration-exploitation strategy. Relational context representations allow a natural way for humans to specify knowledge owing to their descriptive nature. We propose an adaptation of Knowledge Infused Policy Gradients to the Contextual Bandit setting and a novel Knowledge Infused Policy Gradients Upper Confidence Bound algorithm and perform an experimental analysis of a simulated music recommendation dataset and various real-life datasets where expert knowledge can drastically reduce the total regret and where it cannot.Comment: Accepted for publication in the research track at ECML-PKDD 202

    Hierarchical Multiple-Instance Data Classification with Costly Features

    Full text link
    We extend the framework of Classification with Costly Features (CwCF) that works with samples of fixed dimensions to trees of varying depth and breadth (similar to a JSON/XML file). In this setting, the sample is a tree - sets of sets of features. Individually for each sample, the task is to sequentially select informative features that help the classification. Each feature has a real-valued cost, and the objective is to maximize accuracy while minimizing the total cost. The process is modeled as an MDP where the states represent the acquired features, and the actions select unknown features. We present a specialized neural network architecture trained through deep reinforcement learning that naturally fits the data and directly selects features in the tree. We demonstrate our method in seven datasets and compare it to two baselines.Comment: RL4RealLife @ ICML2021; code available at https://github.com/jaromiru/rcwc

    MEBN-RM: A Mapping between Multi-Entity Bayesian Network and Relational Model

    Full text link
    Multi-Entity Bayesian Network (MEBN) is a knowledge representation formalism combining Bayesian Networks (BN) with First-Order Logic (FOL). MEBN has sufficient expressive power for general-purpose knowledge representation and reasoning. Developing a MEBN model to support a given application is a challenge, requiring definition of entities, relationships, random variables, conditional dependence relationships, and probability distributions. When available, data can be invaluable both to improve performance and to streamline development. By far the most common format for available data is the relational database (RDB). Relational databases describe and organize data according to the Relational Model (RM). Developing a MEBN model from data stored in an RDB therefore requires mapping between the two formalisms. This paper presents MEBN-RM, a set of mapping rules between key elements of MEBN and RM. We identify links between the two languages (RM and MEBN) and define four levels of mapping from elements of RM to elements of MEBN. These definitions are implemented in the MEBN-RM algorithm, which converts a relational schema in RM to a partial MEBN model. Through this research, the software has been released as a MEBN-RM open-source software tool. The method is illustrated through two example use cases using MEBN-RM to develop MEBN models: a Critical Infrastructure Defense System and a Smart Manufacturing System

    Beyond the paywall

    Get PDF
    In dieser Dissertation untersuche ich die Forschungswege von sechs Wissenschaftlern, die in verschiedenen Disziplinen und Institutionen in den Vereinigten Staaten und in der Tschechischen Republik arbeiten. Um dies zu tun, verwende ich sogenannte „multi-sited“ ethnographisch-methodische Strategien (d.h. Strategien, die Anthropologen verwenden, um Kulturen an zwei oder mehr geografischen Standorten zu vergleichen), mit dem Ziel, informationsbezogene Verhaltensweisen dieser Wissenschaftler im global vernetzten akademischen Umfeld zu untersuchen, englisch abgekürzt „GNAE“, ein Begriff, der sich speziell auf die komplexe Bricolage von Netzwerkinfrastrukturen, Online-Informationsressourcen und Tools bezieht, die Wissenschaftler heutzutage nutzen, d.h. die weltweite akademische e-IS, oder akademische Infrastruktur (Edwards et al. 2013). Die zentrale Forschungsfrage (RQ1), die in dieser Dissertation beantwortet wird, ist: Gibt es, gemäß der multi-sited ethnographischen Analyse der beteiligten Wissenschaftler in dieser Studie—Personen, die Forschung in verschiedenen Disziplinen und Institutionen sowie an unterschiedlichen Standorten betreiben—Hinweise darauf, dass ein signifikanter Anteil der nicht-institutionellen/informellen informationsbezogenen Forschung über Mechanismen im GNAE, die nicht von Bibliotheken unterstützt werden, betrieben wird, sowie (RQ2): Was für Muster sind vorhanden und wie beziehen sie sich auf informationswissenschaftliche und andere sozialwissenschaftliche Theorien? Und drittens (RQ3): Haben die Resultate praxisnahe Bedeutungen für die Entwicklung von Dienstleistungen in wissenschaftlichen Bibliotheken? Ethnographische Strategien sind bisher noch nicht in der Informationswissenschaft (IS) eingesetzt worden, um Fragen dieser Art zu untersuchen. Die Ergebnisse zeigen, dass eine informelle Informationsexploration nur bei zwei Wissenschaftlern, die mit offenen Daten und Tools einer verteilten Computing-Infrastruktur arbeiten, zu finden ist.In this dissertation I examine the pathways of information exploration and discovery of six scientists working in different research disciplines affiliated with several academic institutions in the United States and in the Czech Republic. To do so, I utilize multi-sited ethnographic methodological strategies (i.e., strategies developed by anthropologists to compare cultures across two or more geographic locations) to examine the information-related behaviors of these scholars within the global networked academic environment (GNAE), a term which specifically refers to the complex bricolage of network infrastructures, online information resources, and tools scholars use to perform their research today (i.e., the worldwide academic e-IS, or academic infrastructure [Edwards et al. 2013]). The central research question (RQ1) to be answered in this dissertation: According to the multi-sited ethnographic analysis of scientists participating in this study—individuals conducting research in various disciplines at different institutions in several geographical locations—is there evidence indicating a significant allotment of non-institutional/informal information-related exploration and discovery occurring beyond official library-supported mechanisms in the GNAE?, and—part two (RQ2) of the central research question—What (if any) patterns are exhibited and how do these patterns relate to information science (IS) and other social science theories? Both RQ1 and RQ2 are exploratory. I additionally ask (RQ3): What might all this mean in the applied sense? by showing examples of services piloted during the research process in response to my observations in the field. Multi-sited ethnographic strategies have not yet been employed in IS, as of the date of publication of this thesis, to examine such questions. Results indicate informal information exploration occurring only with two scientists who use of open data and tools on a distributed computing infrastructure

    Depth-based Hypergraph Complexity Traces from Directed Line Graphs

    Get PDF
    In this paper, we aim to characterize the structure of hypergraphs in terms of structural complexity measure. Measuring the complexity of a hypergraph in a straightforward way tends to be elusive since the hyperedges of a hypergraph may exhibit varying relational orders. We thus transform a hypergraph into a line graph which not only accurately reflects the multiple relationships exhibited by the hyperedges but is also easier to manipulate for complexity analysis. To locate the dominant substructure within a line graph, we identify a centroid vertex by computing the minimum variance of its shortest path lengths. A family of centroid expansion subgraphs of the line graph is then derived from the centroid vertex. We compute the depth-based complexity traces for the hypergraph by measuring either the directed or undirected entropies of its centroid expansion subgraphs. The resulting complexity traces provide a flexible framework that can be applied to both hypergraphs and graphs. We perform (hyper)graph classification in the principal component space of the complexity trace vectors. Experiments on (hyper)graph datasets abstracted from bioinformatics and computer vision data demonstrate the effectiveness and efficiency of the complexity traces.This work is supported by National Natural Science Foundation of China (Grant no. 61503422). This work is supported by the Open Projects Program of National Laboratory of Pattern Recognition. Francisco Escolano is supported by the project TIN2012-32839 of the Spanish Government. Edwin R. Hancock is supported by a Royal Society Wolfson Research Merit Award
    • …
    corecore