16,545 research outputs found

    Towards Unbiased BFS Sampling

    Full text link
    Breadth First Search (BFS) is a widely used approach for sampling large unknown Internet topologies. Its main advantage over random walks and other exploration techniques is that a BFS sample is a plausible graph on its own, and therefore we can study its topological characteristics. However, it has been empirically observed that incomplete BFS is biased toward high-degree nodes, which may strongly affect the measurements. In this paper, we first analytically quantify the degree bias of BFS sampling. In particular, we calculate the node degree distribution expected to be observed by BFS as a function of the fraction f of covered nodes, in a random graph RG(pk) with an arbitrary degree distribution pk. We also show that, for RG(pk), all commonly used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS) suffer from exactly the same bias. Next, based on our theoretical analysis, we propose a practical BFS-bias correction procedure. It takes as input a collected BFS sample together with its fraction f. Even though RG(pk) does not capture many graph properties common in real-life graphs (such as assortativity), our RG(pk)-based correction technique performs well on a broad range of Internet topologies and on two large BFS samples of Facebook and Orkut networks. Finally, we consider and evaluate a family of alternative correction procedures, and demonstrate that, although they are unbiased for an arbitrary topology, their large variance makes them far less effective than the RG(pk)-based technique.Comment: BFS, RDS, graph traversal, sampling bias correctio

    Exploring chemical compound space with a graph-based recommender system

    Full text link
    With the availability of extensive databases of inorganic materials, data-driven approaches leveraging machine learning have gained prominence in materials science research. In this study, we propose an innovative adaptation of data-driven concepts to the mapping and exploration of chemical compound space. Recommender systems, widely utilized for suggesting items to users, employ techniques such as collaborative filtering, which rely on bipartite graphs composed of users, items, and their interactions. Building upon the Open Quantum Materials Database (OQMD), we constructed a bipartite graph where elements from the periodic table and sites within crystal structures are treated as separate entities. The relationships between them, defined by the presence of ions at specific sites and weighted according to the thermodynamic stability of the respective compounds, allowed us to generate an embedding space that contains vector representations for each ion and each site. Through the correlation of ion-site occupancy with their respective distances within the embedding space, we explored new ion-site occupancies, facilitating the discovery of novel stable compounds. Moreover, the graph's embedding space enabled a comprehensive examination of chemical similarities among elements, and a detailed analysis of local geometries of sites. To demonstrate the effectiveness and robustness of our method, we conducted a historical evaluation using different versions of the OQMD and recommended new compounds with Kagome lattices, showcasing the applicability of our approach to practical materials design

    Striking a Balance Between Physical and Digital Resources

    Get PDF
    In various configurations—be they academic, archival, county, juvenile, monastic, national, personal, public, reference, or research, the library has been a fixture in human affairs for a long time. Digital — meaning, content or communication that is delivered through the internet, is 20 years old (but younger in parts). Basically, both approaches to organizing serve to structure information for access. However, digital is multiplying very fast and libraries all-round contemplate an existential crisis; the more hopeful librarians fret about physical and digital space. Yet, the crux of the matter is not about physical vs. digital: without doubt, the digital space of content or communication transmogrifies all walks of life and cannot be wished away; but, the physical space of libraries is time-tested, extremely valuable, and can surely offer more than currently meets the eye. Except for entirely virtual libraries, the symbiotic relationship between the physical and the digital is innately powerful: for superior outcomes, it must be recognized, nurtured, and leveraged; striking a balance between physical and digital resources can be accomplished. This paper examines the subject of delivering digital from macro, meso, and micro perspectives: it looks into complexity theory, digital strategy, and digitization

    Personalization Techniques and Recommender Systems

    Full text link
    • …
    corecore