15 research outputs found

    A Graph-structured Dataset for Wikipedia Research

    Get PDF
    Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale. In this work, we propose an efficient data structure to make requests and access subnetworks of Wikipedia pages and categories. We provide convenient tools for accessing and filtering viewership statistics or "pagecounts" of Wikipedia web pages. The dataset organization leverages principles of graph databases that allows rapid and intuitive access to subgraphs of Wikipedia articles and categories. The dataset and deployment guidelines are available on the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}

    Non-linear subdivision of univariate signals and discrete surfaces

    Get PDF
    During the last 20 years, the joint expansion of computing power, computer graphics, networking capabilities and multiresolution analysis have stimulated several research domains, and developed the need for new types of data such as 3D models, i.e. discrete surfaces. In the intersection between multiresolution analysis and computer graphics, subdivision methods, i.e. iterative refinement procedures of curves or surfaces, have a non-negligible place, since they are a basic component needed to adapt existing multiresolution techniques dedicated to signals and images to more complicated data such as discrete surfaces represented by polygonal meshes. Such representations are of great interest since they make polygonal meshes nearly as exible as higher level 3D model representations, such as piecewise polynomial based surfaces (e.g. NURBS, B-splines...). The generalization of subdivision methods from univariate data to polygonal meshes is relatively simple in case of a regular mesh but becomes less straightforward when handling irregularities. Moreover, in the linear univariate case, obtaining a smoother limit curve is achieved by increasing the size of the support of the subdivision scheme, which is not a trivial operation in the case of a surface subdivision scheme without a priori assumptions on the mesh. While many linear subdivision methods are available, the studies concerning more general non-linear methods are relatively sparse, whereas such techniques could be used to achieve better results without increasing the size support. The goal of this study is to propose and to analyze a binary non-linear interpolatory subdivision method. The proposed technique uses local polar coordinates to compute the positions of the newly inserted points. It is shown that the method converges toward continuous limit functions. The proposed univariate scheme is extended to triangular meshes, possibly with boundaries. In order to evaluate characteristics of the proposed scheme which are not proved analytically, numerical estimates to study convergence, regularity of the limit function and approximation order are studied and validated using known linear schemes of identical support. The convergence criterion is adapted to surface subdivision via a Hausdorff distance-based metric. The evolution of Gaussian and mean curvature of limit surfaces is also studied and compared against theoretical values when available. An application of surface subdivision to build a multiresolution representation of 3D models is also studied. In particular, the efficiency of such a representation for compression and in terms of rate-distortion of such a representation is shown. An alternate to the initial SPIHT-based encoding, based on the JPEG 2000 image compression standard method. This method makes possible partial decoding of the compressed model in both SNR-progressive and level-progressive ways, while adding only a minimal overhead when compared to SPIHT

    Non-linear subdivision using local spherical coordinates

    No full text
    In this paper, we present an original non-linear subdivision scheme suitable for univariate data, plane curves and discrete triangulated surfaces, while keeping the complexity acceptable. The proposed technique is compared to linear subdivision methods having an identical support. Numerical criteria are proposed to verify basic properties, such as convergence of the scheme and the regularity of the limit function

    Spikyball Sampling: Exploring Large Networks via an Inhomogeneous Filtered Diffusion

    No full text
    Studying real-world networks such as social networks or web networks is a challenge. These networks often combine a complex, highly connected structure together with a large size. We propose a new approach for large scale networks that is able to automatically sample user-defined relevant parts of a network. Starting from a few selected places in the network and a reduced set of expansion rules, the method adopts a filtered breadth-first search approach, that expands through edges and nodes matching these properties. Moreover, the expansion is performed over a random subset of neighbors at each step to mitigate further the overwhelming number of connections that may exist in large graphs. This carries the image of a “spiky” expansion. We show that this approach generalize previous exploration sampling methods, such as Snowball or Forest Fire and extend them. We demonstrate its ability to capture groups of nodes with high interactions while discarding weakly connected nodes that are often numerous in social networks and may hide important structures

    What is Trending on Wikipedia? Capturing Trends and Language Biases Across Wikipedia Editions

    No full text
    In this work, we propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. As an example, we focus on English, French, and Russian languages during the last four months of 2018. The proposed method has three steps. Firstly, it extracts the most trending articles over a chosen period of time. Secondly, it performs a semi-supervised topic extraction and thirdly, it compares topics across languages. The automated processing works with the data that combines Wikipedia's graph of hyperlinks, pageview statistics and summaries of the pages. The results show that people share a common interest and curiosity for entertainment, e.g. movies, music, sports independently of their language. Differences appear in topics related to local events or about cultural particularities. Interactive visualizations showing clusters of trending pages in each language edition are available online https://wiki-insights.epfl.ch/wikitrend

    Naviguer dans les traces numériques sur Twitter. Retour sur la conception d'un dispositif de cartographie de données à destination de journalistes

    No full text
    International audienceCet article offre un retour d’expérience d’une recherche appliquée pluridisciplinaire pour concevoir un outil d’exploration et de cartographie de données issues de Twitter à destination des journalistes. L’approche privilégie l’observation et la compréhension des usages des journalistes et une projection d’intégration du logiciel dans leurs pratiques et les routines quotidiennes. Trois scénarios d’usages sont développés, rendant compte d’une diversité d’appropriation de l’outil. Ils soulignent l’intérêt de penser une interface favorisant la compréhension par l’utilisateur des opérations réalisées sur les données, mais également la nécessité que les usagers disposent d’une culture numérique minimale pour produire des contenus éditoriaux fondés sur l'exploitation des traces numériques

    Total aberrations compensation in digital holographic microscopy with a reference conjugated hologram

    No full text
    In this paper we present a new method to achieve quantitative phase contrast imaging in Digital Holographic Microscopy (DHM) that allows to compensate for phase aberrations and image distortion by recording of a single reference hologram.We demonstrate that in particular cases in which the studied specimen does not have abrupt edges, the specimen’s hologram itself can be used as reference hologram. We show that image distortion and phase aberrations introduced by a lens ball used as microscope objective are completely suppressed with our method. Finally the concept of self-conjugated reference hologram is applied on a biological sample (Trypanosoma Brucei) to maintain a spatial phase noise level under 3 degrees
    corecore