194 research outputs found

    HIERARCHICAL CLUSTERING USING LEVEL SETS

    Get PDF
    Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed

    On the performance of a cavity method based algorithm for the Prize-Collecting Steiner Tree Problem on graphs

    Get PDF
    We study the behavior of an algorithm derived from the cavity method for the Prize-Collecting Steiner Tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks networks and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the DHEA solver, a Branch and Cut Linear/Integer Programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two post-processing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases

    General scores for accessibility and inequality measures in urban areas

    Get PDF
    In the last decades, the acceleration of urban growth has led to an unprecedented level of urban interactions and interdependence. This situation calls for a significant effort among the scientific community to come up with engaging and meaningful visualizations and accessible scenario simulation engines. The present paper gives a contribution in this direction by providing general methods to evaluate accessibility in cities based on public transportation data. Through the notion of isochrones, the accessibility quantities proposed measure the performance of transport systems at connecting places and people in urban systems. Then we introduce scores rank cities according to their overall accessibility. We highlight significant inequalities in the distribution of these measures across the population, which are found to be strikingly similar across various urban environments. Our results are released through the interactive platform: www.citychrone.org, aimed at providing the community at large with a useful tool for awareness and decision-making

    From Twitter to GDP: Estimating Economic Activity From Social Media

    Get PDF
    [EN] This paper shows how the use of data derived from Twitter can be used as a proxy for measuring GDP at the country level. Using a dataset of 270 million geo-located image tweets shared on Twitter in 2012 and 2013, I find that: (i) Twitter data can be used as a proxy for estimating GDP at the country level and can explain 94 percent of the variation in GDP; and (ii) that the residuals from my preferred model are negatively correlated to a data quality index which assesses the capacity of a country’s statistical system. This suggests that my estimates for GDP are more accurate for countries which are considered to have more reliable GDP data. Taken together, these findings show that institutions and individuals could use social media data to corroborate official GDP estimates; or alternatively for government statistic agencies to incorporate social media data to complement and further reduce measurement errors.Indaco, A. (2018). From Twitter to GDP: Estimating Economic Activity From Social Media. En 2nd International Conference on Advanced Reserach Methods and Analytics (CARMA 2018). Editorial Universitat Politècnica de València. 87-96. https://doi.org/10.4995/CARMA2018.2018.8316OCS879

    The autoregressive neural network architecture of the Boltzmann distribution of pairwise interacting spins systems

    Full text link
    Generative Autoregressive Neural Networks (ARNN) have recently demonstrated exceptional results in image and language generation tasks, contributing to the growing popularity of generative models in both scientific and commercial applications. This work presents a physical interpretation of the ARNNs by reformulating the Boltzmann distribution of binary pairwise interacting systems into autoregressive form. The resulting ARNN architecture has weights and biases of its first layer corresponding to the Hamiltonian's couplings and external fields, featuring widely used structures like the residual connections and a recurrent architecture with clear physical meanings. However, the exponential growth, with system size, of the number of parameters of the hidden layers makes its direct application unfeasible. Nevertheless, its architecture's explicit formulation allows using statistical physics techniques to derive new ARNNs for specific systems. As examples, new effective ARNN architectures are derived from two well-known mean-field systems, the Curie-Weiss and Sherrington-Kirkpatrick models, showing superior performances in approximating the Boltzmann distributions of the corresponding physics model than other commonly used ARNNs architectures. The connection established between the physics of the system and the ARNN architecture provides a way to derive new neural network architectures for different interacting systems and interpret existing ones from a physical perspective.Comment: 10 pages, 6 figure plus the Supplementary Informatio

    MarciaTesta: An Automatic Generator of Test Programs for Microprocessors' Data Caches

    Get PDF
    SBST (Software Based Self-Testing) is an effective solution for in-system testing of SoCs without any additional hardware requirement. SBST is particularly suited for embedded blocks with limited accessibility, such as cache memories. Several methodologies have been proposed to properly adapt existing March algorithms to test cache memories. Unfortunately they all leave the test engineers the task of manually coding them into the specific Instruction Set Architecture (ISA) of the target microprocessor. We propose an EDA tool for the automatic generation of assembly cache test program for a specific architectur

    Validation & Verification of an EDA automated synthesis tool

    Get PDF
    Reliability and correctness are two mandatory features for automated synthesis tools. To reach the goals several campaigns of Validation and Verification (V&V) are needed. The paper presents the extensive efforts set up to prove the correctness of a newly developed EDA automated synthesis tool. The target tool, MarciaTesta, is a multi-platform automatic generator of test programs for microprocessors' caches. Getting in input the selected March Test and some architectural details about the target cache memory, the tool automatically generates the assembly level program to be run as Software Based Self-Testing (SBST). The equivalence between the original March Test, the automatically generated Assembly program, and the intermediate C/C++ program have been proved resorting to sophisticated logging mechanisms. A set of proved libraries has been generated and extensively used during the tool development. A detailed analysis of the lessons learned is reporte
    corecore