194 research outputs found
HIERARCHICAL CLUSTERING USING LEVEL SETS
Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed
On the performance of a cavity method based algorithm for the Prize-Collecting Steiner Tree Problem on graphs
We study the behavior of an algorithm derived from the cavity method for the
Prize-Collecting Steiner Tree (PCST) problem on graphs. The algorithm is based
on the zero temperature limit of the cavity equations and as such is formally
simple (a fixed point equation resolved by iteration) and distributed
(parallelizable). We provide a detailed comparison with state-of-the-art
algorithms on a wide range of existing benchmarks networks and random graphs.
Specifically, we consider an enhanced derivative of the Goemans-Williamson
heuristics and the DHEA solver, a Branch and Cut Linear/Integer Programming
based approach. The comparison shows that the cavity algorithm outperforms the
two algorithms in most large instances both in running time and quality of the
solution. Finally we prove a few optimality properties of the solutions
provided by our algorithm, including optimality under the two post-processing
procedures defined in the Goemans-Williamson derivative and global optimality
in some limit cases
General scores for accessibility and inequality measures in urban areas
In the last decades, the acceleration of urban growth has led to an
unprecedented level of urban interactions and interdependence. This situation
calls for a significant effort among the scientific community to come up with
engaging and meaningful visualizations and accessible scenario simulation
engines. The present paper gives a contribution in this direction by providing
general methods to evaluate accessibility in cities based on public
transportation data. Through the notion of isochrones, the accessibility
quantities proposed measure the performance of transport systems at connecting
places and people in urban systems. Then we introduce scores rank cities
according to their overall accessibility. We highlight significant inequalities
in the distribution of these measures across the population, which are found to
be strikingly similar across various urban environments. Our results are
released through the interactive platform: www.citychrone.org, aimed at
providing the community at large with a useful tool for awareness and
decision-making
From Twitter to GDP: Estimating Economic Activity From Social Media
[EN] This paper shows how the use of data derived from Twitter can be used as a proxy for measuring GDP at the country level. Using a dataset of 270 million geo-located image tweets shared on Twitter in 2012 and 2013, I find that: (i) Twitter data can be used as a proxy for estimating GDP at the country level and can explain 94 percent of the variation in GDP; and (ii) that the residuals from my preferred model are negatively correlated to a data quality index which assesses the capacity of a country’s statistical system. This suggests that my estimates for GDP are more accurate for countries which are considered to have more reliable GDP data. Taken together, these findings show that institutions and individuals could use social media data to corroborate official GDP estimates; or alternatively for government statistic agencies to incorporate social media data to complement and further reduce measurement errors.Indaco, A. (2018). From Twitter to GDP: Estimating Economic Activity From Social Media. En 2nd International Conference on Advanced Reserach Methods and Analytics (CARMA 2018). Editorial Universitat Politècnica de València. 87-96. https://doi.org/10.4995/CARMA2018.2018.8316OCS879
The autoregressive neural network architecture of the Boltzmann distribution of pairwise interacting spins systems
Generative Autoregressive Neural Networks (ARNN) have recently demonstrated
exceptional results in image and language generation tasks, contributing to the
growing popularity of generative models in both scientific and commercial
applications. This work presents a physical interpretation of the ARNNs by
reformulating the Boltzmann distribution of binary pairwise interacting systems
into autoregressive form. The resulting ARNN architecture has weights and
biases of its first layer corresponding to the Hamiltonian's couplings and
external fields, featuring widely used structures like the residual connections
and a recurrent architecture with clear physical meanings. However, the
exponential growth, with system size, of the number of parameters of the hidden
layers makes its direct application unfeasible. Nevertheless, its
architecture's explicit formulation allows using statistical physics techniques
to derive new ARNNs for specific systems. As examples, new effective ARNN
architectures are derived from two well-known mean-field systems, the
Curie-Weiss and Sherrington-Kirkpatrick models, showing superior performances
in approximating the Boltzmann distributions of the corresponding physics model
than other commonly used ARNNs architectures. The connection established
between the physics of the system and the ARNN architecture provides a way to
derive new neural network architectures for different interacting systems and
interpret existing ones from a physical perspective.Comment: 10 pages, 6 figure plus the Supplementary Informatio
MarciaTesta: An Automatic Generator of Test Programs for Microprocessors' Data Caches
SBST (Software Based Self-Testing) is an effective solution for in-system testing of SoCs without any additional hardware requirement. SBST is particularly suited for embedded blocks with limited accessibility, such as cache memories. Several methodologies have been proposed to properly adapt existing March algorithms to test cache memories. Unfortunately they all leave the test engineers the task of manually coding them into the specific Instruction Set Architecture (ISA) of the target microprocessor. We propose an EDA tool for the automatic generation of assembly cache test program for a specific architectur
Validation & Verification of an EDA automated synthesis tool
Reliability and correctness are two mandatory features for automated synthesis tools. To reach the goals several campaigns of Validation and Verification (V&V) are needed. The paper presents the extensive efforts set up to prove the correctness of a newly developed EDA automated synthesis tool. The target tool, MarciaTesta, is a multi-platform automatic generator of test programs for microprocessors' caches. Getting in input the selected March Test and some architectural details about the target cache memory, the tool automatically generates the assembly level program to be run as Software Based Self-Testing (SBST). The equivalence between the original March Test, the automatically generated Assembly program, and the intermediate C/C++ program have been proved resorting to sophisticated logging mechanisms. A set of proved libraries has been generated and extensively used during the tool development. A detailed analysis of the lessons learned is reporte
- …