48 research outputs found

    Graph Clustering in All Parameter Regimes

    Get PDF
    Resolution parameters in graph clustering control the size and structure of clusters formed by solving a parametric objective function. Typically there is more than one meaningful way to cluster a graph, and solving the same objective function for different resolution parameters produces clusterings at different levels of granularity, each of which can be meaningful depending on the application. In this paper, we address the task of efficiently solving a parameterized graph clustering objective for all values of a resolution parameter. Specifically, we consider a new analysis-friendly objective we call LambdaPrime, involving a parameter ? ? (0,1). LambdaPrime is an adaptation of LambdaCC, a significant family of instances of the Correlation Clustering (minimization) problem. Indeed, LambdaPrime and LambdaCC are closely related to other parameterized clustering problems, such as parametric generalizations of modularity. They capture a number of specific clustering problems as special cases, including sparsest cut and cluster deletion. While previous work provides approximation results for a single value of the resolution parameter, we seek a set of approximately optimal clusterings for all values of ? in polynomial time. More specifically, we show that when a graph has m edges and n nodes, there exists a set of at most m clusterings such that, for every ? ? (0,1), the family contains an optimal solution to the LambdaPrime objective. This bound is tight on star graphs. We obtain a family of O(log n) clusterings by solving the parametric linear programming (LP) relaxation of LambdaPrime at O(log n) ? values, and rounding each LP solution using existing approximation algorithms. We prove that this is asymptotically tight: for a certain class of ring graphs, for all values of ?, ?(log n) feasible solutions are required to provide a constant-factor approximation for the LambdaPrime LP relaxation. To minimize the size of the clustering family, we further propose an algorithm that yields a family of solutions of a size no more than twice of the minimum LP-approximating family

    Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

    Full text link
    Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy. Recently, synthetic images generated by text-to-image diffusion models, have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. To address this, we start by uncovering that diffusion models' cross-attention layers inherently provide annotation-free attention masks aligned with corresponding text inputs on generated images. We then investigate the problems of three prevalent unsupervised learning techniques ( i.e., contrastive learning, masked modeling, and vision-language pretraining) and introduce customized solutions by fully exploiting the aforementioned free attention masks. Our approach is validated through extensive experiments that show consistent improvements in baseline models across various downstream tasks, including image classification, detection, segmentation, and image-text retrieval. By utilizing our method, it is possible to close the performance gap between unsupervised pretraining on synthetic data and real-world scenarios

    Dataset Condensation via Generative Model

    Full text link
    Dataset condensation aims to condense a large dataset with a lot of training samples into a small set. Previous methods usually condense the dataset into the pixels format. However, it suffers from slow optimization speed and large number of parameters to be optimized. When increasing image resolutions and classes, the number of learnable parameters grows accordingly, prohibiting condensation methods from scaling up to large datasets with diverse classes. Moreover, the relations among condensed samples have been neglected and hence the feature distribution of condensed samples is often not diverse. To solve these problems, we propose to condense the dataset into another format, a generative model. Such a novel format allows for the condensation of large datasets because the size of the generative model remains relatively stable as the number of classes or image resolution increases. Furthermore, an intra-class and an inter-class loss are proposed to model the relation of condensed samples. Intra-class loss aims to create more diverse samples for each class by pushing each sample away from the others of the same class. Meanwhile, inter-class loss increases the discriminability of samples by widening the gap between the centers of different classes. Extensive comparisons with state-of-the-art methods and our ablation studies confirm the effectiveness of our method and its individual component. To our best knowledge, we are the first to successfully conduct condensation on ImageNet-1k.Comment: old work,done in 202

    Too Large; Data Reduction for Vision-Language Pre-Training

    Full text link
    This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major steps. First, a codebook-based encoder-decoder captioner is developed to select representative samples. Second, a new caption is generated to complement the original captions for selected samples, mitigating the text-image misalignment problem while maintaining uniqueness. As the result, TL;DR enables us to reduce the large dataset into a small set of high-quality data, which can serve as an alternative pre-training dataset. This algorithm significantly speeds up the time-consuming pretraining process. Specifically, TL;DR can compress the mainstream VLP datasets at a high ratio, e.g., reduce well-cleaned CC3M dataset from 2.82M to 0.67M (āˆ¼\sim24\%) and noisy YFCC15M from 15M to 2.5M (āˆ¼\sim16.7\%). Extensive experiments with three popular VLP models over seven downstream tasks show that VLP model trained on the compressed dataset provided by TL;DR can perform similar or even better results compared with training on the full-scale dataset. The code will be made available at \url{https://github.com/showlab/data-centric.vlp}.Comment: Work in progress. Code: https://github.com/showlab/data-centric.vl

    Proto-Tethys magmatic evolution along northern Gondwana: Insights from Late Silurianā€“Middle Devonian A-type magmatism, East Kunlun Orogen, Northern Tibetan Plateau, China

    Get PDF
    The East Kunlun Orogen records the geological evolutions of the Neoproterozoic ā€“ Early Paleozoic Proto-Tethyan Ocean and Late Paleozoicā€“Mesozoic Paleo-Tethys Ocean along northern Gondwana. However, the late-stage evolution of the Proto-Tethyan Ocean and the configuration of peri-Gondwana microcontinents during the Silurian ā€“ Devonian is under debate. Here we report new geochronological and geochemical data of A-type granites from the western Wulonggou and the eastern Gouli areas in the East Kunlun Orogen to deepen our understanding of these problems. Zircon LA-ICP-MS UPb data reveal that the Danshuigou monzogranite and Shenshuitan syenogranite from the western Wulonggou area were emplaced simultaneously at 418ā€ÆĀ±ā€Æ3ā€ÆMa, while the Niantang syenogranite from the eastern Gouli area was emplaced at 403ā€ÆĀ±ā€Æ2ā€ÆMa. All these rocks display high-K calcic-alkalic to shoshonitic and metaluminous to slight peraluminous signatures, with relatively low CaO, Al2O3, MgO and Sr, and high FeOt/MgO, Ga/Al, Zr, and Nb, indicating their A-type affinity. Their moderate whole-rock ĪµNd(t) (āˆ’5.3 to āˆ’0.6) and zircon ĪµHf(t) (āˆ’6.3ā€“6.4) are different from those of depleted mantle and old basement rocks, but similar to those of the Ordovicianā€“Silurian granitoids in the East Kunlun Orogen. These chemical signatures, together with the anhydrous, low-pressure and high-temperature characteristics of the magmas, indicate that partial melting of the Ordovicianā€“Silurian granitoids generated these A-type granites. Regionally, these A-type granites and previously reported A-type granites in the East Kunlun Orogen compose a Late Silurian ā€“ Middle Devonian A-type granite belt. This belt, together with the regionally coeval molasse formation and mafic-ultramafic rocks, indicate a post-collisional extensional regime for the East Kunlun Orogen during the Late Silurian ā€“ Middle Devonian. Given that extensive contemporaneous post-collision-related magmatic rocks have also been revealed in the neighboring West Kunlun, Altyn, Qilian and Qinling blocks/terranes, we contend that the Neoproterozoic ā€“ Early Paleozoic Proto-Tethyan Ocean that separated these blocks/terranes from Gondwana had closed by the Late Silurian ā€“ Middle Devonian, which]resulted in the re-welding of the above blocks/terranes to northern Gondwana or Gondwana-derived microcontinents

    Cross-National Differences in Victimization : Disentangling the Impact of Composition and Context

    Get PDF
    Varying rates of criminal victimization across countries are assumed to be the outcome of countrylevel structural constraints that determine the supply ofmotivated oĀ”enders, as well as the differential composition within countries of suitable targets and capable guardianship. However, previous empirical tests of these ā€˜compositionalā€™ and ā€˜contextualā€™ explanations of cross-national diĀ”erences have been performed upon macro-level crime data due to the unavailability of comparable individual-level data across countries. This limitation has had two important consequences for cross-national crime research. First, micro-/meso-level mechanisms underlying cross-national differences cannot be truly inferred from macro-level data. Secondly, the eĀ”ects of contextual measures (e.g. income inequality) on crime are uncontrolled for compositional heterogeneity. In this paper, these limitations are overcome by analysing individual-level victimization data across 18 countries from the International CrimeVictims Survey. Results from multi-level analyses on theft and violent victimization indicate that the national level of income inequality is positively related to risk, independent of compositional (i.e. micro- and meso-level) diĀ”erences. Furthermore, crossnational variation in victimization rates is not only shaped by diĀ”erences in national context, but also by varying composition. More speciĀ¢cally, countries had higher crime rates the more they consisted of urban residents and regions with lowaverage social cohesion.

    The JCMT BISTRO Survey: A Spiral Magnetic Field in a Hub-filament Structure, Monoceros R2

    Get PDF
    We present and analyze observations of polarized dust emission at 850 Ī¼m toward the central 1 7 1 pc hub-filament structure of Monoceros R2 (Mon R2). The data are obtained with SCUBA-2/POL-2 on the James Clerk Maxwell Telescope (JCMT) as part of the B-fields in Star-forming Region Observations survey. The orientations of the magnetic field follow the spiral structure of Mon R2, which are well described by an axisymmetric magnetic field model. We estimate the turbulent component of the magnetic field using the angle difference between our observations and the best-fit model of the underlying large-scale mean magnetic field. This estimate is used to calculate the magnetic field strength using the Davisā€“Chandrasekharā€“Fermi method, for which we also obtain the distribution of volume density and velocity dispersion using a column density map derived from Herschel data and the C18O (J = 3 - 2) data taken with HARP on the JCMT, respectively. We make maps of magnetic field strengths and mass-to-flux ratios, finding that magnetic field strengths vary from 0.02 to 3.64 mG with a mean value of 1.0 \ub1 0.06 mG, and the mean critical mass-to-flux ratio is 0.47 \ub1 0.02. Additionally, the mean Alfv\ue9n Mach number is 0.35 \ub1 0.01. This suggests that, in Mon R2, the magnetic fields provide resistance against large-scale gravitational collapse, and the magnetic pressure exceeds the turbulent pressure. We also investigate the properties of each filament in Mon R2. Most of the filaments are aligned along the magnetic field direction and are magnetically subcritical

    The JCMT BISTRO Survey: Studying the Complex Magnetic Field of L43

    Get PDF
    We present observations of polarized dust emission at 850 Ī¼m from the L43 molecular cloud, which sits in the Ophiuchus cloud complex. The data were taken using SCUBA-2/POL-2 on the James Clerk Maxwell Telescope as a part of the BISTRO large program. L43 is a dense (NH 10 22 2 ~ ā€“1023 cmāˆ’2) complex molecular cloud with a submillimeter-bright starless core and two protostellar sources. There appears to be an evolutionary gradient along the isolated filament that L43 is embedded within, with the most evolved source closest to the Sco OB2 association. One of the protostars drives a CO outflow that has created a cavity to the southeast. We see a magnetic field that appears to be aligned with the cavity walls of the outflow, suggesting interaction with the outflow. We also find a magnetic field strength of up to āˆ¼160 Ā± 30 Ī¼G in the main starless core and up to āˆ¼90 Ā± 40 Ī¼G in the more diffuse, extended region. These field strengths give magnetically super- and subcritical values, respectively, and both are found to be roughly trans-AlfvĆ©nic. We also present a new method of data reduction for these denser but fainter objects like starless cores
    corecore