113 research outputs found
CEIL: A General Classification-Enhanced Iterative Learning Framework for Text Clustering
Text clustering, as one of the most fundamental challenges in unsupervised
learning, aims at grouping semantically similar text segments without relying
on human annotations. With the rapid development of deep learning, deep
clustering has achieved significant advantages over traditional clustering
methods. Despite the effectiveness, most existing deep text clustering methods
rely heavily on representations pre-trained in general domains, which may not
be the most suitable solution for clustering in specific target domains. To
address this issue, we propose CEIL, a novel Classification-Enhanced Iterative
Learning framework for short text clustering, which aims at generally promoting
the clustering performance by introducing a classification objective to
iteratively improve feature representations. In each iteration, we first adopt
a language model to retrieve the initial text representations, from which the
clustering results are collected using our proposed Category Disentangled
Contrastive Clustering (CDCC) algorithm. After strict data filtering and
aggregation processes, samples with clean category labels are retrieved, which
serve as supervision information to update the language model with the
classification objective via a prompt learning approach. Finally, the updated
language model with improved representation ability is used to enhance
clustering in the next iteration. Extensive experiments demonstrate that the
CEIL framework significantly improves the clustering performance over
iterations, and is generally effective on various clustering algorithms.
Moreover, by incorporating CEIL on CDCC, we achieve the state-of-the-art
clustering performance on a wide range of short text clustering benchmarks
outperforming other strong baseline methods.Comment: The Web Conference 202
Genetic and Molecular Characterization of Flagellar Assembly in Shewanella oneidensis
Shewanella oneidensis is a highly motile organism by virtue of a polar flagellum. Unlike most flagellated bacteria, it contains only one major chromosome segment encoding the components of the flagellum with the exception of the motor proteins. In this region, three genes encode flagellinsaccording to the original genome annotation. However, we find that only flaA and flaB encode functional filament subunits. Although these two genesare under the control of different promoters, they are actively transcribed and subsequently translated, producing a considerable number of flagellin proteins. Additionally, both flagellins are able to interact with their chaperon FliS and are subjected to feedback regulation. Furthermore, FlaA and FlaB are glycosylated by a pathwayinvolving a major glycosylating enzyme,PseB, in spite of the lack of the majority of theconsensus glycosylation sites. In conclusion, flagellar assembly in S. oneidensis has novel features despite the conservation of homologous genes across taxa
EACOFT: an energy-aware correlation filter for visual tracking.
Correlation filter based trackers attribute to its calculation in the frequency domain can efficiently locate targets in a relatively fast speed. This characteristic however also limits its generalization in some specific scenarios. The reasons that they still fail to achieve superior performance to state-of-the-art (SOTA) trackers are possibly due to two main aspects. The first is that while tracking the objects whose energy is lower than the background, the tracker may occur drift or even lose the target. The second is that the biased samples may be inevitably selected for model training, which can easily lead to inaccurate tracking. To tackle these shortcomings, a novel energy-aware correlation filter (EACOFT) based tracking method is proposed, in our approach the energy between the foreground and the background is adaptively balanced, which enables the target of interest always having a higher energy than its background. The samples’ qualities are also evaluated in real time, which ensures that the samples used for template training are always helpful with tracking. In addition, we also propose an optimal bottom-up and top-down combined strategy for template training, which plays an important role in improving both the effectiveness and robustness of tracking. As a result, our approach achieves a great improvement on the basis of the baseline tracker, especially under the background clutter and fast motion challenges. Extensive experiments over multiple tracking benchmarks demonstrate the superior performance of our proposed methodology in comparison to a number of the SOTA trackers
Resampling methods to reduce the selection bias in genetic effect estimation in genome-wide scans
Using the simulated data of Problem 2 for Genetic Analysis Workshop 14 (GAW14), we investigated the ability of three bootstrap-based resampling estimators (a shrinkage, an out-of-sample, and a weighted estimator) to reduce the selection bias for genetic effect estimation in genome-wide linkage scans. For the given marker density in the preliminary genome scans (7 cM for microsatellite and 3 cM for SNP), we found that the two sets of markers produce comparable results in terms of power to detect linkage, localization accuracy, and magnitude of test statistic at the peak location. At the locations detected in the scan, application of the three bootstrap-based estimators substantially reduced the upward selection bias in genetic effect estimation for both true and false positives. The relative effectiveness of the estimators depended on the true genetic effect size and the inherent power to detect it. The shrinkage estimator is recommended when the power to detect the disease locus is low. Otherwise, the weighted estimator is recommended
Fusion of infrared and visible images for remote detection of low-altitude slow-speed small targets.
Detection of the low-altitude and slow-speed small (LSS) targets is one of the most popular research topics in remote sensing. Despite of a few existing approaches, there is still an accuracy gap for satisfying the practical needs. As the LSS targets are too small to extract useful features, deep learning based algorithms can hardly be used. To this end, we propose in this article an effective strategy for determining the region of interest, using a multiscale layered image fusion method to extract the most representative information for LSS-target detection. In addition, an improved self-balanced sensitivity segment model is proposed to detect the fused LSS target, which can further improve both the detection accuracy and the computational efficiency. We conduct extensive ablation studies to validate the efficacy of the proposed LSS-target detection method on three public datasets and three self-collected datasets. The superior performance over the state of the arts has fully demonstrated the efficacy of the proposed approach
Comparison of family-based association tests in chromosome regions selected by linkage-based confidence intervals
We use the Genetic Analysis Workshop 14 simulated data to explore the effectiveness of a two-stage strategy for mapping complex disease loci consisting of an initial genome scan with confidence interval construction for gene location, followed by fine mapping with family-based tests of association on a dense set of single-nucleotide polymorphisms. We considered four types of intervals: the 1-LOD interval, a basic percentile bootstrap confidence interval based on the position of the maximum Zlr score, and asymptotic and bootstrap confidence intervals based on a generalized estimating equations method. For fine mapping we considered two family-based tests of association: a test based on a likelihood ratio statistic and a transmission-disequilibrium-type test implemented in the software FBAT. In two of the simulation replicates, we found that the bootstrap confidence intervals based on the peak Zlr and the 1-LOD support interval always contained the true disease loci and that the likelihood ratio test provided further strong confirmatory evidence of the presence of disease loci in these regions
Number of Courses, Content of Coursework, and Prior Achievement as Related to Ethnic Achievement Gaps in Mathematics
This study utilized base-year and second follow-up data from the National Educational Longitudinal Study of 1988 to investigate the relationship between eighth-grade math achievement, mathematics course-taking in high school, and twelfth-grade math achievement. Results suggested the following: 1) Type of coursework can be quantified. 2) Type of coursework was more predictive of achievement than amount. 3) There were substantial ethnic achievement differences prior to high school. 4) Number of courses, type of courses, and prior achievement were not equally predictive of twelfth-grade mathematics achievement across ethnic groups. 5) Prior achievement did not equally predict course-taking over ethnic groups in amount or type. 6) Closing ethnic achievement gaps will be a function of efforts taken before high school as well as high school coursework
- …