962 research outputs found
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods
Measuring the similarity of short written contexts is a fundamental problem
in Natural Language Processing. This article provides a unifying framework by
which short context problems can be categorized both by their intended
application and proposed solution. The goal is to show that various problems
and methodologies that appear quite different on the surface are in fact very
closely related. The axes by which these categorizations are made include the
format of the contexts (headed versus headless), the way in which the contexts
are to be measured (first-order versus second-order similarity), and the
information used to represent the features in the contexts (micro versus macro
views). The unifying thread that binds together many short context applications
and methods is the fact that similarity decisions must be made between contexts
that share few (if any) words in common.Comment: 23 page
AMLP:Adaptive Masking Lesion Patches for Self-supervised Medical Image Segmentation
Self-supervised masked image modeling has shown promising results on natural
images. However, directly applying such methods to medical images remains
challenging. This difficulty stems from the complexity and distinct
characteristics of lesions compared to natural images, which impedes effective
representation learning. Additionally, conventional high fixed masking ratios
restrict reconstructing fine lesion details, limiting the scope of learnable
information. To tackle these limitations, we propose a novel self-supervised
medical image segmentation framework, Adaptive Masking Lesion Patches (AMLP).
Specifically, we design a Masked Patch Selection (MPS) strategy to identify and
focus learning on patches containing lesions. Lesion regions are scarce yet
critical, making their precise reconstruction vital. To reduce
misclassification of lesion and background patches caused by unsupervised
clustering in MPS, we introduce an Attention Reconstruction Loss (ARL) to focus
on hard-to-reconstruct patches likely depicting lesions. We further propose a
Category Consistency Loss (CCL) to refine patch categorization based on
reconstruction difficulty, strengthening distinction between lesions and
background. Moreover, we develop an Adaptive Masking Ratio (AMR) strategy that
gradually increases the masking ratio to expand reconstructible information and
improve learning. Extensive experiments on two medical segmentation datasets
demonstrate AMLP's superior performance compared to existing self-supervised
approaches. The proposed strategies effectively address limitations in applying
masked modeling to medical images, tailored to capturing fine lesion details
vital for segmentation tasks
Empirical studies on word representations
One of the most fundamental tasks in natural language processing is representing words with mathematical objects (such as vectors). The word representations, which are most often estimated from data, allow capturing the meaning of words. They enable comparing words according to their semantic similarity, and have been shown to work extremely well when included in complex real-world applications. A large part of our work deals with ways of estimating word representations directly from large quantities of text. Our methods exploit the idea that words which occur in similar contexts have a similar meaning. How we define the context is an important focus of our thesis. The context can consist of a number of words to the left and to the right of the word in question, but, as we show, obtaining context words via syntactic links (such as the link between the verb and its subject) often works better. We furthermore investigate word representations that accurately capture multiple meanings of a single word. We show that translation of a word in context contains information that can be used to disambiguate the meaning of that word
Web Spam DetectionUsing Fuzzy Clustering
Internet is the most widespread medium to express our views and ideas and a lucrative platform for delivering the products. F or this in tention, search engine plays a key role. The information or data about the web pages are stored in an index database of the search engine for use in later queries. Web spam refers to a host of techniques to challenge the ranking algorithms of web search en gines and cause them to rank their web pages higher or for some other beneficial purpose. Usually, the web spam is irritating the web surfers and makes disruption. It ruins the quality of the web search engine. So, in this paper, we presented an efficient clustering method to detect the spam web pages effectively and accurately. Also, we employed various validation measures to validate our research work by using the clustering methods. The comparison s between the obtained charts and the val idation results clearly explain that the research work we presented produces the better result
- …