2,741 research outputs found
QUERY-SPECIFIC SUBTOPIC CLUSTERING IN RESPONSE TO BROAD QUERIES
Information Retrieval (IR) refers to obtaining valuable and relevant information from various sources in response to a specific information need. For the textual domain, the most common form of information sources is a collection of textual documents or text corpus. Depending on the scope of the information need, also referred to as the query, the relevant information can span a wide range of topical themes. Hence, the relevant information may often be scattered through multiple documents in the corpus, and each satisfies the information need to varying degrees. Traditional IR systems present the relevant set of documents in the form of a ranking where the rank of a particular document corresponds to its degree of relevance to the query.
If the query is sufficiently specific, the set of relevant documents will be more or less about similar topics. However, they will be much more topically diverse when the query is vague or about a generalized topic, e.g., ``Computer science. In such cases, multiple documents may be of equal importance as each represents a specific facade of the broad topic of the query. Consider, for example, documents related to information retrieval and machine learning for the query ``Computer Science. In this case, the decision to rank documents from these two subtopics would be ambiguous. Instead, presenting the retrieved results as a cluster of documents where each cluster represents one subtopic would be more appropriate. Subtopic clustering of search results has been explored in the domain of Web-search, where users receive relevant clusters of search results in response to their query.
This thesis explores query-specific subtopic clustering that incorporates queries into the clustering framework. We develop a query-specific similarity metric that governs a hierarchical clustering algorithm. The similarity metric is trained to predict whether a pair of relevant documents should also share the same subtopic cluster in the context of the query. Our empirical study shows that direct involvement of the query in the clustering model significantly improves the clustering performance over a state-of-the-art neural approach on two publicly available datasets. Further qualitative studies provide insights into the strengths and limitations of our proposed approach.
In addition to query-specific similarity metrics, this thesis also explores a new supervised clustering paradigm that directly optimizes for a clustering metric. Being discrete functions, existing approaches for supervised clustering find it difficult to use a clustering metric for optimization. We propose a scalable training strategy for document embedding models that directly optimizes for the RAND index, a clustering quality metric. Our method outperforms a strong neural approach and other unsupervised baselines on two publicly available datasets. This suggests that optimizing directly for the clustering outcome indeed yields better document representations suitable for clustering.
This thesis also studies the generalizability of our findings by incorporating the query-specific clustering approach and our clustering metric-based optimization technique into a single end-to-end supervised clustering model. Also, we extend our methods to different clustering algorithms to show that our approaches are not dependent on any specific clustering algorithm. Having such a generalized query-specific clustering model will help to revolutionize the way digital information is organized, archived, and presented to the user in a context-aware manner
Energy-Efficient Resource Allocation Optimization for Multimedia Heterogeneous Cloud Radio Access Networks
The heterogeneous cloud radio access network (H-CRAN) is a promising paradigm
which incorporates the cloud computing into heterogeneous networks (HetNets),
thereby taking full advantage of cloud radio access networks (C-RANs) and
HetNets. Characterizing the cooperative beamforming with fronthaul capacity and
queue stability constraints is critical for multimedia applications to
improving energy efficiency (EE) in H-CRANs. An energy-efficient optimization
objective function with individual fronthaul capacity and inter-tier
interference constraints is presented in this paper for queue-aware multimedia
H-CRANs. To solve this non-convex objective function, a stochastic optimization
problem is reformulated by introducing the general Lyapunov optimization
framework. Under the Lyapunov framework, this optimization problem is
equivalent to an optimal network-wide cooperative beamformer design algorithm
with instantaneous power, average power and inter-tier interference
constraints, which can be regarded as the weighted sum EE maximization problem
and solved by a generalized weighted minimum mean square error approach. The
mathematical analysis and simulation results demonstrate that a tradeoff
between EE and queuing delay can be achieved, and this tradeoff strictly
depends on the fronthaul constraint
Is Evolution an Algorithm? Effects of local entropy in unsupervised learning and protein evolution
L'abstract è presente nell'allegato / the abstract is in the attachmen
Novel Class Discovery for Long-tailed Recognition
While the novel class discovery has recently made great progress, existing
methods typically focus on improving algorithms on class-balanced benchmarks.
However, in real-world recognition tasks, the class distributions of their
corresponding datasets are often imbalanced, which leads to serious performance
degeneration of those methods. In this paper, we consider a more realistic
setting for novel class discovery where the distributions of novel and known
classes are long-tailed. One main challenge of this new problem is to discover
imbalanced novel classes with the help of long-tailed known classes. To tackle
this problem, we propose an adaptive self-labeling strategy based on an
equiangular prototype representation of classes. Our method infers high-quality
pseudo-labels for the novel classes by solving a relaxed optimal transport
problem and effectively mitigates the class biases in learning the known and
novel classes. We perform extensive experiments on CIFAR100, ImageNet100,
Herbarium19 and large-scale iNaturalist18 datasets, and the results demonstrate
the superiority of our method. Our code is available at
https://github.com/kleinzcy/NCDLR.Comment: TMLR2023, Final versio
Partitioning predictors in multivariate regression models
A Multivariate Regression Model Based on the Optimal Partition of Predictors (MRBOP) useful in applications in the presence of strongly correlated predictors is presented. Such classes of predictors are synthesized by latent factors, which are obtained through an appropriate linear combination of the original variables and are forced to be weakly correlated. Specifically, the proposed model assumes that the latent factors are determined by subsets of predictors characterizing only one latent factor. MRBOP is formalized in a least squares framework optimizing a penalized quadratic objective function through an alternating least-squares (ALS) algorithm. The performance of the methodology is evaluated on simulated and real data sets. © 2013 Springer Science+Business Media New York
You Only Condense Once: Two Rules for Pruning Condensed Datasets
Dataset condensation is a crucial tool for enhancing training efficiency by
reducing the size of the training dataset, particularly in on-device scenarios.
However, these scenarios have two significant challenges: 1) the varying
computational resources available on the devices require a dataset size
different from the pre-defined condensed dataset, and 2) the limited
computational resources often preclude the possibility of conducting additional
condensation processes. We introduce You Only Condense Once (YOCO) to overcome
these limitations. On top of one condensed dataset, YOCO produces smaller
condensed datasets with two embarrassingly simple dataset pruning rules: Low
LBPE Score and Balanced Construction. YOCO offers two key advantages: 1) it can
flexibly resize the dataset to fit varying computational constraints, and 2) it
eliminates the need for extra condensation processes, which can be
computationally prohibitive. Experiments validate our findings on networks
including ConvNet, ResNet and DenseNet, and datasets including CIFAR-10,
CIFAR-100 and ImageNet. For example, our YOCO surpassed various dataset
condensation and dataset pruning methods on CIFAR-10 with ten Images Per Class
(IPC), achieving 6.98-8.89% and 6.31-23.92% accuracy gains, respectively. The
code is available at: https://github.com/he-y/you-only-condense-once.Comment: Accepted by NeurIPS 202
Resource Allocation in Uplink NOMA-IoT Networks: A Reinforcement-Learning Approach
Non-orthogonal multiple access (NOMA) exploits the potential of the power domain to enhance the connectivity for the Internet of Things (IoT). Due to time-varying communication channels, dynamic user clustering is a promising method to increase the throughput of NOMA-IoT networks. This paper develops an intelligent resource allocation scheme for uplink NOMA-IoT communications. To maximise the average performance of sum rates, this work designs an efficient optimization approach based on two reinforcement learning algorithms, namely deep reinforcement learning (DRL) and SARSA-learning. For light traffic, SARSA-learning is used to explore the safest resource allocation policy with low cost. For heavy traffic, DRL is used to handle traffic-introduced huge variables. With the aid of the considered approach, this work addresses two main problems of fair resource allocation in NOMA techniques: 1) allocating users dynamically and 2) balancing resource blocks and network traffic. We analytically demonstrate that the rate of convergence is inversely proportional to network sizes. Numerical results show that: 1) Compared with the optimal benchmark scheme, the proposed DRL and SARSA-learning algorithms have lower complexity with acceptable accuracy and 2) NOMA-enabled IoT networks outperform the conventional orthogonal multiple access based IoT networks in terms of system throughput
- …