71,175 research outputs found
Neural Ranking Models with Weak Supervision
Despite the impressive improvements achieved by unsupervised deep neural
networks in computer vision and NLP tasks, such improvements have not yet been
observed in ranking for information retrieval. The reason may be the complexity
of the ranking problem, as it is not obvious how to learn from queries and
documents when no supervised signal is available. Hence, in this paper, we
propose to train a neural ranking model using weak supervision, where labels
are obtained automatically without human annotators or any external resources
(e.g., click data). To this aim, we use the output of an unsupervised ranking
model, such as BM25, as a weak supervision signal. We further train a set of
simple yet effective ranking models based on feed-forward neural networks. We
study their effectiveness under various learning scenarios (point-wise and
pair-wise models) and using different input representations (i.e., from
encoding query-document pairs into dense/sparse vectors to using word embedding
representation). We train our networks using tens of millions of training
instances and evaluate it on two standard collections: a homogeneous news
collection(Robust) and a heterogeneous large-scale web collection (ClueWeb).
Our experiments indicate that employing proper objective functions and letting
the networks to learn the input representation based on weakly supervised data
leads to impressive performance, with over 13% and 35% MAP improvements over
the BM25 model on the Robust and the ClueWeb collections. Our findings also
suggest that supervised neural ranking models can greatly benefit from
pre-training on large amounts of weakly labeled data that can be easily
obtained from unsupervised IR models.Comment: In proceedings of The 40th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR2017
Optimizing Vision Transformers for Medical Image Segmentation
For medical image semantic segmentation (MISS), Vision Transformers have
emerged as strong alternatives to convolutional neural networks thanks to their
inherent ability to capture long-range correlations. However, existing research
uses off-the-shelf vision Transformer blocks based on linear projections and
feature processing which lack spatial and local context to refine organ
boundaries. Furthermore, Transformers do not generalize well on small medical
imaging datasets and rely on large-scale pre-training due to limited inductive
biases. To address these problems, we demonstrate the design of a compact and
accurate Transformer network for MISS, CS-Unet, which introduces convolutions
in a multi-stage design for hierarchically enhancing spatial and local modeling
ability of Transformers. This is mainly achieved by our well-designed
Convolutional Swin Transformer (CST) block which merges convolutions with
Multi-Head Self-Attention and Feed-Forward Networks for providing inherent
localized spatial context and inductive biases. Experiments demonstrate CS-Unet
without pre-training outperforms other counterparts by large margins on
multi-organ and cardiac datasets with fewer parameters and achieves
state-of-the-art performance. Our code is available at Github
Distributed Algorithms in Large-scaled Empirical Risk Minimization: Non-convexity, Adaptive-sampling, and Matrix-free Second-order Methods
The rising amount of data has changed the classical approaches in statistical modeling significantly. Special methods are designed for inferring meaningful relationships and hidden patterns from these large datasets, which build the foundation of a study called Machine Learning (ML). Such ML techniques have already applied widely in various areas and achieved compelling success. In the meantime, the huge amount of data also requires a deep revolution of current techniques, like the availability of advanced data storage, new efficient large-scale algorithms, and their distributed/parallelized implementation.There is a broad class of ML methods can be interpreted as Empirical Risk Minimization (ERM) problems. When utilizing various loss functions and likely necessary regularization terms, one could approach their specific ML goals by solving ERMs as separable finite sum optimization problems. There are circumstances where the nonconvex component is introduced into the ERMs which usually makes the problems hard to optimize. Especially, in recent years, neural networks, a popular branch of ML, draw numerous attention from the community. Neural networks are powerful and highly flexible inspired by the structured functionality of the brain. Typically, neural networks could be treated as large-scale and highly nonconvex ERMs.While as nonconvex ERMs become more complex and larger in scales, optimization using stochastic gradient descent (SGD) type methods proceeds slowly regarding its convergence rate and incapability of being distributed efficiently. It motivates researchers to explore more advanced local optimization methods such as approximate-Newton/second-order methods.In this dissertation, first-order stochastic optimization for the regularized ERMs in Chapter1 is studied. Based on the development of stochastic dual coordinate accent (SDCA) method, a dual free SDCA with non-uniform mini-batch sampling strategy is investigated [30, 29]. We also introduce several efficient algorithms for training ERMs, including neural networks, using second-order optimization methods in a distributed environment. In Chapter 2, we propose a practical distributed implementation for Newton-CG methods. It makes training neural networks by second-order methods doable in the distributed environment [28]. In Chapter 3, we further build steps towards using second-order methods to train feed-forward neural networks with negative curvature direction utilization and momentum acceleration. In this Chapter, we also report numerical experiments for comparing second-order methods and first-order methods regarding training neural networks. The following Chapter 4 purpose an distributed accumulative sample-size second-order methods for solving large-scale convex ERMs and nonconvex neural networks [35]. In Chapter 5, a python library named UCLibrary is briefly introduced for solving unconstrained optimization problems. This dissertation is all concluded in the last Chapter 6
Design and implementation of image based object recognition
The aim of this paper is to design and implement image based object recognition. This represents more of a challenge when speaking of advance object recognition systems. A practical example of this issue is the recognition of objects in images. This is a task that humans can perform very well, but convolutional neural network systems don’t struggle to perform. AlexNet pre-trained model was used for the training the dataset because of it trouble-free architecture on very large scale dataset “Cifar-10” using R2019a Matlab. The dataset was split with the ratio of 70% for training and 30% for the testing part. This has prompted convolutional neural network to start experimenting with networks architectures as well as new algorithms to train them. This research paper presents an approach to train networks such as to improve their robustness to the recognition of object images on R2019a Matlab. This training strategy is then evaluated for designed AlexNet network architecture. The result of the study was that the training algorithm could improve robustness to different image recognition at the expense of an increase in performance for the performance of images of objects (i.e. Dog, Frog, Deer, Automobile, Airplane etc.) with high accuracy of recognition. When the advantages of different types of architectures were evaluated, it was found that accuracy of all object recognition were around 98% based on the image. It followed the findings from classical object recognition that feed-forward neural networks could perform as well their high accuracy of recognition
- …