795 research outputs found
Lung_PAYNet: a pyramidal attention based deep learning network for lung nodule segmentation
Accurate and reliable lung nodule segmentation in computed tomography (CT) images is required for early diagnosis of lung cancer. Some of the difficulties in detecting lung nodules include the various types and shapes of lung nodules, lung nodules near other lung structures, and similar visual aspects. This study proposes a new model named Lung_PAYNet, a pyramidal attention-based architecture, for improved lung nodule segmentation in low-dose CT images. In this architecture, the encoder and decoder are designed using an inverted residual block and swish activation function. It also employs a feature pyramid attention network between the encoder and decoder to extract exact dense features for pixel classification. The proposed architecture was compared to the existing UNet architecture, and the proposed methodology yielded significant results. The proposed model was comprehensively trained and validated using the LIDC-IDRI dataset available in the public domain. The experimental results revealed that the Lung_PAYNet delivered remarkable segmentation with a Dice similarity coefficient of 95.7%, mIOU of 91.75%, sensitivity of 92.57%, and precision of 96.75%
A practical vision system for the detection of moving objects
The main goal of this thesis is to review and offer robust and efficient algorithms for the detection (or the segmentation) of foreground objects in indoor and outdoor scenes using colour image sequences captured by a stationary camera. For this purpose, the block diagram of a simple vision system is offered in Chapter 2. First this block diagram gives the idea of a precise order of blocks and their tasks, which should be performed to detect moving foreground objects. Second, a check mark () on the top right corner of a block indicates that this thesis contains a review of the most recent algorithms and/or some relevant research about it.
In many computer vision applications, segmenting and extraction of moving objects in video sequences is an essential task. Background subtraction has been widely used for this purpose as the first step.
In this work, a review of the efficiency of a number of important background subtraction and modelling algorithms, along with their major features, are presented. In addition, two background approaches are offered. The first approach is a Pixel-based technique whereas the second one works at object level. For each approach, three algorithms are presented. They are called Selective Update Using Non-Foreground Pixels of the Input Image , Selective Update Using Temporal Averaging and Selective Update Using Temporal Median , respectively in this thesis. The first approach has some deficiencies, which makes it incapable to produce a correct dynamic background. Three methods of the second approach use an invariant colour filter and a suitable motion tracking technique, which selectively exclude foreground objects (or blobs) from the background frames. The difference between the three algorithms of the second approach is in updating process of the background pixels. It is shown that the Selective Update Using Temporal Median method produces the correct background image for each input frame.
Representing foreground regions using their boundaries is also an important task. Thus, an appropriate RLE contour tracing algorithm has been implemented for this purpose. However, after the thresholding process, the boundaries of foreground regions often have jagged appearances. Thus, foreground regions may not correctly be recognised reliably due to their corrupted boundaries. A very efficient boundary smoothing method based on the RLE data is proposed in Chapter 7. It just smoothes the external and internal boundaries of foreground objects and does not distort the silhouettes of foreground objects. As a result, it is very fast and does not blur the image.
Finally, the goal of this thesis has been presenting simple, practical and efficient algorithms with little constraints which can run in real time
Automatic interpretation of clock drawings for computerised assessment of dementia
The clock drawing test (CDT) is a standard neurological test for detection of cognitive impairment. A computerised version of the test has potential to improve test accessibility and accuracy. CDT sketch interpretation is one of the first stages in the analysis of the computerised test. It produces a set of recognised digits and symbols together with their positions on the clock face. Subsequently, these are used in the test scoring. This is a challenging problem because the average CDT taker has a high likelihood of cognitive impairment, and writing is one of the first functional activities to be affected. Current interpretation systems perform less well on this kind of data due to its unintelligibility. In this thesis, a novel automatic interpretation system for CDT sketch is proposed and developed. The proposed interpretation system and all the related algorithms developed in this thesis are evaluated using a CDT data set collected for this study. This data consist of two sets, the first set consisting of 65 drawings made by healthy people, and the second consisting of 100 drawings reproduced from drawings of dementia patients.
This thesis has four main contributions. The first is a conceptual model of the proposed CDT sketch interpretation system based on integrating prior knowledge of the expected CDT sketch structure and human reasoning into the drawing interpretation system. The second is a novel CDT sketch segmentation algorithm based on supervised machine learning and a new set of temporal and spatial features automatically extracted from the CDT data. The evaluation of the proposed method shows that it outperforms the current state-of-the-art method for CDT drawing segmentation. The third contribution is a new
v
handwritten digit recognition algorithm based on a set of static and dynamic features extracted from handwritten data. The algorithm combines two classifiers, fuzzy k-nearest neighbour’s classifier with a Convolutional Neural Network (CNN), which take advantage both of static and dynamic data representation. The proposed digit recognition algorithm is shown to outperform each classifier individually in terms of recognition accuracy.
The final contribution of this study is the probabilistic Situational Bayesian Network (SBN), which is a new hierarchical probabilistic model for addressing the problem of fusing diverse data sources, such as CDT sketches created by healthy volunteers and dementia patients, in a probabilistic Bayesian network. The evaluation of the proposed SBN-based CDT sketch interpretation system on CDT data shows highly promising results, with 100% recognition accuracy for heathy CDT drawings and 97.15% for dementia data.
To conclude, the proposed automatic CDT sketch interpretation system shows high accuracy in terms of recognising different sketch objects and thus paves the way for further research in dementia and clinical computer-assisted diagnosis of dementia
Exploiting primitive grouping constraints for noise robust automatic speech recognition : studies with simultaneous speech.
Significant strides have been made in the field of automatic speech recognition over the past three decades. However, the systems are not robust; their performance degrades in the presence of even moderate amounts of noise. This thesis presents an approach to developing a speech recognition system that takes inspiration firom the approach of human speech recognition
Recommended from our members
Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.
The design of a machine which reads unconstrained words still remains an unsolved problem. For example, automatic interpretation of handwritten documents by a computer is still under research. Most systems attempt to segment words into letters and read words one character at a time. However, segmenting handwritten words is very difficult. So to avoid this words are treated as a whole. This research investigates a number of features computed from whole words for the recognition of handwritten words in particular. Arabic text classification and recognition is a complicated process compared to Latin and Chinese text recognition systems. This is due to the nature cursiveness of Arabic text.
The work presented in this thesis is proposed for word based recognition of handwritten Arabic scripts. This work is divided into three main stages to provide a recognition system. The first stage is the pre-processing, which applies efficient pre-processing methods which are essential for automatic recognition of handwritten documents. In this stage, techniques for detecting baseline and segmenting words in handwritten Arabic text are presented. Then connected components are extracted, and distances between different components are analyzed. The statistical distribution of these distances is then obtained to determine an optimal threshold for word segmentation. The second stage is feature extraction. This stage makes use of the normalized images to extract features that are essential in recognizing the images. Various method of feature extraction are implemented and examined. The third and final stage is the classification. Various classifiers are used for classification such as K nearest neighbour classifier (k-NN), neural network classifier (NN), Hidden Markov models (HMMs), and the Dynamic Bayesian Network (DBN). To test this concept, the particular pattern recognition problem studied is the classification of 32492 words using
ii
the IFN/ENIT database. The results were promising and very encouraging in terms of improved baseline detection and word segmentation for further recognition. Moreover, several feature subsets were examined and a best recognition performance of 81.5% is achieved
Recommended from our members
A digital neural network approach to speech recognition
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis presents two novel methods for isolated word speech recognition based on sub-word components. A digital neural network is the fundamental processing strategy in both methods. The first design is based on the 'Separate Segmentation &
Labelling' (SS&L) approach. The spectral data of the input utterance is first segmented into phoneme-like units which are then time normalised by linear time normalisation. The neural network labels the
time-normalised phoneme-like segments 78.36% recognition accuracy is achieved for the phoneme-like unit. In the second design, no time normalisation is required. After segmentation, recognition is performed by classifying the data in a window as it is slid one frame at a time, from the start to the end of of each phoneme-like segment in the utterance. 73.97% recognition accuracy for the phoneme-like unit is achieved in this application. The parameters of the neural net have been optimised for
maximum recognition performance. A segmentation strategy using the sum of the difference in filterbank channel energy over successive spectra produced 80.27% correct segmentation of isolated utterances into phoneme-like units. A linguistic processor based on that of Kashyap & Mittal [84] enables 93.11% and 93.49% word recognition accuracy to be achieved for the SS&L and 'Sliding Window' recognisers respectively. The linguistic processor has been redesigned to make it portable so that it can be easily applied to any phoneme based isolated word speech recogniser.This work is funded by the Ministry of Science & Technology, Government of Pakistan
- …