559 research outputs found
Support Vector Machines (SVM) in Test Extraction
Text categorization is the process of grouping documents or words into predefined
categories. Each category consists of documents or words having similar attributes.
There exist numerous algorithms to address the need of text categorization including
Naive Bayes, k-nearest-neighbor classifier, and decision trees. In this project, Support
Vector Machines (SVM) is studied and experimented by the implementation ofa textual
extractor. This algorithm is used to extract important points from a lengthy document,
by which it classifies each word in the document under its relevant category and
constructs the structure of the summary with reference to the categorized words. The
performance of the extractor is evaluated using a similar corpus against an existing
summarizer, which uses a different kind of approach. Summarization is part of text
categorization whereby it is considered an essential part of today's information-led
society, and it has been a growing area of research for over 40 years. This project's
objective is to create a summarizer, or extractor, based on machine learning algorithms,
which are namely SVM and K-Means. Each word in the particular document is
processed by both algorithms to determine its actual occurrence in the document by
which it will first be clustered or grouped into categories based on parts of speech (verb,
noun, adjective) which is done by K-Means, then later processed by SVM to determine
the actual occurrence of each word in each of the cluster, taking into account whether
the words have similar meanings with otherwords in the subsequent cluster. The corpus
chosen to evaluate the application is the Reuters-21578 dataset comprising of
newspaper articles. Evaluation of the applications are carried out against another
accompanying system-generated extract which is already in the market, as a means to
observe the amount of sentences overlap with the tested applications, in this case, the
Text Extractor and also Microsoft Word AutoSummarizer. Results show that the Text
Extractor has optimal results at compression rates of 10 - 20% and 35 - 45
A machine learning approach to the unsupervised segmentation of mitochondria in subcellular electron microscopy data
Recent advances in cellular and subcellular microscopy demonstrated its potential towards unravelling the mechanisms of various diseases at the molecular level. The biggest challenge in both human- and computer-based visual analysis of micrographs is the variety of nanostructures and mitochondrial morphologies. The state-of-the-art is, however, dominated by supervised manual data annotation and early attempts to automate the segmentation process were based on supervised machine learning techniques which require large datasets for training. Given a minimal number of training sequences or none at all, unsupervised machine learning formulations, such as spectral dimensionality reduction, are known to be superior in detecting salient image structures.
This thesis presents three major contributions developed around the spectral clustering framework which is proven to capture perceptual organization features. Firstly, we approach the problem of mitochondria localization. We propose a novel grouping method for the extracted line segments which describes the normal mitochondrial morphology. Experimental findings show that the clusters obtained successfully model the inner mitochondrial membrane folding and therefore can be used as markers for the subsequent segmentation approaches. Secondly, we developed an unsupervised mitochondria segmentation framework. This method follows the evolutional ability of human vision to extrapolate salient membrane structures in a micrograph. Furthermore, we designed robust non-parametric similarity models according to Gestaltic laws of visual segregation. Experiments demonstrate that such models automatically adapt to the statistical structure of the biological domain and return optimal performance in pixel classification tasks under the wide variety of distributional assumptions. The last major contribution addresses the computational complexity of spectral clustering. Here, we introduced a new anticorrelation-based spectral clustering formulation with the objective to improve both: speed and quality of segmentation. The experimental findings showed the applicability of our dimensionality reduction algorithm to very large scale problems as well as asymmetric, dense and non-Euclidean datasets
Learning based biological image analysis
The fate of contemporary scientific research in biology and medicine is bound to the advancements in computational methods. The unprecedented data explosion in microscopy and the crescent interest of life scientists in studying more complex and more subtle interactions stimulate the research for innovative computational solutions on challenging real world applications. Extensions and novel formulations
of generic and flexible methods based on learning/inference are necessary to cope with the large variety of the produced data and to avoid continuous reimplementation
and heavy parameter tuning. This thesis exploits cutting edge machine learning methods based on structured probabilistic models and weakly supervised learning
to provide four novel solutions in the areas of large-scale microscopic imaging and multiple objects tracking.
Chapter 2 introduces a weakly supervised learning framework to tackle the problem of detecting defect images while mining massive microscopic imagery databases. This thesis demonstrates accurate prediction with low user annotation
effort. Chapter 3 presents a learning approach for counting overlapping objects in images based on local structured predictors. This problem has numerous applications
in high throughput microscopy screening such as cells counting for drug toxicity assays. Chapter 4 develops a deterministic graphical model to impose temporal consistency in objects counts when dealing with a video sequence. This Chapter shows that global (temporal and spatial) structural inference consistently
improves over local (only spatial) predictions. The method developed in Chapter 4 is used in a novel downstream tracking algorithm which is introduced in Chapter 5.
This Chapter tackles, for the first time, the difficult problem of tracking heavily overlapping, translucent and indistinguishable objects. The mutual occlusion event
handling of such objects is formulated as a novel structured inference problem based on the minimization of a convex multi-commodity flow energy. The optimal
weights of the energy terms are learned with partial user supervision using structured learning with latent variables.To support behavioral biologists, we apply this method to the problem of tracking a community of interacting Drosophila larvae
Visual Clutter Study for Pedestrian Using Large Scale Naturalistic Driving Data
Some of the pedestrian crashes are due to driver’s late or difficult perception of pedestrian’s appearance. Recognition of pedestrians during driving is a complex cognitive activity. Visual clutter analysis can be used to study the factors that affect human visual search efficiency and help design advanced driver assistant system for better decision making and user experience. In this thesis, we propose the pedestrian perception evaluation model which can quantitatively analyze the pedestrian perception difficulty using naturalistic driving data. An efficient detection framework was developed to locate pedestrians within large scale naturalistic driving data. Visual clutter analysis was used to study the factors that may affect the driver’s ability to perceive pedestrian appearance. The candidate factors were explored by the designed exploratory study using naturalistic driving data and a bottom-up image-based pedestrian clutter metric was proposed to quantify the pedestrian perception difficulty in naturalistic driving data. Based on the proposed bottom-up clutter metrics and top-down pedestrian appearance based estimator, a Bayesian probabilistic pedestrian perception evaluation model was further constructed to simulate the pedestrian perception process
Support Vector Machines (SVM) in Test Extraction
Text categorization is the process of grouping documents or words into predefined
categories. Each category consists of documents or words having similar attributes.
There exist numerous algorithms to address the need of text categorization including
Naive Bayes, k-nearest-neighbor classifier, and decision trees. In this project, Support
Vector Machines (SVM) is studied and experimented by the implementation ofa textual
extractor. This algorithm is used to extract important points from a lengthy document,
by which it classifies each word in the document under its relevant category and
constructs the structure of the summary with reference to the categorized words. The
performance of the extractor is evaluated using a similar corpus against an existing
summarizer, which uses a different kind of approach. Summarization is part of text
categorization whereby it is considered an essential part of today's information-led
society, and it has been a growing area of research for over 40 years. This project's
objective is to create a summarizer, or extractor, based on machine learning algorithms,
which are namely SVM and K-Means. Each word in the particular document is
processed by both algorithms to determine its actual occurrence in the document by
which it will first be clustered or grouped into categories based on parts of speech (verb,
noun, adjective) which is done by K-Means, then later processed by SVM to determine
the actual occurrence of each word in each of the cluster, taking into account whether
the words have similar meanings with otherwords in the subsequent cluster. The corpus
chosen to evaluate the application is the Reuters-21578 dataset comprising of
newspaper articles. Evaluation of the applications are carried out against another
accompanying system-generated extract which is already in the market, as a means to
observe the amount of sentences overlap with the tested applications, in this case, the
Text Extractor and also Microsoft Word AutoSummarizer. Results show that the Text
Extractor has optimal results at compression rates of 10 - 20% and 35 - 45
- …