1,080 research outputs found

    Hierarchical Classification of Scientific Taxonomies with Autonomous Underwater Vehicles

    Get PDF
    Autonomous Underwater Vehicles (AUVs) have catalysed a significant shift in the way marine habitats are studied. It is now possible to deploy an AUV from a ship, and capture tens of thousands of georeferenced images in a matter of hours. There is a growing body of research investigating ways to automatically apply semantic labels to this data, with two goals. The task of manually labelling a large number of images is time consuming and error prone. Further, there is the potential to change AUV surveys from being geographically defined (based on a pre-planned route), to permitting the AUV to adapt the mission plan in response to semantic observations. This thesis focusses on frameworks that permit a unified machine learning approach with applicability to a wide range of geographic areas, and diverse areas of interest for marine scientists. This can be addressed through the use of hierarchical classification; in which machine learning algorithms are trained to predict not just a binary or multi-class outcome, but a hierarchy of related output labels which are not mutually exclusive, such as a scientific taxonomy. In order to investigate classification on larger hierarchies with greater geographic diversity, the BENTHOZ-2015 data set was assembled as part of a collaboration between five Australian research groups. Existing labelled data was re-mapped to the CATAMI hierarchy, in total more than 400,000 point labels, conforming to a hierarchy of around 150 classes. The common hierarchical classification approach of building a network of binary classifiers was applied to the BENTHOZ-2015 data set, and a novel application of Bayesian Network theory and probability calibration was used as a theoretical foundation for the approach, resulting in improved classifier performance. This was extended to a more complex hidden node Bayesian Network structure, which permits inclusion of additional sensor modalities, and tuning for better performance in particular geographic regions

    Automatic synthesis of fuzzy systems: An evolutionary overview with a genetic programming perspective

    Get PDF
    Studies in Evolutionary Fuzzy Systems (EFSs) began in the 90s and have experienced a fast development since then, with applications to areas such as pattern recognition, curve‐fitting and regression, forecasting and control. An EFS results from the combination of a Fuzzy Inference System (FIS) with an Evolutionary Algorithm (EA). This relationship can be established for multiple purposes: fine‐tuning of FIS's parameters, selection of fuzzy rules, learning a rule base or membership functions from scratch, and so forth. Each facet of this relationship creates a strand in the literature, as membership function fine‐tuning, fuzzy rule‐based learning, and so forth and the purpose here is to outline some of what has been done in each aspect. Special focus is given to Genetic Programming‐based EFSs by providing a taxonomy of the main architectures available, as well as by pointing out the gaps that still prevail in the literature. The concluding remarks address some further topics of current research and trends, such as interpretability analysis, multiobjective optimization, and synthesis of a FIS through Evolving methods

    Automatic detection of constraints in software documentation

    Get PDF
    2021 Fall.Includes bibliographical references.Software documentation is an important resource when maintaining and evolving software, as it supports developers in program understanding. To keep it up to date, developers need to verify that all the constraints affected by a change in source code are consistently described in the documentation. The process of detecting all the constraints in the documentation and cross-checking the constraints in the source code is time-consuming. An approach capable of automatically identifying software constraints in documentation could facilitate the process of detecting constraints, which are necessary to cross-check documentation and source code. In this thesis, we explore different machine learning algorithms to build binary classification models that assign sentences extracted from software documentation to one of two categories: constraints and non-constraints. The models are trained on a data set that consists of 368 manually tagged sentences from four open-source software systems. We evaluate the performance of the different models (Decision tree, Naive Bayes, Support Vector Machine, Fine-tuned BERT) based on precision, recall and F1-score. Our best model (i.e., a decision tree featuring bigrams) was able to achieve 74.0% precision, 83.8% recall and an F1-score of 0.79. This suggests that our results are promising and that it is possible to build machine learning based models for the automatic detection of constraints in the software documentation

    Learning Ontology Relations by Combining Corpus-Based Techniques and Reasoning on Data from Semantic Web Sources

    Get PDF
    The manual construction of formal domain conceptualizations (ontologies) is labor-intensive. Ontology learning, by contrast, provides (semi-)automatic ontology generation from input data such as domain text. This thesis proposes a novel approach for learning labels of non-taxonomic ontology relations. It combines corpus-based techniques with reasoning on Semantic Web data. Corpus-based methods apply vector space similarity of verbs co-occurring with labeled and unlabeled relations to calculate relation label suggestions from a set of candidates. A meta ontology in combination with Semantic Web sources such as DBpedia and OpenCyc allows reasoning to improve the suggested labels. An extensive formal evaluation demonstrates the superior accuracy of the presented hybrid approach

    Towards the detection and analysis of performance regression introducing code changes

    Get PDF
    In contemporary software development, developers commonly conduct regression testing to ensure that code changes do not affect software quality. Performance regression testing is an emerging research area from the regression testing domain in software engineering. Performance regression testing aims to maintain the system\u27s performance. Conducting performance regression testing is known to be expensive. It is also complex, considering the increase of committed code and developing team members working simultaneously. Many automated regression testing techniques have been proposed in prior research. However, challenges in the practice of locating and resolving performance regression still exist. Directing regression testing to the commit level provides solutions to locate the root cause, yet it hinders the development process. This thesis outlines motivations and solutions to address locating performance regression root causes. First, we challenge a deterministic state-of-art approach by expanding the testing data to find improvement areas. The deterministic approach was found to be limited in searching for the best regression-locating rule. Thus, we presented two stochastic approaches to develop models that can learn from historical commits. The goal of the first stochastic approach is to view the research problem as a search-based optimization problem seeking to reach the highest detection rate. We are applying different multi-objective evolutionary algorithms and conducting a comparison between them. This thesis also investigates whether simplifying the search space by combining objectives would achieve comparative results. The second stochastic approach addresses the severity of class imbalance any system could have since code changes introducing regression are rare but costly. We formulate the identification of problematic commits that introduce performance regression as a binary classification problem that handles class imbalance. Further, the thesis provides an exploratory study on the challenges developers face in resolving performance regression. The study is based on the questions posted on a technical form directed to performance regression. We collected around 2k questions discussing the regression of software execution time, and all were manually analyzed. The study resulted in a categorization of the challenges. We also discussed the difficulty level of performance regression issues within the development community. This study provides insights to help developers during the software design and implementation to avoid regression causes

    Time-slice analysis of dyadic human activity

    Get PDF
    La reconnaissance d’activitĂ©s humaines Ă  partir de donnĂ©es vidĂ©o est utilisĂ©e pour la surveillance ainsi que pour des applications d’interaction homme-machine. Le principal objectif est de classer les vidĂ©os dans l’une des k classes d’actions Ă  partir de vidĂ©os entiĂšrement observĂ©es. Cependant, de tout temps, les systĂšmes intelligents sont amĂ©liorĂ©s afin de prendre des dĂ©cisions basĂ©es sur des incertitudes et ou des informations incomplĂštes. Ce besoin nous motive Ă  introduire le problĂšme de l’analyse de l’incertitude associĂ©e aux activitĂ©s humaines et de pouvoir passer Ă  un nouveau niveau de gĂ©nĂ©ralitĂ© liĂ© aux problĂšmes d’analyse d’actions. Nous allons Ă©galement prĂ©senter le problĂšme de reconnaissance d’activitĂ©s par intervalle de temps, qui vise Ă  explorer l’activitĂ© humaine dans un intervalle de temps court. Il a Ă©tĂ© dĂ©montrĂ© que l’analyse par intervalle de temps est utile pour la caractĂ©risation des mouvements et en gĂ©nĂ©ral pour l’analyse de contenus vidĂ©o. Ces Ă©tudes nous encouragent Ă  utiliser ces intervalles de temps afin d’analyser l’incertitude associĂ©e aux activitĂ©s humaines. Nous allons dĂ©tailler Ă  quel degrĂ© de certitude chaque activitĂ© se produit au cours de la vidĂ©o. Dans cette thĂšse, l’analyse par intervalle de temps d’activitĂ©s humaines avec incertitudes sera structurĂ©e en 3 parties. i) Nous prĂ©sentons une nouvelle famille de descripteurs spatiotemporels optimisĂ©s pour la prĂ©diction prĂ©coce avec annotations d’intervalle de temps. Notre reprĂ©sentation prĂ©dictive du point d’intĂ©rĂȘt spatiotemporel (Predict-STIP) est basĂ©e sur l’idĂ©e de la contingence entre intervalles de temps. ii) Nous exploitons des techniques de pointe pour extraire des points d’intĂ©rĂȘts afin de reprĂ©senter ces intervalles de temps. iii) Nous utilisons des relations (uniformes et par paires) basĂ©es sur les rĂ©seaux neuronaux convolutionnels entre les diffĂ©rentes parties du corps de l’individu dans chaque intervalle de temps. Les relations uniformes enregistrent l’apparence locale de la partie du corps tandis que les relations par paires captent les relations contextuelles locales entre les parties du corps. Nous extrayons les spĂ©cificitĂ©s de chaque image dans l’intervalle de temps et examinons diffĂ©rentes façons de les agrĂ©ger temporellement afin de gĂ©nĂ©rer un descripteur pour tout l’intervalle de temps. En outre, nous crĂ©ons une nouvelle base de donnĂ©es qui est annotĂ©e Ă  de multiples intervalles de temps courts, permettant la modĂ©lisation de l’incertitude inhĂ©rente Ă  la reconnaissance d’activitĂ©s par intervalle de temps. Les rĂ©sultats expĂ©rimentaux montrent l’efficience de notre stratĂ©gie dans l’analyse des mouvements humains avec incertitude.Recognizing human activities from video data is routinely leveraged for surveillance and human-computer interaction applications. The main focus has been classifying videos into one of k action classes from fully observed videos. However, intelligent systems must to make decisions under uncertainty, and based on incomplete information. This need motivates us to introduce the problem of analysing the uncertainty associated with human activities and move to a new level of generality in the action analysis problem. We also present the problem of time-slice activity recognition which aims to explore human activity at a small temporal granularity. Time-slice recognition is able to infer human behaviours from a short temporal window. It has been shown that temporal slice analysis is helpful for motion characterization and for video content representation in general. These studies motivate us to consider timeslices for analysing the uncertainty associated with human activities. We report to what degree of certainty each activity is occurring throughout the video from definitely not occurring to definitely occurring. In this research, we propose three frameworks for time-slice analysis of dyadic human activity under uncertainty. i) We present a new family of spatio-temporal descriptors which are optimized for early prediction with time-slice action annotations. Our predictive spatiotemporal interest point (Predict-STIP) representation is based on the intuition of temporal contingency between time-slices. ii) we exploit state-of-the art techniques to extract interest points in order to represent time-slices. We also present an accumulative uncertainty to depict the uncertainty associated with partially observed videos for the task of early activity recognition. iii) we use Convolutional Neural Networks-based unary and pairwise relations between human body joints in each time-slice. The unary term captures the local appearance of the joints while the pairwise term captures the local contextual relations between the parts. We extract these features from each frame in a time-slice and examine different temporal aggregations to generate a descriptor for the whole time-slice. Furthermore, we create a novel dataset which is annotated at multiple short temporal windows, allowing the modelling of the inherent uncertainty in time-slice activity recognition. All the three methods have been evaluated on TAP dataset. Experimental results demonstrate the effectiveness of our framework in the analysis of dyadic activities under uncertaint

    Intelligent data leak detection through behavioural analysis

    Get PDF
    In this paper we discuss a solution to detect data leaks in an intelligent and furtive way through a real time analysis of the user’s behaviour while handling classified information. Data is based on experiences with real world use cases and a variety of data preparation and data analysis techniques have been tried. Results show the feasibility of the approach, but also the necessity to correlate with other security events to improve the precision.UID/CEC/00319/201
    • 

    corecore