40,981 research outputs found

    Constructive neural networks : generalisation, convergence and architectures

    Full text link
    Feedforward neural networks trained via supervised learning have proven to be successful in the field of pattern recognition. The most important feature of a pattern recognition technique is its ability to successfully classify future data. This is known as generalisation. A more practical aspect of pattern recognition methods is how quickly they can be trained and how reliably a good solution is found. Feedforward neural networks have been shown to provide good generali- sation on a variety of problems. A number of training techniques also exist that provide fast convergence. Two problems often addressed within the field of feedforward neural networks are how to improve thegeneralisation and convergence of these pattern recognition techniques. These two problems are addressed in this thesis through the frame- work of constructive neural network algorithms. Constructive neural networks are a type of feedforward neural network in which the network architecture is built during the training process. The type of architecture built can affect both generalisation and convergence speed. Convergence speed and reliability areimportant properties of feedforward neu- ral networks. These properties are studied by examining different training al- gorithms and the effect of using a constructive process. A new gradient based training algorithm, SARPROP, is introduced. This algorithm addresses the problems of poor convergence speed and reliability when using a gradient based training method. SARPROP is shown to increase both convergence speed and the chance of convergence to a good solution. This is achieved through the combination of gradient based and Simulated Annealing methods. The convergence properties of various constructive algorithms are examined through a series of empirical studies. The results of these studies demonstrate that the cascade architecture allows for faster, more reliable convergence using a gradient based method than a single layer architecture with a comparable num- ber of weights. It is shown that constructive algorithms that bias the search direction of the gradient based training algorithm for the newly added hidden neurons, produce smaller networks and more rapid convergence. A constructive algorithm using search direction biasing is shown to converge to solutions with networks that are unreliable and ineÆcient to train using a non-constructive gradient based algorithm. The technique of weight freezing is shown to result in larger architectures than those obtained from training the whole network. Improving the generalisation ability of constructive neural networks is an im- portant area of investigation. A series of empirical studies are performed to examine the effect of regularisation on generalisation in constructive cascade al- gorithms. It is found that the combination of early stopping and regularisation results in better generalisation than the use of early stopping alone. A cubic regularisation term that greatly penalises large weights is shown to be benefi- cial for generalisation in cascade networks. An adaptive method of setting the regularisation magnitude in constructive networks is introduced and is shown to produce generalisation results similar to those obtained with a fixed, user- optimised regularisation setting. This adaptive method also oftenresults in the construction of smaller networks for more complex problems. The insights obtained from the SARPROP algorithm and from the convergence and generalisation empirical studies are used to create a new constructive cascade algorithm, acasper. This algorithm is extensively benchmarked and is shown to obtain good generalisation results in comparison to a number of well-respected and successful neural network algorithms. A technique of incorporating the validation data into the training set after network construction is introduced and is shown to generally result in similar or improved generalisation. The diÆculties of implementing a cascade architecture in VLSI are described and results are given on the effect of the cascade architecture on such attributes as weight growth, fan-in, network depth, and propagation delay. Two variants of the cascade architecture are proposed. These new architectures are shown to produce similar generalisation results to the cascade architecture, while also addressing the problems of VLSI implementation of cascade networks

    Online adaptation strategies for statistical machine translation in post-editing scenarios

    Full text link
    [EN] One of the most promising approaches to machine translation consists in formulating the problem by means of a pattern recognition approach. By doing so, there are some tasks in which online adapta- tion is needed in order to adapt the system to changing scenarios. In the present work, we perform an exhaustive comparison of four online learning algorithms when combined with two adaptation strategies for the task of online adaptation in statistical machine translation. Two of these algorithms are already well-known in the pattern recognition community, such as the perceptron and passive- aggressive algorithms, but here they are thoroughly analyzed for their applicability in the statistical machine translation task. In addition, we also compare them with two novel methods, i.e., Bayesian predictive adaptation and discriminative ridge regression. In statistical machine translation, the most successful approach is based on a log-linear approximation to a posteriori distribution. According to experimental results, adapting the scaling factors of this log-linear combination of models using discriminative ridge regression or Bayesian predictive adaptation yields the best performance.This paper is based upon work supported by the EC (FP7) under CasMaCat (287576) project and the EC (FEDER/FSE) and the Spanish MICINN under projects MIPRCV "Consolider Ingenio 2010" (CSD2007-00018) and iTrans2 (TIN2009-14511). This work is also supported by the Spanish MITyC under the erudito.com (TSI-020110-2009-439) project, by the Generalitat Valenciana under Grant Prometeo/2009/014, and by the UPV under Grant 20091027. The authors would like to thank the anonymous reviewers for their useful and constructive comments.Martínez Gómez, P.; Sanchis Trilles, G.; Casacuberta Nolla, F. (2012). Online adaptation strategies for statistical machine translation in post-editing scenarios. Pattern Recognition. 45(9):3193-3203. https://doi.org/10.1016/j.patcog.2012.01.011S3193320345

    Incremental learning with respect to new incoming input attributes

    Get PDF
    Neural networks are generally exposed to a dynamic environment where the training patterns or the input attributes (features) will likely be introduced into the current domain incrementally. This paper considers the situation where a new set of input attributes must be considered and added into the existing neural network. The conventional method is to discard the existing network and redesign one from scratch. This approach wastes the old knowledge and the previous effort. In order to reduce computational time, improve generalization accuracy, and enhance intelligence of the learned models, we present ILIA algorithms (namely ILIA1, ILIA2, ILIA3, ILIA4 and ILIA5) capable of Incremental Learning in terms of Input Attributes. Using the ILIA algorithms, when new input attributes are introduced into the original problem, the existing neural network can be retained and a new sub-network is constructed and trained incrementally. The new sub-network and the old one are merged later to form a new network for the changed problem. In addition, ILIA algorithms have the ability to decide whether the new incoming input attributes are relevant to the output and consistent with the existing input attributes or not and suggest to accept or reject them. Experimental results show that the ILIA algorithms are efficient and effective both for the classification and regression problems

    Multi-learner based recursive supervised training

    Get PDF
    In this paper, we propose the Multi-Learner Based Recursive Supervised Training (MLRT) algorithm which uses the existing framework of recursive task decomposition, by training the entire dataset, picking out the best learnt patterns, and then repeating the process with the remaining patterns. Instead of having a single learner to classify all datasets during each recursion, an appropriate learner is chosen from a set of three learners, based on the subset of data being trained, thereby avoiding the time overhead associated with the genetic algorithm learner utilized in previous approaches. In this way MLRT seeks to identify the inherent characteristics of the dataset, and utilize it to train the data accurately and efficiently. We observed that empirically, MLRT performs considerably well as compared to RPHP and other systems on benchmark data with 11% improvement in accuracy on the SPAM dataset and comparable performances on the VOWEL and the TWO-SPIRAL problems. In addition, for most datasets, the time taken by MLRT is considerably lower than the other systems with comparable accuracy. Two heuristic versions, MLRT-2 and MLRT-3 are also introduced to improve the efficiency in the system, and to make it more scalable for future updates. The performance in these versions is similar to the original MLRT system

    SCANN: Synthesis of Compact and Accurate Neural Networks

    Full text link
    Deep neural networks (DNNs) have become the driving force behind recent artificial intelligence (AI) research. An important problem with implementing a neural network is the design of its architecture. Typically, such an architecture is obtained manually by exploring its hyperparameter space and kept fixed during training. This approach is time-consuming and inefficient. Another issue is that modern neural networks often contain millions of parameters, whereas many applications and devices require small inference models. However, efforts to migrate DNNs to such devices typically entail a significant loss of classification accuracy. To address these challenges, we propose a two-step neural network synthesis methodology, called DR+SCANN, that combines two complementary approaches to design compact and accurate DNNs. At the core of our framework is the SCANN methodology that uses three basic architecture-changing operations, namely connection growth, neuron growth, and connection pruning, to synthesize feed-forward architectures with arbitrary structure. SCANN encapsulates three synthesis methodologies that apply a repeated grow-and-prune paradigm to three architectural starting points. DR+SCANN combines the SCANN methodology with dataset dimensionality reduction to alleviate the curse of dimensionality. We demonstrate the efficacy of SCANN and DR+SCANN on various image and non-image datasets. We evaluate SCANN on MNIST and ImageNet benchmarks. In addition, we also evaluate the efficacy of using dimensionality reduction alongside SCANN (DR+SCANN) on nine small to medium-size datasets. We also show that our synthesis methodology yields neural networks that are much better at navigating the accuracy vs. energy efficiency space. This would enable neural network-based inference even on Internet-of-Things sensors.Comment: 13 pages, 8 figure

    Image segmentation with adaptive region growing based on a polynomial surface model

    Get PDF
    A new method for segmenting intensity images into smooth surface segments is presented. The main idea is to divide the image into flat, planar, convex, concave, and saddle patches that coincide as well as possible with meaningful object features in the image. Therefore, we propose an adaptive region growing algorithm based on low-degree polynomial fitting. The algorithm uses a new adaptive thresholding technique with the L∞ fitting cost as a segmentation criterion. The polynomial degree and the fitting error are automatically adapted during the region growing process. The main contribution is that the algorithm detects outliers and edges, distinguishes between strong and smooth intensity transitions and finds surface segments that are bent in a certain way. As a result, the surface segments corresponding to meaningful object features and the contours separating the surface segments coincide with real-image object edges. Moreover, the curvature-based surface shape information facilitates many tasks in image analysis, such as object recognition performed on the polynomial representation. The polynomial representation provides good image approximation while preserving all the necessary details of the objects in the reconstructed images. The method outperforms existing techniques when segmenting images of objects with diffuse reflecting surfaces

    For the Jubilee of Vladimir Mikhailovich Chernov

    Get PDF
    On April 25, 2019, Vladimir Chernov celebrated his 70th birthday, Doctor of Physics and Mathematics, Chief Researcher at the Laboratory of Mathematical Methods of Image Processing of the Image Processing Systems Institute of the Russian Academy of Sciences (IPSI RAS), a branch of the Federal Science Research Center "Crystallography and Photonics RAS and part-Time Professor at the Department of Geoinformatics and Information Security of the Samara National Research University named after academician S.P. Korolev (Samara University). The article briefly describes the scientific and pedagogical achievements of the hero of the day. © Published under licence by IOP Publishing Ltd
    corecore