101,357 research outputs found

    On the complexity of curve fitting algorithms

    Get PDF
    We study a popular algorithm for fitting polynomial curves to scattered data based on the least squares with gradient weights. We show that sometimes this algorithm admits a substantial reduction of complexity, and, furthermore, find precise conditions under which this is possible. It turns out that this is, indeed, possible when one fits circles but not ellipses or hyperbolas.Comment: 8 pages, no figure

    A Gaussian-mixed Fuzzy Clustering Model on Valence-Arousal-related fMRI Data-Set

    Get PDF
    Previous medical experiments illustrated that Valence and Arousal were high corresponded to brain response by amygdala and orbital frontal cortex through observation by functional magnetic resonance imaging (fMRI). In this paper, Valence-Arousal related fMRI data-set were acquired from the picture stimuli experiments, and finally the relative Valence -Arousal feature values for a given word that corresponding to a given picture stimuli were calculated. Gaussian bilateral filter and independent components analysis (ICA) based Gaussian component method were applied for image denosing and segmenting; to construct the timing signals of Valence and Arousal from fMRI data-set separately, expectation maximal of Gaussian mixed model was addressed to calculate the histogram, and furthermore, Otsu curve fitting algorithm was introduced to scale the computational complexity; time series based Valence -Arousal related curve were finally generated. In Valence-Arousal space, a fuzzy c-mean method was applied to get typical point that represented the word relative to the picture. Analyzed results showed the effectiveness of the proposed methods by comparing with other algorithms for feature extracting operations on fMRI data-set including power spectrum density (PSD), spline, shape-preserving and cubic fitting methods

    Multi-learner based recursive supervised training

    Get PDF
    In this paper, we propose the Multi-Learner Based Recursive Supervised Training (MLRT) algorithm which uses the existing framework of recursive task decomposition, by training the entire dataset, picking out the best learnt patterns, and then repeating the process with the remaining patterns. Instead of having a single learner to classify all datasets during each recursion, an appropriate learner is chosen from a set of three learners, based on the subset of data being trained, thereby avoiding the time overhead associated with the genetic algorithm learner utilized in previous approaches. In this way MLRT seeks to identify the inherent characteristics of the dataset, and utilize it to train the data accurately and efficiently. We observed that empirically, MLRT performs considerably well as compared to RPHP and other systems on benchmark data with 11% improvement in accuracy on the SPAM dataset and comparable performances on the VOWEL and the TWO-SPIRAL problems. In addition, for most datasets, the time taken by MLRT is considerably lower than the other systems with comparable accuracy. Two heuristic versions, MLRT-2 and MLRT-3 are also introduced to improve the efficiency in the system, and to make it more scalable for future updates. The performance in these versions is similar to the original MLRT system

    Evaluating Overfit and Underfit in Models of Network Community Structure

    Full text link
    A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

    Using Band Ratio, Semi-Empirical, Curve Fitting, and Partial Least Squares (PLS) Models to Estimate Cyanobacterial Pigment Concentration from Hyperspectral Reflectance

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)This thesis applies several different remote sensing techniques to data collected from 2005 to 2007 on central Indiana reservoirs to determine the best performing algorithms in estimating the cyanobacterial pigments chlorophyll a and phycocyanin. This thesis is a set of three scientific papers either in press or review at the time this thesis is published. The first paper describes using a curve fitting model as a novel approach to estimating cyanobacterial pigments from field spectra. The second paper compares the previous method with additional methods, band ratio and semi-empirical algorithms, commonly used in remote sensing. The third paper describes using a partial least squares (PLS) method as a novel approach to estimate cyanobacterial pigments from field spectra. While the three papers had different methodologies and cannot be directly compared, the results from all three studies suggest that no type of algorithm greatly outperformed another in estimating chlorophyll a on central Indiana reservoirs. However, algorithms that account for increased complexity, such as the stepwise regression band ratio (also known as 3-band tuning), curve fitting, and PLS, were able to predict phycocyanin with greater confidence

    Sampling-based Prediction of Algorithm Runtime

    Get PDF
    The ability to handle and analyse massive amounts of data has been progressively improved during the last decade with the growth of computing power and the opening up of the Internet era. Nowadays, machine learning algorithms have been widely applied in various fields of engineering sciences and in real world applications. However, currently, users of machine learning algorithms do not usually receive feedback on when a given algorithm will have finished building a model for a particular data set. While in theory such estimation can be obtained by asymptotic performance analysis, the complexity of machine learning algorithms means theoretical asymptotic performance analysis can be a very difficult task. This work has two goals. The first goal is to investigate how to use sampling-based techniques to predict the running time of a machine learning algorithm training on a particular data set. The second goal is to empirically evaluate a set of sampling-based running time prediction methods. Experimental results show that, with some care in the sampling stage, application of appropriate transformations on the running time observations followed by the use of suitable curve fitting algorithms makes it possible to obtain useful average-case running time predictions and an approximate time function for a given machine learning algorithm building a model on a particular data set. There are 41 WEKA (Witten Frank, 2005) machine learning algorithms are used for the experiments

    Time-Domain Macromodeling of High Speed Distributed Networks

    Get PDF
    With the rapid growth in density, operating speeds and complexity of modern very-large-scale integration (VLSI) circuits, there is a growing demand on efficient and accurate modeling and simulation of high speed interconnects and packages in order to ensure the signal integrity, reliability and performance of electronic systems. Such models can be derived from the knowledge of the physical characteristics of the structure or based on the measured port-to-port response.In the first part of this thesis, a passive macromodeling technique based on Method of Characteristics (referred as Passive Method of Characteristics or PMoC) is described which is applicable for modeling of electrically long high-speed interconnect networks. This algorithm is based on extracting the propagation delay of the interconnect followed by a low order rational approximation to capture the attenuation effects. The key advantage of the algorithm is that the curve fitting to realize the macromodel depends only on per-unit-length (p.u.l.) parameters and not on the length of the transmission line. In this work, the PMoC is developed to model multiconductor transmission lines.Next, an efficient approach for time domain sensitivity analysis of lossy high speed interconnects in the presence of nonlinear terminations is presented based on PMoC. An important feature of the proposed method is that the sensitivities are obtained from the solution of the original network, leading to significant computational advantages. The sensitivity analysis is also used to optimize the physical parameters of the network to satisfy the required design constraints. A time-domain macromodel for lossy multiconductor transmission lines exposed to electromag¬netic interference is also described in this thesis based on PMoC. The algorithm provides an efficient mechanism to ensure the passivity of the macromodel for different line lengths. Numerical examples illustrate that when compared to other passive incident field coupling algorithms, the proposed method is efficient in modeling electrically long interconnects since delay extraction without segmentation is used to capture the frequency response.In addition, this thesis discusses macromodeling techniques for complex packaging structures based on the frequency-domain behavior of the system obtained from measurements or electromagnetic simulators. Such techniques approximate the transfer function of the interconnect network as a rational function which can be embedded with modern circuit simulators with integrated circuit emphasis (SPICE). One of the most popular tools for rational approximations of measured or simulated data is based on vector fitting (VF) algorithms. Nonetheless, the vector fitting algorithms usually suffer convergence issues and lack of accuracy when dealing with noisy measured data. As a part of this thesis, a methodology is presented to improve the convergence and accuracy issues of vector fitting algorithm based on instrumental variable technique. This methodology is based on obtaining the “instruments” in an iterative manner and do not increase the complexity of vector fitting to capture the frequency response and minimize the biasing
    corecore