22,750 research outputs found

    A flexible Tool for Model Building: the Relevant Transformation of the Inputs Network Approach (RETINA)

    Get PDF
    A new method, called relevant transformation of the inputs network approach (RETINA) is proposed as a tool for model building and selection. It is designed to improve some of the shortcomings of neural networks. It has the flexibility of neural network models, the concavity of the likelihood in the weights of the usual likelihood models, and the ability to identify a parsimonious set of attributes that are likely to be relevant for predicting out of sample outcomes. RETINA expands the range of models by considering transformations of the original inputs; splits the sample in three disjoint subsamples, sorts the candidate regressors by a saliency feature, chooses the models in subsample 1, uses subsample 2 for parameter estimation and subsample 3 for cross-validation. It is modular, can be used as a data exploratory tool and is computationally feasible in personal computers. In tests on simulated data, it achieves high rates of successes when the sample size or the R2 are large enough. As our experiments show, it is superior to alternative procedures such as the non negative garrote and forward and backward stepwise regression.

    What Advertisers Want: A Hedonic Analysis of Advertising Rates in South African Consumer Magazines

    Get PDF
    This article explores the role of circulation, readership and reader demographics in the determination of advertising rates in South African consumer magazines. The study uses panel data collected between 2000 and 2003 to quantify the relationships by assigning implicit prices to various magazine characteristics. Furthermore, a synopsis of the structure of the magazine industry in South Africa is developed using cluster-analytic techniques. The analysis lends some statistical credence to some widely held beliefs in the publishing industry; namely that advertisers value the young, the educated and the affluent as audiences. The role of race and gender in the determination of magazine advertising rates is also explored.

    Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models.

    Get PDF
    Knowing the catalytic turnover numbers of enzymes is essential for understanding the growth rate, proteome composition, and physiology of organisms, but experimental data on enzyme turnover numbers is sparse and noisy. Here, we demonstrate that machine learning can successfully predict catalytic turnover numbers in Escherichia coli based on integrated data on enzyme biochemistry, protein structure, and network context. We identify a diverse set of features that are consistently predictive for both in vivo and in vitro enzyme turnover rates, revealing novel protein structural correlates of catalytic turnover. We use our predictions to parameterize two mechanistic genome-scale modelling frameworks for proteome-limited metabolism, leading to significantly higher accuracy in the prediction of quantitative proteome data than previous approaches. The presented machine learning models thus provide a valuable tool for understanding metabolism and the proteome at the genome scale, and elucidate structural, biochemical, and network properties that underlie enzyme kinetics

    Interjurisdictional Housing Prices and Spatial Amenities: Which Measures of Housing Prices Reflect Local Public Goods?

    Get PDF
    Understanding the spatial variation in housing prices plays a crucial role in topics ranging from the cost of living to quality-of-life indices to studies of public goods and household mobility. Yet analysts have not reached a consensus on the best source of such data, variously using self-reported values from the census, transactions values, tax assessments, and rental values. Additionally, while most studies use micro-level data, some have used summary statistics such as the median housing value. Assessing neighborhood price indices in Los Angeles, we find that indices based on transactions prices are highly correlated with indices based on self-reported values, but the former are better correlated with public goods. Moreover, rental values have a higher correlation with public goods and income levels than either asset-value measure. Finally, indices based on median values are poorly correlated with the other indices, public goods, and income.

    Online Tool Condition Monitoring Based on Parsimonious Ensemble+

    Full text link
    Accurate diagnosis of tool wear in metal turning process remains an open challenge for both scientists and industrial practitioners because of inhomogeneities in workpiece material, nonstationary machining settings to suit production requirements, and nonlinear relations between measured variables and tool wear. Common methodologies for tool condition monitoring still rely on batch approaches which cannot cope with a fast sampling rate of metal cutting process. Furthermore they require a retraining process to be completed from scratch when dealing with a new set of machining parameters. This paper presents an online tool condition monitoring approach based on Parsimonious Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly flexible principle where both ensemble structure and base-classifier structure can automatically grow and shrink on the fly based on the characteristics of data streams. Moreover, the online feature selection scenario is integrated to actively sample relevant input attributes. The paper presents advancement of a newly developed ensemble learning algorithm, pENsemble+, where online active learning scenario is incorporated to reduce operator labelling effort. The ensemble merging scenario is proposed which allows reduction of ensemble complexity while retaining its diversity. Experimental studies utilising real-world manufacturing data streams and comparisons with well known algorithms were carried out. Furthermore, the efficacy of pENsemble was examined using benchmark concept drift data streams. It has been found that pENsemble+ incurs low structural complexity and results in a significant reduction of operator labelling effort.Comment: this paper has been published by IEEE Transactions on Cybernetic

    United Kingdom and United States Tourism Demand for Malaysia:A Cointegration Analysis

    Get PDF
    Tourism industry has been an important contributor to the Malaysia economy. In this paper we inspect variations in the long run demand for tourism from United Kingdom and United States to Malaysia. The demand for tourism has been explained by macroeconomic variables, including income in origin countries, tourism prices in Malaysia, and travel cost between the two countries. Annual data from 1972 to 2006 are used for the analysis. Augmented Dickey-Fuller and Johansen’s maximum likelihood tests are used to test for unit root and cointegration. An error correction model (ECM) are estimated to a explain United Kingdom and United States demand for tourism to Malaysia. The results show that the long run equilibrium exists among variables, and the United Kingdom and United States tourists seem to be highly sensitive to the price variable.Tourism demand, cointegration analysis, Error Correction Model
    • …
    corecore