Search CORE

23,517 research outputs found

Automatic supervised information extraction of structured web data

Author: Pérez Ruiz David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/06/2019
Field of study

The overall purpose of this project is, in short words, to create a system able to extract vital information from product web pages just like a human would. Information like the name of the product, its description, price tag, company that produces it, and so on. At a first glimpse, this may not seem extraordinary or technically difficult, since web scraping techniques exist from long ago (like the python library Beautiful Soup for instance, an HTML parser1 released in 2004). But let us think for a second on what it actually means being able to extract desired information from any given web source: the way information is displayed can be extremely varied, not only visually, but also semantically. For instance, some hotel booking web pages display at once all prices for the different room types, while medium-sized consumer products in websites like Amazon offer the main product in detail and then more small-sized product recommendations further down the page, being the latter the preferred way of displaying assets by most retail companies. And each with its own styling and search engines. With the above said, the task of mining valuable data from the web now does not sound as easy as it first seemed. Hence the purpose of this project is to shine some light on the Automatic Supervised Information Extraction of Structured Web Data problem. It is important to think if developing such a solution is really valuable at all. Such an endeavour both in time and computing resources should lead to a useful end result, at least on paper, to justify it. The opinion of this author is that it does lead to a potentially valuable result. The targeted extraction of information of publicly available consumer-oriented content at large scale in an accurate, reliable and future proof manner could provide an incredibly useful and large amount of data. This data, if kept updated, could create endless opportunities for Business Intelligence, although exactly which ones is beyond the scope of this work. A simple metaphor explains the potential value of this work: if an oil company were to be told where are all the oil reserves in the planet, it still should need to invest in machinery, workers and time to successfully exploit them, but half of the job would have already been done2. As the reader will see in this work, the way the issue is tackled is by building a somehow complex architecture that ends in an Artificial Neural Network3. A quick overview of such architecture is as follows: first find the URLs that lead to the product pages that contain the desired data that is going to be extracted inside a given site (like URLs that lead to ”action figure” products inside the site ebay.com); second, per each URL passed, extract its HTML and make a screenshot of the page, and store this data in a suitable and scalable fashion; third, label the data that will be fed to the NN4; fourth, prepare the aforementioned data to be input in an NN; fifth, train the NN; and sixth, deploy the NN to make [hopefully accurate] predictions

UPCommons. Portal del coneixement obert de la UPC

Asymmetric long memory GARCH: a reply to Hwang's model

Author: Pérez Ana
Ruiz Esther
Publication venue
Publication date: 01/11/2001
Field of study

Hwang (2001) proposes the FIFGARCH model to represent long memory asymmetric conditional variance. Although he claims that this model nests many previous models, we show that it does not and that the model is badly specified. We propose and alternative specification

e-Archivo (Univ. Carlos III de Madrid e-Archivo)

Asymmetric long memory garch: a reply to hwang's model

Author: Pérez Ana
Ruiz Esther
Publication venue: 'Elsevier BV'
Publication date: 01/01/2003
Field of study

Hwang (Econom. Lett. 71 (2001) 1) proposes the FIFGARCH model to represent long memory asymmetric conditional variances. However, the model is badly specified and does not nest some fractionally integrated heteroskedastic models previously proposed. We suggest an alternative specification and illustrate the results with simulated data.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

e-Archivo (Univ. Carlos III de Madrid e-Archivo)

Properties of the sample autocorrelations of non-linear transformations in long memory stochastic volatility models

Author: Pérez A.
Ruiz Esther
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2003
Field of study

The autocorrelations of log-squared, squared, and absolute financial returns are often used to infer the dynamic properties of the underlying volatility. This article shows that, in the context of long-memory stochastic volatility models, these autocorrelations are smaller than the autocorrelations of the log volatility and so is the rate of decay for squared and absolute returns. Furthermore, the corresponding sample autocorrelations could have severe negative biases, making the identification of conditional heteroscedasticity and long memory a difficult task. Finally, we show that the power of some popular tests for homoscedasticity is larger when they are applied to absolute returns.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

e-Archivo (Univ. Carlos III de Madrid e-Archivo)

Finite sample properties of a QML estimator of Stochastic Volatility models with long memory

Author: Pérez Ana
Ruiz Esther
Publication venue: 'Elsevier BV'
Publication date: 01/01/2001
Field of study

We analyse the finite sample properties of a QML estimator of LMSV models. We show up its poor performance for realistic parameter values. We discuss an identification problem when the volatility has a unit root. An empirical analysis illustrates our findings.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

e-Archivo (Univ. Carlos III de Madrid e-Archivo)

PROPERTIES OF THE SAMPLE AUTOCORRELATIONS IN AUTOREGRESSIVE STOCHASTIC VOLATlLITY MODELS

Author: Ana Pérez
Esther Ruiz
Publication venue
Publication date
Field of study

Time series generated by Stochastic Volatility (SV) processes are uncorrelated although not independent. This has consequences on the properties of the sample autocorrelations. In this paper, we analyse the asymptotic and finite sample properties of the correlogram of series generated by SV processes. It is shown that the usual uncorrelatedness tests could be misleading. The properties of the correlogram of the log-squared series, often used as a diagnostic of conditional heteroscedasticity, are also analysed. It is proven that the more persistent and the larger the variance of volatility, the larger the negative bias of fue sample autocorrelations of that series.

Research Papers in Economics

Stochastic volatility models and the Taylor effect

Author: Mora Galán Alberto
Pérez Ana
Ruiz Esther
Publication venue
Publication date: 01/11/2004
Field of study

It has been often empirically observed that the sample autocorrelations of absolute financial returns are larger than those of squared returns. This property, know as Taylor effect, is analysed in this paper in the Stochastic Volatility (SV) model framework. We show that the stationary autoregressive SV model is able to generate this property for realistic parameter specifications. On the other hand, the Taylor effect is shown not to be a sampling phenomena due to estimation biases of the sample autocorrelations. Therefore, financial models that aims to explain the behaviour of financial returns should take account of this property

e-Archivo (Univ. Carlos III de Madrid e-Archivo)

Estimating US persistent and transitory monetary shocks: implications for monetary policy

Author: Jesús Ruiz
Juan Angel Lafuente
Rafaela Pérez
Publication venue
Publication date
Field of study

This paper proposes an estimation method for persistent and transitory monetary shocks using the monetary policy modeling proposed in Andolfatto et al, [Journal of Monetary Economics, 55 (2008), pp.: 406-422]. The contribution of the paper is threefold: a) to deal with non-Gaussian innovations, we consider a convenient reformulation of the state-space representation that enables us to use the Kalman filter as an optimal estimation algorithm. Now the state equation allows expectations play a significant role in explaining the future time evolution of monetary shocks; b) it offers the possibility to perform maximum likelihood estimation for all the parameters involved in the monetary policy, and c) as a consequence, we can estimate the conditional probability that a regime change has occurred in the current period given an observed monetary shock. Empirical evidence on US monetary policy making is provided through the lens of a Taylor rule, suggesting that the Fed’s policy was implemented accordingly with the macroeconomic conditions after the Great Moderation. The use of the particle filter produces similar quantitative and qualitative findings. However, our procedure has much less computational cost.Kalman filter, Non-normality, Particle filter, Monetary policy

Research Papers in Economics

Tax Reforms in an Endogenous Growth Model with Pollution

Author: Esther Fernández
Jesús Ruiz
Rafaela Pérez Sánchez
Publication venue
Publication date
Field of study

This paper discusses the effects of a green tax reform in an AK growth model without abatement activities and with a negative environmental externality in utility function. There is also a non-optimal level of public spending. The results depend on the financing source of public spending. When there is not public debt, a revenue-neutral green tax reform has not any effect on pollution, growth and welfare. On the contrary, when short-run deficits are financed by debt issuing, a variety of green tax reforms increase welfare. Nevertheless, in this framework, non-green tax reforms are also welfare improving.Environmental externalities, Economic growth, Pollution taxes, Laffer Curve.

Research Papers in Economics