128 research outputs found

    Estimating Classification Uncertainty of Bayesian Decision Tree Technique on Financial Data

    Get PDF
    Copyright © 2007 Springer. The final publication is available at link.springer.comBook title: Perception-based Data Mining and Decision Making in Economics and FinanceSummary Bayesian averaging over classification models allows the uncertainty of classification outcomes to be evaluated, which is of crucial importance for making reliable decisions in applications such as financial in which risks have to be estimated. The uncertainty of classification is determined by a trade-off between the amount of data available for training, the diversity of a classifier ensemble and the required performance. The interpretability of classification models can also give useful information for experts responsible for making reliable classifications. For this reason Decision Trees (DTs) seem to be attractive classification models. The required diversity of the DT ensemble can be achieved by using the Bayesian model averaging all possible DTs. In practice, the Bayesian approach can be implemented on the base of a Markov Chain Monte Carlo (MCMC) technique of random sampling from the posterior distribution. For sampling large DTs, the MCMC method is extended by Reversible Jump technique which allows inducing DTs under given priors. For the case when the prior information on the DT size is unavailable, the sweeping technique defining the prior implicitly reveals a better performance. Within this chapter we explore the classification uncertainty of the Bayesian MCMC techniques on some datasets from the StatLog Repository and real financial data. The classification uncertainty is compared within an Uncertainty Envelope technique dealing with the class posterior distribution and a given confidence probability. This technique provides realistic estimates of the classification uncertainty which can be easily interpreted in statistical terms with the aim of risk evaluation

    Regional population expenditure for foodstuffs in the Russian Federation: componential and cluster analyses

    Full text link
    The article describes the solving of the problem of conducting the component and cluster analyses of population expenditure on food as one of the most important components of the standard of living. The purpose of the analysis is to develop the regional clusters of the Russian Federation, which vary in the structure of household expenditure for foodstuffs. The foodstuffs are presented in absolute units taking into integral account the standard of living index. The methods of intellectual analysis such as component and cluster analyses are applied as the research methods. The procedure for the data intellectual analysis based on the interconnected performance of component and cluster analyses is proposed. The procedure of the data intellectual analysis considers the interrelation between the results received by different methods, and also the possibility to return to the previous method for the purpose of repeating the analysis to specify consistently the clusters composition. Few clusters of the wealthy regions characterized by the high and average levels of expenditure for foodstuffs are revealed as well as the quite many clusters of not enough wealthy and not wealthy regions characterized by the low level of expenditure for foodstuffs. It is shown that the growth of standard of living characterized by the size of a gross regional product per capita is followed by the growth of the Gini coefficient, which indicates both the inequality of income distribution and reduction in expenditure for low-value foodstuffs. The results of the analysis can be applied to the development of the decision-making support system intended for the analysis of the scenarios of macroeconomic regulation in the eld of income policy for the purpose of increasing the standard of living of population. The analysis of the population expenditure for foodstuffs has allowed to reveal the cluster structure of the regions of the Russian Federation, to show it according to the generalized indications, to formulate the specific characteristics of the clusters of the regions and important management decisions

    A Semantic-Based Knowledge Management Platform

    Get PDF
    We describe the development of a semantic-based knowledge management platform for web-enabled environments featuring intelligence and insight capabilities. The main objective of the platform is to semantically search, analyze and present information retrieved from the web (or any other type of document) as well as allows domain ontology to evolve periodically. This is achieved through the use of Multi-Agent Systems and ontologies, one for building distributed systems and the other for knowledge representation. The most important feature of the SKMP lie in that the information retrieved from the web is the source of ontology evolving, while the periodical ontology evolving that will enrich domain ontologies by adding more semantics in return significantly improves efficiency of the semantic retrieval, i.e., the two are mutually reinforcing relationship. We test and verify the feature through three domain ontology from different domain

    Higher education decision making and decision support systems

    Get PDF
    The authors illustrate several issues in decision support and decision support systems (DSS), state of the art research in these fields, and also their own studies in designing a higher education DSS. The final section contains our contribution in outlining the modules of the DSS, involving the present systems and databases of FSEGA and UBB, results and activities belonging to FSEGA students, teaching and research staff, to assist decisions for all the actors implicated in the processes, in various specific situations.decision support, decision support systems (DSS), higher education institutions, Information and Communication Technologies (ICT)

    Are foreign currency markets interdependent? evidence from data mining technologies

    Get PDF
    This study uses two data mining methodologies: Classification and Regression Trees (C&RT) and Generalized Rule Induction (GRI) to uncover patterns among daily cash closing prices of eight currency markets. Data from 2000 through 2009 is used, with the last year held out to test the robustness of the rules found in the previous nine years. Results from the two methodologies are contrasted. A number of rules which perform well in both the training and testing years are discussed as empirical evidence of interdependence among foreign currency markets. The mechanical rules identified in this paper can usefully supplement other types of financial modeling of foreign currencies.Foreign Currency Markets

    Constructing Time Series Shape Association Measures: Minkowski Distance and Data Standardization

    Full text link
    It is surprising that last two decades many works in time series data mining and clustering were concerned with measures of similarity of time series but not with measures of association that can be used for measuring possible direct and inverse relationships between time series. Inverse relationships can exist between dynamics of prices and sell volumes, between growth patterns of competitive companies, between well production data in oilfields, between wind velocity and air pollution concentration etc. The paper develops a theoretical basis for analysis and construction of time series shape association measures. Starting from the axioms of time series shape association measures it studies the methods of construction of measures satisfying these axioms. Several general methods of construction of such measures suitable for measuring time series shape similarity and shape association are proposed. Time series shape association measures based on Minkowski distance and data standardization methods are considered. The cosine similarity and the Pearsons correlation coefficient are obtained as particular cases of the proposed general methods that can be used also for construction of new association measures in data analysis.Comment: Presented at BRICS CCI 2013, Porto de Galinhas, Brasil, 8-11 September 2013. Reference on Proceedings of BRICS CCI 2013 is adde

    The Applicability Of Data Mining Techniques In Eurostat Databases: An Example Of The Decision Tree

    Get PDF
    As a result of the rapid development of technology, infrastructure that is used for obtaining and storing data has also been developed continuously. Besides that, importance of knowledge as an indispensable element for individuals and institutions has increased each passing day. However, former data management techniques have become insufficient for mass of data which increases rapidly. Therefore, there was a need for new methods. Data mining is a field that emerges variety of data extraction techniques to meet these requirements. Eurostat is a statistical office of European Union. Its purpose is to serve objective and accurate data to decision makers. These statistics are open for everyone to use. Although Eurostat databases are very comprehensive and useful, it is particularly difficult to find academic publications related to data mining. In this study, it is intended to do data mining study using the statistics that provided by Eurostat. In order to accomplish the analysis, “Information Society” field was selected and the data was analyzed with using a decision tree algorithm of data mining. In the end, the analysis results were presented. It is also intended to shed some light to the next studies

    Identification of Patterns of Consumption through the Daily Mean Outdoor Temperature

    Get PDF
    The identification and recognition of patterns in the context of building is a necessary feedback to create intelligent buildings. In this context, the key is empowering the systems with learning elements to make decisions. The challenge is detected element to predict the human behavior in the building. Daily mean outdoor temperature is one of the variables with incidence in the human comfort due to the weather adaptation of the users. In this paper it analyzed the consumption in an office respect to the internal temperature and the daily mean temperature through cluster techniques. The cluster can be used as a forecasting of consumption