48 research outputs found

    SGL-SVM方法研究及其在财务困境预测中的应用

    Get PDF
    针对分类问题,本文提出了稀疏组Lasso支持向量机方法(Sparse group lasso SVM,SGLSVM),即在SVM模型的损失函数中引入SGL惩罚函数,能同时进行组间变量和组内变量的筛选。由于SGL-SVM的目标函数求解比较复杂,本文又提出了一种快速的双层坐标下降算法。通过模拟实验,发现SGL-SVM方法在预测效果和变量选择上均要优于其他方法,对于变量具有自然分组结构且组内是稀疏的数据,本文方法在提高变量选择效果的同时又能提高模型的预测精度。最后,本文将SGL-SVM方法应用到我国制造业上市公司财务困境预测中。国家自然科学基金“广义线性模型的组变量选择及其在信用评分中的应用”(71471152);;全国统计科学研究重点项目“大数据下的信用评分研究”(2015629);;中央高校基本科研业务费专项资金“多源异构大数据的整合分析研究”(20720181003,20720171095)的资

    The Research of Random Forest Theory and its Application in Economics and Finance

    Get PDF
    近年来,各学科间不断地融合,研究方法相互渗透已成为现代科学发展的一大趋势。金融理论、数理统计、计量经济学、计算机技术、数据挖掘、机器学习等学科的融合为经济金融的研究提供了新的研究方法和思想。我们注意到,起源于数据挖掘领域的非参数随机森林方法,以非参数决策树方法为基础,借助于机器学习的组合预测思想,结合计算机技术,不仅可以很好地处理非线性、非高斯问题,而且具有较高的预测精度。此外,在非参数随机森林的基础上,不断发展出了分位数回归森林、随机生存回归森林等,并在医学、市场营销、物理、考古等领域都有众多应用。而我国在非参数的研究非常滞后,几乎都停留在简单的非参数方法的应用上,对非参数随机森林的研究目前...Recently, due to the extensively interaction between different disciplines such as financial theory, mathematical statistics, econometrics, computer technology, data mining and machine learning etc., people observed that it has become a useful tool to analyze the financial markets by combining different techniques within these areas. Inspired by this idea, we conclude that the nonparametric random...学位:经济学博士院系专业:经济学院计划统计系_统计学学号:1542007015367

    大数据时代统计学应拥抱数据科学

    Get PDF
    21世纪是信息爆炸的时代,随着计算机技术的飞速发展,极大地便利了数据的采集和存储,各个部门每天都积累了大量的数据,比如商业银行交易记录、超市的销售记录、政府统计中各中小企业的财务报表等等。同时这些数据的维度也越来越高,比如研究基因与癌症的关系涉及的基因有几万个,信用评分中有上千个自变量等等。数据来源多样化,有业务记录数据,有传感器数据,也有第三方数据,甚至是从网上爬取来的数据等。此外,数据的格式也越来越多样化,有结构化

    基于多源数据融合的个人信用评分研究

    Get PDF
    随着信息技术的发展,数据来源越来越多,虽然可以更加精准、科学地刻画个人信用状况,但由于数据来源多、结构复杂等问题,给传统的征信技术带来了挑战。本文提出了基于多源数据融合的个人信用模型,可以同时对多个数据集进行建模和变量选择,同时考虑了数据集间的相似性和异质性。通过模拟实验发现,本文所提出的整合模型在变量选择和分类效果方面都具有明显的优势。此外,将整合模型应用于城市和农村两个数据集的个人信用评分中发现,整合模型在实际应用中也有很好的表现。国家自然科学基金面上项目“广义线性模型的组变量选择及其在信用评分中的应用”(71471152);;全国统计科学研究重点项目“大数据下的信用评分研究”(2015629);;中央高校基本科研业务费专项资金“多源异构大数据的整合分析研究”(20720171095)的资

    Forecasting of Enterprise's Credit Risk Based on Network-logistic Model

    Get PDF
    随着计算机和互联网的快速发展,特别是在大数据时代,企业积累了大量有关企业经营、财务等相关数据,变量众多且关系纷繁复杂,如果利用传统的logistic回归建立企业信用风险预警模型往往效果不好。本文在充分考虑变量间的网络结构(Network)关系基础上,提出了网络结构Logistic模型,通过惩罚方法同时实现变量选择和参数估计。蒙特卡洛模拟表明网络结构Logistic模型要优于其他方法。最后,我们将其应用到我国企业信用风险预警中,充分考虑财务指标间的网络结构关系,科学地选择评估指标,构建更加适合我国国情的企业信用风险预警方法。With the rapid development of computer and the Internet,especially in the era of big data,some enterprises has accumulated a lot about their operation and finance data. Since the data is numerous and complicated,if we use the traditional logistic regression to build up the enterprise credit risk,the performance usually isn't good. In this paper,we propose network-logistic model based on considering the network relationship among variables,via penalized method to conduct variable selection and parameters estimation simultaneously. Simulation results show that network-logistic model performs better than other compared methods. Finally,we apply it to forecast enterprise's credit risk,under considering the network relationship between financial indicators,select significant variables and build up a suitable credit risk forecasting model for Chinese enterprises.国家自然科学基金面上项目“广义线性模型的组变量选择及其在信用评分中的应用”(71471152);; 国家社会科学基金重大项目“大数据与统计学理论的发展研究”(13&ZD148);国家社会科学基金青年项目“大数据的高维变量选择方法及其应用研究”(13CTJ001)的资

    Research on Dealing with Missing Data Based on Clustering and Association Rule

    Get PDF
    本文提出了基于聚类和关联规则的缺失数据处理新方法,通过聚类方法将含有缺失数据的数据集相近的记录归到一类,然后利用改进后的关联规则方法对各子数据集挖掘变量间的关联性,并利用这种关联性来填补缺失数据。通过实例分析,发现该方法对缺失数据处理,尤其是对在先验辅助信息缺失情况下的海量数据集具有较好的效果。This paper proposed a new method of dealing with missing data based on clustering and association rule.Firstly,we divided the original data set into several parts by clustering method,and then use the improved association rule to investigate useful rules between the variables on those child data sets,and use these rules to fill the missing data.We found that this method has a good result on handling massive data sets with missing data by empirical study.国家社科基金重点项目“国家统计数据质量管理问题研究”(09AZD0345)阶段性成

    Ordinal rank cluster and analysis of active period of earthquakes

    Get PDF
    本文在对fISHEr最优求解有序聚类方法和有序近邻聚类方法剖析的基础上,提出了有序秩聚类分析方法,并对fISHEr最优求解、有序近邻聚类和有序秩聚类在计算效率上进行了比较分析,研究表明有序秩聚类在处理海量数据具有明显的优势。最后利用该方法对我国南北地震带活跃期进行分析,取得了良好的效果。This paper gives a new method of cluster for ordered samples-ordinal rank cluster based on the Fisher and near-neighbour cluster methods,and compares these three methods on the efficiency of computation.The results show that the ordinal rank cluster is superior to other methods on analysis of massive data.At last,this method is applied to analyze the active period of earthquakes of north-south earthquake belt in china,and it have good effect.国家教育部社科研究规划项目(06JA910003)资

    Study on Effects of Social Security on Household Consumption

    Get PDF
    本文基于CgSS《中国城乡居民生活综合调查》2006年的家庭微观调查数据分析了我国城乡家庭消费支出的分布特征,按有无社会保障把城乡家庭分别分为两组,利用分位数回归方法研究不同消费层次上,社会保障对城乡家庭消费的影响,并利用反事实分析与分位数分解方法对这两组家庭消费差异进行研究。主要结论有:有社会保障家庭人均消费要高于无社会保障家庭人均消费,消费收入弹性呈“几“字型,有社会保障和无社会保障家庭的消费差异主要是由收入、地产财富等差异造成的。最后,提出了具有针对性的政策建议。In this paper,we examined the distribution of consumption expenditure using the mirco survey data of "a comprehensive survey of Chinese urban and rural residents" from CGSS in 2006.Rural and urban residents are separately grouped into two groups,with social security and without social security.Quantile regression is used to study the effects of social security on household consumption on different consumption levels.Counterfactual analysis and quantile decomposition are used to decompose the differences between the two groups' household consumption.The main conclusions are: average basic consumption expenditure of households with social security is much higher than that of households without security.Finally,targeted policy recommendation was proposed.中央高校基本科研业务费专项资金(2010221040); 国家自然科学基金项目(71201139); 国家统计局统计科研计划项目(71201139)资

    Default Forecasting on Housing Mortgage and Interest Rate Policy Simulation

    Get PDF
    本文首次构建了基于非参数随机森林(rAndOM fOrEST,rf)的住房贷款违约风险评估模型,利用某大型银行个人住房贷款数据,研究了借款人特征、贷款特征、房产特征和经济文化特征等因素对贷款违约的影响。实证研究发现已偿还比例、利率、贷款收入比、额度等是贷款违约最重要的影响因素,并且rf方法的预测准确率明显高于lOgISTIC模型等其他方法。此外,本文还研究了利率调整对贷款违约的影响,发现利率对违约率的影响是负方向的,且呈不对称性和非线性。This paper proposed a housing mortgage default risk forecasting model based on non-parametric random forest at first.Then by using the housing mortgage database from a big famous bank in China,this paper studied the effect of housing mortgage default according to borrowers' characteristics,loan characteristics,housing characteristics and local economic and cultural characteristics.The empirical study found that the proportion which had been repaid,interest rate,ratio of loan to income,loan amount were the most important factors.The results also showed the prediction accuracy of RF were much higher than other methods such as logistic regression.In addition,this paper also studied how the interest rate affected mortgage default,finding that interest rate had negative effect,which were asymmetry and nonlinear,on the mortgage default

    Research on Price Discovery Function of Stock Index Futures in Chinese Emerging Market

    Get PDF
    本文基于沪深300股指期货5分钟高频数据,利用协整检验、误差修正模型和脉冲响应函数研究了我国股指期货长短期的价格发现机制,并用信息共享模型、共因子模型研究了我国股指期货市场的价格发现贡献程度;在此基础上,引入分位数回归,探讨不同涨跌幅度的期现关系。实证结果表明:我国指数期货和现货价格存在相互引导关系,而现阶段现货市场能更快反应全部市场的冲击,且现货市场在价格发现功能中的作用相对较大;随着涨跌幅度的变化,现货对期货的影响呈u型走势,而期货对现货的影响呈单边上升走势。This article studies stock index futures of long-term and short-term price discovery mechanism using cointegration test,error correction model and impulse response function based on the Shanghai Shenzhen 300 stock index futures 5-minute high-frequency data,and information sharing model and common factor model are used to study contribution of price discovery of stock index futures in China.In addition,quantile regression is used to explore the relationship of future and spot market at different ups and downs.The empirical results show that there is mutual guidance between index futures and spot prices in China market,and the spot market have a bigger role in price discovery in current stage;with the change of ups and downs,the impact of spot on index future showed a U-trend,and the impact of futures on the spot showed a unilateral increasing trend.中央高校基本科研业务费专项资金(2010221040);国家社科基金重点项目(11BTJ001);福建省社科基金(2011C042)资
    corecore