199 research outputs found
On the efficiency of estimating penetrating rank on large graphs
P-Rank (Penetrating Rank) has been suggested as a useful measure of structural similarity that takes account of both incoming and outgoing edges in ubiquitous networks. Existing work often utilizes memoization to compute P-Rank similarity in an iterative fashion, which requires cubic time in the worst case. Besides, previous methods mainly focus on the deterministic computation of P-Rank, but lack the probabilistic framework that scales well for large graphs. In this paper, we propose two efficient algorithms for computing P-Rank on large graphs. The first observation is that a large body of objects in a real graph usually share similar neighborhood structures. By merging such objects with an explicit low-rank factorization, we devise a deterministic algorithm to compute P-Rank in quadratic time. The second observation is that by converting the iterative form of P-Rank into a matrix power series form, we can leverage the random sampling approach to probabilistically compute P-Rank in linear time with provable accuracy guarantees. The empirical results on both real and synthetic datasets show that our approaches achieve high time efficiency with controlled error and outperform the baseline algorithms by at least one order of magnitude
Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification
The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional “bag of words” representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small
ESG Reporting Quality Assessment in Listed Companies of Maritime Sector
Regulatory obligations and market trends connected to environmental sustainability have lately intensified their effect on the shipping industry. New standards are continuously being established, such as the IMO\u27s 2050 aim of lowering greenhouse gas emissions by 50% compared to 2008 levels. These rules have an impact on capital markets and investor decisions about how to fund the maritime transport sector. The standards now include Environmental, Social, and Governance (ESG) components. These components are not only concerned with the environmental impact of shipping, but also with the social and governance dimensions of those firms that are typically associated with maritime transport risks, such as accidents, ship reservations, pollution issues, and so on. Considering the particular peculiarities of the maritime transport sector, our previous research has resulted in the development of a unified ESG reporting framework customized to shipping. To do this, the authors evaluated shipping related ESG reports and extracted essential ESG variables and methodological frameworks from them. The present study conducts a quality assessment of existing ESG reporting in various sectors of maritime transport companies on a large sample of firms listed at major stock exchanges, while it also identifies the level of compliance and areas for improvement. Based on a comprehensive methodological framework for reporting and assessing ESG for shipping, the research delivers relevant and robust information to aid management decision making, stakeholders, and debtholders insight on firm\u27s sustainability
Supply driven mortgage choice
Variable mortgage contracts dominate the UK mortgage market (Miles, 2004). The dominance of the variable rate mortgage contracts has important consequences for the transmission mechanism of monetary policy decisions and systemic risks (Khandani et al., 2012; Fuster and Vickery, 2013). This raises an obvious concern that a mortgage market such as that in the UK, where the major proportion of mortgage debt is either at a variable or fixed for less than two years rate (Badarinza, et al., 2013; CML, 2012), is vulnerable to alterations in the interest rate regime. Theoretically, mortgage choice is determined by demand and supply factors. So far, most of the existing literature has focused on the demand side perspective, and what is limited is consideration of supply side factors in empirical investigation on mortgage choice decisions. This paper uniquely explores whether supply side factors may partially explain observed/ex-post mortgage type decisions. Empirical results detect that lenders’ profit motives and mortgage funding/pricing issues may have assisted in preferences toward variable rate contracts. Securitisation is found to positively impact upon gross mortgage lending volumes while negatively impacting upon the share of variable lending flows. This shows that an increase in securitisation not only improves liquidity in the supply of mortgage funds, but also has the potential to shift mortgage choices toward fixed mortgage debt. The policy implications may involve a number of measures, including reconsideration of the capital requirements for the fixed, as opposed to the variable rate mortgage debt, growing securitisation and optimisation of the mortgage pricing policies
A Knowledge-Based Semantic Kernel for Text Classification
Abstract. Typically, in textual document classification the documents are represented in the vector space using the “Bag of Words ” (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic relatedness measure for pairs of words, namely Omiotis, into a seman-tic kernel. The suggested measure incorporates the TF-IDF weighting scheme, thus creating a semantic kernel which combines both seman-tic and statistical information from text. Empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the standard BOW representation, when Omiotis is embedded in four different classifiers
Results of the seventh edition of the BioASQ Challenge
The results of the seventh edition of the BioASQ challenge are presented in
this paper. The aim of the BioASQ challenge is the promotion of systems and
methodologies through the organization of a challenge on the tasks of
large-scale biomedical semantic indexing and question answering. In total, 30
teams with more than 100 systems participated in the challenge this year. As in
previous years, the best systems were able to outperform the strong baselines.
This suggests that state-of-the-art systems are continuously improving, pushing
the frontier of research.Comment: 17 pages, 2 figure
MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing
The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (seeNote 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28, 000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing “learning to rank” (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh
- …