6 research outputs found

    LeMe-PT: A Medical Package Leaflet Corpus for Portuguese

    Get PDF
    The current trend on natural language processing is the use of machine learning. This is being done on every field, from summarization to machine translation. For these techniques to be applied, resources are needed, namely quality corpora. While there are large quantities of corpora for the Portuguese language, there is the lack of technical and focused corpora. Therefore, in this article we present a new corpus, built from drug package leaflets. We describe its structure and contents, and discuss possible exploration directions

    The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms

    Get PDF
    Rare diseases affect a small number of people compared to the general population. However, more than 6,000 different rare diseases exist and, in total, they affect more than 300 million people worldwide. Rare diseases share as part of their main problem, the delay in diagnosis and the sparse information available for researchers, clinicians, and patients. Finding a diagnostic can be a very long and frustrating experience for patients and their families. The average diagnostic delay is between 6–8 years. Many of these diseases result in different manifestations among patients, which hampers even more their detection and the correct treatment choice. Therefore, there is an urgent need to increase the scientific and medical knowledge about rare diseases. Natural Language Processing (NLP) can help to extract relevant information about rare diseases to facilitate their diagnosis and treatments, but most NLP techniques require manually annotated corpora. Therefore, our goal is to create a gold standard corpus annotated with rare diseases and their clinical manifestations. It could be used to train and test NLP approaches and the information extracted through NLP could enrich the knowledge of rare diseases, and thereby, help to reduce the diagnostic delay and improve the treatment of rare diseases. The paper describes the selection of 1,041 texts to be included in the corpus, the annotation process and the annotation guidelines. The entities (disease, rare disease, symptom, sign and anaphor) and the relationships (produces, is a, is acron, is synon, increases risk of, anaphora) were annotated. The RareDis corpus contains more than 5,000 rare diseases and almost 6,000 clinical manifestations are annotated. Moreover, the Inter Annotator Agreement evaluation shows a relatively high agreement (F1-measure equal to 83.5% under exact match criteria for the entities and equal to 81.3% for the relations). Based on these results, this corpus is of high quality, supposing a significant step for the field since there is a scarcity of available corpus annotated with rare diseases. This could open the door to further NLP applications, which would facilitate the diagnosis and treatment of these rare diseases and, therefore, would improve dramatically the quality of life of these patients.This work was supported by the Madrid Government (Comunidad de Madrid) under the Multiannual Agreement with UC3M in the line of "Fostering Young Doctors Research" (NLP4RARE-CM-UC3M) and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation; the Multiannual Agreement with UC3M in the line of "Excellence of University Professors (EPUC3M17)"; and a grant from Spanish Ministry of Economy and Competitiveness (SAF2017-86810-R)

    Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches

    Get PDF
    Drug Safety (DS) is a domain with significant public health and social impact. Knowledge Engineering (KE) is the Computer Science discipline elaborating on methods and tools for developing “knowledge-intensive” systems, depending on a conceptual “knowledge” schema and some kind of “reasoning” process. The present systematic and mapping review aims to investigate KE-based approaches employed for DS and highlight the introduced added value as well as trends and possible gaps in the domain. Journal articles published between 2006 and 2017 were retrieved from PubMed/MEDLINE and Web of Science¼ (873 in total) and filtered based on a comprehensive set of inclusion/exclusion criteria. The 80 finally selected articles were reviewed on full-text, while the mapping process relied on a set of concrete criteria (concerning specific KE and DS core activities, special DS topics, employed data sources, reference ontologies/terminologies, and computational methods, etc.). The analysis results are publicly available as online interactive analytics graphs. The review clearly depicted increased use of KE approaches for DS. The collected data illustrate the use of KE for various DS aspects, such as Adverse Drug Event (ADE) information collection, detection, and assessment. Moreover, the quantified analysis of using KE for the respective DS core activities highlighted room for intensifying research on KE for ADE monitoring, prevention and reporting. Finally, the assessed use of the various data sources for DS special topics demonstrated extensive use of dominant data sources for DS surveillance, i.e., Spontaneous Reporting Systems, but also increasing interest in the use of emerging data sources, e.g., observational healthcare databases, biochemical/genetic databases, and social media. Various exemplar applications were identified with promising results, e.g., improvement in Adverse Drug Reaction (ADR) prediction, detection of drug interactions, and novel ADE profiles related with specific mechanisms of action, etc. Nevertheless, since the reviewed studies mostly concerned proof-of-concept implementations, more intense research is required to increase the maturity level that is necessary for KE approaches to reach routine DS practice. In conclusion, we argue that efficiently addressing DS data analytics and management challenges requires the introduction of high-throughput KE-based methods for effective knowledge discovery and management, resulting ultimately, in the establishment of a continuous learning DS system

    Essays in Mutual Fund Research

    Get PDF
    In my doctoral thesis, I demonstrate i) how the demand and supply side respond to the (first time) availability of product information for mutual funds and ii) how actions and personal characteristics of portfolio managers impact investors and fund management. Essays (1) and (2) extend the scarce evidence on the utility of investor information disclosure by means of a comprehensive investigation into the disclosure practices of the mutual fund industry. Using product information with different degrees of salience and obligation, ranging from comprehensive mandatory pre-contractual product information to complementary fund characteristics only disclosed by selective players, the essays document the importance of thoroughly written and designed information. Specifically, on the demand side, I analyze i) whether retail investors can understand mutual fund product information and ii) if investors are able to benefit from novel disclosure initiatives. Moreover, on the supply side, I show if and to what extent mutual fund companies react to novel disclosure regulations. Essays (3) and (4) shift the focus towards the individuals in charge of managing retail investors’ money, i.e. the portfolio managers, analyzing the impact of incentive mechanisms and personality traits on fund management and investor behavior. The overarching contribution of my research is threefold. First, by addressing information salience and understandability, I shed light on retail investor limitations not explained by the classical efficient market framework assuming investors to be fully rational utility-maximizing decision-makers (e.g., Fama 1970). Thus, my research adds to the rich behavioral finance literature dealing with cognitive capacity and information processing constraints (e.g., Kozup et al. 2012, Agnew and Szykman 2005). Second, by analyzing investor behavior from an objective point of view, I contribute to the understanding of determinants which affect flows of mutual fund investor (e.g., Sirri and Tufano 1998, Barber et al. 2005). Third, methodically my research adds to the quantification of qualitative data in the finance domain (e.g., Loughran and McDonald 2016, 2019) by applying advanced textual analytics (essays (1), (3) and (4)), allowing to investigate large samples of written (verbal) information. How do policy makers help consumers make sound investment decisions? Regulations which require disclosure of information are among the most ubiquitous interventions in investor protection. The popularity of mandatory information disclosure follows standard economic theory which suggests that disclosure can help avoid instances of market failure in situations characterized by asymmetric information and a risk of misaligned incentives (e.g., Akerlof 1970, Ross 1973). However, although broadly advocated as an appropriate policy measure, there is a paucity of data supporting the efficiency of mandatory information disclosure. For example, individuals’ information processing abilities have been shown to be limited and, thus, the increasing extent of mandatory information likely leads to an ‘information overload’, where the marginal utility of information for the decision-maker becomes negative (e.g., Eppler and Mengis 2004). In my dissertation, I focus on investor information disclosed by actively managed equity mutual funds, since holdings in this asset class represent the by far largest fraction of household investments: in 2017, worldwide retail assets under management by equity mutual funds totaled at $21.8 trillion with the large majority being actively managed (Investment Company Institute 2018). Moreover, disclosure requirements are pervasive for fund companies and the market is a prime candidate for unintended consequences of mandatory disclosure such as information overload: investors face a dizzying number of product options and each product carries a host of characteristics, which should be considered in order to make an informed decision. Especially when investing in an actively managed mutual fund which is tantamount to delegating the management of a securities portfolio. I investigate four types of investor information which regulatory authorities have qualified as decision-relevant when it comes to this delegation task. First and foremost, investor should understand the fund’s key features. For this to be the case, mandatory product information has to be easy to understand for the average investor (essay 1). The introduction of Key Investor Information Documents (KIIDs) for mutual funds in the European Union is the regulator’s response to the quest for a more comprehensible description of the essential product features and we examine if these documents live up to their purpose. Following Loughran and McDonald (2014), we assess the comprehensibility and regulatory compliance of KIIDs and thereby extend the scarce academic evidence on the importance of product information documents (e.g., Habschick et al. 2012, Oehler et al. 2014, Walther 2015). We use a comprehensive sample of roughly 38,000 product information documents for mutual funds pre and post the introduction of KIIDs to capture the regulations impact on fund information comprehensibility. We find that while mutual fund product information remains difficult to read requiring on average 13 years of formal education from readers, textual readability significantly improved with the introduction of KIIDs. Furthermore, we show that the introduction of KIIDs translated into a ‘clearer’ writing style. By contrast, we detect that the relative usage of financial jargon increased in the new short form disclosure document. Moreover, the improvement on readability and the significant reduction in length seem to be achieved at the expense of an appealing font. Only half of the KIIDs comply with regulators’ guidelines on font type and size. Taken together, we document mixed results on the regulations’ effectiveness in creating clear and comprehensible pre-contractual information that enable retail investor to read and understand those documents. Second, unlike index funds, actively managed funds sell the potential to beat their benchmark (usually a market index) and investors who select this type of mutual fund are typically looking for an opportunity to outperform the market index. However, actively managed funds usually charge significantly higher fees than passive funds (e.g., Morningstar 2018). This cost difference may be justified by the fund manager’s effort to manage the portfolio in a way which creates an opportunity to generate excess returns. Thus, assessing the fees charged by an actively managed fund in light of the actual level of activeness is a worthwhile screening exercise for investors: prior literature documents substantial underperformance for funds with low levels of activeness (e.g., Petajisto 2013, Cremers et al. 2016, Cremers and Pareek 2016). However, and even though fund companies employ Active Share (AS) , a metric to capture the degree to which a fund deviates from its benchmark, for a variety of purposes and provide AS information to institutional investors, they did not disclose it to retail investors and were not required to do so by regulators. The lack of equal access to AS information can be regarded as an information asymmetry, which prevents retail investors from fully evaluating the potential value proposition of an actively managed equity fund. Consequently, the New York Attorney General (NYOAG) revealed dubious index-hugging practices and unequal access to AS information for several of the largest US mutual funds and subsequently imposed disclosure of AS on them (NYOAG 2018). We make use of this unique intervention and thereby extend the few existing studies on funds’ activeness (essay 2). In particular, we are the first to demonstrate if and how individual investors react to AS information once they (can) learn about it. We find that retail investors strongly respond to the NYOAG intervention, but not in the way intended by the regulators. We document a significant increase in investor flows into funds of fund companies affected by the intervention. The effect is most pronounced in the days after the intervention became public. However, rather than ‘rationally’ re-allocating assets away from ‘high fee/low activeness’ and into truly actively managed funds, investors are subject to a media attention bias. Fund companies that are prominently covered in the press following the disclosure intervention experience high net inflows, irrespective of the degree of AS. These ïŹndings are hard to square with the notion that retail investors have understood the concept behind AS and rationally traded on this newly available information. On the supply side, we do not observe a change in portfolio management habits following the intervention. Even for funds with the lowest AS levels—i.e. arguably those funds with the highest pressure to act in an attempt to legitimate ‘active’ fees—we do not observe any measurable eïŹ€ort to increase AS post-intervention. In sum, our evaluation of the NYOAG intervention documents a number of unintended consequences and reveals substantial limits to the eïŹ€ectiveness of this disclosure initiative. Third, investors face ongoing uncertainty about the standard of care fund managers exercise when managing their savings and whether they act in their best interest. Following the rationale "(
) that a portfolio manager's ownership of a fund provides a direct indication of his or her alignment with the interests of shareholders in that fund" (SEC 2004, section II, part D), managers of US mutual funds are required to disclose the amount of their private investments in all funds they manage. However, information about the beneficial holdings of portfolio managers (their skin-in-the-game) is far from readily accessible for the average retail investor. Instead, managers’ private investments are disclosed in a supplementary fund information document that is only provided upon request and, at best, can be considered a secondary source for the average investor. Yet, interestingly, fund managers regularly use another medium to voluntarily disclose skin-in-the-game to their investors: the Letter to the Shareholder (LS). The LS is a non-mandatory–however commonly enclosed–component of the mutual fund's semi-annual or annual report. It is typically authored by the fund management, addresses the fund shareholders directly and thus constitutes a key element in communication with their shareholders (e.g., Hillert et al. 2016, Chu and Kim 2019). Unlike prior studies (e.g., Khorana et al. 2007, Ma et al. 2019, Evans 2008, Ibert 2018), who find that funds with managerial ownership yield higher risk-adjusted returns, I exploit verbal signaling of the managers in the LS to analyze aggregate investor fund flows applying advanced textual analytics (essay 3). With this, I contribute to prior research on the effects of fund manager skin-in-the-game by observing how retail investors respond to their managers’ signaling activities. I find that signaling of skin-in-the-game in the LS triggers substantial net inflows from retail investors. The effect is most sizeable in the days after investors receive the LS and persistent throughout time. On the other side, I show that retail investors’ asset allocation is unaltered by the actual amount invested by fund managers –an information the average retail investors most probable is unable (or unwilling) to find. Finally, I document that signaling of fund managers in the LS affects only retail investors. Professional investors, on the other hand, regularly have access to licensed fund data providers and potentially can easily obtain valuable information on fund manager investments. Fourth and lastly, we explore the consequences of a well-researched personality trait –narcissism– on fund managers’ portfolio management. Unlike ‘hard facts’ of a fund, such as past performance, cost or investment style, investors do know little about their fund managers personality. Yet, looking into the literature on corporate managers (e.g., Chatterjee and Hambrick 2007, Kumar and Goyal 2015, Aktas et al. 2016), personality traits might also affect the job of fund managers. Applying text-mining techniques on verbatim fund manager interviews retrieved from The Wall Street Transcript, we find that narcissism is even more severe among professional fund managers than in the corporate context. We show that narcissistic fund managers are significantly more likely to deviate from their advertised investment style. Moreover, we document that while the realized performance of narcissistic fund manager is virtually identical to their non-narcissistic counterparts, we find that they exhibit a worse risk-return profile. Furthermore, we identify that large funds, i.e. those associated with higher compensation and prestige in the business, are more often managed by narcissistic managers, which is in line with prior literature documenting ‘empire-building’ behavior of narcissists. Given our evidence pointing to a rather negative relation of narcissism on portfolio management, we would expect investors to refrain investing with a narcissistic manager. However, we find that this is not the case. Most probable, investors do not know about personal traits of their fund managers and consequently are unable to act upon this information. Taken together, the findings of my essays stress the importance of salient information disclosure in order for retail investors to arrive at a wise investment decision. The empirical evidence provided highlights certain shortcoming in current disclosure practices and regulations. Essay (1) indicates that summary product information accompanied by formatting and language guidelines are a first step in the right direction to ensure investors comprehensibility of product information for mutual funds. However, we still detect linguistic barriers that potentially prevent investors from reading and understanding relevant product characteristics. Essay (2) provides insights on the effect of a non-standardized information disclosure intervention. As can be inferred from investors’ (non-) response to the availability of information on funds’ activeness, we observe that local interventions that address information asymmetries and therefore should benefit retail investors decision making, proof almost inefficient when not requiring a standardized, comparable and well-thought through information layout. Essay (3) supports this notion in documenting a prevalent mismatch between information availability and information usage. Finally, essay (4) points on the importance of personality traits. For retail investors it might be important to know more about the character of their fund managers given the evidence that personality traits, such as narcissism, affect day-to-day portfolio management. In sum, decision relevant information for investors, from the explanation of funds’ investment style in the prospectus (essay 1), funds’ ‘true’ degree of activeness (essay 2), an indication of manager private wealth investment (essay 3) or hints on the managers personality (essay 4), remains useless as long as the understandability, salience and transparency of disclosure stays low

    Simplifying drug package leaflets written in Spanish by using word embedding

    Get PDF
    Background: Drug Package Leaflets (DPLs) provide information for patients on how to safely use medicines. Pharmaceutical companies are responsible for producing these documents. However, several studies have shown that patients usually have problems in understanding sections describing posology (dosage quantity and prescription), contraindications and adverse drug reactions. An ultimate goal of this work is to provide an automatic approach that helps these companies to write drug package leaflets in an easy-to-understand language. Natural language processing has become a powerful tool for improving patient care and advancing medicine because it leads to automatically process the large amount of unstructured information needed for patient care. However, to the best of our knowledge, no research has been done on the automatic simplification of drug package leaflets. In a previous work, we proposed to use domain terminological resources for gathering a set of synonyms for a given target term. A potential drawback of this approach is that it depends heavily on the existence of dictionaries, however these are not always available for any domain and language or if they exist, their coverage is very scarce.This work was supported by the Research Program of the Ministry of Economy and Competitiveness - Government of Spain, (eGovernAbility-Access project TIN2014-52665-C2-2-R)
    corecore