1,050 research outputs found

    Essays on Corporate Disclosure of Value Creation

    Get PDF
    Information on a firm’s business model helps investors understand an entity’s resource requirements, priorities for action, and prospects (FASB, 2001, pp. 14-15; IASB, 2010, p. 12). Disclosures of strategy and business model (SBM) are therefore considered a central element of effective annual report commentary (Guillaume, 2018; IIRC, 2011). By applying natural language processing techniques, I explore what SBM disclosures look like when management are pressed to say something, analyse determinants of cross-sectional variation in SBM reporting properties, and assess whether and how managers respond to regulatory interventions seeking to promote SBM annual report commentary. This dissertation contains three main chapters. Chapter 2 presents a systematic review of the academic literature on non-financial reporting and the emerging literature on SBM reporting. Here, I also introduce my institutional setting. Chapter 3 and Chapter 4 form the empirical sections of this thesis. In Chapter 3, I construct the first large sample corpus of SBM annual report commentary and provide the first systematic analysis of the properties of such disclosures. My topic modelling analysis rejects the hypothesis that such disclosure is merely padding; instead finding themes align with popular strategy frameworks and management tailor the mix of SBM topics to reflect their unique approach to value creation. However, SBM commentary is less specific, less precise about time horizon (short- and long-term), and less balanced (more positive) in tone relative to general management commentary. My findings suggest symbolic compliance and legitimisation characterize the typical annual report discussion of SBM. Further analysis identifies proprietary cost considerations and obfuscation incentives as key determinants of symbolic reporting. In Chapter 4, I seek evidence on how managers respond to regulatory mandates by adapting the properties of disclosure and investigate whether the form of the mandate matters. Using a differences-in-differences research design, my results suggest a modest incremental response by treatment firms to the introduction of a comply or explain provision to provide disclosure on strategy and business model. In contrast, I find a substantial response to enacting the same requirements in law. My analysis provides clear and consistent evidence that treatment firms incrementally increase the volume of SBM disclosure, improve coverage across a broad range of topics as well as providing commentary with greater focus on the long term. My results point to substantial changes in SBM reporting properties following regulatory mandates, but the form of the mandate does matter. Overall, this dissertation contributes to the accounting literature by examining how firms discuss a central topic to economic decision making in annual reports and how firms respond to different forms of disclosure mandate. Furthermore, the results of my analysis are likely to be of value for regulators and policymakers currently reviewing or considering mandating disclosure requirements. By examining how companies adapt their reporting to different types of regulations, this study provides an empirical basis for recalibrating SBM disclosure mandates, thereby enhancing the information set of capital market participants and promoting stakeholder engagement in a landscape increasingly shaped by non-financial information

    Fuzzy Natural Logic in IFSA-EUSFLAT 2021

    Get PDF
    The present book contains five papers accepted and published in the Special Issue, “Fuzzy Natural Logic in IFSA-EUSFLAT 2021”, of the journal Mathematics (MDPI). These papers are extended versions of the contributions presented in the conference “The 19th World Congress of the International Fuzzy Systems Association and the 12th Conference of the European Society for Fuzzy Logic and Technology jointly with the AGOP, IJCRS, and FQAS conferences”, which took place in Bratislava (Slovakia) from September 19 to September 24, 2021. Fuzzy Natural Logic (FNL) is a system of mathematical fuzzy logic theories that enables us to model natural language terms and rules while accounting for their inherent vagueness and allows us to reason and argue using the tools developed in them. FNL includes, among others, the theory of evaluative linguistic expressions (e.g., small, very large, etc.), the theory of fuzzy and intermediate quantifiers (e.g., most, few, many, etc.), and the theory of fuzzy/linguistic IF–THEN rules and logical inference. The papers in this Special Issue use the various aspects and concepts of FNL mentioned above and apply them to a wide range of problems both theoretically and practically oriented. This book will be of interest for researchers working in the areas of fuzzy logic, applied linguistics, generalized quantifiers, and their applications

    Using Mahalanobis Distance to Filter Erroneous Vowel Features in Less-Resourced Languages: Application to Quebec French

    Get PDF
    A major challenge in building shareable datasets for phonetic studies consists of maximising data collections while minimising the errors involved, in particular when automatic processes come into play (e.g. labeling, formant detection...). Automatically extracting formants from large amounts of speech frequently produces artifacts, due to formant jumps, alignment problems, or noise. One solution is to use formant range filters. However this requires prior knowledge about the vowels, such as formant range information. Here we propose to use the Mahalanobis distance to remove erroneous values relying only on the labeled speech data. Our study is conducted on a Quebec French corpus including more than 170 k tokens of 16 vowel types. Results show that the proposed method can complement the threshold-based filter approach. Furthermore, it can be used autonomously for undocumented languages to eliminate erroneous values. The approach also makes it easy to adjust the degree of filtering

    Produkce diskurzu českých mluvčích s afázií: Explorace s využitím usage-based lingvistiky

    Get PDF
    The research in linguistic aphasiology has been dominated by structuralist, rule-based approaches to the study of langauge. However, recent work has shown that analyses based in constructivist, usage-based frameworks can provide explanations to patterns of language processing in aphasia that are difficult to accommodate in structuralist models. The present work follows up on these findings and aims to provide additional evidence for the benefits of the usage-based model by using data from Czech speakers with aphasia, an understudied language in this context. The aims of the study were threefold: to create a collection of samples of aphasic connected speech available to other researchers, to provide a description of the patterns of aphasic discourse production in Czech, and, most importantly, to show potential benefits of usage-based construction grammar for aphasia research. A corpus of the speech of eleven persons with fluent and non-fluent aphasia of varying degrees of severity was created. The corpus consist of more than 23000 word position produced by speakers with aphasia in tasks used to elicit conversational, narrative, descriptive, and procedural discourse. The corpus is lemmatized and morphologically tagged and the transcripts are aligned with audio recordings. A smaller sample of three,...Výzkum v lingvistické afaziologii využíval po dlouhou dobu především strukturalistické přístupy založené na pravidlech. Některé výsledky z poslední doby však ukazují, že konstruktivistické přístupy založené na užívání jazyka (usage-based přístup) dokážou vysvětlit některá specifika zpracování jazyka v afázii, která jsou ve strukturalistickém rámci obtížně vysvětlitelná. Předkládaná dizertační práce navazuje na tyto výzkumy a klade si za cíl předložit další důkazy pro výhodnost usage-přístupu. Využívá přitom data z češtiny, která je v afaziologickém výzkumu značně podreprezentovaná. Práce si stanovila tři cíle: jednak shromáždit projevy českých mluvčích s afázií, které by byly přístupné dalším výzkumníkům, dále podat detailní popis produkce diskurzu v afázii v češtině a konečně ukázat některé přednosti usage-based přístupu pro afaziologii. V rámci práce byl vytvořen korpus jedenácti mluvčích s fluentní a nefluentní afázií s různými stupni závažnosti poruchy. Korpus obsahuje přes 23000 slovních pozic vyprodukovaných mluvčími s afázií sebranými s využitím úkolů, jejichž cílem bylo elicitovat konverzační, narativní, deskriptivní a procedurální diskurz. Korpus je lematizován a morfologicky označkován. Dále je v něm zahrnut menší vzorek řečové produkce tří neurotypických mluvčích se srovnatelnými...Ústav českého jazyka a teorie komunikaceInstitute of Czech Language and Theory of CommunicationFaculty of ArtsFilozofická fakult

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Full text link
    Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.Comment: Under submission at Computer Science and Language. Preprint allowe

    A Tale of Two Approaches: Comparing Top-Down and Bottom-Up Strategies for Analyzing and Visualizing High-Dimensional Data

    Get PDF
    The proliferation of high-throughput and sensory technologies in various fields has led to a considerable increase in data volume, complexity, and diversity. Traditional data storage, analysis, and visualization methods are struggling to keep pace with the growth of modern data sets, necessitating innovative approaches to overcome the challenges of managing, analyzing, and visualizing data across various disciplines. One such approach is utilizing novel storage media, such as deoxyribonucleic acid~(DNA), which presents efficient, stable, compact, and energy-saving storage option. Researchers are exploring the potential use of DNA as a storage medium for long-term storage of significant cultural and scientific materials. In addition to novel storage media, scientists are also focussing on developing new techniques that can integrate multiple data modalities and leverage machine learning algorithms to identify complex relationships and patterns in vast data sets. These newly-developed data management and analysis approaches have the potential to unlock previously unknown insights into various phenomena and to facilitate more effective translation of basic research findings to practical and clinical applications. Addressing these challenges necessitates different problem-solving approaches. Researchers are developing novel tools and techniques that require different viewpoints. Top-down and bottom-up approaches are essential techniques that offer valuable perspectives for managing, analyzing, and visualizing complex high-dimensional multi-modal data sets. This cumulative dissertation explores the challenges associated with handling such data and highlights top-down, bottom-up, and integrated approaches that are being developed to manage, analyze, and visualize this data. The work is conceptualized in two parts, each reflecting the two problem-solving approaches and their uses in published studies. The proposed work showcases the importance of understanding both approaches, the steps of reasoning about the problem within them, and their concretization and application in various domains

    Clinical text classification in Cancer Real-World Data in Spanish

    Get PDF
    Healthcare systems currently store a large amount of clinical data, mostly unstructured textual information, such as electronic health records (EHRs). Manually extracting valuable information from these documents is costly for healthcare professionals. For example, when a patient first arrives at an oncology clinical analysis unit, clinical staff must extract information about the type of neoplasm in order to assign the appropriate clinical specialist. Automating this task is equivalent to text classification in natural language processing (NLP). In this study, we have attempted to extract the neoplasm type by processing Spanish clinical documents. A private corpus of 23, 704 real clinical cases has been processed to extract the three most common types of neoplasms in the Spanish territory: breast, lung and colorectal neoplasms. We have developed methodologies based on state-of-the-art text classification task, strategies based on machine learning and bag-of-words, based on embedding models in a supervised task, and based on bidirectional recurrent neural networks with convolutional layers (C-BiRNN). The results obtained show that the application of NLP methods is extremely helpful in performing the task of neoplasm type extraction. In particular, the 2-BiGRU model with convolutional layer and pre-trained fastText embedding obtained the best performance, with a macro-average, more representative than the micro-average due to the unbalanced data, of 0.981 for precision, 0.984 for recall and 0.982 for F1-score.The authors acknowledge the support from the Ministerio de Ciencia e Innovación (MICINN) under project PID2020-116898RB-I00, from Universidad de Málaga and Junta de Andalucía through grants UMA20-FEDERJA-045 and PYC20-046-UMA (all including FEDER funds), and from the Malaga-Pfizer consortium for AI research in Cancer - MAPIC. Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

    Full text link
    The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.Comment: Technical report; 23 pages; code is at: https://github.com/microsoft/promptbenc

    Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings

    Get PDF
    Languages are dynamic entities, where the meanings associated with words constantly change with time. Detecting the semantic variation of words is an important task for various NLP applications that must make time-sensitive predictions. Existing work on semantic variation prediction have predominantly focused on comparing some form of an averaged contextualised representation of a target word computed from a given corpus. However, some of the previously associated meanings of a target word can become obsolete over time (e.g. meaning of gay as happy), while novel usages of existing words are observed (e.g. meaning of cell as a mobile phone). We argue that mean representations alone cannot accurately capture such semantic variations and propose a method that uses the entire cohort of the contextualised embeddings of the target word, which we refer to as the sibling distribution. Experimental results on SemEval-2020 Task 1 benchmark dataset for semantic variation prediction show that our method outperforms prior work that consider only the mean embeddings, and is comparable to the current state-of-the-art. Moreover, a qualitative analysis shows that our method detects important semantic changes in words that are not captured by the existing methods
    • …
    corecore