1,050 research outputs found
Essays on Corporate Disclosure of Value Creation
Information on a firm’s business model helps investors understand an entity’s resource requirements, priorities for action, and prospects (FASB, 2001, pp. 14-15; IASB, 2010, p. 12). Disclosures of strategy and business model (SBM) are therefore considered a central element of effective annual report commentary (Guillaume, 2018; IIRC, 2011). By applying natural language processing techniques, I explore what SBM disclosures look like when management are pressed to say something, analyse determinants of cross-sectional variation in SBM reporting properties, and assess whether and how managers respond to regulatory interventions seeking to promote SBM annual report commentary. This dissertation contains three main chapters. Chapter 2 presents a systematic review of the academic literature on non-financial reporting and the emerging literature on SBM reporting. Here, I also introduce my institutional setting. Chapter 3 and Chapter 4 form the empirical sections of this thesis. In Chapter 3, I construct the first large sample corpus of SBM annual report commentary and provide the first systematic analysis of the properties of such disclosures. My topic modelling analysis rejects the hypothesis that such disclosure is merely padding; instead finding themes align with popular strategy frameworks and management tailor the mix of SBM topics to reflect their unique approach to value creation. However, SBM commentary is less specific, less precise about time horizon (short- and long-term), and less balanced (more positive) in tone relative to general management commentary. My findings suggest symbolic compliance and legitimisation characterize the typical annual report discussion of SBM. Further analysis identifies proprietary cost considerations and obfuscation incentives as key determinants of symbolic reporting. In Chapter 4, I seek evidence on how managers respond to regulatory mandates by adapting the properties of disclosure and investigate whether the form of the mandate matters. Using a differences-in-differences research design, my results suggest a modest incremental response by treatment firms to the introduction of a comply or explain provision to provide disclosure on strategy and business model. In contrast, I find a substantial response to enacting the same requirements in law. My analysis provides clear and consistent evidence that treatment firms incrementally increase the volume of SBM disclosure, improve coverage across a broad range of topics as well as providing commentary with greater focus on the long term. My results point to substantial changes in SBM reporting properties following regulatory mandates, but the form of the mandate does matter. Overall, this dissertation contributes to the accounting literature by examining how firms discuss a central topic to economic decision making in annual reports and how firms respond to different forms of disclosure mandate. Furthermore, the results of my analysis are likely to be of value for regulators and policymakers currently reviewing or considering mandating disclosure requirements. By examining how companies adapt their reporting to different types of regulations, this study provides an empirical basis for recalibrating SBM disclosure mandates, thereby enhancing the information set of capital market participants and promoting stakeholder engagement in a landscape increasingly shaped by non-financial information
Fuzzy Natural Logic in IFSA-EUSFLAT 2021
The present book contains five papers accepted and published in the Special Issue, “Fuzzy Natural Logic in IFSA-EUSFLAT 2021”, of the journal Mathematics (MDPI). These papers are extended versions of the contributions presented in the conference “The 19th World Congress of the International Fuzzy Systems Association and the 12th Conference of the European Society for Fuzzy Logic and Technology jointly with the AGOP, IJCRS, and FQAS conferences”, which took place in Bratislava (Slovakia) from September 19 to September 24, 2021. Fuzzy Natural Logic (FNL) is a system of mathematical fuzzy logic theories that enables us to model natural language terms and rules while accounting for their inherent vagueness and allows us to reason and argue using the tools developed in them. FNL includes, among others, the theory of evaluative linguistic expressions (e.g., small, very large, etc.), the theory of fuzzy and intermediate quantifiers (e.g., most, few, many, etc.), and the theory of fuzzy/linguistic IF–THEN rules and logical inference. The papers in this Special Issue use the various aspects and concepts of FNL mentioned above and apply them to a wide range of problems both theoretically and practically oriented. This book will be of interest for researchers working in the areas of fuzzy logic, applied linguistics, generalized quantifiers, and their applications
Using Mahalanobis Distance to Filter Erroneous Vowel Features in Less-Resourced Languages: Application to Quebec French
A major challenge in building shareable datasets for phonetic studies consists of maximising data collections while minimising the errors involved, in particular when automatic processes come into play (e.g. labeling, formant detection...). Automatically extracting formants from large amounts of speech frequently produces artifacts, due to formant jumps, alignment problems, or noise. One solution is to use formant range filters. However this requires prior knowledge about the vowels, such as formant range information. Here we propose to use the Mahalanobis distance to remove erroneous values relying only on the labeled speech data. Our study is conducted on a Quebec French corpus including more than 170 k tokens of 16 vowel types. Results show that the proposed method can complement the threshold-based filter approach. Furthermore, it can be used autonomously for undocumented languages to eliminate erroneous values. The approach also makes it easy to adjust the degree of filtering
Produkce diskurzu ÄŤeskĂ˝ch mluvÄŤĂch s afáziĂ: Explorace s vyuĹľitĂm usage-based lingvistiky
The research in linguistic aphasiology has been dominated by structuralist, rule-based approaches to the study of langauge. However, recent work has shown that analyses based in constructivist, usage-based frameworks can provide explanations to patterns of language processing in aphasia that are difficult to accommodate in structuralist models. The present work follows up on these findings and aims to provide additional evidence for the benefits of the usage-based model by using data from Czech speakers with aphasia, an understudied language in this context. The aims of the study were threefold: to create a collection of samples of aphasic connected speech available to other researchers, to provide a description of the patterns of aphasic discourse production in Czech, and, most importantly, to show potential benefits of usage-based construction grammar for aphasia research. A corpus of the speech of eleven persons with fluent and non-fluent aphasia of varying degrees of severity was created. The corpus consist of more than 23000 word position produced by speakers with aphasia in tasks used to elicit conversational, narrative, descriptive, and procedural discourse. The corpus is lemmatized and morphologically tagged and the transcripts are aligned with audio recordings. A smaller sample of three,...VĂ˝zkum v lingvistickĂ© afaziologii vyuĹľĂval po dlouhou dobu pĹ™edevšĂm strukturalistickĂ© pĹ™Ăstupy zaloĹľenĂ© na pravidlech. NÄ›kterĂ© vĂ˝sledky z poslednĂ doby však ukazujĂ, Ĺľe konstruktivistickĂ© pĹ™Ăstupy zaloĹľenĂ© na uĹľĂvánĂ jazyka (usage-based pĹ™Ăstup) dokážou vysvÄ›tlit nÄ›která specifika zpracovánĂ jazyka v afázii, která jsou ve strukturalistickĂ©m rámci obtĂĹľnÄ› vysvÄ›tlitelná. PĹ™edkládaná dizertaÄŤnĂ práce navazuje na tyto vĂ˝zkumy a klade si za cĂl pĹ™edloĹľit dalšà dĹŻkazy pro vĂ˝hodnost usage-pĹ™Ăstupu. VyuĹľĂvá pĹ™itom data z ÄŤeštiny, která je v afaziologickĂ©m vĂ˝zkumu znaÄŤnÄ› podreprezentovaná. Práce si stanovila tĹ™i cĂle: jednak shromáždit projevy ÄŤeskĂ˝ch mluvÄŤĂch s afáziĂ, kterĂ© by byly pĹ™ĂstupnĂ© dalšĂm vĂ˝zkumnĂkĹŻm, dále podat detailnĂ popis produkce diskurzu v afázii v ÄŤeštinÄ› a koneÄŤnÄ› ukázat nÄ›kterĂ© pĹ™ednosti usage-based pĹ™Ăstupu pro afaziologii. V rámci práce byl vytvoĹ™en korpus jedenácti mluvÄŤĂch s fluentnĂ a nefluentnĂ afáziĂ s rĹŻznĂ˝mi stupni závaĹľnosti poruchy. Korpus obsahuje pĹ™es 23000 slovnĂch pozic vyprodukovanĂ˝ch mluvÄŤĂmi s afáziĂ sebranĂ˝mi s vyuĹľitĂm ĂşkolĹŻ, jejichĹľ cĂlem bylo elicitovat konverzaÄŤnĂ, narativnĂ, deskriptivnĂ a procedurálnĂ diskurz. Korpus je lematizován a morfologicky oznaÄŤkován. Dále je v nÄ›m zahrnut menšà vzorek Ĺ™eÄŤovĂ© produkce třà neurotypickĂ˝ch mluvÄŤĂch se srovnatelnĂ˝mi...Ăšstav ÄŤeskĂ©ho jazyka a teorie komunikaceInstitute of Czech Language and Theory of CommunicationFaculty of ArtsFilozofická fakult
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech
Self-supervised learning (SSL) is at the origin of unprecedented improvements
in many different domains including computer vision and natural language
processing. Speech processing drastically benefitted from SSL as most of the
current domain-related tasks are now being approached with pre-trained models.
This work introduces LeBenchmark 2.0 an open-source framework for assessing and
building SSL-equipped French speech technologies. It includes documented,
large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous
speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to
one billion learnable parameters shared with the community, and an evaluation
protocol made of six downstream tasks to complement existing benchmarks.
LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for
speech with the investigation of frozen versus fine-tuned downstream models,
task-agnostic versus task-specific pre-trained models as well as a discussion
on the carbon footprint of large-scale model training.Comment: Under submission at Computer Science and Language. Preprint allowe
A Tale of Two Approaches: Comparing Top-Down and Bottom-Up Strategies for Analyzing and Visualizing High-Dimensional Data
The proliferation of high-throughput and sensory technologies in various fields has led to a considerable increase in data volume, complexity, and diversity. Traditional data storage, analysis, and visualization methods are struggling to keep pace with the growth of modern data sets, necessitating innovative approaches to overcome the challenges of managing, analyzing, and visualizing data across various disciplines.
One such approach is utilizing novel storage media, such as deoxyribonucleic acid~(DNA), which presents efficient, stable, compact, and energy-saving storage option. Researchers are exploring the potential use of DNA as a storage medium for long-term storage of significant cultural and scientific materials.
In addition to novel storage media, scientists are also focussing on developing new techniques that can integrate multiple data modalities and leverage machine learning algorithms to identify complex relationships and patterns in vast data sets. These newly-developed data management and analysis approaches have the potential to unlock previously unknown insights into various phenomena and to facilitate more effective translation of basic research findings to practical and clinical applications.
Addressing these challenges necessitates different problem-solving approaches. Researchers are developing novel tools and techniques that require different viewpoints. Top-down and bottom-up approaches are essential techniques that offer valuable perspectives for managing, analyzing, and visualizing complex high-dimensional multi-modal data sets. This cumulative dissertation explores the challenges associated with handling such data and highlights top-down, bottom-up, and integrated approaches that are being developed to manage, analyze, and visualize this data. The work is conceptualized in two parts, each reflecting the two problem-solving approaches and their uses in published studies. The proposed work showcases the importance of understanding both approaches, the steps of reasoning about the problem within them, and their concretization and application in various domains
Clinical text classification in Cancer Real-World Data in Spanish
Healthcare systems currently store a large amount of clinical data, mostly unstructured textual information, such as electronic health records (EHRs). Manually extracting valuable information from these documents is costly for healthcare professionals. For example, when a patient first arrives at an oncology clinical analysis unit, clinical staff must extract information about the type of neoplasm in order to assign the appropriate clinical specialist. Automating this task is equivalent to text classification in natural language processing (NLP). In this study, we have attempted to extract the neoplasm type by processing Spanish clinical documents. A private corpus of 23, 704 real clinical cases has been processed to extract the three most common types of neoplasms in the Spanish territory: breast, lung and colorectal neoplasms. We have developed methodologies based on state-of-the-art text classification task, strategies based on machine learning and bag-of-words, based on embedding models in a supervised task, and based on bidirectional recurrent neural networks with convolutional layers (C-BiRNN). The results obtained show that the application of NLP methods is extremely helpful in performing the task of neoplasm type extraction. In particular, the 2-BiGRU model with convolutional layer and pre-trained fastText embedding obtained the best performance, with a macro-average, more representative than the micro-average due to the unbalanced data, of 0.981 for precision, 0.984 for recall and 0.982 for F1-score.The authors acknowledge the support from the Ministerio de Ciencia e InnovaciĂłn (MICINN) under project PID2020-116898RB-I00, from Universidad de Málaga and Junta de AndalucĂa through grants UMA20-FEDERJA-045 and PYC20-046-UMA (all including FEDER funds), and from the Malaga-Pfizer consortium for AI research in Cancer - MAPIC. Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
The increasing reliance on Large Language Models (LLMs) across academia and
industry necessitates a comprehensive understanding of their robustness to
prompts. In response to this vital need, we introduce PromptBench, a robustness
benchmark designed to measure LLMs' resilience to adversarial prompts. This
study uses a plethora of adversarial textual attacks targeting prompts across
multiple levels: character, word, sentence, and semantic. These prompts are
then employed in diverse tasks, such as sentiment analysis, natural language
inference, reading comprehension, machine translation, and math
problem-solving. Our study generates 4,032 adversarial prompts, meticulously
evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our
findings demonstrate that contemporary LLMs are vulnerable to adversarial
prompts. Furthermore, we present comprehensive analysis to understand the
mystery behind prompt robustness and its transferability. We then offer
insightful robustness analysis and pragmatic recommendations for prompt
composition, beneficial to both researchers and everyday users. We make our
code, prompts, and methodologies to generate adversarial prompts publicly
accessible, thereby enabling and encouraging collaborative exploration in this
pivotal field: https://github.com/microsoft/promptbench.Comment: Technical report; 23 pages; code is at:
https://github.com/microsoft/promptbenc
Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings
Languages are dynamic entities, where the meanings associated with words constantly change with time. Detecting the semantic variation of words is an important task for various NLP applications that must make time-sensitive predictions. Existing work on semantic variation prediction have predominantly focused on comparing some form of an averaged contextualised representation of a target word computed from a given corpus. However, some of the previously associated meanings of a target word can become obsolete over time (e.g. meaning of gay as happy), while novel usages of existing words are observed (e.g. meaning of cell as a mobile phone). We argue that mean representations alone cannot accurately capture such semantic variations and propose a method that uses the entire cohort of the contextualised embeddings of the target word, which we refer to as the sibling distribution. Experimental results on SemEval-2020 Task 1 benchmark dataset for semantic variation prediction show that our method outperforms prior work that consider only the mean embeddings, and is comparable to the current state-of-the-art. Moreover, a qualitative analysis shows that our method detects important semantic changes in words that are not captured by the existing methods
- …