407,028 research outputs found
Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts
There are different ways to define similarity for grouping similar texts into
clusters, as the concept of similarity may depend on the purpose of the task.
For instance, in topic extraction similar texts mean those within the same
semantic field, whereas in author recognition stylistic features should be
considered. In this study, we introduce ways to classify texts employing
concepts of complex networks, which may be able to capture syntactic, semantic
and even pragmatic features. The interplay between the various metrics of the
complex networks is analyzed with three applications, namely identification of
machine translation (MT) systems, evaluation of quality of machine translated
texts and authorship recognition. We shall show that topological features of
the networks representing texts can enhance the ability to identify MT systems
in particular cases. For evaluating the quality of MT texts, on the other hand,
high correlation was obtained with methods capable of capturing the semantics.
This was expected because the golden standards used are themselves based on
word co-occurrence. Notwithstanding, the Katz similarity, which involves
semantic and structure in the comparison of texts, achieved the highest
correlation with the NIST measurement, indicating that in some cases the
combination of both approaches can improve the ability to quantify quality in
MT. In authorship recognition, again the topological features were relevant in
some contexts, though for the books and authors analyzed good results were
obtained with semantic features as well. Because hybrid approaches encompassing
semantic and topological features have not been extensively used, we believe
that the methodology proposed here may be useful to enhance text classification
considerably, as it combines well-established strategies
Recommended from our members
The Will of Others
Scholarly reflections on the concept of the will as it is articulated in late ancient texts have centered on the male individual and the difficulties he faces as he tries to train or direct his intentions. By contrast, in this article we seek to explore late ancient concepts and negotiations of the will by considering a cluster of ancient Jewish and Christian narrative scenarios in which women are under the threat of sexual assault. Rather than a split between warring parts of one person, these narratives treat moments when the will of one actor is in conflict with the will of another. Thus, these scenarios raise questions that cannot otherwise be accessed about human intention, agency, and subjectivity, and their limitations by social and cultural realities. We argue that these cases should be viewed not as the marginal troubles that sometimes happen to women, but as expressions of the fundamental problems at the heart of the theories of the will embraced within late ancient Judaism and Christianity
Types and degrees of interpretive resemblance in translation
This articles explores one of the types of interpretive resemblance found in translation, namely, resemblance between concepts. These are cases where the concept encoded involves a resemblance relation between its literal import and the meaning it communicates, i.e. cases in which words do not literally communicate the concepts they encode. It is argued that translations are often carried out not on the basis of the concept encoded in the original text but on the basis of the actual concept communicated. This constitutes one of the sources of discrepancy found between original and target texts. In these cases, the translation encodes not what was encoded originally but (the translator's
interpretation of what) the source concept was intended to communicate. There are three ways in which what is communicated by a concept may depart from what it encodes: concept narrowing, concept loosening, and echoic uses of concepts. In addition to discussing these processes in relation to translation, arguments are put forward for the existence of a further resemblance possibility: concept widening
AN ANALYSIS OF ALTERNATIVE NET PRESENT VALUE CAPITAL INVESTMENT DECISION MODELS
We have found that the disagreement between Returns-to-Assets (RTA) and Returns-to-Equity (RTE) proponents is not confined to agricultural economics. Depending on the course they are taking and the accompanying text, students are likely to learn that there is a "right" way to calculate Net Present Values (NPVs), either by the RTA method or the RTE method. In most cases, only one of the two methods is discussed and illustrated with numerical examples. Less common are texts that compare the two methods, discuss their underlying assumptions, or show how the NPVs from the two methods can be reconciled. The paper is organized as follows. The first section of the main body of the paper provides a comparative overview of the RTA and RTE methods; the second section discusses our textbook survey; the final section offers our conclusions. Appendix A contains a brief history of the theoretical development of discounted cash flow (DCF) concepts. Appendix B contains additional details on defining components of NPV models. Finally, Appendix C is a listing of some additional references.Research Methods/ Statistical Methods,
Analyzing transfer learning impact in biomedical cross lingual named entity recognition and normalization
Background
The volume of biomedical literature and clinical data is growing at an exponential rate. Therefore, efficient access to data described in unstructured biomedical texts is a crucial task for the biomedical industry and research. Named Entity Recognition (NER) is the first step for information and knowledge acquisition when we deal with unstructured texts. Recent NER approaches use contextualized word representations as input for a downstream classification task. However, distributed word vectors (embeddings) are very limited in Spanish and even more for the biomedical domain.
Methods
In this work, we develop several biomedical Spanish word representations, and we introduce two Deep Learning approaches for pharmaceutical, chemical, and other biomedical entities recognition in Spanish clinical case texts and biomedical texts, one based on a Bi-STM-CRF model and the other on a BERT-based architecture.
Results
Several Spanish biomedical embeddigns together with the two deep learning models were evaluated on the PharmaCoNER and CORD-19 datasets. The PharmaCoNER dataset is composed of a set of Spanish clinical cases annotated with drugs, chemical compounds and pharmacological substances; our extended Bi-LSTM-CRF model obtains an F-score of 85.24% on entity identification and classification and the BERT model obtains an F-score of 88.80% . For the entity normalization task, the extended Bi-LSTM-CRF model achieves an F-score of 72.85% and the BERT model achieves 79.97%. The CORD-19 dataset consists of scholarly articles written in English annotated with biomedical concepts such as disorder, species, chemical or drugs, gene and protein, enzyme and anatomy. Bi-LSTM-CRF model and BERT model obtain an F-measure of 78.23% and 78.86% on entity identification and classification, respectively on the CORD-19 dataset.
Conclusion
These results prove that deep learning models with in-domain knowledge learned from large-scale datasets highly improve named entity recognition performance. Moreover, contextualized representations help to understand complexities and ambiguity inherent to biomedical texts. Embeddings based on word, concepts, senses, etc. other than those for English are required to improve NER tasks in other languages.This work was partially supported by the Research Program of the Ministry of Economy and Competitiveness - Government of Spain, (DeepEMR project TIN2017-87548-C2-1-R)
Critical legal reading: The elements, strategies and dispositions needed to master this essential skill
Alex Steel, Kate Galloway, Mary Heath, Natalie Skead, Mark Israel, Anne Hewit
Summer Season in the Working Settlement: From Mushrooms to People and Back. Uralmash in Fiction and Non-fiction Literature About 1930s
The article is devoted to the analysis of several key concepts of texts on construction and the first years of Uralmashzavod work (1930s, 1960s-1970s). They are traced on the material of artistic and diary prose, memoirs and records, in which similar motives are found. Most of the texts are stored in the Museum archive and they are either introduced into the scientific discourse for the first time or it is the first time that they become the object of philological and anthropological studies. “Excluded” loci of Ovalov’s novel are “filled” with memories about Uralmash, but the main memory of “first builders” involves a surprisingly large number of concepts presented in the majority of cases only by fiction. There is an attempt to trace how the chronotope and topography of the working settlement in fiction are reflected in the identity of “old-timers” corresponding to the deep needs and expectations of the factory workers. The reconstruction of the “anthropogenesis” of a “new worker”, emerging from the meta-text about the giant plant is represented alongside with the paradox of the adoption of “wild space”, which after being cleared of the forest for the construction of the plant, upon decades turned in one of the greenest areas of Sverdlovsk-Ekaterinburg.
Keywords: Uralmash, socialist city, first builders, Prishvin’s diaries, A. Bannikov, E. Bannikova, L. Ovalov, V. Anfimov, “Zina Demina”, Jiang Tsinggu
- …