2,711 research outputs found

    Low- and high-resource opinion summarization

    Get PDF
    Customer reviews play a vital role in the online purchasing decisions we make. The reviews express user opinions that are useful for setting realistic expectations and uncovering important details about products. However, some products receive hundreds or even thousands of reviews, making them time-consuming to read. Moreover, many reviews contain uninformative content, such as irrelevant personal experiences. Automatic summarization offers an alternative – short text summaries capturing the essential information expressed in reviews. Automatically produced summaries can reflect overall or particular opinions and be tailored to user preferences. Besides being presented on major e-commerce platforms, home assistants can also vocalize them. This approach can improve user satisfaction by assisting in making faster and better decisions. Modern summarization approaches are based on neural networks, often requiring thousands of annotated samples for training. However, human-written summaries for products are expensive to produce because annotators need to read many reviews. This has led to annotated data scarcity where only a few datasets are available. Data scarcity is the central theme of our works, and we propose a number of approaches to alleviate the problem. The thesis consists of two parts where we discuss low- and high-resource data settings. In the first part, we propose self-supervised learning methods applied to customer reviews and few-shot methods for learning from small annotated datasets. Customer reviews without summaries are available in large quantities, contain a breadth of in-domain specifics, and provide a powerful training signal. We show that reviews can be used for learning summarizers via a self-supervised objective. Further, we address two main challenges associated with learning from small annotated datasets. First, large models rapidly overfit on small datasets leading to poor generalization. Second, it is not possible to learn a wide range of in-domain specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to subtle semantic mistakes in generated summaries, such as ‘great dead on arrival battery.’ We address the first challenge by explicitly modeling summary properties (e.g., content coverage and sentiment alignment). Furthermore, we leverage small modules – adapters – that are more robust to overfitting. As we show, despite their size, these modules can be used to store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method for learning personalized summarizers based on aspects, such as ‘price,’ ‘battery life,’ and ‘resolution.’ This task is harder to learn, and we present a few-shot method for training a query-based summarizer on small annotated datasets. In the second part, we focus on the high-resource setting and present a large dataset with summaries collected from various online resources. The dataset has more than 33,000 humanwritten summaries, where each is linked up to thousands of reviews. This, however, makes it challenging to apply an ‘expensive’ deep encoder due to memory and computational costs. To address this problem, we propose selecting small subsets of informative reviews. Only these subsets are encoded by the deep encoder and subsequently summarized. We show that the selector and summarizer can be trained end-to-end via amortized inference and policy gradient methods

    Archaeological palaeoenvironmental archives: challenges and potential

    Get PDF
    This Arts and Humanities Research Council (AHRC) sponsored collaborative doctoral project represents one of the most significant efforts to collate quantitative and qualitative data that can elucidate practices related to archaeological palaeoenvironmental archiving in England. The research has revealed that archived palaeoenvironmental remains are valuable resources for archaeological research and can clarify subjects that include the adoption and importation of exotic species, plant and insect invasion, human health and diet, and plant and animal husbandry practices. In addition to scientific research, archived palaeoenvironmental remains can provide evidence-based narratives of human resilience and climate change and offer evidence of the scientific process, making them ideal resources for public science engagement. These areas of potential have been realised at an imperative time; given that waterlogged palaeoenvironmental remains at significant sites such as Star Carr, Must Farm, and Flag Fen, archaeological deposits in towns and cities are at risk of decay due to climate change-related factors, and unsustainable agricultural practices. Innovative approaches to collecting and archiving palaeoenvironmental remains and maintaining existing archives will permit the creation of an accessible and thorough national resource that can service archaeologists and researchers in the related fields of biology and natural history. Furthermore, a concerted effort to recognise absences in archaeological archives, matched by an effort to supply these deficiencies, can produce a resource that can contribute to an enduring geographical and temporal record of England's biodiversity, which can be used in perpetuity in the face of diminishing archaeological and contemporary natural resources. To realise these opportunities, particular challenges must be overcome. The most prominent of these include inconsistent collection policies resulting from pressures associated with shortages in storage capacity and declining specialist knowledge in museums and repositories combined with variable curation practices. Many of these challenges can be resolved by developing a dedicated storage facility that can focus on the ongoing conservation and curation of palaeoenvironmental remains. Combined with an OASIS + module designed to handle and disseminate data pertaining to palaeoenvironmental archives, remains would be findable, accessible, and interoperable with biological archives and collections worldwide. Providing a national centre for curating palaeoenvironmental remains and a dedicated digital repository will require significant funding. Funding sources could be identified through collaboration with other disciplines. If sufficient funding cannot be identified, options that would require less financial investment, such as high-level archive audits and the production of guidance documents, will be able to assist all stakeholders with the improved curation, management, and promotion of the archived resource

    The Social Evolution of World Politics

    Get PDF
    How can we understand long-term change in world politics better? Based on readings of thinkers as diverse as Habermas, Foucault and Luhmann, the authors of this book propose a framework for understanding such change in terms of social evolution. They show that processes of social learning and unlearning are key to understanding the long-term historical evolution of complex societies, and propose to approach these with the core concepts of autonomization, hierarchical complexity, and co-evolution. Three case studies illustrate this social evolutionary perspective to the study of world politics, examining the evolution of forms of organizing political authority, of conflicts, of diplomacy, of law as boundary condition

    Full Issue, Volume 2

    Get PDF
    Full issue of volume 2

    The Influence of Greek on the Gothic New Testament in relation to Proto-Germanic: Corpus-driven Evidence on Written Contact

    Get PDF
    Η παρούσα μελέτη διερευνά τη ρηματική σύνταξη στο Γοτθικό κείμενο της Καινής Διαθήκης, Επιστολές στους Κορινθίους ΙΙ:12, τις οποίες συγκρίνει με το Ελληνικό κείμενο-πηγή και τις διαχρονικές αναμεταφράσεις στα Αγγλικά. Η Γοτθική είναι η μόνη σωζόμενη γλώσσα του Aνατολικο-γερμανικού κλάδου και η αρχαιότερη της Γερμανικής οικογένειας, καθώς χρονολογείται από τον 4ο μ.Χ.. Τα δεδομένα δείχνουν μικρές ομοιότητες μεταξύ του Γοτθικού και των Αγγλικών κειμένων, όσον αφορά την εκφορά της φωνής/διάθεσης και των τυπολογικών προτιμήσεων στη ρηματική σύνταξη, καθώς και σημαντικές αποκλίσεις από τις αντίστοιχες Ελληνικές ρηματικές δομές. Τα τυπολογικά σχήματα που προκύπτουν δείχνουν προτίμηση της Γοτθικής για περιφραστική μετάφραση των δομών που στα Ελληνικά εκφέρονται με παθητική, φαινόμενο που παρατηρείται λιγότερο στην αντίστοιχη μετάφραση στα Αγγλικά. Από τα δεδομένα συνάγεται ότι οι αποκλίσεις αυτές μπορούν να αποτελέσουν ενδείξεις των παράλληλων γραμματικών συστημάτων που σχηματίζονται μέσω της ‘γραπτής επαφής’ και ενδεχομένως έχουν διαχρονικές επιπτώσεις στην εξέλιξη των εμπλεκόμενων γλωσσών. Μέσω στατιστικής ανάλυσης και με αντιθετική αντιπαραβολή ποσοτικών δεδομένων μπορεί να έρθουν στην επιφάνεια τυπολογικά μοτίβα που προηγουμένως δεν είχαν διερευνηθεί. Μελλοντικές μελέτες θα μπορούσαν να επεκτείνουν το εύρος της έρευνας σε διαχρονικές ενδογλωσσικές μεταφράσεις σε σύγκριση με τη Γοτθική για να αποδείξουν τις τυπολογικές τους σχέσεις.The present study investigates verbal constructions in the Gothic New Testament text, Epistles to the Corinthians II:12. The corpus compiled is compared to the Greek source text and diachronic retranslations in English. Gothic is the only surviving language of the East Germanic branch, and the oldest within the Germanic family, as it dates back to the 4th AD. Although such an old language can provide insight into the typology of Germanic, the available manuscripts are limited; Biblical translations of the New Testament and fragmentary Old Testament excerpts are the only existing texts. The results show slight similarities between the Gothic and English corpora under examination, concerning the expression of voice/diathesis and the typological preferences of the verbal constructions, as well as significant deviations from the Greek equivalent constructions. The prevalent typological patterns show the preference of Gothic for periphrastic translation of the Greek passive constructions, which is less observed in the equivalent English translations. The findings insinuate that such deviations can contribute to evidence of the parallel grammar systems that are diachronically manifested by written contact and can bear diachronic implications to the evolution of the languages involved. Statistical analysis within a contrastive approach, along with the examination of quantified data, can highlight typological patterns that were previously underexplored. Future studies could extend the scope of research to other intralingual Germanic translations in comparison to Gothic, in order to establish evidence on their typological relations

    Improving Science That Uses Code

    Get PDF
    As code is now an inextricable part of science it should be supported by competent Software Engineering, analogously to statistical claims being properly supported by competent statistics.If and when code avoids adequate scrutiny, science becomes unreliable and unverifiable because results — text, data, graphs, images, etc — depend on untrustworthy code.Currently, scientists rarely assure the quality of the code they rely on, and rarely make it accessible for scrutiny. Even when available, scientists rarely provide adequate documentation to understand or use it reliably.This paper proposes and justifies ways to improve science using code:1. Professional Software Engineers can help, particularly in critical fields such as public health, climate change and energy.2. ‘Software Engineering Boards,’ analogous to Ethics or Institutional Review Boards, should be instigated and used.3. The Reproducible Analytic Pipeline (RAP) methodology can be generalized to cover code and Software Engineering methodologies, in a generalization this paper introduces called RAP+. RAP+ (or comparable interventions) could be supported and or even required in journal, conference and funding body policies.The paper’s Supplemental Material provides a summary of Software Engineering best practice relevant to scientific research, including further suggestions for RAP+ workflows.‘Science is what we understand well enough to explain to a computer.’ Donald E. Knuth in A=B [ 1]‘I have to write to discover what I am doing.’ Flannery O’Connor, quoted in Write for your life [ 2]‘Criticism is the mother of methodology.’ Robert P. Abelson in Statistics as Principled Argument [ 3]‘From its earliest times, science has operated by being open and transparent about methods and evidence, regardless of which technology has been in vogue.’ Editorial in Nature [4

    Accustomed to Obedience?

    Get PDF
    Many histories of Ancient Greece center their stories on Athens, but what would that history look like if they didn’t? There is another way to tell this story, one that situates Greek history in terms of the relationships between smaller Greek cities and in contact with the wider Mediterranean. In this book, author Joshua P. Nudell offers a new history of the period from the Persian wars to wars that followed the death of Alexander the Great, from the perspective of Ionia. While recent scholarship has increasingly treated Greece through the lenses of regional, polis, and local interaction, there has not yet been a dedicated study of Classical Ionia. This book fills this clear gap in the literature while offering Ionia as a prism through which to better understand Classical Greece. This book offers a clear and accessible narrative of the period between the Persian Wars and the wars of the early Hellenistic period, two nominal liberations of the region. The volume complements existing histories of Classical Greece. Close inspection reveals that the Ionians were active partners in the imperial endeavor, even as imperial competition constrained local decision-making and exacerbated local and regional tensions. At the same time, the book offers interventions on critical issues related to Ionia such as the Athenian conquest of Samos, rhetoric about the freedom of the Greeks, the relationship between Ionian temple construction and economic activity, the status of the Panionion, Ionian poleis and their relationship with local communities beyond the circle of the dodecapolis, and the importance of historical memory to our understanding of ancient Greece. The result is a picture of an Aegean world that is more complex and less beholden narratives that give primacy to the imperial actors at the expense of local developments

    Volume 45: Full Issue

    Get PDF
    Humboldt Journal of Social Relations 50th Anniversary Edition: Becoming a Polytechni

    Applying machine learning: a multi-role perspective

    Get PDF
    Machine (and deep) learning technologies are more and more present in several fields. It is undeniable that many aspects of our society are empowered by such technologies: web searches, content filtering on social networks, recommendations on e-commerce websites, mobile applications, etc., in addition to academic research. Moreover, mobile devices and internet sites, e.g., social networks, support the collection and sharing of information in real time. The pervasive deployment of the aforementioned technological instruments, both hardware and software, has led to the production of huge amounts of data. Such data has become more and more unmanageable, posing challenges to conventional computing platforms, and paving the way to the development and widespread use of the machine and deep learning. Nevertheless, machine learning is not only a technology. Given a task, machine learning is a way of proceeding (a way of thinking), and as such can be approached from different perspectives (points of view). This, in particular, will be the focus of this research. The entire work concentrates on machine learning, starting from different sources of data, e.g., signals and images, applied to different domains, e.g., Sport Science and Social History, and analyzed from different perspectives: from a non-data scientist point of view through tools and platforms; setting a problem stage from scratch; implementing an effective application for classification tasks; improving user interface experience through Data Visualization and eXtended Reality. In essence, not only in a quantitative task, not only in a scientific environment, and not only from a data-scientist perspective, machine (and deep) learning can do the difference
    corecore