628 research outputs found
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Transformer is a deep neural network that employs a self-attention mechanism
to comprehend the contextual relationships within sequential data. Unlike
conventional neural networks or updated versions of Recurrent Neural Networks
(RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in
handling long dependencies between input sequence elements and enable parallel
processing. As a result, transformer-based models have attracted substantial
interest among researchers in the field of artificial intelligence. This can be
attributed to their immense potential and remarkable achievements, not only in
Natural Language Processing (NLP) tasks but also in a wide range of domains,
including computer vision, audio and speech processing, healthcare, and the
Internet of Things (IoT). Although several survey papers have been published
highlighting the transformer's contributions in specific fields, architectural
differences, or performance evaluations, there is still a significant absence
of a comprehensive survey paper encompassing its major applications across
various domains. Therefore, we undertook the task of filling this gap by
conducting an extensive survey of proposed transformer models from 2017 to
2022. Our survey encompasses the identification of the top five application
domains for transformer-based models, namely: NLP, Computer Vision,
Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze
the impact of highly influential transformer-based models in these domains and
subsequently classify them based on their respective tasks using a proposed
taxonomy. Our aim is to shed light on the existing potential and future
possibilities of transformers for enthusiastic researchers, thus contributing
to the broader understanding of this groundbreaking technology
Mixed Reality Interfaces for Augmented Text and Speech
While technology plays a vital role in human communication, there still remain many significant challenges when using them in everyday life. Modern computing technologies, such as smartphones, offer convenient and swift access to information, facilitating tasks like reading documents or communicating with friends. However, these tools frequently lack adaptability, become distracting, consume excessive time, and impede interactions with people and contextual information. Furthermore, they often require numerous steps and significant time investment to gather pertinent information. We want to explore an efficient process of contextual information gathering for mixed reality (MR) interfaces that provide information directly in the user’s view. This approach allows for a seamless and flexible transition between language and subsequent contextual references, without disrupting the flow of communication. ’Augmented Language’ can be defined as the integration of language and communication with mixed reality to enhance, transform, or manipulate language-related aspects and various forms of linguistic augmentations (such as annotation/referencing, aiding social interactions, translation, localization, etc.). In this thesis, our broad objective is to explore mixed reality interfaces and their potential to enhance augmented language, particularly in the domains of speech and text. Our aim is to create interfaces that offer a more natural, generalizable, on-demand, and real-time experience of accessing contextually relevant information and providing adaptive interactions. To better address this broader objective, we systematically break it down to focus on two instances of augmented language. First, enhancing augmented conversation to support on-the-fly, co-located in-person conversations using embedded references. And second, enhancing digital and physical documents using MR to provide on-demand reading support in the form of different summarization techniques. To examine the effectiveness of these speech and text interfaces, we conducted two studies in which we asked the participants to evaluate our system prototype in different use cases. The exploratory usability study for the first exploration confirms that our system decreases distraction and friction in conversation compared to smartphone search while providing highly useful and relevant information. For the second project, we conducted an exploratory design workshop to identify categories of document enhancements. We later conducted a user study with a mixed-reality prototype to highlight five board themes to discuss the benefits of MR document enhancement
Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology
Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/
Northeastern Illinois University, Academic Catalog 2023-2024
https://neiudc.neiu.edu/catalogs/1064/thumbnail.jp
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Large language models (LLMs) are a special class of pretrained language
models obtained by scaling model size, pretraining corpus and computation.
LLMs, because of their large size and pretraining on large volumes of text
data, exhibit special abilities which allow them to achieve remarkable
performances without any task-specific training in many of the natural language
processing tasks. The era of LLMs started with OpenAI GPT-3 model, and the
popularity of LLMs is increasing exponentially after the introduction of models
like ChatGPT and GPT4. We refer to GPT-3 and its successor OpenAI models,
including ChatGPT and GPT4, as GPT-3 family large language models (GLLMs). With
the ever-rising popularity of GLLMs, especially in the research community,
there is a strong need for a comprehensive survey which summarizes the recent
research progress in multiple dimensions and can guide the research community
with insightful future research directions. We start the survey paper with
foundation concepts like transformers, transfer learning, self-supervised
learning, pretrained language models and large language models. We then present
a brief overview of GLLMs and discuss the performances of GLLMs in various
downstream tasks, specific domains and multiple languages. We also discuss the
data labelling and data augmentation abilities of GLLMs, the robustness of
GLLMs, the effectiveness of GLLMs as evaluators, and finally, conclude with
multiple insightful future research directions. To summarize, this
comprehensive survey paper will serve as a good resource for both academic and
industry people to stay updated with the latest research related to GPT-3
family large language models.Comment: Preprint under review, 58 page
Each book its own Babel:Conceptual unity and disunity in early modern natural philosophy
Natural philosophy changed quickly during the early modern period (1600-1800). Aristotelian philosophy was combated by Cartesian mechanicism, which was soon itself ousted by the Newtonian school. The development of new ideas within a scientific discipline is partially an issue of doing empirical research, in order to exclude positions and progress the field. However, it is also an issue of developing new concepts and a fitting language, in order to be able to express all these new positions being investigated. This second development however also implies that the differences between thinkers might grow too large - the languages in which they express their philosophy can become too different for them to have a meaningful discussion. In this dissertation I investigate, using algorithms that extract the meaning of words from texts, a few hundred texts from these three different school. I do this in order to see how they differ from each other conceptually, how the meaning of words can travel through lines of influence from author to author and how guarding the boundaries of a school and guarding the language they use, relate
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Evaluating Copyright Protection in the Data-Driven Era: Centering on Motion Picture\u27s Past and Future
Since the 1910s, Hollywood has measured audience preferences with rough industry-created methods. In the 1940s, scientific audience research led by George Gallup started to conduct film audience surveys with traditional statistical and psychological methods. However, the quantity, quality, and speed were limited. Things dramatically changed in the internet age. The prevalence of digital data increases the instantaneousness, convenience, width, and depth of collecting audience and content data. Advanced data and AI technologies have also allowed machines to provide filmmakers with ideas or even make human-like expressions. This brings new copyright challenges in the data-driven era.
Massive amounts of text and data are the premise of text and data mining (TDM), as well as the admission ticket to access machine learning technologies. Given the high and uncertain copyright violation risks in the data-driven creation process, whoever controls the copyrighted film materials can monopolize the data and AI technologies to create motion pictures in the data-driven era. Considering that copyright shall not be the gatekeeper to new technological uses that do not impair the original uses of copyrighted works in the existing markets, this study proposes to create a TDM and model training limitations or exceptions to copyrights and recommends the Singapore legislative model.
Motion pictures, as public entertainment media, have inherently limited creative choices. Identifying data-driven works’ human original expression components is also challenging. This study proposes establishing a voluntarily negotiated license institution backed up by a compulsory license to enable other filmmakers to reuse film materials in new motion pictures. The film material’s degree of human original authorship certified by film artists’ guilds shall be a crucial factor in deciding the compulsory license’s royalty rate and terms to encourage retaining human artists. This study argues that international and domestic policymakers should enjoy broad discretion to qualify data-driven work’s copyright protection because data-driven work is a new category of work. It would be too late to wait until ubiquitous data-driven works block human creative freedom and floods of data-driven work copyright litigations overwhelm the judicial systems
- …