1,110 research outputs found
LIS Journals\u27 Lack of Participation in Wikidata Item Creation
There are many items in Wikidata representing scholarly articles. However, these items have been created mostly by volunteer Wikidata editors and not systematically by journal publishers or editors, which can lead to gaps and inconsistencies in the datasets. This article presents findings from a survey investigating practices of library and information studies (LIS) journals in Wikidata item creation. Believing that a significant number of LIS journal editors would be aware of Wikidata and some would be creating Wikidata items for their publications, the authors sent a survey asking 138 English-language LIS journal editors if they created Wikidata items for materials published in their journal and follow-up questions. With a response rate of 41 percent, respondents overwhelmingly indicated that they did not create Wikidata items for materials published in their journal and were completely unaware of or only somewhat familiar with Wikidata. Respondents indicated that more familiarity with Wikidata and its benefits for scholarly journals as well as institutional support for the creation of Wikidata items could lead to greater participation; however, a campaign of education about Wikidata, documentation of benefits, and support for creation would be a necessary first step. The article presents and discusses the results of the survey, but the conclusions that can be drawn are minimal; therefore, the authors also discuss the benefits of creating Wikidata items for LIS journals as a first step in this educational campaign for editors and publishers
Universal Self-adaptive Prompting
A hallmark of modern large language models (LLMs) is their impressive general
zero-shot and few-shot abilities, often elicited through prompt-based and/or
in-context learning. However, while highly coveted and being the most general,
zero-shot performances in LLMs are still typically weaker due to the lack of
guidance and the difficulty of applying existing automatic prompt design
methods in general tasks when ground-truth labels are unavailable. In this
study, we address this by presenting Universal Self-adaptive Prompting (USP),
an automatic prompt design approach specifically tailored for zero-shot
learning (while compatible with few-shot). Requiring only a small amount of
unlabeled data & an inference-only LLM, USP is highly versatile: to achieve
universal prompting, USP categorizes a possible NLP task into one of the three
possible task types, and then uses a corresponding selector to select the most
suitable queries & zero-shot model-generated responses as
pseudo-demonstrations, thereby generalizing ICL to the zero-shot setup in a
fully automated way. We evaluate zero-shot USP with two PaLM models, and
demonstrate performances that are considerably stronger than standard zero-shot
baselines and are comparable to or even superior than few-shot baselines across
more than 20 natural language understanding (NLU) and natural language
generation (NLG) tasks.Comment: 10 pages, 3 figures, 4 tables (19 pages, 5 figures and 9 tables
including references and appendices
The Life Cycle of Knowledge in Big Language Models: A Survey
Knowledge plays a critical role in artificial intelligence. Recently, the
extensive success of pre-trained language models (PLMs) has raised significant
attention about how knowledge can be acquired, maintained, updated and used by
language models. Despite the enormous amount of related studies, there still
lacks a unified view of how knowledge circulates within language models
throughout the learning, tuning, and application processes, which may prevent
us from further understanding the connections between current progress or
realizing existing limitations. In this survey, we revisit PLMs as
knowledge-based systems by dividing the life circle of knowledge in PLMs into
five critical periods, and investigating how knowledge circulates when it is
built, maintained and used. To this end, we systematically review existing
studies of each period of the knowledge life cycle, summarize the main
challenges and current limitations, and discuss future directions.Comment: paperlist: https://github.com/c-box/KnowledgeLifecycl
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models
The VNHSGE (VietNamese High School Graduation Examination) dataset, developed
exclusively for evaluating large language models (LLMs), is introduced in this
article. The dataset, which covers nine subjects, was generated from the
Vietnamese National High School Graduation Examination and comparable tests.
300 literary essays have been included, and there are over 19,000
multiple-choice questions on a range of topics. The dataset assesses LLMs in
multitasking situations such as question answering, text generation, reading
comprehension, visual question answering, and more by including both textual
data and accompanying images. Using ChatGPT and BingChat, we evaluated LLMs on
the VNHSGE dataset and contrasted their performance with that of Vietnamese
students to see how well they performed. The results show that ChatGPT and
BingChat both perform at a human level in a number of areas, including
literature, English, history, geography, and civics education. They still have
space to grow, though, especially in the areas of mathematics, physics,
chemistry, and biology. The VNHSGE dataset seeks to provide an adequate
benchmark for assessing the abilities of LLMs with its wide-ranging coverage
and variety of activities. We intend to promote future developments in the
creation of LLMs by making this dataset available to the scientific community,
especially in resolving LLMs' limits in disciplines involving mathematics and
the natural sciences.Comment: 74 pages, 44 figure
The Knowledge Graph Construction in the Educational Domain: Take an Australian School Science Course as an Example
The evolution of the Internet technology and artificial intelligence has changed the ways we gain knowledge, which has expanded to every aspect of our lives. In recent years, Knowledge Graphs technology as one of the artificial intelligence techniques has been widely used in the educational domain. However, there are few studies dedicating the construction of knowledge graphs for K-10 education in Australia, and most of the existing studies only focus on at the theory level, and little research shows practical pipeline steps to complete the complex flow of constructing the educational knowledge graph. Apart from that, most studies focused on concept entities and their relations but ignored the features of concept entities and the relations between learning knowledge points and required learning outcomes. To overcome these shortages and provide the data foundation for the development of downstream research and applications in this educational domain, the construction processes of building a knowledge graph for Australian K-10 education were analyzed at the theory level and implemented in a practical way in this research. We took the Year 9 science course as a typical data source example fed to the proposed method called K10EDU-RCF-KG to construct this educational knowledge graph and to enrich the features of entities in the knowledge graph. In the construction pipeline, a variety of techniques were employed to complete the building process. Firstly, the POI and OCR techniques were applied to convert Word and PDF format files into text, followed by developing an educational resources management platform where the machine-readable text could be stored in a relational database management system. Secondly, we designed an architecture framework as the guidance of the construction pipeline. According to this architecture, the educational ontology was initially designed, and a backend microservice was developed to process the entity extraction and relation extraction by NLP-NER and probabilistic association rule mining algorithms, respectively. We also adopted the NLP-POS technique to find out the neighbor adjectives related to entitles to enrich features of these concept entitles. In addition, a subject dictionary was introduced during the refinement process of the knowledge graph, which reduced the data noise rate of the knowledge graph entities. Furthermore, the connections between learning outcome entities and topic knowledge point entities were directly connected, which provides a clear and efficient way to identify what corresponding learning objectives are related to the learning unit. Finally, a set of REST APIs for querying this educational knowledge graph were developed
Workshop Proceedings of the 12th edition of the KONVENS conference
The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years
The Open Linguistics Working Group: developing the Linguistic Linked Open Data cloud
The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Linked Open Data (LLOD) cloud, an LOD (sub-)cloud of linguistic resources, which covers various linguistic databases, lexicons, corpora, terminologies, and metadata repositories. We present and summarize five years of progress on the development of the cloud and of advancements in open data in linguistics, and we describe recent community activities. The paper aims to serve as a guideline to orient and involve researchers with the community and/or Linguistic Linked Open Data
DataPerf: Benchmarks for Data-Centric AI Development
Machine learning research has long focused on models rather than datasets,
and prominent datasets are used for common ML tasks without regard to the
breadth, difficulty, and faithfulness of the underlying problems. Neglecting
the fundamental importance of data has given rise to inaccuracy, bias, and
fragility in real-world applications, and research is hindered by saturation
across existing dataset benchmarks. In response, we present DataPerf, a
community-led benchmark suite for evaluating ML datasets and data-centric
algorithms. We aim to foster innovation in data-centric AI through competition,
comparability, and reproducibility. We enable the ML community to iterate on
datasets, instead of just architectures, and we provide an open, online
platform with multiple rounds of challenges to support this iterative
development. The first iteration of DataPerf contains five benchmarks covering
a wide spectrum of data-centric techniques, tasks, and modalities in vision,
speech, acquisition, debugging, and diffusion prompting, and we support hosting
new contributed benchmarks from the community. The benchmarks, online
evaluation platform, and baseline implementations are open source, and the
MLCommons Association will maintain DataPerf to ensure long-term benefits to
academia and industry.Comment: NeurIPS 2023 Datasets and Benchmarks Trac
- …