Search CORE

1,035 research outputs found

Generating Concise and Readable Summaries of XML Documents

Author: Ifrim Georgiana
Kumar Kondreddi Sarath
Ramanath Maya
Publication venue
Publication date: 01/01/2009
Field of study

XML has become the de-facto standard for data representation and exchange, resulting in large scale repositories and warehouses of XML data. In order for users to understand and explore these large collections, a summarized, bird's eye view of the available data is a necessity. In this paper, we are interested in semantic XML document summaries which present the "important" information available in an XML document to the user. In the best case, such a summary is a concise replacement for the original document itself. At the other extreme, it should at least help the user make an informed choice as to the relevance of the document to his needs. In this paper, we address the two main issues which arise in producing such meaningful and concise summaries: i) which tags or text units are important and should be included in the summary, ii) how to generate summaries of different sizes.%for different memory budgets. We conduct user studies with different real-life datasets and show that our methods are useful and effective in practice

arXiv.org e-Print Archive

MPG.PuRe

Entity Summarisation with Limited Edge Budget on Undirected and Directed Knowledge Graphs

Author: Pikuła Mariusz
Schenkel Ralf
Sydow Marcin
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 15/09/2010
Field of study

The paper concerns a novel problem of summarising entities with limited presentation budget on entity-relationship knowledge graphs and propose an efficient algorithm for solving this problem. The algorithm has been implemented in two variants: undirected and directed, together with a visualisation tool. Experimental user evaluation of the algorithm was conducted on real large semantic knowledge graphs extracted from the web. The reported results of experimental user evaluation are promising and encourage to continue the work on improving the algorithm.

Biblioteka Nauki - repozytorium artykuÅÃ³w

Investigationes Linguisticae

Why data citation isn't working, and what to do about it

Author: Buneman Peter
Christie Greig
Davies Jamie A.
Dimitrellou Roza
Harding Simon D.
Pawson Adam J.
Sharman Joanna L.
Wu Yinjun
Publication venue: 'Oxford University Press (OUP)'
Publication date: 02/05/2020
Field of study

Edinburgh Research Explorer

Interactively learning to summarise timelines by reinforcement learning

Author: Ye Yuxuan
Publication venue
Publication date: 22/03/2022
Field of study

Explore Bristol Research

Automatically Documenting Software Artifacts

Author: Li Boyang
Publication venue: W&M ScholarWorks
Publication date: 13/09/2017
Field of study

Software artifacts, such as database schema and unit test cases, constantly change during evolution and maintenance of software systems. Co-evolution of code and DB schemas in Database-Centric Applications (DCAs) often leads to two types of challenging scenarios for developers, where (i) changes to the DB schema need to be incorporated in the source code, and (ii) maintenance of a DCAs code requires understanding of how the features are implemented by relying on DB operations and corresponding schema constraints. On the other hand, the number of unit test cases often grows as new functionality is introduced into the system, and maintaining these unit tests is important to reduce the introduction of regression bugs due to outdated unit tests. Therefore, one critical artifact that developers need to be able to maintain during evolution and maintenance of software systems is up-to-date and complete documentation. In order to understand developer practices regarding documenting and maintaining these software artifacts, we designed two empirical studies both composed of (i) an online survey of contributors of open source projects and (ii) a mining-based analysis of method comments in these projects. We observed that documenting methods with database accesses and unit test cases is not a common practice. Further, motivated by the findings of the studies, we proposed three novel approaches: (i) DBScribe is an approach for automatically documenting database usages and schema constraints, (ii) UnitTestScribe is an approach for automatically documenting test cases, and (iii) TeStereo tags stereotypes for unit tests and generates html reports to improve the comprehension and browsing of unit tests in a large test suite. We evaluated our tools in the case studies with industrial developers and graduate students. In general, developers indicated that descriptions generated by the tools are complete, concise, and easy to read. The reports are useful for source code comprehension tasks as well as other tasks, such as code smell detection and source code navigation

College of William & Mary: W&M Publish

Specifications of standards in systems and synthetic biology: Status and developments in 2020

Author: Czauderna Tobias
Golebiewski Martin
Gorochowski Thomas E
Hucka Michael
Keating Sarah M
König Matthias
Myers Chris
Nickerson David
Schreiber Falk
Sommer Bjorn
Waltemath Dagmar
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2020
Field of study

This special issue of the Journal of Integrative Bioinformatics presents papers related to the 10th COMBINE meeting together with the annual update of COMBINE standards in systems and synthetic biology

Crossref

Directory of Open Access Journals

Caltech Authors

Royal College of Art Research Repository

Monash University Research Portal

Explore Bristol Research

Entity Summarisation with Limited Edge Budget on Undirected and Directed Knowledge Graphs

Author
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date
Field of study

Crossref

Domain-specific ChatBots for Science using Embeddings

Author: Yager Kevin G.
Publication venue
Publication date: 15/06/2023
Field of study

Large language models (LLMs) have emerged as powerful machine-learning systems capable of handling a myriad of tasks. Tuned versions of these systems have been turned into chatbots that can respond to user queries on a vast diversity of topics, providing informative and creative replies. However, their application to physical science research remains limited owing to their incomplete knowledge in these areas, contrasted with the needs of rigor and sourcing in science domains. Here, we demonstrate how existing methods and software tools can be easily combined to yield a domain-specific chatbot. The system ingests scientific documents in existing formats, and uses text embedding lookup to provide the LLM with domain-specific contextual information when composing its reply. We similarly demonstrate that existing image embedding methods can be used for search and retrieval across publication figures. These results confirm that LLMs are already suitable for use by physical scientists in accelerating their research efforts.Comment: 12 pages, 5 figure

arXiv.org e-Print Archive