1,035 research outputs found
Generating Concise and Readable Summaries of XML Documents
XML has become the de-facto standard for data representation and exchange,
resulting in large scale repositories and warehouses of XML data. In order for
users to understand and explore these large collections, a summarized, bird's
eye view of the available data is a necessity. In this paper, we are interested
in semantic XML document summaries which present the "important" information
available in an XML document to the user. In the best case, such a summary is a
concise replacement for the original document itself. At the other extreme, it
should at least help the user make an informed choice as to the relevance of
the document to his needs. In this paper, we address the two main issues which
arise in producing such meaningful and concise summaries: i) which tags or text
units are important and should be included in the summary, ii) how to generate
summaries of different sizes.%for different memory budgets. We conduct user
studies with different real-life datasets and show that our methods are useful
and effective in practice
Entity Summarisation with Limited Edge Budget on Undirected and Directed Knowledge Graphs
The paper concerns a novel problem of summarising entities with limited presentation budget on entity-relationship knowledge graphs and propose an efficient algorithm for solving this problem. The algorithm has been implemented in two variants: undirected and directed, together with a visualisation tool. Experimental user evaluation of the algorithm was conducted on real large semantic knowledge graphs extracted from the web. The reported results of experimental user evaluation are promising and encourage to continue the work on improving the algorithm.
Automatically Documenting Software Artifacts
Software artifacts, such as database schema and unit test cases, constantly change during evolution and maintenance of software systems. Co-evolution of code and DB schemas in Database-Centric Applications (DCAs) often leads to two types of challenging scenarios for developers, where (i) changes to the DB schema need to be incorporated in the source code, and (ii) maintenance of a DCAs code requires understanding of how the features are implemented by relying on DB operations and corresponding schema constraints. On the other hand, the number of unit test cases often grows as new functionality is introduced into the system, and maintaining these unit tests is important to reduce the introduction of regression bugs due to outdated unit tests. Therefore, one critical artifact that developers need to be able to maintain during evolution and maintenance of software systems is up-to-date and complete documentation. In order to understand developer practices regarding documenting and maintaining these software artifacts, we designed two empirical studies both composed of (i) an online survey of contributors of open source projects and (ii) a mining-based analysis of method comments in these projects. We observed that documenting methods with database accesses and unit test cases is not a common practice. Further, motivated by the findings of the studies, we proposed three novel approaches: (i) DBScribe is an approach for automatically documenting database usages and schema constraints, (ii) UnitTestScribe is an approach for automatically documenting test cases, and (iii) TeStereo tags stereotypes for unit tests and generates html reports to improve the comprehension and browsing of unit tests in a large test suite. We evaluated our tools in the case studies with industrial developers and graduate students. In general, developers indicated that descriptions generated by the tools are complete, concise, and easy to read. The reports are useful for source code comprehension tasks as well as other tasks, such as code smell detection and source code navigation
Specifications of standards in systems and synthetic biology: Status and developments in 2020
This special issue of the Journal of Integrative Bioinformatics presents papers related to the 10th COMBINE meeting together with the annual update of COMBINE standards in systems and synthetic biology
Domain-specific ChatBots for Science using Embeddings
Large language models (LLMs) have emerged as powerful machine-learning
systems capable of handling a myriad of tasks. Tuned versions of these systems
have been turned into chatbots that can respond to user queries on a vast
diversity of topics, providing informative and creative replies. However, their
application to physical science research remains limited owing to their
incomplete knowledge in these areas, contrasted with the needs of rigor and
sourcing in science domains. Here, we demonstrate how existing methods and
software tools can be easily combined to yield a domain-specific chatbot. The
system ingests scientific documents in existing formats, and uses text
embedding lookup to provide the LLM with domain-specific contextual information
when composing its reply. We similarly demonstrate that existing image
embedding methods can be used for search and retrieval across publication
figures. These results confirm that LLMs are already suitable for use by
physical scientists in accelerating their research efforts.Comment: 12 pages, 5 figure
- …