2,826 research outputs found
Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management
Spreadsheet software is the tool of choice for interactive ad-hoc data
management, with adoption by billions of users. However, spreadsheets are not
scalable, unlike database systems. On the other hand, database systems, while
highly scalable, do not support interactivity as a first-class primitive. We
are developing DataSpread, to holistically integrate spreadsheets as a
front-end interface with databases as a back-end datastore, providing
scalability to spreadsheets, and interactivity to databases, an integration we
term presentational data management (PDM). In this paper, we make a first step
towards this vision: developing a storage engine for PDM, studying how to
flexibly represent spreadsheet data within a database and how to support and
maintain access by position. We first conduct an extensive survey of
spreadsheet use to motivate our functional requirements for a storage engine
for PDM. We develop a natural set of mechanisms for flexibly representing
spreadsheet data and demonstrate that identifying the optimal representation is
NP-Hard; however, we develop an efficient approach to identify the optimal
representation from an important and intuitive subclass of representations. We
extend our mechanisms with positional access mechanisms that don't suffer from
cascading update issues, leading to constant time access and modification
performance. We evaluate these representations on a workload of typical
spreadsheets and spreadsheet operations, providing up to 20% reduction in
storage, and up to 50% reduction in formula evaluation time
Sparse Attention-Based Neural Networks for Code Classification
Categorizing source codes accurately and efficiently is a challenging problem
in real-world programming education platform management. In recent years,
model-based approaches utilizing abstract syntax trees (ASTs) have been widely
applied to code classification tasks. We introduce an approach named the Sparse
Attention-based neural network for Code Classification (SACC) in this paper.
The approach involves two main steps: In the first step, source code undergoes
syntax parsing and preprocessing. The generated abstract syntax tree is split
into sequences of subtrees and then encoded using a recursive neural network to
obtain a high-dimensional representation. This step simultaneously considers
both the logical structure and lexical level information contained within the
code. In the second step, the encoded sequences of subtrees are fed into a
Transformer model that incorporates sparse attention mechanisms for the purpose
of classification. This method efficiently reduces the computational cost of
the self-attention mechanisms, thus improving the training speed while
preserving effectiveness. Our work introduces a carefully designed sparse
attention pattern that is specifically designed to meet the unique needs of
code classification tasks. This design helps reduce the influence of redundant
information and enhances the overall performance of the model. Finally, we also
deal with problems in previous related research, which include issues like
incomplete classification labels and a small dataset size. We annotated the
CodeNet dataset with algorithm-related labeling categories, which contains a
significantly large amount of data. Extensive comparative experimental results
demonstrate the effectiveness and efficiency of SACC for the code
classification tasks.Comment: 2023 3rd International Conference on Digital Society and Intelligent
Systems (DSInS 2023
PERICLES Deliverable 4.3:Content Semantics and Use Context Analysis Techniques
The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner. Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.PERICLE
Unlocking Fine-Grained Details with Wavelet-based High-Frequency Enhancement in Transformers
Medical image segmentation is a critical task that plays a vital role in
diagnosis, treatment planning, and disease monitoring. Accurate segmentation of
anatomical structures and abnormalities from medical images can aid in the
early detection and treatment of various diseases. In this paper, we address
the local feature deficiency of the Transformer model by carefully re-designing
the self-attention map to produce accurate dense prediction in medical images.
To this end, we first apply the wavelet transformation to decompose the input
feature map into low-frequency (LF) and high-frequency (HF) subbands. The LF
segment is associated with coarse-grained features while the HF components
preserve fine-grained features such as texture and edge information. Next, we
reformulate the self-attention operation using the efficient Transformer to
perform both spatial and context attention on top of the frequency
representation. Furthermore, to intensify the importance of the boundary
information, we impose an additional attention map by creating a Gaussian
pyramid on top of the HF components. Moreover, we propose a multi-scale context
enhancement block within skip connections to adaptively model inter-scale
dependencies to overcome the semantic gap among stages of the encoder and
decoder modules. Throughout comprehensive experiments, we demonstrate the
effectiveness of our strategy on multi-organ and skin lesion segmentation
benchmarks. The implementation code will be available upon acceptance.
\href{https://github.com/mindflow-institue/WaveFormer}{GitHub}.Comment: Accepted in MICCAI 2023 workshop MLM
A Survey of Quantum Theory Inspired Approaches to Information Retrieval
Since 2004, researchers have been using the mathematical framework of Quantum Theory (QT) in Information Retrieval (IR). QT offers a generalized probability and logic framework. Such a framework has been shown capable of unifying the representation, ranking and user cognitive aspects of IR, and helpful in developing more dynamic, adaptive and context-aware IR systems. Although Quantum-inspired IR is still a growing area, a wide array of work in different aspects of IR has been done and produced promising results. This paper presents a survey of the research done in this area, aiming to show the landscape of the field and draw a road-map of future directions
UTILISING NETWORKED WORKSTATIONS TO ACCELERATE DATABASE QUERIES
The rapid growth in
the size of databases and the advances made in Query Languages has resulted in increased SQL query complexity submitted by users, which in turn slows down the speed of information retrieval from the database.
The future of high performance database systems lies in parallelism. Commercial
vendors´ database systems have introduced solutions but these have proved to be
extremely expensive.
This paper investagetes how networked resources such as workstations can be
utilised by using Parallel Virtual Machine (PVM) to Optimise Database Query Execution. An investigation and experiments of the scalability of the PVM are conducted. PVM is
used to implement palallelism in two separate ways:
(i) Removes the work load for deriving and maintaining rules from the
data server for Semantic Query Optimisation, therefore clears the way for more
widespread use of SQO in databases [16], [5].
(ii) Answers users queries by a proposed Parallel Query Algorithm PQA
which works over a network of workstations, coupled with a sequential Database
Management System DBMS called PostgreSql on the prototype called Expandable
Server Architecture ESA [11], [12], [21], [13].
Experiments have been conducted to
tackle the problems of Parallel and Distributed systems such as task
scheduling, load balance and fault tolerance
- …