1,042 research outputs found
gMark: Schema-Driven Generation of Graphs and Queries
Massive graph data sets are pervasive in contemporary application domains.
Hence, graph database systems are becoming increasingly important. In the
experimental study of these systems, it is vital that the research community
has shared solutions for the generation of database instances and query
workloads having predictable and controllable properties. In this paper, we
present the design and engineering principles of gMark, a domain- and query
language-independent graph instance and query workload generator. A core
contribution of gMark is its ability to target and control the diversity of
properties of both the generated instances and the generated workloads coupled
to these instances. Further novelties include support for regular path queries,
a fundamental graph query paradigm, and schema-driven selectivity estimation of
queries, a key feature in controlling workload chokepoints. We illustrate the
flexibility and practical usability of gMark by showcasing the framework's
capabilities in generating high quality graphs and workloads, and its ability
to encode user-defined schemas across a variety of application domains.Comment: Accepted in November 2016. URL:
http://ieeexplore.ieee.org/document/7762945/. in IEEE Transactions on
Knowledge and Data Engineering 201
The Linked Data Benchmark Council (LDBC): Driving competition and collaboration in the graph data management space
Graph data management is instrumental for several use cases such as
recommendation, root cause analysis, financial fraud detection, and enterprise
knowledge representation. Efficiently supporting these use cases yields a
number of unique requirements, including the need for a concise query language
and graph-aware query optimization techniques. The goal of the Linked Data
Benchmark Council (LDBC) is to design a set of standard benchmarks that capture
representative categories of graph data management problems, making the
performance of systems comparable and facilitating competition among vendors.
LDBC also conducts research on graph schemas and graph query languages. This
paper introduces the LDBC organization and its work over the last decade
1st INCF Workshop on Sustainability of Neuroscience Databases
The goal of the workshop was to discuss issues related to the sustainability of neuroscience databases, identify problems and propose solutions, and formulate recommendations to the INCF. The report summarizes the discussions of invited participants from the neuroinformatics community as well as from other disciplines where sustainability issues have already been approached. The recommendations for the INCF involve rating, ranking, and supporting database sustainability
AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving
Unlike humans, who can effortlessly estimate the entirety of objects even
when partially occluded, modern computer vision algorithms still find this
aspect extremely challenging. Leveraging this amodal perception for autonomous
driving remains largely untapped due to the lack of suitable datasets. The
curation of these datasets is primarily hindered by significant annotation
costs and mitigating annotator subjectivity in accurately labeling occluded
regions. To address these limitations, we introduce AmodalSynthDrive, a
synthetic multi-task multi-modal amodal perception dataset. The dataset
provides multi-view camera images, 3D bounding boxes, LiDAR data, and odometry
for 150 driving sequences with over 1M object annotations in diverse traffic,
weather, and lighting conditions. AmodalSynthDrive supports multiple amodal
scene understanding tasks including the introduced amodal depth estimation for
enhanced spatial understanding. We evaluate several baselines for each of these
tasks to illustrate the challenges and set up public benchmarking servers. The
dataset is available at http://amodalsynthdrive.cs.uni-freiburg.de
Multi-dimensional data refining strategy for effective fine-tuning LLMs
Data is a cornerstone for fine-tuning large language models, yet acquiring
suitable data remains challenging. Challenges encompassed data scarcity,
linguistic diversity, and domain-specific content. This paper presents lessons
learned while crawling and refining data tailored for fine-tuning Vietnamese
language models. Crafting such a dataset, while accounting for linguistic
intricacies and striking a balance between inclusivity and accuracy, demands
meticulous planning. Our paper presents a multidimensional strategy including
leveraging existing datasets in the English language and developing customized
data-crawling scripts with the assistance of generative AI tools. A fine-tuned
LLM model for the Vietnamese language, which was produced using resultant
datasets, demonstrated good performance while generating Vietnamese news
articles from prompts. The study offers practical solutions and guidance for
future fine-tuning models in languages like Vietnamese
The LDBC social network benchmark: Business intelligence workload
The Social Network Benchmark’s Business Intelligence workload (SNB BI) is a comprehensive graph OLAP benchmark targeting analytical data systems capable of supporting graph workloads. This paper marks the finalization of almost a decade of research in academia and industry via the Linked Data Benchmark Council (LDBC). SNB BI advances the state-of-the art in synthetic and scalable analytical database benchmarks in many aspects. Its base is a sophisticated data generator, implemented on a scalable distributed infrastructure, that produces a social graph with small-world phenomena, whose value properties follow skewed and correlated distributions and where values correlate with structure. This is a temporal graph where all nodes and edges follow lifespan-based rules with temporal skew enabling realistic and consistent temporal inserts and (recursive) deletes. The query workload exploiting this skew and correlation is based on LDBC’s “choke point”-driven design methodology and will entice technical and scientific improvements in future (graph) database systems. SNB BI includes the first adoption of “parameter curation” in an analytical benchmark, a technique that ensures stable runtimes of query variants across different parameter values. Two performance metrics characterize peak single-query performance (power) and sustained concurrent query throughput. To demonstrate the portability of the benchmark, we present experimental results on a relational and a graph DBMS. Note that these do not constitute an official LDBC Benchmark Result – only audited results can use this trademarked term
- …