1,042 research outputs found

    Parameter curation and data generation for benchmarking multimodel queries

    Get PDF
    Peer reviewe

    gMark: Schema-Driven Generation of Graphs and Queries

    Full text link
    Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the experimental study of these systems, it is vital that the research community has shared solutions for the generation of database instances and query workloads having predictable and controllable properties. In this paper, we present the design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator. A core contribution of gMark is its ability to target and control the diversity of properties of both the generated instances and the generated workloads coupled to these instances. Further novelties include support for regular path queries, a fundamental graph query paradigm, and schema-driven selectivity estimation of queries, a key feature in controlling workload chokepoints. We illustrate the flexibility and practical usability of gMark by showcasing the framework's capabilities in generating high quality graphs and workloads, and its ability to encode user-defined schemas across a variety of application domains.Comment: Accepted in November 2016. URL: http://ieeexplore.ieee.org/document/7762945/. in IEEE Transactions on Knowledge and Data Engineering 201

    The Linked Data Benchmark Council (LDBC): Driving competition and collaboration in the graph data management space

    Full text link
    Graph data management is instrumental for several use cases such as recommendation, root cause analysis, financial fraud detection, and enterprise knowledge representation. Efficiently supporting these use cases yields a number of unique requirements, including the need for a concise query language and graph-aware query optimization techniques. The goal of the Linked Data Benchmark Council (LDBC) is to design a set of standard benchmarks that capture representative categories of graph data management problems, making the performance of systems comparable and facilitating competition among vendors. LDBC also conducts research on graph schemas and graph query languages. This paper introduces the LDBC organization and its work over the last decade

    1st INCF Workshop on Sustainability of Neuroscience Databases

    Get PDF
    The goal of the workshop was to discuss issues related to the sustainability of neuroscience databases, identify problems and propose solutions, and formulate recommendations to the INCF. The report summarizes the discussions of invited participants from the neuroinformatics community as well as from other disciplines where sustainability issues have already been approached. The recommendations for the INCF involve rating, ranking, and supporting database sustainability

    AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving

    Full text link
    Unlike humans, who can effortlessly estimate the entirety of objects even when partially occluded, modern computer vision algorithms still find this aspect extremely challenging. Leveraging this amodal perception for autonomous driving remains largely untapped due to the lack of suitable datasets. The curation of these datasets is primarily hindered by significant annotation costs and mitigating annotator subjectivity in accurately labeling occluded regions. To address these limitations, we introduce AmodalSynthDrive, a synthetic multi-task multi-modal amodal perception dataset. The dataset provides multi-view camera images, 3D bounding boxes, LiDAR data, and odometry for 150 driving sequences with over 1M object annotations in diverse traffic, weather, and lighting conditions. AmodalSynthDrive supports multiple amodal scene understanding tasks including the introduced amodal depth estimation for enhanced spatial understanding. We evaluate several baselines for each of these tasks to illustrate the challenges and set up public benchmarking servers. The dataset is available at http://amodalsynthdrive.cs.uni-freiburg.de

    Multi-dimensional data refining strategy for effective fine-tuning LLMs

    Full text link
    Data is a cornerstone for fine-tuning large language models, yet acquiring suitable data remains challenging. Challenges encompassed data scarcity, linguistic diversity, and domain-specific content. This paper presents lessons learned while crawling and refining data tailored for fine-tuning Vietnamese language models. Crafting such a dataset, while accounting for linguistic intricacies and striking a balance between inclusivity and accuracy, demands meticulous planning. Our paper presents a multidimensional strategy including leveraging existing datasets in the English language and developing customized data-crawling scripts with the assistance of generative AI tools. A fine-tuned LLM model for the Vietnamese language, which was produced using resultant datasets, demonstrated good performance while generating Vietnamese news articles from prompts. The study offers practical solutions and guidance for future fine-tuning models in languages like Vietnamese

    The LDBC social network benchmark: Business intelligence workload

    Get PDF
    The Social Network Benchmark’s Business Intelligence workload (SNB BI) is a comprehensive graph OLAP benchmark targeting analytical data systems capable of supporting graph workloads. This paper marks the finalization of almost a decade of research in academia and industry via the Linked Data Benchmark Council (LDBC). SNB BI advances the state-of-the art in synthetic and scalable analytical database benchmarks in many aspects. Its base is a sophisticated data generator, implemented on a scalable distributed infrastructure, that produces a social graph with small-world phenomena, whose value properties follow skewed and correlated distributions and where values correlate with structure. This is a temporal graph where all nodes and edges follow lifespan-based rules with temporal skew enabling realistic and consistent temporal inserts and (recursive) deletes. The query workload exploiting this skew and correlation is based on LDBC’s “choke point”-driven design methodology and will entice technical and scientific improvements in future (graph) database systems. SNB BI includes the first adoption of “parameter curation” in an analytical benchmark, a technique that ensures stable runtimes of query variants across different parameter values. Two performance metrics characterize peak single-query performance (power) and sustained concurrent query throughput. To demonstrate the portability of the benchmark, we present experimental results on a relational and a graph DBMS. Note that these do not constitute an official LDBC Benchmark Result – only audited results can use this trademarked term
    • …
    corecore