5 research outputs found

    Development of a large-scale neuroimages and clinical variables data atlas in the neuGRID4You (N4U) project

    Get PDF
    © 2015 Elsevier Inc.. Exceptional growth in the availability of large-scale clinical imaging datasets has led to the development of computational infrastructures that offer scientists access to image repositories and associated clinical variables data. The EU FP7 neuGRID and its follow on neuGRID4You (N4U) projects provide a leading e-Infrastructure where neuroscientists can find core services and resources for brain image analysis. The core component of this e-Infrastructure is the N4U Virtual Laboratory, which offers easy access for neuroscientists to a wide range of datasets and algorithms, pipelines, computational resources, services, and associated support services. The foundation of this virtual laboratory is a massive data store plus a set of Information Services collectively called the 'Data Atlas'. This data atlas stores datasets, clinical study data, data dictionaries, algorithm/pipeline definitions, and provides interfaces for parameterised querying so that neuroscientists can perform analyses on required datasets. This paper presents the overall design and development of the Data Atlas, its associated dataset indexing and retrieval services that originated from the development of the N4U Virtual Laboratory in the EU FP7 N4U project in the light of detailed user requirements

    Serije poljoprivede u bazi podataka Republičkog zavoda za statistiku Srbije

    Get PDF
    The objectives of this Paper have been to examine which data on agriculture can be found in the Statistical Office of the Republic of Serbia Database, and what are the possibilities for the use of the Database in the research and analysis of agriculture. The Statistical Office of the Republic of Serbia Database physically represents normalized database formed in DBMS SQL Server. The methodological approach to the Paper subject is primarily related to modelling and the way of using Database. The options of accession, filtering and downloading of data from the Database are explained. The technical characteristics of the Database were described, indicators of agriculture listed and the possibilities of using Database were analysed. We examined whether these possibilities could be improved. It was concluded that improvements were possible, first, by enriching Database with data that are now only available in printed publications of the Office, and then, through methodological and technical improvements by redesigning the Database modelled on cloud founded databases. Also, the application of the achievements of the new multidisciplinary scientific field - Visual Analytics would improve visualization, interactive data analysis and data management.Ciljevi ovog rada bili su da se istraži koji se podaci o poljoprivredi mogu pronaći u Bazi podataka Republičkog zavoda za statistiku Srbije i kakve su mogućnosti za primenu Baze podataka u istraživanju i analizi poljoprivrede. Baza podataka Republičkog zavoda za statistiku fizički predstavlja normalizovanu bazu podataka formiranu u DBMS SQL Serveru. Metodološki pristup temi rada odnosi se na modeliranje i način korišćenja Baze podataka. Objašnjene su opcije pristupanju, filtriranju i preuzimanju podataka iz Baze podataka. Opisane su tehničke karakteristike Baze, nabrojani su indikatori poljoprivrede, analizirane su mogućnosti korišćenja Baze podataka. Ispitano je da li ove mogućnosti mogu da se unaprede kroz poboljšanje načina skladištenja i pristupa podacima. Zaključeno je da su poboljšanja moguća, i to obogaćivanjem Baze podataka podacima iz oblasti poljoprivrede koji su za sada raspoloživi samo u štampanim publikacijama Zavoda, a zatim, kroz metodološka i tehnička poboljšanja, redizajniranjem Baze podataka po ugledu na cloud zasnovane baze podataka. Takođe, primena dostignuća novog multidisciplinarnog naučnog polja - Vizuelne analitike poboljšala bi vizuelizaciju, interaktivnu analizu podataka i upravljanje podacima

    Just-In-Time Data Virtualization: Lightweight Data Management with ViDa

    Get PDF
    As the size of data and its heterogeneity increase, traditional database system architecture becomes an obstacle to data analysis. Integrating and ingesting (loading) data into databases is quickly becoming a bottleneck in face of massive data as well as increasingly heterogeneous data formats. Still, state-of-the-art approaches typically rely on copying and transforming data into one (or few) repositories. Queries, on the other hand, are often ad-hoc and supported by pre-cooked operators which are not adaptive enough to optimize access to data. As data formats and queries increasingly vary, there is a need to depart from the current status quo of static query processing primitives and build dynamic, fully adaptive architectures. We build ViDa, a system which reads data in its raw format and processes queries using adaptive, just-in-time operators. Our key insight is use of virtualization, i.e., abstracting data and manipulating it regardless of its original format, and dynamic generation of operators. ViDa's query engine is generated just-in-time; its caches and its query operators adapt to the current query and the workload, while also treating raw datasets as its native storage structures. Finally, ViDa features a language expressive enough to support heterogeneous data models, and to which existing languages can be translated. Users therefore have the power to choose the language best suited for an analysis

    Toward timely, predictable and cost-effective data analytics

    Get PDF
    Modern industrial, government, and academic organizations are collecting massive amounts of data at an unprecedented scale and pace. The ability to perform timely, predictable and cost-effective analytical processing of such large data sets in order to extract deep insights is now a key ingredient for success. Traditional database systems (DBMS) are, however, not the first choice for servicing these modern applications, despite 40 years of database research. This is due to the fact that modern applications exhibit different behavior from the one assumed by DBMS: a) timely data exploration as a new trend is characterized by ad-hoc queries and a short user interaction period, leaving little time for DBMS to do good performance tuning, b) accurate statistics representing relevant summary information about distributions of ever increasing data are frequently missing, resulting in suboptimal plan decisions and consequently poor and unpredictable query execution performance, and c) cloud service providers - a major winner in the data analytics game due to the low cost of (shared) storage - have shifted the control over data storage from DBMS to the cloud providers, making it harder for DBMS to optimize data access. This thesis demonstrates that database systems can still provide timely, predictable and cost-effective analytical processing, if they use an agile and adaptive approach. In particular, DBMS need to adapt at three levels (to workload, data and hardware characteristics) in order to stabilize and optimize performance and cost when faced with requirements posed by modern data analytics applications. Workload-driven data ingestion is introduced with NoDB as a means to enable efficient data exploration and reduce the data-to-insight time (i.e., the time to load the data and tune the system) by doing these steps lazily and incrementally as a side-effect of posed queries as opposed to mandatory first steps. Data-driven runtime access path decision making introduced with Smooth Scan alleviates suboptimal query execution, postponing the decision on access paths from query optimization, where statistics are heavily exploited, to query execution, where the system can obtain more details about data distributions. Smooth Scan uses access path morphing from one physical alternative to another to fit the observed data distributions, which removes the need for a priori access path decisions and substantially improves the predictability of DBMS. Hardware-driven query execution introduced with Skipper enables the usage of cold storage devices (CSD) as a cost-effective solution for storing the ever increasing customer data. Skipper uses an out-of-order CSD-driven query execution model based on multi-way joins coupled with efficient cache and I/O scheduling policies to hide the non-uniform access latencies of CSD. This thesis advocates runtime adaptivity as a key to dealing with raising uncertainty about workload characteristics that modern data analytics applications exhibit. Overall, the techniques introduced in this thesis through the three levels of adaptivity (workload, data and hardware-driven adaptivity) increase the usability of database systems and the user satisfaction in the case of big data exploration, making low-cost data analytics reality

    Challenges and Opportunities in Self-Managing Scientific Databases

    Get PDF
    Advances in observation instruments and abundance of computational power for simulations encourage scientists to gather and produce unprecedented amounts of increasingly complex data. Organizing data automatically to enable efficient and unobstructed access is pivotal for the scientists. Organizing these vast amounts of complex data, however, is particularly difficult for scientists who have little experience in data management; hence they spend considerable amounts of time dealing with data analysis and computing problems rather than answering scientific questions or developing new hypotheses. Therefore scientific experiments are in many ways ideal targets for research in self-managing database systems. In this paper, we describe challenges and opportunities for research in automating scientific data management. We first discuss the problems faced in particular scientific domains using concrete examples of large-scale applications from neuroscience and high-energy physics. As we will show, the scientific questions are evolving ever more rapidly while datasets size and complexity increases. Scientists struggle to organize and reorganize the data whenever their hypothesis change and therefore their queries and their data changes as well. We identify research challenges in large-scale scientific data management related to self-management. By addressing these research challenges we can relieve the burden of organizing the data off the scientists, thereby ensuring that they can access it in the most efficient way and ultimately enabling the scientists to focus on their science.
    corecore