226 research outputs found
A Call to Arms: Revisiting Database Design
Good database design is crucial to obtain a sound, consistent database, and -
in turn - good database design methodologies are the best way to achieve the
right design. These methodologies are taught to most Computer Science
undergraduates, as part of any Introduction to Database class. They can be
considered part of the "canon", and indeed, the overall approach to database
design has been unchanged for years. Moreover, none of the major database
research assessments identify database design as a strategic research
direction.
Should we conclude that database design is a solved problem?
Our thesis is that database design remains a critical unsolved problem.
Hence, it should be the subject of more research. Our starting point is the
observation that traditional database design is not used in practice - and if
it were used it would result in designs that are not well adapted to current
environments. In short, database design has failed to keep up with the times.
In this paper, we put forth arguments to support our viewpoint, analyze the
root causes of this situation and suggest some avenues of research.Comment: Removed spurious column break. Nothing else was change
Research on conceptual modeling: Themes, topics, and introduction to the special issue
Conceptual modeling continues to evolve as researchers and practitioners reflect on the challenges of modeling and implementing data-intensive problems that appear in business and in science. These challenges of data modeling and representation are well-recognized in contemporary applications of big data, ontologies, and semantics, along with traditional efforts associated with methodologies, tools, and theory development. This introduction contains a review of some current research in conceptual modeling and identifies emerging themes. It also introduces the articles that comprise this special issue of papers from the 32nd International Conference on Conceptual Modeling (ER 2013).This article was supported, in part, by the J. Mack Robinson College of Business at the Georgia State University, the Marriott School of Management at Brigham Young University (EB-201313), and by the GEODAS-BI (TIN2012-37493-C03-03) project from the Spanish Ministry of Education and Competitivity
Community detection applied on big linked data
The Linked Open Data (LOD) Cloud has more than tripled its sources in just six years (from 295 sources in 2011 to 1163 datasets in 2017). The actual Web of Data contains more then 150 Billions of triples. We are assisting at a staggering growth in the production and consumption of LOD and the generation of increasingly large datasets. In this scenario, providing researchers, domain experts, but also businessmen and citizens with visual representations and intuitive interactions can significantly aid the exploration and understanding of the domains and knowledge represented by Linked Data. Various tools and web applications have been developed to enable the navigation, and browsing of the Web of Data. However, these tools lack in producing high level representations for large datasets, and in supporting users in the exploration and querying of these big sources. Following this trend, we devised a new method and a tool called H-BOLD (High level visualizations on Big Open Linked Data). H-BOLD enables the exploratory search and multilevel analysis of Linked Open Data. It offers different levels of abstraction on Big Linked Data. Through the user interaction and the dynamic adaptation of the graph representing the dataset, it will be possible to perform an effective exploration of the dataset, starting from a set of few classes and adding new ones. Performance and portability of H-BOLD have been evaluated on the SPARQL endpoint listed on SPARQL ENDPOINT STATUS. The effectiveness of H-BOLD as a visualization tool is described through a user study
Recommended from our members
Data Abstraction for Visualizing Large Time Series
Numeric time series is a class of data consisting of chronologically ordered observations represented by numeric values. Much of the data in various domains, such as financial, medical and scientific, are represented in the form of time series. To cope with the increasing sizes of datasets, numerous approaches for abstracting large temporal data are developed in the area of data mining. Many of them proved to be useful for time series visualization. However, despite the existence of numerous surveys on time series mining and visualization, there is no comprehensive classification of the existing methods based on the needs of visualization designers. We propose a classification framework that defines essential criteria for selecting an abstraction method with an eye to subsequent visualization and support of users' analysis tasks. We show that approaches developed in the data mining field are capable of creating representations that are useful for visualizing time series data. We evaluate these methods in terms of the defined criteria and provide a summary table that can be easily used for selecting suitable abstraction methods depending on data properties, desirable form of representation, behaviour features to be studied, required accuracy and level of detail, and the necessity of efficient search and querying. We also indicate directions for possible extension of the proposed classification framework
Flexibility in Data Management
With the ongoing expansion of information technology, new fields of application requiring data management emerge virtually every day. In our knowledge culture increasing amounts of data and work force organized in more creativity-oriented ways also radically change traditional fields of application and question established assumptions about data management. For instance, investigative analytics and agile software development move towards a very agile and flexible handling of data. As the primary facilitators of data management, database systems have to reflect and support these developments. However, traditional database management technology, in particular relational database systems, is built on assumptions of relatively stable application domains. The need to model all data up front in a prescriptive database schema earned relational database management systems the reputation among developers of being inflexible, dated, and cumbersome to work with. Nevertheless, relational systems still dominate the database market. They are a proven, standardized, and interoperable technology, well-known in IT departments with a work force of experienced and trained developers and administrators.
This thesis aims at resolving the growing contradiction between the popularity and omnipresence of relational systems in companies and their increasingly bad reputation among developers. It adapts relational database technology towards more agility and flexibility. We envision a descriptive schema-comes-second relational database system, which is entity-oriented instead of schema-oriented; descriptive rather than prescriptive. The thesis provides four main contributions: (1)~a flexible relational data model, which frees relational data management from having a prescriptive schema; (2)~autonomous physical entity domains, which partition self-descriptive data according to their schema properties for better query performance; (3)~a freely adjustable storage engine, which allows adapting the physical data layout used to properties of the data and of the workload; and (4)~a self-managed indexing infrastructure, which autonomously collects and adapts index information under the presence of dynamic workloads and evolving schemas. The flexible relational data model is the thesis\' central contribution. It describes the functional appearance of the descriptive schema-comes-second relational database system. The other three contributions improve components in the architecture of database management systems to increase the query performance and the manageability of descriptive schema-comes-second relational database systems. We are confident that these four contributions can help paving the way to a more flexible future for relational database management technology
Recommended from our members
The nature of the course team approach at the UK Open University
This thesis explores the nature of the course team approach at the UK Open University (UKOU) by investigating three issues: the formations of course teams, the process of working together in teams and the development of courses by teams.Adopting the naturalist paradigm, data were collected from three course teams of the UKOU using observations, interviews and documents. Altogether, 42 hours of observations were carried out over six months by observing 14 course team meetings.There were 28 hours of interview data from 21 interviews of 17 interviewees. A range of documents was collected.The study found that the formation of course teams is regulated by course approval protocol, and is derived from the effort of individual members. The responsibility of core academic course team members is vaguely demarcated. Academic's personal attributes are a key to team organisation. Previous experience of working together influences the members' current work in teams.In the process of working together as a team in meetings, the study shows that the .agendas of course team meetings often include practical issues. The course team meetings are flooded with practical concerns with pedagogical concerns remaining in the background.The development of courses by course teams, as this study shows, is framed by the system for course construction established by the University. An awareness of changing external environment contributes to the development of courses. There are differing views on the academic autonomy of academic course team members.Theorising of major findings leads to conclude that. both course teams and their work are contextualised because they interact with systemic, interpersonal, personal and historical contexts. Therefore, the suggestions to the successful adoption of the course team approach emphasize academic's attributes, teamwork and the system for course construction set up by the institution
A Family of Joint Sparse PCA Algorithms for Anomaly Localization in Network Data Streams
Determining anomalies in data streams that are collected and transformed from various types of networks has recently attracted significant research interest. Principal Component Analysis (PCA) is arguably the most widely applied unsupervised anomaly detection technique for networked data streams due to its simplicity and efficiency. However, none of existing PCA based approaches addresses the problem of identifying the sources that contribute most to the observed anomaly, or anomaly localization. In this paper, we first proposed a novel joint sparse PCA method to perform anomaly detection and localization for network data streams. Our key observation is that we can detect anomalies and localize anomalous sources by identifying a low dimensional abnormal subspace that captures the abnormal behavior of data. To better capture the sources of anomalies, we incorporated the structure of the network stream data in our anomaly localization framework. Also, an extended version of PCA, multidimensional KLE, was introduced to stabilize the localization performance. We performed comprehensive experimental studies on four real-world data sets from different application domains and compared our proposed techniques with several state-of-the-arts. Our experimental studies demonstrate the utility of the proposed methods
- …