188 research outputs found
Study and Performance Analysis of Different Techniques for Computing Data Cubes
Data is an integrated form of observable and recordable facts in operational or transactional systems in the data warehouse. Usually, data warehouse stores aggregated and historical data in multi-dimensional schemas. Data only have value to end-users when it is formulated and represented as information. And Information is a composed collection of facts for decision making. Cube computation is the most efficient way for answering this decision making queries and retrieve information from data. Online Analytical Process (OLAP) used in this purpose of the cube computation. There are two types of OLAP: Relational Online Analytical Processing (ROLAP) and Multidimensional Online Analytical Processing (MOLAP). This research worked on ROLAP and MOLAP and then compare both methods to find out the computation times by the data volume. Generally, a large data warehouse produces an extensive output, and it takes a larger space with a huge amount of empty data cells. To solve this problem, data compression is inevitable. Therefore, Compressed Row Storage (CRS) is applied to reduce empty cell overhead
Spatial Data Quality in the IoT Era:Management and Exploitation
Within the rapidly expanding Internet of Things (IoT), growing amounts of spatially referenced data are being generated. Due to the dynamic, decentralized, and heterogeneous nature of the IoT, spatial IoT data (SID) quality has attracted considerable attention in academia and industry. How to invent and use technologies for managing spatial data quality and exploiting low-quality spatial data are key challenges in the IoT. In this tutorial, we highlight the SID consumption requirements in applications and offer an overview of spatial data quality in the IoT setting. In addition, we review pertinent technologies for quality management and low-quality data exploitation, and we identify trends and future directions for quality-aware SID management and utilization. The tutorial aims to not only help researchers and practitioners to better comprehend SID quality challenges and solutions, but also offer insights that may enable innovative research and applications
When and where do you want to hide? Recommendation of location privacy preferences with local differential privacy
In recent years, it has become easy to obtain location information quite
precisely. However, the acquisition of such information has risks such as
individual identification and leakage of sensitive information, so it is
necessary to protect the privacy of location information. For this purpose,
people should know their location privacy preferences, that is, whether or not
he/she can release location information at each place and time. However, it is
not easy for each user to make such decisions and it is troublesome to set the
privacy preference at each time. Therefore, we propose a method to recommend
location privacy preferences for decision making. Comparing to existing method,
our method can improve the accuracy of recommendation by using matrix
factorization and preserve privacy strictly by local differential privacy,
whereas the existing method does not achieve formal privacy guarantee. In
addition, we found the best granularity of a location privacy preference, that
is, how to express the information in location privacy protection. To evaluate
and verify the utility of our method, we have integrated two existing datasets
to create a rich information in term of user number. From the results of the
evaluation using this dataset, we confirmed that our method can predict
location privacy preferences accurately and that it provides a suitable method
to define the location privacy preference
A Survey of Imbalanced Learning on Graphs: Problems, Techniques, and Future Directions
Graphs represent interconnected structures prevalent in a myriad of
real-world scenarios. Effective graph analytics, such as graph learning
methods, enables users to gain profound insights from graph data, underpinning
various tasks including node classification and link prediction. However, these
methods often suffer from data imbalance, a common issue in graph data where
certain segments possess abundant data while others are scarce, thereby leading
to biased learning outcomes. This necessitates the emerging field of imbalanced
learning on graphs, which aims to correct these data distribution skews for
more accurate and representative learning outcomes. In this survey, we embark
on a comprehensive review of the literature on imbalanced learning on graphs.
We begin by providing a definitive understanding of the concept and related
terminologies, establishing a strong foundational understanding for readers.
Following this, we propose two comprehensive taxonomies: (1) the problem
taxonomy, which describes the forms of imbalance we consider, the associated
tasks, and potential solutions; (2) the technique taxonomy, which details key
strategies for addressing these imbalances, and aids readers in their method
selection process. Finally, we suggest prospective future directions for both
problems and techniques within the sphere of imbalanced learning on graphs,
fostering further innovation in this critical area.Comment: The collection of awesome literature on imbalanced learning on
graphs: https://github.com/Xtra-Computing/Awesome-Literature-ILoG
Detection of Communities within the Multibody System Dynamics Network and Analysis of Their Relations
Multibody system dynamics is already a well developed branch of theoretical, computational and applied mechanics. Thousands of documents can be found in any of the well-known scientific databases. In this work it is demonstrated that multibody system dynamics is built of many thematic communities. Using the Elsevier’s abstract and citation database SCOPUS, a massive amount of data is collected and analyzed with the use of the open source visualization tool Gephi. The information is represented as a large set of nodes with connections to study their graphical distribution and explore geometry and symmetries. A randomized radial symmetry is found in the graphical representation of the collected information. Furthermore, the concept of modularity is used to demonstrate that community structures are present in the field of multibody system dynamics. In particular, twenty-four different thematic communities have been identified. The scientific production of each community is analyzed, which allows to predict its growing rate in the next years. The journals and conference proceedings mainly used by the authors belonging to the community as well as the cooperation between them by country are also analyzed
Study and Performance Analysis of Different Techniques for Computing Data Cubes
Data is an integrated form of observable and recordable facts in operational or transactional systems in the data warehouse. Usually, data warehouse stores aggregated and historical data in multi-dimensional schemas. Data only have value to end-users when it is formulated and represented as information. And Information is a composed collection of facts for decision making. Cube computation is the most efficient way for answering this decision making queries and retrieve information from data. Online Analytical Process (OLAP) used in this purpose of the cube computation. There are two types of OLAP: Relational Online Analytical Processing (ROLAP) and Multidimensional Online Analytical Processing (MOLAP). This research worked on ROLAP and MOLAP and then compare both methods to find out the computation times by the data volume. Generally, a large data warehouse produces an extensive output, and it takes a larger space with a huge amount of empty data cells. To solve this problem, data compression is inevitable. Therefore, Compressed Row Storage (CRS) is applied to reduce empty cell overhead
- …