119 research outputs found
Multi-graph-view subgraph mining for graph classification
© 2015, Springer-Verlag London. In this paper, we formulate a new multi-graph-view learning task, where each object to be classified contains graphs from multiple graph-views. This problem setting is essentially different from traditional single-graph-view graph classification, where graphs are collected from one single-feature view. To solve the problem, we propose a cross graph-view subgraph feature-based learning algorithm that explores an optimal set of subgraphs, across multiple graph-views, as features to represent graphs. Specifically, we derive an evaluation criterion to estimate the discriminative power and redundancy of subgraph features across all views, with a branch-and-bound algorithm being proposed to prune subgraph search space. Because graph-views may complement each other and play different roles in a learning task, we assign each view with a weight value indicating its importance to the learning task and further use an optimization process to find optimal weight values for each graph-view. The iteration between cross graph-view subgraph scoring and graph-view weight updating forms a closed loop to find optimal subgraphs to represent graphs for multi-graph-view learning. Experiments and comparisons on real-world tasks demonstrate the algorithm’s superior performance
From Large Language Models to Databases and Back: A discussion on research and education
This discussion was conducted at a recent panel at the 28th International
Conference on Database Systems for Advanced Applications (DASFAA 2023), held
April 17-20, 2023 in Tianjin, China. The title of the panel was "What does LLM
(ChatGPT) Bring to Data Science Research and Education? Pros and Cons". It was
moderated by Lei Chen and Xiaochun Yang. The discussion raised several
questions on how large language models (LLMs) and database research and
education can help each other and the potential risks of LLMs.Comment: 7 pages, 2 figures, the Panel at the 28th International Conference on
Database Systems for Advanced Applications (DASFAA 2023
A Survey on Cross-domain Recommendation: Taxonomies, Methods, and Future Directions
Traditional recommendation systems are faced with two long-standing
obstacles, namely, data sparsity and cold-start problems, which promote the
emergence and development of Cross-Domain Recommendation (CDR). The core idea
of CDR is to leverage information collected from other domains to alleviate the
two problems in one domain. Over the last decade, many efforts have been
engaged for cross-domain recommendation. Recently, with the development of deep
learning and neural networks, a large number of methods have emerged. However,
there is a limited number of systematic surveys on CDR, especially regarding
the latest proposed methods as well as the recommendation scenarios and
recommendation tasks they address. In this survey paper, we first proposed a
two-level taxonomy of cross-domain recommendation which classifies different
recommendation scenarios and recommendation tasks. We then introduce and
summarize existing cross-domain recommendation approaches under different
recommendation scenarios in a structured manner. We also organize datasets
commonly used. We conclude this survey by providing several potential research
directions about this field
Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services
It is universal to see people obtain knowledge on micro-blog services by
asking others decision making questions. In this paper, we study the Jury
Selection Problem(JSP) by utilizing crowdsourcing for decision making tasks on
micro-blog services. Specifically, the problem is to enroll a subset of crowd
under a limited budget, whose aggregated wisdom via Majority Voting scheme has
the lowest probability of drawing a wrong answer(Jury Error Rate-JER). Due to
various individual error-rates of the crowd, the calculation of JER is
non-trivial. Firstly, we explicitly state that JER is the probability when the
number of wrong jurors is larger than half of the size of a jury. To avoid the
exponentially increasing calculation of JER, we propose two efficient
algorithms and an effective bounding technique. Furthermore, we study the Jury
Selection Problem on two crowdsourcing models, one is for altruistic
users(AltrM) and the other is for incentive-requiring users(PayM) who require
extra payment when enrolled into a task. For the AltrM model, we prove the
monotonicity of JER on individual error rate and propose an efficient exact
algorithm for JSP. For the PayM model, we prove the NP-hardness of JSP on PayM
and propose an efficient greedy-based heuristic algorithm. Finally, we conduct
a series of experiments to investigate the traits of JSP, and validate the
efficiency and effectiveness of our proposed algorithms on both synthetic and
real micro-blog data.Comment: VLDB201
IJA: An Efficient Algorithm for Query Processing in Sensor Networks
One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm
- …