34,115 research outputs found
Active Learning for Hidden Attributes in Networks
In many networks, vertices have hidden attributes, or types, that are
correlated with the networks topology. If the topology is known but these
attributes are not, and if learning the attributes is costly, we need a method
for choosing which vertex to query in order to learn as much as possible about
the attributes of the other vertices. We assume the network is generated by a
stochastic block model, but we make no assumptions about its assortativity or
disassortativity. We choose which vertex to query using two methods: 1)
maximizing the mutual information between its attributes and those of the
others (a well-known approach in active learning) and 2) maximizing the average
agreement between two independent samples of the conditional Gibbs
distribution. Experimental results show that both these methods do much better
than simple heuristics. They also consistently identify certain vertices as
important by querying them early on
Search Efficient Binary Network Embedding
Traditional network embedding primarily focuses on learning a dense vector
representation for each node, which encodes network structure and/or node
content information, such that off-the-shelf machine learning algorithms can be
easily applied to the vector-format node representations for network analysis.
However, the learned dense vector representations are inefficient for
large-scale similarity search, which requires to find the nearest neighbor
measured by Euclidean distance in a continuous vector space. In this paper, we
propose a search efficient binary network embedding algorithm called BinaryNE
to learn a sparse binary code for each node, by simultaneously modeling node
context relations and node attribute relations through a three-layer neural
network. BinaryNE learns binary node representations efficiently through a
stochastic gradient descent based online learning algorithm. The learned binary
encoding not only reduces memory usage to represent each node, but also allows
fast bit-wise comparisons to support much quicker network node search compared
to Euclidean distance or other distance measures. Our experiments and
comparisons show that BinaryNE not only delivers more than 23 times faster
search speed, but also provides comparable or better search quality than
traditional continuous vector based network embedding methods
XML content warehousing: Improving sociological studies of mailing lists and web data
In this paper, we present the guidelines for an XML-based approach for the
sociological study of Web data such as the analysis of mailing lists or
databases available online. The use of an XML warehouse is a flexible solution
for storing and processing this kind of data. We propose an implemented
solution and show possible applications with our case study of profiles of
experts involved in W3C standard-setting activity. We illustrate the
sociological use of semi-structured databases by presenting our XML Schema for
mailing-list warehousing. An XML Schema allows many adjunctions or crossings of
data sources, without modifying existing data sets, while allowing possible
structural evolution. We also show that the existence of hidden data implies
increased complexity for traditional SQL users. XML content warehousing allows
altogether exhaustive warehousing and recursive queries through contents, with
far less dependence on the initial storage. We finally present the possibility
of exporting the data stored in the warehouse to commonly-used advanced
software devoted to sociological analysis
- …