26,306 research outputs found
Adaptive imputation of missing values for incomplete pattern classification
In classification of incomplete pattern, the missing values can either play a
crucial role in the class determination, or have only little influence (or
eventually none) on the classification results according to the context. We
propose a credal classification method for incomplete pattern with adaptive
imputation of missing values based on belief function theory. At first, we try
to classify the object (incomplete pattern) based only on the available
attribute values. As underlying principle, we assume that the missing
information is not crucial for the classification if a specific class for the
object can be found using only the available information. In this case, the
object is committed to this particular class. However, if the object cannot be
classified without ambiguity, it means that the missing values play a main role
for achieving an accurate classification. In this case, the missing values will
be imputed based on the K-nearest neighbor (K-NN) and self-organizing map (SOM)
techniques, and the edited pattern with the imputation is then classified. The
(original or edited) pattern is respectively classified according to each
training class, and the classification results represented by basic belief
assignments are fused with proper combination rules for making the credal
classification. The object is allowed to belong with different masses of belief
to the specific classes and meta-classes (which are particular disjunctions of
several single classes). The credal classification captures well the
uncertainty and imprecision of classification, and reduces effectively the rate
of misclassifications thanks to the introduction of meta-classes. The
effectiveness of the proposed method with respect to other classical methods is
demonstrated based on several experiments using artificial and real data sets
From Data Fusion to Knowledge Fusion
The task of {\em data fusion} is to identify the true values of data items
(eg, the true date of birth for {\em Tom Cruise}) among multiple observed
values drawn from different sources (eg, Web sites) of varying (and unknown)
reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of
various fusion methods on Deep Web data. In this paper, we study the
applicability and limitations of different fusion techniques on a more
challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true
subject-predicate-object triples extracted by multiple information extractors
from multiple information sources. These extractors perform the tasks of entity
linkage and schema alignment, thus introducing an additional source of noise
that is quite different from that traditionally considered in the data fusion
literature, which only focuses on factual errors in the original sources. We
adapt state-of-the-art data fusion techniques and apply them to a knowledge
base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B
Web pages, which is three orders of magnitude larger than the data sets used in
previous data fusion papers. We show great promise of the data fusion
approaches in solving the knowledge fusion problem, and suggest interesting
research directions through a detailed error analysis of the methods.Comment: VLDB'201
Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy
Expert finding is an information retrieval task concerned with the search for
the most knowledgeable people, in some topic, with basis on documents
describing peoples activities. The task involves taking a user query as input
and returning a list of people sorted by their level of expertise regarding the
user query. This paper introduces a novel approach for combining multiple
estimators of expertise based on a multisensor data fusion framework together
with the Dempster-Shafer theory of evidence and Shannon's entropy. More
specifically, we defined three sensors which detect heterogeneous information
derived from the textual contents, from the graph structure of the citation
patterns for the community of experts, and from profile information about the
academic experts. Given the evidences collected, each sensor may define
different candidates as experts and consequently do not agree in a final
ranking decision. To deal with these conflicts, we applied the Dempster-Shafer
theory of evidence combined with Shannon's Entropy formula to fuse this
information and come up with a more accurate and reliable final ranking list.
Experiments made over two datasets of academic publications from the Computer
Science domain attest for the adequacy of the proposed approach over the
traditional state of the art approaches. We also made experiments against
representative supervised state of the art algorithms. Results revealed that
the proposed method achieved a similar performance when compared to these
supervised techniques, confirming the capabilities of the proposed framework
Self-Organizing Hierarchical Knowledge Discovery by an Artmap Information Fusion System
Classifying terrain or objects may require the resolution of conflicting information from sensors working at different times, locations, and scales, and from users with different goals and situations. Current fusion methods can help resolve such inconsistencies, as when evidence variously suggests that an object is a car, a truck, or an airplane. The methods described here define a complementary approach to the information fusion problem, considering the case where sensors and sources arc both nominally inconsistent and reliable, as when evidence suggests that an object is a car, a vehicle, and man-made. Underlying relationships among classes are assumed to be unknown to the automated system or the human user. The ARTMAP self-organizing rule discovery procedure is illustrated with an image example, but is not limited to the image domain.Air Force Office of Scientific Research (F49620-0 1-1-0423); National Geospatial-Intelligence Agency (NMA 201-01-1-2016, NMA 501-03-1-2030); National Science Foundation (SBE-0354378, DGE-0221680); Office of Naval Research (N00014-01-1-0624
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
- …