79 research outputs found
Efficient Ranked Keyword Using AML
Entity Recognition is process of identifying predefined entities such as person names, products, or locations in a given docu ment. This is done by finding all possible substrings from a document that match any reference in the given entity dictionary. Approximate Membership Extraction (AME) method was used for finding all substrings in a given document that can approximately match any c lean references but it generates many redundant matched substrings because of approximation (rough calculation), thus rendering AME is not suitable for real - world tasks based on entity extraction. We propose a web - based join framework which combines a web search along with the approximate membership localization. Our process first provides a top n number of documents fetched from the web using a general search using the given query and then approximate membership localization(AML) is applied on these documents using the clear reference table and extra cts the entities form the document to form the intermediate reference table using Edit distance Vector, Score Correlation
Name Disambiguation from link data in a collaboration graph using temporal and topological features
In a social community, multiple persons may share the same name, phone number
or some other identifying attributes. This, along with other phenomena, such as
name abbreviation, name misspelling, and human error leads to erroneous
aggregation of records of multiple persons under a single reference. Such
mistakes affect the performance of document retrieval, web search, database
integration, and more importantly, improper attribution of credit (or blame).
The task of entity disambiguation partitions the records belonging to multiple
persons with the objective that each decomposed partition is composed of
records of a unique person. Existing solutions to this task use either
biographical attributes, or auxiliary features that are collected from external
sources, such as Wikipedia. However, for many scenarios, such auxiliary
features are not available, or they are costly to obtain. Besides, the attempt
of collecting biographical or external data sustains the risk of privacy
violation. In this work, we propose a method for solving entity disambiguation
task from link information obtained from a collaboration network. Our method is
non-intrusive of privacy as it uses only the time-stamped graph topology of an
anonymized network. Experimental results on two real-life academic
collaboration networks show that the proposed method has satisfactory
performance.Comment: The short version of this paper has been accepted to ASONAM 201
Entity Identity Reconciliation based Big Data Federation A MDE approach
“Information is power” is a sentence attributed to Francis Bacon that acquired a high important in the current era of the information. However, too much information can be a negative aspect. The term of “Infoxication” refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information. With the increasing of relevance of open data and big database, the application of mechanisms and solutions to manage information is critical. This paper introduces the problem of unique identification and data reconciliation and offers a discussion about how to solve this problem in big and open data environment. The problem of data reconciliation in multiple databases and the unique identification of entities is not a new problem, but, how effective are classical mechanisms in the new internet environment? In this paper a solution based on model-driven engineering and virtual graph is presented in order to improve the processing of information in big open repositories. The paper illustrates the idea with a real example for the right exploitation of heritage information in the south of Spain.Ministerio de Ciencia e Innovación TIN2013-46928-C3-3-
Handling instance coreferencing in the KnoFuss architecture
Finding RDF individuals that refer to the same real-world entities but have different URIs is necessary for the efficient use of data across sources. The requirements for such instance-level integration of RDF data are different from both database record linkage and ontology schema matching scenarios. Flexible configuration and reuse of different methods is needed to achieve good performance. Our data integration architecture, called KnoFuss, implements a component-based approach, which allows flexible selection and tuning of methods and takes the ontological schemata into account to improve the reusability of methods
Entity Identity Reconciliation based Big Data Federation-A MDE approach
“Information is power” is a sentence attributed to Francis Bacon that acquired a high important in the current era of the information. However, too much information can be a negative aspect. The term of “Infoxication” refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information. With the increasing of relevance of open data and big database, the application of mechanisms and solutions to manage information is critical. This paper introduces the problem of unique identification and data reconciliation and offers a discussion about how to solve this problem in big and open data environment. The problem of data reconciliation in multiple databases and the unique identification of entities is not a new problem, but, how effective are classical mechanisms in the new internet environment? In this paper a solution based on model-driven engineering and virtual graph is presented in order to improve the processing of information in big open repositories. The paper illustrates the idea with a real example for the right exploitation of heritage information in the south of Spain
- …