24 research outputs found
Information Extraction from Social Media
With the proliferation of social media
sites, such as Twitter, Facebook, and LinkedIn,
social streams have proven to contain the most
up-to-date information on current events.
Therefore, it is crucial to extract activities or
events from the social streams, such as tweets
and it become an ongoing research trend. Most
approaches that aim at extracting event
information from twitter typically use the context
of messages. However, exploiting the location
information of geo-referenced messages and the
profile data are also important because tweet
messages are short, fragmented and noisy, and
therefore not include complete information about
the events. For this, in this paper, a framework
for event-extraction and categorization from
Twitter is proposed. To extract the localized
related activities, several mining mechanisms
and cleaning techniques is used for real-time
twitter corpus and various language processing
approaches is applied for categorization the
events and then the system will display the
valuable information for the targeted domain
Information Extraction from Social Media
With the proliferation of social media sites, suchas Twitter, Facebook, and LinkedIn, social streamshave proven to contain the most up-to-date informationon current events. Therefore, it is crucial to extractactivities or events from the social streams, such astweets and it become an ongoing research trend. Mostapproaches that aim at extracting event informationfrom twitter typically use the context of messages.However, exploiting the location information of georeferencedmessages and the profile data are alsoimportant because tweet messages are short,fragmented and noisy, and therefore not includecomplete information about the events. For this, in thispaper, a framework for event-extraction andcategorization from Twitter is proposed. To extract thelocalized related activities, several mining mechanismsand cleaning techniques is used for real-time twittercorpus and various language processing approaches isapplied for categorization the events and then thesystem will display the valuable information for thetargeted domain
Information Extraction from Social Media
With the proliferation of social media
sites, such as Twitter, Facebook, and LinkedIn,
social streams have proven to contain the most
up-to-date information on current events.
Therefore, it is crucial to extract activities or
events from the social streams, such as tweets
and it become an ongoing research trend. Most
approaches that aim at extracting event
information from twitter typically use the context
of messages. However, exploiting the location
information of geo-referenced messages and the
profile data are also important because tweet
messages are short, fragmented and noisy, and
therefore not include complete information about
the events. For this, in this paper, a framework
for event-extraction and categorization from
Twitter is proposed. To extract the localized
related activities, several mining mechanisms
and cleaning techniques is used for real-time
twitter corpus and various language processing
approaches is applied for categorization the
events and then the system will display the
valuable information for the targeted domain
Conversion of XML schema to data warehouse schema
Data warehousing technology aims atproviding support for decision making foroperational data. Defining a data warehouse fordata stored in XML (Extensible MarkupLanguage) format should be addressed as variousorganizationsuse XML to facilitate the promotionof their businesses. XML is a worldwide standardto represent data in web based system. Numbers oforganizations use XML for e-commerce andinternet based applications. This paper describesthe system that convert the XML schema into datawarehouse schema. XML schema for Russian Dollapproach is used as input and schema graph isconverted. And then fact, dimension tables arebeing identified from the schema graph. Datawarehouse schema is extracted from these fact,dimension tables and their relationship
Enhanced Matrix based Frequent Accessed Pages Mining Algorithm
In computer science and Web miningwhich is an interdisciplinary subfield of computerscience, is the computational process ofdiscovering patterns in web log files. The overallgoal of the web mining process is to extractinformation from web log files and transform itinto an understandable structure for further use.In Web mining, Apriori is a classic algorithm forlearning association rules. This classicalalgorithm is inefficient due to so many scans ofdatabase and takes too much time to scan thedatabase. In this paper we will build a method toobtain the frequent page-item set by using adifferent approach to the classical Apriorialgorithm but based on it and applying theconcept of transaction reduction and a new matrixmethod
Clustering XML Document Based On Path Similarities Using Structure Only
We propose a methodology for clustering XMLdocuments on the basis of their structuralsimilarities. This research combines the methods ofcommon XPath and K-means clustering that improvethe efficiency for those XML documents with manydifferent structures. The common XPath is used forsearching similarities between huge numbers of XMLdocuments’ paths. K-means clustering algorithm isessentially used to accurate clusters. In order tocluster the documents’ paths we indicate the steps bystep methods. The first step includes frequentstructure mining for searching similarities betweenthe huge amounts of XML documents’ structures byusing the F-P growth method. The second step buildsdimensional feature vector matrix by using extractedpaths. Based on the set of common path vectorscollected, we compute the structure similaritybetween the XML documents. And the last steputilizes the K-means clustering algorithm is used tocreate accurate clusters which are based on the ideaof using path based clustering, which groups thedocuments according to their common XPaths, i.e.their frequent structures. The quality of clusteringcan be measured on the dissimilarity of documentstructures. Also, experimental evaluation performedon both synthetic and real data shows theeffectiveness of our approach
Clustering Homogeneous and Heterogeneous XML Documents by Summarizing Edge
Extensible Markup Language(XML) is amarkup language that defines a set of rules forencoding documents in format that is both humanreliableand machine-reliable. Large amount ofXML documents on the web require the developedclustering techniques to group.In this system,XEdgeclustering algorithm is applied for clusteringof the homogeneous and heterogeneous XMLdocuments in order to utilize in searchengine.LevelStructure and LevelEdge aregenerated from tree and then calculate similarityand distance metrics.X-Edge provides a structurerepresentation of XML documents based on edgessummaries.Finally, the outputs of the system areclusters for homogeneous and heterogeneous XMLdocuments. The advantage of this system is thatthe output clusters can be applied in the searchengine
Query Processing for RDF data Using XML Repository
The Semantic Web, which represents a web ofknowledge, offers us new opportunities to search forknowledge and information. To harvest such searchpower requires robust and scalable datarepositories that can store RDF data. Most of theexisting RDF storage techniques rely on relationmodel and relational database technologies forthese tasks. The mis-match between the graphmodel of the RDF data and the rigid 2D tables ofrelational model jeopardizes the scalability of suchrepositories and frequently renders a repositoryinefficient for some types of data and queries. Inthis paper, we propose a system that can store RDFdata in the XML repository. This system serializesRDF data into RDF/XML and then maps into aXML document. We discuss the basic idea ofserializing RDF data into RDF/XML and thenmapping of RDF/XML to XML document
Using Case-Based Reasoning to Hotel Recommendation System
Web-based application is currently a hotresearch and development area. Thousands of applications havebeen made available on the Web, but the problem is that most ofthem are nothing more than a network of static hypertext pages.This system aimed at supporting a leisure traveler to staycomfortable of selecting a hotel, selecting a hotel for weddingand to do a meeting or conference. The system enables theuser to identity his/her own destination and to personalize thehotel by aggregating elementary items (additional locations tovisit, services and activities). To reduce time for finding hoteland to give an immediate answer to the user by exploring thecumulative experiences from previous user’s answers for thebenefit of new . This system represents a hotel recommendationsystem by using Case-based Reasoning. Case-based reasoningis an artificial intelligence approach to learning and problemsolving based on experience. To retrieve match cases from thecase based that is similar to the user’s choose by using nearestneighbor algorithm technique. From the system, user can chooserelevant hotel according to their needs and interests activitiesand reduce time for the user
Evaluation of Binarization Methods for Aged Printed Myanmar Documents
Binarization is one of sub phases ofpreprocessing step of optical character recognition(OCR). Binarization is separation of foreground textfrom background of document image. The accuracyof OCR mainly relies on binarization’s result. Thispaper compares several alternative binarizationalgorithms for aged printed Myanmar documents.The algorithms evaluated are global thresholding(Otsu), Local thresholding (Niblack, Sauvola, Wolf,Feng and Nick). It is found that the binarized imagesmore stable if filters (Wiener and Gaussian) are priorused before applying binarization algorithms.Another one is that local thresholding is suit for agedMyanmar documents. Among local thresholding,Niblack, Sauvola and Wolf are the more suitablealgorithms based on the experimental results. Thequality of binarized images is verified by usingdifferent assessment parameters like mean squareerror (MSE), signal to noise ratio (SNR) and peaksignal to noise ratio (PSNR). This work aims to getthe high accuracy of recognition steps with the mainobjective of developing OCR of aged printedMyanmar documents