Search CORE

24 research outputs found

Information Extraction from Social Media

Author: Khin Nwe Ni Tun
San San Nwe
Publication venue
Publication date: 17/02/2017
Field of study

With the proliferation of social media sites, such as Twitter, Facebook, and LinkedIn, social streams have proven to contain the most up-to-date information on current events. Therefore, it is crucial to extract activities or events from the social streams, such as tweets and it become an ongoing research trend. Most approaches that aim at extracting event information from twitter typically use the context of messages. However, exploiting the location information of geo-referenced messages and the profile data are also important because tweet messages are short, fragmented and noisy, and therefore not include complete information about the events. For this, in this paper, a framework for event-extraction and categorization from Twitter is proposed. To extract the localized related activities, several mining mechanisms and cleaning techniques is used for real-time twitter corpus and various language processing approaches is applied for categorization the events and then the system will display the valuable information for the targeted domain

MERAL Portal

Information Extraction from Social Media

Author: Nwe San San
Tun Khin Nwe Ni
Publication venue
Publication date: 16/02/2017
Field of study

With the proliferation of social media sites, suchas Twitter, Facebook, and LinkedIn, social streamshave proven to contain the most up-to-date informationon current events. Therefore, it is crucial to extractactivities or events from the social streams, such astweets and it become an ongoing research trend. Mostapproaches that aim at extracting event informationfrom twitter typically use the context of messages.However, exploiting the location information of georeferencedmessages and the profile data are alsoimportant because tweet messages are short,fragmented and noisy, and therefore not includecomplete information about the events. For this, in thispaper, a framework for event-extraction andcategorization from Twitter is proposed. To extract thelocalized related activities, several mining mechanismsand cleaning techniques is used for real-time twittercorpus and various language processing approaches isapplied for categorization the events and then thesystem will display the valuable information for thetargeted domain

MERAL Portal

Information Extraction from Social Media

Author: Khin Nwe Ni Tun
San San Nwe
Publication venue
Publication date: 17/02/2017
Field of study

MERAL Portal

Conversion of XML schema to data warehouse schema

Author: Tun Khin Nwe Ni
Wai Khin Htar
Publication venue
Publication date: 27/12/2017
Field of study

Data warehousing technology aims atproviding support for decision making foroperational data. Defining a data warehouse fordata stored in XML (Extensible MarkupLanguage) format should be addressed as variousorganizationsuse XML to facilitate the promotionof their businesses. XML is a worldwide standardto represent data in web based system. Numbers oforganizations use XML for e-commerce andinternet based applications. This paper describesthe system that convert the XML schema into datawarehouse schema. XML schema for Russian Dollapproach is used as input and schema graph isconverted. And then fact, dimension tables arebeing identified from the schema graph. Datawarehouse schema is extracted from these fact,dimension tables and their relationship

MERAL Portal

Enhanced Matrix based Frequent Accessed Pages Mining Algorithm

Author: Aung Tharyar
Tun Khin Nwe Ni
Publication venue
Publication date: 27/12/2017
Field of study

In computer science and Web miningwhich is an interdisciplinary subfield of computerscience, is the computational process ofdiscovering patterns in web log files. The overallgoal of the web mining process is to extractinformation from web log files and transform itinto an understandable structure for further use.In Web mining, Apriori is a classic algorithm forlearning association rules. This classicalalgorithm is inefficient due to so many scans ofdatabase and takes too much time to scan thedatabase. In this paper we will build a method toobtain the frequent page-item set by using adifferent approach to the classical Apriorialgorithm but based on it and applying theconcept of transaction reduction and a new matrixmethod

MERAL Portal

Clustering XML Document Based On Path Similarities Using Structure Only

Author: Mon Ei Ei
Tun Khin Nwe Ni
Publication venue
Publication date: 30/12/2009
Field of study

We propose a methodology for clustering XMLdocuments on the basis of their structuralsimilarities. This research combines the methods ofcommon XPath and K-means clustering that improvethe efficiency for those XML documents with manydifferent structures. The common XPath is used forsearching similarities between huge numbers of XMLdocuments’ paths. K-means clustering algorithm isessentially used to accurate clusters. In order tocluster the documents’ paths we indicate the steps bystep methods. The first step includes frequentstructure mining for searching similarities betweenthe huge amounts of XML documents’ structures byusing the F-P growth method. The second step buildsdimensional feature vector matrix by using extractedpaths. Based on the set of common path vectorscollected, we compute the structure similaritybetween the XML documents. And the last steputilizes the K-means clustering algorithm is used tocreate accurate clusters which are based on the ideaof using path based clustering, which groups thedocuments according to their common XPaths, i.e.their frequent structures. The quality of clusteringcan be measured on the dissimilarity of documentstructures. Also, experimental evaluation performedon both synthetic and real data shows theeffectiveness of our approach

MERAL Portal

Clustering Homogeneous and Heterogeneous XML Documents by Summarizing Edge

Author: Thu May Myat
Tun Khin Nwe Ni
Publication venue
Publication date: 27/12/2017
Field of study

Extensible Markup Language(XML) is amarkup language that defines a set of rules forencoding documents in format that is both humanreliableand machine-reliable. Large amount ofXML documents on the web require the developedclustering techniques to group.In this system,XEdgeclustering algorithm is applied for clusteringof the homogeneous and heterogeneous XMLdocuments in order to utilize in searchengine.LevelStructure and LevelEdge aregenerated from tree and then calculate similarityand distance metrics.X-Edge provides a structurerepresentation of XML documents based on edgessummaries.Finally, the outputs of the system areclusters for homogeneous and heterogeneous XMLdocuments. The advantage of this system is thatthe output clusters can be applied in the searchengine

MERAL Portal

Query Processing for RDF data Using XML Repository

Author: Hnin Win Lai
Tun Khin Nwe Ni
Publication venue
Publication date: 05/05/2011
Field of study

The Semantic Web, which represents a web ofknowledge, offers us new opportunities to search forknowledge and information. To harvest such searchpower requires robust and scalable datarepositories that can store RDF data. Most of theexisting RDF storage techniques rely on relationmodel and relational database technologies forthese tasks. The mis-match between the graphmodel of the RDF data and the rigid 2D tables ofrelational model jeopardizes the scalability of suchrepositories and frequently renders a repositoryinefficient for some types of data and queries. Inthis paper, we propose a system that can store RDFdata in the XML repository. This system serializesRDF data into RDF/XML and then maps into aXML document. We discuss the basic idea ofserializing RDF data into RDF/XML and thenmapping of RDF/XML to XML document

MERAL Portal

Using Case-Based Reasoning to Hotel Recommendation System

Author: Lynn War War
Tun Khin Nwe Ni
Publication venue
Publication date: 03/08/2009
Field of study

Web-based application is currently a hotresearch and development area. Thousands of applications havebeen made available on the Web, but the problem is that most ofthem are nothing more than a network of static hypertext pages.This system aimed at supporting a leisure traveler to staycomfortable of selecting a hotel, selecting a hotel for weddingand to do a meeting or conference. The system enables theuser to identity his/her own destination and to personalize thehotel by aggregating elementary items (additional locations tovisit, services and activities). To reduce time for finding hoteland to give an immediate answer to the user by exploring thecumulative experiences from previous user’s answers for thebenefit of new . This system represents a hotel recommendationsystem by using Case-based Reasoning. Case-based reasoningis an artificial intelligence approach to learning and problemsolving based on experience. To retrieve match cases from thecase based that is similar to the user’s choose by using nearestneighbor algorithm technique. From the system, user can chooserelevant hotel according to their needs and interests activitiesand reduce time for the user

MERAL Portal

Evaluation of Binarization Methods for Aged Printed Myanmar Documents

Author: Phyo Aye Su
Tun Khin Nwe Ni
Publication venue
Publication date: 25/02/2016
Field of study

Binarization is one of sub phases ofpreprocessing step of optical character recognition(OCR). Binarization is separation of foreground textfrom background of document image. The accuracyof OCR mainly relies on binarization’s result. Thispaper compares several alternative binarizationalgorithms for aged printed Myanmar documents.The algorithms evaluated are global thresholding(Otsu), Local thresholding (Niblack, Sauvola, Wolf,Feng and Nick). It is found that the binarized imagesmore stable if filters (Wiener and Gaussian) are priorused before applying binarization algorithms.Another one is that local thresholding is suit for agedMyanmar documents. Among local thresholding,Niblack, Sauvola and Wolf are the more suitablealgorithms based on the experimental results. Thequality of binarized images is verified by usingdifferent assessment parameters like mean squareerror (MSE), signal to noise ratio (SNR) and peaksignal to noise ratio (PSNR). This work aims to getthe high accuracy of recognition steps with the mainobjective of developing OCR of aged printedMyanmar documents

MERAL Portal