4 research outputs found

    A Comparative Study of Dimensionality Reduction Techniques to Enhance Trace Clustering Performances

    Get PDF
    Technology Management/ Information System/ EntrepreneurshipProcess mining aims at extracting useful information from event logs. Recently, in order to improve processes, several organizations such as high-tech companies, hospitals, and municipalities utilize process mining techniques. Real-life process logs from such organizations are usually very large and complicated, since the process logs in general contain numerous activities which are executed by many employees. Furthermore, lots of real-life process logs generate spaghetti-like process models due to the complexity of processes. Traditional process mining techniques have problems with discovering and analyzing real-life process logs which come from less structured processes. To overcome the weaknesses of traditional process mining techniques, a trace clustering has been developed. The trace clustering splits an event log into several subsets, and each subset contains homogenous cases. Even though the trace clustering is useful to handle complex process logs, it is time-consuming and computationally expensive due to a large number of features generated from complex logs. In this thesis, we applied dimensionality reduction (preprocessing) techniques to the trace clustering in order to reduce the number of features. To validate our approach, we conducted experiments to discover relationships between dimensionality reduction techniques and clustering algorithms, and we performed a case study which involves patient treatment processes of a hospital. Among many dimensionality reduction techniques, we used three techniques namely singular value decomposition (SVD), random projection, and principal components analysis (PCA). The result shows that the trace clustering with dimensionality reduction techniques produce higher average fitness values. Furthermore, processing time of trace clustering is effectively reduced with dimensionality reduction techniques. Moreover, we measured similarity between clustering results to observe the degree of changes in clustering results while applying dimensionality reduction techniques. The similarity is resulted differently according to used clustering algorithm.ope

    Ontology development from the encyclopaedic organization of knowledge

    Get PDF
    Globalna mreža, internet, ubrzano se transformira u semantičku mrežu prelazeći od povezivanja dokumenata na povezivanje podataka, odnosno dosadašnji web portali s klasičnim bazama informacija i znanja postaju povezani podaci (engl. linked data) globalnog oblaka (engl. cloud computing). Usporedbom organizacijske strukture tradicionalnih enciklopedija na papirnatom mediju s onima u mrežnom okruženju uočavaju se određene razlike proizašle iz različitih vrsta medija koje omogućavaju nove funkcionalnosti pretraživanja. Promjene kroz koje prolazi enciklopedičko djelo zahtijeva uspostavljanje novog načina modeliranja organizacije enciklopedičkog znanja u mrežnom okruženju koje će svoje utemeljenje pronaći na analizi specifičnosti strukture enciklopedičkog članka uvažavanjem temeljnih postavki semantičkog weba, principa ontologijskog oblikovanja i potreba korisnika kako bi se osigurala njihova dostupnost i bolja iskoristivost. U svrhu očuvanja korisnosti enciklopedije u današnjoj mrežnoj informacijskoj eksploziji potrebno je poboljšati sposobnost predstavljanja njihovog sadržaja na smisleni (semantički) način u mrežnom okruženju. Cilj ove doktorske disertacije je istražiti koji elementi enciklopedičke organizacije znanja mogu pružiti podršku za razvoj ontologije te razviti metodu kojom će se generirati ontologija na osnovi enciklopedički organiziranog znanja. Proučavanjem literature, analizom sličnih ontologijskih modela odabranih svjetskih projekata i znanja pohranjenog u biografskim člancima Hrvatske enciklopedije (HE) iz područja hrvatske književnosti predložio se ontologijski model koji na djelotvoran način opisuje enciklopedičko znanje navedenog područja. Primijenjena je metodologija analize sadržaja odabranih 1170 članaka HE te METHONTOLOGY metoda. Upotrijebio se Protege softver za razvoj ontologije. Klasna hijerarhija ontologije se razvija FCA pristupom te se provodi LSA metoda u svrhu određivanja pojma kao skupa srodnih termina te pripadnosti pojedinih dokumenta (članaka) tom pojmu, čime je omogućena automatska klasifikacija članaka pojedinim ontologijskim klasama. Razvijena ontologija poslužit će za organizaciju, pretraživanje i pregledavanje znanja mrežne HE iz odabranog područja hrvatske književnosti, kao i za dobivanje preciznih odgovora na složena pitanja. Obuhvaćen je veliki broj odnosa potreban za opis produkcije književnika pojedinih nacionalnih književnosti, njihovih životopisa, međusobnih odnosa, odnosa između pojedinih književnih djela i cjelokupnog znanja koje se nalazi u biografskim enciklopedijskim člancima iz područja književnosti čime se omogućuje opis književnosti bilo kojeg naroda. Dobivena ontologija omogućuje uspostavljanje interoperabilnosti i povezivanje s ostalim strukturiranim izvorima enciklopedijskog znanja na semantičkoj mreži (npr. DBpedia), što će omogućiti povezivanje relevantnog i bogatog znanja HE u „globalnu mrežu znanja“ koja nastaje i razvija se kroz projekte semantičke mreže.The global network, the Internet, is hurriedly tranforming into an semantic network by turning from document connection onto data connection, i.e. today's WEB portals with classic information databases are becoming linked data of global computing. By comparing organisational structures of traditional encyclopedias on paper media with electronical encyclopedias in Web surroundings, you can notice certain differences that come from different types of media which enable new functionalities of searching. The above mentioned changes, through which an encyclopedical work passes through, demands an establishment of a new way of modeling organisation of encyclopedical knowledge in a Web surrounding that will find its foundation on the analysis of the specificity of a structure of an encyclopedical chapter by respecting the basic settings of the Semantic Web, principles of ontological shaping and needs of users so their availability and usability would be ensured. With the aim of preserving encyclopedical usability in today's Web information explosion, there is a need of modifying presentation of its content in a meaningful (semantic way) in a Web surrounding. Semantic interoperability means an existence of infrastructure which will enable mechanical interpretation and conclusion about content on the Web. Therefore, the key term of Semantic Web is presented by ontology, the basic component in enabling semantic interoperability. The aim of this doctoral thesis is to find out which elements of encyclopedical knowledge organisation can offer support for the development of ontology and develop a method by which ontology will be generated on the basis of encyclopedically organised knowledge. Developed ontology will be used for organisation, searching and browsing data of Croatia's Web Encyclopedia in the selected field of Croatian literature, as well as receiving precise answers on asked questions. In the introductory part of this work are presented its starting points, goals and methods, as well as the structure of the entire work. The second chapter deals with a theoretical display and clarification os Semantic Web with the aim of its full understanding. The goal of this chapter is to point out the basic theoretical and technical background of Semantic Web, the meaning of the term Semantic Web is explained, basic difference between the Web we know today and its development toward Semantic Web, disadvantages of today's Web and advantages of Semantic Web are explained, basic terms and the architecture of Semantic Web, review of basic ontologic definitions and their main goal and role. The chapter gives a detailed review of basic languages used on the semantic web with actual examples (RDF, RDFS; OWL and SKOS). Some of the more important projects of Semantic Web are shown in the third chapter of the work. During the selection of significant projects, it was considered to choose projects which are significant for better understanding of Semantic Web in the field of encyclopedics. That is why one part of this chapter deals with showing ontologies unavoidable for better understanding of Semantic Web, and the other part of the chapter gives a review of ontologic projects created entirely on encyclopedical knowledge. The mentioned analysis of existing encyclopedical ontological projects shows that no former project did try to connect and research the development of ontology and its constructive elements on the basis of structural organisation of the encyclopedial chapter by researching significance of individual structural elements of an encyclopedical chapter for the development of ontology. The fourth chapter is an introduction to the development of the ontological model of literature and to the basic settings of Protégé software. Elements of standards and ontologic languages (i.e. vocabularies RDFS, OWL, SKOKS) are shown and applied in the development of ontology of Croatian Encyclopedia in the field of Croatian literature. Pointed out was the possibility of reaching interoperability inasmuch individual ontologic resources, as well as overall ontology, with existing semantic ontologic projects on the Semantic Web which will allow conneting relevant and rich knowledge of Croatian Encyclopedia into an „global network of knowledge“ which appears and develops through projects of Semantic Web. The fifth chapter gave an insight into the historical development of encyclopedia in the world so people could completely realize the context through which encyclopedia had to go through in other to gain today's familiar features of a modern encyclopedistic work. The chapter has given basic information about the development of the Croatian central lexicographical institution; Lexicographical institution „Miroslav Krleža“ (LZMK) that does lexicography and encyclopedics of particular interest for the Republic of Croatia. It is shown that this doctoral thesis can contribute to ,with some particular elements , realising the mission and vision of LZMK. Publishing work of basic and expert enyclopedic editions of LZMK is shown, embracing editions on a paper media and those in a web surrounding. By analysing publishing work of web editions, it was found that their substantiation was mostly a matter of adapting traditional organisations of encyclopedic knowledge from paper media to web surroundings. That is why this doctoral thesis will research which uses from applying ontologic principles of semantic web would LZMK have, as well as users of these valuable knowledge sources. The sixth chapter has shown a structural organisation of encyclopedical articles, looking back onto basic types of encyclopedic articles and especially on features of biographic encyclopedial article that is the basis of research of this work. An analysis of type of data which contain encyclopedical biographic articles has been done. The mentioned allowed establishing basic ontologic layers i.e. facets by which ontologic relations will classify. The seventh chapter identifies constructive ontologic elements from encyclopedical biographic article, shows the metodology used in development of ontology, as well as resulting conceptual taxonomy and ontologic relations. The chapter has shown the role of structural elements of encyclopedical biographic article during ontologic development and connected them to corresponding constructive ontologic elements. The results of the research are presented through display of ontologic modules and belonging ontologic relations that can be used in describing a certain term. A display of structure and sequence of elements of encylopedical biographic article has been shown, with developed ontologic features by which can all types of information in the article be stored, from which constant elements have been determined that can be perceived for development of article infoframes, which is suitable for a quick insight into most important information of individual articles. Final results are shown through application of the gained ontology in description of an individual, as well as through possibility of installing complex semantic questions by unstructured data of encyclopedical biographic articles and through the possibility of organising browsing encyclopedical data, which hasn't been possible until now. The eight chapter explained the FCA approach applied during building of ontology so a gathering of conceptual features would be established, by which terms in ontology were defined so classification of terms could be made into a hierarchy. Important definitions were emphasized to understand places of formal term analysis in methodology of creating ontology. An actual example of accomplishing FCA approach was shown in 37 articles of Croatian literature in Croatian Encyclopedia, as well as a transformation of a transformational grid into a formal language of first order logics. Chapter shows the advantages of applying FCA analysis because of generating new and unfamilliar terms which could be hardly established only by handiwork of ontology, because texts specific to the domain do not include any kind of noun phrase for labeling these new terms. The ninth chapter brings out problems of automated indexing methods and information fetching. Theoretically, it shows the LSA method and its application on the example of encyclopedical articles of Croatian Encyclopedia with the goal of learning about its effieciency and goals in building ontology of an certain area. Conducting LSA method on chosen articles shows its utility in assessing the term as a gathering of related terms and affiliations of certain entries of articles (documents) in that matter, by which on the basis of word forms that selected articles consist of allows an automatic classification of articles by individual ontologic classes. The tenth, also final, chapter of this work is a conclusion which combines theoretical and practical part of the work by giving a short review of research results and showing the possibility of establishing interoperability and connecting Linked Data concept of Croatian encyclopedistics with other structural sources of encyclopedical knowledge on the web ( e.g. DBpedia, Freebase, etc.)

    Spotting Topics with the Singular Value Decomposition

    No full text

    Spotting Topics with the Singular Value Decomposition

    No full text
    . The singular value decomposition, or SVD , has been studied in the past as a tool for detecting and understanding patterns in a collection of documents. We show how the matrices produced by the SVD calculation can be interpreted, allowing us to spot patterns of characters that indicate particular topics in a corpus. A test collection, consisting of two days of AP newswire traffic, is used as a running example. 1 Introduction We address the question of how to analyze a large collection of documents to see what topics are discussed (or perhaps just mentioned) in that collection. We assume that the collection is large, perhaps on the order of hundreds of megabytes of text or more. The documents may have been written by many people, perhaps in different languages. New documents may be added to the collection at any time, and we will not know in advance exactly which topics occur in the collection. We assume that a "topic" can be characterized by some set of words or phrases, which by th..