Search CORE

4 research outputs found

DCC Digital Curation Manual: Instalment on Ontologies

Author: Doerr Martin
Publication venue: HATII, University of Glasgow; University of Edinburgh; UKOLN, University of Bath; Council for the Central Laboratory of the Research Councils.
Publication date: 28/01/2008
Field of study

Instalment on the role of ontologies within the digital curation life-cycle. Describes the increasingly important role of ontologies for digital curation, some practical applications, the topic’s place within the OAIS reference model, and advice on developing institution-specific selection frameworks

Edinburgh Research Archive

Information Needs for Decision-making in Crises

Author: Yliniemi Terhi
Publication venue
Publication date: 11/08/2004
Field of study

The aim of this study is to describe the information needs of the decision-makers in the Finnish Defence Forces. The research problem is studied through the perspective of an international crisis, where the Finnish Defence Forces have to co-operate with other authorities in order to secure the functions vital to society. The results of this study describe the information needed in decision-making. This study is a preliminary study of the decision support system (DSS) project of the Finnish Defence Forces. Thus part of the results is preliminary requirements for the DSS. Nowadays the threat scenarios against the vital functions of the state are various. The threats have expanded beyond classical military threats. To operate in such situations presumes that the Finnish Defence Forces have to have tight co-operation with other authorities. Also they often have to make decisions in co-operation. The problem is that there is often a huge amount of information available for decision-makers. To get only the essential information is critical in decision-making. The main research problem of this study is to define the information that decision-makers on the strategic level of the Finnish Defence Forces need in crises. This study is hermeneutical and it increases understanding of the research topic. The study was conducted by using action-oriented research approach. The data was collected by interviewing fifteen decision-makers of the Finnish Defence Forces. The questions of the interviews were semi-structured and the data of the study was qualitative. The analysis has been done by using induction. Thus the results can be generalized to all decision-making situations under national crises. Decision-making on the strategic level of the Finnish Defence Forces is forwardlooking. The main result of this study is that decision-makers need mostly information that is related to future. For example they need estimations of future events. They also have a need for information about the past and current situations. DSS should support decision-making by providing relevant information. DSS should also enable proactive decision-making by giving time to prepare for decision-making. Getting relevant information is not enough − it is important to achieve understanding of the situation as a whole. The social structures for communication should be created in advance. A good way to create these structures and also ways of co-operation is that the Finnish Defence Forces arrange exercises with other authorities. The other Finnish authorities can benefit from the results of this study as well.Tämän tutkimuksen tavoitteena on lisätä ymmärrystä puolustusvoimien strategisoperatiivisen tason päätöksentekijöiden tietotarpeista. Tutkimusongelmaa tarkastellaan kansallisissa kriisitilanteissa, joissa puolustusvoimat toimii yhteistyössä muiden viranomaisten kanssa yhteiskunnan elintärkeiden toimintojen ja kansalaisten elinmahdollisuuksien turvaamiseksi. Tutkimustuloksena kuvataan tietoja, jotka puolustusvoimien päätöksentekijän tulee tietää voidakseen toimia tilanteen edellyttämällä tavalla. Koska tutkimus toimii puolustusvoimien päätöksenteon tukijärjestelmä -projektin esiselvityksenä, esitetään tuloksissa myös alustavia asiakasvaatimuksia päätöksenteon tukijärjestelmälle. Yhteiskuntaa uhkaavat kriisitilanteet ovat nykypäivänä monimuotoisia. Kriisitilanteisiin varautuminen ja yhteiskunnan elintärkeiden toimintojen suojaaminen edellyttävät puolustusvoimilta saumatonta yhteistoimintaa ja päätöksentekoa muiden viranomaisten kanssa. Toiminnan kannalta kriittistä on oleellisten tietojen saaminen päätöksenteon tueksi. Tämän tutkimuksen pääongelmana on selvittää, mitä tietoa puolustusvoimien strategis-operatiivisen tason päätöksentekijät kriisitilanteissa tarvitsevat. Tutkimus on tieteenkäsitykseltään hermeneuttinen pyrkien lisäämään ymmärrystä tutkimuskohteesta. Tutkimus on toteutettu pääosin toiminta- nalyyttistä tutkimusotetta käyttäen. Empiirisen aineiston keruu toteutettiin haastattelemalla viittätoista puolustusvoimien strategis-operatiivisella tasolla työskentelevää tai työskennellyttä henkilöä. Haastattelut toteutettiin teemahaastatteluina ja aineisto analysoitiin induktiota hyödyntämällä. Tutkimuksen aineisto on laadullinen. Induktion avulla aineisto teemoiteltiin tuloksiksi kuuden teeman alle. Näin tuloksissa päästiin yksittäistapauksista yleistettävämmälle tasolle. Tutkimuksen tuloksena on, että puolustusvoimien strategis-operatiivisen tason päätöksentekijät tarvitsevat päätöksenteon tueksi eniten tulevaisuussuuntautunutta tietoa kuten arvioita. Tulevaisuutta koskevien tietojen lisäksi päätöksentekijät tarvitsevat myös historiaan ja nykyhetkeen liittyviä tietoja. Päätöksenteon tukijärjestelmän tulee mahdollistaa ennakoiva johtaminen antamalla päätöksentekijälle aikaa valmistautua päätöksentekoon. Päätöksentekotilanteessa oleellista on tilanneymmärryksen saavuttaminen. Yhteistoimintaa edellyttävissä tilanteissa kommunikaatiorakenteiden tulisi olla harjoitusten avulla etukäteen luotuja, jotta yhteistoiminnalla olisi edellytykset onnistua. Puolustusvoimien lisäksi tutkimuksesta hyötynevät puolustusvoimien yhteistyöviranomaiset. /Kir0

Trepo - Institutional Repository of Tampere University

Text classification

Author: Mähönen Mika
Publication venue
Publication date: 05/06/2013
Field of study

Tämän diplomityön tarkoituksena oli tutkia tekstin luokittelua ja avainsanojen poimintaa. Tähän tarkasteluun tärkein yksittäinen tekijä on datan rakenne, jonka avulla työssä perusteellaan luokittelun tarpeellisuutta. Informaation etsintään on saatavilla kaksi keskeistä menetelmää, jotka ovat informaation poiminta strukturoimattomasta datasta ja strukturoidun datan käyttöönotto eli metadata. Työssä nämä menetelmät esitellään huolellisesti samalla argumentoiden, minkä tyyppisiä heikkouksia ja vahvuuksia niihin liittyy. Tämän tutkimuksen perusteella saatu lopputulos oli, että molempia menetelmiä tarvitaan osana kokonaisvaltaista sisällönhallintaratkaisua. Sisällöstä kirjattujen avainsanojen ja luokittelun voidaan ajatella olevan sisällöstä saatavilla olevia havaintoja. Näiden havaintojen tarkoitus on tiivistää tekstiä niin, että dokumentin löytäminen on yksinkertaisempaa. Luokittelu ja avainsanojen kerääminen on edellyttänyt perinteisesti ihmistyötä, koska teksti edellyttää tulkintaa. Tämä on myös syy, miksi ihmiset suorittavat edelleen avainsanojen poimintaa ja luokittelua. Tämän prosessin automatisointi voi parantaa monien tietoteknillisen järjestelmien tehokkuutta ja säästää aikaa prosessoitaessa suurta määrää tekstidokumentteja. Aihealuetta työssä tutkitaan esittelemällä toimenpiteet, joita tekstin luokitteluun ja avainsanojen poimintaan tarvitaan. Tämä tutkimus on jaettu NLP-menetelmiin (engl. natural language processing) ja luokittelualgoritmeihin. NLP-tekniikoiden tehtävänä on poistaa haasteita, jotka liittyvät merkkijonojen vertailuun tietokoneen muistissa. Näiden tekniikoiden osalta työssä esitellään kielen tunnistusta, tekstin jakamista avaimiin, sanojen palauttamista perusmuotoon, konseptien mallintamista ja ominaisuuksien valintaa. Luokittelualgoritmien osalta työssä tutkitaan naiivia Bayesian luokittelua ja päätöspuita. Näistä algoritmeista annetaan myös käytännön esimerkki, joka vahvistaa esitellyn teorian käytännössä. Tutkimuksen aikana luokittelujärjestelmissä havaittiin muutamia rajoituksia. Näistä rajoituksista ensimmäinen on, ettei luokittelujärjestelmä omaa ihmiselle tunnusomaisia abstraktiotasoja. Näin ollen tietokone ei pysty yhdistämään esimerkiksi sanoja auto ja ajoneuvo toisiinsa. Toinen löydetty rajoite oli, ettei sanojen sijaintia huomioida tekstissä. Löydetyistä rajoitteista huolimatta, monet algoritmit toimivat todellisuudessa varsin hyvin. Tämä on todennettu myös useissa tieteellisissä julkaisuissa. Työssä luokittelua ja avainsanojen keräämistä tutkittiin myös käytännön ympäristössä eräässä Suomessa toimivassa pankki- ja vakuutusyhtiössä. Tässä projektissa hyödynnettiin IBM:n Content Classification Modulea, joka käyttöönotettiin asiakasympäristössä. Tämän projektin osalta työssä esitellään saatuja kokemuksia ja muutama parannusehdotus nykyiseen järjestelmään. Projektista saatujen kokemusten perusteella tuote todettiin käyttökelpoiseksi tekstin luokitteluun ja avainsanojen poimintaan.The purpose of this master's thesis was to study text classification and keyword extraction methods. Data structure is the most important factor when one considers, how important information can be located from a vast amount of data. There are two ways to approach locating relevant information: the first one relies on unstructured data and the second one on structured information which is known as metadata. These methods are carefully introduced with their advantages and disadvantages to the argument of why classification and keywords are needed with data warehouses. Conclusion of this study was that both approaches are required as a part of a comprehensive content management solution. Keywords and text classification can be seen as a limited amount of observations from the text content. In fact the purpose of keywords and text classification is to provide all the necessary information. This information can then be used to locate documents that satisfy our information needs. Classification and keyword extraction process has traditionally required human interpretation known as cognition which computers do not have. Cognition has been the main reason why humans are still required in this process. To have this process automated could enhance functionality of many computer systems and save time while processing large amount of data. This matter is studied by introducing operations that are required to classify a text document and extract its keywords. This subject is divided into natural language processing and text classification algorithms. The aim of natural language processing is to remove challenges that arise from comparison of character strings in the memory of a computer. The following natural language techniques were studied: language recognition, text tokenization, lemmatization, stemming, concept modeling and feature selection algorithms. This thesis introduces two classification algorithms which are naive Bayes and decision trees. An example is given of both of them to proof theories in practice. Conclusion of this study was that the studied text algorithms have few limitations. The first limitation is that computers do not have similar understanding of words occurred in text. For example humans are able to automatically connect the word car to vehicle while computers are not. The second limitation is that word position in the text is not taken into account. Despite limitations found from classification algorithms, they do work relatively well and it has been proven by many scientific studies and publications. Keyword extraction and text classification were studied in practice. This part of study was carried out for a company that operates within the insurance and bank sector in Finland. During the project IBM's product Content Classification Module was commissioned in use. Conclusion of the project was that the studied product works very well in practice. Based on this project a few improvements were found and they are being introduced to the customer

Trepo - Institutional Repository of Tampere University

Formal ontology for subject

Author: Christopher A. Welty
Jessica Jenkins
Publication venue
Publication date: 01/01/1999
Field of study

Subject-based classification is an important part of information retrieval, and has a long history in libraries, where a subject taxonomy was used to determine the location of books on the shelves. We have been studying the notion of subject itself, in order to determine a formal ontology of subjects for a large-scale digital library card catalog system. Deep analysis reveals a lot of ambiguity regarding the usage of subjects in existing systems and terminology, and we attempt to formalize these notions into a single framework for representing it

CiteSeerX