    Distinguishing Noise and Main Text Content from Web-Sourced Plain Text Documents Using Sequential Neural Networks

    Boilerplate removal and the identification of the actual textual content is a crucial step in web corpus creation. However, existing methods don’t always filter out the noise perfectly and are often not applicable for plain text corpora. In this thesis, I will develop machine learning methods to identify the main textual content in plain text documents. I will utilize transfer learning and pretrained language models as a base for training monolingual models with French and Swedish data as well as a multilingual model with French, Swedish, English, Finnish, German and Spanish data. I will compare two machine learning architectures based on the XLM-RoBERTa language model: first a classification model built on top of the pretrained XLM-RoBERTa model and a second model using an additional Long Short-Term Memory (LSTM) network layer. I will show that the LSTM layer improves the classification of the XLM-RoBERTa model and the built multilingual model performs well even with data in unseen languages. I will perform a further analysis on the results and show that the results of the boilerplate detection with the trained models differ with text varieties. Certain types of text documents, such as lyrical texts or discussion forum texts pose challenges in boilerplate detection, and it would be beneficial for future research to focus on gathering data that has been difficult to clean

    The spindle assembly checkpoint as a drug target - Novel small-molecule inhibitors of Aurora kinases

    Cell division (mitosis) is a fundamental process in the life cycle of a cell. Equal distribution of chromosomes between the daughter cells is essential for the viability and well-being of an organism: loss of fidelity of cell division is a contributing factor in human cancer and also gives rise to miscarriages and genetic birth defects. For maintaining the proper chromosome number, a cell must carefully monitor cell division in order to detect and correct mistakes before they are translated into chromosomal imbalance. For this purpose an evolutionarily conserved mechanism termed the spindle assembly checkpoint (SAC) has evolved. The SAC comprises a complex network of proteins that relay and amplify mitosis-regulating signals created by assemblages called kinetochores (KTs). Importantly, minor defects in SAC signaling can cause loss or gain of individual chromosomes (aneuploidy) which promotes tumorigenesis while complete failure of SAC results in cell death. The latter event has raised interest in discovery of low molecular weight (LMW) compounds targeting the SAC that could be developed into new anti-cancer therapeutics. In this study, we performed a cell-based, phenotypic high-throughput screen (HTS) to identify novel LMW compounds that inhibit SAC function and result in loss of cancer cell viability. Altogether, we screened 65 000 compounds and identified eight that forced the cells prematurely out of mitosis. The flavonoids fisetin and eupatorin, as well as the synthetic compounds termed SACi2 and SACi4, were characterized in more detail utilizing versatile cell-based and biochemical assays. To identify the molecular targets of these SAC-suppressing compounds, we investigated the conditions in which SAC activity became abrogated. Eupatorin, SACi2 and SACi4 preferentially abolished the tensionsensitive arm of the SAC, whereas fisetin lowered also the SAC activity evoked by lack of attachments between microtubules (MTs) and KTs. Consistent with the abrogation of SAC in response to low tension, our data indicate that all four compounds inhibited the activity of Aurora B kinase. This essential mitotic protein is required for correction of erratic MT-KT attachments, normal SAC signaling and execution of cytokinesis. Furthermore, eupatorin, SACi2 and SACi4 also inhibited Aurora A kinase that controls the centrosome maturation and separation and formation of the mitotic spindle apparatus. In line with the established profound mitotic roles of Aurora kinases, these small compounds perturbed SAC function, caused spindle abnormalities, such as multi- and monopolarity and fragmentation of centrosomes, and resulted in polyploidy due to defects in cytokinesis. Moreover, the compounds dramatically reduced viability of cancer cells. Taken together, using a cell-based HTS we were able to identify new LMW compounds targeting the SAC. We demonstrated for the first time a novel function for flavonoids as cellular inhibitors of Aurora kinases. Collectively, our data support the concept that loss of mitotic fidelity due to a non-functional SAC can reduce the viability of cancer cells, a phenomenon that may possess therapeutic value and fuel development of new anti-cancer drugs.Siirretty Doriast

    ANCA-associated vasculitis : studies on clinical presentation and factors involving disease activity and outcome

    Aims. The aim of this study was to investigate the factors associated with long-term prognosis and disease activity in patients with anti-neutrophil cytoplasm autoantibodies (ANCA)-associated vasculitis (AAV). An additional aim was to define the coagulation and fibrinolysis profile of renal AAV patients. Methods. Four cohorts including patients with granulomatosis with polyangiitis (GPA), microscopic polyangiitis (MPA) and renal-limited AAV were investigated to achieve these aims. Long-term prognosis and relapses were assessed retrospectively in a Finnish cohort of 85 patients with renal biopsy-proven AAV from a single centre (Study I). The associations between chronic nasal Staphylococcus aureus carriage (CNSAC) and proteinase 3 (PR3) ANCA with relapse were studied in AAV patients who participated in two randomised controlled trials in Europe. To define nasal CNSAC status, monthly nasal swabs were obtained from 200 patients with early systemic or generalised disease during the 18-month trials. The patient was defined as a chronic carrier of Staphylococcus aureus (S. aureus) when ≥ 75% of at least four nasal cultures were positive for S. aureus (Study II). PR3-ANCA levels, which were examined via nine different enzyme-linked immunosorbent assays (ELISAs), were obtained monthly during the 18-month trial in 28 patients with early systemic GPA. PR3-ANCA peaks were identified by the highest sum of logarithmic transformation values from all assays (Study III). The coagulation profile was assessed prospectively in 21 Finnish patients with renal AAV in the active versus the remission phase of disease and further compared with that of 40 patients with other renal diseases. The laboratory analysis consisted of platelet count, thrombin time, antithrombin activities, fibrinogen, factor VIII activity (FVIIIC), von Willebrand factor antigen (VWF:Ag) and ristocetin cofactor activity (VWF:RCo), prothrombin fragments (F 1 + 2), D-dimer and antiphospholipid antibodies (Study IV). Results. The 5-year and 20-year patient survival rates were 88% and 45%, respectively. Older age and presence of myeloperoxidase (MPO) ANCA were significantly associated with worsened survival. The 5-year and 20-year renal survival rates were 79% and 68%, respectively. Renal survival was best in a focal class and worst in the sclerotic class of AAV glomerulonephritis. Female sex was significantly associated with better renal survival, while a glomerular filtration rate < 30 ml/min and MPO-ANCA predicted worse renal survival. Relapse-free survival at 5 years was 47% while at 20 years it was only 10%. Patients with GPA had higher relapse risk compared with MPA patients. (Study I). The frequency of CNSAC was 12% in the whole cohort. CNSAC was almost exclusively seen in GPA patients. In patients with generalised GPA, the association with CNSAC and relapse was observed. Also, in early systemic GPA, in those patients who were under immunosuppressive treatment, a similar trend for significant association was found (Study II). A PR3-ANCA peak corresponded to relapse. However, the PR3-ANCA peak could also be identified in non-relapsing patients, and large overlaps in PR3-ANCA values prevented drawing a distinction between relapsing patients and non-relapsing patients. The alterations of immunosuppression were reflected in PR3-ANCA levels (Study III). F 1 + 2 and D-dimer were substantially elevated during active disease. During remission, their levels decreased considerably, even though D-dimer levels remained above the reference value. FVIIIC, VWF:Ag and VFW:RCo levels were high during active AAV and remained elevated during remission. The load of coagulopathies during remission was comparable to that of patients with other renal diseases involving at least moderate renal impairment. No antiphospholipid antibodies were found. Among AAV patients, two thromboembolic complications were observed (Study IV). Conclusions. In a long-term follow-up cohort, patient and renal survival were comparable with recent studies showing improved prognosis as compared to earlier reports. Both patient and renal survival were negatively predicted by the presence of MPO-ANCA. The development of end-stage renal disease was more common in men. In the long run, relapses were common, especially in patients with GPA. One special subgroup of individuals who were more prone to relapse among GPA patients were those with CNSAC. PR3-ANCA levels were not only affected by disease activity but also reflected the level of immunosuppressive treatment. Active renal AAV was characterised by enhanced coagulation and fibrinolysis, which failed to normalise completely during remission.Tausta ja tavoitteet. ANCA-vaskuliitit ovat ryhmä verrattain harvinaisia sairauksia, joita luonnehtivat pienten verisuonten seinämän tulehdus ja kuolio. ANCA-vaskuliitit voidaan jaotella kliinisen ilmiasun perusteella alaryhmiin. Granulomatoottinen polyangiitti (GPA) ja mikroskooppinen polyangiitti (MPA) muodostavat valtaosan tapauksista. Tautivaiheiden vaihtelu, eli jo rauhoittuneen taudin uusiminen, on leimallista näille taudeille. ANCA-vaskuliittipotilailla on suurentunut riski saada veritulppa. Tämän väitöstutkimuksen ensisijaisena tarkoituksena oli selvittää GPA- ja MPA-potilaiden ennusteeseen ja taudin aktiviteettiin vaikuttavia tekijöitä. Tutkimuksessa selvitettiin myös ANCA-vaskuliittipotilaiden veren hyytymisprofiilia suhteessa tautiaktiviteettiin. Menetelmät. Pitkäaikaisennustetta, eli potilaiden eloonjääntiä, munuaiskorvaushoitoon joutumista ja taudin uusimista, tutkittiin suomalaisessa munuaistautia sairastavassa ANCA-vaskuliittipotilaista koostuvassa aineistossa (N=85) (Tutkimus I). Kroonista Staphylococcus aureus-nenäkantajuutta (KSANK) ja sen yhteyttä taudin uusiutumis- eli relapsiriskiin tutkittiin ANCA-vaskuliittipotilailla (N=200), jotka olivat osallistuneet kahteen eurooppalaiseen hoitotutkimukseen. KSANK-määritelmä edellytti, että vähintään 75 %:a kuukausittain nenän limakalvoilta otetuista bakteeriviljelyistä oli positiivisia Staphylococcus aureuksen suhteen 18 kuukautta kestävän tutkimuksen aikana (Tutkimus II). Tautiaktiviteetin ja PR3-ANCA-vasta-ainetasojen yhteyttä tutkittiin niin ikään eurooppalaiseen hoitotutkimukseen osallistuneilla GPA-potilailla (N=28). Käytössä oli yhdeksän entsyymi-immunologista testiä ja PR3-ANCA-vasta-aineille määritettiin testikohtaisesti korkein taso (Tutkimus III). Veren hyytymisaktiviteettia tutkittiin suomalaisessa aineistossa, joka koostui munuaistautia sairastavista ANCA-vaskuliittipotilaista (N=21) ja muita munuaistauteja sairastavista verrokkipotilaista (N=40). Tutkimuksessa määritettiin verihiutalepitoisuus, plasman trombiiniaika, antitrombiiniaktiviteetti, fibrinogeeni, hyytymistekijä VIII:n aktiviteetti, von Willebrand-tekijän antigeeni ja ristosetiinikofaktorin aktiviteetti, protrombiinifragmentit, D-dimeeri sekä fosfolipidivasta-aineet (Tutkimus IV). Tulokset. Elossaolo-osuus oli 88 % viiden vuoden kuluttua ja 45 % kahdenkymmenen vuoden jälkeen. Kahdenkymmenen vuoden kuluttua 32 % potilaista oli pysyvästi munuaiskorvaushoidossa. Potilailla, joilla oli diagnoosivaiheessa todettavissa MPO-ANCA-vasta-aineita, oli heikompi ennuste sekä eloonjäännin että munuaistoiminnan suhteen. Miehet päätyivät naisia useammin munuaiskorvaushoitoon. Vain 10 % potilaista vältti taudin uusimisen kahdenkymmenen vuoden seurannassa ja relapsi oli tavallisempi GPA-potilailla (Tutkimus I). Krooninen Staphylococcus aureus nenäkantajuus todettiin 12 %:lla potilaista ja valtaosa heistä oli GPA-potilaita. Nenäkantajilla oli suurempi relapsi- eli taudin uusimisriski (Tutkimus II). Relapoivilla potilailla (N=16) PR3-ANCA-vasta-ainetason huippu osui taudin relapsiajankohtaan. Kuitenkin myös niillä potilailla, joilla vaskuliitti pysyi rauhallisena, todettiin samankaltainen PR3-vasta-ainetasojen nousu immunosupressiivisen lääkityksen vähentyessä (Tutkimus III). Diagnoosivaiheen ANCA-vaskuliittia sairastavilla todettiin protromboottinen tila. Erityisesti trombiinin muodostus ja kohonnut D-dimeeri vallitsivat aktiivisessa taudissa. Taudin rauhoituttua hyytymisalttius väheni, mutta edelleen erityisesti hyytymistekijä VIII aktiviteetti oli koholla. Seuranta-aikana kahdella vaskuliittipotilaalla todettiin veritulppa (Tutkimus IV). Johtopäätökset. Suomalaisten ANCA-vaskuliittipotilaiden pitkäaikaisseurannassa voitiin todeta viimeaikaisiin kansainvälisiin tuloksiin vertautuva parantunut ennuste sekä kuolleisuuden että munuaistoiminnan suhteen. Potilailla, joilla oli diagnoosivaiheessa todettavissa MPO-ANCA-vasta-aineita, oli heikompi ennuste sekä eloonjäännin että munuaistoiminnan suhteen. Miehet päätyivät naisia useammin munuaiskorvaushoitoon. Taudin uusiminen oli pitkäaikaisseurannassa yleistä erityisesti GPA-potilailla. Erityisesti ne GPA-potilaat, joilla oli krooninen Staphylococcus aureus-nenäkantajuus, olivat alttiita saamaan tautirelapsin. PR3-ANCA-vasta-ainetasot heijastivat tautiaktiviteetin lisäksi myös immunosupressiivisen lääkityksen tasoa. Aktiivisessa, diagnoosivaiheen ANCA-vaskuliitissa veren hyytymisaktiviteetti oli korostunutta eikä se normaalistunut täydellisesti tautiaktiviteetin rauhoituttua

    Maatalouden muutokset ja niiden ympäristövaikutukset Saaristomeren valuma-alueella

    Raportissa käsitellään maatalouden muutoksia sekä näihin liittyviä ympäristövaikutuksia ja -ongelmia ratkaisuvaihtoehtoineen Saaristomeren valuma-alueen eri osissa. Selvitys on osa Lounais-Suomen ympäristökeskuksen hallinnoimaa MUUSA- eli ”Muuttuvan maatalouden ympäristönsuojelu Saaristomeren valuma-alueella” –yhteistyöhanketta, joka on toteutettu Varsinais-Suomen TE-keskuksen ELMA-rahoituksella. Selvitys sisältää yhteenvedon maatalouden muutosennusteista, jotka on koottu 2000-luvulla julkaistuista kotimaisista tutkimusraporteista sekä 2006 julkaistun viljelijäkyselyn tuloksista. Näissä alueellisia ennusteita ja tuloksia on kuitenkin esitetty vain tukialueittain (A- tai B-alue) tai TE-keskuksittain. Saaristomeren valuma-alueen viljelijöiden käsityksiä sekä eri osa-alueilla mahdollisesti esiintyviä eroja selvitettiin talvella 2006 – 2007 järjestetyllä viljelijäkyselyllä. Siihen vastasi 10 % alueen viljelijöistä. Kysely toteutettiin ympäristötuen koulutustilaisuuksissa, laivaseminaareissa ja sokerijuurikkaanviljelijöiden kokouksissa. Kysely tehtiin 51 kysymystä sisältävällä lomakkella, jossa viljelijöitä pyydettiin arvioimaan viiden vuoden kuluessa tapahtuvaa muutosta sekä omalla tilallaan että omassa asuinkunnassaan. Vastauksia käsiteltiin osa-alueittain sekä viljelijöiden ikäryhmien ja tilakoon mukaisesti. Kyselystä saadut tulokset vastasivat hyvin 2000-luvulla tehtyjä ennusteita, eikä vastauksissa ilmennyt kovinkaan suuria eroja eri osa-alueiden välillä. Maataloudessa voidaan Saaristomeren valuma-alueella nähdä selkeät kehityssuunnat. Tuotanto keskittyy alueellisesti ja yksikkökoot kasvavat sekä kasvinviljely- että kotieläintiloilla, samoin koneiden koko. Kuluja pyritään vähentämään kustannustehokkailla ja työtä vähentävillä tekniikoilla. Ravinnekuormituksen ja eroosion arvioidaan pienenevän väkilannoituksen vähenemisen ja tarkentumisen sekä uusien viljelytekniikoiden ja lisääntyvän kasvipeitteisyyden myötä. Energia- ja öljykasvien viljelyn ennustetaan lisääntyvän. Kotieläintuotannon keskittyminen ja peltojen korkea fosforitila sekä näistä johtuva lannan levitysalan puute tuottavat ongelmia erityisesti Vakka-Suomessa. Lannan ennustetaan muuallakin leviävän yhä laajemmille alueille, mikä lisää kuljetusmatkoja ja saattaa lisätä esim. hajuhaittoja ellei lantaa käsitellä hajuttomammaksi. Viljelijöillä on selvästi kiinnostusta ympäristönsuojelun edistämiseen mm. suojavyöhykkeiden, kosteikkojen ja maiseman hoidon avulla. Yleisesti kaivataan kuitenkin sopivien toimenpiteiden yksilöllistä ja tilakohtaista suunnittelua. Viljelijäkyselyn tuloksia käsiteltiin alueellisissa kokouksissa ja erilaisissa tilaisuuksissa vuoden 2007 aikana. Näiden näkemysten perusteella tehtiin toimenpide-ehdotukset maatalouden ympäristönsuojelun painopisteiksi viidelle eri osa-alueelle: 1) saaristo, 2) Vakka-Suomi, 3) Aurajoen alue ja sen lähialue, 4) Paimionjoen alue sekä 5) Salon ja Kiskon-Pernionjoen alueet

    New and poorly known Holarctic species of Boletina Staeger, 1840 (Diptera, Mycetophilidae)

    AbstractBackgroundThe genus Boletina is a species rich group of fungus gnats. Members of the genus are mainly known from temperate, boreal and arctic biomes. Phylogeny of the genus is still poorly resolved, dozens of species are insufficiently described and undescribed species are often discovered, especially from samples taken from the boreal zone.New informationFour new species are described. Boletina valteri Salmela sp.n. (Finland), Boletina kullervoi Salmela sp.n. (Finland), B. hyperborea Salmela sp.n. (Finland, Norway, Sweden, Canada) and B. nuortti Salmela sp.n. (Finland). Boletina arctica Holmgren is redescribed and reported for the first time from the Canadian high arctic zone. Boletina borealis Zetterstedt and B. birulai Lundström are reported for the first time from Canada. Boletina subnitidula Sasakawa (syn. n.) is proposed as a junior synonym of B. pallidula Edwards.</p

    Urban environment predisposes dogs and their owners to allergic symptoms

    Our companion-animals, dogs, suffer increasingly from non-communicable diseases, analogous to those common in humans, such as allergic manifestations. In humans, living in rural environments is associated with lower risk of allergic diseases. Our aim was to explore whether a similar pattern can be found in dogs, using a nation-wide survey in Finland (n = 5722). We characterised the land-use around dog's home at the time of birth as well as around its current home, and described several lifestyle factors. The severity of owner-reported allergic symptoms in dogs was estimated with a comprehensive set of questions, developed by experts of canine dermatology. Also, the prevalence of diagnosed allergies in dog owners was recorded. The results indicate that allergic symptoms are more prevalent in urban environments both in dog owners and in dogs (accounting the effect of dog breed). Several factors related to rural living, such as bigger family size and regular contact with farm animals and other pets, were also protective against allergic symptoms in dogs. Interestingly, allergic dogs were more likely to have allergic owners than healthy dogs were. Therefore, we suggest that the mutual presence of allergic symptoms in both species indicates common underlying causal factors of allergic diseases.Peer reviewe

    The Fish Value Network and Business Ecosystem in Finland

    Finland’s geography largely consists of coastline, seas and lakes, and fish has traditionally been a natural part of the local economy. A versatile ecosystem has developed around fish in Finland. Sustainable fishing, fish farming and fish processing and trade provide not only a climate friendly means of food production but also economic growth to various business actors. There is demand for innovative fish-based products. Finland has global expertise in operating in cold climate conditions which makes Finnish industries and ecosystems natural partners in development cooperation aiming at adding the value of Arctic fish. There are many aspects to raising the value-added of fish: developing fishing and fish handling processes, developing new user-friendly products for the customers and looking for new markets for fish as a raw material. This report maps the business and innovation ecosystem of fish in Finland and illustrates the opportunities linked with raising the value-added of fish

    Analysing intervention programmes: barriers and success factors. A systematic review

    The analyses of research into media literacy and digital skills (ML&DS) interventions presented in this report, offer valuable insights into the characteristics, challenges, and factors of success for these interventions in diverse contexts and for a range of outcomes