74 research outputs found

    Automated Identification of National Implementations of European Union Directives With Multilingual Information Retrieval Based On Semantic Textual Similarity

    Get PDF
    The effective transposition of European Union (EU) directives into Member States is important to achieve the policy goals defined in the Treaties and secondary legislation. National Implementing Measures (NIMs) are the legal texts officially adopted by the Member States to transpose the provisions of an EU directive. The measures undertaken by the Commission to monitor NIMs are time-consuming and expensive, as they resort to manual conformity checking studies and legal analysis. In this thesis, we developed a legal information retrieval system using semantic textual similarity techniques to automatically identify the transposition of EU directives into the national law at a fine-grained provision level. We modeled and developed various text similarity approaches such as lexical, semantic, knowledge-based, embeddings-based and concept-based methods. The text similarity systems utilized both textual features (tokens, N-grams, topic models, word and paragraph embeddings) and semantic knowledge from external knowledge bases (EuroVoc, IATE and Babelfy) to identify transpositions. This thesis work also involved the development of a multilingual corpus of 43 directives and their corresponding NIMs from Ireland (English legislation), Italy (Italian legislation) and Luxembourg (French legislation) to validate the text similarity based information retrieval system. A gold standard mapping (prepared by two legal researchers) between directive articles and NIM provisions was prepared to evaluate the various text similarity models. The results show that the lexical and semantic text similarity techniques were more effective in identifying transpositions as compared to the embeddings-based techniques. We also observed that the unsupervised text similarity techniques had the best performance in case of the Luxembourg Directive-NIM corpus. We also developed a concept recognition system based on conditional random fields (CRFs) to identify concepts in European directives and national legislation. The results indicate that the concept recognitions system improved over the dictionary lookup program by tagging the concepts which were missed by dictionary lookup. The concept recognition system was extended to develop a concept-based text similarity system using word-sense disambiguation and dictionary concepts. The performance of the concept-based text similarity measure was competitive with the best performing text similarity measure. The labeled corpus of 43 directives and their corresponding NIMs was utilized to develop supervised text similarity systems by using machine learning classifiers. We modeled three machine learning classifiers with different textual features to identify transpositions. The results show that support vector machines (SVMs) with term frequency-inverse document frequency (TF-IDF) features had the best overall performance over the multilingual corpus. Among the unsupervised models, the best performance was achieved by TF-IDF Cosine similarity model with macro average F-score of 0.8817, 0.7771 and 0.6997 for the Luxembourg, Italian and Irish corpus respectively. These results demonstrate that the system was able to identify transpositions in different national jurisdictions with a good performance. Thus, it has the potential to be useful as a support tool for legal practitioners and Commission officials involved in the transposition monitoring process

    Automated Identification of National Implementations of European Union Directives with Multilingual Information Retrieval based on Semantic Textual Similarity

    Get PDF
    The effective transposition of European Union (EU) directives into Member States is important to achieve the policy goals defined in the Treaties and secondary legislation. National Implementing Measures (NIMs) are the legal texts officially adopted by the Member States to transpose the provisions of an EU directive. The measures undertaken by the Commission to monitor NIMs are time-consuming and expensive, as they resort to manual conformity checking studies and legal analysis. In this thesis, we developed a legal information retrieval system using semantic textual similarity techniques to automatically identify the transposition of EU directives into the national law at a fine-grained provision level. We modeled and developed various text similarity approaches such as lexical, semantic, knowledge-based, embeddings-based and concept-based methods. The text similarity systems utilized both textual features (tokens, N-grams, topic models, word and paragraph embeddings) and semantic knowledge from external knowledge bases (EuroVoc, IATE and Babelfy) to identify transpositions. This thesis work also involved the development of a multilingual corpus of 43 directives and their corresponding NIMs from Ireland (English legislation), Italy (Italian legislation) and Luxembourg (French legislation) to validate the text similarity based information retrieval system. A gold standard mapping (prepared by two legal researchers) between directive articles and NIM provisions was prepared to evaluate the various text similarity models. The results show that the lexical and semantic text similarity techniques were more effective in identifying transpositions as compared to the embeddings-based techniques. We also observed that the unsupervised text similarity techniques had the best performance in case of the Luxembourg Directive-NIM corpus

    Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives

    Get PDF
    The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets

    Canadian oncogenic human papillomavirus cervical infection prevalence: Systematic review and meta-analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Oncogenic human papillomavirus (HPV) infection prevalence is required to determine optimal vaccination strategies. We systematically reviewed the prevalence of oncogenic cervical HPV infection among Canadian females prior to immunization.</p> <p>Methods</p> <p>We included studies reporting DNA-confirmed oncogenic HPV prevalence estimates among Canadian females identified through searching electronic databases (e.g., MEDLINE) and public health websites. Two independent reviewers screened literature results, abstracted data and appraised study quality. Prevalence estimates were meta-analyzed among routine screening populations, HPV-positive, and by cytology/histology results.</p> <p>Results</p> <p>Thirty studies plus 21 companion reports were included after screening 837 citations and 120 full-text articles. Many of the studies did not address non-response bias (74%) or use a representative sampling strategy (53%).</p> <p>Age-specific prevalence was highest among females aged < 20 years and slowly declined with increasing age. Across all populations, the highest prevalence estimates from the meta-analyses were observed for HPV types 16 (routine screening populations, 8 studies: 8.6% [95% confidence interval 6.5-10.7%]; HPV-infected, 9 studies: 43.5% [28.7-58.2%]; confirmed cervical cancer, 3 studies: 48.8% [34.0-63.6%]) and 18 (routine screening populations, 8 studies: 3.3% [1.5-5.1%]; HPV-infected, 9 studies: 13.6% [6.1-21.1%], confirmed cervical cancer, 4 studies: 17.1% [6.4-27.9%].</p> <p>Conclusion</p> <p>Our results support vaccinating females < 20 years of age, along with targeted vaccination of some groups (e.g., under-screened populations). The highest prevalence occurred among HPV types 16 and 18, contributing a combined cervical cancer prevalence of 65.9%. Further cancer protection is expected from cross-protection of non-vaccine HPV types. Poor study quality and heterogeneity suggests that high-quality studies are needed.</p

    A Bayesian Approach for Forecasting Heat Load in a District Heating System

    Full text link
    The growing population in cities increases the energy demand and affects the environment by increasing carbon emissions. Information and communications technology solutions which enable energy optimization are needed to address this growing energy demand in cities and to reduce carbon emissions. District heating systems optimize the energy production by reusing waste energy with combined heat and power plants. Forecasting the heat load demand in residential buildings assists in optimizing energy production and consumption in a district heating system. However, the presence of a large number of factors such as weather forecast, district heating operational parameters and user behavioural parameters, make heat load forecasting a challenging task. This thesis proposes a probabilistic machine learning model using a Naive Bayes classifier, to forecast the hourly heat load demand for three residential buildings in the city of Skellefteå, Sweden over a period of winter and spring seasons. The district heating data collected from the sensors equipped at the residential buildings in Skellefteå, is utilized to build the Bayesian network to forecast the heat load demand for horizons of 1, 2, 3, 6 and 24 hours. The proposed model is validated by using four cases to study the influence of various parameters on the heat load forecast by carrying out trace driven analysis in Weka and GeNIe. Results show that current heat load consumption and outdoor temperature forecast are the two parameters with most influence on the heat load forecast. The proposed model achieves average accuracies of 81.23 % and 76.74 % for a forecast horizon of 1 hour in the three buildings for winter and spring seasons respectively. The model also achieves an average accuracy of 77.97 % for three buildings across both seasons for the forecast horizon of 1 hour by utilizing only 10 % of the training data. The results indicate that even a simple model like Naive Bayes classifier can forecast the heat load demand by utilizing less training data.Validerat; 20150810 (global_studentproject_submitter
    corecore