196 research outputs found

    Microdata Deduplication with Spark

    Get PDF
    Üha rohkem avaldatakse veebis struktureeritud sisu, mis on loetav nii inimeste kui masinate poolt. Tänu otsimootorite loojatele, kes on defineerinud standardid struktureeritud sisu esitamiseks, teevad järjest rohkemad veebisaidid osa oma andmetest, nt toodete, isikute, organisatsioonide ja asukohtade kirjeldused, veebis avalikuks. Selleks kasutatakse RDFa, microdata jms vorminguid. Microdata on üks viimastest vormingutest ning saanud populaarseks suhteliselt lühikese aja jooksul. Sarnaselt on arenenud tehnoloogiad veebist struktureeritud sisu kättesaamiseks. Näiteks on Apache Any23, mis võimaldab veebilehtedest microdata andmeid eraldada ja linkandmetena kättesaadavaks teha. Samas pole struktureeritud andmete veebist kättesaamine enam suurim tehniline väljakutse. Nimelt on veebist saadud andmeid enne kasutamist vaja puhastada - eemaldada duplikaadid, lahendada ebakõlad ning hakkama tuleb saada ka ebamääraste andmetega.\n\rKäesoleva magistritöö peamiseks fookuseks on efektiivse lahenduse loomine veebis leiduvatest linkandmetest duplikaatide eemaldamine suurte andmekoguste jaoks. Kuigi deduplikeerimise algoritmid on saavutanud suhtelise küpsuse, tuleb neid konkreetsete andmekomplektide jaoks siiski peenhäälestada. Eelkõige tuleb tuvastada sobivaim võtme pikkus kirjete sortimiseks. Käesolevas töös tuvastatakse optimaalne võtme pikkus veebis leiduvate tooteandmete deduplikeerimise kontekstis. Suurte andmemahtude tõttu kasutatakse Apache Spark'i deduplikeerimist hajusalgoritmide realiseerimiseks.The web is transforming from traditional web to web of data, where information is presented in such a way that it is readable by machines as well as human. As a part of this transformation, every day more and more websites implant structured data, e.g. product, person, organization, place etc., into the HTML pages. To implant the structured data different encoding vocabularies, such as RDFa, microdata, and microformats, are used. Microdata is the most recent addition to these structure data embedding standards, but it has gained more popularity over other formats in less time. Similarly, progress has been made in the extraction of the structured data from web pages, which has resulted in open source tools such as Apache Any23 and non-profit Common Crawl project. Any23 allows extraction of microdata from the web pages with less effort, whereas Common Crawl extracts data from websites and provides it publically for download. In fact, the microdata extraction tools only take care of parsing and data transformation steps of data cleansing. Although with the help of these state-of-the-art extraction tools microdata can be easily extracted, before the extracted data used in potential applications, duplicates should be removed and data unified. Since microdata origins from arbitrary web resources, it has arbitrary quality as well and should be treated correspondingly. \n\rThe main purpose of this thesis is to develop the effective mechanism for deduplication of microdata on the web scale. Although the deduplication algorithms have reached relative maturity, however, these algorithm needs to be executed on specific datasets for fine-tuning. In particular, the need to identify the most suitable length of sorting key in sorted-based deduplication approach. The present work identifies the optimum length of the sorting key in the context of extracted product microdata deduplication. Due to large volumes of data to be processed continuously, Apache Spark will be used for implementing the necessary procedures

    Remark On Optimal Homotopy Method: Application Towards Nano-Fluid Flow Narrating Differential Equations

    Get PDF
    The short communication is devoted to validate the reliability and convergence of Optimal Homotopy Analysis Method (O-HAM). Owing the importance of present validation of O-HAM one can implement this method towards nanofluid flow narrating differential equations at larger scale for better description. To be more specific, the fractional order differential equation due to vertically moving non-spherical nano particle in a purely viscous liquid and an advection PDE is take into account. The corresponding homotopy for both cases are constructed and solutions are proposed by means of O-HAM. The obtained values are compared with numerical benchmarks. We observed an excellent match which confirms the O-HAM conjecture. Therefore, it can be directed that the utilization of O-HAM towards nanofluid flow regime may provide relief against some non-attempted problems

    Genetic Diversity and Traits Association in Tetraploid and Hexaploid Wheat Genotypes in Khyber Pakhtunkhwa Province of Pakistan

    Get PDF
    Information regarding the magnitude of variability as well as the correlation among agronomicallyimportant traits renders the basis for development of a successful crop improvement program. An experimentcontaining 16 wheat genotypes (8 durum and 8 spring wheat) was conducted in crop season of year 2015-2016, at The University of Agriculture, Peshawar. The experimental design used was a Randomized CompleteBlock Design (RCBD) with three replications. The parameters under study were days to heading, days tomaturity, flag leaf area, plant height, tillers m-2, spikes m-2, spikelets spike-1, 1000-grain weight, grain yieldand harvest index. Among the genotypes for various study traits, statistically significant differences wereobserved. Except for the flag leaf area and days to maturity, durum vs. spring wheat contrast was significantfor all other studied parameters. The flag leaf area of durum wheat was more than that of spring wheat andit took fewer days to initiate heading as well. In contrast, spring wheat genotypes had more average plantheight, tillers m-2, spikes m-2, 1000-grain weight, grain yield and harvest index than durum wheat genotypes.Correlation analysis revealed that tillers m-2 and spikes m-2 had significantly positive association with grainyield while grain yield had significantly strong positive association with harvest index in tested germplasm.Durum genotypes DWE3 and DWE7 performed best for yield contributing traits while spring wheat verities,Janbaz, Barsat and Shahkar outperformed others in terms of 1000-grain weight, grain yield, and harvest index,respectively. These genotypes are recommended to be further tested at multi-locations to check for wideradaptability and a possible use in future wheat breeding programs in the areaPeer reviewe

    The importance of a medium-term budgetary framework in enhancing the sustainability of public finances

    Get PDF
    Globalization has changed traditional systems to scientific systems rapidly. This reform took place every organizational process. Fiscal reforms are the major reforms in the organization. The objectives of fiscal reform is to enhance the quality of fiscal governance and public finance. The fiscal reform also consolidate superior budgetary resources through better disciplinary fiscal policies. These disciplined fiscal policies can be followed by the organizational and legislative measurements. It will leads to the introduction and establishment of new and independent fiscal rules. Now fiscal strategies moving towards the multi-annual budget system. The implementation of multi-annual budget system introduced the medium term budgetary system. The basic aim of this study is to introducing the importance of implementation of medium term budget system and the role of this budget system in public organizations. Fiscal management policies and tools plays important role in the sustainability of public finance in nations

    Uncovering the Relationship of Supply Chain Management and Firm Performance: Evidence from Textile Sector of Pakistan

    Get PDF
    The purpose of this research paper was to find the impact of the supply chain on firm performance in Textile firm of Pakistan. Data was collected through questioners in the month of March 2018, Approximately 30 questioners were distributed among the managers of the ten textile organizations in Faisalabad which are expected to have the best knowledge about the supply chain operations and its impact on the performance of the organization, all of them responded positively. It is found that dimensions associated with SCM methods as well as explain the connection amongst SCM methods, aggressive benefit, as well as firm performance. The actual study focuses on the causal associations in between SCM exercise, aggressive benefit as well as firm performance as well as ignores the actual feasible recursive associations. &nbsp
    corecore