250 research outputs found

    Microdata Deduplication with Spark

    Get PDF
    Üha rohkem avaldatakse veebis struktureeritud sisu, mis on loetav nii inimeste kui masinate poolt. Tänu otsimootorite loojatele, kes on defineerinud standardid struktureeritud sisu esitamiseks, teevad järjest rohkemad veebisaidid osa oma andmetest, nt toodete, isikute, organisatsioonide ja asukohtade kirjeldused, veebis avalikuks. Selleks kasutatakse RDFa, microdata jms vorminguid. Microdata on üks viimastest vormingutest ning saanud populaarseks suhteliselt lühikese aja jooksul. Sarnaselt on arenenud tehnoloogiad veebist struktureeritud sisu kättesaamiseks. Näiteks on Apache Any23, mis võimaldab veebilehtedest microdata andmeid eraldada ja linkandmetena kättesaadavaks teha. Samas pole struktureeritud andmete veebist kättesaamine enam suurim tehniline väljakutse. Nimelt on veebist saadud andmeid enne kasutamist vaja puhastada - eemaldada duplikaadid, lahendada ebakõlad ning hakkama tuleb saada ka ebamääraste andmetega.\n\rKäesoleva magistritöö peamiseks fookuseks on efektiivse lahenduse loomine veebis leiduvatest linkandmetest duplikaatide eemaldamine suurte andmekoguste jaoks. Kuigi deduplikeerimise algoritmid on saavutanud suhtelise küpsuse, tuleb neid konkreetsete andmekomplektide jaoks siiski peenhäälestada. Eelkõige tuleb tuvastada sobivaim võtme pikkus kirjete sortimiseks. Käesolevas töös tuvastatakse optimaalne võtme pikkus veebis leiduvate tooteandmete deduplikeerimise kontekstis. Suurte andmemahtude tõttu kasutatakse Apache Spark'i deduplikeerimist hajusalgoritmide realiseerimiseks.The web is transforming from traditional web to web of data, where information is presented in such a way that it is readable by machines as well as human. As a part of this transformation, every day more and more websites implant structured data, e.g. product, person, organization, place etc., into the HTML pages. To implant the structured data different encoding vocabularies, such as RDFa, microdata, and microformats, are used. Microdata is the most recent addition to these structure data embedding standards, but it has gained more popularity over other formats in less time. Similarly, progress has been made in the extraction of the structured data from web pages, which has resulted in open source tools such as Apache Any23 and non-profit Common Crawl project. Any23 allows extraction of microdata from the web pages with less effort, whereas Common Crawl extracts data from websites and provides it publically for download. In fact, the microdata extraction tools only take care of parsing and data transformation steps of data cleansing. Although with the help of these state-of-the-art extraction tools microdata can be easily extracted, before the extracted data used in potential applications, duplicates should be removed and data unified. Since microdata origins from arbitrary web resources, it has arbitrary quality as well and should be treated correspondingly. \n\rThe main purpose of this thesis is to develop the effective mechanism for deduplication of microdata on the web scale. Although the deduplication algorithms have reached relative maturity, however, these algorithm needs to be executed on specific datasets for fine-tuning. In particular, the need to identify the most suitable length of sorting key in sorted-based deduplication approach. The present work identifies the optimum length of the sorting key in the context of extracted product microdata deduplication. Due to large volumes of data to be processed continuously, Apache Spark will be used for implementing the necessary procedures

    The impact of tourism on the real estate market: the case of the world’s leading tourism destination

    Get PDF
    Mestrado APNORHousing market is an important segment of real estate market and the evolution of house prices is not only related with the macroeconomic factors and construction industry factors, but also with the tourism industry factors. The purpose of this master’s dissertation is to give an overview of the Portuguese housing market and determine the fundamental factors that have an influence on the housing prices. Considering the recent high growth of tourism activity in Portugal, this study analyzes if tourism has had an impact on the housing prices, in the World's Leading Tourism Destination Country. We focus on answering the following question: What is the impact of tourism on the real estate market? This research seeks to understand the main drivers of the house prices of a European country using the Engle-Granger Cointegration analysis. Using a quarterly database between 1998 and 2019 about Portugal, based on cointegration test we find a positive long run relationship between house prices and inflation rate, housing permits, construction cost, loans and tourism, while unemployment rate has a negative impact on house prices in long term. In addition, based on error correction model, we find that in the short run housing prices are determined by the factors of unemployment rate, housing permits, construction costs and loans in Portugal. In this way, we evidence that tourism only affects the house prices in Portugal in the long run and has no effects on the housing market in the short run.O mercado imobiliário é um segmento importante do setor imobiliário e a evolução dos preços da habitação está relacionada não apenas com os fatores macroeconómicos e da indústria da construção, mas também com os fatores da indústria do turismo. O objetivo desta dissertação de mestrado é fornecer uma visão geral do mercado imobiliário português e determinar os fatores fundamentais que influenciaram os preços da habitação. Considerando o recente e elevado crescimento da atividade turística em Portugal, que é um dos principais países de destino turístico do mundo, este estudo analisa se o turismo teve um impacto nos preços da habitação. Com o objetivo de responder à seguinte pergunta: Qual o impacto do turismo no mercado imobiliário? Este trabalho de investigação procura perceber os principais fatores determinantes dos preços das casas de um país europeu usando a metodologia de Cointegração Engle-Granger. Utilizando uma base de dados trimestral entre 1998 e 2019, para Portugal, com base no teste de cointegração, encontramos evidência de uma relação positiva a longo prazo entre os preços da habitação e a taxa de inflação, licenças de habitação, custos de construção, empréstimos e turismo, enquanto a taxa de desemprego tem um impacto negativo nos preços da habitação em Portugal no longo prazo. Além disso, com base no modelo de correção de erros, constatamos que, a curto prazo, os preços da habitação são determinados pelos fatores de taxa de desemprego, licenças de habitação, custos de construção e empréstimos em Portugal. Deste modo, evidenciamos que o turismo afeta apenas os preços da habitação em Portugal a longo prazo e não afeta o mercado imobiliário a curto prazo

    Remark On Optimal Homotopy Method: Application Towards Nano-Fluid Flow Narrating Differential Equations

    Get PDF
    The short communication is devoted to validate the reliability and convergence of Optimal Homotopy Analysis Method (O-HAM). Owing the importance of present validation of O-HAM one can implement this method towards nanofluid flow narrating differential equations at larger scale for better description. To be more specific, the fractional order differential equation due to vertically moving non-spherical nano particle in a purely viscous liquid and an advection PDE is take into account. The corresponding homotopy for both cases are constructed and solutions are proposed by means of O-HAM. The obtained values are compared with numerical benchmarks. We observed an excellent match which confirms the O-HAM conjecture. Therefore, it can be directed that the utilization of O-HAM towards nanofluid flow regime may provide relief against some non-attempted problems

    Factors Effecting Corporate Cash Holding of Non-Financial Firms in Pakistan

    Get PDF
    The previous researches explore the question of why firms hold cash. But there are few researches done in developing countries like Pakistan. The need for cash is characterized by its policies of firms regarding capital structure, working capital requirements, cash flow management, dividend payments, and asset management. In this paper, the impact of these factors is normally analyzed under the framework of Tradeoff theory, Pecking Order Theory and Free Cash Flow Theory. This paper focuses on determining the level of corporate cash holdings of non-financial Pakistani firms, and cash holding requirement among different industries. The data is set for period of 2008-2012 by using the data of 40companies and 6 industries. The findings of the study support the theories. Which show that firm size, net working capital, leverage, Capital Expenditure and Dividend significantly affect the cash holdings of non-financial firms in Pakistan

    Analysis of Male-female Enrollment Trends in Oklahoma Vocational Education Programs During the Period 1972 - 1979

    Get PDF
    Occupational and Adult Educatio

    Cloud-based Architecture of Raspberry Pi: Personal Cloud Storage

    Get PDF
    The research explained the reason why we need personal cloud storage. This research will show steps on how to build a personal cloud storage by using credit card size Raspberry Pi (minicomputer), which will help the user to enable cloud storage mode to their external hard drive. However, other cloud storage services like Dropbox, Google Drive, and iCloud gives limited amount of storage. This research will help the users to use (1TB) or above size external hard drive to be use and have access anywhere from any device over internet. Also the second part of this research focus on replace the laptops to raspberry pi that lecturers use in the classroom to play PowerPoint slides, and videos at university. Universities use laptops to plug and play their educational slides and videos. All these laptops price and maintenance cost lot to the university, if we look deeply just for play slides we do not have to buy a laptop which cost 300andalsothelecturerhavetocarrythelaptopsallthetimesfromthefacultytoclasses,moreovermostofthetimesthelaptopsarenotavailable.Overcomeabovestatement,allthelaptopscanbereplacedtoRaspberryPiwhichcost300 and also the lecturer have to carry the laptops all the times from the faculty to classes, moreover most of the times the laptops are not available. Overcome above statement, all the laptops can be replaced to “Raspberry Pi” which cost 35 and it does not need any maintenance

    Exploring Association of Economic Ties and Social Interaction Between Minorities and Muslims in Pakistan

    Get PDF
    Financial The socio-political adjustment and economic well-being of minority groups has been reported as one of the important issues in the recent time in Pakistan. The study was conducted to know the adjustment issues of Christians and Sikhs minorities in District Peshawar and Swat of Khyber Pakhtunkhwa of Pakistan. More specifically, the aim of this study was to explore how Sikhs and Christian minorities in Pakistan are adjusted in the social fabric of Muslim majority areas and to what extent their economic ties with Muslims are strengthening their overall social interaction in the area. A total of 372 respondents from the targeted locales were selected through proportional allocation through Sekaran table and data was collected based on five-point Likert scale. The economic activities of the minorities were selected as independent variables while the level of social interaction was obtained as dependent variables. The collected data was analysed at uni-variate level through frequency distribution and Chi square test was performed at bi-variate level. It was found that minorities are fully enjoying participation in the economic activities which results in strong social ties and interaction in their daily lives with Muslim majority in the study area. However, it was also found that the minorities are still underprivileged in the area of economic and business activities in Pakistan

    Univariate Modelling and Forecasting of Energy Consumption: The Case Study of Electricity in Pakistan.

    Get PDF
    Demand and supply side assessment are the two foremost important components of energy management and planning. Unfortunately, for the past twenty years Pakistan is confronting extremely serious issues with energy management such as electricity followed by institutional incompetence and lack of policy response. This is due to the fact that the country neither has long term energy plans nor short term solutions to deal with energy crisis. This study outlines overall consumption of electricity and forecasting its various components. The interminable crisis of electricity affects all sectors of economy. The study deals with this particular aspect and applies Holt-winter and ARIMA models for the forecasting. The outcomes of both the models suggest that ARIMA model is more reliable for forecasting as compared to Holt-winter model. Estimated results affirm the tendency of increasing demand in all the indices which show an alarming position in future. Household sector will have the highest energy demand in 2030, followed by industrial sector. Thus, due to the ever increasing demand of electricity energy, government should initiate different renewable sources of power production such as hydal and solar energy to overcome the shortfall of electricity energy as well as sustainability in economy

    Genetic Diversity and Traits Association in Tetraploid and Hexaploid Wheat Genotypes in Khyber Pakhtunkhwa Province of Pakistan

    Get PDF
    Information regarding the magnitude of variability as well as the correlation among agronomicallyimportant traits renders the basis for development of a successful crop improvement program. An experimentcontaining 16 wheat genotypes (8 durum and 8 spring wheat) was conducted in crop season of year 2015-2016, at The University of Agriculture, Peshawar. The experimental design used was a Randomized CompleteBlock Design (RCBD) with three replications. The parameters under study were days to heading, days tomaturity, flag leaf area, plant height, tillers m-2, spikes m-2, spikelets spike-1, 1000-grain weight, grain yieldand harvest index. Among the genotypes for various study traits, statistically significant differences wereobserved. Except for the flag leaf area and days to maturity, durum vs. spring wheat contrast was significantfor all other studied parameters. The flag leaf area of durum wheat was more than that of spring wheat andit took fewer days to initiate heading as well. In contrast, spring wheat genotypes had more average plantheight, tillers m-2, spikes m-2, 1000-grain weight, grain yield and harvest index than durum wheat genotypes.Correlation analysis revealed that tillers m-2 and spikes m-2 had significantly positive association with grainyield while grain yield had significantly strong positive association with harvest index in tested germplasm.Durum genotypes DWE3 and DWE7 performed best for yield contributing traits while spring wheat verities,Janbaz, Barsat and Shahkar outperformed others in terms of 1000-grain weight, grain yield, and harvest index,respectively. These genotypes are recommended to be further tested at multi-locations to check for wideradaptability and a possible use in future wheat breeding programs in the areaPeer reviewe
    corecore