58 research outputs found
Detecting vandalism on Wikipedia across multiple languages
Vandalism, the malicious modification or editing of articles, is a serious problem
for free and open access online encyclopedias such as Wikipedia. Over the 13 year
lifetime of Wikipedia, editors have identified and repaired vandalism in 1.6% of more
than 500 million revisions of over 9 million English articles, but smaller manually
inspected sets of revisions for research show vandalism may appear in 7% to 11% of
all revisions of English Wikipedia articles. The persistent threat of vandalism has led
to the development of automated programs (bots) and editing assistance programs
to help editors detect and repair vandalism. Research into improving vandalism
detection through application of machine learning techniques have shown significant
improvements to detection rates of a wider variety of vandalism. However, the focus
of research is often only on the English Wikipedia, which has led us to develop a
novel research area of cross-language vandalism detection (CLVD).
CLVD provides a solution to detecting vandalism across several languages through
the development of language-independent machine learning models. These models
can identify undetected vandalism cases across languages that may have insufficient
identified cases to build learning models. The two main challenges of CLVD are (1)
identifying language-independent features of vandalism that are common to multiple
languages, and (2) extensibility of vandalism detection models trained in one
language to other languages without significant loss in detection rate. In addition,
other important challenges of vandalism detection are (3) high detection rate of a variety
of known vandalism types, (4) scalability to the size of Wikipedia in the number
of revisions, and (5) ability to incorporate and generate multiple types of data that
characterise vandalism.
In this thesis, we present our research into CLVD onWikipedia, where we identify
gaps and problems in existing vandalism detection techniques. To begin our thesis,
we introduce the problem of vandalism onWikipedia with motivating examples, and
then present a review of the literature. From this review, we identify and address the
following research gaps. First, we propose techniques for summarising the user activity
of articles and comparing the knowledge coverage of articles across languages.
Second, we investigate CLVD using the metadata of article revisions together with
article views to learn vandalism models and classify incoming revisions. Third, we
propose new text features that are more suitable for CLVD than text features from
the literature. Fourth, we propose a novel context-aware vandalism detection technique
for sneaky types of vandalism that may not be detectable through constructing
features. Finally, to show that our techniques of detecting malicious activities are not
limited to Wikipedia, we apply our feature sets to detecting malicious attachments
and URLs in spam emails. Overall, our ultimate aim is to build the next generation
of vandalism detection bots that can learn and detect vandalism from multiple
languages and extend their usefulness to other language editions of Wikipedia
The impact of recentralization reform on corruption: evidence from a quasi-natural experiment
How does government recentralization reform affect corruption? We utilize the pilot recentralization reform that transforms the legislative function, power, and responsibility of the district-level authorities to the higher level of the government organ in Vietnam as a quasi-natural experiment to address the aforementioned question. We find strong evidence that recentralization reform leads to lower corruption. The result illustrates that, among the firms which have the highest probability of making a bribe payment, those incorporated in jurisdictions experiencing the recentralization reform are 4.3% less likely to pay a bribe. In addition, the perception that bribery is a common and necessary practice is also significantly lowered in the post-recentralization period. We further show that the impact of recentralization is stronger for firms which lack a political connection. Overall, these results shed light on the real impact of the government recentralization reform and also the determinants of corruption, thereby providing important policy implications for policymakers to create a more conducive business environment
Sustainable technology in developed countries: waste municipal management
As more studies were conducted and global events unfold, a greater emphasis is being placed on the importance of preserving the Earth's natural resources and cycles before we face a catastrophic climate crisis. Thus, developed countries are constantly adapting their policies and legislation to promote green development for the sake of sustainable development, which benefits both the environment and the socioeconomic segment. As populations grow and living standards improve, more waste is generated. Appropriate municipal waste management is necessary to avoid harm to the environment, wildlife, and human health. Sustainable municipal solid waste management is even included in the United Nations' (UN) Sustainable Development Goals, which aim to improve the world's environment and economy. The European Union (EU) member states' waste management systems can be considered exemplary. In some countries, landfills have been prohibited, promoting the use of more sustainable technologies such as organic waste incineration, recycling, and composting. However, a divide exists between member countries, with some lagging behind in terms of waste management strategies. Thus, this paper examined the current state of municipal waste in EU member states, followed by a review of the various disposal technologies implemented. The difficulties and environmental concerns that must be overcome are discussed, as are the recommendations and possible future directions
Impact of climate change on meteorological, hydrological and agricultural droughts in the Lower Mekong River Basin: a case study of the Srepok Basin, Vietnam
peer reviewedThe objective of this study is to assess future changes in meteorological, hydrology and agricultural droughts under the impact of changing climate in the Srepok River Basin, a subbasin of LMB, using three drought indices; standardized precipitation index (SPI), standardized runoff index (SRI) and standardized soil moisture index (SSWI). The well-calibrated Soil and Water Assessment Tool (SWAT) is used as a simulation tool to estimate the features of meteorological, hydrological and agricultural droughts. The climate data for the 2016–2040 period is obtained from four different regional climate models; HadGEM3-RA, SNU-MM5, RegCM4 and YSU-RSM, which are downscaled from the HadGEM2-AO GCM. The results show that the severity, duration and frequency of droughts are predicted to increase in the near future for this region. Moreover, the meteorological drought is less sensitive to climate change than the hydrological and agricultural droughts; however, it has a stronger correlation with the hydrological and agricultural droughts as the accumulation period is increased. These findings may be useful for water resources management and future planning for mitigation and adaptation to the climate change impact in the Srepok River Basin
SIP-MBA: A secure IoT platform with brokerless and micro-service architecture
The Internet of Things is one of the most interesting technology trends today. Devices in the IoT network are often geared towards mobility and compact in size, thus having a rather weak hardware configuration. There are many light weight protocols, tailor-made suitable for limited processing power and low energy consumption, of which MQTT is the typical one. The current MQTT protocol supports three types of quality-of-service (QoS) and the user has to trade-off the security of the packet transmission by transmission rate, bandwidth and energy consumption. The MQTT protocol, however, does not support packet storage mechanisms which means that when the receiver is interrupted, the packet cannot be retrieved. In this paper, we present a broker-less SIP-MBA Platform, designed for micro-service and using gRPC protocol to transmit and receive messages. This design optimizes the transmission rate, power consumption and transmission bandwidth, while still meeting reliability when communicating. Besides, we implement users and things management mechanisms with the aim of improving security issues. Finally, we present the test results by implementing a collect data service via gRPC protocol and comparing it with streaming data by using the MQTT protocol.Web of Science12759358
IoHT-MBA: An Internet of Healthcare Things (IoHT) platform based on microservice and brokerless architecture
Internet of Thing (IoT), currently, is one of the technology trends that are most interested. IoT can be divided into five main areas including: Health-care, Environmental, Smart city, Commercial and Industrial. The IoHT-MBA Platform is considered the backbone of every IoT architecture, so the optimal design of the IoHT-MBA Platform is essential issue, which should be carefully considered in the different aspects. Although, IoT is applied in multiple domains, however, there are still three main features that are challenge to improve: i) data collection, ii) users, devices management, and iii) remote device control. Today's medical IoT systems, often too focused on the big data or access control aspects of participants, but not focused on collecting data accurately, quickly, and efficiently; power redundancy and system expansion. This is very important for the medical sector - which always prioritizes the availability of data for therapeutic purposes over other aspects. In this paper, we introduce the IoHT Platform for Healthcare environment which is designed by microservice and brokerless architecture, focusing strongly on the three aforementioned characteristics. In addition, our IoHT Platform considers the five other issues including (1) the limited processing capacity of the devices, (2) energy saving for the device, (3) speed and accurate of the data collection, (4) security mechanisms and (5) scalability of the system. Also, in order for the IoHT Platform to be suitable for the field of health monitoring, we also add realtime alerts for the medical team. In the evaluation section, moreover, we describe the evaluation to prove the effectiveness of the proposed IoHT Platform (i.e. the proof-of-concept) in the performance, non-error, and non affected by geographical distance. Finally, a complete code solution is publicized on the authors' GitHub repository to engage further reproducibility and improvement.Web of Science12760159
Factors associated with 90-day mortality in Vietnamese stroke patients: Prospective findings compared with explainable machine learning, multicenter study
The prevalence and predictors of mortality following an ischemic stroke or intracerebral hemorrhage have not been well established among patients in Vietnam. 2885 consecutive diagnosed patients with ischemic stroke and intracerebral hemorrhage at ten stroke centres across Vietnam were involved in this prospective study. Posthoc analyses were performed in 2209 subjects (age was 65.4 ± 13.7 years, with 61.4% being male) to explore the clinical characteristics and prognostic factors associated with 90-day mortality following treatment. An explainable machine learning model using extreme gradient boosting and SHapley Additive exPlanations revealed the correlation between original clinical research and advanced machine learning methods in stroke care. In the 90 days following treatment, the mortality rate for ischemic stroke was 8.2%, while for intracerebral hemorrhage, it was higher at 20.5%. Atrial fibrillation was an elevated risk of 90-day mortality in the ischemic stroke patient (OR 3.09; 95% CI 1.90–5.02, p 0.05). The baseline NIHSS score was a significant predictor of 90-day mortality in both patient groups. The machine learning model can predict a 0.91 accuracy prediction of death rate after 90 days. Age and NIHSS score were in the top high risks with other features, such as consciousness, heart rate, and white blood cells. Stroke severity, as measured by the NIHSS, was identified as a predictor of mortality at discharge and the 90-day mark in both patient groups
The challenge of unprecedented floods and droughts in risk management
Risk management has reduced vulnerability to floods and droughts globally1,2, yet their impacts are still increasing3. An improved understanding of the causes of changing impacts is therefore needed, but has been hampered by a lack of empirical data4,5. On the basis of a global dataset of 45 pairs of events that occurred within the same area, we show that risk management generally reduces the impacts of floods and droughts but faces difficulties in reducing the impacts of unprecedented events of a magnitude not previously experienced. If the second event was much more hazardous than the first, its impact was almost always higher. This is because management was not designed to deal with such extreme events: for example, they exceeded the design levels of levees and reservoirs. In two success stories, the impact of the second, more hazardous, event was lower, as a result of improved risk management governance and high investment in integrated management. The observed difficulty of managing unprecedented events is alarming, given that more extreme hydrological events are projected owing to climate change3
Panta Rhei benchmark dataset: socio-hydrological data of paired events of floods and droughts
As the adverse impacts of hydrological extremes increase in many regions of the world, a better understanding of the drivers of changes in risk and impacts is essential for effective flood and drought risk management and climate adaptation. However, there is currently a lack of comprehensive, empirical data about the processes, interactions and feedbacks in complex human-water systems leading to flood and drought impacts. Here we present a benchmark dataset containing socio-hydrological data of paired events, i.e., two floods or two droughts that occurred in the same area. The 45 paired events occurred in 42 different study areas and cover a wide range of socio-economic and hydro-climatic conditions. The dataset is unique in covering both floods and droughts, in the number of cases assessed, and in the quantity of socio-hydrological data. The benchmark dataset comprises: 1) detailed review style reports about the events and key processes between the two events of a pair; 2) the key data table containing variables that assess the indicators which characterise management shortcomings, hazard, exposure, vulnerability and impacts of all events; 3) a table of the indicators-of-change that indicate the differences between the first and second event of a pair. The advantages of the dataset are that it enables comparative analyses across all the paired events based on the indicators-of-change and allows for detailed context- and location-specific assessments based on the extensive data and reports of the individual study areas. The dataset can be used by the scientific community for exploratory data analyses e.g. focused on causal links between risk management, changes in hazard, exposure and vulnerability and flood or drought impacts. The data can also be used for the development, calibration and validation of socio-hydrological models. The dataset is available to the public through the GFZ Data Services (Kreibich et al. 2023, link for review: https://dataservices.gfz-potsdam.de/panmetaworks/review/923c14519deb04f83815ce108b48dd2581d57b90ce069bec9c948361028b8c85/).</p
Global age-sex-specific mortality, life expectancy, and population estimates in 204 countries and territories and 811 subnational locations, 1950–2021, and the impact of the COVID-19 pandemic: a comprehensive demographic analysis for the Global Burden of Disease Study 2021
Background: Estimates of demographic metrics are crucial to assess levels and trends of population health outcomes. The profound impact of the COVID-19 pandemic on populations worldwide has underscored the need for timely estimates to understand this unprecedented event within the context of long-term population health trends. The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 provides new demographic estimates for 204 countries and territories and 811 additional subnational locations from 1950 to 2021, with a particular emphasis on changes in mortality and life expectancy that occurred during the 2020–21 COVID-19 pandemic period. Methods: 22 223 data sources from vital registration, sample registration, surveys, censuses, and other sources were used to estimate mortality, with a subset of these sources used exclusively to estimate excess mortality due to the COVID-19 pandemic. 2026 data sources were used for population estimation. Additional sources were used to estimate migration; the effects of the HIV epidemic; and demographic discontinuities due to conflicts, famines, natural disasters, and pandemics, which are used as inputs for estimating mortality and population. Spatiotemporal Gaussian process regression (ST-GPR) was used to generate under-5 mortality rates, which synthesised 30 763 location-years of vital registration and sample registration data, 1365 surveys and censuses, and 80 other sources. ST-GPR was also used to estimate adult mortality (between ages 15 and 59 years) based on information from 31 642 location-years of vital registration and sample registration data, 355 surveys and censuses, and 24 other sources. Estimates of child and adult mortality rates were then used to generate life tables with a relational model life table system. For countries with large HIV epidemics, life tables were adjusted using independent estimates of HIV-specific mortality generated via an epidemiological analysis of HIV prevalence surveys, antenatal clinic serosurveillance, and other data sources. Excess mortality due to the COVID-19 pandemic in 2020 and 2021 was determined by subtracting observed all-cause mortality (adjusted for late registration and mortality anomalies) from the mortality expected in the absence of the pandemic. Expected mortality was calculated based on historical trends using an ensemble of models. In location-years where all-cause mortality data were unavailable, we estimated excess mortality rates using a regression model with covariates pertaining to the pandemic. Population size was computed using a Bayesian hierarchical cohort component model. Life expectancy was calculated using age-specific mortality rates and standard demographic methods. Uncertainty intervals (UIs) were calculated for every metric using the 25th and 975th ordered values from a 1000-draw posterior distribution. Findings: Global all-cause mortality followed two distinct patterns over the study period: age-standardised mortality rates declined between 1950 and 2019 (a 62·8% [95% UI 60·5–65·1] decline), and increased during the COVID-19 pandemic period (2020–21; 5·1% [0·9–9·6] increase). In contrast with the overall reverse in mortality trends during the pandemic period, child mortality continued to decline, with 4·66 million (3·98–5·50) global deaths in children younger than 5 years in 2021 compared with 5·21 million (4·50–6·01) in 2019. An estimated 131 million (126–137) people died globally from all causes in 2020 and 2021 combined, of which 15·9 million (14·7–17·2) were due to the COVID-19 pandemic (measured by excess mortality, which includes deaths directly due to SARS-CoV-2 infection and those indirectly due to other social, economic, or behavioural changes associated with the pandemic). Excess mortality rates exceeded 150 deaths per 100 000 population during at least one year of the pandemic in 80 countries and territories, whereas 20 nations had a negative excess mortality rate in 2020 or 2021, indicating that all-cause mortality in these countries was lower during the pandemic than expected based on historical trends. Between 1950 and 2021, global life expectancy at birth increased by 22·7 years (20·8–24·8), from 49·0 years (46·7–51·3) to 71·7 years (70·9–72·5). Global life expectancy at birth declined by 1·6 years (1·0–2·2) between 2019 and 2021, reversing historical trends. An increase in life expectancy was only observed in 32 (15·7%) of 204 countries and territories between 2019 and 2021. The global population reached 7·89 billion (7·67–8·13) people in 2021, by which time 56 of 204 countries and territories had peaked and subsequently populations have declined. The largest proportion of population growth between 2020 and 2021 was in sub-Saharan Africa (39·5% [28·4–52·7]) and south Asia (26·3% [9·0–44·7]). From 2000 to 2021, the ratio of the population aged 65 years and older to the population aged younger than 15 years increased in 188 (92·2%) of 204 nations. Interpretation: Global adult mortality rates markedly increased during the COVID-19 pandemic in 2020 and 2021, reversing past decreasing trends, while child mortality rates continued to decline, albeit more slowly than in earlier years. Although COVID-19 had a substantial impact on many demographic indicators during the first 2 years of the pandemic, overall global health progress over the 72 years evaluated has been profound, with considerable improvements in mortality and life expectancy. Additionally, we observed a deceleration of global population growth since 2017, despite steady or increasing growth in lower-income countries, combined with a continued global shift of population age structures towards older ages. These demographic changes will likely present future challenges to health systems, economies, and societies. The comprehensive demographic estimates reported here will enable researchers, policy makers, health practitioners, and other key stakeholders to better understand and address the profound changes that have occurred in the global health landscape following the first 2 years of the COVID-19 pandemic, and longer-term trends beyond the pandemic
- …