106 research outputs found
Addressing the Impact of Localized Training Data in Graph Neural Networks
Graph Neural Networks (GNNs) have achieved notable success in learning from
graph-structured data, owing to their ability to capture intricate dependencies
and relationships between nodes. They excel in various applications, including
semi-supervised node classification, link prediction, and graph generation.
However, it is important to acknowledge that the majority of state-of-the-art
GNN models are built upon the assumption of an in-distribution setting, which
hinders their performance on real-world graphs with dynamic structures. In this
article, we aim to assess the impact of training GNNs on localized subsets of
the graph. Such restricted training data may lead to a model that performs well
in the specific region it was trained on but fails to generalize and make
accurate predictions for the entire graph. In the context of graph-based
semi-supervised learning (SSL), resource constraints often lead to scenarios
where the dataset is large, but only a portion of it can be labeled, affecting
the model's performance. This limitation affects tasks like anomaly detection
or spam detection when labeling processes are biased or influenced by human
subjectivity. To tackle the challenges posed by localized training data, we
approach the problem as an out-of-distribution (OOD) data issue by by aligning
the distributions between the training data, which represents a small portion
of labeled data, and the graph inference process that involves making
predictions for the entire graph. We propose a regularization method to
minimize distributional discrepancies between localized training data and graph
inference, improving model performance on OOD data. Extensive tests on popular
GNN models show significant performance improvement on three citation GNN
benchmark datasets. The regularization approach effectively enhances model
adaptation and generalization, overcoming challenges posed by OOD data.Comment: 6 pages, 4 figure
Over-Squashing in Graph Neural Networks: A Comprehensive survey
Graph Neural Networks (GNNs) have emerged as a revolutionary paradigm in the
realm of machine learning, offering a transformative approach to dissect
intricate relationships inherent in graph-structured data. The foundational
architecture of most GNNs involves the dissemination of information through
message aggregation and transformation among interconnected nodes, a mechanism
that has demonstrated remarkable efficacy across diverse applications
encompassing node classification, link prediction, and recommendation systems.
Nonetheless, their potential prowess encounters a restraint intrinsic to
scenarios necessitating extensive contextual insights. In certain contexts,
accurate predictions hinge not only upon a node's immediate local surroundings
but also on interactions spanning far-reaching domains. This intricate demand
for long-range information dissemination exposes a pivotal challenge recognized
as "over-squashing," wherein the fidelity of information flow from distant
nodes becomes distorted. This phenomenon significantly curtails the efficiency
of message-passing mechanisms, particularly for tasks reliant on intricate
long-distance interactions. In this comprehensive article, we illuminate the
prevalent constraint of over-squashing pervading GNNs. Our exploration entails
a meticulous exposition of the ongoing efforts by researchers to improve the
ramifications posed by this limitation. Through systematic elucidation, we
delve into strategies, methodologies, and innovations proposed thus far, all
aimed at mitigating the detriments of over-squashing. By shedding light on this
intricately woven issue, we aim to contribute to a nuanced understanding of the
challenges within the GNN landscape and the evolving solutions designed to
surmount them.Comment: 8 page
Recommended from our members
Data-Driven Control, Modeling, and Forecasting for Residential Solar Power
Distributed solar generation is rising rapidly due to a continuing decline in the cost of solar modules. Most residential solar deployments today are grid-tied, enabling them to draw power from the grid when their local demand exceeds solar generation and feed power into the grid when their local solar generation exceeds demand. The electric grid was not designed to support such decentralized and intermittent energy generation by millions of individual users. This dramatic increase in solar power is placing increasing stress on the grid, which must continue to balance its supply and demand despite the potential for large solar fluctuations. To address the problem, this thesis proposes new data-driven techniques for better controlling, modeling, and forecasting residential solar power.
The grid currently exercises no direct control over its connected solar capacity, but instead indirectly controls it by regulating new solar connections. This approach is highly inefficient and wastes much of the grid\u27s potential to transmit solar. Instead, we propose Software-defined Solar-powered (SDS) systems that dynamically regulate solar flow rates into the grid and design an SDS prototype, called SunShade. Specifically, we introduce a new class of Weighted Power Point Tracking (WPPT) algorithms, inspired by Maximum Power Point Tracking (MPPT), capable of dynamically enforcing both hard and relative caps on solar power, which enables the grid to decouple rate control from admission control. In contrast, to avoid grid regulations entirely, homes can also partially or entirely defect from the grid to fully utilize their solar power without restrictions. We present a switching architecture that enables homes to dynamically switch between a local generator, battery, and solar to co-optimize their cost, carbon footprint, switching frequency, and reliability. We introduce switching policies that reveal a tradeoff between solar utilization and reliability, such that higher solar utilization requires more switching, which can lead to lower reliability.
Enabling better control of intermittent solar also requires improving solar performance models, which infer solar output based on current environmental conditions. Recent solar models primarily leverage data from ground-based weather stations, which may be far from solar sites and thus inaccurate. In addition, these weather stations report cloud cover---the most important metric for solar modeling---in coarse units of oktas. Instead, we propose developing solar models based on data from a new generation of Geostationary Operational Environmental Satellites (GOES-16 and GOES-17) that began launching in late 2017. We develop physical and machine learning (ML) models for solar performance modeling using both derived data products released by the National Oceanic and Atmospheric Administration (NOAA), as well as the satellites\u27 raw multispectral data. We find that ML-based models using the raw multispectral data are significantly more accurate than both physical models using derived data products, such as Downward Shortwave Radiation (DSR), and prior okta-based solar models. The raw multispectral data is also beneficial since it is available at much higher spatial and temporal resolutions---1km^2 and every 5 minutes---than oktas---25km^2 and every hour. The accuracy of our ML-based models on multispectral data is also better regardless of whether they are locally trained using data only from a particular solar site or globally trained using data from many solar sites. Since global models can be trained once but used anywhere, they can also enable accurate modeling for sites with limited data, e.g., newly installed solar sites.
Solar forecasting models, which predict future solar output based on environmental conditions also help in better solar control. Accurate near-term solar forecasts on the order of minutes to an hour are particularly important because homes and the grid must be able to adapt to large sudden changes in solar output. Current solar forecasting techniques, which primarily use Numerical Weather Predictions (NWP) algorithms, mostly leverage physics-based modeling. These physics-based models are most appropriate for forecast horizons on the order of hours to days and not near-term forecasts on the order of minutes to an hour. While there is some recent work on analyzing images from ground-based sky cameras for accurate near-term solar forecasting, it requires installing additional infrastructure. We instead propose a general model for solar nowcasting from abundant and readily available multispectral satellite data using self-supervised learning. Specifically, we develop deep auto-regressive models using convolutional neural networks (CNN) and long short-term memory networks (LSTM) that are globally trained across multiple locations to predict raw future observations of the spatio-temporal data collected by the recently launched GOES-R series of satellites. Our model estimates a location\u27s future solar irradiance based on satellite observations, which we feed to a regression model trained on smaller site-specific solar data to provide near-term solar photovoltaic (PV) forecasts that account for site-specific characteristics
Polymorphism in Bi-based perovskite oxides: a first-principles study
Under normal conditions, bulk crystals of BiScO , BiCrO, BiMnO,
BiFeO, and BiCoO present three very different variations of the
perovskite structure: an antipolar phase, a rhombohedral phase with a large
polarization along the space diagonal of the pseudocubic unit cell, and a
supertetragonal phase with even larger polarization. With the aim of
understanding the causes for this variety, we have used a genetic algorithm to
search for minima in the surface energy of these materials. Our results show
that the number of these minima is very large when compared to that of typical
ferroelectric perovskites like BaTiO and PbTiO , and that a fine energy
balance between them results in the large structural differences seen. As
byproducts of our search we have identified charge-ordering structures with low
energy in BiMnO , and several phases with energies that are similar to that
of the ground state of BiCrO. We have also found that a inverse
supertetragonal phase exists in bulk, likely to be favored in films epitaxially
grown at large values of tensile misfit strain
Continuities and changes in spatial patterns of under-five mortality at the district level in India (1991–2011)
Background India has the largest number of under-five deaths globally, and large variations in under-five mortality persist between states and districts. Relationships between under-five mortality and numerous socioeconomic, development and environmental health factors have been explored at the national and state levels, but the possible spatial heterogeneity in these relationships has seldom been investigated at the district level. This study seeks to unravel local variation in key determinants of under-five mortality based on the 1991 and 2011 censuses. Methods Using geocoded district-level data from the last two census rounds (1991 and 2011) and ordinary least squares and geographically weighted regressions, we identify district-specific relationships between under-five mortality rate and a series of determinants for two periods separated by 20 years (1986–1987 and 2006–2007). To identify spatial groupings of coefficients, we perform a cluster analysis based on t-values of the geographically weighted regression. Results The geographically weighted regression analysis shows that relationships between the under-five mortality rate and factors for socioeconomic, development, and environmental health factors vary spatially in terms of direction, strength, and extent when considering: female literacy and labor force participation; share of scheduled castes and scheduled tribes; access to electricity; safe water and sanitation; road infrastructure; and medical facilities. This spatial heterogeneity is accompanied by significant changes over time in the roles that these factors play in under-five mortality. Important local determinants of under-five mortality in 2011 were female literacy, female labor force participation, access to sanitation facilities and electricity; while the key local determinants in 1991 were road infrastructure, safe water, and medical facilities. We identify six different clusters based on geographically weighted regression coefficients that broadly encompass the same districts in both periods; but these clusters do not follow the regional boundaries suggested by the previous studies. In particular, the high mortality states of India that are often typically classified as high focus states were classified into three different clusters based on the relationship of the factors associated with under-five mortality. Conclusion This study demonstrates the utility of combining geographically weighted regression and cluster analyses as a methodological approach to study local-level variation in public health indicators, and it could be applied in any country using aggregate-level information from census or survey data. Identifying local predictors of under-five mortality is important for designing interventions in specific districts. Additional reduction in under-five mortality will only be possible with intervention programs designed at the local level, which take into consideration local level determinants of under-five mortality
Identification of group specific motifs in Beta-lactamase family of proteins
<p>Abstract</p> <p>Background</p> <p>Beta-lactamases are one of the most serious threats to public health. In order to combat this threat we need to study the molecular and functional diversity of these enzymes and identify signatures specific to these enzymes. These signatures will enable us to develop inhibitors and diagnostic probes specific to lactamases. The existing classification of beta-lactamases was developed nearly 30 years ago when few lactamases were available. DLact database contain more than 2000 beta-lactamase, which can be used to study the molecular diversity and to identify signatures specific to this family.</p> <p>Methods</p> <p>A set of 2020 beta-lactamase proteins available in the DLact database <url>http://59.160.102.202/DLact</url> were classified using graph-based clustering of Best Bi-Directional Hits. Non-redundant (> 90 percent identical) protein sequences from each group were aligned using T-Coffee and annotated using information available in literature. Motifs specific to each group were predicted using PRATT program.</p> <p>Results</p> <p>The graph-based classification of beta-lactamase proteins resulted in the formation of six groups (Four major groups containing 191, 726, 774 and 73 proteins while two minor groups containing 50 and 8 proteins). Based on the information available in literature, we found that each of the four major groups correspond to the four classes proposed by Ambler. The two minor groups were novel and do not contain molecular signatures of beta-lactamase proteins reported in literature. The group-specific motifs showed high sensitivity (> 70%) and very high specificity (> 90%). The motifs from three groups (corresponding to class A, C and D) had a high level of conservation at DNA as well as protein level whereas the motifs from the fourth group (corresponding to class B) showed conservation at only protein level.</p> <p>Conclusion</p> <p>The graph-based classification of beta-lactamase proteins corresponds with the classification proposed by Ambler, thus there is no need for formulating a new classification. However, further characterization of two small groups may require updating the existing classification scheme. Better sensitivity and specificity of group-specific motifs identified in this study, as compared to PROSITE motifs, and their proximity to the active site indicates that these motifs represents group-specific signature of beta-lactamases and can be further developed into diagnostics and therapeutics.</p
Data Visualization and Techniques
Data visualization is the graphical representation of information. Bar charts scatter graphs, and maps are examples of simple data visualizations that have been used for decades. Information technology combines the principles of visualization with powerful applications and large data sets to create sophisticated images and animations. A tag cloud, for instance, uses text size to indicate the relative frequency of use of a set of terms. In many cases, the data that feed a tag cloud come from thousands of Web pages, representing perhaps millions ofusers. All of this information is contained in a simple image that you can understand quickly and easily. More complex visualizations sometimes generate animations that demonstrate how data change over time. In an application called Gap minder, bubbles represent the countriesof the world, with each nationÊs population reflected in the size of its bubble. You can set the x and y axes to compare life expectancy with per capita income, for example, and the tool will show how each nationÊs bubble moves on the graph over time. You can see that higher income generallycorrelates with longer life expectancy, but the visualization also clearly shows that China doesnÊt follow this trend·in 1975, the country had one of the lowest per capita incomes but one of the longer life expectancies. The animation also shows the steep drop in life expectancy in many sub-Saharan African countries starting in the early 1990s (corresponding to the AIDS epidemic in that part of the world) and the plummeting of life expectancy in Rwanda at the time of that nationÊs genocide
A REVIEW ON CARISSA CARANDASǧPHYTOCHEMISTRY,ETHNOǧPHARMACOLOGY, AND MICROPROPAGATION AS CONSERVATION STRATEGY
Carissa carandas is a useful food and medicinal plant of India, found to be widely distributed throughout subtropical and topical regions. The planthas been used as a traditional medicinal plant over thousands of years in the Ayurvedic, Unani, and Homoeopathic system of medicine. Traditionally,whole plant and its parts were used in the treatment of various ailments. The major bioactive constituents, which impart medicinal value to the herb,are alkaloids, flavonoids, saponins and large amounts of cardiac glycosides, triterpenoids, phenolic compounds and tannins. Roots were reported tocontain volatile principles including 2-acetyl phenol, lignan, carinol, sesquiterpenes (carissone, carindone), lupeol, β-sitosterol, 16β-hydroxybetulinicacid, α-amyrin, β-sitosterol glycoside, and des-Nmethylnoracronycine, whereas leaves were reported to contain triterpenoid constitutes as wellas tannins. While, fruits have been reported to contain carisol, epimer of α-amyrin, linalool, β-caryophyllene, carissone, carissic acid, carindone,ursolic acid, carinol, ascorbic acid, lupeol, and β-sitosterol. Ethnopharmacological significance of the plant has been ascribed due to anti-cancer,anti-convulsant, anti-oxidant, analgesic, anti-inflammatoryAQ1, anti-ulcer, anthelmintic activity, cardiovascular, anti-nociceptive, anti-diabetic,antipyretic, hepatoprotective, neuropharmacological, and diuretic activities, antimicrobial activities and cytotoxic potentials, in-vitro anti-oxidant,and DNA damage inhibition, and constipation and diarrheal activities. The review also dealt with describing micropropagation strategies for effectiveconservation of this important food and medicinal plant. The review has been written with the aim to provide a direction for further clinical researchto promote safe and effective herbal treatments to cure a number of diseases
Age- and Sex-Specific Burden of Morbidity and Disability in India: A Current Scenario
India is the second most populous country in the world with a population of 1.3 billion; any change in its morbidity and disability pattern is bound to bring change at the Asia level, which is a matter of concern for the developing countries. Disability-free life expectancy (DFLE) and disability-adjusted life years (DALYs) provide summary measures of health across characteristics. The assessments of epidemiological patterns and health system performance of any place and time period display its progress towards the goal of sustainable development goals (SDGs). The main aim of this study is to assess the age and sex pattern of the burden of diseases (mortality and morbidity) and disability in India. The information on disease and deaths was extracted from the 71st round of the National Sample Surveys (NSS) conducted in 2014 (NSS 2014) and the Causes of Deaths Study conducted in the 2010–2013 (RGI 2010–2013) and disability from Census of India 2011 (ORG 2011)
Periodicals and Nation-Building: The Public Sphere, Modernity, and Modernism in Modern Review and Visva Bharati Quarterly
The paper analyzes selections from Modern Review and Visva Bharati Quarterly, to study the complex act of nation-building taking place in India during the first half of the twentieth century. Through these periodicals, it discusses three interconnected occurrences that contributed to the envisioning of new India: firstly, the construction of a politically aware public sphere through nationalistic sentiments and anti-imperial internationalism; secondly, India’s localization of modernity as oscillating between the colonial subjects’ reactionary modernity and the colonially administered modernity of domination; and thirdly, the emergence of a modernism that was more immersed in restructuring social and political systems of power than being restricted to formal and aesthetic novelty. Thus, drawing on writings published in Modern Review and Visva Bharati Quarterly, the paper assesses the degree to which the two periodicals realized the identity of new India
- …