57 research outputs found
Nationality Classification Using Name Embeddings
Nationality identification unlocks important demographic information, with
many applications in biomedical and sociological research. Existing name-based
nationality classifiers use name substrings as features and are trained on
small, unrepresentative sets of labeled names, typically extracted from
Wikipedia. As a result, these methods achieve limited performance and cannot
support fine-grained classification.
We exploit the phenomena of homophily in communication patterns to learn name
embeddings, a new representation that encodes gender, ethnicity, and
nationality which is readily applicable to building classifiers and other
systems. Through our analysis of 57M contact lists from a major Internet
company, we are able to design a fine-grained nationality classifier covering
39 groups representing over 90% of the world population. In an evaluation
against other published systems over 13 common classes, our F1 score (0.795) is
substantial better than our closest competitor Ethnea (0.580). To the best of
our knowledge, this is the most accurate, fine-grained nationality classifier
available.
As a social media application, we apply our classifiers to the followers of
major Twitter celebrities over six different domains. We demonstrate stark
differences in the ethnicities of the followers of Trump and Obama, and in the
sports and entertainments favored by different groups. Finally, we identify an
anomalous political figure whose presumably inflated following appears largely
incapable of reading the language he posts in.Comment: 10 pages, 9 figures, 4 table, accepted by CIKM 2017, Demo and free
API: www.name-prism.co
Tetragonal Mexican-Hat Dispersion and Switchable Half-Metal State with Multiple Anisotropic Weyl Fermions in Penta-Graphene
In past decades, the ever-expanding library of 2D carbon allotropes has
yielded a broad range of exotic properties for the future carbon-based
electronics. However, the known allotropes are all intrinsic nonmagnetic due to
the paired valence electrons configuration. Based on the reported 2D carbon
structure database and first-principles calculations, herein we demonstrate
that inherent ferromagnetism can be obtained in the prominent allotrope,
penta-graphene, which has an unique Mexican-hat valence band edge, giving rise
to van Hove singularities and electronic instability. Induced by modest
hole-doping, being achievable in electrolyte gate, the semiconducting
pentagraphene can transform into different ferromagnetic half-metals with room
temperature stability and switchable spin directions. In particular, multiple
anisotropic Weyl states, including type-I and type-II Weyl cones and hybrid
quasi Weyl nodal loop, can be found in a sizable energy window of spin-down
half-metal under proper strains. These findings not only identify a promising
carbon allotrope to obtain the inherent magnetism for carbon-based spintronic
devices, but highlight the possibility to realize different Weyl states by
combining the electronic and mechanical means as well
JourneyDB: A Benchmark for Generative Image Understanding
While recent advancements in vision-language models have had a transformative
impact on multi-modal comprehension, the extent to which these models possess
the ability to comprehend generated images remains uncertain. Synthetic images,
in comparison to real data, encompass a higher level of diversity in terms of
both content and style, thereby presenting significant challenges for the
models to fully grasp. In light of this challenge, we introduce a comprehensive
dataset, referred to as JourneyDB, that caters to the domain of generative
images within the context of multi-modal visual understanding. Our meticulously
curated dataset comprises 4 million distinct and high-quality generated images,
each paired with the corresponding text prompts that were employed in their
creation. Furthermore, we additionally introduce an external subset with
results of another 22 text-to-image generative models, which makes JourneyDB a
comprehensive benchmark for evaluating the comprehension of generated images.
On our dataset, we have devised four benchmarks to assess the performance of
generated image comprehension in relation to both content and style
interpretation. These benchmarks encompass prompt inversion, style retrieval,
image captioning, and visual question answering. Lastly, we evaluate the
performance of state-of-the-art multi-modal models when applied to the
JourneyDB dataset, providing a comprehensive analysis of their strengths and
limitations in comprehending generated content. We anticipate that the proposed
dataset and benchmarks will facilitate further research in the field of
generative content understanding. The dataset is publicly available at
https://journeydb.github.io.Comment: Accepted to the Thirty-seventh Conference on Neural Information
Processing Systems (NeurIPS 2023
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Large Language Models (LLMs) exhibit impressive reasoning and data
augmentation capabilities in various NLP tasks. However, what about small
models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant
fundamentals, chain of thought, and common mistakes for most NLP samples, which
makes annotation more than just an answer, thus allowing other models to learn
"why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot
score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even
more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we
augmented 58 NLP datasets and taught various student models with different
parameters from OPT and BLOOM series in a multi-task setting. The experimental
results indicate that the data augmentation provided by TeacherLM has brought
significant benefits. We will release the TeacherLM series of models and
augmented datasets as open-source.Comment: 5 figures, 15 page
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Core-collapse supernova (CCSN) is one of the most energetic astrophysical
events in the Universe. The early and prompt detection of neutrinos before
(pre-SN) and during the SN burst is a unique opportunity to realize the
multi-messenger observation of the CCSN events. In this work, we describe the
monitoring concept and present the sensitivity of the system to the pre-SN and
SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is
a 20 kton liquid scintillator detector under construction in South China. The
real-time monitoring system is designed with both the prompt monitors on the
electronic board and online monitors at the data acquisition stage, in order to
ensure both the alert speed and alert coverage of progenitor stars. By assuming
a false alert rate of 1 per year, this monitoring system can be sensitive to
the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos
up to about 370 (360) kpc for a progenitor mass of 30 for the case
of normal (inverted) mass ordering. The pointing ability of the CCSN is
evaluated by using the accumulated event anisotropy of the inverse beta decay
interactions from pre-SN or SN neutrinos, which, along with the early alert,
can play important roles for the followup multi-messenger observations of the
next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure
Characteristics of annular surface dielectric barrier discharge with microsecond pulse under water-covered condition.
Surface dielectric barrier discharge (SDBD) has wide applications in flow control, wastewater treatment, and biomedicine. The dielectric surface of an SDBD actuator is generally attached to the water droplets during applications. Thus far, only a few studies have been conducted on the effects of water covering the dielectric surface on the discharge characteristics of SDBD. Therefore, the effects of water droplets on the discharge of an SDBD actuator based on a repetitive microsecond pulse power supply were investigated in this study. The results show that a filament micro-discharge channel forms between the light and dark regions at the internal edge of the SDBD high-voltage electrode and develops toward the center of the dielectric surface in the region without water droplet coverage. SDBD in the water-covered region was divided into two stages. This paper compares the electrical characteristics of SDBD with and without water droplet, and explores the electric field distortion effect of water droplet endpoints through 3D simulation.Based on the theories of water droplet polarization and gas discharge, the effects of water droplets on plasma development and surface charge accumulation under water-covered condition were analyzed. The water droplet plays a similar role as a "secondary electrode" during the discharge process
Corporate Performance, Market-Industry Competition and Enterprise Environmental-Protection Investment
Worldwide, many countries regard green as a keyword related to development, and investments into environmental protection are an important way for enterprises to achieve green development. Therefore, clarifying which factors influence enterprises to invest into environmental protection is very important. Starting from micro-enterprises and using the data from companies listed in China’s A-share manufacturing industry from 2008 to 2019, in this study, we empirically analyze the relationship between corporate performance (CP) and the scale of investments by enterprises into environmental protection (EI) and analyze the moderating effect of industry competition on the relationship between CP and EI. The result shows that (1) a positive correlation can be found between CP and EI; (2) fierce industry competition can increase the positive impact of CP on EI; and (3) compared with industries with non-heavy pollution, fierce industrial competition increases the positive impact of CP on EI in industries with heavy pollution. The research results show that performance is a key factor influencing enterprises’ decisions about investments into environmental protection, and industry competition can stimulate enterprises to invest into environmental protection. This study explores the internal and external factors influencing an organization to promote active behaviors of investing into environmental protection, provides a reference for enterprises to explore “win–win” paths, and provides a certain theoretical basis for the government to improve relevant regulations
Corporate Performance, Market-Industry Competition and Enterprise Environmental-Protection Investment
Worldwide, many countries regard green as a keyword related to development, and investments into environmental protection are an important way for enterprises to achieve green development. Therefore, clarifying which factors influence enterprises to invest into environmental protection is very important. Starting from micro-enterprises and using the data from companies listed in China’s A-share manufacturing industry from 2008 to 2019, in this study, we empirically analyze the relationship between corporate performance (CP) and the scale of investments by enterprises into environmental protection (EI) and analyze the moderating effect of industry competition on the relationship between CP and EI. The result shows that (1) a positive correlation can be found between CP and EI; (2) fierce industry competition can increase the positive impact of CP on EI; and (3) compared with industries with non-heavy pollution, fierce industrial competition increases the positive impact of CP on EI in industries with heavy pollution. The research results show that performance is a key factor influencing enterprises’ decisions about investments into environmental protection, and industry competition can stimulate enterprises to invest into environmental protection. This study explores the internal and external factors influencing an organization to promote active behaviors of investing into environmental protection, provides a reference for enterprises to explore “win–win” paths, and provides a certain theoretical basis for the government to improve relevant regulations
Fig 5 -
Gray value change of micro-discharge channels in single-pulse discharge image: (a) Grayscale curve of micro-discharge channels in single-pulse discharge image, (b) Mean of micro-discharge channels gray value of single pulse discharge image.</p
- …