315 research outputs found
Spatial-temporal data mining procedure: LASR
This paper is concerned with the statistical development of our
spatial-temporal data mining procedure, LASR (pronounced ``laser''). LASR is
the abbreviation for Longitudinal Analysis with Self-Registration of
large--small- data. It was motivated by a study of ``Neuromuscular
Electrical Stimulation'' experiments, where the data are noisy and
heterogeneous, might not align from one session to another, and involve a large
number of multiple comparisons. The three main components of LASR are: (1) data
segmentation for separating heterogeneous data and for distinguishing outliers,
(2) automatic approaches for spatial and temporal data registration, and (3)
statistical smoothing mapping for identifying ``activated'' regions based on
false-discovery-rate controlled -maps and movies. Each of the components is
of interest in its own right. As a statistical ensemble, the idea of LASR is
applicable to other types of spatial-temporal data sets beyond those from the
NMES experiments.Comment: Published at http://dx.doi.org/10.1214/074921706000000707 in the IMS
Lecture Notes--Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
Wardrop Equilibrium Can Be Boundedly Rational: A New Behavioral Theory of Route Choice
As one of the most fundamental concepts in transportation science, Wardrop
equilibrium (WE) has always had a relatively weak behavioral underpinning. To
strengthen this foundation, one must reckon with bounded rationality in human
decision-making processes, such as the lack of accurate information, limited
computing power, and sub-optimal choices. This retreat from behavioral
perfectionism in the literature, however, was typically accompanied by a
conceptual modification of WE. Here we show that giving up perfect rationality
need not force a departure from WE. On the contrary, WE can be reached with
global stability in a routing game played by boundedly rational travelers. We
achieve this result by developing a day-to-day (DTD) dynamical model that
mimics how travelers gradually adjust their route valuations, hence choice
probabilities, based on past experiences. Our model, called cumulative logit
(CULO), resembles the classical DTD models but makes a crucial change: whereas
the classical models assume routes are valued based on the cost averaged over
historical data, ours values the routes based on the cost accumulated. To
describe route choice behaviors, the CULO model only uses two parameters, one
accounting for the rate at which the future route cost is discounted in the
valuation relative to the past ones and the other describing the sensitivity of
route choice probabilities to valuation differences. We prove that the CULO
model always converges to WE, regardless of the initial point, as long as the
behavioral parameters satisfy certain mild conditions. Our theory thus upholds
WE's role as a benchmark in transportation systems analysis. It also resolves
the theoretical challenge posed by Harsanyi's instability problem by explaining
why equally good routes at WE are selected with different probabilities
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
The recent performance leap of Large Language Models (LLMs) opens up new
opportunities across numerous industrial applications and domains. However,
erroneous generations, such as false predictions, misinformation, and
hallucination made by LLMs, have also raised severe concerns for the
trustworthiness of LLMs', especially in safety-, security- and
reliability-sensitive scenarios, potentially hindering real-world adoptions.
While uncertainty estimation has shown its potential for interpreting the
prediction risks made by general machine learning (ML) models, little is known
about whether and to what extent it can help explore an LLM's capabilities and
counteract its undesired behavior. To bridge the gap, in this paper, we
initiate an exploratory study on the risk assessment of LLMs from the lens of
uncertainty. In particular, we experiment with twelve uncertainty estimation
methods and four LLMs on four prominent natural language processing (NLP) tasks
to investigate to what extent uncertainty estimation techniques could help
characterize the prediction risks of LLMs. Our findings validate the
effectiveness of uncertainty estimation for revealing LLMs'
uncertain/non-factual predictions. In addition to general NLP tasks, we
extensively conduct experiments with four LLMs for code generation on two
datasets. We find that uncertainty estimation can potentially uncover buggy
programs generated by LLMs. Insights from our study shed light on future design
and development for reliable LLMs, facilitating further research toward
enhancing the trustworthiness of LLMs.Comment: 20 pages, 4 figure
Ammonia Nitrogen Pollution Characteristics of Natural Rainfall in Urban Business District in Southern China: A Case Study of Chengdu City
Chengdu city was chosen as the representative of southern cities in China in this work, characteristics of ammonia nitrogen (NH3-N) pollution in natural rainfall were analyzed by measuring the concentration in 15 natural rainfalls from April to September in 2017. The influence of ammonia emission from toilet vent of building on NH3-N pollution in rainfall was investigated, and the variation of total NH3-N pollutants and its influencing factors were expounded. The results showed that the average concentration of NH3-N in first rainfall was the highest, reaching 18.2mg/L, the average concentration of NH3-N in the subsequent 14 rainfalls was between 2.0 and 5.0mg/L, which is higher than Grade V (?2mg/L) of Environmental Quality Standards of Surface Water (GB 3838-2002), and was an important source of NH3-N pollution in water. The concentration of NH3-N in natural rainfalls decreased with the increase of the distance between the sampling point and the toilet vent, indicating that the ammonia discharged from toilet exhaust is a major source of NH3-N pollution in urban atmosphere. The main factors affecting total NH3-N pollutants in natural precipitation include rainfall intensity, rainfall duration and drought days. The total amount of NH3-N pollutants in surface runoff is less than that in natural rainfall
Modeling lightcurves for improved classification of astronomical objects
Many synoptic surveys are observing large parts of the sky multiple times. The resulting time series of light measurements, called lightcurves, provide a wonderful window to the dynamic nature of the Universe. However, there are many significant challenges in analyzing these lightcurves. We describe a modeling-based approach using Gaussian process regression for generating critical measures for the classification of such lightcurves. This method has key advantages over other popular nonparametric regression methods in its ability to deal with censoring, a mixture of sparsely and densely sampled curves, the presence of annual gaps caused by objects not being visible throughout the year from a given position on Earth and known but variable measurement errors. We demonstrate that our approach performs better by showing it has a higher correct classification rate than past methods popular in astronomy. Finally, we provide future directions for use in sky-surveys that are getting even bigger by the day
- …