630 research outputs found
Homophily and Contagion Are Generically Confounded in Observational Social Network Studies
We consider processes on social networks that can potentially involve three
factors: homophily, or the formation of social ties due to matching individual
traits; social contagion, also known as social influence; and the causal effect
of an individual's covariates on their behavior or other measurable responses.
We show that, generically, all of these are confounded with each other.
Distinguishing them from one another requires strong assumptions on the
parametrization of the social process or on the adequacy of the covariates used
(or both). In particular we demonstrate, with simple examples, that asymmetries
in regression coefficients cannot identify causal effects, and that very simple
models of imitation (a form of social contagion) can produce substantial
correlations between an individual's enduring traits and their choices, even
when there is no intrinsic affinity between them. We also suggest some possible
constructive responses to these results.Comment: 27 pages, 9 figures. V2: Revised in response to referees. V3: Ditt
Data based identification and prediction of nonlinear and complex dynamical systems
We thank Dr. R. Yang (formerly at ASU), Dr. R.-Q. Su (formerly at ASU), and Mr. Zhesi Shen for their contributions to a number of original papers on which this Review is partly based. This work was supported by ARO under Grant No. W911NF-14-1-0504. W.-X. Wang was also supported by NSFC under Grants No. 61573064 and No. 61074116, as well as by the Fundamental Research Funds for the Central Universities, Beijing Nova Programme.Peer reviewedPostprin
Breadth analysis of Online Social Networks
This thesis is mainly motivated by the analysis, understanding, and prediction of human behaviour
by means of the study of their digital fingeprints. Unlike a classical PhD thesis, where
you choose a topic and go further on a deep analysis on a research topic, we carried out a breadth
analysis on the research topic of complex networks, such as those that humans create themselves
with their relationships and interactions. These kinds of digital communities where humans interact
and create relationships are commonly called Online Social Networks. Then, (i) we have
collected their interactions, as text messages they share among each other, in order to analyze the
sentiment and topic of such messages. We have basically applied the state-of-the-art techniques
for Natural Language Processing, widely developed and tested on English texts, in a collection
of Spanish Tweets and we compare the results. Next, (ii) we focused on Topic Detection, creating
our own classifier and applying it to the former Tweets dataset. The breakthroughs are two:
our classifier relies on text-graphs from the input text and we achieved a figure of 70% accuracy,
outperforming previous results. After that, (iii) we moved to analyze the network structure (or
topology) and their data values to detect outliers. We hypothesize that in social networks there
is a large mass of users that behaves similarly, while a reduced set of them behave in a different
way. However, specially among this last group, we try to separate those with high activity, or
low activity, or any other paramater/feature that make them belong to different kind of outliers.
We aim to detect influential users in one of these outliers set. We propose a new unsupervised
method, Massive Unsupervised Outlier Detection (MUOD), labeling the outliers detected os of
shape, magnitude, amplitude or combination of those. We applied this method to a subset of
roughly 400 million Google+ users, identifying and discriminating automatically sets of outlier
users. Finally, (iv) we find interesting to address the monitorization of real complex networks.
We created a framework to dynamically adapt the temporality of large-scale dynamic networks,
reducing compute overhead by at least 76%, data volume by 60% and overall cloud costs by at
least 54%, while always maintaining accuracy above 88%.PublicadoPrograma de Doctorado en Ingeniería Matemática por la Universidad Carlos III de MadridPresidente: Rosa María Benito Zafrilla.- Secretario: Ángel Cuevas Rumín.- Vocal: José Ernesto Jiménez Merin
Forecasting: theory and practice
Forecasting has always been in the forefront of decision making and planning.
The uncertainty that surrounds the future is both exciting and challenging,
with individuals and organisations seeking to minimise risks and maximise
utilities. The lack of a free-lunch theorem implies the need for a diverse set
of forecasting methods to tackle an array of applications. This unique article
provides a non-systematic review of the theory and the practice of forecasting.
We offer a wide range of theoretical, state-of-the-art models, methods,
principles, and approaches to prepare, produce, organise, and evaluate
forecasts. We then demonstrate how such theoretical concepts are applied in a
variety of real-life contexts, including operations, economics, finance,
energy, environment, and social good. We do not claim that this review is an
exhaustive list of methods and applications. The list was compiled based on the
expertise and interests of the authors. However, we wish that our encyclopedic
presentation will offer a point of reference for the rich work that has been
undertaken over the last decades, with some key insights for the future of the
forecasting theory and practice
Tumour growth: An approach to calibrate parameters of a multiphase porous media model based on in vitro observations of Neuroblastoma spheroid growth in a hydrogel microenvironment
To unravel processes that lead to the growth of solid tumours, it is necessary to link knowledge of cancer biology with the physical properties of the tumour and its interaction with the surrounding microenvironment. Our understanding of the underlying mechanisms is however still imprecise. We therefore developed computational physics-based models, which incorporate the interaction of the tumour with its surroundings based on the theory of porous media. However, the experimental validation of such models represents a challenge to its clinical use as a prognostic tool. This study combines a physics-based model with in vitro experiments based on microfluidic devices used to mimic a three-dimensional tumour microenvironment. By conducting a global sensitivity analysis, we identify the most influential input parameters and infer their posterior distribution based on Bayesian calibration. The resulting probability density is in agreement with the scattering of the experimental data and thus validates the proposed workflow. This study demonstrates the huge challenges associated with determining precise parameters with usually only limited data for such complex processes and models, but also demonstrates in general how to indirectly characterise the mechanical properties of neuroblastoma spheroids that cannot feasibly be measured experimentally
Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain
The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
Statistical Tools for Network Data: Prediction and Resampling
Advances in data collection and social media have led to
more and more network data appearing in diverse areas, such
as social sciences, internet, transportation and biology.
This thesis develops new principled statistical tools for network analysis,
with emphasis on both appealing statistical properties and
computational efficiency.
Our first project focuses on building prediction models for
network-linked data. Prediction algorithms typically assume the
training data are independent samples, but in many modern applications
samples come from individuals connected by a network. For example, in
adolescent health studies of risk-taking behaviors, information on the
subjects' social network is often available and plays an important
role through network cohesion, the empirically observed phenomenon of
friends behaving similarly. Taking cohesion into account in
prediction models should allow us to improve their performance. We propose a network-based penalty on individual node effects to encourage similarity between predictions for linked nodes, and show that incorporating it into prediction leads to improvement over
traditional models both theoretically and empirically when network
cohesion is present. The penalty can be used with many loss-based
prediction methods, such as regression, generalized linear models, and
Cox's proportional hazard model. Applications to predicting levels of
recreational activity and marijuana usage among teenagers from the
AddHealth study based on both demographic covariates and friendship
networks are discussed in detail. Our approach to taking
friendships into account can significantly improve predictions of
behavior while providing interpretable estimates of covariate effects.
Resampling, data splitting, and cross-validation are powerful general strategies in statistical inference, but resampling from a network remains
a challenging problem. Many statistical models and methods for networks need model selection and tuning parameters, which could be done by cross-validation if we had a good method for splitting network data; however, splitting
network nodes into groups requires deleting edges and destroys some of
the structure. Here we propose a new network cross-validation
strategy based on splitting edges rather than nodes, which avoids
losing information and is applicable to a wide range of network
models. We provide a theoretical justification for our method in a
general setting and demonstrate how our method can be used in a
number of specific model selection and parameter tuning tasks, with extensive
numerical results on simulated networks. We also apply the method to analysis of a citation
network of statisticians and obtain meaningful research communities.
Finally, we consider the problem of community detection on partially
observed networks. However, in
practice, network data are often collected through sampling
mechanisms, such as survey questionnaires, instead of direct
observation. The noise and bias introduced by such sampling mechanisms can obscure the community structure and invalidate the assumptions of standard community detection
methods. We propose a model to
incorporate neighborhood sampling, through a model reflective of survey designs, into community detection for directed networks, since friendship networks obtained from surveys are naturally directed. We model the edge sampling probabilities as a function of both individual preferences and community parameters, and fit the model by a combination of spectral clustering and the method of
moments. The algorithm is computationally efficient and comes with a theoretical guarantee of consistency. We evaluate the proposed
model in extensive simulation studies and applied it to a
faculty hiring dataset, discovering a meaningful hierarchy of communities among US business schools.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145894/1/tianxili_1.pd
Scalable Machine Learning Methods for Massive Biomedical Data Analysis.
Modern data acquisition techniques have enabled biomedical researchers to collect and analyze datasets of substantial size and complexity. The massive size of these datasets allows us to comprehensively study the biological system of interest at an unprecedented level of detail, which may lead to the discovery of clinically relevant biomarkers. Nonetheless, the dimensionality of these datasets presents critical computational and statistical challenges, as traditional statistical methods break down when the number of predictors dominates the number of observations, a setting frequently encountered in biomedical data analysis. This difficulty is compounded by the fact that biological data tend to be noisy and often possess complex correlation patterns among the predictors. The central goal of this dissertation is to develop a computationally tractable machine learning framework that allows us to extract scientifically meaningful information from these massive and highly complex biomedical datasets. We motivate the scope of our study by considering two important problems with clinical relevance: (1) uncertainty analysis for biomedical image registration, and (2) psychiatric disease prediction based on functional connectomes, which are high dimensional correlation maps generated from resting state functional MRI.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111354/1/takanori_1.pd
Towards Explainable Artificial Intelligence (XAI): A Data Mining Perspective
Given the complexity and lack of transparency in deep neural networks (DNNs),
extensive efforts have been made to make these systems more interpretable or
explain their behaviors in accessible terms. Unlike most reviews, which focus
on algorithmic and model-centric perspectives, this work takes a "data-centric"
view, examining how data collection, processing, and analysis contribute to
explainable AI (XAI). We categorize existing work into three categories subject
to their purposes: interpretations of deep models, referring to feature
attributions and reasoning processes that correlate data points with model
outputs; influences of training data, examining the impact of training data
nuances, such as data valuation and sample anomalies, on decision-making
processes; and insights of domain knowledge, discovering latent patterns and
fostering new knowledge from data and models to advance social values and
scientific discovery. Specifically, we distill XAI methodologies into data
mining operations on training and testing data across modalities, such as
images, text, and tabular data, as well as on training logs, checkpoints,
models and other DNN behavior descriptors. In this way, our study offers a
comprehensive, data-centric examination of XAI from a lens of data mining
methods and applications
Análisis de datos etnográficos, antropológicos y arqueológicos: una aproximación desde las humanidades digitales y los sistemas complejos
La llegada de las Ciencias de la Computación, el Big Data, el Análisis de Datos, el Aprendizaje Automático y la Minería de Datos ha modificado la manera en que se hace ciencia en todos los campos científicos, dando lugar, a su vez, a la aparición de nuevas disciplinas tales como la Mecánica Computacional, la Bioinformática, la Ingeniería de la Salud, las Ciencias Sociales Computacionales, la Economía Computacional, la Arqueología Computacional y las Humanidades Digitales –entre otras. Cabe destacar que todas estas nuevas disciplinas son todavía muy jóvenes y están en continuo crecimiento, por lo que contribuir a su avance y consolidación tiene un gran valor científico.
En esta tesis doctoral contribuimos al desarrollo de una nueva línea de investigación dedicada al uso de modelos formales, métodos analíticos y enfoques computacionales para el estudio de las sociedades humanas tanto actuales como del pasado.El Ministerio de Ciencia e Innovación
• Proyecto SimulPast – “Transiciones sociales y ambientales: simulando el pasado para
entender el comportamiento humano” (CSD2010-00034 CONSOLIDER-INGENIO
2010).
• Proyecto CULM – “Modelado del cultivo en la prehistoria” (HAR2016-77672-P).
• Red de Excelencia SimPastNet – “Simular el pasado para entender el
comportamiento humano” (HAR2017-90883-REDC).
• Red de Excelencia SocioComplex – “Sistemas Complejos Socio-Tecnológicos”
(RED2018-102518-T).
La Consejería de Educación de la Junta de Castilla y León
• Subvención a la línea de investigación “Entendiendo el comportamiento humano,
una aproximación desde los sistemas complejos y las humanidades digitales” dentro
del programa de apoyo a los grupos de investigación reconocidos (GIR) de las
universidades públicas de Castilla y León (BDNS 425389
- …