41 research outputs found
Online Deception Detection Refueled by Real World Data Collection
The lack of large realistic datasets presents a bottleneck in online
deception detection studies. In this paper, we apply a data collection method
based on social network analysis to quickly identify high-quality deceptive and
truthful online reviews from Amazon. The dataset contains more than 10,000
deceptive reviews and is diverse in product domains and reviewers. Using this
dataset, we explore effective general features for online deception detection
that perform well across domains. We demonstrate that with generalized features
- advertising speak and writing complexity scores - deception detection
performance can be further improved by adding additional deceptive reviews from
assorted domains in training. Finally, reviewer level evaluation gives an
interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing
(RANLP) 201
Optimal- difference sequence in nonparametric regression
Difference-based methods have been attracting increasing attention in
nonparametric regression, in particular for estimating the residual variance.To
implement the estimation, one needs to choose an appropriate difference
sequence, mainly between {\em the optimal difference sequence} and {\em the
ordinary difference sequence}. The difference sequence selection is a
fundamental problem in nonparametric regression, and it remains a controversial
issue for over three decades. In this paper, we propose to tackle this
challenging issue from a very unique perspective, namely by introducing a new
difference sequence called {\em the optimal- difference sequence}. The new
difference sequence not only provides a better balance between the
bias-variance trade-off, but also dramatically enlarges the existing family of
difference sequences that includes the optimal and ordinary difference
sequences as two important special cases. We further demonstrate, by both
theoretical and numerical studies, that the optimal- difference sequence has
been pushing the boundaries of our knowledge in difference-based methods in
nonparametric regression, and it always performs the best in practical
situations
Global Depths for Irregularly Observed Multivariate Functional Data
Two frameworks for multivariate functional depth based on multivariate depths
are introduced in this paper. The first framework is multivariate functional
integrated depth, and the second framework involves multivariate functional
extremal depth, which is an extension of the extremal depth for univariate
functional data. In each framework, global and local multivariate functional
depths are proposed. The properties of population multivariate functional
depths and consistency of finite sample depths to their population versions are
established. In addition, finite sample depths under irregularly observed time
grids are estimated. As a by-product, the simplified sparse functional boxplot
and simplified intensity sparse functional boxplot are proposed for
visualization without data reconstruction. A simulation study demonstrates the
advantages of global multivariate functional depths over local multivariate
functional depths in outlier detection and running time for big functional
data. An application of our frameworks to cyclone tracks data demonstrates the
excellent performance of our global multivariate functional depths.Comment: 29 pages, 6 figure
A New Functional Clustering Method with Combined Dissimilarity Sources and Graphical Interpretation
Clustering is an essential task in functional data analysis. In this study, we propose a framework for a clustering procedure based on functional rankings or depth. Our methods naturally combine various types of between-cluster variation equally, which caters to various discriminative sources of functional data; for example, they combine raw data with transformed data or various components of multivariate functional data with their covariance. Our methods also enhance the clustering results with a visualization tool that allows intrinsic graphical interpretation. Finally, our methods are model-free and nonparametric and hence are robust to heavy-tailed distribution or potential outliers. The implementation and performance of the proposed methods are illustrated with a simulation study and applied to three real-world applications
Time Reversal Enabled Fiber-Optic Time Synchronization
Over the past few decades, fiber-optic time synchronization (FOTS) has
provided fundamental support for the efficient operation of modern society.
Looking toward the future beyond fifth-generation/sixth-generation (B5G/6G)
scenarios and very large radio telescope arrays, developing high-precision,
low-complexity and scalable FOTS technology is crucial for building a
large-scale time synchronization network. However, the traditional two-way FOTS
method needs a data layer to exchange time delay information. This increases
the complexity of system and makes it impossible to realize multiple-access
time synchronization. In this paper, a time reversal enabled FOTS method is
proposed. It measures the clock difference between two locations without
involving a data layer, which can reduce the complexity of the system.
Moreover, it can also achieve multiple-access time synchronization along the
fiber link. Tests over a 230 km fiber link have been carried out to demonstrate
the high performance of the proposed method
BPTF promotes tumor growth and predicts poor prognosis in lung adenocarcinomas.
BPTF, a subunit of NURF, is well known to be involved in the development of eukaryotic cell, but little is known about its roles in cancers, especially in non-small-cell lung cancer (NSCLC). Here we showed that BPTF was specifically overexpressed in NSCLC cell lines and lung adenocarcinoma tissues. Knockdown of BPTF by siRNA significantly inhibited cell proliferation, induced cell apoptosis and arrested cell cycle progress from G1 to S phase. We also found that BPTF knockdown downregulated the expression of the phosphorylated Erk1/2, PI3K and Akt proteins and induced the cleavage of caspase-8, caspase-7 and PARP proteins, thereby inhibiting the MAPK and PI3K/AKT signaling and activating apoptotic pathway. BPTF knockdown by siRNA also upregulated the cell cycle inhibitors such as p21 and p18 but inhibited the expression of cyclin D, phospho-Rb and phospho-cdc2 in lung cancer cells. Moreover, BPTF knockdown by its specific shRNA inhibited lung cancer growth in vivo in the xenografts of A549 cells accompanied by the suppression of VEGF, p-Erk and p-Akt expression. Immunohistochemical assay for tumor tissue microarrays of lung tumor tissues showed that BPTF overexpression predicted a poor prognosis in the patients with lung adenocarcinomas. Therefore, our data indicate that BPTF plays an essential role in cell growth and survival by targeting multiply signaling pathways in human lung cancers
Optimal Estimation of Derivatives in Nonparametric Regression
Abstract We propose a simple framework for estimating derivatives without fitting the regression function in nonparametric regression. Unlike most existing methods that use the symmetric difference quotients, our method is constructed as a linear combination of observations. It is hence very flexible and applicable to both interior and boundary points, including most existing methods as special cases of ours. Within this framework, we define the variance-minimizing estimators for any order derivative of the regression function with a fixed bias-reduction level. For the equidistant design, we derive the asymptotic variance and bias of these estimators. We also show that our new method will, for the first time, achieve the asymptotically optimal convergence rate for difference-based estimators. Finally, we provide an effective criterion for selection of tuning parameters and demonstrate the usefulness of the proposed method through extensive simulation studies of the firstand second-order derivative estimators