21 research outputs found
Optimal experiment design in a filtering context with application to sampled network data
We examine the problem of optimal design in the context of filtering multiple
random walks. Specifically, we define the steady state E-optimal design
criterion and show that the underlying optimization problem leads to a second
order cone program. The developed methodology is applied to tracking network
flow volumes using sampled data, where the design variable corresponds to
controlling the sampling rate. The optimal design is numerically compared to a
myopic and a naive strategy. Finally, we relate our work to the general problem
of steady state optimal design for state space models.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS283 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Identifiability of flow distributions from link measurements with applications to computer networks
We study the problem of identifiability of distributions of flows on a graph from aggregate measurements collected on its edges. This is a canonical example of a statistical inverse problem motivated by recent developments in computer networks. In this paper (i) we introduce a number of models for multi-modal data that capture their spatio-temporal correlation, (ii) provide sufficient conditions for the identifiability of nth order cumulants and also for a special class of heavy tailed distributions. Further, we investigate conditions on network routing for the flows that prove sufficient for identifiability of their distributions (up to mean). Finally, we extend our results to directed acyclic graphs and discuss some open problems.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/58107/2/ip7_5_004.pd
A new framework of optimizing keyword weights in text categorization and record querying
In text mining research, the Vector Space Model (VSM) has been commonly used to represent text documents as a vector where each component is associated with a particular word in the documents. Assigning appropriate keyword weights in VSM has been critical in Information Retrieval (IR) and Text Categorization (TC).
Traditionally keyword weighting processes are unsupervised; that is, the knowledge of document's category is not leveraged to label the documents. Typically, each keyword weight is assigned using the term frequency -- inverse document frequency (TFIDF) measure. Although the TFIDF measure has been proven effective in several text mining problems, it might not give the optimal classification power for IR and TC. In this thesis, we propose a new optimization framework to find the best keyword weights based on the proposed inter-class and intra-class similarity concept.
The optimal keyword weight can be viewed as the feature space projection where documents from the same category are best clustered together and separated from other categories. Subsequently, the category average (centroid) classification is employed to categorize text documents. The proposed approach is tested on two practical applications: record query and text categorization. The record query application is slightly different from traditional IR problems as the goal is to find correlated (duplicate and master) text records. This problem was initiated by a telecommunication company where service engineers attempt to look for associations of the current defect problem in previously recorded problems in the database. Extensive experiments demonstrate that the proposed framework significantly improves the classification accuracy and provides balanced performance as measured on all text categories when compared to the standard TFIDF search. The text categorization application is tested on the Reuters news data set which is a gold-standard benchmark data set. The results show that our framework improves performance for the two applications considered, namely Information Retrieval and Text Categorization.M.S.Includes bibliographical references (p. 80-83)
Statistical Inverse Problems on Graphs with Application to Flow Volume Estimation in Computer Networks.
Estimation of flow volumes in computer networks involves the use of data that are either highly aggregated or fairly noisy. We address several conceptual and practical aspects of the use of such data for flow volume estimation in this work. The results presented are often of general statistical interest in addition to their application in computer networks context.
First, we study the problem of identifiability of joint distribution of flow volumes in a computer network from aggregate (lower dimensional)
measurements collected on its edges. Conceptually,
this is a canonical example of a statistical inverse problem. In a significant departure from previous approaches we investigate settings where flow-volumes exhibit dependence. We introduce a number of models that capture spatial, temporal and inter-modal (i.e. between packets and bytes)
dependence between flow-volumes. We provide sufficient, sometimes necessary, conditions for the identifiability of the flow volumes distribution (up to mean) under these models. Next, we use these results and models to perform computer network tomography using joint modeling for packet and byte volumes. We highlight various technical challenges, propose different estimating procedures and investigate their properties. Finally, we examine the problem of optimal design in the context of filtering multiple random walks. Specifically, we define the steady state E-optimal design criterion and show that the underlying optimization problem is convex. The developed methodology is
applied to tracking network flow volumes using sampled data, where the design variable
corresponds to controlling the sampling rate.Ph.D.StatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/62268/1/singhal_1.pd
IDENTIFYING AND DETECTING REAL-TIME OBJECTS USING DRONE CAMERA
As per the need of the time, there is tremendous increase in the application of Unmanned Aerial Vehicle (UAV) which is publicly known as Drone. In this paper we present a system which detects and counts number of people in real time using drone camera. Also there is growing interest in video-based solutions for detecting, monitoring and counting in business and security applications. Compared to early used classic sensor-based solutions the video based ones allow more versatile functionalities ,improved performance with lower cost.In this paper, we present a real-time system for people counting based on single low-end non-calibrated video camera. The main challenge of aerial image analysis includes: 1] the size of an object/human in aerial image can be very small, 2] the object in aerial images are tilted outward due to perspective projection deformation, which makes the humans hard to recognize in aerial images, and 3] the error is likely to occur whenever multiple persons move closely, e.g. in shopping centres