1,147 research outputs found
Community Structure in the United States House of Representatives
We investigate the networks of committee and subcommittee assignments in the
United States House of Representatives from the 101st--108th Congresses, with
the committees connected by ``interlocks'' or common membership. We examine the
community structure in these networks using several methods, revealing strong
links between certain committees as well as an intrinsic hierarchical structure
in the House as a whole. We identify structural changes, including additional
hierarchical levels and higher modularity, resulting from the 1994 election, in
which the Republican party earned majority status in the House for the first
time in more than forty years. We also combine our network approach with
analysis of roll call votes using singular value decomposition to uncover
correlations between the political and organizational structure of House
committees.Comment: 44 pages, 13 figures (some with multiple parts and most in color), 9
tables, to appear in Physica A; new figures and revised discussion (including
extra introductory material) for this versio
Nonparametric (distribution-free) control charts : an updated overview and some results
Control charts that are based on assumption(s) of a specific form for the underlying process distribution are referred to as parametric control charts. There are many applications where there is insufficient information to justify such assumption(s) and, consequently, control charting techniques with a minimal set of distributional assumption requirements are in high demand. To this end, nonparametric or distribution-free control charts have been proposed in recent years. The charts have stable in-control properties, are robust against outliers and can be surprisingly efficient in comparison with their parametric counterparts. Chakraborti and some of his colleagues provided review papers on nonparametric control charts in 2001, 2007 and 2011, respectively. These papers have been received with considerable interest and attention by the community. However, the literature on nonparametric statistical process/quality control/monitoring has grown exponentially and because of this rapid growth, an update is deemed necessary. In this article, we bring these reviews forward to 2017, discussing some of the latest developments in the area. Moreover, unlike the past reviews, which did not include the multivariate charts, here we review both univariate and multivariate nonparametric control charts. We end with some concluding remarks.https://www.tandfonline.com/loi/lqen20hj2020Science, Mathematics and Technology Educatio
Towards an Architecture for Efficient Distributed Search of Multimodal Information
The creation of very large-scale multimedia search engines, with more than one billion
images and videos, is a pressing need of digital societies where data is generated by multiple connected devices. Distributing search indexes in cloud environments is the inevitable solution to deal with the increasing scale of image and video collections. The distribution of such indexes in this setting raises multiple challenges such as the even partitioning of data space, load balancing across index nodes and the fusion of the results computed over multiple nodes. The main question behind this thesis is how to reduce and distribute the multimedia retrieval computational complexity?
This thesis studies the extension of sparse hash inverted indexing to distributed settings.
The main goal is to ensure that indexes are uniformly distributed across computing nodes while keeping similar documents on the same nodes. Load balancing is performed at both node and index level, to guarantee that the retrieval process is not delayed by nodes that have to inspect larger subsets of the index.
Multimodal search requires the combination of the search results from individual modalities and document features. This thesis studies rank fusion techniques focused on reducing complexity by automatically selecting only the features that improve retrieval effectiveness.
The achievements of this thesis span both distributed indexing and rank fusion research.
Experiments across multiple datasets show that sparse hashes can be used to distribute documents and queries across index entries in a balanced and redundant manner across nodes. Rank fusion results show that is possible to reduce retrieval complexity and improve efficiency by searching only a subset of the feature indexes
Recommended from our members
Heterogeneous Treatment Effect Estimation Using Machine Learning
With the rise of large and fine-grained data sets, there is a desire for researchers, physicians, businesses, and policymakers to estimate the treatment effect heterogeneity across individuals and contexts at an ever-greater precision to effectively allocate resources, to adequately assign treatments, and to understand the underlying causal mechanism. In this thesis, we provide tools for estimating and understanding the treatment heterogeneity.Chapter 1 introduces a unifying framework for many estimators of the Conditional Average Treatment Effect (CATE), a function that describes the treatment heterogeneity. We introduce meta-learners as algorithms that can be combined with any machine learning/regression method to estimate the CATE. We also propose a new meta-learner, the X-learner, that can adapt to structural properties such as the smoothness and sparsity of the underlying treatment effect. We then present its desirable properties through simulations and theory and apply it to two field experiments.As part of this thesis, we created an R package, causalToolbox, that implements eight CATE estimators and several tools that are useful to estimate the CATE and understand the underlying causal mechanism. Chapter 2 focuses on the causalToolbox package and explains how the package is structured and implemented. The package uses the same syntax for all implemented CATE estimators. That makes it easy for appliers to switch between estimators and compare different estimators on a given data set. We give examples of how it can be used to find a well-performing estimator for a given data set, how confidence intervals for the CATE can be computed, and how estimating the CATE for a unit with many CATE estimators simultaneously can give practitioners a sense for which estimates are unstable and depend heavily on the chosen estimator. Chapter 3 is an application of the causalToolbox package. It shows how useful it is in a simulation study that has been set up for the Empirical Investigation of Methods for Heterogeneity Workshop at the 2018 Atlantic Causal Inference Conference by Carlos Carvalho, Jennifer Hill, Jared Murray, and Avi Feller, based on the National Study of Learning Mindsets.When implementing the CATE estimators, we noticed that there was a need for a variation of the Random Forests (RF) algorithm that works particularly well for statistical inference. We designed an R package, forestry, that implements a new version of the RF algorithm and several tools for statistical inference with it. In Chapter 4, we describe the problem that confidence interval estimation with RF can perform poorly in areas where RF are biased or in areas outside of the support of the training data. We then introduce a new method that allows us to screen for points for which our confidence intervals methods should not be used. CATE estimates can be used to assign treatments to subjects, but in many studies, estimating the CATE is not the ultimate goal. Researchers often want to understand the underlying causal mechanisms. In Chapter 5, we discuss a modification of the RF algorithm that is particularly interpretable and allows practitioners to understand the underlying mechanism better. Usually, RF are based on deep regression trees that are difficult to understand. In this new version of the RF, we use linear response functions and very shallow trees to make the results more easily understandable. The algorithm finds splits in quasi-linear time and locally adapts to the smoothness of the underlying response functions. In an experimental study, we show that it leads to shallow and interpretable trees that compare favorably to other regression estimators on a broad range of real-world data sets
Spam Detection Using Machine Learning and Deep Learning
Text messages are essential these days; however, spam texts have contributed negatively to the success of this communication mode. The compromised authenticity of such messages has given rise to several security breaches. Using spam messages, malicious links have been sent to either harm the system or obtain information detrimental to the user. Spam SMS messages as well as emails have been used as media for attacks such as masquerading and smishing ( a phishing attack through text messaging), and this has threatened both the user and service providers. Therefore, given the waves of attacks, the need to identify and remove these spam messages is important.
This dissertation explores the process of text classification from data input to embedded representation of the words in vector form and finally the classification process. Therefore, we have applied different embedding methods to capture both the linguistic and semantic meanings of words. Static embedding methods that are used include Word to Vector (Word2Vec) and Global Vectors (GloVe), while for dynamic embedding the transfer learning of the Bidirectional Encoder Representations from Transformers (BERT) was employed. For classification, both machine learning and deep learning techniques were used to build an efficient and sensitive classification model with good accuracy and low false positive rate. Our result established that the combination of BERT for embedding and machine learning for classification produced better classification results than other combinations.
With these results, we developed models that combined the self-feature extraction advantage of deep learning and the effective classification of machine learning. These models were tested on four different datasets, namely: SMS Spam dataset, Ling dataset, Spam Assassin dataset and Enron dataset. BERT+SVC (hybrid model) produced the result with highest accuracy and lowest false positive rate
Quantitative Techniques in Participatory Forest Management
Forest management has evolved from a mercantilist view to a multi-functional one that integrates economic, social, and ecological aspects. However, the issue of sustainability is not yet resolved. Quantitative Techniques in Participatory Forest Management brings together global research in three areas of application: inventory of the forest variables that determine the main environmental indices, description and design of new environmental indices, and the application of sustainability indices for regional implementations. All these quantitative techniques create the basis for the development of scientific methodologies of participatory sustainable forest management
Artificial Intelligence-based Smarter Accessibility Evaluations for Comprehensive and Personalized Assessment
The research focuses on utilizing artificial intelligence (AI) and machine learning (ML) algorithms to enhance accessibility for people with disabilities (PwD) in three areas: public buildings, homes, and medical devices. The overarching goal is to improve the accuracy, reliability, and effectiveness of accessibility evaluation systems by leveraging smarter technologies. For public buildings, the challenge lies in developing an accurate and reliable accessibility evaluation system. AI can play a crucial role by analyzing data, identifying potential barriers, and assessing the accessibility of various features within buildings. By training ML algorithms on relevant data, the system can learn to make accurate predictions about the accessibility of different spaces and help policymakers and architects design more inclusive environments. For private places such as homes, it is essential to have a person-focused accessibility evaluation system. By utilizing machine learning-based intelligent systems, it becomes possible to assess the accessibility of individual homes based on specific needs and requirements. This personalized approach can help identify barriers and recommend modifications or assistive technologies that can enhance accessibility and independence for PwD within their own living spaces. The research also addresses the intelligent evaluation of healthcare devices in the home. Many PwD rely on medical devices for their daily living, and ensuring the accessibility and usability of these devices is crucial. AI can be employed to evaluate the accessibility features of medical devices, provide recommendations for improvement, and even measure their effectiveness in supporting the needs of PwD. Overall, this research aims to enhance the accuracy and reliability of accessibility evaluation systems by leveraging AI and ML technologies. By doing so, it seeks to improve the quality of life for individuals with disabilities by enabling increased independence, fostering social inclusion, and promoting better accessibility in public buildings, private homes, and medical devices
- …