55,863 research outputs found
Fine-Grained Car Detection for Visual Census Estimation
Targeted socioeconomic policies require an accurate understanding of a
country's demographic makeup. To that end, the United States spends more than 1
billion dollars a year gathering census data such as race, gender, education,
occupation and unemployment rates. Compared to the traditional method of
collecting surveys across many years which is costly and labor intensive,
data-driven, machine learning driven approaches are cheaper and faster--with
the potential ability to detect trends in close to real time. In this work, we
leverage the ubiquity of Google Street View images and develop a computer
vision pipeline to predict income, per capita carbon emission, crime rates and
other city attributes from a single source of publicly available visual data.
We first detect cars in 50 million images across 200 of the largest US cities
and train a model to predict demographic attributes using the detected cars. To
facilitate our work, we have collected the largest and most challenging
fine-grained dataset reported to date consisting of over 2600 classes of cars
comprised of images from Google Street View and other web sources, classified
by car experts to account for even the most subtle of visual differences. We
use this data to construct the largest scale fine-grained detection system
reported to date. Our prediction results correlate well with ground truth
income data (r=0.82), Massachusetts department of vehicle registration, and
sources investigating crime rates, income segregation, per capita carbon
emission, and other market research. Finally, we learn interesting
relationships between cars and neighborhoods allowing us to perform the first
large scale sociological analysis of cities using computer vision techniques.Comment: AAAI 201
Towards an Intelligent Database System Founded on the SP Theory of Computing and Cognition
The SP theory of computing and cognition, described in previous publications,
is an attractive model for intelligent databases because it provides a simple
but versatile format for different kinds of knowledge, it has capabilities in
artificial intelligence, and it can also function like established database
models when that is required.
This paper describes how the SP model can emulate other models used in
database applications and compares the SP model with those other models. The
artificial intelligence capabilities of the SP model are reviewed and its
relationship with other artificial intelligence systems is described. Also
considered are ways in which current prototypes may be translated into an
'industrial strength' working system
Executive Orders: Promoting Democracy and Openness in New York State Government
This joint report outlines 11 executive actions Gov. Andrew Cuomo can take to open up New York State government, increase the accountability of state agencies and reduce barriers to voting. The orders are centered on the basic goal of empowering the citizenry with more and better information about what its government is doing, and how it is spending tax payer dollars
Scholarly communication: The quest for Pasteur's Quadrant
The scholarly communication system is sustained by its functions of a) registration, b) certification or legitimization, c) dissemination and awareness d) archiving or curation and e) reward. These functions have remained stable during the development of scholarly communication but the means through which they are achieved have not. It has been a long journey from the days when scientists communicated primarily through correspondence. The
impact of modern-day technological changes is significant and has destabilized the scholarly communication system to some extent because many more options have become available to communicate scholarly information with. Pasteur's Quadrant was articulated by Donald E Stokes in his book Pasteur's Quadrant Basic Science and Technological Innovation. It is the
idea that basic science (as practiced by Niels Bohr) and applied science (as exemplified by Thomas Edison) can be brought together to create a synergy that will produce results of significant benefit, as Louis Pasteur did. Given the theory (fundamental understanding) we have of scholarly communication and given how modern-day technological advances can be applied, a case can be made that use-inspired basic research (Pasteur's Quadrant) should be the focus for current research in scholarly communication. In doing so the different types of digital scholarly resources and their characteristics must be investigated to determine how the
fundamentals of scholarly communication are being supported. How libraries could advocate for and contribute to the improvement of scholarly communication is also noted. These resources could include: e-journals, repositories, reviews, annotated content, data, pre -print and working papers servers, blogs, discussion forums, professional and academic hubs
The Cure: Making a game of gene selection for breast cancer survival prediction
Motivation: Molecular signatures for predicting breast cancer prognosis could
greatly improve care through personalization of treatment. Computational
analyses of genome-wide expression datasets have identified such signatures,
but these signatures leave much to be desired in terms of accuracy,
reproducibility and biological interpretability. Methods that take advantage of
structured prior knowledge (e.g. protein interaction networks) show promise in
helping to define better signatures but most knowledge remains unstructured.
Crowdsourcing via scientific discovery games is an emerging methodology that
has the potential to tap into human intelligence at scales and in modes
previously unheard of. Here, we developed and evaluated a game called The Cure
on the task of gene selection for breast cancer survival prediction. Our
central hypothesis was that knowledge linking expression patterns of specific
genes to breast cancer outcomes could be captured from game players. We
envisioned capturing knowledge both from the players prior experience and from
their ability to interpret text related to candidate genes presented to them in
the context of the game.
Results: Between its launch in Sept. 2012 and Sept. 2013, The Cure attracted
more than 1,000 registered players who collectively played nearly 10,000 games.
Gene sets assembled through aggregation of the collected data clearly
demonstrated the accumulation of relevant expert knowledge. In terms of
predictive accuracy, these gene sets provided comparable performance to gene
sets generated using other methods including those used in commercial tests.
The Cure is available at http://genegames.org/cure
Investigating people: a qualitative analysis of the search behaviours of open-source intelligence analysts
The Internet and the World Wide Web have become integral parts of the lives of many modern individuals, enabling almost instantaneous communication, sharing and broadcasting of thoughts, feelings and opinions. Much of this information is publicly facing, and as such, it can be utilised in a multitude of online investigations, ranging from employee vetting and credit checking to counter-terrorism and fraud prevention/detection. However, the search needs and behaviours of these investigators are not well documented in the literature. In order to address this gap, an in-depth qualitative study was carried out in cooperation with a leading investigation company. The research contribution is an initial identification of Open-Source Intelligence investigator search behaviours, the procedures and practices that they undertake, along with an overview of the difficulties and challenges that they encounter as part of their domain. This lays the foundation for future research in to the varied domain of Open-Source Intelligence gathering
- …