5,871 research outputs found
Bounding the Probability of Error for High Precision Recognition
We consider models for which it is important, early in processing, to
estimate some variables with high precision, but perhaps at relatively low
rates of recall. If some variables can be identified with near certainty, then
they can be conditioned upon, allowing further inference to be done
efficiently. Specifically, we consider optical character recognition (OCR)
systems that can be bootstrapped by identifying a subset of correctly
translated document words with very high precision. This "clean set" is
subsequently used as document-specific training data. While many current OCR
systems produce measures of confidence for the identity of each letter or word,
thresholding these confidence values, even at very high values, still produces
some errors.
We introduce a novel technique for identifying a set of correct words with
very high precision. Rather than estimating posterior probabilities, we bound
the probability that any given word is incorrect under very general
assumptions, using an approximate worst case analysis. As a result, the
parameters of the model are nearly irrelevant, and we are able to identify a
subset of words, even in noisy documents, of which we are highly confident. On
our set of 10 documents, we are able to identify about 6% of the words on
average without making a single error. This ability to produce word lists with
very high precision allows us to use a family of models which depends upon such
clean word lists
Learned versus Hand-Designed Feature Representations for 3d Agglomeration
For image recognition and labeling tasks, recent results suggest that machine
learning methods that rely on manually specified feature representations may be
outperformed by methods that automatically derive feature representations based
on the data. Yet for problems that involve analysis of 3d objects, such as mesh
segmentation, shape retrieval, or neuron fragment agglomeration, there remains
a strong reliance on hand-designed feature descriptors. In this paper, we
evaluate a large set of hand-designed 3d feature descriptors alongside features
learned from the raw data using both end-to-end and unsupervised learning
techniques, in the context of agglomeration of 3d neuron fragments. By
combining unsupervised learning techniques with a novel dynamic pooling scheme,
we show how pure learning-based methods are for the first time competitive with
hand-designed 3d shape descriptors. We investigate data augmentation strategies
for dramatically increasing the size of the training set, and show how
combining both learned and hand-designed features leads to the highest
accuracy
Annotating Synapses in Large EM Datasets
Reconstructing neuronal circuits at the level of synapses is a central
problem in neuroscience and becoming a focus of the emerging field of
connectomics. To date, electron microscopy (EM) is the most proven technique
for identifying and quantifying synaptic connections. As advances in EM make
acquiring larger datasets possible, subsequent manual synapse identification
({\em i.e.}, proofreading) for deciphering a connectome becomes a major time
bottleneck. Here we introduce a large-scale, high-throughput, and
semi-automated methodology to efficiently identify synapses. We successfully
applied our methodology to the Drosophila medulla optic lobe, annotating many
more synapses than previous connectome efforts. Our approaches are extensible
and will make the often complicated process of synapse identification
accessible to a wider-community of potential proofreaders
The Limitations of Stock Market Efficiency: Price Informativeness and CEO Turnover
Stock prices are more informative when the information has less social value. Speculators with limited resources making costly (private) information production decisions must decide to produce information about some firms and not others. We show that producing and trading on private information is most profitable in the stocks of firms with poor corporate governance -- precisely because it will not be acted upon -- and less profitable at firms with better corporate governance. To the extent that the information in the stock price is used for disciplining the CEO by the board of directors, the informed trader has a reduced incentive to produce the information in the first place. We test our model using the probability of informed trading (PIN) and the probability of forced CEO turnover in a simultaneous-equation system. The empirical results support the model predictions. Stock prices are efficient, but there is a limit to the disciplining role they can fulfill. We apply the model to evaluate the effects of the Sarbanes-Oxley Act of 2002.
A genetic screen for regulators of muscle morphogenesis in Drosophila
The mechanisms that determine the final topology of skeletal muscles remain largely unknown. We have been developing Drosophila body wall musculature as a model to identify and characterize the pathways that control muscle size, shape, and orientation during embryogenesis (Johnson et al., 2013; Williams et al., 2015; Yang et al., 2020a; Yang et al., 2020b). Our working model argues muscle morphogenesis is regulated by (1) extracellular guidance cues that direct muscle cells toward muscle attachment sites, and (2) contact dependent interactions between muscles and tendon cells. While we have identified several pathways that regulate muscle morphogenesis, our understanding is far from complete. Here we report the results of a recent EMS-based forward genetic screen that identified a myriad of loci not previously associated with muscle morphogenesis. We recovered new alleles of known muscle morphogenesis genes, including back seat driver, kon-tiki, thisbe, and tumbleweed, arguing our screen had the depth and precision to uncover myogenic genes. We also identified new alleles of spalt-major, barren, and patched that presumably disrupt independent muscle morphogenesis pathways. Equally as important, our screen shows that at least 11 morphogenetic loci remain to be mapped and characterized. Our screen has developed exciting new tools to study muscle morphogenesis, which may provide future insights into the mechanisms that regulate skeletal muscle topology
- …