1,316 research outputs found
Distributed computing and data storage in proteomics: many hands make light work, and a stronger memory
Modern day proteomics generates ever more complex data, causing the requirements on the storage and processing of such data to outgrow the capacity of most desktop computers. To cope with the increased computational demands, distributed architectures have gained substantial popularity in the recent years. In this review, we provide an overview of the current techniques for distributed computing, along with examples of how the techniques are currently being employed in the field of proteomics. We thus underline the benefits of distributed computing in proteomics, while also pointing out the potential issues and pitfalls involved.acceptedVersio
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality
Microtask crowdsourcing is increasingly critical to the creation of extremely
large datasets. As a result, crowd workers spend weeks or months repeating the
exact same tasks, making it necessary to understand their behavior over these
long periods of time. We utilize three large, longitudinal datasets of nine
million annotations collected from Amazon Mechanical Turk to examine claims
that workers fatigue or satisfice over these long periods, producing lower
quality work. We find that, contrary to these claims, workers are extremely
stable in their quality over the entire period. To understand whether workers
set their quality based on the task's requirements for acceptance, we then
perform an experiment where we vary the required quality for a large
crowdsourcing task. Workers did not adjust their quality based on the
acceptance threshold: workers who were above the threshold continued working at
their usual quality level, and workers below the threshold self-selected
themselves out of the task. Capitalizing on this consistency, we demonstrate
that it is possible to predict workers' long-term quality using just a glimpse
of their quality on the first five tasks.Comment: 10 pages, 11 figures, accepted CSCW 201
ProcessGPT: Transforming Business Process Management with Generative Artificial Intelligence
Generative Pre-trained Transformer (GPT) is a state-of-the-art machine
learning model capable of generating human-like text through natural language
processing (NLP). GPT is trained on massive amounts of text data and uses deep
learning techniques to learn patterns and relationships within the data,
enabling it to generate coherent and contextually appropriate text. This
position paper proposes using GPT technology to generate new process models
when/if needed. We introduce ProcessGPT as a new technology that has the
potential to enhance decision-making in data-centric and knowledge-intensive
processes. ProcessGPT can be designed by training a generative pre-trained
transformer model on a large dataset of business process data. This model can
then be fine-tuned on specific process domains and trained to generate process
flows and make decisions based on context and user input. The model can be
integrated with NLP and machine learning techniques to provide insights and
recommendations for process improvement. Furthermore, the model can automate
repetitive tasks and improve process efficiency while enabling knowledge
workers to communicate analysis findings, supporting evidence, and make
decisions. ProcessGPT can revolutionize business process management (BPM) by
offering a powerful tool for process augmentation, automation and improvement.
Finally, we demonstrate how ProcessGPT can be a powerful tool for augmenting
data engineers in maintaining data ecosystem processes within large bank
organizations. Our scenario highlights the potential of this approach to
improve efficiency, reduce costs, and enhance the quality of business
operations through the automation of data-centric and knowledge-intensive
processes. These results underscore the promise of ProcessGPT as a
transformative technology for organizations looking to improve their process
workflows.Comment: Accepted in: 2023 IEEE International Conference on Web Services
(ICWS); Corresponding author: Prof. Amin Beheshti ([email protected]
Report on the 2015 NSF Workshop on Unified Annotation Tooling
On March 30 & 31, 2015, an international group of twenty-three researchers with expertise in linguistic annotation convened in Sunny Isles Beach, Florida to discuss problems with and potential solutions for the state of linguistic annotation tooling. The participants comprised 14 researchers from the U.S. and 9 from outside the U.S., with 7 countries and 4 continents represented, and hailed from fields and specialties including computational linguistics, artificial intelligence, speech processing, multi-modal data processing, clinical & medical natural language processing, linguistics, documentary linguistics, sign-language linguistics, corpus linguistics, and the digital humanities. The motivating problem of the workshop was the balkanization of annotation tooling, namely, that even though linguistic annotation requires sophisticated tool support to efficiently generate high-quality data, the landscape of tools for the field is fractured, incompatible, inconsistent, and lacks key capabilities. The overall goal of the workshop was to chart the way forward, centering on five key questions: (1) What are the problems with current tool landscape? (2) What are the possible benefits of solving some or all of these problems? (3) What capabilities are most needed? (4) How should we go about implementing these capabilities? And, (5) How should we ensure longevity and sustainability of the solution? I surveyed the participants before their arrival, which provided significant raw material for ideas, and the workshop discussion itself resulted in identification of ten specific classes of problems, five sets of most-needed capabilities. Importantly, we identified annotation project managers in computational linguistics as the key recipients and users of any solution, thereby succinctly addressing questions about the scope and audience of potential solutions. We discussed management and sustainability of potential solutions at length. The participants agreed on sixteen recommendations for future work. This technical report contains a detailed discussion of all these topics, a point-by-point review of the discussion in the workshop as it unfolded, detailed information on the participants and their expertise, and the summarized data from the surveys
- …