114,174 research outputs found
A Novel Approach to the Common Due-Date Problem on Single and Parallel Machines
This paper presents a novel idea for the general case of the Common Due-Date
(CDD) scheduling problem. The problem is about scheduling a certain number of
jobs on a single or parallel machines where all the jobs possess different
processing times but a common due-date. The objective of the problem is to
minimize the total penalty incurred due to earliness or tardiness of the job
completions. This work presents exact polynomial algorithms for optimizing a
given job sequence for single and identical parallel machines with the run-time
complexities of for both cases, where is the number of jobs.
Besides, we show that our approach for the parallel machine case is also
suitable for non-identical parallel machines. We prove the optimality for the
single machine case and the runtime complexities of both. Henceforth, we extend
our approach to one particular dynamic case of the CDD and conclude the chapter
with our results for the benchmark instances provided in the OR-library.Comment: Book Chapter 22 page
Common Due-Date Problem: Exact Polynomial Algorithms for a Given Job Sequence
This paper considers the problem of scheduling jobs on single and parallel
machines where all the jobs possess different processing times but a common due
date. There is a penalty involved with each job if it is processed earlier or
later than the due date. The objective of the problem is to find the assignment
of jobs to machines, the processing sequence of jobs and the time at which they
are processed, which minimizes the total penalty incurred due to tardiness or
earliness of the jobs. This work presents exact polynomial algorithms for
optimizing a given job sequence or single and parallel machines with the
run-time complexities of and respectively, where
is the number of jobs and the number of machines. The algorithms take a
sequence consisting of all the jobs as input and
distribute the jobs to machines (for ) along with their best completion
times so as to get the least possible total penalty for this sequence. We prove
the optimality for the single machine case and the runtime complexities of
both. Henceforth, we present the results for the benchmark instances and
compare with previous work for single and parallel machine cases, up to
jobs.Comment: 15th International Symposium on Symbolic and Numeric Algorithms for
Scientific Computin
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Petuum: A New Platform for Distributed Machine Learning on Big Data
What is a systematic way to efficiently apply a wide spectrum of advanced ML
programs to industrial scale problems, using Big Models (up to 100s of billions
of parameters) on Big Data (up to terabytes or petabytes)? Modern
parallelization strategies employ fine-grained operations and scheduling beyond
the classic bulk-synchronous processing paradigm popularized by MapReduce, or
even specialized graph-based execution that relies on graph representations of
ML programs. The variety of approaches tends to pull systems and algorithms
design in different directions, and it remains difficult to find a universal
platform applicable to a wide range of ML programs at scale. We propose a
general-purpose framework that systematically addresses data- and
model-parallel challenges in large-scale ML, by observing that many ML programs
are fundamentally optimization-centric and admit error-tolerant,
iterative-convergent algorithmic solutions. This presents unique opportunities
for an integrative system design, such as bounded-error network synchronization
and dynamic scheduling based on ML program structure. We demonstrate the
efficacy of these system designs versus well-known implementations of modern ML
algorithms, allowing ML programs to run in much less time and at considerably
larger model sizes, even on modestly-sized compute clusters.Comment: 15 pages, 10 figures, final version in KDD 2015 under the same titl
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
- …