The onset of parallel computers has crea,ted a large demand for parallelizing many of the
common sequential algorithms used in software applications today. With parallel computers
we are allowed to implement many independent operations simultaneously. The number of
operations which can be performed simultaneously is, of course, dependent on the number of
processors in our machine as well as its architecture.
The inherent problem for parallel algorithm designers is how to parallelize an algorithm
allowing for a given number of processors while minimizing the size or the depth of the
algorithm. The size of the algorithm relates to the number of operations which must be
performed for a desired outcome. The depth of an algorithm is the number of operations in
the longest chain of operations from any input to any output. If we are given an unlimited
number of processors to perform our algorithms, then to minimize computation time, we
would look for an algorithm which minimzizes the depth. Conversely, if we have a limited
number of processors, then we need to keep the size small and we would use an algorithm
which minimizes the size. The evaluation of these algorithms is centered upon speedup and
efficiency. Speedup measures the increase or decrease of computational time of the
parallel algorithm compared to a known sequential algorithm. Efficiency measures the
increase or decrease of computational time per processor.
One technique which is used in constructing parallel algorithms as well as sequential
algorithms is "Divide and Conquer". Divide and Conquer is a technique which
splits a problem into two smaller problems whose solutions can be simply combined to solve
the larger problem. This splitting is continued recursively until problems are so small
that they are easy to solve. A general problem which is solved using this technique is the
summation of N numbers. The parallel prefix algorithm of Upper/Lower construction solves
this problem by continued breaking of the inputs into boxes. The boxes are formed by
splitting N into two halves which we will call the parents, where parentL denotes the box
with smaller indices and parentR with larger indices. If each half has greater than two
inputs then these halves are split again into two more boxes we will call the children of
each respective parent using the same format above: childL, childR. This is done until the
final split creates boxes with only two elements. These two elements in each two-element
box are then summed. The larger index number of the childL box is then added to the
numbers in its parent which have larger indices than its index number. This is continued
for every childL and parentL until all numbers have been summed.
Compared to other algorithms which perform summation, the Upper/Lower construction
minimizes depth. The implementation complexity of this algorithm is fairly simple and the
largest number of processors needed is N/2. Its size = N/2 * log N base 2, and its depth =
log N base 2. The speedup = N/2 and the efficiency = 2/log N base 2. This shows that the
efficiency of this algorithm decreases as the problem size grows. This is certainly a very
efficient algorithm, but in most cases it is unrealistic since it assumes an unlimited
number processors. An alternative to this algorithm is the Odd/Even construction for
summation.
The Odd/Even construction solves the summation problem by breaking the indexed numbers
into odd and even indices. Once the inputs are divided into sets with odd and even index
values, the odds are combined with the next higher evens at input. The parallel prefix is
performed on the new combined evens and then the evens are combined with the next higher
odds at output. This algorithm is a little more complex than Upper/Lower but it allows for
limited processors. The depth = 2log N-2 base 2, is larger than Upper/Lower but the size =
2N - log N base 2 - 2 is smaller. This is certainly a more realistic algorithm, although
the time required to perform this algorithm is longer than the Upper/Lower algorithm.