Class Notes Class Photo Data Flow Fast Links
Glossary MIMD Architectures Networks Parallel Algorithms
References Seminar Presentations SIMD Architectures Speedup and Efficiency
  Math 4674/CSC 6551 Home Page  

PARALLEL ALGORITHMS

Introduction to Parallel Algorithms
Problems in Designing Parallel Algorithms
Divide & Conquer Technique
Upper/Lower Algorithm Construction
Odd/Even Algorithm Construction

Introduction to Parallel Algorithms

The onset of parallel computers has crea,ted a large demand for parallelizing many of the common sequential algorithms used in software applications today. With parallel computers we are allowed to implement many independent operations simultaneously. The number of operations which can be performed simultaneously is, of course, dependent on the number of processors in our machine as well as its architecture.


Problems in Designing Parallel Algorithms

The inherent problem for parallel algorithm designers is how to parallelize an algorithm allowing for a given number of processors while minimizing the size or the depth of the algorithm. The size of the algorithm relates to the number of operations which must be performed for a desired outcome. The depth of an algorithm is the number of operations in the longest chain of operations from any input to any output. If we are given an unlimited number of processors to perform our algorithms, then to minimize computation time, we would look for an algorithm which minimzizes the depth. Conversely, if we have a limited number of processors, then we need to keep the size small and we would use an algorithm which minimizes the size. The evaluation of these algorithms is centered upon speedup and efficiency. Speedup measures the increase or decrease of computational time of the parallel algorithm compared to a known sequential algorithm. Efficiency measures the increase or decrease of computational time per processor.


Divide & Conquer Technique

One technique which is used in constructing parallel algorithms as well as sequential algorithms is "Divide and Conquer". Divide and Conquer is a technique which splits a problem into two smaller problems whose solutions can be simply combined to solve the larger problem. This splitting is continued recursively until problems are so small that they are easy to solve. A general problem which is solved using this technique is the summation of N numbers. The parallel prefix algorithm of Upper/Lower construction solves this problem by continued breaking of the inputs into boxes. The boxes are formed by splitting N into two halves which we will call the parents, where parentL denotes the box with smaller indices and parentR with larger indices. If each half has greater than two inputs then these halves are split again into two more boxes we will call the children of each respective parent using the same format above: childL, childR. This is done until the final split creates boxes with only two elements. These two elements in each two-element box are then summed. The larger index number of the childL box is then added to the numbers in its parent which have larger indices than its index number. This is continued for every childL and parentL until all numbers have been summed.


Upper/Lower Algorithm Construction

Compared to other algorithms which perform summation, the Upper/Lower construction minimizes depth. The implementation complexity of this algorithm is fairly simple and the largest number of processors needed is N/2. Its size = N/2 * log N base 2, and its depth = log N base 2. The speedup = N/2 and the efficiency = 2/log N base 2. This shows that the efficiency of this algorithm decreases as the problem size grows. This is certainly a very efficient algorithm, but in most cases it is unrealistic since it assumes an unlimited number processors. An alternative to this algorithm is the Odd/Even construction for summation.


Odd/Even Algorithm Construction

The Odd/Even construction solves the summation problem by breaking the indexed numbers into odd and even indices. Once the inputs are divided into sets with odd and even index values, the odds are combined with the next higher evens at input. The parallel prefix is performed on the new combined evens and then the evens are combined with the next higher odds at output. This algorithm is a little more complex than Upper/Lower but it allows for limited processors. The depth = 2log N-2 base 2, is larger than Upper/Lower but the size = 2N - log N base 2 - 2 is smaller. This is certainly a more realistic algorithm, although the time required to perform this algorithm is longer than the Upper/Lower algorithm.