The conjugate gradient algorithm suffers from communication bottlenecks on
parallel architectures due to the two global reductions required each
iteration. In this paper, we introduce a new communication hiding conjugate
gradient variant, which requires only one global reduction per iteration. We
then introduce pipelined versions of this variant and of a variant due to
Meraunt, to fully overlap the matrix vector product with all inner products.
These variants exhibit rates of convergence and final accuracies better than
previously known pipelined variants. The convergence these variants are
improved using a predict-and-recompute scheme, whereby recursively updated
quantities are first used as a predictor for the true value, and then
recomputed exactly later in the iteration. Numerical tests indicate that, on
difficult problems, applying this technique to our pipelined variants often
results in both a rate of convergence and ultimately attainable accuracy far
better than previously studied pipelined conjugate gradient variants.