Tuesday, February 24, 2009

Web search ranking and GPU Nvidia CUDA many core computation

Many search ranking algorithms involve a multiplication between a [very large and sparse] matrix and a full vector (e.g. PageRank with power method) or can be re-casted into this type of problem (e.g. HITS, some of the new learning to rank methodologies, etc).

GPU computation performs parallel blocks based multiplication, where each block is assigned to a grid of threads running on separate core. I have 18 cores on my sony notebook (16 on GPU and 2 on CPU) and that is a lot of power!
  1. GPGPU: CUDPP provides a library for parallel multiplication based on classical parallel prefix approach. I tested some code (adapted to run on windows vista by myself) and this approach looks quite promising;
  2. IBM proposes a closed-source solution for sparse matrix - vector multiplication. Declared performance is interesting but I don't know the internals of their proposed approach;
  3. This technical report from NVIDIA tackles the problem with an interesting mix of algorithmic and engineering solutions: N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, December 2008. [PDF] [Online].
In short, I believe that having low cost GPUs open a new exciting scenario for linear algebra computations. This model can make search engines save a lot of money.

No comments:

Post a Comment