SVD and Tucker Decomposition with Low RAM Requirements

Recently I’ve been experimenting with algorithms for the Singular Value Decomposition and the Tucker Decomposition, with the goal of processing large matrices (more than 105 rows and columns) and large tensors (more than 104 rows, columns, and tubes) that are relatively sparse (about 10% density). The problem with matrices and tensors of this size is that most algorithms use a lot of RAM. My desktop computer has 8 GiB of RAM, but that’s not enough. However, I have found some algorithms that have low RAM requirements and yet are as fast as the standard algorithms.

Until recently, I have been using Doug Rohde’s SVD implementation and Brett Bader and Tamara Kolda’s Tucker decomposition. Both of these use sparse representations and carefully designed algorithms. I recommend both highly, but I am now reaching their limits, given the hardware that is currently available to me.

For low RAM SVD and eigen decomposition, I have been experimenting with various stochastic approximations, such as the Generalized Hebbian Algorithm (GHA), Subspace Networks (SN), and Stochastic Gradient Ascent (SGA). The stochastic approach has performed well in the Netflix Prize competition, but I find that it is sensitive to parameter tuning and somewhat slow. I am now using Matthew Brand’s Thin Singular Value Decomposition, which has been implemented in Matlab by David Wingate. This algorithm uses little RAM, runs quickly, has no parameters to tune, and allows incremental updating of the matrix.

For low RAM Tucker decomposition, I was experimenting with stochastic SVD and eigen decomposition embedded in Bader and Kolda’s Alternating Least Squares (ALS) algorithm. This approach uses little RAM but runs either very slowly (with Brand’s SVD) or very, very slowly (with GHA, SN, or SGA). Then I discovered Hongcheng Wang and Narenda Ahuja’s algorithm for Tucker decomposition, which uses little RAM and runs at least as quickly as Bader and Kolda’s algorithm. The fit of Wang and Ahuja’s decomposition is slightly worse than the fit of Bader and Kolda’s decomposition, but the difference might not matter for my purposes, and I have found a few small tweaks that improve Wang and Ahuja’s fit.

It’s interesting that Brand, Wang, and Ahuja work in computational vision. The demands of visual data have motivated them to develop algorithms for SVD and Tucker decomposition that can handle large matrices and tensors. My intuitions about computational linguistics and the demands of linguistic data are leading me along the same path. I was once attracted to the view of evolutionary psychology, that the mind consists of many different modules, but now I am returning to the older view of the mind as more unified.

Update: My algorithm and source code for big tensors.

3 Responses to “SVD and Tucker Decomposition with Low RAM Requirements”

  1. Some good links there, thanks. I haven’t even heard of some of those algos, time to get reading. About the modularity of mind: I like the idea of a unified mind, but I’m almost certainly convinced that more complex sub-structures exist , despite some (Landauer I think) saying that non-linear structures are too hard for our brains to make. Something that has grabbed my attention in this regard lately is spectral geometry. I spoke to K v Rijsbergen about this recently, and he said it had sparked his interest too. Have you looked into it at all?

    On another note, i’ve actually just finished writing an SVD for sparse data in python. It’s slow, probably buggy but has worked on the data I’ve sent it. I wanted something I could send large sparse data, and SVDLIBC was not coping with it too well. I haven’t load tested it with highly dimensional data yet as the targets in my current research have changed. (355×355 has run fine, albeit slowly). note: it’s a WIP, so go easy on me :) /shameless plug

    http://dpn.name/index.php/2007/09/19/sparse-svd-in-pure-python-is-done/

  2. One thing that is of particular interest to many applications is the incremental addition of previously missing data. In other words, running these algorithms on a sparse data set is one thing, but what happens when you observe the actual vallue of data previously imputed? How much reevaluation is really necessary?

  3. For SVD, Brand’s algorithm handles missing data, updating, and downdating very efficiently. See these three papers and try the Matlab code:

    http://www.merl.com/reports/docs/TR2002-024.pdf
    http://www.merl.com/reports/docs/TR2003-014.pdf
    http://www.merl.com/publications/TR2006-059/
    http://www.eecs.umich.edu/~wingated/resources.html

    If you don’t have Matlab, I expect the code will run in Octave:

    http://www.gnu.org/software/octave/

    For the Tucker decomposition, it should be possible to use Brand’s ideas, but it hasn’t been done yet, as far as I know.

Leave a Reply