Recommender systems are prevalent across the internet. Many services rely on accurate recommendation systems, with deep learning recommenders becoming very popular due to their generalizable accuracy. However, as with many deep learning models, these recommenders have large magnitudes of parameters that have high computation cost. We seek to alleviate these issues by exploring the use of common matrix approximations, such as low rank, random Fourier features, and PCA, to compare their efficiency speedups on Google's TPU architecture versus traditional GPU and CPU setups. We focus on a case study of training a DCN model over the movielens dataset, where we apply the matrix approximations for cross layer interactions. While low rank approximation is often the best generalizable approach for GPUs and CPUs in terms of high complexity reduction and preserved accuracy, our results suggest that random Fourier Features may scale better for large batch training on TPUs.
Advised by Rashmi Vinayak
Milestone 1, Milestone 2, Milestone 3, Milestone 4, Milestone 5, Final Paper