Clusters of GPUs have rapidly emerged as the means for achieving extreme-scale, cost-effective, and powerefficient high performance computing. At the same time, high level APIs like map-reduce are being used for developing several types of high-end and/or data-intensive applications. Map-reduce, originally developed for data processing applications, has been successfully used for many classes of applications that involve a significant amount of computations, such as machine learning, image processing, and data mining applications. Because such applications can be accelerated using GPUs (and other accelerators), there has been interest in supporting map-reduce-like APIs on GPUs. However, while the use of map-reduce for a single GPU has been studied, developing map-reduce-like models for programming a heterogeneous CPU-GPU cluster remains an open challenge. This paper presents the MATE-CG system, which is a map reduce-like framework based on the generalized reduction API. We develop support for enabling scalable and efficient implementation of data-intensive applications in a heterogeneous cluster of multi-core CPUs and many-core GPUs. Our contributions are three folds: 1) we port the generalized reduction model on clusters of modern GPUs with a map-reduce-like API, dealing with very large datasets, 2) we further propose three schemes to better utilize the computing power of CPUs and/or GPUs and develop an auto-tuning strategy to achieve the best-possible heterogeneous configuration for iterative applications, 3) we show how analytical models can be used to optimize important parameters in our system. We evaluate our system using three representative data intensive applications and report results on a heterogeneous cluster of 128 CPU cores and 16 GPUs (7168 GPU cores). We show an average speedup of 87x on this cluster over execution with 2 CPU-cores. Our applications also achieve an average improvement of 25% by using CPU cores and GPUs simultaneously, over the best performance achieved from using only one of the types of resources in the cluster.