Sharing GPUs in the cloud is cost effective and can facilitate the adoption of hardware accelerator enabled cloud. Butsharing causes interference between co-located VMs andleads to performance degradation. In this paper, we proposedan interference-aware VM scheduler at the cluster level withthe goal of minimizing interference. NVIDIA vGPU pro-vides sharing capability and high performance, but it has unique performance characteristics, which have not been studied thoroughly before. Our study reveals several key ob-servations. We leverage our observations to construct modelsbased on machine learning techniques to predict interferencebetween co-located VMs on the same GPU. We proposed a system architecture leveraging our models to schedule VMs to minimize the interference. The experiments show that our observations improves the model accuracy (by 15% ̃ 40%) and the scheduler reduces application run-time overhead by 24.2% in simulated scenarios.