To help compute shortest path distances over large graphs efficiently, 2-hop labeling has emerged as a major tool, with Pruned Landmark Labeling (PPL) as a popular algorithm. This paper demonstrates the first scalable parallel implementation of the PPL algorithm that produces the same results as the sequential algorithm. Based on theoretical analysis, we show how computations on each vertex can be performed in parallel while maintaining correctness, resulting in the Vertex-Centrix PLL (VC-PLL) algorithm. We also show a formulation of this algorithm based on linear algebra and argue why the use of a library based on linear algebra operations will not produce an efficient implementation. Next, we introduce a batched VC-PLL (BVC-PLL) algorithm to reduce the computational inefficiency in VC-PLL. We have carried out a parallel implementation of this method for modern clusters, combining shared memory and distributed memory parallelism, that can efficiently execute on graphs with more than a billion edges. We also demonstrate how BVC-PLL algorithm can be extended to handle directed graphs and weighted graphs and how the version for weighted graphs can benefit from SIMD parallelization.