Load balancing is the key to many parallel and distributed applications. We consider the following load balancing problem: given any undirected connected graph and an initial weight distribution on the nodes, determine a schedule to move weights across edges so as to have (almost) equal weights on the nodes. Weights are moved across edges in rounds, and, in a round, weights are moved between the adjacent nodes exactly once. We study this problem in both static and dynamic networks. Previously studied diffusion and dimension exchange algorithms are slow in practice in the sense that they require many rounds of weight exchanges. In this paper, we present a class of algorithms that are work-efficient, i.e., they reduce the number of rounds (i.e., iterations) of weight exchanges needed to balance the load. In our algorithms, a node exchanges load with its neighbors sequentially (one neighbor at a time) and the load at that node is updated before subsequent exchanges with other neighbors. Simulation results on six network topologies show that our algorithms balance the load quite work-efficiently compared to previous algorithms.