TY - GEN
T1 - A Study of Long-Tail Latency in n-Tier Systems
T2 - 37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017
AU - Wang, Qingyang
AU - Lai, Chien An
AU - Kanemasa, Yasuhiko
AU - Zhang, Shungeng
AU - Pu, Calton
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/13
Y1 - 2017/7/13
N2 - Long-tail latency of web-facing applications continues to be a serious problem. Most of the previously published research addresses two classes of long latency problems: uneven workloads such as web search, and resource saturation in single nodes. We describe an experimental study of a third class of long tail latency problemsthat are specific to distributed systems: Cross-Tier Queue Overflow (CTQO) due to a combination of millibottlenecks (with sub-second duration) and tightly-coupled servers in n-tier systems (e.g., Apache, Tomcat, and MySQL) using RPC-style request-response communications. Our experiments show that the appearance of millibottlenecks (e.g., created by short workload bursts) in one server often causes another server (which has no saturated resources) in the synchronous invocation chain to fill up its queues (CTQO) and drop packets, creating very long response time queries. CTQO can be reduced or avoided by replacing the server dropping packets with an asynchronous server. In synchronous n-tier system experiments, long tail latency due to CTQO can be reproduced consistently atutilization as low as 43%. In contrast, when all n-tier servers are replaced by asynchronous versions, CTQO and consequent dropped packets remain absent at utilization levels as high as 83%, despite the same millibottlenecks.
AB - Long-tail latency of web-facing applications continues to be a serious problem. Most of the previously published research addresses two classes of long latency problems: uneven workloads such as web search, and resource saturation in single nodes. We describe an experimental study of a third class of long tail latency problemsthat are specific to distributed systems: Cross-Tier Queue Overflow (CTQO) due to a combination of millibottlenecks (with sub-second duration) and tightly-coupled servers in n-tier systems (e.g., Apache, Tomcat, and MySQL) using RPC-style request-response communications. Our experiments show that the appearance of millibottlenecks (e.g., created by short workload bursts) in one server often causes another server (which has no saturated resources) in the synchronous invocation chain to fill up its queues (CTQO) and drop packets, creating very long response time queries. CTQO can be reduced or avoided by replacing the server dropping packets with an asynchronous server. In synchronous n-tier system experiments, long tail latency due to CTQO can be reproduced consistently atutilization as low as 43%. In contrast, when all n-tier servers are replaced by asynchronous versions, CTQO and consequent dropped packets remain absent at utilization levels as high as 83%, despite the same millibottlenecks.
UR - http://www.scopus.com/inward/record.url?scp=85027270420&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027270420&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2017.32
DO - 10.1109/ICDCS.2017.32
M3 - Conference contribution
AN - SCOPUS:85027270420
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 207
EP - 217
BT - Proceedings - IEEE 37th International Conference on Distributed Computing Systems, ICDCS 2017
A2 - Lee, Kisung
A2 - Liu, Ling
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 June 2017 through 8 June 2017
ER -