Hi,
I've recently been asked to help with an installation that is suffering performance problems. After comparing many attributes of this system with another running the same software that performs much better, we discovered that the network between the HTB appserver and the HTB database had a latency between 10 and 100 times greater in the poorly performing system. (Acceptable system had a trace time of up to 0.16ms latency, compared to a range from 1.1 ms up to hundreds of ms on the poorly performing system.)
We used the tc[1] command on linux to degrade the latency of the acceptable system and we saw a corresponding drop in the performance in the performance of HTB, so we are fairly confident that we have identified the problem.
I'm interested in getting input from others about this issue. So - if you are wiling, it would be great if you could give version, latency and an approximate description of performance for the systems that you are involved with. Obviously this is wildly un-scientific, as each system is very different, however it may give some indication of the network performance required in general.
Here are our numbers.
System 1:
HTB Version: 5.3
Network Latency: 0.13-0.16ms
Performance: HTB calls are generally sub second.
System 2:
HTB Version: 5.3
Network Latency: 1.1ms-100ms and some even higher. The range appears to be weighted toward the lower end of the range.
Performance: HTB calls are generally more than 10 seconds up to over a minute.
We used tcptraceroute[2] to capture the latency as most of the hardware on the network was setup to drop UDP and ICMP packets.
One last note. I'm not trying to imply that this is a problem with HTB. 0.1ms latency should be easily achievable in a data center environment. I do however feel that the speed of the network between the HTB appserver and it's database is something that should be a documented requirement when designing the network infrastructure for your installation.
thanks
Hamish
1:
http://devresources.linux-foundation.org/shemminger/netem/example.htmlNote - you can remove the delay to eth0 using the command:
tc qdisc del dev eth0 root
2:
http://michael.toren.net/code/tcptraceroute/