[p4] Server performance questions.

Chuck Karish karish at well.com
Sat Oct 20 10:45:41 PDT 2001


At 02:42 AM 10/19/2001 -0700, Russell C. Jackson wrote:
>Users: 750+ with more than 2/3 of the users working over the WAN. Most users
>are Windows users.
>Files: 1.5 Million+
>Actual space used for depot: 120+ Gigs
>db.have - 1.7 Gigs minimum (I remove old clients regularly to keep this as
>small as possible.)
>db.rev - 650 Megs

Another useful metric:  request rate (lines per hour added to the journal).

>Our server is currently a dual 600 Mhz machine running Windows NT, SP6a with
>4 Gigs of RAM, a RAID 5 array for storage and a Gigabit NIC.
>
>During normal work hours, the server utilization is usually between 70% and
>100%.

Is that CPU utilization, from Task Manager?

>Based on Perforce's numbers in the Performance tuning section of the
>tech notes, this seems really high since they indicate that a 140 Mhz single
>processor Sparc machine should be able to handle 700 - 800 users.

Mileage varies widely.  I worked in a shop where a three-processor, 450 MHz
SPARC system just barely keeping up with 150 prolific users, two thirds
of them using the CLI.  CPU usage was never the limiting factor.

>I suppose
>that might be true if all of the users were Unix users that didn't use the
>GUI, and didn't run any server intensive commands, but that isn't reality.

Make sure that everyone knows how to turn on compression for
slow data links.  This reduces traffic by 75%, a big win for slow
transmission lines.  It does add load to the server, though.

The p4d process stays active while it prints its report to stdout.  Users
who are doing big operations should redirect the output to a file so
the server doesn't wait for display on a terminal.  Remote power users
and admins should develop the habit of doing big tasks on logins to local
machines, and they should learn to use tricks like 'p4 integ -v' .

"Reality" includes process management.  That might involve splitting up
the p4 service into separate depots or separate server instances so,
for example, doc writers and managers who use only a small part of
the code base don't make huge requests when their GUI clients update
the full depot view.

>Main problems we are seeing when everyone is working:
>
>1. Sync of single small file can take several minutes to complete, and it is
>the server side that is waiting because it sends the file over, and then
>sits there. I timed one sync today, and it took 7 minutes to complete on a
>file that normally syncs out in 2 seconds.
>
>2. Submit can show the same problem. It will send the files over, but then
>the server will sit there for minutes before it completes the command.
>
>3. Similar problems with other commands that would normally complete very
>fast.

These are all the same problem: excessive latency.

>Note: This doesn't happen with every single request, but it happens
>frequently enough to be causing the users a lot of problems.
>
>Questions:
>
>1. Has anyone else seen similar problems? If so, did you find a solution, or
>at least a cause?

Yes.  When the rate of requests is greater than what the server can handle,
requests are queued.  The queue grows as long as the server is overloaded.

As Stephen Vance pointed out, some requests cause regions of the metadata
to be locked while a snapshot is recorded.  This blocks other requests that
try to access the same region.  Perforce has been working to minimize the
use of these locks, but they're necessary if the client is to receive a correct
response.  Lesson:  Try to limit requests to as small a scope as possible.
Users who keep GUI clients open all the time should choose client views that
are restricted to the files they're actually interested in.

Use netstat to keep an eye on the queue.  Netstat reports the client system
names and process IDs  associated with open sockets and p4d reports the
process IDs associated with requests.  The numbers of sockets in the
OPEN_WAIT and CLOSE_WAIT states are also indicators of server load
and system network efficiency.

>2. If you have a similar, or larger size server:
>        A. What is your configuration?
>        B. How many users do you have remote?
>        C. How many of your users use the GUI?
>
>3. Do any of you have any recommendations about whether faster hardware is
>the way to go, or should we be looking into splitting this server into
>multiple smaller servers? Multiple smaller servers is a pain in terms of
>code management, server maintenance, and costs is why we have avoided going
>that route so far.

The people who are scoffing at NT aren't just expressing their religious beliefs.
There are two places where many Unix systems work better than an untuned
NT system: the network stack and the kernel task scheduler.  Those CPUs
have to be used efficiently to get the best performance; 100% on the Task
Manager meter doesn't necessarily mean that you're getting optimal performance.

If I were running a server on Wintel hardware I' be thinking hard about using
FreeBSD.  If I were committed to NT I'd be doing everything I could to tune it to
handle a lot of network connections and a lot of simultaneous tasks.

Sometimes individual fast disks work better than a RAID array.  p4d has three
separate uses of the disks that mustn't wait on each other: retrieving source
files, reading and writing the metadata, and kernel paging (pagefile.sys).  These
should be on separate spindles.

p4d never suffers from having too much memory.  Do what you can to see that
its metadata don't get paged out.

Clients that make repeated requests (review daemon, GUI clients) should be set
up not to ask again while a previous request is still queued.  This is something that
Perforce could make easier by adding a quality-of-service attribute to requests:
a client could specify "drop this request if it won't complete within one minute".


Chuck Karish            karish at well.com           (415) 317-0182




More information about the perforce-user mailing list