[p4] Perforce server lockup problem

Erik Johnson erik at valvesoftware.com
Fri Mar 18 13:52:28 PST 2005


A followup to our problem.
 
Thanks to Perforce technical support, they pointed us to the
undocumented -1 flag to use when doing integrates. Here is the
description from p4 help undoc:
 
p4 integrate -1
    The flag '-1' can be supplied to the 'p4 integrate' command to
    force consideration of direct integration history only.
 
This seems to essentially mimic the previous integration behavior. In
our test, our integration time went from approximately 10 minutes
(during which most of the time no users were able to complete any
process) to around 10 seconds with no interruption. This was on an
integrate where 14 total source code files had changed.
 
It's unclear to me why this isn't the default behavior, but maybe our
workflow differs from others in some meaningful way.
 
Thanks to everyone on the list for pointing in the right direction.
 
Erik

________________________________

From: perforce-user-bounces at perforce.com
[mailto:perforce-user-bounces at perforce.com] On Behalf Of Erik Johnson
Sent: Wednesday, March 16, 2005 10:10 AM
To: Erik Johnson; perforce-user at perforce.com
Subject: RE: [p4] Perforce server lockup problem


Another option for us would be to get someone here onsite that could
help us diagnose this problem. Does anyone on the list know if there is
anyone out there that provides this kind of service?
 
Erik

________________________________

From: perforce-user-bounces at perforce.com
[mailto:perforce-user-bounces at perforce.com] On Behalf Of Erik Johnson
Sent: Tuesday, March 15, 2005 1:27 PM
To: perforce-user at perforce.com
Subject: RE: [p4] Perforce server lockup problem


We built a test case to reproduce the problem and setup filemon to watch
the disk.
 
We created a brand new branch for scratch from our main codeline, and
then went into integrating in back in. This operation took about 10
minutes, and during that time all other clients on the network were
essentially locked out of Perforce.
 
Scanning through the filemon logs, approximately 7 minutes of time is
spent doing this:
 
314822 11:21:43 AM p4s.exe:3736 READ U:\p4root\db.integed SUCCESS
Offset: 2195996672 Length: 8192 
314823 11:21:43 AM p4s.exe:3736 READ  U:\p4root\db.integed SUCCESS
Offset: 1690583040 Length: 8192 
314824 11:21:43 AM p4s.exe:3736 READ U:\p4root\db.integed SUCCESS
Offset: 1690583040 Length: 8192 

>From reading through release notes, it looks like there were some
changes to how integrations were dealt with from our original server
version (2001.1) and the version we're currently running. I don't know
enough about the inner workings of the database to have a clear idea as
to what the excessive reads of this file actually means.
 
Any help much appreciated.
 
Erik

________________________________

From: perforce-user-bounces at perforce.com
[mailto:perforce-user-bounces at perforce.com] On Behalf Of Erik Johnson
Sent: Tuesday, March 08, 2005 12:11 PM
To: Bruce McPeek; perforce-user at perforce.com
Subject: RE: [p4] Perforce server lockup problem


We've been able to reproduce this problem with anti-virus running, and
without (running Etrust).
 
I'm fairly sure it's not a bottleneck on the drive array, as just
watching the bounds of reads/writes under normal operation are at a much
higher throughput than this. For what it's worth, they are four 250GB
Western Digital 7200RPM 8MB cache SATA drives, running in a RAID 10
array (mirrored and striped). We've benchmarked this setup for some of
our other business activities in identical hardware, and they don't
exhibit this behavior.
 
I'll take a look at Sysinternals Filemon and see what data it provides,
just to make sure it's Perforce creating the problem.
 
Sounds like it also makes to sense to upgrade to the very latest server
software.
 
I'll let the list know what I find out.
 
Thanks for the info,
 
Erik

________________________________

From: perforce-user-bounces at perforce.com
[mailto:perforce-user-bounces at perforce.com] On Behalf Of Bruce McPeek
Sent: Tuesday, March 08, 2005 9:04 AM
To: perforce-user at perforce.com
Subject: RE: [p4] Perforce server lockup problem


Erik,
 
To me this, this sounds more like a hardware issue under load. I am
especially suspicious of the SATA RAID 5.
 
Could you describe the hardware upgrades you mentioned? How is your SATA
RAID 5 configured? Hardware RAID or software RAID? How many drives of
what size? Even better which models. How are you doing your SATA? On
motherboard or add-on card? Are your SATA drivers native windows? Third
party?
 
I need to look at how SATA interfaces with the rest of a system's I/O
again but I'm wondering if this may be your bottleneck.
 
I agree with the other posters about the anti-virus. If it is installed,
how is it configured with respect to what is scanned?
 
I just noticed you are at Valve Software. What are the typical sizes of
the files you are working with? Large binaries for games?
 
 
Bruce
 

________________________________

From: Erik Johnson [mailto:erik at valvesoftware.com] 
Sent: Monday, March 07, 2005 11:22 AM
To: perforce-user at perforce.com
Subject: [p4] Perforce server lockup problem


I'll try and give as much data on our setup, along with the problem
we've been having. I haven't gotten any really crisp leads from Perforce
support on this problem. Maybe someone else has already solved it.
 
We're running our Perforce server on Windows 2003 server on a machine
with 2GB RAM, a RAID 5 disk subsystem with 7200RPM SATA drives, and a
single 3.GHz HT processor. We're running server version
P4D/NTX86/2004.2/73359 (2004/12/27).
 
Our general workflow (and the one that tends to generate the problem) is
that we have a main branch that few people directly work on, with
personal branches off of it that individual developers integrate into.
There are roughly 20 developers integrating roughly 5,000 lines of code
a day. Unfortunately, we changed a couple of variables at once when the
problem started happening (upgraded server software, server hardware),
so I can't reasonably point to a specific root cause.
 
When a developer is merging from their personal branch into our main
codeline, we're seeing total database lockup, and constant reads on the
server (viewed via Windows perfmon). This means that all other users on
the system cannot sync, integrate, checkout, checkin, etc. The condition
generally takes around 15 minutes to clear, and then things all go back
to normal. CPU and RAM usage on the system is all nominal, and while the
reads on the disk subsystem are constant, they are not within 1/3 of the
observed peak read throughput.
 
Has anyone else seen behavior like this? It appears that the database is
being read table by table for some reason.
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.perforce.com/pipermail/perforce-user/attachments/20050318/5baa8aca/attachment-0007.html>


More information about the perforce-user mailing list