[p4] p4 verify question
sweeney at addr.com
Mon Oct 6 15:34:11 PDT 2008
Matt Janulewicz wrote:
> Hello -
> I have inherited a nightly/weekly maintenance script that runs a verify
> on a large set of data over the weekend. This sometimes takes more than
> 12 hours (per day! It's split in half!) Upon actually reading the script
> I see that it runs two commands one after the other:
> p4 verify -q //depot/...
> p4 verify -u -q //depot/...
> Is that first verify command really necessary? The output of it gets
> dumped to a log file (which we eventually throw away after not looking
> at it for a couple weeks) but other than that if we run the '-u -q' part
> immediately after that, is it redundant? At other jobs I've only ever
> run that second command on a weekly basis, I just want to be sure I'm
> not missing something.
The reason for the two passes here is that the first will do a verify of
only those files with checksums, and the second will store checksums
only for those without. However, verify -u is largely redundant in
recent releases. Newer servers calculate the checksum (2003.2) and
length (2005.1) automatically on submit, so it should only be needed to
force the checksum in the event of error or behind the scenes depot
manipulation. If your depot was upgraded from an older version that
didn't store checksums (or lengths) and you weren't running periodic
verify -u operations, then a single one-off verify -u will generate and
store any missing checksums and lengths and you should never need to run
it again, since it should no longer be possible to store a new file
revision without this data.
> In the 'p4 help' for verify it also mentions:
> p4 verify -q #1,#1
> p4 verify -q #head,#head
> ... to verify the first and head revisions of all files. This leads me
> to believe that it's a good idea to run a plain p4 verify -q after the
> -u -q, but is this (only #1 and #head) significantly faster than running
> it on //...?
I benchmarked this while working at a large computer games company and
it turned out to be significantly slower to run the two separate #1,#1
#head,#head passes versus a single pass against //... -- approximately 2
x 3 hours v/s 1 x 4ish hours, from what I recall. However, this may be
a pathological case due to the nature of the files stored in Perforce;
almost exclusively large binary audio, video and 3d game assets and
rendering textures, along with a relatively insignificant volume of code
and other text files. I would log start and stop timestamps for your
data and do whichever is quicker.
> I suspect it is but wonder if I'll be missing anything, or
> leave my backups at risk, if I don't run it on all revisions ...?
The reason that '#1,#1' and '#head,head' is considered 'good enough' for
text files is down to the way that RCS ,v files store the file
revisions. The head revision of the file is stored inline in its
entirety, along with a series of deltas that can be used to 'patch' the
head backwards in time to get to the older revisions. To get from #head
back to #1, all intervening deltas must be generated and applied. If
any of them is in error, this is guaranteed to show up in #1, as the
error will cascade through into all the older revisions. For a given
//.../badfile flagged by this, you will still need to p4 verify
//.../badfile[#1,#head] to see which revision introduces the error. And
of course it's only a win if it _is_ actually faster than checking
everything, which it wasn't in my case. Of course, for binary files,
which are stored as separate gzip files in the server, checking first
and head only will silent miss any corruption to all of the intervening
versions. Perforce could usefully rewrite the various verify
documentation so that it is complete, correct and up to date.
Hope this helps.
More information about the perforce-user