[p4] binary files?

Weintraub, David david.weintraub at bofasecurities.com
Mon Jun 12 06:06:29 PDT 2006


Oh, CVS and RCS handle binary files with ease. The big problem is that
these version control systems automatically update their keywords in any
source version of a file unless you specifically mark the file as binary
(or in RCS's case, told it not to expand keywords). I found this out
when I had a word processing document where I was talking about RCS
keywords.

CVS, since it works in both the Unix and Windows world will also convert
End of Line characters if you're working in a multiple platform
environment. For example, it will change x0Dx0A (CR/LF) to x0A (LF) if
it believes that the document you're working on is a text document.
That's probably the biggest source of binary corruption. That's why it
was very important to tag CVS files as binary using the "-kb" command
when you add a binary file into the CVS repository. Doing that allows
CVS to handle the binary file with no problem. Unfortunately, people
forget to do that, and don't realize there's a problem until much later
after CVS corrupts a file because it dutifully converted your EOL
sequence for you.

Perforce does a lot better than CVS on two fronts. First of all, it
doesn't expand keywords unless you explicitly told it to, and you'd
probably wouldn't do that if you knew the file was binary to begin with.
(Yes, people do mark binary files for keyword expansion. I knew someone
who set all files in their archive to automatically expanded RCS
keywords, and then realized he destroyed a few binary files in the
process. This wasn't on a Perforce archive, but I can imagine someone
using triggers or the filetype table to do this without realizing what
they could cause.)

The other thing Perforce does is examine the first block of characters
of a file, and if it finds any non-text characters, it assumes the file
is binary and not a text file. Perforce will then use gzip storage for
the file instead of RCS file format and not do EOL line conversion. Your
MS Word and Excel files are extremely safe under this mechanism since
they are bound to contain non-text characters in the first block of
characters.

Unfortunately, certain file formats like PDF can fool this system, so it
is better to help Perforce by determining file type by suffix.
Fortunately, Perforce has already put into their system the most common
types of suffix mappings in their file type. Do a "p4 filetype -o" to
see what is already in Perforce. You'll notice that *.doc, *.xls, and
the problematic *.pdf are already defined as binary files.

Bottom line, you can store your Microsoft Word and Excel files in
Perforce without any worry. Just remember that each version of a binary
file takes up a lot of room since each version must be stored on the
server. In text files, only the difference between the versions are
stored. Changing a single character in a version of a 1 megabyte Word
document means storing an extra megabyte on your server. Changing a
single character of a 1 megabyte text file means only storing around 100
extra bytes on your server.


-----Original Message-----
From: perforce-user-bounces at perforce.com
[mailto:perforce-user-bounces at perforce.com] On Behalf Of Mike
Sent: Saturday, June 10, 2006 7:48 PM
To: perforce-user at perforce.com
Subject: [p4] binary files?

Can perforce deal properly (not corrput) binary files such as a MS Word
document that contains the interface specifications for a program, a MS
Excel spreadsheet that has the database size calculations, or a MS
Powerpoint file that has a presentation of the program for users?

I have used RCS and CVS a lot and know they don't deal nicely with
binary files.

Mike
_______________________________________________
perforce-user mailing list  -  perforce-user at perforce.com
http://maillist.perforce.com/mailman/listinfo/perforce-user


More information about the perforce-user mailing list