[p4] Performing difference and merge operations on Unicode (UTF-16) files

Patrick Bennett patrick.bennett at inin.com
Thu Jun 22 22:48:42 PDT 2006


Perforce's Unicode support is pretty much broken IMO.  When you set 
P4CHARSET you're basically telling Perforce that's the format that ALL 
of your Unicode files are in.
It will expect files you're checking in to be in that format [it doesn't 
check], it always converts to UTF-8 on the server [this isn't a bad 
thing], and will always convert from UTF-8 on the server to whatever you 
specify in P4CHARSET.  Realistically, the filetype should be encoding 
specific and assignable per file like any other filetype.  The filetype 
shouldn't be something like 'unicode' (it could be optional, if Perforce 
payed attention to the BOM's [if present] at the beginning of the file 
to determine encoding type), but should probably be the specific 
encoding to use (just like p4charset).  Whether that be utf8, utf16, 
shiftjis, etc. 
We have a variety of different unicode files in our system, many in UTF8 
and a fair number in UTF16 (the application that has the UTF16 files 
always resaves them as UTF16 no matter what  :<).  With Perforce there's 
no easy way for a user to work with these different files.  If 
everything isn't in your one p4charset format, you're basically out of luck.

As for your corruption, my guess is the files may have been ruined 
before you switched your environment over.  With the unicode option not 
set on the server, the unicode filetype just maps to 'text', so utf16 
files would most likely be corrupted.   utf8 files would probably 
survive ok.  Make sure the filetype is correctly set and submit a 
corrected version of the files (and that your charset mapping matches 
the file).

For now, until Perforce fixes up their unicode support, we've resorted 
to storing all utf16 files as 'binary'.  This of course has huge 
downsides (no Perforce controlled merges), but we don't have too much of 
a choice.

When I spoke with someone in tech. support about it he seemed to 
indicate that if want Unicode support to be fixed, we (the customers) 
will probably need to be a bit more vocal about it.  I think (and it's 
kind of obvious IMO) Unicode isn't too much of a concern at Perforce at 
this point.

T L Holaday wrote:
> Is anyone successfully  performing diff and merge operations on Unicode
> (UTF-16) files?
>
> My host and client are both Windows Server 2003 R2, I have switched the
> server with -ix, and my environment variables are P4CHARSET=utf16 and
> P4COMMANDCHARSET=winansi.
>
> I would like to be able to use the Unicode filetype, but when I do, the
> files get corrupted upon retrieval.  Any suggestions?  I've read the tech
> note, but there appears to be something missing.
>
>   


-- 

*Patrick Bennett* | Software Engineer
phone & fax +1.317.715.8302 | patrick.bennett at inin.com
 
*Interactive Intelligence Inc.*
Deliberately Innovative
www.inin.com <http://www.inin.com/>



More information about the perforce-user mailing list