[p4] Performing difference and merge operations on Unicode (UTF-16) files
Patrick Bennett
patrick.bennett at inin.com
Thu Jun 22 22:48:42 PDT 2006
Perforce's Unicode support is pretty much broken IMO. When you set
P4CHARSET you're basically telling Perforce that's the format that ALL
of your Unicode files are in.
It will expect files you're checking in to be in that format [it doesn't
check], it always converts to UTF-8 on the server [this isn't a bad
thing], and will always convert from UTF-8 on the server to whatever you
specify in P4CHARSET. Realistically, the filetype should be encoding
specific and assignable per file like any other filetype. The filetype
shouldn't be something like 'unicode' (it could be optional, if Perforce
payed attention to the BOM's [if present] at the beginning of the file
to determine encoding type), but should probably be the specific
encoding to use (just like p4charset). Whether that be utf8, utf16,
shiftjis, etc.
We have a variety of different unicode files in our system, many in UTF8
and a fair number in UTF16 (the application that has the UTF16 files
always resaves them as UTF16 no matter what :<). With Perforce there's
no easy way for a user to work with these different files. If
everything isn't in your one p4charset format, you're basically out of luck.
As for your corruption, my guess is the files may have been ruined
before you switched your environment over. With the unicode option not
set on the server, the unicode filetype just maps to 'text', so utf16
files would most likely be corrupted. utf8 files would probably
survive ok. Make sure the filetype is correctly set and submit a
corrected version of the files (and that your charset mapping matches
the file).
For now, until Perforce fixes up their unicode support, we've resorted
to storing all utf16 files as 'binary'. This of course has huge
downsides (no Perforce controlled merges), but we don't have too much of
a choice.
When I spoke with someone in tech. support about it he seemed to
indicate that if want Unicode support to be fixed, we (the customers)
will probably need to be a bit more vocal about it. I think (and it's
kind of obvious IMO) Unicode isn't too much of a concern at Perforce at
this point.
T L Holaday wrote:
> Is anyone successfully performing diff and merge operations on Unicode
> (UTF-16) files?
>
> My host and client are both Windows Server 2003 R2, I have switched the
> server with -ix, and my environment variables are P4CHARSET=utf16 and
> P4COMMANDCHARSET=winansi.
>
> I would like to be able to use the Unicode filetype, but when I do, the
> files get corrupted upon retrieval. Any suggestions? I've read the tech
> note, but there appears to be something missing.
>
>
--
*Patrick Bennett* | Software Engineer
phone & fax +1.317.715.8302 | patrick.bennett at inin.com
*Interactive Intelligence Inc.*
Deliberately Innovative
www.inin.com <http://www.inin.com/>
More information about the perforce-user
mailing list