[p4] Performing difference and merge operations on Unicode(UTF-16) files
Robert Cowham
robert at vaccaperna.co.uk
Fri Jun 23 04:55:03 PDT 2006
I would agree with Patrick that it is not handled very well.
It was done originally to support the Japanese market (hence the missing 5.1
release which was just unicode), hence support for SHIFT JIS etc. The
translation mechanism is a global thing on a per connection basis as shown
by the API function:
class ClientApi : public StrDict {
public:
void SetTrans( int output, int content = -2,
int fnames = -2, int dialog = -2 );
This function is unfortunately not documented in the API docs!!
However, I use it in p4com to support Unicode filenames for things like Word
etc. The base files are stored as binaries, and I turn off translation of
contents - but it then handles the filenames happily. Note that in p4com I
use standard functions to convert between Windows Unicode format (UTF16
BSTRs) and UTF8 for filenames.
m_client.SetTrans(CharSetApi::UTF_8, CharSetApi::NOCONV,
CharSetApi::UTF_8,
CharSetApi::UTF_8);
LPCSTR p4ClientUser::TranslateFromBSTR(BSTR bs)
{
USES_CONVERSION;
if (!TranslateCharset())
{
m_buf.Set(W2CA(bs));
return m_buf.Text();
}
else
{
// Find out how big a buffer we need
int buflen = WideCharToMultiByte(CP_UTF8, 0, bs,
SysStringLen(bs),
NULL, 0, NULL, NULL);
if (m_tranbuf != NULL)
delete m_tranbuf;
m_tranbuf = new char[buflen + 1];
int copied = WideCharToMultiByte(CP_UTF8, 0, bs,
SysStringLen(bs),
m_tranbuf, buflen +
1, NULL, NULL);
if (0 == copied) throw DISP_E_EXCEPTION;
m_tranbuf[copied] = '\0'; // Make sure NULL terminated as
problems otherwise
return m_tranbuf;
}
}
Behind the scenes Perforce has their own converters doing something similar.
Having worked with a few clients who are having problems, I would certainly
like to add my support to requests for improvement in this area.
Robert
> -----Original Message-----
> From: perforce-user-bounces at perforce.com
> [mailto:perforce-user-bounces at perforce.com] On Behalf Of
> Patrick Bennett
> Sent: 23 June 2006 06:49
> To: T L Holaday
> Cc: perforce-user at perforce.com
> Subject: Re: [p4] Performing difference and merge operations
> on Unicode(UTF-16) files
>
> Perforce's Unicode support is pretty much broken IMO. When
> you set P4CHARSET you're basically telling Perforce that's
> the format that ALL of your Unicode files are in.
> It will expect files you're checking in to be in that format
> [it doesn't check], it always converts to UTF-8 on the server
> [this isn't a bad thing], and will always convert from UTF-8
> on the server to whatever you specify in P4CHARSET.
> Realistically, the filetype should be encoding specific and
> assignable per file like any other filetype. The filetype
> shouldn't be something like 'unicode' (it could be optional,
> if Perforce payed attention to the BOM's [if present] at the
> beginning of the file to determine encoding type), but should
> probably be the specific encoding to use (just like
> p4charset). Whether that be utf8, utf16, shiftjis, etc.
> We have a variety of different unicode files in our system,
> many in UTF8 and a fair number in UTF16 (the application that
> has the UTF16 files always resaves them as UTF16 no matter
> what :<). With Perforce there's no easy way for a user to
> work with these different files. If everything isn't in your
> one p4charset format, you're basically out of luck.
>
> As for your corruption, my guess is the files may have been
> ruined before you switched your environment over. With the
> unicode option not set on the server, the unicode filetype
> just maps to 'text', so utf16
> files would most likely be corrupted. utf8 files would probably
> survive ok. Make sure the filetype is correctly set and
> submit a corrected version of the files (and that your
> charset mapping matches the file).
>
> For now, until Perforce fixes up their unicode support, we've
> resorted to storing all utf16 files as 'binary'. This of
> course has huge downsides (no Perforce controlled merges),
> but we don't have too much of a choice.
>
> When I spoke with someone in tech. support about it he seemed
> to indicate that if want Unicode support to be fixed, we (the
> customers) will probably need to be a bit more vocal about
> it. I think (and it's kind of obvious IMO) Unicode isn't too
> much of a concern at Perforce at this point.
More information about the perforce-user
mailing list