[revml] Conversion of Perl Perforce repository to Subversion - Part 1
Barrie Slaymaker
barries@slaysys.com
Thu, 20 May 2004 22:31:50 -0400
On Wed, May 19, 2004 at 12:26:30PM -0400, John Peacock wrote:
> I have [stupidly] agreed to test the feasibility of converting the main
> Perl repository from Perforce to Subversion. Initially, this would be to
> provide a readonly public repository; eventually, it might lead to
> development being moved permanently from P4 to SVN. I have two questions,
> the first more of a possible design issue with VCP and the second more of a
> practical question based on my incomplete understanding of VCP, so I'll
> leave the second question for another message.
>
> I am using CLKao's svk to mirror the Perforce repository, which ultimately
> uses VCP to do the heavy lifting. I've attempted the conversion twice and
> both times, the server eventually swapped itself almost to death due to the
> huge RAM requirements (first 512MB then 2GB actual memory installed).
> Based on my readings of the LIMITATIONS in VCP::Dest::revml, the odds are
> good that the basic design is flawed for such a large conversion (64k
> revisions).
I hope that you're not trying to use the VCP::Dest::revml driver for
serious conversions. Even if it didn't hog up a log of disk space,
going to RevML and then away from RevML is going to be terribly slow.
The VCP::Dest::revml driver is definitely not meant to convert huge
repositories. It's a research and testing driver until someone comes up
with a good use case for RevML (we originally set out to develop RevML
with VCP's precursor being a desktop extractor/inserter to/from RevML,
but there seems to be no constituency for RevML the language and doing
conversions by extracting from the source to RevML and then from RevML
in to the destination is going to be much less efficient than going
directly from one repository to another).
That being said, should a need for production support for RevML arise,
VCP's RevML drivers could be optimized to only cache a few files and
refresh the cache from the source repository, but only if the source
repository is also not RevML.
The RAM limitation should not apply to other drivers, though I can't
speak for the svn drivers. If you're seeing massive RAM use when using
VCP::Source::{p4,cvs,vss} and VCP::Dest::p4, then I need to get to the
bottom of it. But I don't think that's what you're doing.
If you want to send me a copy of the perl repository, I can work with it
here to narrow in on the problem; the core VCP filters and {p4,vss,cvs}
drivers need to be RAM friendly.
> I don't know where to start looking; I assume if I could find out what hash
> is being used to store the metadata, I could convert that to a tied hash
> and trade performance for being able to actually finish the conversion.
> I'm not even sure if this is a flaw in VCP::Dest::svk or if it is in one of
> the other modules that makes up VCP.
>
> Any hints and directions to start my hunt would be appreciated.
You can try using the null: destination and (first) no filter, then
(second and later) the filters VCP reports using in its log file on the
p4->svn conversion to isolate the RAM usage.
By far the most common data structure is the VCP::Rev object, so tracing
the lifecycle of VCP::Rev instance is likely to turn up some
information. In order to conserve memory, however, this is a packed
data structure in memory and a lot of the standard strings are stored in
tied hashes so that VCP::Rev instances can contain ints. Forcing a
coredump and looking at it with the strings command might be informative
(in case I forgot to tie a hash).
- Barrie