[p4] cvs2p4 version 3.0b1 released...

rmg at perfortify.com rmg at perfortify.com
Mon Jul 24 13:16:51 PDT 2006

This is to announce that a new release of cvs2p4, the CVS -> Perforce
repository conversion tool, is now available at the Perforce Public


I felt oblidged to include the "b" (for "beta"), so that users will be
aware that this release introduces some major changes. 



Here's the summary of what's new, in no particular order, from the
NEWS file:

=== Release 3.0b1, July 24, 2006

This release represents a major rework, driven by the needs of the
most ambitious conversion I have ever performed. The demands of this
conversion have required that the tools both perform better, and
support additional new features, especially with respect to
heuristically determining correct CVS release tags to Perforce

At least one external "alpha" test user has performed a conversion
using this code. I've decided to call this a "beta" release simply to
alert users that there may be some rough edges. However, given the
degree of testing it has undergone, I would encourage potential users
to use this release rather the final one of the 2.n series.

Here's a laundry list of the changes since 2.5.5. I have decided to
leave the list in time order (versus by significance of the change)
for now, at least. But one change in particular bears special mention:

  ** This version now requires a specially patched version of the
  ** "rlog" command (from RCS) in order to work. The required patch is
  ** supplied in the release; you will need to apply the patch, and
  ** compile a new version of rlog. See the section
  ** "src/rcs-5.7/src/rlog.c.patch" in the README file for further
  ** information.

Changes since 2.5.5:

- Tweak the test/runtest script to sleep for 2 seconds after issuing
  the "p4 admin stop" command, to give the server time to shut down.
  This was causing problems on some systems.

- Yowsa! Eating my own dogfood at a real conversion just now, of a
  moderately large 5-year-old CVS repository, I feel your pain, oh ye
  who have suffered... shall we say: "label bloat"?. Both the dblabels
  (journal-format) file AND the final resultant db.label file get very
  large very quickly. In part, this is because the "extra" label
  records that get written for branches consume space, I'm
  guessing. BUT the fact that we write a temp file (dblabels), and
  then replay it into the Perforce server, makes the conversion take
  up much more disk space.

  As a simple hack to provide immediate relief, I have rejiggered
  bin/dolabels such that the "dblables" data stream is written directly
  into an instance of "p4 -jr -". This saves the time and disk space
  involved writing the temp file, which is, normally, not reused
  after the conversion. Waste not, want not!

- Achh! I have encountered a repository with "."s in the author:
  names. These are supposed to be illegal (as I read rcsfile(5)).
  But cvs2p4 now tolerates them.

- bin/dolabels has been tweaked so as to allow a user-supplied
  "labelmap" file to give hints as to which labels really correspond
  to which branches.

- Up till now, bin/genmetadata has used to use the "sort +1" form of
  specifying sort keys.  This seems to be obsolescent; the sort in
  Fedora Core 3 seems to still support it, but the one in FC5 no
  longer does. I have switched to the "-k 2" form for this, and it
  seems to work fine on FC3 and FC5. [Seems OK on FreeBSD 4, too].

- bin/genmetadata now leaves two files, tags.txt and brtags.txt, in
  the conversion directory. These are, simply, sorted lists of the
  "plain" tags and branch tags encountered during the scan of the CVS
  repository. "brtags.txt" contains one line per branch tag
  encountered in the conversion. "tags.txt" is one line per tag,
  being the tag name, whitespace, and then a '\001'-separated list
  of branches in which the revision is present.

  These can be useful when you are faced with the task of
  building a mapping function between CVS tags and CVS branches, for
  use with the branch_for_tag() function in bin/dolabels.

- bin/genmetadata can now infer tag->branch mappings heuristically in
  many cases; the mappings thus determined are recorded in the
  <convdir>/tags.txt file, which bin/dolabels now knows how to load
  and use. The overall effect is that now, by default, Perforce labels
  will be created for those CVS release tags for which a branch
  mapping was discovered heuristically. (And, by default, ONLY these!)
  Please see the new "RCS/CVS Tag to Perforce Label Conversion"
  section in the README file for more details.

- $P4ROOT now defaults to being placed _inside_ the "conversion dir",
  making it easier to have multiple conversions around without

- I believe that cvs2p4 now runs in a fixed memory footprint; all
  large data sets are now kept in DBM databases. If you see evidence
  otherwise please let me know!

- In order to have a way of gauging bin/dolabels progress, it now
  prints out an input file line count for every 10,000 lines of the
  "labels" file as it is processed.

- In previous editions, with $COPYIMPORT=1, the copy of the CVS
  repository tree left under $P4ROOT had directory write permission
  turned off on directories in the copy. This prevented running the
  automated tests repeatedly, since the attempt to remove the existing
  $P4ROOT at the start of the second and subsequent runs would
  fail. Write permission is now left on in the newly copied

- Change from the not-so apropos Artistic license to the lithe MIT

- Improved handling of vendor branch files, and the intricacies of
  whether vendor-dropped files have been modified locally or not.
  (See the notes labelled "VENDOR-DROP BRANCHES" and in the README

- Added $Depotmap{} feature for mapping top-level subdirs of
  CVS_MODULE into different depots.

- OK, this is what I call NEWS!: Yesterday I noticed genmetadata - the
  one I sweated to make fast with huge binaries with a few large
  revisions go fast - going real SLOW on a file with thousands of
  small text revisions. I gave up, patched rlog to have the one
  missing shred of info I needed, and reimplemented the whole thing to
  just use rlog. Call this bullet bitten!

  I've verified that this version produces IDENTICAL results to
  to older perl-based parsing... in about half the aggregate time.

- There is now a facility allowing you to map top-level directories in
  the CVS content being converted into multiple different Perforce
  depots.  Basically, just add a line in the following format to the
  config file for each such mapping you want to establish:

    $Depotmap{"<topdir>"} = "<depot>";

  All files with no such mapping in effect will be put into the
  default depot path defined by $DEPOT in the config file.

- No question, this gets to be 3.0! :-). It now does special spoofing
  treatment for cvs import-ed files, to match the cvs behavior when
  multiple vendor drops have been taken, with no local modifications
  yet applied. Label generation has been fixed for these so that
  labels applied to, e.g., revisions before any 1.2 is checked in
  will appear to be in *both* the "import" branch and the "main" branch,
  just the way they behave in CVS. This required very significant mucking
  in all stages of the pipeline! Much testing will ensue...

- bin/srcdiff copes with relative paths now, like a reasonable person
  might assume it would.

- The "vendor branch" names found in the RCS archives of files created
  by "cvs import" (i.e., the names of thags on "1.1.1"), are used as
  the vendor branch name, rather than "import". (The old behavior, of
  flattening them out into a single "import" branch, may be added back
  as an option in the future if it's missed at all)

- There is now a special list you can use to declare files which
  should explicitly be treated as binaries. Please see the
  "BIN_PATHNAME" section at the end of the config file.

- The test/config file has been split-cloned into


   The former has the required defaults for running the test,
   while the latter can provide a good place for users to start.

- Added the $USE_IMPORT_DEPOT config option; if enabled (by assigning
  a depot name to it), revisions from the 1.1.1 branch (aka the cvs
  import "vendor" branch) will be placed into the depot named by
  $USE_IMPORT_DEPOT. If used, the name must include the leading "//",

    $USE_IMPORT_DEPOT = "//import";

More information about the perforce-user mailing list