[revml] Adding Branching to vcp and RevML

Barrie Slaymaker barries@slaysys.com
Fri, 1 Feb 2002 12:01:43 -0500


Here's a first cut; comments please!

Thanks,

Barrie

-------------------------------------------------------------------------

vcp Branching Design
====================

Version:    0.1
Author:     Barrie Slaymaker <barries@slaysys.com>
Discuss at: revml@perforce.com

I'm trying to figure out the most appropriate way to imbue RevML and vcp
with branch and merge tracking and replicating abilities.  This
document is cvs and perforce specific at the moment, but I've tried to
represent concepts and not product features so hopefully it won't be too
far from here to a general solution.

Please think about all the SCMs you know about and see what we'd need to
alter to accommodate them, thanks.  VSS is especially crucial in this
regard as we have a project underway to support it.


Definitions
===========

Every product and engineer seems to use unique terminology for branching
and repositories, some of which conflict with terminology that vcp
already uses (source and destination, for instance).  Here is a list of
terms and what they mean in the context of vcp and this document that
may help head off some confusion:

    version - A specific version of a file.  This is easy to confuse
    with "revision", (defined just below); I do it all the time, though
    I've tried to keep this document consistent.

    delta - the changes applied to one version to produce another.

    revision - A description of the change from one version to another,
    usually including a delta, but sometimes (esp. with binary files)
    including the entire new version of a file.

    source - the repository being read from (possibly a RevML file).

    destination - the repository being written to (possibly a RevML
    file).

    transfer - the act of extracting from the source and applying them
    to the destination.

    metadata - all data not in the actual files/revisions being moved.

    base version - the version before the current version.  In the
    context of a branch, this is the file on the "main" line that the
    branched revision was created from.

    target version - the version created by branching from a base
    version.


Goals
=====

1. Record the origin (location and version) for each file version
created by branching.  We'll call this a "branch record", akin to
Perforce's "integration record".

2. Branch records must be able to apply to groups of files or single
files, depending on the source repository's branching methodology.

3. Revisions affected by branching should not need to refer to the
branch record.

4. Branch records need to be able to capture generic and
product-specific metadata.  Generic metadata includes base and
target versions.  Product specific metadata includes Perforce
branch views.

5. Handle merges as best as possible.  Merges don't really affect cvs,
but when doing p4->p4, it would be nice to simulate merges; not sure
how to do this easily.  I'd like to do the equivalent of a "p4 resolve"
that would let me alter the file on disk instead of through the
interactive resolve, then have "p4 submit" get the file like the p4 edit
does.  Something like the "am" action in the interactive mode, for
example.  Is there a p4 incantation for that (I've never had to do it and
haven't been able to concoct such an incantation here)?  As a fall-back,
we could use P4MERGE and supply "m\n" to an interactive "p4 resolve"
session.

7. A branch map is a mapping of source repository branches to
destination repository branches.  A branch map will contain the branch
records.  Branch maps should be extractable, probably using a command
like:

    vcp cvs:/module branchmap:foo.bmap

The resulting files would be XML and perhaps also as YAML
(http://yaml.org/).  The motivation for optionally supporting YAML is
that it is a less cluttered file format that administrators may find
easier to read and alter.  It may not be enough easier to warrant the
extra effort involved, however, especially in the first implementation.
The XML format is required because branch maps must be able to exist
"naturally" in RevML and so that additional tools (generic XML tools,
textual interfaces and GUIs) can be brought to bear on branch maps
without implementing a new format with an uncertain future (YAML).

8. Branch mapping files should be editable by hand and then (optionally)
usable when doing a transfer:

    vcp cvs:/module/... --branchmap=foo.bmap p4://depot

This will allow multiple transfers to take place with the same branch
map.

9. If an external branchmap is specified for a transfer, it should be an
error if a new branch has appeared that is not in the branchmap.  This
error should occur before any changes in the target repository occur and
(possibly) a new branchmap file could be created (foo.bmap.001 or such)
that contains the contents of foo.bmap with the missing branches added
in.  The error should be suppressible with an option.  This goal is
intended to prevent accidentally missing a branch or using the wrong
branch map, while making it easy (by copying a file) to add new branch
mappings when new branches occur.

10. Branch maps should not be required when doing a transfer if it is
possible to make intelligent guesses about branch names.  An inability
to make intelligent guesses should cause an error message and exit
without altering the destination repository.

11. Both "external wrapper" and integrated text and GUI clients should
be supported by the branch map concepts and implementation.


Design
======

It is useful to describe the information that defines a branch using two
categories: the "branch" metadata, which is associated with the branch
itself and not with individual files affected by the branch, and
"per-file branch metadata", which is largely a mapping of what files
were branched from what versions (and perhaps by who and when).

Examples of branch metadata are:
    - branch name/tag/label
    - location in repository
    - whether or not to transfer the branch
    - product-specific data
        - like Perforce's branch view.
        - cvs's branch (and "magic branch") number
    - vcp/RevML's assigned branch id.
    - Perhaps a branch comment

Examples of per-file branch metadata are:
    - base version (what Perforce calls "source" or "theirs")
        - location in repository
        - version id (<rev_id> in RevML terms)

This metadata is all distinct from the actual file metadata, which
for the first version in the branch would contain such data as
    - user performing the branch
    - when the branch was performed
    - the comment entered while branching.
    - the branch id (if necessary; with cvs the <rev_id> contains
    a branch number and with Perforce the file's location should be
    enough to identify the branch in most cases if not all; we need to
    identify any counter examples to this assumption).

This distinction between branch, per-file branch, and file metadata
is made for several reasons:

    1. file metadata is the minimal subset needed to move revisions; if
    the branches to operate on in the source and destination
    repositories are fully specified by the user, no branch or per-file
    branch metadata are needed.

    2. It is a goal to be able to store branch metadata and perhaps
    per-file branch metadata externally in branch mapping files to allow
    them to be altered to control the transfer process in a reusable
    manner.

    3. Branch meta data can be used to contain information that is
    common to the branched files, and is often the only thing a user may
    need to alter.  A side effect of this is that branch metadata must
    occur before the per-file branch metadata in the information stream
    (whether it be RevML or vcp's internal transfer process).

    4. It is far more likely that a user will want to review and alter
    the branch metadata than the per-file branch metadata to control
    whether or not and and how a branch is transferred.

    5. branch metadata may exist before the branch is actually made (to
    wit, Perforce's branch views may exist before the branch is
    performed).

    6. In RevML and in the inner workings of vcp, branch metadata will
    need to come before revision records (<rev> elements) so that the
    receiving processor can store them in a lookup table to be consulted
    when the file revisions are processed.

    7. Some errors should be detected (nonexistent branches and branches
    that do exists but haven't been configured in the branch mapping,
    for instance) using branch maps before a transfer begins.

It is likely that the per-file branch metadata will be packaged with the
metadata for the reversion that creates a branched file, though only
when necessary.

The branch maps will be representable using a subset of RevML that can
occur within a "normal" RevML file (within the <revml> element) or as a
separate document.  Only one branchmap may occur in any file.

As with RevML, a branch map may contain elements describing the source
repository for auditing purposes (i.e. reading the file to see just what
it contains some months after you created it :), but this will not be
used in processing except possibly to give more informative errors:

    branch r1_0 in source repository not found in branch map foo.bmap.
    NOTE: foo.bmap does not appear to be for this repository
    repository details: 
        ...  <=== extracted from repository
    branch map details:
        ...  <==== extracted from branch map file

Here's an initial cut at a branch map describing a CVS repository in
XML.  Sorry for the wide-screen effect, I can reformat if it drives
people's email clients bonkers.  I'm picking on cvs here because I'm
mostly concerned with cvs->foo transfers, thought the other way is
necessary for testing purposes.

  <branch_map>

    <!-- metadata that applies to the entire branch map
    -->
    <source_root>/foo/bar</source_root>     <!-- all source paths are relative to this -->
    <dest_root>//depot/foo/bar</dest_root>  <!-- user supplied, all destination paths are relative to this -->
    <time>2000-01-01 00:00:00Z</time>       <!-- when this map was created -->
    <rep_type>cvs</rep_type>
    <rep_desc>Concurrent Versions System (CVS) 1.10.7 (client/server)</rep_desc>

    <!-- branch metadata -->

    <branches>
      <branch id="mainline">
        <dest_root>//depot/main</dest_root> <!-- where to put mainline files -->
      </branch>
      <branch id="branch-1">
        <source_id>release_1</source_id>
        <dest_id>release_1</dest_id>        <!-- edit this to change in a transfer -->
        <dest_root>//depot/release_1</>     <!-- only needed to override the <branch_map/source_root/dest_root> -->
      </branch>
      ...more branches...
    </branches>
  </branch_map>

It is likely that the best place to put per-file branch metadata is with
the existing per-revision information for the first revision of a file
in the branch, like the **'ed items here:

  <rev>
    <name>a/deeply/buried/file</name>
    <type>text</type>
    <cvs_info>Some info cvs might emit about this file</cvs_info>
    <rev_id>1.23.2.1</rev_id>
    <time>2000-01-01 12:00:08Z</time>
    <base_rev_id>1.23</base_rev_id>    <!-- ** -->
    <base_rev_name>foo</base_rev_name> <!-- ** -->
    <user_id>cvs_t_user</user_id>
    <label>achoo08</label>
    <label>blessyou08</label>
    <comment>comment 2
</comment>
    <base_rev_id>1.1</base_rev_id>
    <delta type="diff-u" encoding="none">@@ -1 +1 @@
-a/deeply/buried/file, revision 1, char 0x01="<char code="0x01" />"
+a/deeply/buried/file, revision 2, char 0x09="  "
</delta>
    <digest type="MD5" encoding="base64">Dint+VF10zKgeQcxVRuU9g</digest>
   </rev>

The <base_rev_id> tag already exists and is ideal for use in identifying
the version number of the base file.  The <base_rev_name> tag is new and
only necessary when the location or name of the file in the repository
has changed as part of the branch process.  It is relative to the
<dest_root>, like the existing <name> is ("<dest_root>" is spelled
"<rev_root>" in the current RevML version 0.28, that needs to change to
be clearer and to support branch maps more effectively).