Posts tagged with git

Rebase v Merge in Git

February 3rd, 2009

If you haven’t worked with a version control tool that allows for easy branching, you’re probably wondering what the difference is between rebase and merge and why you’d choose one over the other.

From the standpoint of the end result, a merge and a rebase in Git appear to do the same thing:

A + B + A’ (merge) = A + A’ + B (rebase)

Wouldn’t it be simpler to just choose one operation and stick with it?

The answer of course is no.  Otherwise you wouldn’t have the option.  (If you feel completely contrary, best of luck.  And choose merge.)

It really only becomes apparent once your development effort becomes hierarchical, either in terms of application lifecycle (such as dev, test and release versions) or in a team structure.  You’re now dealing with multiple branches, each with non-trivial changes that can and will occur independently.

Rebases are how changes should pass from the top of hierarchy downwards and merges are how they flow back upwards.

Let’s take a look in more detail at what is actually taking place in each operation for a parent branch A and a child branch B:

Merge

A + B + A’ + dA’B

where A’ are the changes being merged in and dA’B the resolution of merge conflicts from A’ and the set of commits B on the current branch

The changes introduced by A’ and dA’B are grouped together into one merge commit A” = A’ + dA’B that is added on top of the existing set of commits (A + B) and becomes the head of the current branch:

A + B + A”

Rebase

A + A’ + B + dA’B

The rebase resets the starting point the branch and reapplies the set of commits B.  Merge conflicts dA’B are combined with these commits so they become grouped together B’ = B + dA’B:

A + A’ + B’

If you stare at it, you’ll realize that the rebase guarantees that the changes being brought it in from the other branch come in exactly as-is: A + A’.  Any conflicts are resolved and contained with the associated commits B’ on the current branch.

Using merge to pull in changes from the higher-level branch mixes those changes with the resolved merge conflicts.  This means that the current branch won’t necessarily have the same state as the one it was based on (from the hierarchical structure).  And since the resolved conflicts are grouped all together in one merge commit, you’ve also made it harder to cleanly cherry-pick individual changes.

Having development lifecycle tracks and each local developer branch start from known consistent states is critical to reducing and resolving code issues.  Where a change occurs (or suddenly becomes missing) and who is responsible become easier to determine.  By using the rebase to pull in changes, you have that.

When you’re submitting changes back up the chain, you only want to add your changes on top of the existing commits of the higher-level branch.  Merge is clearly the operation for that.

Even if you’re a single developer with only a few branches, it’s worth it to get in the habit of using rebase and merge properly.  The basic work pattern will look like:

  1. Create new branch B from existing branch A
  2. Add/commit changes on branch B
  3. Rebase updates from branch A
  4. Merge changes from branch B onto branch A