We recently migrated from a self-managed Gitlab instance to gitlab.com. The system administrator of the self-managed Gitlab instance said this would simply entail a git pull && git push and that the migration will be done quickly. Depending on your usage of Gitlab, this is either a naïve oversimplification or straight forward dangerous. The truth is more nuanced and entails quite a bit more work. Since the migration took us a couple of full working days and we wrote some reusable checklists and code in the process, we have quickly jotted these down. Maybe somebody else can also make use of it.

When moving off a self-managed instance, there's a couple high level planning questions to consider:

  • [ ] Are there multiple entities using Gitlab?

    • In our case, we had three companies using the instance that each had multiple groups - additionally there's a couple people that had projects in their private accounts.
    • When moving, they might have different strategies - at least you'll need different API keys. In our case, the move entailed even moving to different Forges.
  • [ ] How to coordinate the switch between different teams and people?

    • Moving takes time and is not an atomic task. If you have multiple projects and teams, there's communication effort on top of planning and the actual work.
  • [ ] Inform your teams with enough time to plan ahead.

    • A week is not enough unless there's nothing else on the table for that week.
    • A team might have used a Gitlab feature that's not easy to migrate or you didn't think of migrating yourself.
    • In any case, inform them with a strategic plan, otherwise coordination can become more difficult.
    • Maybe you can use this blog post as a blueprint for your migration plans, at least.

After these, there's high level technical tasks:

  • [ ] Chose a new Forge.

    • Moving off Gitlab to a different Forge will most likely lose a lot of metadata, make your automation with bots and Gitlab CI unusable, etc.
    • Beware, however, that even moving to another Gitlab instance like gitlab.com will lose metadata and entail manual work.
    • We chose to stick with Gitlab, because we have no reason to move off Gitlab. We're happy users and have been for years.
  • [ ] On gitlab.com, create new groups and user accounts.

    • Make sure these have the correct rights set up.
    • Maybe you can use this as a chance to do some cleanup.
  • [ ] Invite your teams to the new groups.

Now, to the actual migration tasks. Gitlab has the ability to export and import projects. This is possible to do in the Web Application. However, depending on the number of projects, this will be quite tedious and error prone. We opted to make use of various Gitlab CLI projects, but that didn't pan out. Having said that, using the Gitlab API directly is well documented and straight forward.

Now, on a high level, we'll do the following:

  1. Get the current projects lists together with id and name.
  2. Create a mapping between old and new project names.
  3. Trigger a job to export each project.
  4. Poll the self-managed Gitlab for the export until it can be downloaded.
  5. Import each project into gitlab.com.

Here are the details:

  1. Get the project list:
curl "https://gitlab.200ok.ch/api/v4/projects?private_token=$GITLAB_TOKEN&per_page=100" \
  | jq '.[] | { id: .id, from: .path_with_namespace, to: ""}'

This will yield a structure like:

[
  { "id": 1, "from": "200ok/project-name", "to": "" }
]

For automating tasks 2-5, we have written this Ruby script:

require 'json'

projects = [
  # This project will keep it's namespace and project name when
  # imported.
  { "id": 1, "from": '200ok/project-name', "to": '' },
  # This project will only be downloaded for archiving, but not
  # imported to gitlab.com
  { "id": 2, "from": '200ok/project-name', "to": nil },
  # This project will be imported to a different namespace and project
  # name.
  { "id": 3, "from": '200ok/project-name', "to": "ns2/project-name-2" },
]

# prepare projects
projects.each_with_index do |project, index|
  to = project[:to]
  to = project[:from] if to and to.empty?
  project[:to] = to
  projects[index] = project
end

BASE_CMD = 'curl -s --header "PRIVATE-TOKEN: %s" '
EXPORT_CMD   = BASE_CMD + '--request POST "https://gitlab.200ok.ch/api/v4/projects/%s/export"'
STATUS_CMD   = BASE_CMD + '"https://gitlab.200ok.ch/api/v4/projects/%s/export"'
DOWNLOAD_CMD = BASE_CMD + ' --remote-header-name --remote-name "https://gitlab.200ok.ch/api/v4/projects/%s/export/download"'
IMPORT_CMD   = BASE_CMD + '--request POST --form "namespace=%s" --form "path=%s" --form "file=@%s" "https://gitlab.com/api/v4/projects/import"'

tokens = {
  old-gitlab-admin-token: 'token1',
  new-gitlab-user-1: 'token2',
  new-gitlab-user-2: 'token3'
}

# schedule exports from gitlab.200ok.ch
projects.each do |project|
  puts "Requesting export for #{project['from']}..."
  cmd = EXPORT_CMD % [tokens[:old-gitlab-admin-token], project[:id]]
  system(cmd)
  sleep 0.25
end

# loop to find finished exports and import to gitlab.com
remaining = [true]
while remaining.count
  projects.each_with_index do |project, index|
    file = project[:from].tr('/', '_') + '.tar.gz'
    projects[index][:done] = done = File.exists?(file)
    next if done
    print "Checking #{project[:from]}..."
    cmd = STATUS_CMD % [tokens[:old-gitlab-admin-token], project[:id]]
    result = JSON.parse(%x[#{cmd}])
    puts status = result['export_status']
    if status == 'finished'
      puts "Downloading #{project[:from]}..."
      cmd = DOWNLOAD_CMD % [tokens[:old-gitlab-admin-token], project[:id]]
      system(cmd)
      system("mv *_export.tar.gz #{file}")
      if to = project[:to]
        token = to.start_with?('username1') ? tokens[:new-gitlab-user-2] : tokens[:new-gitlab-user-1]
        puts "Uploading #{to}..."
        ns, path = to.split('/')
        cmd = IMPORT_CMD % [token, ns, path, file]
        system(cmd)
      end
    end
    # do not inundate gitlab
    # 5 request per minute per user
    sleep 15
  end
  remaining = projects.select { |project| !project[:done] }
  puts "Remaining #{remaining.count}/#{projects.count}"
end
puts "All done."

Now your projects are on gitlab.com - you're done, right? Not quite:

  • All Merge requests, comments, assignments, etc will belong to the user whom the API key belongs to. Gitlab has no way of knowing which users to map these things to.

    • [ ] Manually go over all projects and re-assign the relevant topics.
  • [ ] Webhooks are not included in the export/import process. If you're using Webhooks, you will have to reconfigure those for each project. You can probably use this API for it, but we did it by hand, because we took the chance to rewire some notifications.

  • [ ] CI/CD Variables are not included in the export/import process. If you're using CI/CD Variables, you will have to reconfigure those for each project. We did this by hand, but we wrote a script to make it more visible where CI/CD Variables are used:

require 'json'

projects = [
  { "id": 1, "from": '200ok/project-name', "to": '' }
]

projects.each do |project|
  variables =
    `curl --silent --header "Private-Token: #{ENV['GITLAB_API_TOKEN']}" "https://gitlab.200ok.ch/api/v4/projects/#{project[
      :id
    ]}/variables"`
  variables = JSON.parse(variables)
  unless variables.empty?
    puts "* #{project[:from]}"
    puts "#+begin_src json"
    puts JSON.pretty_generate(variables)
    # puts `echo '#{variables.to_json}' | jq '.'`
    puts "#+end_src json"
    puts
  end
end

This will yield an Org mode document like:

* TODO 200ok/200ok.ch
#+begin_src json
[
  {
    "variable_type": "env_var",
    "key": "FTP_HOST",
    "value": "your-ftp-host",
    "protected": false,
    "masked": false,
    "environment_scope": "*"
  }
]
#+end_src json

Then, make the adjustments in the relevant Gitlab projects.

  • [ ] Migrate Gitlab container registry.

    • If you're using the Gitlab container registry, maybe even for CI runners, you need to migrate those images and all projects using them.
  • [ ] If you've used Bot users doing things on commit/push, you'll need to migrate those, their Docker images and config.

Now, you're done with the migration of projects from your self-managed Gitlab instance to gitlab.com. However, the work is not done, yet. You'll need to:

  • [ ] Inform your team members to now update their git repository remotes and discontinue their use of the prior Forge.
    • If they use tooling on top of git (like Magit Forge), they will have to update their remotes there, too.

Since it's time critical and error prone to make these adjustments by hand on many projects in a diverse team, we've written a script to automate that process, as well:

STDOUT.sync = true

require 'colorize'
require 'yaml'
require 'securerandom'

system 'stty cbreak'

base = ARGV.first

custom_file = File.expand_path('.fix_git_config.yml', ENV['HOME'])
custom = File.exist?(custom_file) ? YAML.load(File.read(custom_file)) : []
mapping = YAML.load(DATA.read).concat(custom)

repos = Dir.glob('**/.git/config', base: base)
repos.each_with_index do |config, index|
  config = File.expand_path(config, base)
  repo = config.sub('/.git/config', '')
  puts ('-' * 60).colorize(:yellow)
  puts "Repo #{index+1}/#{repos.count}: #{repo}".colorize(:yellow)
  ini = File.read(config)
  replacements = {}
  ini_viz = mapping.reduce(ini) do |r, h|
    uuid = SecureRandom.uuid
    replacements[uuid] = h.keys.first.colorize(:red)
    r.gsub(h.keys.first, uuid + h.values.first.colorize(:green))
  end
  ini_viz = replacements.reduce(ini_viz) { |r, kv| r.gsub(*kv) }
  if ini != ini_viz
    puts
    puts ini_viz
    puts
    puts 'Type y to apply, n to skip, anything else to abort.'.colorize(:yellow)
    q = $stdin.sysread 1
    if q == 'y'
      ini = mapping.reduce(ini) { |r, h| r.gsub(h.keys.first, h.values.first) }
      File.open(config, 'w') { |f| f.write(ini) }
      puts
      puts "Updated #{config}"
    elsif q == 'n'
      puts
      puts 'Skipped.'
    else
      puts
      puts 'Abort.'
      system 'stty cooked'
      exit 0
    end
  else
    puts 'No changes required.'
  end
end

system 'stty cooked'

__END__
- gitlab.200ok.ch: gitlab.com
- 200ok/old-project-name: ns2/project-name-2

At the end of the script, there's a mapping between old namespaced project names and new ones. So, if you did this kind of cleanup with the first script, you can do it here, too.

The usage of this script is: ./fix_git_origin.rb path. Running it looks like this:

asciicast

Now, you're all set!

It took us close to 4 working days worth of work to migrate 75 projects in different teams. With this writeup, we hope you will get it done faster!


If you liked this post and want to say 'thanks', please head over to our free/libre and open source software page - and if you like one of them, give it a star on Github or Gitlab.