Content migration sitecore

As the Sitecore platform keeps evolving, sooner or later solutions built on the Sitecore platform will need an upgrade, Especially if companies want to be able to utilize Sitecore in doing more than just being a content management platform.

Unfortunately, most of Sitecore implementations still require a bigger upgrade path than just a new Sitecore version. And there could be many reasons for that:

  • Too customized code that makes the solution not upgradeable
  • Wrong configuration strategy (changing original Sitecore config files instead of patches)
  • The actual functionality implementation that disallow personalization capabilities
  • Simply poor system design
  • Outdated technologies are being used in the solution like ASP.NET Web Forms (which are no longer supported by Microsoft)

The reasons stated above can lead to an obvious conclusion – its not only the Sitecore version upgrade that is required, but changing, or even re-implementing the initial solution. Read more about upgrade paths here

Sitecore content migration – The challenge

Recently we did a re-implementation of one of our clients Sitecore solutions. Which meant we had to start all over by building the new solution on the latest Sitecore version but still implement the same project using the old codebase as inspiration only.

The new solution follows all possible best practices, helix compliant, up to date with the industry standards and supports future Sitecore upgrades. However there appeared to be a new problem…
On one side we had an old implementation which contained about 6000 pages (with old codebase and content structure, mixed data and presentation layers, customized rendering engine etc) On the other side, a brand new helix-compliant solution with about 100 individual components available but 0 pages.

The new solution was implemented and ready to go live, but what about the existing content?
Obviously, manual re-creation of all 6000 pages is not an option as it would take a lifetime. We had to invent a way on how to migrate all the content but re-create it in a whole new way – new content structure, new component templates and renderings etc.

In addition to that, the old solution had a lot of outdated content including pages and media items. In total we had a Sitecore master database of 7 GB. At this point, it would be very beneficial to our client to get rid of not used content. And most importantly, getting rid of old media items which are no longer being used but just create a mess in the database.

Requirements

We found out that it’s not an easy task to do. And in order to do this right, we had to structure the requirement as follows:

  • URL paths to newly migrated pages must be kept as is. So all links in search engines remain valid
  • URLs to new pages that change must be followed with redirects to new paths
  • Pages normally refer to other pages and media items. Those links must be kept in the new solution as well
  • Old pages may have a lot of versions which are old and outdated, so only the publishable versions will be migrated
  • The content must be extracted from various places in the old solution and be placed into new components, datasources and pages in the new one
  • Some pages should not be migrated tonew solution as they are outdated. The client has to map all these pages.
  • Pages that should not be migrated still refer to media items that may not be needed anymore so those media items should not be migrated as well.

The approach

In general, the process of going live with the new solution included several stages:

  • We keep the old solution live without changes
  • We develop the new implementation
  • We set up the new production environment which will replace the old one when it’s ready
  • We implement the migration tool which would automate the extraction of existing content and creation of it on the new solution.
  • We set up a short ‘go-live’ interval where we stop content editing on old environment, run the content migration and switch the public DNS to point to new server

The process is more or less straigh-forward except the main part – how to properly implement the migration of the content, taking all the requirements above into consideration?

Attaching old master database to the new solution

Thankfully, Sitecore is a platform where backwards compatibility in principle is a rock-solid statement when it comes to basic API. With a little bit of configuration it’s possible to attach an old Sitecore version master database to the new solution and still be able to read items and field values.
The old implementation solution was still live in production at this point. And that also included ongoing content changes being done by editors every day.
We couldn’t afford to set the old solution on pause as the live content editing would impact the business side of the project. So we decided to attach the live master database of the old solution to the new one.

Cleaning up pages that we no longer neededs

Only the client had true knowledge about the content and therefore they would be responsible for marking pages as valid for migration and which should not be migrated.

Therefore we implemented a simple administration page where the client could decide what pages should be migrated. For that we created two checkbox fields on page templates in the old solution where we kept the data about if the page should or should not be migrated and therefore we could read this data during the migration process.

Keeping item IDs

Using Sitecore API which basically has not been changed in the last 10 or even 15 years, we could read the data from the attached database. From that we could create new pages in the new one.

However the first challenge we faced is that the pages and their components contain Rich Text fields that refer to other pages in the solution.

Links to the page includes page item ID in Sitecore. If we just created a new page, it’s ID would be lost, and the links to this page would become broken in the new solution. So the migration tool used the API in a way that old page item ID’s were kept and new pages would get the same ID’s from the old solution.

Migrate only used media items

As discussed before, all unused media items should not be migrated and this approach would serve as a cleanup process of the old 7 GB database.

We managed to implement a tool which would migrate only those media items which are being referred by existing pages.

Unfortunately, we did not find a way to keep Media item IDs while the migration process, as Sitecore API does not provide that functionality.

Also one media item can be referred by multiple pages and it was crucial that we didn’t migrate media items multiple times. We made a decision to keep the old-to-new media item id map in a simple SQLite database and that allowed us to ensure that only single media items were being migrated.

Media references

Along with links to pages, Rich Text fields can refer to media items. Media links also contain Media item ID in it.

As we said before, there was no way to keep the media IDs, and that turned out to be a really tough challenge for us as we were missing about 1000 media items referred from different places all over the website.

We had the SQLite map of the old to new ID’s which left us data and options for further decisions.

As an extra step of the migration process we implemented the parser which was based on regular expression engine.

This parser would find media references, then resolve the old ID’s using the map and try to get the new Media item. If the new media item didn’t exist – we would migrate it – we still had the old media item ID. Eureka!

Sitecore migration tool

The new solution had 7 websites on the same instance. And nearly 100 individual components and a few page templates for each of the websites. From the very first look we could see that a simple script couldn’t perform the migration and serve the purpose.

Luckily. The Sitecore platform is a well-designed software and it provided us with a list of options that we could use in our own implementation. One of those are sitecore pipelines.

The migration (or creation) of the page consists of few logical steps and they are the same if we should choose to do it manually or with little automation:

  • Create page in the right place with required template
  • Fill general page-related data and content
  • Set up specific data relevant to the page type (if applicable)
  • Create and setup page components with content

It sounded logical to do the same thing on the backend side and we came up with dedicated pipelines which would cover the whole process for the single page migration.

And what’s more important, this approach forced us to split the migration process to concrete steps and keep them configurable.

Therefore we got 2 custom pipelines “migratePage” and “migratePageComponents” and lots of dedicated processors for each step of those:

<pipelines>
  <migratePage>
    <processor type="Pintle.Scripting.Pipelines.MigratePage.CreatePageFromCorrectTemplate, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePage.CreatePageModulesFolder, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePage.StartEditingContext, Pintle.Scripting"/>
    <!--start editing new item-->
    <processor type="Pintle.Scripting.Pipelines.MigratePage.MigrateGeneralPage, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePage.MigrateMySitePage, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePage.MigrateSignupPage, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePage.MigrateLandingPage, Pintle.Scripting"/>
    ...
    <processor type="Pintle.Scripting.Pipelines.MigratePage.MigrateComponents, Pintle.Scripting"/>
    <!--end editing new item-->
    <processor type="Pintle.Scripting.Pipelines.MigratePage.EndEditingContext, Pintle.Scripting"/>
  </migratePage>
  <migratePageComponents>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateStandardDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateProductDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateAccordionDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateTabsDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateTextDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateFactDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateQuoteDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateProgramDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateSelfServiceDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateCalculatorDecks, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateTaxonomyDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateImageDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateWriteToUsDeck, Pintle.Scripting"/>
    <processor type="Pintle.Scripting.Pipelines.MigratePageComponents.MigrateSearchDeck, Pintle.Scripting"/>
    ...
  </migratePageComponents>
</pipelines>

Alright, this gives us the brains behind the scenes, however these pipelines need to be executed from somewhere.

As the amount of content was quite large the migration process needed to be able to be performed in chunks. Therefore we also implemented an administration page where we could execute migration tasks. This gave us options to migrate pages from old destinations to new ones.

Either for the root item with all its children and components or for a single page item with or without components.

Conclusions: Sitecore content migration

This was the result after the migration:

  • Migrated about 6000 pages
  • 7 individual websites on the same Sitecore instance
  • Migrated only required media
  • About 5 hours of execution time
  • The database size went from 7 GB down to 1.7 GB

The content migration task is not simple however is absolutely doable. In solutions with huge amounts of content it is a must have option which solves a list of problems

  • Almost no manual page re-creation or editing is needed
  • Limiting the content-freeze period and go live time
  • Migration gives the options to clean up the content and even change dedicated parts of pages/components etc
  • If something goes wrong, there is always an option to fix and then start over

Make sure to make content migration a vital part of you content strategy when you are migrating to a new solution. Without a clear view on your content and the planning, it will be impossible to make a successful migration. No matter how good your migration tool is.

Share article
See also
Sitecore benefits in the AWS cloud

Sitecore to AWS Case: Cloud as a Transition

Read more Motiejus Bagdonas 21.10.2019
Pintle.Packager – Sitecore Package Generator

Pintle.Packager – Sitecore Package Generator

Read more Volodymyr Hil 07.10.2019
Ledernes Hovedorganisation office

Ledernes Hovedorganisation agil udviklingsproces

Read more Marcell Lindenborg 27.09.2019
VEKS Vestegnens Kraftvarmeselskab

Redesign af veks.dk på Sitecore 9

Read more Marcell Lindenborg 27.09.2019

MVC renderings with xWrap framework – Sitecore Experience Wrapper

Read more Volodymyr Hil 10.12.2018