xDB: Sitecore DMS Moves to MongoDB
I recently had the chance to attend a preview of Sitecore 7.5, code named “Andes”. Here are my thoughts.
A Big Deal
When Sitecore feels that they have moved the ball significantly, they tend to mark the occasion with a name change or two. So when batched writes came to the Online Marketing System (OMS), it became the Digital Marketing Suite (DMS), and when personalization was fully integrated into the Page Editor, it became, officially at least, the Experience Editor, and the CMS + DMS combination became the Customer Experience Platform.
With 7.5, Sitecore has embraced the term “Experience” across the board. We have the Sitecore Experience Platform, the Experience Database, even the Print Experience Studio. By renaming their entire product line, Sitecore is telegraphing that something significant has happened. In my view, this sentiment is justified. With 7.5, Sitecore firmly plants its flag in the land of Big Data.
It’s not what you make, it’s what you keep
At the heart of DMS was a cruel paradox. Every visitor interaction was tracked: every goal scored, every page browsed, every profile matched. But you couldn’t keep the data. It was too much, and SQL Server just couldn’t handle the scale. By moving the data store to MongoDB, a Sitecore instance will be able to gather and collect data in perpetuity. Sitecore’s draft documentation states that this data store “provides storage that incrementally scales to terabytes or even petabytes of data.” A petabyte is a lot.
You are not your browser
Sitecore 7.5 also takes aim at another liability of the old DMS, it’s equating of Visitor and Browser. DMS provides the ability to view a visitor as a coherent entity over many visits, progressing from casual anonymity to fully engaged community member, but the seamless story breaks down as soon as a mobile phone enters the scene. Since the DMS concept of a visitor was anchored on the presence of a persistent cookie, and since cookies are tied to the browser, they cannot follow the visitor from one device to another. There were ways around this, involving redirects and deleting and recreating cookies, but it was complex and fragile, and not supported in the API. This changes with the “Contact”, which replaces the Visitor in 7.5. Now there are API calls for merging contacts and visits across devices, and to share session information across devices.
Room to Grow
With Sitecore 7.5, the DMS is renamed the “Experience Database,” or xDB. Whereas the DMS stored visitor activity in relational tables, this data is now stored in JSON objects. Here, for example is an excerpt of how xDB stores a visit, now termed an Interaction. Note that Pages is now a nested array of subdocuments:
This new structure is visible in the query definitions that define DMS reports. Compare, for example, the Recent Visitors definition in Sitecore 7.2 and 7.5:
The change to NoSQL architecture provides two key advantages, flexibility and scalability. If you have worked with RESTful services, then you are familiar with the flexibility JSON offers, since adding a field to the returned results will not break existing consuming code. In a similar way, this schema will allow pushing external data into a single data store. One scenario might be to add segmentation data to customer contact records, to facilitate both personalization and reporting.
The second, and far more significant, advantage is that NoSQL allows for horizontal scaling. Vertical scaling means getting ever more powerful hardware, with rapidly escalating costs and diminishing returns. Horizontal scaling means dividing the load onto many parallel machines. To use a real world analogy: imagine you have children, and the only option to house them as they grew into adulthood was to buy a bigger house. Your housing costs are going to get pretty steep, especially when you have grown grandchildren. If you can put each grown child in a separate house, then your cost per square foot will stay reasonable, and everyone will be a lot more comfortable. NoSQL systems make this possible by a technique called “sharding,” in which a gatekeeper routes queries to separate machines based on a hash value of a key field. Externally, this looks like one system, but each individual machine is working with a manageable amount of data. Sitecore has a tradition of code naming their releases after mountains. This release is named after the Andes mountain range, to underscore this idea of horizontal scalability.
xDB Cloud
Sitecore plans to offer the Experience Database as a cloud service, with a consumption based pricing model, based on the number of contacts stored. This will be hosted on Microsoft Azure, and may be an appealing alternative for smaller organizations who do not want to invest up front in maintaining a production MongoDB data center.
The Experience Profile
More and more information will now be associated with the individual contact, such as pages viewed, automation plan states, profiles scored, engagement value triggered. Sitecore 7.5 will provide a new SPEAK based dashboard, called the Experience Profile, also called the xFile, to provide a centralized view to marketers. Since this is a SPEAK interface, it will be extensible by organizations to be better tailored to their business needs.
Aggregation and Reporting
Sitecore 7.5 does not altogether do away with SQL Server. Instead, it is being repurposed as a reporting store, with Fact and Dimension tables to support the Executive Dashboard and other strategic reporting needs. Sitecore will provide reporting and processing APIs to allow developers to work with this data.
Migration
Sitecore will provide command line utilities to export data from an existing DMS Database. However, if your solution has customized functionality, and especially if it queries directly against the DMS database, reworking this functionality to work with MongoDB will require development.
A Big Step to Big Data
This release is the next logical step in Sitecore’s march towards Big Data. Just as Sitecore 7 addressed scalability issues on the content side, “Andes” promises to allow organizations to harness vast amount of user data to focus the customer experience both on and off the web.
A few links
- Sitecore’s press releases on the Experience Platform and their partnership with MongoDB
- Sitecore’s Experience Database product page.
- VentureBeat article on xDB. The craft brew scenario illustrates the power of retaining granular visit data.
- An overview of MongoDB sharding, which allows for almost limitless growth.
- A few examples of what you can store in a petabyte.
- Have 15 minutes to learn MongoDB? Here’s an online tutorial: http://try.mongodb.org