I wanted to share a handy SQL query that can be useful in some key situations. This query gives you information about all content that has been updated within a certain date/time range.
First, why might we want to do this? In our case, we were deploying a newly rebuilt site on Sitecore 8.2 and we needed to sync from the legacy site just the content that had been modified since the last full content migration. About 30 days in this case. We had the migration scripts ready and we just wanted to point them at only the content that had changed.
But wait? Why go deep to the SQL layer, aren’t there other ways to get this information? Well, yes, but sometimes the SQL layer can be the best option. You could:
- Use the package manager and set the created and modified filters. However, this can run slow, really slow, depending on how much content you have.
- Write a script in c# or Powershell. Yep, you can do that.
- Run a tool like Razl.
- Or, you can use this script which takes like less than 10 seconds to run on very large content databases.
And why is this SQL query particularly blog-worthy? Can't we just query the Items table and sort by created or modified date? Well, no. Unfortunately, the Items table, while having created and modified dates, only has the dates of the initial version of the item. In almost all cases, that’s not what we need since new versions can be created regularly.
So how do we do it then? The answer is the VersionedFields table, but it’s not quite that easy. This table has created and modified dates for every single version of every single field in Sitecore. This is a little more granular than we were looking for. However, we can link the field back to the item with a join on the Items table. And we can get the most recently created or modified date of the item by using the Max() aggregate.
This gives us the core information we need from the query. There’s a couple other nice-to-haves. The first is getting the Template Name which may or may not be important to you but was helpful for us. This involved joining the Items table back to the join we already did on the Items table. A little bit of a mind-bender, but of course in Sitecore, templates are items too. So we need to grab the template id of the item in the first join and join that back to the Items table to get the template’s name, sigh.
The second nice-to-have is the item’s path. To do this requires a Stored Procedure to recursively loop back up to the top of the tree, building the path along the way. There may be several versions of this Stored Procedure out there, this is the one we used
CREATE FUNCTION [dbo].[GetPathForItemId] ( @itemId uniqueidentifier ) RETURNS varchar(1024) AS BEGIN DECLARE @path varchar(1024) SET @path = '' DECLARE @name varchar(255) DECLARE @id uniqueidentifier SET @name = '' SET @id = @itemId WHILE @name <> 'sitecore' BEGIN SELECT @name = name, @id = parentid FROM Items where id = @id SET @path = '/' + @name + @path END RETURN @path END GO
Finally, here is the query in all its glory. I hope you enjoy it as much as I did!
SELECT ItemId, Items.TemplateID, ItemsTemplate.Name as TemplateName, Max(Version) as Version, Max(VF.Created) as Created, Max(VF.Updated) as Updated,[legacy_db].[dbo].GetPathForItemId(ItemId) as Path FROM [legacy_db].[dbo].[VersionedFields] as VF join [legacy_db].[dbo].[Items] as Items on Items.ID = VF.ItemId join [legacy_db].[dbo].[Items] as ItemsTemplate on Items.TemplateID = ItemsTemplate.ID where (VF.Updated >= '2018-04-13' and VF.Updated <= '2018-05-01 23:59:59.000') and [legacy_db].[dbo].GetPathForItemId(ItemId) like '/sitecore/content/%' group by ItemId, Items.TemplateID, ItemsTemplate.Name order by Updated