How To Query For Updated Items In Sitecore Using SQL

I wanted to share a handy sql query that can be useful in some key situations.  This query gives you information about all content that has been updated within a certain date/time range.  

First, why might we want to do this?  In our case we were deploying a newly rebuilt site on Sitecore 8.2 and we needed to sync from the legacy site just the content that had been modified since the last full content migration.  About 30 days in this case.  We had the migration scripts ready and we just wanted to point them at only the content that had changed.

But wait?  Why go deep to the sql layer, aren’t there other ways to get this information?  Well, yes, but sometimes the sql layer can be the best option.  You could:

  1. Use the package manager and set the created and modified filters.  However this can run slow, really slow, depending on how much content you have.
  2. Write a script in c# or powershell.  Yep, you can do that.
  3. Run a tool like Razl.
  4. Or, you can use this script which takes like less than 10 seconds to run on very large content databases.

And why is this sql query particularly blog worthy?  Cannot we just query the Items table and sort by created or modified date.  Well, no.  Unfortunately the Items table, while having created and modified dates, only has the dates of the initial version of the item.  In almost all cases, that’s not what we need since new versions can be created regularly.

So how do we do it then?  The answer is the VersionedFields table, but it’s not quite that easy.  This table has created and modified dates for every single version of every single field in Sitecore.  This is a little more granular than we were looking for.  However, we can link the field back to the item with a join on the Items table.  And we can get the most recent created or modified date of the item by using the Max() aggregate.

This gives us the core information we need from the query.  There’s a couple other nice-to-haves.  The first is getting the Template Name which may or may not be important to you, but was helpful for us.  This involved joining the Items table back to the join we already did on the Items table.  A little bit of a mind bender, but of course in Sitecore, templates are items too.  So we need to grab the template id of the item in the first join, and join that back to the Items table to get the template’s name, sigh.

The second nice-to-have is the item’s path.  To do this requires a Stored Procedure to recursively loop back up to the top of the tree, building the path along the way.  There may be several versions of this Stored Procedure out there, this is the one we used
 

CREATE FUNCTION [dbo].[GetPathForItemId] 
(
       @itemId uniqueidentifier
)
RETURNS varchar(1024)
AS
BEGIN
       
       DECLARE @path varchar(1024)
       SET @path = ''

       DECLARE @name varchar(255)
       DECLARE @id uniqueidentifier

       SET @name = ''
       SET @id = @itemId

       WHILE @name <> 'sitecore'
       BEGIN
       
       SELECT @name = name, @id = parentid FROM Items where id = @id

       SET @path = '/' + @name + @path

       END 

       RETURN @path

END

GO

Finally, here is the query in all its glory.  I hope you enjoy it as much as I did!

 SELECT ItemId, Items.TemplateID, ItemsTemplate.Name as TemplateName, Max(Version) as Version, Max(VF.Created) as Created, Max(VF.Updated) as Updated,[legacy_db].[dbo].GetPathForItemId(ItemId) as Path
  FROM  [legacy_db].[dbo].[VersionedFields] as VF
  join [legacy_db].[dbo].[Items] as Items on Items.ID = VF.ItemId
  join [legacy_db].[dbo].[Items] as ItemsTemplate on Items.TemplateID = ItemsTemplate.ID
  where (VF.Updated >= '2018-04-13' and VF.Updated <= '2018-05-01 23:59:59.000')
  and [legacy_db].[dbo].GetPathForItemId(ItemId) like  '/sitecore/content/%'
  group by ItemId, Items.TemplateID, ItemsTemplate.Name
  order by Updated