Start of Main Content

A week ago today I was on an Amtrak train with fellow Sitecore MVP, Dan Solovay, on my way to meet the Sitecore 7 development team at the Huge Inc. headquarters in Brooklyn.  We don't generally travel 4 hours through 3 states to attend a meetup, but this was too good to miss  - a true deep dive into the Sitecore 7 search features directly from the core team responsible for the latest major release.

So, what did we learn from this unique opportunity?

Search First Content Approach

Stephen Pope set up the presentation by sharing the primary motivation behind Sitecore 7.  This release was intended to improve the scalability of the platform and provide a shift from a strictly hierarchy-based content model to a "Search First" content approach.

The Search First model applies to both content managers and developers.  Content Managers will be able to take advantage of new Search UIs, a welcome update for any organization that maintains thousands or hundreds of thousands of content pieces, and developers will now use the search API’s as the default means for retrieving and working with sets of content items.

We learned that the test benchmark for development was 100 million content items.  That kind of scalability does not exist in older versions of Sitecore, and breaking through those limitations is a necessary step as the CMS provider continues to reach for larger enterprise customers.

Extensible Search Provider Architecture

The team provided a quick conceptual overview of the Sitecore 7 architecture comprised of search crawlers, search providers, and a rich LINQ to search API implementation.



Stephen drove home the message that the team has built a very flexible foundation for the search architecture.  This includes strong default implementations on Lucene.NET and SOLR, and the ability to hook in or customize search as you see fit.

The provider model delivers a welcome level of abstraction from any given search engine.  Whether you choose to use Lucene.NET or SOLR, the majority of your code can be insulated from those specific implementations.  This gives organizations the ability to switch out providers in the future with much less pain - a decision which will become more relevant as new providers are developed.  As we recently discovered at the New England Sitecore User Group, Coveo is currently working on a Sitecore 7 integration.  We also heard rumors that a provider for Elasticsearch, a highly-scalable open source search engine used by the likes of GitHub and StackOverflow, is also in the works.

LINQ integration

The developers I have spoken with are most excited about the provided LINQ integration.  As the team walked through various examples of querying, filtering, and sorting, it was clear that LINQ provides a very expressive and concise syntax to work with Sitecore data  - unquestionably a step above the old Search APIs. Here are a few examples taken from AutoHaus, an excellent Sitecore 7 demo site and development sandbox built by Alex Shyba.

Search for Small Engine Cars:

using (var context = Sitecore.ContentSearch.SearchManager.GetIndex("sitecore_web_index").CreateSearchContext()) {
   return context.GetQueryable().Where(i => i.EngineCC < 400).ToList();

Search for the Fastest Cars:

using (var context = Sitecore.ContentSearch.SearchManager.GetIndex("sitecore_web_index").CreateSearchContext()) {
return context.GetQueryable()
                    .Where(i => i.TopSpeed >= 350)
                    .Where(i => i.ZeroToHundred < 3.5f)

Search for American Muscle Cars:

using (var context = Sitecore.ContentSearch.SearchManager.GetIndex("sitecore_web_index").CreateSearchContext()) {
return context.GetQueryable()
.Where(i => i.NumCylinders == 8)
.Where(i => i.Make == "Ford" ||  i.Make == "Chevrolet" || i.Make == "Dodge" ||
i.Make == "Pontiac" || i.Make == "Plymouth" || i.Make == "Oldsmobile" ||
i.Make == "AMC" || i.Make == "Buick" || i.Make == "Mercury")
.Where(i => i.Doors.Contains(2))
.Where(i => i.EngineType == "v")
.Where(i => i.BodyType != "suv")
.Where(i => i.BodyType != "van")
.Where(i => i.BodyType != "pickup")
.Where(i => i.EngineCC >= 4900)

Another worthwhile note: the team stressed that because LINQ is implemented using an abstracted provider layer, LINQ queries will continue to work without modification even if you switch search providers.

Crawling External Data Sources

The Lucene and SOLR crawlers shipped with Sitecore are capable of crawling all Sitecore content.  It will also be possible to develop custom crawlers that index other data sources including XML, RSS feeds, and any data exposed via a Web API.

One common example for usage would be indexing Twitter feeds that are relevant to your website.  Rather than creating a custom database to store this information or shoehorning it into Sitecore content items, you can store this external data directly in your search index.  A developer can than retrieve this content using the same APIs used to retrieve Sitecore content from the search index.  This is a good fit for any external content that does not need to be actively managed or edited.

Sitecore - Now With More Testing

Stephen Pope made clear that one of the team’s principal design goals was to provide developers with an API that is completely testable using free tools such as NUnit.  In short, if you build logic that interacts with Sitecore using the new Search APIs, your code will be completely unit testable. Stephen has posted a sample project on Github to show how to go about building Sitecore unit tests (

Upgrade Path

The team presented a strong case for upgrading your solutions to Sitecore 7 sooner rather than later.  The Sitecore kernel was not refactored with this upgrade, and most all of the new features are bolted on to Sitecore and do not affect the existing functionality of Sitecore 6.6.

The team also updated the Sitecore Advanced Database Crawler, which was the previous defacto library used when working with Lucene.NET.  This module was rewritten to take advantage of improved search performance and functionality, meaning if you upgrade to Sitecore 7 and use the new Advanced Database Crawler, you get the advantage of the increased performance with minimal refactoring.

Sitecore has also unveiled a multi-provider search model that will be available in Update 2. This will prove very useful for upgrades and will allow you to run multiple search providers side-by-side, such as Lucene.NET and SOLR.  With this update implementers will be able to replace search functionality over time, allowing you to dip your toes in the water with SOLR when you currently rely on Lucene.NET.

What’s Next?

The Sitecore 7 team is moving rapidly through updates based on community feedback.  We learned that Update 2 is code complete and moving into QA, so we can expect that release shortly.  Here are more of the most exciting updates to come.

Update 1

  • Expanded Search Results Items -  URI, Paths, and Datasources
  • SOLR Switch On Rebuild - Ensures the system always has an active index during rebuild operations
  • New Japanese Analyzer -  Will likely be committed back to Lucene.NET Apache project

Update 2

  • Document Mapper Factory - Support for interfaces in LINQ to Provider model. Object Factory and Rules for creating objects.  Support for working with multiple types in a single query result.
  • Multi-Provider Support - Useful for upgrades and proving fallback providers, for example, fall back to a local Lucene index if remote SOLR index is not available.
  • DMS Enabled Search Queries - Use search queries in Datasources and return items from an index based on DMS attributes (location, profiles, etc.)
  • Dynamic Crawler Manager - Use different crawlers for different sources.
  • Updated Legacy Search APIs - making use of new search functionality so that you will gain the benefits even before refactoring your code to use the new APIs.
  • All Sitecore Search UI running on new API

Update 3 possibilities

  • Spatial support
  • Synonym and highlight support
  • Elasticsearch Provider
  • Membership Table Crawler
  • Boosting Manager


Final Thoughts

This release is an answer to the well-known scaling issues that Sitecore could hit in larger deployments.  Breaking away from a strictly hierarchical based content model was a must, and the Search First approach is a welcome and sensible direction.

For me, this presentation also reinforced an emergent theme: an engaged Sitecore 7 development team reaching out to the community in order to educate and request feedback on the product.  Stephen Pope, Tim Ward, Martin Hyldahl, Alex Shyba, and Kieran Marron have been busy blogging, hosting Google Hangouts, and presenting the past three months.  This type of outreach and direct communication from product developers and engineers highlights that Sitecore is very much a technology-first company, which bodes well for their ability to respond to and execute on the needs of their customers and implementers.


Latest Ideas

Take advantage of our expertise with your next project.