Searching a decoupled Umbraco site

Simon Miller Team : Web Development Tags : Web Development Search Umbraco MVC

Last month my colleague Jason Deacon wrote a well-received blog about decoupling Umbraco from a front-end website, and Peter Nguyen expanded upon the pros and cons of this approach. My current project puts this example into practice and we are seeing, as Jason originally stated, some excellent load times.

As Peter discusses, now that we are in MVC-land and not Umbraco-land some of the built in functions we are used to having from Umbraco are no longer there. One of the key omissions is no Examine based search. Traditionally with Umbraco when a node is published, it is simultaneously added to the Examine search index. From the front end it is then very simple using the ExamineIndexer to perform a fast search. With Jason’s approach, all we have for data is the Umbraco.config XML file.

There were a few ways to approach this:

Install Lucene (which is basically what Examine is) via NuGet and implement a custom index, however this would require an offline process to store the data from the XML in indexes. Too much trouble for little gain.
XPath – personally I am no fan of this. Aesthetically the methods are not pleasing to the eye, and if you are going down this path you may as well use…
LINQ to XML – bingo.

As Jason had already proved the speed of LINQ to XML for traversing the Umbraco.config, I had no concerns about performance. And consistent with his findings I am seeing searches on the front end taking ~18ms.

The search requirements for the site were not that complex; a user may search via keyword the site content, return a list of paginated results (ordered by relevance) as well as utilise a type-ahead quick search. For the server side searching I wrote a simple method in our search repository:

public static SearchResults Search(string keyword, int pageNo, int itemsPerPage)
        {
            var searchResults = new SearchResults();
            searchResults.Results = new List<KeyValuePair<int, SearchResult>>();

            if (!string.IsNullOrEmpty(keyword))
            {
                keyword = keyword.ToLower();
                var results = Context.Descendants().Where(e =>
                    !Node.ExcludedNodeTypes.Contains(e.AttributeAsString("nodeTypeAlias")) && 
                    (
                        e.AttributeAsString("nodeName").ToLower().Contains(keyword) ||
                        (e.ElementAsString("description") != null && e.ElementAsString("description").ToLower().Contains(keyword)) ||
                        (e.ElementAsString("body") != null && e.ElementAsString("body").ToLower().Contains(keyword))
                    )).AsEnumerable();

                if (results.Any())
                {
                    foreach (var item in results)
                    {
                        var result = new SearchResult()
                        {
                            Title = item.AttributeAsString("nodeName"),
                            Summary = item.ElementAsString("description").StripHtml(200),
                            FullUrl = UrlRepository.GetUrlForNode(item.AttributeAsInt("id"))
                        };

                        var keywordCounts = string.Join(" ", item.AttributeAsString("nodeName"), item.ElementAsString("description").StripHtml(), item.ElementAsString("body").StripHtml()).ToLower().Split(' ');
                        var count = keywordCounts.Where(e => e.StartsWith(keyword)).Count();

                        searchResults.Results.Add(new KeyValuePair<int, SearchResult>(count, result));
                    }
                    
                    var allResults = searchResults.Results.OrderByDescending(e => e.Key);
                    searchResults.Total = allResults.Count();
                    searchResults.Results = allResults.Skip((pageNo - 1) * itemsPerPage).Take(itemsPerPage).ToList();
                }

            }

            return searchResults;
        }

The search itself is nothing more than utilising a simple Contains() on predetermined searchable fields. All nodes in the XML will have ‘nodeName’ attributes, and all content nodes that we wish to search on were given ‘description’ elements. One the subset of results is retrieved based on the input keyword, I am then counting the occurrences of the keyword in each result and finally creating a KeyValuePair of the count and the result model. Ordering this list by the greatest count first is a good basis for how relevant a result is. Finally I return 10 items per page and hand it back to the controller for passing to the view.

You will note the ExcludedNodes list; this is a string array of node type alias’s we wish to exclude from search results, such as folders and content blocks. Should the search require to be expanded into parsing content from blocks related to a page, I would extend the search to parse the node IDs from all page’s ‘contentBlocks’ property (a multi-node picker), find the node in the XML and perform the keyword search on it as well. Each ‘hit’ would increase the relevancy of the page itself, boosting it in the results.

The second part of the search is the type-ahead. In the past I have used jQuery’s Autocomplete with much success, but decided to try something new. Typeahead.js is created by the same people behind Twitter so I knew it would perform well. It has a slightly steeper ‘getting started’ curve but I think, ultimately, will prove more flexible should I need to expand upon the search.

    var searchEngine = new Bloodhound({
        datumTokenizer: function (d) {
            return Bloodhound.tokenizers.whitespace(d)
        },
        queryTokenizer: Bloodhound.tokenizers.whitespace,
        remote: "/quicksearch/search/?keyword=%QUERY",
    });

    searchEngine.initialize();

    $('.typeahead').typeahead(null, {
        displayKey: 'value',
        source: searchEngine.ttAdapter()
    })
    .bind('typeahead:selected', function (obj, datum, name) {
        document.location.href = datum.url;
    });

The MVC path /quicksearch/search/ takes us to our Search controller action which returns a Json object, formatted from the initial search results into a list of “value” and “url” fields (with value being the string used in the type-ahead dropdown). The bind method simply allows a user to click the result (or press enter) and go directly to the page ‘url’. The search also has a fallback to a regular plain text search method that takes the user to the search results page, ordered by relevancy.

I am very happy with the ease of use of this method of searching content, and the ~18ms page loads don’t lie!

Searching a decoupled Umbraco site

About wiliam blog

more blogs

wiliam.com.au

Featured clients

clients

talk to us0420 521 870orcontact us online