ElasticSearch-Part2-Populate and Search

In the last post we discussed how we would configure the indices for ChanderYoga. This post will describe the process of actually creating the indices, populating the indices, and creating ElasticSearch requests that fulfill the feature requirements.

Creating an index in ElasticSearch is a simple process. If the feature requirements are satisfied by the default configuration, then you simply add a document and the index will be automatically created. 

For ChanderYoga I will be creating an index per resource. This will allow me to change the analysis and settings for each index individually and reindex the data if necessary without effecting the other resources. 

In the downloadable code base at Application_Start() the indices are created and populated if they don’t already exist. 

First, lets create the blog index using the configuration we created in the Part 1. For the sake of brevity I have removed the body of the analysis and the mappings in the example below.

POST http://localhost:9200/blog
{
    "settings": {
        "index": {
            "number_of_shards": 1,
            "number_of_replicas": 0
        },
        "analysis": { ... }
    },
    "mappings": {...}
}

The next step will be populating the indices using the bulk api. The bulk api accepts multiple types of actions, the most useful for initial population of indices is the “index” action. The actual body of the request has specific format requirements. The first line declares the type of action, index, type, and the document id (optional, except in udpate/delete actions). The second line (not required for a delete action) is the actual contents of the action. Each line must end in a newline character. 

{ "index": { "_index": "asanas", "_type": "asana", "_id": "1" }}
{ “field1”: ”value1”, ”field2”: ”value2” }

I have constructed some C# classes to handle creating a bulk api request. Below are the “BulkRequest”, “BulkActionBase”, and “IndexBulkAction” classes.

public class BulkRequest
{
    private readonly string _actionsJson;

    public readonly IElasticSearchUriProvider UriProvider;
    public readonly IEnumerable<BulkActionBase> Actions;

    private BulkRequest(IElasticSearchUriProvider uriProvider)
    {
        if (uriProvider == null)
        {
            throw new ArgumentNullException("uriProvider");
        }

        UriProvider = uriProvider;
    }

    public BulkRequest(IElasticSearchUriProvider uriProvider, IEnumerable<BulkActionBase> actions)
        : this(uriProvider)
    { 
        if(actions == null || !actions.Any())
        {
            throw new ArgumentNullException("actions");
        }

        Actions = actions;
    }

    /// <summary>
    /// Only use this constructor if you are sure that you have the correctly formatted request in string form already.
    /// </summary>
    /// <param name="actionsJson"></param>
    public BulkRequest(IElasticSearchUriProvider uriProvider, string actionsJson)
        : this(uriProvider)
    {
        if (string.IsNullOrWhiteSpace(actionsJson))
        {
            throw new ArgumentNullException("actionsJson");
        }

        // Trim and add a newline character at the end of the string just incase the user forgot.
        _actionsJson = actionsJson.Trim() + "\r\n";
    }

    public override string ToString()
    {
        StringBuilder builder = new StringBuilder(_actionsJson);
        if (builder.Length == 0)
        {
            foreach (BulkActionBase action in Actions)
            {
                builder.AppendLine(action.ToString());
            }
        }

        return builder.ToString();
    }
}

public abstract class BulkActionBase
{
    protected abstract string Action { get; }

    public readonly string Index;
    public readonly string Type;
    public readonly string DocumentId;

    protected BulkActionBase(string index, string type) 
    {
        if (string.IsNullOrWhiteSpace(index))
        {
            throw new ArgumentNullException("index");
        }

        if (string.IsNullOrWhiteSpace(type))
        {
            throw new ArgumentNullException("type");
        }

        Index = index;
        Type = type;
    }

    protected BulkActionBase(string index, string type, string documentId)
        : this(index, type)
    {
        if (string.IsNullOrWhiteSpace(documentId))
        {
            throw new ArgumentNullException("documentId");
        }

        DocumentId = documentId;
    }

    public override string ToString()
    {
        string result = null;
        if(string.IsNullOrWhiteSpace(DocumentId))
        {
            result = string.Format("{{\"{0}\":{{\"_index\":\"{1}\",\"_type\":\"{2}\"}}}}", Action, Index, Type);
        }
        else
        {
            result = string.Format("{{\"{0}\":{{\"_index\":\"{1}\",\"_type\":\"{2}\",\"_id\":\"{3}\"}}}}", Action, Index, Type, DocumentId);
        }

        return result;
    }
}

public class IndexBulkAction<T> : BulkActionBase where T : class
{
    protected override string Action
    {
        get { return "index"; }
    }

    public readonly T Document;

    public IndexBulkAction(string index, string type, T document)
        : base(index, type)
    {
        if (Document == default(T))
        {
            throw new ArgumentNullException("document");
        }

        Document = document;
    }

    public IndexBulkAction(string index, string type, string documentId, T document)
        : base(index, type, documentId)
    {
        if (Document == default(T))
        {
            throw new ArgumentNullException("document");
        }

        Document = document;
    }

    public override string ToString()
    {
        StringBuilder builder = new StringBuilder();
        builder.AppendLine(base.ToString());
        builder.Append(JsonConvert.SerializeObject(Document));

        return builder.ToString();
    }
}

The override of ToString() in each of these classes is an easy way to create the necessary json for a bulk api request.

After constructing the body of the request we are ready to perform a POST request against the cluster, which will look something like this:

POST http://localhost:9200/_bulk
{ "index": { "_index": "asanas", "_type": "asana", "_id": "1" }}
{ “field1”: ”value1”, ”field2”: ”value2” }
{ "index": { "_index": "asanas", "_type": "asana", "_id": "2" }}
{ “field1”: ”value1”, ”field2”: ”value2” }

It is finally time to craft search requests to find our data. Let’s review our requirements to make sure we satisfy them with the search requests.

Feature: A single text box through which a user can search over asanas, mudras, pranayamas, and blog posts.

  • All data visible to a user should be searchable.
  • The user should be able to see the type of resource for each search result. 
  • Fields like Name and Title should be more important to the search results than fields like Tags or Categories, and the least important fields should be large text fields.

To simplify the process of creating and maintaining search requests as well as handling the response from the server I will be making a search request per resource. 

First, I will tackle the most complex resource, the Asana resource. 

Model Property Json Field Analyzed Search Field
Id id N/A
Author author author
SanskritName sanskrit_name name
EnglishName english_name name
Description description text
Sequence sequence text
Benefits benefits text
Breathing breathing text
Chakra chakra text
Categories categories tags
AnatomyFocuses anatomy_focus tags
Contraindications contras tags
Therapeutics therapeutics tags

Many of the fields use “copy_to” to have their values analyzed and searchable through the name, text, or tags fields. This leaves of with three value categories of searchable content in the Asana index (name > tags/author > text). Using the multi_match query we can search against multiple fields at once all using their own configured analysis, and apply unique “boost” values per field by adding a “^n” to the field name, like “name^10”. Boosting makes any match on that field worth approximately n times more than a match on a field without a specified boost value.

{
  "from":0,
  "size":5,
  "query":{
    "multi_match":{
      "query":"chander",
      "fields":[
        "name^10",
        "name.starts_with^7",
        "name.contains_shingle^3",
        "tags^8",
        "tags.starts_with^6",
        "tags.contains_shingle^2",
        "text^7",
        "text.starts_with^4",
        "text.contains_shingle",
        "author^8",
        "author.starts_with^5",
        "author.contains_shingle^2"
      ]
    }
  }
}

This query searches for full, starts with, and contains matches making each worth more than the next. So if the user has searched for “chander” and “Chander Dhall” is the author of a blog post, then it is very likely that document will receive a high relevancy score in relation to other asanas. If the word “chander” also exists in the name of the asana that match would be worth even more.

Lets apply this to the other resources:

Mudras:

Model Property Json Field Analyzed Search Field
Id id N/A
Author author author
Name name name
Description description text
Sequence sequence text
Benefits benefits text

{
  "from":0,
  "size":5,
  "query":{
    "multi_match":{
      "query":"chander",
      "fields":[
        "name^10",
        "name.starts_with^7",
        "name.contains_shingle^3",
        "text^7",
        "text.starts_with^4",
        "text.contains_shingle",
        "author^8",
        "author.starts_with^5",
        "author.contains_shingle^2"
      ]
    }
  }
}

Pranayamas:

Model Property Json Field Analyzed Search Field
Id id N/A
Author author author
Name name name
Description description text
Sequence sequence text
Benefits benefits text

{
  "from":0,
  "size":5,
  "query":{
    "multi_match":{
      "query":"chander",
      "fields":[
        "name^10",
        "name.starts_with^7",
        "name.contains_shingle^3",
        "text^7",
        "text.starts_with^4",
        "text.contains_shingle",
        "author^8",
        "author.starts_with^5",
        "author.contains_shingle^2"
      ]
    }
  }
}

Blog Posts:

Model Property Json Field Analyzed Search Field
Id id N/A
Author author author
Title title title
Text text text
Tags tags tags
Benefits benefits text

{
  "from":0,
  "size":5,
  "query":{
    "multi_match":{
      "query":"chander",
      "fields":[
        "title^10",
        "title.starts_with^7",
        "title.contains_shingle^3",
        "tags^8",
        "tags.starts_with^6",
        "tags.contains_shingle^2",
        "text^7",
        "text.starts_with^4",
        "text.contains_shingle",
        "author^8",
        "author.starts_with^5",
        "author.contains_shingle^2"
      ]
    }
  }
}

In part 3 we will cover how to utilize these search requests, and handle the ElasticSearch responses in a C# .NET environment.

Copyright © 2017 Chander Dhall, Inc. • All Rights Reserved • View Sitemap