Whitelist / Blacklist searching in Elasticsearch

May 2018

How do we match a large number of documents against a dynamic whitelist/blacklist in Elasticsearch?

You can use the terms query

Suppose we have an index of page access logs like so:

PUT /mybeat-2018/_doc/1
{
    "host" : "elastic.co",
    "ttl" : 40
}

PUT /mybeat-2018/_doc/2
{
    "host" : "elastic.co",
    "ttl" : 666
}

PUT /mybeat-2018/_doc/3
{
    "host" : "google.com",
    "ttl" : 55
}

and an independent whitelist that can shrink or grow, with a bunch of hosts:

PUT /whitelist/_doc/1
{
 "hosts" : [
   {
     "name" : "elastic.co"
   },
   {
     "name" : "twitter.com"
   }
 ]
}

Then a search on the mybeat-* for whatever is in the whitelist should reference the whitelist document (in our case the document with id: 1) like so:

GET /mybeat-*/_search
{
    "query" : {
        "terms" : {
            "host" : {
                "index" : "whitelist",
                "type" : "_doc",
                "id" : "1",
                "path" : "hosts.name"
            }
        }
    }
}