This post does not cover every aspect of Elasticsearch; it is a short introduction to geospatial features in the engine.

ElasticSearch Logo

I. A few words about Elasticsearch

Elasticsearch, like Apache Solr, is a Lucene-based search engine. It tends to be more flexible, modern, and easier to get started with than Solr (see also). There is a feature comparison at Solr vs Elasticsearch.

Some strengths of Elasticsearch:

  • Schemaless
    • You avoid heavy upfront schema work: Elasticsearch infers basic field types from the documents you send, so you can start indexing soon after install. For non-basic types such as geo_point and geo_shape, you still need explicit mapping.
  • RESTful API
    • Create, update, and delete indices over HTTP (GET, POST, DELETE, PUT), with JSON bodies instead of query-string-only GET parameters.
  • Distributed (without extra cluster software such as Apache ZooKeeper)
  • Near real-time search.

To revisit how indexing works in search engines, see the earlier posts on the inverted index here and the vector space model here (Vietnamese). For installing Elasticsearch, see How To Install Elasticsearch on an Ubuntu VPS. For a broader tour, see Elasticsearch – Awesome search and index engine.

II. Location search in Elasticsearch

Elasticsearch supports field types such as geo_point and geo_shape, plus filters and aggregations for problems like “nearest points” or “how many points fall in this region.”

Example index mapping:

curl -XPUT http://localhost:9200/business -d '
{
 "mappings" : {
    "restaurant": {
        "properties": {
            "name": {
                "type": "string"
            },
            "location": {
                "type"          : "geo_point",
                "geohash"       : true,
                "geohash_prefix": true
            },
            "address" : {
                "type" : "string"
            }
      }
    }
  }
}'

Sample locations:

{:.table.table-bordered}

namelatlongeohashaddress
Beafsteak Nam Sơn10.775365106.690952w3gv7dv8xfep200 Bis Nguyễn Thị Minh Khai, P. 6, Quận 3
Đo Đo Quán10.768050106.688704w3gv7b227jbp10/14 Lương Hữu Khánh, P. Phạm Ngũ Lão, Quận 1
Chè Hà Ký10.754105106.658514w3gv5jdr5qxb138 Châu Văn Liêm, P. 11, Quận 5
Cơm Gà Đông Nguyên10.755465106.652302w3gv5j4tmxxu89-91 Châu Văn Liêm, P. 14, Quận 5
Nhà Hàng Sân Vườn Bên Sông10.831478106.724668w3gvsef9bvzc7/3 Kha Vạn Cân, P. Hiệp Bình Chánh, Quận Thủ Đức
Lẩu Dê Bình Điền10.869835106.763260w3gvv6y9kk0e1296C Kha Vạn Cân, Quận Thủ Đức

Snippet source

You only need to send lat and lon; Elasticsearch derives geohash for you.

Geo sort

Sort venues by distance from a known latitude/longitude (nearest first):

curl -XPOST "http://localhost:9200/business/restaurant/_search?pretty=1" -d'
{
   "query" : {
        "match_all" : {}
    },
    "sort" : [
        {
            "_geo_distance" : {
                "location" : {
                    "lat" : 10.776945451753402,
                    "lon" : 106.69494867324829
                },
                "order" : "asc",
                "unit" : "km",
                "distance_type" : "arc"
            }
        }
    ]
}'

Geo filter

Standing at Independence Palace (10.776945451753402, 106.69494867324829), we want venues within 4 km (the example uses 4 km so the circle does not reach District 5; 5 km would):

curl -XGET "http://localhost:9200/business/restaurant/_search?pretty=1 " -d'
{
    "filter" : {
        "geo_distance" : {
            "location" : {
                "lat" : 10.776945451753402,
                "lon" : 106.69494867324829
            }, 
            "distance": "4km",
            "distance_type": "arc"
        }
    }
}'

Elasticsearch returns hits inside that 4 km radius from the given point.

Geo aggregation

Note: aggregation APIs require Elasticsearch 1.0.0 or newer.

Example: bucket documents by geohash cells that share the same first five characters—a coarse “same neighborhood” bucket (roughly on the order of km² for that precision; exact cell size depends on latitude).

curl -XGET "http://localhost:9200/business/restaurant/_search?pretty=1 " -d'
{
    "size": 0,
    "aggregations" : {
        "restaurant-geohash" : {
            "geohash_grid" : {
                "field" : "location",
                "precision" : 5
            }
        }
    }
}'

Sample response:

{
  ...
  "aggregations" : {
    "restaurant-geohash" : {
      "buckets" : [ {
        "key" : "w3gv7",
        "doc_count" : 2
      }, {
        "key" : "w3gv5",
        "doc_count" : 2
      }, {
        "key" : "w3gvv",
        "doc_count" : 1
      }, {
        "key" : "w3gvs",
        "doc_count" : 1
      } ]
    }
  }
}

You can combine geo features with text queries, from simple match queries to fuzzier ones:

Exact match:

curl -XGET 'localhost:9200/business/restaurant/_search?size=50&pretty=1' -d '
{
  "size": 3,
    "query": {
        "match": {"name": "Lẩu Dê Bình Điền"}
    }
}'

Approximate match:

curl -XGET 'localhost:9200/business/restaurant/_search?size=50&pretty=1' -d '
{
    "query": {
        "fuzzy_like_this" : {
            "fields" : ["address", "name"],
            "like_text" : "De Thu Duc",
            "max_query_terms" : 12
        }
    }
}'

III. What is geohash?

World GeoHash

Normally you locate a point with longitude and latitude. Geohash is a base-32 encoding that represents the same information as a compact alphanumeric string instead of two decimal numbers. The world is subdivided into labeled cells (using 0–9 and a–z). For example, Independence Palace is w3gv7cvnryzz at (10.776945451753402, 106.69494867324829).

Precision matters: rounding to 10.77 and 106.69 shifts the point by about 1.3 km—for example alley 150 Nguyen Trai instead of 8 Huyen Tran Cong Chua. You can verify distances in Google Maps.

Nearby areas within roughly 20 km² around Independence Palace share the prefix w3gv, which makes geohash attractive for “near this point” queries backed by an inverted index. Like raw coordinates, longer geohashes mean finer precision.

{:.table.table-bordered}

GeoHash lengthArea height x width
15,009.4km x 4,992.6km
21,252.3km x 624.1km
3156.5km x 156km
439.1km x 19.5km
54.9km x 4.9km
61.2km x 609.4m
7152.9m x 152.4m
838.2m x 19m
94.8m x 4.8m
101.2m x 59.5cm
1114.9cm x 14.9cm
123.7cm x 1.9cm

IV. Conclusion

Elasticsearch is a practical, powerful search stack—not only for classic full-text search but also for spatial problems. It is quick to prototype yet solid enough for long-running location-based services. Foursquare was an early mover, migrating from Solr to Elasticsearch in August 2012. Other teams such as GitHub and SoundCloud also rely on Elasticsearch for search.

References

Elasticsearch

  1. Geo distance filter — Distance filters on geo_point fields (legacy guide URL; see current Elasticsearch documentation for newer releases).

  2. Geohash grid aggregation — Bucketing documents by geohash cells.

Tools

  1. geohash.gofreerange.com — Interactive geohash explorer.

  2. geohash-js — JavaScript geohash encoder/decoder.

Articles

  1. Gauth (2012). Find closest subway station with Elasticsearch.

  2. Florian Hopf (2014). Use cases for Elasticsearch: Geospatial search.

  3. DigitalOcean Community. How To Install Elasticsearch on an Ubuntu VPS.

  4. Foursquare Engineering (2012). Foursquare now uses Elasticsearch.