Renaming an index in Elastic (slight hacks)

We've all been there .. the pet project has grown, the index is too large and you want to use fancy features like aliases with write-active indices, but you cannot reuse the name of the index for your new alias - so you'll have to modify your application. That sucks. The internet tells you it can't be done - this blog post describes some metadata manipulation that can achieve a rename, at a slight risk of losing everything.

Alternatives & risk

First thing first - you should always consider the official methods first - such as reindexing into a new index, snapshotting and restoring, splitting the index into more shards.

If all else fails, read on - but any advice here is anecdotal at best.

.. something something backups .. .. seriously though. If it would hurt to lose, it's worth backing up (snapshots).

Method

The Elasticsearch data store uses a number of files to store both metadata and data. We're specifically interested in the metadata files, as these contain all settings that makes up an index. The state files have the extension `.st`. Each state file is additionally protected by a checksum, which we'll have to reproduce once the file has been edited.

  1. Shut down the Elasticsearch node.
  2. In the data directory, identify all files that contain the index name in question.
    • We can do this with a grep: grep -rF --include '*.st' 'my-old-index'
    • From testing - the state-*.st file withing an index folder will always match.
    • Sometimes, the index also exists within a global-*.st state file
  3. Open each file using a hex editor
    • Preferably one capable of calculating CRC32 checksums. I use HxD.
  4. Substitute the my-old-index name with the new my-new-index name in each file.
    • Note that the two names are of equal length. This is to not upset any length of offset identifiers.
  5. Recalculate the crc32 checksum.
    • Calculate using all but the last 8 bytes of the file
  6. Once done - rerun the grep to ensure that all files have been edited - this grep should not find anything.
  7. Restart the Elasticsearch node.

Sources

  • Lucene uses the CodecUtility to perform any checksum checks - they are documented here. Specifically, we see that the last 8 bytes (64 bits) are the checksum, and from testing I find that zlib-crc32 (aka CRC32) is algorithm 0 - and is what's currently used. Of the 8 bytes for checksums, only the 4 latter are used - so all checksums begin with 4 null bytes. From testing, the checksums are calculated on the entire file, minus the 8 bytes in the end.
  • An old blogpost on elastic.co dives into the layout of files within the data folders. Worth a look, if you're interested in the state files or other files meaning.

Tested on ..

I have successfully renamed an index so that I could create a new alias with the name, on an Elasticsearch 6.8.12 server (single-node), running in docker with a mapped volume. The rename moved to a new name of the same length (to not break any formats in the state files). The new alias is happily inserting into a new index, while providing access to the old (full) index and the new index.