Despite overall Elasticsearch stability, it is still possible
for a cluster to get into a "red"
state. One of the
reasons for that to happen is if an index becomes corrupt. This can
be caused by an abrupt loss of power, hardware failure or—more
commonly—running out of disk space. In this post we’ll discuss how
to bring the cluster to a healthy state with minimal or no data
loss in such situation.
In our example scenario we have Elasticsearch cluster version 5.6 running on AWS. Steps described below will also work just fine for versions 6.x, but probably not for 2.x or earlier. Deployment on AWS was done with terraform using our open source elasticsearch modules. At the same time, regardless of your particular setup, Elasticsearch recovery steps will be very similar, so keep reading on.
If there is something wrong with Elasticsearch, the first thing to do is to check cluster health:
$ curl -s https://elasticsearch.example.com:9200/_cluster/health?pretty | jq '.status'
"red"
It is very easy to identify which indices are at fault, in case
when the cause for cluster degradation is in fact index corruption,
since those will have the status: "red"
themselves:
$ curl -s "https://elasticsearch.example.com:9200/_cluster/health?level=indices" |
jq '.indices | map_values(.status)'
Obviously, after much googling with no success, the tempting way to recover might be to just remove the folder with elasticsearch data and start from scratch. But even in a development environment, where data loss might be acceptable, that is a terrible solution. It is most likely that only some of the indices are at fault, so there is definitely a way to recover with far less damage.
The next section describes what to do if your Elasticsearch cluster was deployed on AWS and the EBS volume with the data ran out of space. On the other hand, if the file system has enough space and something else caused the corruption you can skip to Recover the indices.
The first logical thing to do is to free up some space, such as by:
We’ll need to SSH into each data node, whether through a bastion host, via a VPN, or by some other means. Just in case, if our terraform modules where used to deploy Elasticsearch, here is how to get a list of IP addresses for all data nodes in the cluster:
$ aws ec2 describe-instances --filters
'Name=tag:cluster,Values=elk-dev-elasticsearch-cluster'
'Name=tag:Name,Values=*data-node*'
'Name=instance-state-name,Values=running'
| jq '.Reservations[].Instances[]
| { PublicIp: .PublicIpAddress, PrivateIp: .PrivateIpAddress }'
We need to log in to a data node and check its storage
situation. Assuming Elasticsearch stores data on the drive
/dev/xvdf
mounted at
/mnt/elasticsearch
:
$ df -h | grep /dev/xvdf
/dev/xvdf 7.8G 7.2G 276M 97% /mnt/elasticsearch
Although usage is still not at a full 100%, it is already possible that the cluster is in a semi-functional state, and its state is likely red. But if there is no more space, it is almost certain that some indices are corrupt and API requests to store—or even retrieve—data will result in an error.
$ lsblk | grep xvdf
xvdf 202:80 0 8G 0 disk /mnt/elasticsearch
$ df -h | grep /dev/xvdf
/dev/xvdf 7.8G 7.8G 0 100% /mnt/elasticsearch
In any case, we need to give it some space. The current EBS
volume size is 8Gb, as you might suspect. For the sake of the
example we will double it. The path for getting that done is
different depending on how the cluster was deployed. In our case it
was done with terraform, so
resizing EBS is just a matter of changing a variable and runnig the
usual terraform apply
.
Resizing EBS volumes does not change the file system, so we must update it manually.
Important: The steps below will have to be done on each of the data nodes:
$ lsblk | grep xvdf
xvdf 202:80 0 16G 0 disk /mnt/elasticsearch
$ df -h | grep /dev/xvdf
/dev/xvdf 7.8G 7.8G 0 100% /mnt/elasticsearch
$ sudo resize2fs /dev/xvdf
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/xvdf is mounted on /mnt/elasticsearch; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/xvdf is now 4194304 (4k) blocks long.
$ df -h | grep /dev/xvdf
/dev/xvdf 16G 7.8G 7.2G 52% /mnt/elasticsearch
Great, we’ve made some space. Beware that there is a limit enforced by AWS on how many times you can resize an EBS volume per day. Also, once you’re done with recovery, you might want to configure curator to run recurring maintenance in order to prevent running out of space again in the future.
We’ve already seen how to identify the indices at fault. At this point we could fix our problem by deleting the indices with red status. But we can do better than that. In particular, indices with red status most likely have their primary shards unassigned, so we can try reassigning the shards and possibly deleting only the ones that couldn’t be recovered.
We can inspect the state of our shards with this API call:
$ curl -s https://elasticsearch.example.com:9200/_cat/shards?v
Note: If you only have one data node and are using
default "number_of_replicas": 1
, then for all indices
in yellow
state you will see 50% of your shards in a
state UNASSIGNED
, which is expected, since there is no
other available data node that could be responsible for the replica
shards. In order to fix that, you can change the number of replicas
to 0 or add at least one more data node to the cluster.
It will be easy to spot malfunctioning indices, since either all
or some of their primary shards will be UNASSIGNED
.
What we need to do is to tell Elasticsearch to try to reassign
failed shards. Those which do not change their state to
STARTED
after the attempt could be bad and can be
deleted.
Let’s look at one of our red
indices as an
example:
$ curl -s "https://elasticsearch.example.com:9200/_cluster/health?level=indices" |
jq '.indices."elk-2018.02.07"'
{
"status": "red",
"number_of_shards": 5,
"number_of_replicas": 1,
"active_primary_shards": 0,
"active_shards": 0,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 10
}
$ curl -s "https://elasticsearch.example.com:9200/_cat/shards?v" | grep elk-2018.02.07
elk-2018.02.07 1 p UNASSIGNED
elk-2018.02.07 1 r UNASSIGNED
elk-2018.02.07 2 p UNASSIGNED
elk-2018.02.07 2 r UNASSIGNED
elk-2018.02.07 3 p UNASSIGNED
elk-2018.02.07 3 r UNASSIGNED
elk-2018.02.07 4 p UNASSIGNED
elk-2018.02.07 4 r UNASSIGNED
elk-2018.02.07 0 p UNASSIGNED
elk-2018.02.07 0 r UNASSIGNED
Here we can see that all of the shards for the index
elk-2018.02.07
are UNASSIGNED
. It is
possible that some of them will be in STARTED
state,
but unless all of the primary shards (p
) are started,
the whole index and the cluster will be red
.
Furthermore we can inspect the exact reason why our shards for
the above index are UNASSIGNED
:
$ curl -s "https://elasticsearch.example.com:9200/_cluster/state/routing_table" | jq '
.routing_table.indices
| .[] | .shards | .[] | .[]
| select(.index == "elk-2018.02.07")
| select(.unassigned_info.reason == "ALLOCATION_FAILED")
'
Filtered response will contain a list of all shards for index
elk-2018.02.07
, that couldn’t be allocated with
explanation why that happened. In case of lack of space possible
reasons could be:
"shard failure, reason [lucene commit failed], failure
IOException[No space left on device]"
"failed to create shard, failure IOException[No space
left on device]"
In order to fix as much as possible, we issue a
retry_failed
failed command by making a
POST
request with an empty body:
$ curl -s -XPOST "https://elasticsearch.example.com:9200/_cluster/reroute?retry_failed" | jq '
.state.routing_table.indices
| .[] | .shards | .[] | .[]
| select(.unassigned_info.reason=="ALLOCATION_FAILED")
'
Check the cluster status after issuing the reroute call, and you
should see the number of unassigned_shards
go down.
Here is
documentation on rerouting that can be useful in understanding
what is actually going on.
Most likely, the above call will need to be issued multiple
times, until all of the failed shards are reassigned. After enough
tries you should get your cluster back to the "green"
status. It is possible though that—after enough shards are
reassigned—the state will get to the "yellow"
status,
and no matter how many further retry_failed
commands
you will issue, unassigned_shards
number will not go
down:
$ curl https://elasticsearch.example.com:9200/_cluster/health?pretty
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 21,
"active_shards" : 34,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 80.95238095238095
}
One way or another, we already successfully restored our cluster and it is now fully functional, so we can let Elasticsearch try to heal itself by restarting the service on all data nodes:
$ sudo service elasticsearch restart
If at this point some of your indices are still red, and there
are shards that are corrupt and can not be reassigned, it may be
time to send those leftover UNASSIGNED
shards into the
abyss.
There is no “delete shard” API call in Elasticsearch, but there
is a command to allocate an empty primary shard on a particular
data node, which is effectively the same thing, except that you
need to tell Elasticsearch which node the new shard should be
assigned to. Any arbitrary node can be chosen for that purpose,
since Elasticsearch will rebalance shards later anyways, so in this
example we’ll use the elk-dev-data-node-00-us-east-1a
node. Be aware, it will result in data loss!
$ RESP=$(curl -s "https://elasticsearch.example.com:9200/_cluster/state/routing_table" | jq '
.routing_table.indices
| .[] | .shards | .[] | .[]
| select(.unassigned_info.reason == "ALLOCATION_FAILED")
')
$ REQ=$(echo "$RESP" | jq 'select (.primary)
| { allocate_empty_primary: {
index: .index,
shard: .shard,
node: "elk-dev-data-node-00-us-east-1a",
accept_data_loss: true
}
}' | jq --slurp '{commands: .}')
$ curl -s -XPOST "https://elasticsearch.example.com:9200/_cluster/reroute" -d "$REQ" -H 'Content-Type: application/json'
Overwriting bad shards is guaranteed to fix the problem with
"red"
indices, and it results in much less data loss
than deleting the whole index.
If you still want to just go ahead and delete all indices with
status:"red"
, here is a very dangerous
script that will do so, but use it as a very last resort,
unless you really don’t care about the data:
$ RED_INDICES=$(curl -s "https://elasticsearch.example.com:9200/_cluster/health?level=indices" |
jq -r '[.indices | to_entries[] | select(.value.status == "red") | .key] | join(",")')
$ curl -s -XDELETE "https://elasticsearch.example.com:9200/$RED_INDICES"
We’ve deployed Elasticsearch/Logstash/Kibana (ELK) stack on
numerous occasions and it proved itself as an amazing log
aggregation and analysis solution. It could just as well be used
for other purposes such as monitoring, structured data ingestion,
or simply storage for documents. Whatever your use case is, if
Elasticsearch is at its center, maintenance has to be thought out
properly and curator
is a must.
As it was mentioned before, Elasticsearch is pretty good at staying healthy. Disasters do happen though, and everyone’s situation is very different, so if above guide didn’t solve your problem, hopefully at least it helped a bit with narrowing down the necessary solution. Please, share your experience with us by commenting in the form below. If you need help deploying ELK or you are stuck trying to bring your Elasticsearch cluster back to life, feel free to contact us.
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.