NoSQL – john's blog

I recently came across a scenario where both of our Couchbase servers had failed due to major failures at our hosts’ data centers. One server eventually came back up but its state was set to “pending” and our app could not connect to it. We did enable replication but when we attempted to click the “fail over” button on the bad node, the scary data loss warnings frightened us away from attempting the fail over. Eventually, the second server came back on its own and the state of both Couchbase nodes changed to “up”.

This exercise is a test to see just how easy it is to recover from a single node and all node failure (assuming the node’s hard drives are still intact).

While the Couchbase documentation does explain all of this, I found this experiment most helpful to properly understand exactly what happens when nodes go down.

Set up a test two-node Couchbase environment

If you are using CentOS 6 or RedHat these steps should work. Otherwise just follow the instructions on couchbase.com.

sudo yum update -y
sudo wget http://packages.couchbase.com/releases/2.2.0/couchbase-server-community_2.2.0_x86_64.rpm
sudo rpm --install couchbase-server-community_2.2.0_x86_64.rpm

Make sure the server’s firewall has these TCP ports open:

11209-11211, 4369, 8091-8092, 21100-21299

Once Couchbase is installed, you can access the Couchbase admin console from your browser:

http://your-couchbase-server-1:8091

Since this is the first node we will start a new cluster:

Default settings are fine for our test.

Select the beer-sample bucket so we can have some data to check when the nodes recover. You can use your own bucket too, just make sure replication is enabled.

We don’t care about Couchbase notifications for our test servers.

Set up a Couchbase administrator account.

First node setup is complete:

Now we need to set up the second node.

Repeat the steps above to install Couchbase.

Once Couchbase is installed on the second server visit that Couchbase server’s administration console in your browser.

http://your-couchbase-server-2:8091

This time we will be joining an existing cluster. Enter the IP address of the first node and the administrator username and password you set during the setup of the first node.

Server should now be associated to your Couchbase cluster.

In order to actually use the new node with your cluster, the cluster needs to be rebalanced. Click “Server Nodes” from the top nav and then click the “Pending Rebalance” tab. Then click “Rebalance” to the right.

Wait for the nodes to rebalance before proceeding.

When rebalancing is complete your nodes should look similar to this:

Now it’s time to fail some nodes.

Single-node failure

First have a look at the buckets in your cluster. Note the number of items in the beer-sample bucket. You should see 7303 items (unless the sample bucket has changed since this post).

The item count is an easy way to see how much data is potentially available.

Ok, now it’s time to kill a node. Choose one of your Couchbase nodes (it doesn’t matter which one) and either shut it down or just stop the Couchbase server service.

sudo service couchbase-server stop

If you were viewing the “failed” nodes web administration console you will be disconnected and should login to the other node’s web console.

You should see one node up and one down.

Now have a look at your buckets. Note that the item count is now reduced by 50%. The data is still safe because the data was replicated and evenly distributed on all nodes. We are seeing an reduced item count because half the active data is gone.

To get back access to all of our data we need to make the replica data (on our remaining node) active. This is actually really easy. Just click “Fail Over” on the down node.

You will be presented with the very scary data loss warning. I’m sure in some circumstances you will lose data but not with this simple scenario.

The “down” server will be added to the “pending rebalance” tab. If you rebalance now, any data not replicated across the cluster on the “down” server will be lost. If the “down” server comes back online while it is pending rebalance you will be prompted to add the server back. If you did rebalance, the server will have to be reconfigured manually to join the cluster again.

Have a look at your buckets now. Item count should be 7303 again and it should look the same as before, except you now only have 1 node.

Your Couchbase cluster should now be working (but slower and without replication).

Restart the “down” node so we can do the next test.
Couchbase should automatically detect that the previously “down” server is back and it will prompt you to add it.

Add the node back and rebalance. Once complete your cluster should be up and running with 2 nodes.

Two-node failure

This is the actual situation we found ourselves in last week. Both of our nodes went down at the same time. To replicate this, stop the Couchbase service on both nodes.

Node 1:

sudo service couchbase-server stop

Node 2:

sudo service couchbase-server stop

Now start the Couchbase service on one of the nodes.

sudo service couchbase-server start

Now look at the buckets. Yikes! Item count is 0 on beer-sample.

To resolve this, it’s actually the same procedure as a single node failure. The only difference is that this time no nodes are up which means none of the Couchbase data is in an active state.
Click “Fail over” on the “down” node and confirm the fail over.

Now the node that was “pending” should now be “up”.

Have a look at the buckets which should show 7303 items.

The cluster should now be running, just without replications and slower since we only have 1 node.

Now restart the Couchbase service on the “down” node.

sudo service couchbase-server start

Add it back to the cluster and rebalance.

Your cluster should now be fully restored.

Download ElasticSearch

wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.20.6.tar.gz

Extract archive
```
tar -xzf elasticsearch-0.20.6.tar.gz
```
Move extracted folder to /opt/elasticsearch
```
mv elasticsearch-0.20.6 /opt/elasticsearch
```
[ad name=”Google Adsense 468×60″]
Set permissions
```
chown -R root:root /opt/elasticsearch
```
Change to elasticsearch directory
```
cd /opt/elasticsearch
```
// //

Install Web GUI plugin

bin/plugin -install mobz/elasticsearch-head

Install Couchbase Transport plugin

bin/plugin -install transport-couchbase -url http://packages.couchbase.com.s3.amazonaws.com/releases/elastic-search-adapter/1.0.0/elasticsearch-transport-couchbase-1.0.0.zip

Setup a username and password for Couchbase Replication to connect to your ElasticSearch server. Change “abc123” to your desired password.
```
echo "couchbase.password: abc123" >> config/elasticsearch.yml

 echo "couchbase.username: admin" >> config/elasticsearch.yml

 
```

Edit ElasticSearch configuration file and set the following parameters

cluster.name: NameOfYourCluster

network.host: local ip address of this node

node.name: "name of this node"

Download a script that will allow you to run ElasticSearch as a service

curl -L http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz

We only need the one script so move it over
```
mv *servicewrapper*/service bin/
```
Cleanup
```
rm -Rf *servicewrapper*
```
Install ElasticSearch as service with the new script.
```
bin/service/elasticsearch install
```

Create a symbolic link

ln -s `readlink -f bin/service/elasticsearch` /usr/local/bin/rcelasticsearch

Start the service
```
service elasticsearch start
```
Make ElasticSearch start on boot
```
chkconfig elasticsearch on
```

Set the default template for Couchbase Transport

curl -XPUT http://localhost:9200/_template/couchbase -d @plugins/transport-couchbase/couchbase_template.json

That’s it. Your ElasticSearch server is now ready to be setup as a replication endpoint for Couchbase. For instructions on how to setup the replication on your Couchbase server visit: http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search

john's blog

the web, technology and miscellaneous rants

Category: NoSQL

Simple recovery of a Couchbase cluster when one or all nodes fail

Set up a test two-node Couchbase environment

Single-node failure

Two-node failure

Configure ElasticSearch for use with Couchbase replication.