Two issues in Elastic Search recently came up, moving our Elastic Search data to a different server, and upgrading our Elastic Search instance from ES 0.18.6 to ES 0.19.11. All the while, new data was pouring in, and we didn’t want to lose it, and we didn’t want to cause much downtime. The motivation for doing all of this was that our statistics queries were coming back too slow, bogging down the system, and would sometimes crash our DSpace server if you were unlucky, requiring the SysAdmin or DevOps to come to the rescue and restart Tomcat. The users were not happy with that arrangement.

We had a food party at work where I, the DSpace Developer, and Travis, the SysAdmin found that we would have more fun drawing on the whiteboard about how to go about the move to the new Elastic Search architecture, as opposed to just talking about food, and how someone will be dearly missed.

So our Before State was that we had two applications, DSpace-Production and DSpace-Staging, that both produced and consumed Elastic Search data. The Elastic Search master instance lived on DSpace-Staging, and thus our Production application was very dependent on our Staging instance. At one time, the staging server also housed all of our Dark-Archive files, which required tons of Disk I/O, especially during disk backup periods, which could end up halting our production system. Not good.

Everything was running ElasticSearch 0.18.6, and some more technical information about the Before-State setup is that:

  • DSpace-Production runs Tomcat, in which node-local client (local=true, data=false, master=false) lives, which finds an Elastic-Search instance running locally on DSpace-Production (data=false, master=false). The DSpace-Production ElasticSearch instance has to use ES discovery process to discover the master, located on a server in the same LAN.
  • DSpace-Staging, just like production, except that its ElasticSearch instance is set (data=true, master=true), thus this is the master and workhorse of the entire operation.

Our process to upgrade to 0.19.11 and migrate to the new server was to disconnect ElasticSearch from real-time events, and to instead log all of them for later processing. We did that. We then shut down Elastic Search, installed Elastic Search 0.19.11 in place on our staging server (our master), copied the old 18.6 data directory into 19.11’s data directory. Upon starting ES 0.19.11, it in-place converted the data to the new format. We then configured the staging instance to point to a new discovery master to be our dedicated elastic search search, we call it PELA (Production ELAstic search). A magical feature in Elastic Search 0.19.11 allowed these two clusters to join, and to bring the old indexes into the new server.

After that, we had to configure our application to talk to Elastic Search 0.19.11, and to our new elastic search server, and then things continued. We had some performance woes, but bumped the memory to 3GB, and then it ran fine.

Elastic Search server with DSpace