Read time: 2 minutes

CouchDB and Lucene integration

August 2018: Please note that this post was written for old versions of Lucene and CouchDB. It is left here for historical purposes only.

Last week, I had to integrate Lucene full-text search engine with CouchDB. Here are some quick handy notes in case you have to deal with this integration.

Lucene benefits:

  • ranked searching
  • powerful query types: phrase queries, wildcard queries, proximity queries, range queries, etc
  • fielded searching
  • boolean operators
  • sorting by any field
  • allows simultaneous update and searching


Followed instructions from

Install maven2:

sudo apt-get install git-core maven2

Download the couchdb-lucene source:

git clone git://

Build everything:

cd couchdb-lucene

Copy the assembled jar file into a proper directory, and give appropiate permissions:

mkdir /var/lib/couchdb/1.0.1/lucene/
cp /var/lib/couchdb/1.0.1/lucene/
cd /var/lib/couchdb/1.0.1/lucene/
cd ..
chown -R couchdb lucene/

Setting up the integration CouchDB-Lucene

Configure the proper options in /etc/couchdb/local.ini file. Add the following parameters at the end of the file:

os_process_timeout=60000 ; increase the timeout from 5 seconds.

fti=/usr/bin/python /var/lib/couchdb/1.0.1/lucene/couchdb-lucene-0.7-SNAPSHOT/tools/

_fti = {couch_httpd_external, handle_external_req, <<"fti">>}
Install the init file to start under services:

cd /var/lib/couchdb/1.0.1/lucene/couchdb-lucene-0.7-SNAPSHOT/tools/etc/init.d/couchdb-lucene
cp couchdb-lucene /etc/init.d/
Edit the couchdb-lucene file and set the correct location of the run script


Now you can start the service using the usual service syntax:

service couchdb-lucene start

Restart the couchdb service to apply configuration changes

service couchdb restart


Used this design document in martintest2 DB, it has a ‘title’ attribute where I will search:

   "_id": "_design/foo",
   "_rev": "1-166900c56b2e87d91bb48dcf890c84ed",
   "fulltext": {
       "by_title": {
           "index": "function(doc) { var ret=new Document(); ret.add(doc.title); return ret }"
Tried this command for a martintest2 DB..

curl -X GET http://localhost:5984/martintest2/_fti/_design/foo/by_title?q=hello

corresponding to this ‘register’:

   "_id": "1679b0952323a672a5a84d76dc002077",
   "_rev": "1-97dd85b06c25328a300f3f4041def370",
   "title": "Hello World",
   "body": "Well hello and welcome to my new blog...",
   "date": "2009/01/15 15:52:20"

Making a fuzzy query: use the ~ parameter

curl -X GET http://localhost:5984/martintest2/_fti/_design/foo/by_title?q=hello

Like what you read?

Subscribe to our newsletter and get updates on Deep Learning, NLP, Computer Vision & Python.

No spam, ever. We'll never share your email address and you can opt out at any time.
Comments powered by Disqus