Read time: 2 minutes

CouchDB and Lucene integration

August 2018: Please note that this post was written for old versions of Lucene and CouchDB. It is left here for historical purposes only.

Last week, I had to integrate Lucene full-text search engine with CouchDB. Here are some quick handy notes in case you have to deal with this integration.

Lucene benefits:

  • ranked searching
  • powerful query types: phrase queries, wildcard queries, proximity queries, range queries, etc
  • fielded searching
  • boolean operators
  • sorting by any field
  • allows simultaneous update and searching

 Install

Followed instructions from http://wiki.fluidproject.org/display/fluid/Setting+Up+CouchDB+and+Lucene

Install maven2:

sudo apt-get install git-core maven2

Download the couchdb-lucene source:

git clone git://github.com/rnewson/couchdb-lucene.git

Build everything:

cd couchdb-lucene
mvn

Copy the assembled jar file into a proper directory, and give appropiate permissions:

mkdir /var/lib/couchdb/1.0.1/lucene/
cp couchdb-lucene-0.7-SNAPSHOT-dist.zip /var/lib/couchdb/1.0.1/lucene/
cd /var/lib/couchdb/1.0.1/lucene/
unzip couchdb-lucene-0.7-SNAPSHOT-dist.zip
cd ..
chown -R couchdb lucene/

Setting up the integration CouchDB-Lucene

Configure the proper options in /etc/couchdb/local.ini file. Add the following parameters at the end of the file:

[couchdb]
os_process_timeout=60000 ; increase the timeout from 5 seconds.

[external]
fti=/usr/bin/python /var/lib/couchdb/1.0.1/lucene/couchdb-lucene-0.7-SNAPSHOT/tools/couchdb-external-hook.py

[httpd_db_handlers]
_fti = {couch_httpd_external, handle_external_req, <<"fti">>}
Install the init file to start under services:

cd /var/lib/couchdb/1.0.1/lucene/couchdb-lucene-0.7-SNAPSHOT/tools/etc/init.d/couchdb-lucene
cp couchdb-lucene /etc/init.d/
Edit the couchdb-lucene file and set the correct location of the run script

DAEMON=/usr/local/couchdb-lucene-0.5.5/bin/run

Now you can start the service using the usual service syntax:

service couchdb-lucene start

Restart the couchdb service to apply configuration changes

service couchdb restart

Firing-up

Used this design document in martintest2 DB, it has a ‘title’ attribute where I will search:

{
   "_id": "_design/foo",
   "_rev": "1-166900c56b2e87d91bb48dcf890c84ed",
   "fulltext": {
       "by_title": {
           "index": "function(doc) { var ret=new Document(); ret.add(doc.title); return ret }"
       }
   }
}
Tried this command for a martintest2 DB..

curl -X GET http://localhost:5984/martintest2/_fti/_design/foo/by_title?q=hello
{"limit":25,"etag":"1ffee9136a5b","fetch_duration":0,"q":"default:hello","search_duration":0,"total_rows":1,"skip":0,"rows":[{"id":"1679b0952323a672a5a84d76dc002077","score":0.8784157037734985}]}

corresponding to this ‘register’:

{
   "_id": "1679b0952323a672a5a84d76dc002077",
   "_rev": "1-97dd85b06c25328a300f3f4041def370",
   "title": "Hello World",
   "body": "Well hello and welcome to my new blog...",
   "date": "2009/01/15 15:52:20"
}

Making a fuzzy query: use the ~ parameter

curl -X GET http://localhost:5984/martintest2/_fti/_design/foo/by_title?q=hello
{"limit":25,"etag":"1ffee9136a5b","fetch_duration":0,"q":"default:hello","search_duration":0,"total_rows":1,"skip":0,"rows":[{"id":"1679b0952323a672a5a84d76dc002077","score":0.8784157037734985}]}

Like what you read?

Subscribe to our newsletter and get updates on Deep Learning, NLP, Computer Vision & Python.

No spam, ever. We'll never share your email address and you can opt out at any time.
Comments powered by Disqus

Get in touch

Do you have a project in mind?
We'd love to e-meet you!

Thanks for reaching out!

We'll reply as soon as possible.

And in the meantime?
Check out our blog to see what we're currently working on.