Apache

       CouchDB

25. 04. 2016

Josef Ludvíček

Jakub Knetl

What is CouchDB?

  • document database
  • embraces web approach
  • simple to use
  • written in erlang
  • well documented

Apache CouchDB™ is a databasethat uses JSON for documents,JavaScript for MapReduce indexes,and regular HTTP for its API

Basic features

  • JSON data
  • data accessible through
    • REST API
    • web interface (futon)
  • data presented using view
  • query with javascript
  • everything is document

Documents

  • Database is flat collection of documents
  • Document
    • is self-describing (with no schema)
    • has unique id (key)
    • has no limits on field size or element count
    • may have attached metadata
    • is versioned
    • may be validated

REST API

  • calls return JSON objects
# create database
curl -X PUT http://127.0.0.1:5984/demo

# get information about database
curl -X GET http://127.0.0.1:5984/demo

# create document
curl -H 'Content-Type: application/json' \
            -X POST http://127.0.0.1:5984/demo \
            -d '{"company": "Example, Inc."}'

# get document with id 8843faaf0b831d364278331bc3001bd8
curl -X GET http://127.0.0.1:5984/demo/8843faaf0b831d364278331bc3001bd8

Views

  • adds a structure to semi-structured data
  • with views you can
    • filter documents
    • extract data
    • build efficient indices
  • views are built on-demand
  • multiple view allowed for one document

View model

  • defined using javascript map-reduce
    • you may perform arbitrary computation for view
    • only map function is needed
  • stored separately from the data
    • inside special _design database
    • => view is also document
    • => it may be also replicated

View Indexes

  • view indexes are incrementally updated
    • => no need to do full re-indexing
  • map reduce model allows view to
    • do incremental update
    • to be computed in parallel
  • B-tree storage engine
    • internal data storage
    • documents
    • views

Eventual Consistency

  • Consistency: All database clients see the same data, even with concurrent updates.
  • Availability: All database clients are able to access some version of the data.
  • Partition tolerance: The database can be split over multiple servers.
  • CouchDB favours availability and partition tolerance over consistency

Local consistency

  • on single node CouchDB avoids conflicts
    • by returning a 409 Conflict error.
    • PUT checks whether version changes

No Data locking

  • problems of locking
    • locks makes accesses sequential
    • locks requires overhead
  • MVCC (Multi-version Concurrency Control)
    • use multiple versions instead of maintaining one version consistent

Replication

  • peer-to-peer
    • clients can update DB on arbitrary replica
  • each CouchDB host has independent replica
    • accessible even when host is partitioned
  • incremental replication
  • filtered replication

Replication II

  • only last revision is replicated to other nodes
    • previous versions stays on local host only
  • each replication proces is document
    • stored in _replicator database
  • What if two conflicting updates occurs?

Conflict resolution

  • what if two conflicting versions are present?
    • then both versions are stored and replicated
  • CouchDB tries to resolve conflict automatically
    • winning version is computed using deterministic algorithm
    • losing version is stored as previous version
  • winning version is returned to the clients using REST endpoint
  • losing version still accessible and may be merged
  • HTTP API can be used to show you version with conflicts

 

 

  • merge is left up to application (because your merging strategy may differs based on the data)
curl -X GET http://localhost:5984/users/354822?conflicts=true

Compaction

  • recovering of "wasted space"
  • reduces disk space usage
    • removes old data revisions
    • clean old view indices
  • on schedule or event (e.g. exceedeng wasted space limit)
  • clones active data to new file and deletes old file

CouchApp

  • CouchApp is set of tools to simplify development
  • CouchDB is able to host web application
    • application is usually html + javascript
    • application is stored as document
  • advantages over standard web server
    • scalability
    • flexibility
    • versioning

Demo time

Questions?

Thanks for attention...