My knowledge of Rust has surpassed my knowledge of CouchDB. I now think less about how to abstract the CouchDB API and more about what to abstract. Additionally, I believe my original strategy for the couchdb crate is a bad idea.
That strategy is to provide a thin abstraction whereby the application has fine-grained control over each HTTP request it sends to the CouchDB server. Here’s an example request with one query parameter:
// GET /stuff/my_doc?rev=<some_revision>
client.get_document("/stuff/my_doc")
.rev(&some_revision)
.run();
There’s not much abstraction here. The application explicitly sets the
URI path and rev
query parameter. This level of abstraction leads to
two problems:
The CouchDB API provides many ways of doing the same thing, so the library would also provide many ways of doing the same thing—i.e., bloat.
The library is less useful to most applications.
Let’s look at the first problem: bloat.
Here are some examples how CouchDB has two ways of doing the same thing:
Are you creating a document? You can PUT the document or POST it to the database.
Are you deleting a document? You can specify its revision via the
If-Match
header or the rev
query parameter.
Are you uploading an attachment? You can embed the attachment content as JSON using base64-encoding or use a multipart message to separate the attachment from its document and avoid the overhead of base64.
A good CouchDB library will hide meaningless choices and use a reasonable default. For example, the library should use multipart to upload attachment content because multipart uses significantly less bandwidth than base64 in real-world cases. Application programmers shouldn’t be bothered about this detail.
There’s even more to be said about attachments, and that brings us to the second point: being useful. But before I explain this, you need a basic, two-minute understanding of CouchDB attachments.
A CouchDB attachment is a MIME-typed blob added to a document—think
email attachment.
But, unlike an email, a CouchDB document is
revision-controlled, hence each attachment has a history. Imagine the
following sequence of events:
You create a document with an attachment containing the content
Hello
at document revision 1.
You update the attachment with the content Goodbye
at document
revision 2.
After updating the attachment, you can retrieve the original
Hello
content by explicitly requesting revision 1.
But what happens to an attachment when you update the document itself? That depends on how much info you send in your update:
If you send the full attachment, including content, then the server will overwrite the existing attachment with the new content.
Or, if you send an attachment stub
containing only the
attachment’s name, then the server will make no changes to the
existing attachment.
Or, if you send no attachment info at all, the server will delete the existing attachment.
If the library requires the application to explicitly provide attachment info—as the couchdb crate does—then deletion is the default. But deletion is a bad default. A better default would be to send a stub and make no changes.
However, a CouchDB library that does automatic stub-sending would take more control over the outgoing HTTP request. Such a library would also need to know about all attachments without the application telling the library about them. Is this even possible? Yes.
Imagine the following pseudocode:
struct Meta {
id: DocumentId,
revision: Revision,
attachments: HashMap<String, Attachment>,
}
struct Speech {
transcript: String,
}
let doc1 = db.read_document("gettysburg");
let (meta, mut content): (Meta, Speech) = doc1.into_content();
if content.transcript == "Four score and *eight* years ago…" {
// Oops! Need to correct Lincoln's speech.
content.transcript = "Four score and *seven* years ago…".to_owned();
let doc2 = Document::from_content(meta, content);
db.write_document(doc2); // sends a stub for any existing attachment
}
A key fact is that the CouchDB server sends attachment info as part of
any document. Hence, in the code above, the doc1
variable holds
all attachment info, and it transfers the info to meta
, with meta
later transferring the info to doc2
. When the application sends doc2
to the server, the library knows enough to send a stub for any existing
attachment.
Suppose instead the application adds (or modifies) an attachment:
let doc3 = db.read_document("washington_farewell");
let (mut meta, content): (Meta, Speech) = doc3.into_content();
meta.attachments.insert("manuscript.png",
Attachment::new("image/png", load_image()));
let doc4 = Document::from_content(meta, content);
db.write_document(doc4); // sends a stub for any preexisting attachment
In this case, the doc4
variable has enough info to send the full
content of the new manuscript.png
attachment and a stub for any other,
preexisting attachment. Waste not, want not.
I’ve been exploring this and other ideas in my new project, Chill.
Chill hides more HTTP headers, URI query parameters, and JSON content of the HTTP messages. With Chill, the application declares what to do and Chill figures out how to do it.
My thanks go to Jeremy Wright for editing early drafts of this article and making it better.
Got feedback? Email me at c.m.brandenburg@gmail.com.