NoSQL with XML

I’m always interested in the idea of NoSQL, alternative databases and document storage because I think someday this approach may become standard. If there is one constant pain in any CMS, it’s the challenge of syncing different versions, updating code. It’s great we have GIT to keep our code in sync, too bad our content and the site configuration is rolling out of date every single second.

On FWW (http://freelancerworldwide.com) which we have hosted on WP Engine we’re constantly faced with this hurdle when deploying. We use WP Engine’s GIT Push feature to update code on a staging server, then when it’s ready to deploy we do a manual migration with WP Engine’s migrate code-only feature. The database obviously cannot be migrated from testing sites or else we’d lost live data. Sure we can roll forward our staging sites using the latest DB from live, but even that comes with it’s own risks. We had to a “cleaner” script to eliminate all the sensitive user data from the live DB otherwise aside from privacy and security risks, we’d be possibly sending email notifications from testing servers out to the live users.

Imagine a system without a DB, with a form of version-controlled content and configuration. I tend to view this as a way to solve all these migration issues. And I like XML for it’s relatively low learning curve. I envision users, even non-devs, being able to manually edit XML files to change config settings, or even to write their own content directly into a file. And I also like XML because it’s tangible, or real in a way that some of the other NoSQL DB’s are not. If the data disappears, then requires code to retrieve, to me that means the objective is lost. I want the absolutely simplicity and flexibility of being able to open a document on a web server and see, there is a the configuration, or there is the content in “plain sight”.

On two GoldHat projects we’ve used some form of raw XML storage. Certainly storing data in XML is easy enough. It’s searching and filtering and other matters that become interesting, and sometimes messy. But nothing so far has really stopped us from doing all the normal operations we’d usually see in a DB solution. Below are a few of the tactics we’ve used to make an XML-based data storage system with version control:

  1. Create a separate directory from the rest of the project, we’ve been using “/data” but it could be “/content” or other.
  2. Version control the content directory separately from the source code of your project, which means adding it to your .gitignore file so it doesn’t end up in the source code.
  3. Create a method that automates indexing of records. When an XML record, say “ticket” is created then the “ticket index” file is updated merging in the reference to the ID of the given ticket.
  4. Create random alpha-numeric ID’s for records, possibly prefixed by a code for the record type such as “TKT-8779XC81729”. Then store the files in a directory based on type such as “data/ticket/TKT-8779XC81729.xml”.
  5. Write loading methods in a class such as “Records” to load a single record such as $record->getByID( $id ).
  6. Write a class specifically for loading record lists such as “RecordList” and provide methods to set filters such as $record_list->addFilter( $field, $value, $operator ). Filtering is then handled by XPath queries.
  7. Use a front-end solution such as List.js or jQuery Datatables to limit your need for filtering and sorting on the server side. Build your system to load list data unfiltered and unsorted, but dynamically apply front-end sorting/filtering.