- Start Using MongoDB
- Getting Busy with Riak
- Putting NoSQL In Its Place
- Linked Data, RDF, and SPARQL
But first, let’s talk about what makes Riak so cool. Even if you don’t know how to use it, you should know the features; with 25% of the Fortune 50 using it, you never know when you might get asked to support it.
Riak the RESTful
Riak has a full HTTP 1.1 compliant API. This means that you can perform read, write, delete, and query operations using standard PUT (POST), GET, and DELETE operations. As such, curl is a great tool for working from the command line but any tool that can make HTTP calls will work.
Riak the Bountiful
On top of the standard HTTP API, it also has APIs for Java, PHP, Python, Ruby, JQuery, node.js, and C/C++. The language of choice is Erlang; after all, Riak was written in it. Between the RESTful API and other programming language support, Riak can be used from nearly anywhere.
Riak the Scalable
A huge selling point for Riak is that it is very scalable. Adding a node to a cluster is as simple as your first install, and doing so gives near linear performance on complex functions and with loads of concurrency. The data replication allows very robust fault tolerance (many failed nodes will still allow full access) and its distribution algorithms yield predictable results from even distribution.
Riak the Free
Riak is open source, though the parent company Basho does offer paid options. Riak Enterprise is offered for multi-datacenter replication, monitoring, and support. For storage, Riak Cloud Storage offers an S3 API compatible storage engine with replication capabilities. If you don’t need these services, you can scale Riak to your heart’s content for nothing.
Installing Riak
Now that I’ve sung its praises, let’s get down to the install. I have a simple Oracle Enterprise Linux 6.3 x86_64 VM I spun up for the purpose using a standard server install. In order to install Riak, we must:
- Install the expect library
- Download riak 1.3 from AWS
- Install riak via rpm
- Set open file limit to 4096
- Start Riak
[root@orahost ~]# yum install expect ... Installed: expect.x86_64 0:5.44.1.15-4.el6 [root@orahost ~]# wget http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.0/rhel/6/riak-1.3.0-1.el6.x86_64.rpm --2013-03-03 19:01:08-- http://s3.amazonaws.com/downloads.basho.com/riak/1.3/1.3.0/rhel/6/riak-1.3.0-1.el6.x86_64.rpm Resolving s3.amazonaws.com... 72.21.195.97 Connecting to s3.amazonaws.com|72.21.195.97|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 23875120 (23M) [application/x-rpm] Saving to: “riak-1.3.0-1.el6.x86_64.rpm” 100%[============================================================================================================================>] 23,875,120 6.64M/s in 4.0s 2013-03-03 19:01:12 (5.64 MB/s) - “riak-1.3.0-1.el6.x86_64.rpm” saved [23875120/23875120] [root@orahost ~]# rpm -Uvh riak-1.3.0-1.el6.x86_64.rpm Preparing... ########################################### [100%] 1:riak ########################################### [100%] root@orahost ~]# ulimit -n 4096 [root@orahost ~]# riak start Attempting to restart script through sudo -H -u riak
And we are done and installed. At this point Riak is fully usable for reads and writes.
How Riak Storage Works
Riak data is stored inside of buckets, a flat namespace you can think of like a table. Buckets are unique in a Riak installation and can store oodles of key/value pairs. Keys are like primary keys, unique within a bucket. Values are where the magic happens: they can be anything. Strings, XML, JSON, binary, you name it. You can use it to store session data (login, expiry, preferences, etc), log files, backups, large XML files, whatever. The power to quickly store and retrieve data of any kind is where Riak shines.
Now, when an Oracle or other RDBMS person looks at this they may wonder why it’s so appealing; after all, you can make a table with a KEY, VALUE column combination and use that for the same purpose. But the strength of Riak is in its scalability and flexibility. It’s not meant to take the place of a relational database but it can be a great tool alongside one (or on its own).
Let’s PUT in Some Data!
In order to write to Riak, we write to a URL in the form of: http://hostname:8098/buckets/bucketname/keys/key (note 8098 is the default port). The body of the HTTP PUT request will be the data to store into the bucket with the key you specified. For instance, to write into a bucket called ‘logdata’ on localhost with a numeric key, you could use: http://localhost:8098/buckets/logdata/keys/1.
There are a few required/recommended parameters, but the ones we’ll concern ourselves with here are Content-Type and Content-Length. And to do it, we will use the UTL_HTTP package built into Oracle.
Before we can do that, we have to connect to Oracle as SYS and setup Access Control Lists (ACL):
BEGIN dbms_network_acl_admin.create_acl ( acl => 'riak.xml', description => 'Permissions to access risk datastore', principal => 'RIAK_USR', is_grant => TRUE, privilege => 'connect' ); end; / begin dbms_network_acl_admin.assign_acl ( acl => 'riak.xml', host => '127.0.0.1' ); end; /
In the two PL/SQL blocks above, we are creating an ACL called riak.xml which gives the RIAK_USR user the ability to connect to host 127.0.0.1. Now that it’s done, we can run UTL_HTTP operations against localhost.
Here is the function I’ve created to do this:
create or replace function testput(pv_bucket in varchar2, pv_key in varchar2, pv_data in varchar2) return varchar2 is lv_req utl_http.req; lv_rsp utl_http.resp; lv_out varchar2(32767); begin lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key, method => 'PUT'); utl_http.set_header(lv_req, 'Content-Type', 'text/plain'); utl_http.set_header(lv_req, 'Content-Length', length(pv_data)); utl_http.write_text(lv_req, pv_data); lv_rsp := utl_http.get_response(lv_req); lv_out := lv_rsp.status_code; utl_http.end_response(lv_rsp); return lv_out; exception when utl_http.end_of_body then utl_http.end_response(lv_rsp); return lv_out; end; /
In this function, you can pass in a bucket name, key, and value. The function opens a PUT request, sets the headers, and writes the body with the utl_http.write_text procedure. The response that is returned is the status code from riak (204 is normal for inserts).
SQL> select testput('newbucket', '1', 'My test document') from dual; TESTPUT('NEWBUCKET','1','MYTESTDOCUMENT') -------------------------------------------------------------------------------- 204
GET It?
Using a GET operation I can pull data by key, like with this function:
create or replace function testget(pv_bucket in varchar2, pv_key in varchar2) return varchar2 is lv_req utl_http.req; lv_rsp utl_http.resp; lv_out varchar2(32767); begin lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key, method => 'GET'); lv_rsp := utl_http.get_response(lv_req); utl_http.read_text(lv_rsp, lv_out); utl_http.end_response(lv_rsp); return lv_out; exception when utl_http.end_of_body then utl_http.end_response(lv_rsp); return lv_out; end; /
This function looks an awful lot like the PUT function. It retrieves data from the Riak database using a bucket and key with a GET operation. The body of the returned page is returned from the function.
SQL> select testget('newbucket', '1') from dual; TESTGET('NEWBUCKET','1') -------------------------------------------------------------------------------- My test document
And to make sure we cover it, let’s delete that row (and verify).
SQL> select testdelete('newbucket', '1') from dual; TESTDELETE('NEWBUCKET','1') -------------------------------------------------------------------------------- 204 SQL> select testget('newbucket', '1') from dual; TESTGET('NEWBUCKET','1') -------------------------------------------------------------------------------- not found
If you guessed we did this with an HTTP DELETE operation, you’d be right:
create or replace function testdelete(pv_bucket in varchar2, pv_key in varchar2) return varchar2 is lv_req utl_http.req; lv_rsp utl_http.resp; lv_out varchar2(4000); begin lv_req := utl_http.begin_request(url => 'http://127.0.0.1:8098/buckets/' || pv_bucket || '/keys/' || pv_key, method => 'DELETE'); lv_rsp := utl_http.get_response(lv_req); lv_out := lv_rsp.status_code; utl_http.end_response(lv_rsp); return lv_out; exception when utl_http.end_of_body then utl_http.end_response(lv_rsp); return lv_out; end; /
With those three functions, we can store (and update), retrieve, and delete any data we want. You can store VARCHAR2 and CLOB data, BLOBs as binary data, XML from DBMS_XMLGEN, JSON using the Alexandria Library, and really anything else you can think of. This data can be used for programming logic (like session data storage) or for later analysis with MapReduce.
And because they look so much alike, it would probably be really easy to put them in a nice Riak helper package. A project for another day, perhaps.
Conclusion
Just key/value storage to a free, highly distributed and redundant database is useful and impressive. But as I’ve said, there’s a lot more you can do with Riak. For instance:
- Riak Search allows you to perform full text searches across the values of a bucket. This is an indexed search which is not quite as powerful as a MapReduce, but performs better and more predictably.
- Secondary Indexes (2i) allow you to pick and choose what will be indexed and how. You can define tags for your data which allow you to search it on your terms; for instance, for a bucket full of emails, you can add tags for twitter and blog addresses, then search on those.
- Links which allow one way relationships between objects that can be walked to form loose graph-like relationships.
- MapReduce is the big kahuna. As with most systems, you can do a Map to gather a single set of data or use Reduce functions to further aggregate the info. Riak supports MapReduce functions in JavaScript and Erlang.
I plan on publishing some MapReduce examples in the near future, but for now I value my sanity and wish to go on pretending GROUP BY CUBE is the best aggregating mechanism since they fired up the hazelnut chocolate mixer at the Nutella plant.
Really Interesting Article thank you , was looking for ways of how to move data Oracle to riak. Any idea how this technique performs – I need tomove 2Tb of data to Oracle -> Riak.