Dynamic RADOS object interfaces with Lua
In this post I’m going to demonstrate how to dynamically extend the interface of objects in RADOS using the Lua scripting language, and then build an example service for image thumbnail generation and storage that performs remote image processing inside a target object storage device (OSD). We’re gonna have a lot of fun.
Note that this is a re-post of the article appearing at https://ceph.com/rados/dynamic-object-interfaces-with-lua/ which was published on October 29, 2013.
Update 27 March 2017: The Lua object class handlers are now merged into upstream Ceph as of the Kraken version.
RADOS Object Classes #
One of the less publicized features of the RADOS object store is the ability to extend the object interface by writing C/C++ plugins that add new remote execution targets that may perform arbitrary operations on object data. The ability to add user-defined functionality to the OSD is a very powerful feature allowing applications to reduce network round-trips and data movement, exploit remote resources, and simplify otherwise complex interfaces by taking advantage of the transactional context within which remote operations execute. But that’s enough marketing—here is a very simple example that computes the MD5 hash of an object without transferring the object payload over the network.
Example: MD5 Hash of Object #
The straightforward method for a client to compute the MD5 hash of an object is to first retrieve the entire object and then apply the MD5 hash function to the data locally. Using librados and the crypotpp library, this might look something like the following:
bufferlist data;
size_t size;
ioctx.read("my_obj", data, 0, 0);
byte digest[AES::BLOCKSIZE];
MD5().CalculateDigest(digest, (byte*)data.c_str(), data.length());
Here the client first reads the entire object over the network, and then computes the MD5 hash of the object data. However, transferring the entire object to the client can be avoided by introducing a custom object interface for computing the MD5 hash within the storage system. The following code snippet illustrates the basics of how an MD5 hash could be computed using the object class facility. Note that the following code would in practice be compiled into a shared library and loaded dynamically into a running OSD process, but we have omitted the deployment details to keep things simple (there are links at the end of this section to more information on getting started with object classes).
int compute_md5(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
{
size_t size;
int ret = cls_cxx_stat(hctx, &size, NULL);
if (ret < 0)
return ret;
bufferlist data;
ret = cls_cxx_read(hctx, 0, size, data);
if (ret < 0)
return ret;
byte digest[AES::BLOCKSIZE];
MD5().CalculateDigest(digest, (byte*)data.c_str(), data.length());
out->append(digest, sizeof(digest));
return 0;
}
Before explaining the function compute_md5
, let’s see how a client would
remotely invoke compute_md5
to calculate the hash:
bufferlist input, output;
ioctx.exec("my_obj", "my_hash_class", "compute_md5", input, output);
Here the client runs the librados exec method to invoke the compute_md5
function remotely on the object named “my_obj”. Note that the “my_hash_class”
is a name that identifies the plugin (not shown in this tutorial), and may
contain many functions that can be invoked remotely. Now, through the power of
networking, and lots of hand waving, a client can invoke the compute_md5
function above which will run remotely on the OSD storing the target object
(these are lots of gory details about how this actually happens that are beyond
the scope of this document). When the remote method is executed, it performs a
transaction that atomically reads the object payload and computes the MD5 hash,
all within the OSD process, avoiding any network transfer of object data. At
the end of the compute_md5
function the digest is written into the out
parameter that will be marshaled back to the client.
Now that is some pretty magical stuff right there. But, there are situations where the overhead of compiling C/C++ into a shared library–potentially with multiple target architectures–is too heavy weight. It’d be nice if we could inject and alter object interfaces on-the-fly. To address this need, we have created a mechanism for defining new object classes using the Lua scripting language, which I’ll describe next.
Additional Resources: Object Class Development #
While it was necessary to introduce the concept of object classes, unfortunately a full tutorial on the subject is not in the scope of this post. Located on Github is a “Hello, World” example object class containing extensive documentation. This resource is a good starting point, and if you have questions, please do not hesitate to ask questions on the Ceph mailing lists or IRC channels.
Dynamic Object Classes With Lua #
In order to support dynamic generation of object interfaces, we have embedded the LuaJIT VM inside the OSD process. Why Lua, you may ask? The Lua language and its run-time are specifically designed as an embedded language, and when coupled with the LuaJIT virtual machine, near native performance can be achieved. Briefly, the current implementation expects a Lua script defining any number of functions to be sent to the OSD along with a client request that specifies which specific function in the script to execute. Now let’s dig into the details.
A Lua object class is an arbitrary Lua script containing at least one exported function handler that a client may invoke remotely. By building up a collection of handlers, new and interesting interfaces to objects can be constructed and dynamically loaded into a running RADOS cluster. The basic structure of a Lua object class is shown in the following code snippet:
-- helper modules
-- helper functions
-- etc...
function helper()
end
function handler1(input, output)
helper()
end
function handler2(input, output)
end
objclass.register(handler1)
objclass.register(handler2)
In the above Lua script any number of functions and modules can be used to
support the behavior exported by the functions handler1
and handler2
. A client
can remotely execute any registered function, provide an arbitrary input, and
receive an arbitrary output.
Handler Registration #
Object classes written in Lua may have many functions, only a subset of which
are handlers available to be directly invoked by a client. In order to make a
Lua function available, the function must be exported by registering it. This
is done using the objclass.register
function. The following code snippet illustrates
how this works.
function helper()
-- help out with stuff
end
function thehandler(input, output)
helper()
end
objclass.register(thehandler)
In the above example objclass.register(thehandler)
exports the function thehandler
,
making it available for clients to call. A client that attempts to call the
helper function (an unregistered function), will receive a return value of
-ENOTSUPP
.
Error Handling Semantics #
In the previous section we presented an example object class method written in C++ that calculated the MD5 hash of an object. Returning to this example, notice that each operation on the object is carefully checked for failure, and an error code is returned if any operation fails. When a negative value is returned from an object class handler the current transaction will be aborted, and the return value is passed back to the client. When the handler has completed successfully a return value of zero will commit the transaction. While in C++ we must perform these checks explicitly, in Lua this common pattern for handling errors can be fully managed. Take as an example the following C++ object class handler:
int handle1(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
{
int ret = cls_cxx_create(hctx, true);
if (ret < 0)
return ret;
...
return 0;
}
The handler handle1
will return -EEXIST
if the object already exists (or any
other error encountered when running cls_cxx_create
), and return zero if the
handler complete successfully. The same functionality can be constructed in
Lua, but when error handling fits this common pattern of aborting
automatically, the Lua object class run-time will automagically select the
correct return value. For instance in the following example, handle2
and
handle3
have identical semantics to handle1
defined above in C++.
function handle2(input, output)
objclass.create(true);
return 0;
end
function handle3(input, output)
objclass.create(true);
end
objclass.register(handle2)
objclass.register(handle3)
Some operations return error codes that we may want to handle directly. For
example, when retrieving a value from the object map, -ENOENT
is used to
indicate that the given key was not found. If the handler code can deal with
this case (e.g. creating and initializing a new key), then it is simple enough
to just return all other error codes. This exact scenario is shown in the
following C++ handler, in which we abort on any error code that is not -ENOENT
.
int handle(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
{
string key;
::decode(key, *in);
int ret = cls_cxx_map_get_val(hctx, key, &bl);
if (ret < 0 && ret != -ENOENT)
return ret;
if (ret == -ENOENT) {
/* initialize new key */
}
...
return 0;
}
The same handler can be constructed in Lua as follows:
function handle(input, output)
key = input:str()
ok, ret_or_val = pcall(objclass.map_get_val, key)
if not ok then
if ret_or_val ~= -objclass.ENOENT then
return ret_or_val
else
-- initialize new key
end
end
val = ret_or_val
...
return 0
end
The trick here is to call the objclass.map_get_val
in protected mode via the Lua
pcall
function, which prevents any errors from being automatically propagated
to the caller, allowing our handler to examine the return value.
Logging #
An object class can write into the OSD log (e.g. /var/log/ceph/osd-0.log) to
record debugging information using the objclass.log
function. The function takes any
number of arguments which are converted into strings and separated by spaces in
the final output. If the first argument is numeric then it is interpreted as a
log-level. If no log-level is specified a default log-level is used.
objclass.log('hi') -- will log 'hi'
objclass.log(0, 'ouch') -- log 'ouch' at log-level = 0
objclass.log('foo', 'bar') -- log 'foo bar'
objclass.log(1) -- will log '1' at default log-level
Logging is useful in debugging script execution and can also be used to provide more detailed error information.
Object Payload I/O #
The payload data of an object can be read from and written to using the
objclass.read
and objclass.write
functions. Each function takes an offset and length
parameter.
size, mtime = objclass.stat()
data = objclass.read(0, size) -- size bytes from offset 0
objclass.write(0, data:length(), data) -- length of data at offset 0
Index Access #
A key/value store supporting range queries (based on Google’s LevelDB) can be
accessed using the objclass.map_set_val
and objclass.map_get_val
functions. A key can be
any string and a value is a standard blob of any size.
function handler(input, output)
objclass.map_set_val("foo", input)
data = objclass.map_get_val("foo")
assert(data == input)
end
Additional Resources #
The Lua object class facility is not yet in the mainline Ceph tree. The feature
is located in the cls-lua
branch, and can be checked out from github:
git clone git://github.com/ceph/ceph.git cls-lua
The normal procedures for building and installing Ceph from source apply, and the only dependency is that LuaJIT development libraries be installed. These dependencies are available on Ubuntu. In addition, more functionality than is listed in this post has been implemented, and a set of unit tests are available in the source tree demonstrating the full range of features.
Lua Client Libraries #
Before we jump into the sample application, I’ll introduce two additional components that will make our life easier. The first is Lua bindings for librados, and the second is a Lua library that hides the details of serializing Lua scripts for execution within the OSD.
Lua-RADOS #
Lua bindings for the librados client library are available on Github in the lua-rados project. Here we will provide a brief overview for context. Please consult the full documentation for additional information. Ok, let’s jump right in. The following code snippet shows how to connect to a RADOS cluster:
local rados = require "rados"
local cluster = rados.create()
cluster:conf_read_file()
cluster:connect()
Next, open a client I/O context for a particular pool:
local ioctx = cluster:open_ioctx('data')
Now the Lua client can interact with objects, such as setting an extended attribute:
local name = 'xattr key'
local data = 'i am some important data'
ioctx:setxattr('my_obj', name, data, #data)
Those are the basics of writing RADOS clients in Lua. Now, let’s run some remote scripts from a Lua client.
Cls-Lua Client #
The protocol for sending a script to an OSD is fairly simple, but is easily wrapped up in a convenience library. The cls-lua-client library does just that, building on top of the lua-rados library described in the previous section. Assuming that we have connected to a RADOS cluster and constructed an I/O context object, a remote Lua script can be executed as in the following example. First, let’s create a Lua string containing the script we want to execute.
local script = [[
function say_hello(input, output)
output:append("Hello, ")
if #input == 0 then
output:append("world")
else
output:append(input:str())
end
output:append("!")
end
objclass.register(say_hello)
]]
The script above will send to its output the string “Hello, world!” if the
input is zero-length. Otherwise, it will reply with “Hello, input
!”, where
input
is substituted with the input sent from the client. This can be
remotely executed using the cls-lua-client library as follows:
local ret, outdata = clslua.exec(ioctx, "oid", script, "say_hello", "")
print(outdata)
local ret, outdata = clslua.exec(ioctx, "oid", script, "say_hello", "John")
print(outdata)
Executing this would produce the output:
Hello, world!
Hello, John!
Great, now we have all the pieces to start building a sample application!
Example Application: Image Thumbnail Service #
As a driving example we will construct a service on top of RADOS that stores and generates image thumbnails. The service is very simple, and has the following properties.
- Writing an image into an object sets the “base” or “original” image data.
- A thumbnail computed from the base image can be generated remotely inside the OSD.
- The original image and any generated thumbnail can be retrieved.
In the following examples I’ll demonstrate the core of the service. In practice these routines would be added to a larger project or executable, and of course made more robust against errors and different edge case scenarios. A fully functional example of this can be found in the cls-lua-client project on github.
Storing an Image #
To store an image in RADOS we first read it from a local file, and then write it to the object. In order to support storage and retrieval of different thumbnails, we record the location and size of an image blob in the object index under a key describing it. In this simple example writing an image sets its base image, so we store it under the key “original”.
function put(object, filename)
-- read in image blob from file
local file = io.open(filename, "rb")
local img = file:read("*all")
-- write the blob into the object
local size, offset = #img, 0
ioctx:write(object, img, size, offset)
-- record size/offset in the object index
local loc_spec = size .. "@" .. offset
ioctx:omapset(object, {
original = loc_spec,
})
end
Reducing Round-trips #
In the previous example two round-trips were required to 1) set the object data and 2) update the index. These can be done atomically in a single round-trip by using a co-designed interface, demonstrated in the following script:
function put_smart(object, filename)
-- define the script to execute remotely
local script = [[
function put(img)
-- write the input blob
local size, offset = #img, 0
objclass.write(offset, size, img)
-- update the leveldb index
local loc_spec_bl = bufferlist.new()
local loc_spec = size .. "@" .. offset
loc_spec_bl:append(spec)
objclass.map_set_val("original", loc_spec_bl)
end
objclass.register(store)
]]
-- read the input image blob from the file
local file = io.open(filename, "rb")
local img = file:read("*all")
-- remotely execute script with image as input
clslua.exec(ioctx, object, script, "put", img)
end
The script reads the image from the file and sends the image as the input to a script which executes on the OSD, taking care of the write and index update at the same time. Neat!
Retrieving an Image #
To read a particular version of an image we need to look-up the offset and length for the target image blob stored in the object index. In the following example the index look-up and object read are performed remotely, and the image is returned to the client if it exists. In the next section I’ll show how the spec string is stored, but for context it describes the specification for creating a thumbnail (e.g. 500×400 pixels).
function get(object, filename, spec)
local script = [[
function get(input, output)
-- lookup the location of the image given the spec
local loc_spec_bl = objclass.map_get_val(input:str())
local size, offset = string.match(loc_spec_bl:str(), "(%d+)@(%d+)")
-- read and return the image blob from the object
out_bl = objclass.read(offset, size)
output:append(out_bl:str())
end
objclass.register(get)
]]
-- execute script remotely
ret, img = clslua.exec(ioctx, object, script, "get", spec)
-- write image to output file
local file = io.open(filename, "wb")
file:write(img)
end
The image returned from the script is then written to the output file.
Generating Thumbnails #
Thumbnails are generated using Lua wrappers to ImageMagick available on github
at https://github.com/leafo/magick. A thumbnail is generated using the
magick.thumb
function, passing in an image blob and a thumbnail specification
string (e.g. 500×300 pixels). The script that runs remotely first reads the
original image, computes the thumbnail, appends the thumbnail to the object
payload, and then records the offset and size of the thumbnail in the object
index under a key equal to the specification string.
function thumb(object, spec_string)
local script = [[
(*local magick = require "magick"
function get_orig_img()
-- lookup the location of the original image
local loc_spec_bl = objclass.map_get_val("original")
local size, offset = string.match(loc_spec_bl:str(), "(%d+)@(%d+)")
-- read image into memory
return objclass.read(offset, size)
end
function thumb(input, output)
-- apply thumbnail spec to original image
local spec_string = input:str()
local blob = get_orig_img()
local img = assert(magick.load_image_from_blob(blob:str()))
img = magick.thumb(img, spec_string)
-- append thumbnail to object
local obj_size = objclass.stat()
local img_bl = bufferlist.new()
img_bl:append(img)
objclass.write(obj_size, #img_bl, img_bl)
-- save location in leveldb
local loc_spec = #img_bl .. "@" .. obj_size
local loc_spec_bl = bufferlist.new()
loc_spec_bl:append(loc_spec)
objclass.map_set_val(spec_string, loc_spec_bl)
end
objclass.register(thumb)*)
]]
clslua.exec(ioctx, object, script, "thumb", spec_string)
end
And that’s it folks… on-the-fly custom RADOS object interfaces! Want to contribute? We are continually improving the Lua bindings and the internal Lua object class API and are always looking for feedback. Thanks for stopping by!