Hadoop on Ceph: diving in

It has been possible for several years now to run Hadoop on top of the Ceph file system using a shim layer that maps between the HDFS abstraction and the underlying Ceph file interface. Since then bug fixes and performance enhancements have found their way into the shim, but usability has remained a sore area primarily due to the lack of documentation, and low-level setup required in many instances. This post marks the beginning of a series of posts on using Hadoop on top of Ceph.

These posts are not meant to be a substitute for the official documentation. But official documentation doesn’t exist, you say? Well, that’s the journey I’m starting now, the culmination of which will hopefully be a set of easy to use tools and documentation.

It’s been a while since I setup Hadoop on top of Ceph. So let’s jump into the deep end and see what happens. In these posts I won’t dive much (if at all) into the installation details of Ceph. For that I’ll refer the reader to https://docs.ceph.com/ which contains excellent documentation on getting started with the Ceph storage system. The Ceph file system is still in development, and its documentation is sparse and in flux, but the official Ceph documentation has enough information to get you up and running with the file system. I’ll generally just post the version of software and configuration of the Ceph file system, and highlight anything of specific importance.

Ceph Installation #

Here’s the basic setup. Everything is done with ceph-deploy.

Fedora Server 22
Ceph development installation
- ceph version 9.0.1-1455-g6612058 (661205899be3e3f7b53682a3b87c1340e395b62e)
Single host, 1 monitor, 1 OSD

Ceph is installed, and ceph status reports:

[nwatkins@localhost cluster]$ sudo ceph status
    cluster e3f30405-0b81-4c1c-a811-91b68b076644
     health HEALTH_OK
     monmap e1: 1 mons at {localhost=[::1]:6789/0}
            election epoch 2, quorum 0 localhost
     osdmap e5: 1 osds: 1 up, 1 in
      pgmap v7: 64 pgs, 1 pools, 0 bytes data, 0 objects
            6715 MB used, 31669 MB / 38385 MB avail
                  64 active+clean

The Ceph file system consists of one metadata pool and one data pool:

[nwatkins@localhost cluster]$ ceph mds stat
e5: 1/1/1 up {0=localhost=up:active}

[nwatkins@localhost cluster]$ ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

The current Ceph installation has a single administrative user. We’ll focus on this particular configuration and examine using custom users later, including the implications on Hadoop configuration.

Hadoop Setup #

I’m following the basic single-node Hadoop setup instructions.

java-1.8.0-openjdk
Hadoop 2.7.1 binary
set JAVA HOME to /usr/lib/jvm/jre

Right out of the box Hadoop should has some basic functionality. fs -ls lists the current directory on the local file system. The goal of this post is to setup Hadoop so that this command will list directory contents in the Ceph file system.

[nwatkins@localhost hadoop-2.7.1]$ bin/hadoop fs -ls
Found 10 items
-rw-r--r--   1 nwatkins nwatkins      15429 2015-06-29 00:15 LICENSE.txt
-rw-r--r--   1 nwatkins nwatkins        101 2015-06-29 00:15 NOTICE.txt
-rw-r--r--   1 nwatkins nwatkins       1366 2015-06-29 00:15 README.txt
drwxr-xr-x   - nwatkins nwatkins       4096 2015-06-29 00:15 bin
drwxr-xr-x   - nwatkins nwatkins         19 2015-06-29 00:15 etc
drwxr-xr-x   - nwatkins nwatkins        101 2015-06-29 00:15 include
drwxr-xr-x   - nwatkins nwatkins         19 2015-06-29 00:15 lib
drwxr-xr-x   - nwatkins nwatkins       4096 2015-06-29 00:15 libexec
drwxr-xr-x   - nwatkins nwatkins       4096 2015-06-29 00:15 sbin
drwxr-xr-x   - nwatkins nwatkins         29 2015-06-29 00:15 share

In a Hadoop setup that runs on top of HDFS this would be the point at which HDFS is setup, and Hadoop is configured to run on top of HDFS. For instance, a basic Hadoop tutorial will use a configuration similar to this:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

Instead of point Hadoop at HDFS, we want to point it at Ceph.

Hadoop/Ceph Setup #

Add the following to the core-site.xml Hadoop configuration file. The fs.defaultFS generally should point at a Ceph monitor with the default Ceph port. There are a variety of configuration options, but this is common. The other configuration properties are used to tell Hadoop the classpath of the shim layer.

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>ceph://localhost:6789/</value>
  </property>

  <!-- default 1.x implementation -->
  <property>
    <name>fs.ceph.impl</name>
    <value>org.apache.hadoop.fs.ceph.CephFileSystem</value>
  </property>

  <!-- default implementation -->
  <property>
    <name>fs.AbstractFileSystem.ceph.impl</name>
    <value>org.apache.hadoop.fs.ceph.CephFs</value>
  </property>
</configuration>

If we repeat the fs -ls command Hadoop complains that it cannot find the Ceph file system shim.

[nwatkins@localhost hadoop-2.7.1]$ bin/hadoop fs -ls
-ls: Fatal internal error
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.ceph.CephFileSystem not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)

The shim layer can be obtained from https://github.com/ceph/cephfs-hadoop.git. Once cloned, run mvn package. This hits us with a new error which basically says that the Java bindings for libcephfs cannot be found.

Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.552 sec <<< FAILURE!
org.apache.hadoop.fs.test.unit.HcfsUmaskTest  Time elapsed: 0.551 sec  <<< ERROR!
java.lang.UnsatisfiedLinkError: no cephfs_jni in java.library.path
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1865)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1122)
        at com.ceph.fs.CephNativeLoader.<clinit>(CephNativeLoader.java:28)
        at com.ceph.fs.CephMount.<clinit>(CephMount.java:88)

That’s an easy fix. Install with ceph-deploy:

ceph-deploy pkg --install cephfs-java localhost

That will put some bits in the local file system:

/usr/lib64/libcephfs_jni.so.1
/usr/lib64/libcephfs_jni.so.1.0.0
/usr/share/java/libcephfs-test.jar
/usr/share/java/libcephfs.jar

Unfortunately Maven will not look here. So copy the installed file libcephfs.jar into src/main/resources in the cephfs-hadoop tree. Run the Maven package command again, but skip the tests (this is additional setup we don’t want to deal with right now):

mvn -Dmaven.test.skip=true package

The build should now succeed:

[nwatkins@localhost cephfs-hadoop]$ ls -l target/
total 120
drwxrwxr-x. 3 nwatkins nwatkins  4096 Jul 12 20:32 apidocs
-rw-rw-r--. 1 nwatkins nwatkins 30369 Jul 12 20:31 cephfs-hadoop-0.80.6.jar
-rw-rw-r--. 1 nwatkins nwatkins 53428 Jul 12 20:32 cephfs-hadoop-0.80.6-javadoc.jar
-rw-rw-r--. 1 nwatkins nwatkins 26455 Jul 12 20:31 cephfs-hadoop-0.80.6-sources.jar
drwxrwxr-x. 3 nwatkins nwatkins    36 Jul 12 20:31 classes
drwxrwxr-x. 2 nwatkins nwatkins    69 Jul 12 20:32 javadoc-bundle-options
drwxrwxr-x. 2 nwatkins nwatkins    27 Jul 12 20:31 maven-archiver
drwxrwxr-x. 3 nwatkins nwatkins    34 Jul 12 20:31 maven-status

Now that we have the shim layer, we need to add it to the Hadoop CLASSPATH. Copy over target/libcephfs-hadoop-0.80.6.jar from cephfs-hadoop into hadoop/lib, and do the same for the installed libcephfs.jar file located at /usr/share/java/libcephfs.jar. Now setup the environment so Hadoop will pick up these dependencies.

export HADOOP_CLASSPATH=$HOME/hadoop-2.7.1/lib/cephfs-hadoop-0.80.6.jar
export HADOOP_CLASSPATH=$HOME/hadoop-2.7.1/lib/libcephfs.jar:$HADOOP_CLASSPATH

Unfortunately Hadoop doesn’t always look at the system directories for these dependencies. Running again you may see something like this which is saying that the native (i.e. C++) bits can’t be found.

[nwatkins@localhost hadoop-2.7.1]$ bin/hadoop fs -ls
Exception in thread "main" java.lang.UnsatisfiedLinkError: no cephfs_jni in java.library.path
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1865)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1122)
        at com.ceph.fs.CephNativeLoader.<clinit>(CephNativeLoader.java:28)
        at com.ceph.fs.CephMount.<clinit>(CephMount.java:97)

To resolve this, copy the /usr/lib64/libcephfs_jni.* into hadoop/lib/native.

natkins@localhost hadoop-2.7.1]$ ls -l /usr/lib64/libcephfs_jni*
lrwxrwxrwx. 1 root root     22 Jul 10 16:20 /usr/lib64/libcephfs_jni.so.1 -> libcephfs_jni.so.1.0.0
-rwxr-xr-x. 1 root root 106544 Jul 10 16:39 /usr/lib64/libcephfs_jni.so.1.0.0

The Java bindings for libcephfs are setup to look for libcephfs_jni.so, so add a symbolic link:

[nwatkins@localhost native]$ ln -s libcephfs_jni.so.1.0.0 libcephfs_jni.so
[nwatkins@localhost native]$ ls -l
total 5144
lrwxrwxrwx. 1 nwatkins nwatkins      22 Jul 12 21:01 libcephfs_jni.so -> libcephfs_jni.so.1.0.0
-rwxr-xr-x. 1 nwatkins nwatkins  106544 Jul 12 21:01 libcephfs_jni.so.1
-rwxr-xr-x. 1 nwatkins nwatkins  106544 Jul 12 21:01 libcephfs_jni.so.1.0.0

Now we should be all set. Run fs -ls again:

[nwatkins@localhost hadoop-2.7.1]$ bin/hadoop fs -ls
ls: `.': No such file or directory

Hadoop will default to looking at the user’s home directory, but this doesn’t yet exist. We can stick some stuff in the root directory to verify:

[nwatkins@localhost hadoop-2.7.1]$ bin/hadoop fs -ls /
[nwatkins@localhost hadoop-2.7.1]$ bin/hadoop fs -put etc/hadoop/core-site.xml /
15/07/12 21:03:01 INFO ceph.CephFileSystem: selectDataPool path=ceph://localhost:6789/core-site.xml._COPYING_ pool:repl=cephfs_data:1 wanted=3
[nwatkins@localhost hadoop-2.7.1]$ bin/hadoop fs -ls /
Found 1 items
-rw-r--r--   1 nwatkins       4269 2015-07-12 21:03 /core-site.xml

Success. Now to make that a hell of a lot easier :)