OpRequest flow in RADOS OSD server
This post is a quick tour of the life cycle of an OpRequest
in the
Ceph/RADOS storage server. We’ll follow the request from the time the generic
message arrives off the network, to the point that the resulting transaction
for an object operation hits the low-level object store layer as a
transaction.
The Messenger
handles connections and generic messages. A message will be
dispatched to any registered dispatchers via the ms_dispatch
virtual method
on the Dispatcher
interface. The OSD
class implements the Dispatcher
interface. There are two high-level asynchronous traces described below. The
first is the process of receiving, preparing, and queueing a request. The
second is from the perspective of separate worker threads that dequeue
requests to be processed.
Message Dispatch and Request Queuing #
The trace begins when a message is dispatched to the OSD:
- bool OSD::ms_dispatch(Message *m)
- src/osd/OSD.cc:4720
There are two paths that can be taken, both of which will arrive at
OSD::dispatch_op
.
- void OSD::_dispatch(Message *m)
- Construct a new OpRequest
- src/osd/OSD.cc:4937
- void OSD::do_waiters()
- Grab an existing OpRequest
- src/osd/OSD.cc:4840
Both _dispatch
and do_waiters
will then process a request:
- void OSD::dispatch_op(OpRequestRef op)
- src/osd/OSD.cc:4857
- void OSD::handle_op(OpRequestRef op)
- src/osd/OSD.cc:7352
- void OSD::enqueue_op(PG *pg, OpRequestRef op)
- src/osd/OSD.cc:7546
- void PG::queue_op(OpRequestRef op)
- src/osd/PG.cc:1707
The request is now living on a queue waiting to be picked up by a worker:
Request Processing #
The rough flow:
- struct OpWQ: public ThreadPool::WorkQueueVal<pair<PGRef, OpRequestRef>, PGRef >
- src/osd/OSD.h:1101
- void OSD::OpWQ::_process(PGRef pg, ThreadPool::TPHandle &handle)
- src/osd/OSD.cc:7604
- void OSD::dequeue_op(PGRef pg, OpRequestRef op, ThreadPool::TPHandle &handle)
- src/osd/OSD.cc:7643
- void ReplicatedPG::do_request(OpRequestRef op, ThreadPool::TPHandle &handle)
- src/osd/ReplicatedPG.cc:1080
- void ReplicatedPG::do_op(OpRequestRef op)
- src/osd/ReplicatedPG.cc:1191
- void ReplicatedPG::execute_ctx(OpContext *ctx)
- src/osd/ReplicatedPG.cc:1706
The following sub-trace shows the path taken to the actual logic behind a
RADOS client write
operation. All other client operations can be found down
this path as well. For instance, CEPH_OSD_OP_WRITE
is sibling to all other
client operations in a large switch
statement in do_osd_ops
.
- int ReplicatedPG::prepare_transaction(OpContext *ctx)
- src/osd/ReplicatedPG.cc:5055
- int ReplicatedPG::do_osd_ops(OpContext *ctx, vector
& ops) - src/osd/ReplicatedPG.cc:2921
- case CEPH_OSD_OP_WRITE
- src/osd/ReplicatedPG.cc:3650
The accumulated transaction is submitted in issue_repop
that will then call
submit_transaction
on the configured PGBackend (e.g. replication or erasure
coding). The backend will communicate with replicas as well as run the
transaction against the local object store.
- void ReplicatedPG::issue_repop(RepGather *repop, utime_t now)
- src/osd/ReplicatedPG.cc:6660
- virtual void submit_transaction(
- src/osd/PGBackend.h:490
The local object store (e.g. FileStore
or BlueStore
) is what manages the
underlying storage hardware.