This post is a quick tour of the life cycle of an OpRequest in the
Ceph/RADOS storage server. We’ll follow the request from the time the generic
message arrives off the network, to the point that the resulting transaction
for an object operation hits the low-level object store layer as a
transaction.
The Messenger handles connections and generic messages. A message will be
dispatched to any registered dispatchers via the ms_dispatch virtual method
on the Dispatcher interface. The OSD class implements the Dispatcher
interface. There are two high-level asynchronous traces described below. The
first is the process of receiving, preparing, and queueing a request. The
second is from the perspective of separate worker threads that dequeue
requests to be processed.
Message Dispatch and Request Queuing
The trace begins when a message is dispatched to the OSD:
- bool OSD::ms_dispatch(Message *m)
- src/osd/OSD.cc:4720
There are two paths that can be taken, both of which will arrive at
OSD::dispatch_op.
- void OSD::_dispatch(Message *m)
- Construct a new OpRequest
- src/osd/OSD.cc:4937
- void OSD::do_waiters()
- Grab an existing OpRequest
- src/osd/OSD.cc:4840
Both _dispatch and do_waiters will then process a request:
- void OSD::dispatch_op(OpRequestRef op)
- src/osd/OSD.cc:4857
- void OSD::handle_op(OpRequestRef op)
- src/osd/OSD.cc:7352
- void OSD::enqueue_op(PG *pg, OpRequestRef op)
- src/osd/OSD.cc:7546
- void PG::queue_op(OpRequestRef op)
- src/osd/PG.cc:1707
The request is now living on a queue waiting to be picked up by a worker:
Request Processing
The rough flow:
- struct OpWQ: public ThreadPool::WorkQueueVal<pair<PGRef, OpRequestRef>, PGRef >
- src/osd/OSD.h:1101
- void OSD::OpWQ::_process(PGRef pg, ThreadPool::TPHandle &handle)
- src/osd/OSD.cc:7604
- void OSD::dequeue_op(PGRef pg, OpRequestRef op, ThreadPool::TPHandle &handle)
- src/osd/OSD.cc:7643
- void ReplicatedPG::do_request(OpRequestRef op, ThreadPool::TPHandle &handle)
- src/osd/ReplicatedPG.cc:1080
- void ReplicatedPG::do_op(OpRequestRef op)
- src/osd/ReplicatedPG.cc:1191
- void ReplicatedPG::execute_ctx(OpContext *ctx)
- src/osd/ReplicatedPG.cc:1706
The following sub-trace shows the path taken to the actual logic behind a
RADOS client write operation. All other client operations can be found down
this path as well. For instance, CEPH_OSD_OP_WRITE is sibling to all other
client operations in a large switch statement in do_osd_ops.
- int ReplicatedPG::prepare_transaction(OpContext *ctx)
- src/osd/ReplicatedPG.cc:5055
- int ReplicatedPG::do_osd_ops(OpContext *ctx, vector
& ops) - src/osd/ReplicatedPG.cc:2921
- case CEPH_OSD_OP_WRITE
- src/osd/ReplicatedPG.cc:3650
The accumulated transaction is submitted in issue_repop that will then call
submit_transaction on the configured PGBackend (e.g. replication or erasure
coding). The backend will communicate with replicas as well as run the
transaction against the local object store.
- void ReplicatedPG::issue_repop(RepGather *repop, utime_t now)
- src/osd/ReplicatedPG.cc:6660
- virtual void submit_transaction(
- src/osd/PGBackend.h:490
The local object store (e.g. FileStore or BlueStore) is what manages the
underlying storage hardware.