Date: January 19, 2025

Topic: RPC and Client-Server Systems

Recall

Performing an RPC is a lot more costly due to overheads like context switches and trapping, which requires the kernels involvement.

It also runs at runtime instead of compile time like a normal procedure call.

Notes

RPC and Client-Server Systems

Safety: We want clients and servers to be in different address spaces / different protection domains
- This causes a hit in performance as RPC has to go across address spaces (client in 1 space and server in another)
Performance: We want to make RPC calls as efficient as a normal procedure call

RPC vs Simple Procedure Call

Normal Procedure Call

A caller calls a callee within the same address space
- Arguments are already set up in the stack and kernel doesn’t have to be involved
Everything is being done at compile time

Remote Procedure Call

Making a call from the client:
- When the caller makes a call, it actually traps into the kernel (call trap)
- The kernel validates the call and copies arguments of the call into the kernel buffers from the client address space
- The kernel then locates the server procedure that needs to be executed, copies the arguments in the kernel buffer into the server address space
- Then, the kernel schedules the server to run the procedure
When the server is done, it needs to return results back to the client:
- The server traps into the kernel (return trap)
- The kernel copies the results from the server’s address space into kernel buffers
- This is then copied out to the client’s address space
Now we have completed the entire loop and the kernel can reschedule the client who can then receive the results and continue executing
Everything is being done at runtime which is a source of performance hit
A lot of overhead at runtime in addition to the server procedure
- Two traps (call and return)
  - System needs to validate the call and copy arguments from client’s address space into kernel buffers
- Two context switches (switch from client to server to run server procedure, then server to client after server procedure is done)
  - Explicit and implicit costs of context switching overhead
- Need to schedule the server in order to run server code

Lightweight RPC: Kernel Copies Quiz

In an RPC (client call - server execution - return results to the client), how many times does the kernel copy "stuff" from user address spaces into the kernel and vice-versa?

4 times (client → kernel, kernel → server, server → kernel, kernel → client)

Making a complete RPC call requires 8 copying steps and interactions between the user level and kernel level, resulting in a lot of overheads compared to a normal procedure call.

Copying Overhead

This copying happens every time we have a call return between the client and the server
Kernel has no idea of the semantics of the arguments between the client and server but has to be its intermediary
First Copy (User Level): When the client makes a call, we use a client stub
- Takes the arguments of the client and makes an RPC packet out of it
- RPC packet serializes the data structures being passed as arguments as continuous bytes
Second Copy (Kernel Level): Client then traps into the kernel
- Kernel copies the message from user address space (client) into the kernel buffer
Third Copy (Kernel Level): Kernel schedules server in server domain
- Kernel then copies its buffer into the server domain
Fourth Copy (User Level): Deserialize packet into server’s expected arguments using server stub
Going client-server-client thus requires 8 copies in total, much more than a normal procedure call

<aside> 📌 SUMMARY: By using LRPC, we can eliminate some overheads in the original RPC implementation. This is done with a setup phase which requires user-kernel crossings to set up the PD, A-stack and BO. After this is done, future communications are validated from Client using the BO into the Kernel, but data communication can be just between the Client and Server. This reduces copying from 4 times to 2 times from Client to Server. In SMP, LRPC can be further extended by dedicating certain CPUs to just be servers. This allow the caches to be warm and thus reduces the impact from loss of locality.

</aside>