[lrpc.pdf]
The DASH system [18] eliminates an intermediate kernel copy by allocating messages out of a region specially mapped into both kernel and user domains.
""Multiple processors are used to reduce LRPC latency by caching domain
contexts on idle processors. As we show in Section 4, the context switch that
occurs during an LRPC is responsible for a large part of the transfer time. This
time is due partly to the code required to update the hardware’s virtual memory
registers and partly to the extra memory fetches that occur as a result of
invalidating the translation lookaside buffer (TLB).
LRPC reduces context-switch overhead by caching domains on idle processors.
When a call is made, the kernel checks for a processor idling in the context of
the server domain. If one is found, the kernel exchanges the processors of the
calling and idling threads, placing the calling thread on a processor where the
context of the server domain is already loaded; the called server procedure can
then execute on that processor without requiring a context switch. The idling
thread continues to idle, but on the client’s original processor in the context of
the client domain. On return from the server, a check is also made. If a processor
is idling in the client domain (likely for calls that return quickly), then the
processor exchange can be done again.""
3. CLARK, D. D. The structuring of systems using upcalls. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (Orcas Is., Wash., Dec. l-4,1985), ACM, New York, 1985, pp. 171-180. [Cla85.pdf]
18. Tzou, S.-Y., AND ANDERSON,D. P. A performance evaluation of the DASH message-passing system. Tech. Rep. UCB/CSD 88/452, Computer Science Division, Univ. of California, Berkeley, Oct. 1988.
