vDSO (virtual dynamic shared object) is a virtual file provided by kernel that maps some syscalls to userspace so that they can be called without a context switch. Eg: gettimeofday, clock_gettime, etc.

The man page for this is very well written.

Why does the vDSO exist at all?

There are some system calls the kernel provides that user-space code ends up using frequently, to the point that such calls can dominate overall performance. This is due both to the frequency of the call as well as the context-switch overhead that results from exiting user space and entering the kernel.

Example

Making system calls can be slow. In x86 32-bit systems, you can trigger a software interrupt (int $0x80) to tell the kernel you wish to make a system call. However, this instruction is expensive: it goes through the full interrupt-handling paths in the processor’s microcode as well as in the kernel. Newer processors have faster (but backward incompatible) instructions to initiate system calls. Rather than require the C library to figure out if this functionality is available at run time, the C library can use functions provided by the kernel in the vDSO and let the kernel handle the details of which instruction to use.

One frequently used system call is gettimeofday(2). This system call is called both directly by user-space applications as well as indirectly by the C library. Think timestamps or timing loops or polling — all of which frequently need to know what time it is right now. This information is also not secret — any application in any privilege mode (root or any unprivileged user) will get the same answer. Thus the kernel arranges for the information required to answer this question to be placed in memory the process can access. Now a call to gettimeofday(2) changes from a system call to a normal function call and a few memory accesses.

So does every user space program has the vsdo? What’s the cost?

Yes, it is mapped into every user-space process by the kernel at runtime. The cost of this mapping both in memory and performance is negligible so that’s not a concern. Moreover, it’s not a per-process mapping.

Similar to shared libraries, the code (text) sections are mapped read-only and shared, so there is only one set of physical pages, regardless of how many processes use them. Even though each process sees the vDSO at a different address, the actual physical memory that contains the vDSO code is the same for all processes and the kernel ensures this.