5.4 KiB
Buffer Pools & Memory
Buffer Pool Manager
The buffer pool is a memory region organized as an array of fixed-size pages. An array entry is called a frame.
When a DBMS requests a page, an exact copy is placed into one of these frames.
The page table keeps track of pages that are currently in memory. It also maintains additional metadata per page:
- Dirty Flag
- Pin/Reference Counter
Page Table vs Page Directory
The page directory is the mapping from page ids to page locations in the database file. All changes must be recorded on disk to allow the DBMS to find the data on restart.
The page table is the mapping from page ids to a copy og the page in the buffer pool frames. This is an in-memory data structure that does not need to be stored on disk.
Allocation Polices
Global Policies make decisions for all active transactions.
Local Policies allocate frames for specific transactions without considering behaviour of concurrent transactions. This still need to support sharing pages.
Buffer Pool Optimizations
Multiple Buffer Pools
The DBMS does not always have a single buffer pool for the entire system:
- multiple buffer pool instances
- per-database buffer pool
- per-page type buffer pool
Multiple buffer pools help to reduce latch contention and improve locality.
There are multiple approaches to manage multiple buffer pools:
- Object Id: embed an object id in record ids and then maintain from objects to specific buffer pools.
- Hashing: hash the page id to select which buffer pool to access.
Pre-Fetching
The DBMS can also prefetch pages based on a query plan in order to reduce the wait between the request of a page end its load in memory.
Scan Sharing
Queries can reuse data retrieved from storage or operator computations. This is done by allowing multiple queries to attach to a single cursor that scans a table. It's also possible to share intermediate results.
If a query stats a scan and if there is one already doing the same operation, then the DBMS will attach the new cursor to the second query cursor.
The DBMS keeps track of where the second query joined the first so that it can retrieve the pages that were already scanned by the first query.
Note
: Scan Sharing is different from Output Caching.
Buffer Pool Bypass
The sequential scan will not store fetched pages in the buffer pool to avoid overhead. The used memory is local to the query thread and will be discarded as the scan finishes. This avoids the overhead of going to the page table and of evicting pages if the buffer is full.
OS Page Cache
Most disk operations go through the OS API. Unless the DBMS tells it to, the OS maintains its own filesystem cache.
Most DBMSs use direct I/O (O_DIRECT
) to bypass the OS's cache to avoid redundant copies of pages and the loss of control over I/O, since the DBMS has different eviction policies.
Page Replacement Policies
When the DBMS needs to free up a frame to make space for a new page, it must decide which page to evict from the buffer pool.
Goals:
- Correctness: do not evict pinned data.
- Accuracy: evict rarely used pages.
- Speed
- Metadata Overhead
LRU (Last Recently Used)
Maintain a timestamp of when each page was last accessed. When the DBMS needs to evict a page, select the one with the oldest timestamp.
Note
: Keep the pages in sorted order to reduce search time on eviction
CLOCK
Approximation of LRU without needing a separate timestamp per page.
Each page has a reference bit. When the page is accessed the bit is set to 1.
The pages are organised in a circular buffer with a "clock hand". Upon sweeping, check if a page's bit is set to 1: toggle if is 1, evict if is zero.
Note
: LRU and CLOCK replacement policies are susceptible to sequential flooding. A sequential scan that reads every page pollutes the buffer pool with pages that are read only once. In this case the most recently used page is actually the most unneeded one.
LRU-K
Track the history of last K references to each page as timestamps and compute the interval between subsequent access.
The DBMS then uses this history to estimate the next time that page is going to be accessed.
Localization
The DBMS chooses which page to evict on a transaction/query basis. This minimizes the pollution of the pool from each query.
Priority Hints
The DBMS knows the context of each page during query execution. It can provide hints to the buffer pool on whether a page is important or not.
Dirty Pages
Fast: if a page in the buffer pool is not dirty, then the DBMS can simply drop it. Slow: if a page is dirty, then the DBMS must write back to disk to ensure that its changes are persisted.
It's a tradeoff between fast evictions versus writing dirty pages that will not be read again in the future.
Note
: a page is dirty if a query has modified it while it's in the buffer pool (not yet written to disk).
Background Writing
The DBMS can periodically walk through the page table and write dirty pages to disk. When a dirty page is safely written, the DBMS can either evict it or just unset the dirty flag.
Note
: need to be careful to avoid writing dirty pages before their log records have been written.