Implement Linux memory policy Add support for Linux memory policy
set_mempolicy(2)
$ man 2 set_mempolicy↗long set_mempolicy(int mode, const unsigned long *nodemask, unsigned long maxnode);This function controls which NUMA node a process's memory should be allocated to.
NUMA
+-------------------------------------------------------------+
| Multi-Socket Server |
+-----------------------------+-------------------------------+
| Node 0 | Node 1 |
| +---------+ +---------+ | +---------+ +---------+ |
| | CPU 0 | | CPU 1 | | | CPU 2 | | CPU 3 | |
| +----+----+ +----+----+ | +----+----+ +----+----+ |
| | | | | | |
| +------+-----+ | +------+-----+ |
| | | | |
| +------+------+ | +------+------+ |
| | Memory 0 |<------+------->| Memory 1 | |
| | (local) | slow | | (local) | |
| +-------------+ | +-------------+ |
+-----------------------------+-------------------------------+Modes
| Mode | Behavior |
|---|---|
| MPOL_DEFAULT | System default: Local memory preference |
| MPOL_LOCAL | Allocate memory to the local node of current CPU |
| MPOL_PREFERRED | Prefer specified node; use other nodes if unavailable |
| MPOL_BIND | Use only the specified node |
| MPOL_INTERLEAVE | Alternate allocation between specified nodes |
| MPOL_PREFERRED_MANY | Prioritize multiple nodes |
| MPOL_WEIGHTED_INTERLEAVE | Alternate allocation with weighted priorities |
MPOL_BIND
{
"memoryPolicy": {
"mode": "MPOL_BIND",
"nodes": "0,1"
}
}Memory allocation behavior:
- Node 0: Available
- Node 1: Available
- Node 2: Unavailable → Triggers OOM if memory is exhausted
MPOL_INTERLEAVE
{
"memoryPolicy": {
"mode": "MPOL_INTERLEAVE",
"nodes": "0,1"
}
}Memory allocation behavior:
- Page 1 → Node 0
- Page 2 → Node 1
- Page 3 → Node 0
- Page 4 → Node 1
- ... (alternating allocation)
Use case: Distributes large datasets across multiple nodes to optimize bandwidth utilization
MPOL_PREFERRED
{
"memoryPolicy": {
"mode": "MPOL_PREFERRED",
"nodes": "0"
}
}Memory allocation behavior:
- First attempts allocation on Node 0
- If Node 0 is full, fallback to Node 1
- Unlike BIND, this does not result in an error
Flags
| Flag | Effect |
|---|---|
| MPOL_F_STATIC_NODES | Maintains node numbers even when changing cpusets |
| MPOL_F_RELATIVE_NODES | Interprets node numbers as relative positions within cpusets |
| MPOL_F_NUMA_BALANCING | Applicable only to BIND. The kernel monitors access patterns and automatically migrates pages |
Validation
Validation is quite challenging :P
Flag-related notes
MPOL_F_NUMA_BALANCINGcan only be used withMPOL_BINDMPOL_F_STATIC_NODESandMPOL_F_RELATIVE_NODEScannot be used simultaneously
Mode-related notes
DEFAULT: nodes must be empty, flags must be emptyLOCAL: nodes must be empty, flags must be emptyPREFERRED+ empty nodes:STATIC/RELATIVEflags are prohibitedBIND: nodes must have one or more entriesINTERLEAVE: nodes must have one or more entriesPREFERRED_MANY: nodes must have one or more entriesWEIGHTED_INTERLEAVE: nodes must have one or more entries