What is NUMA and how can I configure my cluster to use it?
What is NUMA?
From the hardware perspective, a NUMA system is a computer platform that comprises multiple components or assemblies each of which may contain 0 or more CPUs, local memory, and/or IO buses. For brevity and to disambiguate the hardware view of these physical components/assemblies from the software abstraction thereof, we’ll call the components/assemblies "nodes".
Memory access time and effective memory bandwidth vary depending on how far away the nodes containing the CPU or IO bus making the memory access is from the nodes containing the target memory. For example, access to memory by CPUs attached to the same node will experience faster access times and higher bandwidths than accesses to memory on other, remote nodes. NUMA platforms can have nodes at multiple remote distances from any given node.
Generally, SingleStore is deployed on NUMA systems by deploying one SingleStore leaf node per NUMA socket, with each SingleStore node numa-bound to a single socket with numactl. This is very important for performance - running a SingleStore leaf node across multiple NUMA sockets will greatly increase memory latency and hurt performance. Aggregator nodes generally are not and do not need to be NUMA-bound.
NUMA support is by numa-binding each memsqld process with numactl. The engine has no special awareness of NUMA - it simply uses the CPU and memory resources assigned to it. This numactl configuration is done outside the engine - usually in Ops or sdb-admin.
SingleStore Support strongly recommends running one SingleStore leaf per NUMA socket - if you have a leaf running across multiple NUMA sockets, you will usually suffer significant performance penalties. The SingleStore nodes on each socket operate separately, just like SingleStore nodes on different hosts - there are no special optimizations for sharing storage or query execution cross-NUMA-socket on a host.
When deploying Singlestore via Ops or sdb-admin, these tools will automatically configure Singlestore for NUMA, if numactl is installed on the system. If numactl is not installed on the system, no NUMA-aware configuration is done (i.e. Ops will just deploy one leaf per host).
There are no warnings about missing or improper NUMA configuration from either Engine, Ops, or sdb-admin.
Getting NUMA information from a machine:
numactl --hardware provides the information needed, but it is in a format that is not easily machine-read. You can also check /sys/devices/system/node/online; this is a special file created by the kernel that indicates the presently-online NUMA nodes. In one sandbox testing environment, our support team had its contents set to "0-1", indicating that there were two NUMA nodes on that machine.
In the following example, there is only one node on this host:
master-agg-and-leaf-ip-10-0-3-76 /home/admin $ numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 64550 MB
node 0 free: 2693 MB
master-agg-and-leaf-ip-10-0-3-76 /home/admin $ free -m
total used free shared buffers cached
Mem: 64550 61857 2692 16 173 21875
-/+ buffers/cache: 39808 24741
Swap: 0 0 0
master-agg-and-leaf-ip-10-0-3-76 /home/admin $ cat /sys/devices/system/node/online
Configuration of NUMA:
To simplify the configuration process, the SingleStore DB management tools (sdb-admin, memsqlctl, etc.) can automatically detect if a host machine has multiple NUMA nodes and then configure SingleStore DB with numactl to bind individual SingleStore DB nodes to NUMA nodes.
Customized NUMA configuration:
If necessary, you can always customize NUMA setups by creating or editing <SingleStore install dir>/data/numactl.cnf. That file contains the arguments we pass to numactl, like
If you edit that file and restart the SingleStore node, it will start up with the specified numactl settings. (numactl must be installed).
For example, you could numa-bind aggregator nodes this way if needed.