OOM とは
Linux では Out Of Memory (OOM) Killer という機能により、システムのメモリ・スワップを使い尽くすと、メモリを確保する最終手段としてプロセスにシグナルを送信し、強制的にそのプロセスを終了させようとする。
この機能のおかげでメモリを解放できないにもかかわらずメモリ確保の処理が繰り返され、システムが止まってしまうのを避けることが可能。また、メモリを過剰に消費しているプロセスも検知できる。
OOM を発生させる
以下のコードで OOM を発生させる。やってることは単純で 1 GiB の物理メモリを持つシステムで 2 GiB の仮想メモリを確保し、徐々にそのメモリ領域にアクセスする、というもの。
#include <unistd.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>
#define BUFFER_SIZE (2000*1024*1024)
#define NCYCLE 30
#define PAGE_SIZE 4096
int main(void) {
char *p;
time_t t;
char *s;
t = time(NULL);
s = ctime(&t);
printf("%.*s: before memory allocation. please press Enter\n", (int)(strlen(s) - 1), s);
getchar();
p = malloc(BUFFER_SIZE);
if (p == NULL)
err(EXIT_FAILURE, "malloc() failed.");
t = time(NULL);
s = ctime(&t);
printf("%.*s: allocated %d MB. please press Enter\n", (int)(strlen(s) - 1), s, BUFFER_SIZE/(1024*1024));
getchar();
int i;
for (i = 0; i < BUFFER_SIZE; i += PAGE_SIZE) {
p[i] = 0;
int cycle = i / (BUFFER_SIZE/NCYCLE);
if (cycle != 0 && i % (BUFFER_SIZE/NCYCLE) == 0) {
t = time(NULL);
s = ctime(&t);
printf("%.*s: touched %d MB.\n", (int)(strlen(s) - 1), s, i / (1024*1024));
sleep(1);
}
}
t = time(NULL);
s = ctime(&t);
printf("%.*s: touched %d MB. please press Enter\n", (int)(strlen(s) - 1), s, BUFFER_SIZE / (1024*1024));
getchar();
exit(EXIT_SUCCESS);
}
まず、十分な仮想空間を確保するために、オーバーコミットポリシーを OVERCOMMIT_ALWAYS にしておく。
$ sudo bash -c "echo 1 > /proc/sys/vm/overcommit_memory"
$ cat /proc/sys/vm/overcommit_memory
1
で、コンパイルして実行。
$ gcc -o oom oom.c
$ ./oom
Wed Feb 5 00:25:32 2020: before memory allocation. please press Enter
Wed Feb 5 00:27:21 2020: allocated 2000 MB. please press Enter
この段階で、./oom 自身が OOM Killer に殺されないようにしておく。
$ ps aux | grep oom
root 160 0.0 0.0 0 0 ? S 00:15 0:00 [oom_reaper]
ec2-user 3111 0.0 0.0 4360 1004 pts/0 S+ 00:25 0:00 ./oom
ec2-user 3113 0.0 0.1 110492 1800 pts/1 D+ 00:25 0:00 grep --color=auto oom
$ sudo bash -c "echo -17 > /proc/3111/oom_adj"
$ cat /proc/3111/oom_adj
-17
すると、以下のようなログが /var/log/messages から確認でき、OOM Killer によって amazon-ssm-agent が kill されたことがわかる。
Feb 5 00:27:00 ip-10-3-0-32 kernel: [ 662.633469] bash (3116): /proc/3111/oom_adj is deprecated, please use /proc/3111/oom_score_adj instead.
Feb 5 00:27:25 ip-10-3-0-32 kernel: [ 686.245536] oom invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=-1000
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.255460] oom cpuset=/ mems_allowed=0
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.258634] CPU: 0 PID: 3111 Comm: oom Not tainted 4.14.77-70.59.amzn1.x86_64 #1
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.264211] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.268687] Call Trace:
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.271047] dump_stack+0x5c/0x82
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.273943] dump_header+0x94/0x21c
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.276828] ? get_page_from_freelist+0x525/0xba0
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.280290] oom_kill_process+0x213/0x410
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.283462] out_of_memory+0x296/0x4c0
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.286475] __alloc_pages_slowpath+0x9ef/0xdd0
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.290108] __alloc_pages_nodemask+0x207/0x220
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.293489] alloc_pages_vma+0x7c/0x1e0
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.296816] __handle_mm_fault+0x8b8/0x1460
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.300158] ? __switch_to_asm+0x24/0x60
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.303931] handle_mm_fault+0xaa/0x1e0
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.307616] __do_page_fault+0x22e/0x4c0
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.312343] ? page_fault+0x2f/0x50
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.315857] page_fault+0x45/0x50
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.319198] RIP: a93ed0e0: (null)
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.324143] RSP: 0000:00007fffa93ed000 EFLAGS: 00400650
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.483396] 9687 pages reserved
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.486108] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.492291] [ 1569] 0 1569 2857 215 13 3 0 -1000 udevd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.498712] [ 1879] 0 1879 27285 58 21 3 0 0 lvmetad
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.505112] [ 1888] 0 1888 6799 48 16 3 0 0 lvmpolld
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.513038] [ 2095] 0 2095 2353 122 9 3 0 0 dhclient
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.519666] [ 2183] 0 2183 2353 119 10 3 0 0 dhclient
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.526256] [ 2225] 0 2225 75962 2313 33 5 0 0 amazon-ssm-agen
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.533098] [ 2233] 0 2233 13251 97 26 3 0 -1000 auditd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.539231] [ 2258] 0 2258 61858 132 23 3 0 0 rsyslogd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.546069] [ 2280] 0 2280 1630 24 9 3 0 0 rngd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.553607] [ 2298] 32 2298 8841 99 22 3 0 0 rpcbind
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.562336] [ 2319] 29 2319 9983 203 24 4 0 0 rpc.statd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.570198] [ 2353] 81 2353 5461 58 15 3 0 0 dbus-daemon
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.577966] [ 2756] 0 2756 1099 37 8 3 0 0 acpid
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.585345] [ 2791] 38 2791 29129 202 30 3 0 0 ntpd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.591737] [ 2812] 0 2812 22408 431 42 4 0 0 sendmail
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.598350] [ 2821] 51 2821 20272 370 41 4 0 0 sendmail
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.604926] [ 2833] 0 2833 30411 149 17 3 0 0 crond
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.611236] [ 2847] 0 2847 4797 42 14 3 0 0 atd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.618104] [ 2877] 0 2877 1627 31 9 3 0 0 agetty
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.625219] [ 2878] 0 2878 1090 25 8 3 0 0 mingetty
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.633028] [ 2880] 0 2880 1090 24 8 3 0 0 mingetty
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.640639] [ 2882] 0 2882 1090 24 8 3 0 0 mingetty
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.648232] [ 2884] 0 2884 1090 24 8 3 0 0 mingetty
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.655781] [ 2886] 0 2886 1090 24 8 3 0 0 mingetty
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.663595] [ 2888] 0 2888 1090 25 8 3 0 0 mingetty
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.670351] [ 2892] 0 2892 2731 93 12 3 0 -1000 udevd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.677755] [ 2893] 0 2893 2731 93 12 3 0 -1000 udevd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.684895] [ 2901] 0 2901 30011 270 65 3 0 0 sshd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.692963] [ 2903] 501 2903 30011 275 63 3 0 0 sshd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.700402] [ 2904] 501 2904 28851 98 15 3 0 0 bash
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.709589] [ 3019] 0 3019 20148 206 41 3 0 -1000 sshd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.716951] [ 3081] 0 3081 30012 269 62 3 0 0 sshd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.725010] [ 3083] 501 3083 30012 269 60 3 0 0 sshd
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.732538] [ 3084] 501 3084 28853 98 15 3 0 0 bash
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.740800] [ 3111] 501 3111 513091 224852 448 5 0 -1000 oom
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.748446] Out of memory: Kill process 2225 (amazon-ssm-agen) score 9 or sacrifice child
Feb 5 00:27:26 ip-10-3-0-32 kernel: [ 686.756243] Killed process 2225 (amazon-ssm-agen) total-vm:303848kB, anon-rss:9252kB, file-rss:0kB, shmem-rss:0kB