OOM Killer ( 1 )

February 05, 2020

OOM とは

Linux では Out Of Memory (OOM) Killer という機能により、システムのメモリ・スワップを使い尽くすと、メモリを確保する最終手段としてプロセスにシグナルを送信し、強制的にそのプロセスを終了させようとする。
この機能のおかげでメモリを解放できないにもかかわらずメモリ確保の処理が繰り返され、システムが止まってしまうのを避けることが可能。また、メモリを過剰に消費しているプロセスも検知できる。

OOM を発生させる

以下のコードで OOM を発生させる。やってることは単純で 1 GiB の物理メモリを持つシステムで 2 GiB の仮想メモリを確保し、徐々にそのメモリ領域にアクセスする、というもの。

#include <unistd.h>  
#include <time.h>  
#include <stdio.h>  
#include <stdlib.h>  
#include <string.h>  
#include <err.h>  
   
#define BUFFER_SIZE (2000*1024*1024)  
#define NCYCLE 30  
#define PAGE_SIZE 4096  
   
int main(void) {  
  char *p;  
  time_t t;  
  char *s;  
   
  t = time(NULL);  
  s = ctime(&amp;t);  
  printf("%.*s: before memory allocation. please press Enter\n", (int)(strlen(s) - 1), s);  
  getchar();  
   
  p = malloc(BUFFER_SIZE);  
  if (p == NULL)  
    err(EXIT_FAILURE, "malloc() failed.");  
   
  t = time(NULL);  
  s = ctime(&amp;t);  
  printf("%.*s: allocated %d MB. please press Enter\n", (int)(strlen(s) - 1), s, BUFFER_SIZE/(1024*1024));  
  getchar();  
   
  int i;  
  for (i = 0; i < BUFFER_SIZE; i += PAGE_SIZE) {  
    p[i] = 0;  
    int cycle = i / (BUFFER_SIZE/NCYCLE);  
    if (cycle != 0 &amp;&amp; i % (BUFFER_SIZE/NCYCLE) == 0) {  
      t = time(NULL);  
      s = ctime(&amp;t);  
      printf("%.*s: touched %d MB.\n", (int)(strlen(s) - 1), s, i / (1024*1024));  
      sleep(1);  
    }  
  }  
  t = time(NULL);  
  s = ctime(&amp;t);  
  printf("%.*s: touched %d MB. please press Enter\n", (int)(strlen(s) - 1), s, BUFFER_SIZE / (1024*1024));  
  getchar();  
   
  exit(EXIT_SUCCESS);  
}  

まず、十分な仮想空間を確保するために、オーバーコミットポリシーを OVERCOMMIT_ALWAYS にしておく。

$ sudo bash -c "echo 1 > /proc/sys/vm/overcommit_memory"  
$ cat /proc/sys/vm/overcommit_memory   
1  

で、コンパイルして実行。

$ gcc -o oom oom.c   
$ ./oom  
Wed Feb  5 00:25:32 2020: before memory allocation. please press Enter  
  
Wed Feb  5 00:27:21 2020: allocated 2000 MB. please press Enter  

この段階で、./oom 自身が OOM Killer に殺されないようにしておく。

$ ps aux | grep oom  
root       160  0.0  0.0      0     0 ?        S    00:15   0:00 [oom_reaper]  
ec2-user  3111  0.0  0.0   4360  1004 pts/0    S+   00:25   0:00 ./oom  
ec2-user  3113  0.0  0.1 110492  1800 pts/1    D+   00:25   0:00 grep --color=auto oom  
$ sudo bash -c "echo -17 > /proc/3111/oom_adj"  
$ cat /proc/3111/oom_adj  
-17  

すると、以下のようなログが /var/log/messages から確認でき、OOM Killer によって amazon-ssm-agent が kill されたことがわかる。

Feb  5 00:27:00 ip-10-3-0-32 kernel: [  662.633469] bash (3116): /proc/3111/oom_adj is deprecated, please use /proc/3111/oom_score_adj instead.  
Feb  5 00:27:25 ip-10-3-0-32 kernel: [  686.245536] oom invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=-1000  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.255460] oom cpuset=/ mems_allowed=0  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.258634] CPU: 0 PID: 3111 Comm: oom Not tainted 4.14.77-70.59.amzn1.x86_64 #1  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.264211] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.268687] Call Trace:  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.271047]  dump_stack+0x5c/0x82  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.273943]  dump_header+0x94/0x21c  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.276828]  ? get_page_from_freelist+0x525/0xba0  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.280290]  oom_kill_process+0x213/0x410  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.283462]  out_of_memory+0x296/0x4c0  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.286475]  __alloc_pages_slowpath+0x9ef/0xdd0  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.290108]  __alloc_pages_nodemask+0x207/0x220  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.293489]  alloc_pages_vma+0x7c/0x1e0  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.296816]  __handle_mm_fault+0x8b8/0x1460  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.300158]  ? __switch_to_asm+0x24/0x60  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.303931]  handle_mm_fault+0xaa/0x1e0  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.307616]  __do_page_fault+0x22e/0x4c0  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.312343]  ? page_fault+0x2f/0x50  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.315857]  page_fault+0x45/0x50  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.319198] RIP: a93ed0e0:          (null)  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.324143] RSP: 0000:00007fffa93ed000 EFLAGS: 00400650  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.483396] 9687 pages reserved  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.486108] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.492291] [ 1569]     0  1569     2857      215      13       3        0         -1000 udevd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.498712] [ 1879]     0  1879    27285       58      21       3        0             0 lvmetad  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.505112] [ 1888]     0  1888     6799       48      16       3        0             0 lvmpolld  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.513038] [ 2095]     0  2095     2353      122       9       3        0             0 dhclient  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.519666] [ 2183]     0  2183     2353      119      10       3        0             0 dhclient  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.526256] [ 2225]     0  2225    75962     2313      33       5        0             0 amazon-ssm-agen  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.533098] [ 2233]     0  2233    13251       97      26       3        0         -1000 auditd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.539231] [ 2258]     0  2258    61858      132      23       3        0             0 rsyslogd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.546069] [ 2280]     0  2280     1630       24       9       3        0             0 rngd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.553607] [ 2298]    32  2298     8841       99      22       3        0             0 rpcbind  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.562336] [ 2319]    29  2319     9983      203      24       4        0             0 rpc.statd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.570198] [ 2353]    81  2353     5461       58      15       3        0             0 dbus-daemon  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.577966] [ 2756]     0  2756     1099       37       8       3        0             0 acpid  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.585345] [ 2791]    38  2791    29129      202      30       3        0             0 ntpd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.591737] [ 2812]     0  2812    22408      431      42       4        0             0 sendmail  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.598350] [ 2821]    51  2821    20272      370      41       4        0             0 sendmail  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.604926] [ 2833]     0  2833    30411      149      17       3        0             0 crond  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.611236] [ 2847]     0  2847     4797       42      14       3        0             0 atd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.618104] [ 2877]     0  2877     1627       31       9       3        0             0 agetty  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.625219] [ 2878]     0  2878     1090       25       8       3        0             0 mingetty  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.633028] [ 2880]     0  2880     1090       24       8       3        0             0 mingetty  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.640639] [ 2882]     0  2882     1090       24       8       3        0             0 mingetty  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.648232] [ 2884]     0  2884     1090       24       8       3        0             0 mingetty  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.655781] [ 2886]     0  2886     1090       24       8       3        0             0 mingetty  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.663595] [ 2888]     0  2888     1090       25       8       3        0             0 mingetty  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.670351] [ 2892]     0  2892     2731       93      12       3        0         -1000 udevd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.677755] [ 2893]     0  2893     2731       93      12       3        0         -1000 udevd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.684895] [ 2901]     0  2901    30011      270      65       3        0             0 sshd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.692963] [ 2903]   501  2903    30011      275      63       3        0             0 sshd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.700402] [ 2904]   501  2904    28851       98      15       3        0             0 bash  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.709589] [ 3019]     0  3019    20148      206      41       3        0         -1000 sshd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.716951] [ 3081]     0  3081    30012      269      62       3        0             0 sshd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.725010] [ 3083]   501  3083    30012      269      60       3        0             0 sshd  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.732538] [ 3084]   501  3084    28853       98      15       3        0             0 bash  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.740800] [ 3111]   501  3111   513091   224852     448       5        0         -1000 oom  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.748446] Out of memory: Kill process 2225 (amazon-ssm-agen) score 9 or sacrifice child  
Feb  5 00:27:26 ip-10-3-0-32 kernel: [  686.756243] Killed process 2225 (amazon-ssm-agen) total-vm:303848kB, anon-rss:9252kB, file-rss:0kB, shmem-rss:0kB  

 © 2023, Dealing with Ambiguity