cloud-init collect-logs can use too much memory
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Fix Released
|
Medium
|
Unassigned |
Bug Description
if the journal is large, or the machine doesn't have a lot of memory, cloud-init collect-logs can cause an OOM.
The problem is that we are reading the entire journal into memory and then writing it out:
https:/
We should not buffer the entire journal in memory. I think redirecting it to an output file would not cause a memory spike.
Thanks,
Pradip Dhara
1. cloud-provider: azure
2. i don't think the cloud-init configuration is relevant here. But, I can provide it if needed.
3. can't do because cloud-init collect-logs is crashing
4. i can get dmesg logs if you like. But, I don't think it is relevant to this bug.
I can confirm with memory_profiler, if collect logs tries to _write_ command_ output_ to_file( ['cat', 'some-multi- gig-file' ], "/tmp/filecopy", verbosity=0) we can get into incredibly high memory consumption and resulting OOM issues.
Simply inverting this call to something like the following python avoids realizing the entire stream in memory:
with open("/ tmp/filecopy" , "w") as f: s.call( ['cat', 'some-multi- gig-file' ], stdout=f)
subproces
In profiling a 4 Gig file "copy" I can see we our memory footprint is capped at about 30 MiB instead of 4 Gig for file creation.
Line # Mem usage Increment Occurrences Line Contents ======= ======= ======= ======= ======= ======= ======= ===== profiler. profile tmp/filecopy" , "w") as f: call([" cat", "/some- multi-gig- file"], stdout=f)
=======
6 32.4 MiB 32.4 MiB 1 @memory_
7 def doit():
8 32.4 MiB 0.0 MiB 1 with open("/
9 32.6 MiB 0.2 MiB 1 subprocess.