I can confirm with memory_profiler, if collect logs tries to _write_command_output_to_file(['cat', 'some-multi-gig-file'], "/tmp/filecopy", verbosity=0) we can get into incredibly high memory consumption and resulting OOM issues.
Simply inverting this call to something like the following python avoids realizing the entire stream in memory:
with open("/tmp/filecopy", "w") as f:
subprocess.call(['cat', 'some-multi-gig-file'], stdout=f)
In profiling a 4 Gig file "copy" I can see we our memory footprint is capped at about 30 MiB instead of 4 Gig for file creation.
Line # Mem usage Increment Occurrences Line Contents
=============================================================
6 32.4 MiB 32.4 MiB 1 @memory_profiler.profile
7 def doit():
8 32.4 MiB 0.0 MiB 1 with open("/tmp/filecopy", "w") as f:
9 32.6 MiB 0.2 MiB 1 subprocess.call(["cat", "/some-multi-gig-file"], stdout=f)
I can confirm with memory_profiler, if collect logs tries to _write_ command_ output_ to_file( ['cat', 'some-multi- gig-file' ], "/tmp/filecopy", verbosity=0) we can get into incredibly high memory consumption and resulting OOM issues.
Simply inverting this call to something like the following python avoids realizing the entire stream in memory:
with open("/ tmp/filecopy" , "w") as f: s.call( ['cat', 'some-multi- gig-file' ], stdout=f)
subproces
In profiling a 4 Gig file "copy" I can see we our memory footprint is capped at about 30 MiB instead of 4 Gig for file creation.
Line # Mem usage Increment Occurrences Line Contents ======= ======= ======= ======= ======= ======= ======= ===== profiler. profile tmp/filecopy" , "w") as f: call([" cat", "/some- multi-gig- file"], stdout=f)
=======
6 32.4 MiB 32.4 MiB 1 @memory_
7 def doit():
8 32.4 MiB 0.0 MiB 1 with open("/
9 32.6 MiB 0.2 MiB 1 subprocess.