How to process a file in PowerShell line-by-line as a stream -
i'm working multi-gigabyte text files , want stream processing on them using powershell. it's simple stuff, parsing each line , pulling out data, storing in database.
unfortunately, get-content | %{ whatever($_) }
appears keep entire set of lines @ stage of pipe in memory. it's surprisingly slow, taking long time read in.
so question 2 parts:
- how can make process stream line line , not keep entire thing buffered in memory? avoid using several gigs of ram purpose.
- how can make run faster? powershell iterating on
get-content
appears 100x slower c# script.
i'm hoping there's dumb i'm doing here, missing -linebuffersize
parameter or something...
if work on multi-gigabyte text files not use powershell. if find way read faster processing of huge amount of lines slow in powershell anyway , cannot avoid this. simple loops expensive, 10 million iterations (quite real in case) have:
# "empty" loop: takes 10 seconds measure-command { for($i=0; $i -lt 10000000; ++$i) {} } # "simple" job, output: takes 20 seconds measure-command { for($i=0; $i -lt 10000000; ++$i) { $i } } # "more real job": 107 seconds measure-command { for($i=0; $i -lt 10000000; ++$i) { $i.tostring() -match '1' } }
update: if still not scared try use .net reader:
$reader = [system.io.file]::opentext("my.log") try { for() { $line = $reader.readline() if ($line -eq $null) { break } # process line $line } } { $reader.close() }
update 2
there comments possibly better / shorter code. there nothing wrong original code for
, not pseudo-code. shorter (shortest?) variant of reading loop is
$reader = [system.io.file]::opentext("my.log") while($null -ne ($line = $reader.readline())) { $line }
Comments
Post a Comment