How to process a file in PowerShell line-by-line as a stream -


i'm working multi-gigabyte text files , want stream processing on them using powershell. it's simple stuff, parsing each line , pulling out data, storing in database.

unfortunately, get-content | %{ whatever($_) } appears keep entire set of lines @ stage of pipe in memory. it's surprisingly slow, taking long time read in.

so question 2 parts:

  1. how can make process stream line line , not keep entire thing buffered in memory? avoid using several gigs of ram purpose.
  2. how can make run faster? powershell iterating on get-content appears 100x slower c# script.

i'm hoping there's dumb i'm doing here, missing -linebuffersize parameter or something...

if work on multi-gigabyte text files not use powershell. if find way read faster processing of huge amount of lines slow in powershell anyway , cannot avoid this. simple loops expensive, 10 million iterations (quite real in case) have:

# "empty" loop: takes 10 seconds measure-command { for($i=0; $i -lt 10000000; ++$i) {} }  # "simple" job, output: takes 20 seconds measure-command { for($i=0; $i -lt 10000000; ++$i) { $i } }  # "more real job": 107 seconds measure-command { for($i=0; $i -lt 10000000; ++$i) { $i.tostring() -match '1' } } 

update: if still not scared try use .net reader:

$reader = [system.io.file]::opentext("my.log") try {     for() {         $line = $reader.readline()         if ($line -eq $null) { break }         # process line         $line     } } {     $reader.close() } 

update 2

there comments possibly better / shorter code. there nothing wrong original code for , not pseudo-code. shorter (shortest?) variant of reading loop is

$reader = [system.io.file]::opentext("my.log") while($null -ne ($line = $reader.readline())) {     $line } 

Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

html - Instapaper-like algorithm -

c# - How to execute a particular part of code asynchronously in a class -