Sunday, December 28, 2008

some good sed regex

A fantastic tutorial on sed, worth checking out. I never used it before. I guess I was too intimidated to learn it. But the truth is, it's a LOT easier than trying to do search replacement in emacs or vim.

Anyway, here are some notes
  1. sed works on a line by line basis through the file you send it. If you want it to perform an action more than once on the same line, you have to pass in the /g parameter
  2. you must have the three separators no matter what.
    sed 's/one/two/'
  3. However, you can always change them.
    sed 's_one_two_'
  4. one of the things I didn't see as useful is the number flag. s///5 but what if you want to add a colon (:) after the 80th character on each line?
    sed 's/./&:/80' <file> new
  5. if you want to run sed as a shell script - #!/bin/sed at the top then a file followed with a separate command on each line. just run it by something.sed <old> new
  6. if you want to use a in the regex, you have to type Ctrl-V Ctrl-I I'm not sure if you have to do this when putting it into a script.
  7. by the way, that < oldfile > newfile thing is really just piping.
some sed regex I'm scraping from that tutorial follows. Another good link is this sed one liners page.

removing duplicate words:
  • note the /g - for global replacement
  • and the space inside the parenthesis, this separates the words
sed 's/\([a-z]* \)\1/\1/g' <old> new
to make a change on the first word of every line no matter what's inside it:
  • if you want it on every word in the line add a /g
sed 's/[^ ]*/(&)/' <old>new
to send in a few files, count all the lines that don't begin with "#:", pipe it through grep looking for anything that isn't blank and then count the number of lines:
  • note the ".*" after "#:" this is the equivalent of saying "select EVERYTHING after the #:" - we then replace it with nothing
  • the f1 f2 and f3 are just names of files, presumably some of the lines in those files start with #:
  • grep -v means "the opposite" of whatever expression I pass it to look for
  • wc -l - just counts lines
sed 's/^#:.*//' f1 f2 f3 | grep -v '^$' | wc -l
to get rid of commeted lines:
  • first line takes out everything following (and including the #)
  • second command removes all the tabs and spaces
  • final command deletes each empty line
sed -e 's/#.*//' -e 's/[ ^I]*$//' -e '/^$/ d'
to double every line:
  • p simply duplicates whatever is printed
sed '/^$/ p'

No comments:

Post a Comment