Regular expressions are a powerful tool for text processing. Apparently. For some reason I just can’t ever get my head round them, and thus, given my preferred method for learning things, I can can never remember any of the syntax for them. So I’m going to try and do something about it by making notes on useful things which can be done with them.
The first question with regexps is which regexps. Regular expressions define a language for dealing with text, but different programming languages implement them differently. Since python is the language I use for everything lightweight I do, let’s try those:
import re file_handle=open(filename) for line in file_handle.readlines(): data +=" ".join(line.strip().lower().split(" "))+'\n' file_handle.close() module_name=re.search('(?<=^module )\w+',data).group(0) modules_used = re.findall('(?<=^use )\w+',data,re.MULTILINE)
This code snippet takes a filename for a modern Fortran file, preprocesses it into a fixed format with no whitespace then uses regexps to parse the name of the module and the names of all modules included directly from this file. Here the carat matches the start of a new line in a string (thanks to the MULTILINE flag) the pattern in brackets “(?<=blah)” means matches must follow the text blah and \w+ matches the next word. There’s probably a better way of doing most of this.