James Percival

My Github Pages material

Last updated : 30th August 2018

Regexps - another thing I can't do. Yet.


Regular expressions are a powerful tool for text processing. Apparently. For some reason I just can’t ever get my head round them, and thus, given my preferred method for learning things, I can can never remember any of the syntax for them. So I’m going to try and do something about it by making notes on useful things which can be done with them.

Python regexps to parse a string

The first question with regexps is which regexps. Regular expressions define a language for dealing with text, but different programming languages implement them differently. Since python is the language I use for everything lightweight I do, let’s try those:

import re

file_handle=open(filename)
for line in file_handle.readlines():
    data +=" ".join(line.strip().lower().split(" "))+'\n'
file_handle.close()

module_name=re.search('(?<=^module )\w+',data).group(0)
modules_used = re.findall('(?<=^use )\w+',data,re.MULTILINE)

This code snippet takes a filename for a modern Fortran file, preprocesses it into a fixed format with no whitespace then uses regexps to parse the name of the module and the names of all modules included directly from this file. Here the carat matches the start of a new line in a string (thanks to the MULTILINE flag) the pattern in brackets “(?<=blah)” means matches must follow the text blah and \w+ matches the next word. There’s probably a better way of doing most of this.