Regular Expressions
A regular expression is a method for specifying a set of strings. Our topic for this lecture is the famous grep algorithm that determines whether a given text contains any substring from the set. We examine an efficient implementation that makes use of our digraph reachability implementation from Week 1.
Applications
- Pattern matching in Genomic data
- Syntax highlighting
- Scan for virus signatures
- Process natural language
- Specify a programming language
- Access information in digital libraries
- Search genome using PROSITE patterns
- Filter text (spam, NetNanny, Carnivore, malware)
- Validate data-entry fields (dates, email, URL, credit card)
Parse text files
- Compile a java program
- Crawl and index the Web
- Read in data stored in ad hoc input file format
- Create Java documentation from Javadoc comments