Skip to main content

6. Text Processing Services

6.1. string - Common string operations

6.2. re - Regular expression operations

6.3. difflib - Helpers for computing deltas

6.4. textwrap - Text wrapping and filling

6.5. unicodedata - Unicode Database

6.6. stringprep - Internet String Preparation

6.7. readline - GNU readline interface

6.8. rlcompleter - Completion function for GNU readline

6.3 difflib

Helper for computing deltas

  • class.difflib.SequenceMatcher

SequenceMatcher(isjunk=None,a='',b='',autojunk=True)

  • class.difflib.Differ
  • class.difflib.HtmlDiff
  • difflib.get_close_matches(word, possibilities, n=3. cutoff=0.6)

6.5. unicodedata

Using theunicodedataPython module it's easy to normalize anyunicodedata strings (remove accents etc):

import unicodedata

data = u'ïnvéntìvé'
normal = unicodedata.normalize
('NFKD', data).
encode('ASCII', 'ignore')
print(normal)

The output will be:

b'inventive'

The NFKD stands for Normalization Form Compatibility Decomposition, and this is where characters are decomposed by compatibility, also multiple combining characters are arranged in a specific order.