Friday, 13 April 2018

RDKit Reaction SMARTS

There's a been some papers using the RDKit for synthesis planning. If you're writing a paper and use the term "Reaction SMARTS" make sure you mean what everyone else thinks it means.

The SMILES, SMARTS, and SMIRKS line notations were created* by Daylight for storing, matching, and transforming connection tables.

  • SMILES describes a connection table to store molecule and reactions
  • SMARTS describes a pattern (or query) to match molecules and reactions
  • SMIRKS describes a transform (or "reaction") to modify molecules

RDKit uses the term "Reaction SMARTS" to mean "transform" (see RDKit Book). Unfortunately in Daylight's terminology Reaction SMARTS is a pattern not a transform.

Screenshot from the Daylight SMARTS theory manual.
Reactions SMARTS is primarily useful for searching reaction databases. For example this Reaction SMILES:


is matched by this Reaction SMARTS

[*:1][Nh2:2]>>[*:1][Nh0:2](~[OD1])~[OD1] amino to nitro

You can highlight the substructure:

Highlighting the SMARTS in the SMILES using CDK Depict

But that's a transform!

Yes but it's matching a transform (SMARTS) not applying one (SMIRKS), some may think you could read this unmodified as a SMIRKS but this is not the case. SMIRKS needs "real parts" after the second angled bracket as these are the parts created by the transform. Note that '*' is valid SMILES and in SMIRKS it kind of means "unmodified". This actually gives us the nice invariants:

All SMILES are valid SMARTS but not all SMARTS are valid SMILES
All SMIRKS are valid SMARTS but not all SMARTS are valid SMIRKS

Here is the SMIRKS transform for amino to nitro

[*:1][ND3:2]([H])([H])>>[*:1][N:2](=O)=O amino to nitro

In SMIRKS I can apply this SMIRKS to "molecules" and it will create "reactions". Note these molecules do not need to have atom-maps but they will come out with atom maps (see dt_transform)!


The output is


And another thing...

In general you can't run SMIRKS backwards. If I want to run a nitro to amino because the atoms/bonds we're adding need to be "real" we need to encode the reverse transform separately!

[*:1][ND3:2]([H])([H])>>[*:1][N:2](=O)=O amino to nitro
[*:1][ND3:2](~[OD1])(~[OD1])>>[*:1][N:2]([H])[H] nitro to amino

Although dt_transform specifies a direction this only controls whether the input molecules appear on the left or right of the output reaction.

*SMILES was created by Dave Weininger whilst at EPA