Difference between revisions of "AtomSets"

From Jmol
Jump to navigation Jump to search
(By element names:)
(By type of molecule:)
Line 79: Line 79:
  
 
== By type of molecule: ==
 
== By type of molecule: ==
<tt>protein, nucleic, dna, rna, water, solvent, ligand</tt>...
+
* <tt>amino</tt>:
 +
This is based only upon group name and has nothing to do with the actual atoms that make up the group.
 +
 
 +
This set is composed of all groups with one of these names:
 +
ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL, ASX, GLX, UNK.
 +
 
 +
This will usually be the 20 canonical proteinogenic amino acids, plus ambiguous Asp/Asn (ASX), ambiguous Glu/Gln (GLX) and 'unknown' (UNK).
 +
 
 +
* <tt>protein</tt>:
 +
The 'protein' set in Jmol is based solely on the atoms that make up the group. It is independent of the group name and independent of whether the atoms are
 +
defined in ATOM records or HETATM records.
 +
 
 +
This has the advantage that modified groups and interesting things that
 +
are not amino acids can still be identified as part of the protein.
 +
 
 +
The atoms that make up the set are recognized by the names of the atoms.
 +
 
 +
Case 1:
 +
If the group has 4 atoms named N, CA, C and (O or O1) and they
 +
are bonded in the correct order, then the group is considered protein.
 +
 
 +
Case 2:
 +
If the group has exactly 1 atom whose name is CA then it is considered
 +
protein. The purpose of this is to pick up alpha-carbon-only models
 +
 
 +
* <tt>nucleic</tt>:
 +
A group is considered 'nucleic' if it contains atoms with all of
 +
the following names: C5, C6, N1, C2, N3, C4, O5*, O3*, C3* (asterisk may be substituted by prime/apostrophe).
 +
 
 +
* <tt>dna</tt>:
 +
A group is 'rna' if it is 'nucleic' and it also contains at
 +
atom named O2* (or O2').
 +
 
 +
* <tt>rna</tt>:
 +
A group is 'dna' if it is 'nucleic' and it '''does not''' contain
 +
an atom named O2* (or O2').
 +
 
 +
* <tt>water</tt>:
 +
This set is based only on group name ... one of the following:
 +
HOH, DOD, or WAT.
 +
 
 +
* <tt>ions</tt>:
 +
Based only on group name: PO4 or SO4 (phosphate and sulphate).
 +
 
 +
* <tt>solvent</tt>:
 +
water or ions.
 +
 
 +
* <tt>ligand</tt>:
 +
hetero and not solvent (needs confirmation; at least, it is so in Rasmol)
  
 
== By type of residue: ==
 
== By type of residue: ==

Revision as of 01:20, 9 October 2006

Predefined Atom Sets

Jmol recognizes and uses several keywords or tokens for several purposes: commands in the scripting language, colors, etc. Among them, there are keywords for predefined atom sets:

By element names:

carbon, oxygen, hydrogen, sulphur, etc.

On Jmol 11, element symbol preceded by underscore can also be used: _C, _O, _H, _S, etc. Also, deuterium or _D, and tritium or _T.

On PDB format, Jmol will identify the element from columns 77-78 (element symbol, right-justified). If this is absent, then it will interpret the "aton name" field (columns 13-14) to deduce the element identity.

Note: Jmol 10.2 has a bug by which it may read calcium as alpha carbon based on its ID, although identification by element name works properly.

Parts in proteins:

Backbone

Inclusion in this set is determined by atom id*, as follows:

  • Peptide bond: N, H (bound to N), CA (alpha carbon), HA (bound to CA), C (carbonyl carbon), O or O1 (bound to C)
  • In glycine, the two equivalent hydrogens are both in the backbone set: either H1 and 1HA or 1HA and 2HA.
  • Termini:
    • second carbonyl oxygen on C-terminus: OXT
    • terminal amino hydrogens: 1H, 2H, 3H

(*)Note: on PDB format: atom id is called atom name, and must be in these positions/columns:

  • 13-14 : Chemical symbol, right justified, except for hydrogen atoms
  • 15 : Remoteness indicator (alphabetic); e.g., in amino acid residues, alpha = A, beta = B, gamma = G, delta = D, epsilon = E.
  • 16 : Branch designator (numeric).

Sidechain

Defined as (not backbone).

Alpha

A set defined by atom id CA.

Parts in nucleic acids:

Backbone

Inclusion in this set is determined by atom id*, as follows:

  • Phosphate groups:
    • phosphorus: P
    • oxygens bound to phosphorus: O1P, O2P
  • Atoms in pentose:
    • carbon ring: C1', C2', C3', C4', C5'
    • hydrogens attached to carbon ring: H1', 1H2', 2H2' (only DNA), H3', H4', 1H5' and 2H5'
    • hydroxyls: O2', O3', O4', O5', 2HO' (H on 2'-hydroxyl, only RNA) (the ring oxygen is denoted O4, not O1).

Note: PDB files label pentose atoms with asterisk instead of prime signs. How does Jmol cope with this? Not much of a trouble: given the asterisk is a wildcard, "select C3*" will get pentose carbons either labeled with prime or asterisk!.

  • Termini:
    • 5'-terminus oxygen (no phosphate): O5T
    • 5'-terminus hydrogen (attached to O5T or O5'): H5T
    • 3'-terminus hydrogen (on 3'-hydroxyl): H3T
  • Atoms in bases:
    • ring, both purines and pyrimidines: N1, C2, N3, C4, C5, C6
    • ring, purines: N7, C8, N9
    • ring, pyrimidines: O2
    • substituents on ring:
      • in cytosine: N4
      • in guanine: N2
      • in adenine: N6
      • in thymine: C5M
      • in guanine and hypoxanthine: O6
      • in thymine and uracil: O4
      • in thiouracil: S4

(*)Note: on PDB format, atom id is called atom name, and must be in these positions/columns:

  • 13-14 : Chemical symbol, right justified, except for hydrogen atoms
  • 15 : Remoteness indicator (alphabetic).
  • 16 : Branch designator (numeric).

Sidechain

Defined as (not backbone).

Bases

Synonim of (nucleic and sidechain).

By type of molecule:

  • amino:

This is based only upon group name and has nothing to do with the actual atoms that make up the group.

This set is composed of all groups with one of these names: ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL, ASX, GLX, UNK.

This will usually be the 20 canonical proteinogenic amino acids, plus ambiguous Asp/Asn (ASX), ambiguous Glu/Gln (GLX) and 'unknown' (UNK).

  • protein:

The 'protein' set in Jmol is based solely on the atoms that make up the group. It is independent of the group name and independent of whether the atoms are defined in ATOM records or HETATM records.

This has the advantage that modified groups and interesting things that are not amino acids can still be identified as part of the protein.

The atoms that make up the set are recognized by the names of the atoms.

Case 1: If the group has 4 atoms named N, CA, C and (O or O1) and they are bonded in the correct order, then the group is considered protein.

Case 2: If the group has exactly 1 atom whose name is CA then it is considered protein. The purpose of this is to pick up alpha-carbon-only models

  • nucleic:

A group is considered 'nucleic' if it contains atoms with all of the following names: C5, C6, N1, C2, N3, C4, O5*, O3*, C3* (asterisk may be substituted by prime/apostrophe).

  • dna:

A group is 'rna' if it is 'nucleic' and it also contains at atom named O2* (or O2').

  • rna:

A group is 'dna' if it is 'nucleic' and it does not contain an atom named O2* (or O2').

  • water:

This set is based only on group name ... one of the following: HOH, DOD, or WAT.

  • ions:

Based only on group name: PO4 or SO4 (phosphate and sulphate).

  • solvent:

water or ions.

  • ligand:

hetero and not solvent (needs confirmation; at least, it is so in Rasmol)

By type of residue:

Inclusion in this sets is determined by residue id (only as far as it is written in the adequate field in the molecular coordinate file, usually PDB format).

Residue IDs:

  • Nucleotides: A, G, C, T, U
  • Amino acids: the 3-letter standard abbreviation

Residue sets:

  • Nucleotides: purine, pyrimidine, at, cg
  • Amino acids:
    • acyclic, cyclic, aliphatic, aromatic
    • large, medium, small
    • polar, nonpolar, hydrophobic, neutral, charged, acidic, negative, basic, positive, ...
    • buried, surface
  • hetero, ions, ligand, water, solvent

By structure of the polymer:

  • amino, protein, nucleic
  • helix, sheet, turn
  • bonded