Difference between revisions of "AtomSets"

From Jmol
Jump to navigation Jump to search
(By type of residue:)
Line 3: Line 3:
  
 
== By element names: ==
 
== By element names: ==
carbon, oxygen, hydrogen, sulphur, etc.
+
<tt>carbon</tt>, <tt>oxygen</tt>, <tt>hydrogen</tt>, <tt>sulphur</tt>, etc.
  
 
On PDB format, Jmol will identify the element from columns 77-78 (element symbol, right-justified). If this is absent, then it will interpret the "aton name" field (columns    13-14).
 
On PDB format, Jmol will identify the element from columns 77-78 (element symbol, right-justified). If this is absent, then it will interpret the "aton name" field (columns    13-14).
Line 10: Line 10:
  
 
== Parts in proteins: ==
 
== Parts in proteins: ==
=== Backbone ===
+
=== <tt>Backbone</tt> ===
 
Inclusion in this set is determined by '''atom id'''*, as follows:
 
Inclusion in this set is determined by '''atom id'''*, as follows:
  
* Peptide bond: N, H (bound to N), CA (alpha carbon), HA (bound to CA), C (carbonyl carbon), O or O1 (bound to C)
+
* Peptide bond: <tt>N</tt>, <tt>H</tt> (bound to N), <tt>CA</tt> (alpha carbon), <tt>HA</tt> (bound to CA), <tt>C</tt> (carbonyl carbon), <tt>O</tt> or <tt>O1</tt> (bound to C)
* In glycine, the two equivalent hydrogens are both in the backbone set: either H1 and 1HA or 1HA and 2HA.
+
* In glycine, the two equivalent hydrogens are both in the backbone set: either <tt>H1</tt> and <tt>1HA</tt> or <tt>1HA</tt> and <tt>2HA</tt>.
  
 
* Termini:
 
* Termini:
** OXT : second carbonyl oxygen on C-terminus
+
** second carbonyl oxygen on C-terminus: <tt>OXT</tt>
** 1H, 2H, 3H : terminal amino hydrogens
+
** terminal amino hydrogens: <tt>1H</tt>, <tt>2H</tt>, <tt>3H</tt>
  
 
(*)Note: on PDB format: '''atom id''' is called ''atom name'', and must be in these positions/columns:
 
(*)Note: on PDB format: '''atom id''' is called ''atom name'', and must be in these positions/columns:
 
* 13-14 : Chemical symbol, right justified, except for hydrogen atoms
 
* 13-14 : Chemical symbol, right justified, except for hydrogen atoms
* 15 : Remoteness indicator (alphabetic); e.g., in amino acid residues, alpha = A, beta = B, gamma = G, delta = D, epsilon = E.
+
* 15 : Remoteness indicator (alphabetic); e.g., in amino acid residues, alpha = <tt>A</tt>, beta = <tt>B</tt>, gamma = <tt>G</tt>, delta = <tt>D</tt>, epsilon = <tt>E</tt>.
 
* 16 : Branch designator (numeric).
 
* 16 : Branch designator (numeric).
  
=== Sidechain ===
+
=== <tt>Sidechain</tt> ===
Defined as (not backbone).
+
Defined as (<tt>not backbone</tt>).
 +
 
 +
=== <tt>Alpha</tt> ===
 +
A set defined by '''atom id''' <tt>CA</tt>.  
  
 
== Parts in nucleic acids: ==
 
== Parts in nucleic acids: ==
=== Backbone ===
+
=== <tt>Backbone</tt> ===
 
Inclusion in this set is determined by '''atom id'''*, as follows:
 
Inclusion in this set is determined by '''atom id'''*, as follows:
  
 
* Phosphate groups:
 
* Phosphate groups:
** phosphorus: P
+
** phosphorus: <tt>P</tt>
** oxygens bound to phosphorus: O1P, O2P
+
** oxygens bound to phosphorus: <tt>O1P, O2P</tt>
  
 
* Atoms in pentose:
 
* Atoms in pentose:
** carbon ring: C1', C2', C3', C4', C5'
+
** carbon ring: <tt>C1', C2', C3', C4', C5'</tt>
** hydrogens attached to carbon ring: H1', 1H2', 2H2' (only DNA), H3', H4', 1H5' and 2H5'
+
** hydrogens attached to carbon ring: <tt>H1', 1H2', 2H2'</tt> (only DNA), <tt>H3', H4', 1H5'</tt> and <tt>2H5'</tt>
** hydroxyls: O2', O3', O4', O5', 2HO' (H on 2'-hydroxyl, only RNA) (the ring oxygen is denoted O4, not O1).
+
** hydroxyls: <tt>O2', O3', O4', O5', 2HO'</tt> (H on 2'-hydroxyl, only RNA) (the ring oxygen is denoted <tt>O4</tt>, not <tt>O1</tt>).
  
<blockquote>Note: PDB files label pentose atoms with asterisk instead of prime signs. How does Jmol cope with this? Not much of a trouble: given the asterisk is a wildcard, "select C3*" will get pentose carbons either labeled with prime or asterisk!.
+
<blockquote>Note: PDB files label pentose atoms with asterisk instead of prime signs. How does Jmol cope with this? Not much of a trouble: given the asterisk is a wildcard, "<tt>select C3*</tt>" will get pentose carbons either labeled with prime or asterisk!.
 
</blockquote>
 
</blockquote>
  
 
* Termini:
 
* Termini:
** 5'-terminus oxygen (no phosphate): O5T
+
** 5'-terminus oxygen (no phosphate): <tt>O5T</tt>
** 5'-terminus hydrogen (attached to O5T or O5'): H5T
+
** 5'-terminus hydrogen (attached to O5T or O5'): <tt>H5T</tt>
** 3'-terminus hydrogen (on 3'-hydroxyl): H3T
+
** 3'-terminus hydrogen (on 3'-hydroxyl): <tt>H3T</tt>
  
 
* Atoms in bases:
 
* Atoms in bases:
** ring, both purines and pyrimidines: N1, C2, N3, C4, C5, C6
+
** ring, both purines and pyrimidines: <tt>N1, C2, N3, C4, C5, C6</tt>
** ring, purines: N7, C8, N9
+
** ring, purines: <tt>N7, C8, N9</tt>
** ring, pyrimidines: O2
+
** ring, pyrimidines: <tt>O2</tt>
 
** substituents on ring:
 
** substituents on ring:
*** in cytosine: N4  
+
*** in cytosine: <tt>N4</tt>
*** in guanine: N2
+
*** in guanine: <tt>N2</tt>
*** in adenine: N6  
+
*** in adenine: <tt>N6</tt>
*** in thymine: C5M  
+
*** in thymine: <tt>C5M</tt>
*** in guanine and hypoxanthine: O6
+
*** in guanine and hypoxanthine: <tt>O6</tt>
*** in thymine and uracil: O4
+
*** in thymine and uracil: <tt>O4</tt>
*** in thiouracil: S4
+
*** in thiouracil: <tt>S4</tt>
  
 
(*)Note: on PDB format, '''atom id''' is called ''atom name'', and must be in these positions/columns:
 
(*)Note: on PDB format, '''atom id''' is called ''atom name'', and must be in these positions/columns:
Line 68: Line 71:
  
 
=== Sidechain ===
 
=== Sidechain ===
Defined as (not backbone).
+
Defined as (<tt>not backbone</tt>).
  
 
=== Bases ===
 
=== Bases ===
Synonim of sidechain.
+
Synonim of <tt>sidechain</tt>.
  
  
 
== By type of molecule: ==
 
== By type of molecule: ==
protein, nucleic, dna, rna, water, solvent, ligand...
+
<tt>protein, nucleic, dna, rna, water, solvent, ligand</tt>...
  
 
== By type of residue: ==
 
== By type of residue: ==
Line 81: Line 84:
  
 
Residue IDs:
 
Residue IDs:
* Nucleotides: A, G, C, T, U
+
* Nucleotides: <tt>A, G, C, T, U</tt>
 
* Amino acids: the 3-letter standard abbreviation  
 
* Amino acids: the 3-letter standard abbreviation  
  
 
Residue sets:
 
Residue sets:
* Nucleotides: purine, pyrimidine, at, cg
+
* Nucleotides: <tt>purine, pyrimidine, at, cg</tt>
 
* Amino acids:  
 
* Amino acids:  
** acyclic, cyclic, aliphatic, aromatic,
+
** <tt>acyclic, cyclic, aliphatic, aromatic</tt>
** large, medium, small  
+
** <tt>large, medium, small</tt>
** polar, nonpolar, hydrophobic, neutral, charged, acidic, negative, basic, positive, ...
+
** <tt>polar, nonpolar, hydrophobic, neutral, charged, acidic, negative, basic, positive,</tt> ...
** buried, surface
+
** <tt>buried, surface</tt>
* hetero, ions, ligand, water, solvent  
+
* <tt>hetero, ions, ligand, water, solvent</tt>
  
 
== By structure of the polymer: ==
 
== By structure of the polymer: ==
* amino, protein, nucleic  
+
* <tt>amino, protein, nucleic</tt>
* helix, sheet, turn  
+
* <tt>helix, sheet, turn</tt>
* bonded
+
* <tt>bonded</tt>

Revision as of 09:08, 6 June 2006

Predefined Atom Sets

Jmol recognizes and uses several keywords or tokens for several purposes: commands in the scripting language, colors, etc. Among them, there are keywords for predefined atom sets:

By element names:

carbon, oxygen, hydrogen, sulphur, etc.

On PDB format, Jmol will identify the element from columns 77-78 (element symbol, right-justified). If this is absent, then it will interpret the "aton name" field (columns 13-14).

Note: there is currently (June 2006, Jmol 10.2) a bug by which Jmol may read calcium as alpha carbon based on its ID, although identification by element name works properly.

Parts in proteins:

Backbone

Inclusion in this set is determined by atom id*, as follows:

  • Peptide bond: N, H (bound to N), CA (alpha carbon), HA (bound to CA), C (carbonyl carbon), O or O1 (bound to C)
  • In glycine, the two equivalent hydrogens are both in the backbone set: either H1 and 1HA or 1HA and 2HA.
  • Termini:
    • second carbonyl oxygen on C-terminus: OXT
    • terminal amino hydrogens: 1H, 2H, 3H

(*)Note: on PDB format: atom id is called atom name, and must be in these positions/columns:

  • 13-14 : Chemical symbol, right justified, except for hydrogen atoms
  • 15 : Remoteness indicator (alphabetic); e.g., in amino acid residues, alpha = A, beta = B, gamma = G, delta = D, epsilon = E.
  • 16 : Branch designator (numeric).

Sidechain

Defined as (not backbone).

Alpha

A set defined by atom id CA.

Parts in nucleic acids:

Backbone

Inclusion in this set is determined by atom id*, as follows:

  • Phosphate groups:
    • phosphorus: P
    • oxygens bound to phosphorus: O1P, O2P
  • Atoms in pentose:
    • carbon ring: C1', C2', C3', C4', C5'
    • hydrogens attached to carbon ring: H1', 1H2', 2H2' (only DNA), H3', H4', 1H5' and 2H5'
    • hydroxyls: O2', O3', O4', O5', 2HO' (H on 2'-hydroxyl, only RNA) (the ring oxygen is denoted O4, not O1).

Note: PDB files label pentose atoms with asterisk instead of prime signs. How does Jmol cope with this? Not much of a trouble: given the asterisk is a wildcard, "select C3*" will get pentose carbons either labeled with prime or asterisk!.

  • Termini:
    • 5'-terminus oxygen (no phosphate): O5T
    • 5'-terminus hydrogen (attached to O5T or O5'): H5T
    • 3'-terminus hydrogen (on 3'-hydroxyl): H3T
  • Atoms in bases:
    • ring, both purines and pyrimidines: N1, C2, N3, C4, C5, C6
    • ring, purines: N7, C8, N9
    • ring, pyrimidines: O2
    • substituents on ring:
      • in cytosine: N4
      • in guanine: N2
      • in adenine: N6
      • in thymine: C5M
      • in guanine and hypoxanthine: O6
      • in thymine and uracil: O4
      • in thiouracil: S4

(*)Note: on PDB format, atom id is called atom name, and must be in these positions/columns:

  • 13-14 : Chemical symbol, right justified, except for hydrogen atoms
  • 15 : Remoteness indicator (alphabetic).
  • 16 : Branch designator (numeric).

Sidechain

Defined as (not backbone).

Bases

Synonim of sidechain.


By type of molecule:

protein, nucleic, dna, rna, water, solvent, ligand...

By type of residue:

Inclusion in this sets is determined by residue id (only as far as it is written in the adequate field in the molecular coordinate file, usually PDB format).

Residue IDs:

  • Nucleotides: A, G, C, T, U
  • Amino acids: the 3-letter standard abbreviation

Residue sets:

  • Nucleotides: purine, pyrimidine, at, cg
  • Amino acids:
    • acyclic, cyclic, aliphatic, aromatic
    • large, medium, small
    • polar, nonpolar, hydrophobic, neutral, charged, acidic, negative, basic, positive, ...
    • buried, surface
  • hetero, ions, ligand, water, solvent

By structure of the polymer:

  • amino, protein, nucleic
  • helix, sheet, turn
  • bonded