Difference between revisions of "AtomSets"
AngelHerraez (talk | contribs) (warning about ligand set) |
AngelHerraez (talk | contribs) m (removing colons) |
||
Line 8: | Line 8: | ||
:''Technical note'': most of these are set in [http://jmol.svn.sourceforge.net/viewvc/jmol/trunk/Jmol/src/org/jmol/viewer/JmolConstants.java?view=markup src/org/jmol/viewer/JmolConstants.java] | :''Technical note'': most of these are set in [http://jmol.svn.sourceforge.net/viewvc/jmol/trunk/Jmol/src/org/jmol/viewer/JmolConstants.java?view=markup src/org/jmol/viewer/JmolConstants.java] | ||
− | == By element name | + | == By element name == |
<tt>carbon</tt>, <tt>oxygen</tt>, <tt>hydrogen</tt>, <tt>sulphur</tt>, etc. | <tt>carbon</tt>, <tt>oxygen</tt>, <tt>hydrogen</tt>, <tt>sulphur</tt>, etc. | ||
Line 17: | Line 17: | ||
Note: Jmol 10.2 has a bug by which it may read calcium as alpha carbon based on its ID, although identification by element name works properly. It's been fixed for Jmol 11. | Note: Jmol 10.2 has a bug by which it may read calcium as alpha carbon based on its ID, although identification by element name works properly. It's been fixed for Jmol 11. | ||
− | == By type of molecule | + | == By type of molecule == |
− | * <tt>amino</tt> | + | * <tt>amino</tt> |
: This is based only upon group name and has nothing to do with the actual atoms that make up the group. | : This is based only upon group name and has nothing to do with the actual atoms that make up the group. | ||
: This set is composed of all groups with one of these names: ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL, ASX, GLX, UNK. | : This set is composed of all groups with one of these names: ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL, ASX, GLX, UNK. | ||
: This will usually be the 20 canonical proteinogenic amino acids, plus ambiguous Asp/Asn (ASX), ambiguous Glu/Gln (GLX) and 'unknown' (UNK). | : This will usually be the 20 canonical proteinogenic amino acids, plus ambiguous Asp/Asn (ASX), ambiguous Glu/Gln (GLX) and 'unknown' (UNK). | ||
− | * <tt>protein</tt> | + | * <tt>protein</tt> |
: The 'protein' set in Jmol is based solely on the atoms that make up the group. It is independent of the group name and independent of whether the atoms are defined in ATOM records or HETATM records. | : The 'protein' set in Jmol is based solely on the atoms that make up the group. It is independent of the group name and independent of whether the atoms are defined in ATOM records or HETATM records. | ||
: This has the advantage that modified groups and interesting things that are not amino acids can still be identified as part of the protein. | : This has the advantage that modified groups and interesting things that are not amino acids can still be identified as part of the protein. | ||
Line 32: | Line 32: | ||
:: If the group has exactly 1 atom whose name is CA then it is considered protein. The purpose of this is to pick up alpha-carbon-only models | :: If the group has exactly 1 atom whose name is CA then it is considered protein. The purpose of this is to pick up alpha-carbon-only models | ||
− | * <tt>nucleic</tt> | + | * <tt>nucleic</tt> |
: A group is considered 'nucleic' if it contains atoms with all of these names: C3*, O3*, O5*, N1, C2, N3, C4, C5, C6 (asterisk may be substituted by prime/apostrophe). It is independent of the group name and independent of whether the atoms are defined in ATOM records or HETATM records. | : A group is considered 'nucleic' if it contains atoms with all of these names: C3*, O3*, O5*, N1, C2, N3, C4, C5, C6 (asterisk may be substituted by prime/apostrophe). It is independent of the group name and independent of whether the atoms are defined in ATOM records or HETATM records. | ||
− | * <tt>rna</tt> | + | * <tt>rna</tt> |
: A group is 'rna' if it is <tt>nucleic</tt> and it also contains an atom named O2* (or O2'). | : A group is 'rna' if it is <tt>nucleic</tt> and it also contains an atom named O2* (or O2'). | ||
− | * <tt>dna</tt> | + | * <tt>dna</tt> |
: A group is 'dna' if it is <tt>nucleic</tt> and it '''does not''' contain an atom named O2* (or O2'). | : A group is 'dna' if it is <tt>nucleic</tt> and it '''does not''' contain an atom named O2* (or O2'). | ||
− | * <tt>carbohydrate</tt> | + | * <tt>carbohydrate</tt> |
: This is based only upon group name. | : This is based only upon group name. | ||
: This set is composed of all groups which name is in a (not comprehensive) list corresponding to common mono-, di- and trisaccharides ([[AtomSets/Carbohydrate|full list and identities]]). | : This set is composed of all groups which name is in a (not comprehensive) list corresponding to common mono-, di- and trisaccharides ([[AtomSets/Carbohydrate|full list and identities]]). | ||
− | * <tt>water</tt> | + | * <tt>water</tt> |
: Any molecule chemically interpreted as water (an oxygen atom connected to two hydrogen, deuterium or tritium atoms), plus groups named HOH, DOD, or WAT. | : Any molecule chemically interpreted as water (an oxygen atom connected to two hydrogen, deuterium or tritium atoms), plus groups named HOH, DOD, or WAT. | ||
:: The original implementation of this set was only the named groups. | :: The original implementation of this set was only the named groups. | ||
− | * <tt>ions</tt> | + | * <tt>ions</tt> |
: This set is not what might seem, but it's just composed of groups named PO4 or SO4. | : This set is not what might seem, but it's just composed of groups named PO4 or SO4. | ||
:: ''Explanation'': Both are common ions in protein crystals for X-ray diffraction, and this special set (which is RasMol syntax) is kept for backward compatibility. | :: ''Explanation'': Both are common ions in protein crystals for X-ray diffraction, and this special set (which is RasMol syntax) is kept for backward compatibility. | ||
− | * <tt>solvent</tt> | + | * <tt>solvent</tt> |
: This includes <tt>water</tt> (in its extended definition) and also UREA groups. | : This includes <tt>water</tt> (in its extended definition) and also UREA groups. | ||
:: The original implementation of this was <tt>water</tt> or <tt>ions</tt>. | :: The original implementation of this was <tt>water</tt> or <tt>ions</tt>. | ||
− | * <tt>ligand</tt> | + | * <tt>ligand</tt> |
:: ''Note: <code>ligand</code> will, in some occasions, '''not''' match what you expect.'' | :: ''Note: <code>ligand</code> will, in some occasions, '''not''' match what you expect.'' | ||
: The new definition (Jmol 12.2) includes atoms that do not belong to protein, nucleic or solvent. As a consequence, | : The new definition (Jmol 12.2) includes atoms that do not belong to protein, nucleic or solvent. As a consequence, | ||
Line 66: | Line 66: | ||
: Ligand was clasically defined as <tt>hetero</tt> and not <tt>solvent</tt> (and so not always what you would expect from the word ligand, either). | : Ligand was clasically defined as <tt>hetero</tt> and not <tt>solvent</tt> (and so not always what you would expect from the word ligand, either). | ||
− | == By type of residue | + | == By type of residue == |
Inclusion in these sets is mostly determined by residue id (only as far as it is written in the adequate field in the molecular coordinate file, usually PDB format). A few cases are also included based on their chemical structure and the presence of a distinguishing atom name. | Inclusion in these sets is mostly determined by residue id (only as far as it is written in the adequate field in the molecular coordinate file, usually PDB format). A few cases are also included based on their chemical structure and the presence of a distinguishing atom name. | ||
Line 82: | Line 82: | ||
* <tt>hetero, carbohydrate, ions, ligand, water, solvent</tt> (see above) | * <tt>hetero, carbohydrate, ions, ligand, water, solvent</tt> (see above) | ||
− | == By structure of the polymer | + | == By structure of the polymer == |
* <tt>helix, sheet, turn</tt> | * <tt>helix, sheet, turn</tt> | ||
* <tt>bonded</tt> | * <tt>bonded</tt> | ||
− | == Parts in proteins | + | == Parts in proteins == |
=== <tt>Backbone</tt> or <tt>Mainchain</tt> === | === <tt>Backbone</tt> or <tt>Mainchain</tt> === | ||
Inclusion in this set is determined by '''atom id'''*, as follows: | Inclusion in this set is determined by '''atom id'''*, as follows: | ||
Line 108: | Line 108: | ||
A set defined by '''atom id''' <tt>CA</tt>. | A set defined by '''atom id''' <tt>CA</tt>. | ||
− | == Parts in nucleic acids | + | == Parts in nucleic acids == |
=== <tt>Backbone</tt> or <tt>Mainchain</tt> === | === <tt>Backbone</tt> or <tt>Mainchain</tt> === | ||
Inclusion in this set is determined by '''atom id'''*, as follows: | Inclusion in this set is determined by '''atom id'''*, as follows: |
Revision as of 12:20, 31 October 2014
- Description of files in Jmol+JSmol distribution
- File formats read or written by Jmol
- The Jmol scripting interface
- Scripting as a programming language
- Complete reference of scripting commands:
- Loading models directly from databases
- Mouse manual
- Default colors used by Jmol
- Atom sets predefined in Jmol
- Support for bond orders · isotopes · stereochemistry · hydrogen bonds
- Jmol as editor
- Multi-touch support
- Copying and pasting state scripts between applets.
- Backward compatibility (changes of behavior across versions)
- Features added since version 10
- Users mailing list (and a mirror)
Predefined Atom Sets
Jmol recognizes and uses several keywords or tokens for several purposes: commands in the scripting language, colors, etc. Among them, there are keywords for predefined atom sets.
Please note that many of these keywords only apply to file formats that specify a residue ID or group ID as part of the information for each atom (most typically, pdb
and mmcif
formats, designed for macromolecules).
- Technical note: most of these are set in src/org/jmol/viewer/JmolConstants.java
By element name
carbon, oxygen, hydrogen, sulphur, etc.
On Jmol 11, element symbol preceded by underscore can also be used: _C, _O, _H, _S, etc. Also, deuterium or _D, and tritium or _T.
On PDB format, Jmol will identify the element from columns 77-78 (element symbol, right-justified). If this is absent, then it will interpret the "atom name" field (columns 13-14) to deduce the element identity.
Note: Jmol 10.2 has a bug by which it may read calcium as alpha carbon based on its ID, although identification by element name works properly. It's been fixed for Jmol 11.
By type of molecule
- amino
- This is based only upon group name and has nothing to do with the actual atoms that make up the group.
- This set is composed of all groups with one of these names: ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL, ASX, GLX, UNK.
- This will usually be the 20 canonical proteinogenic amino acids, plus ambiguous Asp/Asn (ASX), ambiguous Glu/Gln (GLX) and 'unknown' (UNK).
- protein
- The 'protein' set in Jmol is based solely on the atoms that make up the group. It is independent of the group name and independent of whether the atoms are defined in ATOM records or HETATM records.
- This has the advantage that modified groups and interesting things that are not amino acids can still be identified as part of the protein.
- The atoms that make up the set are recognized by the names of the atoms.
- Case 1:
- If the group has 4 atoms named N, CA, C and (O or O1) and they are bonded in the correct order, then the group is considered protein.
- Case 2:
- If the group has exactly 1 atom whose name is CA then it is considered protein. The purpose of this is to pick up alpha-carbon-only models
- nucleic
- A group is considered 'nucleic' if it contains atoms with all of these names: C3*, O3*, O5*, N1, C2, N3, C4, C5, C6 (asterisk may be substituted by prime/apostrophe). It is independent of the group name and independent of whether the atoms are defined in ATOM records or HETATM records.
- rna
- A group is 'rna' if it is nucleic and it also contains an atom named O2* (or O2').
- dna
- A group is 'dna' if it is nucleic and it does not contain an atom named O2* (or O2').
- carbohydrate
- This is based only upon group name.
- This set is composed of all groups which name is in a (not comprehensive) list corresponding to common mono-, di- and trisaccharides (full list and identities).
- water
- Any molecule chemically interpreted as water (an oxygen atom connected to two hydrogen, deuterium or tritium atoms), plus groups named HOH, DOD, or WAT.
- The original implementation of this set was only the named groups.
- ions
- This set is not what might seem, but it's just composed of groups named PO4 or SO4.
- Explanation: Both are common ions in protein crystals for X-ray diffraction, and this special set (which is RasMol syntax) is kept for backward compatibility.
- solvent
- This includes water (in its extended definition) and also UREA groups.
- The original implementation of this was water or ions.
- ligand
- Note:
ligand
will, in some occasions, not match what you expect.
- Note:
- The new definition (Jmol 12.2) includes atoms that do not belong to protein, nucleic or solvent. As a consequence,
- water, other solvent, ions and carbohydrate are considered
ligands
. - Nonstandard amino acids are not considered
ligands
. - Nonstandard nucleotides are not considered
ligands
. - Isolated nucleotides (e.g. ATP, GTP, AMP...) are not considered
ligands
(despite being understood by us as ligands; this is due to them being part of thenucleic
atom set)
- water, other solvent, ions and carbohydrate are considered
- Ligand was clasically defined as hetero and not solvent (and so not always what you would expect from the word ligand, either).
By type of residue
Inclusion in these sets is mostly determined by residue id (only as far as it is written in the adequate field in the molecular coordinate file, usually PDB format). A few cases are also included based on their chemical structure and the presence of a distinguishing atom name.
Residue IDs:
- Nucleotides: A, G, C, T, U, DA, DG, DC, DT but also based on their chemical nucleic structure and certain atom names. tu gets thiouridine.
- Amino acids: the 3-letter standard abbreviation
Residue sets:
- Nucleotides: purine, pyrimidine, at, cg
- Amino acids:
- acyclic, cyclic, aliphatic, aromatic
- large (Arg,Glu,Gln,His,Ile,Leu,Lys,Met,Phe,Trp,Tyr), medium (Asn,Asp,Cys,Pro,Thr,Val), small (Ala,Gly,Ser)
- polar, nonpolar, hydrophobic (Ala,Gly,Ile,Leu,Met,Phe,Pro,Trp,Tyr,Val), neutral, charged, acidic, negative, basic, positive, ...
- buried (Ala,Cys,Ile,Leu,Met,Phe,Trp,Val), surface
- hetero, carbohydrate, ions, ligand, water, solvent (see above)
By structure of the polymer
- helix, sheet, turn
- bonded
Parts in proteins
Backbone or Mainchain
Inclusion in this set is determined by atom id*, as follows:
- Peptide bond: N, H (bound to N), CA (alpha carbon), HA (bound to CA), C (carbonyl carbon), O or O1 (bound to C)
- In glycine, the two equivalent hydrogens are both in the backbone set: either H1 and 1HA or 1HA and 2HA.
- Termini:
- second carbonyl oxygen on C-terminus: OXT
- terminal amino hydrogens: 1H, 2H, 3H
(*)Note: on PDB format: atom id is called atom name, and must be in these positions/columns:
- 13-14 : Chemical symbol, right justified, except for hydrogen atoms
- 15 : Remoteness indicator (alphabetic); e.g., in amino acid residues, alpha = A, beta = B, gamma = G, delta = D, epsilon = E.
- 16 : Branch designator (numeric).
Sidechain
Defined as (not backbone).
Alpha
A set defined by atom id CA.
Parts in nucleic acids
Backbone or Mainchain
Inclusion in this set is determined by atom id*, as follows:
- Phosphate groups:
- phosphorus: P
- oxygens bound to phosphorus: O1P, O2P
- Atoms in pentose:
- carbon ring: C1', C2', C3', C4', C5'
- hydrogens attached to carbon ring: H1', 1H2', 2H2' (only DNA), H3', H4', 1H5' and 2H5'
- hydroxyls: O2', O3', O4', O5', 2HO' (H on 2'-hydroxyl, only RNA) (the ring oxygen is denoted O4, not O1).
- Note: old PDB files label pentose atoms with asterisk instead of prime signs. How does Jmol cope with this? Not much of a trouble: given the asterisk is a wildcard, "select C3*" will get pentose carbons either labeled with prime or asterisk!.
- Termini:
- 5'-terminus oxygen (no phosphate): O5T
- 5'-terminus hydrogen (attached to O5T or O5'): H5T
- 3'-terminus hydrogen (on 3'-hydroxyl): H3T
- Atoms in bases (excluded from backbone):
- ring, both purines and pyrimidines: N1, C2, N3, C4, C5, C6
- ring, purines: N7, C8, N9
- ring, pyrimidines: O2
- substituents on ring:
- in cytosine: N4
- in guanine: N2
- in adenine: N6
- in thymine: C5M
- in guanine and hypoxanthine: O6
- in thymine and uracil: O4
- in thiouracil: S4
(*)Note: on PDB format, atom id is called atom name, and must be in these positions/columns:
- 13-14 : Chemical symbol, right justified, except for hydrogen atoms
- 15 : Remoteness indicator (alphabetic).
- 16 : Branch designator (numeric).
Sidechain
Defined as (not backbone).
Bases
Synonym of (nucleic and sidechain).
User-defined Atom Sets
A set can be created, and assigned any group of atoms, by giving it a name, using
define whatever_name atom_expression
Later, that set's name can be used as the predefined ones, in commands select, restrict, display, hide
, etc.
You can also use JmolScript variables for this purpose.
See AtomSets/Popup Menu.