Difference between revisions of "Database Connection"

From Jmol
Jump to navigation Jump to search
(load from databases)
(adding American Mineralogist Crystal Structure Database)
 
(31 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Connection of Jmol to databases ==
+
= Summary =
 +
Shorthand notations (read below for details and explanation):
 +
 
 +
{| class="wikitable"
 +
|- style="text-align: center;"
 +
! RCSB PDB
 +
! RCSB Ligands
 +
! EBI PDB <br>Europe
 +
! NCI <br>(Cactus)
 +
! PubChem
 +
! CrystOD
 +
! AMCSD
 +
! Materials<br>Project
 +
|- style="text-align: center;"
 +
| =
 +
| ==
 +
| *
 +
| $
 +
| :
 +
|
 +
|
 +
|- style="text-align: center;"
 +
| =pdb/
 +
| =ligand/
 +
| =pdbe/
 +
| =nci/
 +
| =pubchem/
 +
| =cod/
 +
| =ams/
 +
| =mp/
 +
|}
 +
 
 +
= Connection of Jmol to databases =
 
Jmol can connect to certain databases in order to directly retrieve structures.
 
Jmol can connect to certain databases in order to directly retrieve structures.
This applies to the Jmol application and to the Jmol signed applet.
+
This applies to the Jmol application, to the JSmol HTML5 object and to the Jmol signed Java applet.
 +
(The unsigned applet is not allowed connection to external servers and so does not support this method.)
 +
 
 +
:''Note:'' After some changes (2016) in the way some databases are allowing access, old versions of Jmol will fail to retrieve the structures. '''Updating your Jmol will fix this problem.''' Example of this failure:
 +
<pre style="margin-left:5ex;">ERROR in script: unrecognized file format for file
 +
http://cactus.nci.nih.gov/chemical/structure/tylenol/file?format=sdf&get3d=True
 +
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
 +
<html><head>
 +
<title>301 Moved Permanently</title>
 +
</pre>
  
 
== PDB ==
 
== PDB ==
(The [http://www.pdb.org RCSB Protein Data Bank])
+
(The [http://www.pdb.org RCSB Protein Data Bank], also Wordlwide PDB, wwPDB)
  
 
Macromolecular structures may be retrieved from the PDB database:
 
Macromolecular structures may be retrieved from the PDB database:
 
* Using the Jmol application, top menu bar <code>File > Get PDB</code>. A dialog is displayed where you can type a 4-character PDB ID.
 
* Using the Jmol application, top menu bar <code>File > Get PDB</code>. A dialog is displayed where you can type a 4-character PDB ID.
* Using the script language: use an equal sign followed by the 4-character PDB ID (no spaces). Example: <code>load =1crn</code> for crambin.
+
* Using the script language:  
 +
** use '''an equal sign''' followed by the 4-character PDB ID (no spaces). Example: <code>load =1CRN</code>. This will retrieve the file in <code>pdb</code> format.
 +
** use '''an equal sign''' followed by <code>pdb</code>, a forward '''slash''', and the 4-character PDB ID. Example: <code>load =pdb/1CRN</code> for crambin.
 +
** use '''an equal sign''' followed by the 4-character PDB ID (no spaces) '''and <code>.cif</code>''' to retrieve the file in <code>mmCIF</code> format. Example: <code>load =1CRN.cif</code>. This format may be faster than <code>pdb</code> for large files.
 +
** use '''an equal sign''' followed by the 4-character PDB ID (no spaces) '''and <code>.mmtf</code>''' to retrieve the file in <code>MMTF</code> format. Example: <code>load =1CRN.mmtf</code>. This binary format is faster than <code>pdb</code> and <code>cif</code> for large files.
 +
 
 +
For additional options (nucleic acid secondary structure, validation or domain annotations), see [{{ScriptingDoc}}#loadfiletypes the <code>load</code> command]
 +
 
 +
Ligand structures can also be retrieved (in CIF format) from the PDB:
 +
* Using the Jmol application, top menu bar <code>File > Get PDB</code>. You must type an equal sign before the 3-character PDB ID of the ligand.
 +
* Using the script language:
 +
** use '''two equal signs'''. Example: <code>load ==ETB</code> for ethyl-coenzyme A.
 +
** use '''an equal sign''' followed by <code>ligand</code>, a forward '''slash''', and the 3-character ID. Example: <code>load =ligand/ETB</code>
 +
===Database location===
 +
By default Jmol will connect to the PDB server at https://files.rcsb.org/download/ for macromolecules (when using the '=id' option) and at https://files.rcsb.org/ligands/download/ for ligands (when using the '==id' option).
 +
To force the use of another server, you can do:
 +
set loadFormat = "  "
 +
set loadLigandFormat = "  "
 +
and put between the quotes the url in the proper request format; use <code>%FILE</code> at the position where the PDB ID should be inserted into the url.
 +
[{{ScriptingDoc}}#set_loadformat More details]
 +
 
 +
Examples:
 +
set loadFormat = "<nowiki>https://files.rcsb.org/download/%FILE.pdb</nowiki>"
 +
set loadFormat = "<nowiki>http://www.ebi.ac.uk/msd-srv/oca/oca-bin/save-pdb?id=%FILE</nowiki>"
 +
set loadLigandFormat = "<nowiki>https://files.rcsb.org/ligands/download/%FILE.cif</nowiki>"
 +
 
 +
=== PDB Europe ===
 +
[http://www.pdbe.org EMBL-EBI's Protein Data Bank in Europe] (PDBe) keeps a mirror of the PDB data.
  
Ligand structures can also be retrieved from the PDB:
+
To retrieve models from them, '''in mmCIF format''':
* Using the Jmol application, top menu bar <code>File > Get PDB</code>. You must type a = sign before the 3-character PDB ID of the ligand.
+
* Using the script language:  
* Using the script language: use two equal signs. Example: <code>load ==etb</code> for  ethyl-coenzyme A.
+
** use '''an asterisk sign''' followed by the 4-character PDB ID (no spaces). Example: <code>load *1CRN</code> for crambin.
 +
** use '''an equal sign''' followed by <code>pdbe</code>, a forward '''slash''', and the 4-character PDB ID (no spaces). Example: <code>load =pdbe/1CRN</code>
 +
 
 +
For additional options (nucleic acid secondary structure, validation or domain annotations), see [{{ScriptingDoc}}#loadfiletypes the <code>load</code> command]
  
 
== NCI/NIH ==
 
== NCI/NIH ==
Line 18: Line 89:
  
 
Chemical structures may be retrieved from this database by using a common name, a IUPAC name, a SMILES, an InChI, an InChIKey, a Chemical Abstracts registry number...
 
Chemical structures may be retrieved from this database by using a common name, a IUPAC name, a SMILES, an InChI, an InChIKey, a Chemical Abstracts registry number...
 +
 +
''Note'': the SMILES will be converted to a reasonable 3D model in the server.
 
* Using the Jmol application, top menu bar <code>File > Get MOL</code>. A dialog is displayed where you can type the name or identifier.
 
* Using the Jmol application, top menu bar <code>File > Get MOL</code>. A dialog is displayed where you can type the name or identifier.
* Using the script language: use a dollar sign followed by the name or identifier (no spaces). If the name contains spaces, enclose between quotes. Examples: <code>load $aspirin</code>; <code>load "$acetylsalycilic acid"</code>
+
* Using the script language:  
 +
** use '''a dollar sign''' followed by the name or identifier (no spaces). If the name contains spaces, enclose between quotes. Examples: <code>load $aspirin</code>; <code>load "$acetylsalycilic acid"</code>
 +
** use '''an equal sign''' followed by <code>nci</code>, a forward '''slash''', and the name or identifier. Example: <code>load =nci/aspirin</code>
 +
 
 +
===Database location===
 +
By default, Jmol will connect to the CACTUS server at https://cactus.nci.nih.gov/chemical/structure/
 +
To force the use of another server, you can do:
 +
set nihResolverFormat = "  "
 +
set smilesURLformat = "  "
 +
and put between the quotes the url in the proper request format; use <code>%FILE</code> at the position where the name or ID should be inserted into the url.
 +
[{{ScriptingDoc}}#set_loadformat More details]
 +
 
 +
Examples:
 +
set nihResolverFormat = "<nowiki>https://cactus.nci.nih.gov/chemical/structure/%FILE</nowiki>"
 +
set smilesURLformat = "<nowiki>https://cactus.nci.nih.gov/chemical/structure/%FILE/file?format=sdf&get3d=true</nowiki>"
 +
 
  
 
== PubChem ==
 
== PubChem ==
Line 26: Line 114:
 
Chemical structures may be retrieved from this database by name
 
Chemical structures may be retrieved from this database by name
 
* Using the Jmol application, top menu bar <code>File > Get MOL</code>. A dialog is displayed where you can type the name or identifier, prefixed with a : sign.
 
* Using the Jmol application, top menu bar <code>File > Get MOL</code>. A dialog is displayed where you can type the name or identifier, prefixed with a : sign.
* Using the script language: use a colon sign followed by the name or identifier (no spaces). If the name contains spaces, enclose between quotes. Examples: <code>load :aspirin</code>; <code>load ":acetylsalycilic acid"</code>
+
* Using the script language:  
 +
** use '''a colon sign''' followed by the name or identifier (no spaces). If the name contains spaces, enclose between quotes.  
 +
** use '''an equal sign''' followed by <code>pubchem</code>, a forward '''slash''', and the name or identifier.
 +
Examples:
 +
* <code>load :aspirin</code>
 +
* <code>load =pubchem/aspirin</code>
 +
* <code>load ":acetylsalycilic acid"</code>
 +
* <code>load ":103-90-2"</code>  (a number ID in Chemical Abstracts Service)
 +
For a more explicit indication of the kind of identifier that is being provided, a tag and an extra colon are included:
 +
* <code>load :name:tylenol</code>
 +
* <code>load :cid:1983</code>
 +
* <code>load :smiles:CC(=O)Nc1ccc(cc1)O</code>  Here the :smiles: tag is required
 +
 
 +
===Database location===
 +
By default, Jmol will connect to the PubChem server at https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/
 +
To force the use of another server, you can do:
 +
set pubChemFormat = "  "
 +
and put between the quotes the url in the proper request format; use <code>%FILE</code> at the position where the name or ID should be inserted into the url.
 +
 
 +
Example:
 +
<nowiki>https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/%FILE/SDF?record_type=3d</nowiki>
 +
 
 +
== Crystallographic Open Database ==
 +
The [http://www.crystallography.net/ COD website] holds an open-access collection of crystal structures of organic, inorganic, metal-organic compounds and minerals, excluding biopolymers.
 +
 
 +
Chemical structures (in CIF format) may be retrieved from this database by numeric code:
 +
* Using the script language: use '''an equal sign''' followed by <code>cod</code>, a forward '''slash''' and the numeric code of the compound (no spaces). Example:
 +
** <code>load =cod/1000373</code>  for sodium vanadium dioxide difluoride (symmetry space group P 1 21 1)
 +
** <code>load =cod/1000373 {444 666 1}</code>  for a display of the crystalline network of the same
 +
 
 +
 
 +
== American Mineralogist Crystal Structure Database ==
 +
The [http://rruff.geo.arizona.edu/AMS/amcsd.php AMCSD website] (hosted at the University of Arizona, USA) includes every structure published in the American Mineralogist, The Canadian Mineralogist, European Journal of Mineralogy and Physics and Chemistry of Minerals, as well as selected datasets from other journals.
 +
 
 +
Chemical structures (in CIF format) may be retrieved from this database by name, by 5-digit ID or by 7-digit ID. See
 +
[{{ScriptingDoc}}#loadfiletypes the scripting documentation] for more details (search for <code>ams/</code> to reach the entry).
 +
 
 +
Note: calling by structure name will retrieve a multi-model file with all the crystal structures that match the search.
 +
 
 +
== The Materials Project ==
 +
[https://www.materialsproject.org/ The Materials Project] provides open web-based access to computed information on known and predicted materials.
 +
 
 +
 
 +
= Connection of Jmol to resources by specifying a DOI =
 +
 
 +
Jmol can retrieve a model from a resource that complies with this:
 +
* the file to be loaded has a registered [[:en:Digital_object_identifier|DOI]]
 +
* the DOI metadata defining the path to the resource has a METS or [http://www.openarchives.org/ore/1.0/toc ORE] resource map.
 +
** You can find out if such a resource map exists by invoking e.g. https://data.datacite.org/application/vnd.datacite.datacite+xml/10.14469/hpc/4310 where the DOI is appended. The XML metadata file as downloaded should have a string e.g. <code> <relatedIdentifier relatedIdentifierType="URL" relationType="HasMetadata" relatedMetadataScheme="ORE" schemeURI="http://www.openarchives.org/ore/">https://data.hpc.imperial.ac.uk/resolve/?ore=4310</relatedIdentifier></code>
 +
 
 +
If the resource map exists, a mechanism may be implemented to retrieve the model from the DOI value into JSmol. There is an example [http://www.ch.ic.ac.uk/rzepa/talks/durham1/doi.html here], including links of the form:
 +
 
 +
<code> <a href="javascript:handle_jmol('10.14469/hpc/4310',%20';frame 1;spin 3;')">Load using a  DOI</a></code>
 +
 
 +
where 10.14469/hpc/4310 is the DOI. This only points to the DOI landing page and requires the deposition metadata which points to the resource map and which in turn defines the path to the required file from the landing page.  In this instance, the request retrieves by default a Gaussian log file, but the default can be reconfigured in the <code>resolve-doi.js</code> file.
 +
 
 +
The function <code>handle_jmol()</code> is processed in a custom JavaScript called <code>resolve-doi.js</code> (included in that example page). This works at least for a DSpace repository (which uses METS resource maps) and the Imperial repository (which uses ORE).
 +
 
 +
:''Note: If anyone knows of molecular files in other METS or ORE compliant data repositories, please share with [[User:Rzepa|Henry Rzepa]]''

Latest revision as of 09:52, 19 August 2023

Summary

Shorthand notations (read below for details and explanation):

RCSB PDB RCSB Ligands EBI PDB
Europe
NCI
(Cactus)
PubChem CrystOD AMCSD Materials
Project
= == * $ :
=pdb/ =ligand/ =pdbe/ =nci/ =pubchem/ =cod/ =ams/ =mp/

Connection of Jmol to databases

Jmol can connect to certain databases in order to directly retrieve structures. This applies to the Jmol application, to the JSmol HTML5 object and to the Jmol signed Java applet. (The unsigned applet is not allowed connection to external servers and so does not support this method.)

Note: After some changes (2016) in the way some databases are allowing access, old versions of Jmol will fail to retrieve the structures. Updating your Jmol will fix this problem. Example of this failure:
ERROR in script: unrecognized file format for file
http://cactus.nci.nih.gov/chemical/structure/tylenol/file?format=sdf&get3d=True
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>

PDB

(The RCSB Protein Data Bank, also Wordlwide PDB, wwPDB)

Macromolecular structures may be retrieved from the PDB database:

  • Using the Jmol application, top menu bar File > Get PDB. A dialog is displayed where you can type a 4-character PDB ID.
  • Using the script language:
    • use an equal sign followed by the 4-character PDB ID (no spaces). Example: load =1CRN. This will retrieve the file in pdb format.
    • use an equal sign followed by pdb, a forward slash, and the 4-character PDB ID. Example: load =pdb/1CRN for crambin.
    • use an equal sign followed by the 4-character PDB ID (no spaces) and .cif to retrieve the file in mmCIF format. Example: load =1CRN.cif. This format may be faster than pdb for large files.
    • use an equal sign followed by the 4-character PDB ID (no spaces) and .mmtf to retrieve the file in MMTF format. Example: load =1CRN.mmtf. This binary format is faster than pdb and cif for large files.

For additional options (nucleic acid secondary structure, validation or domain annotations), see the load command

Ligand structures can also be retrieved (in CIF format) from the PDB:

  • Using the Jmol application, top menu bar File > Get PDB. You must type an equal sign before the 3-character PDB ID of the ligand.
  • Using the script language:
    • use two equal signs. Example: load ==ETB for ethyl-coenzyme A.
    • use an equal sign followed by ligand, a forward slash, and the 3-character ID. Example: load =ligand/ETB

Database location

By default Jmol will connect to the PDB server at https://files.rcsb.org/download/ for macromolecules (when using the '=id' option) and at https://files.rcsb.org/ligands/download/ for ligands (when using the '==id' option). To force the use of another server, you can do:

set loadFormat = "  "
set loadLigandFormat = "  "

and put between the quotes the url in the proper request format; use %FILE at the position where the PDB ID should be inserted into the url. More details

Examples:

set loadFormat = "https://files.rcsb.org/download/%FILE.pdb"
set loadFormat = "http://www.ebi.ac.uk/msd-srv/oca/oca-bin/save-pdb?id=%FILE"
set loadLigandFormat = "https://files.rcsb.org/ligands/download/%FILE.cif"

PDB Europe

EMBL-EBI's Protein Data Bank in Europe (PDBe) keeps a mirror of the PDB data.

To retrieve models from them, in mmCIF format:

  • Using the script language:
    • use an asterisk sign followed by the 4-character PDB ID (no spaces). Example: load *1CRN for crambin.
    • use an equal sign followed by pdbe, a forward slash, and the 4-character PDB ID (no spaces). Example: load =pdbe/1CRN

For additional options (nucleic acid secondary structure, validation or domain annotations), see the load command

NCI/NIH

National Cancer Institute, CADD Group's Chemoinformatics Tools, Chemical Identifier Resolver (CACTUS server).

Chemical structures may be retrieved from this database by using a common name, a IUPAC name, a SMILES, an InChI, an InChIKey, a Chemical Abstracts registry number...

Note: the SMILES will be converted to a reasonable 3D model in the server.

  • Using the Jmol application, top menu bar File > Get MOL. A dialog is displayed where you can type the name or identifier.
  • Using the script language:
    • use a dollar sign followed by the name or identifier (no spaces). If the name contains spaces, enclose between quotes. Examples: load $aspirin; load "$acetylsalycilic acid"
    • use an equal sign followed by nci, a forward slash, and the name or identifier. Example: load =nci/aspirin

Database location

By default, Jmol will connect to the CACTUS server at https://cactus.nci.nih.gov/chemical/structure/ To force the use of another server, you can do:

set nihResolverFormat = "  "
set smilesURLformat = "  "

and put between the quotes the url in the proper request format; use %FILE at the position where the name or ID should be inserted into the url. More details

Examples:

set nihResolverFormat = "https://cactus.nci.nih.gov/chemical/structure/%FILE"
set smilesURLformat = "https://cactus.nci.nih.gov/chemical/structure/%FILE/file?format=sdf&get3d=true"


PubChem

National Center for Biotechnology Information, PubChem.

Chemical structures may be retrieved from this database by name

  • Using the Jmol application, top menu bar File > Get MOL. A dialog is displayed where you can type the name or identifier, prefixed with a : sign.
  • Using the script language:
    • use a colon sign followed by the name or identifier (no spaces). If the name contains spaces, enclose between quotes.
    • use an equal sign followed by pubchem, a forward slash, and the name or identifier.

Examples:

  • load :aspirin
  • load =pubchem/aspirin
  • load ":acetylsalycilic acid"
  • load ":103-90-2" (a number ID in Chemical Abstracts Service)

For a more explicit indication of the kind of identifier that is being provided, a tag and an extra colon are included:

  • load :name:tylenol
  • load :cid:1983
  • load :smiles:CC(=O)Nc1ccc(cc1)O Here the :smiles: tag is required

Database location

By default, Jmol will connect to the PubChem server at https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/ To force the use of another server, you can do:

set pubChemFormat = "  "

and put between the quotes the url in the proper request format; use %FILE at the position where the name or ID should be inserted into the url.

Example:

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/%FILE/SDF?record_type=3d

Crystallographic Open Database

The COD website holds an open-access collection of crystal structures of organic, inorganic, metal-organic compounds and minerals, excluding biopolymers.

Chemical structures (in CIF format) may be retrieved from this database by numeric code:

  • Using the script language: use an equal sign followed by cod, a forward slash and the numeric code of the compound (no spaces). Example:
    • load =cod/1000373 for sodium vanadium dioxide difluoride (symmetry space group P 1 21 1)
    • load =cod/1000373 {444 666 1} for a display of the crystalline network of the same


American Mineralogist Crystal Structure Database

The AMCSD website (hosted at the University of Arizona, USA) includes every structure published in the American Mineralogist, The Canadian Mineralogist, European Journal of Mineralogy and Physics and Chemistry of Minerals, as well as selected datasets from other journals.

Chemical structures (in CIF format) may be retrieved from this database by name, by 5-digit ID or by 7-digit ID. See the scripting documentation for more details (search for ams/ to reach the entry).

Note: calling by structure name will retrieve a multi-model file with all the crystal structures that match the search.

The Materials Project

The Materials Project provides open web-based access to computed information on known and predicted materials.


Connection of Jmol to resources by specifying a DOI

Jmol can retrieve a model from a resource that complies with this:

If the resource map exists, a mechanism may be implemented to retrieve the model from the DOI value into JSmol. There is an example here, including links of the form:

<a href="javascript:handle_jmol('10.14469/hpc/4310',%20';frame 1;spin 3;')">Load using a DOI</a>

where 10.14469/hpc/4310 is the DOI. This only points to the DOI landing page and requires the deposition metadata which points to the resource map and which in turn defines the path to the required file from the landing page. In this instance, the request retrieves by default a Gaussian log file, but the default can be reconfigured in the resolve-doi.js file.

The function handle_jmol() is processed in a custom JavaScript called resolve-doi.js (included in that example page). This works at least for a DSpace repository (which uses METS resource maps) and the Imperial repository (which uses ORE).

Note: If anyone knows of molecular files in other METS or ORE compliant data repositories, please share with Henry Rzepa

Contributors

AngelHerraez, Rzepa