Critical Assessment of Small Molecule Identification

For each challenge, the rank of the winner(s) is highlighted in bold. If the submission did not contain the correct candidate this is denoted as "-". If someone did not participate in a challenge, nothing is shown. The molecular formula was provided with the challenge data for challenges 7, 8, 13 and 14, these have been excluded from the table here.

	an	es	fa	kd	lr
challenge1	1	2	1	1	1
challenge2	1	1	-	1	1
challenge3	1	1	-	1	1
challenge4	1	1	1	1	1
challenge5	1	1	3	1	1
challenge6	1	8		1	1
challenge9	1	1	1	1	1
challenge10	1	45	1	1	1
challenge11	1	4		1
challenge12	1	9		-
challenge15	1	-		-
challenge16	1	-	2	1

Participant information and abstracts

ParticipantID:        AN
Category:             category1
Authors:              Andrew Newsome and Dejan Nikolic
Affiliations:         University of Illinois at Chicago, College of Pharmacy, 
		      Department of Medicinal Chemistry & Pharmacognosy, 
                      Chicago Mass Spectrometry Laboratory
Automatic methods:    No

Abstract

Formula candidates were determined on a case by case basis using manual methods.
The manual methods used to arrive at structural candidates typically involved a
combination of monoisotopic fragment ion and neutral loss formula analysis using
a formula calculator and Excel spreadsheet as well as database searching on the
accurate masses of molecular ions, fragment ions, neutral losses, and potential
formulas thereof. The search databases most often employed were Chemspider,
SciFinder Scholar, Reaxys, and Google Scholar. Literature consultation, deductive
reasoning, and tacit knowledge and experience were also used. In many cases, the
formula could be determined strictly from the accurate mass and fragment ion
analysis. In some cases, a formula candidate was not decided upon until after the
category 2 structure candidates were determined.

For ranking candidate structures, a subjective confidence scale from 0.60 to 1.00
was used. Structures were placed on the scale based upon how "confident" we felt
about the proposed structure from our overall assessment of the fit of the
candidates to the challenge data. The confidence scale ranking brackets are
defined as follows:

1.00: Full confidence that the single candidate is the correct formula. 
0.90 to 0.99: High confidence that candidate is the correct formula.
0.80 to 0.89: Good confidence that candidate is the correct formula.
0.70 to 0.79: Fair confidence that candidate is the correct formula.
0.60 to 0.69: Poor confidence that candidate is the correct formula.

Formula candidates were submitted for all of the challenges. Adduct formulas were
provided for challenges 7, 8, 13,and 14.  There were no cases where more than one
formula was submitted, but some formula submissions were ranked at a higher level
of confidence than others.

ParticipantID:        ES

Category:	      category1
Authors:              Schymanski, Emma
Affiliations:         (1) Eawag: Swiss Federal Institute of Aquatic Science and 
		      Technology, Überlandstrasse 133, CH-8600 Dübendorf, 
                      Switzerland 
Automatic pipeline:   no
Abstract:

Category 1 challenges were processed with MOLGEN-MS/MS using the elements CHNOPS
where no evidence of halogens was present. Additional parameters were adjusted
according to the AnalyticalMethods files. The mode was either [M+H]+ or [M-H]-,
depending on whether positive or negative mode was quoted in the files. As most
were ESI ionisation, this is a reasonable (but not foolproof) assumption.

Some challenges were filtered by ring and double-bond counts as this was given 
as "clues" (aromatic structure present, amide bonds). Default parameters were MS
accuracy 5 ppm, MS/MS accuracy 10 ppm, using the existence filter and allowing
"OEI" ions to explain MS/MS peaks.

Generally the combined match value was used as the score, except for challenge 15
where no MS isotope pattern was present - in this case MSMSMV scaled by ppm was used.

Where multiple MS/MS files were available, these were scaled to relative intensities 
and merged taking the peak of highest relative intensity where the same peak 
occurred more than once. Isotope patterns in the MS/MS were removed where the accuracy 
in the MS/MS was sufficient to unequivocally identify the peak as an isotope 
and not a fragment.

The results were cross-checked with the category 2 submissions by Schymane et. al.

ParticipantID:        KD
Category:	      category1
Authors:              Kai Dührkop, Sebastian Böcker
Affiliations:         Chair for Bioinformatics, Friedrich-Schiller-University, 
                      Jena, Germany
Automatic methods:   yes

Abstract

The spectral data (ms and ms/ms) was analyzed using the newest (not yet published
and still in progress) version of the SIRIUS command line tool. The isotope
pattern analysis limits itself to [M+H]+ and [M+Na]+ ion adducts in positive mode
and [M-H]- adducts in negative mode. The chosen allowed mass deviation depends on
the instrument:

Orbitrap: 5 ppm
ToF (positive): 10 ppm
ToF (negative): 20 ppm
FTICR: 2 ppm

We used the alphabet CHNOPSClBrIF, but we set upperbounds for certain elements to
speed up computations: F, I and S are restricted to 6 occurences per molecule.
Cl and P are restricted to 3 occurences per molecule.  Br is restricted to one
occurence per molecule.  For molecules with mass greater than 900 Da we used only
the alphabet CHNOPS.

The molecular formula identification of SIRIUS is an automatic method. It is
complete de-novo and does not perform any database search: Neither in compound
databases nor in spectral databases.

The output of SIRIUS is a list of all possible molecular formulas within the
allowed mass range together with their scores. We (automatically) transformed
this output list to a new representation which is more suitable for this contest:

The best formula candidate gets score 1.0. Following formulas get a logarithmic
decreasing score.  Formulas which SIRIUS score differs more than 10% from the
SIRIUS score of the best candidate formula are excluded.

The challenges 7, 8, 13 and 14 are exluded, as the correct molecular formula is
given in the challenge's description. For challenge 15 and 16 we ignored the
MS/MS spectra with unit mass resolution.

ParticipantID:        LR
Category:             category1
Authors:              Ridder, Lars(1) and Hooft, Justin J.J. van der(2)
Affiliations:         (1) Wageningen University, Laboratory of Biochemistry,
                      Wageningen, The Netherlands
                      (2) University of Glasgow, College of Medical, 
                      Veterinary, and Life Sciences, United Kingdom
Automatic methods:    yes
Spectral libraries:   no

Abstract
The challenge peak lists were converted to MAGMa input files, and processed
with MAGMa using candidate molecules from PubChem, as described in the
metadata file for category 2.
Submissions for category 1 consists of the lists of unique molecular formula's
obtained in category 2. The provided scores correspond to the highest scoring
candidate (in category 2) with the given molecular formula.
This submission is supported by the observation in Ridder et al. (2012) and Ridder
et al. (2013) that, even if the top scoring candidate structure in MAGMa is
not correct, the molecular formula often is.

References:
Ridder, L.; van der Hooft, J. J. J.; Verhoeven, S.; de Vos, R. C. H.; van Schaik, R.;
Vervoort (2012) J. Rapid Commun. Mass Spectrom. 26, 2461-2471.
Ridder, L.; van der Hooft, J. J. J.; Verhoeven, S.; de Vos, R. C. H.; Bino, R. J.;
Vervoort (2013) J. Anal. Chem. 85, 6033-6040.

ParticipantID:        FA
Category:             category1 and category2
Authors:              Allen, Felicity and Greiner, Russ
Affiliations:         Department of Computing Science
		      University of Alberta, Canada
Automatic methods:   yes

Abstract
A list of candidate structures was obtained by querying PubChem
for all structures within 10ppm of the precursor mass (or with
the given molecular formula if this was provided). For cases where 
the precursor mass was not provided, this value was deduced manually 
by considering the MS2 and MS1 data. Where further specific 
information was provided, the candidate lists were filtered using 
that information e.g. aromaticity, amide bonds.

The candidate lists were then processed with the input spectra by
the program cfm-id (http://sourceforge.net/projects/cfm-id/) to
produce a ranked list of structures for Category 2. A Single-Energy 
CFM model was used, for which parameters were trained using 
non-peptide metabolite data from Metlin, as described in
http://arxiv.org/abs/1312.0264 and stored in the supplementary
data section of the above sourceforge project. Since the model
expects a low, medium and high energy spectrum, whereas the
challenge data (except 16) only has one spectrum, we repeated 
the provided spectrum for all three energy levels. For Challenge 16,
we repeated the CE20 spectrum for low and medium and used the
CE40 spectrum for high. All spectra were pre-processed - peaks below 
1% intensity relative to the highest peaks were removed.

For Category 1, the molecular formula was computed for each structure
from Category 2 and kept in the same order. The list was then processed
to remove duplicate entries, keeping only the highest ranked listing 
for each unique molecular formula.

Submission is only made for positive ion mode, since cfm-id does not
currently support negative mode.

Details per Challenge and Participant. See legend at bottom for more details

The table is also available as CSV download. The individual submissions are also available for download.

participant	challenge	rank	tc	bc	wc	ec	rrp	p	wbc	wwc	wec	wrrp
an	challenge1	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge1	2	11	0	9	2	0.95	0.14	0.00	0.72	0.14	0.86
fa	challenge1	1	34	0	33	1	1.00	0.46	0.00	0.54	0.00	1.00
kd	challenge1	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
lr	challenge1	1	5	0	4	1	1.00	0.36	0.00	0.64	0.00	1.00
an	challenge2	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge2	1	2	0	1	1	1.00	0.55	0.00	0.45	0.00	1.00
fa	challenge2	-	5	-	-	-	-	-	-	-	-	-
kd	challenge2	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
lr	challenge2	1	2	0	1	1	1.00	0.53	0.00	0.47	0.00	1.00
an	challenge3	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge3	1	3	0	2	1	1.00	0.60	0.00	0.40	0.00	1.00
fa	challenge3	-	4	-	-	-	-	-	-	-	-	-
kd	challenge3	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
lr	challenge3	1	5	0	4	1	1.00	0.62	0.00	0.38	0.00	1.00
an	challenge4	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge4	1	4	0	3	1	1.00	0.47	0.00	0.53	0.00	1.00
fa	challenge4	1	13	0	12	1	1.00	0.44	0.00	0.56	0.00	1.00
kd	challenge4	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
lr	challenge4	1	2	0	1	1	1.00	0.72	0.00	0.28	0.00	1.00
an	challenge5	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge5	1	9	0	8	1	1.00	0.18	0.00	0.82	0.00	1.00
fa	challenge5	3	37	2	34	1	0.94	0.11	0.23	0.66	0.00	0.77
kd	challenge5	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
lr	challenge5	1	4	0	3	1	1.00	0.33	0.00	0.67	0.00	1.00
an	challenge6	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge6	8	144	7	136	1	0.95	0.01	0.06	0.94	0.00	0.94
kd	challenge6	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
lr	challenge6	1	2	0	1	1	1.00	0.67	0.00	0.33	0.00	1.00
an	challenge9	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge9	1	6	0	5	1	1.00	0.19	0.00	0.81	0.00	1.00
fa	challenge9	1	21	0	20	1	1.00	0.16	0.00	0.84	0.00	1.00
kd	challenge9	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
lr	challenge9	1	15	0	14	1	1.00	0.11	0.00	0.89	0.00	1.00
an	challenge10	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge10	45	283	44	238	1	0.84	0.00	0.17	0.83	0.00	0.83
fa	challenge10	1	10	0	9	1	1.00	0.57	0.00	0.43	0.00	1.00
kd	challenge10	1	18	0	17	1	1.00	0.50	0.00	0.50	0.00	1.00
lr	challenge10	1	7	0	6	1	1.00	0.27	0.00	0.73	0.00	1.00
an	challenge11	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge11	4	41	3	37	1	0.93	0.03	0.11	0.86	0.00	0.89
kd	challenge11	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
an	challenge12	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge12	9	27	8	18	1	0.69	0.05	0.45	0.50	0.00	0.55
kd	challenge12	-	1	-	-	-	-	-	-	-	-	-
an	challenge15	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge15	-	177	-	-	-	-	-	-	-	-	-
kd	challenge15	-	11	-	-	-	-	-	-	-	-	-
an	challenge16	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge16	-	6	-	-	-	-	-	-	-	-	-
fa	challenge16	2	51	1	49	1	0.98	0.04	0.04	0.92	0.00	0.96
kd	challenge16	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00

Table legend:

rank: Absolute rank of correct solution
tc: Total number of candidates
bc: Number of candidates with a score better than correct solution
wc: Number of candidates with a score worse than correct solution
ec: Number of candidates with same score as the correct solution
rrp: Relative ranking position (1.0 is good, 0.0 is not)
p: Score of correct solution
wbc: Sum of scores better than correct solution
wwc: Sum of scores worse than correct solution
wec: Sum of scores equal to correct solution
wrrp: RRP weighted by the scores (1 is good)