Critical Assessment of Small Molecule Identification

For each challenge, the rank of the winner(s) is highlighted in bold. If the submission did not contain the correct candidate this is denoted as "-". If someone did not participate in a challenge, nothing is shown.

	an	ds	es	fa	lr	tm
challenge1	1	1	9	12	1	-
challenge2	1	1	44	-	3	-
challenge3	-		21	-	17	-
challenge4	1	-	238	18	78	-
challenge5	1	1	4	9	2	-
challenge6	1	1	1		1	-
challenge7	1	1	17	23	1	-
challenge8	1	2	1	1	1	-
challenge9	1	1	1	2	1	-
challenge10	1	1	1	1	1	-
challenge11	2	6	21
challenge11(tautomer1)	3	5	1
challenge11(tautomer2)	1	-	22
challenge12	1	3	35
challenge13			12	24	42
challenge14	1	2	1	761	5
challenge15	1	1	-
challenge16	1	1	-	100

Participant information and abstracts

ParticipantID:        AN
Category:             category2
Authors:              Andrew Newsome and Dejan Nikolic
Affiliations:         University of Illinois at Chicago, College of Pharmacy, 
		      Department of Medicinal Chemistry & Pharmacognosy, 
                      Chicago Mass Spectrometry Laboratory
Automatic methods:    No

Abstract Structure candidates were determined on a case by case basis using
manual methods. The manual methods used to arrive at structural candidates
typically involved a combination of monoisotopic fragment ion and neutral loss
formula analysis, database searching on molecular ion and fragment formulas and
monoisotopic masses, literature consultation, deductive reasoning, and tacit
knowledge and experience. The search databases most often employed were
Chemspider, SciFinder Scholar, Reaxys, and Google Scholar.

For ranking candidate structures, a subjective confidence scale from 0.60 to 1.00
was used. Structures were placed on the scale based upon how "confident" we felt
about the proposed structure from our overall assessment of the fit of the
candidates to the challenge data. The confidence scale ranking brackets are
defined as follows:

1.00: Full confidence that the single candidate is the correct structure. 
0.90 to 0.99: High confidence that candidate is the correct structure. 
0.80 to 0.89: Good confidence that candidate is the correct structure. 
0.70 to 0.79: Fair confidence that candidate is the correct structure. 
0.60 to 0.69: Poor confidence that candidate is the correct structure. 

Where several possible structural isomers existed that matched the challenge
data, isomers that were thought to be more likely were placed in a higher ranking
bracket. In cases where many other possible structures existed that could
potentially match the challenge data, we noted this in the respective abstract
and lowered the confidence score for the submission accordingly. Structures
placed in the same ranking bracket were regarded as equally likely. Structure
candidates were submitted for all challenges except for challenge 13.

ParticipantID:        FA
Category:             category1 and category2
Authors:              Allen, Felicity and Greiner, Russ
Affiliations:         Department of Computing Science
		      University of Alberta, Canada
Automatic methods:   yes

Abstract
A list of candidate structures was obtained by querying PubChem
for all structures within 10ppm of the precursor mass (or with
the given molecular formula if this was provided). For cases where 
the precursor mass was not provided, this value was deduced manually 
by considering the MS2 and MS1 data. Where further specific 
information was provided, the candidate lists were filtered using 
that information e.g. aromaticity, amide bonds.

The candidate lists were then processed with the input spectra by
the program cfm-id (http://sourceforge.net/projects/cfm-id/) to
produce a ranked list of structures for Category 2. A Single-Energy 
CFM model was used, for which parameters were trained using 
non-peptide metabolite data from Metlin, as described in
http://arxiv.org/abs/1312.0264 and stored in the supplementary
data section of the above sourceforge project. Since the model
expects a low, medium and high energy spectrum, whereas the
challenge data (except 16) only has one spectrum, we repeated 
the provided spectrum for all three energy levels. For Challenge 16,
we repeated the CE20 spectrum for low and medium and used the
CE40 spectrum for high. All spectra were pre-processed - peaks below 
1% intensity relative to the highest peaks were removed.

For Category 1, the molecular formula was computed for each structure
from Category 2 and kept in the same order. The list was then processed
to remove duplicate entries, keeping only the highest ranked listing 
for each unique molecular formula.

Submission is only made for positive ion mode, since cfm-id does not
currently support negative mode.

ParticipantID:		ES
Category:		category2
Authors: 		Emma Schymanski(1), Michael Gerlich(2), Christoph Ruttkies(2), 
                        Steffen Neumann(2)
Affiliations: 		(1) Eawag: Swiss Federal Institute of Aquatic Science and 
			Technology, Überlandstrasse 133, CH-8600 Dübendorf, Switzerland 
			(2) IPB: Leibniz Institute of Plant Biochemistry, 
			Department of Stress and Development Biology, Weinberg 3, 
			DE-06120 Halle (Saale), Germany.
Automatic pipeline:	yes
Abstract:

Category 2 challenges were processed using MetFrag and MetFusion using compound
database queries. Three databases (PubChem, ChemSpider and KEGG) were queried and
the results were merged to create one candidate list, taking the maximum score
for entries in more than one database. MetFusion also used massbank.jp to
retrieve spectral information, with default parameters. Information from the
Analytical Methods files and Category 1 results (ES) were used to adjust
input parameters for the automatic pipeline.

The formula was used for candidate retrieval where this was given or clear from
Category 1, otherwise the exact mass was used. We enabled the element filter CHNOPS 
unless we obtained high scoring non-CHNOPS candidates without it. The exact mass 
database retrieval and fragmentation parameters were adjusted according to the expected 
or quoted instrument accuracy. Manual checking of the automatic calculations 
were performed to detect any anomolies.

Where the MetFusion scores were poor (few or no matching spectra from MassBank),
MetFrag results were submitted.

ParticipantID:		DS
Category:		2
Author:			Daniel L. Sweeney
Affiliation: 	       	MathSpec, Inc., Arlington Heights, IL
Automated Process:     	yes
Spectral Libraries:    	No

Abstract:

Used Rational Numbers Search software and searched the mass spectral
peak lists against a database of approximately 200000 molecules that
had been partitioned for rapid mass spectral searching.  The InChi
structures were copied from the PubChem entry for the corresponding
compound.

In challenges where the precursor ion was not present in the MS/MS
data file, the precursor ion was copied from the MS data file.  All
ions greater in mass than the precursor ion, if present in the MS/MS
data file, were removed prior to the analysis.  The isotope data was
not used.

Challenges 7 and 8 were done manually.  Challenge 10 was done manually
with the aid of an Excel Add-In.

Attempts were made to identify all sixteen compounds, but no
possibilities were found for challenges 3 or 13.

ParticipantID:	TM
Category:	category2
Authors:	Miyazaki, Tsubasa and Hisayuki, Horai
Affiliations:	Ibaraki National College of Technology, 
                Dept. of Electronic and Computer Engineering, Hitachinaka, Japan
Automatic methods:	no

Abstract
CH1:
1. We mapped spectra in MassBank and the challenge spectrum data SP to a vector
space and found a compound C whose spectrum SP1 is the most similar to SP.
2. Because the distance between spectra of SP and SP1 is relatively small and the
highest m/z values in SP1 and SP is equivalent, C was decided to be the result of
the challenge.

CH5:
1. We mapped spectra in MassBank and the challenge spectrum data SP to a vector
space and found a compound C whose spectrum SP1 is the most similar to SP.
2. Because the distance between spectra of SP and SP1 is relatively small and the
highest m/z value in SP is lesser than the one of SP1, the result of the
challenge is decided to be a substructure of C.  

CH3, CH6:
1. We mapped spectra in MassBank and the challenge spectrum data SP to a vector
space and found a compound C whose spectrum SP1 is the most similar to SP.
2. Because the distance between spectra of SP and SP1 is relatively small and the
highest m/z value in SP is greater than the one of SP1, C is decided to be a
substructure of the result of the challenge.  

CH2, CH4, CH7, CH8, CH9, CH10:
1. We mapped spectra in MassBank and the challenge spectrum data SP to a vector
space and found a compound whose spectrum SP1 is the most similar to SP.
2. Because the distance between spectra of SP and SP1 is quite large, we choses
important peaks P and search spectra which have peaks whose m/z values are equal
to Ps'.  The choice of PP is based on the intuition the analyst.
3. The score indicates the confidence of the analyst.

ParticipantID:        LR
Category:             category2
Authors:              Ridder, Lars(1) and Hooft, Justin J.J. van der(2)
Affiliations:         (1) Wageningen University, Laboratory of Biochemistry,
                      Wageningen, The Netherlands
                      (2) University of Glasgow, College of Medical, 
                      Veterinary, and Life Sciences, United Kingdom
Automatic methods:    yes

Abstract
The challenge peak lists were converted to MAGMa input files, and processed
with MAGMa using candidate molecules from PubChem (Ridder et al. 2012,
Ridder et al. 2013)
This method is available here: http://www.emetabolomics.org/magma. It does
not make use of searches in spectral libraries.
By default MAGMa is restricted to candidate molecules from PubChem <1200 Da
and consisting of the elements C,H,N,O,P and S. 
For challenges 7, 8 and 9 candidates molecules were retrieved from PubChem 
outside the default restrictions. Candidates for 7 and 8 were >1200 Da,
and challenge 9 was recognised to contain chloro atoms, based on the isotope
pattern, so candidate molecules with halogens were included.
The reported score represents the "refined ranking" as described in Ridder
et al. (2013). For challenges 1, 2 and 14 de large numbers of PubChem candidates
obtained initially were reduced based on a threshold of 5 on the number of 
related PubChem references. No submissions are made for challenges 11,12,15 and 
16 for which none of the retrieved pubchem candidates (based on default restrictions) 
provided a satisfactory match in MAGMa between fragment ions and in silico 
substructures.

References:
Ridder, L.; van der Hooft, J. J. J.; Verhoeven, S.; de Vos, R. C. H.; van Schaik, R.;
Vervoort (2012) J. Rapid Commun. Mass Spectrom. 26, 2461-2471.
Ridder, L.; van der Hooft, J. J. J.; Verhoeven, S.; de Vos, R. C. H.; Bino, R. J.;
Vervoort (2013) J. Anal. Chem. 85, 6033-6040.

Details per Challenge and Participant. See legend at bottom for more details

The details table is also available as CSV download. The individual submissions are also available for download.

participant	challenge	rank	tc	bc	wc	ec	rrp	p	wbc	wwc	wec	wrrp
an	challenge1	1	2	0	1	1	1.00	0.60	0.00	0.40	0.00	1.00
ds	challenge1	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge1	9	5631	8	5622	1	1.00	0.00	0.00	1.00	0.00	1.00
fa	challenge1	12	6767	9	6755	3	1.00	0.01	0.06	0.92	0.01	0.92
lr	challenge1	1	1084	0	1083	1	1.00	0.00	0.00	1.00	0.00	1.00
tm	challenge1	-	1	-	-	-	-	-	-	-	-	-
an	challenge2	1	2	0	1	1	1.00	0.60	0.00	0.40	0.00	1.00
ds	challenge2	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge2	44	12702	43	12658	1	1.00	0.00	0.00	1.00	0.00	1.00
fa	challenge2	-	131	-	-	-	-	-	-	-	-	-
lr	challenge2	3	631	2	628	1	1.00	0.00	0.01	0.99	0.00	0.99
tm	challenge2	-	1	-	-	-	-	-	-	-	-	-
an	challenge3	-	1	-	-	-	-	-	-	-	-	-
es	challenge3	21	335	0	314	21	0.97	0.00	0.00	0.90	0.10	0.90
fa	challenge3	-	18	-	-	-	-	-	-	-	-	-
lr	challenge3	17	370	2	353	15	0.98	0.01	0.01	0.91	0.07	0.92
tm	challenge3	-	1	-	-	-	-	-	-	-	-	-
an	challenge4	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
ds	challenge4	-	10	-	-	-	-	-	-	-	-	-
es	challenge4	238	721	236	483	2	0.67	0.00	0.37	0.63	0.00	0.63
fa	challenge4	18	1622	16	1604	2	0.99	0.00	0.04	0.95	0.00	0.96
lr	challenge4	78	825	77	747	1	0.91	0.00	0.18	0.82	0.00	0.82
tm	challenge4	-	1	-	-	-	-	-	-	-	-	-
an	challenge5	1	3	0	2	1	1.00	0.42	0.00	0.58	0.00	1.00
ds	challenge5	1	2	0	1	1	1.00	0.60	0.00	0.40	0.00	1.00
es	challenge5	4	366	3	362	1	0.99	0.00	0.01	0.98	0.00	0.99
fa	challenge5	9	2725	8	2716	1	1.00	0.00	0.03	0.97	0.00	0.97
lr	challenge5	2	350	1	348	1	1.00	0.01	0.01	0.99	0.00	0.99
tm	challenge5	-	1	-	-	-	-	-	-	-	-	-
an	challenge6	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
ds	challenge6	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge6	1	6	0	5	1	1.00	0.72	0.00	0.28	0.00	1.00
lr	challenge6	1	2	0	1	1	1.00	0.67	0.00	0.33	0.00	1.00
tm	challenge6	-	1	-	-	-	-	-	-	-	-	-
an	challenge7	1	7	0	6	1	1.00	0.15	0.00	0.85	0.00	1.00
ds	challenge7	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge7	17	17	0	0	17	0.50	0.06	0.00	0.00	0.94	0.06
fa	challenge7	23	24	14	1	9	0.22	0.04	0.62	0.01	0.33	0.05
lr	challenge7	1	17	0	16	1	1.00	0.11	0.00	0.89	0.00	1.00
tm	challenge7	-	1	-	-	-	-	-	-	-	-	-
an	challenge8	1	3	0	2	1	1.00	0.42	0.00	0.58	0.00	1.00
ds	challenge8	2	2	0	0	2	0.50	0.50	0.00	0.00	0.50	0.50
es	challenge8	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
fa	challenge8	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
lr	challenge8	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
tm	challenge8	-	2	-	-	-	-	-	-	-	-	-
an	challenge9	1	6	0	5	1	1.00	0.23	0.00	0.77	0.00	1.00
ds	challenge9	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge9	1	4	0	3	1	1.00	0.43	0.00	0.57	0.00	1.00
fa	challenge9	2	150	1	148	1	0.99	0.02	0.03	0.95	0.00	0.97
lr	challenge9	1	113	0	112	1	1.00	0.02	0.00	0.98	0.00	1.00
tm	challenge9	-	1	-	-	-	-	-	-	-	-	-
an	challenge10	1	2	0	1	1	1.00	0.56	0.00	0.44	0.00	1.00
ds	challenge10	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
es	challenge10	1	9	0	8	1	1.00	0.13	0.00	0.87	0.00	1.00
fa	challenge10	1	20	0	19	1	1.00	0.33	0.00	0.67	0.00	1.00
lr	challenge10	1	20	0	19	1	1.00	0.10	0.00	0.90	0.00	1.00
tm	challenge10	-	3	-	-	-	-	-	-	-	-	-
an	challenge11	2	3	1	1	1	0.50	0.36	0.36	0.28	0.00	0.64
ds	challenge11	6	17	5	11	1	0.69	0.06	0.31	0.63	0.00	0.69
es	challenge11	21	2392	20	2371	1	0.99	0.00	0.01	0.98	0.00	0.99
an	challenge11(tautomer1)	3	3	2	0	1	0.00	0.28	0.72	0.00	0.00	0.28
ds	challenge11(tautomer1)	5	17	4	12	1	0.75	0.06	0.25	0.69	0.00	0.75
es	challenge11(tautomer1)	1	2392	0	2391	1	1.00	0.00	0.00	1.00	0.00	1.00
an	challenge11(tautomer2)	1	3	0	2	1	1.00	0.36	0.00	0.64	0.00	1.00
ds	challenge11(tautomer2)	-	17	-	-	-	-	-	-	-	-	-
es	challenge11(tautomer2)	22	2392	21	2370	1	0.99	0.00	0.02	0.98	0.00	0.98
an	challenge12	1	3	0	2	1	1.00	0.38	0.00	0.62	0.00	1.00
ds	challenge12	3	21	2	18	1	0.90	0.05	0.10	0.85	0.00	0.90
es	challenge12	35	902	34	867	1	0.96	0.00	0.06	0.93	0.00	0.94
es	challenge13	12	227	11	215	1	0.95	0.01	0.07	0.93	0.00	0.93
fa	challenge13	24	284	18	260	6	0.93	0.01	0.17	0.80	0.03	0.80
lr	challenge13	42	206	41	164	1	0.80	0.01	0.36	0.63	0.00	0.64
an	challenge14	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
ds	challenge14	2	5	1	3	1	0.75	0.21	0.21	0.58	0.00	0.79
es	challenge14	1	8219	0	8218	1	1.00	0.00	0.00	1.00	0.00	1.00
fa	challenge14	761	9708	732	8947	29	0.92	0.00	0.16	0.83	0.01	0.83
lr	challenge14	5	1583	4	1578	1	1.00	0.00	0.01	0.99	0.00	0.99
an	challenge15	1	3	0	2	1	1.00	0.43	0.00	0.57	0.00	1.00
ds	challenge15	1	4	0	3	1	1.00	0.33	0.00	0.67	0.00	1.00
es	challenge15	-	6	-	-	-	-	-	-	-	-	-
an	challenge16	1	1	0	0	1	-	1.00	0.00	0.00	0.00	1.00
ds	challenge16	1	4	0	3	1	1.00	0.28	0.00	0.72	0.00	1.00
es	challenge16	-	3976	-	-	-	-	-	-	-	-	-
fa	challenge16	100	10637	97	10537	3	0.99	0.00	0.02	0.98	0.00	0.98

Table legend:

rank: Absolute rank of correct solution
tc: Total number of candidates
bc: Number of candidates with a score better than correct solution
wc: Number of candidates with a score worse than correct solution
ec: Number of candidates with same score as the correct solution
rrp: Relative ranking position (1.0 is good, 0.0 is not)
p: Score of correct solution
wbc: Sum of scores better than correct solution
wwc: Sum of scores worse than correct solution
wec: Sum of scores equal to correct solution
wrrp: RRP weighted by the scores (1 is good)