Critical Assessment of Small Molecule Identification

CASMI 2022

CASMI 2017

CASMI 2016

Important Dates
Contest Rules
Example Data
Challenge Data
Solutions
Results
Proceedings
About the Team

CASMI 2014

CASMI 2013

CASMI 2012

News

March 29th, 2017
The CASMI 2016 Cat 2+3 paper is out!

Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned!

Dec 4th, 2016
The MS1 peak lists for Category 2+3 have been added for completeness.

May 6th, 2016
The winners and full results are available.

April 25th, 2016
The solutions are public now.

April 18th, 2016
The contest is closed now, the results are fantastic and will be opened soon!

April 9th, 2016
All teams who submit before the deadline April 11th will be allowed to update the submission until Friday 15th.

February 12th, 2016
New categories 2 and 3 and data for automatic methods released. 10 new challenges in category 1.

January 25th, 2016
E. Schymanski and S. Neumann joined the organising team, additional contest data coming soon.

January 11th, 2016
New CASMI 2016 raw data files are available.

Results in Category 2

Summary of Challenge wins

	Vaniya	Duehrkop	Verdegem	Allen	Brouard
Gold	70	82	44	63	86
Silver	26	21	53	71	50
Bronze	35	11	65	40	31
Gold (neg)	33	0	24	26	20
Gold (pos)	37	82	20	37	66

Summary statistics per participant

	Mean rank	Median rank	Top	Top3	Top10	Mean RRP	Median RRP
Vaniya	19.75	3.0	46	79	101	0.804	0.922
Duehrkop	25.17	1.0	70	90	100	0.945	1.000
Verdegem	70.79	9.8	24	59	105	0.880	0.972
Allen	47.98	6.0	39	77	123	0.906	0.987
Brouard	127.34	5.2	62	93	118	0.874	0.988

Summary of Rank by Challenge and Participant

For each challenge, the rank of the winner(s) is highlighted in bold. If the submission did not contain the correct candidate this is denoted as "-". If someone did not participate in a challenge, nothing is shown. The tables are sortable if you click into the column header.

This summary is also available as CSV download.

	Vaniya	Duehrkop	Verdegem	Allen	Brouard
challenge-001	29.5		353.0	27.5	21.5
challenge-002	-		5.0	5.0	5.5
challenge-003	7.5		27.0	7.0	4.5
challenge-004	21.5		8.5	8.0	7.0
challenge-005	2.0		3.5	112.0	383.0
challenge-006	40.5		86.0	63.0	75.0
challenge-007	2.0		4.0	3.0	266.0
challenge-008	1.5		2.5	2.0	2.0
challenge-009	-		1.0	1.0	55.0
challenge-010	4.5		3.5	6.0	7.0
challenge-011	1.0		14.5	8.0	3753.0
challenge-012	15.0		28.5	47.5	2530.0
challenge-013			1.0	1.0	40.0
challenge-014	-		35.5	19.5	22.0
challenge-015	3.0		73.0	146.0	39.0
challenge-016	101.0		1.5	2.0	72.0
challenge-017	-		95.5	82.0	58.0
challenge-018	1.0		3.0	1.0	1.0
challenge-019	-		21.5	3.0	341.0
challenge-020	71.0		10.5	70.0	1.0
challenge-021	-		2.0	32.0	1217.0
challenge-022	4.5		8.0	4.5	1.0
challenge-023	2.0		6.0	7.5	917.0
challenge-024	2.5		70.5	27.0	183.0
challenge-025	8.5		5.0	7.0	65.0
challenge-026	2.5		75.5	1.5	1.0
challenge-027	-		109.5	81.5	31.0
challenge-028	26.5		14.0	14.0	1.0
challenge-029	4.0		3.5	9.0	3.0
challenge-030	19.0		139.5	2.0	81.0
challenge-031	-		9.5	6.5	3.0
challenge-032	68.5		3.0	42.0	78.0
challenge-033	-		6.0	49.5	1.0
challenge-034	1.0		1.5	1.0	6.0
challenge-035	23.5		14.5	12.5	5.0
challenge-036	8.0		1.0	1170.5	972.0
challenge-037	6.5		6.5	64.0	68.0
challenge-038	3.5		2.5	3.5	29.0
challenge-039	-		240.5	205.0	8.0
challenge-040	-		6.5	33.5	39.0
challenge-041	1.0		139.0	424.0	1.0
challenge-042	6.5		5.0	6.5	1.0
challenge-043	-		188.5	12.0	20.0
challenge-044	2.5		1.5	3.0	19.0
challenge-045	-		74.5	14.0	16.0
challenge-046	1.5		62.0	29.0	44.0
challenge-047	1.0		3.5	136.0	216.0
challenge-048	2.0		2.0	3.0	5.0
challenge-049	12.5		13.5	11.5	129.0
challenge-050	-		3.5	3.0	234.0
challenge-051	1.0		79.0	159.5	36.0
challenge-052	-		48.5	103.5	160.0
challenge-053	1.0		61.0	308.5	2014.0
challenge-054	3.0		50.0	17.0	17.0
challenge-055	1.0		11.5	4.0	21.0
challenge-056	-		84.0	5.0	14.0
challenge-057	22.5		1.5	1.0	81.0
challenge-058	1.0		1.0	11.0	5.5
challenge-059	-		2.0	2.0	4.0
challenge-060	3.0		44.5	69.0	95.0
challenge-061			2.0	21.0	319.0
challenge-062	1.0		66.5	76.0	605.0
challenge-063	1.0		1.0	1.0	20.0
challenge-064	2.0		3.0	23.0	12.0
challenge-065	-		3.5	3.5	134.0
challenge-066	17.0		23.5	4.5	14.0
challenge-067	-		17.0	1.0	5.0
challenge-068	2.0		1.5	1.0	3.0
challenge-069	5.0		84.5	21.5	101.0
challenge-070	-		3.0	2.5	367.0
challenge-071	2.0		3.0	3.0	2.0
challenge-072			1.0	1.0	70.0
challenge-073	1.0		1.0	1.0	1.0
challenge-074	1.0		1.0	1.0	90.0
challenge-075	4.5		9.0	4.0	3.0
challenge-076	1.5		17.0	4.0	57.0
challenge-077	4.5		39.0	63.0	36.0
challenge-078			16.0	7.0	112.0
challenge-079	1.0		7.5	1.0	6.0
challenge-080			1.5	2.0	28.0
challenge-081	4.0		5.5	8.5	6.0
challenge-082	17.0	1.0	4.0	1.0	1.0
challenge-083	147.0	3.0	3.5	16.0	33.0
challenge-084	11.0	14.0	48.0	17.0	63.0
challenge-085	49.0	1.0	53.0	89.0	16.0
challenge-086	76.5	1.0	53.0	72.0	1.0
challenge-087	34.5	10.0	87.0	35.5	1.0
challenge-088	41.0	1.0	50.0	65.0	1.0
challenge-089	131.5	1.0	28.0	68.0	1.0
challenge-090	12.5	3.0	12.5	38.5	6.0
challenge-091	10.0	11.0	89.5	6.5	1.0
challenge-092	-	1.0	629.0	2.0	1.0
challenge-093	79.0	1.0	13.5	22.0	26.0
challenge-094	-	81.0	1.0	1.0	85.0
challenge-095	106.0	1.0	4.0	1.0	76.0
challenge-096	2.5	1.0	2.0	2.0	1.0
challenge-097	1.0	11.0	32.0	257.5	71.0
challenge-098	1.0	1.5	48.0	2.5	1.5
challenge-099	1.0	1.0	138.5	15.0	1.0
challenge-100	-	1.0	8.5	15.5	1.0
challenge-101	9.0	5.0	14.0	2.0	4.0
challenge-102	184.0	22.0	116.0	212.0	31.0
challenge-103	238.5	1.0	158.0	5.0	1.0
challenge-104	4.0	1.0	7.0	6.0	1.0
challenge-105	1.0	1.5	7.5	3.5	128.5
challenge-106		7.0	1.0	1.0	3.0
challenge-107	44.5	2.0	41.5	2.0	2.0
challenge-108	27.0	1.0	1.0	2.0	1.0
challenge-109	3.0	1.5	9.0	3.0	1.5
challenge-110	1.0	1.0	1281.0	124.5	1.0
challenge-111	1.0	1.0	1.0	2.0	1.0
challenge-112	-	2.0	2.0	6.0	4.0
challenge-113	35.5	1.0	3.5	35.0	1.0
challenge-114	11.0	1.0	9.0	20.0	1.0
challenge-115	1.0	1.0	5.5	3.0	1.0
challenge-116	49.5	2.0	31.0	1.5	2.0
challenge-117	1.0	1.0	1.5	40.0	1.0
challenge-118	-	1.0	11.0	5.0	3.0
challenge-119	-	94.0	134.5	125.0	131.0
challenge-120		77.0	66.0	614.0	9.0
challenge-121	46.0	34.0	3.0	6.0	136.0
challenge-122	2.5	1.0	4.0	12.0	46.0
challenge-123	1.5	1.0	1.5	1.0	1.0
challenge-124	3.0	1.0	6.5	2.0	1.0
challenge-125	117.0	24.0	156.0	123.5	4.0
challenge-126	9.0	195.0	87.0	18.0	2.0
challenge-127	21.0	4.0	43.0	65.0	1.0
challenge-128	20.0	1.0	66.0	6.0	1.0
challenge-129	139.0	3.0	13.5	6.0	2.0
challenge-130	1.0	1.0	6.5	52.5	1.0
challenge-131	151.5	966.0	64.0	39.5	990.0
challenge-132	1.0	1.0	3.5	1.0	1.0
challenge-133		1.0	1.0	1.0	1.0
challenge-134	6.5	4.0	2.5	3.0	30.0
challenge-135	-	17.0	31.0	1.0	3.0
challenge-136	15.5	9.0	3.5	3.0	2.0
challenge-137	1.0	1.0	2.0	177.5	1.0
challenge-138	1.0	1.0	1.0	1.0	15.0
challenge-139	1.0	1.0	1.0	1.0	66.0
challenge-140	-	1.0	8.5	6.0	1.0
challenge-141	-	1.0	14.0	186.0	2.0
challenge-142	1.0	1.0	65.0	2.0	2.0
challenge-143	1.0	1.0	525.0	13.0	1.0
challenge-144	1.5	1.0	144.0	88.0	230.0
challenge-145	1.0	1.0	15.0	1.0	3.0
challenge-146	-	1.0	3.0	2.0	77.0
challenge-147	-	2.0	3.5	4.0	1.0
challenge-148	1.0	1.0	3.0	2.0	1.0
challenge-149	1.0	6.0	2.5	5.0	96.0
challenge-150	1.0	1.0	2.0	3.0	1.0
challenge-151	1.0	1.5	25.5	40.0	1.5
challenge-152	-	1.0	265.0	173.0	2075.0
challenge-153	-	1.0	9.0	2.0	1.0
challenge-154	-	11.0	12.0	3.0	54.0
challenge-155	9.0	1.0	252.0	27.0	1.0
challenge-156	-	1.0	1.0	1.0	1.0
challenge-157	36.0	268.0	8.5	143.5	32.0
challenge-158	2.0	1.0	1.0	1.0	1.0
challenge-159	2.0	506.0	16.0	2.0	61.0
challenge-160	33.0	1.0	68.0	121.0	2.0
challenge-161	-	1.0	193.0	21.0	1.0
challenge-162	12.0	11.0	208.0	53.0	14.0
challenge-163	6.0	55.0	227.0	135.0	26.0
challenge-164	2.0	1.0	1.0	1.0	1.0
challenge-165	1.0	1.0	168.0	29.0	1.0
challenge-166	-	1.0	102.0	72.5	1.0
challenge-167	-	1.0	205.0	1.0	3.0
challenge-168	13.5	2.0	335.5	120.0	3.0
challenge-169	1.0	3.0	1.0	1.0	3.0
challenge-170	-	3.0	33.0	4.5	1.0
challenge-171	2.0	7.0	8.5	24.0	7.0
challenge-172	11.0	1.0	186.0	64.0	1.0
challenge-173	40.0	1.0	20.5	88.0	4.0
challenge-174	-	3.0	244.0	10.0	2.0
challenge-175	15.5	44.0	136.0	5.5	8.0
challenge-176	1.0	1.0	1.5	1.0	1.0
challenge-177	1.0	1.0	28.0	213.5	24.0
challenge-178	72.5	1.0	1809.5	615.5	3101.0
challenge-179	3.0	20.0	22.5	1.0	14.0
challenge-180	19.5	44.0	186.5	4.5	6.0
challenge-181	1.0	41.0	7.0	6.0	11.0
challenge-182	-	1.5	2.0	9.0	1.0
challenge-183	6.0	33.0	217.0	9.0	40.0
challenge-184	1.0	1.0	270.0	32.0	1.0
challenge-185	-	1.0	11.5	4.0	1.0
challenge-186	1.0	1.0	2.0	1.0	3.0
challenge-187	1.0	1.0	1.0	1.0	23.0
challenge-188	2.0	1.0	81.0	1.0	1.0
challenge-189	-	-	1.0	10.0	682.0
challenge-190	1.0	1.0	3.0	2.0	1.0
challenge-191	-	2.0	103.5	4.0	2.0
challenge-192	1.0	1.0	5.5	1.0	1.0
challenge-193	3.0	3.0	6.0	1.0	2.0
challenge-194	1.5	1.0	2.5	3.0	3.0
challenge-195	1.0	1.0	1.0	1.0	1.0
challenge-196	4.5	297.0	3.5	3.0	300.0
challenge-197	-	34.0	845.5	13.5	8.0
challenge-198	1.5	1.0	9.5	6.0	4.0
challenge-199	94.5	9.0	280.5	1.0	131.0
challenge-200	-	56.0	21.5	7.0	73.0
challenge-201	-	1.0	2.5	2.5	1.0
challenge-202	-	1.0	505.0	1090.0	758.0
challenge-203		1.0	1.0	1.0	1.0
challenge-204	-	6.0	233.5	6.5	5.0
challenge-205	2.0	1.0	10.0	10.0	3.0
challenge-206	1.0	2.0	1.0	1.0	1.0
challenge-207	88.0	25.0	146.0	39.0	25.0
challenge-208	2.0	1.5	2.0	2.0	1.5

Participant information and abstracts

Participant:	Vaniya
Authors:	Vaniya, Arpana [1], Stephanie N. Samra [1], Sajjan S. Mehta [1], 
		Diego Pedrosa [1], Hiroshi Tsugawa [2], and Oliver Fiehn [1]
Affiliations: 	[1] Genome Center, University of California, Davis 
		[2] RIKEN Center for Sustainable Resource Science (CSRS), Wako, Japan

ParticipantID:	avaniya003
Category:	Category 2
Automatic methods: Yes

Abstract: 

MS-FINDER developed by H.Tsugawa et al. was used as an in silico software for unknown
compound identification in Category 2. MS-FINDER version 1.62 was used. MS/MS spectra
were uploaded to MS-FINDER in msp format. Precursor m/z, ion mode, mass accuracy of
instrument, and precursor type were used as metadata. Each candidate file was
converted to a structure database file which can be read by MS-FINDER. Each file was
saved in the software folder in order for it to be called by MS-FINDER. This file was
changed after each calculation in order to match the challenge data. A search of the
challenge msp against the challenge candidate list was performed on the top 500
candidates. Up to 500 top candidates structures were exported as a text file from
MS-FINDER. Final scores and SMILES were reported for submission to CASMI 2016.
Multiple candidates were submitted for each challenge.

Participant:	        Duehrkop
Authors: 		Dührkop, Kai (1) and Shen, Huibin (2) and Meusel, Marvin (1)
			and Rousu, Juho (2) and Böcker, Sebastian (1)
Affiliations:         	(1) Chair of Bioinformatics, Friedrich-Schiller University, Jena
			(2) Department of Computer Science, Aalto University

ParticipantID:	      csifingerid
Category:	      category2
Automatic pipeline:   yes
Spectral libraries:   yes (for training)

Abstract

We processed the peaklists in MGF format using a command line version
of CSI:FingerId 1.0.1. Fragmentation trees were computed with Sirius 3.1.4 
using the Q-TOF instrument settings. We computed trees for all
candidate formulas in the given structure candidate list.  Only the
top scoring trees were selected for further processing: Trees with a
score smaller than 80% of the score of the optimal tree were
discarded. Each of these trees was processed with CSI:FingerId as
described in [1]. We predicted for each tree a molecular fingerprint
(with platt probability estimates) and compared them against the
fingerprints of all structure candidates with the same molecular
formula. The resulting hits were merged together in one list and were
sorted by score. A constant value of 10000 was added to all scores to
make them positive (as stated in the CASMI rules). Ties of compounds
with same score (and sometimes also with same 2D structure) were
ordered randomly.

The machine learning method was trained on 7352 spectra (4564
compounds) downloaded from GNPS [2] and Massbank [3]. As our training
dataset contains only spectra in positive ion mode (there are too few
spectra with negative ion mode in GNPS), we ommited all challenges
with negative ion mode; As long as there are not enough public
available reference spectra measured in negative ion mode our method
will be only able to process positive ion mode spectra.

We observed for 67 challenges that the top scoring structure candidate
was a compound which is also contained in our training set. If we
evaluate our method on spectra from compounds we have already trained
on we usually reach a performance comparable to spectral library
search. To avoid an overestimation of the performance of our method,
we removed all of these top scoring candidates from our training set
and retrained our classifiers. To compensate the removed spectra, we
added the training spectra that are provided by CASMI. The submission
with the ParticipantID csifingerid_leaveout contains the search
results of these newly trained classifiers.

[1] Kai Dührkop, Huibin Shen, Marvin Meusel, Juho Rousu and Sebastian
    Böcker Searching molecular structure databases with tandem mass
    spectra using CSI:FingerID.  Proc Natl Acad Sci U S A,
    112(41):12580-12585, 2015.

[2] https://gnps.ucsd.edu

[3] Horai H, et al. MassBank: a public repository for sharing mass 
    spectral data for life sciences.  J Mass Spectrom 45(7):703–714, 2010.

Participant:		Verdegem
Authors:		Verdegem, Dries and Ghesquière, Bart
Affiliation:		Vesalius Research Center, VIB/KULeuven, Leuven, Belgium

ParticipantID:		dverdegem
Category:		category2
Automatic method:	yes

Abstract
For all assignments, we used the MAGMa+ software [1].

MAGMa+ uses MAGMa [2] under the hood. It runs MAGMa twice with two
different, fine-tuned parameters of which the values depend on the
ionization mode. MAGMa+ then determines the molecular class of the top
ranked metabolites returned by both MAGMa runs. This latent molecular
class is determined by a trained two-class random forest
classifier. Depending on the most prevelant molecular class, one of
both MAGMa outcomes (the one from the run with the parameters
corresponding to the most prevelant class) is returned to the user.

As structure database, the possible solution list provided in the
contest was used. We did not perform any prefiltering.

[1] Verdegem, Dries, et al. "Improved metabolite identification with
    MIDAS and MAGMa through MS/MS spectral dataset-driven parameter
    optimization." accepted for publication in Metabolomics
[2] Ridder, Lars, et al. "Substructure‐based annotation of
    high‐resolution multistage MSn spectral trees." Rapid
    Communications in Mass Spectrometry 26.20 (2012): 2461-2471.

Participant:          Allen
Authors:              Felicity Allen, Tanvir Sajed, Russ Greiner, David Wishart
Affiliations:         Department of Computing Science
		      University of Alberta, Canada

ParticipantID:        FelicityAllenCFMOrig
Category:             category2
Automatic pipeline:   yes
Spectral libraries:   no

Abstract

We processed the list of molecules and provided candidates using cfm-id.
The original  CFM positive and negative models were used, which were trained 
on data from the Metlin database.  Mass tolerances of 10ppm were used
and the Jaccard score was applied for spectral comparisons. The input spectrum
was repeated for the low, medium and high energies.

Participant:	      Brouard
Authors:              Céline Brouard(1,2), Huibin Shen(1,2), Kai Dührkop(3), 
		      Sebastian Böcker(3) and Juho Rousu(1,2)
Affiliations:         (1) Department of Computer Science, Aalto University, Espoo, Finland
                      (2) Helsinki Institute for Information Technology, Espoo, Finland
                      (3) Chair for Bioinformatics, Friedrich-Schiller University, 
		      Jena, Germany

ParticipantID:        IOKRAlignf
Category:	      category2
Automatic pipeline:   yes
Spectral libraries:   no

Abstract

We used a recent machine learning approach, called Input Output Kernel
Regression, for predicting the candidate scores. In this method, the
similarities between the MS/MS spectra and the molecular similarities
are encoded using two kernel functions. In input, we computed
different kernels based on MS/MS spectra and on fragmentation
trees. In output we built a gaussian kernel based on molecular
fingerprints. We used approximately 6000 molecular fingerprints from
OpenBabel. We combined the different input kernels using the Alignf
algorithm, which searches to maximize the alignment between the
combined kernel the output kernel.

We trained separate models for the MS/MS spectra in positive mode and
the MS/MS spectra in negative mode.  We considered additional MS/MS
spectra from GNPS and MassBank for training the models.

Details per Challenge and Participant. See legend at bottom for more details

The details table is also available as HTML and as CSV download. The individual submissions are also available for download.