Critical Assessment of Small Molecule Identification

CASMI 2022

CASMI 2017

CASMI 2016

Important Dates
Contest Rules
Example Data
Challenge Data
Solutions
Results
Proceedings
About the Team

CASMI 2014

CASMI 2013

CASMI 2012

News

March 29th, 2017
The CASMI 2016 Cat 2+3 paper is out!

Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned!

Dec 4th, 2016
The MS1 peak lists for Category 2+3 have been added for completeness.

May 6th, 2016
The winners and full results are available.

April 25th, 2016
The solutions are public now.

April 18th, 2016
The contest is closed now, the results are fantastic and will be opened soon!

April 9th, 2016
All teams who submit before the deadline April 11th will be allowed to update the submission until Friday 15th.

February 12th, 2016
New categories 2 and 3 and data for automatic methods released. 10 new challenges in category 1.

January 25th, 2016
E. Schymanski and S. Neumann joined the organising team, additional contest data coming soon.

January 11th, 2016
New CASMI 2016 raw data files are available.

Overview | Category 1 | Category 2 and 3

Challenges: Categories 2 and 3

The data for Categories 2 and 3 consist of training sets and challenge sets. The data is identical for both categories - for Category 2 only in silico fragmentation is allowed to find the correct answer; for Category 3 any type of additional information can be used, see the rules.

Spectral Acquisition

All MS/MS spectra were obtained on a Q Exactive Plus Orbitrap from Thermo Scientific, with <5 ppm mass accuracy and MS/MS resolution of 35,000 using electrospray ionisation and stepped 20/35/50 HCD nominal collision energies. The spectra were obtained by measuring several mixes in the same LC-MS run, in data-dependent acquisition mode using inclusion lists with the [M+H]+ (positive) and [M-H]- ion masses. A reversed phase C18 column was used (2.6 uM, 2.1x50 mm with a 2.1x5 mm precolumn) with a gradient of (A/B): 95/5 at 0 min, 95/5 at 1 min, 0/100 at 13 min, 0/100 at 24 min (A = water, B = methanol, both with 0.1% formic acid) at a flow of 300 uL/min.

The MS/MS peak lists were extracted with RMassBank using the ion mass and a retention time window of 0.3 min around the expected retention time. The data was cleaned and recalibrated to within 5 ppm using known formulas. All peaks in the MS/MS that did not have a valid subformula within 5 ppm of the recalibrated data were removed. All substances with double chromatographic peaks, different substances with identical spectra (detected via the SPLASH), MS/MS with only one peak and MS/MS with a maximum intensity below 1x10⁵ were excluded from the datasets. Substances that were measured multiple times in the same ionisation mode were only included once. MS/MS from positive and negative mode were included if the substance ionised in both modes.

Training Data

The training dataset consists of 312 peak lists (from 285 substances), 234 MS/MS obtained in positive mode (all [M+H]+) and 58 MS/MS in negative mode (all [M-H]-). The identities of the substances in the training dataset, plus retention times are provided in the summary CSV file. Candidates are provided for each challenge, as SDF and CSV files, the latter contains the candidates as SMILES, InChI and InChIKey. The files are connected via their number (e.g. Training-023.SDF are the candidates for Training-023.txt).

Training_negative_mgf.zip and Training_negative_peaklist.zip

Training_positive_mgf.zip and Training_positive_peaklist.zip

Summary with IDs, structures, RT etc. in training summary CSV

Training_Candidates.zip contains candidates as CSV.

Please note that one of the subtances in the training data was measured twice in different standard mixes, so that Training-003 and Training-114 (negative mode) are effectively repeats of Training-002 and Training-116 (positive mode).

Challenge Data

The challenge dataset consists of 208 peak lists from 188 substances, 127 obtained in positive mode ([M+H]+) and 81 in negative mode ([M-H]-). The retention times for these substances are provided in the summary CSV file. Candidates are provided for each challenge, as SDF and CSV files, the latter contains the candidates as SMILES, InChI and InChIKey. The files are connected via the Challenge number (e.g. Challenge-123.SDF are the candidates for Challenge-123.txt).

Challenge_negative_mgf.zip and Challenge_negative_peaklist.zip

Challenge_positive_mgf.zip and Challenge_positive_peaklist.zip

Summary retention time, polarity etc. in challenge summary CSV

Challenge_Candidates.zip contains candidates as CSV.

New (2016-12-04) MS peak lists extracted with RMassBank:

CASMI2016_Cat2+3_MS1-neg_peaklists.tar
CASMI2016_Cat2+3_MS1-neg_peaklists.txt (alternative single-file format used in [1]).
CASMI2016_Cat2+3_MS1-pos_peaklists.tar
CASMI2016_Cat2+3_MS1-pos_peaklists.txt (alternative single-file format used in [1]).

Candidate lists

The candidates were retrieved from ChemSpider as SMILES strings on 14/02/2016 and converted to standard InChI and InChIKeys with OpenBabel. The candidate lists are saved as CSV files, with ‘ “ ‘ as quoting character and “,“ as field separator. Candidates where the OpenBabel conversion from SMILES to InChI failed were removed. The presence of the correct solution in the candidates was checked.

If you have any issues with the file formats please ask on the casmi-discuss@lists.sf.net mailing list, or contact the CASMI team. Your participation should not be prevented by formatting issues.