News
March 29th, 2017
The CASMI 2016 Cat 2+3 paper is out! Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned! Dec 4th, 2016
The MS1 peak lists for Category 2+3 have been added for completeness. May 6th, 2016
The winners and full results are available. April 25th, 2016
The solutions are public now. April 18th, 2016
The contest is closed now, the results are fantastic and will be opened soon! April 9th, 2016
All teams who submit before the deadline April 11th will be allowed to update the submission until Friday 15th. February 12th, 2016
New categories 2 and 3 and data for automatic methods released. 10 new challenges in category 1. January 25th, 2016
E. Schymanski and S. Neumann joined the organising team, additional contest data coming soon. January 11th, 2016
New CASMI 2016 raw data files are available.
March 29th, 2017
The CASMI 2016 Cat 2+3 paper is out! Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned! Dec 4th, 2016
The MS1 peak lists for Category 2+3 have been added for completeness. May 6th, 2016
The winners and full results are available. April 25th, 2016
The solutions are public now. April 18th, 2016
The contest is closed now, the results are fantastic and will be opened soon! April 9th, 2016
All teams who submit before the deadline April 11th will be allowed to update the submission until Friday 15th. February 12th, 2016
New categories 2 and 3 and data for automatic methods released. 10 new challenges in category 1. January 25th, 2016
E. Schymanski and S. Neumann joined the organising team, additional contest data coming soon. January 11th, 2016
New CASMI 2016 raw data files are available.
Challenges: Categories 2 and 3
The data for Categories 2 and 3 consist of training sets
and challenge sets. The data is identical for both
categories - for Category 2 only in silico fragmentation
is allowed to find the correct answer; for Category 3 any
type of additional information can be used, see the rules.
Spectral Acquisition
All MS/MS spectra were obtained on a Q Exactive Plus Orbitrap from Thermo Scientific, with <5 ppm mass accuracy and MS/MS resolution of 35,000 using electrospray ionisation and stepped 20/35/50 HCD nominal collision energies. The spectra were obtained by measuring several mixes in the same LC-MS run, in data-dependent acquisition mode using inclusion lists with the [M+H]+ (positive) and [M-H]- ion masses. A reversed phase C18 column was used (2.6 uM, 2.1x50 mm with a 2.1x5 mm precolumn) with a gradient of (A/B): 95/5 at 0 min, 95/5 at 1 min, 0/100 at 13 min, 0/100 at 24 min (A = water, B = methanol, both with 0.1% formic acid) at a flow of 300 uL/min. The MS/MS peak lists were extracted with RMassBank using the ion mass and a retention time window of 0.3 min around the expected retention time. The data was cleaned and recalibrated to within 5 ppm using known formulas. All peaks in the MS/MS that did not have a valid subformula within 5 ppm of the recalibrated data were removed. All substances with double chromatographic peaks, different substances with identical spectra (detected via the SPLASH), MS/MS with only one peak and MS/MS with a maximum intensity below 1x105 were excluded from the datasets. Substances that were measured multiple times in the same ionisation mode were only included once. MS/MS from positive and negative mode were included if the substance ionised in both modes.Training Data
The training dataset consists of 312 peak lists (from 285 substances), 234 MS/MS obtained in positive mode (all [M+H]+) and 58 MS/MS in negative mode (all [M-H]-). The identities of the substances in the training dataset, plus retention times are provided in the summary CSV file. Candidates are provided for each challenge, as SDF and CSV files, the latter contains the candidates as SMILES, InChI and InChIKey. The files are connected via their number (e.g. Training-023.SDF are the candidates for Training-023.txt).- Training_negative_mgf.zip and Training_negative_peaklist.zip
- Training_positive_mgf.zip and Training_positive_peaklist.zip
- Summary with IDs, structures, RT etc. in training summary CSV
- Training_Candidates.zip contains candidates as CSV.
Challenge Data
The challenge dataset consists of 208 peak lists from 188 substances, 127 obtained in positive mode ([M+H]+) and 81 in negative mode ([M-H]-). The retention times for these substances are provided in the summary CSV file. Candidates are provided for each challenge, as SDF and CSV files, the latter contains the candidates as SMILES, InChI and InChIKey. The files are connected via the Challenge number (e.g. Challenge-123.SDF are the candidates for Challenge-123.txt).- Challenge_negative_mgf.zip and Challenge_negative_peaklist.zip
- Challenge_positive_mgf.zip and Challenge_positive_peaklist.zip
- Summary retention time, polarity etc. in challenge summary CSV
- Challenge_Candidates.zip contains candidates as CSV.
- CASMI2016_Cat2+3_MS1-neg_peaklists.tar
- CASMI2016_Cat2+3_MS1-neg_peaklists.txt (alternative single-file format used in [1]).
- CASMI2016_Cat2+3_MS1-pos_peaklists.tar
- CASMI2016_Cat2+3_MS1-pos_peaklists.txt (alternative single-file format used in [1]).
Candidate lists
The candidates were retrieved from ChemSpider as SMILES strings on 14/02/2016 and converted to standard InChI and InChIKeys with OpenBabel. The candidate lists are saved as CSV files, with ‘ “ ‘ as quoting character and “,“ as field separator. Candidates where the OpenBabel conversion from SMILES to InChI failed were removed. The presence of the correct solution in the candidates was checked.
If you have any issues with the file formats please ask on the casmi-discuss@lists.sf.net mailing list, or contact the CASMI team. Your participation should not be prevented by formatting issues.