News
March 29th, 2017
The CASMI 2016 Cat 2+3 paper is out! Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned! Dec 4th, 2016
The MS1 peak lists for Category 2+3 have been added for completeness. May 6th, 2016
The winners and full results are available. April 25th, 2016
The solutions are public now. April 18th, 2016
The contest is closed now, the results are fantastic and will be opened soon! April 9th, 2016
All teams who submit before the deadline April 11th will be allowed to update the submission until Friday 15th. February 12th, 2016
New categories 2 and 3 and data for automatic methods released. 10 new challenges in category 1. January 25th, 2016
E. Schymanski and S. Neumann joined the organising team, additional contest data coming soon. January 11th, 2016
New CASMI 2016 raw data files are available.
March 29th, 2017
The CASMI 2016 Cat 2+3 paper is out! Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned! Dec 4th, 2016
The MS1 peak lists for Category 2+3 have been added for completeness. May 6th, 2016
The winners and full results are available. April 25th, 2016
The solutions are public now. April 18th, 2016
The contest is closed now, the results are fantastic and will be opened soon! April 9th, 2016
All teams who submit before the deadline April 11th will be allowed to update the submission until Friday 15th. February 12th, 2016
New categories 2 and 3 and data for automatic methods released. 10 new challenges in category 1. January 25th, 2016
E. Schymanski and S. Neumann joined the organising team, additional contest data coming soon. January 11th, 2016
New CASMI 2016 raw data files are available.
Extra results in Category 1
The "extra" evaluations include all submissions that were submitted
after passing of the contest deadline, and also results by Christoph Ruttkies
who is considered an internal participant.
We also offer to run future submissions through the
evaluation pipeline and put the results up here. Please
note that such future submissions will have been performed
after release of the solutions, unlike the contest entries.
Summary of Challenge wins
Vaniya (in silico) | Vaniya | Allen | Allen (retrained) | Nothias (CFM) | Nothias-Scaglia | Nothias (ISDB UNPD) | Nikolic | Allard | Allard (ISDB DNP) | Allard (ISDB UNPD) | Ruttkies (MetFrag+CFM) | Bertrand | Bertrand (manual) | Kind | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Gold | 9 | 14 | 7 | 7 | 7 | 11 | 8 | 15 | 2 | 4 | 2 | 2 | 6 | 5 | 12 |
Silver | 2 | 1 | 5 | 5 | 1 | 0 | 1 | 3 | 2 | 1 | 2 | 4 | 2 | 4 | 1 |
Bronze | 4 | 0 | 2 | 2 | 3 | 1 | 3 | 0 | 2 | 3 | 3 | 0 | 1 | 0 | 0 |
Summary statistics per participant
Mean rank | Median rank | Top | Top3 | Top10 | Mean RRP | Median RRP | |
---|---|---|---|---|---|---|---|
Vaniya (in silico) | 8.38 | 1.0 | 9 | 13 | 14 | 0.952 | 1.000 |
Vaniya | 5.25 | 1.0 | 14 | 15 | 15 | 0.989 | 1.000 |
Allen | 3.47 | 2.0 | 7 | 12 | 16 | 0.971 | 0.993 |
Allen (retrained) | 3.47 | 2.0 | 7 | 12 | 16 | 0.971 | 0.993 |
Nothias (CFM) | 4.81 | 1.0 | 7 | 9 | 12 | 0.790 | 0.929 |
Nothias-Scaglia | 1.25 | 1.0 | 11 | 11 | 12 | 0.994 | 1.000 |
Nothias (ISDB UNPD) | 2.71 | 1.0 | 8 | 10 | 14 | 0.816 | 1.000 |
Nikolic | 1.22 | 1.0 | 14 | 18 | 18 | 0.785 | 1.000 |
Allard | 3.40 | 2.5 | 2 | 6 | 10 | 0.661 | 0.727 |
Allard (ISDB DNP) | 2.33 | 2.0 | 4 | 8 | 9 | 0.693 | 0.857 |
Allard (ISDB UNPD) | 2.89 | 2.0 | 2 | 7 | 9 | 0.690 | 0.786 |
Ruttkies (MetFrag+CFM) | 112.53 | 21.5 | 2 | 5 | 6 | 0.870 | 0.926 |
Bertrand | 5.29 | 2.0 | 6 | 8 | 12 | 0.781 | 0.933 |
Bertrand (manual) | 4.77 | 2.0 | 5 | 9 | 11 | 0.811 | 0.929 |
Kind | 19.62 | 1.0 | 12 | 14 | 15 | 0.875 | 1.000 |
Summary of Rank by Challenge and Participant
For each challenge, the rank of the winner(s) is highlighted in bold. If the submission did not contain the correct candidate this is denoted as "-". If someone did not participate in a challenge, nothing is shown. The tables are sortable if you click into the column header. This summary is also available as CSV download.Vaniya (in silico) | Vaniya | Allen | Allen (retrained) | Nothias (CFM) | Nothias-Scaglia | Nothias (ISDB UNPD) | Nikolic | Allard | Allard (ISDB DNP) | Allard (ISDB UNPD) | Ruttkies (MetFrag+CFM) | Bertrand | Bertrand (manual) | Kind | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
challenge-001 | 1.0 | 1.0 | 4.5 | 4.5 | 3.0 | - | 7.0 | 2.0 | 3.0 | 3.0 | 3.0 | 4.0 | 7.0 | 7.0 | 5.0 |
challenge-002 | 1.0 | 1.0 | 1.0 | 1.0 | - | 1.0 | 2.0 | 2.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 |
challenge-004 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 1.0 | 2.0 | 34.0 | 6.0 | 2.0 | 1.0 |
challenge-005 | - | - | - | - | - | - | - | 1.0 | - | - | - | - | - | - | - |
challenge-006 | - | - | 2.5 | 2.5 | 4.5 | 6.0 | 1.0 | 7.0 | 7.0 | 6.0 | - | 6.0 | 6.0 | - | |
challenge-007 | 1.0 | 1.0 | 1.0 | 1.0 | - | 1.0 | 2.0 | 1.0 | 2.0 | 1.0 | 2.0 | 1.5 | 1.0 | 1.0 | 3.0 |
challenge-008 | 1.0 | 1.0 | 4.0 | 4.0 | 4.5 | 1.0 | 1.0 | 2.0 | 4.0 | 3.0 | 6.0 | 46.0 | 2.0 | ||
challenge-009 | 9.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1206.5 | 2.0 | 2.0 | 1.0 |
challenge-010 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 15.0 | 1.0 | 1.0 | 1.0 | |
challenge-011 | 1.0 | 1.0 | 19.0 | 19.0 | 1.5 | - | - | 1.0 | 8.0 | 2.0 | - | 175.0 | 4.0 | - | 1.0 |
challenge-012 | 2.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 | 4.0 | 2.0 | 3.0 | 88.0 | 1.0 | 1.0 | 1.0 |
challenge-013 | 40.0 | 1.0 | 3.0 | 3.0 | - | 1.0 | - | 1.0 | - | - | - | 146.0 | - | - | 1.0 |
challenge-014 | 67.0 | 68.0 | 8.0 | 8.0 | 38.0 | - | 9.0 | 2.0 | - | - | - | 18.0 | 29.0 | 23.0 | 292.0 |
challenge-015 | 1.0 | 1.0 | 1.0 | 1.0 | 4.0 | 4.0 | 4.0 | 1.0 | 1.0 | 1.0 | 3.0 | 1.0 | |||
challenge-016 | 2.5 | 2.0 | 2.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 | 25.0 | 2.0 | 2.0 | 1.0 | |||
challenge-017 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | - | - | 1.0 | |||
challenge-018 | 1.5 | 1.0 | 4.0 | 4.0 | - | - | - | 1.0 | 34.5 | 12.0 | 12.0 | 1.0 | |||
challenge-019 | 3.0 | 1.0 | 3.0 | 3.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 1.0 | 1.0 | 1.0 |
Participant information and abstracts
Participant: Allard Authors: Allard, Pierre-Marie(1) and Houriet, Joëlle(1) Affiliations: (1) Laboratory of Phytochemistry and Bioactive Natural Products, School of Pharmaceutical Sciences, University of Geneva, Quai-Ernest Ansermet 30, 1211 Geneva, Switzerland ParticipantID: pma Category: category1 Automatic pipeline: yes Spectral libraries: yes Abstract: We processed only data of category 1, in positive mode (challenge 1 to 14). Data conversion: Data of challenge 1 to 9 were converted to .mzXML format using Proteowizzard. Fragmentation spectra of the ions of interest were extracted and saved as .mgf files. Parent ion mass in .mgf files was corrected to fit the exact mass of ion of interest when necessary. Molecular network generation: (Molecular network was generated to assess possible structural relationship between metabolites and to generate a common .mgf file) All .mgf files (challenge 1 to 14) were uploaded to GNPS servers (http://gnps.ucsd.edu) and treated in the data treatment workflow using the following parameters: The data were clustered with MS-Cluster with a parent mass tolerance of 0.8 Da and a MS/MS fragment ion tolerance of 0.5 Da to create consensus spectra. A network was then created, where edges were filtered to have a cosine score above 0.7 and more than 6 matched peaks. Further edges between two nodes were kept in the network if, and only if, each of the nodes appeared in each other's respective top 10 most similar nodes. The spectra in the network were then searched against all available GNPS spectral library. A GNPS library hit was taken into account for challenge 3 since it was a permanently charged compound wich was not included in the ISDBs. It's score was set as the highest. In-Silico Databases (ISDB) spectral match: Two in-silico MS/MS fragmentation database were queried: an ISDB created from data of the Dictionary of Natural Products and an ISDB created from data of the UNPD database (http://pkuxxj.pku.edu.cn/UNPD/) ISDBs were generated using cfm-id (https://sourceforge.net/projects/cfm-id/) as described in : Allard, P.-M.; Péresse, T.; Bisson, J.; Gindro, K.; Marcourt, L.; Pham, V. C.; Roussi, F.; Litaudon, M.; Wolfender, J.-L. Anal. Chem. 2016, 88, 3317–3323. The clustered .mgf file obtained were searched against both ISDBs using tremolo (http://proteomics.ucsd.edu/Software/Tremolo/) for the spectral match and in-house script for annotation of the hits. The spectral search was made using the following parameters: tolerance.PM_tolerance=0.01 SCORE_THRESHOLD=0.1 TOP_K_RESULTS=10 Detailed workflow to perform spectral match, scripts and the UNPD-ISDB are available here : http://oolonek.github.io/ISDB/
Participant: Vaniya (in silico) Authors: Vaniya, Arpana [1], Stephanie N. Samra [1], Mine Palazoglu [1], Hiroshi Tsugawa [2], and Oliver Fiehn [1] Affiliations: [1] Genome Center, University of California, Davis [2] RIKEN Center for Sustainable Resource Science (CSRS), Wako, Japan ParticipantID: avaniya001 Category: Category 1 Automatic methods: Yes Abstract: MS-FINDER developed by H.Tsugawa et al. was used as an in silico software for unknown compound identification in Category 1. MS-FINDER version 1.62 was used. Text format of MS1 and MS/MS spectra were uploaded to MS-FINDER. Precursor m/z, ion mode, mass accuracy of instrument, and precursor type were used as metadata. Seven Golden Rules and SIRIUS 3.1.3 were first used to identify the molecular formula. The MS1 spectrum and isotopic abundances were used for Seven Golden Rules. An isotopic abundances error of either 3% or 5% was used depending on mass accuracy of instrument. The MS1 and MS/MS spectra were used for SIRIUS 3.1.3. MS-FINDER was also used to calculate molecular formulas. Formulas found in Dictionary of Natural Products in Seven Golden Rules had higher ranking regardless of the score compared to hits from PubChem or SIRIUS, due to the fact that challenges were natural products. The formulas generated from Seven Golden Rules, SIRIUS, and MS-FINDER were used to validated and confirm the top candidate molecular formulas. HMDB, SMPDB, PlantCyc, FooDB, YMDB, UNPD, BMDB, ECMDB, PubChem, ChEBI, KNApSAck, DrugBank, and T3DB were used as compound databases in MS-FINDER to find structural candidate. Experimental MS/MS spectrum was compared to in silico MS/MS spectrum of each candidate structure for a given molecular formula that was generated with Seven Golden Rules, SIRIUS, and MS-FINDER. The top candidate structures were exported as a text file from MS-FINDER. For challenges, with no result from MS-FINDER or challenges having the same candidate structures MetFrag was used as an additional in silico software. In MetFrag, PubChem was the only compound database used in the search. The following data and metadata was used for calculation; MS/MS spectrum, parent ion m/z value, mass accuracy of instrument, ion mode, and adduct type. The number of structures to limit the processing was set to 100 and the setting for only biological compounds was left unchecked. Final scores and SMILES were reported for submission to CASMI 2016. For this submission, candidates from MS-FINDER were weighted more heavily than MetFrag, except in the case were there were no results from MS-FINDER. Multiple candidates were submitted for each challenge.
Participant: Avaniya Authors: Vaniya, Arpana [1], Stephanie N. Samra [1], Mine Palazoglu [1], Hiroshi Tsugawa [2], and Oliver Fiehn [1] Affiliations: [1] Genome Center, University of California, Davis [2] RIKEN Center for Sustainable Resource Science (CSRS), Wako, Japan ParticipantID: avaniya002 Category: Category 1 Automatic methods: Yes Abstract: The challenges were first searched against multiple mass spectral libraries to find the best match. The MS/MS data was converted to msp format to be searched against NIST14, METLIN, MassBank, ReSpect, and LipidBlast using NIST MS Search 2.0. Candidates with a reverse dot product score of 500 were confirmed by examining match of experimental MS/MS to reference MS/MS. Top candidates from the MS library search were used to validate candidates in the MS-FINDER and MetFrag results. MS-FINDER, Seven Golden Rules, SIRIUS 3.1.3 and MetFrag were used with the same method for the submission titled avaniya001-category1. For this submission, candidates that was found in both MS library search and MS-FINDER or MetFrag were weighted more heavily. Candidates for challenges with no hits from the MS library search remained unchanged from the submission titled avaniya001-category1. Final scores and SMILES were reported for submission to CASMI 2016. Multiple candidates were submitted for each challenge.
Participant: Allen Authors: Felicity Allen, Russ Greiner, David Wishart Affiliations: Department of Computing Science University of Alberta, Canada ParticipantID: felicityallen Category: category1 and category2 Automatic pipeline: yes Spectral libraries: no Abstract A list of candidate structures was obtained by querying all of the following databases for all candidates within the required mass ranges (determined as above): HMDB http://www.hmdb.ca/ ChEBI http://www.ebi.ac.uk/chebi/ ChEMBL https://www.ebi.ac.uk/chembl/ Metlin http://metlin.scripps.edu/ FOODB http://foodb.ca/ T3DB http://www.t3db.ca/ DrugBank http://www.drugbank.ca/ ECMDB http://www.ecmdb.ca/ YMDB http://www.ymdb.ca/ PlantDB Privately held list of 200,000 plant and plant-derived compounds. The MS1 spectra were then predicted for each candidate molecular formula using the emass program by A. Rockwood and P. Haimi [1]. These predicted spectra were compared to the provided MS1 spectra (restricted to within 10 Da of the monoisotopic mass of the molecular formula), and an MS1_SCORE was produced for each molecular formula based on the closeness of this match. The scoring metric used was: MS1_SCORE = ( (WP + WR + DP)_5ppm + (WP + WR + DP)_10ppm + (WP + WR + DP)_50ppm )/10 where WP = intensity weighted precision (0-100) WR = intensity weighted recall (0-100) DP = dot product (0-1) x 100 [1] Rockwood A. and Haimi P., "Efficient calculation of accurate masses of isotopic peaks.", Journal of the American Society for Mass Spectrometry, 17:3 p415-9 2006. For all candidate structures, CFM was used to produce a score for the MS2 spectra. The original CFM positive and negative models were used, which were trained on data from the Metlin database. Mass tolerances of 10ppm were used and the Jaccard score was applied for spectral comparisons. The input spectrum was repeated for the low, medium and high energies. The Jaccard score was summed across three energies, and multiplied by 300. [2] Allen F., Pon A., Wilson M., Greiner R., Wishart D., "CFM-ID: A web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra", Nucleic Acids Research, Web Server Edition 2014. [3] Allen F., Greiner R., Wishart D., "Competitive Fragmentatation Modeling of ESI-MS/MS spectra for putative metabolite identification", Metabolomics, 11:1, p98-110, 2015. For all candidates, a DB_SCORE was produced according to which of the above databases it was found in, adding +50 for each database, except CHEMBL, which added only 10.0. The results were ranked according to the sum of the above three scores: TOTAL_SCORE = MS2_SCORE + DB_SCORE + MS1_SCORE
ParticipantID: GNPS with MS in silico tools Category: category1 Authors: Louis-Felix Nothias (1), Ricardo Silva (1), Florent Olivon (2), Alex Melnik (1), Marc Litaudon (2) Affiliations: (1) Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92037, USA (2) Institut de Chimie des Substances Naturelles, CNRS-ICSN, University of Paris-Saclay, 1 avenue de la terrasse, 91190, Gif-sur-Yvette, France Automatic pipeline: partial Spectral libraries: yes Abstract In the frame of the CASMI 2016, we used GNPS (Global Natural Products Social molecular networking) dereplication workflow [1], and tested different combinations of in silico tools for mass spectrometry with three different proposals. The proposal “GnpsCfmIDDnp” was prepared by using: (A) Sirius3 for molecular formula calculation [2]; (B) GNPS for MS/MS spectral matching; (C) CFM-ID for in silico MS/MS spectral matching for challenges 1-19 [3] with a candidate list retrieved from Dictionary of Natural Products or SciFinder (Challenges 5 and 18). (A) Candidate molecular formulas of challenges 1-19 were calculated using Sirius 3.1, using provided MS1 and MS/MS peak lists (atoms C,H,N,O,S,P and halogens and 20 ppm max error). Candidate molecular formulas were manually curated based on natural products likeliness. (B) Both MS/MS peak lists and raw MS/MS spectra were converted to .mgf format and uploaded to GNPS web platform (http://gnps.ucsd.edu). A spectral library search were conducted via a GNPS dereplication workflow (with all spectral libraries available in March 2016). Annotations were confirmed based on the fitting score, inspection of MS/MS spectral matching with mirror plot, and consistency with the molecular formula from Sirius3. Additionally, searches were conducted with METLIN [4] and NIST spectral libraries [5]. Then, in silico tools for tandem mass spectrometry were used to establish a list of candidates for each challenge. (C) CFM-ID was used for challenges 1-14 (positive ion mode) and 15-19 (negative ion mode). A list of candidate was retrieved from Dictionary of Natural Products or SciFinder (challenge 18) by searching the hypothetical molecular formula(s). Parameters were set as follow: ppm_mass_tol = 20, prob_thresh = 0.001, param_file = metab_se_cfm or negative_metab_se_cfm, score_type = Jaccard, apply_postprocessing = 1. The output score of CFM-ID was used to rank candidates. Finally, the candidate list for each challenge of the proposal was made by ranking the spectral library hit at first position, and then the candidates from in silico tools. The candidates for challenges 3, 10 and 18 were found to be non natural products. Thus, these challenges should be regarded as unannotated. [1] GNPS - Global Natural Products Social molecular networking, http://gnps.ucsd.edu [2] Böcker, S.; Dührkop, K. Fragmentation Trees Reloaded. J Cheminform 2016, 8 (1), 1–26. [3] Allen, F.; Greiner, R.; Wishart, D. Competitive Fragmentation Modeling of ESI-MS/MS Spectra for Putative Metabolite Identification. Metabolomics 2014. [4] Smith, C. A.; O’Maille, G.; Want, E. J.; Qin, C.; Trauger, S. A.; Brandon, T. R.; Custodio, D. E.; Abagyan, R.; Siuzdak, G. METLIN: A Metabolite Mass Spectral Database. Therapeutic drug monitoring 2005, 27 (6), 747–751. [5] NIST Mass spectrometry datacenter, http://chemdata.nist.gov
Participant: Nothias-Scaglia Authors: Louis-Felix Nothias (1), Ricardo Silva (1), Florent Olivon (2), Alex Melnik (1), Marc Litaudon (2) Affiliations: (1) Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92037, USA (2) Institut de Chimie des Substances Naturelles, CNRS-ICSN, University of Paris-Saclay, 1 avenue de la terrasse, 91190, Gif-sur-Yvette, France ParticipantID: GNPS with MS in silico tools Category: category1 Automatic pipeline: partial Spectral libraries: yes Abstract In the frame of the CASMI 2016, we used GNPS (Global Natural Products Social molecular networking) dereplication workflow [1], and tested different combinations of in silico tools for mass spectrometry with three different proposals. The proposal “GnpsCSIFingerID” was prepared by using: (A) Sirius3 for molecular formula calculation [2]; (B) GNPS for MS/MS spectral matching; (C) CSI:FingerID for in silico MS/MS spectral matching for challenges 1-14 [3]; (D) CFM-ID [4] with a candidate list retrieved from Dictionary of Natural Products or SciFinder for challenges 15-19. (A) Candidate molecular formulas of challenges 1-19 were calculated using Sirius 3.1 with the provided MS1 and MS/MS peak lists (atoms C,H,N,O,S,P and halogens and 20 ppm max error). Candidate molecular formulas were manually curated based on natural products likeliness. (B) Both MS/MS peak lists and raw MS/MS spectra were converted to .mgf format and uploaded to GNPS web platform (http://gnps.ucsd.edu). A spectral library search were conducted via a GNPS dereplication workflow (with all spectral libraries available in March 2016). Annotations were confirmed based on the fitting score, inspection of MS/MS spectral matching with mirror plot, and consistency with the molecular formula from Sirius3. Additionally, searches were conducted with METLIN [5] and NIST spectral libraries [6]. Then, in silico tools for tandem mass spectrometry were used to establish a list of candidates for each challenge. (C) CSI:FingerID was used for challenges 1-14 (positive ion mode). The top 10 candidates were considered for the putative molecular formula. The « biological database » filter was not used, and the same candidate rank order was kept (not the match score). (D) Because negative ion mode is not available in CSI:FingerID, CFM-ID was used for challenges 15-19. A list of candidate was retrieved from Dictionary of Natural Products or SciFinder (challenge 18) by searching the hypothetical molecular formula(s). The output score of CFM-ID was used to rank candidates. Finally, the candidate list for each challenge was made by ranking the spectral library hit at first position, and then the candidates from in silico tools. The candidates for challenges 3, 10 and 18 were found to be non natural products. Thus, these challenges should be regarded as unannotated. Furthermore, no candidates are proposed for challenge 6, because the hypothetical molecular formula was not available in CSI:FingerID (the monoisotopic ion of the parent was above 15 ppm of mass deviation). [1] GNPS - Global Natural Products Social molecular networking, http://gnps.ucsd.edu [2] Böcker, S.; Dührkop, K. Fragmentation Trees Reloaded. J Cheminform 2016, 8 (1), 1–26. [3] Dührkop, K.; Shen, H.; Meusel, M.; Rousu, J.; Böcker, S. Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID. PNAS 2015, 112 (41), 12580–12585. [4] Allen, F.; Greiner, R.; Wishart, D. Competitive Fragmentation Modeling of ESI-MS/MS Spectra for Putative Metabolite Identification. Metabolomics 2014. [5] Smith, C. A.; O’Maille, G.; Want, E. J.; Qin, C.; Trauger, S. A.; Brandon, T. R.; Custodio, D. E.; Abagyan, R.; Siuzdak, G. METLIN: A Metabolite Mass Spectral Database. Therapeutic drug monitoring 2005, 27 (6), 747–751. [6] NIST Mass spectrometry datacenter, http://chemdata.nist.gov
ParticipantID: GNPS with MS in silico tools Category: category1 Authors: Louis-Felix Nothias (1), Ricardo Silva (1), Florent Olivon (2), Alex Melnik (1), Marc Litaudon (2) Affiliations: (1) Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92037, USA (2) Institut de Chimie des Substances Naturelles, CNRS-ICSN, University of Paris-Saclay, 1 avenue de la terrasse, 91190, Gif-sur-Yvette, France Automatic pipeline: partial Spectral libraries: yes Abstract In the frame of the CASMI 2016, we used GNPS (Global Natural Products Social molecular networking) dereplication workflow [1], and tested different combinations of in silico tools for mass spectrometry with three different proposals. The proposal “GnpsIsdbUNPD” was prepared by using: (A) Sirius3 for molecular formula calculation [2]; (B) GNPS for MS/MS spectral matching; (C) ISBD-UNPD for in silico MS/MS spectral matching for challenges 1-14 [3]; and (D) CFM-ID [4] with a candidate list retrieved from Dictionary of Natural Products or SciFinder for challenges 15-19. (A) Candidate molecular formulas of challenges 1-19 were calculated using Sirius 3.1, using provided MS1 and MS/MS peak lists (atoms C,H,N,O,S,P and halogens and 20 ppm max error). Candidate molecular formulas were manually curated based on natural products likeliness. (B) Both MS/MS peak lists and raw MS/MS spectra were converted to .mgf format and uploaded to GNPS web platform (http://gnps.ucsd.edu). A spectral library search were conducted via a GNPS dereplication workflow (with all spectral libraries available in March 2016). Annotations were confirmed based on the fitting score, inspection of MS/MS spectral matching with mirror plot, and consistency with the molecular formula from Sirius3. Additionally, searches were conducted with METLIN [5] and NIST spectral libraries [6]. Then, in silico tools for tandem mass spectrometry were used to establish a list of candidates for each challenge. (C) ISDB-UNPD was used for challenges 1-14 (positive ion mode). All the candidates were considered, but those with the putative molecular formula were ranked first manually. Parameters were set as follow: tolerance 0.05, score threshold = 0.05, Top K results = 100. (D) Because negative ion mode is not available in ISDB-UNPD, CFM-ID was used for challenges 15-19. A list of candidate was retrieved from Dictionary of Natural Products or SciFinder (challenge 18) by searching the hypothetical molecular formula(s). The output score of CFM-ID was used to rank candidates. Finally, the candidate list for each challenge of the proposal was made by ranking the GNPS spectral database hit at first position, and then the candidates from in silico tools. The candidates for challenges 3, 10 and 18 were found to be non natural products. Thus, these challenges should be regarded as unannotated. [1] GNPS - Global Natural Products Social molecular networking, http://gnps.ucsd.edu [2] Böcker, S.; Dührkop, K. Fragmentation Trees Reloaded. J Cheminform 2016, 8 (1), 1–26. [3] Allard, P.-M.; Péresse, T.; Bisson, J.; Gindro, K.; Marcourt, L.; Pham, V. C.; Roussi, F.; Litaudon, M.; Wolfender, J.-L. Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication. Anal. Chem. 2016. [4] Allen, F.; Greiner, R.; Wishart, D. Competitive Fragmentation Modeling of ESI-MS/MS Spectra for Putative Metabolite Identification. Metabolomics 2014. [5] Smith, C. A.; O’Maille, G.; Want, E. J.; Qin, C.; Trauger, S. A.; Brandon, T. R.; Custodio, D. E.; Abagyan, R.; Siuzdak, G. METLIN: A Metabolite Mass Spectral Database. Therapeutic drug monitoring 2005, 27 (6), 747–751. [6] NIST Mass spectrometry datacenter, http://chemdata.nist.gov
Participant: Nikolic Author: Dejan Nikolic Affiliations: UIC/NIH Center for Botanical Dietary Supplements Research Department of Medicinal Chemistry & Pharmacognosy, College of Pharmacy, University of Illinois at Chicago, ParticipantID: Nikolic Category: Category1 Automatic methods: No Abstract Structure candidates were determined on a case by case basis using a manual method outlined in the previous publication from the CASMI2012 contest (1). The method involves searching of the elemental composition in the SciFinder and Reaxys databases restricting the hits to naturally occurring compounds. Publicly available spectral libraries such as MassBank, METLIN and ReSpect were also consulted. Hits returned from the searches were manually scrutinized by attempting to rationalize the experimental spectrum with the candidate structures. For ranking candidate structures, a subjective confidence scale from 0.60 to 1.00 was used. The overall confidence in the assignment was assessed based on several factors including spectral library match (if applicable), the ability to rationalize as many fragment ions as possible as well as the overall experience in working with a particular class of compounds. The confidence scale ranking brackets are defined as follows: 1.00: Full confidence that the single candidate is the correct structure. 0.90 to 0.99: High confidence that candidate is the correct structure. 0.80 to 0.89: Good confidence that candidate is the correct structure. 0.70 to 0.79: Fair confidence that candidate is the correct structure. 0.60 to 0.69: Poor confidence that candidate is the correct structure. For some challenges (e.g. Ch 4, 6, 8, 14) the data could fit equally well several structural isomers, which reduces the overall confidence that the highest ranking candidate is the correct structure. It was noted that for some of the originally posted challenges (1-9) there is a discrepancy between the raw data in the original manufacturers format and the peak list provided. In those cases the original file was used for evaluation. Reference (1) Newsome, A. and Nikolic D. CASMI 2013: Identification of small molecules by tandem mass spectrometry combined with database and literature mining; Mass Spectrometry 3, S0034 (2014)
Participant: Bertrand Authors: Bertrand, Samuel(1) Affiliations: (1) Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences Pharmaceutiques et Biologiques, Université de Nantes, France ParticipantID: SamuelBLCMS Category: category1 Automatic methods: yes Spectral libraries: no Abstract The challenge data were automatically treated using R, XCMS [1], IPO [2], CAMERA [3], SIRIUS3 [4], MeHaloCoA [5], RMassBank [6] and CFM-ID [7] as follow, and stored during the analysis in a MYSQL databases throughout the process: 1- LC-MS data were transformed in centroid mode using proteowizad if necessary. 2- LC peaks detection was achieved using XCMS after peak detection optimisation with IPO. In the case of some challenges, peaks detection was optimized manually. 2- in the challenge-related peak, neutralLosses, adducts were searched within PCgroups using CAMERA. 3- MS2 spectra of the ions related to each challenges were retrieved using RMassBank 4- for each challenge, molecular formula obtained using SIRIUS3 and discriminated based on isotpic distribution, MS2 fragmentation (calculated by SIRIUS) and adduct redundancy (number of occurrences of the MF among all adducts over the maximum number of occurrences of a MF among all proposed MF). The presence of S, Cl, Br atoms were automatically detected using MeHaloCoA. 5- molecular formula of compounds (corrected from adduct information) were searched into various databases looking for CAS number, InChI, InChIKey, SMILES, Mol: AntiBase, ChEBI, DNP, DMNP, KNAPSACK, UNPD, KEGG, LipidMaps. For each compounds found in the data bases missing data were completed (as much as possible) using OpenBabel [8], CTS [9], CACTUS [10], ChemSpider [11]. 6- MS2 similarity between simulated and measures MS2 were evaluated and scored using CFM-ID (when possible). 7- final scores (SF) was calculated according to MF score (SMF) and MS2 similarity score (SMS2) as follow: SF=SMF+SMS2. Note: when no sucessfull detection of the peaks were achieved (Challenges 1-2, 5-9 and 16), the raw MS spectra (available on the casmi website) were manually introduced for calculation. No structures were submitted for Challenges 3, 8 du to the absence of structures in DB. Bibliography: [1] R. Tautenhahn, et al., BMC Bioinf., 2008, 9, 504. [2] G. Libiseller, et al., BMC Bioinf., 2015, 16, 118. [3] C. Kuhl, et al., Anal. Chem., 2012, 84, 283. [4] S. Böcker, et al., Bioinformatics, 2009, 25, 218. [5] http://yguitton.github.io/MeHaloCoA/ [6] http://bioconductor.org/packages/RMassBank/ [7] F. Allen, et al., Metabolomics, 2014, 11, 98. [8] N. O'Boyle, et al., J. Cheminformatics, 2011, 3, 33. [9] G. Wohlgemuth, et al., Bioinformatics, 2010, 26, 2647. [10] http://cactus.nci.nih.gov/chemical/structure [11] H.E. Pence, et al., Journal of Chemical Education, 2010, 87, 1123.
Participant: Bertrand (manual) Authors: Bertrand, Samuel(1) Affiliations: (1) Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences Pharmaceutiques et Biologiques, Université de Nantes, France ParticipantID: SamuelBMS Category: category1 Automatic methods: yes Spectral libraries: no Abstract The challenge data were automatically treated using R, CAMERA [1], SIRIUS3 [2], MeHaloCoA [3], RMassBank [4] and CFM-ID [5] as follow, and stored during the analysis in a MYSQL databases throughout the process: 1- MS Data were manually introduced into the DB. 2- NeutralLosses, adducts were searched within MS1 spectra using CAMERA. 3- for each challenge, molecular formula obtained using SIRIUS3 and discriminated based on isotpic distribution, MS2 fragmentation (calculated by SIRIUS) and adduct redundancy (number of occurrences of the MF among all adducts over the maximum number of occurrences of a MF among all proposed MF). The presence of S, Cl, Br atoms were automatically detected using MeHaloCoA. 4- molecular formula of compounds (corrected from adduct information) were searched into various databases looking for CAS number, InChI, InChIKey, SMILES, Mol: AntiBase, ChEBI, DNP, DMNP, KNAPSACK, UNPD, KEGG, LipidMaps. For each compounds found in the data bases missing data were completed (as much as possible) using OpenBabel [6], CTS [7], CACTUS [8], ChemSpider [9]. 5- MS2 similarity between simulated and measures MS2 were evaluated and scored using CFM-ID (when possible). 6- final scores (SF) was calculated according to MF score (SMF) and MS2 similarity score (SMS2) as follow: SF=SMF+SMS2. Note: No structures were submitted for Challenges 3, 8 du to the absence of structures in DB. Bibliography: [1] C. Kuhl, et al., Anal. Chem., 2012, 84, 283. [2] S. Böcker, et al., Bioinformatics, 2009, 25, 218. [3] http://yguitton.github.io/MeHaloCoA/ [4] http://bioconductor.org/packages/RMassBank/ [5] F. Allen, et al., Metabolomics, 2014, 11, 98. [6] N. O'Boyle, et al., J. Cheminformatics, 2011, 3, 33. [7] G. Wohlgemuth, et al., Bioinformatics, 2010, 26, 2647. [8] http://cactus.nci.nih.gov/chemical/structure [9] H.E. Pence, et al., Journal of Chemical Education, 2010, 87, 1123.
Participant: Kind Authors: Tobias Kind Affiliations: UC Davis Genome Center - Metabolomics ParticipantID: tkind Category: category1 Automatic methods: no Abstract This is a submission for the http://www.casmi-contest.org/2016/ Category 1: Best Structure Identification on Natural Products The challenges for Category 1 are natural products from several organisms of different possible origin (plants, fungi, marine sponges, algae or micro-algae), acquired on QToF instruments from Waters and Agilent. Based on the MS and MS/MS and other data, the goal is to determine the correct molecular structure at the given retention time using the spectral data and the additional information provided. (1) Molecular formulas were determined with the Seven Golden Rules [http://fiehnlab.ucdavis.edu/projects/Seven_Golden_Rules] and Sirius [https://bio.informatik.uni-jena.de/software/sirius/] In some cases the provided data was not sufficient and was extracted from the raw files using ProteoWizard and MZMine. (2) Formulae were then queried in Dictionary of Natural Products [http://dnp.chemnetbase.com/] and UNPD [http://pkuxxj.pku.edu.cn/UNPD/] as well as ChemSpider [http://www.chemspider.com/] and REAXYS [https://www.reaxys.com] to obtain molecular structures. (3) Obtained molecule candidates from the natural product databases were downloaded as SMILES or InCHI and InChiKey and then submitted to different programs to rank them. CFM-ID was used to generate MS/MS spectra [https://sourceforge.net/projects/cfm-id/]. Additionally the MS-Finder software [http://prime.psc.riken.jp/Metabolomics_Software/] and CSI-FingerID [http://www.csi-fingerid.org/] were used for compound ranking. Subsequently all compound data was converted into MGF format and MS/MS spectra were submitted to NIST14 GUI MS/MS database search and manual peak inspection. For some cases additional neutral losses and charachteristic product ion peaks were investigated with the MS-Finder GUI. This manual process of compound annotation is highly unsustainable, error-prone, frustrating and time-consuming. Fully automated processes have to be developed. More importantly completely unknown compounds can not be elucidated with this workflow, because MS/MS data and retention time is not sufficient for complete structure elucidation.
Participant: Allard (ISDB UNPD) Authors: Allard, Pierre-Marie(1) and Houriet, Joëlle(1) Affiliations: (1) Laboratory of Phytochemistry and Bioactive Natural Products, School of Pharmaceutical Sciences, University of Geneva, Quai-Ernest Ansermet 30, 1211 Geneva, Switzerland ParticipantID: pmaUNPDISDB Category: category1 Automatic pipeline: yes Spectral libraries: yes Abstract: We processed only data of category 1, in positive mode (challenge 1 to 14). Data conversion: Data of challenge 1 to 9 were converted to .mzXML format using Proteowizzard. Fragmentation spectra of the ions of interest were extracted and saved as .mgf files. Parent ion mass in .mgf files was corrected to fit the exact mass of ion of interest when necessary. Molecular network generation: (Molecular network was generated to assess possible structural relationship between metabolites and to generate a common .mgf file) All .mgf files (challenge 1 to 14) were uploaded to GNPS servers (http://gnps.ucsd.edu) and treated in the data treatment workflow using the following parameters: The data were clustered with MS-Cluster with a parent mass tolerance of 0.8 Da and a MS/MS fragment ion tolerance of 0.5 Da to create consensus spectra. A network was then created, where edges were filtered to have a cosine score above 0.7 and more than 6 matched peaks. Further edges between two nodes were kept in the network if, and only if, each of the nodes appeared in each other's respective top 10 most similar nodes. In-Silico Database (ISDB) spectral match: The UNPD-ISDB was created from data of the UNPD database (http://pkuxxj.pku.edu.cn/UNPD/). The ISDB were generated using cfm-id (https://sourceforge.net/projects/cfm-id/) as described in : Allard, P.-M.; Péresse, T.; Bisson, J.; Gindro, K.; Marcourt, L.; Pham, V. C.; Roussi, F.; Litaudon, M.; Wolfender, J.-L. Anal. Chem. 2016, 88, 3317–3323. The clustered .mgf file obtained were searched against both ISDBs using tremolo (http://proteomics.ucsd.edu/Software/Tremolo/) for the spectral match and in-house script for annotation of the hits. The spectral search was made using the following parameters: tolerance.PM_tolerance=0.005 (0.001 for Q-Exactive acquired spectras) SCORE_THRESHOLD=0.1 TOP_K_RESULTS=10 Detailed workflow to perform spectral match, scripts and the UNPD-ISDB are available here : http://oolonek.github.io/ISDB/ Filtering of hits: Hits were reported as InChi code. Stereochemistry was then cleared (using JChem Standardizer from ChemAxon) and duplicate entries were removed. Note: no Molecular Formula calculation is done in the process.
Participant: Allard (ISDB DNP) Authors: Allard, Pierre-Marie(1) and Houriet, Joëlle(1) Affiliations: (1) Laboratory of Phytochemistry and Bioactive Natural Products, School of Pharmaceutical Sciences, University of Geneva, Quai-Ernest Ansermet 30, 1211 Geneva, Switzerland ParticipantID: pmaDNPISDB Category: category1 Automatic pipeline: yes Spectral libraries: yes Abstract: We processed only data of category 1, in positive mode (challenge 1 to 14). Data conversion: Data of challenge 1 to 9 were converted to .mzXML format using Proteowizzard. Fragmentation spectra of the ions of interest were extracted and saved as .mgf files. Parent ion mass in .mgf files was corrected to fit the exact mass of ion of interest when necessary. Molecular network generation: (Molecular network was generated to assess possible structural relationship between metabolites and to generate a common .mgf file) All .mgf files (challenge 1 to 14) were uploaded to GNPS servers (http://gnps.ucsd.edu) and treated in the data treatment workflow using the following parameters: The data were clustered with MS-Cluster with a parent mass tolerance of 0.8 Da and a MS/MS fragment ion tolerance of 0.5 Da to create consensus spectra. A network was then created, where edges were filtered to have a cosine score above 0.7 and more than 6 matched peaks. Further edges between two nodes were kept in the network if, and only if, each of the nodes appeared in each other's respective top 10 most similar nodes. In-Silico Database (ISDB) spectral match: This ISDB was created from data of the Dictionary of Natural Products. The DNP-ISDB was generated using cfm-id (https://sourceforge.net/projects/cfm-id/) as described in : Allard, P.-M.; Péresse, T.; Bisson, J.; Gindro, K.; Marcourt, L.; Pham, V. C.; Roussi, F.; Litaudon, M.; Wolfender, J.-L. Anal. Chem. 2016, 88, 3317–3323. The clustered .mgf file obtained were searched against both ISDBs using tremolo (http://proteomics.ucsd.edu/Software/Tremolo/) for the spectral match and in-house script for annotation of the hits. The spectral search was made using the following parameters: tolerance.PM_tolerance=0.005 (0.001 for Q-Exactive acquired spectras) SCORE_THRESHOLD=0.1 TOP_K_RESULTS=10 Detailed workflow to perform spectral match, scripts and the UNPD-ISDB are available here : http://oolonek.github.io/ISDB/ Filtering of hits: Hits were reported as InChi code. Stereochemistry was then cleared (using JChem Standardizer from ChemAxon) and duplicate entries were removed. Note: no Molecular Formula calculation is done in the process.
Participant: Ruttkies (MetFrag+CFM)