News

March 29th, 2017
The CASMI 2016 Cat 2+3 paper is out!

Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned!

Dec 4th, 2016
The MS1 peak lists for Category 2+3 have been added for completeness.

May 6th, 2016
The winners and full results are available.

April 25th, 2016
The solutions are public now.

April 18th, 2016
The contest is closed now, the results are fantastic and will be opened soon!

April 9th, 2016
All teams who submit before the deadline April 11th will be allowed to update the submission until Friday 15th.

February 12th, 2016
New categories 2 and 3 and data for automatic methods released. 10 new challenges in category 1.

January 25th, 2016
E. Schymanski and S. Neumann joined the organising team, additional contest data coming soon.

January 11th, 2016
New CASMI 2016 raw data files are available.


Extra results in Category 3
The "extra" evaluations include all submissions that were submitted after passing of the contest deadline, and also results by Christoph Ruttkies who is considered an internal participant.

We also offer to run future submissions through the evaluation pipeline and put the results up here. Please note that such future submissions will have been performed after release of the solutions, unlike the contest entries.

Summary of Challenge wins

Allen
Allen (retrained)
Ruttkies (MF+RT+Ref)
Ruttkies (MF+CFM+RT+Ref)
Ruttkies (MF+CFM+RT+Ref+MoNA)
Kind
Gold 124 128 168 174 167 148
Silver 47 45 16 14 23 22
Bronze 22 22 13 10 11 11
Gold (neg) 53 53 68 70 64 59
Gold (pos) 71 75 100 104 103 89

Summary statistics per participant

Mean rank Median rank Top Top3 Top10 Mean RRP Median RRP
Allen 14.00 1.0 117 159 182 0.969 1.000
Allen (retrained) 13.62 1.0 120 160 182 0.971 1.000
Ruttkies (MF+RT+Ref) 7.04 1.0 162 183 191 0.987 1.000
Ruttkies (MF+CFM+RT+Ref) 5.39 1.0 163 180 199 0.989 1.000
Ruttkies (MF+CFM+RT+Ref+MoNA) 4.25 1.0 155 182 194 0.990 1.000
Kind 6.40 1.0 146 162 174 0.904 1.000

Summary of Rank by Challenge and Participant

For each challenge, the rank of the winner(s) is highlighted in bold. If the submission did not contain the correct candidate this is denoted as "-". If someone did not participate in a challenge, nothing is shown. The tables are sortable if you click into the column header.

This summary is also available as CSV download.

Allen Allen (retrained) Ruttkies (MF+RT+Ref) Ruttkies (MF+CFM+RT+Ref) Ruttkies (MF+CFM+RT+Ref+MoNA) Kind
challenge-001 1.5 1.5 3.5 6.0 38.0 2.0
challenge-002 2.0 2.0 3.0 3.0 2.0 4.0
challenge-003 7.5 7.5 3.0 3.0 5.0 16.5
challenge-004 1.5 1.5 3.0 3.0 9.0 2.0
challenge-005 1.0 1.0 1.0 1.0 1.0 1.0
challenge-006 14.0 14.0 10.0 9.0 15.0 18.0
challenge-007 1.0 1.0 1.0 1.0 1.0 1.0
challenge-008 1.0 1.0 1.0 1.0 1.0 1.0
challenge-009 1.0 1.0 1.0 1.0 1.0 1.0
challenge-010 6.0 6.0 2.0 2.0 1.0 27.5
challenge-011 1.0 1.0 1.0 1.0 1.0 1.0
challenge-012 1.0 1.0 1.0 1.0 1.0 1.0
challenge-013 2.0 2.0 1.0 1.0 1.0 1.0
challenge-014 19.5 19.5 1.0 1.0 1.0 1.0
challenge-015 1.0 1.0 1.0 1.0 1.0 1.0
challenge-016 1.0 1.0 1.0 1.0 1.0 1.0
challenge-017 1.0 1.0 1.0 1.0 1.0 4.5
challenge-018 1.0 1.0 1.0 1.0 1.0 1.0
challenge-019 1.0 1.0 1.0 1.0 1.0 1.0
challenge-020 3.0 3.0 2.0 2.0 1.0 2.0
challenge-021 1.0 1.0 1.0 1.0 1.0 1.0
challenge-022 7.0 7.0 2.0 2.0 3.0 4.0
challenge-023 1.5 1.5 2.0 2.0 3.0 2.0
challenge-024 1.0 1.0 1.0 1.0 1.0 1.0
challenge-025 1.0 1.0 1.0 1.0 1.0 1.0
challenge-026 1.5 1.5 1.0 1.0 1.0 44.0
challenge-027 94.5 94.5 1.0 1.0 27.0 94.5
challenge-028 7.0 7.0 1.0 1.0 1.0 1.0
challenge-029 1.0 1.0 1.0 1.0 2.0 1.0
challenge-030 1.0 1.0 1.0 1.0 1.0 1.0
challenge-031 1.0 1.0 1.0 1.0 1.0 1.0
challenge-032 42.0 42.0 1.0 1.0 3.0 58.0
challenge-033 4.0 4.0 1.0 1.0 2.0 1.0
challenge-034 3.0 3.0 1.0 1.0 1.0 1.0
challenge-035 1.0 1.0 1.0 1.0 1.0 6.5
challenge-036 1170.5 1170.5 1.0 1.0 1.0 1.0
challenge-037 1.0 1.0 1.0 1.0 1.0 1.0
challenge-038 1.0 1.0 1.0 1.0 1.0 1.0
challenge-039 5.0 5.0 5.0 9.0 2.0 9.5
challenge-040 2.0 2.0 1.0 1.0 1.0 1.0
challenge-041 437.0 437.0 1.0 1.0 1.0 65.5
challenge-042 1.5 1.5 2.0 2.0 4.0 2.0
challenge-043 1.5 1.5 5.0 6.0 35.0 3.0
challenge-044 1.0 1.0 1.0 1.0 1.0 1.0
challenge-045 3.0 3.0 1.0 1.0 14.0 1.0
challenge-046 8.5 8.5 2.0 2.0 2.0 94.0
challenge-047 136.0 136.0 1.0 1.0 1.0 3.0
challenge-048 1.0 1.0 1.0 1.0 1.0 1.0
challenge-049 1.5 1.5 1.0 1.0 1.0 1.0
challenge-050 1.0 1.0 1.0 1.0 1.0 1.0
challenge-051 1.0 1.0 1.0 1.0 1.0 1.0
challenge-052 1.0 1.0 1.0 1.0 2.0 18.0
challenge-053 1.0 1.0 1.0 1.0 4.0 1.0
challenge-054 3.0 3.0 1.0 1.0 11.0 1.0
challenge-055 1.0 1.0 1.0 1.0 1.0 1.0
challenge-056 1.0 1.0 1.0 1.0 1.0 1.0
challenge-057 1.0 1.0 1.0 1.0 1.0 1.0
challenge-058 1.0 1.0 2.0 1.0 1.0 1.0
challenge-059 1.0 1.0 1.0 1.0 1.0 1.0
challenge-060 2.0 2.0 1.0 1.0 1.0 1.0
challenge-061 1.0 1.0 1.0 1.0 1.0 1.0
challenge-062 1.0 1.0 1.0 1.0 1.0 1.0
challenge-063 1.0 1.0 1.0 1.0 1.0 1.0
challenge-064 1.0 1.0 1.0 1.0 1.0 1.0
challenge-065 2.0 2.0 1.0 1.0 1.0 1.0
challenge-066 4.5 4.5 1.0 1.0 1.0 1.0
challenge-067 1.0 1.0 1.0 1.0 1.0 1.0
challenge-068 1.0 1.0 1.0 1.0 1.0 1.0
challenge-069 1.0 1.0 1.0 1.0 1.0 1.0
challenge-070 1.0 1.0 1.0 1.0 1.0 1.0
challenge-071 1.0 1.0 1.0 1.0 1.0 1.0
challenge-072 1.0 1.0 1.0 1.0 1.0 1.0
challenge-073 1.0 1.0 1.0 1.0 1.0 1.0
challenge-074 1.0 1.0 1.0 1.0 1.0 1.0
challenge-075 1.0 1.0 1.0 1.0 1.0 1.0
challenge-076 1.0 1.0 1.0 1.0 1.0 2.0
challenge-077 2.0 2.0 55.0 51.0 5.0 1.0
challenge-078 7.0 7.0 1.0 1.0 1.0 1.0
challenge-079 1.0 1.0 1.0 1.0 1.0 1.0
challenge-080 1.0 1.0 1.0 1.0 1.0 1.0
challenge-081 2.0 2.0 2.0 2.0 1.0 1.0
challenge-082 1.0 1.0 1.0 1.0 1.0 1.0
challenge-083 27.0 19.0 1.0 1.0 1.0 50.0
challenge-084 1.0 1.0 1.0 1.0 1.0 1.0
challenge-085 7.0 4.0 1.0 1.0 1.0 22.0
challenge-086 1.0 1.0 1.0 1.0 1.0 1.0
challenge-087 35.5 85.0 12.0 4.0 3.0 -
challenge-088 1.0 1.0 1.0 1.0 1.0 1.0
challenge-089 77.0 123.0 17.0 7.0 8.0 -
challenge-090 38.5 13.0 1.0 1.0 1.0 1.0
challenge-091 13.0 14.0 13.0 10.0 12.0 4.5
challenge-092 23.0 23.0 120.0 109.0 22.0 13.0
challenge-093 1.0 1.0 1.0 1.0 1.0 1.0
challenge-094 1.0 1.0 1.0 1.0 1.0 1.0
challenge-095 2.0 3.0 1.0 1.0 1.0 1.0
challenge-096 1.0 1.0 1.0 1.0 1.0 1.0
challenge-097 1.0 1.0 1.0 1.0 1.0 1.0
challenge-098 1.0 1.0 1.0 1.0 1.0 1.0
challenge-099 1.0 1.0 1.0 1.0 1.0 1.0
challenge-100 21.5 7.0 56.0 7.0 2.0 14.0
challenge-101 11.0 10.0 75.0 4.0 7.0 -
challenge-102 2.0 2.0 1.0 1.0 1.0 1.0
challenge-103 6.0 6.0 3.0 5.0 2.0 13.0
challenge-104 6.0 3.0 2.0 2.0 1.0 1.0
challenge-105 1.0 1.0 1.0 1.0 1.0 1.0
challenge-106 2.0 2.0 1.0 1.0 1.0 1.0
challenge-107 1.0 1.0 1.0 1.0 1.0 77.0
challenge-108 1.0 1.0 1.0 1.0 1.0 1.0
challenge-109 1.0 1.0 1.0 1.0 1.0 1.0
challenge-110 1.0 1.0 1.0 1.0 1.0 1.0
challenge-111 1.0 1.0 1.0 1.0 1.0 1.0
challenge-112 2.0 1.0 1.0 1.0 1.0 1.0
challenge-113 3.0 4.0 1.0 1.0 1.0 1.5
challenge-114 1.0 1.0 1.0 1.0 1.0 1.0
challenge-115 1.0 1.0 1.0 1.0 1.0 1.0
challenge-116 1.5 2.0 1.0 1.0 2.0 35.0
challenge-117 1.0 1.0 1.0 1.0 1.0 1.0
challenge-118 1.0 1.0 1.0 1.0 1.0 1.0
challenge-119 125.0 45.0 14.0 15.0 9.0 -
challenge-120 2.0 2.0 3.0 9.0 3.0 1.0
challenge-121 3.0 3.0 1.0 1.0 1.0 1.0
challenge-122 1.0 1.0 1.0 1.0 1.0 1.0
challenge-123 1.0 1.0 1.0 1.0 1.0 1.0
challenge-124 1.0 1.0 1.0 1.0 1.0 1.0
challenge-125 2.0 1.0 1.0 1.0 1.0 25.0
challenge-126 20.0 6.0 16.0 6.0 7.0 -
challenge-127 65.0 1.0 1.0 1.0 1.0 -
challenge-128 1.0 1.0 1.0 1.0 1.0 3.0
challenge-129 8.0 28.0 2.0 2.0 4.0 3.5
challenge-130 1.0 1.0 1.0 1.0 1.0 1.0
challenge-131 51.5 28.0 113.0 5.0 20.0 -
challenge-132 1.0 1.0 1.0 1.0 1.0 1.0
challenge-133 1.0 1.0 1.0 1.0 1.0 1.0
challenge-134 1.0 1.0 1.0 1.0 2.0 2.0
challenge-135 2.0 2.0 3.0 2.0 2.0 22.0
challenge-136 4.0 21.0 1.0 1.0 1.0 47.0
challenge-137 1.0 1.0 1.0 1.0 1.0 1.0
challenge-138 2.0 2.0 1.0 1.0 1.0 1.0
challenge-139 1.0 1.0 1.0 1.0 1.0 1.0
challenge-140 9.0 8.0 1.0 1.0 1.0 1.0
challenge-141 6.0 6.0 1.0 1.0 1.0 1.0
challenge-142 1.0 1.0 1.0 1.0 1.0 1.0
challenge-143 3.0 3.0 1.0 1.0 1.0 1.0
challenge-144 1.0 2.0 1.0 1.0 1.0 1.0
challenge-145 1.0 1.0 1.0 1.0 1.0 1.0
challenge-146 2.0 2.0 10.0 3.0 3.0 1.0
challenge-147 1.0 1.0 1.0 1.0 1.0 1.0
challenge-148 1.0 2.0 1.0 1.0 1.0 1.0
challenge-149 1.0 1.0 1.0 1.0 1.0 1.0
challenge-150 1.0 1.0 1.0 1.0 1.0 1.0
challenge-151 1.0 1.0 1.0 1.0 1.0 1.0
challenge-152 1.0 1.0 1.0 1.0 1.0 1.0
challenge-153 2.0 3.0 1.0 1.0 1.0 6.0
challenge-154 6.0 19.0 1.0 1.0 1.0 23.0
challenge-155 1.0 1.0 1.0 1.0 1.0 1.0
challenge-156 1.0 1.0 1.0 1.0 1.0 1.0
challenge-157 15.0 15.0 5.0 4.0 3.0 8.0
challenge-158 1.0 1.0 1.0 1.0 1.0 1.0
challenge-159 30.0 32.0 1.0 1.0 1.0 6.0
challenge-160 10.0 4.0 2.0 3.0 2.0 1.0
challenge-161 1.0 1.0 3.0 4.0 3.0 3.0
challenge-162 9.0 7.0 1.0 1.0 1.0 3.0
challenge-163 3.0 3.0 1.0 1.0 1.0 2.0
challenge-164 1.0 1.0 1.0 1.0 1.0 1.0
challenge-165 1.0 1.0 1.0 1.0 1.0 1.0
challenge-166 1.0 1.0 4.0 8.0 2.0 1.0
challenge-167 1.0 1.0 3.0 2.0 2.0 1.0
challenge-168 2.0 2.0 214.0 368.0 266.0 1.0
challenge-169 1.0 1.0 1.0 1.0 1.0 1.0
challenge-170 1.0 1.0 1.0 1.0 1.0 1.0
challenge-171 26.0 34.0 20.0 30.0 33.0 -
challenge-172 3.0 1.0 3.0 4.0 2.0 1.0
challenge-173 1.0 1.0 1.0 1.0 1.0 1.0
challenge-174 1.0 1.0 1.0 1.0 1.0 1.0
challenge-175 13.5 13.0 1.0 1.0 1.0 66.0
challenge-176 1.0 1.0 1.0 1.0 1.0 1.0
challenge-177 1.0 1.0 1.0 1.0 1.0 1.0
challenge-178 3.0 2.0 1.0 1.0 1.0 1.0
challenge-179 1.0 1.0 1.0 1.0 2.0 1.0
challenge-180 12.5 22.0 4.0 4.0 4.0 39.0
challenge-181 1.0 1.0 1.0 1.0 1.0 1.0
challenge-182 1.0 1.0 1.0 1.0 1.0 1.0
challenge-183 2.0 1.0 2.0 2.0 2.0 9.0
challenge-184 3.0 1.0 37.0 87.0 4.0 1.0
challenge-185 1.0 1.0 1.0 1.0 1.0 1.0
challenge-186 1.0 2.0 1.0 1.0 1.0 1.0
challenge-187 6.0 6.0 1.0 1.0 1.0 1.0
challenge-188 13.0 14.0 1.0 1.0 1.0 1.0
challenge-189 1.0 1.0 1.0 1.0 1.0 1.0
challenge-190 1.0 1.0 1.0 1.0 1.0 1.0
challenge-191 1.0 1.0 1.0 1.0 1.0 1.0
challenge-192 2.0 2.0 1.0 1.0 1.0 1.0
challenge-193 1.0 1.0 1.0 1.0 1.0 7.0
challenge-194 1.0 1.0 1.0 1.0 1.0 1.0
challenge-195 1.0 1.0 1.0 1.0 1.0 1.0
challenge-196 1.0 1.0 1.0 1.0 1.0 2.0
challenge-197 2.5 2.0 148.0 76.0 57.0 1.0
challenge-198 8.0 13.0 12.0 8.0 2.0 -
challenge-199 1.0 1.0 138.0 43.0 36.0 -
challenge-200 2.0 2.0 1.0 1.0 2.0 3.0
challenge-201 1.0 1.0 1.0 1.0 1.0 1.0
challenge-202 2.0 2.0 1.0 1.0 1.0 1.0
challenge-203 1.0 1.0 1.0 1.0 1.0 1.0
challenge-204 6.5 5.0 1.0 1.0 1.0 -
challenge-205 4.0 4.0 1.0 1.0 1.0 1.0
challenge-206 1.0 1.0 1.0 1.0 1.0 1.0
challenge-207 19.0 19.0 144.0 21.0 12.0 123.0
challenge-208 1.0 1.0 1.0 1.0 1.0 1.0


Participant information and abstracts

ParticipantID:        FelicityAllenCFMOrig
Category:             category3
Authors:              Felicity Allen, Tanvir Sajed, Russ Greiner, David Wishart
Affiliations:         Department of Computing Science
		      University of Alberta, Canada
Automatic pipeline:   yes
Spectral libraries:   no

Abstract

We processed the list of molecules and provided candidates using cfm-id.

We combined two scores:  CFM_SCORE + DB_SCORE

CFM SCORE

The original  CFM positive and negative models were used, which were trained 
on data from the Metlin database.  Mass tolerances of 10ppm were used
and the Jaccard score was applied for spectral comparisons. The input spectrum
was repeated for the low, medium and high energies.

DB_SCORE

We checked for membership of each candidate in HMDB, ChEBI, a local database of 
plant derived compounds, FOODB and DRUGBANK and assigned +10 to the score for
each database the compound was found to be a member of.

 
Participant:	      Allen
Authors:              Felicity Allen, Tanvir Sajed, Russ Greiner, David Wishart
Affiliations:         Department of Computing Science
		      University of Alberta, Canada

ParticipantID:        FelicityAllenCFMRetrained
Category:             category3
Automatic pipeline:   yes
Spectral libraries:   no

Abstract

We processed the list of molecules and provided candidates using cfm-id.

We combined two scores:  CFM_SCORE + DB_SCORE

CFM SCORE

A new CFM model trained on data from Metlin and NIST MS/MS was used
for the positive mode spectra.  This new model also incorporated
altered chemical features and a neural network within the transition
function.  Mass tolerances of 10ppm were used, and the DotProduct
score was applied for spectral comparisons.  This model only combined
the spectra across energies before training, so only one energy exists
in the output.  For the Negative model we have not yet trained a new
model, so the original negative CFM model was used, as for the CFMOrig
entry.

DB_SCORE

We checked for membership of each candidate in HMDB, ChEBI, a local database of 
plant derived compounds, FOODB and DRUGBANK and assigned +10 to the score for
each database the compound was found to be a member of.
ParticipantID:       ruttkies
Authors:             Christoph Ruttkies (1), Emma Schymanski (2) 
		     and Steffen Neumann (1)
Affiliations:        (1) Leibniz Institute of Plant Biochemistry, Germany
		     (2) Eawag: Swiss Federal Institute for Aquatic Science 
		     and Technology, Dübendorf, Switzerland

Automatic methods:   yes

Abstracts

### Category 3:
## ruttkies_metfrag_rt_refs:

MetFragCL 2.3
(http://msbi.ipb-halle.de/~cruttkie/metfrag/MetFrag2.3-CL.jar) 
(former version 2.2 published in [1]) was used to process the given MS/MS
peaklists. The CSIDs from the provided candidate lists were used to
select candidates from the online ChemSpider database. Parameters for
fragmentation were set with mzppm equal to 5, mzabs equal to 0.001 and
tree depth equal to 2. The adduct type of the precursor was set to
[M+H]+ for positive ionization and [M-H]- for negative ionization
mode. Candidates consisting of non-covalent bound substructures
(e.g. salts) and containing non-standard isotopes were filtered
out. As additional scoring terms the retention time score and the
number of references retrieved from the online ChemSpider database
(ChemSpiderReferenceCount) described in [1]. For the linear retention
time model retention times from the negative and positive training set
were used together with the CDK calculated logP values. The best
weight combination, out of 1000 randomly drawn from the simplex,
giving the highest number of correctly Top1 ranked candidates in the
training set was chosen for w_MetFrag, weighting MetFrag's Fragmenter
score, w_RT, weighting the retention time score, and w_Refs, weighting
the Reference score. Positive and negative mode were optimized
separately. The weighted sum of the scores was used to create the
final candidate list for the positive and negative ionization mode,
respectively. The used weights for positive ionization mode were:
w_MetFrag = 0.4260182, w_RT = 0.2206725, w_Refs = 0.3533094. The used
weights for negative ionization mode were: w_MetFrag = 0.3982628, 
w_RT = 0.2321251, w_Refs = 0.3696120.

## ruttkies_metfrag_rt_refs_cfmid:

MetFragCL 2.3 (http://msbi.ipb-halle.de/~cruttkie/metfrag/MetFrag2.3-CL.jar) 
(former version 2.2 published in [1]) was used to process the given MS/MS
peaklists. The CSIDs from the provided candidate lists were used to
select candidates from the online ChemSpider database. Parameters for
fragmentation were set with mzppm equal to 5, mzabs equal to 0.001 and
tree depth equal to 2. The adduct type of the precursor was set to
[M+H]+ for positive ionization and [M-H]- for negative ionization
mode. Candidates consisting of non-covalent bound substructures
(e.g. salts) and containing non-standard isotopes were filtered
out. The resulting candidate lists were used as input for CFM-ID [2]
version 2 to retrieve an additional scoring term that was used to
calculate the final score as described in [1]. Further scoring terms
included were the retention time score and the number of references
retrieved from the online ChemSpider database
(ChemSpiderReferenceCount) as described in [1]. For the linear
retention time model retention times from the negative and positive
training set were used together with the CDK calculated logP
values. The best weight combination, out of 1000 randomly drawn from
the simplex, giving the highest number of correctly Top1 ranked
candidates in the training set was chosen for w_MetFrag, weighting
MetFrag's Fragmenter score, w_RT, weighting the retention time score,
w_Refs, weighting the Reference score, and w_CFM-ID, weighting the
CFM-ID score. Positive and negative mode were optimized
separately. The weighted sum of the scores was used to create the
final candidate list for the positive and negative ionization mode,
respectively. The best weights for positive ionization mode were:
w_MetFrag = 0.4260182, w_RT = 0.2206725, w_Refs = 0.3533094. The used
weights for positive ionization mode were: w_MetFrag = 0.43807140,
w_RT = 0.09885304, w_Refs = 0.33431292, w_CFM-ID = 0.12876264. The
used weights for negative ionization mode were: w_MetFrag =
0.38728278, w_RT = 0.19584541, w_Refs = 0.32712506, 
w_CFM-ID = 0.08974675.

## ruttkies_metfrag_rt_refs_cfmid_mona:

MetFragCL 2.3
(http://msbi.ipb-halle.de/~cruttkie/metfrag/MetFrag2.3-CL.jar) 
(former version 2.2 published in [1]) was used to process the given MS/MS
peaklists. The CSIDs from the provided candidate lists were used to
select candidates from the online ChemSpider database. Parameters for
fragmentation were set with mzppm equal to 5, mzabs equal to 0.001 and
tree depth equal to 2. The adduct type of the precursor was set to
[M+H]+ for positive ionization and [M-H]- for negative ionization
mode. Candidates consisting of non-covalent bound substructures
(e.g. salts) and containing non-standard isotopes were filtered
out. The resulting candidate lists were used as input for CFM-ID [2]
version 2 to retrieve an additional scoring term that was used to
calculate the final score as described in [1]. Further scoring terms
included were the retention time score, the number of references
retrieved from the online ChemSpider database
(ChemSpiderReferenceCount) as described in [1] and the spectral match
based on cosine similarity of the LC-MS/MS standards library provided
by the MassBank of North America (MoNA) (from
http://mona.fiehnlab.ucdavis.edu/spectra/querytree accessed January
2016). The approach is known from MetFusion [3]. For the linear
retention time model retention times from the negative and positive
training set were used together with the CDK calculated logP
values. The best weight combination, out of 1000 randomly drawn from
the simplex, giving the highest number of correctly Top1 ranked
candidates in the training set was chosen for w_MetFrag, weighting
MetFrag's Fragmenter score, w_RT, weighting the retention time score,
w_Refs, weighting the Reference score, w_CFM-ID, weighting the CFM-ID
score, and w_MoNA, weighting the spectral MetFusion similarity
score. Positive and negative mode were optimized separately. The
weighted sum of the scores was used to create the final candidate list
for the positive and negative ionization mode, respectively. The used
weights for positive ionization mode were: w_MetFrag = 0.16212070,
w_RT = 0.08104633, w_Refs = 0.25308415, w_CFM-ID = 0.06701364, w_MoNA
= 0.43673519. The used weights for negative ionization mode were:
w_MetFrag = 0.13587813, w_RT = 0.09295245, w_Refs = 0.09457464,
w_CFM-ID = 0.17781439, w_MoNA = 0.49878039.

[1] - Ruttkies C*, Schymanski L E*, Wolf S, Hollender J, Neumann S (2016) 
MetFrag relaunched: incorporating strategies beyond in silico fragmentation. 
J of Cheminformatics 8(3). doi:10.1186/s13321-016-0115-9
[2] - Allen F, Greiner R, Wishart D (2015) Competitive fragmentation modeling 
of ESI–MS/MS spectra for putative metabolite identification. Metabolomics
11(1):98–110. doi:10.​1007/​s11306-014-0676-4
[3] - Gerlich M, Neumann S (2013) MetFusion: integration of compound
identification strategies. J Mass Spectrom 48(3):291–298. doi:10.​1002/​jms.​3123
Participant:	      Kind
Authors:              Tobias Kind(1), Hiroshi Tsugawa(2)
Affiliations:         (1) UC Davis Genome Center - Metabolomics, Davis CA, USA 
		      (2) RIKEN Center for Sustainable Resource Science (CSRS), 
		      Wako, Japan

ParticipantID:        tkind
Category:             category3
Automatic methods:    semi-automatic

Abstract
This is a submission for the http://www.casmi-contest.org/2016/
Category 3: Best Automatic Structural Identification - Full Information

This third category uses MS/MS spectra of 208 unknown compounds (validation set).
All MS/MS spectra were obtained on a Q Exactive Plus Orbitrap from Thermo Scientific, 
with <5 ppm mass accuracy and MS/MS resolution of 35,000 using electrospray ionization
and stepped 20/35/50 HCD nominal collision energies. The [M+H]+ (positive) and 
[M-H]- ion masses were recorded. 

A reversed phase C18 column was used (2.6 uM, 2.1x50 mm with 
a 2.1x5 mm precolumn) with a gradient of (A/B): 95/5 at 0 min, 95/5 at 1 min, 
0/100 at 13 min, 0/100 at 24 min (A = water, B = methanol, both with 0.1% formic acid) 
at a flow of 300 uL/min. 

In Category 3 any form of additional information can be used 
(retention time information, mass spectral libraries, patents, reference count.
This allows to demonstrate whether/how much additional information can improve 
the results of the unknown annotation. 

Approach:
Here we used a two-step procedure, first MS-Finder search and then
MS/MS search for confirmation whenever possible.

(1) Molecular formulas and structures were determined with the MS-Finder
software [http://prime.psc.riken.jp/Metabolomics_Software/] by querying
an internal structure databases and all CASMI provided structures. 
Result data was exported as text file and results were formatted for CASMI submission.

First the molecular formulas were determined with Lewis and Senior
check, 97% element ratio check and  20% isotopic abundance ratio 
and 5 ppm mass accuracy for MS1 and 20 ppm for MS2. Elements
CHNOPSFClBrI were included (Si was excluded). The top 5 formula
were regarded for structure queries. However no MS1 information was provided
for this contest, only precursor mass and product ion information.

Each formula was queried  in an internal structure database and the
CASMI compounds for the validation set category 3. An tree-depth 
of 2 and relative abundance cutoff of 1% as well as 
up to 100 possible structures were reported with MS-Finder.

The score was calculated by the in-silico fragmenter that simulates 
the alpha-cleavage of linear chains up to three chemical bonds 
with consideration of the bond-dissociation energy. Multiple bonds 
(double-, triple-, or cycles) are modeled as penalized 
single bonds in which hydrogens are lost. The final score also
includes mass accuracy, isotopic ratio, product ion assignment, 
neutral loss assignment and existence of the compound in the
internal MS-Finder structure databases.

(2) MS/MS search was used for further confirmation and the 
NIST MS Search GUI [http://chemdata.nist.gov/] together with 
major MS/MS databases such as NIST, MONA, ReSpect and MassBank 
was utilized. 

The precursor was set to 5 ppm and product ion search tolerance to 200 ppm.
For some of the searches that gave no MS/MS results also
simple similarity search without precursor info was used.
Results that gave overall low hit scores were also cross-referenced
with the STOFF-IDENT database of environmentally-relevant substances,
to obtain information on potential hit candidates.



Details per Challenge and Participant. See legend at bottom for more details

The details table is also available as HTML and as CSV download. The individual submissions are also available for download.