News
Oct 31st, 2017
The results are now available. Oct 30th, 2017
The solutions are now available. Sept 8th, 2017
Update for Challenge 15 available, but will not count in evaluation. Sept 4th, 2017
Updated mailling list and submission information. Aug 23rd, 2017
The preliminary results have been sent out to participants, and are now available. July 09th, 2017
We fixed the intensities in the TSV archive for challenges 046-243. June 22nd, 2017
We added the Category 4 on a subset of the data files. May 22nd, 2017
We have improved challenges 29, 42, 71, 89, 105, 106 and 144. April 26th, 2017
The rules and challenges of CASMI 2017 are public now ! Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned!
Oct 31st, 2017
The results are now available. Oct 30th, 2017
The solutions are now available. Sept 8th, 2017
Update for Challenge 15 available, but will not count in evaluation. Sept 4th, 2017
Updated mailling list and submission information. Aug 23rd, 2017
The preliminary results have been sent out to participants, and are now available. July 09th, 2017
We fixed the intensities in the TSV archive for challenges 046-243. June 22nd, 2017
We added the Category 4 on a subset of the data files. May 22nd, 2017
We have improved challenges 29, 42, 71, 89, 105, 106 and 144. April 26th, 2017
The rules and challenges of CASMI 2017 are public now ! Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned!
Results in Category 4
Summary of participant performance
Category 4 extra contains the scores from two methods that were submitted on 19.10.2017 after the contest deadline. The initial submissions before the deadline were unfortunately identical to IOKR_TanimotoGaussian. With these updated files, IOKR_TanimotoGaussian_AvgScore would have achieved the 3rd rank in Category 4.F1 score | Mean rank | Median rank | Top | Top3 | Top10 | Misses | TopPos | TopNeg | Mean RRP | Median RRP | N | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
IOKR_TanimotoGaussian_AvgScore | 1446 | 2721.84 | 201.0 | 38 | 60 | 73 | 0 | 22 | 16 | 0.683 | 0.957 | 198 |
MPIOKR_GaussianRFF | 1066 | 3862.30 | 582.5 | 33 | 43 | 52 | 0 | 22 | 11 | 0.587 | 0.726 | 198 |
Table legend:
- F1 score
- The Formula 1 score awards points similar to the scheme in F1 racing for each challenge based on the rank of the correct solution. In the participant table, these are summed over all challenges. Please note that the F1 score is thus not neccessarily comparable across categories.
- Mean/Median rank
- Mean and median rank of the correct solution. For tied ranks with other candidates, the average rank of the ties is used.
- Top, Top3, Top10
- Number of challenges where the correct solution is ranked first, among the Top 3 and Top 10
- Misses
- Number of challenges where the correct solution is missing.
- TopPos, TopNeg
- Top1 ranked solutions in positive or negative ionization mode.
- Mean/Median RRP
- The relative ranking position, which is also incorporating the length of candidate list.
- N
- Number of submissions that have passed the evaluation scripts.
Summary of Rank by Challenge
For each challenge, the lowest rank among participants is highlighted in bold. If the submission did not contain the correct candidate this is denoted as "-". If someone did not participate in a challenge, the table cell is empty. The tables are sortable if you click into the column header. Category4:IOKR_TanimotoGaussian_AvgScore | MPIOKR_GaussianRFF | |
---|---|---|
challenge-046 | 1647.0 | 1313.0 |
challenge-047 | 384.0 | 408.0 |
challenge-048 | 662.0 | 6981.0 |
challenge-049 | 469.5 | 1636.5 |
challenge-050 | 14.0 | 2.0 |
challenge-051 | 1.0 | 14.0 |
challenge-052 | 1477.0 | 315.0 |
challenge-053 | 2.0 | 2.0 |
challenge-054 | 11295.0 | 2985.0 |
challenge-055 | 1.0 | 29.0 |
challenge-056 | 1.0 | 1.0 |
challenge-057 | 15753.0 | 15693.0 |
challenge-058 | 533.0 | 10.0 |
challenge-059 | 1.0 | 1.0 |
challenge-060 | 8941.0 | 11264.0 |
challenge-061 | 3117.0 | 3304.0 |
challenge-062 | 1.0 | 216.0 |
challenge-063 | 1.0 | 1.0 |
challenge-064 | 11410.0 | 11188.0 |
challenge-065 | 1886.0 | 2004.0 |
challenge-066 | 11104.0 | 10588.0 |
challenge-067 | 1260.0 | 2918.0 |
challenge-068 | 8.5 | 7.5 |
challenge-069 | 307.0 | 51.0 |
challenge-070 | 11294.0 | 11323.0 |
challenge-071 | 107.0 | 9970.0 |
challenge-072 | 15577.0 | 14528.0 |
challenge-073 | 1.0 | 1.0 |
challenge-074 | 2.0 | 6.0 |
challenge-075 | 1.0 | 587.0 |
challenge-076 | 15949.0 | 15260.0 |
challenge-077 | 7099.0 | 7590.0 |
challenge-078 | 8399.0 | 12007.0 |
challenge-079 | 664.0 | 1030.0 |
challenge-080 | 5495.0 | 1.0 |
challenge-081 | 1.0 | 1.0 |
challenge-082 | 19.0 | 147.0 |
challenge-083 | 1.5 | 43.5 |
challenge-084 | 344.0 | 280.0 |
challenge-085 | 9093.0 | 9193.0 |
challenge-086 | 268.0 | 26388.0 |
challenge-087 | 1.0 | 1.0 |
challenge-088 | 3524.5 | 3503.5 |
challenge-089 | 4132.0 | 4492.0 |
challenge-090 | 209.0 | 67.0 |
challenge-091 | 704.0 | 640.0 |
challenge-092 | 32.0 | 344.0 |
challenge-093 | 3704.0 | 3578.0 |
challenge-094 | 8.0 | 16242.0 |
challenge-095 | 9242.0 | 9355.0 |
challenge-096 | 7928.0 | 247.0 |
challenge-097 | 1.0 | 1.0 |
challenge-098 | 3007.0 | 3090.0 |
challenge-099 | 10038.0 | 10059.0 |
challenge-100 | 7538.0 | 7101.0 |
challenge-101 | 12311.0 | 13007.0 |
challenge-102 | 4935.0 | 4425.0 |
challenge-103 | 1.0 | 1.0 |
challenge-104 | 115.0 | 1897.0 |
challenge-105 | 38.0 | 43.0 |
challenge-106 | 8520.0 | 3868.0 |
challenge-107 | 5550.0 | 5559.0 |
challenge-108 | 4.0 | 3721.0 |
challenge-109 | 9176.0 | 8997.0 |
challenge-110 | 6637.0 | 6611.0 |
challenge-111 | 2.0 | 6.0 |
challenge-112 | 1222.0 | 6219.0 |
challenge-113 | 380.0 | 399.0 |
challenge-114 | 7775.0 | 9881.0 |
challenge-115 | 1.0 | 1.0 |
challenge-116 | 1.0 | 39.0 |
challenge-117 | 2629.0 | 3755.0 |
challenge-118 | 6421.0 | 6842.0 |
challenge-119 | 9798.0 | 8871.0 |
challenge-120 | 1.0 | 10219.0 |
challenge-121 | 336.0 | 46.0 |
challenge-122 | 477.0 | 5358.0 |
challenge-123 | 11390.0 | 13377.0 |
challenge-124 | 1366.0 | 1349.0 |
challenge-125 | 2163.5 | 2277.5 |
challenge-126 | 32.0 | 39.0 |
challenge-127 | 562.0 | 350.0 |
challenge-128 | 7700.0 | 1.0 |
challenge-129 | 1.0 | 1879.0 |
challenge-130 | 72.0 | 110.0 |
challenge-131 | 379.0 | 384.0 |
challenge-132 | 1119.0 | 2367.0 |
challenge-133 | 4.0 | 11.0 |
challenge-134 | 1881.0 | 11909.0 |
challenge-135 | 567.5 | 596.5 |
challenge-136 | 1.0 | 1.0 |
challenge-137 | 6.5 | 1.5 |
challenge-138 | 22.0 | 3.0 |
challenge-139 | 2.0 | 185.0 |
challenge-140 | 1.0 | 11612.0 |
challenge-141 | 1.0 | 1.0 |
challenge-142 | 320.0 | 5299.0 |
challenge-143 | 50.0 | 10.0 |
challenge-144 | 25.0 | 4.0 |
challenge-145 | 25.0 | 1.0 |
challenge-146 | 11720.0 | 11091.0 |
challenge-147 | 1.0 | 1293.0 |
challenge-148 | 2.0 | 1.0 |
challenge-149 | 1.0 | 1.0 |
challenge-150 | 2.0 | 1722.0 |
challenge-151 | 10338.0 | 7983.0 |
challenge-152 | 55.0 | 80.0 |
challenge-153 | 23.0 | 34.0 |
challenge-154 | 10102.0 | 7472.0 |
challenge-155 | 2.0 | 3.0 |
challenge-156 | 15.0 | 20.0 |
challenge-157 | 25.0 | 13.0 |
challenge-158 | 3.0 | 9544.0 |
challenge-159 | 1.0 | 2299.0 |
challenge-160 | 12176.0 | 2629.0 |
challenge-161 | 261.0 | 87.0 |
challenge-162 | 1.0 | 17958.0 |
challenge-163 | 1.0 | 1.0 |
challenge-164 | 4.0 | 1.0 |
challenge-165 | 1988.0 | 2004.0 |
challenge-166 | 3641.0 | 3407.0 |
challenge-167 | 14821.0 | 15347.0 |
challenge-168 | 5.0 | 2212.0 |
challenge-169 | 27.0 | 578.0 |
challenge-170 | 1.0 | 1.0 |
challenge-171 | 2.0 | 1.0 |
challenge-172 | 204.0 | 444.0 |
challenge-173 | 1353.0 | 16181.0 |
challenge-174 | 6409.0 | 6948.0 |
challenge-175 | 1396.0 | 14.0 |
challenge-176 | 14.0 | 283.0 |
challenge-177 | 1.0 | 414.0 |
challenge-178 | 14960.0 | 1385.0 |
challenge-179 | 1732.0 | 2422.0 |
challenge-180 | 277.0 | 289.0 |
challenge-181 | 2364.0 | 16151.0 |
challenge-182 | 27.0 | 39.0 |
challenge-183 | 1.5 | 1.5 |
challenge-184 | 1.0 | 1.0 |
challenge-185 | 3729.0 | 11832.0 |
challenge-186 | 2.0 | 14.0 |
challenge-187 | 133.0 | 243.0 |
challenge-188 | 2.0 | 1.0 |
challenge-189 | 1138.0 | 47.0 |
challenge-190 | 2349.0 | 3666.0 |
challenge-191 | 974.0 | 976.0 |
challenge-192 | 36.0 | 5022.0 |
challenge-193 | 1.0 | 1.0 |
challenge-194 | 65.0 | 8323.0 |
challenge-195 | 1.5 | 3.5 |
challenge-196 | 2855.0 | 2.0 |
challenge-197 | 34.0 | 34.0 |
challenge-198 | 1.0 | 1.0 |
challenge-199 | 2745.0 | 2795.0 |
challenge-200 | 1.0 | 1.0 |
challenge-201 | 15085.0 | 26330.0 |
challenge-202 | 2337.0 | 18616.0 |
challenge-203 | 1.0 | 1.0 |
challenge-204 | 6392.0 | 6410.0 |
challenge-205 | 3.0 | 16.0 |
challenge-206 | 2329.0 | 2407.0 |
challenge-207 | 10269.0 | 9551.0 |
challenge-208 | 33.0 | 139.0 |
challenge-209 | 198.0 | 31.0 |
challenge-210 | 32.0 | 4671.0 |
challenge-211 | 1.0 | 1.0 |
challenge-212 | 15505.0 | 15390.0 |
challenge-213 | 1.0 | 1.0 |
challenge-214 | 9752.0 | 10013.0 |
challenge-215 | 9.0 | 3860.0 |
challenge-216 | 8.0 | 13.0 |
challenge-217 | 2.0 | 9.0 |
challenge-218 | 2.0 | 2.0 |
challenge-219 | 1.0 | 26.0 |
challenge-220 | 1.0 | 15801.0 |
challenge-221 | 18460.0 | 19164.0 |
challenge-222 | 585.0 | 12334.0 |
challenge-223 | 1.0 | 62.0 |
challenge-224 | 2.0 | 2.0 |
challenge-225 | 1.5 | 5.5 |
challenge-226 | 858.0 | 227.0 |
challenge-227 | 2908.0 | 2873.0 |
challenge-228 | 5.0 | 4252.0 |
challenge-229 | 18334.0 | 18559.0 |
challenge-230 | 15108.0 | 18378.0 |
challenge-231 | 1373.0 | 1369.0 |
challenge-232 | 1.0 | 1.0 |
challenge-233 | 283.0 | 2848.0 |
challenge-234 | 11448.0 | 11475.0 |
challenge-235 | 3.0 | 1536.0 |
challenge-236 | 7.0 | 1.0 |
challenge-237 | 3.0 | 21.0 |
challenge-238 | 237.0 | 23.0 |
challenge-239 | 8.0 | 1.0 |
challenge-240 | 628.0 | 5456.0 |
challenge-241 | 4.0 | 1.0 |
challenge-242 | 1.0 | 1.0 |
challenge-243 | 2.0 | 2.0 |
Participant information and abstracts
Participant: Bach ParticipantID: IOKR_TanimotoGaussian Category: 4 Authors: Eric Bach(1), Céline Brouard(1,2), Kai Dührkop(3), Sebastian Böcker(3) and Juho Rousu(1,2) Affiliations: (1) Department of Computer Science, Aalto University, Espoo, Finland (2) Helsinki Institute for Information Technology, Espoo, Finland (3) Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany Automatic pipeline: yes Spectral libraries: no Abstract We used a recent machine learning approach, called Input Output Kernel Regression (IOKR), for predicting the candidate scores. IOKR has been successfully applied to metabolite identification [1]. In this method kernel functions are used to measure the similarity between MS/MS spectra (input kernel) respectively between molecular structures (output kernel). On the input side, we use several kernels defined on MS/MS spectra and fragmentation trees, and combine them uniformly, i.e. we sum up the kernels with equal weights. On the output side, we use a Gaussian kernel on Tanimoto features calculated from binary fingerprints representing the molecular structures in the candidate sets. We train two separated IOKR models one for each ionization mode, i.e. positive and negative. For the positive model we use ~14000 identified MS/MS spectra and for the negative model ~5800. Those spectra mainly are extracted from the GNPS and MassBank databases. We represent the candidate molecular structures using ~7600 binary molecular fingerprints. For each challenge spectra we predict the molecular formula using Sirius [2] by taking into account the possible molecule formulas based on the candidate sets. The score we submitted for each candidate is the one corresponding to the most likely molecular formula. [1] Brouard, Cé.; Shen, H.; Dührkop, K.; d'Alché Buc, F.; Böcker, S. & Rousu, J. Fast metabolite identification with Input Output Kernel Regression Bioinformatics, 2016 [2] https://bio.informatik.uni-jena.de/software/sirius/
Participant: Bach ParticipantID: MPIOKR_GaussianRFF Category: 4 Authors: Eric Bach(1), Céline Brouard(1,2), Kai Dührkop(3), Sebastian Böcker(3) and Juho Rousu(1,2) Affiliations: (1) Department of Computer Science, Aalto University, Espoo, Finland (2) Helsinki Institute for Information Technology, Espoo, Finland (3) Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany Automatic pipeline: yes Spectral libraries: no Abstract Magnitude-preserving Input Output Kernel Regression (MP-IOKR) is an extension of the Input Output Kernel Regression (IOKR) method [1], which has been successfully applied to metabolite identification [2]. Magnitude-preserving IOKR uses a modified objective function, which can exploit the knowledge about the molecular candidates for a set of training MS/MS spectra. IOKR objective function for the regression function h (prediction of the feature vector representing a molecular structure): (1) h = argmin_h sum_i ||h(x_i) - psi(y_i)||^2 + lambda ||h||^2 Magnitude-preserving modification of the objective function: (2) h = argmin_h sum_i 1/n_i sum_j ||(h(x_i)-h(x_j))-(psi(y_i)-psi(y_j)))||^2 + lambda ||h||^2 with i in {1,...,l} being an iterator over the number of training data, j in {1,...,n_i} being an iterator over the number of molecular candidates of training example i. The x_i and y_i are the training MS/MS spectra respectively the training molecular structure. The x_j and y_j are the candidates' MS/MS spectra respectively the candidates' molecular structures. Equation (1) (IOKR) minimizes the prediction error between the training MS/MS spectra and the training molecular structure. In contrast to that, Equation (2) (MP-IOKR) learns a function h, which also preserves the magnitudes (differences) between the training molecular structure (y_i) and all its candidate molecular structures (y_j). In that way we consider how the true candidate relates to all the remaining candidates and include this knowledge into our learning problem. It is important to node that we do not need the MS/MS spectrum of each candidate x_j, as we approximate the corresponding input feature vectors for each candidate using the molecular structure y_j. MP-IOKR is a kernel method. Kernels measure the similarity between structured objects, e.g. MS/MS spectra (input kernel) or molecular structures (output kernel). On the input side, we use several kernels defined on MS/MS spectra and fragmentation trees, and combine them uniformly, i.e. we sum up the kernels with equal weights. On the output side, we use a Gaussian kernel calculated from binary fingerprints representing the molecular structures in the candidate sets. As we cannot deal with possibly millions of candidates (and O(million^2) kernel matrices). we approximate the Gaussian features and use those in our framework. We train two separated MP-IOKR models one for each ionization mode, i.e. positive and negative. For the positive model we use ~14000 identified MS/MS spectra + ~4Million candidates and their molecular structures. The MS/MS spectra are mainly are extracted from the GNPS and MassBank databases. For the negative model we use ~5800 identified MS/MS spectra and ~1.5Million candidates. We represent the candidate molecular structures using ~7600 binary molecular fingerprints. For each challenge spectra we predict the molecular formula using Sirius [3] by taking into account the possible molecule formulas based on the candidate sets. The score we submitted for each candidate is the one corresponding to the most likely molecular formula. [1] Brouard, Cé.; Shen, H.; Dührkop, K.; d'Alché Buc, F.; Böcker, S. & Rousu, J. Fast metabolite identification with Input Output Kernel Regression Bioinformatics, 2016 [2] Brouard, Cé.; Bach, E.; Böcker, S. & Rousu, J. Magnitude-Preserving Ranking for Structured Outputs (submitted), 2017 [3] https://bio.informatik.uni-jena.de/software/sirius/