News

Oct 31st, 2017
The results are now available.

Oct 30th, 2017
The solutions are now available.

Sept 8th, 2017
Update for Challenge 15 available, but will not count in evaluation.

Sept 4th, 2017
Updated mailling list and submission information.

Aug 23rd, 2017
The preliminary results have been sent out to participants, and are now available.

July 09th, 2017
We fixed the intensities in the TSV archive for challenges 046-243.

June 22nd, 2017
We added the Category 4 on a subset of the data files.

May 22nd, 2017
We have improved challenges 29, 42, 71, 89, 105, 106 and 144.

April 26th, 2017
The rules and challenges of CASMI 2017 are public now !

Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned!


Results in Category 4

Summary of participant performance

Category 4 extra contains the scores from two methods that were submitted on 19.10.2017 after the contest deadline. The initial submissions before the deadline were unfortunately identical to IOKR_TanimotoGaussian. With these updated files, IOKR_TanimotoGaussian_AvgScore would have achieved the 3rd rank in Category 4.
F1 score Mean rank Median rank Top Top3 Top10 Misses TopPos TopNeg Mean RRP Median RRP N
IOKR_TanimotoGaussian_AvgScore 1446 2721.84 201.0 38 60 73 0 22 16 0.683 0.957 198
MPIOKR_GaussianRFF 1066 3862.30 582.5 33 43 52 0 22 11 0.587 0.726 198
This summary is also available as CSV download.

Table legend:

F1 score
The Formula 1 score awards points similar to the scheme in F1 racing for each challenge based on the rank of the correct solution. In the participant table, these are summed over all challenges. Please note that the F1 score is thus not neccessarily comparable across categories.
Mean/Median rank
Mean and median rank of the correct solution. For tied ranks with other candidates, the average rank of the ties is used.
Top, Top3, Top10
Number of challenges where the correct solution is ranked first, among the Top 3 and Top 10
Misses
Number of challenges where the correct solution is missing.
TopPos, TopNeg
Top1 ranked solutions in positive or negative ionization mode.
Mean/Median RRP
The relative ranking position, which is also incorporating the length of candidate list.
N
Number of submissions that have passed the evaluation scripts.

Summary of Rank by Challenge

For each challenge, the lowest rank among participants is highlighted in bold. If the submission did not contain the correct candidate this is denoted as "-". If someone did not participate in a challenge, the table cell is empty. The tables are sortable if you click into the column header.

Category4:

IOKR_TanimotoGaussian_AvgScore MPIOKR_GaussianRFF
challenge-046 1647.0 1313.0
challenge-047 384.0 408.0
challenge-048 662.0 6981.0
challenge-049 469.5 1636.5
challenge-050 14.0 2.0
challenge-051 1.0 14.0
challenge-052 1477.0 315.0
challenge-053 2.0 2.0
challenge-054 11295.0 2985.0
challenge-055 1.0 29.0
challenge-056 1.0 1.0
challenge-057 15753.0 15693.0
challenge-058 533.0 10.0
challenge-059 1.0 1.0
challenge-060 8941.0 11264.0
challenge-061 3117.0 3304.0
challenge-062 1.0 216.0
challenge-063 1.0 1.0
challenge-064 11410.0 11188.0
challenge-065 1886.0 2004.0
challenge-066 11104.0 10588.0
challenge-067 1260.0 2918.0
challenge-068 8.5 7.5
challenge-069 307.0 51.0
challenge-070 11294.0 11323.0
challenge-071 107.0 9970.0
challenge-072 15577.0 14528.0
challenge-073 1.0 1.0
challenge-074 2.0 6.0
challenge-075 1.0 587.0
challenge-076 15949.0 15260.0
challenge-077 7099.0 7590.0
challenge-078 8399.0 12007.0
challenge-079 664.0 1030.0
challenge-080 5495.0 1.0
challenge-081 1.0 1.0
challenge-082 19.0 147.0
challenge-083 1.5 43.5
challenge-084 344.0 280.0
challenge-085 9093.0 9193.0
challenge-086 268.0 26388.0
challenge-087 1.0 1.0
challenge-088 3524.5 3503.5
challenge-089 4132.0 4492.0
challenge-090 209.0 67.0
challenge-091 704.0 640.0
challenge-092 32.0 344.0
challenge-093 3704.0 3578.0
challenge-094 8.0 16242.0
challenge-095 9242.0 9355.0
challenge-096 7928.0 247.0
challenge-097 1.0 1.0
challenge-098 3007.0 3090.0
challenge-099 10038.0 10059.0
challenge-100 7538.0 7101.0
challenge-101 12311.0 13007.0
challenge-102 4935.0 4425.0
challenge-103 1.0 1.0
challenge-104 115.0 1897.0
challenge-105 38.0 43.0
challenge-106 8520.0 3868.0
challenge-107 5550.0 5559.0
challenge-108 4.0 3721.0
challenge-109 9176.0 8997.0
challenge-110 6637.0 6611.0
challenge-111 2.0 6.0
challenge-112 1222.0 6219.0
challenge-113 380.0 399.0
challenge-114 7775.0 9881.0
challenge-115 1.0 1.0
challenge-116 1.0 39.0
challenge-117 2629.0 3755.0
challenge-118 6421.0 6842.0
challenge-119 9798.0 8871.0
challenge-120 1.0 10219.0
challenge-121 336.0 46.0
challenge-122 477.0 5358.0
challenge-123 11390.0 13377.0
challenge-124 1366.0 1349.0
challenge-125 2163.5 2277.5
challenge-126 32.0 39.0
challenge-127 562.0 350.0
challenge-128 7700.0 1.0
challenge-129 1.0 1879.0
challenge-130 72.0 110.0
challenge-131 379.0 384.0
challenge-132 1119.0 2367.0
challenge-133 4.0 11.0
challenge-134 1881.0 11909.0
challenge-135 567.5 596.5
challenge-136 1.0 1.0
challenge-137 6.5 1.5
challenge-138 22.0 3.0
challenge-139 2.0 185.0
challenge-140 1.0 11612.0
challenge-141 1.0 1.0
challenge-142 320.0 5299.0
challenge-143 50.0 10.0
challenge-144 25.0 4.0
challenge-145 25.0 1.0
challenge-146 11720.0 11091.0
challenge-147 1.0 1293.0
challenge-148 2.0 1.0
challenge-149 1.0 1.0
challenge-150 2.0 1722.0
challenge-151 10338.0 7983.0
challenge-152 55.0 80.0
challenge-153 23.0 34.0
challenge-154 10102.0 7472.0
challenge-155 2.0 3.0
challenge-156 15.0 20.0
challenge-157 25.0 13.0
challenge-158 3.0 9544.0
challenge-159 1.0 2299.0
challenge-160 12176.0 2629.0
challenge-161 261.0 87.0
challenge-162 1.0 17958.0
challenge-163 1.0 1.0
challenge-164 4.0 1.0
challenge-165 1988.0 2004.0
challenge-166 3641.0 3407.0
challenge-167 14821.0 15347.0
challenge-168 5.0 2212.0
challenge-169 27.0 578.0
challenge-170 1.0 1.0
challenge-171 2.0 1.0
challenge-172 204.0 444.0
challenge-173 1353.0 16181.0
challenge-174 6409.0 6948.0
challenge-175 1396.0 14.0
challenge-176 14.0 283.0
challenge-177 1.0 414.0
challenge-178 14960.0 1385.0
challenge-179 1732.0 2422.0
challenge-180 277.0 289.0
challenge-181 2364.0 16151.0
challenge-182 27.0 39.0
challenge-183 1.5 1.5
challenge-184 1.0 1.0
challenge-185 3729.0 11832.0
challenge-186 2.0 14.0
challenge-187 133.0 243.0
challenge-188 2.0 1.0
challenge-189 1138.0 47.0
challenge-190 2349.0 3666.0
challenge-191 974.0 976.0
challenge-192 36.0 5022.0
challenge-193 1.0 1.0
challenge-194 65.0 8323.0
challenge-195 1.5 3.5
challenge-196 2855.0 2.0
challenge-197 34.0 34.0
challenge-198 1.0 1.0
challenge-199 2745.0 2795.0
challenge-200 1.0 1.0
challenge-201 15085.0 26330.0
challenge-202 2337.0 18616.0
challenge-203 1.0 1.0
challenge-204 6392.0 6410.0
challenge-205 3.0 16.0
challenge-206 2329.0 2407.0
challenge-207 10269.0 9551.0
challenge-208 33.0 139.0
challenge-209 198.0 31.0
challenge-210 32.0 4671.0
challenge-211 1.0 1.0
challenge-212 15505.0 15390.0
challenge-213 1.0 1.0
challenge-214 9752.0 10013.0
challenge-215 9.0 3860.0
challenge-216 8.0 13.0
challenge-217 2.0 9.0
challenge-218 2.0 2.0
challenge-219 1.0 26.0
challenge-220 1.0 15801.0
challenge-221 18460.0 19164.0
challenge-222 585.0 12334.0
challenge-223 1.0 62.0
challenge-224 2.0 2.0
challenge-225 1.5 5.5
challenge-226 858.0 227.0
challenge-227 2908.0 2873.0
challenge-228 5.0 4252.0
challenge-229 18334.0 18559.0
challenge-230 15108.0 18378.0
challenge-231 1373.0 1369.0
challenge-232 1.0 1.0
challenge-233 283.0 2848.0
challenge-234 11448.0 11475.0
challenge-235 3.0 1536.0
challenge-236 7.0 1.0
challenge-237 3.0 21.0
challenge-238 237.0 23.0
challenge-239 8.0 1.0
challenge-240 628.0 5456.0
challenge-241 4.0 1.0
challenge-242 1.0 1.0
challenge-243 2.0 2.0
This summary is also available as CSV download.


Participant information and abstracts

Participant:          Bach
ParticipantID:        IOKR_TanimotoGaussian
Category:	          4
Authors:              Eric Bach(1), Céline Brouard(1,2), Kai Dührkop(3), 
                      Sebastian Böcker(3) and Juho Rousu(1,2)
Affiliations:         (1) Department of Computer Science, Aalto University,
		          Espoo, Finland
                      (2) Helsinki Institute for Information Technology, Espoo,
		      	  Finland
                      (3) Chair for Bioinformatics, Friedrich-Schiller University,
		      	  Jena, Germany
Automatic pipeline:   yes
Spectral libraries:   no

Abstract
We used a recent machine learning approach, called Input Output Kernel Regression 
(IOKR), for predicting the candidate scores. IOKR has been successfully applied 
to metabolite identification [1]. 

In this method kernel functions are used to measure the similarity between MS/MS
spectra (input kernel) respectively between molecular structures (output kernel).
On the input side, we use several kernels defined on MS/MS spectra and fragmentation
trees, and combine them uniformly, i.e. we sum up the kernels with equal weights.
On the output side, we use a Gaussian kernel on Tanimoto features calculated 
from binary fingerprints representing the molecular structures in the candidate 
sets.

We train two separated IOKR models one for each ionization mode, i.e. positive
and negative. For the positive model we use ~14000 identified MS/MS spectra and 
for the negative model ~5800. Those spectra mainly are extracted from the GNPS 
and MassBank databases. We represent the candidate molecular structures using 
~7600 binary molecular fingerprints. 

For each challenge spectra we predict the molecular formula using Sirius [2] by
taking into account the possible molecule formulas based on the candidate sets.
The score we submitted for each candidate is the one corresponding to the most
likely molecular formula.

[1] Brouard, Cé.; Shen, H.; Dührkop, K.; d'Alché Buc, F.; Böcker, S. & Rousu, J.
    Fast metabolite identification with Input Output Kernel Regression
    Bioinformatics, 2016
[2] https://bio.informatik.uni-jena.de/software/sirius/
Participant:          Bach
ParticipantID:        MPIOKR_GaussianRFF
Category:	      4
Authors:              Eric Bach(1), Céline Brouard(1,2), Kai Dührkop(3), 
                      Sebastian Böcker(3) and Juho Rousu(1,2)
Affiliations:         (1) Department of Computer Science, Aalto University,
		          Espoo, Finland
                      (2) Helsinki Institute for Information Technology,
		      	  Espoo, Finland
                      (3) Chair for Bioinformatics, Friedrich-Schiller University,
		      	  Jena, Germany
Automatic pipeline:   yes
Spectral libraries:   no

Abstract

Magnitude-preserving Input Output Kernel Regression (MP-IOKR) is an
extension of the Input Output Kernel Regression (IOKR) method [1],
which has been successfully applied to metabolite identification
[2]. Magnitude-preserving IOKR uses a modified objective function,
which can exploit the knowledge about the molecular candidates for a
set of training MS/MS spectra.

IOKR objective function for the regression function h (prediction of
the feature vector representing a molecular structure):

(1) h = argmin_h sum_i ||h(x_i) - psi(y_i)||^2 + lambda ||h||^2
        
Magnitude-preserving modification of the objective function:

(2) h = argmin_h sum_i 1/n_i sum_j ||(h(x_i)-h(x_j))-(psi(y_i)-psi(y_j)))||^2 
                    + lambda ||h||^2
                    
with i in {1,...,l} being an iterator over the number of training data, j in {1,...,n_i} 
being an iterator over the number of molecular candidates of training example i. 
The x_i and y_i are the training MS/MS spectra respectively the training molecular 
structure. The x_j and y_j are the candidates' MS/MS spectra respectively the 
candidates' molecular structures. Equation (1) (IOKR) minimizes the prediction 
error between the training MS/MS spectra and the training molecular structure. 
In contrast to that, Equation (2) (MP-IOKR) learns a function h, which also preserves 
the magnitudes (differences) between the training molecular structure (y_i) and
all its candidate molecular structures (y_j). In that way we consider how the true
candidate relates to all the remaining candidates and include this knowledge into
our learning problem. It is important to node that we do not need the MS/MS 
spectrum of each candidate x_j, as we approximate the corresponding input feature 
vectors for each candidate using the molecular structure y_j.

MP-IOKR is a kernel method. Kernels measure the similarity between structured
objects, e.g. MS/MS spectra (input kernel) or molecular structures (output kernel).
On the input side, we use several kernels defined on MS/MS spectra and fragmentation
trees, and combine them uniformly, i.e. we sum up the kernels with equal weights.
On the output side, we use a Gaussian kernel calculated from binary fingerprints
representing the molecular structures in the candidate sets. As we cannot deal
with possibly millions of candidates (and O(million^2) kernel matrices). we 
approximate the Gaussian features and use those in our framework.

We train two separated MP-IOKR models one for each ionization mode, i.e. positive
and negative. For the positive model we use ~14000 identified MS/MS spectra +
~4Million candidates and their molecular structures. The MS/MS spectra are mainly
are extracted from the GNPS and MassBank databases. For the negative model we 
use ~5800 identified MS/MS spectra and ~1.5Million candidates. We represent the 
candidate molecular structures using ~7600 binary molecular fingerprints. 

For each challenge spectra we predict the molecular formula using Sirius [3] by
taking into account the possible molecule formulas based on the candidate sets.
The score we submitted for each candidate is the one corresponding to the most
likely molecular formula.

[1] Brouard, Cé.; Shen, H.; Dührkop, K.; d'Alché Buc, F.; Böcker, S. & Rousu, J. 
    Fast metabolite identification with Input Output Kernel Regression 
    Bioinformatics, 2016
[2] Brouard, Cé.; Bach, E.; Böcker, S. & Rousu, J.
    Magnitude-Preserving Ranking for Structured Outputs
    (submitted), 2017
[3] https://bio.informatik.uni-jena.de/software/sirius/

Details per Challenge and Participant. See legend at bottom for more details

The details table is also available as HTML and as CSV download.