News

Oct 31st, 2017
The results are now available.

Oct 30th, 2017
The solutions are now available.

Sept 8th, 2017
Update for Challenge 15 available, but will not count in evaluation.

Sept 4th, 2017
Updated mailling list and submission information.

Aug 23rd, 2017
The preliminary results have been sent out to participants, and are now available.

July 09th, 2017
We fixed the intensities in the TSV archive for challenges 046-243.

June 22nd, 2017
We added the Category 4 on a subset of the data files.

May 22nd, 2017
We have improved challenges 29, 42, 71, 89, 105, 106 and 144.

April 26th, 2017
The rules and challenges of CASMI 2017 are public now !

Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned!


Results in Category 1

Summary of participant performance

F1 score Mean rank Median rank Top Top3 Top10 Misses TopPos TopNeg Mean RRP Median RRP N
kai 469 235.02 4.0 11 19 28 4 9 2 0.965 0.999 45
kai112 350 266.00 3.0 12 14 16 17 9 3 0.985 1.000 45
CStacey 330 7.74 3.0 7 13 17 22 7 0 0.751 0.875 45
SamuelB2 319 12.12 4.0 6 12 21 16 4 2 0.929 0.982 45
SamuelB3 318 12.74 4.0 6 12 20 16 4 2 0.921 0.982 45
seeslab 237 18.29 13.0 4 10 15 12 3 1 0.926 0.957 45
SamuelB 229 15.43 9.0 4 8 17 16 2 2 0.904 0.931 45
Rakesh 192 1.89 1.0 6 8 9 36 4 2 0.998 1.000 45
This summary is also available as CSV download.

Table legend:

F1 score
The Formula 1 score awards points similar to the scheme in F1 racing for each challenge based on the rank of the correct solution. In the participant table, these are summed over all challenges. Please note that the F1 score is thus not neccessarily comparable across categories.
Mean/Median rank
Mean and median rank of the correct solution. For tied ranks with other candidates, the average rank of the ties is used.
Top, Top3, Top10
Number of challenges where the correct solution is ranked first, among the Top 3 and Top 10
Misses
Number of challenges where the correct solution is missing.
TopPos, TopNeg
Top1 ranked solutions in positive or negative ionization mode.
Mean/Median RRP
The relative ranking position, which is also incorporating the length of candidate list.
N
Number of submissions that have passed the evaluation scripts.

Summary of Rank by Challenge

For each challenge, the lowest rank among participants is highlighted in bold. If the submission did not contain the correct candidate this is denoted as "-". If someone did not participate in a challenge, the table cell is empty. The tables are sortable if you click into the column header.

Category1:

CStacey kai kai112 Rakesh SamuelB SamuelB2 SamuelB3 seeslab
challenge-001 - 31.0 9.0 - - - - 1.0
challenge-002 3.5 11.0 - - 4.0 4.0 4.0 21.0
challenge-003 - 1.0 47.0 - 1.0 1.0 1.0 7.0
challenge-004 - 1.0 1.0 2.0 - - - 3.0
challenge-005 - 95.0 101.0 - 22.0 3.0 3.0 31.0
challenge-006 12.5 1.0 1.0 - 18.0 1.0 1.0 16.0
challenge-007 13.0 1.0 1.0 1.0 17.0 1.0 1.0 2.0
challenge-008 1.5 1.0 1.0 - - - - 9.0
challenge-009 3.5 1.0 1.0 - 10.0 10.0 10.0 24.0
challenge-010 1.0 8584.0 - - 4.0 4.0 4.0 -
challenge-011 1.0 2.0 2.0 - 1.0 1.0 1.0 -
challenge-012 - 4.0 2556.0 - 9.0 9.0 9.0 14.0
challenge-013 1.0 6.0 1.0 - 33.0 3.0 3.0 27.0
challenge-014 - 180.0 118.0 - 8.0 2.0 2.0 38.0
challenge-015 - 14.0 21.0 - 9.0 10.0 28.0 4.5
challenge-016 - 1.0 1.0 - 59.0 59.0 59.0 25.0
challenge-017 2.5 1.0 19.0 - 1.0 1.0 1.0 13.0
challenge-018 - 92.0 - - 13.0 13.0 13.0 13.0
challenge-019 1.0 2.0 - - - - - -
challenge-020 - 42.0 - - 3.0 3.0 3.0 7.0
challenge-021 4.0 372.0 3805.0 - - - - -
challenge-022 1.5 3.0 - - 4.0 4.0 4.0 2.0
challenge-023 2.5 25.0 252.0 - 15.0 15.0 15.0 -
challenge-024 1.0 7.0 258.0 - 1.5 1.5 1.5 1.0
challenge-025 12.0 10.0 - - - - - 23.0
challenge-026 32.0 2.0 1.0 - - - - 27.0
challenge-027 1.0 - 122.0 - 8.0 8.0 8.0 -
challenge-028 - 10.0 34.0 - - - - 106.0
challenge-029 1.0 7.0 - 1.0 3.0 3.0 3.0 1.0
challenge-030 - 3.0 4.0 - 3.0 4.0 4.0 -
challenge-031 - 4.0 2.0 1.0 22.0 22.0 22.0 2.0
challenge-032 - 2.0 - 7.0 - - - 14.0
challenge-033 - 59.0 - - - - - -
challenge-034 - - - - - - - 66.0
challenge-035 - 1.0 - - 31.0 31.0 31.0 60.0
challenge-036 3.0 2.0 1.0 1.0 94.5 84.5 84.5 5.0
challenge-037 - 14.0 1.0 1.0 20.0 20.0 20.0 2.0
challenge-038 22.5 25.0 1.0 2.0 29.0 29.0 29.0 1.0
challenge-039 3.0 10.0 - - 1.0 1.0 1.0 12.0
challenge-040 7.0 3.0 - 1.0 3.5 3.5 3.5 3.0
challenge-041 47.0 1.0 1.0 - - - - -
challenge-042 - 1.0 - - - - - 23.0
challenge-043 - - - - - - - -
challenge-044 - 4.0 86.0 - - - - -
challenge-045 - - - - - - - -
This summary is also available as CSV download.


Participant information and abstracts

ParticipantID:        CatherineStacey
Category:             category1
Authors:              Catherine Stacey
Affiliations:         retired

Automatic methods:    yes

Abstract

The challenge data were automatically treated using software written
in C#, using the OBDotNet library, to do the following:

1- retrieve PubChem compounds which matched the mass and error range
   of the parent ion (assuming either pre-charged ions or proton
   adduct/loss)
2- find allowable neutral losses in the MS/MS spectrum, compute the
   elemental formula of the loss, then reject formulas where the loss
   formula is inconsistent with the parent formula
3- score remaining formulas by isotopic fit and mass accuracy, and
   reject low scored formulas, filter candidates to those with the
   remaining formulas
4- for remaining compounds, convert their SMILES string to Mol format
   and search for allowed neutral loss and diagnostic low mass SMARTS
   motifs in the molecule
5- for candidates which pass this pre-filter, apply full fragmentation
   rules, developed with OpenBabel
6- score each filtered candidate by matches between predicted and
   found fragments, with 1 indicating 100% of fragments are matched
7- for challenges with no PubChem matches, candidates were predicted
   from manual searches of KnapSack, or derived from known
   molecules(e.g.glucuronide from aglycone) and the OpenBabel
   prediction software was used for confirmation and scoring


ParticipantID:        
Category:	      Category 1
Authors:              Rakesh Kumar [1], Nilesh Kumar [1], Ranjan Nanda [1],
		      Dinesh Gupta [1]
Affiliations:         [1] International Centre for Genetic Engineering
		      and Biotechnology (ICGEB), New Delhi
Automatic pipeline:   Semi-automated
Spectral libraries:   no

Abstract:

1) The PubChem molecular formulas were stored in a local library.
   The mgf formatted MS/MS spectra was used directly for the analysis. 

2) At first, all the possible candidates for each principal peak were
   fetched and ranked on the basis of likeness of Natural Products,
   employing our algorithm and in-house script (Python
   2.7). Formula(s) for rest of the sub peaks were also fetched.  For
   each of the explained peaks, explanation scores were also appended
   for the candidate formula.  The top ranked formulas were mainly
   searched in KEGG, Dictionary of Natural Products, HMDB, ChemSpider,
   PubChem etc., resembling the given description.  The scoring was
   given to natural product likeness using meta-data information
   available for respective challenges.

3) Finally the compounds in INCHI format and corresponding scores were
   submitted to CASMI 2017, for all the challenges.  Multiple
   compounds were submitted for individual challenge.
ParticipantID:        SamuelBMS
Category:             category1
Authors:              Bertrand, Samuel(1)
Affiliations:         (1) Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences
                      Pharmaceutiques et Biologiques, Université de Nantes, France 
Automatic methods:    yes
Spectral libraries:   no

Abstract

The challenge data were automatically treated using R, CAMERA [1],
SIRIUS3 [2], MeHaloCoA [3], RMassBank [4], Taxize [5], CFM-ID [6] and
Tremolo [7] as follow, and stored during the analysis in a MYSQL
databases throughout the process:

1- MS Data were manually introduced into the DB.

2- NeutralLosses, adducts were searched within MS1 spectra using CAMERA.

3- for each challenge, molecular formula obtained using SIRIUS3 and
   discriminated based on isotpic distribution, MS2 fragmentation
   (calculated by SIRIUS) and adduct redundancy (number of occurrences
   of the MF among all adducts over the maximum number of occurrences
   of a MF among all proposed MF). The presence of S, Cl, Br atoms
   were automatically detected using MeHaloCoA. The Sirius Score
   (Ssirius) was kept for further discrimination of th structure.

4- molecular formula of compounds (corrected from adduct information)
   were searched into various databases (AntiBase, ChEBI, DNP, DMNP,
   GNPS, HMDB, KEGG, KNAPSACK, LipidMaps, MassBank, MassBankEU, MONA,
   OldCASMI, ReSpect, SupernaturalII, UNPD) looking for InChI, SMILES,
   Mol and biological sources data. For each compounds found in the
   data bases missing InChI and SMILES data were completed (as much as
   possible) using OpenBabel [6], CTS [7], CACTUS [8] or ChemSpider
   [9].

5- when available, the biological origin of the compound was used for
   scoring. The phylogeny information of the challenge compound was
   compared to the same information about the proposed structures
   using Taxize. The corresponding Sphylo score was calculated from
   -10 to +10 according to phylogeny similarities. A Sphylo of 0
   corresponds to no data or similar kingdom.

6- for all structures, in silico MS2 were calculated using CFM-ID
   (when possible). The similarity between the in silico MS2 and the
   acquired MS2 of the challenge was compared using Tremolo. The
   MQScore (cosin index) given by Tremolo was further used for
   discrimination. (No Real Spectra present in the searched DB were
   used)

7- final scores (SF) was calculated according to the 3 previously
   reported score as follow: SF=Ssirius+Sphylo+MQScore (when negative
   score were generated, all scores of the challenge were modified to
   provide only positive scores).

Bibliography:
[1] C. Kuhl, et al., Anal. Chem., 2012, 84, 283.
[2] S. Böcker, et al., Bioinformatics, 2009, 25, 218.
[3] C. Roullier, et al. Anal. Chem. 2016, 88(18), 9143.
[4] M. A. Stravs, et al., J. Mass Spectrom. 2013, 48(1), 89.
[5] S. Chamberlain, et al., F1000Research 2013, 2, 191.
[5] F. Allen, et al., Metabolomics, 2014, 11, 98.
[6] N. O'Boyle, et al., J. Cheminformatics, 2011, 3, 33.
[7] G. Wohlgemuth, et al., Bioinformatics, 2010, 26, 2647.
[8] http://cactus.nci.nih.gov/chemical/structure
[9] H.E. Pence, et al., Journal of Chemical Education, 2010, 87, 1123.
ParticipantID:        SamuelBMS2
Category:             category1
Authors:              Bertrand, Samuel(1)
Affiliations:         (1) Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences
		      Pharmaceutiques et Biologiques, Université de Nantes, France 
Automatic methods:    yes
Spectral libraries:   no

Abstract

The challenge data were automatically treated using R, CAMERA [1],
SIRIUS3 [2], MeHaloCoA [3], RMassBank [4], Taxize [5], CFM-ID [6] and
Tremolo [7] as follow, and stored during the analysis in a MYSQL
databases throughout the process:

1- MS Data were manually introduced into the DB.

2- NeutralLosses, adducts were searched within MS1 spectra using
   CAMERA.

3- for each challenge, molecular formula obtained using SIRIUS3 and
   discriminated based on isotpic distribution, MS2 fragmentation
   (calculated by SIRIUS) and adduct redundancy (number of occurrences
   of the MF among all adducts over the maximum number of occurrences
   of a MF among all proposed MF). The presence of S, Cl, Br atoms
   were automatically detected using MeHaloCoA. The Sirius Score
   (Ssirius) was kept for further discrimination of th structure.

4- molecular formula of compounds (corrected from adduct information)
   were searched into various databases (AntiBase, ChEBI, DNP, DMNP,
   GNPS, HMDB, KEGG, KNAPSACK, LipidMaps, MassBank, MassBankEU, MONA,
   OldCASMI, ReSpect, SupernaturalII, UNPD) looking for InChI, SMILES,
   Mol and biological sources data. For each compounds found in the
   data bases missing InChI and SMILES data were completed (as much as
   possible) using OpenBabel [6], CTS [7], CACTUS [8] or ChemSpider
   [9].

5- when available, the biological origin of the compound was used for
   scoring. The phylogeny information of the challenge compound was
   compared to the same information about the proposed structures
   using Taxize. The corresponding Sphylo score was calculated from
   -10 to +10 according to phylogeny similarities. A Sphylo of 0
   corresponds to no data or similar kingdom.

6- for all structures, in silico MS2 were calculated using CFM-ID
   (when possible). The similarity between the MS2 from databases
   (GNPS, HMDB, MassBank, MassBankEU, MONA, OldCASMI, ReSpect) as well
   as in silico MS2 was compared using Tremolo to the acquired MS2 of
   the challenge. The MQScore (cosin index) given by Tremolo was
   further used for discrimination. (No Real Spectra present in the
   searched DB were used)

7- final scores (SF) was calculated according to the 3 previously
   reported score as follow: SF=Ssirius+Sphylo+MQScore (when negative
   score were generated, all scores of the challenge were modified to
   provide only positive scores).

Bibliography:
[1] C. Kuhl, et al., Anal. Chem., 2012, 84, 283.
[2] S. Böcker, et al., Bioinformatics, 2009, 25, 218.
[3] C. Roullier, et al. Anal. Chem. 2016, 88(18), 9143.
[4] M. A. Stravs, et al., J. Mass Spectrom. 2013, 48(1), 89.
[5] S. Chamberlain, et al., F1000Research 2013, 2, 191.
[5] F. Allen, et al., Metabolomics, 2014, 11, 98.
[6] N. O'Boyle, et al., J. Cheminformatics, 2011, 3, 33.
[7] G. Wohlgemuth, et al., Bioinformatics, 2010, 26, 2647.
[8] http://cactus.nci.nih.gov/chemical/structure
[9] H.E. Pence, et al., Journal of Chemical Education, 2010, 87, 1123.
ParticipantID:        SamuelBMS2
Category:             category1
Authors:              Bertrand, Samuel(1)
Affiliations:         (1) Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences
                      Pharmaceutiques et Biologiques, Université de Nantes, France 
Automatic methods:    yes
Spectral libraries:   no

Abstract

The challenge data were automatically treated using R, CAMERA [1],
SIRIUS3 [2], MeHaloCoA [3], RMassBank [4], Taxize [5], CFM-ID [6] and
Tremolo [7] as follow, and stored during the analysis in a MYSQL
databases throughout the process:

1- MS Data were manually introduced into the DB.

2- NeutralLosses, adducts were searched within MS1 spectra using
   CAMERA.

3- for each challenge, molecular formula obtained using SIRIUS3 and
   discriminated based on isotpic distribution, MS2 fragmentation
   (calculated by SIRIUS) and adduct redundancy (number of occurrences
   of the MF among all adducts over the maximum number of occurrences
   of a MF among all proposed MF). The presence of S, Cl, Br atoms
   were automatically detected using MeHaloCoA. The Sirius Score
   (Ssirius) was kept for further discrimination of th structure.

4- molecular formula of compounds (corrected from adduct information)
   were searched into various databases (AntiBase, ChEBI, DNP, DMNP,
   GNPS, HMDB, KEGG, KNAPSACK, LipidMaps, MassBank, MassBankEU, MONA,
   OldCASMI, ReSpect, SupernaturalII, UNPD) looking for InChI, SMILES,
   Mol and biological sources data. For each compounds found in the
   data bases missing InChI and SMILES data were completed (as much as
   possible) using OpenBabel [6], CTS [7], CACTUS [8] or ChemSpider
   [9].

5- when available, the biological origin of the compound was used for
   scoring. The phylogeny information of the challenge compound was
   compared to the same information about the proposed structures
   using Taxize. The corresponding Sphylo score was calculated from
   -10 to +10 according to phylogeny similarities. A Sphylo of 0
   corresponds to no data or similar kingdom.

6- for all structures, in silico MS2 were calculated using CFM-ID
   (when possible). The similarity between the MS2 from databases
   (GNPS, HMDB, MassBank, MassBankEU, MONA, OldCASMI, ReSpect) was
   compared using Tremolo to the acquired MS2 of the challenge;
   however, when no MS2 spectra were existing, in silico MS2 were used
   instead. The MQScore (cosin index) given by Tremolo was further
   used for discrimination. (No Real Spectra present in the searched
   DB were used)

7- final scores (SF) was calculated according to the 3 previously
   reported score as follow: SF=Ssirius+Sphylo+MQScore (when negative
   score were generated, all scores of the challenge were modified to
   provide only positive scores).

Bibliography:
[1] C. Kuhl, et al., Anal. Chem., 2012, 84, 283.
[2] S. Böcker, et al., Bioinformatics, 2009, 25, 218.
[3] C. Roullier, et al. Anal. Chem. 2016, 88(18), 9143.
[4] M. A. Stravs, et al., J. Mass Spectrom. 2013, 48(1), 89.
[5] S. Chamberlain, et al., F1000Research 2013, 2, 191.
[5] F. Allen, et al., Metabolomics, 2014, 11, 98.
[6] N. O'Boyle, et al., J. Cheminformatics, 2011, 3, 33.
[7] G. Wohlgemuth, et al., Bioinformatics, 2010, 26, 2647.
[8] http://cactus.nci.nih.gov/chemical/structure
[9] H.E. Pence, et al., Journal of Chemical Education, 2010, 87, 1123.
ParticipantID: seeslab
Category: category1

Authors: Ruiz-Botella Manuel(1), Senan Oriol(1), Yanes Oscar(1, 2),
	 Sales-Pardo Marta(1) and Guimerà Roger(1, 3) Affiliations:
	 (1) Universitat Rovira i Virgili Av. Pa\"isos Catalans 26,
	 Tarragona 43007, Catalonia, Spain, (2) Metabolomics Platform,
	 Spanish Biomedical Research Center in Diabetes and Associated
	 Metabolic Disorders (CIBERDEM), Monforte de Lemos 35, 28029
	 Madrid, Spain, (3) Instituci\'o Catalana de Recerca i Estudis
	 Avan\c{c}ats (ICREA), Llu\'is Companys 23, Barcelona 08010,
	 Catalonia, Spain

Automatic pipeline:   yes
Spectral libraries:   yes

Abstract

We have used four different methods to obtain candidate structures for
the 45 challenges.  The first method we have used is iMet
(http://imet.seeslab.net/). It uses MS/MS data, the MS1 parental ion
and isotopic patterns as input data.  With iMet we obtained list of
candidate neighbors and putative molecular formulas for each
challenge. Candidate neighbors have two characteristics: first they
are metabolites in iMet database which have a spectral similarity with
the query molecule. Second they are molecules that with one chemical
transformation (for example CH3 addition) included in iMet
biotransformation database we would get a metabolite whose molecular
formula has the mass of the query metabolite. iMet has spectral
information from METLIN, HMDB and Massbank. It uses three values of
collision energy as a parameter, and it accepts values of 0, 10, 20
and 40. We have used a combination of the two closest energy values in
the challenges that were obtained with a different collision energy
value that the ones accepted by iMet.

The second method is MSFINDER:
(http://prime.psc.riken.jp/Metabolomics_Software/MS-FINDER/index.html). MSFINDER
uses MS/MS data and isotopic pattern and outputs a list of putative
structures for the query molecule. MSFINDER uses spectral databases,
and molecular databases, like Pubchem or ChEBI. We have used the
following databases (they change depending on the challenge):

BMDB,ChEBI,DrugBank,FooDB,HMDB,KNApSAcK,PlantCyc,PubChem,T3DB,UNPD,MINE,STOFF

The third method to obtain candidate structures was Metfusion
(https://msbi.ipb-halle.de/MetFusion/). We have used Metlin as a
spectral database and ChemSpider molecular database. It combines both
results based on the Tanimoto coefficient.

The last method we have used is CFM ID
(http://cfmid.wishartlab.com/). Specifically, we have used the
Compound Identification tool, which ranks a list of candidate
structures according to how well they match the input spectra. The
input spectra is a peak list at different energy levels (10 V, 20 V,
40 V)

Once we had the candidate structures (for MSFINDER, Metfusion and CFM
ID) and the candidate neighbors (for iMet), we have used a trained
Naive Bayes model that combines the solutions of the four methods, and
outputs a sorted (by score) list of compound structures. The reported
score reflects how confident is the meta-model about that particular
structure. The meta-model has been trained with known compound
spectra.


ParticipantID:        kai_iso
Category:             category1
Authors:              Dührkop, Kai (1) and Ludwig, Marcus (1) and Böcker, Sebastian (1)
		      and Bach, Eric (2) and Brouard, Céline (2) and Rousu, Juho (2)
Affiliations:         (1) Chair of Bioinformatics, Friedrich-Schiller University, Jena
                      (2) Department of Computer Science, Aalto University
                      Developmental Biology, Halle, Germany
Automatic pipeline:   yes
Spectral libraries:   no

Abstract
We processed the peaklists in MGF format using an in-house version of CSI:FingerID. 
Fragmentation trees were computed with Sirius 3.1.5 
using the Q-TOF instrument settings. 

As the spectra were measured in MSe mode we expect to see isotope
peaks in MSMS. We used an experimental feature in SIRIUS that allows
for detecting isotope patterns in MSMS and incorporate them into the
fragmentation tree scoring.

We used the standard workflow of the SIRIUS+CSI:FingerID (version 3.5) software:
We computed trees for all candidate formulas by searching the given
precursor mass (with the list of common adduct types from category 4)
in PubChem.  Only the top scoring trees were selected for further
processing: Trees with a score smaller than 75% of the score of the
optimal tree were discarded. Each of these trees was processed with
CSI:FingerId as described in [1]. We predicted for each tree a
molecular fingerprint (with platt probability estimates) and compared
them against the fingerprints of all structure candidates with the
same molecular formula. For comparison of fingerprints, we used the
new new maximum likelihood scoring function which is implemented since
SIRIUS 3.5.  The resulting hits were merged together in one list and
were sorted by score. A constant value was added to all scores to make
them positive (as stated in the CASMI rules). Ties of compounds with
same score were ordered randomly. If a compound could not be processed
(e.g. because of multiple charges) its score was set to zero.

[1] Kai Dührkop, Huibin Shen, Marvin Meusel, Juho Rousu and Sebastian
    Böcker Searching molecular structure databases with tandem mass
    spectra using CSI:FingerID.  Proc Natl Acad Sci U S A,
    112(41):12580-12585, 2015.
ParticipantID:        kai112
Category:             category1
Authors:              Dührkop, Kai (1) and Ludwig, Marcus (1) and Böcker, Sebastian (1)
		      and Bach, Eric (2) and Brouard, Céline (2) and Rousu, Juho (2)
Affiliations:         (1) Chair of Bioinformatics, Friedrich-Schiller University, Jena
                      (2) Department of Computer Science, Aalto University
                      Developmental Biology, Halle, Germany
Automatic pipeline:   yes
Spectral libraries:   no

Abstract
We processed the peaklists in MGF format using an in-house version of CSI:FingerID. 
Fragmentation trees were computed with Sirius 3.1.5 
using the Q-TOF instrument settings. 

The preliminary results have shown that we miss a lot of compounds
because we were not always able to identify the correct molecular
formula in top ranks. This might be because no isotope patterns for
the precursor were given. So we prepared a second submission kai112
which is not longer using a hard threshold, but instead consider all
molecular formulas for the CSI:FingerID search and add the SIRIUS
score on top of the CSI:FingerID score.  To avoid that empty trees
(which we would have thrown away by a hard threshold) get high scores
by random, we add a penalty of 1000 if a tree explains not a single
fragment peak. Furthermore, for the kai112 submission we trained
CSI:FingerID on a larger dataset that contains also spectra from NIST.

Beside removing the hard threshold, the kai112 submission follows the
standard SIRIUS+CSI:FingerID protocol: We computed trees for all
candidate formulas by searching the given precursor mass (with the
list of common adduct types from category 4) in PubChem.. Each of
these trees was processed with CSI:FingerId as described in [1]. We
predicted for each tree a molecular fingerprint (with platt
probability estimates) and compared them against the fingerprints of
all structure candidates with the same molecular formula. For
comparison of fingerprints, we used the new new maximum likelihood
scoring function which is implemented since SIRIUS 3.5.  Trees with
one node get penalty of 1000. For all other trees, the SIRIUS score
was added to the CSI:FingerID score. The resulting hits were merged
together in one list and were sorted by score. A constant value was
added to all scores to make them positive (as stated in the CASMI
rules). Ties of compounds with same score were ordered randomly. If a
compound could not be processed (e.g. because of multiple charges) its
score was set to zero.

[1] Kai Dührkop, Huibin Shen, Marvin Meusel, Juho Rousu and Sebastian
    Böcker Searching molecular structure databases with tandem mass
    spectra using CSI:FingerID.  Proc Natl Acad Sci U S A,
    112(41):12580-12585, 2015.

Details per Challenge and Participant. See legend at bottom for more details

The details table is also available as HTML and as CSV download.