News
Oct 31st, 2017
The results are now available. Oct 30th, 2017
The solutions are now available. Sept 8th, 2017
Update for Challenge 15 available, but will not count in evaluation. Sept 4th, 2017
Updated mailling list and submission information. Aug 23rd, 2017
The preliminary results have been sent out to participants, and are now available. July 09th, 2017
We fixed the intensities in the TSV archive for challenges 046-243. June 22nd, 2017
We added the Category 4 on a subset of the data files. May 22nd, 2017
We have improved challenges 29, 42, 71, 89, 105, 106 and 144. April 26th, 2017
The rules and challenges of CASMI 2017 are public now ! Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned!
Oct 31st, 2017
The results are now available. Oct 30th, 2017
The solutions are now available. Sept 8th, 2017
Update for Challenge 15 available, but will not count in evaluation. Sept 4th, 2017
Updated mailling list and submission information. Aug 23rd, 2017
The preliminary results have been sent out to participants, and are now available. July 09th, 2017
We fixed the intensities in the TSV archive for challenges 046-243. June 22nd, 2017
We added the Category 4 on a subset of the data files. May 22nd, 2017
We have improved challenges 29, 42, 71, 89, 105, 106 and 144. April 26th, 2017
The rules and challenges of CASMI 2017 are public now ! Jan 20th, 2017
Organisation of CASMI 2017 is underway, stay tuned!
Results in Category 1
Summary of participant performance
F1 score | Mean rank | Median rank | Top | Top3 | Top10 | Misses | TopPos | TopNeg | Mean RRP | Median RRP | N | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
kai | 469 | 235.02 | 4.0 | 11 | 19 | 28 | 4 | 9 | 2 | 0.965 | 0.999 | 45 |
kai112 | 350 | 266.00 | 3.0 | 12 | 14 | 16 | 17 | 9 | 3 | 0.985 | 1.000 | 45 |
CStacey | 330 | 7.74 | 3.0 | 7 | 13 | 17 | 22 | 7 | 0 | 0.751 | 0.875 | 45 |
SamuelB2 | 319 | 12.12 | 4.0 | 6 | 12 | 21 | 16 | 4 | 2 | 0.929 | 0.982 | 45 |
SamuelB3 | 318 | 12.74 | 4.0 | 6 | 12 | 20 | 16 | 4 | 2 | 0.921 | 0.982 | 45 |
seeslab | 237 | 18.29 | 13.0 | 4 | 10 | 15 | 12 | 3 | 1 | 0.926 | 0.957 | 45 |
SamuelB | 229 | 15.43 | 9.0 | 4 | 8 | 17 | 16 | 2 | 2 | 0.904 | 0.931 | 45 |
Rakesh | 192 | 1.89 | 1.0 | 6 | 8 | 9 | 36 | 4 | 2 | 0.998 | 1.000 | 45 |
Table legend:
- F1 score
- The Formula 1 score awards points similar to the scheme in F1 racing for each challenge based on the rank of the correct solution. In the participant table, these are summed over all challenges. Please note that the F1 score is thus not neccessarily comparable across categories.
- Mean/Median rank
- Mean and median rank of the correct solution. For tied ranks with other candidates, the average rank of the ties is used.
- Top, Top3, Top10
- Number of challenges where the correct solution is ranked first, among the Top 3 and Top 10
- Misses
- Number of challenges where the correct solution is missing.
- TopPos, TopNeg
- Top1 ranked solutions in positive or negative ionization mode.
- Mean/Median RRP
- The relative ranking position, which is also incorporating the length of candidate list.
- N
- Number of submissions that have passed the evaluation scripts.
Summary of Rank by Challenge
For each challenge, the lowest rank among participants is highlighted in bold. If the submission did not contain the correct candidate this is denoted as "-". If someone did not participate in a challenge, the table cell is empty. The tables are sortable if you click into the column header. Category1:CStacey | kai | kai112 | Rakesh | SamuelB | SamuelB2 | SamuelB3 | seeslab | |
---|---|---|---|---|---|---|---|---|
challenge-001 | - | 31.0 | 9.0 | - | - | - | - | 1.0 |
challenge-002 | 3.5 | 11.0 | - | - | 4.0 | 4.0 | 4.0 | 21.0 |
challenge-003 | - | 1.0 | 47.0 | - | 1.0 | 1.0 | 1.0 | 7.0 |
challenge-004 | - | 1.0 | 1.0 | 2.0 | - | - | - | 3.0 |
challenge-005 | - | 95.0 | 101.0 | - | 22.0 | 3.0 | 3.0 | 31.0 |
challenge-006 | 12.5 | 1.0 | 1.0 | - | 18.0 | 1.0 | 1.0 | 16.0 |
challenge-007 | 13.0 | 1.0 | 1.0 | 1.0 | 17.0 | 1.0 | 1.0 | 2.0 |
challenge-008 | 1.5 | 1.0 | 1.0 | - | - | - | - | 9.0 |
challenge-009 | 3.5 | 1.0 | 1.0 | - | 10.0 | 10.0 | 10.0 | 24.0 |
challenge-010 | 1.0 | 8584.0 | - | - | 4.0 | 4.0 | 4.0 | - |
challenge-011 | 1.0 | 2.0 | 2.0 | - | 1.0 | 1.0 | 1.0 | - |
challenge-012 | - | 4.0 | 2556.0 | - | 9.0 | 9.0 | 9.0 | 14.0 |
challenge-013 | 1.0 | 6.0 | 1.0 | - | 33.0 | 3.0 | 3.0 | 27.0 |
challenge-014 | - | 180.0 | 118.0 | - | 8.0 | 2.0 | 2.0 | 38.0 |
challenge-015 | - | 14.0 | 21.0 | - | 9.0 | 10.0 | 28.0 | 4.5 |
challenge-016 | - | 1.0 | 1.0 | - | 59.0 | 59.0 | 59.0 | 25.0 |
challenge-017 | 2.5 | 1.0 | 19.0 | - | 1.0 | 1.0 | 1.0 | 13.0 |
challenge-018 | - | 92.0 | - | - | 13.0 | 13.0 | 13.0 | 13.0 |
challenge-019 | 1.0 | 2.0 | - | - | - | - | - | - |
challenge-020 | - | 42.0 | - | - | 3.0 | 3.0 | 3.0 | 7.0 |
challenge-021 | 4.0 | 372.0 | 3805.0 | - | - | - | - | - |
challenge-022 | 1.5 | 3.0 | - | - | 4.0 | 4.0 | 4.0 | 2.0 |
challenge-023 | 2.5 | 25.0 | 252.0 | - | 15.0 | 15.0 | 15.0 | - |
challenge-024 | 1.0 | 7.0 | 258.0 | - | 1.5 | 1.5 | 1.5 | 1.0 |
challenge-025 | 12.0 | 10.0 | - | - | - | - | - | 23.0 |
challenge-026 | 32.0 | 2.0 | 1.0 | - | - | - | - | 27.0 |
challenge-027 | 1.0 | - | 122.0 | - | 8.0 | 8.0 | 8.0 | - |
challenge-028 | - | 10.0 | 34.0 | - | - | - | - | 106.0 |
challenge-029 | 1.0 | 7.0 | - | 1.0 | 3.0 | 3.0 | 3.0 | 1.0 |
challenge-030 | - | 3.0 | 4.0 | - | 3.0 | 4.0 | 4.0 | - |
challenge-031 | - | 4.0 | 2.0 | 1.0 | 22.0 | 22.0 | 22.0 | 2.0 |
challenge-032 | - | 2.0 | - | 7.0 | - | - | - | 14.0 |
challenge-033 | - | 59.0 | - | - | - | - | - | - |
challenge-034 | - | - | - | - | - | - | - | 66.0 |
challenge-035 | - | 1.0 | - | - | 31.0 | 31.0 | 31.0 | 60.0 |
challenge-036 | 3.0 | 2.0 | 1.0 | 1.0 | 94.5 | 84.5 | 84.5 | 5.0 |
challenge-037 | - | 14.0 | 1.0 | 1.0 | 20.0 | 20.0 | 20.0 | 2.0 |
challenge-038 | 22.5 | 25.0 | 1.0 | 2.0 | 29.0 | 29.0 | 29.0 | 1.0 |
challenge-039 | 3.0 | 10.0 | - | - | 1.0 | 1.0 | 1.0 | 12.0 |
challenge-040 | 7.0 | 3.0 | - | 1.0 | 3.5 | 3.5 | 3.5 | 3.0 |
challenge-041 | 47.0 | 1.0 | 1.0 | - | - | - | - | - |
challenge-042 | - | 1.0 | - | - | - | - | - | 23.0 |
challenge-043 | - | - | - | - | - | - | - | - |
challenge-044 | - | 4.0 | 86.0 | - | - | - | - | - |
challenge-045 | - | - | - | - | - | - | - | - |
Participant information and abstracts
ParticipantID: CatherineStacey Category: category1 Authors: Catherine Stacey Affiliations: retired Automatic methods: yes Abstract The challenge data were automatically treated using software written in C#, using the OBDotNet library, to do the following: 1- retrieve PubChem compounds which matched the mass and error range of the parent ion (assuming either pre-charged ions or proton adduct/loss) 2- find allowable neutral losses in the MS/MS spectrum, compute the elemental formula of the loss, then reject formulas where the loss formula is inconsistent with the parent formula 3- score remaining formulas by isotopic fit and mass accuracy, and reject low scored formulas, filter candidates to those with the remaining formulas 4- for remaining compounds, convert their SMILES string to Mol format and search for allowed neutral loss and diagnostic low mass SMARTS motifs in the molecule 5- for candidates which pass this pre-filter, apply full fragmentation rules, developed with OpenBabel 6- score each filtered candidate by matches between predicted and found fragments, with 1 indicating 100% of fragments are matched 7- for challenges with no PubChem matches, candidates were predicted from manual searches of KnapSack, or derived from known molecules(e.g.glucuronide from aglycone) and the OpenBabel prediction software was used for confirmation and scoring
ParticipantID: Category: Category 1 Authors: Rakesh Kumar [1], Nilesh Kumar [1], Ranjan Nanda [1], Dinesh Gupta [1] Affiliations: [1] International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi Automatic pipeline: Semi-automated Spectral libraries: no Abstract: 1) The PubChem molecular formulas were stored in a local library. The mgf formatted MS/MS spectra was used directly for the analysis. 2) At first, all the possible candidates for each principal peak were fetched and ranked on the basis of likeness of Natural Products, employing our algorithm and in-house script (Python 2.7). Formula(s) for rest of the sub peaks were also fetched. For each of the explained peaks, explanation scores were also appended for the candidate formula. The top ranked formulas were mainly searched in KEGG, Dictionary of Natural Products, HMDB, ChemSpider, PubChem etc., resembling the given description. The scoring was given to natural product likeness using meta-data information available for respective challenges. 3) Finally the compounds in INCHI format and corresponding scores were submitted to CASMI 2017, for all the challenges. Multiple compounds were submitted for individual challenge.
ParticipantID: SamuelBMS Category: category1 Authors: Bertrand, Samuel(1) Affiliations: (1) Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences Pharmaceutiques et Biologiques, Université de Nantes, France Automatic methods: yes Spectral libraries: no Abstract The challenge data were automatically treated using R, CAMERA [1], SIRIUS3 [2], MeHaloCoA [3], RMassBank [4], Taxize [5], CFM-ID [6] and Tremolo [7] as follow, and stored during the analysis in a MYSQL databases throughout the process: 1- MS Data were manually introduced into the DB. 2- NeutralLosses, adducts were searched within MS1 spectra using CAMERA. 3- for each challenge, molecular formula obtained using SIRIUS3 and discriminated based on isotpic distribution, MS2 fragmentation (calculated by SIRIUS) and adduct redundancy (number of occurrences of the MF among all adducts over the maximum number of occurrences of a MF among all proposed MF). The presence of S, Cl, Br atoms were automatically detected using MeHaloCoA. The Sirius Score (Ssirius) was kept for further discrimination of th structure. 4- molecular formula of compounds (corrected from adduct information) were searched into various databases (AntiBase, ChEBI, DNP, DMNP, GNPS, HMDB, KEGG, KNAPSACK, LipidMaps, MassBank, MassBankEU, MONA, OldCASMI, ReSpect, SupernaturalII, UNPD) looking for InChI, SMILES, Mol and biological sources data. For each compounds found in the data bases missing InChI and SMILES data were completed (as much as possible) using OpenBabel [6], CTS [7], CACTUS [8] or ChemSpider [9]. 5- when available, the biological origin of the compound was used for scoring. The phylogeny information of the challenge compound was compared to the same information about the proposed structures using Taxize. The corresponding Sphylo score was calculated from -10 to +10 according to phylogeny similarities. A Sphylo of 0 corresponds to no data or similar kingdom. 6- for all structures, in silico MS2 were calculated using CFM-ID (when possible). The similarity between the in silico MS2 and the acquired MS2 of the challenge was compared using Tremolo. The MQScore (cosin index) given by Tremolo was further used for discrimination. (No Real Spectra present in the searched DB were used) 7- final scores (SF) was calculated according to the 3 previously reported score as follow: SF=Ssirius+Sphylo+MQScore (when negative score were generated, all scores of the challenge were modified to provide only positive scores). Bibliography: [1] C. Kuhl, et al., Anal. Chem., 2012, 84, 283. [2] S. Böcker, et al., Bioinformatics, 2009, 25, 218. [3] C. Roullier, et al. Anal. Chem. 2016, 88(18), 9143. [4] M. A. Stravs, et al., J. Mass Spectrom. 2013, 48(1), 89. [5] S. Chamberlain, et al., F1000Research 2013, 2, 191. [5] F. Allen, et al., Metabolomics, 2014, 11, 98. [6] N. O'Boyle, et al., J. Cheminformatics, 2011, 3, 33. [7] G. Wohlgemuth, et al., Bioinformatics, 2010, 26, 2647. [8] http://cactus.nci.nih.gov/chemical/structure [9] H.E. Pence, et al., Journal of Chemical Education, 2010, 87, 1123.
ParticipantID: SamuelBMS2 Category: category1 Authors: Bertrand, Samuel(1) Affiliations: (1) Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences Pharmaceutiques et Biologiques, Université de Nantes, France Automatic methods: yes Spectral libraries: no Abstract The challenge data were automatically treated using R, CAMERA [1], SIRIUS3 [2], MeHaloCoA [3], RMassBank [4], Taxize [5], CFM-ID [6] and Tremolo [7] as follow, and stored during the analysis in a MYSQL databases throughout the process: 1- MS Data were manually introduced into the DB. 2- NeutralLosses, adducts were searched within MS1 spectra using CAMERA. 3- for each challenge, molecular formula obtained using SIRIUS3 and discriminated based on isotpic distribution, MS2 fragmentation (calculated by SIRIUS) and adduct redundancy (number of occurrences of the MF among all adducts over the maximum number of occurrences of a MF among all proposed MF). The presence of S, Cl, Br atoms were automatically detected using MeHaloCoA. The Sirius Score (Ssirius) was kept for further discrimination of th structure. 4- molecular formula of compounds (corrected from adduct information) were searched into various databases (AntiBase, ChEBI, DNP, DMNP, GNPS, HMDB, KEGG, KNAPSACK, LipidMaps, MassBank, MassBankEU, MONA, OldCASMI, ReSpect, SupernaturalII, UNPD) looking for InChI, SMILES, Mol and biological sources data. For each compounds found in the data bases missing InChI and SMILES data were completed (as much as possible) using OpenBabel [6], CTS [7], CACTUS [8] or ChemSpider [9]. 5- when available, the biological origin of the compound was used for scoring. The phylogeny information of the challenge compound was compared to the same information about the proposed structures using Taxize. The corresponding Sphylo score was calculated from -10 to +10 according to phylogeny similarities. A Sphylo of 0 corresponds to no data or similar kingdom. 6- for all structures, in silico MS2 were calculated using CFM-ID (when possible). The similarity between the MS2 from databases (GNPS, HMDB, MassBank, MassBankEU, MONA, OldCASMI, ReSpect) as well as in silico MS2 was compared using Tremolo to the acquired MS2 of the challenge. The MQScore (cosin index) given by Tremolo was further used for discrimination. (No Real Spectra present in the searched DB were used) 7- final scores (SF) was calculated according to the 3 previously reported score as follow: SF=Ssirius+Sphylo+MQScore (when negative score were generated, all scores of the challenge were modified to provide only positive scores). Bibliography: [1] C. Kuhl, et al., Anal. Chem., 2012, 84, 283. [2] S. Böcker, et al., Bioinformatics, 2009, 25, 218. [3] C. Roullier, et al. Anal. Chem. 2016, 88(18), 9143. [4] M. A. Stravs, et al., J. Mass Spectrom. 2013, 48(1), 89. [5] S. Chamberlain, et al., F1000Research 2013, 2, 191. [5] F. Allen, et al., Metabolomics, 2014, 11, 98. [6] N. O'Boyle, et al., J. Cheminformatics, 2011, 3, 33. [7] G. Wohlgemuth, et al., Bioinformatics, 2010, 26, 2647. [8] http://cactus.nci.nih.gov/chemical/structure [9] H.E. Pence, et al., Journal of Chemical Education, 2010, 87, 1123.
ParticipantID: SamuelBMS2 Category: category1 Authors: Bertrand, Samuel(1) Affiliations: (1) Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences Pharmaceutiques et Biologiques, Université de Nantes, France Automatic methods: yes Spectral libraries: no Abstract The challenge data were automatically treated using R, CAMERA [1], SIRIUS3 [2], MeHaloCoA [3], RMassBank [4], Taxize [5], CFM-ID [6] and Tremolo [7] as follow, and stored during the analysis in a MYSQL databases throughout the process: 1- MS Data were manually introduced into the DB. 2- NeutralLosses, adducts were searched within MS1 spectra using CAMERA. 3- for each challenge, molecular formula obtained using SIRIUS3 and discriminated based on isotpic distribution, MS2 fragmentation (calculated by SIRIUS) and adduct redundancy (number of occurrences of the MF among all adducts over the maximum number of occurrences of a MF among all proposed MF). The presence of S, Cl, Br atoms were automatically detected using MeHaloCoA. The Sirius Score (Ssirius) was kept for further discrimination of th structure. 4- molecular formula of compounds (corrected from adduct information) were searched into various databases (AntiBase, ChEBI, DNP, DMNP, GNPS, HMDB, KEGG, KNAPSACK, LipidMaps, MassBank, MassBankEU, MONA, OldCASMI, ReSpect, SupernaturalII, UNPD) looking for InChI, SMILES, Mol and biological sources data. For each compounds found in the data bases missing InChI and SMILES data were completed (as much as possible) using OpenBabel [6], CTS [7], CACTUS [8] or ChemSpider [9]. 5- when available, the biological origin of the compound was used for scoring. The phylogeny information of the challenge compound was compared to the same information about the proposed structures using Taxize. The corresponding Sphylo score was calculated from -10 to +10 according to phylogeny similarities. A Sphylo of 0 corresponds to no data or similar kingdom. 6- for all structures, in silico MS2 were calculated using CFM-ID (when possible). The similarity between the MS2 from databases (GNPS, HMDB, MassBank, MassBankEU, MONA, OldCASMI, ReSpect) was compared using Tremolo to the acquired MS2 of the challenge; however, when no MS2 spectra were existing, in silico MS2 were used instead. The MQScore (cosin index) given by Tremolo was further used for discrimination. (No Real Spectra present in the searched DB were used) 7- final scores (SF) was calculated according to the 3 previously reported score as follow: SF=Ssirius+Sphylo+MQScore (when negative score were generated, all scores of the challenge were modified to provide only positive scores). Bibliography: [1] C. Kuhl, et al., Anal. Chem., 2012, 84, 283. [2] S. Böcker, et al., Bioinformatics, 2009, 25, 218. [3] C. Roullier, et al. Anal. Chem. 2016, 88(18), 9143. [4] M. A. Stravs, et al., J. Mass Spectrom. 2013, 48(1), 89. [5] S. Chamberlain, et al., F1000Research 2013, 2, 191. [5] F. Allen, et al., Metabolomics, 2014, 11, 98. [6] N. O'Boyle, et al., J. Cheminformatics, 2011, 3, 33. [7] G. Wohlgemuth, et al., Bioinformatics, 2010, 26, 2647. [8] http://cactus.nci.nih.gov/chemical/structure [9] H.E. Pence, et al., Journal of Chemical Education, 2010, 87, 1123.
ParticipantID: seeslab Category: category1 Authors: Ruiz-Botella Manuel(1), Senan Oriol(1), Yanes Oscar(1, 2), Sales-Pardo Marta(1) and Guimerà Roger(1, 3) Affiliations: (1) Universitat Rovira i Virgili Av. Pa\"isos Catalans 26, Tarragona 43007, Catalonia, Spain, (2) Metabolomics Platform, Spanish Biomedical Research Center in Diabetes and Associated Metabolic Disorders (CIBERDEM), Monforte de Lemos 35, 28029 Madrid, Spain, (3) Instituci\'o Catalana de Recerca i Estudis Avan\c{c}ats (ICREA), Llu\'is Companys 23, Barcelona 08010, Catalonia, Spain Automatic pipeline: yes Spectral libraries: yes Abstract We have used four different methods to obtain candidate structures for the 45 challenges. The first method we have used is iMet (http://imet.seeslab.net/). It uses MS/MS data, the MS1 parental ion and isotopic patterns as input data. With iMet we obtained list of candidate neighbors and putative molecular formulas for each challenge. Candidate neighbors have two characteristics: first they are metabolites in iMet database which have a spectral similarity with the query molecule. Second they are molecules that with one chemical transformation (for example CH3 addition) included in iMet biotransformation database we would get a metabolite whose molecular formula has the mass of the query metabolite. iMet has spectral information from METLIN, HMDB and Massbank. It uses three values of collision energy as a parameter, and it accepts values of 0, 10, 20 and 40. We have used a combination of the two closest energy values in the challenges that were obtained with a different collision energy value that the ones accepted by iMet. The second method is MSFINDER: (http://prime.psc.riken.jp/Metabolomics_Software/MS-FINDER/index.html). MSFINDER uses MS/MS data and isotopic pattern and outputs a list of putative structures for the query molecule. MSFINDER uses spectral databases, and molecular databases, like Pubchem or ChEBI. We have used the following databases (they change depending on the challenge): BMDB,ChEBI,DrugBank,FooDB,HMDB,KNApSAcK,PlantCyc,PubChem,T3DB,UNPD,MINE,STOFF The third method to obtain candidate structures was Metfusion (https://msbi.ipb-halle.de/MetFusion/). We have used Metlin as a spectral database and ChemSpider molecular database. It combines both results based on the Tanimoto coefficient. The last method we have used is CFM ID (http://cfmid.wishartlab.com/). Specifically, we have used the Compound Identification tool, which ranks a list of candidate structures according to how well they match the input spectra. The input spectra is a peak list at different energy levels (10 V, 20 V, 40 V) Once we had the candidate structures (for MSFINDER, Metfusion and CFM ID) and the candidate neighbors (for iMet), we have used a trained Naive Bayes model that combines the solutions of the four methods, and outputs a sorted (by score) list of compound structures. The reported score reflects how confident is the meta-model about that particular structure. The meta-model has been trained with known compound spectra.
ParticipantID: kai_iso Category: category1 Authors: Dührkop, Kai (1) and Ludwig, Marcus (1) and Böcker, Sebastian (1) and Bach, Eric (2) and Brouard, Céline (2) and Rousu, Juho (2) Affiliations: (1) Chair of Bioinformatics, Friedrich-Schiller University, Jena (2) Department of Computer Science, Aalto University Developmental Biology, Halle, Germany Automatic pipeline: yes Spectral libraries: no Abstract We processed the peaklists in MGF format using an in-house version of CSI:FingerID. Fragmentation trees were computed with Sirius 3.1.5 using the Q-TOF instrument settings. As the spectra were measured in MSe mode we expect to see isotope peaks in MSMS. We used an experimental feature in SIRIUS that allows for detecting isotope patterns in MSMS and incorporate them into the fragmentation tree scoring. We used the standard workflow of the SIRIUS+CSI:FingerID (version 3.5) software: We computed trees for all candidate formulas by searching the given precursor mass (with the list of common adduct types from category 4) in PubChem. Only the top scoring trees were selected for further processing: Trees with a score smaller than 75% of the score of the optimal tree were discarded. Each of these trees was processed with CSI:FingerId as described in [1]. We predicted for each tree a molecular fingerprint (with platt probability estimates) and compared them against the fingerprints of all structure candidates with the same molecular formula. For comparison of fingerprints, we used the new new maximum likelihood scoring function which is implemented since SIRIUS 3.5. The resulting hits were merged together in one list and were sorted by score. A constant value was added to all scores to make them positive (as stated in the CASMI rules). Ties of compounds with same score were ordered randomly. If a compound could not be processed (e.g. because of multiple charges) its score was set to zero. [1] Kai Dührkop, Huibin Shen, Marvin Meusel, Juho Rousu and Sebastian Böcker Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc Natl Acad Sci U S A, 112(41):12580-12585, 2015.
ParticipantID: kai112 Category: category1 Authors: Dührkop, Kai (1) and Ludwig, Marcus (1) and Böcker, Sebastian (1) and Bach, Eric (2) and Brouard, Céline (2) and Rousu, Juho (2) Affiliations: (1) Chair of Bioinformatics, Friedrich-Schiller University, Jena (2) Department of Computer Science, Aalto University Developmental Biology, Halle, Germany Automatic pipeline: yes Spectral libraries: no Abstract We processed the peaklists in MGF format using an in-house version of CSI:FingerID. Fragmentation trees were computed with Sirius 3.1.5 using the Q-TOF instrument settings. The preliminary results have shown that we miss a lot of compounds because we were not always able to identify the correct molecular formula in top ranks. This might be because no isotope patterns for the precursor were given. So we prepared a second submission kai112 which is not longer using a hard threshold, but instead consider all molecular formulas for the CSI:FingerID search and add the SIRIUS score on top of the CSI:FingerID score. To avoid that empty trees (which we would have thrown away by a hard threshold) get high scores by random, we add a penalty of 1000 if a tree explains not a single fragment peak. Furthermore, for the kai112 submission we trained CSI:FingerID on a larger dataset that contains also spectra from NIST. Beside removing the hard threshold, the kai112 submission follows the standard SIRIUS+CSI:FingerID protocol: We computed trees for all candidate formulas by searching the given precursor mass (with the list of common adduct types from category 4) in PubChem.. Each of these trees was processed with CSI:FingerId as described in [1]. We predicted for each tree a molecular fingerprint (with platt probability estimates) and compared them against the fingerprints of all structure candidates with the same molecular formula. For comparison of fingerprints, we used the new new maximum likelihood scoring function which is implemented since SIRIUS 3.5. Trees with one node get penalty of 1000. For all other trees, the SIRIUS score was added to the CSI:FingerID score. The resulting hits were merged together in one list and were sorted by score. A constant value was added to all scores to make them positive (as stated in the CASMI rules). Ties of compounds with same score were ordered randomly. If a compound could not be processed (e.g. because of multiple charges) its score was set to zero. [1] Kai Dührkop, Huibin Shen, Marvin Meusel, Juho Rousu and Sebastian Böcker Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc Natl Acad Sci U S A, 112(41):12580-12585, 2015.