How the Molecules Gateway was generated

The generation of the Molecules Gateway can be summarized in three distinct steps:

sample preparation
sample analysis
data processing

1. Sample preparation

This involved parallel processing of 80 different actinomycete cultures. To a 2-mL sample of each culture in a 15-mL centrifuge tube, 4 mL ethanol was added. The tube was shaken for 1 h at 30°C and centrifuged at 4000 rpm for 8 minutes. Then, 125 µl from each supernatant were transferred in each of 24 identical 96-well microtiter plates. One of the plates was used for analysis, while the others were dried for storage.

2. Analysis

Samples were analyzed using a Vanquish UHPLC system (Thermo Fisher Scientific) with a YMC-Triart ODS column (3.0 × 100 mm, S-1.9 μm, 12 nm) coupled to an Orbitrap Exploris™ 120 high-resolution mass spectrometer (Thermo Scientific Scientific). The mobile phase, which was delivered at 0.8 ml min−1 at 40°C, consisted of 0.1% formic acid in H2O (A), LCMS-grade acetonitrile (B) and LCMS-grade isopropyl alcohol (C). The runtime sample analysis was 23 minutes and the mass to charge (m/z) ratio (MS1scan) was measured in the range from 150 to 2000. Further details can be found
here.

3. Data processing

This consisted of four automated steps:

Pre-processing. Each molecule can be detected by the mass spectrometer as different ionization types (adducts) at identical retention time (RT). Thus, signals need to be processed by identifying the adducts originating from the same molecule. This step is done by using the “Align retention times”, “Detect compounds” and “Group compounds” nodes of Compound Discoverer workflow. Next, signals below a fixed threshold in peak area and shape are filtered out.
Internal dereplication. This involves comparing newly acquired, consolidated and filtered signals, with a library of previously annotated molecules. This step is carried out by using the “Search mzVault” node of the Compound Discoverer workflow. A signal matching a molecule present in the library receives automatically that molecule’s annotation and enters the library with the associated metadata. Signals without matches enter the annotation workflow.
Annotation. Three annotation tools are used in parallel: Compound Discoverer, MolDiscovery and MS2Query. Each tool may predict or not the identity of a molecule and different tools may give different predictions. So, after the individual analyses, the outputs from the three tools are harmonized, linked and compared. Cases without predictions by any tool are labelled as “unknown” and enter the Molecules Gateway with the lowest scoring level; cases with prediction(s) enter the “Consistency and Ranking” step.
Consistency and Ranking. In order to choose and score the most likely annotation, the different predictions are compared for agreement (same InChIKey) or name similarity, and according to whether or not the predicted molecule has been reported in the literature to originate from actinomycetes. In case of disagreement in predictions, two additional criteria are introduced: i) whether or not the chemical formula of the predicted molecule is identical to the formula predicted by SIRIUS; and ii) whether or not the calculated RT of the predicted molecule is within the confidence limits of the observed RT. Finally, a qualitative confidence level is assigned to the chosen prediction according to the Table below.

Annotation level	Predictions by	Source from Actinomycetes
★★★★	agreement by ≥ 2 tools	✓
★★★	agreement by ≥ 2 tools
★★	single tool or disagreement	✓
★	single tool or disagreement
–	No prediction
★★★★★	Matching reference standards

The generation of the Molecules Gateway can be summarized in three distinct steps:

1. Sample preparation

2. Analysis

3. Data processing

Demo Title