Decoding the Complexities of COVID-19 with Analysis
Sameh, Mahmoud, et al. “Integrated analysis to infer COVID-19 biological insights.” Scientific Reports 13.1 (2023): 1802.
A fundamental question we have heard in several conversations, for example, with Ocean Genomics and , is how to integrate data from >1 level analyzed. Nature publication journal created an article called “Integrated analysis to infer COVID-19 biological insights.” The focus was on COVID19 severe response with an investigated cohort of 57 patients and controls, analyzed for blood– based proteomics and metabolomics.
Proteins were measured by LC-MS/MS method and another method (UHP-LC-MS/MS) to measure metabolites. The analysis suite to watch is different.
The researchers compared 4 different approaches to analyze the data.
A comparison of 4 different approaches to analyze data:
Single omics – analyze each level by itself from rough data to implicated gene/protein/pathway list
analysis based on knowledge-based-network (KBN)
Multi analysis based on statistical-based-network (SBN)
They compared the 4 approaches at the final output level – the “findings”, what is unique and how the specific approach increased significance for shared (or carried over) findings.
For all possible analysis options, the first step was to create the “ground truth” data set of rough data – proteins and metabolites that passed missing value criteria, coupled (found in more than 20% of samples) with uniquely expressed proteins and metabolites in the COVID-19 cohort (excluding drugs and other non-relevant metabolites). This is crucial to avoid dealing with too large a data set and making the network analysis impossible. For the single analysis – the samples were analyzed separately all the way through till the final list and the output compared for shared values.
For the knowledge-based approach and the statistical approach – cleaning the data was highly important, avoiding the “hair ball” effect.
When comparing the methods, the researchers showed that 71% of pathway signatures found in the single omics analysis were found in the KBN, but at higher and higher significance levels. Same was shown for the SBN that shared 71% (20 pathways) with the single omics 25 pathways with the KBN. Therefore, to integrate the KBN and SBN they took their shared proteins (10) and metabolites (4) between KBN, SBN and added their immediate contacts on both KBN and SBN (53).
In summary I think the diagram (above) showing significant pathways (FDR < 0.05) uncovered by each method were compared to each other, indicate that the methodology is 1) heavy depended on proper data wrangling and clean up specifically for a question in mind (defines the knowledge in KBN); 2) no one method to roll them all, and eventually there is value in running a number of methodologies and concise reporting.
It is unclear what knowledge base the authors used to KBN, as they provide an inexhaustive version of KBN and SBN interaction tables in the supplementary materials.