Reproducibility of Genetic Risk Factors Identified for Long COVID using Combinatorial Analysis Across US and UK Patient Cohorts.., 2025, Sardell et al

Reproducibility of Genetic Risk Factors Identified for Long COVID using Combinatorial Analysis Across US and UK Patient Cohorts with Diverse Ancestries

Jason Sardell, Matt Pearson, Karolina Chocian, Sayoni Das, Krystyna Taylor, Mark Strivens, Rohit Gupta, Amy Rochlin, Steve Gardner

Background
Long COVID is a major public health burden causing a diverse array of debilitating symptoms in tens of millions of patients globally. In spite of this overwhelming disease prevalence and staggering cost, its severe impact on patients' lives and intense global research efforts, study of the disease has proved challenging due to its complexity. Genome-wide association studies (GWAS) have identified only four loci potentially associated with the disease, although these results did not statistically replicate between studies. A previous combinatorial analysis study identified a total of 73 genes that were highly associated with two long COVID cohorts in the predominantly (>91%) white European ancestry Sano GOLD population, and we sought to reproduce these findings in the independent and ancestrally more diverse All of Us (AoU) population.

Methods
We assessed the reproducibility of the 5,343 long COVID disease signatures from the original study in the AoU population. Because the very small population sizes provide very limited power to replicate findings, we initially tested whether we observed a statistically significant enrichment of the Sano GOLD disease signatures that are also positively correlated with long COVID in the AoU cohort after controlling for population substructure.

Results
For the Sano GOLD disease signatures that have a case frequency greater than 5% in AoU, we consistently observed a significant enrichment (77% - 83%, p < 0.01) of signatures that are also positively associated with long COVID in the AoU cohort. These encompassed 92% of the genes identified in the original study. At least five of the disease signatures found in Sano GOLD were also shown to be individually significantly associated with increased long COVID prevalence in the AoU population. Rates of signature reproducibility are strongest among self-identified white patients, but we also observe significant enrichment of reproducing disease associations in self-identified black/African-American and Hispanic/Latino cohorts. Signatures associated with 11 out of the 13 drug repurposing candidates identified in the original Sano GOLD study were reproduced in this study.

Conclusion
These results demonstrate the reproducibility of long COVID disease signal found by combinatorial analysis, broadly validating the results of the original analysis. They provide compelling evidence for a much broader array of genetic associations with long COVID than previously identified through traditional GWAS studies. This strongly supports the hypothesis that genetic factors play a critical role in determining an individual's susceptibility to long COVID following recovery from acute SARS-CoV-2 infection. It also lends weight to the drug repurposing candidates identified in the original analysis. Together these results may help to stimulate much needed new precision medicine approaches to more effectively diagnose and treat the disease. This is also the first reproduction of long COVID genetic associations across multiple populations with substantially different ancestry distributions. Given the high reproducibility rate across diverse populations, these findings may have broader clinical application and promote better health equity. We hope that this will provide confidence to explore some of these mechanisms and drug targets and help advance research into novel ways to diagnose the disease and accelerate the discovery and selection of better therapeutic options, both in the form of newly discovered drugs and/or the immediate prioritization of coordinated investigations into the efficacy of repurposed drug candidates.

Link | PDF (Preprint)
 
So from what I can understand, they are claiming this is a replication of an earlier study, which seems like a big deal.

And again Precision Life are talking about drug targets and repurposed drugs. Have they ever been specific about these in LC or ME?

I would be interested to compare this to their findings in ME.
 
It sounds encouraging. I have looked through the paper rather quickly and cannot find any account of what the genes puled out actually are. I think that the usefulness of the data will depend directly on what the genes are and what the combinations are.

Hopefully someone can tell us.
 
The numbers before the rs IDs are which "disease signature" in Table 7 they are part of. Links go to dbSNP.

Severe cohort
Four high-frequency disease signatures from the Severe Sano GOLD analysis were significantly associated with increased prevalence of long COVID in AoU, one of which was still significant after applying the more conservative Bonferroni FDR correction (Table 6) [Signature 1 below]. All four signatures are comprised of five SNP genotypes, each of which contributes to the overall association with disease in AoU (i.e., removing any of the SNP genotypes from the signature results in a lower odds ratio). This observation highlights the utility of the combinatorial analysis approach for identifying genetic disease associations.

Two of the replicating disease signatures from the Severe analysis mapped to the gene CCDC146 and one mapped to D2HGDH. These genes have different functions and affect different potential mechanism of action hypotheses for their role in the development of long COVID. CCDC146 is a ubiquitous centriole and microtubule-associated protein linked to cognitive functioning and type 2 diabetes. D2HGDH is an enzyme involved in mitochondrial functioning, also exhibits anti-inflammatory effects.

1,2,3: rs17035343 - No gene
1,2,3: rs1872513 - LOC105377488 : Intron Variant
1,2,3: rs9312595 - ASB5 : Intron Variant
1,2,3: rs4936114 - No gene

1: rs12454570 - LOC105371973 : Intron Variant
2: rs58438895 - CCDC146 : Intron Variant, LOC102723791 : Non Coding Transcript Variant
3: rs1109968 - CCDC146 : Missense Variant, LOC102723791 : Intron Variant

4: rs6716743 - D2HGDH : Non Coding Transcript Variant
4: rs2010874 - LINC00903 : 2KB Upstream Variant
4: rs13096228 - No gene
4: rs67844017 - LINC02520 : Intron Variant
4: rs79853277 - No gene

Fatigue dominant cohort
Two disease signatures from the Fatigue Dominant Sano GOLD analysis were significantly associated with increased prevalence of long COVID in AoU, one of which was still significant after applying the more conservative Bonferroni FDR correction (Table 6) [Signature 5 below]. The latter is comprised of two SNP genotypes, while the other is comprised of five SNP genotypes. Each of the individual SNP genotypes contribute to the signatures’ association with disease in AoU.

5: rs9515203 - COL4A2 : Intron Variant
5: rs11633336 - SLC12A1 : Intron Variant

6: rs10914896 - No gene
6: rs10229643 - GLCCI1 : Intron Variant
6: rs9960341 - No gene
6: rs2076584 - RIN2 : Synonymous Variant
6: rs17702926 - LINC02885 : Intron Variant

Finally, if we pool the signatures from the Severe and Fatigue Dominant cohorts into a single analysis (excluding the large number of signatures from the General cohort to avoid the need for stringent FDR correction), then 5 of signatures in Table 7 remain significant under the combined Benjamini-Hochberg FDR correction. These include all four significant signatures from the Severe analysis and the top signature from the Fatigue Dominant analysis.


-----

Of the 13 repurposing gene candidates identified in Taylor et al. (2023), 11 (85%) map to at least one disease signature that reproduces in AoU (see Supplemental Table 9). These genes include TLR4 which Taylor et al. (2023) noted has been shown to protect against long-term cognitive impairment pathology caused by SARS-CoV-245. Inhibition of TLR4 in a mouse model was shown to prevent long term cognitive pathology including synapse elimination and memory deficits that are caused by the SARS-CoV-2 Spike protein. Previous clinical studies have shown that antagonizing TLR4 signaling has the effect of dampening the pathological cytokine storm observed in patients with severe acute COVID-19 and reduces mortality rates in hospitalized COVID-19 patients46,47

From Supplemental Table 9 (links to GeneCards):
CETP
GUCY1A2
HPN
MAPK9
PDE4D
POR
PRODH
RRBP1
SLC12A1
TLR4
TNIK

-----

Importantly, the results of this paper provide strong supporting evidence for a much broader range of genetic associations with long COVID than has been uncovered by GWAS studies to date. This provides evidence highly consistent with a strong biological basis of the disease and the hypothesis that patients’ genetics influence their susceptibility to developing long COVID (and their predominant symptoms) following recovery from acute SARS-CoV-2 infection.

The AoU ancestry distribution differs significantly from the mainly (>91%) white British patient cohort used in the original combinatorial analysis. Disease signature reproducibility rates are very strong in the sub-cohort of self-identified white patients, as expected given the similarity in ancestry between that cohort with the original Sano GOLD dataset. Signature reproducibility
rates are lower in sub-cohorts of self-identified black/African-Americans and Hispanic/Latinos, but we still observe significant enrichment of disease signatures despite very small sample sizes.

Edit: Added GeneCards links to more genes.
 
Last edited:
Would you br able to translate it into laymans terms?

Proteins associated with microtubulules are likely to be the basis of a cell's internal 'structural memory'. For nerve cells that probably relates closely to memory as we normally understand it. It says the protein is linked to cognitive function so that would fit. I had a thought that in ME/CFS there might be an inability to forget input clutter from the previous day (which is what sleep is for). It could link in to a complement issue since that also seems to be related to (useful) forgetting.

More generally it would be an example of what I have previously called the 'writing on the wall' - i.e. regulatory signals that you do not find in solution but painted on to cell or matrix structure in a hidden form.

The relation to type 2 diabetes is obscure but insulin resistance (type 2) seems to keep coming up on proteomic trawls.

It is all too easy to try to tie together things when, biology being so complex, you could probably tie any two things together (and wrongly). But it is worth a try.
 
Proteins associated with microtubulules are likely to be the basis of a cell's internal 'structural memory'.
Thank you!

If something external (to the cell) ‘instructed’ the cell to enter e.g. a ‘defensive’ state, could ‘deviations’ in this gene lead to a reduction in the ability of the cell to get out of that state?

Or could it be more like a bug in a code that leads to a incorrect deployment of an ‘instruction’? Like the cell doesn’t ‘remember’ how or what to do?
 
If something external (to the cell) ‘instructed’ the cell to enter e.g. a ‘defensive’ state, could ‘deviations’ in this gene lead to a reduction in the ability of the cell to get out of that state?

Quite plausibly. That would be a bit similar to the amyloid-related disease stories like BSE.
But like BSE you might not need to have anything wrong with your gene. The gene product RNA would get caught up in a cycle of mistaken epigenetic change.

I chatted to the pain insensitivity guy about this a bit and he kept nodding and saying exactly. So I think I am probably not misrepresenting things.

I am thinking particularly that the cell may not be able to 'forget' what it was programmed to do earlier. Like an office where the shredder just piled the old stuff back on your desk until you could see for the piles of paper.
 
I am thinking particularly that the cell may not be able to 'forget' what it was programmed to do earlier. Like an office where the shredder just piled the old stuff back on your desk until you could see for the piles of paper.
So when something interacts with a cell, it stores the ‘instruction’ in an internal memory structure? And that structure gets stuck in one state, like not being able to overwrite the memory of a computer? So you keep running the same code on repeat.

Or are the memories of the instructions more like contained physical entities that are created on demand, and the problem is that they don’t get destroyed or flushed out after use? Clogging the system, essentially.
 
"CCDC146 is a ubiquitous centriole and microtubule-associated protein linked to cognitive functioning and type 2 diabetes."

That might be of interest.
I’m really brainfoggy, did the study find people with the gene related to synthesising this protein was more common in people who developed “severe” long COVID? Or am I misunderstanding, I don’t have the energy to dig deep in the paper sorry.
 
So when something interacts with a cell, it stores the ‘instruction’ in an internal memory structure? And that structure gets stuck in one state, like not being able to overwrite the memory of a computer? So you keep running the same code on repeat.

Or are the memories of the instructions more like contained physical entities that are created on demand, and the problem is that they don’t get destroyed or flushed out after use? Clogging the system, essentially.

Take your pick. Could be anything like this. But it opens up new possibilities. And, as I say, the guy who works on the things kept nodding. He works on absent pain but clearly thinks about unexplained pain too and I suspect he had already had similar thoughts, if maybe not so much about ME/CFS per se.
 
I’m really brainfoggy, did the study find people with the gene related to synthesising this protein was more common in people who developed “severe” long COVID? Or am I misunderstanding, I don’t have the energy to dig deep in the paper sorry.

Someone else may have a better take on this but we may need Chris Ponting to explain.

My reading is that the paper is talking about a particular variant of this gene being linked to Long Covid of an ME/CFS flavour. The gene would come in two sequence variants (probably a more common one a less common but they might be much the same) and they may not know how the two variants differ in terms of how much protein is made or how well it works. There is just likely to be a difference in the effect on the protein's action.

Also, an important part of this approach is that they are not saying that this gene variant on its own is linked to Long Covid but this gene variant, combined with a couple of other gene variants seems to pop up more often. That seems to suggest that what matters is some interacting combination of things. A bit like getting a full house at Poker. Winning the game does not depend on having a queen of hearts but on having another queen and three jacks as well. And the king of spades can do the same but only if there is another king.

It seems like a plausible and important way to look at genetic traits that could explain a strong genetic basis that need not show up on a simple single gene linkage search.
 
My reading is that the paper is talking about a particular variant of this gene being linked to Long Covid of an ME/CFS flavour.
I'm not sure it's necessarily related to ME/CFS type long COVID. I think this study looked at people with any kind of long COVID. They tried to replicate gene signatures from their 2023 study which grouped people into severe and fatigue dominant groups. The CCDC146 SNPs here appear to replicate from the severe long COVID group.

This study's inclusion criteria:
The baseline long COVID cohort was created by selecting all 458 individuals with GDA genotyping data who have a diagnosis of long COVID, using ICD-10 code U09.9 (post-acute COVID-19). We note that this criterion, which implies a prevalence of long COVID less than 0.2%, almost certainly excludes many patients with long COVID based on published estimates of long COVID prevalence of between 6.9% to 14%36,37,38.

The control cohort was generated by selecting individuals with GDA genotyping data who have evidence of SARS-CoV-2 infection, either based on a reported positive COVID-19 test in the COPE COVID-19 survey (n=3,615) or presence of ICD-10 codes B97.21 or U07.1 (n=17,024). We excluded individuals with long COVID based on ICD-10 code U09.9 as well as any individual with a history of symptomatic phenotypes consistent with long COVID or other post-viral fatigue syndromes (see Supplemental Table 1). Applying these criteria, our maximum control population included 9,774 individuals.

What they're replicating:
We previously identified long COVID associated disease signatures in two patient cohorts derived from the Sano GOLD study cohort, as described in the original Taylor et al. (2023) paper, and a third unreported patient cohort using a broader definition of the disease (Supplemental Table 4). This resulted in:
  1. 1,188 signatures mapped to 43 genes, from a ‘Severe’ cohort of patients who reported the greatest variety and severity of long COVID symptoms.
  2. 1,435 signatures mapped to 35 genes, from a ‘Fatigue Dominant’ cohort of patients who reported predominantly fatigue-associated long COVID symptoms.
  3. 6,445 signatures mapped to 165 genes, from a ‘General’ cohort of patients who reported they were still suffering continuation or development of new symptoms 12 weeks after the initial SARS-CoV-2 infection, with these symptoms lasting for at least 2 months with no other explanation.

Looking at Supplementary File 2 from the 2023 paper, both the SNPs listed here for this gene, as they say, are from the severe cohort. Though there is another SNP listed for CCDC146 in the fatigue dominant cohort of that study: rs57954141

Edit: Fixed file number.
 
Last edited:
Col4a1 is of interest to me being a location of my rare genetic disorder Axenfeld Rieger syndrome also FOXO1 another location and FOXC1
This paper mentions rapamycin

https://pmc.ncbi.nlm.nih.gov/articles/PMC3459649/
COL4A1 and COL4A2 mutations and disease: insights into pathogenic mechanisms and potential therapeutic targets

COL4A1/A2-Related Disorders

https://rarediseases.org/rare-diseases/col4a1-a2-related-disorders/

https://en.m.wikipedia.org/wiki/Sirolimus
reduces the sensitivity of T cells and B cells to interleukin-2 (IL-2), inhibiting their activity.
 
Last edited:
We hope that this will provide confidence to explore some of these mechanisms and drug targets and help advance research into novel ways to diagnose the disease and accelerate the discovery and selection of better therapeutic options, both in the form of newly discovered drugs and/or the immediate prioritization of coordinated investigations into the efficacy of repurposed drug candidates.

Are any of these findings viable drug targets? I would have thought it was far too early for anything like that but very happy to be wrong.

Their ME/CFS press release from last year claimed that they had identified viable repurposed drugs iirc but I don't think they've ever expanded on it. Perhaps not elaborating is for business reasons, but I worry it could all be puff.

Precision Life are doing very interesting work but this community has had enough false hope.

If anyone knows any more I'd be interested to hear, even/especially if it is simply that they are not making drug candidates public yet.
 
Just found this thread quite informative/refreshed my memory.

https://www.s4me.info/threads/genet...analysis-2023-taylor-et-al.34243/#post-484434

Also found this video from UnitetoFight last year, havent watched yet though.



At the end of the UnitetoFight talk Das says that clinical trials will begin after replication on the LOCOME project (with DecodeME data) is complete!

At Metrodora in Salt Lake City. I guess it remains to be seen how much of this is hot air but it's certainly exciting!
 
Back
Top Bottom
OSZAR »