Borody et al
Article
Borody et al is a recurring person in the Astral Codex Ten archive, appearing 2 times across 2 issues between November 17, 2021 and February 01, 2023. The archive places it in contexts such as “Borody et al: Our last paper!”; “rejecting Borody et al, whose control group was”. It most often appears alongside Alexandros Marinos, Aref, Argentina.
Metadata
- Category: People
- Mention count: 2
- Issue count: 2
- First seen: November 17, 2021
- Last seen: February 01, 2023
Appears In
Related Pages
-
- Alexandros Marinos (2 shared issues)
-
- Aref (2 shared issues)
-
- Argentina (2 shared issues)
-
- Australia (2 shared issues)
-
- azithromycin (2 shared issues)
-
- azithromycin (2 shared issues)
-
- Biber (2 shared issues)
-
- Biber et al (2 shared issues)
-
- BMJ (2 shared issues)
-
- Brazil (2 shared issues)
-
- Cadegiani (2 shared issues)
-
- Carvallo (2 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
On the one hand, I have immense contempt for ivmmeta for letting all those other awful studies pass and then pulling out all the stops to try to nitpick this one. I have no idea if their proposed randomization failure really happened. And no doubt the reason they’re even able to investigate this is that this study is really careful and transparent - most of them don’t tell you anything about their randomization method. I would be shocked if other studies don’t have all these problems and worse. On the other hand, the point isn’t to be fair, it’s to be right. And this is a potential confounder. Not a huge one. But a potential one. I guess all we can do is try to bound the damage. Even if the confounding is 100% real and bad, there’s no way to make this study consistent with the crazy super-pro-ivermectin results of studies like Espitia-Hernandez and Aref. And even if we deny any confounding, we see the same slight pro-ivermectin trend - 86 hospitalizations vs. 95 - that we’ve seen in so many other studies. Nothing is going to make me believe that this isn’t in the top 33% of studies we’ve been looking at, so let’s add it as grist for the meta-analysis (though maybe not quite as much grist as its vast size indicates) and move on, angrily. Buonfrate et al: An Italian RCT. Patients were randomized into low-dose ivermectin (32), placebo (29), or high-dose ivermectin (32). Primary outcome was viral load on day 7. There was no significant difference (average of 2 in ivermectin groups, 2.2 in placebo group). They admit that they failed to reach the planned sample size, but did a calculation to show that even if they had, the trial could not have returned a positive result. Clinically, an average of 2 patients were hospitalized in each of the ivermectin arms, compared to 0 in the placebo arm - which bucks our previously-very-constant pro-ivermectin trend. Mayer et al: Not an RCT. Patients in an Argentine province were offered the opportunity to try ivermectin; 3266 said yes and become the experimental group, 17966 said no and became the control group. There were many obvious differences between the groups, but they all seemed to handicap ivermectin. There was a nonsignificant trend toward less hospitalization and significantly less mortality (1.5% vs. 2.1%, p = 0.03). While looking into this study, I learned the term “immortal time bias”. This means a period in between selection for the study and the beginning of study recording where patient outcomes are not counted. I think the problem here is that if you signed up for the system on Day X, and if you got sick before they could give you ivermectin, you were in the control group. See this Twitter thread, I have not confirmed everything he says. This only hardens my resolve to stay away from non-RCTs. Borody et al: Our last paper! …is it a paper? I can’t find it published anywhere. It mostly seems to be on news sites. Doesn’t look peer-reviewed. And it starts with “Note that views expressed in this opinion article are the writer’s personal views”. Whatever. 600 Australians were treated with ivermectin, doxycycline, and zinc. The article compares this to an “equivalent control group” made of “contemporary infected subjects in Australia obtained from published Covid Tracking Data”; this is not how you control group, @#!% you. Then it gets excited about the fact that most patients had better symptoms at the end of the ten-day study period than the beginning (untreated COVID resolves in about ten days). Why are these people wasting my time with this? Let’s move on. The Analysis If we remove all fraudulent and methodologically unsound studies from the table above, we end up with this: Gideon Meyerowitz-Katz, who investigated many of the studies above for fraud, tried a similar exercise. I learned about his halfway through, couldn’t help seeing it briefly, but tried to avoid remembering it or using it when generating mine (also, I did take the result of his fraud investigations into account), so they should be considered not quite independent efforts. His looks like this: He nixed Chowdhury, Babaloba, Ghauri, Faisal, and Aref, but kept Szenta Fonseca, Biber (?), and Mayer. There was correlation of 0.45, which I guess is okay. I asked him about his decision-making, and he listed a combination of serious statistical errors and small red flags adding up. I was pretty uncomfortable with most of these studies myself, so I will err on the side of severity, and remove all studies that either I or Meyerowitz-Katz disliked. We end up with the following short list: We’ve gone from 29 studies to 11, getting rid of 18 along the way. For the record, we eliminated 2/19 for fraud, 1/19 for severe preregistration violations, 10 for methodological problems, and 6 because Meyerowitz-Katz was suspicious of them. …but honestly this table still looks pretty good for ivermectin, doesn’t it? Still lots of big green boxes. Meyerowitz-Katz accuses ivmmeta of cherry-picking what statistic to use for their forest plot. That is, if a study measures ten outcomes, they sometimes take the most pro-ivermectin outcome. Ivmmeta.com counters that they used a consistent and reasonable (if complicated) process for choosing their outcome of focus, that being: If studies report multiple kinds of effects then the most serious outcome is used in calculations for that study. For example, if effects for mortality and cases are both reported, the effect for mortality is used, this may be different to the effect that a study focused on. If symptomatic results are reported at multiple times, we used the latest time, for example if mortality results are provided at 14 days and 28 days, the results at 28 days are used. Mortality alone is preferred over combined outcomes. Outcomes with zero events in both arms were not used (the next most serious outcome is used — no studies were excluded). For example, in low-risk populations with no mortality, a reduction in mortality with treatment is not possible, however a reduction in hospitalization, for example, is still valuable. Clinical outcome is considered more important than PCR testing status. When basically all patients recover in both treatment and control groups, preference for viral clearance and recovery is given to results mid-recovery where available (after most or all patients have recovered there is no room for an effective treatment to do better). If only individual symptom data is available, the most serious symptom has priority, for example difficulty breathing or low SpO2 is more important than cough. I’m having trouble judging this, partly because Meyerowitz-Katz says ivmmeta has corrected some earlier mistakes, and partly because there really is some reasonable debate over how to judge studies with lots of complicated endpoints. By this point I had completely forgotten what ivmmeta did, so I independently coded all 11 remaining studies following something in between my best understanding of their procedure and what I considered common sense. The only exception was that when the most severe outcome was measured in something other than patients (ie average number of virus copies per patient), I defaulted to one that was measured in patients instead, to keep everything with the same denominator. My results mostly matched ivmmeta’s, with one or two exceptions that I think are within the scope of argument or related to my minor deviations from their protocol. Placebo vs. ivermectin groups sometimes differed in size, which I’ve adjusted for and rounded off. Probably I’m forgetting some reason I can’t just do simple summary statistics to this, but whatever. It is p = 0.15, not significant. This is maybe unfair, because there aren’t a lot of deaths in the sample, so by focusing on death rather than more common outcomes we’re pointlessly throwing away sample size. What happens if I unprincipledly pick whatever I think the most reasonable outcome to use from each study is? I’ve chosen “most reasonable” as a balance between “is the most severe” and “has a lot of data points”: Now it’s p = 0.04, seemingly significant, but I had to make some unprincipled decisions to get there. I don’t think I specifically replaced negative findings with positive ones, but I can’t prove that even to myself, let alone to you. [UPDATE 5/31/22: A reader writes in to tell me that the t-test I used above is overly simplistic. A Dersimonian-Laird test is more appropriate for meta-analysis, and would have given 0.03 and 0.005 on the first and second analysis, where I got 0.15 and 0.04. This significantly strengthens the apparent benefit of ivermectin from ‘debatable’ to ‘clear’. I discuss some reasons below why I am not convinced by this apparent benefit.] (how come I’m finding a bunch of things on the edge of significance, but the original ivmmeta site found a lot of extremely significant things? Because they combined ratios, such that “one death in placebo, zero in ivermectin” looked like a nigh-infinite benefit for ivermectin, whereas I’m combining raw numbers. Possibly my way is statistically illegitimate for some reason, but I’m just trying to get a rough estimate of how convinced to be) So we are stuck somewhere between “nonsignificant trend in favor” and “maybe-significant trend in favor, after throwing out some best practices”. This is normally where I would compare my results to those of other meta-analyses made by real professionals. But when I look at them, they all include studies later found to be fake, like Elgazzar, and unsurprisingly come up with wildly positive conclusions. There are about six in this category. One of them later revised their results to exclude Elgazzar and still found strong efficacy for ivermectin, but they still included Niaee and some other dubious studies. The only meta-analysis that doesn’t make these mistakes is Popp (a Cochrane review), which is from before Elgazzar was found to be fraudulent, but coincidentally excludes it for other reasons. It also excludes a lot of good studies like Mahmud and Ravakirti because they give patients other things like HCQ and azithromycin - I chose to include them, because I don’t think they either work or have especially bad side effects, so they’re basically placebo - but Cochrane is always harsh like this. They end up with a point estimate where ivermectin cuts mortality by 40% - but say the confidence intervals are too wide to draw any conclusion. I think this basically agrees with my analyses above - the trends really are in ivermectin’s favor, but once you eliminate all the questionable studies there are too few studies left to have enough statistical power to reach significance. Except that everyone is still focusing on deaths and hospitalizations just because they’re flashy. Mahmud et al, which everyone agrees is a great study, found that ivermectin decreased days until clinical recovery, p = 0.003? So what do you do? This is one of the toughest questions in medicine. It comes up again and again. You have some drug. You read some studies. Again and again, more people are surviving (or avoiding complications) when they get the drug. It’s a pattern strong enough to common-sensically notice. But there isn’t an undeniable, unbreachable fortress of evidence. The drug is really safe and doesn’t have a lot of side effects. So do you give it to your patients? Do you take it yourself? Here this question is especially tough, because, uh, if you say anything in favor of ivermectin you will be cast out of civilization and thrown into the circle of social hell reserved for Klan members and 1/6 insurrectionists. All the health officials in the world will shout “horse dewormer!” at you and compare you to Josef Mengele. But good doctors aren’t supposed to care about such things. Your only goal is to save your patient. Nothing else matters. I am telling you that Mahmud et al is a good study and it got p = 0.003 in favor of ivermectin. You can take the blue pill, and stay a decent respectable member of society. Or you can take the horse dewormer pill, and see where you end up. In a second, I’ll tell you my answer. But you won’t always have me to answer questions like this, and it might be morally edifying to observe your thought process in situations like this. So take a second, and meet me on the other side of the next section heading. … … … … … The Synthesis Hopefully you learned something interesting about yourself there. But my answer is: worms! As several doctors and researchers have pointed out (h/t especially Avi Bitterman and David Boulware), the most impressive studies come from places that are teeming with worms. Mahmud from Bangladesh, Ravakirti from East India, Lopez-Medina from Colombia, etc. Here’s the prevalence of roundworm infections by country (source). But alongside roundworms, there are threadworms, hookworms, blood flukes, liver flukes, nematodes, trematodes, all sorts of worms. Add them all up and somewhere between half and a quarter of people in the developing world have at least one parasitic worm in their body. Being full of worms may impact your ability to fight coronavirus. Gluchowska et al write: Helminth [ie worm] infections are among the most common infectious diseases. Bradbury et al. highlight the possible negative interactions between helminth infection and COVID-19 severity in helminth-endemic regions and note that alterations in the gut microbiome associated with helminth infection appear to have systemic immunomodulatory effects. It has also been proposed that helminth co-infection may increase the morbidity and mortality of COVID-19, because the immune system cannot efficiently respond to the virus; in addition, vaccines will be less effective for these patients, but treatment and prevention of helminth infections might reduce the negative effect of COVID-19. During millennia of parasite-host coevolution helminths evolved mechanisms suppressing the host immune responses, which may mitigate vaccine efficacy and increase severity of other infectious diseases. Treatment of worm infections might reduce the negative effect of COVID-19! And ivermectin is a deworming drug! You can see where this is going… The most relevant species of worm here is the roundworm Strongyloides stercoralis. Among the commonest treatments for COVID-19 is corticosteroids, a type of immunosuppresant drug. The types of immune responses it suppresses do more harm than good in coronavirus, so turning them off limits collateral damage and makes patients better on net. But these are also the types of immune responses that control Strongyloides. If you turn them off even very briefly, the worms multiply out of control, you get what’s called “Strongyloides hyperinfection”, and pretty often you die. According to the WHO: The current COVID-19 pandemic serves to highlight the risk of using systemic corticosteroids and, to a lesser extent, other immunosuppressive therapy, in populations with significant risk of underlying strongyloidiasis. Cases of strongyloidiasis hyperinfection in the setting of corticosteroid use as COVID-19 therapy have been described and draw attention to the necessity of addressing the risk of iatrogenic strongyloidiasis hyperinfection syndrome in infected individuals prior to corticosteroid administration. Although this has gained importance in the midst of a pandemic where corticosteroids are one of few therapies shown to improve mortality, its relevance is much broader given that corticosteroids and other immunosuppressive therapies have become increasingly common in treatment of chronic diseases (e.g. asthma or certain rheumatologic conditions). So you need to “address the risk” of strongyloides infection during COVID treatment in roundworm-endemic areas. And how might you address this, WHO? Treatment of chronic strongyloidiasis with ivermectin 200 µg/kg per day orally x 1-2 days is considered safe with potential contraindications including possible Loa loa infection (endemic in West and Central Africa), pregnancy, and weight <15kg. Given ivermectin’s safety profile, the United States has utilized presumptive treatment with ivermectin for strongyloidiasis in refugees resettling from endemic areas, and both Canada and the European Centre for Disease Prevention and Control have issued guidance on presumptive treatment to avoid hyperinfection in at risk populations. Screening and treatment, or where not available, addition of ivermectin to mass drug administration programs should be studied and considered. This is serious and common enough that, if you’re not going to screen for it, it might be worth “add[ing] ivermectin to mass drug administration programs” in affected areas! Dr. Avi Bitterman carries the hypothesis to the finish line: First two images are with all relevant studies; second two are a sensitivity analysis that removes some of the most dubious. The good ivermectin trials in areas with low Strongyloides prevalence, like Vallejos in Argentina, are mostly negative. The good ivermectin trials in areas with high Strongyloides prevalence, like Mahmud in Bangladesh, are mostly positive. Worms can’t explain the viral positivity outcomes (ie PCR), but Dr. Bitterman suggests that once you remove low quality trials and worm-related results, the rest looks like simple publication bias: This is still just a possibility. Maybe I’m over-focusing too hard on a couple positive results and this will all turn out to be nothing. Or who knows, maybe ivermectin does work against COVID a little - although it would have to be very little, fading to not at all in temperate worm-free countries. But this theory feels right to me. It feels right to me because it’s the most troll-ish possible solution. Everybody was wrong! The people who called it a miracle drug against COVID were wrong. The people who dismissed all the studies because they F@#king Love Science were wrong. Ivmmeta.com was wrong. Gideon Meyerowitz-Katz was…well, he was right, actually, I got the worm-related meta-analysis graphic above from his Twitter timeline. Still, an excellent troll. Also, the best part is that I ignorantly asked, in my description of Mahmud et al above: And it was! It was a fluke! A literal, physical, fluke! For my whole life, God has been placing terrible puns in my path to irritate me, and this would be the worst one ever! So it has to be true! The Scientific Takeaway About ten years ago, when the replication crisis started, we learned a certain set of tools for examining studies. Check for selection bias. Distrust “adjusting for confounders”. Check for p-hacking and forking paths. Make teams preregister their analyses. Do forest plots to find publication bias. Stop accepting p-values of 0.049. Wait for replications. Trust reviews and meta-analyses, instead of individual small studies. These were good tools. Having them was infinitely better than not having them. But even in 2014, I was writing about how many bad studies seemed to slip through the cracks even when we pushed this toolbox to its limits. We needed new tools. I think the methods that Meyerowitz-Katz, Sheldrake, Heathers, Brown, Lawrence and others brought to the limelight this year are some of the new tools we were waiting for. Part of this new toolset is to check for fraud. About 10 - 15% of the seemingly-good studies on ivermectin ended up extremely suspicious for fraud. Elgazzar, Carvallo, Niaee, Cadegiani, Samaha. There are ways to check for this even when you don’t have the raw data. Like: The Carlisle-Stouffer-Fisher method: Check some large group of comparisons, usually the Table 1 of an RCT where they compare the demographic characteristics of the control and experimental groups, for reasonable p-values. Real data will have p-values all over the map; one in every ten comparisons will have a p-value of 0.1 or less. Fakers seem bad at this and usually give everything a nice safe p-value like 0.8 or 0.9.
Inline links: Buonfrate et al:, Mayer et al:, immortal time bias, this Twitter thread, Borody et al:, https://substackcdn.com/image/fetch/$s_!Wpjs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2d8a451b-b1fc-44e5-ae67-b1506e491762_914x657.png, https://substackcdn.com/image/fetch/$s_!DOjA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F17d5827a-38da-4a99-beb3-c3018df5c633_920x604.png, https://substackcdn.com/image/fetch/$s_!GX1n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc692fec8-a450-4579-b337-c72bec060970_912x298.png, https://substackcdn.com/image/fetch/$s_!YcH4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36db98e-e653-44da-906c-20312b1689a3_468x205.png, https://substackcdn.com/image/fetch/$s_!jbcL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd189a844-daf2-4199-bb2e-830d4fc64415_468x206.png, later revised their results to exclude Elgazzar, Popp, https://substackcdn.com/image/fetch/$s_!2B6r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F505c5ac4-3fe8-47a4-8505-dab80601b44d_416x198.png, Avi Bitterman, David Boulware, https://substackcdn.com/image/fetch/$s_!JWWh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fac9e4f34-f9cc-40f2-9d83-da4e7178fad7_772x330.png, source, Gluchowska et al, the WHO, carries, https://substackcdn.com/image/fetch/$s_!xExE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5da21781-249c-4e59-b616-9f23d83cc044_2048x1184.jpeg, https://substackcdn.com/image/fetch/$s_!4SMr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdcd6e4b2-37f7-4602-93d5-2581c3b27a60_700x432.png, https://substackcdn.com/image/fetch/$s_!-6n2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7fd6e8f4-093e-4e02-bce7-363615146c9c_2228x1346.jpeg, https://substackcdn.com/image/fetch/$s_!CPZs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0425847-198a-4bd3-a63b-149f15d147ba_700x432.png, https://substackcdn.com/image/fetch/$s_!H3rK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9972491b-25b0-4c06-8aca-86fce102ae63_666x147.png, even in 2014, The Carlisle-Stouffer-Fisher method
Carvallo said that zero people in the treatment group of his study got COVID, compared to 58% of people in the control group. This is a pretty implausibly big effect, even by the standards of other pro-ivermectin studies, although I don’t know if anyone else tried the exact same preventative protocol as Carvallo. I think this is a more nuanced story than Alexandros’ version where Buzzfeed just doesn’t know that sometimes studies happen at more than one hospital. Is fraud the best explanation? I think Alexandros thinks of Carvallo as just not keeping very good records, so he doesn’t have raw data, and probably mixed up his numbers a few times or gave false numbers, and didn’t have anything to send his collaborators when they asked. I think this is maybe possible, although it seems suspicious that he falsely said Dr. Lombardo was involved, falsely claimed the hospital involved was doing a different trial, and got very implausible results. I can imagine weird chains of events that would cause all of these things through honest misunderstandings. But they don’t seem like the best explanation. After discussing this with Alexandros, he objects to my use of the term “known fraudster”. Perhaps I should have said “highly credibly suspected fraudster” instead, although in a Bayesian sense nothing can ever be 100% and at some point plausibility shades imperceptibly into knowledge. Still, I feel like my description here was more accurate than Alexandros’, which just mentions the hospital approval issue and says nothing about any of the rest of this in a thousand word subsection about this study in particular. I did err in saying the Carvallo paper was retracted. According to the article: After BuzzFeed News raised questions about how the study’s data was collected and analyzed, a representative from the Journal of Biomedical Research and Clinical Investigation, which published the results, said late Monday, “We will remove the paper temporarily.” A link was removed from the table of contents — but was reinstated by Thursday. The journal’s explanation, provided after this story was published, was that the author “informed us that he has already provided the evidence of his study to the media.” I apologize for the error. Elalfy et al (still disagree with Alexandros) I described this as: As best I can tell, this is some kind of Egyptian trial. It might or might not be an RCT; it says stuff like “Patients were self-allocated to the treatment groups; the first 3 days of the week for the intervention arm while the other 3 days for symptomatic treatment”. Were they self-allocated in the sense that they got to choose? Doesn’t that mean it’s not random? Aren’t there seven days in a week? These are among the many questions that Elalfy et al do not answer for us. The control group (which they seem to think can also be called “the white group”) took zinc, paracetamol, and maybe azithromycin. The intervention group took zinc, nitazoxanide, ribavirin, and ivermectin. There were very large demographic differences between the groups of the sort which make the study unusable […] There is no primary outcome assigned, but viral clearance rates on day seven were 58% in the yellow group compared to 0% in the white group, which I guess is a strong positive result. This table looks very impressive, in terms of the experimental group doing better than the control, except that they don’t specify whether it was before the trial or after it, and at least one online commentator thinks it might have been before, in which case it’s only impressive how thoroughly they failed to randomize their groups. Overall I don’t feel bad throwing this study out. I hope it one day succeeds in returning to its home planet. In the summary post, Alexandros’ entire criticism of my coverage of this trial, one of the seven trials he focuses on as most unfairly covered and uses as the lynchpin of his argument that I am morally culpable for disastrously bad reporting, is: [Elalfy et al] are accused of incompetence for failing to randomize their groups multiple times in Scott’s piece. The paper writes in six separate places that it is not reporting on a randomized trial, amongst them on a diagram that Scott included in his own essay. Hard to imagine how else they could have made it clear. In his full post on this, he goes line by line to point out all the places they say they are non-randomized, pausing to snark about how dumb I am for not noticing each time4. But he never addresses the actual source of my confusion, which is the part of the paper where it says that: Patients were self-allocated to the treatment groups; the first 3 days of the week for the intervention arm while the other 3 days for symptomatic treatment. If this was done as described, it should be an (almost) random trial; patients who come in on Wednesdays shouldn’t systematically differ from patients who come in on Thursdays5. But in fact, it looks (assuming I am understanding a very ambiguous table correctly) like there are very large pre-existing differences between the groups, sufficient to explain the entire result. If they in fact followed their days-of-the-week protocol, and it was random as expected, then I’m misunderstanding the table seeming to show very large differences, and they have indeed found evidence for ivermectin’s efficacy. If they didn’t follow their day-of-the-week protocol and it’s non-random, then maybe I’m understanding the table correctly and their groups had large differences to begin with and the fact that they had large differences at the end of the trial doesn’t demonstrate anything about ivermectin. This is all I was trying to say in the post, and instead of having any opinion on it Alexandros just makes fun of me for saying it. I think our actual crux is that Alexandros thinks a table of big differences between the groups has to be post-treatment (based on how big the differences are), whereas I’m not sure (because it’s unclear in the study, and also because the authors describe what could be a randomization method but also go on and on about how nonrandom they are). This is why I thought it mattered how random it was! Maybe instead of mocking me for this, you can admit it’s an important and relevant question! Ghauri et al (still disagree with Alexandros) I describe this as: Pakistan, 95 patients. Nonrandom; the study compared patients who happened to be given ivermectin (along with hydroxychloroquine and azithromycin) vs. patients who were just given the latter two drugs. There’s some evidence this produced systematic differences between the two groups - for example, patients in the control group were 3x more likely to have had diarrhea (this makes sense; diarrhea is a potential ivermectin side effect, so you probably wouldn’t give it to people already struggling with this problem). Also, the control group was twice as likely to be getting corticosteroids, maybe a marker for illness severity. Primary outcome was what percent of both groups had a fever: on day 7 it was 21% of ivermectin patients vs. 65% of controls, p < 0.001. No other outcomes were reported. I don’t hate this study, but I think the nonrandom assignment (and observed systematic differences) is a pretty fatal flaw. Alexandros notes that these are three differences between experimental/control groups, out of 33 listed characteristics that could have been different. There is approximately a 23% chance (he calculates) that you could get these differences by chance. He accuses me of failing to do a formal Carlisle test - the usual test you would use to determine whether weird differences between randomized groups are because of fraud - instead eyeballing it and getting it wrong. Here I do want to defend myself: I am not accusing Ghauri et al of fraud. In fact, this would be nonsensical: they admit they are assigning patients nonrandomly. Carlisle tests are usually done to show that something about group assignment is impossible (and therefore fraudulent) in a fair random assignment. But these people aren’t claiming to have done a fair random assignment, so I’m not sure what a Carlisle test would prove. My argument is more like: this is nonrandom, therefore we should expect it to be unfair. It is unnecessary, but helpful, to note an actual apparent unfairness - there’s some evidence they gave the ivermectin to less severe patients (as measured by corticosteroid use). Therefore, we can’t necessarily trust this to be a fair trial (which it was never really claiming to be). In the end I kept Ghauri as an okay study, although GMK didn’t so it ended out trashed in the final analysis anyway. I think my thinking was that I never claimed to be only looking at RCTs, so this non-RCT whose between-group-differences confirmed that it was indeed a non-RCT with all the risk of bias that entails, didn’t necessarily need to be ruled out. Still, I don’t think I was wrong to mention this possibility, and I think Alexandros was wrong to suggest that I needed to do extra tests for this to be fair. Borody et al (still disagree with Alexandros) I described this as: Our last paper! …is it a paper? I can’t find it published anywhere. It mostly seems to be on news sites. Doesn’t look peer-reviewed. And it starts with “Note that views expressed in this opinion article are the writer’s personal views”. Whatever. 600 Australians were treated with ivermectin, doxycycline, and zinc. The article compares this to an “equivalent control group” made of “contemporary infected subjects in Australia obtained from published Covid Tracking Data”; this is not how you control group, @#!% you. Then it gets excited about the fact that most patients had better symptoms at the end of the ten-day study period than the beginning (untreated COVID resolves in about ten days). Why are these people wasting my time with this? Let’s move on. Alexandros lists his full concerns here. My summary: Scott is being incredibly disrespectful to the authors, who are in fact a legendary gastroenterologist who invented life-saving h. pylori therapy and a brilliant immunologist who invented a well-regarded bronchitis vaccine (in particular, in describing their control group, I said “this is not how you control group, @#!% you”.
“Synthetic control groups” - ie comparing people in a trial to some previously-known understanding of how a disease progresses - are a standard practice, and basically fine. Borody et al indeed have had amazing careers with many things they can be proud of. But I continue to believe that this paper is not among them. Synthetic control groups are more common in social sciences, but have occasionally been used in pharmacology when it would be unethical or extremely difficult to use a real control group. The most common use case is rare cancers, where it takes years to get enough patients to test a drug and it also seems kind of unethical to delay. Another good thing about rare cancers is that they're pretty discrete; you don't have to worry about things like "well, 90% of leukemias never make it to a doctor anyway, so maybe we're only seeing the serious leukemias" or "these guys counted the leukemias that get dealt with by the local doctors' office, but those other guys counted the leukemias that have to go to the hospital". More important, studies with synthetic control groups usually go above and beyond to justify why their synthetic control group should be a fair comparison to the treatment group. Here's an example, from a paper about a rare leukemia. They start by getting a synthetic control group from a previous randomized controlled trial of leukemia drugs (not the general population!) Then they throw out more than half their patients for not being a good match for the selection criteria of the current study. Then they investigate whether there are significant differences on five important demographic factors, and find a few. Then they re-weight the patients in the historical comaprator study to adjust out the differences between the previous population and the current population. Then they do some analyses to check if they re-weighted everything correctly. Then they apologize profusely for having to use this vastly inferior methodology at all: In special cases when a disease is rare, prognosis is very poor, and there are limited therapeutic options available, single-arm clinical trials may be used as evidence for accelerated drug approvals. Comprehensive evaluation of historical comparator or reference data can provide an additional approach for putting the efficacy of a new therapy into perspective.11, 12 In this study, we applied different statistical methods and sensitivity analyses to evaluate the clinical efficacy of blinatumomab against historical data. Concerns often raised regarding the use of historical comparator data are the influence of potential biases related to selection, misclassification and confounding.12 The requirement of rigorous eligibility criteria in the blinatumomab clinical study—such as Eastern Cooperative Oncology Group status of two or lower and absence of abnormal lab values during screening—may increase the chance of better outcomes in the clinical study than the historical data. While it may be possible to use unadjusted historical data when patient populations are sufficiently similar,27 the disproportionate number of advanced-stage patients in the blinatumomab trial required methods applied to individual-level data to minimize bias. Selection bias was minimized by use of stringent inclusion criteria into the historical data set and by weighting or adjusting for known prognostic factors. In addition, the historical data set represented adult R/R patients who received standard of care (excluding palliative care patients where possible), without any restrictions to any patient subgroups. Residual confounding may still remain and be difficult to control for, particularly in data sets where differences in important prognostic factors are unknown or not measured in one data set. In this study, nearly all known important prognostic factors were adjusted for in the weighted or propensity score analyses. Missing data on key covariates lead to exclusion of some records from the analyses (Figure 1), which may theoretically bias the overall results. However, our examination of records with missing covariates did not identify significant differences by patient demographic characteristics compared with patients who had complete data (data not shown). Misclassification bias was limited by harmonization of patient-level data in the pooled analysis, which employed common data definitions for disease classification and outcomes characterization. Compare this to how the Borody study discusses its synthetic control group: The control data was from contemporary infected subjects in Australia obtained from published Covid Tracking Data. I hesitate to say “they didn’t even say which tracking data”, because in the past I’ve said things like that and just missed it. But I can’t find them saying which tracking data. In Borody et al’s synthetic control group, 70/600 (11.5%) patients required hospitalization. But the US hospitalization rate appears to be about 1% for unvaccinated individuals. So Borody’s synthetic control group got 10x the expected hospitalization rate. This seems very relevant to this study finding that ivermectin decreases hospitalization by 90%! I’m not claiming this is fraudulent, or impossible, or means the study couldn’t have been good. And Borody claim to have used an “equivalent” control group, so maybe there was some adjustment done for this. But this is why we usually use more than one word to describe our control groups! Or use real control groups that don’t ruin your study if you do a finicky adjustment slightly wrong! I feel like these are the kinds of questions Alexandros needs to be asking, instead of just giving a link to a Stat News article about how sometimes synthetic control groups are okay. Also other questions, like “how come this found a 90% decrease in hospitalization and mortality, but lots of other studies found smaller decreases, and the biggest and best studies found none at all?” I know Alexandros’ answers are to find lots of flaws with the biggest and best studies, but these flaws wouldn’t be enough to cover up a 90% cure rate. And if you’re in the business of calling out flaws in studies I genuinely think having your control group be “we used some group of people somewhere in Australia, they had 10x the normal hospitalization rate, we won’t tell you anything else” would be the sort of flaw you would call out! Thomas Borody is a genuinely brilliant gastroenterologist and I am very grateful for his life-saving discoveries. But Elon Musk is a genuinely brilliant engineer and I am very grateful for his low-cost reusable rockets - and this doesn’t mean he never does crazy inexplicable things. Maybe Borody and his collaborators have a point from this study, but I don’t feel like it makes sense as written. If they ever explain what they were doing in more detail and it’s some sort of amazing 4D-chess move that makes total sense, I will apologize to them. Otherwise, stick to inventing amazing life-saving digestive therapies. In response to this section, Alexandros stresses that he is not necessarily saying Borody et al is incorrect or challenging my decision to leave it out. He writes: I will repeat that my strong objection, is that you wrote " this is not how you control group, @#!% you". I therefore pointed to stat news to support my case that, yes, this can indeed be how you control group. That's all. In the article I even noted that this aversion towards disrespect to elders may even be a cultural difference between us. To be clear, if I were making a case for ivermectin, I would not be relying on this study as my starting point. III. Hokey Meta-Analysis Alexandros points out that I used the wrong statistical test when analyzing the overall picture gleaned from this studies. He’s right. The right statistical test would make ivermectin look stronger, without changing the sign of the conclusion. After getting a core group of potentially trustworthy studies, I tried to see whether ivermectin still had a statistically significant positive effect in them. I tried to be honest that I didn’t really know how to do formal meta-analyses: Probably I’m forgetting some reason I can’t just do simple summary statistics to this, but whatever. It is p = 0.15, not significant . . . What happens if I unprincipledly pick whatever I think the most reasonable outcome to use from each study is? . . . Now it’s p = 0.04, seemingly significant I in fact could not do simple summary statistics to this. Alexandros describes the test I should have used, a DerSimonian-Laird test, and applies it to the same data. Now the numbers are p = 0.03 and p < 0.0001. I accept that I was wrong, he is right, and this is more accurate. My original conclusion to this section is that although you couldn’t be absolutely sure from the numbers, eyeballing things it definitely looked like ivermectin had an effect. I then went on to try to explain that effect. With Marinos’ corrections, you can be sure from the numbers, but the rest of the post - an attempt to explain the effect - still stands. IV. Worms Alexandros brings up issues with the Strongyloides hypothesis; Dr. Bitterman graciously responds. I find the issues real enough to lower my credence in the idea, but not to completely rule it out. Even if it is true, I probably overestimated how important it was. My original explanation for the effect was Dr. Avi Bitterman’s theory of Strongyloides hyperinfection. Many people in certain tropical regions are infected with the parasitic worm Strongyloides. Usually a person’s immune system keeps this worm under control, and the parasites cause only limited problems. But under certain situations - especially when people take immune-suppressing corticosteroids - the immune system fails, the worms multiply, and the patient can potentially die of sudden worm overgrowth (“hyperinfection”). Corticosteroids are a common COVID treatment. So plausibly some people in tropical areas fighting COVID are at risk of dying from worm hyperinfection. Ivermectin was originally an anti-parasitic-worm medication before being repurposed to fight COVID, and everyone agrees it is very good at this. So if many people in COVID trials are dying of worm infections, then ivermectin could help them. This would look like ivermectin reducing mortality in COVID trials, and make people wrongly conclude that ivermectin treats COVID. Alexandros responds to this theory here, again I’ll try to summarize: The original Bitterman paper concludes that ivermectin trials show stronger results in high-Strongyloides-prevalence regions. But it mixes prevalence data from two different papers with different methodologies. Correcting for this, the findings no longer clear a formal bar for statistical significance, and don’t really look significant either.
Inline links: are a standard practice, Here's an example, 11, 12, 27, Figure 1, about 1% for unvaccinated individuals, here
When John Ioannidis attacks funnel plots, I am fine with this because Dr. Ioannidis is known to be unusually rigorous and this is part of his pro-rigor crusade. But when Alexandros gets angry at me for rejecting Borody et al, whose control group was “we got a control group from somewhere, it had 10x the normal hospitalization rate, don’t ask questions” - or thinks it’s offensive to suspect Carvallo, whose statistical analysis was “the person I claim was my statistician denies ever having been associated with me and explicitly accuses me of lying, but whatever, here are some numbers proving that zero people who took ivermectin died” - then I don’t think he can fairly demand Ioannidean levels of rigor when it serves him.